The sunburst diagram can be used to visualize the distribution of hierarchical variables of data. It represents distribution with a list of rings around the center circle. The central circle represents the total quantity of a particular attribute and then each ring around it represents distribution at that level to a relationship with parent ring which is inside of it.
The sunburst chart is also sometimes referred to as multi-level pie chart or ring chart or donut chart or doughnut chart or radial treemap.
A common example to explain the usage of a sunburst chart would population distribution of world where the central circle represents total world distribution, ring around it represents distribution per continent, ring around it represents distribution per country of each continent, and ring around it can further used to for distribution per state of each country.
It can be used to display any kind of hierarchical or multi-level data.
The sunburst chart is very similar to treemap charts with the only difference being that data is laid out radially. If you are interested in learning treemap plotting using python then feel free to go through our tutorial on treemap which explains various ways to draw treemap in python.
As a part of this tutorial, we have explained how to create sunburst charts in Python using interactive data visualization library "plotly". Tutorial explains different ways of creating charts using Plotly Python API with simple and easy-to-understand examples.
Below, we have listed important sections of tutorial to give an overview of material covered.
We'll start by importing necessary libraries.
import pandas as pd
import numpy as np
pd.set_option("max_columns", 30)
import plotly.express as px
import plotly.graph_objects as go
We'll also be using 3 datasets available from kaggle to include further data for analysis and plotting.
We suggest that you download allthe datasets to follow along with us through the tutorial.
starbucks_locations = pd.read_csv("datasets/starbucks_store_locations.csv")
starbucks_locations.head()
world_countries_data = pd.read_csv("datasets/countries of the world.csv")
world_countries_data["World"] = "World"
world_countries_data.head()
indian_district_population = pd.read_csv("datasets/indian-census-data-with-geospatial-indexing/district wise population for year 2001 and 2011.csv")
indian_district_population["Country"] = "India"
indian_district_population.head()
There are two ways to generate a sunburst chart using plotly. It provides two APIs for generating sunburst charts.
We'll be explaining both ways one by one below.
The plotly has a module named express which provides easy to use method named sunburst() which can be used to create sunburst charts. It accepts dataframe containing data, columns to use for hierarchy, and columns to use for actual values of the distribution. We can provide a list of columns with hierarchical relations as a list to the path attribute of the method. The values to use to decide the sizes of distribution circles can be provided as a column name to the values attribute. We can also provide title, width, and height attributes of the figure. The sunburst() method returns a figure object which can be used to show a chart by calling show()** method on it.
We'll need to prepare the dataset first in order to show Starbucks store counts distribution per city, and country worldwide. We'll be grouping the original Starbucks dataset according to Country, and City. Then we'll call count() on it which will count entry for each possible combination of Country and City. We also have introduced a new column named World which has all values same containing string World. We have created this column to create a circle in the center to see the total worldwide count.
starbucks_dist = starbucks_locations.groupby(by=["Country", "State/Province", "City"]).count()[["Store Number"]].rename(columns={"Store Number":"Count"})
starbucks_dist["World"] = "World"
starbucks_dist = starbucks_dist.reset_index()
starbucks_dist.head()
fig = px.sunburst(starbucks_dist,
path=["World", "Country", "State/Province", "City"],
values='Count',
title="Starbucks Store Count Distribution World Wide [Country, State, City]",
width=750, height=750)
fig.show()
Below we are creating a sunburst chart depicting population distribution per district of India in 2011. We have passed the path parameter list of columns necessary to create a hierarchy. We have covered this in our tutorial on treemap as well.
fig = px.sunburst(indian_district_population,
path=["Country", "State", "District",],
values='Population in 2011',
width=750, height=750,
title="Indian District Population Per State",
)
fig.show()
Below we have created a sunburst chart showing population count per country per region of the world. We have provided necessary columns having a hierarchical relationship to the path parameter of the method.
fig = px.sunburst(world_countries_data,
path=["World", "Region", "Country"],
values='Population',
width=750, height=750,
title="World Population Per Country Per Region",
)
fig.show()
Below the sunburst chart explains area distribution per country per region worldwide.
fig = px.sunburst(world_countries_data,
path=["World", "Region", "Country"],
values='Area (sq. mi.)',
width=750, height=750,
title="World Area Per Country Per Region",
)
fig.show()
Below we have again plotted a sunburst chart explaining population distribution per country per region but we have also color-encoded each distribution according to GDP of that country/region.
We can compare the population and GDP of the country based on this sunburst chart. We can notice that countries like India and China have less GDP even though having more population whereas countries like the US, Japan, Germany, UK, France, Australia, and Hong Kong have less population but more GDP.
fig = px.sunburst(world_countries_data,
path=["World", "Region", "Country"],
values='Population',
width=750, height=750,
color_continuous_scale="BrBG",
color='GDP ($ per capita)',
title="World Population Per Country Per Region Color-Encoded By GDP"
)
fig.show()
Below we have again plotted population distribution per country per region of the world but this time we have color-encoded data to the area of countries and regions. This helps us compare the relationship between population and area.
We can notice that countries like India are more but has less area compared to countries like Russia, the United States, and Brazil which have visibly more area with less population.
fig = px.sunburst(world_countries_data,
path=["World", "Region", "Country"],
values='Population',
width=750, height=750,
color_continuous_scale="RdYlGn",
color='Area (sq. mi.)',
title="World Population Per Country Per Region Color-Encoded By Area"
)
fig.show()
The second way of creating a sunburst chart using plotly is using the Sunburst() method of the graph_objects module. We need to provide it a list of all possible combination of parent and child combination and their values in order to create a chart using this method.
In order to create a sunburst chart using graph_objects.Sunburst() method, we have done little preprocessing with data. The Sunburst() method expects that we provided all possible parent-child relationship labels and their values to it. We have region-country relation labels and values ready in the dataset but for getting world-region relationship labels and values we have grouped dataframe according to the region in order to get region-wise population counts. We have then combined labels in order to generate all possible parent-child relationship labels as well as values.
region_wise_pop = world_countries_data.groupby(by="Region").sum()[["Population"]].reset_index()
parents = [""] + ["World"] *region_wise_pop.shape[0] + world_countries_data["Region"].values.tolist()
labels = ["World"] + region_wise_pop["Region"].values.tolist() + world_countries_data["Country"].values.tolist()
values = [world_countries_data["Population"].sum()] + region_wise_pop["Population"].values.tolist() + world_countries_data["Population"].values.tolist()
fig =go.Figure(go.Sunburst(
parents=parents,
labels= labels,
values= values,
))
fig.update_layout(title="World Population Per Country Per Region",
width=700, height=700)
fig.show()
Below we have again created a sunburst chart of population distribution but this time it looks completely like the plotly.express module. We have set the branchvalues parameter to string value total which fills the whole circle. By default, the Sunburst() method does not create full circle sunburst charts.
region_wise_pop = world_countries_data.groupby(by="Region").sum()[["Population"]].reset_index()
parents = [""] + ["World"] *region_wise_pop.shape[0] + world_countries_data["Region"].values.tolist()
labels = ["World"] + region_wise_pop["Region"].values.tolist() + world_countries_data["Country"].values.tolist()
values = [world_countries_data["Population"].sum()] + region_wise_pop["Population"].values.tolist() + world_countries_data["Population"].values.tolist()
fig =go.Figure(go.Sunburst(
parents=parents,
labels= labels,
values= values,
branchvalues="total",
))
fig.update_layout(title="World Population Per Country Per Region",
width=700, height=700)
fig.show()
Below we have combined two sunburst charts into a single figure. One sunburst chart is about world population distribution per country per region and another is about area distribution per country per region. We can combine many related sunburst charts this way to show possible relationships. Please go through code to understand a little preprocessing in order to create charts.
fig = go.Figure()
parents = [""] + ["World"] *region_wise_pop.shape[0] + world_countries_data["Region"].values.tolist()
labels = ["World"] + region_wise_pop["Region"].values.tolist() + world_countries_data["Country"].values.tolist()
values = [world_countries_data["Population"].sum()] + region_wise_pop["Population"].values.tolist() + world_countries_data["Population"].values.tolist()
fig.add_trace(go.Sunburst(
parents=parents,
labels= labels,
values= values,
domain=dict(column=0),
name="Population Distribution"
))
region_wise_area = world_countries_data.groupby(by="Region").sum()[["Area (sq. mi.)"]].reset_index()
parents = [""] + ["World"] *region_wise_area.shape[0] + world_countries_data["Region"].values.tolist()
labels = ["World"] + region_wise_area["Region"].values.tolist() + world_countries_data["Country"].values.tolist()
values = [world_countries_data["Area (sq. mi.)"].sum()] + region_wise_area["Area (sq. mi.)"].values.tolist() + world_countries_data["Area (sq. mi.)"].values.tolist()
fig.add_trace(go.Sunburst(
parents=parents,
labels= labels,
values= values,
domain=dict(column=1)
))
fig.update_layout(
grid= dict(columns=2, rows=1),
margin = dict(t=0, l=0, r=0, b=0),
width=900, height=700
)
fig.show()
This ends our small tutorial explaining how to plot a sunburst chart in python using plotly.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to