The chord diagram is a data visualization technique used to show the relationship between various data attributes. It organizes data attributes radially in a circle and the between attributes is shown by drawing arcs between them. When graphs have many arcs between points then it can make visualization look messy. The chord diagram can bundle these arcs using a technique called hierarchical edge bundling which creates an arc between two data attributes and the size of arc varies based on a number of connections between them.
The chord diagrams are commonly used to show the relationship between data attributes by presenting a relationship attribute-based on the size ofthe arc. Its also used to show flow or connections between data attributes. The chord diagrams are commonly used for population migration studies, airport routes, economic flows, genome studies, etc.
We'll be explaining ways to plot chord diagrams in python using holoviews. Holoviews is a wrapper library around bokeh and matplotlib hence it uses them for plotting purpose behind the scene. We'll be using both bokeh and matplotlib backends for explaining chord diagrams plotting using holoviews. If you do not have a background on holoviews and you are interested in learning holoviews then we have a tutorial on holoviews basics. Please feel free to explore our tutorial to learn about the wonderful data visualization library called holoviews.
We'll now start by importing necessary libraries.
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")
pd.set_option("max_columns", 30)
import holoviews as hv
We'll be using New Zealand Migration and Brazil flight data for visualization purposes. The datasets are available at kaggle.
We suggest that you download the datasets and follow along with us using the same datasets to better understand the material and get an in-depth idea about the whole process. We'll first load the datasets as pandas dataframe and aggregate it in various ways to create different chord diagrams.
Please make a note that original brazil flights dataset is quite big with size of 600+ MB and nearly 2.5 Mn rows. If you have low RAM then you can use nrows attribute of read_csv() method to load only first few thousand entries to follow along with tutorial without getting stuck. We have loaded only 10k entries for making things simple for explanation purpose.
nz_migration = pd.read_csv("datasets/migration_nz.csv")
nz_migration.head()
brazil_flights = pd.read_csv("brazil_flights_data.csv", nrows=10000, encoding="latin")
brazil_flights = brazil_flights.rename(columns={"Cidade.Origem":"City_Orig", "Cidade.Destino":"City_Dest",
"Estado.Origem":"State_Orig", "Estado.Destino":"State_Dest",
"Pais.Origem":"Country_Orig", "Pais.Destino":"Country_Dest",
"Aeroporto.Origem":"Airport_Orig", "Aeroporto.Destino":"Airport_Dest"})
brazil_flights.head()
We'll start by setting backend for plotting as bokeh
. We'll be explicitly specifying each time which backend to use to holoviews.
hv.extension("bokeh")
We'll first group brazil flight dataset by origin city and destination city and then take count of each combinations. We'll be using this aggregated dataset for plotting the chord chart.
flight_counts_bet_cities = brazil_flights.groupby(by=["City_Orig", "City_Dest"]).count()[["Voos"]].rename(columns={"Voos":"Count"}).reset_index()
flight_counts_bet_cities = flight_counts_bet_cities.sort_values(by="Count", ascending=False)
flight_counts_bet_cities.head()
Holoviews has a method named Chord
to create chord diagrams. We need to provide its dataframe containing a source of flow, the destination of flow, and value. We can explicitly specify which column name to use as source
column, destination
column, and values
. If we don't specify column names then it'll take the first column as the source, second column as a destination, and third column as value flow from source to destination.
We can specify chart attributes for holoviews chord charts by using %%opts
jupyter notebook cell magic command. We can specify an attribute and its value in either brackets or parentheses. The figure dimensions attributes are specified in brackets and graph attributes are specified in parenthesis.
Below we are creating a chord chart showing traffic movement between cities of brazil.
%%opts Chord [height=600 width=600 title="Traffic Movement Between Cities" ]
chord = hv.Chord(flight_counts_bet_cities)
print(chord)
chord
We can also give labels to each node on the circle. We need to create holoviews Dataset
object in order to do it as explained below. We need to pass it dataframe containing all cities’ names as a column.
cities = list(set(flight_counts_bet_cities["City_Orig"].unique().tolist() + flight_counts_bet_cities["City_Dest"].unique().tolist()))
cities_dataset = hv.Dataset(pd.DataFrame(cities, columns=["City"]))
Below we are creating a chord chart again for traffic movement between the cities of Brazil. This time we are giving flight dataset and cities dataset object created above as input to the Chord
method. We also have set labels attribute to City
which was specified as a column name when creating cities dataset.
%%opts Chord [height=700 width=700 title="Traffic Movement Between Cities" labels="City"]
hv.Chord((flight_counts_bet_cities, cities_dataset))
We'll now set backend as matplotlib in order to plot the same chart using matplotlib.
hv.extension("matplotlib")
hv.output(fig='svg', size=250)
Below we are plotting the same chart as above but using matplotlib.
%%opts Chord [height=700 width=700 title="Traffic Movement Between Cities" labels="City"]
hv.Chord((flight_counts_bet_cities, cities_dataset))
We'll now create a chord chart showing traffic movement between states of the dataset.
hv.extension("bokeh")
Below we are creating another dataset that has information about flight movement between states. We are first grouping the original flight dataset based on origin state and destination state and then counting flights between each state combinations.
flight_counts_bet_states = brazil_flights.groupby(by=["State_Orig", "State_Dest"]).count()[["Voos"]].rename(columns={"Voos":"Count"}).reset_index()
flight_counts_bet_states = flight_counts_bet_states.sort_values(by="Count", ascending=False)
flight_counts_bet_states.head()
states = list(set(flight_counts_bet_states["State_Orig"].unique().tolist() + flight_counts_bet_states["State_Dest"].unique().tolist()))
states_dataset = hv.Dataset(pd.DataFrame(states, columns=["State"]))
Below we have created a chord chart depicted traffic movement between states of the dataset. We also have a modified chart by setting colors of nodes and edges.
If you don't remember configuration options when trying %%opts jupyter magic command then you can press tab inside of brackets and parenthesis. It'll display you list of available options. You can then try various values for that configuration options.
%%opts Chord [height=700 width=700 title="Traffic Movement Between States" labels="State"]
%%opts Chord (node_color="State" node_cmap="Category20" edge_color="State_Orig" edge_cmap='Category20')
hv.Chord((flight_counts_bet_states, states_dataset))
hv.extension("matplotlib")
hv.output(fig='svg', size=200)
%%opts Chord [title="Traffic Movement Between States" labels="State"]
%%opts Chord (node_color="State" node_cmap="Category20" edge_color="State_Orig" edge_cmap='Category20')
hv.Chord((flight_counts_bet_states, states_dataset))
We'll now create a chord diagram showing traffic movement between airports of a dataset.
hv.extension("bokeh")
We'll first create an aggregated dataset which has information about flight count between source and destination airports. We also have filtered dataset to keep only airports which have more than 75 flights in order to prevent chord chart from getting crowded.
flight_counts_bet_airports = brazil_flights.groupby(by=["Airport_Orig", "Airport_Dest"]).count()[["Voos"]].rename(columns={"Voos":"Count"}).reset_index()
flight_counts_bet_airports = flight_counts_bet_airports.sort_values(by="Count", ascending=False)
flight_counts_bet_airports = flight_counts_bet_airports[flight_counts_bet_airports["Count"] > 75]
flight_counts_bet_airports.head()
airports = list(set(flight_counts_bet_airports["Airport_Orig"].unique().tolist() + flight_counts_bet_airports["Airport_Dest"].unique().tolist()))
airports_dataset = hv.Dataset(pd.DataFrame(airports, columns=["Airport"]))
We have modified many chart attributes below when plotting a chord chart explaining flight movement between airports.
%%opts Chord [height=800 width=800 title="Traffic Movement Between Airports" labels="Airport" bgcolor="black"]
%%opts Chord (node_color="Airport" node_cmap="Category20" edge_color="Airport_Orig" edge_cmap='Category20' edge_alpha=0.8)
%%opts Chord (edge_line_width=2 node_size=25 label_text_color="white")
hv.Chord((flight_counts_bet_airports, airports_dataset))
hv.extension("matplotlib")
hv.output(fig='svg', size=250)
%%opts Chord [labels="Airport" title="Traffic Movement Between Airports"]
%%opts Chord (node_color="Airport" node_cmap="Category20" edge_color="Airport_Orig" edge_cmap='Category20')
%%opts Chord (node_size=15 edge_alpha=0.8 edge_linewidth=1.0)
hv.Chord((flight_counts_bet_airports, airports_dataset))
We'll now be using a chord chart to show population immigration to New Zealand in 2016. We'll start by setting backend as bokeh.
hv.extension("bokeh")
We'll first need to create a dataset that has a column with information about the source and destination county as well as population immigration between them. We'll filter the original new Zealand migration dataset by keeping only entries with arrivals and year as 2016. We'll then remove entries where continents data is present. We'll also create a new column named DestCountry
having values as New Zealand. We'll then group by source country and the destination country and then sum up the population. We'll then remove entries which has less than 1000 value for the population in order to prevent the chart from getting cluttered.
immigration_to_nz = nz_migration[nz_migration["Measure"] == "Arrivals"]
immigration_to_nz = immigration_to_nz[immigration_to_nz["Year"]==2016]
immigration_to_nz = immigration_to_nz[~immigration_to_nz["Country"].isin(['All countries', 'Not stated', 'Asia', 'Europe',])]
immigration_to_nz = immigration_to_nz.groupby(by="Country").sum()[["Value"]]
immigration_to_nz["DestCountry"] = "New Zealand"
immigration_to_nz = immigration_to_nz.reset_index().rename(columns={"Country":"SourceCountry"})
immigration_to_nz = immigration_to_nz[["SourceCountry", "DestCountry", "Value"]].sort_values(by="Value", ascending=False)
immigration_to_nz = immigration_to_nz[immigration_to_nz.Value > 1000]
immigration_to_nz.head()
immigrate_countries = list(set(immigration_to_nz["SourceCountry"].unique().tolist() + ["New Zealand"]))
immigrate_countries_dataset = hv.Dataset(pd.DataFrame(immigrate_countries, columns=["Country"]))
%%opts Chord [height=800 width=800 title="Immigration to New Zealand [2016]" labels="Country" bgcolor="black"]
%%opts Chord (node_color="Country" node_cmap="Category20" edge_color="SourceCountry" edge_cmap='Category20' edge_alpha=0.8)
%%opts Chord (edge_line_width=3 node_size=25 label_text_color="lime")
hv.Chord((immigration_to_nz, immigrate_countries_dataset))
hv.extension("matplotlib")
hv.output(fig='svg', size=250)
%%opts Chord [height=800 width=800 title="Immigration to New Zealand [2016]" labels="Country"]
%%opts Chord (node_color="Country" node_cmap="Category20" edge_color="SourceCountry" edge_cmap='Category20' edge_alpha=0.8)
%%opts Chord (node_size=15 edge_alpha=0.8 edge_linewidth=1.0)
hv.Chord((immigration_to_nz, immigrate_countries_dataset))
We'll now create a chord diagram showing population emigration from New Zealand.
hv.extension("bokeh")
Below we are creating an emigration dataset exactly the same way as we create an immigration dataset with few minor changes in steps.
migration_from_nz = nz_migration[nz_migration["Measure"] == "Departures"]
migration_from_nz = migration_from_nz[migration_from_nz["Year"]==2016]
migration_from_nz = migration_from_nz[~migration_from_nz["Country"].isin(['All countries', 'Not stated', 'Asia', 'Europe',])]
migration_from_nz = migration_from_nz.groupby(by="Country").sum()[["Value"]]
migration_from_nz["SourceCountry"] = "New Zealand"
migration_from_nz = migration_from_nz.reset_index().rename(columns={"Country":"DestCountry"})
migration_from_nz = migration_from_nz[["SourceCountry", "DestCountry", "Value"]].sort_values(by="Value", ascending=False)
migration_from_nz = migration_from_nz[migration_from_nz.Value>1000]
migration_from_nz.head()
emigrate_countries = list(set(migration_from_nz["DestCountry"].unique().tolist() + ["New Zealand"]))
emigrate_countries_dataset = hv.Dataset(pd.DataFrame(emigrate_countries, columns=["Country"]))
%%opts Chord [height=800 width=800 title="Emigration from New Zealand [2016]" labels="Country" bgcolor="black"]
%%opts Chord (node_color="Country" node_cmap="Category20" edge_color="DestCountry" edge_cmap='Category20' edge_alpha=0.8)
%%opts Chord (edge_line_width=3 node_size=25 label_text_color="cyan")
hv.Chord((migration_from_nz, emigrate_countries_dataset))
hv.extension("matplotlib")
hv.output(fig='svg', size=250)
%%opts Chord [height=800 width=800 title="Emigration from New Zealand [2016]" labels="Country"]
%%opts Chord (node_color="Country" node_cmap="Category20" edge_color="DestCountry" edge_cmap='Category20' edge_alpha=0.8)
%%opts Chord (node_size=15 edge_alpha=0.8 edge_linewidth=1.0)
hv.Chord((migration_from_nz, emigrate_countries_dataset))
This ends our small tutorial explaining how to plot a chord diagram using holoviews. Please feel free to let us know your views in the comments section.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to