Updated On : May-08,2020 Time Investment : ~30 mins

Network Analysis in Python: Important Structures and Bipartite Graphs [Networkx]¶

Table of Contents¶

Introduction
1. Load Dataset
2. Create a Graph
3. Visualize Graph
4. Structures in a Graph
5. Bipartite Graphs
References

Introduction ¶

Network analysis helps us get meaningful insights into graph data structures. We already discussed network structure and it's basic analysis in our other tutorial titled "Network Analysis: Node Importance & Paths". We have explained about basic network structure and network creation as well as manipulation using python library networkx. This tutorial is a continuation of that tutorial on further analysis of graph data structures. As a part of this tutorial, we'll be taking a look at presence important structures like cliques, triangles, connected components. We'll also explain the bipartite graph. If you do not have a background about network terminology and networkx library then we suggest that you go through our tutorials on basic network analysis and networkx basics.

To get started with the analysis, we'll define the graph data structure first. Graphs are data structure which has two main entities:

Nodes/Vertices: It's used to represent entities like airports, people, recipe ingredients, etc
Edges: It's used to represent a relationship between nodes like the distance between airports, the relation between people, whether an ingredient is part of a recipe, etc. Edges are the most important properties of graphs.

Graphs are generally represented as G(V, E) where V represents a list of vertices/nodes, and E represents a list of edges between those nodes. When representing graphs as visually each node is represented as a circle and each edge is shown as a line connecting nodes labeling relation between that nodes.

We'll start importing all necessary libraries which will be used as a part of this tutorial.

import pandas as pd
import numpy as np

import networkx as nx

import matplotlib.pyplot as plt

import sys
import warnings
warnings.filterwarnings('ignore')

print('Python Version : '+sys.version)
print('NetworkX version : '+nx.__version__)

%matplotlib inline

Python Version : 3.7.3 (default, Mar 27 2019, 22:11:17)
[GCC 7.3.0]
NetworkX version : 2.3

1. Load Dataset ¶

We'll start loading the dataset that we'll be using for our tutorial. We'll be using physician trust dataset available from Konect.

The dataset has information about the network which captures innovation spread among 246 physicians from Illinois, Peoria, Bloomington, Quincy, and Galesburg collected in 1966. The node in a graph presents physician and edge represent that left physician will contact the right physician for advice or discussion hence trusting that physician.

We'll load this dataset and create a graph out of it. We suggest that you download the dataset as well to follow along with us.

df = pd.read_csv('datasets/moreno_innovation/out.moreno_innovation_innovation', sep=' ', skiprows=2, header=None)
df.columns = ['doctor1', 'doctor2']
df.head()

	doctor1	doctor2
0	1	2
1	1	3
2	1	4
3	1	5
4	1	6

2. Create a Graph ¶

We'll be creating a directed graph using the networkx package. We'll loop through each entry of the dataset and add an edge to a network suggesting the first physician will interact with the second physician. We do not have any metadata present as a part of this dataset to be added to the network. We'll be printing the first few nodes and edges once the graph is created.

undirected_G = nx.Graph()

for idx in df.index:
    undirected_G.add_edge(df.loc[idx]['doctor1'], df.loc[idx]['doctor2'])

list(undirected_G.nodes(data=True))[:5]

[(1, {}), (2, {}), (3, {}), (4, {}), (5, {})]

list(undirected_G.edges(data=True))[:5]

[(1, 2, {}), (1, 3, {}), (1, 4, {}), (1, 5, {}), (1, 6, {})]

3. Visualize Graph ¶

We'll now try to visualize graphs using various network graph plots available like networkx plot, circos plot, arc plot, and matrix plot.

3.1 Networkx Plot ¶

nx.draw(undirected_G, font_color="red", node_color="lime", with_labels=True, edge_size=10)

From above networkx hairball, we can see that the dataset seems to be consist of 4 different graphs.

3.2 Circos Plot ¶

from nxviz import CircosPlot

c = CircosPlot(undirected_G)
c.draw()

By looking at the above circos plot it seems like there are different independent networks present in a dataset.

3.3 Arc Plot ¶

from nxviz import ArcPlot

a = ArcPlot(undirected_G, figsize=(12,5))
a.draw()
plt.tight_layout()
plt. autoscale()

The above arc chart also confirms further that the dataset seems to consist of 4 different networks.

3.4 Matrix Plot [Adjacency Matrix] ¶

from nxviz import MatrixPlot

m = MatrixPlot(undirected_G)
m.draw()

The above matrix plot of the graph adjacency matrix represents the same findings are previous plots.

4. Structures in a Graph ¶

We'll now try to identify various structures available in the graph. We'll look for cliques, triangles, connected components present in graphs.

4.1 Cliques & Triangles ¶

In a network, a clique is a group of nodes that are closely connected with one another. It's a data structure where each node is connected to all other nodes of that data structure hence knows everybody else.

The single edge is the simplest clique where both nodes are connected to each other. The triangles are another simplest type of clique where there are three nodes and each node is connected to the other two nodes.

Networkx API provides a method called find_cliques() which returns all possible cliques. We'll use it to get cliques of different sizes.

cliques = list(nx.find_cliques(undirected_G))

trianlges = [clique for clique in cliques if len(clique) == 3]

print("Number of Triangles Available : ", len(trianlges))
print("First Few Triangles : ", trianlges[:5])

Number of Triangles Available :  224
First Few Triangles :  [[2, 11, 40], [2, 11, 39], [2, 42, 110], [3, 67, 42], [3, 9, 41]]

nx.draw(undirected_G.subgraph(trianlges[0]), with_labels=True, node_color="lime", font_color="red")

cliques_of_len_4 = [clique for clique in cliques if len(clique) == 4]

print("Available Number of Cliques of Length 4 : ", len(cliques_of_len_4))
print("First Few Cliques of Length 4 : ", cliques_of_len_4[:4])

Available Number of Cliques of Length 4 :  89
First Few Cliques of Length 4 :  [[1, 4, 5, 6], [9, 40, 11, 89], [11, 40, 12, 15], [12, 40, 69, 73]]

nx.draw(undirected_G.subgraph(cliques_of_len_4[0]), with_labels=True, node_color="lime", font_color="red")

cliques_of_len_5 = [clique for clique in cliques if len(clique) == 5]

print("Available Number of Cliques of Length 4 : ", len(cliques_of_len_5))
print("First Few Cliques of Length 4 : ", cliques_of_len_5[:4])

Available Number of Cliques of Length 4 :  19
First Few Cliques of Length 4 :  [[13, 15, 18, 74, 14], [15, 23, 54, 40, 37], [20, 25, 21, 117, 95], [127, 121, 132, 128, 129]]

nx.draw(undirected_G.subgraph(cliques_of_len_5[0]), with_labels=True, node_color="lime", font_color="red")

4.2 Connected Components ¶

Connected components of the graph are subgraphs where each node is reachable from another node by following some path. It'll be reachable directly or by following a few other nodes but one can travel from one node to another without break.

These components are not connected to other nodes of the graph. When we first plotted above network through circos plot, arc plot, networkx plot, and matrix plot; we noticed that this network of physicians seems to consist of other independent small networks.

Networkx provides us with methods named connected_component_subgraphs() and connected_components() for generating list of connected components present in graph. We can pass the original graph to them and it'll return a list of connected components as a subgraph. We'll below retrieve all subgraphs from the original network and try to plot them to better understand them.

connected_components = list(nx.connected_component_subgraphs(undirected_G))
connected_components

[<networkx.classes.graph.Graph at 0x7ff80169cf28>,
 <networkx.classes.graph.Graph at 0x7ff7fc6cb390>,
 <networkx.classes.graph.Graph at 0x7ff7fc765898>,
 <networkx.classes.graph.Graph at 0x7ff7fc765be0>]

Below we are using connected_components() for generating list of connected components. It returns a list where each entry is a list itself of nodes comprising connected components. We'll loop through each list entry and convert it to subgraph using Graph.subgraph() method. It'll result in the same output as the output of the above method.

connected_components2 = [undirected_G.subgraph(nodes) for nodes in list(nx.connected_components(undirected_G))]
connected_components2

[<networkx.classes.graph.Graph at 0x7ff8016de898>,
 <networkx.classes.graph.Graph at 0x7ff7fc67a940>,
 <networkx.classes.graph.Graph at 0x7ff7fc67a908>,
 <networkx.classes.graph.Graph at 0x7ff7fc67a240>]

4.3 Plotting Individual Connected Components as Networkx Graph ¶

plt.figure(figsize=(15,6))

plt.subplot(141)
nx.draw(connected_components[0], with_labels=True, node_color="lime", font_color="red")

plt.subplot(142)
nx.draw(connected_components[1], with_labels=True, node_color="lime", font_color="red")

plt.subplot(143)
nx.draw(connected_components[2], with_labels=True, node_color="lime", font_color="red")

plt.subplot(144)
nx.draw(connected_components[3], with_labels=True, node_color="lime", font_color="red")

4.4 Adding Connected Components Index as Metadata to Nodes & Visualizing Graph ¶

We'll now add connected components index as metadata to each node of the original graph. We'll then visualize the modified graph using the circos plot to properly highlight each individual connected component.

## Logic to add nodes metadata
for i, subgraph in enumerate(connected_components):
    for n in subgraph.nodes():
        undirected_G.node[n]['component_idx'] = i
        undirected_G.node[n]['node_size'] = (i+1)*10
        undirected_G.node[n]['node_label'] = "%d"%(i+1)

## Logic to add edges metadata
for i, subgraph in enumerate(connected_components):
    for start, end in subgraph.edges():
        undirected_G.edges[start, end]['component_idx'] = i
        undirected_G.edges[start, end]['edge_width'] = i+1

circos = CircosPlot(undirected_G,
                   node_color='component_idx', node_order='component_idx',
                   edge_color="component_idx", edge_width='edge_width',
                   node_cmap="Dark2",
                   )
circos.draw()

arc = ArcPlot(undirected_G,
                node_color='component_idx', node_order='component_idx',
                edge_color="component_idx", edge_width='edge_width',
                figsize=(10,6))
arc.draw()
plt.tight_layout()
plt. autoscale()

We can notice from the above circos plot that each individual component is highlighted using different colors.

5. Bipartite Graphs ¶

Bipartite graphs (bi-two, partite-partition) are special cases of graphs where there are two sets of nodes as its name suggests. The node from one set can only connect to nodes from another set. The nodes from one set can not interconnect. It can be used to model a relationship between two different sets of points.

The real-life examples of bipartite graphs are person-crime relationship, recipe-ingredients relationship, company-customer relationship, etc.

We'll be loading crime data available from konect to understand bipartite graphs.

5.1 Load Dataset ¶

The dataset consists of three files. The first file has information from person id to crime id relation. The second file has information about the type of crime based on the index of the first file. The third file has information about the gender of a person based on the index of the first file. We'll load all files as a pandas dataframe and display the first few rows below to get an idea about the contents of files.

df = pd.read_csv('datasets/moreno_crime/out.moreno_crime_crime', sep=' ', skiprows=2, names= ["PersonID", "CrimeID", "NAN"])
df = df[['PersonID', 'CrimeID']]
df.head()

	PersonID	CrimeID
0	1	1
1	1	2
2	1	3
3	1	4
4	2	5

roles = pd.read_csv('datasets/moreno_crime/rel.moreno_crime_crime.person.role', header=None)
roles.columns = ['roles']
roles.head()

	roles
0	Suspect
1	Victim
2	Victim
3	Suspect
4	Victim

gender = pd.read_csv('datasets/moreno_crime/ent.moreno_crime_crime.person.sex', header=None)
gender.columns  = ["gender"]
gender.head()

	gender
0	1
1	1
2	1
3	1
4	1

5.2 Create Graph ¶

Below we are first joining the first dataframe with roles dataframe to create dataframe where we have a mapping from person to crime as well as the role of person involved. We'll then loop through rows of dataframe to generate a bipartite graph by adding nodes and edges to the graph. We are also adding a bipartite node attribute to a node to distinguish the set of nodes. We are also adding a role edge attribute which lets us know the role of a person in this crime.

bipartite_G = nx.Graph()

## Logic to add nodes and edges to graph with their metadata
for r, d in df.join(roles).iterrows():
    pid = 'P{0}'.format(d['PersonID'])  # pid = "Person I.D."
    cid = 'C{0}'.format(d['CrimeID'])  # cid = "Crime I.D."
    bipartite_G.add_node(pid, bipartite='person')
    bipartite_G.add_node(cid, bipartite='crime')
    bipartite_G.add_edge(pid, cid, role=d['roles'])

## Logic to add gender metadata to nodes
for idx in gender.index:
    nodeid = 'P{0}'.format(idx+1)
    bipartite_G.node[nodeid]['gender'] = gender.loc[idx]["gender"]

list(bipartite_G.nodes(data=True))[:5]

[('P1', {'bipartite': 'person', 'gender': 1}),
 ('C1', {'bipartite': 'crime'}),
 ('C2', {'bipartite': 'crime'}),
 ('C3', {'bipartite': 'crime'}),
 ('C4', {'bipartite': 'crime'})]

list(bipartite_G.edges(data=True))[:5]

[('P1', 'C1', {'role': 'Suspect'}),
 ('P1', 'C2', {'role': 'Victim'}),
 ('P1', 'C3', {'role': 'Victim'}),
 ('P1', 'C4', {'role': 'Suspect'}),
 ('C1', 'P756', {'role': 'Suspect'})]

5.3 Analyze Properties of Bipartite Graph ¶

Networkx has a module named bipartite which provides a list of methods to find out insights of bipartite graphs. We'll try to analyze the properties of bipartite graphs further below.

nx.bipartite.is_bipartite(bipartite_G)

True

Below we are looping through all nodes and trying to find out-degree centrality of all person nodes.

person_nodes = [node for node in bipartite_G.nodes() if bipartite_G.node[node]['bipartite'] == 'person']

person_nodes_with_centrality = [node for node in nx.bipartite.degree_centrality(bipartite_G, person_nodes).items() if node[0].startswith("P")]

sorted(person_nodes_with_centrality, key=lambda x: x[1], reverse=True)[:5]

[('P815', 0.045372050816696916),
 ('P2', 0.039927404718693285),
 ('P425', 0.032667876588021776),
 ('P220', 0.0308529945553539),
 ('P56', 0.025408348457350273)]

Below we are looping through all nodes and trying to find out-degree centrality of all crime nodes.

crime_nodes = [node for node in bipartite_G.nodes() if bipartite_G.node[node]['bipartite'] == 'crime']


crime_nodes_with_centrality = [node for node in nx.bipartite.degree_centrality(bipartite_G, crime_nodes).items() if node[0].startswith("C")]

sorted(crime_nodes_with_centrality, key=lambda x: x[1], reverse=True)[:5]

[('C110', 0.02171290711700844),
 ('C153', 0.0180940892641737),
 ('C14', 0.015681544028950542),
 ('C43', 0.015681544028950542),
 ('C95', 0.015681544028950542)]

5.4 Projections in Bipartite Graph ¶

We can project bipartite graph to one of the node-set of graph.

Below we'll be creating person-person projection of a person-crime bipartite graph where we'll put an edge between two person nodes related to same crime. We'll then plot it as a circos plot.

person_graph = nx.bipartite.projection.projected_graph(bipartite_G, person_nodes)

for n, d in person_graph.nodes(data=True):
    person_graph.node[n]['neighbors_count'] = len(list(person_graph.neighbors(n)))

c = CircosPlot(person_graph, node_color='gender', node_grouping='gender', node_order='neighbors_count')
c.draw()

Below we'll be creating crime-crime projection of a person-crime bipartite graph where we'll put an edge between two crime nodes related to same person. We'll then plot it using circos plot to understand how crimes are related.

crime_graph = nx.bipartite.projection.projected_graph(bipartite_G, crime_nodes)

for n, d in crime_graph.nodes(data=True):
    crime_graph.node[n]['neighbors_count'] = len(list(crime_graph.neighbors(n)))

c = CircosPlot(crime_graph, node_color='neighbors_count', node_order='neighbors_count')
c.draw()

This ends our small tutorial on basic graph analysis. We tried to cover below-mentioned points:

Graph Definition.
Cliques
Triangles
Connected Components
Bipartite Graphs

Please feel free to let us know your views in the comments section.

References ¶

Sunny Solanki

Comfortable Learning through Video Tutorials?

If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.

Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

When going through coding examples, it's quite common to have doubts and errors.

If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.

You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.

Want to Share Your Views? Have Any Suggestions?

If you want to

provide some suggestions on topic
share your views
include some details in tutorial
suggest some new topics on which we should create tutorials/blogs

Please feel free to contact us at coderzcolumn07@gmail.com. We appreciate and value your feedbacks. You can also support us with a small contribution by clicking DONATE.

network-analysis, networkx

Sunny Solanki

Software Developer | Youtuber | Bonsai Enthusiast

Subscribe to Our YouTube Channel

Tutorial Categories

Artificial Intelligence (83)
Data Science (84)
Digital Marketing (8)
Machine Learning (38)
Python (131)

Network Analysis in Python: Important Structures and Bipartite Graphs [Networkx]¶

Table of Contents¶

Introduction ¶

1. Load Dataset ¶

2. Create a Graph ¶

3. Visualize Graph ¶

3.1 Networkx Plot ¶

3.2 Circos Plot ¶

3.3 Arc Plot ¶

3.4 Matrix Plot [Adjacency Matrix] ¶

4. Structures in a Graph ¶

4.1 Cliques & Triangles ¶

4.2 Connected Components ¶

4.3 Plotting Individual Connected Components as Networkx Graph ¶

4.4 Adding Connected Components Index as Metadata to Nodes & Visualizing Graph ¶

5. Bipartite Graphs ¶

5.1 Load Dataset ¶

5.2 Create Graph ¶

5.3 Analyze Properties of Bipartite Graph ¶

5.4 Projections in Bipartite Graph ¶

References ¶

Sunny Solanki

Comfortable Learning through Video Tutorials?

Stuck Somewhere? Need Help with Coding? Have Doubts About the Topic/Code?

Want to Share Your Views? Have Any Suggestions?

Sunny Solanki

Subscribe to Our YouTube Channel

Tutorial Categories

Newsletter Subscription