Streamlit is an open-source python library that lets us create a dashboard by integrating charts created by other python libraries like matplotlib, plotly, bokeh, Altair, etc. It even provided extensive supports for interactive widgets like dropdowns, multi-selects, radio buttons, checkboxes, sliders, etc. It takes very few lines of code to create a dashboard using streamlit. The API of streamlit is very easy to use and pythonic. As a part of this tutorial, we'll explain how to create a simple dashboard using streamlit by integrating charts created in matplotlib. We'll be adding interactions to charts which will let us modify charts to explore different relationships.
We have already covered one more tutorial where we explain how to create a basic dashboard using streamlit and plotly.
We expect that readers have basic knowledge of matplotlib and how to create charts using it to follow along in this tutorial. We'll also be skipping definitions of some of the streamlit functions which we have already covered in the tutorial whose link is given above. We recommend that readers go through that tutorial as well.
We'll be creating charts from pandas’ data frame directly. It'll create charts using matplotlib internally and will also require fewer lines of code to create charts.
We'll start by importing the necessary libraries for our tutorial.
import streamlit as st
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
pd.set_option("display.max_columns", 50)
We'll be using the breast cancer dataset available from scikit-learn. The dataset has various measures of tumor like radius, texture, perimeter, area, smoothness, etc. The dataset also has a type of tumor available which is either malignant or benign. Dataset has 569 rows where 357 are for malignant tumors and 212 are for benign tumors.
Below we have loaded the dataset into pandas dataframe. We'll be using this dataframe for plotting charts to create our dashboard.
breast_cancer = datasets.load_breast_cancer(as_frame=True)
breast_cancer_df = pd.concat((breast_cancer["data"], breast_cancer["target"]), axis=1)
breast_cancer_df["target"] = [breast_cancer.target_names[val] for val in breast_cancer_df["target"]]
breast_cancer_df.head()
In this section, we'll explain the charts that we'll be including in our dashboard separately for explanation purposes.
The scatter plot shows relationship between two measurements where each point of scatter point is colored to represent tumor type (malignant or benign). This can help us understand how measurements are varying across two tumor types.
By default, we'll be showing a relationship between measurements mean texture and mean area. When we include this chart in the dashboard, we'll be linking X and Y axes to dropdowns where each dropdown will have a list of all measurements. By selecting different values in dropdowns, we'll be able to explore the relationship between different combinations of measurements.
Our code for creating a scatter chart starts by creating a figure object. We have created an axis object using a figure object on which we'll be drawing a scatter chart. Then We have divided our original dataframe into two dataframe where one has an entry for malignant tumors and one has an entry for benign tumors. We have then created a scatter plot first using malignant tumors dataframe and then using benign tumor dataframe. We have created a chart directly by calling .plot.scatter() method on pandas dataframe. This method internally creates a plot using matplotlib and reduces the number of lines of code that we need to write if we do it using matplotlib directly. We have colored malignant tumor points with tomato color and benign tumor points with dodgerblue color. We have provided axis object that we created at the beginning to both functions calls so that both creates chart on the same axis.
scatter_fig = plt.figure(figsize=(8,7))
scatter_ax = scatter_fig.add_subplot(111)
malignant_df = breast_cancer_df[breast_cancer_df["target"] == "malignant"]
benign_df = breast_cancer_df[breast_cancer_df["target"] == "benign"]
malignant_df.plot.scatter(x="mean texture", y="mean area", s=120, c="tomato", alpha=0.6, ax=scatter_ax, label="Malignant")
benign_df.plot.scatter(x="mean texture", y="mean area", s=120, c="dodgerblue", alpha=0.6, ax=scatter_ax,
title="Mean Texture vs Mean Area", label="Benign");
For creating a side-by-side bar chart showing the value of the average measurements per tumor type, we have created an intermediate dataframe. The intermediate dataframe is created by grouping our original breast cancer dataframe based on tumor type and then taking an average of measurements for each group. Below we have printed an intermediate dataframe that has information about the average value for each measurement per tumor type.
avg_breast_cancer_df = breast_cancer_df.groupby("target").mean()
avg_breast_cancer_df
Below we have created a side-by-side bar chart which shows an average of measurements mean radius, mean texture, mean perimeter, and area error per cancer tumor type. By default, these four values will be displayed in the dashboard.
We'll be creating a multi-select which will include a list of all measurements in it. The chart will include average measurements which are selected in the multi-select widget.
Our code starts by creating a figure and axis objects. It then filters our average measurements dataframe which we created in the previous cell to include information about four measurements we mentioned earlier. The chart is created using this dataframe by calling .plot.bar() method on it.
bar_fig = plt.figure(figsize=(8,7))
bar_ax = bar_fig.add_subplot(111)
sub_avg_breast_cancer_df = avg_breast_cancer_df[["mean radius", "mean texture", "mean perimeter", "area error"]]
sub_avg_breast_cancer_df.plot.bar(alpha=0.8, ax=bar_ax, title="Average Measurements per Tumor Type");
The histogram shows the distribution of measurements values. This can be useful to analyze how values of measurements are spread.
By default, we'll be displaying a histogram of mean radius and mean texture. We'll be creating a multi-select with a list of all measurements in the dashboard. We'll link this multi-select with this histogram so that histogram of all selected measurements in multi-select is included in the chart.
Our code starts by creating a figure and an axis object. We then create a dataframe that has an entry for only mean radius and mean texture. The histogram is created by calling .plot.hist() method on this dataframe.
hist_fig = plt.figure(figsize=(8,7))
hist_ax = hist_fig.add_subplot(111)
sub_breast_cancer_df = breast_cancer_df[["mean radius", "mean texture"]]
sub_breast_cancer_df.plot.hist(bins=50, alpha=0.7, ax=hist_ax, title="Average Measurements per Tumor Type");
The hexbin chart is useful to show a relationship between two attributes explaining the density of samples. The chart has hexagons in it where the color of a hexagon is based on a number of data samples that fall in that hexagon. The darker hexagon represents the presence of more points in it.
By default, we'll be creating a hexbin chart of mean texture and mean area. We'll be creating two dropdowns in our dashboard and link them with a hexbin chart. Both dropdowns will have a list of measurements of cancer tumor type. We can try different combinations of these measurements using dropdowns to analyze data using a hexbin chart.
Our code for this example starts by creating a figure and axis objects. It then creates hexbin by calling .plot.hexbin() method on our original breast cancer dataframe. We have provided mean texture to be used for the x-axis and mean area to be used for the y-axis.
hexbin_fig = plt.figure(figsize=(8,7))
hexbin_ax = hexbin_fig.add_subplot(111)
breast_cancer_df.plot.hexbin(x="mean texture", y="mean area",
reduce_C_function=np.mean,
gridsize=25,
#cmap="Greens",
ax=hexbin_ax,
title="Concentration of Measurements"
);
In this section, we'll briefly explain the widgets and containers that we'll be using in our dashboard. We'll be using widgets to update charts and explore different relationships.
Please make a NOTE that we have included all widgets in sidebar of the dashboard. The same widgets can be included in the main container of the dashboard above charts as well. It'll require a little bit of layout handling. We have included it in the sidebar to make things simple.
In this section, we have included code for the dashboard. We have put together all charts, widgets, and containers that we discussed till now to create a final dashboard. We'll now explain the code of the dashboard.
If you want to see the definition of methods used in this tutorial then please feel free to check our tutorial explaining how to create a dashboard using streamlit and plotly whose link we have given at beginning of the tutorial and also in the reference section at the end.
Please make a note that each time you make a change to dashboard file, it'll show a button named Rerun on top-right corner of dashboard. Clicking on this button will rerun original file again to create dashboard with new changes.
You can execute the below command in shell/command prompt and it'll start the dashboard on port 8501 by default.
You can access the dashboard by going to link localhost:8501. The above command also will start the dashboard in the browser.
You can also record a screencast by clicking on a button with three lines in the top-right corner of the page and selecting the option Record a screencast.
import streamlit as st
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
import warnings
warnings.filterwarnings("ignore")
####### Load Dataset #####################
breast_cancer = datasets.load_breast_cancer(as_frame=True)
breast_cancer_df = pd.concat((breast_cancer["data"], breast_cancer["target"]), axis=1)
breast_cancer_df["target"] = [breast_cancer.target_names[val] for val in breast_cancer_df["target"]]
########################################################
st.set_page_config(layout="wide")
st.markdown("## Breast Cancer Dataset Analysis") ## Main Title
################# Scatter Chart Logic #################
st.sidebar.markdown("### Scatter Chart: Explore Relationship Between Measurements :")
measurements = breast_cancer_df.drop(labels=["target"], axis=1).columns.tolist()
x_axis = st.sidebar.selectbox("X-Axis", measurements)
y_axis = st.sidebar.selectbox("Y-Axis", measurements, index=1)
if x_axis and y_axis:
scatter_fig = plt.figure(figsize=(6,4))
scatter_ax = scatter_fig.add_subplot(111)
malignant_df = breast_cancer_df[breast_cancer_df["target"] == "malignant"]
benign_df = breast_cancer_df[breast_cancer_df["target"] == "benign"]
malignant_df.plot.scatter(x=x_axis, y=y_axis, s=120, c="tomato", alpha=0.6, ax=scatter_ax, label="Malignant")
benign_df.plot.scatter(x=x_axis, y=y_axis, s=120, c="dodgerblue", alpha=0.6, ax=scatter_ax,
title="{} vs {}".format(x_axis.capitalize(), y_axis.capitalize()), label="Benign");
########## Bar Chart Logic ##################
st.sidebar.markdown("### Bar Chart: Average Measurements Per Tumor Type : ")
avg_breast_cancer_df = breast_cancer_df.groupby("target").mean()
bar_axis = st.sidebar.multiselect(label="Average Measures per Tumor Type Bar Chart",
options=measurements,
default=["mean radius","mean texture", "mean perimeter", "area error"])
if bar_axis:
bar_fig = plt.figure(figsize=(6,4))
bar_ax = bar_fig.add_subplot(111)
sub_avg_breast_cancer_df = avg_breast_cancer_df[bar_axis]
sub_avg_breast_cancer_df.plot.bar(alpha=0.8, ax=bar_ax, title="Average Measurements per Tumor Type");
else:
bar_fig = plt.figure(figsize=(6,4))
bar_ax = bar_fig.add_subplot(111)
sub_avg_breast_cancer_df = avg_breast_cancer_df[["mean radius", "mean texture", "mean perimeter", "area error"]]
sub_avg_breast_cancer_df.plot.bar(alpha=0.8, ax=bar_ax, title="Average Measurements per Tumor Type");
################# Histogram Logic ########################
st.sidebar.markdown("### Histogram: Explore Distribution of Measurements : ")
hist_axis = st.sidebar.multiselect(label="Histogram Ingredient", options=measurements, default=["mean radius", "mean texture"])
bins = st.sidebar.radio(label="Bins :", options=[10,20,30,40,50], index=4)
if hist_axis:
hist_fig = plt.figure(figsize=(6,4))
hist_ax = hist_fig.add_subplot(111)
sub_breast_cancer_df = breast_cancer_df[hist_axis]
sub_breast_cancer_df.plot.hist(bins=bins, alpha=0.7, ax=hist_ax, title="Distribution of Measurements");
else:
hist_fig = plt.figure(figsize=(6,4))
hist_ax = hist_fig.add_subplot(111)
sub_breast_cancer_df = breast_cancer_df[["mean radius", "mean texture"]]
sub_breast_cancer_df.plot.hist(bins=bins, alpha=0.7, ax=hist_ax, title="Distribution of Measurements");
#################### Hexbin Chart Logic ##################################
st.sidebar.markdown("### Hexbin Chart: Explore Concentration of Measurements :")
hexbin_x_axis = st.sidebar.selectbox("Hexbin-X-Axis", measurements, index=0)
hexbin_y_axis = st.sidebar.selectbox("Hexbin-Y-Axis", measurements, index=1)
if hexbin_x_axis and hexbin_y_axis:
hexbin_fig = plt.figure(figsize=(6,4))
hexbin_ax = hexbin_fig.add_subplot(111)
breast_cancer_df.plot.hexbin(x=hexbin_x_axis, y=hexbin_y_axis,
reduce_C_function=np.mean,
gridsize=25,
#cmap="Greens",
ax=hexbin_ax, title="Concentration of Measurements");
##################### Layout Application ##################
container1 = st.container()
col1, col2 = st.columns(2)
with container1:
with col1:
scatter_fig
with col2:
bar_fig
container2 = st.container()
col3, col4 = st.columns(2)
with container2:
with col3:
hist_fig
with col4:
hexbin_fig
This ends our small tutorial explaining how we can create a basic dashboard using streamlit and matplotlib. Please feel free to let us know your views in the comments section.
If you are more comfortable learning through video tutorials then we would recommend that you subscribe to our YouTube channel.
When going through coding examples, it's quite common to have doubts and errors.
If you have doubts about some code examples or are stuck somewhere when trying our code, send us an email at coderzcolumn07@gmail.com. We'll help you or point you in the direction where you can find a solution to your problem.
You can even send us a mail if you are trying something new and need guidance regarding coding. We'll try to respond as soon as possible.
If you want to