We'll be discussing the basic principles of visualizing data that you have collected and analyzed. We'll be talking about various principles to keep in mind when forming visualization which makes sense to the human brain. Our main focus during this blog will be to learn about ways that help present data that is intuitive to the human brain and can be very easily interpreted without any kind of training.
Data visualization is divided into 3 categories:
Examples
: static plots using matplotlib, seaborn, etc.Examples
: dashboards using dash, plotly, bokeh, voila, panel, etcWe'll be concentrating on Information Visualization
in this blog.
It refers to the visual representation of information keeping below mentioned goals in mind:
Our main focus will be the effectiveness
of visualized information as it helps with faster interpretations, more distinctions and fewer errors.
As the amount of data increases over time, we need an efficient way to represent such a vast amount of data so that meaningful insight can be derived from it which would be otherwise impossible by going through it manually.
One more motivation behind information visualization is that the human visual system has the highest bandwidth channel to the human brain. Our brain can easily interpret information represented on screen which has almost a million pixels of data.
The human brain is also extremely good at detecting patterns in data represented visually.
Data Visualizations can reveal data that sometimes you might not be able to derive using statistics.
We'll be covering below mentioned important topics of information visualizations which can help one represent data very accurately using visualization:
In order to visualize data, we need to map datasets to visual attributes. It's also referred to as data encoding. It generally consists of two steps:
Please feel free to look at the small dataset below and try to identify which columns refer to which data type(nominal, ordinal or quantitative?).
ANSWER:
French cartographer Bertin presented 7 key visual attributes that can be used to represent data in 0/1/2 dimensions. Below we have mentioned that 7 attributes:
Bertin also provided levels of organization which you can use with various data types:
Attribute | level of organization |
---|---|
Position | N O Q |
Size | N O Q |
Value | N O q |
Texture | N o |
Color | N |
Orientation | N |
Shape | N |
N - Nominal, O - Ordered, Q - Quantitative
We can deduce below points from the table above:
Please go through the image below and identify which data types and mappings are represented in the following visualization:
ANSWER:
Above graphs has 2 different variables represented:
Please go through the image below and identify which data types and mappings are represented in the following visualization:
ANSWER:
Above graphs have 4 different variables represented:
We'll effect of dimensionality on data types to visual attributes mappings. We'll consider data of different dimensions and which visualization can be used to represent it.
A dataset with a single variable can be represented with various plots like line chart, bar chart, box plot, dot chart, etc.
A dataset with 2 variables can be easily described using 2D scatter plots.
A dataset with 3 variables can be represented with 3D scatter plots but we can't really see an exact representation of the 3rd dimension on a 2-dimensional surface.
As we can see above that we can't really say where E and F are in relation to one another in 3D scatter plot. Hence, it'll be a better choice to use 2D scatter plot with 3rd dimension represented with attribute.
Two variables [x,y] can map to points(scatter plots, maps, etc.). The third variable [z] must use color, size, shape, etc.
How many variables can be depicted in an image?
Past research shows that it's not possible to cross a barrier of 3 variables on 2D surface visualization but one can use more attributes to represent more than 3-dimensional data.
Information visualization is all about choosing effective visual encodings to represent information from a given dataset. But choosing best encoding (or mapping) from many possibilities is a challenge. Hence, we'll consider below mentioned 3 basic principles when deciding best encoding:
Decades of research has found out below mentioned perceptual properties by our human brain:
Our brain can accurately identify the difference in position, length and less accurately in color and density. One should decide importance ordering according to the above perceptual properties for the best results.
Expressiveness is defined as below:
all
the facts in the set of data, and only
the facts in the data.Some examples of expressiveness would be that we can not use color/hue to represent which color is greater than another, can't use length attribute to represent a nominal variable, etc.
Consistency refers that the properties of the image (visual attributes) that should match the properties of the data. E.g. Don't map one-dimensional data to two-or-three dimensional representations.
Please go through the below dataset and try to create visualizations that take into account as many dimensions as possible.
ANSWER
Please find below one of the possible answers on how different variables are encoded. Please make a note that this is not an efficient answer and there can be a different efficient way to represent as well.
Please go through the below image and try to answer whether it's an effective visual representation of data. If yes then try to reason why and if not then why not.
ANSWER
NO. The above visual representation is not expressive because it implies incorrect ordinal relationships among countries.
There 5 different ways to increase the amount of information encoded by visual representation on 2D surface.
Till now we have discussed data types, visual mappings, etc. Now we'll move on to another end of the spectrum and talk about how the human mind processes that information. It's important to understand how visual perception works in order to effectively design visualizations.
70% of our body's sense receptors reside in our eyes. The eye and the visual cortex of the brain form a massively parallel processor that provides the highest bandwidth channel to human cognitive centers. It's important to keep in mind that the eye is not a camera and attention is selective. The camera has good optics whereas the eye has relatively poor optics. The camera has a single focus, white balance, exposure whereas the eye is constantly scanning, constantly adjusting focus, constantly adapting. The camera captures full image whereas the eye works with the mental reconstruction of the image.
Please check the below example which explains what should be kept in mind when designing visualization.
Please check the below example which explains the above point clearly.
Image 1:
Image 2:
We can clearly see that it takes time to clearly guess a number of fives in the first image than in the second image. It's therefore important to take into consideration points like what can be perceived immediately, which properties are good discriminators, which can mislead viewers when designing visualization.
Below is a list of points which are preattentive:
Color(Hue)
is preattentive: Detection of the red circle in group of blue circles is preattentive in below visualization.Form(Curvature)
is preattentive: Curved form "pops out" of display in the below image.Detection of the slanted line
in a sea of vertical lines is preattentive.Note; It's important to note that while color and form can be preattentive to the eye but the conjunction of both is not. We can see it in below visualization that it's hard to find a red circle in a sea of req square and blue circle distractors.
To understand the magnitude, let's start with a simple guessing example. Please try to guess how an area of the small circle is related to a big circle in the below image. Make two guesses after considering.
ANSWER
The correct answer is big circles area is 25 times that of small circle.
Research shows that below are magnitude by which people underestimate/overestimate various properties:
Length:
0.9 to 1.1Area:
0.6 to 0.9 (Underestimation)Volume:
0.5 to 0.8 (Even more underestimation)In the above image, the majority of people think that a big circle has an area of around 16 times that of the small circle. People generally guess magnitude for line accurately.
The below image shows how good the human eye is at estimating relative magnitude for various attributes.
We can notice above that the human eye can accurately estimate magnitude related to the position and least accurately for Color. It suggests that Color(hue/saturation/value) is not a good attribute when representing quantitative variable.
Below is a list of important points to critique when evaluating a visualization:
Above mentioned are basic principles if followed well will help you design visualization effectively and will help you get meaningful insights from data. It's also good practice to critique visualization after it’s designed by asking questions mentioned above as it'll help improve end result.
If you want to