grafixgeek.com

data driven graphics made easy

Why is data visualisation important?

Posted on October 19, 2015

It has oft been said that "a picture paints a thousand words". Conventional wisdom is often bandied about by means of quotes. But is it in fact true a picture tells a better story than words? Or in the case of data visualisation, a better story than figures.

In general I think the only answer can be that it is subjective, and depends on the situation. To be sure graphic visualisations can be misleading just as easily as they can inform. Any kind of information presentation requires care and honesty whether it be through text or images.

It is no secret that we live in a world with information overload. And younger generations have grown up in a graphic intensive world where few can detect patterns among rows and columns of numbers. But almost anyone can interpret a properly constructed data visualisation.

Anscombe's Quartet

In 1973 the statistician Francis Anscombe demonstrated the importance of graphing data when he produced four sets of figures with almost idential statistical properties, yet appear very different when graphed (Anscombe, 1973). Tables and graphs courtesy of Wikipedia.

Anscombe's quartet
I II III IV
x y x y x y x y
10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89

The above table shows Anscombe's four data sets. Even though it is just a small data set, it is almost impossible for a normal person to understand how the values x and y relate to each other in each data set just by looking at this figures. For this reason it is very common to employ statistics. Now while statistical properties are beyond the understand of a person not trained in such methods, even they can be quite uninformative as this constructed case shows. The table below has the statistical properties for each data set.

Property Value
Mean of x in each case 9 (exact)
Sample variance of x in each case 11 (exact)
Mean of y in each case 7.50 (to 2 decimal places)
Sample variance of y in each case 4.122 or 4.127 (to 3 decimal places)
Correlation between x and y in each case 0.816 (to 3 decimal places)
Linear regression line in each case y = 3.00 + 0.500x (to 2 and 3 decimal places, respectively)

And yet when we graph the data we get four completely different pictures:


However, data visualisation is more than just graphing datasets. It is a form a story telling. The rise of the infographic phenomenon is testament to this, and data is the basis of these stories whether it is presented in a traditional graph format or not.

Cole Nussbaumer (Saden, 2015) says that good data visualisation leads to a good understanding of the data. If it delivers that "A-ha" moment, then it has done its job well.

It doesn't matter how good your data is, if you can't communicate it effectively, your message will be lost.

References

  • Anscombe, F.J. (1973). Graphs in Statistical Analysis. The American Statistician Vol. 27, No. 1 (Feb., 1973), pp. 17-21.
  • Saden,C. (2015). Defining Data Visualization - Data Visualization and D3.js. YouTube. https://www.youtube.com/watch?v=TMH43G11OnQ.
  • Wikipedia,. (2015). Anscombe's quartet. https://en.wikipedia.org/wiki/Anscombe%27s_quartet.