Data visualization guideline: Get the chart type right for your data

Tell me what data you have. I’ll show you which chart type you should choose to visualize your data. Last updated on November 12, 2023.

MasaKudamatsu
Masa’s Design Reviews

--

An assortment of chart type examples in this article

Motivation

Google “how to choose chart type”, and you’ll get tons of articles advising you on this first step of any data visualization. As far as I see, all these guidelines introduce one chart type after another and then explain its pros and cons.

That is NOT how we work on data visualization. We don’t pick a chart type first and then choose the data appropriate for it. What kind of data analyst is that? :-)

We have data first, and an idea of what aspect of the data to visualize. Then we pick a chart type. A guideline should follow this sequence.

This article aims to be a guideline like that: first asking you which type of data you have and then suggesting the appropriate type of chart for your data.

1. Cross-sectional data

If your data concerns one point in time, say, the year of 2020, you have four options depending on what you want to emphasize.

1.1 Quantity comparison: Bar Chart

The most basic chart type, a bar chart, is appropriate for comparing quantities.

Image source: Google Chart

In the above example, the population in 2010 is compared across largest U.S. cities.

The size of a bar is a powerful visual cue for a quantity. Placing these bars side by side helps us immediately see the difference in quantities across the units of observation.

1.2 Share comparison: Multiple Pie Charts

If your data involves percentages, a pie chart provides a powerful visual metaphor of each observation’s share because we’re so much used to cutting a pie (or a cake) to share it among ourselves.

However, we should use small multiples of pie chart like this (Camoes 2013):

Image source: Figure 8C of Shwabiz (2014)

Few (2007) explains why we should avoid a single pie chart with all the observations included in it. Click the link and read page 2. I’m sure you’ll be illuminated.

1.3 Correlation: Scatterplot

If your data tells us about the correlation between two characteristics, go with scatterplot:

Image source: Google Charts

In the above example, each dot represents a student. The further to the right, the longer hours he or she studied. The further to the top, the higher their final grade. Perhaps the kind of a chart you want to show to your children. :-)

It’s important to know, however, that correlation doesn’t necessarily imply causation. A scatterplot is only suggestive of a possible cause of something.

Image source: xkcd.com

1.4 Many observations with differences in their importance: Treemap

If your data contains many observations some of which are more important than others, consider using what’s known as treemap:

S&P 500 stock price changes on March 15, 2021. Image source: finviz.com

In the above well-known example, the area of each rectangle represents the market capitalization of each of the 500 stocks, with its color indicating whether and how much the stock price increases (green) or declines (red). More important companies such as Google, Facebook, and Amazon (see the top middle) take up larger spaces, and thus visually attract attention, than other lesser-known companies. Still, if these lesser-known company stock prices go up across the board, we can immediately tell that as an expanse of green (as shown in the bottom-right quarter of the above treemap). So the overall trend can also be shown with a treemap.

Treemaps also show observations in the same category next to each other. In the above example, companies are clustered by industry. This makes it easier to locate the observations that the viewer is particularly interested in.

So treemaps can visualize as many as three characteristics of many observations all at once: one by location (e.g. industry), another by size (e.g. market capitalization), and the third by color (e.g. stock price changes).

Another application I can think of is per capita GDP growth across countries. Location refers to continents, size to population, and color to economic growth. Such a treemap can tell us whether and where many people enjoy economic booms in a particular year, because countries with large population will be visually “overrepresented”.

For more on treemaps, see Shneiderman and Plaisant (2014). It was Martin Wattenberg who first applied treemaps to stock market data back in 1999 (Wattenberg 2008).

2. Time-series data

If you have data tracking the same observation across many periods (e.g., the price of a particular stock, the annual number of deaths due to traffic accidents), you have four choices.

2.1 Changes are more important: Line Chart

Image source: Google Charts

Line chart visualizes changes across time very effectively, because the slope of a line immediately tells us whether it’s an increase or a decline.

In particular, line chart is powerful when a decline needs to be emphasized. We know everything by default falls to the ground due to gravity. A downward trajectory of falling objects is an image we’re so used to. This cartoon exploits this fact very effectively:

Page 1 of Extracts from an Investigation into the Physical Properties of Books, written by William Addison Dwiggins in 1919. Image source: Wikimedia Commons

2.2 Levels are more important: Bar Chart

In some cases, however, levels, not changes, are more important as in the following example:

The top page of New York Times on March 27, 2020. Image source: Gakushi Fujiwara

It’s a great example of editorial design combined with data visualization. With the pandemic, the number of job losses in the U.S. hits the unprecedented level. With line chart, you wouldn’t be able to emphasize the unprecedented level of job losses, because the level is visualized just by a tiny end point in line chart, instead of a super-long rectangle in bar chart.

Another example of the time-series data better represented as a bar chart is Scarr (2011):

It is so effective that I don’t think I need to explain any. Here is a commentary on this chart by Kosara (2021a).

2.3 Changes are small relative to levels: Bar Chart on changes

Even if changes are more important, there is one situation where the bar chart is more appropriate: when changes are small relative to levels.

Monthly sales of a fictitious coffee chain over January to October. Image source: Kosara (2013)

In this example, monthly changes are small relative to the levels. As a result, it is difficult to compare the degree of changes from month to month.

Kosara (2013) proposes the following solution in this kind of situation: plotting the changes, not the levels, as bars.

Monthly changes in sales of a fictitious coffee chain over January to October. Image source: Kosara (2013)

Now it is very clear in which months there are big changes, positive and negative.

2.4 Distribution: Scatterplot

If we have the time-series data on distribution, we can use a scatterplot:

Image source: Handa (2019)

In this chart, the distribution of skin tones for the fashion models that appear in the cover of Vogue magazine is plotted over years. In this particular example, plotting the average or median in a line chart cannot tell the whole picture because the diversity of skin tones is what concerns us. A scatterplot allows us to see how heterogeneous or homogenous Vogue cover models are over time.

2.5 Share to the whole after a particular time: Pie Chart

You might be surprised to learn you can use a pie chart for time-series data. Here is a rare example where a pie chart is effective:

Image source: Cal Fire (2021)

This pie chart plots the 20 largest wildfires in California, sorted by time. Each “pie” measures the size of areas burnt down.

Kosara (2021b) argues that, in this particular example, a pie chart is successful because it works well to “get the point across about the vast majority of the largest wildfires having happened in the last 20 years.”

And I agree. It is staggering that the last 2 years account for more than a half of the damages over the last 90 years, and this point is very powerfully demonstrated with the use of a pie chart.

3. Two-period panel data

If you observe data at two points in time, say, this week and last week, you have three options depending on the type of data you compare across the two periods.

3.1 Quantity comparison: Paired Bar Chart

If a bar chart is appropriate for quantity comparison at one point in time, then it’s a logical progression that pairing bars for each observation will help comparing quantities across two points in time:

Image source: Google Charts

For each observation, we can immediately see whether the quantity increases or declines. In the above example, it’s very easy to tell that only in Chicago population declined from 2000 to 2010.

A drawback of paired bar chart is the difficulty of comparison across observations at each point in time. Applying different color to each period as in the above example helps a bit (squint your eyes to focus on red or blue), but not quite.

The “time” dimension doesn’t need to be time, however. Take this example:

Image source: Dholakia (2023)

Here the unit of observations is the type of video games (action, platform, racing, role-playing, and shooter). Each unit has two observations: one for EU and the other for North America. We can think of space as “time” here.

3.2 Share comparison: Stacked Bar Chart

If your data contains percentages across two time periods, go for stacked bar chart:

Image source: Google Charts (adapted by the author)

You could use paired bar chart, but the use of stacked bar chart is more effective for visualizing the share of each observation at the same point in time as well, in addition to changes in the share of each observation across periods.

You can also use stacked bar chart across more than two periods like this one:

Japanese Prime Minister’s approval (in blue) & disapproval (in red) ratings from November 2013 to November 2020. The grey indicates the share of people who did not answer. Image source: Asahi Shinbun Digital

I find this particular example is a great piece of data visualization. See my other article (Kudamatsu 2020) for why I think so.

3.3 Ranking: Slope Chart

If your data contains ranks or if you want to stress changes in the ranking across two periods, consider using slope chart:

Image source: Figure 9D of Shwabiz (2014)

In the above example, the sources of income for people above the retirement age in the U.S. are compared between 1962 and 2009. It is immediately clear that the Other category slips into the bottom of the ranking, as its downward-sloping line crosses a lot.

Unlike stacked bar chart, however, slope chart should not be used for multiple periods. It is going to be very messy with many lines crossing with each other.

4. Multi-period panel data

If your data spans across many periods for multiple units of observation, you have five options.

4.1 Difference between two units: Overlapping Area Chart

If there are only two units of observation, and if you want to stress when there is a big difference between the two, the overlapping area chart is a way to go.

Image source: Yi (2021)

In the above example, there are just two units of observation: the number of entries to and the number of exits from Embarcadero subway station in San Francisco. It is visually very clear that the number of entries exceeds during the morning hours while the number of exits surpasses during the evening hours.

Once there are more than two units of observation, however, the overlapping area chart gets very messy. As both Yi (2021) and Google (undated) recommend, use it only with two units of observation.

4.2 Changes in the total: Stacked Area Chart

If you want to indicate changes in the total over time, the stacked area chart may be a good option:

Image source: Yi (2021)

In the above example, the total number of active users is clearly shown as well as its breakdown into trial users, basic plan users, and premium plan users.

A drawback is that each unit’s size is gauged with the width of a color band, which is not easy (or actually impossible) to perceive (Few 2011).

If changes in the level of each unit of observation are also important, it is best to show a line graph for the total (see Section 2.1 above) along with sparklines for each unit, about which we discuss next.

4.3 Many units of observation: Sparklines

With many units of observations, data points are all over in two dimensions (e.g. time and space), making any charts too complicated to quickly understand.

Sparklines is a great option in this case:

Image source: Figure 6B of Shwabiz (2014)

In the above example, we have four causes of disability: circulatory, mental, musculoskeletal, and cancer. For each, the data indicates its share in the total disability insurance awarded in the U.S. in each year from 1975 to 2010.

By repeating the same line chart with all but one series alternately grayed out, we can clearly see how each observation’s quantity changes over time in comparison to other observations.

This is a great technique to deal with many data points in two dimensions.

4.4 Changes compared to a particular date: Index Chart

Stock price changes relative to February 2005 (image source: Protovis)

If all that matters is the changes relative to the level on a particular date, an index chart is a way to go. It is a line chart, essentially, but the values are transformed into the ratio to the ones on a reference date for all observations. By subtracting 100% from the ratio, then, the y-axis indicates the percentage of changes relative to the reference date.

In the above example, changes in stock prices are shown relative to February 2005. Those stock traders do not care about the absolute level of stock prices. They only care about whether the price goes up or down relative to the one on the date of purchase.

4.5 Two outcome measures over time: Connected scatterplots

So far in Section 4, we have only looked at the data with a single outcome. What if we have two outcomes over time for several units of observation?

As long as the number of units is up to three or four, we can use a connected scatterplot:

Image source: Koivunen-Niemi and koponen+Hildén (2021)

Here eight countries from Nordic and Baltic regions are plotted over time about two outcomes: spending on higher education and the share of people going to college.

This article is based on my thoughts inspired by reading Shwabiz (2014)’s excellent discussions on data visualization (during my long flight from Osaka to Madrid back in February 2017).

Any feedback and suggestion would be much appreciated. Post a comment below.

I’ll keep updating this article as I learn more about data visualization. Bookmark this page and come back whenever you need to create a new chart.

Stay tuned.

Changelog

Nov 12, 2023: Section 2.2 is expanded with an additional example.

Nov 11, 2023: Section 2.5 is added.

Mar 11, 2023: Section 2.4 is added.

Mar 10, 2023: Section 4.5 is added.

Mar 6, 2023: Section 4.1 is expanded.

Feb 28, 2023: Section 4.4 is added.

Feb 9, 2023: Section 4 is expanded.

Feb 6, 2023: Section 2.3 is added.

References

Cal Fire. (2021) “All but 3 of the Top 20 Largest #Wildfires have occurred since 2000…”, Twitter, Sep 11, 2021.

Camoes, Jorge. (2013) “Finally revealed: the optimal number of categories in a pie chart”, Excelcharts.com, Feb 14, 2013.

Dholakia, Sara. (2023) “A Guide To Getting Data Visualization Right”, Smashing Magazine, Jan 5, 2023.

Few, Stephen. (2007) “Save the Pies for Dessert”, Visual Business Intelligence Newsletter, August, 2007.

Few, Stephen. (2011) “Quantitative Displays for Combining Time-Series and Part-to-Whole Relationships”, Visual Business Intelligence Newsletter, January-March, 2011.

Google. (undated) “Data visualization”, Material Design 2, undated.

Handa, Malaika. (2019) “Colorism in High Fashion”, The Pudding, April 2019.

Kosara, Robert. (2013) “Continuous Values and Baselines”, Eagereyes, April 2013.

Kosara, Robert. (2021a) “New video: Chart Appreciation, Iraq’s Bloody Toll by Simon Scarr”, Eagereyes, Jul 22, 2021.

Kosara, Robert. (2021b) “Can A Timeline Pie Chart Work?”, Eagereyes, Sep 16, 2021.

Kudamatsu, Masa. (2020) “Stacked bar chart for approval ratings over time”, medium.com, Nov 22, 2020.

Scarr, Simon. (2011) “Iraq’s bloody toll”, South China Morning Post, Dec 17, 2011.

Shneiderman, Ben, and Catherine Plaisant. (2014) “Treemaps for space-constrained visualization of hierarchies”, University of Maryland Human-Computer Interaction Lab, Sep. 2014

Shwabiz, Jonathan A. (2014) “An Economist’s Guide to Visualizing Data”, Journal of Economic Perspectives, 28(1): 209–34.

Koivunen-Niemi, Laura, and Koponen+Hildén (2021) “Learn to Create a Connected Scatter Plot Using Python With Data From Eurostat (2019)”, Sage Research Methods: Data Visualization, Apr 20, 2021.

Sleeper, Ryan. (undated) “How to Make Connected Scatter Plots in Tableau”, Playfair Data Visual Analytics and Tableau Blog, undated.

Wattenberg, Martin. (2008) “Map of the Market (1998)”, bewitched.com, Aug. 2008.

Yi, Mike. (2021) “A Complete Guide to Area Charts”, Chartio, 2021.

--

--

MasaKudamatsu
Masa’s Design Reviews

Self-taught web developer (currently in search of a job) whose portfolio is available at masakudamatsu.dev