Order categories by a meaningful value

Rafael Irizarry

When one of the axes is used to show categories, as is done in barplots, the default is to order the categories alphabetically when they are defined by character strings. If they are defined by factors, they are ordered by the factor levels. We rarely want to use alphabetical order. Instead, we should order by a meaningful quantity. In all the cases above, the barplots were ordered by the values being displayed. The exception was the graph showing barplots comparing browsers. In this case, we kept the order the same across the barplots to ease the comparison. Specifically, instead of ordering the browsers separately in the two years, we ordered both years by the average value of 2000 and 2015.

We previously learned how to use the reorder function, which helps us achieve this goal. To appreciate how the right order can help convey a message, suppose we want to create a plot to compare the murder rate across states. We are particularly interested in the most dangerous and safest states. Note the difference when we order alphabetically (the default) versus when we order by the actual rate:

Two bar graphs that show the murder rate across each of the 50 states. One graph is ordered alphabetically, one is ordered by highest to lowest murder rate. Vermont has the lowest murder rate, where the District of Columbia has the highest.
The murder rate of each of the 50 states. One done alphabetically, the other from highest to lowest murder rate.

Earlier we saw an example related to income distributions across regions. Here are the two versions plotted against each other:

Two dot plot graphs shows the average income per day of several countries. The lowest earners are in Southern Asia and Western Africa, the highest in in Northern America and Europe. One of these graphs is organized alphabetically, one from lowest to highest.
Graphs showing differently organized plots of the average daily income from different countries.

The first orders the regions alphabetically, while the second orders them by the group’s median.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Business Analytics Copyright © by Rafael Irizarry is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book