Avoid too many significant digits

Rafael Irizarry

By default, statistical software like R returns many significant digits. The default behavior in R is to show 7 significant digits. That many digits often adds no information and the added visual clutter can make it hard for the viewer to understand the message. As an example, here are the per 10,000 disease rates, computed from totals and population in R, for California across the five decades:

state year Measles Pertussis Polio
California 1940 37.8826320 18.3397861 0.8266512
California 1950 13.9124205 4.7467350 1.9742639
California 1960 14.1386471 NA 0.2640419
California 1970 0.9767889 NA NA
California 1980 0.3743467 0.0515466 NA

We are reporting precision up to 0.00001 cases per 10,000, a very small value in the context of the changes that are occurring across the dates. In this case, two significant figures is more than enough and clearly makes the point that rates are decreasing:

state year Measles Pertussis Polio
California 1940 37.9 18.3 0.8
California 1950 13.9 4.7 2.0
California 1960 14.1 NA 0.3
California 1970 1.0 NA NA
California 1980 0.4 0.1 NA

Another principle related to displaying tables is to place values being compared on columns rather than rows. Note that our table above is easier to read than this one:

state disease 1940 1950 1960 1970 1980
California Measles 37.9 13.9 14.1 1 0.4
California Pertussis 18.3 4.7 NA NA 0.1
California Polio 0.8 2.0 0.3 NA NA

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Business Analytics Copyright © by Rafael Irizarry is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book