Avoid too many significant digits
Rafael Irizarry
By default, statistical software like R returns many significant digits. The default behavior in R is to show 7 significant digits. That many digits often adds no information and the added visual clutter can make it hard for the viewer to understand the message. As an example, here are the per 10,000 disease rates, computed from totals and population in R, for California across the five decades:
state | year | Measles | Pertussis | Polio |
---|---|---|---|---|
California | 1940 | 37.8826320 | 18.3397861 | 0.8266512 |
California | 1950 | 13.9124205 | 4.7467350 | 1.9742639 |
California | 1960 | 14.1386471 | NA | 0.2640419 |
California | 1970 | 0.9767889 | NA | NA |
California | 1980 | 0.3743467 | 0.0515466 | NA |
We are reporting precision up to 0.00001 cases per 10,000, a very small value in the context of the changes that are occurring across the dates. In this case, two significant figures is more than enough and clearly makes the point that rates are decreasing:
state | year | Measles | Pertussis | Polio |
---|---|---|---|---|
California | 1940 | 37.9 | 18.3 | 0.8 |
California | 1950 | 13.9 | 4.7 | 2.0 |
California | 1960 | 14.1 | NA | 0.3 |
California | 1970 | 1.0 | NA | NA |
California | 1980 | 0.4 | 0.1 | NA |
Another principle related to displaying tables is to place values being compared on columns rather than rows. Note that our table above is easier to read than this one:
state | disease | 1940 | 1950 | 1960 | 1970 | 1980 |
---|---|---|---|---|---|---|
California | Measles | 37.9 | 13.9 | 14.1 | 1 | 0.4 |
California | Pertussis | 18.3 | 4.7 | NA | NA | 0.1 |
California | Polio | 0.8 | 2.0 | 0.3 | NA | NA |