Encoding a third variable
Rafael Irizarry
Below is a scatterplot showing the relationship between infant survival and average income. It also encodes three variables: OPEC membership, region, and population.
data:image/s3,"s3://crabby-images/87b3b/87b3bb3d01c8e8dcfddc90fd42a4466729ab487d" alt="A scatter plot titled "Infant survival proportion". It uses colors and shapes to encode information."
It encodes categorical variables with color and shape. These shapes can be controlled with shape
argument. Below are the shapes available for use in R package. For the last five, the color goes inside.
data:image/s3,"s3://crabby-images/37490/3749070cd421224537c4a29ecc12561d9113ca5a" alt="Three rows of shapes usable in R package. All can be used to encode for different variables."
For continuous variables, we can use color, intensity, or size. We now show an example of how we do this with a case study.
When selecting colors to quantify a numeric variable, we choose between two options: sequential and diverging. Sequential colors are suited for data that goes from high to low. High values are clearly distinguished from low values. Here are some examples.
data:image/s3,"s3://crabby-images/33dc3/33dc334c0eb51861bb23a7a9f02dcac7c0fe9c39" alt="Color palettes that can be used to represent changing data, typically using less to more intensity or saturation."
Diverging colors are used to represent values that diverge from a center. We put equal emphasis on both ends of the data range: higher than the center and lower than the center. An example of when we would use a divergent pattern would be if we were to show height in standard deviations away from the average. Here are some examples of divergent patterns:
data:image/s3,"s3://crabby-images/c7aeb/c7aeba45a886074f72e28e4aeef8411e7f98a1ac" alt="A range of color palettes with values ranging from color to color, but become light and unsaturated in the center."