Color has a fundamental role in data visualization through enhancing the readability and interpretability of data, and significantly impacting the viewer's ability to quickly and accurately understand the information being presented. The strategic use of color can highlight key data points, differentiate between data categories, and indicate numerical values through gradients. For instance, color can draw attention to outliers, trends, and patterns, making it easier for viewers to discern the story the data tells at a glance. Furthermore, color aids in memory retention and engagement with the visualization. By creating a visual hierarchy through color coding, viewers can prioritize information and understand complex datasets more easily. A well-designed color scale can simplify the comprehension of gradients and distributions, making it straightforward to compare magnitudes and assess relationships within the data.
This guide covers some important aspects of using color to create effective visualizations:
When thinking about what colors to use it is important to know what venue you are working in. When engaging with a viewer whether in person or indirectly through print, online etc. it is useful to anticipate what there expectations are. Consider also the content of your visualization and its wider context. If the content and the context in which it is communicated are more serious or scholarly then a color scheme reflecting these qualities should be used. If the content is more light-hearted or whimsical than a more adventurous color scheme could be used.
In the example below we see a chart with World War II casualties. In most situations, the viewer, the content and the context would anticipate something more staid such as the chart on the left; and avoid the garish, neon colors shown in the chart to the right.
Sequential
These are used for representing data that has an inherent order or progression, like a temperature gradient or time series data. Use a single color that varies in intensity or saturation.
Example:
Divergent
These are suitable for data with a critical midpoint, like positive and negative values around zero, median, mean, etc.. Divergent color scales use two distinct colors that blend at the midpoint.
Example:
Discrete
When representing distinct categories or groups, choose a set of easily distinguishable colors. Avoid colors that could imply an order or relationship between categories.
Example:
The colors you use should be consistent throughout your visualization, especially if you are comparing visualizations.
In the example below the categorical variables (Yes and no) are colored differently and will lead to confusion.
Inconsistent coloring of variables
In the below example the categories have the same color in both charts:
Consistent coloring of variables
There are a variety of barriers to color accessibility such as color blindness where physiological characteristics affect the ability to perceive color. This in turn affects how intelligible the visualization is and ultimately the story you are trying to tell. Color agnosia is another, cognitive, barrier to color interpretation whereby a viewer has difficulty recognizing and naming colors, though the colors may appear as they do to people with regular vision.
In most circumstances it will be impossible to know in advance what kind of barriers are present among viewers and there is really no color palette that will make visualizations completely accessible to all people (see, for example, the monochromatic simulation below where everything appears in gray scales). However, understanding what kinds of visual impairments exist and how they affect the way people see is a useful skill to have and may make it easier to communicate in some situations when engaging directly with diverse populations
In the examples below you can see a rainbow color ramp on the left with a simulation of what people with various types of colorblindness will see on the right.
Achromatopsia is very rare type of colorblindness resulting in a complete lack of color cells. People with Achromatopsia are also very sensitive to all light.
Blue cone monochromacy also called S-cone monochromacy, is a rare and severe form of color blindness caused by having only one type of functioning cone cell in their eyes, which is typically sensitive to blue light.
Deuteranopia is caused by a lack of green cone cells and people with deuteranopia are unable to distinguish green and red.
Protanopia is caused by a lack of red cone cells and people with protanopia are unable to distinguish green and red.
Protanomaly is caused by a red cone cell that is shifted toward the green cone's sensitivity.
Tritanomaly also called blue-yellow color blindness individuals have all three types of cone cells in their eyes, but the blue cone cells are affected and perceive blue and yellow colors differently than individuals with normal color vision.
Tritanopia (Blue colorblindness) is a rare form of color blindness where individuals have a deficiency in blue cone cells, leading to difficulty distinguishing between blue and green or yellow and red.