Selecting the right data visualization is a fundamental aspect of effective data communication. Here are some principle steps in the decision-making process:
1. Know the Data: The first step is to thoroughly understand the data you are working with. Determine if the data are quantitative or qualitative; categorical, ordinal or ratio; or continuous or discrete, all of these factors influence chart-type selection.
2. Define the Objective: Think about what you want to convey with your visualization. Do you want to compare values, show a distribution, examine relationships between variables, demonstrate trends over time, or present a part-to-whole relationship?
3. Consider the Venue: Anticipate who will be looking at your visualization. What is their level of expertise? What do they care or know about?
4. Choose an Appropriate Chart Type: Depending on the nature of your data and your objectives, certain types of charts may be more suitable than others. Bar charts and column charts are good for comparing discrete quantities, while line charts are useful for showing trends over time. Scatter-plots can display relationships between variables. Pie charts illustrate part-to-whole relationships, and box plots are very useful for depicting statistical distributions.
5. Guiding Principles: Once you have chosen a chart type, you can begin to design your visualization. Simplicity is best when designing a chart. Using colors, labels, and titles effectively helps to guide the viewer's attention and enhance understanding.
6. Get Feedback Show your visualization to others, especially people who represent your target audience, and gather feedback, and refine your visualization as needed based on the feedback.
Data can be divided into two broad categories: Quantitative, i.e. numbers that represent measurement; and Qualitative, non-numeric descriptors or codes like text, labels, and numeric codes. In most venues of statistics numerical data is expanded to include numeric, ratio and interval. Advances in computer science have expanded the range of data types to include text, and multi-media like image, video, and audio data. Please see below for more details about data types.
Numerical Data: Quantitative, meaning they represent a measurable quantity. Numerical data are either:
Interval Data: This is a numerical type of data in which the difference between two values is meaningful. However, there's no absolute zero point (a point that represents a lack of the quantity being measured). Examples include temperature, where the difference between 30 and 40 is the same as between 20 and 30, but there is not an absolute zero point that indicates no temperature.
Ratio Data: This type of data is similar to interval data, but it has an absolute zero point. Examples: height, weight, age, distance. Here, zero means the absence of the variable being measured.
Categorical Data: Qualitative, meaning that categories are defined by characteristics of objects, or people. Examples include color of a shirt, type of animal, brand of a product, race, sex or occupation. Categorical can be subdivided into:
Spatial Data Another type of data is spatial data or data about location. Traditionally these are simply called maps, although advances in computer science have expanded the functionality and utility of map composition and data management and so requires special consideration. Maps can be used to visualize the same data types as described above, but uses spatial units like counties, states and countries.
SEE more at the Library's GIS Guide
Temporal Data, or time-series data provide a chronological sequence of data points over time. This type of data is often found in domains like finance for tracking stock prices, meteorology for weather trends, and digital analytics for website traffic observations. Time-series plots streamline the presentation of continuous data, making it easier to analyze temporal patterns, displaying the start and end dates of observations.
Hierarchical Data Hierarchical data is structured in a way that resembles a tree, where each item or node has a parent and possibly many children, except for the root node, which has no parent. This type of data organization is naturally visualized using tree graphs or hierarchical diagrams, such as organizational charts, family trees, and file systems. These visualizations effectively illustrate relationships and hierarchies, allowing viewers to understand complex structures through a clear, nested layout. Hierarchical diagrams can show the breakdown of broad categories into finer sub-categories, depict lines of authority in organizations, trace genealogies, or represent the structure of software systems, making them invaluable tools in various fields for organizing and presenting data in an intuitive and scalable manner.
Network Data Network data refer to relationships and interactions between entities within a system, like social networks of people, ecological systems, internet networks and more. Network data are structured around nodes, representing the entities, and edges, which denote the relationships or interactions between them. Analyzing network data facilitates exploration of connectivity patterns, the identification of influential nodes, and understanding of network dynamics, and gives some insights into the organization, behavior, and properties of complex systems.
Unstructured Data Another type of data is unstructured. This includes text, video audio and other multimedia types. These data types have emerged in response to innovations in computer science and include things like text images, video and other multimedia. In some contexts like Geographic Information Systems (GIS) a satellite image is a form of data or computer which captures and processes images as data. Some aspects of these data are relevant in the context of visualization, such as when you visualize the range of pixel values of an image.
After determining the type(s) of data in your dataset, you will want to determine what kind of relationship or function you want to demonstrate.
Compare Values: If your want to visualize a comparison of two or more values, you want to pick a visualization that clearly shows the differences or similarities between different data points or categories. For example, you might use a bar or column chart to compare sales figures of different products or revenue generated by different departments in a company.
Illustrate a Distribution: Showing a distribution involves visualizing the spread of data across a range of values, and is useful if you want to illustrate the frequency, or pattern of data. Common visualization types include: histograms, box plots, and density plots.
Examine Relationships Between Variables: To explore the correlation or association between two or more variables, you can use scatter plots, line charts, or heatmaps. These visualizations help to identify patterns and trends that might exist between different variables.
Demonstrate Trends Over Time: If you want to illustrate trends or patterns that occur over time, line charts, area charts, or time series plots are effective choices. These visualizations reveal how a particular measurement or data point changes over different time intervals.
Show a Part-to-Whole Relationship: Presenting a part-to-whole relationship involves illustrating how individual components contribute to the overall whole. Pie charts and stacked bar charts are commonly used to show the proportion of different parts relative to the total.
Show Location: If you want to show how values change over or how they compare over an area like city, state or country, a cartographic product would be very good solution.
The Venue encompasses all aspects of where, how and to whom the visualization will be viewed. Presenting visualizations online presents opportunities for much more engaging and dynamic visualizations with interactive utilities. Some professional journals may have requirements on how visualizations can be formatted. Furthermore, you have to anticipate who your audience will be and what assumptions and skills they bring to interpreting the visualization.
Number of Data Points: If you have a large number of observations, consider using aggregated or summarized visualizations, such as grouped bar charts or box plots, to avoid clutter.
Data Density: For dense data with many overlapping points, consider using scatter plots with transparency or jittering to prevent visually overwhelming charts.
Data Context and Audience: Anticipate the context in which the visualization will be presented and the data literacy level of your audience. Select visualizations that are comprehensive enough to communicate the information, but are easy to interpret.
Data Integrity: Make sure that the chart accurately represents the data without distorting or misleading the audience.
This framework guides the design of visualizations that not only communicate data accurately but also effectively engage and inform the audience.
Clear: Clarity ensures that the message within the data is easily understood by the audience. This means the visualization should be straightforward, avoiding any potential confusion or misinterpretation. Clarity can be facilitated by:
Clean: Cleanliness in a visualization typically emphasizes a minimalist approach, where every element serves a purpose. A clean design is free from clutter and superfluous information that could distract from the key message. Cleanliness entails:
Concise: Conciseness involves distilling the data to its most essential elements, and presenting it in a manner that is both brief and comprehensive. A concise visualization communicates the key insights without oversimplification or loss of critical information. To be concise:
Captivating: A captivating visualization draws the viewer in, holds their attention, and makes the data memorable. This principle is about leveraging the power of visual storytelling to engage the audience on an emotional or intellectual level. To captivate your audience:
By adhering to these Four C's, data visualization professionals can create visuals that are not only aesthetically pleasing but also meaningful and effective in communicating complex information. The integration of clarity, cleanliness, conciseness, and captivating elements ensures that visualizations serve their primary purpose: to illuminate insights and facilitate understanding in a way that words alone cannot achieve.
Gathering feedback from friends, colleagues, and potential viewers is a valuable step in the data visualization process. It provides diverse perspectives that can help refine and enhance the effectiveness of your visualization. Here are some strategies to improve your visualization with feedback and iterative testing:
The Value of External Feedback
Effective Iteration and Testing Strategies
Implementing Feedback Effectively
Documentation and Reflection