LibGuides: Data Visualization: Methods

Methods

Selecting the right data visualization is a fundamental aspect of effective data communication. Here are some principle steps in the decision-making process:

1. Know the Data: The first step is to thoroughly understand the data you are working with. Determine if the data are quantitative or qualitative; categorical, ordinal or ratio; or continuous or discrete, all of these factors influence chart-type selection.

2. Define the Objective: Think about what you want to convey with your visualization. Do you want to compare values, show a distribution, examine relationships between variables, demonstrate trends over time, or present a part-to-whole relationship?

3. Consider the Venue: Anticipate who will be looking at your visualization. What is their level of expertise? What do they care or know about?

4. Choose an Appropriate Chart Type: Depending on the nature of your data and your objectives, certain types of charts may be more suitable than others. Bar charts and column charts are good for comparing discrete quantities, while line charts are useful for showing trends over time. Scatter-plots can display relationships between variables. Pie charts illustrate part-to-whole relationships, and box plots are very useful for depicting statistical distributions.

5. Guiding Principles: Once you have chosen a chart type, you can begin to design your visualization. Simplicity is best when designing a chart. Using colors, labels, and titles effectively helps to guide the viewer's attention and enhance understanding.

6. Get Feedback Show your visualization to others, especially people who represent your target audience, and gather feedback, and refine your visualization as needed based on the feedback.

Know Your Data

Data can be divided into two broad categories: Quantitative, i.e. numbers that represent measurement; and Qualitative, non-numeric descriptors or codes like text, labels, and numeric codes. In most venues of statistics numerical data is expanded to include numeric, ratio and interval. Advances in computer science have expanded the range of data types to include text, and multi-media like image, video, and audio data. Please see below for more details about data types.

Numerical Data: Quantitative, meaning they represent a measurable quantity. Numerical data are either:

Discrete Data: are whole numbers that represent countable items. Examples include number of students in a class, or number of books on a shelf. It is impossible to half half of a student or half of a book in any meaningful sense
Continuous Data: can take any value within a range and can be divided meaningfully into smaller increments like. fractions or decimals. Examples: temperature, height, weight, time.

Interval Data: This is a numerical type of data in which the difference between two values is meaningful. However, there's no absolute zero point (a point that represents a lack of the quantity being measured). Examples include temperature, where the difference between 30 and 40 is the same as between 20 and 30, but there is not an absolute zero point that indicates no temperature.

Ratio Data: This type of data is similar to interval data, but it has an absolute zero point. Examples: height, weight, age, distance. Here, zero means the absence of the variable being measured.

Categorical Data: Qualitative, meaning that categories are defined by characteristics of objects, or people. Examples include color of a shirt, type of animal, brand of a product, race, sex or occupation. Categorical can be subdivided into:

Nominal Data : which are categorical data that have no order or priority. They are simply used to label or categorize. For example, hair color (blonde, brown, black) is a nominal variable, as no color is inherently better or worse, larger or smaller than the others.
Ordinal Data: categorical data with a definite ordering of the variables, where the order of the categories is important, but the differences between the categories are not necessarily evenly spaced. For example academic grades like A, B, C, D, F could represent values ranging from 90 to 100, 80 to 89, 70 to 79, 60 to 69 and 0 to 50 respectively; or customer satisfaction survey responses like very unsatisfied, unsatisfied, neutral, satisfied, very satisfied.
Binary/Ternary Data: Binary is the classification of data into two groups, e.g. "Yes" or "No," "male" or "female ,"etc., oftentimes represented by "0" or "1." Ternary data is classified into three groups, e.g. "Yes," "No," "Maybe," and can be represented as 0, 1, 2. Any classification of data into 4 or more categories is typically just called "categorical."

Spatial Data Another type of data is spatial data or data about location. Traditionally these are simply called maps, although advances in computer science have expanded the functionality and utility of map composition and data management and so requires special consideration. Maps can be used to visualize the same data types as described above, but uses spatial units like counties, states and countries.

SEE more at the Library's GIS Guide

Temporal Data, or time-series data provide a chronological sequence of data points over time. This type of data is often found in domains like finance for tracking stock prices, meteorology for weather trends, and digital analytics for website traffic observations. Time-series plots streamline the presentation of continuous data, making it easier to analyze temporal patterns, displaying the start and end dates of observations.

Hierarchical Data Hierarchical data is structured in a way that resembles a tree, where each item or node has a parent and possibly many children, except for the root node, which has no parent. This type of data organization is naturally visualized using tree graphs or hierarchical diagrams, such as organizational charts, family trees, and file systems. These visualizations effectively illustrate relationships and hierarchies, allowing viewers to understand complex structures through a clear, nested layout. Hierarchical diagrams can show the breakdown of broad categories into finer sub-categories, depict lines of authority in organizations, trace genealogies, or represent the structure of software systems, making them invaluable tools in various fields for organizing and presenting data in an intuitive and scalable manner.

Network Data Network data refer to relationships and interactions between entities within a system, like social networks of people, ecological systems, internet networks and more. Network data are structured around nodes, representing the entities, and edges, which denote the relationships or interactions between them. Analyzing network data facilitates exploration of connectivity patterns, the identification of influential nodes, and understanding of network dynamics, and gives some insights into the organization, behavior, and properties of complex systems.

Unstructured Data Another type of data is unstructured. This includes text, video audio and other multimedia types. These data types have emerged in response to innovations in computer science and include things like text images, video and other multimedia. In some contexts like Geographic Information Systems (GIS) a satellite image is a form of data or computer which captures and processes images as data. Some aspects of these data are relevant in the context of visualization, such as when you visualize the range of pixel values of an image.

Text Data: include any length of text from single words to lengthy compositions like novels. or even larger collections called corpora, which may includes millions or billions of words.

Return to top

Define Your Objective

After determining the type(s) of data in your dataset, you will want to determine what kind of relationship or function you want to demonstrate.

Compare Values: If your want to visualize a comparison of two or more values, you want to pick a visualization that clearly shows the differences or similarities between different data points or categories. For example, you might use a bar or column chart to compare sales figures of different products or revenue generated by different departments in a company.

Illustrate a Distribution: Showing a distribution involves visualizing the spread of data across a range of values, and is useful if you want to illustrate the frequency, or pattern of data. Common visualization types include: histograms, box plots, and density plots.

Examine Relationships Between Variables: To explore the correlation or association between two or more variables, you can use scatter plots, line charts, or heatmaps. These visualizations help to identify patterns and trends that might exist between different variables.

Demonstrate Trends Over Time: If you want to illustrate trends or patterns that occur over time, line charts, area charts, or time series plots are effective choices. These visualizations reveal how a particular measurement or data point changes over different time intervals.

Show a Part-to-Whole Relationship: Presenting a part-to-whole relationship involves illustrating how individual components contribute to the overall whole. Pie charts and stacked bar charts are commonly used to show the proportion of different parts relative to the total.

Show Location: If you want to show how values change over or how they compare over an area like city, state or country, a cartographic product would be very good solution.

Return to top

Consider Your Venue

The Venue encompasses all aspects of where, how and to whom the visualization will be viewed. Presenting visualizations online presents opportunities for much more engaging and dynamic visualizations with interactive utilities. Some professional journals may have requirements on how visualizations can be formatted. Furthermore, you have to anticipate who your audience will be and what assumptions and skills they bring to interpreting the visualization.

Domain Knowledge: Are you presenting data to a community group, where people may have differing levels of knowledge about a subject, or are you publishing a visualization in a very-specialized venue like a scientific journal? The former group may have little familiarity with concepts like correlation or standard deviation, and find visualizations of these measures pointless; while in a scientific journal such data is almost always mandatory. Also match the complexity of the visualization to the anticipated skillset of the audience.
Purpose and Context: Consider the context in which the visualization will be presented. Is it for a technical report, a business presentation, or a public educational setting? Tailor the visualization to suit the specific purpose and tone of the communication.
Accessibility: Try to ensure that the visualization is accessible to all members of your audience, including those with visual impairments.
Engagement: Consider how to capture and maintain the audience's interest. Interactive visualizations or engaging storytelling techniques can enhance the audience's engagement with the data.
Device and Medium: Be aware of the platform or medium through which the visualization will be presented. Different devices and mediums may have different constraints, such as screen size or resolution, which could impact the choice of visualization type and design.
Age and Demographics: Take into account the age and demographics of your audience. Younger audiences might be more receptive to interactive and dynamic visualizations, while older audiences might prefer more straightforward and traditional charts.
Key Message: Ensure that the chosen visualization effectively conveys the key message you want to communicate to your audience. Avoid clutter and unnecessary details that might distract from the main point.

Return to top

Choose Chart Type

Number of Data Points: If you have a large number of observations, consider using aggregated or summarized visualizations, such as grouped bar charts or box plots, to avoid clutter.

Data Density: For dense data with many overlapping points, consider using scatter plots with transparency or jittering to prevent visually overwhelming charts.

Data Context and Audience: Anticipate the context in which the visualization will be presented and the data literacy level of your audience. Select visualizations that are comprehensive enough to communicate the information, but are easy to interpret.

Data Integrity: Make sure that the chart accurately represents the data without distorting or misleading the audience.

Return to top

Guiding Principals of Design Visualization

This framework guides the design of visualizations that not only communicate data accurately but also effectively engage and inform the audience.

Clear: Clarity ensures that the message within the data is easily understood by the audience. This means the visualization should be straightforward, avoiding any potential confusion or misinterpretation. Clarity can be facilitated by:

Using straightforward language and avoid technical jargon that might alienate the audience.
Selecting chart types and visual elements that directly contribute to the comprehension of the data.
Employing a logical structure that guides the viewer through the data in a coherent manner.

Clean: Cleanliness in a visualization typically emphasizes a minimalist approach, where every element serves a purpose. A clean design is free from clutter and superfluous information that could distract from the key message. Cleanliness entails:

Eliminating unnecessary decorative graphics, excessive labels, or overwhelming colors.
Using white space effectively to separate and organize elements, making the visualization easier to navigate.
Choosing a color scheme and typography that enhance readability and do not detract from the data being presented.

Concise: Conciseness involves distilling the data to its most essential elements, and presenting it in a manner that is both brief and comprehensive. A concise visualization communicates the key insights without oversimplification or loss of critical information. To be concise:

Focus on the most relevant data points that support the narrative or answer the specific question being addressed.
Avoid overcrowding the visualization with too many data series, categories, or variables that can obscure the main message.
Streamline the content to ensure that every piece of data, text, and graphic element adds value and meaning.

Captivating: A captivating visualization draws the viewer in, holds their attention, and makes the data memorable. This principle is about leveraging the power of visual storytelling to engage the audience on an emotional or intellectual level. To captivate your audience:

Use compelling narratives or themes to frame the data, making it relatable and interesting.
Incorporate elements of surprise or intrigue, such as unexpected findings or trends, to spark curiosity.
Design with the user experience in mind, ensuring the visualization is not only informative but also enjoyable to explore.

By adhering to these Four C's, data visualization professionals can create visuals that are not only aesthetically pleasing but also meaningful and effective in communicating complex information. The integration of clarity, cleanliness, conciseness, and captivating elements ensures that visualizations serve their primary purpose: to illuminate insights and facilitate understanding in a way that words alone cannot achieve.

Return to top

Get Feedback

Gathering feedback from friends, colleagues, and potential viewers is a valuable step in the data visualization process. It provides diverse perspectives that can help refine and enhance the effectiveness of your visualization. Here are some strategies to improve your visualization with feedback and iterative testing:

The Value of External Feedback

Diverse Perspectives: People from different backgrounds can provide insights that you might not have considered, highlighting areas of confusion or misinterpretation.
Usability Insights: Feedback can reveal how intuitive and user-friendly your visualization is, particularly for those who may not be familiar with the data or the subject matter.
Design Improvement: Suggestions can lead to design improvements, making your visualization more accessible and engaging.

Effective Iteration and Testing Strategies

Prototype Early and Often: Create quick, early versions of your visualizations to test ideas. Early prototyping can save time and resources by identifying potential issues before you invest in more detailed designs.
Compare Chart Types: If uncertain about the best way to present your data, create several versions using different chart types. This can reveal which format most effectively communicates your data's story.
A/B Testing: Present two versions of your visualization to different groups and gather feedback on each. This method can provide clear insights into which design choices work best.
Use Specific Questions: When seeking feedback, ask specific questions to guide reviewers. Instead of asking if they "like" the visualization, ask if they can easily find key information or understand the main message.
Incorporate Interactivity in Testing: If your visualization is interactive, ensure that testing includes interaction elements. Observing how users interact with the visualization can provide valuable insights into its usability and functionality.

Implementing Feedback Effectively

Openness to Criticism: Approach feedback with an open mind. Constructive criticism is crucial for improvement, even if it challenges your initial assumptions or design choices.
Prioritize Feedback: You may receive a wide range of feedback. Prioritize changes based on the impact they will have on clarity, understanding, and engagement.
Iterative Design Process: Refinement of a visualization is an iterative process. Implement changes based on feedback, then test again. This cycle should continue until the visualization meets your objectives and effectively communicates the intended message.

Documentation and Reflection

Document Changes: Keep track of the feedback received and the changes made. This documentation can be invaluable for understanding the design decisions and for future projects.
Reflect on the Process: After completing the visualization, reflect on what was learned from the feedback and testing phases. This reflection can improve your process for future visualizations.
Iterative testing and feedback are fundamental to creating effective data visualizations. They ensure that your visualizations not only present data accurately but also resonate with the intended audience, making complex information understandable and engaging.

Return to top