LibGuides: GIS Services: Visualization

About Visualization

Symbology refers to the techniques and decisions made about how features in a map are visualized. From the basic geometric elements of vector datasets (point, line, surface) we can derive a variety of different symbols to represent different classes of spatial phenomena, different magnitudes, and ranges of values. To represent these things, we have to make decisions iconography, size and color.

Guidelines for Effective Symbology

Strategies for Effective Symbologies

There are a lot of factors to consider when symbolizing spatial data, and there is a substantial degree of subjectivity and personal taste in the decision making process. Furthermore the amount of data (or shapefile layers) in your map must influence the your strategy for choosing symbols. In the images below we can see a very detailed topographic map of Baltimore and environs with the legend displayed. We can see that there are over 60 symbols for the map representing various things like built-up areas, different types of roads, elevation contours, land use and so on. In contrast the population density map to the right represents only minimal decision making like what color ramp we would like to represent the numeric values of the densities.

When making decisions about how to visualize your data, consider the following:

1. Keep it simple.

The map should be easy to read and comprehend. Remember that a map can convey at a glance what it would otherwise take paragraphs and paragraphs of text to convey. We want symbols that are easy to see, easy to contrast from other data and that do not overwhelm the eyes.

In the example above we can see two population dot-density maps. Each dot (or star in the left map) represents 3000 people. From both maps we might get the general idea that the suburbs of Washington, DC and Baltimore are more densely populated relative to the rest of the state. However, the lurid colors and proliferation of eccentric symbols on the left are quite jarring, and easily diminish the effectiveness of the map. We can do several things to make a more approachable map.

First, we can symbolize population at the block-group level with dots and then set the block-group lines to null to that they do now show up. Then we can use the dissolve tool to generate a shapefile that visualizes the state outline only leaving the interior clutter-free and the dots more visible. The map on the right is actually two shapefiles. One with the block-groups visualized with the dot density technique (see more later), and a second shapefile of the sate boundary generated with the dissolve tool.

2. Symbology is guided by convention.

In the image above left, we can see what looks like a lake district in some country, however by changing the symbology we can see that it is actually the Isthmus of Panama. By relying on conventions like "water=blue," we can ensure the effectiveness of our map to communicate information. Furthermore, things like capital cities are often marked by stars, or have their names' underlined.

There are other conventions that are less well known outside of the cartography world. For example, if we want to visualize a range of quantities it is customary to use a "color ramp" which is a range of colors like green-yellow-red or blue-yellow-red (shown above far left and left), or a single color range like light blue to dark blue, as seen above middle right. Each of these techniques is useful for showing values relative to other values with related colors. So that dark green features are lower in value then light green, and red values are high then orange. The map to the far-right shows the same data as the other three, but departs from them by showing unrelated colors so it is difficult to see any pattern in the data.

Generalization

Generalization refers to the loss of detail in a feature. To make a clear and presentable map it is sometimes necessary to select shapefiles that are not as detailed as others, and vice versa. Generalization is a characteristic of polygon and line shapefiles.

Non-generalized EastCoastFarout.png

In the two images above, the map of the East cost is very detailed. We can see along the inlets and bays on the coast there are heavy grey lines representing the county boundaries. If we try to focus on the Chesapeake Bay we can see that the coast is very cluttered, while the image to the right is generalized and the effects of extreme detail have been lost, and the map is easier to discern.

Now if we zoom in to the upper Chesapeake Bay we can see how the effect of generalization contrasts with that when we are zoomed out. The image on the above left is is the same generalized shapefile from the first example, but now we can see that it is rather clunky and abstract. The image above right is the same highly-detailed map form the first example, and now that it is zoomed in, the problems of clutter and illegibility have given way to very clear and precise display of the shapefile boundaries.

When your study area is a small are like a county or state like Maryland then a detailed shapefile is very useful. However if you want to work at the scale of a larger state like Texas or the whole United States, then a generalized shapefile will be most useful.

Sometimes, when looking for data, you will see " (generalized) " in the file name and you will know for what scale this shapefile is useful. For census produces, the United States Census Bureau tends to produce generalized spatial units, while the National historical GIS (NHGIS) produces very detailed shapefiles.

Exaggeration

A common technique to display important features is to exaggerate them. For example, if we have a map showing highways, we will display them larger that they otherwise would be at a given scale. In the map below, the highways would be several miles wide if they were depicted at true scale. On the other hand, if we were to create a highway map at true scale without exaggerating the highways, then they would be invisible because the scale is too small to accurately depict the highways.

Exaggerated features.PNG

Polygon Symbolization

Polygon or Fill symbologies are used to visualize the interior of polygons. Sometimes the polygon shapefile is a "background" that serves only to indicate the extent of an area and distinguish it from adjacent areas. Sometimes it is enough to have only the outline of the polygons and set the fill to "null" meaning that it will have no color. Oftentimes the polygon fill is used to represent statistical data like raw count data to show how much of something is in a feature or ratio data to show what proportion of a value existing a feature relative to the rest of the shapefile. These types of visualizations can be managed in the categories and classification types, which we will look at later.

Polygon fills can also be used to represent different land-use types like swamp, grassland, forest and so on and the ArcMap software has several pre-defined fill options for these visualizations.

SYMBOLS POLYGON.PNG

Line Symbolization

Line symbology includes different symbols for linear phenomena like transportation corridors such as railroads, bicycle routes, highways, roads; water ways like rivers, streams, canals and aqueducts; other lines like contour lines, and administrative boundaries and so on.

ArcMap contains a wide-range of conventional pre-set line symbols.

Point Symbolization

Point symbologies are used to visualize specific locations with point shapefiles. If you have multiple point shapefiles in your project you can visualize them with different colors, or with different shapes like circle, triangle, star and more. If you are representing different classes of locations it is a good idea to to represent them with different sized symbols. So if you have a shapefile of schools and a shapefile of grocery stores it would be best to represent stores with circles, for example, and grocery stores with triangles; or represent both with circles of different colors; and not represent both with circles of different sizes. The reason is that different sized circles represent differences of magnitude like size, and there is a special visualization type called "graduated symbol" that is sued in situations like this.

The ArcMap software has several pre-defined points for things like like schools, airfields, hospitals and highway markers (see below).

Symbology Types POINTS.PNG

Quantities

The quantities symbology style is one of the most complex and requires a lot of decisions to use properly. The options for visualizing quantities includes Graduated Colors, which represents numeric data with a range of colors, Graduate Symbols, which represent differences in numeric data with circles, or other shapes, of different size according to a classification scheme; proportional symbols which show the proportion of a value out of the total of all values with different sized shapes; and dot-density which represents a specified number of observations (usually people) with a dot, so that one dot might equal 50 or 100 people.

In contrast to the Categories symbology type, the Quantities symbology type does not assign a unique symbol to each feature, but rather groups the values into ranges. This is useful for grouping related values together, which is a standard practice in data visualization and helps keep the visualization simple and intelligible.

Graduated Colors

The input for the Value Field is the field in the attribute table that you want to symbolize. The normalization field is optional, and can be used to create a proportion, so if you have a field in your attribute table for number of people aged 20 to 25 you can normalize this with the total population field, and then the resulting value will show the percent of people in each feature aged 20 to 25 as a proportion of the total population.

The classify field requires some detailed attention. The drop down window allows you to change the number of classes. In ArcMap the default setting is five classes, and this is practicable for most needs. You may need to change the number if you are visualizing data that is the result of an index or other measure that is typically grouped into another number. The classify button will open another window with methods of grouping the data into different classes like: Manual, Equal Interval, Defined Interval, Quantile, Jenks' Natural Breaks, Geometrical Interval and Standard Deviation.

Which Classification to Use?

The principal consideration when choosing a classification is to pick one that makes your data expressive, that is there aren’t too many observations in one class and too few in another, or their values are not over represented in one class and under-represented in another. The best approach is to look at the histogram in the classification menu and decide how the data is spread. Is the data exponential like we’ve seen in most of the examples, or is it distributed normally, or do the values demonstrate little variance and might be considered homogenous?

Normal Distribution (also called Gaussian distribution or Bell curve)

A normal distribution is one in which most values are clustered around the mean and taper off to the left and right. It is also called a bell curve or Gaussian distribution. The best classification types for a normal distribution are: Standard Deviation, Jenks Natural Breaks and Quantile

Geometric Distribution

Geometric Distribution.PNG

A geometric distribution is one in which most of the values are clustered at one end. In the image above the values are clustered tot he left, though they can also be clustered to the right side as well. The best classification types for a geometric distribution are: Geometric and Quantile.

Homogenous and Non-Modal Distribution

Homogenous distributions are defined by little variation in observations, and non-model distributions have to regularity. Any classification technique can be used for distributions like this.

Manual classification allows you to define both the number of classes and the number of observations in each class. This is the least specific of the classification methods and is useful if you have specific needs for classifying data that are not satisfied by other classification techniques.

The equal interval classification divides the range of attribute values into equal-sized subranges. If you have a range of values from 1 to 1000 and you specify 5 classes then you will get intervals of 200. This classification method is not influenced by the number of observations and is best applied to familiar data ranges, such as percentages and temperature, non-modal distributions, or in special cases were you need to define intervals of equal value for measurements and indices that have predefined ways for displaying results.

Defined Interval

With the defined interval classification, the data can be classified into specific ranges of with the same value. In ArcMap you will define the interval size and then the software will automatically update the number of classes, by dividing the number of observations in the dataset by the interval size you input.

In the examples below we can see that the first classification has an interval size of 70, which results in 10 classes since there are nearly 700 observations. In the example, below right the interval size was set to 200 which results in 4 classes.

Quantile

Defined interval classification works best when you have a homogenous or non-modal distribution; or in other situations where you may want to specify a defined interval such as when mapping the result of an index which requires certain ranges of data to be grouped. For example, a measurement like Social Area Analysis requires indices to be grouped into ranges of 25% so you may want to specify the interval thusly even if the resulting image is not very expressive.

The quantile classification divides the total number of features by the number of classes specified. It should be noted that this is different from equal interval which divides the range of values by the number of classes.

The quantile classification can be used for Normally distributed data or geometrically distributed data, though it should be used with caution, because similar values may be grouped into different classes, or very different values might end up in the same class leading to a misleading map. In the example below we can see that every value from 48 to 86 is in the same class, though it is more expressive of data centered around the mean.

Jenks Natural Breaks

The default classification in ArcMap is Jenks Natural Breaks. The Natural Breaks method identifies classes based on natural groupings inherent in the data. Theses class breaks represent groups with similar values and which maximize the differences between classes. The class boundaries are set where there are relatively large differences in the data values. Natural breaks are specific to a particular dataset and are not useful for comparing multiple maps built from different underlying information, since they classifications will be different for each map, so if you are comparing multiple maps you don’t want to use the Jenks natural breaks.

Geometric Interval

The Geometrical Interval classification scheme creates class breaks based on class intervals that have a geometrical series. We can see in the histogram here that most of the values are grouped in the lower end while higher values diminish in number to the right resulting in a backwards exponential curve or upside down logarithmic curve. This classification technique is useful for data that are distributed geometrically, that is they are groups on one side or the other.

Standard Deviation

The standard deviation class divides the values into standard deviation from the mean value. This technique is useful for normally distributed data.

Graduated Symbols

The graduated symbol style provides options for displaying quantifies as point symbols.

Clicking on the template button will open the point style options. The classification techniques are the same as mentioned earlier. The image below left shows how the graduated symbol type appears. Care must be taken that the symbols don’t overlap or that the graduated symbols do not cover the lines as well.

Proportional Symbols

Proportional symbol style does not use classifications and offers limited control over display options. In this style only the minimum value can be changed and the larger value will be sized proportionately Proportional symbols are useful when you have non-numeric data that can be categorized like “high, medium, low.” This technique is highly susceptible to overcrowding of symbols to a degree that it is not useful sometimes. In the image below left, the minimum value size has been reduced to minimize cluttering.

Dot Density

Dot density represents numeric data proportionately as points. It is important to note that if a feature has a value of 500, the dot density will not represent 500 points but a proportion of the total. So a value of 500 may be represented by 50 points and a value of 10 may be represented by 1 point. In the dot value field above you can specify what value a point should represent. Dot density is useful when you want to imitate a natural representation of data distribution. The maps below show crime data and, while it does not show the exact locations of crimes the dot density is a useful way of representing point phenomena as a quantitative phenomenon.

In the example above we set the dot value to 50 meaning that each dot represents 50 observations. When there is a very large population a smaller the dot value will results in more dots being visualized on the map. In these cases we have to shrink the dot size- in this case to 1.

In contrast this second map has a dot value of 500, and so there will be fewer dots visualized. We should increase the size of the dots slightly so they stand out more.

We can also represent multiple variables using the dot density technique. In the image above we can see the population of men by age with lighter values representing 100 younger men, and darker dots representing 100 older men.

Other Statistical Graphic Representations

ArcMap has options for visualizing data in formats like pie charts, bar graphs and single bars.

Pie Chart

The pie chart can be used to represent multiple variables like crime count and population density. As you can see (below, left)the pie chart is not really useful when you have a lot of attribute rows. When you zoom in the pie charts are better sized to the feature class (below, right).

Bar Charts

The bar chart works in a similar way and the same display constraints apply as in the pie chart:

Stacked Bar Chart

Stacked Chart works in a way like the bar chart except that the variables are visualized in one column: