Despite its many uses, there are two basic questions that researchers ask when using GIS: where and what.
Where is something located and what are its characteristics.
To answer the 'where' researchers use spatial data and to answer the 'what' researchers use attribute data.
Spatial Data - The Where
Spatial data are representations of the Earth like satellite images or representations of features on the earth like buildings, streets, land use, the county boundaries of Maryland, rivers, addresses and much more.
The principal spatial data types are Rasters and Vectors. Rasters have pixels and are what we normally call "images" like images taken by satellites and airplanes. The pixels in rasters have numeric values that can represent things like color, temperature or elevation. Rasters are most often used in natural sciences and in some social sciences like city planning.
Vector data represent features on the Earth's surface with points, lines and polygons -the elements of Euclidean geometry. A street, for example, would be represented as a line, while a building may be represented as a polygon. Addresses are often displayed as points. Vector data are usually used in connection with attribute data (see more below).
There is an additional data type called LiDAR or Light Detection and Ranging. An airplane or drone passes over the land and rapidly beams a laser toward the ground. The laser beam is reflected back up to a sensor, which measures the time elapsed and records the elevation and coordinate (latitude and longitude) at that point. LiDAR is a sort of hybrid between vector and raster because it begins as vector data, but is almost always processed and converted into raster data and is often used to create 3-D images.
Attribute Data - The What
Attributes are data that describe a location in terms of how many, how much, how often or when, and other characteristics about a population or thing at a given location. The attribute data you select depends on your research needs and can be quantitative or qualitative. These data may be represented as text or numbers in an attribute table.
Figure 1 Baltimore Crime Incidents.
Figure 1 is an example of an attribute table for crime incidents in Baltimore. The attributes include the date, time, address, type, whether the crime happened in a premise or outside of it, any weapon used, and district. These descriptive headings and their columns are called fields while each instance is listed in a row and is called a record or sometimes a tuple.
Each record in the attribute table corresponds to a single spatial feature. In the image below we see three census block groups highlighted along with the corresponding rows in the attribute table.
Attribute tables come in a variety of formats including, most commonly, Microsoft Excel spreadsheets and comma separated or delineated values. Attribute tables are often available separately from shapefiles and must be joined to the spatial data. Occasionally the data must be “cleaned up” before being joined to spatial data.
As noted above attribute data may include text, numbers, and rasters. Each has a specific field property according to the type of data stored in it: Long Integer, Short Integer, Float, Double, Date, Text, Raster Blob and Guid. When you import a table the field property is assigned automatically. However, if you need to create a new field, (and you will often have to in GIS projects) you must understand the data and select the correct field type. The type of data you need will influence the field property you select:
Numeric Data
Essentially short and long integers are used for numbers without decimals and float and double are used for numbers with decimals.
The chief difference between short and long integers is the amount of storage required on the computer short integers are stored as 2-byte values and long integers are stored as 10-byte values. These are vestiges of the early days of computing when storage space was at a far greater premium than in today’s computers. Never-the-less, when working with very large datasets it may help to store data as short integers when possible to maximize processing speed.
Text Data
Other Types
When doing quantitative analyses using more than one field, it is necessary to make sure the fields have the correct property type. Numeric values can be stored as text, for instance but if you want to find population density for instance, you cannot use numbers stored in a text field- they must be stored in a number field like integer, float or double. Similarly, when you are calculating a variable in a new field it is essential that you first specify the correct field property. If you are calculating population density you will likely end up with numbers that have a decimal, therefore you do not want to use an integer, but rather a float or double.
The Mercator Projection is one of the best known and is used by Google Maps. The Mercator projection tends to exaggerate polar regions and makes Greenland, for example, appear to be much larger than it really is, relative to the rest of the Earth.
Mercator Projection
Other projections like the Mollweide Projection tends to flatten the polar regions but exaggerate the size of land on the peripheries of the Equator.
Mollweide Projection
Some projections, like the Sinusoidal Projection, divide the Earth into sections to minimize distortion by localizing it, but make it difficult to interpret the Earth as a whole. In the example below Iceland and Alaska are divided between the west and central sections.
Sinusoidal Projection
Choosing a Projection
Choosing a projection is influenced primarily by the extent of the area you are mapping. For example if you are mapping the entire earth, you will want to use a projection intended for the Earth. If you are mapping Antarctica, then you will want to use a projection for Antarctica that places the south pole in the middle as opposed to a world projection which distributed Antarctica across the bottom of the map.
In the images below we can see the continental United States displayed with a world projection on the left and a conic projection on the right. As we have seen all projections distort the appearance of the Earth in different ways and in the world projection to the left the United States appears flattened. By using a regional projection for North America, instead of a world projection, the United States will appear more "proportional."
If we look closely at the above right map we can see that the Mason-Dixon line is oriented south-west to north-east. If we zoom in we will get a map of Maryland that looks like the one shown below to the left. The map on the right utilizes a projection system made especially for Maryland, which orients the state in a way more familiar to most people with the north to the top.
Datums
In modern mapping and GIS, projections rely on a datum to model the surface of the Earth instead of relying on direct measurement of the Earth. Because the Earth is not perfectly spherical and its surface is marked by great contrasts of elevation from deep canyons to high mountain peaks, it is necessary to make a model that accurately reflects the local landscape and use it to assign a projection that is very accurate to that area. So for example there is a datum or model of North America called the "North American Datum" developed by the United States Geological Service. The process of developing a datum is within the purview of the science of Geodesy, and its most technical details are beyond the scope of this guide.
GIS data comes in a wide variety of formats and from as many sources. You can collect GIS data from your phone as you walk or run around, or even as you drive. This is how Google Maps can determine high traffic volume areas when you plan routes between two addresses. By comparing the speeds and numbers of phones as they move through traffic Google Map can automatically infer the density of traffic in a given location and indicate the location on the map.
Data you collect on your phone can be downloaded and integrated in a GIS program. This is also a cost-effective way to collect custom data for research projects focusing on small or local areas.
Here are some sources for GIS data that are freely available:
Federal Census
Maryland
Baltimore
Other Data Sources
Data.gov is the Federal Government's main open data portal. Here you can find extensive sources for GIS and other data. Much of the data is user-generated during federally funded research.
ICPSR is an international consortium of more than 750 academic institutions and research organizations and maintains a data archive of more than 250,000 files relating to research in the social and behavioral sciences. The archive contains 21 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields.
The General Social Survey (GSS) studies the growing complexity of American society. It is the only full-probability, personal-interview survey designed to monitor changes in both social characteristics and attitudes currently being conducted in the United States.