What is GIS
GIS Data Types
- Vector
- Raster
- LiDAR
- Attribute
Projections and Datums (See more info HERE)
Where Does GIS Data Come From?

What is GIS?

Data Types

Despite its many uses, there are two basic questions that researchers ask when using GIS: where and what.

Where is something located and what are its characteristics.

To answer the 'where' researchers use spatial data and to answer the 'what' researchers use attribute data.

Spatial Data - The Where

Spatial data are representations of the Earth like satellite images or representations of features on the earth like buildings, streets, land use, the county boundaries of Maryland, rivers, addresses and much more.

Spatial Data Types.png

The principal spatial data types are Rasters and Vectors. Rasters have pixels and are what we normally call "images" like images taken by satellites and airplanes. The pixels in rasters have numeric values that can represent things like color, temperature or elevation. Rasters are most often used in natural sciences and in some social sciences like city planning.

Vector data represent features on the Earth's surface with points, lines and polygons -the elements of Euclidean geometry. A street, for example, would be represented as a line, while a building may be represented as a polygon. Addresses are often displayed as points. Vector data are usually used in connection with attribute data (see more below).

There is an additional data type called LiDAR or Light Detection and Ranging. An airplane or drone passes over the land and rapidly beams a laser toward the ground. The laser beam is reflected back up to a sensor, which measures the time elapsed and records the elevation and coordinate (latitude and longitude) at that point. LiDAR is a sort of hybrid between vector and raster because it begins as vector data, but is almost always processed and converted into raster data and is often used to create 3-D images.

Attribute Data - The What

Attributes are data that describe a location in terms of how many, how much, how often or when, and other characteristics about a population or thing at a given location. The attribute data you select depends on your research needs and can be quantitative or qualitative. These data may be represented as text or numbers in an attribute table.

Attribute data.PNG

Figure 1 Baltimore Crime Incidents.

Figure 1 is an example of an attribute table for crime incidents in Baltimore. The attributes include the date, time, address, type, whether the crime happened in a premise or outside of it, any weapon used, and district. These descriptive headings and their columns are called fields while each instance is listed in a row and is called a record or sometimes a tuple.

Each record in the attribute table corresponds to a single spatial feature. In the image below we see three census block groups highlighted along with the corresponding rows in the attribute table.

Tabel and features.PNG

Attribute tables come in a variety of formats including, most commonly, Microsoft Excel spreadsheets and comma separated or delineated values. Attribute tables are often available separately from shapefiles and must be joined to the spatial data. Occasionally the data must be “cleaned up” before being joined to spatial data.

As noted above attribute data may include text, numbers, and rasters. Each has a specific field property according to the type of data stored in it: Long Integer, Short Integer, Float, Double, Date, Text, Raster Blob and Guid. When you import a table the field property is assigned automatically. However, if you need to create a new field, (and you will often have to in GIS projects) you must understand the data and select the correct field type. The type of data you need will influence the field property you select:

Numeric Data

Short Integer: is used for integers (e.g. -3, -2, -1, 0, 1, 2, 3…) that are inclusively within the range -32,768 and 32,767
Long integer: is used for integers that are inclusively within the range −2,147,483,648 to 2,147,483,647
Float: is used for rational numbers with up to 8 digits in the mantissa e.g. 125.00000001
Double: is used for rational numbers with up to 16 digits in the mantissa e.g. 1325.0000000000000001

Essentially short and long integers are used for numbers without decimals and float and double are used for numbers with decimals.

The chief difference between short and long integers is the amount of storage required on the computer short integers are stored as 2-byte values and long integers are stored as 10-byte values. These are vestiges of the early days of computing when storage space was at a far greater premium than in today’s computers. Never-the-less, when working with very large datasets it may help to store data as short integers when possible to maximize processing speed.

Text Data

Text is used for string data like single words, sentences, and paragraphs if needed.

Other Types

Date is used for dates. You can enter dates in formats like January 23, 1978, or 2010/03/25 and it will automatically convert to a MM/DD/YYYY format
Blob or Binary Large Object is used to store files like video, and audio.
Raster is used to link to photographs. The photograph is not displayed in the table, but a link in the records will provide a thumbnail pop-up of the image.
Guid or Global Unique Identifier is used in advanced applications of GIS as an identifier in rows.

When doing quantitative analyses using more than one field, it is necessary to make sure the fields have the correct property type. Numeric values can be stored as text, for instance but if you want to find population density for instance, you cannot use numbers stored in a text field- they must be stored in a number field like integer, float or double. Similarly, when you are calculating a variable in a new field it is essential that you first specify the correct field property. If you are calculating population density you will likely end up with numbers that have a decimal, therefore you do not want to use an integer, but rather a float or double.

Projections and Datums

A projection is a mathematical relationship between the Earth and a map representing it. Because the Earth is three-dimensional it appears to be distorted when represented by a two-dimensional map. The projection is used to make the distortion regular and predictable, and different projections handle the distortion in different ways.

The Mercator Projection is one of the best known and is used by Google Maps. The Mercator projection tends to exaggerate polar regions and makes Greenland, for example, appear to be much larger than it really is, relative to the rest of the Earth.

Mercator Projection

Other projections like the Mollweide Projection tends to flatten the polar regions but exaggerate the size of land on the peripheries of the Equator.

Mollweide Projection

Some projections, like the Sinusoidal Projection, divide the Earth into sections to minimize distortion by localizing it, but make it difficult to interpret the Earth as a whole. In the example below Iceland and Alaska are divided between the west and central sections.

Sinusoidal Projection

Choosing a Projection

Choosing a projection is influenced primarily by the extent of the area you are mapping. For example if you are mapping the entire earth, you will want to use a projection intended for the Earth. If you are mapping Antarctica, then you will want to use a projection for Antarctica that places the south pole in the middle as opposed to a world projection which distributed Antarctica across the bottom of the map.

In the images below we can see the continental United States displayed with a world projection on the left and a conic projection on the right. As we have seen all projections distort the appearance of the Earth in different ways and in the world projection to the left the United States appears flattened. By using a regional projection for North America, instead of a world projection, the United States will appear more "proportional."

If we look closely at the above right map we can see that the Mason-Dixon line is oriented south-west to north-east. If we zoom in we will get a map of Maryland that looks like the one shown below to the left. The map on the right utilizes a projection system made especially for Maryland, which orients the state in a way more familiar to most people with the north to the top.

Datums

In modern mapping and GIS, projections rely on a datum to model the surface of the Earth instead of relying on direct measurement of the Earth. Because the Earth is not perfectly spherical and its surface is marked by great contrasts of elevation from deep canyons to high mountain peaks, it is necessary to make a model that accurately reflects the local landscape and use it to assign a projection that is very accurate to that area. So for example there is a datum or model of North America called the "North American Datum" developed by the United States Geological Service. The process of developing a datum is within the purview of the science of Geodesy, and its most technical details are beyond the scope of this guide.

Where does GIS data come from?

GIS data comes in a wide variety of formats and from as many sources. You can collect GIS data from your phone as you walk or run around, or even as you drive. This is how Google Maps can determine high traffic volume areas when you plan routes between two addresses. By comparing the speeds and numbers of phones as they move through traffic Google Map can automatically infer the density of traffic in a given location and indicate the location on the map.

Data you collect on your phone can be downloaded and integrated in a GIS program. This is also a cost-effective way to collect custom data for research projects focusing on small or local areas.

Most of the data that you can download from the internet was generated by the Federal Government for use with the Census Bureau programs, or for federally funded research that have open access requirements for researchers to allow others to see and use the data created during their projects.
Another source for GIS data is state and local governments, who use GIS to manage infrastructure and natural resources, and often provide it online for free.
A third source of GIS data are universities as mentioned above. Some universities collaborate in consortia and make data available to the public independent of Federal open data requirements.
Non-profit and private firms also generate a lot of data, though it is often proprietary and quite expensive.

Here are some sources for GIS data that are freely available:

Federal Census

U. S. Census Bureau (data.census.gov) This is the main resource for census data from the U. S. Census Bureau, and contains census data and spatial data from the Decennial census and the American Community Survey, form about 2000 to present. The census contains general health and healthcare accessibility data especially in the ACS.
National Historical GIS [NHGIS] (via IPUMS at the Univ. of Minnesota) One of the best sources of Census data. Contains Decennial Census and American Community Survey data, along with special census like the Religious bodies census, state and local census data and more from 1790 to present.

Maryland

MD iMap Maryland's main portal for GIS data generated by state and other sources. This resource contains Spatial data for a wide range of topics, including health and healthcare access.

Baltimore

Baltimore Open City GIS Data is Baltimore's public platform for exploring and downloading open data, discovering and building apps, and engaging to solve important local issues. Analyze and combine Open Datasets using maps, as well as develop new web and mobile applications.

Other Data Sources

Data.gov

Data.gov is the Federal Government's main open data portal. Here you can find extensive sources for GIS and other data. Much of the data is user-generated during federally funded research.

Inter-university Consortium for Political and Social Research (ICPSR)

ICPSR is an international consortium of more than 750 academic institutions and research organizations and maintains a data archive of more than 250,000 files relating to research in the social and behavioral sciences. The archive contains 21 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields.

General Social Survey (via the National Opinion Research Center [NORC] at the Univ. of Chicago.)

The General Social Survey (GSS) studies the growing complexity of American society. It is the only full-probability, personal-interview survey designed to monitor changes in both social characteristics and attitudes currently being conducted in the United States.

IPUMS (via the University of Minnesota) The umbrella project of NHGIS and contains data on a variety of topics like health, education, international data and more.
National Center for Education Statistics (NCES) (via the Dept. of Education) A resource for data about schools and school districts.

GIS Services

Contents

What is GIS?

Data Types

Projections and Datums

Where does GIS data come from?