Data Capture is the process of searching for and retrieving, or creating the data we need for our project. Before searching for data, we must make two decisions about the spatial and temporal scope of the data we are looking for.
Spatial Scope: Identify the precise area we are interested in studying, e.g. Maryland, Baltimore City, Baltimore City and Baltimore County, the Washington Metropolitan region, etc.
Temporal Scope: Identify the precise time we are looking for, e.g. The most recent time, a comparison of 2000 and 2010, etc.
Data capture is often the most time-consuming part of a GIS Project. Not only are there many data sources that you can chose from, but there are other issues to consider like the currency, reliability, and accuracy of the data.
Furthermore, the data you need may not available in the format you want or it may not be available at all. For example, you may want a data set of all food stores in Baltimore, but you can only find a data set of food store data for the entire state of Maryland; or you may want food store data and not find any at all.
Although there are many scenarios that can emerge in the data capture phase, we can generalize all of them into the following four scenarios:
Examples:
Scenario 1: |
We want to analyze the population of Maryland using the Index of Dissimilarity to measure the level of integration between African-Americans and Whites in Maryland with the American Community Survey data from 2013-2017. |
In this case we can retrieve data directly from a source like NHGIS.org, or data.census.gov. This source has census data by race and has spatial data for the state of Maryland as well, making it an exact match to our needs |
Scenario 2: |
We want to analyze the population of Baltimore using the Index of Dissimilarity to measure the level of integration between African-Americans and Whites in Maryland. |
In this case we can find data from NHGIS or the census Bureau as above, but we will have to process it to extract Baltimore from the wider Maryland Dataset. |
Scenario 3: |
We want to create a map of Police Call boxes on Morgan’s campus, but cannot find one on any website, and after contacting several campus offices we determine that there probably is no data about the locations of police call boxes on campus. |
We will have to create the data from scratch. |
Assessing the quality and completeness of spatial data (e.g., checking for spatial coverage, resolution, and precision of coordinate data).
Data processing is a set of techniques to alter data to make it more responsive to project needs. Before going too far lets review the three scenarios the we saw in the Data Capture Module:
These scenarios will influence what tools we can use to process data.
In the ArcMap application there are several tools for processing data:
Joining - This tool is used to connect attribute and spatial data. There are two types of join : joining attribute tables to spatial data and joining spatial data to attribute tables. This process is required in most circumstances.
Analysis Tools - This toolbox may seem out-of-place since we have modules on Spatial analysis following this one. However many of the tools in the Analysis toolbox are used to process data to adapt it to our purposes. For example if we need Baltimore data, but only have data for the entire state of Maryland, we can use the clip tool to extract Baltimore from the state-level dataset, as in scenario 2. To be sure we will find that we often alternate some data processing and spatial analysis procedures.
Geocoding Tools- These tools are used to convert address data to point data. The process of geo-coding will result in a point shapefile with the addresses in the attribute table.
Editing Tools - These tools can be used to alter spatial datasets that need to be changed due to inaccuracies and more. Furthermore there is an editing tool in the ArcMap toolbar that can be used to edit spatial as well as attribute data.
Conversion Tools - These include tools for converting pdf to raster, GPS captured data to points or lines. converting rasters to polygon, converting from Google Map KML format to feature class and so on. These are especially useful in Scenario Three above, where we find data that we are looking for, but it requires extensive processing before we can integrate it into a GIS project.