Skip to Main Content

Data Preprocessing

A guide to methods and resources in Data Pre-processing

About Pre-Processing

Data Pre-processing is a component of the data project that involves  employing techniques for modifying data to handle quality issues and altering it to meet the needs of the selected analytic technique(s).  In some contexts you will hear terms like data wrangling  or data munging as processes that are distinct from data pre-processing.  The terms are not well defined and some authors, and can generally be used inter-changeably.

The steps taken in pre-processing depend on what is learned about the data in the Exploratory Data Analysis (EDA) phase.  For example, if you find null values during the EDA then you will decide on how to handle based on the number and  distribution of the null values; and them implement the mitigation technique during the pre-processing phase. 

The large number and diversity of data sets, the inherent problems possible in them, along with the many analytic approaches that can be implemented, make it difficult to summarize a standard set of approaches to pre-processing.  However this guide provides some guidance on typical pre-proccessing steps that can be employed for many typical data sets.

Return to Data Hub home HERE

©2018 Morgan State University | 1700 East Cold Spring Lane Baltimore, Maryland 21251 | 443-885-3333 | Privacy | Accessibility