College of Natural Sciences and MathematicsDaniels College of Business

Cleaning Data for Effective Data Science: Data Ingestion, Anomaly Detection, Value Imputation, and Feature Engineering

Instructor: Pearson

The course introduces the tools and techniques needed for data ingestion, anomaly detection, value imputation, and feature engineering. Numerous ingested formats are addressed, including JSON, CSV, SQL RDBMS, HDF5, NoSQL databases, and binary serialized data structures. Instructor David Mertz outlines why some problems are peculiar to data representation, while others link to the data in itself. To address untidiness in data, learn how and when to impute missing values, detect unreliable data and statistical anomalies, and generate synthetic features that are necessary for successful data analysis and visualization goals. By the end of this course, you’ll be equipped with highly marketable and in-demand skills in data analysis, machine learning, and data integrity troubleshooting.

Learn More