Course Detail
Units:
3.0
Course Components:
Lecture
Enrollment Information
Enrollment Requirement:
Prerequisites: 'C' or better in CS 3500 AND DS 3190 AND Full Major status in Data Science.
Description
Data mining is the study of efficiently finding structures and patterns in large data sets. We will focus on: (1) converting from a messy and noisy raw data set to a structured and abstract one, (2) applying scalable and probabilistic algorithms to these well-structured abstract data sets, and (3) formally modeling and understanding the error and other consequences of parts (1) and (2), including choice of data representation and trade-offs between accuracy and scalability. These steps are essential for training as a data scientist. Topics will include: similarity search, clustering, regression/dimensionality reduction, graph analysis, PageRank, and small space summaries. We will also cover several recent developments and applications.