Course Detail
Units:
3.0
Course Components:
Lecture
Enrollment Information
Enrollment Requirement:
Prerequisites: "C-" or better in (MATH 1170 OR MATH 1210 OR MATH 1250 OR MATH 1310 OR MATH 1311) OR Full Graduate status in Biomedical Informatics.
Requirement Designation:
Quantitative Intensive BS
Description
The course begins by bootstrapping student's coding skills in the programming language Python, followed by a review of the relevant concepts from statistics. After that, we will move through a series of data science methods using real-life, project-based, lectures and computer labs. The major goals of this course are to learn how to use tools for acquiring, cleaning, analyzing, exploring, and visualizing data; making data-driven inferences and decisions; and effectively communicating results. These will be accomplished through an in-depth sequence of topics which will introduce students to the following data preparation and analysis methods:
Acquiring data through web-scraping and data APIs,
Cleaning and reshaping messy datasets using methods such as data frames, regular expressions or dedicated tools,
Exploratory data analysis and visualization,
Hypothesis testing,
Clustering and classification,
Rating and ranking,
Recommendations,
Network analysis,
Regression and statistical inference,
Natural language processing,
Working with large data: databases, parallel programming.
A major component of this course will be learning how to use python-based programming tools to apply these methods to real-life datasets. Students should have a basic-level of programming experience before taking this course.