Data Science Modules

Data science modules are short explorations into data science that give students the opportunity to work hands-on with a data set relevant to their course and receive some instruction on the principles of data analysis, statistics, and computing. With help from the Data Science Modules development team, a module can be designed and taught in an existing course from any discipline or field.

Featured Modules

Some great modules to check out!


Linguistics 110

Introduction to Phonetics and Phonology

Susan Lin

This module consists of two labs where students analyze data collected on themselves and compiled for the whole class. In the first lab, students explore closure and voice-onset time, as well as look at the effect of native language spoken and height on those numbers. In the second lab, students analyze vowel formants and compare their vowel space to the rest of the class, investigate the predictive power of individual metadata for these characteristics. The module is designed for students with little-to-no coding or statistics experience.

Sociology 130 AC

Neighborhood Mapping

Joanna Reed

This module maps and visualizes socioeconomic and demographic variation across East Bay census tracts, using crowdsourced student data. Students go out into neighborhoods and make qualitative observations, and then compare to census data. Qualitative observations are mapped in the notebook to combine individual obersvations into an overall map. Students engage with the data that they have gathered, and can explore across student groups, socioeconomic categories, or geographic locations. The module is designed for students with little-to-no coding or statistics experience.

Econ 101B

Macroeconomics

Brad DeLong

This core mathematical macro-economics class has a semester’s worth of problem sets. The first notebook illustrates how a notebook can be illustrated like a textbook, with Latex external formulas, and graphical illustrations. The modeling part of the first notebook illustrates a Solow growth model with six graphical panels. The module also uses autograding features to give immediate feedback to the students for early practice problems to let them know if they are on the right path. The module is designed for students with little-to-no coding or statistics experience.

XEnglish 1A

Chinatown and Culture of Exclusion

Amy Lee

Using demographic data from the 20th-21st century, this module has students analyzing how a specific Chinatown, such as SF Chinatown, has changed over time. Students use some simple computational text analysis methods to explore and compare the structures of poems written on Angel Island and in Chinatown publications from the early 20th century. The module is designed for students with little-to-no coding or statistics experience.

Psychology 167 AC

Implicit Bias and Social Outcomes

Rudy Mendoza-Denton

This module introduces students to correlation and regression analysis. Students pick from a set of datasets on health outcomes, and a set of datasets on implicit bias, both at the county level for the entire US. They then merge the two datasets by census code and measure correlation and regression to see if there are interactions between biases and health outcomes. The module is designed for students with little-to-no coding or statistics experience.

XRhetoric 1A

Moral Foundations Theory

Amy Tick

These modules connect word use in political speeches to the Moral Foundations Theory. Statistical inferences and visualizations from this data help students look for rhetoric differences between conservative and liberal presidential candidates. Students then engage with and critique data-driven methods themselves as rhetorical tools. This module is meant to take 3 class periods for students with minimal to no coding experience.

Cuneiform 102

Sumerian Text Analysis

Niek Veldhius

This module works with an interesting data set, the Electronic Text Corpus of Sumerian Literature (ETCSL). These texts are translated from fragmented tablets as old as 6000 years. The techniques used in this module are less common in text analysis such as k-means, hierarchical clustering and multidimensional scaling. These provide the ability to classify a newly translated text with past Sumerian literature as well as create interesting tree graphs and clusters. The module is designed for students with little-to-no coding or statistics experience.

LEGALST-190

Data, Prediction, and Law

Jonathan Marshall

“This module introduces exploratory data analysis as a lab within a semester-long data-enabled course. Using data from 2016 US presidential campaign speeches, module students mine features from speech text, create visualizations for data exploration, and apply principal component analysis to their extracted features. The module concludes with an example of a 3-dimensional feature plot. The course is designed for students with Data-8 or equivalent experience.”