Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Course structure

The following is a general description of the course structure for Data 8. It covers how to build your course syllabus and is a reflection of the content that is covered in the Data 8 textbook, which is freely available at the following link:

https://www.inferentialthinking.com/

The following sections describe some of the major takeaways that students should learn in the class. Data 8 assumes no programming, no statistics, and no math beyond a standard high-school level.

Conceptual understanding of uncertainty and causality

A lot of the technical pieces in the course focus on enabling students to practice specific technical skills (like programming). It is crucial that these skills be learned in order to solidify a high-level understanding of how data, statistics, and inference are inter-related. For example, see Chapter 2: Causality and Experiments from the Data 8 textbook.

Below are a few high-level concepts that students should come away with:

Programming fundamentals

Scripting and interactive computing are the primary ways that we operationalize the data science methods covered in the course. While it is possible to find programs that let you carry out various techniques with user-interfaces, Data 8 stresses that programming fundamentals will facilitate learning the analytic topics and provide a more useful and generic skillset in computational methods.

In Data 8, programming fundamentals are taught alongside statistical concepts. For example, iteration is taught alongside random sampling.

Below are some programming fundamentals that students come away with:

Statistics, sampling, and hypothesis testing

Randomness and statistics are core components of data science. Data 8 has a heavy emphasis on both. It is particularly important that students come away with an appreciation for how a sampling method is used to generate data, as well as an understanding for how statistics can be used (and mis-used) to understand a dataset given a limited number of data points.

Below are some statistics fundamentals that students come away with:

Inference, prediction, and models

While statistics describe a dataset, it does not inherently make predictions about the underlying distribution from which the data are drawn. Data 8 relies heavily on bootstrapping and permutation methods in order to make estimations of error/confidence in parameters derived from the data.

Beyond estimating the value of a model’s parameter given limited data, models are also used to generate predictions about the world given a new set of data. Data 8 treats prediction as an extension of inference. In the same sense that inference quantifies uncertainty in a model’s parameter, we can also generate uncertainty in predictions given a data point that the model has not seen before. This is given treatment in the case of regression (models with quantitative outputs) as well as classification (models with qualitative outputs).

Below are some inference, prediction, and modeling fundamentals that students come away with:

Comparing distributions

Once students learn the various steps that go into statistically describing a single dataset, Data 8 covers how to make comparisons between datasets. This is a crucial part of most scientific analysis, as well as in industry data analytics (e.g., in A/B testing). Data 8 covers comparisons between distributions as an advanced case of the material that has been covered above.

Below are some fundamentals for comparing two distributions that students come away with:

Building a Course syllabus page

You may use whatever technology you prefer for managing your course and distributing content. However, we recommend setting up a syllabus page that is used for distributing interact links and course materials. For an example, see the structure of the Data 8 course syllabus: http://data8.org/sp25/

The syllabus has the following structure:

DateTopicLectureReadingAssignment
Fri 01/24Cause and EffectSlidesChapter 2Homework 01
Mon 01/27TablesSlides, Demos, VideoChapter 3
Wed 01/29Data TypesSlides, Demos, VideoChapters 4, 5Lab 02: Table Operations

Each row is a lecture, and each column is a type of material you can distribute. The links in the columns either point to pages on the course textbook. or interact links that connect students with the course JupyterHub for distribution of homeworks and labs.

The videos and slides listed above and on the Data 8 website are restricted to berkeley.edu addresses.

Assignments

Alongside the textbook are several computational homeworks, labs, and projects that let students interact with the ideas covered in class. They can all be run interactively in the Data 8 environment.

These homework, labs, and project materials are freely available on the semester course repository. Here is the public repository for materials related to the course: https://github.com/data-8/materials-fds