Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Introduction to Jupyter and Python


This lab introduces two tools that extend what you can do with data:

  • Jupyter Notebooks, the interactive environment you’re in right now

  • Python, a programming language widely used for data analysis, web development, and automation

By the end, you’ll be able to load a dataset, do basic calculations on it, and make a chart using Python.

What you’ll practice:

  • Navigating and running cells in a Jupyter Notebook

  • Writing basic Python: variables, data types, and lists

  • Loading and exploring a dataset using pandas

  • Creating a simple chart from data

How to run a cell: Click on any code cell below and press Shift + Enter (or click the Run button in the toolbar). The result appears directly below the cell.


Setup: Install and Import Packages

Before we start, run the cell below. It checks whether the required Python packages are already installed and installs any that are missing. This only needs to run once per session.

Run this cell first. Everything else in the lab depends on it.

# Setup: check for required packages and install if missing
import sys
import subprocess

def ensure_package(import_name, install_name=None):
    """Import a package; install it quietly if not found."""
    if install_name is None:
        install_name = import_name
    try:
        __import__(import_name)
        print(f"  ✓ {import_name} already installed")
    except ImportError:
        print(f"  ↓ Installing {install_name}...")
        subprocess.check_call(
            [sys.executable, "-m", "pip", "install", install_name, "--quiet"]
        )
        print(f"  ✓ {install_name} installed")

print("Checking packages...")
ensure_package("pandas")
ensure_package("matplotlib")
print("\nAll packages ready. You're good to go!")
Checking packages...
  ✓ pandas already installed
  ✓ matplotlib already installed

All packages ready. You're good to go!

Section 1: Your First Cell

Jupyter Notebooks are made of cells. There are two main types:

  • Markdown cells (like this one) display formatted text, images, and instructions

  • Code cells (gray background) contain Python code that you can run

Let’s run your first code cell. Press Shift + Enter on the cell below.

# This is a code cell. The # symbol marks a line as a "comment" - Python ignores it.
# Comments are notes for the human reading the code.

print("Welcome to CIS 13!")
Welcome to CIS 13!

You should see Welcome to CIS 13! appear below the cell. The number in the brackets [1] on the left tells you the order cells were run.

TO-DO 1: Your first print statement

In the cell below, replace the ... with a print() statement that displays your own name.

Example: print("Alex Rivera")

# TO-DO 1: Print your name
...

Section 2: Python as a Calculator

The simplest thing Python can do is math. Just like typing a formula in Excel’s formula bar, you can type an expression and Python evaluates it.

Excel formulaPython equivalent
=5+35 + 3
=10*210 * 2
=9/39 / 3
=2^82 ** 8

Run the demo cell below to see Python do some arithmetic:

# Demo: Python arithmetic
print(5 + 3)     # addition
print(10 * 2)    # multiplication
print(9 / 3)     # division
print(2 ** 8)    # exponentiation (2 to the power of 8)
8
20
3.0
256

TO-DO 2: Calculate a total cost

A customer buys 15 notebooks at $2.49 each and 4 pens at $0.99 each.

In the cell below, write a Python expression that calculates the total cost. Replace ... with your expression.

# TO-DO 2: Calculate total cost (notebooks + pens)
total_cost = ...
print("Total cost: $", total_cost)
Total cost: $ Ellipsis

Section 3: Variables

In Excel, you store a value in a cell (like cell B4). In Python, you store a value in a variable, which is a named container you create yourself.

Excel:   B4 = 49.99
Python:  price = 49.99

Variables make your code readable and reusable. Run the demo below:

# Demo: variables
product_name = "Wireless Mouse"
unit_price = 29.99
quantity = 3

total = unit_price * quantity

print("Product:", product_name)
print("Unit price: $", unit_price)
print("Quantity:", quantity)
print("Total: $", total)
Product: Wireless Mouse
Unit price: $ 29.99
Quantity: 3
Total: $ 89.97

Python has different data types for different kinds of values:

TypeExampleWhat it holds
int42Whole numbers
float29.99Decimal numbers
str"Wireless Mouse"Text (always in quotes)
boolTrue / FalseYes/No values

You can check a variable’s type with the built-in type() function:

# Demo: checking data types
print(type(42))          # int
print(type(29.99))       # float
print(type("hello"))     # str
print(type(True))        # bool
<class 'int'>
<class 'float'>
<class 'str'>
<class 'bool'>

TO-DO 3: Create your own variables

Imagine you work at a campus bookstore. Fill in the ... placeholders to create variables for a product of your choice, then print a summary line.

# TO-DO 3: Create variables for a bookstore product
item_name  = ...   # str: name of the product
item_price = ...   # float: price in dollars
in_stock   = ...   # bool: True if available, False if not

print(item_name, "- $", item_price, "- In stock:", in_stock)
Ellipsis - $ Ellipsis - In stock: Ellipsis

Question 1

In Excel, if you change a value in cell B4, any formula that references B4 automatically updates. In Python, if you update a variable after using it in a calculation, does the calculation update automatically? How would you test this?

Type your answer here.


Section 4: Lists

A single variable holds one value. A list holds many values in order. Think of it as one column from an Excel spreadsheet.

Excel column A:   Python list:
  Laptop            products = ["Laptop", "Keyboard", "Monitor", "Webcam"]
  Keyboard
  Monitor
  Webcam

Run the demo:

# Demo: lists
products = ["Laptop", "Keyboard", "Monitor", "Webcam"]
prices   = [899.99, 49.99, 299.99, 79.99]

print("Products:", products)
print("Prices:",   prices)
print("Number of products:", len(products))

# Access individual items using an index (counting starts at 0, not 1!)
print("First product:", products[0])
print("Last product:",  products[-1])
Products: ['Laptop', 'Keyboard', 'Monitor', 'Webcam']
Prices: [899.99, 49.99, 299.99, 79.99]
Number of products: 4
First product: Laptop
Last product: Webcam

Notice that Python starts counting at 0, not 1. So products[0] is "Laptop", and products[1] is "Keyboard".

TO-DO 4: Build a list and find the total

A store sold the following quantities of four items this week: 12, 5, 30, 8.

Create a list called weekly_sales, then use Python’s built-in sum() and max() functions to find the total and the highest single sale.

# TO-DO 4: Create the list and compute summary stats
weekly_sales = [...]   # fill in the four values

print("Total units sold:", sum(weekly_sales))
print("Best-selling item (units):", max(weekly_sales))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-12-5a31889203a2> in <module>
      2 weekly_sales = [...]   # fill in the four values
      3 
----> 4 print("Total units sold:", sum(weekly_sales))
      5 print("Best-selling item (units):", max(weekly_sales))

TypeError: unsupported operand type(s) for +: 'int' and 'ellipsis'

Question 2

Excel’s SUM() function and Python’s sum() function both add up a range of numbers. What is one advantage of using Python lists over an Excel column for storing data? What is one advantage Excel has?

Type your answer here.


Section 5: Introducing pandas

So far we’ve worked with individual values and simple lists. But real-world data has rows and columns, just like an Excel spreadsheet.

pandas is a Python library for working with tabular data. It gives us a DataFrame, which is an object that works like a spreadsheet inside Python.

Excel Workbook ≈ pandas DataFrame
Excel Row      ≈ DataFrame row
Excel Column   ≈ DataFrame Series (column)
=SUM(B:B)      ≈ df["Units Sold"].sum()
=AVERAGE(B:B)  ≈ df["Units Sold"].mean()

Run the cell below to load pandas and a small dataset we’ll use for the rest of the lab:

# Import pandas - we give it the nickname "pd" by convention
import pandas as pd

# Our dataset: Spartan Office Supply Store, Q1 Sales
data = {
    "Product":    ["Notebook", "Ballpoint Pen", "Stapler", "USB Drive",
                   "Desk Lamp", "Binder", "Calculator", "Highlighter Set"],
    "Category":   ["Paper", "Writing", "Supplies", "Technology",
                   "Furniture", "Paper", "Technology", "Writing"],
    "Units_Sold": [120, 340, 45, 98, 27, 88, 61, 215],
    "Unit_Price": [2.49, 0.99, 8.99, 14.99, 24.99, 3.49, 12.99, 5.99]
}

df = pd.DataFrame(data)
print("Dataset loaded! Shape:", df.shape, "(rows, columns)")
Dataset loaded! Shape: (8, 4) (rows, columns)

Viewing the data

Just like scrolling through an Excel spreadsheet, pandas gives you several ways to view your data:

# Display the full DataFrame (renders as a table in Jupyter)
df
Loading...
# .head() shows the first 5 rows - useful when your dataset is thousands of rows long
df.head()
Loading...
# .describe() gives summary statistics - like Excel's descriptive stats tool
df.describe()
Loading...

Selecting a column

To select a single column, use the column name in square brackets, the same way you’d click a column header in Excel:

# Select the Units_Sold column
print(df["Units_Sold"])

print("\nTotal units sold:",   df["Units_Sold"].sum())
print("Average units sold:",  df["Units_Sold"].mean())
print("Most units sold:",     df["Units_Sold"].max())
0    120
1    340
2     45
3     98
4     27
5     88
6     61
7    215
Name: Units_Sold, dtype: int64

Total units sold: 994
Average units sold: 124.25
Most units sold: 340

Creating a new column

In Excel, you’d write a formula like =B2*C2 and drag it down. In pandas, you define the whole column at once and it applies to every row automatically:

# Demo: create a Revenue column (Units_Sold x Unit_Price)
df["Revenue"] = df["Units_Sold"] * df["Unit_Price"]
df
Loading...

TO-DO 5: Add a Discount Price column

The store is running a 10% off promotion. Add a new column called "Discounted_Price" that shows each product’s unit price after the discount.

Hint: 10% off means multiply by 0.90.

# TO-DO 5: Add a Discounted_Price column (10% off each Unit_Price)
df["Discounted_Price"] = ...
df[["Product", "Unit_Price", "Discounted_Price"]]

Question 3

In Excel, applying a formula like =B2*0.90 to 1,000 rows means dragging the formula down (or double-clicking). In pandas, you write one line and it applies to all rows automatically. Why might this matter when working with large datasets in a business setting?

Type your answer here.


Section 6: Filtering Data

In Excel, you use Filters to show only rows that meet a condition. In pandas, you write a condition directly:

# Demo: show only Technology category products
tech_products = df[df["Category"] == "Technology"]
tech_products
# Demo: show products that sold more than 100 units
high_sellers = df[df["Units_Sold"] > 100]
high_sellers

Section 7: Your First Chart

pandas can create charts directly from your data, similar to Excel’s Insert Chart feature. We’ll use matplotlib, the standard Python charting library.

# matplotlib was loaded in the setup cell above
import matplotlib.pyplot as plt

# Demo - bar chart of Units Sold by Product
plt.figure(figsize=(10, 5))
plt.bar(df["Product"], df["Units_Sold"], color="steelblue")
plt.title("Q1 Units Sold by Product - Spartan Office Supply")
plt.xlabel("Product")
plt.ylabel("Units Sold")
plt.xticks(rotation=30, ha="right")
plt.tight_layout()
plt.show()

TO-DO 6: Chart the Revenue column

Create a bar chart showing Revenue (not Units Sold) for each product. Copy the demo above and change:

  1. The column on the y-axis from "Units_Sold" to "Revenue"

  2. The chart title to something appropriate

  3. The bar color to any color you like (try "coral", "seagreen", or "mediumpurple")

# TO-DO 6: Bar chart of Revenue by Product
plt.figure(figsize=(10, 5))
plt.bar(df["Product"], ..., color=...)
plt.title(...)
plt.xlabel("Product")
plt.ylabel("Revenue ($)")
plt.xticks(rotation=30, ha="right")
plt.tight_layout()
plt.show()

Question 4

Look at your Revenue chart compared to the Units Sold chart. Are the rankings the same? Which product generates the most revenue even though it might not sell the most units? What does this tell you as a store manager?

Type your answer here.

Question 5

This lab used a small dataset with 8 rows. Imagine you work for a company with 50,000 product transactions per month. List two things Python/pandas could do with that data that would be difficult or impractical to do manually in Excel.

Type your answer here.


That’s it!

Here’s a quick recap of what you worked through:

TopicWhat you tried
Jupyter NotebooksRunning code cells, mixing code and text
Python basicsVariables, data types, lists, arithmetic
pandasLoading data, selecting columns, creating new columns, filtering rows
matplotlibCreating and customizing a bar chart

These tools come up again in the databases and systems topics later in CIS 13, so anything you picked up here will be useful.


CIS 13 · El Camino College · Prof. Hac Le