Importing, Cleaning, and Technical Indicators¶
In this lab, we will walk through the process of:
Importing real-world financial data from APIs.
Cleaning and processing messy data that has missing values and formatting issues.
Thinking about data as complete datasets rather than individual cells.
Calculating daily returns to measure stock performance.
Calculating technical indicators (Moving Averages, Bollinger Bands) to analyze stock trends and volatility.
Important Disclaimer: This lab is for educational purposes only. Stock trading involves significant risk, and past performance does not guarantee future results.
# Run this cell — installs and imports all required packages
try:
import yfinance as yf
except ImportError:
!pip install yfinance
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import warnings
warnings.filterwarnings('ignore')
%matplotlib inline
print("All libraries loaded successfully!")
1. Importing Stock Data¶
In Excel, getting stock data usually means going to a website, copying a table, and pasting it into a spreadsheet. In Python, we can download years of data for multiple stocks in a single line of code.
We’ll use the yfinance library, which pulls historical price data directly from Yahoo Finance.
# Run this cell — download 5 years of data for several stocks
tickers = ['AAPL', 'GOOGL', 'MSFT', 'AMZN', 'TSLA']
stock_data = {}
for ticker in tickers:
df = yf.download(ticker, period='5y', progress=False)
df.columns = df.columns.get_level_values(0)
stock_data[ticker] = df
print(f" {ticker}: {len(df):,} trading days downloaded")
# We'll focus on Apple for most of the lab
aapl = stock_data['AAPL'].copy()
print(f"\nDate range: {aapl.index.min().date()} to {aapl.index.max().date()}")
print(f"Total rows: {len(aapl):,}")
Let’s look at the first few rows. Notice how the data is organized. Each row is a trading day and each column is a measurement. This is fundamentally different from Excel, where you might think about individual cells. In pandas, we think about entire columns at once.
# Run this cell — preview the data
aapl.head(10)
Question 1: How many columns does the stock data have? What does each column represent? How would you describe the difference between “Open” and “Close” price?
Type your answer here, replacing this text.
2. Data Cleaning: Dealing with Messy Real-World Data¶
Real-world data is rarely perfect. Stock data can have missing values (holidays, market closures, data errors), incorrect types, or gaps. Let’s inspect the data quality before we do any analysis.
In Excel, you might scan through rows visually. In pandas, we can check the entire dataset in one command.
# Run this cell — check for missing values across ALL columns at once
print("=== Missing Values per Column ===")
print(aapl.isnull().sum())
print(f"\nTotal missing cells: {aapl.isnull().sum().sum()}")
print(f"Dataset shape: {aapl.shape[0]} rows x {aapl.shape[1]} columns")
print(f"Data types:\n{aapl.dtypes}")
Even if there are no missing values, it’s good practice to check. Now let’s look at basic statistics -- the equivalent of using Excel’s AVERAGE, MIN, MAX, and STDEV functions, but applied to every column at once.
# Run this cell — summary statistics for all columns at once
aapl.describe().round(2)
TO-DO: Sometimes stock data has days where the volume is zero or unusually low, which can indicate data quality issues. Find all days where volume was below 1,000,000.
Hint
aapl[aapl['Volume'] < threshold]# TO-DO: Find days with unusually low volume
low_volume_days = aapl[aapl['Volume'] < ...]
print(f"Days with volume below 1,000,000: {len(low_volume_days)}")
low_volume_days[['Close', 'Volume']].head(10)
Question 2: Why might some trading days have very low volume? What could cause this in the real stock market?
Type your answer here, replacing this text.
3. Calculating Daily Returns¶
Daily returns measure the percentage change in price from one day to the next. This is one of the most fundamental calculations in finance. In Excel, you’d create a formula like =(B3-B2)/B2 and drag it down. In pandas, it’s one method call across the entire column.
# Run this cell — calculate daily returns
aapl['Daily_Return'] = aapl['Close'].pct_change() * 100 # as percentage
print("=== Daily Returns (%) ===")
print(aapl[['Close', 'Daily_Return']].tail(10).to_string())
print(f"\nAverage daily return: {aapl['Daily_Return'].mean():.4f}%")
print(f"Std dev of daily returns: {aapl['Daily_Return'].std():.4f}%")
print(f"Best single day: {aapl['Daily_Return'].max():.2f}%")
print(f"Worst single day: {aapl['Daily_Return'].min():.2f}%")
# Run this cell — visualize the distribution of daily returns
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
# Histogram
axes[0].hist(aapl['Daily_Return'].dropna(), bins=80, color='steelblue', edgecolor='white', alpha=0.8)
axes[0].axvline(x=0, color='red', linestyle='--', linewidth=1)
axes[0].set_title('Distribution of Daily Returns', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Daily Return (%)')
axes[0].set_ylabel('Frequency')
# Time series
axes[1].plot(aapl.index, aapl['Daily_Return'], color='steelblue', linewidth=0.5, alpha=0.7)
axes[1].axhline(y=0, color='red', linestyle='--', linewidth=1)
axes[1].set_title('Daily Returns Over Time', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Date')
axes[1].set_ylabel('Daily Return (%)')
plt.tight_layout()
plt.show()
Question 3: Looking at the histogram, does the distribution of daily returns look symmetric? What does the shape of the distribution tell you about the risk of holding a stock?
Type your answer here, replacing this text.
4. Moving Averages (MA)¶
A moving average smooths out short-term fluctuations to reveal longer-term trends. It’s one of the most widely used tools in stock analysis.
20-day MA responds quickly to price changes (short-term trend)
50-day MA is smoother and slower (medium-term trend)
200-day MA shows the long-term direction of the stock
In Excel, a 20-day moving average would be something like =AVERAGE(B2:B21) dragged down hundreds of rows. In pandas, it’s a single .rolling().mean() call.
# Run this cell — calculate 20-day and 50-day moving averages
aapl['MA_20'] = aapl['Close'].rolling(window=20).mean()
aapl['MA_50'] = aapl['Close'].rolling(window=50).mean()
print("Last 10 days with moving averages:")
aapl[['Close', 'MA_20', 'MA_50']].tail(10)
TO-DO: Add a 200-day moving average to the dataframe. The 200-day MA is commonly used to identify long-term market trends.
Hint
aapl['MA_200'] = aapl['Close'].rolling(window=???).mean()# TO-DO: Calculate the 200-day moving average
aapl['MA_200'] = aapl['Close'].rolling(window=...).mean()
# Verify it worked
print("200-day MA (last 5 days):")
print(aapl[['Close', 'MA_200']].tail())
# Run this cell — plot closing price with all three moving averages (last 2 years)
last_2y = aapl.loc[aapl.index >= aapl.index.max() - pd.DateOffset(years=2)]
plt.figure(figsize=(14, 7))
plt.plot(last_2y.index, last_2y['Close'], label='Closing Price', color='black', linewidth=1.2)
plt.plot(last_2y.index, last_2y['MA_20'], label='20-day MA', color='dodgerblue', linewidth=1)
plt.plot(last_2y.index, last_2y['MA_50'], label='50-day MA', color='orange', linewidth=1)
plt.plot(last_2y.index, last_2y['MA_200'], label='200-day MA', color='red', linewidth=1, linestyle='--')
plt.title('Apple Stock Price with Moving Averages', fontsize=16, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Price ($)', fontsize=12)
plt.legend(fontsize=11)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()
Question 4: When the 20-day MA crosses above the 50-day MA, traders call this a “Golden Cross” -- a bullish signal. When it crosses below, that’s a “Death Cross” -- a bearish signal. Can you spot any Golden Crosses or Death Crosses in the chart above? Do they seem to predict what happened next?
Type your answer here, replacing this text.
5. Bollinger Bands¶
Bollinger Bands are a volatility indicator built from three lines:
Middle Band: 20-day Simple Moving Average (SMA)
Upper Band: SMA + 2 x (20-day standard deviation)
Lower Band: SMA - 2 x (20-day standard deviation)
The idea is simple. If the price moves outside the bands, the stock may be overbought (above upper band) or oversold (below lower band). Statistically, about 95% of closing prices should fall within the bands.
In Excel, you’d need separate columns for the SMA, standard deviation, upper band, and lower band, each with a formula dragged across hundreds of rows. In pandas, it’s a few lines.
TO-DO: Calculate Bollinger Bands. Fill in the missing parts:
# TO-DO: Calculate Bollinger Bands
bb_window = 20
# Step 1: Calculate the 20-day SMA (middle band)
aapl['BB_Middle'] = aapl['Close'].rolling(bb_window).mean()
# Step 2: Calculate the 20-day rolling standard deviation
# Hint: Use .rolling(bb_window).std()
aapl['BB_Std'] = aapl['Close'].rolling(bb_window)...
# Step 3: Calculate Upper Band = Middle + 2 * Std
aapl['BB_Upper'] = aapl['BB_Middle'] + (... * aapl['BB_Std'])
# Step 4: Calculate Lower Band = Middle - 2 * Std
aapl['BB_Lower'] = aapl['BB_Middle'] - (... * aapl['BB_Std'])
print("Bollinger Bands (last 5 days):")
aapl[['Close', 'BB_Upper', 'BB_Middle', 'BB_Lower']].tail()
# Run this cell — plot Bollinger Bands (last 2 years)
last_2y = aapl.loc[aapl.index >= aapl.index.max() - pd.DateOffset(years=2)]
plt.figure(figsize=(14, 7))
plt.plot(last_2y.index, last_2y['Close'], label='Closing Price', color='blue', linewidth=1.2)
plt.plot(last_2y.index, last_2y['BB_Middle'], label='SMA (20-day)', color='black', linestyle='--', linewidth=1)
plt.plot(last_2y.index, last_2y['BB_Upper'], label='Upper Band', color='red', linewidth=0.8)
plt.plot(last_2y.index, last_2y['BB_Lower'], label='Lower Band', color='green', linewidth=0.8)
plt.fill_between(last_2y.index, last_2y['BB_Upper'], last_2y['BB_Lower'], color='gray', alpha=0.15)
plt.title('Apple Stock -- Bollinger Bands (Last 2 Years)', fontsize=16, fontweight='bold')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Price ($)', fontsize=12)
plt.legend(fontsize=11)
plt.grid(alpha=0.3)
plt.tight_layout()
plt.show()
TO-DO: Identify overbought and oversold days in the last 2 years. Use boolean indexing to find days where the closing price broke above the upper band (overbought) or below the lower band (oversold).
Hint
last_2y[last_2y['Close'] > last_2y['BB_Upper']]# TO-DO: Find overbought and oversold days
# Hint: overbought means Close > BB_Upper, oversold means Close < BB_Lower
overbought = last_2y[last_2y['Close'] ... last_2y['BB_Upper']]
oversold = last_2y[last_2y['Close'] ... last_2y['BB_Lower']]
print(f"Overbought days (last 2 years): {len(overbought)}")
print(f"Oversold days (last 2 years): {len(oversold)}")
if len(overbought) > 0:
print("\nMost recent overbought dates:")
print(overbought[['Close', 'BB_Upper']].tail())
if len(oversold) > 0:
print("\nMost recent oversold dates:")
print(oversold[['Close', 'BB_Lower']].tail())
Question 5: Looking at the Bollinger Bands chart, can you identify any periods where the bands became very wide or very narrow? What does the width of the bands tell you about the stock’s behavior during those periods?
Type your answer here, replacing this text.
Summary¶
In this lab, you learned to:
Import live stock data using Python APIs, replacing manual copy-paste from websites
Clean and inspect data quality across an entire dataset at once
Calculate daily returns to measure stock performance
Calculate moving averages (20-day, 50-day, 200-day) to identify trends
Calculate Bollinger Bands to measure volatility and identify overbought/oversold conditions
Key takeaway: Python and pandas let you work with data as complete sets, calculating indicators across thousands of rows simultaneously. This is the foundation of data-driven analysis and a major advantage over doing these calculations cell by cell in Excel.
Important reminder: This lab is for educational purposes only. Real stock trading involves significant financial risk. Always consult a qualified financial advisor before making investment decisions.
In Part 2, we will build on these indicators to implement a trading strategy, backtest it against historical data, and measure risk.