Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Building a RAG Chatbot for Course Materials

Tutorial: Create a Local AI Tutor with GPT4All

In this tutorial, you’ll learn how to build a Retrieval-Augmented Generation (RAG) chatbot that can answer questions about your course materials using:

  • LangChain for document processing

  • ChromaDB for vector storage and semantic search

  • GPT4All for local AI responses (no API costs!)

  • Gradio for the chat interface

What you’ll build: A Data 88E course tutor that answers questions using only official course materials, never gives away homework answers, and runs 100% locally on your machine.

Time to complete: 30-45 minutes (first run includes ~4GB model download)

Prerequisites & Setup

What each package does:

  • gradio: Creates the chat interface

  • langchain: Handles document loading and text processing

  • chromadb: Vector database for semantic search

  • gpt4all: Local LLM (runs on your CPU, no API needed)

  • sentence-transformers: Creates embeddings for semantic search

Hardware requirements:

  • CPU: Any modern processor (GPU optional but faster)

  • RAM: 8GB minimum, 16GB recommended

  • Storage: ~5GB free space (for model + vector database)

#pip install gradio langchain langchain-huggingface langchain-chroma chromadb anthropic sentence-transformers python-dotenv
#!pip install langchain-text-splitters langchain-huggingface langchain-chroma
#!pip install langchain-community
import gradio as gr
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma

from langchain_community.document_loaders import DirectoryLoader, TextLoader

import anthropic
import os
from dotenv import load_dotenv


VECTOR_DB_PATH = "./chroma_db"
# ============================================
# CONFIGURATION
# ============================================
DOCUMENTS_PATH = "./88e_training_material-main/F24Lec_MD"  # Change this to your folder
VECTOR_DB_PATH = "./chroma_db"
MODEL_NAME = "mistral-7b-openorca.gguf2.Q4_0.gguf"

Step 2: Configure Paths and Settings

Set up where your documents are located and where to store the vector database.

Important: Change DOCUMENTS_PATH to point to your actual folder containing markdown files!

Data 88E Training Materials

Data 88E (Economics and Data Science) being mostly in the public licensed repos, offers an opportunity for training data for a fine-tuned LLM tutor. The course is designed to teach students how to apply data science tools to economic questions, using Python and real-world datasets. The course is built around as set of github repositories that contain all the materials, including:

  • Textbook: The main course content is in the form of a Jupyter Book 88e-textbook

  • Lecture Notebooks: Each lecture has a corresponding Jupyter notebook with code examples and exercises ( e.g. LectureNBs)

  • Slides: Lecture slides are also available in Google Drive and converted to markdown in the training materials (google drive)

  • Course Calendar: The schedule and topics covered each week are documented in calendar from the course website, also converted to markdown for trainint Fall 2025 Calendar

Training Data Preparation The Making_training_material repo contains the source files and scripts used to convert raw course content into clean markdown, pulling from the textbook, lecture notebooks, slides, and course calendar.

The parsed output lives in 88e_training_material — a self-contained, subject-specific corpus built entirely from the course’s own open-source materials, used to fine-tune the model into a grounded tutor for the course.

Download the materials Skip this cell if you have already downloaded the materials.

repo = "https://github.com/data-88e/88e_training_material"

url = f"{repo}/archive/refs/heads/main.zip"
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("./")
!ls -l 88e_training_material-main/

Step 3: Load Course Documents

This step loads all markdown (.md) files from your course materials folder.

What’s happening:

  1. DirectoryLoader scans the folder recursively

  2. glob="**/*.md" finds all markdown files in all subfolders

  3. TextLoader reads each file as plain text

  4. Documents are stored with metadata (filename, path)

Expected: You should see “Loaded X markdown files” where X is the number of .md files in your folder.

DOCUMENTS_PATH = "./88e_training_material-main/F24Textbook_MD"
# ============================================
# 1. LOAD DOCUMENTS
# ============================================
print("Loading documents...")
loader = DirectoryLoader(
    DOCUMENTS_PATH,
    glob="**/*.md",
    loader_cls=TextLoader,
    loader_kwargs={'encoding': 'utf-8'}
)
documents = loader.load()
print(f"✓ Loaded {len(documents)} markdown files")

Step 4: Split Documents into Chunks

Large documents are split into smaller chunks for better retrieval.

Why split documents?

  • LLMs have limited context windows (~2000 tokens for GPT4All)

  • Smaller chunks = more precise retrieval

  • Better matching between questions and relevant content

Settings explained:

  • chunk_size=500: Each chunk is ~500 characters

  • chunk_overlap=100: 100 characters overlap between chunks (prevents cutting sentences)

  • separators: Split on markdown headers first, then paragraphs, then sentences

Expected: You’ll get many more chunks than documents (usually 5-10x more).

# ============================================
# 2. SPLIT DOCUMENTS INTO CHUNKS
# ============================================
print("Splitting documents into chunks...")
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100,
    separators=["\n## ", "\n### ", "\n\n", "\n", " ", ""]
)
splits = text_splitter.split_documents(documents)
print(f"✓ Split into {len(splits)} chunks")

Step 5: Create Vector Database (Embeddings)

This is the most important step for RAG! We convert text into vectors (numbers) so we can search semantically.

What are embeddings?

  • Embeddings convert text into arrays of numbers (vectors)

  • Similar meanings → similar vectors

  • Example: “GDP growth” and “economic expansion” will have similar vectors

How it works:

  1. Each chunk is converted to a 384-dimensional vector using all-MiniLM-L6-v2

  2. Vectors are stored in ChromaDB for fast similarity search

  3. Database is saved to disk (so you only do this once!)

First run: Takes 1-3 minutes to create embeddings
Subsequent runs: Loads from disk in ~5 seconds

Progress indicator: You’ll see a progress bar as embeddings are created.

# ============================================
# 3. CREATE VECTOR STORE (or load if exists)
# ============================================
if os.path.exists(VECTOR_DB_PATH):
    print("Loading existing vector store...")
    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2",
        model_kwargs={'device': 'cpu'}
    )
    vectorstore = Chroma(
        persist_directory=VECTOR_DB_PATH,
        embedding_function=embeddings
    )
    print("✓ Vector store loaded from disk")
else:
    print("Creating new vector store (this may take a few minutes)...")
    embeddings = HuggingFaceEmbeddings(
        model_name="sentence-transformers/all-MiniLM-L6-v2",
        model_kwargs={'device': 'cpu'}
    )
    vectorstore = Chroma.from_documents(
        documents=splits,
        embedding=embeddings,
        persist_directory=VECTOR_DB_PATH
    )
    print("✓ Vector store created and saved")

Step 6: Load the Language Model

Now we load GPT4All, a local language model that runs entirely on your CPU.

About GPT4All:

  • Runs 100% locally (no internet needed after download)

  • No API costs

  • Privacy-friendly (your data never leaves your machine)

  • Works offline

Model used: mistral-7b-openorca (~4GB)

  • Based on Mistral 7B architecture

  • Fine-tuned for instruction following

  • Good balance of quality and speed

First run only: The model will auto-download (~4GB, takes 3-5 minutes)
Subsequent runs: Model loads from cache in seconds

Models are cached in: ~/.cache/gpt4all/

# ============================================
# 4. LOAD GPT4ALL MODEL
# ============================================
print(f"Loading GPT4All model: {MODEL_NAME}")
print("(First run will download ~4GB - this may take a few minutes)")
model = GPT4All(MODEL_NAME)
print("✓ Model loaded successfully!")

Step 7: Create the RAG Chat Function

This is where the magic happens! The chat() function implements RAG:

RAG Flow (Retrieval-Augmented Generation):

User Question → Retrieve Relevant Chunks → Create Prompt → Generate Answer

How it works:

  1. Retrieval: Search vector store for relevant chunks (k=2 means top 2 results)

  2. Context building: Combine retrieved chunks into context

  3. Prompt creation: Build a prompt with:

    • System instructions (Data 88E Tutor rules)

    • Course material context

    • User’s question

  4. Generation: GPT4All generates response using the context

  5. Post-processing: Add source citations

Data 88E Tutor Rules:

  • ✅ Uses only official course materials

  • ✅ Never gives away homework answers

  • ✅ Provides conceptual guidance instead of solutions

  • ✅ Always cites sources

  • ✅ Assumes questions are from assignments (safe mode)

Response time: 10-60 seconds on CPU (depending on your processor)

# ============================================
# 5. CHAT FUNCTION WITH RAG
# ============================================
chat_history = []

def chat(message, history):
    """
    Process user message with RAG (Retrieval-Augmented Generation)
    """
    global chat_history
    
    # Retrieve relevant documents from vector store
    docs = vectorstore.similarity_search(message, k=2)
    context = "\n\n".join([doc.page_content for doc in docs])
    
    # Build conversation history (last 3 exchanges)
    conversation = ""
    for user_msg, bot_msg in chat_history[-3:]:
        conversation += f"User: {user_msg}\nAssistant: {bot_msg}\n\n"
    
    # Create prompt with Data 88E Tutor system instructions
    prompt = f"""You are "Data 88E Tutor", a course assistant for Foundations of Data Science and Economic Models.

**Core Mission:**
1. Answer student questions only using official FA24 course materials: Slides, Lecture Notebooks, Textbook.
2. Stay within course scope.
3. Never give away assignment answers. Help students learn how to find and verify answers themselves.

**Assignment-Safe Mode (Always On):**
Always assume a question is from homework/labs/projects unless stated otherwise.

**Hard rules:**
- Do not provide final numeric answers, exact code that works on real datasets, or correct options for multiple choice.
- Do not reveal dataset-specific statistics, parameter values, or test expectations.
- Do not run or infer on real assignment filenames/columns.

**Instead, provide only:**
- High-level strategy, conceptual steps, and why they matter.
- Pseudocode or toy Python snippets on fabricated mini-datasets.
- Relevant formulas (symbols, not assignment numbers) + variable definitions + units.
- Diagnostic checklists, plotting advice, and ways to self-verify results.
- Pointers to exact Slides, Lecture Notebooks, and Textbook sections.

**Multiple-Choice Safety:**
For MCQ prompts:
- Never state or imply the correct option.
- Briefly define what each option represents conceptually.
- Give elimination tips.
- End by inviting the student to choose after checking their work.

**Style:**
- Be concise, step-by-step, and student-friendly.
- When using numbers, show the formula first, define variables with units.
- If uncertain, say so and point to the closest reading.

Use the following course materials to answer the question:

{context}

{conversation}User: {message}
Assistant:"""
    
    # Generate response using GPT4All
    response = model.generate(
        prompt,
        max_tokens=512,
        temp=0.7,
        top_p=0.9
    )
    
    # Update chat history
    chat_history.append((message, response))
    
    # Optional: show sources
    sources = set([os.path.basename(doc.metadata.get('source', 'Unknown')) for doc in docs])
    if sources and len(sources) > 0:
        response += f"\n\n📚 *Sources: {', '.join(list(sources)[:2])}*"
    
    return response

Step 8: Launch the Chat Interface

Finally, we create a Gradio chat interface so you can interact with your tutor!

Gradio Interface:

  • Clean, chat-like UI

  • Example questions provided

  • Works in browser

  • Can share with others (set share=True)

Interface features:

  • Title and description

  • Pre-loaded example questions

  • Chat history maintained within session

  • Responsive design

After launching:

  1. You’ll see a local URL (e.g., http://127.0.0.1:8765)

  2. Click the link or copy-paste into your browser

  3. Start asking questions!

Tips:

  • First response may be slow (~60 seconds) while model warms up

  • Subsequent responses are faster

  • Questions should be about course content

  • Try the example questions to get started

# ============================================
# 6. LAUNCH GRADIO INTERFACE
# ============================================
print("\n" + "="*50)
print("Starting Gradio interface...")
print("="*50 + "\n")

demo = gr.ChatInterface(
    fn=chat,
    title="📚 Data 88E RAG Chatbot",
    description="Ask me anything about the course materials! Powered by GPT4All running 100% locally.",
    examples=[
        "What topics are covered in this course?",
        "Explain the Kuznets Hypothesis",
        "What is economic data science?",
        "Summarize the main concepts"
    ],
)

if __name__ == "__main__":
    demo.launch(
        share=True,  # Set to True if you want a public link
        server_name="0.0.0.0",  # Makes it accessible on your network
        server_port=8768
    )

💡 Usage Tips & Troubleshooting

How to Use the Chatbot:

  1. Ask questions naturally: “What is GDP?” or “Explain regression”

  2. Reference specific topics: “Tell me about Week 5 content”

  3. Ask for clarification: “Can you explain that in simpler terms?”

  4. Test assignment understanding: “How would I approach calculating elasticity?” (gets conceptual guidance, not answers!)

What to Expect:

Good questions:

  • “What is the Kuznets Hypothesis?”

  • “How do I interpret regression coefficients?”

  • “What’s the difference between GDP and GNP?”

Won’t get direct answers to:

  • “What’s the answer to Problem 3?”

  • “Give me the code for Question 2”

  • “Which option is correct: A, B, C, or D?”

Troubleshooting:

Problem: “Loaded 0 markdown files”

  • Fix: Check that DOCUMENTS_PATH points to correct folder

  • Verify folder contains .md files

Problem: “Cannot find empty port”

  • Fix: Change server_port=8765 to a different number (e.g., 7860, 8080)

Problem: Response taking forever (>2 minutes)

  • Fix: This is normal on slow CPUs for first response

  • Consider using a smaller model: orca-mini-3b-gguf2-q4_0.gguf

Problem: Responses are nonsensical

  • Fix: Try lowering max_tokens to 256

  • Check that documents loaded correctly

  • Restart kernel and try again

Problem: Model download fails

Performance Tips:

🚀 To make it faster:

  • Use GPU version (see GPU tutorial section)

  • Use smaller model: orca-mini-3b

  • Reduce max_tokens to 256

  • Reduce k=2 to k=1 in retrieval

📊 To improve quality:

  • Increase chunk_size to 1000 for more context

  • Increase k=2 to k=3 for more retrieved chunks

  • Try different models (Mistral 7B for better quality)

import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")