Tutorial: Create a Local AI Tutor with GPT4All¶
In this tutorial, you’ll learn how to build a Retrieval-Augmented Generation (RAG) chatbot that can answer questions about your course materials using:
LangChain for document processing
ChromaDB for vector storage and semantic search
GPT4All for local AI responses (no API costs!)
Gradio for the chat interface
What you’ll build: A Data 88E course tutor that answers questions using only official course materials, never gives away homework answers, and runs 100% locally on your machine.
Time to complete: 30-45 minutes (first run includes ~4GB model download)
Prerequisites & Setup¶
What each package does:
gradio: Creates the chat interfacelangchain: Handles document loading and text processingchromadb: Vector database for semantic searchgpt4all: Local LLM (runs on your CPU, no API needed)sentence-transformers: Creates embeddings for semantic search
Hardware requirements:
CPU: Any modern processor (GPU optional but faster)
RAM: 8GB minimum, 16GB recommended
Storage: ~5GB free space (for model + vector database)
#pip install gradio langchain langchain-huggingface langchain-chroma chromadb anthropic sentence-transformers python-dotenv
#!pip install langchain-text-splitters langchain-huggingface langchain-chroma
#!pip install langchain-community
import gradio as gr
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_chroma import Chroma
from langchain_community.document_loaders import DirectoryLoader, TextLoader
import anthropic
import os
from dotenv import load_dotenv
VECTOR_DB_PATH = "./chroma_db"
# ============================================
# CONFIGURATION
# ============================================
DOCUMENTS_PATH = "./88e_training_material-main/F24Lec_MD" # Change this to your folder
VECTOR_DB_PATH = "./chroma_db"
MODEL_NAME = "mistral-7b-openorca.gguf2.Q4_0.gguf"Step 2: Configure Paths and Settings¶
Set up where your documents are located and where to store the vector database.
Important: Change DOCUMENTS_PATH to point to your actual folder containing markdown files!
Data 88E Training Materials¶
Data 88E (Economics and Data Science) being mostly in the public licensed repos, offers an opportunity for training data for a fine-tuned LLM tutor. The course is designed to teach students how to apply data science tools to economic questions, using Python and real-world datasets. The course is built around as set of github repositories that contain all the materials, including:
Textbook: The main course content is in the form of a Jupyter Book
88e-textbookLecture Notebooks: Each lecture has a corresponding Jupyter notebook with code examples and exercises ( e.g. LectureNBs)
Slides: Lecture slides are also available in Google Drive and converted to markdown in the training materials (google drive)
Course Calendar: The schedule and topics covered each week are documented in calendar from the course website, also converted to markdown for trainint Fall 2025 Calendar
Training Data Preparation
The Making_training_material repo contains the source files and scripts used to convert raw course content into clean markdown, pulling from the textbook, lecture notebooks, slides, and course calendar.
The parsed output lives in 88e_training_material — a self-contained, subject-specific corpus built entirely from the course’s own open-source materials, used to fine-tune the model into a grounded tutor for the course.
Download the materials Skip this cell if you have already downloaded the materials.
repo = "https://github.com/data-88e/88e_training_material"
url = f"{repo}/archive/refs/heads/main.zip"
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall("./")!ls -l 88e_training_material-main/
Step 3: Load Course Documents¶
This step loads all markdown (.md) files from your course materials folder.
What’s happening:
DirectoryLoaderscans the folder recursivelyglob="**/*.md"finds all markdown files in all subfoldersTextLoaderreads each file as plain textDocuments are stored with metadata (filename, path)
Expected: You should see “Loaded X markdown files” where X is the number of .md files in your folder.
DOCUMENTS_PATH = "./88e_training_material-main/F24Textbook_MD"# ============================================
# 1. LOAD DOCUMENTS
# ============================================
print("Loading documents...")
loader = DirectoryLoader(
DOCUMENTS_PATH,
glob="**/*.md",
loader_cls=TextLoader,
loader_kwargs={'encoding': 'utf-8'}
)
documents = loader.load()
print(f"✓ Loaded {len(documents)} markdown files")Step 4: Split Documents into Chunks¶
Large documents are split into smaller chunks for better retrieval.
Why split documents?
LLMs have limited context windows (~2000 tokens for GPT4All)
Smaller chunks = more precise retrieval
Better matching between questions and relevant content
Settings explained:
chunk_size=500: Each chunk is ~500 characterschunk_overlap=100: 100 characters overlap between chunks (prevents cutting sentences)separators: Split on markdown headers first, then paragraphs, then sentences
Expected: You’ll get many more chunks than documents (usually 5-10x more).
# ============================================
# 2. SPLIT DOCUMENTS INTO CHUNKS
# ============================================
print("Splitting documents into chunks...")
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=100,
separators=["\n## ", "\n### ", "\n\n", "\n", " ", ""]
)
splits = text_splitter.split_documents(documents)
print(f"✓ Split into {len(splits)} chunks")Step 5: Create Vector Database (Embeddings)¶
This is the most important step for RAG! We convert text into vectors (numbers) so we can search semantically.
What are embeddings?
Embeddings convert text into arrays of numbers (vectors)
Similar meanings → similar vectors
Example: “GDP growth” and “economic expansion” will have similar vectors
How it works:
Each chunk is converted to a 384-dimensional vector using
all-MiniLM-L6-v2Vectors are stored in ChromaDB for fast similarity search
Database is saved to disk (so you only do this once!)
First run: Takes 1-3 minutes to create embeddings
Subsequent runs: Loads from disk in ~5 seconds
Progress indicator: You’ll see a progress bar as embeddings are created.
# ============================================
# 3. CREATE VECTOR STORE (or load if exists)
# ============================================
if os.path.exists(VECTOR_DB_PATH):
print("Loading existing vector store...")
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2",
model_kwargs={'device': 'cpu'}
)
vectorstore = Chroma(
persist_directory=VECTOR_DB_PATH,
embedding_function=embeddings
)
print("✓ Vector store loaded from disk")
else:
print("Creating new vector store (this may take a few minutes)...")
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2",
model_kwargs={'device': 'cpu'}
)
vectorstore = Chroma.from_documents(
documents=splits,
embedding=embeddings,
persist_directory=VECTOR_DB_PATH
)
print("✓ Vector store created and saved")Step 6: Load the Language Model¶
Now we load GPT4All, a local language model that runs entirely on your CPU.
About GPT4All:
Runs 100% locally (no internet needed after download)
No API costs
Privacy-friendly (your data never leaves your machine)
Works offline
Model used: mistral-7b-openorca (~4GB)
Based on Mistral 7B architecture
Fine-tuned for instruction following
Good balance of quality and speed
First run only: The model will auto-download (~4GB, takes 3-5 minutes)
Subsequent runs: Model loads from cache in seconds
Models are cached in: ~/.cache/gpt4all/
# ============================================
# 4. LOAD GPT4ALL MODEL
# ============================================
print(f"Loading GPT4All model: {MODEL_NAME}")
print("(First run will download ~4GB - this may take a few minutes)")
model = GPT4All(MODEL_NAME)
print("✓ Model loaded successfully!")Step 7: Create the RAG Chat Function¶
This is where the magic happens! The chat() function implements RAG:
RAG Flow (Retrieval-Augmented Generation):
User Question → Retrieve Relevant Chunks → Create Prompt → Generate AnswerHow it works:
Retrieval: Search vector store for relevant chunks (
k=2means top 2 results)Context building: Combine retrieved chunks into context
Prompt creation: Build a prompt with:
System instructions (Data 88E Tutor rules)
Course material context
User’s question
Generation: GPT4All generates response using the context
Post-processing: Add source citations
Data 88E Tutor Rules:
✅ Uses only official course materials
✅ Never gives away homework answers
✅ Provides conceptual guidance instead of solutions
✅ Always cites sources
✅ Assumes questions are from assignments (safe mode)
Response time: 10-60 seconds on CPU (depending on your processor)
# ============================================
# 5. CHAT FUNCTION WITH RAG
# ============================================
chat_history = []
def chat(message, history):
"""
Process user message with RAG (Retrieval-Augmented Generation)
"""
global chat_history
# Retrieve relevant documents from vector store
docs = vectorstore.similarity_search(message, k=2)
context = "\n\n".join([doc.page_content for doc in docs])
# Build conversation history (last 3 exchanges)
conversation = ""
for user_msg, bot_msg in chat_history[-3:]:
conversation += f"User: {user_msg}\nAssistant: {bot_msg}\n\n"
# Create prompt with Data 88E Tutor system instructions
prompt = f"""You are "Data 88E Tutor", a course assistant for Foundations of Data Science and Economic Models.
**Core Mission:**
1. Answer student questions only using official FA24 course materials: Slides, Lecture Notebooks, Textbook.
2. Stay within course scope.
3. Never give away assignment answers. Help students learn how to find and verify answers themselves.
**Assignment-Safe Mode (Always On):**
Always assume a question is from homework/labs/projects unless stated otherwise.
**Hard rules:**
- Do not provide final numeric answers, exact code that works on real datasets, or correct options for multiple choice.
- Do not reveal dataset-specific statistics, parameter values, or test expectations.
- Do not run or infer on real assignment filenames/columns.
**Instead, provide only:**
- High-level strategy, conceptual steps, and why they matter.
- Pseudocode or toy Python snippets on fabricated mini-datasets.
- Relevant formulas (symbols, not assignment numbers) + variable definitions + units.
- Diagnostic checklists, plotting advice, and ways to self-verify results.
- Pointers to exact Slides, Lecture Notebooks, and Textbook sections.
**Multiple-Choice Safety:**
For MCQ prompts:
- Never state or imply the correct option.
- Briefly define what each option represents conceptually.
- Give elimination tips.
- End by inviting the student to choose after checking their work.
**Style:**
- Be concise, step-by-step, and student-friendly.
- When using numbers, show the formula first, define variables with units.
- If uncertain, say so and point to the closest reading.
Use the following course materials to answer the question:
{context}
{conversation}User: {message}
Assistant:"""
# Generate response using GPT4All
response = model.generate(
prompt,
max_tokens=512,
temp=0.7,
top_p=0.9
)
# Update chat history
chat_history.append((message, response))
# Optional: show sources
sources = set([os.path.basename(doc.metadata.get('source', 'Unknown')) for doc in docs])
if sources and len(sources) > 0:
response += f"\n\n📚 *Sources: {', '.join(list(sources)[:2])}*"
return responseStep 8: Launch the Chat Interface¶
Finally, we create a Gradio chat interface so you can interact with your tutor!
Gradio Interface:
Clean, chat-like UI
Example questions provided
Works in browser
Can share with others (set
share=True)
Interface features:
Title and description
Pre-loaded example questions
Chat history maintained within session
Responsive design
After launching:
You’ll see a local URL (e.g.,
http://127.0.0.1:8765)Click the link or copy-paste into your browser
Start asking questions!
Tips:
First response may be slow (~60 seconds) while model warms up
Subsequent responses are faster
Questions should be about course content
Try the example questions to get started
# ============================================
# 6. LAUNCH GRADIO INTERFACE
# ============================================
print("\n" + "="*50)
print("Starting Gradio interface...")
print("="*50 + "\n")
demo = gr.ChatInterface(
fn=chat,
title="📚 Data 88E RAG Chatbot",
description="Ask me anything about the course materials! Powered by GPT4All running 100% locally.",
examples=[
"What topics are covered in this course?",
"Explain the Kuznets Hypothesis",
"What is economic data science?",
"Summarize the main concepts"
],
)
if __name__ == "__main__":
demo.launch(
share=True, # Set to True if you want a public link
server_name="0.0.0.0", # Makes it accessible on your network
server_port=8768
)💡 Usage Tips & Troubleshooting¶
How to Use the Chatbot:¶
Ask questions naturally: “What is GDP?” or “Explain regression”
Reference specific topics: “Tell me about Week 5 content”
Ask for clarification: “Can you explain that in simpler terms?”
Test assignment understanding: “How would I approach calculating elasticity?” (gets conceptual guidance, not answers!)
What to Expect:¶
✅ Good questions:
“What is the Kuznets Hypothesis?”
“How do I interpret regression coefficients?”
“What’s the difference between GDP and GNP?”
❌ Won’t get direct answers to:
“What’s the answer to Problem 3?”
“Give me the code for Question 2”
“Which option is correct: A, B, C, or D?”
Troubleshooting:¶
Problem: “Loaded 0 markdown files”
Fix: Check that
DOCUMENTS_PATHpoints to correct folderVerify folder contains
.mdfiles
Problem: “Cannot find empty port”
Fix: Change
server_port=8765to a different number (e.g., 7860, 8080)
Problem: Response taking forever (>2 minutes)
Fix: This is normal on slow CPUs for first response
Consider using a smaller model:
orca-mini-3b-gguf2-q4_0.gguf
Problem: Responses are nonsensical
Fix: Try lowering
max_tokensto 256Check that documents loaded correctly
Restart kernel and try again
Problem: Model download fails
Fix: Check internet connection
Manually download from: https://
gpt4all .io /models/ Place in:
~/.cache/gpt4all/
Performance Tips:¶
🚀 To make it faster:
Use GPU version (see GPU tutorial section)
Use smaller model:
orca-mini-3bReduce
max_tokensto 256Reduce
k=2tok=1in retrieval
📊 To improve quality:
Increase
chunk_sizeto 1000 for more contextIncrease
k=2tok=3for more retrieved chunksTry different models (Mistral 7B for better quality)
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0) if torch.cuda.is_available() else 'None'}")
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")