Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

GPT4All - setup by downloading models

!! This notebook is not meant to be run, just to explain the setup !!

Teaching LLM workflow by using open source models and GPT4All

GPT4all is a framework to source models and handle Jupyter workflow and have them work within the confines of limited compute, like a personal computer or a cloud based server like we use for instruction.

This info is as of the writing of this notebook in May/June 2025 and this info is changing rapidly

Shared Filesystem

In the setup where I was teaching, I used this notebook to download models from Huggingface and I put them in a shared-readwrite folder where the students could access them on Jupyterhub. This was possible because I was using a Jupyterhub for teaching that had a shared folder system.

Your use case may vary. It could look like...

  • shared read write

  • each student downloads own models

  • download models to local

# Ensure that your python environment has gpt4all package installed.
try:
    from gpt4all import GPT4All
except ImportError:
    %pip install gpt4all
    from gpt4all import GPT4All

Which model to download

In the use case for teaching on a Juptyerhub with a CPU, I was looking for small models,

  • ~1bn parameters

  • quantized (Weights have 4 decimal places instead of 10 )

You can explore the world of models at : Hugging Face Model List

GPT4All is using a subset of these models - Here is the description from their documentation:

  • Many LLMs are available at various sizes, quantizations, and licenses.

  • LLMs with more parameters tend to be better at coherently responding to instructions

  • LLMs with a smaller quantization (e.g. 4bit instead of 16bit) are much faster and less memory intensive, and tend to have slightly worse performance

  • Licenses vary in their terms for personal and commercial use

Five that I picked to download are:

  • DeepSeek-R1-Distill-Qwen-1.5B-Q4_0.gguf

  • Phi-3-mini-4k-instruct.Q4_0.gguf

  • Llama-3.2-1B-Instruct-Q4_0.gguf

  • qwen2-1_5b-instruct-q4_0.gguf

  • mistral-7b-instruct-v0.1.Q4_0.gguf

The simplest way to download a model is just to call for it in GPT4All. GPT4All will download the model if you don’t already have it.

Don’t worry if you get llama_model_load: error loading model: error loading model vocabulary: unknown pre-tokenizer type: 'deepseek-r1-qwen' because thats about access to GPUs which we don’t have in this case.

Let’s check out our local filesystem path and where we will download the files

Approach 1 - if a Shared Hub is being used

# This only worked for SP 25 instuction on Berkeley Datahub
#!ls /home/jovyan/_shared/econ148-readwrite
# Cal-ICOR workhop Hub
!ls /home/jovyan/shared

Approach 2 - if a local machine is being used

#This is my local path to a directory called shared-rw
!ls shared-rw
# or the full path ( this is on my laptop) 
!ls /Users/ericvandusen/Documents/GitHub/SmallLM-SP25/shared-rw

Set the path where the models will download

#path for Shared Hub
path = "/home/jovyan/shared_readwrite"
# path for Local 
#path="/Users/ericvandusen/Documents/GitHub/SmallLM-SP25/shared-rw"

Downloading the models

In this cell, we define the model object — the interface through which the notebook will send all prompts and conversations to the local language model.

The call uses the GPT4All class to load a quantized version of a model , a 2-billion-parameter instruction-tuned model from Google, optimized for efficient CPU inference.

We specify:

  • model_name="XXXXXXX.gguf" – identifies the specific quantized model file we’re using (in Q4_0 format, about ~1 GB in size).

  • allow_download=True – allows GPT4All to automatically download the model if it isn’t found locally.

  • model_path – the directory path where the model is stored or should be saved.

  • verbose=True – prints detailed logs during model loading, useful for confirming that the model is correctly located and initialized.

Once this cell runs successfully, the variable model will serve as our connection point for all local inference — the object we’ll send text prompts and chat messages to throughout the rest of the notebook.

# Define the "model" object to which this notebook's code will send conversations & prompts
model = GPT4All(
    model_name="DeepSeek-R1-Distill-Qwen-1.5B-Q4_0.gguf",
    allow_download=True,
    model_path=path,
    verbose=True
)
# Define the "model" object to which this notebook's code will send conversations & prompts
model = GPT4All(
    model_name="orca-mini-3b-gguf2-q4_0.gguf",
    allow_download=True,
    model_path=path,
    verbose=True
)
# Define the "model" object to which this notebook's code will send conversations & prompts
model = GPT4All(
    model_name="Llama-3.2-1B-Instruct-Q4_0.gguf",
    allow_download=True,
    model_path=path,
    verbose=True
)
# Define the "model" object to which this notebook's code will send conversations & prompts
model = GPT4All(
    model_name="qwen2-1_5b-instruct-q4_0.gguf",
    allow_download=True,
    model_path=path,
    verbose=True
)
# Define the "model" object to which this notebook's code will send conversations & prompts
model = GPT4All(
    model_name="Llama-3.2-1B-Instruct-Q4_0.gguf",
    model_path=path,
    allow_download=True,
    verbose=True
)

Let’s now check which models we have

!ls "{path}" -l

Direct Download approach

Here is a command to do the download directly

  • ! - Jupyter magic that runs shell command in notebook

  • wget - Command-line tool for downloading files

  • -c - Continue/Resume - if download interrupted, picks up where it left off

  • --progress=bar:force - Shows download progress bar (% complete, speed, time remaining)

  • -O [path] - Output - specifies exact location and filename to save

  • [URL] - Direct download link to the GGUF model file on Hugging Face

The URL structure: of getting models from Huggingface

https://huggingface.co/{organization}/{repo}/resolve/main/{filename}
  • Organization: allenai (Allen Institute for AI)

  • Repo: OLMo-2-0425-1B-GGUF

  • Branch: main

  • File: OLMo-2-0425-1B-Q4_K_M.gguf

#!wget -c --progress=bar:force \
#  -O /home/jovyan/shared_readwrite/OLMo-2-0425-1B-Q4_K_M.gguf \
#  https://huggingface.co/allenai/OLMo-2-0425-1B-GGUF/resolve/main/OLMo-2-0425-1B-Q4_K_M.gguf

Bonus Searching for models from the GPT4All database

  • We can go to GPT4All database

  • Make that database into a pandas dataframe

  • Filter to pick nodels we want

import requests
import pandas as pd
#Load JSON from the GPT4All models repository
#Small curated list
url = "https://gpt4all.io/models/models3.json"
models = requests.get(url).json()
# Convert to DataFrame
Models_df = pd.DataFrame(models)
Models_df
# Display the columns of the DataFrame
Models_df.columns
#dimensions of the DataFrame
Models_df.shape

Looks like there are only 32 models in this dataset

GPT4all hasnt been keeping up with all that has been happening - but for students - this is a nice small world to look through! There are 1000s of models on Huggingface - with different quantizations - and its completely overwhelming. For pedagogical purposes the sandbox is helpful!

# Filter models that require less than 4 GB of RAM
# Convert 'ramrequired' to numeric and filter to ramrequired < 4
Models_df[Models_df["ramrequired"].astype(float) < 4]