Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Using GPU resources with PyTorch

This notebook walks you through how to use a GPU, using the PyTorch machine learning framework. You can run each cell one by one as you work through the content, or run all cells and read through all output at the same time.

In this notebook you will learn how to:

  1. Import Torch libraries

  2. List available GPUs.

  3. Check that GPUs are enabled.

  4. Assign a GPU device and retrieve the device name.

  5. Load vectors, matrices, and data onto a GPU.

  6. Load a neural network model onto a GPU.

  7. Train the neural network model.

First, use the following command to get some basic information about the GPU resources available to your notebook server:

!nvidia-smi

Check that you can see your GPU

Import the torch and torchvision libraries you need in order to work with PyTorch, and ensure that your GPU resources are visible in your notebook server.

The following commands import the torch and torchvision utilities, as well as some plotting and tqdm helpers:

!pip install torchvision==0.9.1
!pip install tqdm
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import TensorDataset
import torch.optim as optim
import torchvision
from torchvision import datasets
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
from tqdm import tqdm

Now that these libraries are imported, you can use the following commands to check whether a GPU is available, and how many GPUs this notebook server has access to:

torch.cuda.is_available()  # Do we have a GPU? Should return True.
torch.cuda.device_count()  # How many GPUs do we have access to?

Make sure that you can see at least 1 GPU available before continuing to the next section.

If you see 0 GPUs available, click File -> Hub Control Panel to go back to the notebook server control panel.

Stop your notebook server and start your notebook server again, making sure to use a GPU compatible notebook image and add at least 1 GPU. If you selected 1 or more GPUs and you still cannot see a GPU when you run the previous commands, contact your administrator.

Assign your GPU as a device

Assign the first GPU device to the device variable, and get the device name for your GPU.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)  # Check which device we got

If the output of the print command is cuda:0, continue to the next step.

If not, your environment is not set up correctly. Make sure that you use a GPU compatible notebook image and add at least 1 GPU when you start your notebook server.

Next, check the device name of the GPU:

torch.cuda.get_device_name(0)

Loading vectors, matrices, and other data onto the GPU

Run the following commands to create a data structure and load it onto the GPU device, or create your data structure on the GPU device directly.

X_train = torch.IntTensor([0, 30, 50, 75, 70])  # Initialize a Tensor of Integers with no device specified
print(X_train.is_cuda, ",", X_train.device)  # Check which device Tensor is created on
# Move the Tensor to the device we want to use
X_train = X_train.cuda()
# Alternative method: specify the device using the variable
# X_train = X_train.to(device)
# Confirm that the Tensor is on the GPU now
print(X_train.is_cuda, ",", X_train.device)
# Alternative method: Initialize the Tensor directly on a specific device.
X_test = torch.cuda.IntTensor([30, 40, 50], device=device)
print(X_test.is_cuda, ",", X_test.device)

Loading a Neural Network Model onto the GPU

Run the following commands to create or load a model onto your GPU device.

The following code is a basic, fully connected neural network built in Torch.

# Here is a basic fully connected neural network built in Torch.
# If we want to load it / train it on our GPU, we must first put it on the GPU
# Otherwise it will remain on CPU by default.

batch_size = 100


class SimpleNet(nn.Module):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.fc1 = nn.Linear(784, 784)
        self.fc2 = nn.Linear(784, 10)

    def forward(self, x):
        x = x.view(batch_size, -1)
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        output = F.softmax(x, dim=1)
        return output

Running the above code starts the neural network running on the CPU by default.

The following code moves the model onto the GPU, so that it can be trained with a large data set more quickly.

model = SimpleNet().to(device)  # Load the neural network model onto the GPU

Training the Neural Network Model

The examples in this section show you how to train your neural network model using the FashionMNIST data set.

The following code uses the PyTorch data loader to download the data set, and set up training and testing data sets to work with.

"""
    Data loading, train and test set via the PyTorch dataloader.
"""
# Transform our data into Tensors to normalize the data
train_transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,))
        ])

test_transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.1307,), (0.3081,)),
        ])

# Set up a training data set
trainset = datasets.FashionMNIST('./data', train=True, download=True,
                   transform=train_transform)
train_loader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=False, num_workers=2)

# Set up a test data set
testset = datasets.FashionMNIST('./data', train=False,
                   transform=test_transform)
test_loader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

Place the labels from the FashionMNIST data set into dictionary format, and plot a selection of the data to verify:

# A dictionary to map our class numbers to their items.
labels_map = {
    0: "T-Shirt",
    1: "Trouser",
    2: "Pullover",
    3: "Dress",
    4: "Coat",
    5: "Sandal",
    6: "Shirt",
    7: "Sneaker",
    8: "Bag",
    9: "Ankle Boot",
}

# Plotting 9 random different items from the training data set, trainset.
figure = plt.figure(figsize=(8, 8))
for i in range(1, 3 * 3 + 1):
    sample_idx = torch.randint(len(trainset), size=(1,)).item()
    img, label = trainset[sample_idx]
    figure.add_subplot(3, 3, i)
    plt.title(labels_map[label])
    plt.axis("off")
    plt.imshow(img.view(28,28), cmap="gray")
plt.show()

Run the following code to train the model and see how well it can classify fashion items into the 10 classes in the dictionary.

def train(model, device, train_loader, optimizer, epoch):
    """Model training function"""
    model.train()
    print(device)
    for batch_idx, (data, target) in tqdm(enumerate(train_loader)):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    # Use the no_grad method to increase computation speed
    # since computing the gradient is not necessary in this step.
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))
# number of  training 'epochs'
EPOCHS = 5
# our optimization strategy used in training.
optimizer = optim.Adadelta(model.parameters(), lr=0.01)
for epoch in range(1, EPOCHS + 1):
        print( f"EPOCH: {epoch}")
        train(model, device, train_loader, optimizer, epoch)
        test(model, device, test_loader)

The accuracy of the model increases over a number of epochs, from about 63% in the first epoch to about 72% in the fifth. (The exact numbers here might vary, depending on random weight initialization on your notebook server.)

Saving the model state

Now that the model is trained, save it so that it can be used in the next notebook, Loading and Running a PyTorch Model.

# Saving the model's weights!
torch.save(model.state_dict(), "mnist_fashion_SimpleNet.pt")