Recommender systems are the invisible engines behind personalized internet experiences. Whether you’re searching for your next TV show on Netflix, products on Amazon, or music on Spotify, recommender systems are working behind the scenes. They analyze your preferences, compare them with millions of other users, and deliver tailored recommendations that keep you engaged and happy.

In this article, we’ll build a recommender system using the Surprise library on an Ubuntu GPU server. We’ll use the MovieLens 100K dataset, a popular dataset for collaborative filtering, and train a model using SVD.

Prerequisites

Before starting, ensure you have the following:

  • An Ubuntu 24.04 Cloud GPU Server.
  • CUDA Toolkit and cuDNN Installed.
  • A root or sudo privileges.

Step 1: Install Required Dependencies

First, we need to set up the environment and install the necessary dependencies.

1. Update your system packages.

apt update -y

2. Install Python and the venv module to create an isolated environment:

apt install python3-pip python3-venv -y

3. Create and activate a virtual environment to manage dependencies:

python3 -m venv surprise_env
source surprise_env/bin/activate

4. Install the necessary Python libraries:

pip install "numpy<2" scikit-surprise pandas matplotlib

Explanation:

  • NumPy: For numerical computations.
  • scikit-surprise: A Python library for building and analyzing recommender systems.
  • Pandas: For data manipulation.
  • Matplotlib: For visualization (optional).

Step 2: Download the MovieLens 100K Dataset

The MovieLens 100K dataset is a widely used dataset for building recommender systems. It contains 100,000 ratings from 943 users on 1,682 movies.

1. Create a directory for the dataset and download it:

mkdir -p ~/datasets && cd ~/datasets
wget https://files.grouplens.org/datasets/movielens/ml-100k.zip
unzip ml-100k.zip

This will create a directory named ml-100k containing the dataset files.

2. The file we need is ml-100k/u.data. Let’s inspect the first few lines:

head ml-100k/u.data

Output:

196	242	3	881250949
186	302	3	891717742
22	377	1	878887116
244	51	2	880606923
166	346	1	886397596
298	474	4	884182806
115	265	2	881171488
253	465	5	891628467
305	451	3	886324817
6	86	3	883603013

The columns represent:

  • user_id: ID of the user.
  • item_id: ID of the movie.
  • rating: Rating given by the user (1-5).
  • timestamp: Timestamp of the rating.

Step 3: Load the Dataset

We’ll use the surprise library to load the dataset into a format suitable for training.

1. Create a file named load_data.py:

nano load_data.py

Add the following code:

import pandas as pd
from surprise import Dataset, Reader

# Load MovieLens 100K dataset
file_path = "ml-100k/u.data"
columns = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv(file_path, sep='\t', names=columns)

# Define a reader object
reader = Reader(line_format='user item rating timestamp', sep='\t')

# Load the dataset into Surprise format
data = Dataset.load_from_df(df[['user_id', 'item_id', 'rating']], reader)

print("Data loaded successfully.")

2. Execute the script:

python3 load_data.py

Output:

Data loaded successfully.

Step 4: Build a Recommender System Using Surprise

We’ll use the SVD (Singular Value Decomposition) algorithm, a popular collaborative filtering technique, to build our recommender system.

1. Create a Python file to train the model

nano train_model.py

Add the following code:

import pandas as pd
from surprise import Dataset, Reader, SVD
from surprise.model_selection import cross_validate

# Load dataset
file_path = "ml-100k/u.data"
columns = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv(file_path, sep='\t', names=columns)

reader = Reader(line_format='user item rating timestamp', sep='\t')
data = Dataset.load_from_df(df[['user_id', 'item_id', 'rating']], reader)

# Train SVD model
model = SVD()
cross_validate(model, data, cv=5, verbose=True)

print("Model trained successfully.")

2. Run the script.

python3 train_model.py

This will perform 5-fold cross-validation and output RMSE (Root Mean Square Error) and MAE (Mean Absolute Error).

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9271  0.9318  0.9398  0.9403  0.9423  0.9363  0.0058  
MAE (testset)     0.7311  0.7359  0.7374  0.7412  0.7434  0.7378  0.0043  
Fit time          0.79    0.83    0.85    0.88    0.96    0.86    0.06    
Test time         0.09    0.09    0.09    0.10    0.08    0.09    0.01    
Model trained successfully.

Step 5: Make Predictions Using the Model

Now that the model is trained, let’s make predictions for a specific user.

1. Create a Python file to make predictions.

nano predict.py

Add the following code:

import pandas as pd
from surprise import Dataset, Reader, SVD
from surprise.model_selection import train_test_split

# Load dataset
file_path = "ml-100k/u.data"
columns = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv(file_path, sep='\t', names=columns)

reader = Reader(line_format='user item rating timestamp', sep='\t')
data = Dataset.load_from_df(df[['user_id', 'item_id', 'rating']], reader)

# Split data into train and test set
trainset, testset = train_test_split(data, test_size=0.2)

# Train SVD model
model = SVD()
model.fit(trainset)

# Make predictions
predictions = model.test(testset)

# Print sample predictions
for prediction in predictions[:10]:
    print(f"User {prediction.uid} - Item {prediction.iid}: Predicted Rating {prediction.est:.2f}")

2. Run the script.

python3 predict.py

Output:

User 282 - Item 325: Predicted Rating 2.90
User 619 - Item 188: Predicted Rating 3.74
User 336 - Item 1118: Predicted Rating 3.04
User 763 - Item 505: Predicted Rating 4.18
User 735 - Item 286: Predicted Rating 3.45
User 276 - Item 214: Predicted Rating 3.53
User 305 - Item 557: Predicted Rating 2.91
User 682 - Item 71: Predicted Rating 3.20
User 718 - Item 471: Predicted Rating 4.00
User 145 - Item 460: Predicted Rating 3.16

Step 6: Save and Load the Model

To reuse the trained model, we’ll save it to a file and load it later.

1. Create a file named save_model.py:

nano save_model.py

Add the following code:

import pandas as pd
import pickle
from surprise import Dataset, Reader, SVD

# Load dataset
file_path = "/root/datasets/ml-100k/u.data"  # Ensure this path is correct
columns = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv(file_path, sep='\t', names=columns)  # Now pd is defined

reader = Reader(line_format='user item rating timestamp', sep='\t')
data = Dataset.load_from_df(df[['user_id', 'item_id', 'rating']], reader)

# Train model
trainset = data.build_full_trainset()
model = SVD()
model.fit(trainset)

# Save the trained model
with open("recommender_model.pkl", "wb") as f:
    pickle.dump(model, f)

print("Model saved successfully.")

2. Run the script.

python3 save_model.py

Output:

Model saved successfully.

3. Create a file named load_model.py to load and use the saved model.

nano load_model.py

Add the following code:

import pickle
from surprise import Dataset, Reader

# Load the saved model
with open("recommender_model.pkl", "rb") as f:
    model = pickle.load(f)

# Make a prediction for user 1 and item 50
user_id = "1"
item_id = "50"
predicted_rating = model.predict(user_id, item_id).est

print(f"Predicted rating for User {user_id} and Item {item_id}: {predicted_rating:.2f}")

4. Run the script.

python3 load_model.py

Output:

Predicted rating for User 1 and Item 50: 3.53

Step 7: Evaluate the Model Performance

Finally, let’s evaluate the model’s performance using RMSE (Root Mean Square Error).

1. Create a file for evaluation.

nano evaluate_model.py

Add the following code:

import pandas as pd
from surprise import Dataset, Reader
from surprise.model_selection import train_test_split
from surprise import accuracy
import pickle

# Load dataset
file_path = "/root/datasets/ml-100k/u.data"  # Make sure this path is correct
columns = ['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv(file_path, sep='\t', names=columns)  # Now pd is defined

reader = Reader(line_format='user item rating timestamp', sep='\t')
data = Dataset.load_from_df(df[['user_id', 'item_id', 'rating']], reader)

# Split data
trainset, testset = train_test_split(data, test_size=0.2)

# Load saved model
with open("recommender_model.pkl", "rb") as f:
    model = pickle.load(f)

# Predict
predictions = model.test(testset)

# Compute RMSE
rmse = accuracy.rmse(predictions)
print(f"RMSE: {rmse:.4f}")

2. Run the script.

python3 evaluate_model.py

Output:

RMSE: 0.6721
RMSE: 0.6721

Conclusion

In this article, we built a recommender system using the Surprise library on an Atlantic.Net GPU server. Now, you have a working recommender system that can be extended to real-world applications like e-commerce product recommendations, movie suggestions, or content curation.