Deploying a machine learning model is a crucial step to making it accessible to end-users. Flask, a lightweight web framework for Python, is popular for deploying machine learning models due to its simplicity and flexibility. When deploying on an Ubuntu GPU server, you can leverage the power of GPUs to accelerate inference, especially for deep learning models.
In this guide, we’ll walk through the steps to deploy a machine learning model using Flask on an Ubuntu GPU server.
Prerequisites
Before starting, ensure you have the following:
- An Ubuntu 22.04 Cloud GPU Server.
- Latest CUDA Toolkit and cuDNN Installed.
- Root or sudo privileges.
Step 1: Update and Install System Dependencies
Before starting, it’s essential to update our Ubuntu server to ensure we’re working with the latest security patches and package versions.
apt update -y
apt upgrade -y
Next, install Python and other dependencies.
apt install python3 python3-pip python3-virtualenv -y
Step 2: Set Up a Virtual Environment
Using a virtual environment helps isolate your project’s dependencies, preventing potential version conflicts with other projects or system Python packages.
First, create a Python virtual environment named ml-env:
python3 -m venv ml-env
Next, activate the virtual environment.
source ml-env/bin/activate
Next, upgrade pip to the latest version.
pip install --upgrade pip
Then install Flask and other Python libraries.
pip install flask gunicorn torch torchvision tensorflow numpy pandas
Step 3: Write and Train a Machine Learning Model
Now, let’s create a script called train_model.py that sets up a simple neural network using PyTorch, trains it on dummy data, and saves the model weights.
nano train_model.py
Add the following code:
import torch
import torch.nn as nn
import torch.optim as optim
class SimpleModel(nn.Module):
def __init__(self):
super(SimpleModel, self).__init__()
self.fc1 = nn.Linear(4, 64)
self.fc2 = nn.Linear(64, 64)
self.fc3 = nn.Linear(64, 1)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
# Using sigmoid for a binary classification output
x = torch.sigmoid(self.fc3(x))
return x
# Create the model
model = SimpleModel()
# Define a binary cross-entropy loss and Adam optimizer
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# Dummy training data:
data = torch.rand(100, 4)
labels = torch.randint(0, 2, (100, 1), dtype=torch.float32)
# Train for 10 epochs
for epoch in range(10):
optimizer.zero_grad()
outputs = model(data)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
if (epoch+1) % 2 == 0:
print(f"Epoch [{epoch+1}/10], Loss: {loss.item():.4f}")
# Switch model to evaluation mode (optional but good practice)
model.eval()
# Save the model's state dict only (recommended approach)
torch.save(model.state_dict(), "model_weights.pth")
print("Model weights saved to 'model_weights.pth'")
This code defines a simple feed-forward neural network using PyTorch, trains it on randomly generated data for 10 epochs using binary cross-entropy loss and the Adam optimizer, and then switches the model to evaluation mode. Finally, it saves the trained model parameters (state dict) to the file model_weights.pth.
Run the above script.
python3 train_model.py
Output:
Epoch [2/10], Loss: 0.6822
Epoch [4/10], Loss: 0.6763
Epoch [6/10], Loss: 0.6745
Epoch [8/10], Loss: 0.6731
Epoch [10/10], Loss: 0.6718
Model weights saved to 'model_weights.pth'
Step 4: Create a Flask API for Model Inference
Now, create a app.py to loads the same PyTorch model architecture defined earlier, loads the trained model’s weights and creates an endpoint for inference.
nano app.py
Add the following code:
import torch
import torch.nn as nn
import numpy as np
from flask import Flask, request, jsonify
app = Flask(__name__)
###############################################
# 1. Define the EXACT SAME PyTorch architecture
###############################################
class SimpleModel(nn.Module):
def __init__(self):
super(SimpleModel, self).__init__()
# Must match your training script's layers:
self.fc1 = nn.Linear(4, 64)
self.fc2 = nn.Linear(64, 64)
self.fc3 = nn.Linear(64, 1) # Single output (e.g. for binary classification)
def forward(self, x):
x = torch.relu(self.fc1(x))
x = torch.relu(self.fc2(x))
x = torch.sigmoid(self.fc3(x))
return x
###########################################################
# 2. Load the PyTorch model state dict (must match architecture)
###########################################################
pytorch_model = SimpleModel()
pytorch_model.load_state_dict(torch.load("model_weights.pth"))
pytorch_model.eval()
###########################################################
# 3. Flask endpoint for PyTorch inference
###########################################################
@app.route("/predict", methods=["POST"])
def predict():
data = request.get_json()
# Convert input to a numpy array of shape [1, 4] since the model expects 4 features
input_data = np.array(data["input"]).reshape(1, 4)
# PyTorch inference
tensor_input = torch.tensor(input_data, dtype=torch.float32)
pytorch_output = pytorch_model(tensor_input).detach().numpy().tolist()
return jsonify({
"pytorch": pytorch_output
})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=5000, debug=True)
This code uses PyTorch, NumPy, and Flask to define a simple neural network model, load pre-trained weights, and serve predictions through a /predict endpoint. The endpoint accepts JSON input, converts it to a PyTorch tensor, runs inference, and returns the model’s output as JSON.
Step 5: Run the Flask App
Now, run your Flask app with the following command:
python3 app.py
You will see the following output:
* Serving Flask app 'app'
* Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:5000
* Running on http://your-server-ip:5000
Press CTRL+C to quit
* Restarting with stat
* Debugger is active!
* Debugger PIN: 125-332-575
Step 6: Test the Flask Application
Your Flask app is started and running on port 5000. Now, open another terminal and test the API using the curl command:
curl -X POST -H "Content-Type: application/json" -d '{"input":[0.1, 0.2, 0.3, 0.4]}' http://localhost:5000/predict
You should receive a JSON response similar to:
{
"pytorch": [
[
0.49628227949142456
]
]
}
This value represents the model’s prediction for a binary classification probability (between 0.0 and 1.0).
Conclusion
You have successfully set up a GPU-enabled Ubuntu server, trained a simple PyTorch model on dummy data, and created a Flask API to serve predictions. This foundational framework can be adapted to more complex architectures or real-world datasets by integrating additional data pipelines and container orchestration tools.