Natural Language Processing (NLP) has become a key technology that helps computers understand and work with human language in recent years. NLP is used in many things like chatbots, analyzing feelings in texts, translating languages, and summarizing long articles. While creating a good NLP model is important, deploying it properly to work well for real users is just as crucial. This means ensuring the NLP system is easy to access, can handle many users at once, and works efficiently.
In this article, we will deploy an NLP model as a web application on an Ubuntu GPU server using PyTorch.
Prerequisites
Before starting, ensure you have the following:
- An Ubuntu 24.04 Cloud GPU Server.
- CUDA Toolkit 11.8 and cuDNN 8.6 installed. Verify with `nvidia-smi`.
- curl installed: `sudo apt install curl`
- Root or sudo privileges.
Step 1: Setting up the Environment
1. Install Python with additional dependencies.
apt install python3-venv python3-pip
2. Create a Python virtual environment.
python3 -m venv pytorch-env
3. Activate the virtual environment.
source pytorch-env/bin/activate
Step 2: Install PyTorch with GPU Support
1. Install PyTorch with GPU support
Note: This installation can take a significant amount of time. Verify CUDA is working: Run python3 -c “import torch; print(torch.cuda.is_available())”. If it prints True, CUDA is correctly configured.
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118
2. Next, install Flask to create the web application:
pip install Flask
3. Install the transformers library if you haven’t already:
pip install transformers
Step 3: Building the NLP Model
We’ll use a pre-trained BERT model from the Hugging Face transformers library for text classification.
Note: BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained model that has achieved state-of-the-art results in various NLP tasks. It is used here for text classification. The predicted class 0 represents the first class that the model was trained on. Without knowing the training data, it is impossible to know what that class represents.
Create a Python script to load the pre-trained model and tokenizer, and perform inference on sample text.
nano load_module.py
Add the following code.
from transformers import BertTokenizer, BertForSequenceClassification
import torch
# Load pre-trained model and tokenizer
model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
# Example input text
input_text = "This is a sample text for classification."
# Tokenize input text
inputs = tokenizer(input_text, return_tensors='pt', truncation=True, padding=True)
# Perform inference
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class = torch.argmax(logits, dim=1).item()
print(f"Predicted class: {predicted_class}")
Run the script to test the model:
python3 load_module.py
Output.
Predicted class: 0
This confirms that the model is working as expected.
Step 4: Creating the Flask Web App
Create a file named app.py.
nano app.py
Add the following code.
from flask import Flask, request, jsonify
from transformers import BertTokenizer, BertForSequenceClassification
import torch
app = Flask(__name__)
# Load pre-trained model and tokenizer
model_name = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(model_name)
model = BertForSequenceClassification.from_pretrained(model_name)
@app.route('/predict', methods=['POST'])
def predict():
data = request.json
input_text = data.get('text', '')
if not input_text:
return jsonify({'error': 'No text provided'}), 400
# Tokenize input text
inputs = tokenizer(input_text, return_tensors='pt', truncation=True, padding=True)
# Perform inference
with torch.no_grad():
outputs = model(**inputs)
logits = outputs.logits
predicted_class = torch.argmax(logits, dim=1).item()
return jsonify({'predicted_class': predicted_class})
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
Save the file and run the Flask app using the following command:
python3 app.py &
The Flask server will start on http://0.0.0.0:5000.
Step 5: Testing the Deployed Model
You can test the deployed model by sending a POST request to the /predict endpoint. Here’s how you can do it using curl.
Run the following command in your terminal:
curl -X POST http://localhost:5000/predict -H "Content-Type: application/json" -d '{"text": "This is a sample text for classification."}'
If everything is set up correctly, you should receive a JSON response like this:
{"predicted_class":0}
The predicted_class value will depend on the model’s output.
Conclusion
In this article, we walked through deploying an NLP model as a web application on an Ubuntu GPU server using PyTorch and Flask. We also demonstrated how to test the deployed model by sending a POST request. By leveraging the power of GPUs, you can significantly speed up the inference time of your NLP models, making them suitable for real-time applications.