Table of Contents
- Prerequisites
- Step 1: Install Required Packages
- Step 2: Create a Python Virtual Environment
- Step 3: Install PyTorch with CUDA Support
- Step 4: Create a Basic QA Pipeline
- Step 5: Fine-Tune the Model
- Step 6: Update the qa_pipeline.py Script to Use the Fine-Tuned Model
- Step 7: Deploy the Fine-Tuned Model with FastAPI
- Step 9: Test the FastAPI Application
- Conclusion
Question Answering (QA) systems transform how users interact with data, allowing them to query information in natural language and get concise, direct answers. Building a QA system has become more straightforward with the wide availability of advanced NLP models and the growing popularity of frameworks like Hugging Face Transformers and OpenAI GPT.
This guide will show you how to set up and build a QA system on an Ubuntu GPU server.
Prerequisites
Before starting, ensure you have the following:
- An Ubuntu 24.04 Cloud GPU Server.
- CUDA Toolkit and cuDNN Installed.
- Root or sudo privileges.
Step 1: Install Required Packages
Before starting, it is essential to update all system packages to the latest version.
apt update
apt upgrade
Next, install Python with all required libraries.
apt install python3-full python3-virtualenv -y
Step 2: Create a Python Virtual Environment
A virtual environment isolates your project’s dependencies, ensuring that installed packages don’t interfere with other projects or system-wide packages.
Let’s create a new virtual environment for your project.
python3 -m venv venv
Activate your virtual environment.
source venv/bin/activate
Step 3: Install PyTorch with CUDA Support
PyTorch is a deep learning framework for model operations. Installing it with CUDA support allows the GPU to be utilized, significantly accelerating computations.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
Next, install additional libraries for building and deploying the QA system.
pip install transformers datasets fastapi uvicorn
- transformers: Provides access to pre-trained models and tools from Hugging Face.
- datasets: Offers a collection of datasets and tools for model evaluation.
- fastapi: A modern web framework for building APIs with Python.
- uvicorn: A lightning-fast ASGI server for serving FastAPI applications.
Connect to the Python shell and verify that PyTorch is correctly installed.
python3
>>> import torch
>>> print(torch.__version__)
Output:
2.6.0+cu118
Confirm that CUDA is available for GPU acceleration.
>>> print(torch.cuda.is_available())
Output:
True
Press CTRL+D to exit from the Python shell
Step 4: Create a Basic QA Pipeline
In this step, we’ll set up a basic question-answering pipeline using a pre-trained model to understand the foundational workings before any customization.
Create a Python script that utilizes a pre-trained model to answer questions based on a provided context.
nano qa_pipeline.py
Add the following code:
from transformers import pipeline
def main():
# Initialize QA pipeline with a DistilBERT model fine-tuned on SQuAD
qa_pipeline = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")
context = (
"Ubuntu is a Linux distribution based on Debian and composed mostly of free and open-source software. "
"Ubuntu is officially released in three editions: Desktop, Server, and Core for IoT devices and robots."
)
question = "What is Ubuntu based on?"
result = qa_pipeline({"context": context, "question": question})
print("Answer:", result["answer"])
print("Score:", result["score"])
if __name__ == "__main__":
main()
This script uses the Hugging Face pipeline to load a pre-trained QA model and processes a sample context and question.
Now, run the above script.
python3 qa_pipeline.py
This will display the model’s answer and its confidence score.
Answer: Debian
Score: 0.9923
Explanation:
- Answer: The model identifies “Debian” as the answer to the question.
- Score: A confidence score (ranging from 0 to 1) indicating the model’s certainty. A score of 0.9923 signifies high confidence.
Step 5: Fine-Tune the Model
Fine-tuning tailors the pre-trained model to better suit our specific dataset, enhancing its performance on domain-specific questions.
Create a script to fine-tune the model using the SQuAD dataset, a widely used benchmark for QA tasks.
nano fine_tuning.py
Add the following code.
# fine_tuning.py
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForQuestionAnswering, TrainingArguments, Trainer
def prepare_train_features(examples):
# 1. Tokenize question + context
tokenized_examples = tokenizer(
examples["question"],
examples["context"],
truncation=True,
max_length=384,
padding="max_length",
return_offsets_mapping=True
)
# We'll create start_positions and end_positions for each example
start_positions = []
end_positions = []
# Loop over each example in the batch
for i, offsets in enumerate(tokenized_examples["offset_mapping"]):
# Each 'examples["answers"]' is a list of dict for batched data
answer = examples["answers"][i]
# We take the first answer (SQuAD usually has one per example)
answer_start_char = answer["answer_start"][0]
answer_text = answer["text"][0]
answer_end_char = answer_start_char + len(answer_text)
# Initialize
start_token_idx = None
end_token_idx = None
# Find start/end token indices
for idx, (start, end) in enumerate(offsets):
if start <= answer_start_char < end:
start_token_idx = idx
if start < answer_end_char <= end:
end_token_idx = idx
break
# Fallback in case we don't find a matching token
if start_token_idx is None:
start_token_idx = 0
if end_token_idx is None:
end_token_idx = len(offsets) - 1
start_positions.append(start_token_idx)
end_positions.append(end_token_idx)
tokenized_examples["start_positions"] = start_positions
tokenized_examples["end_positions"] = end_positions
# Remove offset mapping to save memory
tokenized_examples.pop("offset_mapping")
return tokenized_examples
def main():
# 1. Load SQuAD
squad = load_dataset("squad")
# 2. Set up model/tokenizer
model_name = "distilbert-base-uncased"
global tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForQuestionAnswering.from_pretrained(model_name)
# 3. Preprocess train/validation
squad_encoded = squad.map(prepare_train_features, batched=True)
# 4. Define training arguments
training_args = TrainingArguments(
output_dir="./qa_model",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
num_train_epochs=2, # adjust as needed
eval_strategy="epoch", # replaced `evaluation_strategy` with `eval_strategy`
save_steps=1000,
save_total_limit=1,
)
# 5. Initialize Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=squad_encoded["train"],
eval_dataset=squad_encoded["validation"],
)
# 6. Train
trainer.train()
# 7. Save final model
trainer.save_model("./qa_model")
print("Fine-tuning complete. Model saved to ./qa_model")
if __name__ == "__main__":
main()
This script loads the SQuAD dataset, preprocesses the data, and fine-tunes the model.
Now, run the script to commence the fine-tuning process.
python3 fine_tuning.py
During training, you’ll observe outputs indicating the model’s progress, such as:
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 87599/87599 [00:24<00:00, 3514.81 examples/s]
Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10570/10570 [00:03<00:00, 3419.28 examples/s]
{'loss': 3.2628, 'grad_norm': 23.89510726928711, 'learning_rate': 4.8858447488584476e-05, 'epoch': 0.05}
{'loss': 2.3261, 'grad_norm': 13.938529968261719, 'learning_rate': 4.7716894977168955e-05, 'epoch': 0.09}
{'loss': 2.0285, 'grad_norm': 20.18623161315918, 'learning_rate': 4.657534246575342e-05, 'epoch': 0.14}
{'loss': 1.8877, 'grad_norm': 17.599531173706055, 'learning_rate': 4.54337899543379e-05, 'epoch': 0.18}
{'loss': 1.8392, 'grad_norm': 20.058273315429688, 'learning_rate': 4.4292237442922375e-05, 'epoch': 0.23}
{'loss': 1.7229, 'grad_norm': 20.78485870361328, 'learning_rate': 4.3150684931506855e-05, 'epoch': 0.27}
{'loss': 1.6989, 'grad_norm': 29.019317626953125, 'learning_rate': 4.200913242009132e-05, 'epoch': 0.32}
{'loss': 1.6686, 'grad_norm': 20.27025604248047, 'learning_rate': 4.08675799086758e-05, 'epoch': 0.37}
{'loss': 1.6013, 'grad_norm': 17.87761878967285, 'learning_rate': 3.9726027397260274e-05, 'epoch': 0.41}
{'loss': 1.6066, 'grad_norm': 25.986677169799805, 'learning_rate': 3.8584474885844754e-05, 'epoch': 0.46}
{'loss': 1.5666, 'grad_norm': 21.410463333129883, 'learning_rate': 3.744292237442922e-05, 'epoch': 0.5}
{'loss': 1.6172, 'grad_norm': 22.285802841186523, 'learning_rate': 3.63013698630137e-05, 'epoch': 0.55}
{'loss': 1.5496, 'grad_norm': 22.68842315673828, 'learning_rate': 3.5159817351598174e-05, 'epoch': 0.59}
Fine-tuning customizes a pre-trained model to perform better on a specific task by training it on a relevant dataset. In this case, fine-tuning the DistilBERT model on the SQuAD dataset enhances its ability to answer questions accurately within given contexts.
Step 6: Update the qa_pipeline.py Script to Use the Fine-Tuned Model
Open the existing qa_pipeline.py script:
nano qa_pipeline.py
Update the script with the following code:
from transformers import pipeline, AutoModelForQuestionAnswering, AutoTokenizer
model_path = "./qa_model"
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForQuestionAnswering.from_pretrained(model_path)
qa_pipeline = pipeline("question-answering", model=model, tokenizer=tokenizer)
context = "Hugging Face is based in New York and Paris."
question = "Where is Hugging Face based?"
result = qa_pipeline({"context": context, "question": question})
print(result)
# e.g., {'score': 0.95, 'start': 17, 'end': 25, 'answer': 'New York'}
Save and close the file.
Step 7: Deploy the Fine-Tuned Model with FastAPI
After fine-tuning our model, the next step is to deploy it so that it can serve real-time predictions. FastAPI is an excellent choice for this purpose due to its high performance and ease of use.
Create a Python script named app.py to set up our FastAPI application.
nano app.py
Add the following code.
# app.py
from fastapi import FastAPI
from pydantic import BaseModel
from transformers import pipeline
app = FastAPI()
# Load a QA model (pretrained or your fine-tuned one)
qa_pipeline = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")
class QAPayload(BaseModel):
context: str
question: str
@app.post("/qa")
def get_answer(payload: QAPayload):
result = qa_pipeline({"context": payload.context, "question": payload.question})
return {"answer": result["answer"], "score": result["score"]}
This script initializes the FastAPI app and sets up an endpoint for our QA model.
Start the FastAPI server using Uvicorn.
uvicorn app:app --host 0.0.0.0 --port 8000
Output:
Device set to use cuda:0
INFO: Started server process [14449]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
The above command starts the FastAPI application, making it accessible at http://0.0.0.0:8000.
Step 9: Test the FastAPI Application
Testing is crucial to ensure that our API functions as expected. We’ll use the curl utility for manual testing.
Open another terminal and send a POST request to our /qa endpoint using curl.
curl -X POST "http://localhost:8000/qa" -H "Content-Type: application/json" -d '{ "context": "Ubuntu is a Linux distribution based on Debian", "question": "What is Ubuntu based on?" }'
Output:
{
"answer": "Debian",
"score": 0.9965217709541321
}
The model identifies “Debian” as the answer to the question. The confidence score (e.g., 0.9965) indicates the model’s certainty regarding its answer. A score close to 1.0 signifies high confidence.
Conclusion
In this guide, we’ve walked through building a question-answering (QA) system on an Atlantic.Net Cloud GPU server, utilizing FastAPI, Hugging Face Transformers, and PyTorch. We began by setting up the necessary environment, installing essential packages, and verifying the installation to ensure our system was ready for development.