BERT (Bidirectional Encoder Representations from Transformers) is a state-of-the-art model for natural language processing (NLP) tasks, including question answering. Fine-tuning BERT on a custom dataset can significantly improve its performance for specific use cases.

In this guide, we’ll walk through the process of fine-tuning BERT for question answering using TensorFlow on an Ubuntu GPU server.

Prerequisites

Before proceeding, ensure you have the following:

  • An Atlantic.Net Cloud GPU server running Ubuntu 24.04, equipped with an NVIDIA A100 GPU with at least 10 GB of GPU RAM.
  • CUDA Toolkit and cuDNN Installed.
  • A root or sudo privileges.

Step 1: Set Up a Python Virtual Environment

1. Update the system and install essential packages.

apt update -y
apt install python3-pip python3-venv -y

2. Create and activate a virtual environment:

python3 -m venv qa-env
source qa-env/bin/activate

3. Install the required libraries:

pip install tensorflow transformers datasets tf-keras

Step 2: Load the Pre-trained BERT Model and Tokenizer

We’ll use the bert-large-uncased-whole-word-masking-finetuned-squad model, which is fine-tuned for question answering. The BertTokenizerFast is used for tokenization, as it supports the return_offsets_mapping feature required for question answering.

nano fine_tune_bert_qa.py

Add the below code:

# fine_tune_bert_qa.py

from transformers import BertTokenizerFast, TFBertForQuestionAnswering, create_optimizer
from datasets import load_dataset
import tensorflow as tf
from tensorflow.keras.mixed_precision import set_global_policy

# Enable mixed precision for better GPU performance
set_global_policy('mixed_float16')

# Load pre-trained BERT model and fast tokenizer
model_name = "bert-base-uncased"
tokenizer = BertTokenizerFast.from_pretrained(model_name)
model = TFBertForQuestionAnswering.from_pretrained(model_name)

This code loads the necessary libraries, sets up mixed precision for efficient GPU usage, and loads the pre-trained BERT model and tokenizer from Hugging Face’s transformers library.

Step 3: Load and Preprocess the Dataset

We’ll use the SQuAD dataset, a popular dataset for question answering. The dataset contains questions, contexts, and answers with their start and end positions in the context.

Add this to fine_tune_bert_qa.py

 

# Load the SQuAD dataset
dataset = load_dataset("squad")

# Preprocess the dataset
def preprocess_function(examples):
    questions = [q.strip() for q in examples["question"]]
    inputs = tokenizer(
        questions,
        examples["context"],
        max_length=128,  # Reduce sequence length
        truncation=True,
        padding="max_length",
        return_offsets_mapping=True
    )
    offset_mapping = inputs.pop("offset_mapping")
    
    processed_inputs = {
        "input_ids": [],
        "attention_mask": [],
        "start_positions": [],
        "end_positions": []
    }

    for i, offsets in enumerate(offset_mapping):
        answer = examples["answers"][i]
        start_char = answer["answer_start"][0]
        end_char = start_char + len(answer["text"][0])

        start_index = next((idx for idx, offset in enumerate(offsets) if offset[0] <= start_char < offset[1]), None)
        end_index = next((idx for idx, offset in enumerate(offsets) if offset[0] < end_char <= offset[1]), None)

        if start_index is not None and end_index is not None:
            processed_inputs["input_ids"].append(inputs["input_ids"][i])
            processed_inputs["attention_mask"].append(inputs["attention_mask"][i])
            processed_inputs["start_positions"].append(start_index)
            processed_inputs["end_positions"].append(end_index)
        else:
            continue

    return processed_inputs if processed_inputs["start_positions"] else None

# Apply preprocessing and filter out any None entries
tokenized_datasets = dataset.map(preprocess_function, batched=True, remove_columns=dataset["train"].column_names)
tokenized_datasets = tokenized_datasets.filter(lambda x: x is not None)

This script preprocesses the data by tokenizing the questions and contexts and mapping the answers’ start and end positions to token indices. It filters out entries where no valid token position for the answer is found.

Step 4: Prepare TensorFlow Datasets

We’ll convert the tokenized dataset into TensorFlow datasets for training and validation.

Add this to fine_tune_bert_qa.py

 

# Function to convert processed data into TensorFlow dataset format
def create_tf_dataset(tokenized_data):
    input_ids = tf.constant(tokenized_data["input_ids"])
    attention_mask = tf.constant(tokenized_data["attention_mask"])
    start_positions = tf.constant(tokenized_data["start_positions"])
    end_positions = tf.constant(tokenized_data["end_positions"])

    return tf.data.Dataset.from_tensor_slices((
        {"input_ids": input_ids, "attention_mask": attention_mask},
        {"start_positions": start_positions, "end_positions": end_positions}
    )).shuffle(10000).batch(2)  # Adjust batch size as needed

# Create TensorFlow datasets for training and validation
train_dataset = create_tf_dataset(tokenized_datasets["train"])
validation_dataset = create_tf_dataset(tokenized_datasets["validation"])

This code converts the preprocessed data into TensorFlow datasets, which are suitable for training by batching and shuffling the data.

Step 5: Compile the Model

The TFBertForQuestionAnswering model requires a loss function for both the start and end positions. We’ll use SparseCategoricalCrossentropy for this purpose.

Add this to fine_tune_bert_qa.py

 

# Create the optimizer using Hugging Face's utility
num_train_steps = len(train_dataset) * 1  # Assume 3 epochs of training
optimizer, _ = create_optimizer(
    init_lr=5e-5,
    num_train_steps=num_train_steps,
    num_warmup_steps=0,
    weight_decay_rate=0.01,
)

# Compile the model with the optimizer
model.compile(optimizer=optimizer)

This sets up the optimizer with training parameters and compiles the model, making it ready for training.

Step 6: Fine-Tune the Model

Now, we’ll fine-tune the model on the SQuAD dataset.

Add this to fine_tune_bert_qa.py

 

# Fine-tune the model
model.fit(train_dataset, validation_data=validation_dataset, epochs=1)

# Save the fine-tuned model
model.save_pretrained("fine_tuned_bert_qa_model")
tokenizer.save_pretrained("fine_tuned_bert_qa_model")

print("Fine-tuning complete! Model saved to 'fine_tuned_bert_qa_model'.")

This trains the model on the training dataset while evaluating on the validation set. After training, it saves the model and tokenizer for later use.

Step 7: Run the Script

Run the script using the following command to train and save the model.

python3 fine_tune_bert_qa.py

Step 8: Monitor GPU Usage

While the script is running, you can monitor GPU usage to ensure that the fine-tuning process is utilizing the GPU effectively. Use the following command in a separate terminal:

watch -n 1 nvidia-smi

This command will display GPU usage statistics every second.

Step 9: Use the Fine-Tuned Model for Inference

After fine-tuning, you can load the fine-tuned model and use it for inference.

nano run_model.py

Add the below code:

import tensorflow as tf  # Import TensorFlow
from transformers import BertTokenizerFast, TFBertForQuestionAnswering

# Load the fine-tuned model and tokenizer
model = TFBertForQuestionAnswering.from_pretrained("fine_tuned_bert_qa_model")
tokenizer = BertTokenizerFast.from_pretrained("fine_tuned_bert_qa_model")

# Use the model for inference
question = "What is the capital of France?"
context = "France is a country in Europe. The capital of France is Paris."

# Tokenize the input
inputs = tokenizer(question, context, return_tensors="tf")

# Get model predictions
outputs = model(inputs)

# Find the start and end positions of the answer
start_index = tf.argmax(outputs.start_logits, axis=1).numpy()[0]
end_index = tf.argmax(outputs.end_logits, axis=1).numpy()[0]

# Extract the answer tokens
answer_tokens = inputs["input_ids"][0][start_index:end_index+1]

# Decode the answer tokens to text
answer = tokenizer.decode(answer_tokens, skip_special_tokens=True)

print(f"Answer: {answer}")

This script loads the model and tokenizer, processes an input question and context, and predicts the answer using the model.

Run the code:

python3 run_model.py

You will get the below output.

Answer: paris

This output confirms the model’s ability to predict correct answers from the context provided.

Conclusion

In this guide, we’ve walked through the process of fine-tuning BERT for question answering using TensorFlow on an Ubuntu GPU server. You can now fine-tune BERT on your custom dataset and deploy it for production use.