Semantic search is a powerful technique that improves information retrieval by understanding the meaning behind queries and documents. With the help of Deepset.ai’s Haystack framework, you can build and deploy a scalable semantic search application on an Ubuntu GPU server.

This article will guide you through the process to deploy a Scalable Semantic Search Application with Deepset.ai’s Haystack framework on Ubuntu GPU server.

Prerequisites

Before proceeding, ensure you have the following:

  • An Atlantic.Net Cloud GPU server running Ubuntu 24.04.
  • CUDA Toolkit and cuDNN Installed.
  • A root or sudo privileges.

Step 1: Set Up the Environment

1. Install Python Dependencies:

apt install python3 python3-pip python3-venv

2. Create a virtual environment:

python3 -m venv haystack-env
source haystack-env/bin/activate

3. Install Haystack and other dependencies:

pip install farm-haystack[elasticsearch,gpu] sentence-transformers torch

4. Ensure PyTorch can detect the GPU:

python3

Write the following in Python shell:

>>> import torch
>>> print(torch.cuda.is_available())

Output.

True

Press CTRL+D to exit from the Python shell.

Step 2: Install Docker and NVIDIA Container Toolkit

Docker simplifies deployment, and the NVIDIA Container Toolkit allows Docker containers to use GPU resources.

1. Install Docker.

apt install docker.io

2. Start and enable the Docker service.

systemctl start docker
systemctl enable docker

3. Install NVIDIA Container Toolkit.

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | tee /etc/apt/sources.list.d/nvidia-docker.list
apt update && apt install -y nvidia-container-toolkit
systemctl restart docker

4. Haystack supports various document stores like Elasticsearch, FAISS, and Weaviate. For this tutorial, we’ll use Elasticsearch. Run Elasticsearch Docker container.

docker run -d -p 9200:9200 -e "discovery.type=single-node" elasticsearch:7.9.2

5. Verify the Elasticsearch.

curl -X GET "http://localhost:9200/"

Note: You may need to wait a short while for the container to start.

Output.

{
  "name" : "e42a0e2ac752",
  "cluster_name" : "docker-cluster",
  "cluster_uuid" : "Q76hITRvRu65Gzxo_gnYdg",
  "version" : {
    "number" : "7.9.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "d34da0ea4a966c4e49417f2da2f244e3e97b4e6e",
    "build_date" : "2020-09-23T00:45:33.626720Z",
    "build_snapshot" : false,
    "lucene_version" : "8.6.2",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Step 3: Delete the Existing Index (if it exists)

Before creating a new index, delete any existing index to avoid conflicts.

nano semantic_app.py

Add the below code:

from elasticsearch import Elasticsearch

# Connect to Elasticsearch
es = Elasticsearch(hosts=["http://localhost:9200"])
index_name = "document"  # Replace with your index name

# Delete the index if it exists
if es.indices.exists(index=index_name):
    es.indices.delete(index=index_name)
    print(f"Index '{index_name}' deleted successfully.")
else:
    print(f"Index '{index_name}' does not exist.")

Step 4: Initialize the ElasticsearchDocumentStore

The ElasticsearchDocumentStore is a Haystack component that interacts with Elasticsearch to store and retrieve documents. Initialize it with the appropriate embedding dimension to match the output of your embedding model.

from haystack.document_stores import ElasticsearchDocumentStore

document_store = ElasticsearchDocumentStore(
    host="localhost",  # Replace with your Elasticsearch host
    username="",       # Replace with your Elasticsearch username (if applicable)
    password="",       # Replace with your Elasticsearch password (if applicable)
    index="document",  # Name of the index where documents will be stored
    embedding_dim=384  # Set embedding_dim to match the model's output
)

Step 5: Prepare and Index Sample Documents

Add sample documents to the document store.

your_documents = [
    {
        "content": "The quick brown fox jumps over the lazy dog.",
        "meta": {"title": "Example Document 1", "author": "John Doe"}
    },
    {
        "content": "Artificial intelligence is transforming the world.",
        "meta": {"title": "Example Document 2", "author": "Jane Smith"}
    },
    {
        "content": "Semantic search improves information retrieval.",
        "meta": {"title": "Example Document 3", "author": "Alice Johnson"}
    }
]

# Write documents to the document store
document_store.write_documents(your_documents)
print("Documents have been successfully indexed!")

Step 6: Initialize the Retriever

The retriever is responsible for fetching relevant documents based on the query.

from haystack.nodes import EmbeddingRetriever

retriever = EmbeddingRetriever(
    document_store=document_store,  # Pass the document_store here
    embedding_model="sentence-transformers/all-MiniLM-L6-v2"
)

Step 7: Generate Embeddings for Documents

Generate embeddings for the indexed documents.

document_store.update_embeddings(retriever)
print("Document embeddings have been generated!")

Step 8: Initialize the Reader

The reader extracts answers from the retrieved documents.

from haystack.nodes import FARMReader

reader = FARMReader(
    model_name_or_path="deepset/roberta-base-squad2",
    use_gpu=True  # Enable GPU acceleration
)

Step 9: Create the Pipeline

Combine the retriever and reader into a pipeline.

from haystack.pipelines import ExtractiveQAPipeline

pipeline = ExtractiveQAPipeline(reader, retriever)

Step 10: Query the Pipeline

Test the pipeline by running a query.

query = "What is semantic search?"
results = pipeline.run(query=query)

# Print the results
print("Search Results:")
for doc in results["documents"]:
    print(f"Content: {doc.content}")
    print(f"Meta: {doc.meta}")
    print("---")

Step 11: Test the Application

To ensure the application is working correctly, run multiple test queries and verify the results.

test_queries = [
    "Who wrote about the quick brown fox?",
    "How is AI transforming the world?",
    "What improves information retrieval?"
]

for query in test_queries:
    print(f"Query: {query}")
    results = pipeline.run(query=query)
    for doc in results["documents"]:
        print(f"Content: {doc.content}")
        print(f"Meta: {doc.meta}")
        print("---") 

Step 12: Run the Application

Run the Python script:

python3 semantic_app.py

Output:

Documents have been successfully indexed!
Batches: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.45it/s]
Updating embeddings: 10000 Docs [00:00, 15556.09 Docs/s]                                                                                                                
Document embeddings have been generated!
Batches: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 148.67it/s]
Inferencing Samples: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  7.95 Batches/s]
Search Results:
Content: Semantic search improves information retrieval.
Meta: {'author': 'Alice Johnson', 'title': 'Example Document 3'}
---
Content: Artificial intelligence is transforming the world.
Meta: {'author': 'Jane Smith', 'title': 'Example Document 2'}
---
Content: The quick brown fox jumps over the lazy dog.
Meta: {'author': 'John Doe', 'title': 'Example Document 1'}
---
Query: Who wrote about the quick brown fox?
Batches: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 218.53it/s]
Inferencing Samples: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 74.42 Batches/s]
Content: The quick brown fox jumps over the lazy dog.
Meta: {'author': 'John Doe', 'title': 'Example Document 1'}
---
Content: Semantic search improves information retrieval.
Meta: {'author': 'Alice Johnson', 'title': 'Example Document 3'}
---
Content: Artificial intelligence is transforming the world.
Meta: {'author': 'Jane Smith', 'title': 'Example Document 2'}
---
Query: How is AI transforming the world?
Batches: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 256.44it/s]
Inferencing Samples: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 78.72 Batches/s]
Content: Artificial intelligence is transforming the world.
Meta: {'author': 'Jane Smith', 'title': 'Example Document 2'}
---
Content: Semantic search improves information retrieval.
Meta: {'author': 'Alice Johnson', 'title': 'Example Document 3'}
---
Content: The quick brown fox jumps over the lazy dog.
Meta: {'author': 'John Doe', 'title': 'Example Document 1'}
---
Query: What improves information retrieval?
Batches: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 263.43it/s]
Inferencing Samples: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 70.20 Batches/s]
Content: Semantic search improves information retrieval.
Meta: {'author': 'Alice Johnson', 'title': 'Example Document 3'}
---
Content: Artificial intelligence is transforming the world.
Meta: {'author': 'Jane Smith', 'title': 'Example Document 2'}
---
Content: The quick brown fox jumps over the lazy dog.
Meta: {'author': 'John Doe', 'title': 'Example Document 1'}

Conclusion

By following these steps, you can deploy a scalable semantic search application using Deepset.ai’s Haystack framework on an Ubuntu GPU server. This setup leverages the power of Elasticsearch, sentence-transformers, and GPU acceleration to deliver fast and accurate search results.