In the rapidly evolving world of AI and machine learning, generating videos from textual descriptions is no longer confined to the realms of science fiction. Thanks to advancements in generative models, it is now possible to create detailed, dynamic videos based solely on a few lines of text. One of the most exciting developments in this area comes from HuggingFace’s ModelScope, featuring the Text2Video Diffusion Model. This model leverages the power of diffusion techniques to transform written narratives into captivating visual stories.

In this guide, we’ll show you how to use the Text2Video Diffusion Model to generate videos from text.

Prerequisites

Before proceeding, ensure you have the following:

  • An Atlantic.Net Cloud GPU server running Ubuntu 24.04, equipped with an NVIDIA A100 GPU with at least 20 GB of GPU RAM.
  • CUDA Toolkit and cuDNN Installed.
  • Root or sudo privileges.

Step 1: Install Python and Additional Dependencies

The default Python version in Ubuntu 24.04 (Python 3.12) is incompatible with HuggingFace requirements. You’ll need to install Python 3.10:

Add the Python repository.

add-apt-repository ppa:deadsnakes/ppa

Update the package index.

apt update -y

Install Python 3.10 and essential libraries.

apt install python3.10 python3.10-venv python3.10-dev
apt install build-essential libssl-dev libffi-dev zlib1g-dev libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm libncurses5-dev libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev liblzma-dev

Step 2: Install WebUI and ModelScope

Install Git LFS for managing large files.

apt install git-lfs -y

Initialize Git LFS to use the WebUI and ModelScope repositories.

git lfs install

Clone the WebUI repository.

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git

Navigate to the new WebUI directory.

cd stable-diffusion-webui

Set the WebUI version to 1.7.0, which is compatible with ModelScope.

git reset --hard cf2772f

Create a directory structure to set up ModelScope files.

mkdir -p models/ModelScope/t2v

Switch to the t2v directory.

cd models/ModelScope/t2v

Download the latest ModelScope text-to-video model.

git clone https://huggingface.co/ali-vilab/modelscope-damo-text-to-video-synthesis .

Step 3: Configure Nginx as a Reverse Proxy to Access WebUI

By default, the WebUI is accessible on the localhost port 7860. To access the WebUI from an external network, you must configure Nginx as a reverse proxy.

First, install the Nginx web server package.

apt install nginx

Next, create an Nginx virtual host configuration file.

nano /etc/nginx/conf.d/webui.conf

Add the following configuration.

upstream webui {
    server 127.0.0.1:7860;
}

server {
    listen 80;
    listen [::]:80;
    server_name webui.example.com;

    proxy_set_header Host $host;
    proxy_http_version 1.1;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header Sec-WebSocket-Extensions $http_sec_websocket_extensions;
    proxy_set_header Sec-WebSocket-Key $http_sec_websocket_key;
    proxy_set_header Sec-WebSocket-Version $http_sec_websocket_version;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";

    location / {
        proxy_pass http://webui;
    }
}

Save and close the file. Then, verify the Nginx configuration.

nginx -t

Output:

nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

Finally, restart the Nginx to apply the changes.

systemctl restart nginx

Step 4: Configure Stable Diffusion WebUI

First, navigate to the stable-diffusion-webui directory.

cd ~/stable-diffusion-webui

Then, create a Python virtual environment and activate it.

python3.10 -m venv venv
source venv/bin/activate

Update pip to the latest version.

pip install --upgrade pip

Update the tqdm module required by the WebUI interface to run models.

pip3 install --upgrade tqdm

Install Fast API and other libraries.

pip3 install fastapi
pip install clip

Step 5: Run the Stable Diffusion WebUI

At this point, your Stable Diffusion WebUI is configured. You can now launch WebUI, which will serve as the interface for video generation.

python3.10 launch.py

Wait for the application to fully initialize, which might take several minutes, depending on your system’s performance. You will see the following output after the successful execution.

100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.97G/3.97G [00:19<00:00, 216MB/s]
Calculating sha256 for /root/stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors: Running on local URL:  http://127.0.0.1:7860

Step 6: Access WebUI and Generate Videos using ModelScope

Open your web browser and access WebUI using the URL http://webui.example.com.

In the main WebUI navigation menu, click on Extensions and navigate to the Available tab and click Load from: to load all available models in the WebUI repository.

In the search field, type the keyword modelscope and press the Space button to search an extension.

Click Install in the text2video result action tab to install the ModelScope extension.

Again search the keyword “video in extras tab” in the search field. Click Install next to the Video in the Extras tab to install the new extension.

Once the extension is installed, click Reload UI at the bottom of the web page to reload WebUI.

Navigate to the txt2video tab within the WebUI interface.

Select the ModelScope as the Model type and enter your desired text prompt in the Prompt field. For example “children are studying in the classroom”. Then, click Generate and wait for at least 3 minutes for the ModelScope generation process to complete. You should see your generated video on the following screen.

You can also change your input text prompt to generate a new video using ModelScope per your requirements.

Conclusion

Congratulations! You have successfully installed and configured the HuggingFace ModelScope Text2Video Diffusion Model on an Atlantic Cloud GPU. The process involves setting up a compatible Python environment, managing large files with Git LFS, configuring Nginx for external access, and deploying a complex AI model within a user-friendly web interface. You can now experiment with different prompts, explore various settings within the ModelScope, and continue to refine your approach to produce unique and impactful video content.