Generating music using transformers has become very popular due to the advancements in deep learning and natural language processing (NLP). Transformers, originally designed for NLP tasks, have been adapted for music generation by treating musical sequences as a form of language.

In this article, we will generate music using transformers on an Ubuntu GPU server. We’ll cover setting up the environment, writing the code, and running the code to generate music.

Prerequisites

Before starting, ensure you have the following:

  • An Ubuntu 24.04 Cloud GPU Server.
  • CUDA Toolkit and cuDNN Installed.
  • A root or sudo privileges.

Step 1: Set Up the Environment

1. First, make sure your system is up to date.

apt update -y

2. Add the Python repository.

add-apt-repository ppa:deadsnakes/ppa

3. Update the package index.

apt update -y

4. Install Python 3.10 and essential libraries.

apt install python3.10 python3.10-venv python3.10-dev -y

5. Create a virtual environment to isolate the dependencies:

python3.10 -m venv musicgen_env
source musicgen_env/bin/activate

6. Install the required Python libraries:

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118
pip install transformers
pip install numpy scipy matplotlib
pip install midiutil

Explanation:

  • torch: PyTorch is a deep-learning framework that we’ll use to build and train our transformer model.
  • transformers: The Hugging Face Transformers library provides pre-trained models and tools for working with transformers.
  • numpy, scipy, matplotlib: These libraries are used for numerical computations and visualization.
  • midiutil: This library is used to create MIDI files from the generated music.

Step 2: Writing the Code

1. Create a Python Script
Create a new Python script file named music_generator.py:

nano music_generator.py

2. Import Required Libraries

Add the below code:

# Import required libraries
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
from midiutil import MIDIFile

3. Load Pre-trained Model and Tokenizer
We’ll use a pre-trained GPT-2 model from Hugging Face’s Transformers library. GPT-2 is a transformer-based model that can generate text, and we’ll adapt it for music generation.

# Load pre-trained GPT-2 model and tokenizer
def load_model():
    model_name = "gpt2"
    tokenizer = GPT2Tokenizer.from_pretrained(model_name)
    model = GPT2LMHeadModel.from_pretrained(model_name)

    # Set pad token ID explicitly
    tokenizer.pad_token = tokenizer.eos_token

    # Move model to GPU if available
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)

    return model, tokenizer, device

4. Define Music Generation Function
Next, we’ll define a function to generate music using the GPT-2 model. We’ll treat the music as a sequence of tokens and generate new tokens based on the input sequence.

# Generate music using the GPT-2 model
def generate_music(prompt, model, tokenizer, device, max_length=100, temperature=0.7):
    # Encode the input prompt
    input_ids = tokenizer.encode(prompt, return_tensors="pt").to(device)
    
    # Generate music tokens
    output = model.generate(
        input_ids,
        max_length=max_length,
        temperature=temperature,
        do_sample=True,
        top_k=50,
        top_p=0.95,
        pad_token_id=tokenizer.eos_token_id,  # Explicitly set pad token ID
        attention_mask=input_ids.ne(tokenizer.pad_token_id).float().to(device)  # Set attention mask
    )
    
    # Decode the generated tokens to a string
    generated_music = tokenizer.decode(output[0], skip_special_tokens=True)
    
    return generated_music

Explanation:

  • prompt: The initial sequence of tokens that will be used to start the generation.
  • max_length: The maximum length of the generated sequence.
  • temperature: Controls the randomness of the generated sequence. Lower values make the output more deterministic, while higher values make it more random.
  • do_sample: If True, the model will sample from the probability distribution instead of taking the most likely token.
  • top_k: Limits the sampling pool to the top k tokens.
  • top_p: Implements nucleus sampling, where the model samples from the smallest possible set of tokens whose cumulative probability exceeds p.

6. Convert Generated Music to MIDI
The generated music is a sequence of tokens. We’ll convert these tokens into a MIDI file that can be played or further processed.

# Convert generated music tokens to a MIDI file
def tokens_to_midi(generated_music, filename="generated_music.mid"):
    # Convert the generated music tokens to a list of integers
    try:
        tokens = [int(token) for token in generated_music.split() if token.isdigit()]
    except ValueError:
        print("Error: Generated music contains non-integer tokens. Please check the model output.")
        return
    
    # Create a MIDI file
    midi = MIDIFile(1)
    track = 0
    time = 0
    midi.addTrackName(track, time, "Generated Music")
    midi.addTempo(track, time, 120)
    
    # Add notes to the MIDI file
    for i, token in enumerate(tokens):
        pitch = token % 128  # MIDI pitches range from 0 to 127
        duration = 1  # Each note lasts for 1 beat
        midi.addNote(track, 0, pitch, time + i, duration, 100)
    
    # Save the MIDI file
    with open(filename, "wb") as output_file:
        midi.writeFile(output_file)

7. Generate and Save Music
Finally, we’ll generate music and save it as a MIDI file.

# Main function to generate and save music
def main():
    # Load the model and tokenizer
    model, tokenizer, device = load_model()

    # Define a prompt to start the music generation
    prompt = "60 62 64 65 67 69 71 72"  # Example: C major scale
    
    # Generate music
    generated_music = generate_music(prompt, model, tokenizer, device, max_length=200, temperature=0.7)
    
    # Save the generated music as a MIDI file
    tokens_to_midi(generated_music, "generated_music.mid")
    
    print("Music generated and saved as 'generated_music.mid'")

# Run the script
if __name__ == "__main__":
    main()

Step 3: Running the Code

Now, run the above script using the below command.

python3 music_generator.py

When you run the script, it will generate a MIDI file named generated_music.mid in the same directory as the script. The output in the terminal will look something like this:

Music generated and saved as 'generated_music.mid'

The MIDI file contains the generated music in a format that can be easily manipulated or played back. Each note in the MIDI file corresponds to a token in the generated sequence.

Conclusion

In this article we’ve gone through the process of generating music using transformers on an Ubuntu GPU server. We’ve covered setting up the environment, writing the code and running the code to generate music. The generated music is saved as a MIDI file which can be played or further processed. Try it out! By leveraging the power of transformers and GPUs, you can create unique and interesting musical compositions that push the boundaries of what’s possible with AI-generated music.