Detecting anomalies in time series data is a big deal in many industries like finance, healthcare, cybersecurity and industrial IoT. Anomalies can mean faults, fraud, or unexpected behavior that requires immediate attention. Efficient anomaly detection can prevent financial loss, ensure safety in industrial systems and maintain cybersecurity integrity.

With the advancement of deep learning, techniques like Long-Short-Term Memory (LSTM) Autoencoders are very effective in detecting anomalies in time series data.

In this article, we will describe how to detect anomalies in time series data using an LSTM-based autoencoder on an Ubuntu GPU server.

Prerequisites

Before starting, ensure you have the following:

  • An Ubuntu 24.04 Cloud GPU Server.
  • CUDA Toolkit and cuDNN Installed.
  • A root or sudo privileges.

Step 1: Setup a Python Environment

In this section, we’ll set up a dedicated Python environment on an Ubuntu GPU server to run our anomaly detection models.

1. Add the Python 3.10 repository.

add-apt-repository ppa:deadsnakes/ppa

2. Update the package list.

apt update -y

3. Install packages necessary for creating a virtual environment and compiling Python modules.

apt install python3.10 python3.10-venv python3.10-dev

4. Set up a virtual environment to manage your project’s dependencies separately from the system’s Python environment.

python3.10 -m venv venv
source venv/bin/activate

5. Upgrade pip and install wheel.

pip install --upgrade pip
pip install wheel

6. Install the Python packages needed for data analysis, machine learning, and visualization.

pip3 install "numpy<2" pandas matplotlib seaborn tensorflow jupyter scikit-learn

Step 2: Prepare the Time Series Data

Time series data is a sequence of observations recorded at regular time intervals. For this guide, we will generate synthetic time series data with injected anomalies.

Create a file named prepare_data.py.

nano prepare_data.py

Add the following code.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Generate synthetic time series data
def generate_data(n=1000):
    np.random.seed(42)
    time = np.arange(n)
    values = np.sin(0.02 * time) + np.random.normal(scale=0.1, size=n)
    anomalies = np.random.choice(n, size=10, replace=False)
    values[anomalies] += np.random.normal(scale=2, size=10)  # Injecting anomalies
    return pd.DataFrame({'timestamp': time, 'value': values})

data = generate_data()
data.to_csv("time_series_data.csv", index=False)

Run the script to generate and save the data.

python3 prepare_data.py

This script creates a time series dataset with normal patterns and anomalies and saves it as time_series_data.csv.

Step 3: Build an LSTM Autoencoder for Anomaly Detection

An autoencoder is a neural network trained to reconstruct normal data patterns. When it encounters an anomaly, the reconstruction error is significantly higher.

Create a file to detect anomalies.

nano anomaly_detection.py

Add the following code.

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, RepeatVector, TimeDistributed
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt

# Load time series data
data = pd.read_csv("time_series_data.csv")
values = data['value'].values.reshape(-1, 1)

# Normalize data
scaler = MinMaxScaler()
values_scaled = scaler.fit_transform(values)

# Create sequences for LSTM
def create_sequences(data, seq_length):
    sequences = []
    for i in range(len(data) - seq_length):
        sequences.append(data[i: i + seq_length])
    return np.array(sequences)

seq_length = 50
X_train = create_sequences(values_scaled, seq_length)

# Define LSTM autoencoder
model = Sequential([
    LSTM(64, activation='relu', input_shape=(seq_length, 1), return_sequences=True),
    Dropout(0.2),
    LSTM(32, activation='relu', return_sequences=False),
    RepeatVector(seq_length),
    LSTM(32, activation='relu', return_sequences=True),
    Dropout(0.2),
    LSTM(64, activation='relu', return_sequences=True),
    TimeDistributed(Dense(1))
])

model.compile(optimizer='adam', loss='mse')

# Train the model
model.fit(X_train, X_train, epochs=20, batch_size=16, validation_split=0.1)

# Save the trained model
model.save("anomaly_detector.h5")

Run the script to train the model.

python3 anomaly_detection.py

This script trains the LSTM autoencoder on normal patterns and saves the model as anomaly_detector.h5.

Epoch 1/20
54/54 [==============================] - 5s 41ms/step - loss: 0.1079 - val_loss: 0.0153
Epoch 2/20
54/54 [==============================] - 2s 34ms/step - loss: 0.0308 - val_loss: 0.0116
Epoch 3/20
54/54 [==============================] - 2s 34ms/step - loss: 0.0196 - val_loss: 0.0103
Epoch 4/20
54/54 [==============================] - 2s 34ms/step - loss: 0.0143 - val_loss: 0.0088
Epoch 5/20
54/54 [==============================] - 2s 34ms/step - loss: 0.0113 - val_loss: 0.0105
Epoch 6/20
54/54 [==============================] - 2s 34ms/step - loss: 0.0096 - val_loss: 0.0075

Step 4: Detect Anomalies

After training the model, we will use it to detect anomalies in the time series data. Create a file named detect_anomalies.py.

nano detect_anomalies.py

Add the following code.

from tensorflow.keras.models import load_model
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler

# Load the trained model with custom loss function
from tensorflow.keras.losses import MeanSquaredError
model = load_model("anomaly_detector.h5", custom_objects={"mse": MeanSquaredError()})

# Load and preprocess data
data = pd.read_csv("time_series_data.csv")
values = data['value'].values.reshape(-1, 1)

# Normalize data
scaler = MinMaxScaler()
values_scaled = scaler.fit_transform(values)

# Create sequences for prediction
def create_sequences(data, seq_length):
    sequences = []
    for i in range(len(data) - seq_length):
        sequences.append(data[i: i + seq_length])
    return np.array(sequences)

seq_length = 50
X_test = create_sequences(values_scaled, seq_length)

# Predict reconstruction error
X_pred = model.predict(X_test)
mse = np.mean(np.abs(X_pred - X_test), axis=(1, 2))
threshold = np.percentile(mse, 95)  # 95th percentile as anomaly threshold

# Mark anomalies
data = data.iloc[seq_length:]
data['mse'] = mse
data['anomaly'] = data['mse'] > threshold

# Plot anomalies
plt.figure(figsize=(12, 6))
plt.plot(data['timestamp'], data['value'], label='Value', color='blue')
plt.scatter(data['timestamp'][data['anomaly']], data['value'][data['anomaly']], color='red', label='Anomaly')
plt.legend()
plt.title("Anomaly Detection in Time Series Data")
plt.xlabel("Timestamp")
plt.ylabel("Value")
plt.show()

# Save detected anomalies to CSV
data.to_csv("anomalies_detected.csv", index=False)

print("Anomaly detection completed. Results saved in anomalies_detected.csv.")

Run the above script.

python3 detect_anomalies.py

This script detects anomalies and visualizes them while saving results to anomalies_detected.csv.

Anomaly detection completed. Results saved in anomalies_detected.csv.

You can run the command below to display the anomalies detected in the terminal. You can also open the CSV file in a spreadsheet application.

cat anomalies_detected.csv

Output.

timestamp,value,mse,anomaly
50,0.873879381747376,0.05099832916747173,False
51,0.8135997939077313,0.05162067832585881,False
52,0.7947120272127426,0.051868111103225124,False
53,0.933523111229073,0.051020112694549,False
54,0.9850577591345426,0.0504320734778334,False
55,0.9843353719730552,0.05125568314028007,False
56,0.8161786898542412,0.05199617866260993,False
57,0.8777122585307618,0.05020586907291115,False
58,0.9499294519121232,0.04897388883144851,False
59,1.022160525120256,0.048887483703040434,False
60,0.8841216621826973,0.048953077273603506,False
61,0.920533458652686,0.04866946629698113,False
62,0.8351505020489361,0.04704867943636561,False
63,0.8324696791824486,0.04447853718451549,False
64,1.0392684425286447,0.045233664400090544,False

Conclusion

In this article we showed how to detect anomalies in time series data using an LSTM based autoencoder on an Ubuntu GPU server. Now you can apply similar technique to your own time series data to find unusual patterns and issues.