quantitative finance

Deep Learning in Quantitative Finance

Traditional quant models (e.g. ARIMA, Kalman filters, GARCH) are often:

Linear

Rigid

Prone to mis-specification

In contrast, deep learning models like LSTMs or Transformers:

Learn nonlinear relationships

Handle sequential dependencies

Fuse heterogeneous data (text + price + volume)

Stock Price Prediction

One typical deep learning task in finance:

Predict the next-day return or price given a sliding window of historical features.

We’ll now implement a PyTorch LSTM model to predict the next closing price using past closing prices, percent change, RSI, moving average, and percent volume change.

PyTorch Implementation

Step 1: Imports and Data Simulation

  import torch
  import torch.nn as nn
  import numpy as np
  from sklearn.preprocessing import MinMaxScaler
  # Simulate a dataset
  np.random.seed(0)
  num_samples = 1000
  # Features: [pct_change_price, moving_avg, rsi, pct_change_volume]
  X = np.random.randn(num_samples, 4).astype(np.float32)
  # Target: next-day price (simplified for demo)
  y = np.random.randn(num_samples, 1).astype(np.float32)
  # Normalize input features
  scaler = MinMaxScaler()
  X_scaled = scaler.fit_transform(X)
  # Reshape for LSTM input: [batch_size, seq_len, input_size]
  seq_length = 10
  X_sequences = []
  y_sequences = []
  for i in range(len(X_scaled) - seq_length):
  X_sequences.append(X_scaled[i:i+seq_length])
  y_sequences.append(y[i+seq_length])
  X_tensor = torch.tensor(np.array(X_sequences))
  y_tensor = torch.tensor(np.array(y_sequences))

Step 2: Define the LSTM Model

  class StockLSTM(nn.Module):
  def __init__(self, input_size, hidden_size, num_layers=1):
  super(StockLSTM, self).__init__()
  self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
  self.fc = nn.Linear(hidden_size, 1)  # Regression output
  def forward(self, x):
  out, _ = self.lstm(x)  # out: [batch, seq_len, hidden]
  out = out[:, -1, :]    # Get output of the last time step
  out = self.fc(out)
  return out

Step 3: Training Setupp

  model = StockLSTM(input_size=4, hidden_size=32)
  criterion = nn.MSELoss()
  optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
  # Training loop
  epochs = 20
  for epoch in range(epochs):
  model.train()
  output = model(X_tensor)
  loss = criterion(output, y_tensor)
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()
  if (epoch+1) % 5 == 0:
  print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}')

Step 4: Making Predictions and Post-Processing

  model.eval()
  with torch.no_grad():
  predicted = model(X_tensor).numpy()
  # If the output was scaled, you would inverse transform here:
  # predicted_unscaled = scaler_y.inverse_transform(predicted)

Conceptual Implications

Pros:

Learns complex temporal patterns from raw input features

Can generalize better than simple statistical models

Easily extensible: add news sentiment, options data, or macro signals

Cons:

Easy to overfit

Financial time series are non-stationary

Prediction does not imply profitable trading (you still need backtests, slippage models, transaction costs)

Next Steps for Serious Research

Use walk-forward cross-validation to respect temporal order

Use early stopping and dropout to regularize

Add attention or switch to Transformer architectures

Fuse textual data using pre-trained NLP embeddings (e.g. BERT for earnings headlines)

Ideas from the book Machine Learning in Finance: From Theory to Practice

Authors: Matthew F. Dixon, Igor Halperin, Paul Bilokon

Focus: Machine Learning for Financial Time Series

This part of the book addresses the modeling of sequential (time-dependent) data, which is essential in financial contexts where order, timing, and dynamics govern system behavior—such as in pricing, trading, risk, and forecasting.

This part of the book provides the dynamic modeling tools needed for real-world financial decision-making. It bridges classical econometrics with modern machine learning techniques, empowering practitioners to model financial systems as evolving, uncertain environments.

Sequence Modeling

Introduces autoregressive models (AR, MA, ARMA) as the classical starting point.
These models assume stationarity and linearity, providing an interpretable foundation.
Used as baseline models to evaluate performance gains from machine learning approaches.

Why it matters: Many financial time series are non-stationary or exhibit regime switches. Understanding basic models allows practitioners to recognize their limitations and when advanced ML is justified.

Probabilistic Sequence Modeling

Covers Hidden Markov Models (HMMs) and State-Space Models (SSMs), including Kalman Filters.
Introduces particle filtering for non-linear, non-Gaussian systems.
Useful for modeling latent states such as market regimes or volatility states.

Why it matters: These models allow inference on unobservable (latent) variables, enabling strategies that adapt to market structure changes, such as bull/bear transitions or volatility shocks.

Advanced Neural Networks for Sequences

Presents Recurrent Neural Networks (RNNs), including LSTM and GRU variants.
Explores CNNs applied to time series for capturing local temporal structure.
Describes autoencoders for anomaly detection and dimensionality reduction.

Why it matters: These deep models can learn complex, non-linear, long-range dependencies in financial data that traditional models cannot capture. They are highly applicable to forecasting, algorithmic trading, and portfolio optimization.

Importance of These Ideas

Time dependency is foundational in finance: prices, interest rates, and risk factors evolve sequentially.
Improved forecasting: capturing lagged and temporal relationships enhances predictive accuracy.
Risk modeling benefits: enables path-dependent and scenario-based analysis.
Structural discovery: helps uncover hidden market patterns, such as cyclical regimes or liquidity states.

Seven Reasons Most Machine Learning Funds Fail

Dr. Marcos López de Prado, in his lecture “The 7 Reasons Most Machine Learning Funds Fail,” outlines seven systemic errors made by quantitative (particularly ML-based) hedge funds. Each reflects a deep misunderstanding of the financial domain or a failure to properly apply machine learning techniques to it.

Here is a detailed explanation of each:

1. The Sisyphus Paradigm (Working in Silos)

Explanation:

In many funds, individual researchers or teams work in isolation, developing predictive models on their own. These models are often overfit to historical data, lack proper vetting, and are not shared or integrated across the firm.

This siloed approach leads to duplication of effort, missed synergies, and most importantly, the accumulation of fragile strategies that appear to perform well in backtests but break down in live trading.

Solution:

Encourage collaborative research, centralize model evaluation and validation, and create reusable infrastructure to prevent constant reinvention of flawed ideas.

2. Integer Differentiation (Stationarity vs. Memory Dilemma)

Explanation:

To apply many ML algorithms, practitioners attempt to make financial time series stationary (i.e., with constant statistical properties) by taking differences (e.g., first-differencing returns).

However, this often destroys important long-term dependencies (autocorrelations or “memory”) that might carry predictive information. In other words, in trying to remove non-stationarity, one may erase the very structure that could yield forecasts.

Solution: Use more nuanced transformation techniques such as fractional differentiation, which maintains memory while achieving stationarity.

3. Inefficient Sampling (Ignoring Market Information Flow)

Explanation:

Most ML systems sample data at regular intervals (e.g., every minute or every day), assuming time flows uniformly. However, market information flow is irregular: a lot can happen in a few seconds during volatility, and little during calm periods.

Fixed-time sampling can dilute important signals and overemphasize noise.

Solution:

Use event-based sampling methods like volume bars, dollar bars, or tick bars—sampling based on market activity rather than clock time. These adapt to the actual rate at which new information is incorporated into prices.

4. Wrong Labeling (Simplistic Fixed-Horizon Labels)

Explanation:

ML classification models require target labels. A common approach is to define them using fixed horizons (e.g., will price go up over the next 10 minutes?). But this ignores the varying dynamics of price moves and treats all returns within the window equally.

This leads to noisy and unreliable labels, reducing model performance.

Solution:

Adopt dynamic labeling techniques like the triple-barrier method, which sets thresholds for profit-taking, stop-loss, and a time limit. This better reflects realistic trading logic and allows cleaner labeling of outcomes.

5. Weighting of Non‑IID Samples (Ignoring Correlation and Overlapping Data)

Explanation:

Standard ML models assume that data samples are independent and identically distributed (IID). But financial data violates this assumption—observations are autocorrelated, overlapping, and heteroscedastic.

Using such data without adjustment causes the model to overestimate its performance and misidentify patterns.

Solution:

Apply techniques like sequential bootstrapping, which accounts for sample overlap and adjusts the weighting of samples in the training process. This ensures that correlated samples don’t bias the model.

6. Cross-Validation Leakage (Temporal Contamination)

Explanation:

Cross-validation (CV) is used to evaluate model generalization. But in time series, standard k-fold CV leaks information from the future into the past—violating causality.

For example, a model trained on data that overlaps with the test set has already “seen” part of the future, inflating its perceived accuracy.

Solution: Use purged k-fold CV and embargoing. Purging removes overlapping samples from the training set, and embargoing enforces a buffer period after the test set to prevent contamination from adjacent data.

7. Backtest Overfitting (Multiple-Testing Problem)

Explanation:

Researchers often try many models or strategies and select the one that performs best in backtesting. But if enough strategies are tested, at least one will look good by random chance—despite being useless in production.

This is the multiple testing problem, and it leads to false discoveries.

Solution:

Use techniques like the Deflated Sharpe Ratio (DSR) or Probability of Backtest Overfitting (PBO) to statistically correct for multiple hypothesis testing. These tools estimate how likely a model’s backtest performance is due to luck rather than skill.

Videos

YouTube: Title: Deep Learning for Sequences in Quantitative Finance by David Kriegman (March 16, 2022)

Here is a detailed elaboration on the intersection of Quantitative Finance and Deep Learning using PyTorch, including a PyTorch-based implementation to predict stock prices using an LSTM model.

freeradiantbunny.org

freeradiantbunny.org/blog