quantitative finance
Deep Learning in Quantitative Finance
Traditional quant models (e.g. ARIMA, Kalman filters, GARCH) are often:
Linear
Rigid
Prone to mis-specification
In contrast, deep learning models like LSTMs or Transformers:
Learn nonlinear relationships
Handle sequential dependencies
Fuse heterogeneous data (text + price + volume)
Stock Price Prediction
One typical deep learning task in finance:
Predict the next-day return or price given a sliding window of historical features.
We’ll now implement a PyTorch LSTM model to predict the next closing price using past closing prices, percent change, RSI, moving average, and percent volume change.
PyTorch Implementation
Step 1: Imports and Data Simulation
import torch import torch.nn as nn import numpy as np from sklearn.preprocessing import MinMaxScaler # Simulate a dataset np.random.seed(0) num_samples = 1000 # Features: [pct_change_price, moving_avg, rsi, pct_change_volume] X = np.random.randn(num_samples, 4).astype(np.float32) # Target: next-day price (simplified for demo) y = np.random.randn(num_samples, 1).astype(np.float32) # Normalize input features scaler = MinMaxScaler() X_scaled = scaler.fit_transform(X) # Reshape for LSTM input: [batch_size, seq_len, input_size] seq_length = 10 X_sequences = [] y_sequences = [] for i in range(len(X_scaled) - seq_length): X_sequences.append(X_scaled[i:i+seq_length]) y_sequences.append(y[i+seq_length]) X_tensor = torch.tensor(np.array(X_sequences)) y_tensor = torch.tensor(np.array(y_sequences))
Step 2: Define the LSTM Model
class StockLSTM(nn.Module): def __init__(self, input_size, hidden_size, num_layers=1): super(StockLSTM, self).__init__() self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True) self.fc = nn.Linear(hidden_size, 1) # Regression output def forward(self, x): out, _ = self.lstm(x) # out: [batch, seq_len, hidden] out = out[:, -1, :] # Get output of the last time step out = self.fc(out) return out
Step 3: Training Setupp
model = StockLSTM(input_size=4, hidden_size=32) criterion = nn.MSELoss() optimizer = torch.optim.Adam(model.parameters(), lr=0.001) # Training loop epochs = 20 for epoch in range(epochs): model.train() output = model(X_tensor) loss = criterion(output, y_tensor) optimizer.zero_grad() loss.backward() optimizer.step() if (epoch+1) % 5 == 0: print(f'Epoch {epoch+1}, Loss: {loss.item():.4f}')
Step 4: Making Predictions and Post-Processing
model.eval() with torch.no_grad(): predicted = model(X_tensor).numpy() # If the output was scaled, you would inverse transform here: # predicted_unscaled = scaler_y.inverse_transform(predicted)
Conceptual Implications
Pros:
Learns complex temporal patterns from raw input features
Can generalize better than simple statistical models
Easily extensible: add news sentiment, options data, or macro signals
Cons:
Easy to overfit
Financial time series are non-stationary
Prediction does not imply profitable trading (you still need backtests, slippage models, transaction costs)
Next Steps for Serious Research
Use walk-forward cross-validation to respect temporal order
Use early stopping and dropout to regularize
Add attention or switch to Transformer architectures
Fuse textual data using pre-trained NLP embeddings (e.g. BERT for earnings headlines)
Ideas from the book Machine Learning in Finance: From Theory to Practice
Authors: Matthew F. Dixon, Igor Halperin, Paul Bilokon
Focus: Machine Learning for Financial Time Series
This part of the book addresses the modeling of sequential (time-dependent) data, which is essential in financial contexts where order, timing, and dynamics govern system behavior—such as in pricing, trading, risk, and forecasting.
This part of the book provides the dynamic modeling tools needed for real-world financial decision-making. It bridges classical econometrics with modern machine learning techniques, empowering practitioners to model financial systems as evolving, uncertain environments.
Sequence Modeling
- Introduces autoregressive models (AR, MA, ARMA) as the classical starting point.
- These models assume stationarity and linearity, providing an interpretable foundation.
- Used as baseline models to evaluate performance gains from machine learning approaches.
Why it matters: Many financial time series are non-stationary or exhibit regime switches. Understanding basic models allows practitioners to recognize their limitations and when advanced ML is justified.
Probabilistic Sequence Modeling
- Covers Hidden Markov Models (HMMs) and State-Space Models (SSMs), including Kalman Filters.
- Introduces particle filtering for non-linear, non-Gaussian systems.
- Useful for modeling latent states such as market regimes or volatility states.
Why it matters: These models allow inference on unobservable (latent) variables, enabling strategies that adapt to market structure changes, such as bull/bear transitions or volatility shocks.
Advanced Neural Networks for Sequences
- Presents Recurrent Neural Networks (RNNs), including LSTM and GRU variants.
- Explores CNNs applied to time series for capturing local temporal structure.
- Describes autoencoders for anomaly detection and dimensionality reduction.
Why it matters: These deep models can learn complex, non-linear, long-range dependencies in financial data that traditional models cannot capture. They are highly applicable to forecasting, algorithmic trading, and portfolio optimization.
Importance of These Ideas
- Time dependency is foundational in finance: prices, interest rates, and risk factors evolve sequentially.
- Improved forecasting: capturing lagged and temporal relationships enhances predictive accuracy.
- Risk modeling benefits: enables path-dependent and scenario-based analysis.
- Structural discovery: helps uncover hidden market patterns, such as cyclical regimes or liquidity states.
Seven Reasons Most Machine Learning Funds Fail
Dr. Marcos López de Prado, in his lecture “The 7 Reasons Most Machine Learning Funds Fail,” outlines seven systemic errors made by quantitative (particularly ML-based) hedge funds. Each reflects a deep misunderstanding of the financial domain or a failure to properly apply machine learning techniques to it.
Here is a detailed explanation of each:
1. The Sisyphus Paradigm (Working in Silos)
Explanation:
In many funds, individual researchers or teams work in isolation, developing predictive models on their own. These models are often overfit to historical data, lack proper vetting, and are not shared or integrated across the firm.
This siloed approach leads to duplication of effort, missed synergies, and most importantly, the accumulation of fragile strategies that appear to perform well in backtests but break down in live trading.
Solution:
Encourage collaborative research, centralize model evaluation and validation, and create reusable infrastructure to prevent constant reinvention of flawed ideas.
2. Integer Differentiation (Stationarity vs. Memory Dilemma)
Explanation:
To apply many ML algorithms, practitioners attempt to make financial time series stationary (i.e., with constant statistical properties) by taking differences (e.g., first-differencing returns).
However, this often destroys important long-term dependencies (autocorrelations or “memory”) that might carry predictive information. In other words, in trying to remove non-stationarity, one may erase the very structure that could yield forecasts.
Solution: Use more nuanced transformation techniques such as fractional differentiation, which maintains memory while achieving stationarity.
3. Inefficient Sampling (Ignoring Market Information Flow)
Explanation:
Most ML systems sample data at regular intervals (e.g., every minute or every day), assuming time flows uniformly. However, market information flow is irregular: a lot can happen in a few seconds during volatility, and little during calm periods.
Fixed-time sampling can dilute important signals and overemphasize noise.
Solution:
Use event-based sampling methods like volume bars, dollar bars, or tick bars—sampling based on market activity rather than clock time. These adapt to the actual rate at which new information is incorporated into prices.
4. Wrong Labeling (Simplistic Fixed-Horizon Labels)
Explanation:
ML classification models require target labels. A common approach is to define them using fixed horizons (e.g., will price go up over the next 10 minutes?). But this ignores the varying dynamics of price moves and treats all returns within the window equally.
This leads to noisy and unreliable labels, reducing model performance.
Solution:
Adopt dynamic labeling techniques like the triple-barrier method, which sets thresholds for profit-taking, stop-loss, and a time limit. This better reflects realistic trading logic and allows cleaner labeling of outcomes.
5. Weighting of Non‑IID Samples (Ignoring Correlation and Overlapping Data)
Explanation:
Standard ML models assume that data samples are independent and identically distributed (IID). But financial data violates this assumption—observations are autocorrelated, overlapping, and heteroscedastic.
Using such data without adjustment causes the model to overestimate its performance and misidentify patterns.
Solution:
Apply techniques like sequential bootstrapping, which accounts for sample overlap and adjusts the weighting of samples in the training process. This ensures that correlated samples don’t bias the model.
6. Cross-Validation Leakage (Temporal Contamination)
Explanation:
Cross-validation (CV) is used to evaluate model generalization. But in time series, standard k-fold CV leaks information from the future into the past—violating causality.
For example, a model trained on data that overlaps with the test set has already “seen” part of the future, inflating its perceived accuracy.
Solution: Use purged k-fold CV and embargoing. Purging removes overlapping samples from the training set, and embargoing enforces a buffer period after the test set to prevent contamination from adjacent data.
7. Backtest Overfitting (Multiple-Testing Problem)
Explanation:
Researchers often try many models or strategies and select the one that performs best in backtesting. But if enough strategies are tested, at least one will look good by random chance—despite being useless in production.
This is the multiple testing problem, and it leads to false discoveries.
Solution:
Use techniques like the Deflated Sharpe Ratio (DSR) or Probability of Backtest Overfitting (PBO) to statistically correct for multiple hypothesis testing. These tools estimate how likely a model’s backtest performance is due to luck rather than skill.
Videos
YouTube: Title: Deep Learning for Sequences in Quantitative Finance by David Kriegman (March 16, 2022)
Here is a detailed elaboration on the intersection of Quantitative Finance and Deep Learning using PyTorch, including a PyTorch-based implementation to predict stock prices using an LSTM model.