preprocessing for time-series

Preprocessing Stock Price Time-Series for Neural Networks

What Format is the Raw Data?

Normally, stock price data looks like this:

Date	Open	High	Low	Close	Volume
2025-01-01	102.5	105.0	101.0	104.0	5000000
2025-01-02	104.2	106.5	103.5	105.5	4800000

Critical point: Time order must be preserved. You cannot shuffle the rows.

What Does It Mean to "Normalize Prices"?

Normalization means scaling the numeric values into a smaller, consistent range.

Neural networks train better when inputs are scaled uniformly to prevent large numbers from dominating small numbers.

Benefits:

Prevents numerical instability.
Ensures faster, smoother training.
Keeps gradients stable during backpropagation.

MinMaxScaler vs StandardScaler

MinMaxScaler

Scales data to a fixed range, usually [0, 1].
Formula: x' = (x - min(x)) / (max(x) - min(x))
Example: If Open prices range from 90 to 110, then 90 maps to 0, 110 maps to 1, and 100 maps to 0.5.

StandardScaler

Scales data to have zero mean and unit variance (standard Gaussian distribution).
Formula: x' = (x - mean) / standard_deviation
Used when modeling data relative to the average behavior.

How Should a System Admin Preprocess the Data?

Load the data from CSV, database, or API.
Check for missing dates or fields.
Handle missing data:

Market holidays are normal (no trading).
Missing values in features (Open, High, etc.) must be filled or interpolated.

Normalize the features:

Apply MinMaxScaler or StandardScaler to each column separately.
Save the scaler for inverse transformation later.

Frame into sequences:

Use past 60 days as input to predict the next day's close price.

How to Frame the Data into Sequences

Suppose your dataset is 100 days long:

First input (X) = days 1-60; Target (y) = day 61 Close price.
Second input (X) = days 2-61; Target (y) = day 62 Close price.

In pseudocode:

for i in range(60, len(data)):
X.append(data[i-60:i])   # 60-day slice
y.append(data[i]['Close'])  # Target is next day's Close

How to Handle Missing Data

Single missing values: Use forward-fill:

df.fillna(method='ffill', inplace=True)

Short gaps: Interpolate linearly:

df.interpolate(method='linear', inplace=True)

Major missing periods: Consider dropping those sequences entirely.

Quick Summary Table

Task	Method
Normalize prices	MinMaxScaler [0,1] or StandardScaler (mean=0, std=1)
Data format	Tabular with Date, Open, High, Low, Close, Volume
Frame sequences	Past 60 days input to predict next day's Close
Handle missing data	Forward-fill or interpolate; preserve time order

Important Warning

You must apply the same normalization parameters (scaler) to both training and test data.

Train the scaler on the training set only, then apply it to unseen (validation/test) data.

Otherwise, you introduce data leakage and corrupt your results.

Optional Next Step

Would you like an example of a small Perl script for data preprocessing (forward-fill, scaling) or a Python snippet using pandas and sklearn?

freeradiantbunny.org

freeradiantbunny.org/blog