freeradiantbunny.org

freeradiantbunny.org/blog

preprocessing for time-series

Preprocessing Stock Price Time-Series for Neural Networks

What Format is the Raw Data?

Normally, stock price data looks like this:

Date Open High Low Close Volume
2025-01-01 102.5 105.0 101.0 104.0 5000000
2025-01-02 104.2 106.5 103.5 105.5 4800000

Critical point: Time order must be preserved. You cannot shuffle the rows.


What Does It Mean to "Normalize Prices"?

Normalization means scaling the numeric values into a smaller, consistent range.

Neural networks train better when inputs are scaled uniformly to prevent large numbers from dominating small numbers.

Benefits:


MinMaxScaler vs StandardScaler

MinMaxScaler

StandardScaler


How Should a System Admin Preprocess the Data?

  1. Load the data from CSV, database, or API.
  2. Check for missing dates or fields.
  3. Handle missing data:
  1. Normalize the features:
  1. Frame into sequences:

How to Frame the Data into Sequences

Suppose your dataset is 100 days long:

In pseudocode:

for i in range(60, len(data)):
	X.append(data[i-60:i])   # 60-day slice
	y.append(data[i]['Close'])  # Target is next day's Close
    

How to Handle Missing Data

df.fillna(method='ffill', inplace=True)
df.interpolate(method='linear', inplace=True)

Quick Summary Table

Task Method
Normalize prices MinMaxScaler [0,1] or StandardScaler (mean=0, std=1)
Data format Tabular with Date, Open, High, Low, Close, Volume
Frame sequences Past 60 days input to predict next day's Close
Handle missing data Forward-fill or interpolate; preserve time order

Important Warning

You must apply the same normalization parameters (scaler) to both training and test data.

Train the scaler on the training set only, then apply it to unseen (validation/test) data.

Otherwise, you introduce data leakage and corrupt your results.


Optional Next Step

Would you like an example of a small Perl script for data preprocessing (forward-fill, scaling) or a Python snippet using pandas and sklearn?