freeradiantbunny.org

freeradiantbunny.org/blog

preprocessing for time-series

Preprocessing Stock Price Time-Series for Neural Networks

What Format is the Raw Data?

Normally, stock price data looks like this:

DateOpenHighLowCloseVolume
2025-01-01102.5105.0101.0104.05000000
2025-01-02104.2106.5103.5105.54800000

Critical point: Time order must be preserved. You cannot shuffle the rows.


What Does It Mean to "Normalize Prices"?

Normalization means scaling the numeric values into a smaller, consistent range.

Neural networks train better when inputs are scaled uniformly to prevent large numbers from dominating small numbers.

Benefits:


MinMaxScaler vs StandardScaler

MinMaxScaler

StandardScaler


How Should a System Admin Preprocess the Data?

  1. Load the data from CSV, database, or API.
  2. Check for missing dates or fields.
  3. Handle missing data:
  4. Normalize the features:
  5. Frame into sequences:

How to Frame the Data into Sequences

Suppose your dataset is 100 days long:

In pseudocode:

for i in range(60, len(data)):
	    X.append(data[i-60:i])   # 60-day slice
	    y.append(data[i]['Close'])  # Target is next day's Close
	

How to Handle Missing Data


Quick Summary Table

TaskMethod
Normalize pricesMinMaxScaler [0,1] or StandardScaler (mean=0, std=1)
Data formatTabular with Date, Open, High, Low, Close, Volume
Frame sequencesPast 60 days input to predict next day's Close
Handle missing dataForward-fill or interpolate; preserve time order

Important Warning

You must apply the same normalization parameters (scaler) to both training and test data.

Train the scaler on the training set only, then apply it to unseen (validation/test) data.

Otherwise, you introduce data leakage and corrupt your results.


Optional Next Step

Would you like an example of a small Perl script for data preprocessing

(forward-fill, scaling) or a Python snippet using pandas and sklearn?