Introduction
Have you ever wondered if computers can predict stock prices? Well, they can—not perfectly, but they can learn from past data to make educated guesses about future prices. In this blog, we’ll explore how we can use a special kind of artificial intelligence (AI) called Long Short-Term Memory (LSTM) to predict the future prices of the Nifty50, India’s leading stock market index.
Don’t worry if you’re not a tech expert—we’ll explain everything in simple terms, just like teaching a friend!
How Does Stock Prediction Work?
Imagine you’re trying to predict the weather. You’d look at past weather patterns—like temperature, humidity, and rainfall—to guess if it’ll rain tomorrow. Similarly, stock prediction works by analyzing past stock prices to forecast future trends.
Here’s how we do it:
Get Historical Data – We download years of Nifty50 stock prices.
Clean the Data – Remove errors or missing values (like a teacher correcting a messy notebook).
Train the AI Model – Teach the computer to recognize patterns in stock prices.
Make Predictions – Ask the AI to predict future prices based on what it learned.
Now, let’s dive deeper into each step!
Step 1: Getting the Data
We use a library called yfinance (Yahoo Finance) to download Nifty50 stock prices from 2005 to 2025. This gives us a big table with daily prices—like a giant Excel sheet with dates and closing prices.
import yfinance as yf start_date = '2005-05-16' end_date = '2025-05-15' nifty_data = yf.download('^NSEI', start=start_date, end=end_date)
Think of this as downloading a history book of stock prices.
Step 2: Cleaning the Data
Sometimes, data has missing or incorrect entries (like a torn page in a book). We fix this by:
Sorting dates correctly.
Removing or filling missing values.
print(f"Missing values in the dataset: {nifty_data.isnull().sum().sum()}") nifty_data = nifty_data.sort_index()
This ensures our AI learns from clean, organized data.
Step 3: Scaling the Data (Making Numbers Easier to Work With)
Stock prices can be huge (like ₹20,000), but AI works better with smaller numbers (between 0 and 1). We use MinMaxScaler to shrink the numbers while keeping their relationships intact.
from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler(feature_range=(0, 1)) scaled_data = scaler.fit_transform(nifty_close) # nifty_close = Closing prices
This is like converting kilometers into meters—same distance, just easier to handle.
Step 4: Preparing the Data for AI Learning
AI learns from sequences. Imagine teaching a child to predict the next number in this sequence:
1, 2, 3, …? (Answer: 4)
Similarly, we give the AI sequences of 60 days’ stock prices and ask it to predict the 61st day.
def create_dataset(dataset, time_step=60): X, y = [], [] for i in range(len(dataset) - time_step - 1): X.append(dataset[i:(i + time_step), 0]) # 60 days of data y.append(dataset[i + time_step, 0]) # Next day's price return np.array(X), np.array(y) X_train, y_train = create_dataset(train_data, time_step=60)
This way, the AI learns patterns like:
If prices rise for 10 days, will they fall soon?
Do big drops usually recover?
Step 5: Building the AI (LSTM Model)
Our AI is an LSTM (Long Short-Term Memory) network — a type of neural network great at learning sequences (like stock prices over time).
We build it like this:
First Layer (50 neurons) – Learns basic patterns.
Second Layer (50 neurons) – Learns deeper trends.
Third Layer (50 neurons) – Makes final predictions.
Dropout Layers – Prevents overfitting (like a student who memorizes answers instead of learning concepts).
model = Sequential() model.add(LSTM(units=50, return_sequences=True, input_shape=(time_step, 1))) model.add(Dropout(0.2)) model.add(LSTM(units=50, return_sequences=True)) model.add(Dropout(0.2)) model.add(LSTM(units=50)) model.add(Dropout(0.2)) model.add(Dense(units=1)) # Final prediction
We then train the model using past data, just like a student practicing with old exam papers.
Step 6: Making Predictions
After training, the AI can predict future prices. We test it on unseen data (like a surprise test) to see how well it performs.
train_predict = model.predict(X_train) test_predict = model.predict(X_test)
We measure accuracy using:
RMSE (Root Mean Square Error) – How far predictions are from real prices.
R² Score – How well the model explains price movements (0% = random guess, 100% = perfect prediction).
train_rmse = math.sqrt(mean_squared_error(y_train_actual, train_predict)) test_r2 = r2_score(y_test_actual, test_predict)
If the AI gets ~90% accuracy, it’s doing well!
Step 7: Predicting Future Prices
Finally, we ask the AI: "What will Nifty50 prices be in the next 30 days?"
We feed it the last 60 days’ data and let it predict day by day.
last_60_days = scaled_data[-60:] future_predictions = [] for _ in range(30): X_future = last_60_days.reshape(1, time_step, 1) future_pred = model.predict(X_future) last_60_days = np.append(last_60_days[1:], future_pred) future_predictions.append(future_pred[0, 0])
We then plot the predictions:
The red line shows what the AI thinks will happen!
- Here, the sample size is limited to get better accuracy. We need to train them with larger data.
Conclusion: Can AI Really Predict Stocks?
Yes — but with limitations.
✅ Good at:
Finding patterns in historical data.
Making short-term predictions.
❌ Not perfect at:
Predicting sudden crashes (like COVID-19).
Accounting for unexpected news (elections, wars).
Try It Yourself!
Want to run this code? Copy the full script from above and try it on Google Colab or Jupyter Notebook.
click below link and run the below code
import numpy as np import pandas as pd import matplotlib.pyplot as plt import yfinance as yf from sklearn.preprocessing import MinMaxScaler from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, LSTM, Dropout from tensorflow.keras.callbacks import EarlyStopping from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score import datetime as dt import math # Download market data print("Downloading Nifty50 data...") start_date = '2005-05-16' end_date = '2025-05-15' nifty_data = yf.download('^NSEI', start=start_date, end=end_date) # Data cleaning and preprocessing print("\nCleaning and preprocessing data...") print(f"Missing values in the dataset: {nifty_data.isnull().sum().sum()}") nifty_data = nifty_data.sort_index() nifty_close = nifty_data['Close'].values.reshape(-1, 1) # Scale the data scaler = MinMaxScaler(feature_range=(0, 1)) scaled_data = scaler.fit_transform(nifty_close) # Split data into training and testing sets train_size = int(len(scaled_data) * 0.8) test_size = len(scaled_data) - train_size train_data, test_data = scaled_data[0:train_size,:], scaled_data[train_size:len(scaled_data),:] print(f"\nTraining data size: {train_size}, Testing data size: {test_size}") def create_dataset(dataset, time_step=60): X, y = [], [] for i in range(len(dataset) - time_step - 1): X.append(dataset[i:(i + time_step), 0]) y.append(dataset[i + time_step, 0]) return np.array(X), np.array(y) # Create the dataset with time steps time_step = 60 X_train, y_train = create_dataset(train_data, time_step) X_test, y_test = create_dataset(test_data, time_step) # Reshape input to be [samples, time steps, features] which is required for LSTM X_train = X_train.reshape(X_train.shape[0], X_train.shape[1], 1) X_test = X_test.reshape(X_test.shape[0], X_test.shape[1], 1) # Build LSTM model print("\nBuilding LSTM model...") model = Sequential() # First LSTM layer with 50 neurons and return sequences=True to stack another LSTM layer model.add(LSTM(units=50, return_sequences=True, input_shape=(time_step, 1))) model.add(Dropout(0.2)) # Dropout to prevent overfitting # Second LSTM layer with 50 neurons model.add(LSTM(units=50, return_sequences=True)) model.add(Dropout(0.2)) # Third LSTM layer with 50 neurons model.add(LSTM(units=50)) model.add(Dropout(0.2)) # Output layer model.add(Dense(units=1)) # Compile the model model.compile(optimizer='adam', loss='mean_squared_error') # Early stopping to prevent overfitting early_stop = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True) # Train the model print("\nTraining the model...") batch_size = 32 epochs = 50 history = model.fit( X_train, y_train, epochs=epochs, batch_size=batch_size, validation_split=0.1, # Use 10% of training data for validation callbacks=[early_stop], verbose=1 ) # Make predictions and evaluate the model print("\nMaking predictions...") train_predict = model.predict(X_train) test_predict = model.predict(X_test) # Inverse transform to get actual values train_predict = scaler.inverse_transform(train_predict) test_predict = scaler.inverse_transform(test_predict) y_train_actual = scaler.inverse_transform(y_train.reshape(-1, 1)) y_test_actual = scaler.inverse_transform(y_test.reshape(-1, 1)) # Calculate performance metrics train_rmse = math.sqrt(mean_squared_error(y_train_actual, train_predict)) test_rmse = math.sqrt(mean_squared_error(y_test_actual, test_predict)) train_mae = mean_absolute_error(y_train_actual, train_predict) test_mae = mean_absolute_error(y_test_actual, test_predict) train_r2 = r2_score(y_train_actual, train_predict) test_r2 = r2_score(y_test_actual, test_predict) # Display results print(f"\nTraining RMSE: {train_rmse:.2f}") print(f"Testing RMSE: {test_rmse:.2f}") print(f"Training MAE: {train_mae:.2f}") print(f"Testing MAE: {test_mae:.2f}") print(f"Training R^2 Score: {train_r2:.2f}") print(f"Testing R^2 Score: {test_r2:.2f}") # Calculate accuracy as a percentage (simplified for this context) def calculate_accuracy(actual, predicted, threshold=0.01): within_threshold = np.abs(actual - predicted) <= threshold * actual accuracy = np.mean(within_threshold) * 100 return accuracy train_accuracy = calculate_accuracy(y_train_actual, train_predict) test_accuracy = calculate_accuracy(y_test_actual, test_predict) print(f"\nTraining Accuracy: {train_accuracy:.2f}%") print(f"Testing Accuracy: {test_accuracy:.2f}%") # Visualize the results print("\nVisualizing results...") train_dates = nifty_data.index[time_step+1:train_size] test_dates = nifty_data.index[train_size+time_step:-1] plt.figure(figsize=(14, 6)) plt.plot(train_dates, y_train_actual, label='Actual Training Data') plt.plot(train_dates, train_predict, label='Training Predictions') plt.plot(test_dates, y_test_actual, label='Actual Testing Data') plt.plot(test_dates, test_predict, label='Testing Predictions') plt.title('Nifty50 Price Prediction') plt.xlabel('Date') plt.ylabel('Price (INR)') plt.legend() plt.show() # Future predictions print("\nMaking future predictions...") last_60_days = scaled_data[-60:] future_predictions = [] for _ in range(30): # Reshape data for prediction X_future = last_60_days.reshape(1, time_step, 1) # Make prediction future_pred = model.predict(X_future) # Append to the input data last_60_days = np.append(last_60_days[1:], future_pred) last_60_days = last_60_days.reshape(-1, 1) # Store the prediction future_predictions.append(future_pred[0, 0]) # Inverse transform to get actual values future_predictions = np.array(future_predictions).reshape(-1, 1) future_predictions = scaler.inverse_transform(future_predictions) # Create future dates last_date = nifty_data.index[-1] future_dates = [last_date + dt.timedelta(days=i) for i in range(1, 31)] # Visualize future predictions plt.figure(figsize=(14, 6)) plt.plot(nifty_data.index[-100:], nifty_data['Close'].values[-100:], label='Historical Data') plt.plot(future_dates, future_predictions, label='Future Predictions', color='red') plt.title('Nifty50 Future Price Prediction') plt.xlabel('Date') plt.ylabel('Price (INR)') plt.legend() plt.show() # Print model summary print("\nModel Summary:") model.summary()
Credits:ezcompounding
Final Thoughts
AI is a powerful tool for stock prediction, but it’s not a crystal ball. It helps investors make educated guesses, not certainties.
Would you trust AI for stock advice? Let us know in the comments! 🚀
Happy investing! 📈
Top comments (0)