🎯 Student Result Prediction using Random Forest Classifier

This project applies a Random Forest Classifier to predict whether a student will Pass or Fail based on their features (e.g., Age, Quiz Score, etc.).

The pipeline includes feature preparation, train-test split, model training, evaluation, and feature importance analysis.

📂 Steps in the Pipeline

1️⃣ Prepare Features (X) and Label (y)

We remove irrelevant or target columns:

StudentID → Not useful for prediction
Country → Categorical (not encoded for this task)
Result → Target variable (label)

X = df.drop(columns=["StudentID", "Country", "Result"]) y = df["Result"] 2️⃣ Train-Test Split Split the dataset into 80% training and 20% testing. Random state ensures reproducibility. python Copy Edit from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) 3️⃣ Build & Train Random Forest Classifier python Copy Edit from sklearn.ensemble import RandomForestClassifier model = RandomForestClassifier(random_state=42) model.fit(X_train, y_train) y_pred = model.predict(X_test) 4️⃣ Model Evaluation Use accuracy score and classification report to check performance. python Copy Edit from sklearn.metrics import accuracy_score, classification_report print("Accuracy:", accuracy_score(y_test, y_pred)) print(classification_report(y_test, y_pred, target_names=["Fail", "Pass"])) 5️⃣ Feature Importance Visualize which features are most important in predicting the result. python Copy Edit import matplotlib.pyplot as plt import seaborn as sns importances = model.feature_importances_ features = X.columns plt.figure(figsize=(8, 5)) sns.barplot(x=importances, y=features, palette="viridis") plt.title("Feature Importance in Random Forest") plt.xlabel("Importance Score") plt.ylabel("Features") plt.tight_layout() plt.show() 📊 Example Outputs Accuracy: e.g., 0.87 (87%) Classification Report: Precision, Recall, and F1-score for Pass and Fail Feature Importance Plot: Shows which features most influenced the model 🚀 Next Steps Encode categorical features like Country Hyperparameter tuning for Random Forest (GridSearchCV / RandomizedSearchCV) Compare with other models (Logistic Regression, XGBoost, etc.) 📌 Author 👤 Virul Methdinu Meemana 📍 Sri Lanka | Information & Technology Student at SLIIT 🔗 GitHub | LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Machine_Learning_Pipeline.ipynb		Machine_Learning_Pipeline.ipynb
README.md		README.md
student_data.csv		student_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎯 Student Result Prediction using Random Forest Classifier

📂 Steps in the Pipeline

1️⃣ Prepare Features (X) and Label (y)

About

Uh oh!

Releases

Packages

Languages

MrVirul/data_preprocessing_MLPipeline

Folders and files

Latest commit

History

Repository files navigation

🎯 Student Result Prediction using Random Forest Classifier

📂 Steps in the Pipeline

1️⃣ Prepare Features (X) and Label (y)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages