Skip to content

suubh/Machine-Learning-in-Python

Repository files navigation

Machine Learning with Python

This repository contains Machine Learning Projects in Python programming language. All the projects are done on Jupyter Notebooks.

Libraries Required

The following libraries are required to successfully implement the projects.

  • Python 3.6+
  • NumPy (for Linear Algebra)
  • Pandas (for Data Preprocesssing)
  • Scikit-learn (for ML models)
  • Matplotlib (for Data Visualization)
  • Seaborn (for statistical data visualization)

The projects are divided into various categories listed below -

Supervised Learning

  • Linear Regression
    - Linear Regression Single Variables. : A Simple Linear Regression Model to model the linear relationship between Population and Profit for plot sales. - Linear Regression Multiple Variables. : In this project, I build a Linear Regression Model for multiple variables for predicting the House price based on acres and number of rooms.

  • Logistic Regression : In this project, I train a binary Logistic Regression classifier to predict whether a student will get selected on the basis of mid semester and end semester marks.

  • Support Vector Machine : In this project, I build a Support Vector Machines classifier for predicting Social Network Ads . It predicts whether a user with age and estimated salary will buy the product after watching the ads or not. It uses the Radial Basic Function Kernal of SVM. (90.83%)

  • K Nearest Neighbours : K Nearest Neighbours or KNN is the simplest of all machine learning algorithms. In this project, I build a kNN classifier on the Iris Species Dataset which predict the three species of Iris with four features sepal_length,sepal_width,petal_length and petal_width.

  • Naive Bayes : In this project, I build a Naïve Bayes Classifier to classify the different class of a message from sklearn dataset called fetch_20newsgroups.

  • Decision Tree Classification : In this project, I used the Iris Dataset and tried a Decision Tree Classifier which give an accuracy of 96.7% which is less than KNN (98.33%).

  • Random Forest Classification : In this project I used Random Forest Classifier (90.0%) and Random Forest Regressor (61.8%) on the Social Network Ads dataset.

Unsupervised Learning

  • K Means Clustering K-Means clustering is used to find intrinsic groups within the unlabelled dataset and draw inferences.It is one of the most detailed projects, In this project, I implement K-Means Clustering on Credit Card Dataset to cluster different credit card users based on the features.I scaled the data using StandardScaler because normalizing will improves the convergence.I also implemented the Elbow Method to search for the best numbers of clusters.For visualizing the dataset I used PCA(Principal Component Analysis) for dimensionality reduction as the dataset features were large in number.In the end I used Silhouette Score which is used to calculate the performance of clustering . It ranges from -1 to 1 and I got a score of 0.203.

NLP(Natural Language Processing)

Data Cleaning and Preprocessing