 
  Data Structure Data Structure
 Networking Networking
 RDBMS RDBMS
 Operating System Operating System
 Java Java
 MS Excel MS Excel
 iOS iOS
 HTML HTML
 CSS CSS
 Android Android
 Python Python
 C Programming C Programming
 C++ C++
 C# C#
 MongoDB MongoDB
 MySQL MySQL
 Javascript Javascript
 PHP PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to use ML for Wine Quality Prediction?
This tutorial will take a wine quality dataset from online sources such as Kaggle. The preferred dataset is the "Wine Quality Dataset," available at "https://www.kaggle.com/datasets/yasserh/wine-quality-dataset."
The dataset contains a .csv file comprising various categories of wine, such as 'fixed acidity,' 'volatile acidity,' 'pH,' 'density,' and more. From this dataset, the field name 'quality' was dropped at the initial stage, and further, the model was trained.
Here is the Python code to predict the wine quality.
- Importing the necessary libraries. 
import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error import matplotlib.pyplot as plt
- Import the wine quality dataset 
 wine = pd.read_csv('/Users/someswarpal/Downloads/WineQT.csv')  - Drop the column named quality. 
X = wine.drop(columns=['quality']) y = wine['quality']
- Split the data into testing and training sets. 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
- Create a linear regression model 
model = LinearRegression()
- Train the model 
model.fit(X_train, y_train)
- Make predictions on the training sets. 
y_pred = model.predict(X_test)
- Evaluate the model 
 mse = mean_squared_error(y_test, y_pred) print("Mean Squared Error:", mse)  - Calculate the mean quality for each category 
 mean_quality = wine.groupby('quality')['quality'].mean()  Output
Mean Squared Error: 0.38242835212919696
- Find the category with the highest mean quality 
best_quality = mean_quality.idxmax() best_mean_quality = mean_quality.max()
- Print the summary for best Wine. 
 print("Summary of Wine Quality:") print("----------------------------") print("Best Wine Quality Category:", best_quality) print("Mean Quality Score:", best_mean_quality)  Output
Summary of Wine Quality: ---------------------------- Best Wine Quality Category: 8 Mean Quality Score: 8.0
- Find the category with the lowest mean quality 
worst_quality = mean_quality.idxmin() worst_mean_quality = mean_quality.min()
- Print the summary for worst Wine 
Example
 print("Summary of Wine Quality:") print("----------------------------") print("Worst Wine Quality Category:", worst_quality) print("Mean Quality Score:", worst_mean_quality)  Output
Summary of Wine Quality: ---------------------------- Worst Wine Quality Category: 3 Mean Quality Score: 3.0
Conclusion
In conclusion, the code analyzes and displays data from a collection about wine quality in several ways. It starts by reading the dataset and separating it into input features (X) and the goal variable (y). The training set is then used to make and train a linear regression model. On the test set, predictions are then made, and the mean squared error is used to measure how well the model works.
The code also determines each category's average quality in the dataset and finds the category whose average quality is the best. Scatter plots, histograms, box plots, bar charts, line plots, correlation heatmaps, and pie charts are some of the images that can be made. These pictures show how different things affect the quality of the wine.
Overall, the code thoroughly studies the wine quality dataset, from modeling and evaluating the data to showing how the data are distributed and how they relate to each other. It shows how to use famous libraries for data analysis and visualization, such as Pandas, NumPy, sci-kit-learn, matplotlib, and Seaborn, to make the analysis process more accessible and give helpful information for understanding the dataset.
