Machine Learning has always sounded like something challenging and technical . But the more I study it, the more I realize it’s simply about teaching computers to learn from data. One of the most important branches of Machine Learning I’ve been exploring lately is Supervised Learning—and in this post, I want to focus specifically on classification, sharing what I’ve learned so far, the models I’ve used, and some of the challenges I’ve faced as a student diving into this fascinating world.
What is Supervised Learning?
As a former teacher, I have realised that supervised Learning is like teaching a child using flashcards. You show them an apple, tell them “this is an apple,” and do the same with oranges, bananas, and so on. Over time, they start recognizing fruits on their own.
In the same way, supervised learning uses labeled data—meaning the input data already comes with the correct answers (labels). The algorithm studies this relationship and later predicts labels for unseen data.
For example:
- If we feed a model patient data (like age, blood pressure, sugar levels) with labels (“diabetic” or “not diabetic”), the model learns to classify new patients into these categories.
How Classification Works
Classification is all about sorting things into groups. The data has features (inputs), and the task is to predict which class (output) each data point belongs to.
Here’s the step-by-step way I think about it:
- Collect and label data – You need a dataset where the right answers (classes) are already known.
- Train the model – Feed this data to an algorithm so it can learn the relationship between features and labels.
- Test the model – Check how well it predicts on unseen data.
- Deploy – Use it to make real-world decisions.
A simple example from daily life: Gmail classifying emails into Spam or Not Spam. That’s binary classification. More complex tasks, like classifying animals into cats, dogs, or birds, are called multi-class classification.
Types of Classification
When we first started learning about classification in class, I found it really helpful to understand that classification itself has different types:
Binary Classification – Only two classes (e.g., spam vs not spam).
Multi-Class Classification – More than two classes, but each data point belongs to just one (e.g. classifying fruits into apple, banana, or mango).
Multi-Label Classification – Each data point can belong to multiple categories at once (e.g. tagging a photo as “beach,” “sunset” and “friends”).
Understanding these types cleared up a lot of confusion for me when I was getting started!
Common Models Used for Classification
While exploring classification, I came across several algorithms, each with its own strengths and weaknesses:
- Logistic Regression – Despite its name, it’s actually used for classification.
- Decision Trees – Easy to understand and interprete.
- Random Forests – A collection of decision trees working together .
- **Support Vector Machines (SVMs) – Great at finding boundaries, but can be heavy.
- k-Nearest Neighbors (kNN) – Looks at “neighbors” to decide the class.
- **Neural Networks – Super powerful for complex tasks like image and speech.
💻 A Simple Python Example
When I was first learning classification, writing the code felt overwhelming. But once I discovered scikit-learn, things clicked. Here’s a simple example using Logistic Regression on the Iris dataset:
from sklearn.datasets import load_iris from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score # Load dataset iris = load_iris() X = iris.data y = iris.target # Split into train/test X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Train model model = LogisticRegression(max_iter=200) model.fit(X_train, y_train) # Predict & evaluate y_pred = model.predict(X_test) print("Accuracy:", accuracy_score(y_test, y_pred))
Seeing the accuracy printed out for the first time was a fulfilling moment for me —the computer had actually “learned” from data!
My Personal Insights
As a student in Machine Learning, I find classification exciting because it’s so close to real life. Every day, we make classifications in our minds—deciding if a matatu is too full , whether a mango is ripe, or even whether it will rain judging from the sky.
One big lesson: no algorithm is a silver platter. Sometimes a simple logistic regression beats a fancy neural network, depending on the dataset.
Challenges I’ve Faced
It hasn’t been smooth sailing. Here are some of the struggles I’ve encountered while working with classification:
- Understanding the Types of Supervised Learning: Differentiating between classification and regression was tough at first. Writing their Python code also felt intimidating because I didn’t know which libraries to use.
- Data Quality: Missing values, duplicates, or wrong labels can ruin everything .
- Overfitting: My decision trees once performed perfectly on training data but terribly on test data.
- Computational Resources: Neural networks are amazing, but without a good GPU, they can be painfully slow, thus I have to use google colab.
Conclusion
Supervised learning, and classification in particular, has given me a new appreciation of how data can drive intelligent decisions. From simple logistic regression to powerful neural networks, the journey of trying, failing, debugging, and improving has taught me not just technical skills but also patience.
I’m still learning, but one thing is clear: classification is not just about algorithms—it’s about asking the right questions, preparing the right data, and interpreting results responsibly.
**I’d love to hear your thoughts in the comments under this article!
Top comments (0)