DEV Community

Cover image for Talking about Machine Learning (I): Setup
Andrés Baamonde Lozano
Andrés Baamonde Lozano

Posted on

Talking about Machine Learning (I): Setup

Next couple of post of this series will be a tutorial about machine learning, one of the most popular branches of AI.

Environment

I will work with the following libraries(NumPy, SciPy, scikit-learn, matplotlib). I build a tiny install script.

mkdir -p talkingaboutml/talkingaboutml python3 -m virtualenv talkingaboutml/venv talkingaboutml/venv/bin/pip install numpy scipy scikit-learn matplotlib 
Enter fullscreen mode Exit fullscreen mode

now, your talkingaboutml dir looks like:

talkingaboutml/ ├── talkingaboutml (here we store our examples) └── venv 
Enter fullscreen mode Exit fullscreen mode

First example

On our first example i will use sckit datasets (are avaiable on sklearn.datasets), there are many example datasets. I Choose iris. This dataset is a multi-class classification dataset.

As a First example, i will train a simple classification and run a predict.

We need some imports, datasets, accuracy metric and a linear svc:

from sklearn import datasets from sklearn.metrics import accuracy_score from sklearn.svm.classes import SVC 
Enter fullscreen mode Exit fullscreen mode

Load dataset, this datasets are already divided (data, target).

iris = datasets.load_iris() X = iris.data # each register, a iris with features y = iris.target # classification for each register  feature_number = X.shape[1] 
Enter fullscreen mode Exit fullscreen mode

Create classification, train and predict.

 clf = SVC(kernel='linear', C=1.0, probability=True, random_state=0) # 'Linear SVC'  clf.fit(X, y) # Train  y_pred = clf.predict(X) accuracy = accuracy_score(y, y_pred) print(accuracy) 
Enter fullscreen mode Exit fullscreen mode

So... let's do this, in this example i train a single classificator with different C(penalty) values. This parameter tells svm how match you want to avoid misclassifying each training example. A good explanation can be found here or here.

import matplotlib.pyplot as plt import numpy as np from sklearn import datasets from sklearn.metrics import accuracy_score from sklearn.svm.classes import SVC iris = datasets.load_iris() X = iris.data # each register, a iris with features y = iris.target # clasiffication ir each register  feature_number = X.shape[1] penalties = list(np.arange(0.5,10.0, 0.1)) accs = [] for C in penalties: clf = SVC(kernel='linear', C=C, probability=True, random_state=0) # 'Linear SVC'  clf.fit(X, y) # Train  y_pred = clf.predict(X) accuracy = accuracy_score(y, y_pred) accs.append(accuracy) # plot the data fig = plt.figure() ax = fig.add_subplot(1, 1, 1) ax.plot(penalties, accs, 'r') plt.show() 
Enter fullscreen mode Exit fullscreen mode

penalties result

As we can see, the penalty factor. If it is too large, we have too many support vector and it may cause overfit.

Top comments (0)