Posted on Mar 12, 2019

Talking about Machine Learning (I): Setup

#machinelearning #tutorials #python #beginners

Next couple of post of this series will be a tutorial about machine learning, one of the most popular branches of AI.

Environment

I will work with the following libraries(NumPy, SciPy, scikit-learn, matplotlib). I build a tiny install script.

mkdir -p talkingaboutml/talkingaboutml python3 -m virtualenv talkingaboutml/venv talkingaboutml/venv/bin/pip install numpy scipy scikit-learn matplotlib

now, your talkingaboutml dir looks like:

talkingaboutml/ ├── talkingaboutml (here we store our examples) └── venv

First example

On our first example i will use sckit datasets (are avaiable on sklearn.datasets), there are many example datasets. I Choose iris. This dataset is a multi-class classification dataset.

As a First example, i will train a simple classification and run a predict.

We need some imports, datasets, accuracy metric and a linear svc:

from sklearn import datasets from sklearn.metrics import accuracy_score from sklearn.svm.classes import SVC

Load dataset, this datasets are already divided (data, target).

iris = datasets.load_iris() X = iris.data # each register, a iris with features y = iris.target # classification for each register  feature_number = X.shape[1]

Create classification, train and predict.

 clf = SVC(kernel='linear', C=1.0, probability=True, random_state=0) # 'Linear SVC'  clf.fit(X, y) # Train  y_pred = clf.predict(X) accuracy = accuracy_score(y, y_pred) print(accuracy)

So... let's do this, in this example i train a single classificator with different C(penalty) values. This parameter tells svm how match you want to avoid misclassifying each training example. A good explanation can be found here or here.

import matplotlib.pyplot as plt import numpy as np from sklearn import datasets from sklearn.metrics import accuracy_score from sklearn.svm.classes import SVC iris = datasets.load_iris() X = iris.data # each register, a iris with features y = iris.target # clasiffication ir each register  feature_number = X.shape[1] penalties = list(np.arange(0.5,10.0, 0.1)) accs = [] for C in penalties: clf = SVC(kernel='linear', C=C, probability=True, random_state=0) # 'Linear SVC'  clf.fit(X, y) # Train  y_pred = clf.predict(X) accuracy = accuracy_score(y, y_pred) accs.append(accuracy) # plot the data fig = plt.figure() ax = fig.add_subplot(1, 1, 1) ax.plot(penalties, accs, 'r') plt.show()