Next couple of post of this series will be a tutorial about machine learning, one of the most popular branches of AI.
Environment
I will work with the following libraries(NumPy, SciPy, scikit-learn, matplotlib). I build a tiny install script.
mkdir -p talkingaboutml/talkingaboutml python3 -m virtualenv talkingaboutml/venv talkingaboutml/venv/bin/pip install numpy scipy scikit-learn matplotlib now, your talkingaboutml dir looks like:
talkingaboutml/ ├── talkingaboutml (here we store our examples) └── venv First example
On our first example i will use sckit datasets (are avaiable on sklearn.datasets), there are many example datasets. I Choose iris. This dataset is a multi-class classification dataset.
As a First example, i will train a simple classification and run a predict.
We need some imports, datasets, accuracy metric and a linear svc:
from sklearn import datasets from sklearn.metrics import accuracy_score from sklearn.svm.classes import SVC Load dataset, this datasets are already divided (data, target).
iris = datasets.load_iris() X = iris.data # each register, a iris with features y = iris.target # classification for each register feature_number = X.shape[1] Create classification, train and predict.
clf = SVC(kernel='linear', C=1.0, probability=True, random_state=0) # 'Linear SVC' clf.fit(X, y) # Train y_pred = clf.predict(X) accuracy = accuracy_score(y, y_pred) print(accuracy) So... let's do this, in this example i train a single classificator with different C(penalty) values. This parameter tells svm how match you want to avoid misclassifying each training example. A good explanation can be found here or here.
import matplotlib.pyplot as plt import numpy as np from sklearn import datasets from sklearn.metrics import accuracy_score from sklearn.svm.classes import SVC iris = datasets.load_iris() X = iris.data # each register, a iris with features y = iris.target # clasiffication ir each register feature_number = X.shape[1] penalties = list(np.arange(0.5,10.0, 0.1)) accs = [] for C in penalties: clf = SVC(kernel='linear', C=C, probability=True, random_state=0) # 'Linear SVC' clf.fit(X, y) # Train y_pred = clf.predict(X) accuracy = accuracy_score(y, y_pred) accs.append(accuracy) # plot the data fig = plt.figure() ax = fig.add_subplot(1, 1, 1) ax.plot(penalties, accs, 'r') plt.show() As we can see, the penalty factor. If it is too large, we have too many support vector and it may cause overfit.

Top comments (0)