Perceptron Algorithm
Published:
- Perceptron Algorithm
- Binary Classification Problem
- Perceptron Algorithm (Primal From)
- Stop in Finite steps
- Perceptron Algorithm(Dual Form)
- Python Code
Perceptron Algorithm
- Basically, this is my studying from NTCU open course - Machine Learning. I will take the key points for my references.
Binary Classification Problem
Given a training dataset
\[S=\{(x_i,y_i) | x_i \in \mathbb{R}^n, y_i \in {-1,1,i=1,2,\dots,l} \}\] \[\begin{aligned} x_i \in A_{+} & \Longleftrightarrow y_i=1 \\ x_i \in A_{-} & \Longleftrightarrow y_i=-1 \end{aligned}\]Main Goal
- Predit the unseen class label for new data
- Find a function $f:\mathbb{R}_n \rightarrow \mathbb{R}$ by learning from data such that
Here the simplest function is linear
\[f(x)=w^{T}x+b\]Perceptron Algorithm (Primal From)
An on-line and mistake-driven produce
Repeat 
 for i =1 to l 
 if $y_i(\langle w_{k},x_i \rangle +b_k) \leq 0$, then
end if 
 Until no mistakes made within the for loop 
 return: $k,(w_k,b_k)$ 
 Remark:
Stop in Finite steps
Theorem(Novikoff) Let $S$ be a non-trival training set, and let
\[R=\max\limits_{1\leq i \leq l} \Vert x_i\Vert\]Suppose that there exists a vector $\omega_{opt}$ such that $\Vert \omega_{opt} \Vert =1 $ and
\[y_i(\langle w_{opt},x_i \rangle +b_{opt}) \geq r \quad \text{for} \quad 1\leq i \leq l\]The number of mistakes made by the on-line perceptron algorithm on $S$ is almost $\left(\frac{2R}{r}\right)^2$
Remark:
- the value of $\eta$ is irreverent.
Perceptron Algorithm(Dual Form)
\[w_i=\sum\limits_{i=1}^l \alpha_i y_i x_i\]Given a linearly seperable training set $S$ and $\alpha=0$, $\alpha \in \mathbb{R}^l$, $b=0$ and $R=\max\limits_{1\leq i \leq l }\Vert x_i \Vert$
\[y_i(\langle w_{k},x_i \rangle +b_{k})=y_i\left(\sum\limits_{i=1}^l \alpha_i y_i \langle w_i,x_i \rangle +b_k \right)\]Repeat 
 for i= 1 to l 
 if $y_i\left(\sum\limits_{i=1}^l \alpha_i y_i \langle w_i,x_i \rangle +b_k \right) \leq 0$, then
end if 
 Until no mistakes made within the for loop 
 Return :$(\alpha, b)$
 Remark:
- The number of updates:
- \[\sum\limits_{i=1}^l \alpha_i=\Vert \alpha \Vert \leq \left( \frac{2R}{r} \right)^2\]
which $\alpha_i >0$ means that $\langle x_i,y_i \rangle$ has been misclassified.
- Training data only appear in the algorithm through the entries of the Gram matrix
Python Code
The following code are from website
- Import datafrom sklearn import datasets import numpy as np import pandas as pd import matplotlib.pyplot as plt import matplotlib.lines as mlines np.random.seed(10) # Load data iris=datasets.load_iris() X = iris.data[0:99,:2] y = iris.target[0:99]
- Plot figure# Plot figure plt.plot(X[:50, 0], X[:50, 1], 'bo', color='blue', label='0') plt.plot(X[50:99, 0], X[50:99, 1], 'bo', color='orange', label='1') plt.xlabel('sepal length') plt.ylabel('sepal width') plt.legend()# Update y into -1 and 1 y=np.array([1 if i==1 else -1 for i in y])
- Perceptron Algorithm (Primal From)######### # Gradient Descent ######### # Initialize parameters w=np.ones((X.shape[1],1)); b=1; learning_rate=0.1; Round=0; All_Correct=False; # Start Gradient Descent while not All_Correct: misclassified_count=0 for i in range(X.shape[0]): XX=X[i,] yy=y[i] if yy * (np.dot(w.T,XX.T)+b)<0: w+=learning_rate * np.dot(XX,yy).reshape(2,1) b+=learning_rate * yy misclassified_count +=1 if misclassified_count==0: All_Correct=True else: All_Correct=False Round += 1 print(Round) print(w) print(b)
- Plot result hyperplanex_points = np.linspace(4,7,10) y_ = -(w[0]*x_points + b)/w[1] plt.plot(x_points, y_) plt.plot(X[:50, 0], X[:50, 1], 'bo', color='blue', label='0') plt.plot(X[50:99, 0], X[50:99, 1], 'bo', color='orange', label='1') plt.xlabel('sepal length') plt.ylabel('sepal width') plt.legend()
