k-Nearest Neighbor Algorithm

1 minute read

Published: September 23, 2019

k-Nearest Neighbor Algorithm

k-Nearest Neighbor Algorithm

Basically, this is my studying from NTCU open course - Machine Learning. I will take the key points for my references.

define distance
- if values are real number,
\[d(x,x_0)=\left\Vert (x-x_0)\right\Vert^2\]
- if values are ordinal,
\[d(x,x_0)=\sum_{i=1}^n \mathbb{1}(x_i \neq x_0)\]
Given q query point $x_0$, find k-nearest neighbors and count which type is majority.
Classify $x_0$ to the same class with the majority within k-nearest neighbors.
Be careful with scale with different attributes. We might need to normalize
$\hat x_j=\frac{x_j-\mu_j}{\sigma_j}$
Condensed Nearest Neighbors $Z$ is empty set for start
Repeat
for all $x \in X$ (in random order)
find $x’ \in Z$ s.t. $\left\Vert x-x' \right\Vert=\text{min}_{x_j \in Z} \left\Vert x-x_j \right\Vert$
Until $Z$ does not change.
python

The folling codes came from website

Importing the Dataset

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
# Assign colum names to the dataset
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
# Read dataset to pandas dataframe
dataset = pd.read_csv(url, names=names)

dataset.head()
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values

Train Test Split

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)

Feature Scaling

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Training and Predictions

from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

Evaluating the Algorithm

from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

Share on

Twitter Facebook Google+ LinkedIn

Elsa CHOI

k-Nearest Neighbor Algorithm

k-Nearest Neighbor Algorithm

Share on

You May Also Enjoy

Stochastic Processs

紅綠燈

Track Probability

04747 Java