k-Nearest Neighbor Algorithm
Published:
k-Nearest Neighbor Algorithm
- Basically, this is my studying from NTCU open course - Machine Learning. I will take the key points for my references.
- define distance
- if values are real number,
- if values are ordinal,
Given q query point $x_0$, find k-nearest neighbors and count which type is majority.
Classify $x_0$ to the same class with the majority within k-nearest neighbors.
- Be careful with scale with different attributes. We might need to normalize
$\hat x_j=\frac{x_j-\mu_j}{\sigma_j}$
Condensed Nearest Neighbors $Z$ is empty set for start
Repeat
for all $x \in X$ (in random order)
find $x’ \in Z$ s.t. \(\left\Vert x-x' \right\Vert=\text{min}_{x_j \in Z} \left\Vert x-x_j \right\Vert\)
Until $Z$ does not change.- python
The folling codes came from website
- Importing the Dataset
import numpy as np import matplotlib.pyplot as plt import pandas as pd url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" # Assign colum names to the dataset names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class'] # Read dataset to pandas dataframe dataset = pd.read_csv(url, names=names)dataset.head() X = dataset.iloc[:, :-1].values y = dataset.iloc[:, 4].values - Train Test Split
from sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20) - Feature Scaling
from sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(X_train) X_train = scaler.transform(X_train) X_test = scaler.transform(X_test) - Training and Predictions
from sklearn.neighbors import KNeighborsClassifier classifier = KNeighborsClassifier(n_neighbors=5) classifier.fit(X_train, y_train) y_pred = classifier.predict(X_test) - Evaluating the Algorithm
from sklearn.metrics import classification_report, confusion_matrix print(confusion_matrix(y_test, y_pred)) print(classification_report(y_test, y_pred))
