Given q query point $x_0$, find k-nearest neighbors and count which type is majority.
Classify $x_0$ to the same class with the majority within k-nearest neighbors.
$\hat x_j=\frac{x_j-\mu_j}{\sigma_j}$
Condensed Nearest Neighbors $Z$ is empty set for start
Repeat
for all $x \in X$ (in random order)
find $x’ \in Z$ s.t. \(\left\Vert x-x' \right\Vert=\text{min}_{x_j \in Z} \left\Vert x-x_j \right\Vert\)
Until $Z$ does not change.
The folling codes came from website
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
# Assign colum names to the dataset
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']
# Read dataset to pandas dataframe
dataset = pd.read_csv(url, names=names)
dataset.head()
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors=5)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)
from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))