k-Nearest Neighbor Algorithm
Published:
k-Nearest Neighbor Algorithm
- Basically, this is my studying from NTCU open course - Machine Learning. I will take the key points for my references.
- define distance- if values are real number,
 - if values are ordinal,
 
- Given q query point $x_0$, find k-nearest neighbors and count which type is majority. 
- Classify $x_0$ to the same class with the majority within k-nearest neighbors. 
- Be careful with scale with different attributes. We might need to normalize
- $\hat x_j=\frac{x_j-\mu_j}{\sigma_j}$ 
- Condensed Nearest Neighbors $Z$ is empty set for start 
 Repeat
 for all $x \in X$ (in random order)
 find $x’ \in Z$ s.t. \(\left\Vert x-x' \right\Vert=\text{min}_{x_j \in Z} \left\Vert x-x_j \right\Vert\)
 Until $Z$ does not change.
- python
- The folling codes came from website 
- Importing the Datasetimport numpy as np import matplotlib.pyplot as plt import pandas as pd url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data" # Assign colum names to the dataset names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class'] # Read dataset to pandas dataframe dataset = pd.read_csv(url, names=names)dataset.head() X = dataset.iloc[:, :-1].values y = dataset.iloc[:, 4].values
- Train Test Splitfrom sklearn.model_selection import train_test_split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)
- Feature Scalingfrom sklearn.preprocessing import StandardScaler scaler = StandardScaler() scaler.fit(X_train) X_train = scaler.transform(X_train) X_test = scaler.transform(X_test)
- Training and Predictionsfrom sklearn.neighbors import KNeighborsClassifier classifier = KNeighborsClassifier(n_neighbors=5) classifier.fit(X_train, y_train) y_pred = classifier.predict(X_test)
- Evaluating the Algorithmfrom sklearn.metrics import classification_report, confusion_matrix print(confusion_matrix(y_test, y_pred)) print(classification_report(y_test, y_pred))
