Skip to main content

Posts

Showing posts with the label k-NN algorithm

k-NN algorithm using different distance functions (breast cancer data set)

k-NN using breast cancer data from Winsconsin ( https://data.world/health/breast-cancer-wisconsin/workspace/file?filename=breast-cancer-wisconsin-data%2Fdata.csv ) General information about k-NN algorithm can be found:   http://dataworldblog.blogspot.com.es/2017/08/k-nn-algorithm.html In order to measure the similarity between two instances is used a distance function. There are different ways to calculate distance, but traditionally the k-NN algorithm uses Euclidean distance, which is the “ordinary” or “straight-line” distance between two points. It has been demonstrated that the chosen distance function can affect the classification accuracy of the k-NN classifier. The distance calculation for k-NN is heavily dependent on the measurement scale of the input features. Since different inputs have different ranges of values, those inputs with larger range of value will have a larger impact than those that have smaller range of values. This could potentially cause problems ...