Sign in

Data Scientist. Biology/Chemistry Nerd. Find me on LinkedIn — Jennifer Boyles!
Free for commercial use from

K-nearest neighbors (KNN) is a basic machine learning algorithm that is used in both classification and regression problems. KNN is a part of the supervised learning domain of machine learning, which strives to find patterns to accurately map inputs to outputs based on known ground truths. KNN is a non-parametric algorithm meaning we do not make assumptions about the distribution of our data. An example of such an assumption, often seen in linear regression algorithms, is assuming the distribution of our data is approximately normal. Moving on, as I have hit my quota for saying the word ‘assumption’ today.


These past couple of weeks our DSI cohort has been learning the intricacies of supervised machine learning — the linear regression, logistic regression, and k-nearest neighbor algorithms, to be more specific. There are several kinds of machine learning: supervised, unsupervised, semi-supervised, and reinforcement.

In this post, I will be focusing on supervised machine learning. What exactly is supervised machine learning? Supervised machine learning occurs when a ‘supervisor’ helps train the algorithm. This supervisor is a training dataset, in which the ground truth or true values are known. After training the algorithm on this dataset, the coefficients for each predictor variables…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store