Nearest Neighbor Classifiers - Codeprg

Breaking

programing News Travel Computer Engineering Science Blogging Earning

Saturday, 9 May 2020

Nearest Neighbor Classifiers

Instance Based Classifiers:)

Rote-Learner:
Memorizes entire traing data and performs classification only if attribute record match one of the trainging examples.
Nearest neighbor:
  •     Uses k “closest” points (nearest neighbors) for performing classification
Nearest Neighbor Classifiers:
Basic idea:
     If t walks like a duck, quacks like a duck, then it’s probably a duck.
Requires three things:
The set of labeled records
Distance Metric to compute the distance between records
The value of k, the number of nearest neighbors to retrieve
To classify an unknown record:
Compute distance to other training records
identify k nearest neighbors
Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote)
Voronoi Diagram:
Compute distance between two points:
Euclidean distance:
square root of sum of distance square.
Determine the class from the nearest neighbor list:
Take the majority vote of class labels among the k-nearest neighbors
Weigh the vote according to distance
 weight factor, w = 1/d2
Choosing the value of k:
If k is too small, sensitive to noise points
If k is too large, neighborhood may include points from other classes
Scaling issues:
Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes
Example:
u height of a person may vary from 1.5m to 1.8m
u weight of a person may vary from 90lb to 300lb
u income of a person may vary from $10K to $1M

Selection of the right similarity measure is critical:
Nearest neighbor Classification:
k-NN classifiers are lazy learners since they do not build models explicitly
Classifying unknown records are relatively expensive
Can produce arbitrarily shaped decision boundaries
Easy to handle variable interactions since the decisions are based on local information
Selection of right proximity measure is essential
Superfluous or redundant attributes can create problems

Missing attributes are hard to handle

Improving KNN Efficiency:

Avoid having to compute the b  distance to all objects in the training set
Multi-dimensional access methods (k-d trees) 
Fast approximate similarity search
Locality Sensitive Hashing (LSH)

Condensing:
           Determine a smaller set of objects that give the same performance
Editing:
                  Remove objects to improve efficiency :

KNN and Proximity Graphs:
Proximity graphs
a graph in which two vertices are connected by an edge if and only if the vertices satisfy particular geometric requirements.
–nearest-neighbor graphs,
minimum spanning trees
Delaunay triangulations
relative neighborhood graphs
Gabriel graphs