Nearest Neighbor Classifiers

Instance Based Classifiers:)

Rote-Learner:

Memorizes entire traing data and performs classification only if attribute record match one of the trainging examples.

Nearest neighbor:

Uses k “closest” points (nearest neighbors) for performing classification

Nearest Neighbor Classifiers:

Basic idea:

If t walks like a duck, quacks like a duck, then it’s probably a duck.

Requires three things:

The set of labeled records

Distance Metric to compute the distance between records

The value of k, the number of nearest neighbors to retrieve

To classify an unknown record:

Compute distance to other training records

identify k nearest neighbors

Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote)

Voronoi Diagram:

Compute distance between two points:

Euclidean distance:

square root of sum of distance square.

Determine the class from the nearest neighbor list:

Take the majority vote of class labels among the k-nearest neighbors

Weigh the vote according to distance

weight factor, w = 1/d2

Choosing the value of k:

If k is too small, sensitive to noise points

If k is too large, neighborhood may include points from other classes

Scaling issues:

–Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes

–Example:

u height of a person may vary from 1.5m to 1.8m

u weight of a person may vary from 90lb to 300lb

u income of a person may vary from $10K to $1M

Selection of the right similarity measure is critical:
Nearest neighbor Classification:

k-NN classifiers are lazy learners since they do not build models explicitly

Classifying unknown records are relatively expensive

Can produce arbitrarily shaped decision boundaries

Easy to handle variable interactions since the decisions are based on local information

Selection of right proximity measure is essential

Superfluous or redundant attributes can create problems

Missing attributes are hard to handle

Improving KNN Efficiency:

Avoid having to compute the b distance to all objects in the training set

–Multi-dimensional access methods (k-d trees)

–Fast approximate similarity search

–Locality Sensitive Hashing (LSH)

Condensing:
Determine a smaller set of objects that give the same performance

Editing:

Remove objects to improve efficiency :

KNN and Proximity Graphs:

Proximity graphs

–a graph in which two vertices are connected by an edge if and only if the vertices satisfy particular geometric requirements.

–nearest-neighbor graphs,

–minimum spanning trees

–Delaunay triangulations

–relative neighborhood graphs

–Gabriel graphs

Breaking

Codeprg

Saturday, 9 May 2020

Nearest Neighbor Classifiers

Instance Based Classifiers:)

Improving KNN Efficiency:

Translate

Search

Popular

Categories

Labels

Pages

Contact Form

Popular Posts

Breaking

Codeprg

Saturday, 9 May 2020

Nearest Neighbor Classifiers

Instance Based Classifiers:)

Improving KNN Efficiency:

Subscribe via email

Translate

Search

Popular

Categories

Labels

Pages

Contact Form

Popular Posts