Imbalanced Class Problem,Cost-Sensitive Classification,Computing Cost of Classification,Handling Class Imbalance Problem,Sampling-based Approaches: - Codeprg

Breaking

programing News Travel Computer Engineering Science Blogging Earning

Monday 11 May 2020

Imbalanced Class Problem,Cost-Sensitive Classification,Computing Cost of Classification,Handling Class Imbalance Problem,Sampling-based Approaches:

Class Imbalance Problem:

Lots of classification problems where the  classes are skewed ( more records from one
class than another)

Credit card fraud.
Intrusion detection.
Defective products in a manufacturing assembly line.


Challenges:


Evaluation measures such as accuracy are not well-suited for imbalanced class

Detecting the rare class is like finding needle in a haystack.

Confusion Matrix:








Most widely ->used metric:
Accuracy=(a+d)/(a+b+c+d)=(TP+TN)/(TP+TN+FP+FN)

Problem with Accuracy:

Consider a 2-class problem
Number of class 0 examples=9990
Number of class 1 examples=10

And Consider a 2-class problem:
Number of class No examples=990
Number of class yes examples=10

If a model predicts everything to be class NO, accuracy is 990/1000=99%.
This is misleading because the model does not detect any class YES example.

Detecting the rare class is usually more interesting ( e.g. -> frauds, intrusions,defects,etc)


Precision(p)=a/(a+c)
Recall(r)=a/(a+b)
F-measure(F)=2rp/r+p=2a/(2a+b+c)





Alpha is the probability that we reject the null hypothesis when it is true .This is a type I error
a false positive (FP).
Alpha=1-specificity

Beta is the probability that we accept the null hypothesis when it is false.This is a type II
error or false negative(FN).
Beta=1-sensitivity


ROC(Receiver Operating Characteristic):

A graphical approach for displaying trade-off between detection rate and false alarm rate.

Developed in the 1950s for signal detection theory to analyze noisy signals.
ROC curve plots TPR aginst FPR

->Performance of a model represented as a point in a ROC curve.

->Changing the threshold parameter of the classifier changes the location of the point.
ROC Curve:

(TPR, FPR):
(0,0): Declare everything to be a negative class
(1,1): Declare everything to be a positive class
(1,0): ideal
Diagonal line:
           Random guessing
 Below the diagonal line:
           prediction is the opposite of the true class

To draw a ROC curve, the classifier must produce continuous-valued output.
 ->Outputs are used to rank test records from the most likely positive class record to the least likely positive class record.

Many classifiers produce only discrete outputs(i.e., predicted class)

How to get continuous-valued outputs?
->With the help of Decision trees, rule-based classifiers, neural networks, Bayesian classifiers, k-nearest neighbors, SVM.


Using ROC for Model Comparison:
No model consistently outperforms the other
      M1 is better for small FPR
      M2 is better for large FPR
Area Under the ROC curve:
Ideal:
Area=1
Random guess:
Area=0.5

How to construct a ROC curve:

Use a classifier that produces a continuous-valued score for each instance
->The more likely it is for the instance to be in the + class, the higher the score.

Sot the instances in decreasing order according to the score.

Apply a threshold at each unique value of the score.

Count the number of TP, FP , TN , FN.
TPR=TP/(TP+FN)
FPR=FP/(FP+TN)

Handling Class Imbalance  Problem:

Class-based ordering (e.g. RIPPER):
Rules for a rare class have higher priority

Cost-sensitive classification:
 Misclassifying rare class as majority class is more expensive than the misclassifying majority
as rare class

Sampling-based approaches:

Cost Matrix:
C(i,j): Cost of misclassifying I example as a class I.
Cost= Sum of (C(i,j)*f(i,j))

Computing Cost of Classification:

Given a test record x :

compute p(i|x) for each class i
Decision rule: classify node as class k if  k=arg max p(i|x)

For 2-class classify x as + if  p(+|x)  > p(-|x)
This decision rule implicitly assumes that  C(+|+)=C(-|-)=0 and C(+|-)=C(-|+)

Cost-Sensitive Classification:
Classify test record x as class k if
                k=arg min (Sum of  p(i|x)*C(i,j))

2-class:
cost(+)=p(+|x)C(x,x) +p(-|x)C(-,+)

cost(-)=p(+|x)C(+,-)+p(-|x)C(-,-)
Decision rule: classify  x as + if cost(+) < cost(-)
if C(+,+)=C(-,-)=0:
p(+|x) > C(-,+)/C(-,+)+C(+,-)

Sampling-based Approaches:
Modify the distribution of training data so that rare class is well represented in the training set

     Undersample the majority class
     Oversample the rare class