A Random Forests Quantile Classifier for Class Imbalanced Data

GET STARTED
1
Request Info
2
Visit
3
Apply
GET STARTED
1
Request Info
2
Visit
3
Apply

Title:

A Random Forests Quantile Classifier for Class Imbalanced Data

Author:

Robert O'Brien

Date:

2018

Executive Summary:

Extending previous work on quantile classifiers, we propose a random forests quantile classifier for the class imbalanced data problem. The new random forests classifier assigns an example to the minority class if the minority class conditional probability exceeds the unconditional probability of observing a minority class instance. The motivation for our random forests quantile classifier stems from a density-based approach and leads to the useful property that it maximizes the sum of the true positive and true negative rates. Moreover, because the procedure can be equivalently expressed as a cost-weighted Bayes classifier, it also minimizes weighted risk. Because of this dual optimization, unlike the traditional Bayes classifier, the random forests quantile classifier can achieve near zero risk in highly imbalanced problems, while simultaneously optimizing true positive and true negative rates. A common strategy employed by classifiers for imbalanced data is to under-sample the majority class and apply Bayes rule. We show this strategy allows the Bayes rule (a median-classifier, q=0.5) to achieve the goal of jointly optimizing true positive and true negative rates. At the same time, we show that the random forests quantile classifier is invariant to such sampling strategies and retains its optimality regardless. Moreover, we show it outperforms under-sampling with respect to G-mean performance and variable selection in rare, high-dimensional, and high-imbalanced settings.