AUC or Area Under the Curve is one of the metrics that can help with machine learning model evaluation. If this value is .5 or 50% then the model is no better than random guess. If the value is 1 then the model is 100% correct and in data science this would be a red flag. This would be an indication of a feature in the model which is perfectly correlated with what we are trying to predict.
Recently I’ve been trying to figure out why my classification model had AUC value ranging between 95% – 99% . AUC this high generally is too good to be true. This is a weird problem because generally one would want to increase their AUC score rather than decrease it.
So I started with the most obvious, leakage into my model, a variable which would not be available at the time of prediction. Given that I generated about 160 variables for my model, analysis of each did not shed any light on my issue. No leakage found.
I’ve been recommended to use different techniques one of which is SMOTE (Synthetic Minority Oversampling Technique) to apply to my data set. Considering the ratio of my ‘good guys’ to my ‘bad guys’ (I’m looking to predict ‘bad guys’) is 75% to 25% a suggestion was made that a more even ratio like 50/50 would have been more beneficial for my model. Usage of SMOTE artificially generated more’bad guys’ for my data set but my AUC have not really changed.
Then I tried to limit randomly my ‘good guys’ to the same number as my ‘bad guys’. My AUC actually went up a few points.
My next attempt to understand the problem was to gradually eliminate columns with the highest Permutation Feature Importance value from my model and check the AUC score. I realized that AUC value decreased anywhere from a few to a hundred bases points with each iteration.
It turns out that I was solving a bit of a different problem. Here, management used the same data elements that I used in my model to come up with the business rules to categorized someone as a ‘bad guy’. So the model picked up those rules hence we have such a high AUC, accuracy and precision. So my model wasn’t really predicting probabilities for the ‘bad guys’ but rather the view of the bad guys through the lens of our business experts.
The model is still very useful. There are at least two benefits as I see them. First, if the expert user is no longer available and his or her rules have not been documented, the model will fulfill that gap. Second, automation, the model can relief experts from manual work so that they can concentrate on more value adding activities.
I would like to extend special thanks to Ilya Lipkovich, Ph.D. for his contribution to the resolution of this issue.
I would also like to thank Microsoft Azure ML Team, specifically Akshaya Annavajhala and Xinwei Xue, Ph.D. for their assistance.