Azure Machine Learning SMOTE – Part 2

It may be beneficial for your model to use Clean Missing Data module when using SMOTE. Let’s consider the following example of stock data.

Vizualize_TEAM_Data

My dataset is missing a values in the first row for columns Long and Short.  These two fields have been defined by me and the values depend on next days data.  If today was February 4th, we won’t have next trading days data yet, hence the missing values.
Here is what my model looks like with a Cleaner and SMOTE.

TEAM_Model

Here is the output of our data after the Cleaner.
TEAM_Cleaner_Outputl

Since my Cleaner replaced empty strings with 0s and I already used 0s in my dataset to indicate negative outcomes this is probably not an ideal practice, but for illustrative purposes this should be just fine.

Let’s Visualize the content of Evaluate Model.
TEAM_Evaluate
Pretty high AUC, accuracy, precision, recall and F1 score.  Let’s check what happens when we connect our dataset directly to SMOTE bypassing Clean Missing Data module.

TEAM_No_Cleaner_Model

Here is the output from SMOTE
TEAM_SMOTE_output_withOUT_cleaner2.jpg
So we can see that rows with empty values were also added to our dataset.  Considering that I had 4 records where Long had a value of 1 in my original dataset and I used 200% in SMOTE, I was expecting only 8 additional rows, however 10 were added.  Let’s evaluate our model.
TEAM_No_Cleaner_Evaluate.jpg
A significant drop across the board, AUC of 0.5 which means our model is no better than a random guess.

 

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s