It may be beneficial for your model to use Clean Missing Data module when using SMOTE. Let’s consider the following example of stock data.
My dataset is missing a values in the first row for columns Long and Short. These two fields have been defined by me and the values depend on next days data. If today was February 4th, we won’t have next trading days data yet, hence the missing values.
Here is what my model looks like with a Cleaner and SMOTE.
Here is the output of our data after the Cleaner.
Since my Cleaner replaced empty strings with 0s and I already used 0s in my dataset to indicate negative outcomes this is probably not an ideal practice, but for illustrative purposes this should be just fine.
Let’s Visualize the content of Evaluate Model.
Pretty high AUC, accuracy, precision, recall and F1 score. Let’s check what happens when we connect our dataset directly to SMOTE bypassing Clean Missing Data module.
Here is the output from SMOTE
So we can see that rows with empty values were also added to our dataset. Considering that I had 4 records where Long had a value of 1 in my original dataset and I used 200% in SMOTE, I was expecting only 8 additional rows, however 10 were added. Let’s evaluate our model.
A significant drop across the board, AUC of 0.5 which means our model is no better than a random guess.