Azure ML Feature Engineering – Convert to Indicator Values

Feature engineering is probably one on my favorite aspects of data science.  This is the area where domain expertise and creativity can pay high dividends.  Essentially feature engineering allows us to come up with our own features or columns to make our models better.  We can apply numerous tricks from a variety of tools provided by Azure ML Studio.  Here is a screenshot of different manipulation modules:


Convert to Indicator Value is a module that will transform values in a rows of a column into separate columns with binary values.  For example, if we have a data set with a single column A with 3 rows and values ‘b’, ‘c’, and ‘d’, applying Convert to Indicator Values produces a data set with original column A and 3 new columns b, c and d with 1s and 0s indicating appropriate value.    Transformation of categorical values into columns has been available in most statistical software, for example Minitab.  In one of my M.B.A. courses when studying regression we called ‘b’, ‘c’ and ‘d’, “dummy variables“.


When adding Convert to Indicator Value, make sure to use a Metadata Editor to convert the field to categorical in order to avoid getting the following error ‘Column with name “xxx” is not in an allowed category. . ( Error 0056 )”.

Make sure to check Overwrite categorical columns, otherwise original column would stay in the dataset. It has been mentioned that keeping the original column may help a decision tree algorithm whereas removing it may help a simple linear algorithm.