I prefer to have an end-to-end solution when evaluating a potential product. This goes well with “small batch” processing concept mentioned by Eric Ries in his book The Lean Startup. So in order to truly understand how to use a particular technology I like to build a prototype solution for a problem I’m trying to solve. I applaud Microsoft and Azure ML Team in particular for the ease with which Azure ML can be used by a novice user. The fact that a web service can be deployed with a single click of a button is absolutely awesome. Another really great feature is the availability of sample code to consume your web service. Once you click on your newly created web service, click on BATCH EXECUTION. Scroll to the bottom and you’ll see sample code in C#, Python and R.
I created a console app using Visual Studio 2015 and sample code mentioned above. Installed ‘Package Manager Console’ as listed in the Sample Code instructions:
Tools -> Nuget Package Manager -> Package Manager Console
I also searched Nuget for WindowsAzure.Storage, installed it, which automatically added all the require references including Microsoft.WindowsAzure.Storage.dll as listed in the documentation.
Search for “replace” in the code to find all the reference where you need to add your own data, like Azure Blob storage account, storage key, the key for your newly created web service, etc.
Below are some tips to get the code running, you’ll see a command prompt window come up when the program runs.
The program moves a file from my local machine to Azure Blob storage, scores it using a classification model and downloads a new file with scored labels and probabilities to my local machine.
Things to note when putting file location, use verbatim c# string in file name, basically putting @ in front of the string quotes will allow the compiler to disregard the slashes as escaped characters. Here are key constants that require to be set:
const string StorageAccountName = “yourstoragaccount”; // Replace this with your Azure Storage Account name
const string StorageAccountKey = “yourkey==”; // Replace this with your Azure Storage Key
const string StorageContainerName = “yourcontainer”; // Replace this with your Azure Storage Container name
const string InputFileLocation = @”C:\Temp\records.csv”; // Replace this with the location of your input file
const string InputBlobName = “scored_records.csv”; // Replace this with the name you would like to use for your Azure blob; this needs to have the same extension as the input file
const string apiKey = “yourAPIKey==”; // Replace this with the API key for the web service
const string OutputFileLocation = @”C:\Temp\Scoring_Output.csv”; // Replace this with the location you would like to use for your output file
Also, make sure BaseUrl is set to POST value. You can find this when you click on BATCH EXECUTION, make sure not to include anything after the ‘jobs’, for example https://ussouthcentral.services.azureml.net/workspaces/feb9f1db037d499fa3e3081a318eada2/services/1a7q3vfd95cd4791afc06216f354c697/jobs
Set timeout in the code to what you need it to be, you might want to increase that value, I set mine to 30 minutes.
const int TimeOutInMilliseconds = 1800 * 1000; // Set a timeout of 30 minutes
It took 28 minutes to process 2.1 million records or about 1.56 GB of data when I ran my initial test.
Once I got the file with scored probabilities, I realized that I did not need all the data back that I’m sending to my model to get scored. So I changed my web service using Project Columns to only return my key value with Scored Labels and Scored Probabilities which drastically reduced the size of the file returned by the console app.
This console app can be schedule as a task with Windows Task Scheduler or integrated into SSIS package, however current version of SSIS (VS 2012 – 2013) at the time of this writing had older versions of references required by the sample code to run.
Here are some useful links for the consumption of Azure Machine Learning web services:
Note that Scored Probabilities values appear with scientific notation, ex. 3.25789456123547E-08 which aren’t easily manipulative in a SQL Server table. So I just created another field decimal(20, 18) and set with the following value using T-SQL:
WHEN [Scored Probabilities] like ‘%E-%’ THEN LTRIM(RTRIM(CAST(CAST([Scored Probabilities] AS FLOAT) AS DECIMAL(20,18))))
ELSE [Scored Probabilities]
Also, I’ve made a mistake of renaming one of my columns when generating the file to be scored to be ‘Processed’ instead of ‘processed’ but did not update it in the model. I kept getting an error that column ‘processed’ could not be found when trying to run the app. Be careful with case sensitivity, it does matter.
I would like to extend special thanks to Microsoft Azure ML Team and specifically Akshaya Annavajhala for his assistance and patience.