Data wrenching with Azure Machine Learning Workbench
Hi all,
This blog post will cover how to transform data with Azure Machine Learning Workbench. One of the most important aspects in any Machine learning project is how accurate and how good your data is. If you have un-accurate or bad data the learning or simulation models will generate bad results.
The main components of Azure Machine Learning are:
- Azure Machine Learning Workbench (which is what we will use today to prepare our data)
- Azure Machine Learning Experimentation Service (to be able to use Azure ML Workbench we need an Experimentation Service)
- Azure Machine Learning Model Management Service
- Microsoft Machine Learning Libraries for Apache Spark (MMLSpark Library)
- Visual Studio Code Tools for AI
Have a look here for a quick overview of Azure Machine Learning and a detailed description of the different Machine learning components.
We will create a new Azure ML Experimentation Service first, after which we will download and install Azure ML Workbench.
Azure Machine Learning Experimentation Service
The Experimentation Service handles the execution of machine learning experiments. It also supports the Workbench by providing project management, Git integration, access control, roaming, and sharing.
Through easy configuration, you can execute your experiments across a range of compute environment options:
- Local native
- Local Docker container
- Docker container on a remote VM
- Scale out Spark cluster in Azure
Today we will not execute any experiments. We will just show how to transform the data. Let’s create our new Azure ML learning account:
- Select the New button (+) in the upper-left corner of the Azure portal.
- Enter Machine Learning in the search bar. Select the search result named Machine Learning Experimentation (preview).
I selected the DevTest SKU
After filling in the required information and submitting the deployment you should see the following:
Azure Machine Learning Workbench
Now that we have our Azure ML Learning account we can install Azure ML Workbench. Azure ML Workbench is a desktop application plus command-line tools, supported on both Windows and macOS. It allows you to manage machine learning solutions through the entire data science life cycle:
- Data ingestion and preparation
- Model development and experiment management
- Model deployment in various target environments
You can install Azure Machine Learning Workbench on your computer running Windows 10, Windows Server 2016, or newer.
- Download the latest Azure Machine Learning Workbench installer AmlWorkbenchSetup.msi.
Double-click the downloaded installer AmlWorkbenchSetup.msi from File Explorer. The installer downloads all the necessary dependent components, such as Python, Miniconda, and other related libraries. The installation might take around half an hour to finish all the components.
- After the installation process is complete, select the Launch Workbench button on the last screen of the installer.
- Sign in to Workbench by using the same account that you used earlier to provision your Azure resources.
- When the sign-in process has succeeded, Workbench attempts to find the Machine Learning Experimentation accounts that you created earlier. It searches for all Azure subscriptions to which your credential has access. When at least one Experimentation account is found, Workbench opens with that account. It then lists the workspaces and projects found in that account.
Create a new project
- Start the Azure Machine Learning Workbench app and sign in.
- Select File > New Project (or select the + sign in the PROJECTS pane).
- Fill in the Project name and Project directory boxes. Project description is optional but helpful. Leave the Visualstudio.com GIT Repository URL box blank for now
Select Blank Project:
I downloaded a dataset from https://www.kaggle.com/datasets that I will be using here as an example. To bring data into a project using the data source wizard. Select the + button next to the search box in the data view and choose Add Data Source
In my case I used Excel as the data source
You can specify one or more sampling strategies for the dataset, and choose one as the active strategy. The default is to load the Top 100 rows. I changed it to load the full file
Once the data has been imported you will see the following screen
Create a Data Preparation Package
Now that we have our datasource we will prepare our data by creating a Data Preparation Package that we will use to transform our data.
- Click the ‘Data’ icon again
- Right click on the
flights
data source and click ‘Prepare’ - Add a ‘Data Preparation Package Name’ then press ‘OK’
This will then create a flights.dprep
file in your project directory, and will show a page almost similar to what we’ve seen after we’ve added a data source.
Derive Column By Example Transformation
We will now transform our month column from numeric values to a string with the derive column by example. Azure ML Workbench synthesizes a program based on the examples provided by you and applies the same program on remaining rows. All other rows are automatically populated based on the example you provided. Workbench also analyzes your data and tries to identify edge cases.
Select the column you want to transform, click derive column by example
A new column will appear with null values. Click in the first empty cell in the new column and provide an example. In my case I typed January to transform my numeric value of 1 (first month of the year) to January.
Select OK to accept the changes. You should now see the following:
That’s how easy it is to transform data with Azure ML Workbench. Imagine doing this with Excel?! For more examples regarding data transformation have a look here.
Thanks,
Alex
Leave a comment