💪Creating an Azure Machine Learning Workspace and Datastores using Bicep | by Dave R – Microsoft Azure MVP☁️ | CodeX | Oct, 2021

You leverage Machine Learning in Azure as a cloud service for accelerating and managing the machine learning project lifecycle.

In previous articles, I referred to the core components of the Azure Machine Learning service:

  • Workspace: This is the core component. Check how you can create an Azure Machine Learning Workspace.
  • Managed resources: These are Azure Machine Learning Compute nodes to use for your development environment. Compute Clusters are used for submitting training runs.
  • Linked Services: These include Datastores and Compute targets.
  • Assets: This can be an environment, experiments, pipelines, datasets, models, and/or endpoints.
  • Dependencies: These are resources needed to execute your AML Workspace properly.

The figure below represents the Azure Machine Learning Architecture:

Azure Machine Learning Architecture

This article will use Azure Bicep, the new DSL language for deploying Azure resources declaratively, to provide an Azure Machine Learning Workspace with multiple datastores.

First, let’s take a look at two basic concepts. AzureML provides two basic assets for working with data:

Think of a datastore as the mapping for the actual storage resource to the Azure Machine Learning Workspace.

A Datastore provides an interface for your Azure Machine Learning storage accounts.

A Dataset is an asset in your Machine Learning Workspace that will help you connect to the data and your storage service and make the data available for your machine learning experiments.

When you create a dataset in Azure Machine Learning, you are creating a reference to the data in your storage service. Azure is not copying your data.

This means there’s no storage cost incurred when creating our datasets. Think of a dataset as a pointer to other data that is stored on a storage resource.

Instead of pointing directly to your storage resource, you can use datasets to simplify the access to the data across your team. You only register data once, and then you can reuse it across different experiments.

Another benefit of datasets is the ability to use them as a direct input for your script or pipelines, and help you check where data has been used.

Datasets can be created using the Azure Machine Learning Studio Portal in the dataset option and create datasets from local files, from a datastore, database, or Open Datasets as shown in the image below:

Azure Machine Learning Studio — Datasets

Note we must have a Workspace created before we can interact with the Azure Machine Learning Studio portal.

Now that we have gone through the basics, we will simplify the process for creating the Azure Machine Learning Workspace with Datastore and Datasets using Infrastructure-As-Code with a Bicep template.

Bicep is the new DSL language for deploying Azure resources declaratively. In a previous article, we discussed the importance of Bicep for Infrastructure-as-Code for Azure and how it will impact your environments in Azure.

Pre-requisites:

  • Install Bicep in your local machine
  • Azure PowerShell or Azure CLI installed in your local machine
  • An active Azure Subscription
  • A resource group
  • A user with the owner/contributor role enabled in the Azure subscription

We will use the Bicep file below to create a new Azure Machine Learning Workspace with multiple datastores:

Now, we will pass a few parameters for using a separate parameters file.

The parameters file will look like below:

{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"workspaceName": {
"value": "azinsiderj"
},
"location": {
"value": "eastus"
}
}
}

Note we only pass the name of the new Workspace and the location. Then we will deploy our Bicep file to a resource group in the Azure subscription using the code below:

$date = Get-Date -Format "MM-dd-yyyy"
$deploymentName = "AzInsiderDeployment"+"$date"
New-AzResourceGroupDeployment -Name $deploymentName -ResourceGroupName AzInsiderML -TemplateFile .azuredeploy.json -TemplateParameterFile .azuredeploy.parameters.json -c

Once the validation is complete, we will execute the deployment.

The figure below shows the preview of the deployment:

Deployment preview

The image below shows the deployment output.

Deployment output

After a few minutes, our Azure Machine Workspace and the Datastore will be ready.

We can access the Machine Learning Studio Portal and then create our datastore from a web file. You can go to the Azure Portal, and from the Machine Learning Service we just deployed, launch the Studio portal:

Azure Portal — Resource group

This will redirect you to the Azure Machine Learning Studio Portal.

Once you’re in the Studio Portal, you can go to the Datastores option, and you will see the datastores recently created using the Bicep file:

Azure Machine Learning Studio — Datastores

After this, you can create additional datastores or new datasets using Bicep or the Azure Machine Learning Studio Portal.

Hope this provides you with a better understanding of how you can leverage Bicep to automate the creation of the resources needed when working with Azure Machine Learning.

Join the AzInsider email list here.

-Dave R.