AutoML with AWS Sagemaker Canvas and Snowflake

Eda Johnson
Snowflake
Published in
5 min readJul 25, 2022

--

Photo by Art Zone on Unsplash

AWS Sagemaker Canvas is a great option with an easy and user-friendly UI to do low/no-code machine learning. (Please see some of the previous posts about other Snowflake + Sagemaker integrations.) Recently, AWS Sagemaker extended its capabilities to support a true AutoML experience for citizen data scientists who might prefer a code-less way to build ML models with SageMaker Canvas.

Using SageMaker Canvas, business analysts can now build ML models and generate predictions on their own by importing any dataset from their Snowflake accounts and defining the model they would like to train in a few clicks. We can also generate single and batch predictions and maintain different versions of the model without writing a single line of code.

Here are the steps for a simple demo for creating an ML model to predict customer churn using a dataset in Snowflake using Sagemaker Canvas:

1. Import your dataset(s) from your Snowflake account using a Storage Integration

Select Datasets on the left side bar and click on the Import button to import your data from Snowflake:

Click on the Add Connection on the Import page and select Snowflake from the drop-down menu:

Add your Snowflake connection information with an existing Storage Integration (typically defined by Snowflake admins — see Storage integrations in Snowflake) defined as shown in the example below and click on Add connection.

Once the connection is successful, you can now drag and drop the Snowflake tables that contain the source data to prepare your training dataset and optionally build simple transformations either without using single line of code or simply using SQL as below.

Once you are good, you can click on the Import data button and save the dataset by defining a dataset name.

2. Train your model

Once the dataset is ready, you can now use the AutoML functionality to train your model by clicking on Models on the left and clicking in the +New model button and specifying a name for your model (e.g. churn_demo):

In your AutoML configuration, select the dataset you created previously:

Now click on the Build tab and select the target field you would like to predict (in our case it is RETAINED field) in your dataset.

Sagemaker Canvas automatically detects what model is best to use (Time Series, Regression and Classification etc.). You can always click on Change type and select the model type of your choice.

We are selecting 2 category model (Binary Classification) for predicting the RETAINED field for each customer record in the dataset. After that, we click on Change type:

Now that everything is set, we can click on Preview model which generates analysis to estimate predicted accuracy if we were to build this model as well as feature importance findings (this takes a couple of minutes):

Now we can click on the Quick build and start building the model:

After 2–15 mins, Model status shows up under Analyze and our model is sucessfully built and deployed to support predictions:

You can also create different versions of the same model as needed.

3. Predicting customer churn

It is time to predict! You can either use batch prediction or single record prediction under the Predict tab. Here is an example of a single record prediction:

You can also select any dataset in your Snowflake account to do batch predictions by clicking on the Batch prediction and then the Select dataset button:

That’s it!

As many organizations incorporate Machine Learning into their workloads, AWS Sagemaker Canvas provides an excellent option for business analysts to experiment with Machine Learning with their data in the Snowflake Data Cloud leveraging a plethora of datasets available in the Snowflake Marketplace.

Opinions expressed in this post are solely my own and do not represent the views or opinions of my employer.

--

--

Eda Johnson
Snowflake

AWS Machine Learning Specialty | Azure | Databricks | GCP | Snowflake Advanced Architect | Terraform certified Principal Data Cloud Architect