A First Look at the dbt Python Models with Snowpark

Date: August 20, 2022

Opinions expressed in this post are solely my own and do not represent the views or opinions of my employer.

A couple of months ago, I published a Medium article on how to execute Snowpark Python in dbt using fal.ai. Since then, Snowpark Python has become Public Preview as of June 2022. Also the dbt community has more native dbt Python support now. In August 2022, dbt Labs has announced the first beta prerelease of v1.3, which supports dbt models written in Python. This brand new dbt feature is available in three platforms only and of course, Snowflake with Snowpark Python is one of them. You can try this new beta functionality in dbt Core following the installation instructions here. You can also use dbt Cloud by selecting the 1.3 patch release as your dbt version.

Python models pretty much work the same as SQL models in a dbt project with the same capabilities around testing, lineage, documentation, and automation. However, there are some differences in this very early beta phase. For example, Python models only support table and incremental materializations. Also Python support in dbt is only available for models at this time, not for tests, macros and snapshots. Additional Python capabilities are expected to be added as the dbt Python support continues to evolve this year. More about the dbt Python models can be found here: https://docs.getdbt.com/docs/building-a-dbt-project/building-models/python-models

There are a lot of good conversations around why Python models can be good alternatives to SQL models. As much as the dbt Python support excites all of us, as a general rule, any transformations that can be done in SQL are still best created as SQL models in dbt. As the Python support in dbt evolves, there will be more best practices and recommendations where Python models are a good fit. (e.g. ML model training, more complex transformation logic that can be easily handled with a simple Python package, etc.)

Jaffle_shop example is a popular dbt project example that all dbt users are familiar with. It uses customer, order and payment data for a retail shop to build transformations using dbt models.

Jaffle_shop lineage graph generated by dbt

Even though this project is a perfect candidate for dbt SQL models, I decided to re-write it using Python to demonstrate dbt Snowpark Python models using the dbt v1.3 beta release.

Here is the repo: https://github.com/edemiraydin/jaffle_shop_snowpark

As an example, we can quickly compare the customers.sql model and the same model written in Python as customers.py. Here is the customers.sql model:

customers.sql

Here is the customers.py model:

customers.py

Similarly, here is the orders.sql model that originally uses macros and the same model written in Python as orders.py:

orders.sql
orders.py

Snowpark Python and dbt Python models are constantly evolving with new features, but at this early stage here are some of my observations and thoughts so far:

  1. Building jaffle_shop with all dbt Python models takes a little longer than building it with all SQL models. (Please note that dbt v1.3 is still in development so the execution time for the Python models might differ.)

dbt build time with a Small virtual WH using all SQL models: ~23 secs

dbt build time with a Small virtual WH using all Python models: ~34.70 secs

The total dbt execution time increase with Python models is because of a number of reasons including:

  • Extra steps to create and run python stored procedures for models
  • All models created as tables as opposed to views as Python models do not support view materializations yet.

The execution time of the SQL statements generated by Snowpark Python for the customers and orders models in the stored procedures are comparable to the execution time of the SQL statements that were used in the SQL models. (Yay! to the Snowpark Python query generator.)

I’d expect the overall execution time increase with the Snowpark Python models to improve over time.

Please also see the related limitations of dbt Python models here.

2. With Python models and functional programming there is some code reduction and also opportunities for unit testing. Additionally, there is less need for macros.

3. I used dbt Cloud (and the new beta IDE) to develop jaffle_shop_snowpark. I found myself referring to the Snowpark API Reference quite a bit even when developing this simple project. It would be great to have auto completion functionality and other ways to refer to the Snowpark documentation within the dbt Cloud IDE in the future.

4. With Snowpark Python, each dbt python model is created as a Python stored procedure and gets executed. After the dbt build runs, you can view the definition of these dbt-generated Python stored procedures in Snowsight.

Below are jaffle_shop dbt python models created as stored procedures after dbt project is built:

Here is a quick look at the dbt generated Python stored procedure for the customers Python model:

5. Even though my jaffle_shop_snowpark example does not cover this, dbt Python and SQL models can be mixed together seamlessly in a dbt project as long as you use unique model names.

We will definitely hear a lot more about dbt Python models in the upcoming months. In the meantime, you can do the following:

Also please register for the Coalesce 2022 and check out my session, “Empowering pythonistas with dbt and snowpark” on October 17.

Thank you for taking the time to read this blog post!

--

--

Eda Johnson
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

AWS Machine Learning Specialty | Azure | Databricks | GCP | Snowflake Advanced Architect | Terraform certified Principal Data Cloud Architect