Operationalizing your code with Snowpark Python Stored Procedures

Photo by Hitesh Choudhary on Unsplash

What are Python Stored Procedures and why should I be excited?

Snowpark Python procedures help you operationalize your Snowpark Python code and run, orchestrate and schedule your Snowpark Python pipelines. A stored procedure is created once and can be executed many times with a simple CALL statement in your orchestration or automation tools with caller’s or owner’s rights based on your requirements. Snowflake already offers Javascript, SQL (Snowflake Scripting) and Java/Scala (using Snowpark) stored procedures and with the new Python stored procedures with Snowpark, you can truly create polyglot data pipelines that consist of Snowflake Tasks calling stored procedures in any of these languages leveraging the same security and governance features. All in Snowflake… Natively…

Just to add more to the excitement… If you are a dbt fan, you have more to look forward to! :) Python stored procedures are one of the chosen adaptors that will power dbt Python models to run transformations in your dbt DAG at scale. See the dbt discussion for more on this!

How can I create Python Stored Procedures?

Snowpark Python Procedures can be created in several ways:

  1. You can use SQL to create them using in-line Python in Snowsight. Here is a simple example of how to do this:

2. Or you can save the python code in a .py file in a Snowflake stage and use SQL to create them in Snowsight similarly.

3. If you want to take a pure Python approach, you can also use sproc() function in the Snowpark Python Client API to bundle your Python function in a stored procedure and have Snowpark deploy it on the server side. With this, Snowpark will serialize your Python code as well as all the dependencies into bytecode using cloudpickle package and store them in a Snowflake stage automatically. See the API documentation for more.

Stored procedures in Snowpark can be created in multiple ways either as a temporary (session-level) or permanent object in Snowflake. @sproc decorator is particularly fun way to create them. Please take a look at this basic example where I am creating a stored procedure using my handler function, transformData():

A couple of reminders:

The first parameter of the handler function and stored procedure should be a Snowpark session object.

Don’t forget to use add_packages() to add your conda dependencies and add_import() to add your non-compiled custom Python dependencies in your session. This is how Snowpark knows to pull these dependencies from the secure Anaconda Snowflake channel for server-side processing. If you prefer, you can use requirements.txt with add_requirements() function.

Also see the Python data types that are supported for the parameters of a stored procedure and how they map to SQL and Snowpark types here.

Hope this blog post gives you an intoduction to the brand new Python Stored Procedure experience in the Snowflake Data Cloud. Also please check out the quickstart on tips on how to improve performance with Snowpark here.

Please stay tuned for all the amazing announcements at the Snowflake Summit 2023 in a few days and register for the Keynote Live Streams here.

Opinions expressed in this post are solely my own and do not represent the views or opinions of my employer.

--

--

Eda Johnson
Snowflake Builders Blog: Data Engineers, App Developers, AI/ML, & Data Science

AWS Machine Learning Specialty | Azure | Databricks | GCP | Snowflake Advanced Architect | Terraform certified Principal Data Cloud Architect