Snowflake-Labs
diff --git a/‎CICD for Machine Learning using GitHub Actions/.github/workflows/build_and_deploy.yaml‎
Lines changed: 71 additions & 0 deletions b/‎CICD for Machine Learning using GitHub Actions/.github/workflows/build_and_deploy.yaml‎
Lines changed: 71 additions & 0 deletions
diff --git a/‎CICD for Machine Learning using GitHub Actions/README.md‎
Lines changed: 28 additions & 0 deletions b/‎CICD for Machine Learning using GitHub Actions/README.md‎
Lines changed: 28 additions & 0 deletions
diff --git a/‎CICD for Machine Learning using GitHub Actions/conda_env_main.yml‎
Lines changed: 29 additions & 0 deletions b/‎CICD for Machine Learning using GitHub Actions/conda_env_main.yml‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎CICD for Machine Learning using GitHub Actions/data/application_record.csv.zip‎
3.04 MB b/‎CICD for Machine Learning using GitHub Actions/data/application_record.csv.zip‎
3.04 MB
diff --git a/‎CICD for Machine Learning using GitHub Actions/data/credit_record.csv.zip‎
2.28 MB b/‎CICD for Machine Learning using GitHub Actions/data/credit_record.csv.zip‎
2.28 MB
diff --git a/‎CICD for Machine Learning using GitHub Actions/requirements.txt‎
Lines changed: 2 additions & 0 deletions b/‎CICD for Machine Learning using GitHub Actions/requirements.txt‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎CICD for Machine Learning using GitHub Actions/source/01_setup_snowflake.sql‎
Lines changed: 18 additions & 0 deletions b/‎CICD for Machine Learning using GitHub Actions/source/01_setup_snowflake.sql‎
Lines changed: 18 additions & 0 deletions
diff --git a/‎CICD for Machine Learning using GitHub Actions/source/02_load_data.ipynb‎
Lines changed: 123 additions & 0 deletions b/‎CICD for Machine Learning using GitHub Actions/source/02_load_data.ipynb‎
Lines changed: 123 additions & 0 deletions
@@ -0,0 +1,71 @@
+name: Deploy ML_Snowpark_CI_CD
+
+# Controls when the action will run. 
+on:
+ push:
+ branches:
+ - master
+
+ # Allows you to run this workflow manually from the Actions tab
+ workflow_dispatch:
+
+jobs:
+ deploy:
+ runs-on: ubuntu-latest
+
+ steps:
+ - uses: actions/checkout@v3
+
+ - name: Setup Python 3.9
+ uses: actions/setup-python@v4
+ with:
+ python-version: '3.9'
+
+ - name: Install Python packages
+ run: pip install -r requirements.txt
+
+ - name: Download SnowSQL
+ run: curl -O https://sfc-repo.snowflakecomputing.com/snowsql/bootstrap/1.2/linux_x86_64/snowsql-1.2.9-linux_x86_64.bash
+ 
+ - name: Install SnowSQL
+ run: SNOWSQL_DEST=~/bin SNOWSQL_LOGIN_SHELL=~/.profile bash snowsql-1.2.9-linux_x86_64.bash
+
+ - name: Configure snowcli
+ env:
+ SNOWSQL_ACCOUNT: ${{ secrets.SNOWSQL_ACCOUNT }}
+ SNOWSQL_USER: ${{ secrets.SNOWSQL_USER }}
+ SNOWSQL_PWD: ${{ secrets.SNOWSQL_PWD }}
+ SNOWSQL_ROLE: ${{ secrets.SNOWSQL_ROLE }}
+ SNOWSQL_WAREHOUSE: ${{ secrets.SNOWSQL_WAREHOUSE }}
+ SNOWSQL_DATABASE: ${{ secrets.SNOWSQL_DATABASE }}
+ run: |
+ mkdir -p ~/.snowsql
+ echo "[connections.dev]" > ~/.snowsql/config
+ echo "accountname = $SNOWSQL_ACCOUNT" >> ~/.snowsql/config
+ echo "username = $SNOWSQL_USER" >> ~/.snowsql/config
+ echo "password = $SNOWSQL_PWD" >> ~/.snowsql/config
+ echo "rolename = $SNOWSQL_ROLE" >> ~/.snowsql/config
+ echo "warehousename = $SNOWSQL_WAREHOUSE" >> ~/.snowsql/config
+ echo "dbname = $SNOWSQL_DATABASE" >> ~/.snowsql/config
+
+ - name: Test installation
+ run: ~/bin/snowsql -v
+ 
+ - name: Debug current directory
+ run: |
+ pwd
+ ls -al
+
+ # If this code works on your machine then we are good to go for deployment
+ - name: Test Python Connection 
+ run: python test/test_connection.py
+
+ # Deploying Stored Proc to process data incrementally
+ - name: Deploy SPROC using 05_process_data_incrementally.py
+ run: python source/05_process_data_incrementally.py
+
+ # Deploy data Streams and Tasks for scheduled inference
+ - name: Deploy the SQL script with SnowSQL
+ run: ~/bin/snowsql -c dev -f $GITHUB_WORKSPACE/source/06_orchestrate_jobs.sql
+ 
+
@@ -0,0 +1,28 @@
+# CI/CD for Machine Learning within Snowflake
+
+Machine Learning (ML) has transformed the way businesses analyze and interpret data. Yet, operationalizing ML models remains a challenge. As datasets grow and environments evolve, Continuous Integration/Continuous Deployment (CI/CD) becomes indispensable. In this demo, I'll showcases how to implement CI/CD for an ML workflow using Snowflake.
+
+**Medium Article Link**: https://medium.com/snowflake/ci-cd-for-machine-learning-within-snowflake-a-simple-approach-390cc4cbf8ef
+
+This repository contains code to demonstrate end-to-end machine learning development and deployment using Snowflake.
+
+## Repository Structure
+1. Data: This folder contains the dataset used in the demo.
+2. Source Code: This directory contains all the scripts required for setting up the environment, data loading, data processing, model training, and deployment.
+3. Github Actions Code: The files in this directory are used to set up the CI/CD pipeline with Github Actions.
+4. Conda Environment: This directory includes a requirements.txt file to set up the necessary Python environment.
+
+## Steps for Running the Project
+1. Setup Snowflake Environment: This involves configuring the connection to the Snowflake data warehouse and setting up any required databases, schemas, or other resources.
+2. Load Data into Snowflake: In this step, the data present in the Data folder is loaded into the Snowflake data warehouse.
+3. Prepare Data for Model Training: The data loaded into Snowflake is preprocessed and prepared for machine learning model training.
+4. Train and Deploy Machine Learning Model: Using the processed data, a machine learning model is trained and deployed.
+5. Create a Stored Procedure in Snowflake: A stored procedure is created for cleaning and processing data incrementally. This procedure is used for future batch inferences.
+6. Orchestrate the Machine Learning Workflow: The entire workflow is orchestrated using Tasks and Streams in Snowflake. This ensures that the process is fully automated and can handle new data as it becomes available.
+
+## How to Use This Repository
+To use this repository, clone it to your local machine or development environment. You can then run the scripts in the order specified above.
+
+Ensure that you have set up your Snowflake environment correctly and have the necessary access rights to perform these operations. You also need to set up the Python environment using the requirements.txt file provided in the Conda Environment directory.
+
+This demonstration shows the capabilities of Snowflake in terms of machine learning model development, deployment, and orchestration. You can use it as a basis for developing more complex workflows or for working with different types of data and models.
@@ -0,0 +1,29 @@
+name: pysnowpark_ml_ops
+channels:
+ - https://repo.anaconda.com/pkgs/snowflake
+ - nodefaults
+dependencies:
+ - python=3.9
+ - pip
+ - pip:
+ # Basics
+ - pandas==1.5.3
+ - numpy==1.23.5
+ # ML
+ - scikit-learn==1.2.2
+ - xgboost==1.7.3
+ - tensorflow
+ # Visualization
+ - scipy==1.10.1
+ - seaborn==0.12.2
+ - matplotlib==3.7.1
+ # Misc
+ - cloudpickle==2.0.0
+ - jupyter==1.0.0
+ - cachetools==4.2.2
+ - joblib==1.1.1
+ - imbalanced-learn==0.10.1
+ - torch==1.10.2
+ - pytorch-tabnet==3.1.1
+ - snowflake-snowpark-python
+ - snowflake_ml_python
@@ -0,0 +1,2 @@
+snowflake-snowpark-python
+snowflake_ml_python
@@ -0,0 +1,18 @@
+USE ROLE ACCOUNTADMIN;
+
+-- Create Database and Schemas to separate processing, execution, and deployment
+CREATE DATABASE IF NOT EXISTS ML_SNOWPARK_CI_CD;
+USE DATABASE ML_SNOWPARK_CI_CD;
+CREATE SCHEMA IF NOT EXISTS DATA_PROCESSING;
+CREATE SCHEMA IF NOT EXISTS ML_PROCESSING;
+
+
+-- Create a test UDF
+CREATE OR REPLACE FUNCTION DATA_PROCESSING.MULTIPLY(X NUMBER(35,4), Y NUMBER(35,4))
+RETURNS NUMBER(35,4)
+ AS
+$$
+ X * Y
+$$;
+
+select DATA_PROCESSING.MULTIPLY(499, 599) as product;
@@ -0,0 +1,123 @@
+{
+ "cells": [
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "id": "a7d72538",
+ "metadata": {},
+ "source": [
+ "## Imports"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "id": "2cb04fd9",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from snowflake.snowpark.session import Session\n",
+ "import snowflake.snowpark.types as T\n",
+ "import snowflake.snowpark.functions as F\n",
+ "from snowflake.snowpark.functions import col\n",
+ "\n",
+ "import os\n",
+ "import json\n",
+ "import pandas as pd\n",
+ "\n",
+ "import warnings\n",
+ "warnings.filterwarnings(\"ignore\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "id": "9cc080c0",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "my_dir = os.getcwd()\n",
+ "connection_parameters = json.load(open(f'/{my_dir}/creds.json'))\n",
+ "session = Session.builder.configs(connection_parameters).create()"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "id": "67790ddf",
+ "metadata": {},
+ "source": [
+ "# Load Data"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "id": "6671dc41",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "os.chdir('../')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "id": "ec8fc760",
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Loading from local CSV-files\n",
+ "application_record_df = pd.read_csv('data/application_record.csv.zip')\n",
+ "credit_record_df = pd.read_csv('data/credit_record.csv.zip')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "id": "a2bc30c0",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "<snowflake.snowpark.table.Table at 0x15802e370>"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Upload to Snowflake\n",
+ "session.sql('USE DATABASE ML_SNOWPARK_CI_CD').collect()\n",
+ "session.sql('USE SCHEMA DATA_PROCESSING').collect()\n",
+ "\n",
+ "session.write_pandas(application_record_df, table_name='APPLICATION_RECORD', auto_create_table=True, overwrite=True)\n",
+ "session.write_pandas(credit_record_df, table_name='CREDIT_RECORD', auto_create_table=True, overwrite=True)"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.16"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+snowflake-snowpark-python`
	`2`	`+snowflake_ml_python`