Skip to content

Commit c2a27f5

Browse files
Ml with GitHub cicd (#58)
* New folder for ml with github cicd * Changing name for clarity * Changing folder name for clarity * Changing creds path Changed the path such that others copying the code can easily get the code working. * Update README.md * Create build_and_deploy.yaml
1 parent be658a1 commit c2a27f5

17 files changed

+2176
-0
lines changed
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
name: Deploy ML_Snowpark_CI_CD
2+
3+
# Controls when the action will run.
4+
on:
5+
push:
6+
branches:
7+
- master
8+
9+
# Allows you to run this workflow manually from the Actions tab
10+
workflow_dispatch:
11+
12+
jobs:
13+
deploy:
14+
runs-on: ubuntu-latest
15+
16+
steps:
17+
- uses: actions/checkout@v3
18+
19+
- name: Setup Python 3.9
20+
uses: actions/setup-python@v4
21+
with:
22+
python-version: '3.9'
23+
24+
- name: Install Python packages
25+
run: pip install -r requirements.txt
26+
27+
- name: Download SnowSQL
28+
run: curl -O https://sfc-repo.snowflakecomputing.com/snowsql/bootstrap/1.2/linux_x86_64/snowsql-1.2.9-linux_x86_64.bash
29+
30+
- name: Install SnowSQL
31+
run: SNOWSQL_DEST=~/bin SNOWSQL_LOGIN_SHELL=~/.profile bash snowsql-1.2.9-linux_x86_64.bash
32+
33+
- name: Configure snowcli
34+
env:
35+
SNOWSQL_ACCOUNT: ${{ secrets.SNOWSQL_ACCOUNT }}
36+
SNOWSQL_USER: ${{ secrets.SNOWSQL_USER }}
37+
SNOWSQL_PWD: ${{ secrets.SNOWSQL_PWD }}
38+
SNOWSQL_ROLE: ${{ secrets.SNOWSQL_ROLE }}
39+
SNOWSQL_WAREHOUSE: ${{ secrets.SNOWSQL_WAREHOUSE }}
40+
SNOWSQL_DATABASE: ${{ secrets.SNOWSQL_DATABASE }}
41+
run: |
42+
mkdir -p ~/.snowsql
43+
echo "[connections.dev]" > ~/.snowsql/config
44+
echo "accountname = $SNOWSQL_ACCOUNT" >> ~/.snowsql/config
45+
echo "username = $SNOWSQL_USER" >> ~/.snowsql/config
46+
echo "password = $SNOWSQL_PWD" >> ~/.snowsql/config
47+
echo "rolename = $SNOWSQL_ROLE" >> ~/.snowsql/config
48+
echo "warehousename = $SNOWSQL_WAREHOUSE" >> ~/.snowsql/config
49+
echo "dbname = $SNOWSQL_DATABASE" >> ~/.snowsql/config
50+
51+
- name: Test installation
52+
run: ~/bin/snowsql -v
53+
54+
- name: Debug current directory
55+
run: |
56+
pwd
57+
ls -al
58+
59+
# If this code works on your machine then we are good to go for deployment
60+
- name: Test Python Connection
61+
run: python test/test_connection.py
62+
63+
# Deploying Stored Proc to process data incrementally
64+
- name: Deploy SPROC using 05_process_data_incrementally.py
65+
run: python source/05_process_data_incrementally.py
66+
67+
# Deploy data Streams and Tasks for scheduled inference
68+
- name: Deploy the SQL script with SnowSQL
69+
run: ~/bin/snowsql -c dev -f $GITHUB_WORKSPACE/source/06_orchestrate_jobs.sql
70+
71+
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# CI/CD for Machine Learning within Snowflake
2+
3+
Machine Learning (ML) has transformed the way businesses analyze and interpret data. Yet, operationalizing ML models remains a challenge. As datasets grow and environments evolve, Continuous Integration/Continuous Deployment (CI/CD) becomes indispensable. In this demo, I'll showcases how to implement CI/CD for an ML workflow using Snowflake.
4+
5+
**Medium Article Link**: https://medium.com/snowflake/ci-cd-for-machine-learning-within-snowflake-a-simple-approach-390cc4cbf8ef
6+
7+
This repository contains code to demonstrate end-to-end machine learning development and deployment using Snowflake.
8+
9+
## Repository Structure
10+
1. Data: This folder contains the dataset used in the demo.
11+
2. Source Code: This directory contains all the scripts required for setting up the environment, data loading, data processing, model training, and deployment.
12+
3. Github Actions Code: The files in this directory are used to set up the CI/CD pipeline with Github Actions.
13+
4. Conda Environment: This directory includes a requirements.txt file to set up the necessary Python environment.
14+
15+
## Steps for Running the Project
16+
1. Setup Snowflake Environment: This involves configuring the connection to the Snowflake data warehouse and setting up any required databases, schemas, or other resources.
17+
2. Load Data into Snowflake: In this step, the data present in the Data folder is loaded into the Snowflake data warehouse.
18+
3. Prepare Data for Model Training: The data loaded into Snowflake is preprocessed and prepared for machine learning model training.
19+
4. Train and Deploy Machine Learning Model: Using the processed data, a machine learning model is trained and deployed.
20+
5. Create a Stored Procedure in Snowflake: A stored procedure is created for cleaning and processing data incrementally. This procedure is used for future batch inferences.
21+
6. Orchestrate the Machine Learning Workflow: The entire workflow is orchestrated using Tasks and Streams in Snowflake. This ensures that the process is fully automated and can handle new data as it becomes available.
22+
23+
## How to Use This Repository
24+
To use this repository, clone it to your local machine or development environment. You can then run the scripts in the order specified above.
25+
26+
Ensure that you have set up your Snowflake environment correctly and have the necessary access rights to perform these operations. You also need to set up the Python environment using the requirements.txt file provided in the Conda Environment directory.
27+
28+
This demonstration shows the capabilities of Snowflake in terms of machine learning model development, deployment, and orchestration. You can use it as a basis for developing more complex workflows or for working with different types of data and models.
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
name: pysnowpark_ml_ops
2+
channels:
3+
- https://repo.anaconda.com/pkgs/snowflake
4+
- nodefaults
5+
dependencies:
6+
- python=3.9
7+
- pip
8+
- pip:
9+
# Basics
10+
- pandas==1.5.3
11+
- numpy==1.23.5
12+
# ML
13+
- scikit-learn==1.2.2
14+
- xgboost==1.7.3
15+
- tensorflow
16+
# Visualization
17+
- scipy==1.10.1
18+
- seaborn==0.12.2
19+
- matplotlib==3.7.1
20+
# Misc
21+
- cloudpickle==2.0.0
22+
- jupyter==1.0.0
23+
- cachetools==4.2.2
24+
- joblib==1.1.1
25+
- imbalanced-learn==0.10.1
26+
- torch==1.10.2
27+
- pytorch-tabnet==3.1.1
28+
- snowflake-snowpark-python
29+
- snowflake_ml_python
Binary file not shown.
Binary file not shown.
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
snowflake-snowpark-python
2+
snowflake_ml_python
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
USE ROLE ACCOUNTADMIN;
2+
3+
-- Create Database and Schemas to separate processing, execution, and deployment
4+
CREATE DATABASE IF NOT EXISTS ML_SNOWPARK_CI_CD;
5+
USE DATABASE ML_SNOWPARK_CI_CD;
6+
CREATE SCHEMA IF NOT EXISTS DATA_PROCESSING;
7+
CREATE SCHEMA IF NOT EXISTS ML_PROCESSING;
8+
9+
10+
-- Create a test UDF
11+
CREATE OR REPLACE FUNCTION DATA_PROCESSING.MULTIPLY(X NUMBER(35,4), Y NUMBER(35,4))
12+
RETURNS NUMBER(35,4)
13+
AS
14+
$$
15+
X * Y
16+
$$;
17+
18+
select DATA_PROCESSING.MULTIPLY(499, 599) as product;
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
{
2+
"cells": [
3+
{
4+
"attachments": {},
5+
"cell_type": "markdown",
6+
"id": "a7d72538",
7+
"metadata": {},
8+
"source": [
9+
"## Imports"
10+
]
11+
},
12+
{
13+
"cell_type": "code",
14+
"execution_count": 2,
15+
"id": "2cb04fd9",
16+
"metadata": {},
17+
"outputs": [],
18+
"source": [
19+
"from snowflake.snowpark.session import Session\n",
20+
"import snowflake.snowpark.types as T\n",
21+
"import snowflake.snowpark.functions as F\n",
22+
"from snowflake.snowpark.functions import col\n",
23+
"\n",
24+
"import os\n",
25+
"import json\n",
26+
"import pandas as pd\n",
27+
"\n",
28+
"import warnings\n",
29+
"warnings.filterwarnings(\"ignore\")"
30+
]
31+
},
32+
{
33+
"cell_type": "code",
34+
"execution_count": 7,
35+
"id": "9cc080c0",
36+
"metadata": {},
37+
"outputs": [],
38+
"source": [
39+
"my_dir = os.getcwd()\n",
40+
"connection_parameters = json.load(open(f'/{my_dir}/creds.json'))\n",
41+
"session = Session.builder.configs(connection_parameters).create()"
42+
]
43+
},
44+
{
45+
"attachments": {},
46+
"cell_type": "markdown",
47+
"id": "67790ddf",
48+
"metadata": {},
49+
"source": [
50+
"# Load Data"
51+
]
52+
},
53+
{
54+
"cell_type": "code",
55+
"execution_count": 3,
56+
"id": "6671dc41",
57+
"metadata": {},
58+
"outputs": [],
59+
"source": [
60+
"os.chdir('../')"
61+
]
62+
},
63+
{
64+
"cell_type": "code",
65+
"execution_count": 5,
66+
"id": "ec8fc760",
67+
"metadata": {},
68+
"outputs": [],
69+
"source": [
70+
"# Loading from local CSV-files\n",
71+
"application_record_df = pd.read_csv('data/application_record.csv.zip')\n",
72+
"credit_record_df = pd.read_csv('data/credit_record.csv.zip')"
73+
]
74+
},
75+
{
76+
"cell_type": "code",
77+
"execution_count": 10,
78+
"id": "a2bc30c0",
79+
"metadata": {},
80+
"outputs": [
81+
{
82+
"data": {
83+
"text/plain": [
84+
"<snowflake.snowpark.table.Table at 0x15802e370>"
85+
]
86+
},
87+
"execution_count": 10,
88+
"metadata": {},
89+
"output_type": "execute_result"
90+
}
91+
],
92+
"source": [
93+
"# Upload to Snowflake\n",
94+
"session.sql('USE DATABASE ML_SNOWPARK_CI_CD').collect()\n",
95+
"session.sql('USE SCHEMA DATA_PROCESSING').collect()\n",
96+
"\n",
97+
"session.write_pandas(application_record_df, table_name='APPLICATION_RECORD', auto_create_table=True, overwrite=True)\n",
98+
"session.write_pandas(credit_record_df, table_name='CREDIT_RECORD', auto_create_table=True, overwrite=True)"
99+
]
100+
}
101+
],
102+
"metadata": {
103+
"kernelspec": {
104+
"display_name": "Python 3 (ipykernel)",
105+
"language": "python",
106+
"name": "python3"
107+
},
108+
"language_info": {
109+
"codemirror_mode": {
110+
"name": "ipython",
111+
"version": 3
112+
},
113+
"file_extension": ".py",
114+
"mimetype": "text/x-python",
115+
"name": "python",
116+
"nbconvert_exporter": "python",
117+
"pygments_lexer": "ipython3",
118+
"version": "3.9.16"
119+
}
120+
},
121+
"nbformat": 4,
122+
"nbformat_minor": 5
123+
}

0 commit comments

Comments
 (0)