0% found this document useful (0 votes)

491 views85 pages

Get Started With Databricks For Machine Learning

Uploaded by

Andreia Nicolau

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

491 views85 pages

Get Started With Databricks For Machine Learning

Uploaded by

Andreia Nicolau

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 85

Get Started with

Databricks for
Machine Learning

Databricks Academy
2023

©2023 Databricks Inc. — All rights reserved

Learning goals
Upon completion of this content, you should be able to:

Explain fundamental concepts about using the Databricks Lakehouse

1
Platform for machine learning.
2 Perform basic notebook tasks using the Databricks Lakehouse Platform.

3 Store and manage data in the Lakehouse for machine learning tasks.

4 Create and use a baseline model using AutoML.

5 Create and use a feature store table for model training.

6 Track, register, and manage the stage of a model with MLﬂow.

©2023 Databricks Inc. — All rights reserved

Prerequisites/Technical Considerations
Things to keep in mind before you work through this course

Prerequisites Technical Considerations

Intermediate level knowledge of Python

1 1 A cluster running on DBR ML 13.3+

Basic knowledge of data science and

2 machine learning topics such as Unity Catalog enabled workspace
2
regression/classiﬁcation models, model
evaluation metrics.
Basic knowledge of a machine learning
3 3 Model Serving enabled workspace
library such as scikit-learn.

©2023 Databricks Inc. — All rights reserved

Databricks Lakehouse Fundamentals:

Databricks
Fundamentals

Databricks Academy
2023

©2023 Databricks Inc. — All rights reserved

Learning objectives
Things you’ll be able to do after completing this lesson

• Identify Databricks as the Lakehouse Platform

• Describe core services of the Databricks Lakehouse Platform for
different personas.
• Identify different types of assets in the Databricks Workspace.
• Navigate throughout different sections of the Workspace.
• Perform common actions available in the Workspace.
• Describe Databricks Repos and its features.

©2023 Databricks Inc. — All rights reserved

Databricks Overview
What is Databricks?

Inventor and pioneer

5000+ of the data lakehouse Creator of
global employees

$1B+
in revenue

The Lakehouse Company

$3B Gartner-recognized Leader

in investment Database Management Systems
Data Science and Machine Learning Platforms

©2023 Databricks Inc. — All rights reserved

Databricks
Lakehouse Platform
Lakehouse Platform

Simple
Data Data Data Data Science Unify your data warehousing and AI
Warehousing Engineering Streaming and ML
use cases on a single platform

Unity Catalog
Fine-grained governance for data and AI Open
Built on open source and open standards
Delta Lake
Data reliability and performance

Multicloud
Cloud Data Lake
All structured and unstructured data One consistent data platform across
clouds

©2023 Databricks Inc. — All rights reserved

Databricks ecosystem

©2023 Databricks Inc. — All rights reserved

The lakehouse is for ALL data practitioners

Machine Learning Data Engineers Data Analysts Data Governance

Practitioners

Databricks Lakehouse
©2023 Databricks Inc. — All rights reserved
Data Engineering workloads on Databricks

• Simpliﬁes data engineering

with a curated data lake
approach through Delta Lake

• Data orchestration through

Databricks Workﬂows

• Delta Live Tables manage your

full data pipelines
•
©2023 Databricks Inc. — All rights reserved
Data Analysts workloads on Databricks

• Great performance and

concurrency for BI and SQL
workloads on Delta Lake

• Native SQL interface for analysts

• Support for BI tools to directly

query your most recent data in
Delta Lake
©2023 Databricks Inc. — All rights reserved
ML & Data Science workloads on
Databricks
Machine Learning
• Model registry, reproducibility,
productionization with MLﬂow
• Leverages Delta Lake for reproducibility
• AutoML for citizen data scientists

Data Science
• Collaborative notebooks and
dashboards for interactive analysis
• Native support for Python, Java, R, Scala
• Delta Lake data natively supported

©2023 Databricks Inc. — All rights reserved

Lakehouse Governance with Unity Catalog

Govern and manage all data assets

• Warehouse, Tables, Columns
• Data Lake, Files
• Machine Learning Models
• Dashboards and Notebooks

Capabilities
• Data lineage
• Attribute-based access control
• Security policies
• Auditing
• Data sharing
©2023 Databricks Inc. — All rights reserved
Demo:

Exploring the
Workspace

Databricks Academy
2023

©2023 Databricks Inc. — All rights reserved

Demo
High-level steps

Overview of the UI
• Landing page
• Navigation
Workspace
• Creating and managing assets
• Search assets
• Repos
• Clone a repo
• Pull/push changes

©2023 Databricks Inc. — All rights reserved

Databricks Lakehouse Fundamentals:

Working with
Notebooks

Databricks Academy
2023

©2023 Databricks Inc. — All rights reserved

Learning objectives
Things you’ll be able to do after completing this lesson

• Describe Databricks Notebooks as the most common interface for

data engineers when working with Databricks.
• Recognize common use cases for data engineers when working with
Notebooks.
• Describe Databricks cluster.
• Describe the basic cloud-based compute structure of Databricks.

©2023 Databricks Inc. — All rights reserved

Compute Resources

©2023 Databricks Inc. — All rights reserved

Clusters
Overview

Clusters are made up of one Workloads Cluster

or more virtual machine (VM) Worker

instances Notebook
VM instance

Distributes workloads across

Driver Worker
workers Job
VM instance VM instance
• Driver coordinates activities
of executors DBSQL Worker
• Workers run tasks
VM instance
composing a Spark job

©2023 Databricks Inc. — All rights reserved

Clusters
Overview

Three main compute types: Workloads Cluster

Worker
• All-purpose clusters for
Notebook
interactive development VM instance

• Job clusters for automating

Driver Worker
workloads Job

• SQL Warehouses VM instance VM instance

(Serverless) instant DBSQL Worker

compute to run DBSQL
VM instance
queries and dashboards

©2023 Databricks Inc. — All rights reserved

Cluster Mode

Single node Standard (Multi Node)

Low-cost single-instance Default mode for workloads

cluster catering to single-node developed in any supported
machine learning workloads language (requires at least two
and lightweight exploratory VM instances)
analysis

©2023 Databricks Inc. — All rights reserved

Databricks Runtimes
DB Runtime; A set of core components that run on Databricks clusters

Standard Photon Machine Learning

Apache Spark and An optional add-on to Adds popular machine

many other optimize Spark learning libraries like
components and queries (e.g. SQL, TensorFlow, Keras,
updates to provide an DataFrame) PyTorch, and XGBoost.
optimized big data
analytics experiences

©2023 Databricks Inc. — All rights reserved

ML Runtime
Pre-built machine learning infrastructure

Databricks Machine Learning Runtime

• Optimized and pre-conﬁgured ML Frameworks
• Turnkey distributed ML
• Built-in AutoML
• GPU support out of the box

Built-in ML Frameworks and Built-in support for Built-in support for AutoML and Built-in support for
Model Explainability distributed Training Hyperparameter Tuning Hardware Accelerators

AutoML

©2023 Databricks Inc. — All rights reserved

Notebooks

©2023 Databricks Inc. — All rights reserved

Databricks Notebooks
Collaborative, reproducible, and enterprise ready

Reproducible
Multi-Language Automatically track version
Use Python, SQL, Scala, and R,
history, and use git version
all in one Notebook
control with Repos

Visualizations
Built-in visualizations and Collaborative
support for the most popular Real-time co-presence,
visualization libraries co-editing, and commenting
(e.g. matplotlib, ggplot)

Enterprise Ready
Adaptable Enterprise-grade access
Install standard libraries and
controls, identity management,
use local modules
and auditability

©2023 Databricks Inc. — All rights reserved

Ideal for exploratory data analysis
Native tools for visualizing and understanding data in ML workﬂow

Create interactive charts to Summarize a data set’s essential

visualize data in the Notebook with properties and statistics in a data
only two clicks proﬁle with the push of a button

©2023 Databricks Inc. — All rights reserved

Right tool for quick development
Multi-language support, use standard libraries and custom modules

Mix and match languages based on Install Python libraries for a

use case and preferred workﬂow, notebook without affecting other
choosing from Python, SQL, Scala, users with %pip
and R Import local modules using
arbitrary ﬁle support when working
in Repos
©2023 Databricks Inc. — All rights reserved
Demo:

Working with
Notebooks

Databricks Academy
2023

©2023 Databricks Inc. — All rights reserved

Demo
High-level steps

Compute
• Conﬁgure and launch a cluster for ML
Notebooks
• UI Walkthrough
• Using multiple languages
• Working with Markdown
• Data visualization
• Table
• Graphs
• Data Proﬁler

©2023 Databricks Inc. — All rights reserved

Databricks Lakehouse Fundamentals:

Data Storage and

Management

Databricks Academy
2023

©2023 Databricks Inc. — All rights reserved

Learning objectives
Things you’ll be able to do after completing this lesson

• Describe that data is stored in cloud object storage locations and

accessed via Databricks.
• Explain the beneﬁts of data storage in the data lakehouse architecture
across roles and Databricks services.
• Identify Delta Lake as the optimized storage layer that provides the
foundation for data storage for the data lakehouse.
• Describe Unity Catalog as a centralized governance solution in
Databricks.
• Explain the three-tier namespace and its levels.

©2023 Databricks Inc. — All rights reserved

Control
plane

©2023 Databricks Inc. — All rights reserved

Control
plane

©2023 Databricks Inc. — All rights reserved

Delta Lake
Open-source, default storage format on Databricks

• Delta Lake is an open-source project.

• It is the default format for the tables created
in Databricks.
• Delta Lake is the optimized storage layer that
provides the foundation for storing data and
tables in the Databricks Lakehouse Platform.
• Designed to improve data reliability, quality, and
performance in data lakes.

©2023 Databricks Inc. — All rights reserved

Delta Lake brings ACID to object storage

Atomicity means all transactions either succeed

or fail completely
Consistency guarantees relate to how a given
state of the data is observed by simultaneous
operations
Isolation refers to how simultaneous operations
A C I D
ATOMICITY CONSISTENCY ISOLATION DURABILITY

conﬂict with one another. The isolation

guarantees that Delta Lake provides do differ
from other systems
Durability means that committed changes are
permanent

©2023 Databricks Inc. — All rights reserved

Delta Lake features
Key features

• Uniﬁed batch and streaming

• Automatic schema validation
• Support upserts using the merge operation
• Update your table schema without rewriting data.
• Track row-level changes with Change Data Feed
• Querying previous versions of a table based on version number of
timestamp
• Performance optimization with ZORDER and OPTIMIZE
• Supports multiple programming languages like Python, Scala, and SQL.

©2023 Databricks Inc. — All rights reserved

Delta’s rich ecosystem of connectors

Cloud platforms API languages

Google Scala Ruby Python

DataProc

Azure
Synapse Rust

SQL engines ETL and streaming engines

AWS Redshift Power BI

Spectrum

AWS Athena

©2023 Databricks Inc. — All rights reserved

Data ingestion and transformation for ML
An example data ingestion workﬂow

©2023 Databricks Inc. — All rights reserved

Data and AI
Governance with
Unity Catalog

©2023 Databricks Inc. — All rights reserved

Today, data and AI governance is complex
Data Consumers Data Governance Team
Permissions on ﬁles
“Where to “How to
discover secure
Data lake
the datasets, these
Data analyst Permissions
models, on tables, rows and columns
assets?”
notebooks,
dashboards?”
“Who is
Data warehouse accessing
“Can I trust Data engineer these assets
Permissions on ML
the data and and how?”
models, features
ML models?”

“Are we
ML Models meeting the
ML engineer regulatory
Permissions on reports, compliance?”
dashboards

Applications BI dashboards

©2023 Databricks Inc. — All rights reserved

Unity Catalog (UC)
Uniﬁed governance for data and AI

Uniﬁed visibility into data and AI

Single permission model for data and AI
AI-powered monitoring and observability
Open data sharing

Databricks Unity Catalog

Access Data
Discovery Lineage Monitoring Auditing
Controls Sharing

Tables Files Models Notebook Dashboards

©2023 Databricks Inc. — All rights reserved

Key Capabilities of UC

Governance model:
Unity Catalog
• Uniﬁed governance across clouds
• Centralized metadata and user Databricks Databricks
Workspace Workspace
management
• Centralized access controls
GRANT … ON … TO …
• Grant or revoke permission to data and REVOKE … ON … FROM …

AI assets using the UI or the API.

Catalogs, Databases (schemas), Tables,
Views, Storage credentials, External locations

©2023 Databricks Inc. — All rights reserved

The three level namespace of UC
How to use UC

(Unity) Unity Catalog

Metastore

…
(Unity)
Catalog
Metastore
Managed
External
Model

…
Table
table
Databricks
Catalog

…
Account assigned to
Schema External
Managed
Databricks (Database) Table
Table
Workspace

Databricks View
View
Workspace

SELECT * FROM catalog1.database1.table1;

©2023 Databricks Inc. — All rights reserved

Demo:

Data Storage and

Management

Databricks Academy
2023

©2023 Databricks Inc. — All rights reserved

Demo
High-level steps

Data Storage and Management

• Ingest data and create a Delta table
• View and manage tables
• Performance optimization for Delta
• Manage permissions with Unity Catalog

©2023 Databricks Inc. — All rights reserved

Databricks for Machine Learning:

Introduction to
Databricks for
Machine Learning
Databricks Academy
2023

©2023 Databricks Inc. — All rights reserved

Learning objectives
Things you’ll be able to do after completing this lesson

• Describe MLﬂow as an open source platform for managing the

end-to-end machine learning lifecycle that’s built into Databricks.
• Describe MLﬂow Experiments as a tool for tracking model
development runs and comparing the resulting model parameters
and metrics.
• Describe the Model Registry as a centralized model store for
managing models’ full lifecycle, including versioning and annotating.
• [Extra] Describe AutoML and its features.

©2023 Databricks Inc. — All rights reserved

Databricks supports both coding and low-coding users

Low code ML with AutoML Multi-language Notebooks

UI based ML development with a glass box approach Co-edit Notebooks in Python, R, Scala, and SQL

©2023 Databricks Inc. — All rights reserved

Databricks Machine Learning
A data-native and collaborative solution for the full ML lifecycle

Collaborative Multi-Language Notebooks

AutoML

Model Model Runtime and Batch

Data Scoring
Training Tuning Environments
Prep
Online Serving
Data Feature Jobs and API Automation
Versioning Store Monitoring

MLOps / Governance powered by

Open Multi-Cloud Data Lakehouse Foundation with

AutoML

©2023 Databricks Inc. — All rights reserved

AutoML
Rapid, simpliﬁed machine learning for everyone

Quick-start ML initiatives Select and input dataset (UI

Automated Data Prep
or API)
Accelerate your time to production,
Save weeks on ML projects

Auto-generated Automated Training and Automated Feature

notebooks Model Selection Engineering

Ensure best practices.

Customize baseline models with
your domain expertise Automated Explore Generated
Hyperparameter Tuning Artifacts and Notebooks
Wide range of problems
Solve classiﬁcation, regression, and
forecasting problems from a variety
of ML libraries Monitor Deploy

©2023 Databricks Inc. — All rights reserved

Databricks AutoML
A glass-box solution empowering data teams without taking away control

UI and API to start Auto-created

AutoML training MLﬂow Experiment Easily deploy to
Model Registry
to track models and
metrics

Auto-generated Understand and

debug data
Data Exploration quality and
notebook preprocessing

Auto-generated Iterate further on

models from
notebooks with AutoML, adding
source code your expertise

©2023 Databricks Inc. — All rights reserved

AutoML solves two key pain points

Quickly Verify the Predictive Power of Get a Baseline Model to Guide Project
a Dataset Direction

Data
Marketing Data Science
Team Science Team
Team Dataset Baseline
Dataset Model

“Can this dataset be used to “What direction should I go in for

predict customer churn?” this ML project and what
benchmark should
I aim to beat?”

©2023 Databricks Inc. — All rights reserved

MLﬂow

©2023 Databricks Inc. — All rights reserved

Core Machine Learning Issues
Modern ML lifecycle comes with many challenges

• Keeping track of experiments or model development

• Reproducing code
• Comparing models
• Standardization of packaging and deploying models

MLﬂow addresses these issues.

©2023 Databricks Inc. — All rights reserved

MLﬂow
What is mlﬂow?

• Open-source platform for

machine learning lifecycle
• Operationalizing machine learning
• Developed by Databricks
• Pre-installed on the Databricks
Runtime for ML

Model
Registry

©2023 Databricks Inc. — All rights reserved

MLﬂow Components
The four components of MLﬂow

Tracking Projects Models Model Registry

Record and Packaging General model Centralized and
query format format that collaborative
experiments: for reproducible supports diverse model lifecycle
code, data, runs on any deployment tools management
conﬁg, results platform

APIs: CLI, Python, R, Java, REST

Inspect, Visualize and Compare

Metrics

mlflow.autolog()
Track ML development with one Model, environment,
line of code: parameters, metrics, and artifacts
data lineage, model, and
environment.

Auto-generated Data
Exploration Notebook

©2023 Databricks Inc. — All rights reserved

MLﬂow Model Registry
Features and Architecture
Tracking Server

• Collaborative, centralized model hub

Parameters Metrics Artifacts Models
• Allows Versioning of ML artifacts
• Facilitate experimentation, testing, and
production
Model Registry
• Integrate with approval and governance Data Deployment Engineers
workﬂows Scientists

Staging Productio Archived

• Audit log of stage transitions and requests, n

approval workﬂow for stage transitions

• Helps in automation through CI/CD

integration
v3

Demo:

Experimentation
with AutoML

Databricks Academy
2023

Demo
High-level steps

Create an Experiment
• Create and run an AutoML experiment
• View the best model
Model Registry
• Register the best model to Model Registry
• Manage model stages

Databricks for Machine Learning:

End-to-End ML
on the Lakehouse

Databricks Academy
2023

Learning objectives
Things you’ll be able to do after completing this lesson

• Compare and contrast model governance solutions with and without

Unity Catalog.
• Describe the Databricks Feature Store as a centralized repository that
enables data scientists to find and share features.
• Describe Workflows as a capability to productionize data workflows.
• Describe Jobs as a simple solution to schedule and automate one or
more tasks.
• Describe Databricks’ built-in model serving capabilities with real-time
inference, streaming, and batch

Databricks Machine Learning
A data-native and collaborative solution for the full ML lifecycle

Collaborative Multi-Language Notebooks

AutoML

Model Model Runtime and Batch

Data Scoring
Training Tuning Environments
Prep
Online Serving
Data Feature Jobs and API Automation
Versioning Store Monitoring

MLOps / Governance powered by

Open Multi-Cloud Data Lakehouse Foundation with

MLOps - End 2 End workﬂow
Setup MLFlow Model
webhook Schedule Monthly
Slack notiﬁcations, Trigger Retrain Job
Testing Jobs Databricks Job

Data Prep & Build baseline Model Promote Best Run to Automated Model testing Run inferences
Featurization with AutoML Registry Schema, Demographic Load model
ETL + EDA, Feature MLﬂow autologging, Annotate model. Request accuracy, Docs & artifacts… Batch or
Engineering with Koalas Hyperopt +Spark transition to staging Approve/reject request realtime

Approved.
Move to
STAGING STAGING
Tracking
Feature Store STAGING Request
Model Registry Rejected
Webhook triggers test

...
Realtime
HTTP inference

Data Scientist ML Engineer Data Engineer

Feature Store

Feature Store
How feature stores help?

• Feature store provide a centralized repository for managing and serving

machine learning (ML) features.
• Feature stores provide auditing and logging capabilities to track who
accessed or modiﬁed features.
• Feature store helps to handle scaling requirements of feature storage,
retrieval, and serving, ensuring that ML pipelines can operate efﬁciently.
• Feature store allows reusing feature across projects, reducing
duplication.

Why would you need a feature store?
Basic Motivations

Discovery

Multiple Data Scientists are trying to solve similar modeling tasks and come up with different deﬁnitions
of the same features. How can I ﬁnd the features?

Lineage

Model governance requires documentation of the features used to train a model, as well as the
upstream lineage of a feature to reliably use it. How is it computed, and who owns it?

Skew

When multiple teams manage feature computation and ML models in production, minor yet signiﬁcant
skew in upstream data at the input of a feature pipeline can be very hard to detect and ﬁx.

Online Serving

During exploration and model experimentation phases features are implemented in frameworks that do
not scale to production.

Databricks Feature Store
Feature Deﬁnitions Feature Tables Training Data Set Creation
● Deﬁne reusable, ● Represent features as tables
shareable featurization that can be queried from any
logic language
● SQL, ACLs, versions, and
performance optimizations
Feature 1 Feature 2 snapshot

Batch Scoring

load
save Customer Item
Features Features

publish Online Serving

Databricks
Model Serving
... ...

Model Serving Modes
Serving models for batch, streaming, real-time and, edge inference

Batch • High latency

• Leverages databases or object storage
• Fast retrieval of stored predictions

Delta Lake /
Feature Store
Streaming • Stream processing
• Moderately fast scoring on new data
Model
Registry
Real Time • Low latency scoring
Model training • High availability
• Usually using REST (containers, K8s)

Embedded (Edge) • Special case deployments

• Limited connectivity with cloud services

Challenges with building Real-time ML Systems
Most ML models don’t get into production

ML infrastructure is hard Deploying real time models Operating production ML

needs disparate tools requires expert resources

Real-time ML systems Data teams use diverse tools Steep learning curve of
require fast and scalable to develop models deployment tools.
serving infrastructure, which
Customers use separate Model deployment is
is costly to build and
platforms for data, ML, and bottlenecked by limited
maintain
Serving, adding complexity engineering resources,
and cost limiting the ability to scale

Databricks Model Serving

• Multiple model scoring and

deployment choices
World class • Leading multi-cloud inference
model scoring provider giving the customer the
and deployment choice of what, where, and when
they will score their model
options
• Ultra low latency real-time model
serving

Model deployment with Model Serving
Flexible deployment at any scale

Batch scoring
One-click deployment of models
from the Model Registry to scalable
compute clusters for batch scoring

Online scoring
One-click deployment of models to
REST endpoints for auto-scaling low
latency scoring

Core Features of Model Serving
Support real-time production ML workloads

Real Time Lakehouse Uniﬁed Simpliﬁed Deployment

● Low overhead latency: <100ms ● Feature Store Integrated: ● Simple: Endpoints UI and API for
Automated online lookups simple deployment
● Throughput: 3K+ QPS
● MLflow Integrated: Fast & easy ● Flexible: Traffic splitting for staged
● Availability: 99.5% model deployment roll-out and A/B testing
● Scalable: Automatically scales ● Quality & Diagnostics: Payload
● Manageable: Endpoints
up/down to handle bursty traffic logging to Delta
observability with built-in-metrics
● Secure: PrivateLink and ● Unified governance: Manage data and export options
IP-allowlist & AI with UCt

Orchestration with
Workﬂows

Databricks Workﬂows
Databricks Workﬂows

Workﬂows is a fully-managed
cloud-based general-purpose task
orchestration service for the entire Lakehouse Platform
Lakehouse. Data Data Data Data Science
Warehousing Engineering Streaming and ML

Unity Catalog
Workﬂows is a service for data Fine-grained governance for data and AI

engineers, data scientists, and analysts Delta Lake

Data reliability and performance

to build reliable data, analytics and AI Cloud Data Lake

All structured and unstructured data
workﬂows on any cloud.

Workﬂows Features

Orchestrate Anything Fully Managed Simple Workﬂow

Anywhere Authoring
Run diverse workloads for the full Remove operational overhead An easy point-and-click
data and AI lifecycle, on any with a fully managed authoring experience for all your
cloud. Orchestrate; orchestration service enabling data teams not just those with
you to focus on your workﬂows specialized skills
• Notebooks
not on managing your
• Delta Live Tables
infrastructure
• Jobs for SQL
• ML models, and more

Workﬂow features
Key features

Databricks Workﬂow offers:

• Monitoring and debugging
• Repair only failed tasks and sub-tasks Tasks

• Reduces the time and resources required to

recover from unsuccessful job runs
• Access Control
• Manage access across different teams
• Scheduling
• Run jobs immediately or periodically
• Alerts
Job run
©2023 Databricks Inc. — All rights reserved
Example Workﬂow

Data ingestion funnel

E.g. Auto Loader, DLT

Data ﬁltering, quality assurance, transformation

E.g. DLT, SQL, Python

ML feature extraction
E.g. MLﬂow

Persisting features and training prediction

model

Demo:

End-to-End ML
on the Lakehouse

Databricks Academy
2023

Demo
High-level steps

End-to-end ML
• Create a feature store table
• Train and track a model with MLﬂow
• Register a model to Model Registry
• Transition model to next stage
• Use model for batch inference
• Automate inference with Workﬂows

Course Summary
and Next Steps

Databricks Academy
2023

Extra Resources

Feature Store
The ﬁrst Feature Store codesigned with a Data and MLOps Platform

Feature Store
Batch (high throughput)
Feature
Feature Registry
Provider
Online (low latency)

Feature Registry Feature Provider

● Discoverability and Reusability ● Batch and online access to Features
● Versioning ● Feature lookup packaged with Models
● Upstream and downstream Lineage ● Simpliﬁed deployment process

Co-designed with Co-designed with

● Open format ● Open model format that supports all ML

frameworks
● Built-in data versioning and governance
● Feature version and lookup logic hermetically
● Native access through PySpark, SQL, etc. logged with Model

Databricks 101
No ratings yet
Databricks 101
16 pages
Databricks Guide
No ratings yet
Databricks Guide
27 pages
Data Intelligence With Azure Databricks - Virtual 22 - 02 - 2024
No ratings yet
Data Intelligence With Azure Databricks - Virtual 22 - 02 - 2024
32 pages
Simplify Your Streaming
No ratings yet
Simplify Your Streaming
27 pages
De Mod 5 Deploy Workloads With Databricks Workflows
No ratings yet
De Mod 5 Deploy Workloads With Databricks Workflows
19 pages
Programming For Data Science - Assignment 1
No ratings yet
Programming For Data Science - Assignment 1
2 pages
Cert DEWD (Edits)
No ratings yet
Cert DEWD (Edits)
158 pages
Final - Data and Ai Governance.6sept2023
No ratings yet
Final - Data and Ai Governance.6sept2023
42 pages
Python Basics for Aspiring Engineers
No ratings yet
Python Basics for Aspiring Engineers
4 pages
ABD22 1st Exam - 6 January - Attempt Review
No ratings yet
ABD22 1st Exam - 6 January - Attempt Review
13 pages
A Guide To Teaching Data Science PDF
No ratings yet
A Guide To Teaching Data Science PDF
26 pages
Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Tungban Machine Learning Math Course
No ratings yet
Tungban Machine Learning Math Course
124 pages
Databricks Certified Data Engineer Professional Exam Guide 1 Mar 2025
No ratings yet
Databricks Certified Data Engineer Professional Exam Guide 1 Mar 2025
6 pages
Understanding Data Contracts
No ratings yet
Understanding Data Contracts
7 pages
Lec 37
No ratings yet
Lec 37
13 pages
Lesson 1 - Course - Introduction
No ratings yet
Lesson 1 - Course - Introduction
9 pages
Mining Your Data Lake For Analytics Insights v3 101420
No ratings yet
Mining Your Data Lake For Analytics Insights v3 101420
16 pages
Programming For Data Science Assignment-2
No ratings yet
Programming For Data Science Assignment-2
23 pages
Missing Value Treatment
No ratings yet
Missing Value Treatment
22 pages
Python Data Science Cookbook - (Preface) PDF
No ratings yet
Python Data Science Cookbook - (Preface) PDF
8 pages
Pythons Basics
No ratings yet
Pythons Basics
104 pages
Big Data Analytics
No ratings yet
Big Data Analytics
134 pages
Azure Databricks Notes
No ratings yet
Azure Databricks Notes
20 pages
Databricks Champions Program Guide v2
No ratings yet
Databricks Champions Program Guide v2
10 pages
Databricks Certified Professional Data Engineer
No ratings yet
Databricks Certified Professional Data Engineer
15 pages
Bank Loan Analysis Final Presentation
No ratings yet
Bank Loan Analysis Final Presentation
29 pages
1.introduction To Python For Data Science
No ratings yet
1.introduction To Python For Data Science
6 pages
Federal Government Data Maturity Model
No ratings yet
Federal Government Data Maturity Model
8 pages
Manage Data Access With Unity Catalog
No ratings yet
Manage Data Access With Unity Catalog
17 pages
Apache Spark For Beginners
No ratings yet
Apache Spark For Beginners
30 pages
Big Data Hadoop Training Certification 7
No ratings yet
Big Data Hadoop Training Certification 7
40 pages
Introduction To Data Visualization With Matplotlib Chapter2
No ratings yet
Introduction To Data Visualization With Matplotlib Chapter2
27 pages
DP 3011 ENU PowerPoint - 01 Content
No ratings yet
DP 3011 ENU PowerPoint - 01 Content
42 pages
Loan Risk Analysis With Databricks and XGBoost - A Databricks Guide, Including Code Samples and Notebooks (2019)
No ratings yet
Loan Risk Analysis With Databricks and XGBoost - A Databricks Guide, Including Code Samples and Notebooks (2019)
11 pages
Lesson 0 - SDS 2102 - Computer Programming For Data Science I - Course Outline
No ratings yet
Lesson 0 - SDS 2102 - Computer Programming For Data Science I - Course Outline
6 pages
Databricksmcqsquestionsandanswers
No ratings yet
Databricksmcqsquestionsandanswers
5 pages
Assignment - Machine Learning
No ratings yet
Assignment - Machine Learning
3 pages
IICT - Data Science
No ratings yet
IICT - Data Science
22 pages
De Mod 3 Manage Data With Delta Lake
No ratings yet
De Mod 3 Manage Data With Delta Lake
16 pages
(Guia Databrick Lakehouse)
No ratings yet
(Guia Databrick Lakehouse)
83 pages
Slides l4 Ts
No ratings yet
Slides l4 Ts
162 pages
Pandas vs PySpark: Data Operations
No ratings yet
Pandas vs PySpark: Data Operations
3 pages
Unit 1
No ratings yet
Unit 1
61 pages
LLM With Knowledge Graphs
No ratings yet
LLM With Knowledge Graphs
40 pages
Pandas DataFrames Guide
No ratings yet
Pandas DataFrames Guide
147 pages
Databricks Exam
No ratings yet
Databricks Exam
14 pages
Databricks Certified Professional Data Engineer Jun 2024
No ratings yet
Databricks Certified Professional Data Engineer Jun 2024
21 pages
APJ Elevate - Databricks Certification Exam Overview Training Data Analyst Associate
No ratings yet
APJ Elevate - Databricks Certification Exam Overview Training Data Analyst Associate
96 pages
06-Setting Up Unity Catalog
No ratings yet
06-Setting Up Unity Catalog
5 pages
RMM Unit-I Introdution To Data Mining
No ratings yet
RMM Unit-I Introdution To Data Mining
129 pages
The Medallion Architecture
100% (1)
The Medallion Architecture
2 pages
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
100% (1)
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
135 pages
ML Workflows for Cybersecurity
No ratings yet
ML Workflows for Cybersecurity
39 pages
Azure Databricks An Introduction
100% (1)
Azure Databricks An Introduction
54 pages
Data Exploration and Visualization Laboratory - AD3301 - Lab Manual
No ratings yet
Data Exploration and Visualization Laboratory - AD3301 - Lab Manual
55 pages
Python Data Associate Certification Study Guide
No ratings yet
Python Data Associate Certification Study Guide
2 pages
Ultimate Azure Data Scientist Associate DP 100 Certification Guide 1st Edition Rajib Kumar de Download
100% (8)
Ultimate Azure Data Scientist Associate DP 100 Certification Guide 1st Edition Rajib Kumar de Download
61 pages
Data Engineering Databricks
No ratings yet
Data Engineering Databricks
139 pages
Data Engineering With Databricks
100% (2)
Data Engineering With Databricks
63 pages
Tome Lifting From The CVPR 2017 Paper
No ratings yet
Tome Lifting From The CVPR 2017 Paper
10 pages
Sustainable Earth Architecture
No ratings yet
Sustainable Earth Architecture
6 pages
ICREC 2015 Final Paper Ruzicka Et Al Final
No ratings yet
ICREC 2015 Final Paper Ruzicka Et Al Final
5 pages
Deloitte Uk Future of Sport Report Updated
No ratings yet
Deloitte Uk Future of Sport Report Updated
21 pages
I Coded A Tinder Bot To Automate My Dating Life. Here's What Happened by Frank Andrade Geek Culture Oct, 2022 Medium
No ratings yet
I Coded A Tinder Bot To Automate My Dating Life. Here's What Happened by Frank Andrade Geek Culture Oct, 2022 Medium
1 page
Effect of Aging On The Association Between Ankle Muscle Strength and The Control of Bipedal Stance
No ratings yet
Effect of Aging On The Association Between Ankle Muscle Strength and The Control of Bipedal Stance
10 pages
Duplicated Medium
No ratings yet
Duplicated Medium
27 pages
OpenSim40 API Guide DevWorkshop
No ratings yet
OpenSim40 API Guide DevWorkshop
24 pages
Effect of Knee Joint Angle and Contraction Intensity On Hamstrings Coactivation
No ratings yet
Effect of Knee Joint Angle and Contraction Intensity On Hamstrings Coactivation
9 pages
Finnish Javelin Training Insights
No ratings yet
Finnish Javelin Training Insights
5 pages
Matrices Rotations
No ratings yet
Matrices Rotations
7 pages
ABAP
No ratings yet
ABAP
15 pages
Inspection Fixture Design Guideline - HONDA
No ratings yet
Inspection Fixture Design Guideline - HONDA
18 pages
Upgrading Controller Hardware in A Singlenode
No ratings yet
Upgrading Controller Hardware in A Singlenode
32 pages
Game Debugging Log Errors
No ratings yet
Game Debugging Log Errors
3 pages
Data Analysis for Business Insights
No ratings yet
Data Analysis for Business Insights
8 pages
Effects On Playing Online Games in Academic Performance Among Grade 12 Stem Students in Eastern Visayas State University
100% (1)
Effects On Playing Online Games in Academic Performance Among Grade 12 Stem Students in Eastern Visayas State University
7 pages
Sample Paper
No ratings yet
Sample Paper
5 pages
A Study of SQL Techinque PDF
No ratings yet
A Study of SQL Techinque PDF
12 pages
Creativity Assessment in Engineering
No ratings yet
Creativity Assessment in Engineering
15 pages
Lotion Foam Soap Safety Data
No ratings yet
Lotion Foam Soap Safety Data
3 pages
16 AI Driven UX UI Design Empirical Research and Applications in FinTech
No ratings yet
16 AI Driven UX UI Design Empirical Research and Applications in FinTech
11 pages
Tableau - Diabetes Dataset Assessment
No ratings yet
Tableau - Diabetes Dataset Assessment
2 pages
SmartCare Sparkle 2024 EN-V1-20240410
No ratings yet
SmartCare Sparkle 2024 EN-V1-20240410
48 pages
A Digital Computer System (DCS) : Hardware
No ratings yet
A Digital Computer System (DCS) : Hardware
36 pages
Information Systems 7th Edition Baltzan Instant Download
100% (1)
Information Systems 7th Edition Baltzan Instant Download
49 pages
Nosql Unit-2 Notes
No ratings yet
Nosql Unit-2 Notes
36 pages
F2 20-21 讲义 PDF
No ratings yet
F2 20-21 讲义 PDF
275 pages
How To Write A Phenomenological Dissertation A Step by Step Guide
100% (2)
How To Write A Phenomenological Dissertation A Step by Step Guide
193 pages
Test Section 2 Quiz
50% (2)
Test Section 2 Quiz
5 pages
Web Lab N-SCHEME (11-15) PGM and Op
No ratings yet
Web Lab N-SCHEME (11-15) PGM and Op
20 pages
Re: Application For A Position of Assistant Data Manager
No ratings yet
Re: Application For A Position of Assistant Data Manager
12 pages
CST 232 Assignment 1 1718
No ratings yet
CST 232 Assignment 1 1718
5 pages
Tutorial Installasi Hadoop PDF
No ratings yet
Tutorial Installasi Hadoop PDF
10 pages
Oracle 10g UTL_FILE Guide
No ratings yet
Oracle 10g UTL_FILE Guide
7 pages
ERP Data Migration & Deployment
No ratings yet
ERP Data Migration & Deployment
20 pages
MATH GR11 (Stats Prob) QTR3-MODULE-4
No ratings yet
MATH GR11 (Stats Prob) QTR3-MODULE-4
28 pages
Laxmi vs Prabhu Bank Liquidity Analysis
67% (3)
Laxmi vs Prabhu Bank Liquidity Analysis
14 pages
iQR Code: Features and Benefits
No ratings yet
iQR Code: Features and Benefits
6 pages
The Identification of Constraining Factors Impacting Design Bid Build Project Delivery in Tanzania Construction Industry
No ratings yet
The Identification of Constraining Factors Impacting Design Bid Build Project Delivery in Tanzania Construction Industry
15 pages
Quezon City University: College of Computer Science and Information Technology
No ratings yet
Quezon City University: College of Computer Science and Information Technology
4 pages