What are Mapping Data Flows?
• Transform at scale, in the cloud
• Code-free pipelines do NOT require
understanding of Spark / Scala /
Python / Java
• Serverless scale-out
transformation execution engine
• Resilient data transformation Flows
built for big data scenarios
with unstructured data
requirements
• Operationalized with Data
Factory scheduling, control flow and
monitoring
Code-free Data Transformation At Scale
• Does not require understanding of Spark, big data execution engines,
clusters, Scala, Python, etc
• Focus on building business logic and data transformation
• Data cleansing
• Aggregation
• Data conversions
• Data prep
• Data exploration
Modern Data Warehouse Pattern Today
Data
Loading
Databases
Ingest storage Data processing Serving storage
Azure Data
Factory Applications
Load processed
Read data from data into tables
r Load flat files
into data lake on a Azure Storage/ files using DBFS optimized for
Data Lake Store analytics
Logs, files, and media schedule Azure Databricks Azure SQL DW
(unstructured)
Orchestration Clean and Dashboards
join with
stored data
Business/custom apps
(structured) Load to SQL DW
Extract and
transform Azure Data
relational data Factory
Modern Data Warehouse Pattern with Mapping Data
Flows
Databases
Azure Data
Factory Applications
Load processed
Extract and
data into tables
transform
optimized for
r Load files into data
lake on a schedule Azure Storage/
relational data
analytics
Data Lake Store Azure Data
Logs, files, and media Factory Azure SQL DW
(unstructured) Clean and
join disparate
data
Azure Databricks
Dashboards
Business/custom apps
(structured)
Pipeline execution of a Data Flow Activity
Slowly Changing Dimension Scenario
Data De-Duplication
Load Fact Table in DW Scenario
Data Lake Data Science Scenario
Microsoft Azure Data Factory Continues to Extend Data Flow Library with
a Rich Set of Transformations and Expression Functions
Expression builder
All available
functions, fields,
parameters …
Build expressions
here with full auto-
List of columns
complete and syntax
being modified
checking
View results of your
expression in the data
preview pane with live,
interactive results
Switch to Debug Mode and select sample data to
work with for debugging
Debug Data Flows with Data Preview and Data Sampling
Deep Monitoring Introspection of Data Transformations
Schema drift
• In most real-world data integration solutions, source and target
data stores will change shape
• Source data fields will change name
• Number of columns will change over time
• Traditional ETL processes break when schemas drift
• Mapping Data Flow has built-in facilities for flexible schemas to
handle schema drift
• Patterns, rule-based mapping, byName function
• Source: Read additional columns on top of what is defined in the dataset source
• Sink: Write additional columns on top of what is defined in the dataset sink
Pattern matching
• Match by name, type, stream, ordinal position
Rule-based mapping
Resources
• Tutorial Videos: http://aka.ms/dataflowvideos
• Patterns: http://aka.ms/dataflowpatterns
• Documentation: https://docs.microsoft.com/en-us/azure/data-factory/concepts-data-flow-overview
• Expression Language: http://aka.ms/dataflowexpressions
• Data Flow Performance guide: https://aka.ms/dfperf
• Combined Links: https://aka.ms/dflinks