Databricks Certified Data Engineer Associate - Practice Questions
Apache Spark & Notebooks
Q: What is a common use of markdown cells in notebooks?
A. C++
B. Returns all elements of the DataFrame as a list
C. Documentation
D. To run another notebook
Answer: C
Q: What is a benefit of using notebooks in Databricks?
A. Returns all elements of the DataFrame as a list
B. C++
C. Supports interactive development
D. Documentation
Answer: C
Q: Which language is NOT supported in Databricks notebooks?
A. To run another notebook
B. Supports interactive development
C. df.cache()
D. C++
Answer: D
Q: How do you cache a DataFrame in Spark?
A. Documentation
B. df.cache()
C. DataFrame
D. Supports interactive development
Answer: B
Q: How is SparkSession accessed in Databricks?
A. spark
B. C++
C. Documentation
D. To run another notebook
Answer: A
Databricks Certified Data Engineer Associate - Practice Questions
Q: How do you write comments in Python notebooks?
A. To run another notebook
B. # This is a comment
C. spark
D. C++
Answer: B
Q: What does `display(df)` do?
A. Supports interactive development
B. # This is a comment
C. Renders a DataFrame in a tabular format with visualization options
D. spark
Answer: C
Q: What is `%run` used for in notebooks?
A. Supports interactive development
B. To run another notebook
C. spark
D. DataFrame
Answer: B
Q: What does the `.collect()` method do?
A. Renders a DataFrame in a tabular format with visualization options
B. DataFrame
C. df.cache()
D. Returns all elements of the DataFrame as a list
Answer: D
Q: What does `spark.read.csv()` return?
A. C++
B. df.cache()
C. Documentation
D. DataFrame
Answer: D
Data Governance & Security
Databricks Certified Data Engineer Associate - Practice Questions
Q: Which layer defines table-level access?
A. Stores metadata about data assets
B. Catalog permissions
C. A shared environment for users
D. Data permissions and lineage
Answer: B
Q: Who defines data access policies in Unity Catalog?
A. Data permissions and lineage
B. Through access control lists (ACLs)
C. Stores metadata about data assets
D. Data stewards or admins
Answer: D
Q: What does Unity Catalog manage?
A. A shared environment for users
B. Data permissions and lineage
C. Through access control lists (ACLs)
D. Role-Based Access Control
Answer: B
Q: How are user permissions granted?
A. Role-Based Access Control
B. Stores metadata about data assets
C. Through access control lists (ACLs)
D. Assign roles to users
Answer: C
Q: What is a workspace in Databricks?
A. Tracks access logs and usage history
B. Role-Based Access Control
C. Assign roles to users
D. A shared environment for users
Answer: D
Q: What is one way to restrict data access?
Databricks Certified Data Engineer Associate - Practice Questions
A. Data permissions and lineage
B. Tracking data origin and transformations
C. Catalog permissions
D. Assign roles to users
Answer: D
Q: What is data lineage?
A. A shared environment for users
B. Catalog permissions
C. Data stewards or admins
D. Tracking data origin and transformations
Answer: D
Q: What is RBAC?
A. Assign roles to users
B. Tracks access logs and usage history
C. Role-Based Access Control
D. Data stewards or admins
Answer: C
Q: What is the role of a metastore?
A. Role-Based Access Control
B. Stores metadata about data assets
C. Tracks access logs and usage history
D. Data stewards or admins
Answer: B
Q: How does Unity Catalog improve auditing?
A. Assign roles to users
B. Tracks access logs and usage history
C. Data stewards or admins
D. Catalog permissions
Answer: B
Data Ingestion & Transformation
Q: Which tool helps with transformation jobs?
Databricks Certified Data Engineer Associate - Practice Questions
A. JSON
B. XLS
C. Databricks Workflows
D. df.write.format('delta').save('path')
Answer: C
Q: What is a common data ingestion format in Databricks?
A. XLS
B. Incrementally ingesting data from cloud storage
C. df.write.format('delta').save('path')
D. JSON
Answer: D
Q: Which function applies transformation to each row?
A. JSON
B. Structured Streaming
C. Databricks Workflows
D. map
Answer: D
Q: Which format is NOT typically used in Databricks ingestion?
A. map
B. XLS
C. spark.read.csv('file.csv')
D. dropna
Answer: B
Q: How do you write a DataFrame as Delta?
A. Structured Streaming
B. Incrementally ingesting data from cloud storage
C. JSON
D. df.write.format('delta').save('path')
Answer: D
Q: How to read CSV data into a DataFrame?
A. JSON
Databricks Certified Data Engineer Associate - Practice Questions
B. df.write.format('delta').save('path')
C. Structured Streaming
D. spark.read.csv('file.csv')
Answer: D
Q: Which method is used for cleaning data?
A. Structured Streaming
B. JSON
C. df.write.format('delta').save('path')
D. dropna
Answer: D
Q: Which method ingests streaming data?
A. JSON
B. Structured Streaming
C. readStream
D. dropna
Answer: C
Q: What is 'autoloader' used for?
A. JSON
B. spark.read.csv('file.csv')
C. df.write.format('delta').save('path')
D. Incrementally ingesting data from cloud storage
Answer: D
Q: Which API supports streaming in Spark?
A. dropna
B. JSON
C. Structured Streaming
D. map
Answer: C
Databricks Lakehouse Platform
Q: Which storage format does Lakehouse architecture commonly use?
A. Unified BI and ML analytics
Databricks Certified Data Engineer Associate - Practice Questions
B. Lack of schema enforcement and consistency
C. Open formats and APIs
D. Delta Lake
Answer: D
Q: How does Lakehouse support ML workloads?
A. By enabling data scientists to access the same data used in analytics
B. ACID transactions
C. Unified BI and ML analytics
D. Open formats and APIs
Answer: A
Q: What is one way Lakehouse reduces data movement?
A. It combines the benefits of data lakes and data warehouses
B. Unified data platform
C. Unified BI and ML analytics
D. By enabling data scientists to access the same data used in analytics
Answer: B
Q: Which layer of Lakehouse handles governance and security?
A. Open formats and APIs
B. Metadata layer
C. By enabling data scientists to access the same data used in analytics
D. ACID transactions
Answer: B
Q: Which component enables data reliability in a Lakehouse?
A. Unified data platform
B. Lack of schema enforcement and consistency
C. It combines the benefits of data lakes and data warehouses
D. ACID transactions
Answer: D
Q: What is a common use case of a Lakehouse?
A. Unified BI and ML analytics
B. ACID transactions
Databricks Certified Data Engineer Associate - Practice Questions
C. Batch and streaming workloads
D. Unified data platform
Answer: A
Q: Why are traditional data lakes insufficient for BI workloads?
A. Batch and streaming workloads
B. Lack of schema enforcement and consistency
C. Metadata layer
D. Open formats and APIs
Answer: B
Q: Which feature allows multiple tools to access the same data in Lakehouse?
A. Open formats and APIs
B. Delta Lake
C. Metadata layer
D. It combines the benefits of data lakes and data warehouses
Answer: A
Q: What is the primary benefit of the Databricks Lakehouse Platform?
A. Open formats and APIs
B. By enabling data scientists to access the same data used in analytics
C. Batch and streaming workloads
D. It combines the benefits of data lakes and data warehouses
Answer: D
Q: What type of data workloads can be handled by a Lakehouse?
A. It combines the benefits of data lakes and data warehouses
B. Open formats and APIs
C. Delta Lake
D. Batch and streaming workloads
Answer: D
Delta Lake
Q: Which method updates a Delta table conditionally?
A. Parquet
B. MERGE INTO
Databricks Certified Data Engineer Associate - Practice Questions
C. Data reliability with ACID transactions
D. _delta_log
Answer: B
Q: How can schema evolution be enabled in Delta?
A. RESTORE
B. A table stored in Delta format with transaction support
C. Transaction log
D. mergeSchema=True
Answer: D
Q: What is a Delta table?
A. Transaction log
B. Parquet
C. Data reliability with ACID transactions
D. A table stored in Delta format with transaction support
Answer: D
Q: How to enable change data feed in Delta Lake?
A. VACUUM
B. Transaction log
C. Set 'delta.enableChangeDataFeed = true'
D. RESTORE
Answer: C
Q: Which command is used to remove old files in Delta tables?
A. Parquet
B. RESTORE
C. A table stored in Delta format with transaction support
D. VACUUM
Answer: D
Q: What does Delta Lake use for ACID transactions?
A. VACUUM
B. Data reliability with ACID transactions
C. _delta_log
Databricks Certified Data Engineer Associate - Practice Questions
D. A table stored in Delta format with transaction support
Answer: C
Q: What operation allows restoring a table to a previous state?
A. Transaction log
B. RESTORE
C. mergeSchema=True
D. Set 'delta.enableChangeDataFeed = true'
Answer: B
Q: What is one benefit of Delta Lake?
A. Set 'delta.enableChangeDataFeed = true'
B. VACUUM
C. A table stored in Delta format with transaction support
D. Data reliability with ACID transactions
Answer: D
Q: Which file format is used by Delta Lake?
A. VACUUM
B. Set 'delta.enableChangeDataFeed = true'
C. Transaction log
D. Parquet
Answer: D
Q: What enables time travel in Delta Lake?
A. A table stored in Delta format with transaction support
B. Transaction log
C. VACUUM
D. RESTORE
Answer: B
ETL Pipelines & Workflows
Q: What is a task in Databricks Jobs?
A. Via Widgets or Job Parameters
B. A unit of work like running a notebook or script
C. Single Node
Databricks Certified Data Engineer Associate - Practice Questions
D. Python task
Answer: B
Q: How are job parameters passed?
A. Governance on cluster configurations
B. Jobs UI
C. Via Widgets or Job Parameters
D. max_retries
Answer: C
Q: What is a multi-task job?
A. Workflow with multiple dependent tasks
B. Jobs UI
C. Single Node
D. Use the cron expression
Answer: A
Q: What parameter controls retry attempts?
A. max_retries
B. Via Widgets or Job Parameters
C. Use the cron expression
D. Job run history page
Answer: A
Q: How to schedule a job weekly?
A. Workflow with multiple dependent tasks
B. Via Widgets or Job Parameters
C. Python task
D. Use the cron expression
Answer: D
Q: Which task type supports Python scripts?
A. Python task
B. A unit of work like running a notebook or script
C. Governance on cluster configurations
D. max_retries
Databricks Certified Data Engineer Associate - Practice Questions
Answer: A
Q: What is the default cluster mode in a job?
A. Single Node
B. max_retries
C. Jobs UI
D. Job run history page
Answer: A
Q: Where do you find job run logs?
A. Jobs UI
B. max_retries
C. Governance on cluster configurations
D. Job run history page
Answer: D
Q: What is a cluster policy?
A. Via Widgets or Job Parameters
B. Governance on cluster configurations
C. Use the cron expression
D. Single Node
Answer: B
Q: What UI is used to create workflows in Databricks?
A. Via Widgets or Job Parameters
B. Single Node
C. Jobs UI
D. Workflow with multiple dependent tasks
Answer: C