Free Certified Data Engineer Professional Exam Braindumps (page: 20)

Page 20 of 46

In order to facilitate near real-time workloads, a data engineer is creating a helper function to leverage the schema detection and evolution functionality of Databricks Auto Loader. The desired function will automatically detect the schema of the source directly, incrementally process JSON files as they arrive in a source directory, and automatically evolve the schema of the table when new fields are detected.

The function is displayed below with a blank:

Which response correctly fills in the blank to meet the specified requirements?






Answer(s): E



The data engineering team maintains the following code:


Assuming that this code produces logically correct results and the data in the source table has been de-duplicated and validated, which statement describes what will occur when this code is executed?

  1. The silver_customer_sales table will be overwritten by aggregated values calculated from all records in the gold_customer_lifetime_sales_summary table as a batch job.
  2. A batch job will update the gold_customer_lifetime_sales_summary table, replacing only those rows that have different values than the current version of the table, using customer_id as the primary key.
  3. The gold_customer_lifetime_sales_summary table will be overwritten by aggregated values calculated from all records in the silver_customer_sales table as a batch job.
  4. An incremental job will leverage running information in the state store to update aggregate values in the gold_customer_lifetime_sales_summary table.
  5. An incremental job will detect if new rows have been written to the silver_customer_sales table; if new rows are detected, all aggregates will be recalculated and used to overwrite the gold_customer_lifetime_sales_summary table.

Answer(s): C



The data architect has mandated that all tables in the Lakehouse should be configured as external (also known as "unmanaged") Delta Lake tables.

Which approach will ensure that this requirement is met?

  1. When a database is being created, make sure that the LOCATION keyword is used.
  2. When configuring an external data warehouse for all table storage, leverage Databricks for all ELT.
  3. When data is saved to a table, make sure that a full file path is specified alongside the Delta format.
  4. When tables are created, make sure that the EXTERNAL keyword is used in the CREATE TABLE statement.
  5. When the workspace is being configured, make sure that external cloud object storage has been mounted.

Answer(s): C



The marketing team is looking to share data in an aggregate table with the sales organization, but the field names used by the teams do not match, and a number of marketing-specific fields have not been approved for the sales org.

Which of the following solutions addresses the situation while emphasizing simplicity?

  1. Create a view on the marketing table selecting only those fields approved for the sales team; alias the names of any fields that should be standardized to the sales naming conventions.
  2. Create a new table with the required schema and use Delta Lake's DEEP CLONE functionality to sync up changes committed to one table to the corresponding table.
  3. Use a CTAS statement to create a derivative table from the marketing table; configure a production job to propagate changes.
  4. Add a parallel table write to the current production pipeline, updating a new sales table that varies as required from the marketing table.
  5. Instruct the marketing team to download results as a CSV and email them to the sales organization.

Answer(s): A



Page 20 of 46



Post your Comments and Discuss Databricks Certified Data Engineer Professional exam with other Community members:

Puran commented on September 18, 2024
Good material and very honest and knowledgeable support team. Contacted the support team and got a reply in less than 30 minutes.
New Zealand
upvote