Free Certified Data Engineer Professional Exam Braindumps (page: 24)

Page 24 of 46

You are performing a join operation to combine values from a static userLookup table with a streaming DataFrame streamingDF.

Which code block attempts to perform an invalid stream-static join?

  1. userLookup.join(streamingDF, ["userid"], how="inner")
  2. streamingDF.join(userLookup, ["user_id"], how="outer")
  3. streamingDF.join(userLookup, ["user_id”], how="left")
  4. streamingDF.join(userLookup, ["userid"], how="inner")
  5. userLookup.join(streamingDF, ["user_id"], how="right")

Answer(s): B



Spill occurs as a result of executing various wide transformations. However, diagnosing spill requires one to proactively look for key indicators.

Where in the Spark UI are two of the primary indicators that a partition is spilling to disk?

  1. Query’s detail screen and Job’s detail screen
  2. Stage’s detail screen and Executor’s log files
  3. Driver’s and Executor’s log files
  4. Executor’s detail screen and Executor’s log files
  5. Stage’s detail screen and Query’s detail screen

Answer(s): B



A task orchestrator has been configured to run two hourly tasks. First, an outside system writes Parquet data to a directory mounted at /mnt/raw_orders/. After this data is written, a Databricks job containing the following code is executed:

Assume that the fields customer_id and order_id serve as a composite key to uniquely identify each order, and that the time field indicates when the record was queued in the source system.

If the upstream system is known to occasionally enqueue duplicate entries for a single order hours apart, which statement is correct?

  1. Duplicate records enqueued more than 2 hours apart may be retained and the orders table may contain duplicate records with the same customer_id and order_id.
  2. All records will be held in the state store for 2 hours before being deduplicated and committed to the orders table.
  3. The orders table will contain only the most recent 2 hours of records and no duplicates will be present.
  4. Duplicate records arriving more than 2 hours apart will be dropped, but duplicates that arrive in the same batch may both be written to the orders table.
  5. The orders table will not contain duplicates, but records arriving more than 2 hours late will be ignored and missing from the table.

Answer(s): A



A junior data engineer is migrating a workload from a relational database system to the Databricks Lakehouse. The source system uses a star schema, leveraging foreign key constraints and multi-table inserts to validate records on write.

Which consideration will impact the decisions made by the engineer while migrating this workload?

  1. Databricks only allows foreign key constraints on hashed identifiers, which avoid collisions in highly-parallel writes.
  2. Databricks supports Spark SQL and JDBC; all logic can be directly migrated from the source system without refactoring.
  3. Committing to multiple tables simultaneously requires taking out multiple table locks and can lead to a state of deadlock.
  4. All Delta Lake transactions are ACID compliant against a single table, and Databricks does not enforce foreign key constraints.
  5. Foreign keys must reference a primary key field; multi-table inserts must leverage Delta Lake’s upsert functionality.

Answer(s): D



Page 24 of 46



Post your Comments and Discuss Databricks Certified Data Engineer Professional exam with other Community members:

Puran commented on September 18, 2024
Good material and very honest and knowledgeable support team. Contacted the support team and got a reply in less than 30 minutes.
New Zealand
upvote