Free Certified Data Engineer Professional Exam Braindumps (page: 7)

Page 7 of 46

A Spark job is taking longer than expected. Using the Spark UI, a data engineer notes that the Min, Median, and Max Durations for tasks in a particular stage show the minimum and median time to complete a task as roughly the same, but the max duration for a task to be roughly 100 times as long as the minimum.
Which situation is causing increased duration of the overall job?

  1. Task queueing resulting from improper thread pool assignment.
  2. Spill resulting from attached volume storage being too small.
  3. Network latency due to some cluster nodes being in different regions from the source data
  4. Skew caused by more data being assigned to a subset of spark-partitions.
  5. Credential validation errors while pulling data from an external system.

Answer(s): D



Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM.
Given a job with at least one wide transformation, which of the following cluster configurations will result in maximum performance?


  1. • Total VMs; 1
    • 400 GB per Executor
    • 160 Cores / Executor

  2. • Total VMs: 8
    • 50 GB per Executor
    • 20 Cores / Executor

  3. • Total VMs: 16
    • 25 GB per Executor
    • 10 Cores/Executor

  4. • Total VMs: 4
    • 100 GB per Executor
    • 40 Cores/Executor

  5. • Total VMs:2
    • 200 GB per Executor
    • 80 Cores / Executor

Answer(s): A



A junior data engineer on your team has implemented the following code block.


The view new_events contains a batch of records with the same schema as the events Delta table. The event_id field serves as a unique key for this table.
When this query is executed, what will happen with new records that have the same event_id as an existing record?

  1. They are merged.
  2. They are ignored.
  3. They are updated.
  4. They are inserted.
  5. They are deleted.

Answer(s): B



A junior data engineer seeks to leverage Delta Lake's Change Data Feed functionality to create a Type 1 table representing all of the values that have ever been valid for all rows in a bronze table created with the property delta.enableChangeDataFeed = true. They plan to execute the following code as a daily job:


Which statement describes the execution and results of running the above query multiple times?

  1. Each time the job is executed, newly updated records will be merged into the target table, overwriting previous values with the same primary keys.
  2. Each time the job is executed, the entire available history of inserted or updated records will be appended to the target table, resulting in many duplicate entries.
  3. Each time the job is executed, the target table will be overwritten using the entire history of inserted or updated records, giving the desired result.
  4. Each time the job is executed, the differences between the original and current versions are calculated; this may result in duplicate entries for some records.
  5. Each time the job is executed, only those records that have been inserted or updated since the last execution will be appended to the target table, giving the desired result.

Answer(s): B



Page 7 of 46



Post your Comments and Discuss Databricks Certified Data Engineer Professional exam with other Community members:

Puran commented on September 18, 2024
Good material and very honest and knowledgeable support team. Contacted the support team and got a reply in less than 30 minutes.
New Zealand
upvote