Free Certified Data Engineer Professional Exam Braindumps (page: 2)

Page 2 of 46

A junior developer complains that the code in their notebook isn't producing the correct results in the development environment. A shared screenshot reveals that while they're using a notebook versioned with Databricks Repos, they're using a personal branch that contains old logic. The desired branch named dev-2.3.9 is not available from the branch selection dropdown.
Which approach will allow this developer to review the current logic for this notebook?

  1. Use Repos to make a pull request use the Databricks REST API to update the current branch to dev-2.3.9
  2. Use Repos to pull changes from the remote Git repository and select the dev-2.3.9 branch.
  3. Use Repos to checkout the dev-2.3.9 branch and auto-resolve conflicts with the current branch
  4. Merge all changes back to the main branch in the remote Git repository and clone the repo again
  5. Use Repos to merge the current branch and the dev-2.3.9 branch, then make a pull request to sync with the remote repository

Answer(s): B



The security team is exploring whether or not the Databricks secrets module can be leveraged for connecting to an external database.

After testing the code with all Python variables being defined with strings, they upload the password to the secrets module and configure the correct permissions for the currently active user. They then modify their code to the following (leaving all other variables unchanged).


Which statement describes what will happen when the above code is executed?

  1. The connection to the external table will fail; the string "REDACTED" will be printed.
  2. An interactive input box will appear in the notebook; if the right password is provided, the connection will succeed and the encoded password will be saved to DBFS.
  3. An interactive input box will appear in the notebook; if the right password is provided, the connection will succeed and the password will be printed in plain text.
  4. The connection to the external table will succeed; the string value of password will be printed in plain text.
  5. The connection to the external table will succeed; the string "REDACTED" will be printed.

Answer(s): E



The data science team has created and logged a production model using MLflow. The following code correctly imports and applies the production model to output the predictions as a new DataFrame named preds with the schema "customer_id LONG, predictions DOUBLE, date DATE".


The data science team would like predictions saved to a Delta Lake table with the ability to compare all predictions across time. Churn predictions will be made at most once per day.
Which code block accomplishes this task while minimizing potential compute costs?

  1. preds.write.mode("append").saveAsTable("churn_preds")
  2. preds.write.format("delta").save("/preds/churn_preds")



Answer(s): A



An upstream source writes Parquet data as hourly batches to directories named with the current date. A nightly batch job runs the following code to ingest all data from the previous day as indicated by the date variable:


Assume that the fields customer_id and order_id serve as a composite key to uniquely identify each order.

If the upstream system is known to occasionally produce duplicate entries for a single order hours apart, which statement is correct?

  1. Each write to the orders table will only contain unique records, and only those records without duplicates in the target table will be written.
  2. Each write to the orders table will only contain unique records, but newly written records may have duplicates already present in the target table.
  3. Each write to the orders table will only contain unique records; if existing records with the same key are present in the target table, these records will be overwritten.
  4. Each write to the orders table will only contain unique records; if existing records with the same key are present in the target table, the operation will fail.
  5. Each write to the orders table will run deduplication over the union of new and existing records, ensuring no duplicate records are present.

Answer(s): B



Page 2 of 46



Post your Comments and Discuss Databricks Certified Data Engineer Professional exam with other Community members:

Puran commented on September 18, 2024
Good material and very honest and knowledgeable support team. Contacted the support team and got a reply in less than 30 minutes.
New Zealand
upvote