Free Certified Data Engineer Professional Exam Braindumps (page: 3)

Page 3 of 46

A junior member of the data engineering team is exploring the language interoperability of Databricks notebooks. The intended outcome of the below code is to register a view of all sales that occurred in countries on the continent of Africa that appear in the geo_lookup table.
Before executing the code, running SHOW TABLES on the current database indicates the database contains only two tables: geo_lookup and sales.


Which statement correctly describes the outcome of executing these command cells in order in an interactive notebook?

  1. Both commands will succeed. Executing show tables will show that countries_af and sales_af have been registered as views.
  2. Cmd 1 will succeed. Cmd 2 will search all accessible databases for a table or view named countries_af: if this entity exists, Cmd 2 will succeed.
  3. Cmd 1 will succeed and Cmd 2 will fail. countries_af will be a Python variable representing a PySparkDataFrame.
  4. Both commands will fail. No new variables, tables, or views will be created.
  5. Cmd 1 will succeed and Cmd 2 will fail. countries_af will be a Python variable containing a list of strings.

Answer(s): E



A Delta table of weather records is partitioned by date and has the below schema:

date DATE, device_id INT, temp FLOAT, latitude FLOAT, longitude FLOAT
To find all the records from within the Arctic Circle, you execute a query with the below filter: latitude > 66.3

Which statement describes how the Delta engine identifies which files to load?

  1. All records are cached to an operational database and then the filter is applied
  2. The Parquet file footers are scanned for min and max statistics for the latitude column
  3. All records are cached to attached storage and then the filter is applied
  4. The Delta log is scanned for min and max statistics for the latitude column
  5. The Hive metastore is scanned for min and max statistics for the latitude column

Answer(s): D



The data engineering team has configured a job to process customer requests to be forgotten (have their data deleted). All user data that needs to be deleted is stored in Delta Lake tables using default table settings.

The team has decided to process all deletions from the previous week as a batch job at 1am each Sunday. The total duration of this job is less than one hour. Every Monday at 3am, a batch job executes a series of VACUUM commands on all Delta Lake tables throughout the organization.
The compliance officer has recently learned about Delta Lake's time travel functionality. They are concerned that this might allow continued access to deleted data.

Assuming all delete logic is correctly implemented, which statement correctly addresses this concern?

  1. Because the VACUUM command permanently deletes all files containing deleted records, deleted records may be accessible with time travel for around 24 hours.
  2. Because the default data retention threshold is 24 hours, data files containing deleted records will be retained until the VACUUM job is run the following day.
  3. Because Delta Lake time travel provides full access to the entire history of a table, deleted records can always be recreated by users with full admin privileges.
  4. Because Delta Lake's delete statements have ACID guarantees, deleted records will be permanently purged from all storage systems as soon as a delete job completes.
  5. Because the default data retention threshold is 7 days, data files containing deleted records will be retained until the VACUUM job is run 8 days later.

Answer(s): E



A junior data engineer has configured a workload that posts the following JSON to the Databricks REST API endpoint 2.0/jobs/create.


Assuming that all configurations and referenced resources are available, which statement describes the result of executing this workload three times?

  1. Three new jobs named "Ingest new data" will be defined in the workspace, and they will each run once daily.
  2. The logic defined in the referenced notebook will be executed three times on new clusters with the configurations of the provided cluster ID.
  3. Three new jobs named "Ingest new data" will be defined in the workspace, but no jobs will be executed.
  4. One new job named "Ingest new data" will be defined in the workspace, but it will not be executed.
  5. The logic defined in the referenced notebook will be executed three times on the referenced existing all purpose cluster.

Answer(s): C



Page 3 of 46



Post your Comments and Discuss Databricks Certified Data Engineer Professional exam with other Community members:

Puran commented on September 18, 2024
Good material and very honest and knowledgeable support team. Contacted the support team and got a reply in less than 30 minutes.
New Zealand
upvote