Free Certified Data Engineer Professional Exam Braindumps (page: 10)

Page 10 of 46

A small company based in the United States has recently contracted a consulting firm in India to implement several new data engineering pipelines to power artificial intelligence applications. All the company's data is stored in regional cloud storage in the United States.
The workspace administrator at the company is uncertain about where the Databricks workspace used by the contractors should be deployed.

Assuming that all data governance considerations are accounted for, which statement accurately informs this decision?

  1. Databricks runs HDFS on cloud volume storage; as such, cloud virtual machines must be deployed in the region where the data is stored.
  2. Databricks workspaces do not rely on any regional infrastructure; as such, the decision should be made based upon what is most convenient for the workspace administrator.
  3. Cross-region reads and writes can incur significant costs and latency; whenever possible, compute should be deployed in the same region the data is stored.
  4. Databricks leverages user workstations as the driver during interactive development; as such, users should always use a workspace deployed in a region they are physically near.
  5. Databricks notebooks send all executable code from the user’s browser to virtual machines over the open internet; whenever possible, choosing a workspace region near the end users is the most secure.

Answer(s): C



The downstream consumers of a Delta Lake table have been complaining about data quality issues impacting performance in their applications. Specifically, they have complained that invalid latitude and longitude values in the activity_details table have been breaking their ability to use other geolocation processes.

A junior engineer has written the following code to add CHECK constraints to the Delta Lake table:


A senior engineer has confirmed the above logic is correct and the valid ranges for latitude and longitude are provided, but the code fails when executed.
Which statement explains the cause of this failure?

  1. Because another team uses this table to support a frequently running application, two-phase locking is preventing the operation from committing.
  2. The activity_details table already exists; CHECK constraints can only be added during initial table creation.
  3. The activity_details table already contains records that violate the constraints; all existing data must pass CHECK constraints in order to add them to an existing table.
  4. The activity_details table already contains records; CHECK constraints can only be added prior to inserting values into a table.
  5. The current table schema does not contain the field valid_coordinates; schema evolution will need to be enabled before altering the table to add a constraint.

Answer(s): C



Which of the following is true of Delta Lake and the Lakehouse?

  1. Because Parquet compresses data row by row. strings will only be compressed when a character is repeated multiple times.
  2. Delta Lake automatically collects statistics on the first 32 columns of each table which are leveraged in data skipping based on query filters.
  3. Views in the Lakehouse maintain a valid cache of the most recent versions of source tables at all times.
  4. Primary and foreign key constraints can be leveraged to ensure duplicate values are never entered into a dimension table.
  5. Z-order can only be applied to numeric values stored in Delta Lake tables.

Answer(s): B



The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table.

The following logic is used to process these records.


Which statement describes this implementation?

  1. The customers table is implemented as a Type 3 table; old values are maintained as a new column alongside the current value.
  2. The customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.
  3. The customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.
  4. The customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.
  5. The customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended.

Answer(s): B



Page 10 of 46



Post your Comments and Discuss Databricks Certified Data Engineer Professional exam with other Community members:

Puran commented on September 18, 2024
Good material and very honest and knowledgeable support team. Contacted the support team and got a reply in less than 30 minutes.
New Zealand
upvote