Free Certified Data Engineer Professional Exam Braindumps

The data engineering team is migrating an enterprise system with thousands of tables and views into the Lakehouse. They plan to implement the target architecture using a series of bronze, silver, and gold tables. Bronze tables will almost exclusively be used by production data engineering workloads, while silver tables will be used to support both data engineering and machine learning workloads. Gold tables will largely serve business intelligence and reporting purposes. While personal identifying information (PII) exists in all tiers of data, pseudonymization and anonymization rules are in place for all data at the silver and gold levels.
The organization is interested in reducing security concerns while maximizing the ability to collaborate across diverse teams.
Which statement exemplifies best practices for implementing this system?

  1. Isolating tables in separate databases based on data quality tiers allows for easy permissions management through database ACLs and allows physical separation of default storage locations for managed tables.
  2. Because databases on Databricks are merely a logical construct, choices around database organization do not impact security or discoverability in the Lakehouse.
  3. Storing all production tables in a single database provides a unified view of all data assets available throughout the Lakehouse, simplifying discoverability by granting all users view privileges on this database.
  4. Working in the default Databricks database provides the greatest security when working with managed tables, as these will be created in the DBFS root.
  5. Because all tables must live in the same storage containers used for the database they're createdin, organizations should be prepared to create between dozens and thousands of databasesdepending on their data isolation requirements.

Answer(s): A



The data architect has mandated that all tables in the Lakehouse should be configured as external Delta Lake tables. Which approach will ensure that this requirement is met?

  1. Whenever a database is being created, make sure that the LOCATION keyword is used
  2. When configuring an external data warehouse for all table storage, leverage Databricks for all ELT.
  3. Whenever a table is being created, make sure that the LOCATION keyword is used.
  4. When tables are created, make sure that the EXTERNAL keyword is used in the CREATE TABLE statement.
  5. When the workspace is being configured, make sure that external cloud object storage has been mounted.

Answer(s): C



To reduce storage and compute costs, the data engineering team has been tasked with curating a series of aggregate tables leveraged by business intelligence dashboards, customer-facing applications, production machine learning models, and ad hoc analytical queries.

The data engineering team has been made aware of new requirements from a customer-facing application, which is the only downstream workload they manage entirely. As a result, an aggregate table used by numerous teams across the organization will need to have a number of fields renamed, and additional fields will also be added.

Which of the solutions addresses the situation while minimally interrupting other teams in the organization without increasing the number of tables that need to be managed?

  1. Send all users notice that the schema for the table will be changing; include in the communication the logic necessary to revert the new table schema to match historic queries.
  2. Configure a new table with all the requisite fields and new names and use this as the source for the customer-facing application; create a view that maintains the original data schema and table name by aliasing select fields from the new table.
  3. Create a new table with the required schema and new fields and use Delta Lake's deep clone functionality to sync up changes committed to one table to the corresponding table.
  4. Replace the current table definition with a logical view defined with the query logic currently writing the aggregate table; create a new table to power the customer-facing application.
  5. Add a table comment warning all users that the table schema and field names will be changing on a given date; overwrite the table in place to the specifications of the customer-facing application.

Answer(s): B



A Delta Lake table representing metadata about content posts from users has the following schema:

user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, post_time TIMESTAMP, date DATE
This table is partitioned by the date column. A query is run with the following filter: longitude < 20 & longitude > -20

Which statement describes how data will be filtered?

  1. Statistics in the Delta Log will be used to identify partitions that might Include files in the filtered range.
  2. No file skipping will occur because the optimizer does not know the relationship between the partition column and the longitude.
  3. The Delta Engine will use row-level statistics in the transaction log to identify the flies that meet the filter criteria.
  4. Statistics in the Delta Log will be used to identify data files that might include records in the filtered range.
  5. The Delta Engine will scan the parquet file footers to identify each row that meets the filter criteria.

Answer(s): D






Post your Comments and Discuss Databricks Certified Data Engineer Professional exam with other Community members:

Puran commented on September 18, 2024
Good material and very honest and knowledgeable support team. Contacted the support team and got a reply in less than 30 minutes.
New Zealand
upvote