Free DP-600 Exam Braindumps (page: 14)

Page 13 of 36

DRAG DROP (Drag and Drop is not supported)
You have a Fabric tenant that contains a lakehouse named Lakehouse1.
Readings from 100 IoT devices are appended to a Delta table in Lakehouse1. Each set of readings is approximately 25 KB. Approximately 10 GB of data is received daily.
All the table and SparkSession settings are set to the default.
You discover that queries are slow to execute. In addition, the lakehouse storage contains data and log files that are no longer used.
You need to remove the files that are no longer used and combine small files into larger files with a target size of 1 GB per file.
What should you do? To answer, drag the appropriate actions to the correct requirements. Each action may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.
NOTE: Each correct selection is worth one point.

  1. See Explanation section for answer.

Answer(s): A

Explanation:



You need to create a data loading pattern for a Type 1 slowly changing dimension (SCD).
Which two actions should you include in the process? Each correct answer presents part of the solution.
NOTE: Each correct answer is worth one point.

  1. Update rows when the non-key attributes have changed.
  2. Insert new rows when the natural key exists in the dimension table, and the non-key attribute values have changed.
  3. Update the effective end date of rows when the non-key attribute values have changed.
  4. Insert new records when the natural key is a new value in the table.

Answer(s): A,D



HOTSPOT (Drag and Drop is not supported)
You have a Fabric workspace named Workspace1 and an Azure Data Lake Storage Gen2 account named storage1. Workspace1 contains a lakehouse named Lakehouse1.
You need to create a shortcut to storage1 in Lakehouse1.
Which connection and endpoint should you specify? To answer, select the appropriate options in the answer area.
NOTE: Each correct selection is worth one point.

  1. See Explanation section for answer.

Answer(s): A

Explanation:

Box 1: abfss
Box 2: dfs



You are analyzing customer purchases in a Fabric notebook by using PySpark.
You have the following DataFrames:
transactions: Contains five columns named transaction_id, customer_id, product_id, amount, and date and has 10 million rows, with each row representing a transaction. customers: Contains customer details in 1,000 rows and three columns named customer_id, name, and country.
You need to join the DataFrames on the customer_id column. The solution must minimize data shuffling.
You write the following code.
from pyspark.sql import functions as F
results =
Which code should you run to populate the results DataFrame?

  1. transactions.join(F.broadcast(customers), transactions.customer_id == customers.customer_id)
  2. transactions.join(customers, transactions.customer_id == customers.customer_id).distinct()
  3. transactions.join(customers, transactions.customer_id == customers.customer_id)
  4. transactions.crossJoin(customers).where(transactions.customer_id == customers.customer_id)

Answer(s): A






Post your Comments and Discuss Microsoft DP-600 exam with other Community members:

DP-600 Exam Discussions & Posts