QUESTION: 16 Exam Topic: Maintain a data analytics solution Question Set 3

DRAG DROP (Drag and Drop is not supported)
You have a Fabric tenant that contains a lakehouse named Lakehouse1.

Readings from 100 IoT devices are appended to a Delta table in Lakehouse1. Each set of readings is approximately 25 KB. Approximately 10 GB of data is received daily.

All the table and SparkSession settings are set to the default.

You discover that queries are slow to execute. In addition, the lakehouse storage contains data and log files that are no longer used.

You need to remove the files that are no longer used and combine small files into larger files with a target size of 1 GB per file.

What should you do? To answer, drag the appropriate actions to the correct requirements. Each action may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.

NOTE: Each correct selection is worth one point.

Select and Place:

See Explanation section for answer.

Answer(s): A

Explanation:

Box 1: Run the VACUUM command on a schedule. Remove the files.

Remove old files with the Delta Lake Vacuum Command
You can remove files marked for deletion (aka “tombstoned files”) from storage with the Delta Lake vacuum command. Delta Lake doesn't physically remove files from storage for operations that logically delete the files. You need to use the vacuum command to physically remove files from storage that have been marked for deletion and are older than the retention period.

The main benefit of vacuuming is to save on storage costs. Vacuuming does not make your queries run any faster and can limit your ability to time travel to earlier Delta table versions. You need to weigh the costs/ benefits for each of your tables to develop an optimal vacuum strategy. Some tables should be vacuumed frequently. Other tables should never be vacuumed.

Box 2: Run the OPTIMIZE command on a schedule. Combine the files.

Best practices: Delta Lake Compact files
If you continuously write data to a Delta table, it will over time accumulate a large number of files, especially if you add data in small batches. This can have an adverse effect on the efficiency of table reads, and it can also affect the performance of your file system. Ideally, a large number of small files should be rewritten into a smaller number of larger files on a regular basis. This is known as compaction.

You can compact a table using the OPTIMIZE command.

Reference:

https://delta.io/blog/remove-files-delta-lake-vacuum-command/ https://docs.databricks.com/en/delta/best-practices.html

Show Answer Next Question

QUESTION: 17 Exam Topic: Maintain a data analytics solution Question Set 3

HOTSPOT (Drag and Drop is not supported)
You have a Fabric workspace named Workspace1 and an Azure Data Lake Storage Gen2 account named storage1. Workspace1 contains a lakehouse named Lakehouse1.

You need to create a shortcut to storage1 in Lakehouse1.

Which protocol and endpoint should you specify? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Hot Area:

See Explanation section for answer.

Answer(s): A

Explanation:

Box 1: abfss
Access Azure storage
Once you have properly configured credentials to access your Azure storage container, you can interact with resources in the storage account using URIs. Databricks recommends using the abfss driver for greater security.

spark.read.load("abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path-to-data>") dbutils.fs.ls("abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path-to-data>") CREATE TABLE <database-name>.<table-name>;
COPY INTO <database-name>.<table-name>
FROM 'abfss://container@storageAccount.dfs.core.windows.net/path/to/folder' FILEFORMAT = CSV
COPY_OPTIONS ('mergeSchema' = 'true');
Box 2: dfs
dfs is used for the endpoint:
dbutils.fs.ls("abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path-to-data>")

Reference:

https://docs.databricks.com/en/connect/storage/azure-storage.html

Show Answer Next Question

QUESTION: 18 Exam Topic: Maintain a data analytics solution Question Set 3

You have an Azure Repos Git repository named Repo1 and a Fabric-enabled Microsoft Power BI Premium capacity. The capacity contains two workspaces named Workspace1 and Workspace2. Git integration is enabled at the workspace level.

You plan to use Microsoft Power BI Desktop and Workspace1 to make version-controlled changes to a semantic model stored in Repo1. The changes will be built and deployed to Workspace2 by using Azure Pipelines.

You need to ensure that report and semantic model definitions are saved as individual text files in a folder hierarchy. The solution must minimize development and maintenance effort.

In which file format should you save the changes?

PBIP
PBIDS
PBIT
PBIX

Answer(s): A

Explanation:

Power BI Desktop projects (PREVIEW)
Power BI Desktop introduces a new way to author, collaborate, and save your projects. You can now save your work as a Power BI Project (PBIP). As a project, report and semantic model item definitions are saved as individual plain text files in a simple, intuitive folder structure.

Reference:

https://learn.microsoft.com/en-us/power-bi/developer/projects/projects-overview

Show Answer Next Question

QUESTION: 19 Exam Topic: Maintain a data analytics solution Question Set 3

You have a Fabric tenant that contains a lakehouse named Lakehouse1. Lakehouse1 contains a Delta table that has one million Parquet files.

You need to remove files that were NOT referenced by the table during the past 30 days. The solution must ensure that the transaction log remains consistent, and the ACID properties of the table are maintained.

What should you do?

From OneLake file explorer, delete the files.
Run the OPTIMIZE command and specify the Z-order parameter.
Run the OPTIMIZE command and specify the V-order parameter.
Run the VACUUM command.

Answer(s): D

Explanation:

VACUUM
Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime Remove unused files from a table directory.

VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold.

Incorrect:
Not B: What is Z order optimization?
Z-ordering is a technique to colocate related information in the same set of files. This co-locality is automatically used by Delta Lake on Azure Databricks data-skipping algorithms. This behavior dramatically reduces the amount of data that Delta Lake on Azure Databricks needs to read.

Not C: Delta Lake table optimization and V-Order
V-Order is a write time optimization to the parquet file format that enables lightning-fast reads under the Microsoft Fabric compute engines, such as Power BI, SQL, Spark, and others.

Power BI and SQL engines make use of Microsoft Verti-Scan technology and V-Ordered parquet files to achieve in-memory like data access times. Spark and other non-Verti-Scan compute engines also benefit from the V-Ordered files with an average of 10% faster read times, with some scenarios up to 50%.

V-Order works by applying special sorting, row group distribution, dictionary encoding and compression on parquet files, thus requiring less network, disk, and CPU resources in compute engines to read it, providing cost efficiency and performance. V-Order sorting has a 15% impact on average write times but provides up to 50% more compression.

Reference:

https://docs.databricks.com/en/sql/language-manual/delta-vacuum.html https://learn.microsoft.com/en-us/fabric/data-engineering/delta-optimization-and-v-order?

Show Answer Next Question

QUESTION: 20 Exam Topic: Maintain a data analytics solution Question Set 3

You have a Fabric tenant that contains a lakehouse named Lakehouse1.

You need to prevent new tables added to Lakehouse1 from being added automatically to the default semantic model of the lakehouse.

What should you configure?

the SQL analytics endpoint settings
the semantic model settings
the workspace settings
the Lakehouse1 settings

Answer(s): A

Explanation:

Default Power BI semantic models in Microsoft Fabric
Sync the default Power BI semantic model
Previously we auto added all tables and views in the Warehouse to the default Power BI semantic model. Based on feedback, we have modified the default behavior to not automatically add tables and views to the default Power BI semantic model. This change will ensure the background sync will not get triggered. This will also disable some actions like "New Measure", "Create Report", "Analyze in Excel".

If you want to change this default behavior, you can:
Manually enable the Sync the default Power BI semantic model setting for each Warehouse or SQL analytics endpoint in the workspace. This will restart the background sync that will incur some consumption costs.

2. Manually pick tables and views to be added to semantic model through Manage default Power BI semantic model in the ribbon or info bar.

NOTE: Understand what's in the default Power BI semantic model
When you create a Warehouse or SQL analytics endpoint, a default Power BI semantic model is created. The default semantic model is represented with the (default) suffix.

Reference:

https://learn.microsoft.com/en-us/fabric/data-warehouse/semantic-models

Show Answer Next Question

Free Microsoft DP-600 Exam Questions (page: 5)

QUESTION: 16 Exam Topic: Maintain a data analytics solution Question Set 3

Explanation:

Reference:

QUESTION: 17 Exam Topic: Maintain a data analytics solution Question Set 3

Explanation:

Reference:

QUESTION: 18 Exam Topic: Maintain a data analytics solution Question Set 3

Explanation:

Reference:

QUESTION: 19 Exam Topic: Maintain a data analytics solution Question Set 3

Explanation:

Reference:

QUESTION: 20 Exam Topic: Maintain a data analytics solution Question Set 3

Explanation:

Reference:

DP-600 Exam Discussions & Posts