Free DP-203 Exam Braindumps (page: 15)

Page 14 of 94

You are performing exploratory analysis of the bus fare data in an Azure Data Lake Storage Gen2 account by using an Azure Synapse Analytics serverless SQL pool.
You execute the Transact-SQL query shown in the following exhibit.



What do the query results include?

  1. Only CSV files in the tripdata_2020 subfolder.
  2. All files that have file names that beginning with "tripdata_2020".
  3. All CSV files that have file names that contain "tripdata_2020".
  4. Only CSV that have file names that beginning with "tripdata_2020".

Answer(s): D



DRAG DROP (Drag and Drop is not supported)
You use PySpark in Azure Databricks to parse the following JSON input.


You need to output the data in the following tabular format.



How should you complete the PySpark code? To answer, drag the appropriate values to the correct targets. Each value may be used once, more than once, or not at all. You may need to drag the spit bar between panes or scroll to view content.

NOTE: Each correct selection is worth one point.
Select and Place:

  1. See Explanation section for answer.

Answer(s): A

Explanation:



Box 1: select
Box 2: explode
Bop 3: alias
pyspark.sql.Column.alias returns this column aliased with a new name or names (in the case of expressions that return more than one column, such as explode).


Reference:

https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.Column.alias.html https://docs.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/explode



HOTSPOT (Drag and Drop is not supported)
You are designing an application that will store petabytes of medical imaging data.
When the data is first created, the data will be accessed frequently during the first week. After one month, the data must be accessible within 30 seconds, but files will be accessed infrequently. After one year, the data will be accessed infrequently but must be accessible within five minutes.
You need to select a storage strategy for the data. The solution must minimize costs.

Which storage tier should you use for each time frame? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.
Hot Area:

  1. See Explanation section for answer.

Answer(s): A

Explanation:



Box 1: Hot
Hot tier - An online tier optimized for storing data that is accessed or modified frequently. The Hot tier has the highest storage costs, but the lowest access costs.
Box 2: Cool
Cool tier - An online tier optimized for storing data that is infrequently accessed or modified. Data in the Cool tier should be stored for a minimum of 30 days. The Cool tier has lower storage costs and higher access costs compared to the Hot tier.
Box 3: Cool
Not Archive tier - An offline tier optimized for storing data that is rarely accessed, and that has flexible latency requirements, on the order of hours. Data in the Archive tier should be stored for a minimum of 180 days.


Reference:

https://docs.microsoft.com/en-us/azure/storage/blobs/access-tiers-overview https://www.altaro.com/hyper-v/azure-archive-storage/



You have an Azure Synapse Analytics Apache Spark pool named Pool1.
You plan to load JSON files from an Azure Data Lake Storage Gen2 container into the tables in Pool1. The structure and data types vary by file.
You need to load the files into the tables. The solution must maintain the source data types. What should you do?

  1. Use a Conditional Split transformation in an Azure Synapse data flow.
  2. Use a Get Metadata activity in Azure Data Factory.
  3. Load the data by using the OPENROWSET Transact-SQL command in an Azure Synapse Analytics serverless SQL pool.
  4. Load the data by using PySpark.

Answer(s): D






Post your Comments and Discuss Microsoft DP-203 exam with other Community members:

DP-203 Exam Discussions & Posts