Free Microsoft DP-300 Exam Braindumps (page: 5)

You are designing a streaming data solution that will ingest variable volumes of data. You need to ensure that you can change the partition count after creation.
Which service should you use to ingest the data?

  1. Azure Event Hubs Standard
  2. Azure Stream Analytics
  3. Azure Data Factory
  4. Azure Event Hubs Dedicated

Answer(s): D

Explanation:

The partition count for an event hub in a dedicated Event Hubs cluster can be increased after the event hub has been created.

Incorrect Answers:
A: For Azure Event standard hubs, the partition count isn't changeable, so you should consider long-term scale when setting partition count.


Reference:

https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-features#partitions



HOTSPOT (Drag and Drop is not supported)
You are building a database in an Azure Synapse Analytics serverless SQL pool. You have data stored in Parquet files in an Azure Data Lake Storage Gen2 container.

Records are structured as shown in the following sample.

The records contain two applicants at most.
You need to build a table that includes only the address fields.

How should you complete the Transact-SQL statement? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.
Hot Area:

  1. See Explanation section for answer.

Answer(s): A

Explanation:



Box 1: CREATE EXTERNAL TABLE
An external table points to data located in Hadoop, Azure Storage blob, or Azure Data Lake Storage. External tables are used to read data from files or write data to files in Azure Storage. With Synapse SQL, you can use external tables to read external data using dedicated SQL pool or serverless SQL pool.

Syntax:
CREATE EXTERNAL TABLE { database_name.schema_name.table_name | schema_name.table_name | table_name }
( <column_definition> [ ,...n ] )
WITH (
LOCATION = 'folder_or_filepath',
DATA_SOURCE = external_data_source_name,
FILE_FORMAT = external_file_format_name

Box 2. OPENROWSET
When using serverless SQL pool, CETAS is used to create an external table and export query results to Azure Storage Blob or Azure Data Lake Storage Gen2.

Example:
AS
SELECT decennialTime, stateName, SUM(population) AS population
FROM
OPENROWSET(BULK 'https://azureopendatastorage.blob.core.windows.net/censusdatacontainer/release/us_population_county/year=*/*.parquet',
FORMAT='PARQUET') AS [r]
GROUP BY decennialTime, stateName
GO


Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/develop-tables-external-tables



You have an Azure Synapse Analytics Apache Spark pool named Pool1.

You plan to load JSON files from an Azure Data Lake Storage Gen2 container into the tables in Pool1. The structure and data types vary by file.

You need to load the files into the tables. The solution must maintain the source data types. What should you do?

  1. Load the data by using PySpark.
  2. Load the data by using the OPENROWSET Transact-SQL command in an Azure Synapse Analytics serverless SQL pool.
  3. Use a Get Metadata activity in Azure Data Factory.
  4. Use a Conditional Split transformation in an Azure Synapse data flow.

Answer(s): A

Explanation:

Synapse notebooks support four Apache Spark languages:
PySpark (Python)
Spark (Scala)

Spark SQL
.NET Spark (C#)
Note: Bring data to a notebook.
You can load data from Azure Blob Storage, Azure Data Lake Store Gen 2, and SQL pool as shown in the code samples below.
Read a CSV from Azure Data Lake Store Gen2 as a Spark DataFrame. from pyspark.sql import SparkSession from pyspark.sql.types import * account_name = "Your account name" container_name = "Your container name" relative_path = "Your path" adls_path = 'abfss://%s@%s.dfs.core.windows.net/%s' % (container_name, account_name, relative_path) df1 = spark.read.option('header', 'true') \
.option('delimiter', ',') \
.csv(adls_path + '/Testfile.csv')


Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-development-using-notebooks



You are designing a date dimension table in an Azure Synapse Analytics dedicated SQL pool. The date dimension table will be used by all the fact tables.

Which distribution type should you recommend to minimize data movement?

  1. HASH
  2. REPLICATE
  3. ROUND_ROBIN

Answer(s): B

Explanation:

A replicated table has a full copy of the table available on every Compute node. Queries run fast on replicated tables since joins on replicated tables don't require data movement. Replication requires extra storage, though, and isn't practical for large tables.

Incorrect Answers:
C: A round-robin distributed table distributes table rows evenly across all distributions. The assignment of rows to distributions is random. Unlike hash-distributed tables, rows with equal values are not guaranteed to be assigned to the same distribution.

As a result, the system sometimes needs to invoke a data movement operation to better organize your data before it can resolve a query.


Reference:

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-distribute



HOTSPOT (Drag and Drop is not supported)
From a website analytics system, you receive data extracts about user interactions such as downloads, link clicks, form submissions, and video plays.

The data contains the following columns:


You need to design a star schema to support analytical queries of the data. The star schema will contain four tables including a date dimension.

To which table should you add each column? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.
Hot Area:

  1. See Explanation section for answer.

Answer(s): A

Explanation:



Box 1: FactEvents
Fact tables store observations or events, and can be sales orders, stock balances, exchange rates, temperatures, etc.

Box 2: DimChannel
Dimension tables describe business entities – the things you model. Entities can include products, people, places, and concepts including time itself. The most consistent table you'll find in a star schema is a date dimension table. A dimension table contains a key column (or columns) that acts as a unique identifier, and descriptive columns.
Box 3: DimEvent


Reference:

https://docs.microsoft.com/en-us/power-bi/guidance/star-schema






Post your Comments and Discuss Microsoft DP-300 exam prep with other Community members:

DP-300 Exam Discussions & Posts