Free DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK-3.0 Exam Braindumps (page: 4)

Page 4 of 46

Which of the following describes characteristics of the Spark UI?

  1. Via the Spark UI, workloads can be manually distributed across executors.
  2. Via the Spark UI, stage execution speed can be modified.
  3. The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.
  4. There is a place in the Spark UI that shows the property spark.executor.memory.
  5. Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.

Answer(s): D

Explanation:

There is a place in the Spark UI that shows the property spark.executor.memory.
Correct, you can see Spark properties such as spark.executor.memory in the Environment tab. Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.
Wrong – Jobs, Stages, Storage, Executors, and SQL are all tabs in the Spark UI. DAGs can be inspected in the "Jobs" tab in the job details or in the Stages or SQL tab, but are not a separate tab.
Via the Spark UI, workloads can be manually distributed across distributors.
No, the Spark UI is meant for inspecting the inner workings of Spark which ultimately helps understand, debug, and optimize Spark transactions.
Via the Spark UI, stage execution speed can be modified. No, see above.
The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.
No, there is no Scheduler tab.



Which of the following statements about broadcast variables is correct?

  1. Broadcast variables are serialized with every single task.
  2. Broadcast variables are commonly used for tables that do not fit into memory.
  3. Broadcast variables are immutable.
  4. Broadcast variables are occasionally dynamically updated on a per-task basis.
  5. Broadcast variables are local to the worker node and not shared across the cluster.

Answer(s): C

Explanation:

Broadcast variables are local to the worker node and not shared across the cluster.
This is wrong because broadcast variables are meant to be shared across the cluster. As such, they are never just local to the worker node, but available to all worker nodes.
Broadcast variables are commonly used for tables that do not fit into memory.
This is wrong because broadcast variables can only be broadcast because they are small and do fit into memory.
Broadcast variables are serialized with every single task.
This is wrong because they are cached on every machine in the cluster, precisely avoiding to have to be serialized with every single task.
Broadcast variables are occasionally dynamically updated on a per-task basis.
This is wrong because broadcast variables are immutable – they are never updated. More info: Spark – The Definitive Guide, Chapter 14



Which of the following is a viable way to improve Spark's performance when dealing with large amounts of data, given that there is only a single application running on the cluster?

  1. Increase values for the properties spark.default.parallelism and spark.sql.shuffle.partitions
  2. Decrease values for the properties spark.default.parallelism and spark.sql.partitions
  3. Increase values for the properties spark.sql.parallelism and spark.sql.partitions
  4. Increase values for the properties spark.sql.parallelism and spark.sql.shuffle.partitions
  5. Increase values for the properties spark.dynamicAllocation.maxExecutors, spark.default.parallelism, and spark.sql.shuffle.partitions

Answer(s): A

Explanation:

Decrease values for the properties spark.default.parallelism and spark.sql.partitions No, these values need to be increased.
Increase values for the properties spark.sql.parallelism and spark.sql.partitions Wrong, there is no property spark.sql.parallelism.
Increase values for the properties spark.sql.parallelism and spark.sql.shuffle.partitions See above.
Increase values for the properties spark.dynamicAllocation.maxExecutors, spark.default.parallelism, and spark.sql.shuffle.partitions
The property spark.dynamicAllocation.maxExecutors is only in effect if dynamic allocation is enabled, using the spark.dynamicAllocation.enabled property. It is disabled by default. Dynamic allocation can be useful when to run multiple applications on the same cluster in parallel. However, in this case there is only a single application running on the cluster, so enabling dynamic allocation would not yield a performance benefit.
More info: Practical Spark Tips For Data Scientists | Experfy.com and Basics of Apache Spark Configuration Settings | by Halil Ertan | Towards Data Science (https://bit.ly/3gA0A6w ,
https://bit.ly/2QxhNTr)



Which of the following describes a shuffle?

  1. A shuffle is a process that is executed during a broadcast hash join.
  2. A shuffle is a process that compares data across executors.
  3. A shuffle is a process that compares data across partitions.
  4. A shuffle is a Spark operation that results from DataFrame.coalesce().
  5. A shuffle is a process that allocates partitions to executors.

Answer(s): C

Explanation:

A shuffle is a Spark operation that results from DataFrame.coalesce(). No. DataFrame.coalesce() does not result in a shuffle.
A shuffle is a process that allocates partitions to executors. This is incorrect.
A shuffle is a process that is executed during a broadcast hash join.
No, broadcast hash joins avoid shuffles and yield performance benefits if at least one of the two tables is small in size (<= 10 MB by default). Broadcast hash joins can avoid shuffles because instead of exchanging partitions between executors, they broadcast a small table to all executors that then perform the rest of the join operation locally.
A shuffle is a process that compares data across executors.
No, in a shuffle, data is compared across partitions, and not executors. More info: Spark Repartition & Coalesce - Explained (https://bit.ly/32KF7zS)



Page 4 of 46



Post your Comments and Discuss Databricks DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK-3.0 exam with other Community members:

Ravi commented on August 15, 2024
Good documentation
Anonymous
upvote

Bano commented on January 19, 2024
what % of questions do we get in the real exam?
UNITED STATES
upvote

bharathi commented on January 21, 2024
good explanation
UNITED STATES
upvote

Bano commented on January 19, 2024
What % of questions do we get in the real exam?
UNITED STATES
upvote