Databricks Databricks Certified Associate Developer for Apache Spark 3.0 Exam
Certified Associate Developer for Apache Spark (Page 4 )

Updated On: 26-Jan-2026

Which of the following describes a difference between Spark's cluster and client execution modes?

  1. In cluster mode, the cluster manager resides on a worker node, while it resides on an edge node in client mode.
  2. In cluster mode, executor processes run on worker nodes, while they run on gateway nodes in client mode.
  3. In cluster mode, the driver resides on a worker node, while it resides on an edge node in client mode.
  4. In cluster mode, a gateway machine hosts the driver, while it is co-located with the executor in client mode.
  5. In cluster mode, the Spark driver is not co-located with the cluster manager, while it is co-located in client mode.

Answer(s): C

Explanation:

In cluster mode, the driver resides on a worker node, while it resides on an edge node in client mode.
Correct. The idea of Spark's client mode is that workloads can be executed from an edge node, also known as gateway machine, from outside the cluster. The most common way to execute Spark however is in cluster mode, where the driver resides on a worker node.
In practice, in client mode, there are tight constraints about the data transfer speed relative to the data transfer speed between worker nodes in the cluster. Also, any job in that is executed in client mode will fail if the edge node fails. For these reasons, client mode is usually not used in a production environment.
In cluster mode, the cluster manager resides on a worker node, while it resides on an edge node in client execution mode.
No. In both execution modes, the cluster manager may reside on a worker node, but it does not reside on an edge node in client mode.
In cluster mode, executor processes run on worker nodes, while they run on gateway nodes in client mode.
This is incorrect. Only the driver runs on gateway nodes (also known as "edge nodes") in client mode, but not the executor processes.
In cluster mode, the Spark driver is not co-located with the cluster manager, while it is co-located in client mode.
No, in client mode, the Spark driver is not co-located with the driver. The whole point of client mode is that the driver is outside the cluster and not associated with the resource that manages the
cluster (the machine that runs the cluster manager).
In cluster mode, a gateway machine hosts the driver, while it is co-located with the executor in client mode.
No, it is exactly the opposite: There are no gateway machines in cluster mode, but in client mode, they host the driver.



Which of the following describes properties of a shuffle?

  1. Operations involving shuffles are never evaluated lazily.
  2. Shuffles involve only single partitions.
  3. Shuffles belong to a class known as "full transformations".
  4. A shuffle is one of many actions in Spark.
  5. In a shuffle, Spark writes data to disk.

Answer(s): E

Explanation:

In a shuffle, Spark writes data to disk.
Correct! Spark's architecture dictates that intermediate results during a shuffle are written to disk. A shuffle is one of many actions in Spark.
Incorrect. A shuffle is a transformation, but not an action. Shuffles involve only single partitions.
No, shuffles involve multiple partitions. During a shuffle, Spark generates output partitions from multiple input partitions.
Operations involving shuffles are never evaluated lazily.
Wrong. A shuffle is a costly operation and Spark will evaluate it as lazily as other transformations. This is, until a subsequent action triggers its evaluation.
Shuffles belong to a class known as "full transformations".

Not quite. Shuffles belong to a class known as "wide transformations". "Full transformation" is not a relevant term in Spark.
More info: Spark – The Definitive Guide, Chapter 2 and Spark: disk I/O on stage boundaries explanation - Stack Overflow



Which of the following statements about the differences between actions and transformations is correct?

  1. Actions are evaluated lazily, while transformations are not evaluated lazily.
  2. Actions generate RDDs, while transformations do not.
  3. Actions do not send results to the driver, while transformations do.
  4. Actions can be queued for delayed execution, while transformations can only be processed immediately.
  5. Actions can trigger Adaptive Query Execution, while transformation cannot.

Answer(s): E

Explanation:

Actions can trigger Adaptive Query Execution, while transformation cannot.
Correct. Adaptive Query Execution optimizes queries at runtime. Since transformations are evaluated lazily, Spark does not have any runtime information to optimize the query until an action is called. If Adaptive Query Execution is enabled, Spark will then try to optimize the query based on the feedback it gathers while it is evaluating the query.
Actions can be queued for delayed execution, while transformations can only be processed immediately.
No, there is no such concept as "delayed execution" in Spark. Actions cannot be evaluated lazily, meaning that they are executed immediately.
Actions are evaluated lazily, while transformations are not evaluated lazily.
Incorrect, it is the other way around: Transformations are evaluated lazily and actions trigger their evaluation.
Actions generate RDDs, while transformations do not.
No. Transformations change the data and, since RDDs are immutable, generate new RDDs along the way. Actions produce outputs in Python and data types (integers, lists, text files,...) based on
the RDDs, but they do not generate them.
Here is a great tip on how to differentiate actions from transformations: If an operation returns a DataFrame, Dataset, or an RDD, it is a transformation. Otherwise, it is an action.
Actions do not send results to the driver, while transformations do.
No. Actions send results to the driver. Think about running DataFrame.count(). The result of this command will return a number to the driver. Transformations, however, do not send results back to the driver. They produce RDDs that remain on the worker nodes.
More info: What is the difference between a transformation and an action in Apache Spark? | Bartosz Mikulski, How to Speed up SQL Queries with Adaptive Query Execution



Which of the following is a characteristic of the cluster manager?

  1. Each cluster manager works on a single partition of data.
  2. The cluster manager receives input from the driver through the SparkContext.
  3. The cluster manager does not exist in standalone mode.
  4. The cluster manager transforms jobs into DAGs.
  5. In client mode, the cluster manager runs on the edge node.

Answer(s): B

Explanation:

The cluster manager receives input from the driver through the SparkContext.
Correct. In order for the driver to contact the cluster manager, the driver launches a SparkContext. The driver then asks the cluster manager for resources to launch executors.
In client mode, the cluster manager runs on the edge node.
No. In client mode, the cluster manager is independent of the edge node and runs in the cluster. The cluster manager does not exist in standalone mode.
Wrong, the cluster manager exists even in standalone mode. Remember, standalone mode is an easy means to deploy Spark across a whole cluster, with some limitations. For example, in
standalone mode, no other frameworks can run in parallel with Spark. The cluster manager is part of Spark in standalone deployments however and helps launch and maintain resources across the cluster.
The cluster manager transforms jobs into DAGs.
No, transforming jobs into DAGs is the task of the Spark driver. Each cluster manager works on a single partition of data.
No. Cluster managers do not work on partitions directly. Their job is to coordinate cluster resources so that they can be requested by and allocated to Spark drivers.
More info: Introduction to Core Spark Concepts • BigData



Which of the following are valid execution modes?

  1. Kubernetes, Local, Client
  2. Client, Cluster, Local
  3. Server, Standalone, Client
  4. Cluster, Server, Local
  5. Standalone, Client, Cluster

Answer(s): B

Explanation:

This is a tricky Question: to get
right, since it is easy to confuse execution modes and deployment modes. Even in literature, both terms are sometimes used interchangeably.
There are only 3 valid execution modes in Spark: Client, cluster, and local execution modes. Execution modes do not refer to specific frameworks, but to where infrastructure is located with respect to each other.
In client mode, the driver sits on a machine outside the cluster. In cluster mode, the driver sits on a machine inside the cluster. Finally, in local mode, all Spark infrastructure is started in a single JVM (Java Virtual Machine) in a single computer which then also includes the driver.
Deployment modes often refer to ways that Spark can be deployed in cluster mode and how it uses specific frameworks outside Spark. Valid deployment modes are standalone, Apache YARN,
Apache Mesos and Kubernetes. Client, Cluster, Local
Correct, all of these are the valid execution modes in Spark. Standalone, Client, Cluster
No, standalone is not a valid execution mode. It is a valid deployment mode, though. Kubernetes, Local, Client No, Kubernetes is a deployment mode, but not an execution mode. Cluster, Server, Local No, Server is not an execution mode. Server, Standalone, Client
No, standalone and server are not execution modes. More info: Apache Spark Internals - Learning Journal



Viewing page 4 of 37
Viewing questions 16 - 20 out of 342 questions



Post your Comments and Discuss Databricks Databricks Certified Associate Developer for Apache Spark 3.0 exam prep with other Community members:

Join the Databricks Certified Associate Developer for Apache Spark 3.0 Discussion