Databricks Databricks Certified Associate Developer for Apache Spark 3.0 Exam
Certified Associate Developer for Apache Spark (Page 2 )

Updated On: 26-Jan-2026

Which of the following options describes the responsibility of the executors in Spark?

  1. The executors accept jobs from the driver, analyze those jobs, and return results to the driver.
  2. The executors accept tasks from the driver, execute those tasks, and return results to the cluster manager.
  3. The executors accept tasks from the driver, execute those tasks, and return results to the driver.
  4. The executors accept tasks from the cluster manager, execute those tasks, and return results to the driver.
  5. The executors accept jobs from the driver, plan those jobs, and return results to the cluster manager.

Answer(s): C

Explanation:

More info: Running Spark: an overview of Spark’s runtime architecture - Manning (https://bit.ly/2RPmJn9)



In which order should the code blocks shown below be run in order to return the number of records that are not empty in column value in the DataFrame resulting from an inner join of DataFrame

transactionsDf and itemsDf on columns productId and itemId, respectively?
1. .filter(~isnull(col('value')))
2. .count()
3. transactionsDf.join(itemsDf, col("transactionsDf.productId")==col("itemsDf.itemId"))
4. transactionsDf.join(itemsDf, transactionsDf.productId==itemsDf.itemId, how='inner')
5. .filter(col('value').isnotnull())
6. .sum(col('value'))

  1. 4, 1, 2
  2. 3, 1, 6
  3. 3, 1, 2
  4. 3, 5, 2
  5. 4, 6

Answer(s): A

Explanation:

Correct code block:
transactionsDf.join(itemsDf, transactionsDf.productId==itemsDf.itemId, how='inner').filter(~isnull(col('value'))).count()
Expressions col("transactionsDf.productId") and col("itemsDf.itemId") are invalid. col() does not accept the name of a DataFrame, only column names.
Static notebook | Dynamic notebook: See test 2, Question: 56 ( Databricks import instructions)



The code block displayed below contains an error. The code block should count the number of rows that have a predError of either 3 or 6. Find the error.
Code block: transactionsDf.filter(col('predError').in([3, 6])).count()

  1. The number of rows cannot be determined with the count() operator.
  2. Instead of filter, the select method should be used.
  3. The method used on column predError is incorrect.
  4. Instead of a list, the values need to be passed as single arguments to the in operator.
  5. Numbers 3 and 6 need to be passed as string variables.

Answer(s): C

Explanation:

Correct code block: transactionsDf.filter(col('predError').isin([3, 6])).count()
The isin method is the correct one to use here – the in method does not exist for the Column object. More info: pyspark.sql.Column.isin — PySpark 3.1.2 documentation



Which of the following code blocks returns a new DataFrame with the same columns as DataFrame transactionsDf, except for columns predError and value which should be removed?

  1. transactionsDf.drop(["predError", "value"])
  2. transactionsDf.drop("predError", "value")
  3. transactionsDf.drop(col("predError"), col("value"))
  4. transactionsDf.drop(predError, value)
  5. transactionsDf.drop("predError & value")

Answer(s): B

Explanation:

More info: pyspark.sql.DataFrame.drop — PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 2, Question: 58 ( Databricks import instructions)



In which order should the code blocks shown below be run in order to read a JSON file from location jsonPath into a DataFrame and return only the rows that do not have value 3 in column productId?
1. importedDf.createOrReplaceTempView("importedDf")
2. spark.sql("SELECT * FROM importedDf WHERE productId != 3")
3. spark.sql("FILTER * FROM importedDf WHERE productId != 3")
4. importedDf = spark.read.option("format", "json").path(jsonPath)
5. importedDf = spark.read.json(jsonPath)

  1. 4, 1, 2
  2. 5, 1, 3
  3. 5, 2
  4. 4, 1, 3
  5. 5, 1, 2

Answer(s): E

Explanation:

Correct code block:
importedDf = spark.read.json(jsonPath) importedDf.createOrReplaceTempView("importedDf") spark.sql("SELECT * FROM importedDf WHERE productId != 3")
Option 5 is the only correct way listed of reading in a JSON in PySpark. The option("format", "json") is not the correct way to tell Spark's DataFrameReader that you want to read a JSON file. You would do this through format("json") instead. Also, you can communicate the specific path of the JSON file to the DataFramReader using the load() method, not the path() method.
In order to use a SQL command through the SparkSession spark, you first need to create a temporary view through DataFrame.createOrReplaceTempView().
The SQL statement should start with the SELECT operator. The FILTER operator SQL provides is not the correct one to use here.

Static notebook | Dynamic notebook: See test 2, Question: 59 ( Databricks import instructions)



Viewing page 2 of 37
Viewing questions 6 - 10 out of 342 questions



Post your Comments and Discuss Databricks Databricks Certified Associate Developer for Apache Spark 3.0 exam prep with other Community members:

Join the Databricks Certified Associate Developer for Apache Spark 3.0 Discussion