Databricks Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions
Certified Associate Developer for Apache Spark (Page 11 )

Updated On: 21-Feb-2026

Which of the following code blocks returns a DataFrame that has all columns of DataFrame transactionsDf and an additional column predErrorSquared which is the squared value of column predError in DataFrame transactionsDf?

  1. transactionsDf.withColumn("predError", pow(col("predErrorSquared"), 2))
  2. transactionsDf.withColumnRenamed("predErrorSquared", pow(predError, 2))
  3. transactionsDf.withColumn("predErrorSquared", pow(col("predError"), lit(2)))
  4. transactionsDf.withColumn("predErrorSquared", pow(predError, lit(2)))
  5. transactionsDf.withColumn("predErrorSquared", "predError"**2)

Answer(s): C

Explanation:

While only one of these code blocks works, the DataFrame API is pretty flexible when it comes to accepting columns into the pow() method. The following code blocks would also work: transactionsDf.withColumn("predErrorSquared", pow("predError", 2)) transactionsDf.withColumn("predErrorSquared", pow("predError", lit(2)))
Static notebook | Dynamic notebook: See test 1, Question: 26 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/26.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



The code block displayed below contains an error. The code block should return a new DataFrame
that only contains rows from DataFrame transactionsDf in which the value in column predError is at least 5. Find the error. Code block:
transactionsDf.where("col(predError) >= 5")

  1. The argument to the where method should be "predError >= 5".
  2. Instead of where(), filter() should be used.
  3. The expression returns the original DataFrame transactionsDf and not a new DataFrame. To avoid this, the code block should be transactionsDf.toNewDataFrame().where("col(predError) >= 5").
  4. The argument to the where method cannot be a string.
  5. Instead of >=, the SQL operator GEQ should be used.

Answer(s): A

Explanation:

The argument to the where method cannot be a string. It can be a string, no problem here.
Instead of where(), filter() should be used.
No, that does not matter. In PySpark, where() and filter() are equivalent. Instead of >=, the SQL operator GEQ should be used.
Incorrect.
The expression returns the original DataFrame transactionsDf and not a new DataFrame. To avoid this, the code block should be transactionsDf.toNewDataFrame().where("col(predError) >= 5").
No, Spark returns a new DataFrame.
Static notebook | Dynamic notebook: See test 1, Question: 27 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/27.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Which of the following code blocks saves DataFrame transactionsDf in location
/FileStore/transactions.csv as a CSV file and throws an error if a file already exists in the location?

  1. transactionsDf.write.save("/FileStore/transactions.csv")
  2. transactionsDf.write.format("csv").mode("error").path("/FileStore/transactions.csv")
  3. transactionsDf.write.format("csv").mode("ignore").path("/FileStore/transactions.csv")
  4. transactionsDf.write("csv").mode("error").save("/FileStore/transactions.csv")
  5. transactionsDf.write.format("csv").mode("error").save("/FileStore/transactions.csv")

Answer(s): E

Explanation:

Static notebook | Dynamic notebook: See test 1, question 28 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/28.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes contains the element cozy.

A sample of DataFrame itemsDf is below.

Code block:
itemsDf. 1 ( 2 ). 3 ( 4 , 5 ( 6 ))

  1. 1. filter
    2. array_contains("cozy")
    3. select
    4. "itemId"
    5. explode
    6. "attributes"
  2. 1. where
    2. "array_contains(attributes, 'cozy')"
    3. select
    4. itemId
    5. explode
    6. attributes
  3. 1. filter
    2. "array_contains(attributes, 'cozy')"
    3. select
    4. "itemId"
    5. map
    6. "attributes"
  4. 1. filter
    2. "array_contains(attributes, cozy)"
    3. select
    4. "itemId"
    5. explode
    6. "attributes"
  5. 1. filter
    2. "array_contains(attributes, 'cozy')"
    3. select
    4. "itemId"
    5. explode
    6. "attributes"

Answer(s): E

Explanation:

The correct code block is:
itemsDf.filter("array_contains(attributes, 'cozy')").select("itemId", explode("attributes"))
The key here is understanding how to use array_contains(). You can either use it as an expression in a string, or you can import it from pyspark.sql.functions. In that case, the following would also work:
itemsDf.filter(array_contains("attributes", "cozy")).select("itemId", explode("attributes")) Static notebook | Dynamic notebook: See test 1, Question: 29 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/29.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



The code block displayed below contains an error. The code block should return the average of rows in column value grouped by unique storeId. Find the error.
Code block: transactionsDf.agg("storeId").avg("value")

  1. Instead of avg("value"), avg(col("value")) should be used.
  2. The avg("value") should be specified as a second argument to agg() instead of being appended to it.
  3. All column names should be wrapped in col() operators.
  4. agg should be replaced by groupBy.
  5. "storeId" and "value" should be swapped.

Answer(s): D

Explanation:

Static notebook | Dynamic notebook: See test 1, Question: 30 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/30.html ,
https://bit.ly/sparkpracticeexams_import_instructions)






Post your Comments and Discuss Databricks Databricks Certified Associate Developer for Apache Spark 3.0 exam dumps with other Community members:

Join the Databricks Certified Associate Developer for Apache Spark 3.0 Discussion