Databricks Certified Associate Developer for Apache Spark 3.0: Certified Associate Developer for Apache Spark
Free Practice Exam Questions (page: 14)
Updated On: 2-Jan-2026

The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes contains the element cozy.

A sample of DataFrame itemsDf is below.

Code block:
itemsDf. 1 ( 2 ). 3 ( 4 , 5 ( 6 ))

  1. 1. filter
    2. array_contains("cozy")
    3. select
    4. "itemId"
    5. explode
    6. "attributes"
  2. 1. where
    2. "array_contains(attributes, 'cozy')"
    3. select
    4. itemId
    5. explode
    6. attributes
  3. 1. filter
    2. "array_contains(attributes, 'cozy')"
    3. select
    4. "itemId"
    5. map
    6. "attributes"
  4. 1. filter
    2. "array_contains(attributes, cozy)"
    3. select
    4. "itemId"
    5. explode
    6. "attributes"
  5. 1. filter
    2. "array_contains(attributes, 'cozy')"
    3. select
    4. "itemId"
    5. explode
    6. "attributes"

Answer(s): E

Explanation:

The correct code block is:
itemsDf.filter("array_contains(attributes, 'cozy')").select("itemId", explode("attributes"))
The key here is understanding how to use array_contains(). You can either use it as an expression in a string, or you can import it from pyspark.sql.functions. In that case, the following would also work:
itemsDf.filter(array_contains("attributes", "cozy")).select("itemId", explode("attributes")) Static notebook | Dynamic notebook: See test 1, Question: 29 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/29.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



The code block displayed below contains an error. The code block should return the average of rows in column value grouped by unique storeId. Find the error.
Code block: transactionsDf.agg("storeId").avg("value")

  1. Instead of avg("value"), avg(col("value")) should be used.
  2. The avg("value") should be specified as a second argument to agg() instead of being appended to it.
  3. All column names should be wrapped in col() operators.
  4. agg should be replaced by groupBy.
  5. "storeId" and "value" should be swapped.

Answer(s): D

Explanation:

Static notebook | Dynamic notebook: See test 1, Question: 30 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/30.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Which of the following code blocks returns a copy of DataFrame itemsDf where the column supplier has been renamed to manufacturer?

  1. itemsDf.withColumn(["supplier", "manufacturer"])
  2. itemsDf.withColumn("supplier").alias("manufacturer")
  3. itemsDf.withColumnRenamed("supplier", "manufacturer")
  4. itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))
  5. itemsDf.withColumnsRenamed("supplier", "manufacturer")

Answer(s): C

Explanation:

itemsDf.withColumnRenamed("supplier", "manufacturer")

Correct! This uses the relatively trivial DataFrame method withColumnRenamed for renaming column supplier to column manufacturer.
Note that the Question: asks for
"a copy of DataFrame itemsDf". This may be confusing if you are not familiar with Spark yet. RDDs (Resilient Distributed Datasets) are the foundation of Spark DataFrames and are immutable. As such, DataFrames are immutable, too. Any command that changes anything in the DataFrame therefore necessarily returns a copy, or a new version, of it that has the changes applied. itemsDf.withColumnsRenamed("supplier", "manufacturer")
Incorrect. Spark's DataFrame API does not have a withColumnsRenamed() method. itemsDf.withColumnRenamed(col("manufacturer"), col("supplier"))
No. Watch out – although the col() method works for many methods of the DataFrame API, withColumnRenamed is not one of them. As outlined in the documentation linked below, withColumnRenamed expects strings. itemsDf.withColumn(["supplier", "manufacturer"])
Wrong. While DataFrame.withColumn() exists in Spark, it has a different purpose than renaming columns. withColumn is typically used to add columns to DataFrames, taking the name of the new column as a first, and a Column as a second argument. Learn more via the documentation that is linked below.
itemsDf.withColumn("supplier").alias("manufacturer")
No. While DataFrame.withColumn() exists, it requires 2 arguments. Furthermore, the alias() method on DataFrames would not help the cause of renaming a column much. DataFrame.alias() can be useful in addressing the input of join statements. However, this is far outside of the scope of this question. If you are curious nevertheless, check out the link below.
More info: pyspark.sql.DataFrame.withColumnRenamed — PySpark 3.1.1 documentation, pyspark.sql.DataFrame.withColumn — PySpark 3.1.1 documentation, and pyspark.sql.DataFrame.alias —

PySpark 3.1.2 documentation (https://bit.ly/3aSB5tm , https://bit.ly/2Tv4rbE , https://bit.ly/2RbhBd2)
Static notebook | Dynamic notebook: See test 1, Question: 31 (Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/31.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Which of the following code blocks returns DataFrame transactionsDf sorted in descending order by column predError, showing missing values last?

  1. transactionsDf.sort(asc_nulls_last("predError"))
  2. transactionsDf.orderBy("predError").desc_nulls_last()
  3. transactionsDf.sort("predError", ascending=False)
  4. transactionsDf.desc_nulls_last("predError")
  5. transactionsDf.orderBy("predError").asc_nulls_last()

Answer(s): C

Explanation:

transactionsDf.sort("predError", ascending=False)
Correct! When using DataFrame.sort() and setting ascending=False, the DataFrame will be sorted by the specified column in descending order, putting all missing values last. An alternative, although not listed as an answer here, would be transactionsDf.sort(desc_nulls_last("predError")). transactionsDf.sort(asc_nulls_last("predError"))
Incorrect. While this is valid syntax, the DataFrame will be sorted on column predError in ascending order and not in descending order, putting missing values last. transactionsDf.desc_nulls_last("predError")
Wrong, this is invalid syntax. There is no method DataFrame.desc_nulls_last() in the Spark API. There is a Spark function desc_nulls_last() however (link see below). transactionsDf.orderBy("predError").desc_nulls_last()
No. While transactionsDf.orderBy("predError") is correct syntax (although it sorts the DataFrame by column predError in ascending order) and returns a DataFrame, there is no method DataFrame.desc_nulls_last() in the Spark API. There is a Spark function desc_nulls_last() however (link see below).
transactionsDf.orderBy("predError").asc_nulls_last()
Incorrect. There is no method DataFrame.asc_nulls_last() in the Spark API (see above).
More info: pyspark.sql.functions.desc_nulls_last — PySpark 3.1.2 documentation and pyspark.sql.DataFrame.sort — PySpark 3.1.2 documentation (https://bit.ly/3g1JtbI , https://bit.ly/2R90NCS)
Static notebook | Dynamic notebook: See test 1, Question: 32 (
Databricks import instructions) (https://flrs.github.io/spark_practice_tests_code/#1/32.html ,
https://bit.ly/sparkpracticeexams_import_instructions)



Viewing page 14 of 46
Viewing questions 53 - 56 out of 342 questions



Post your Comments and Discuss Databricks Databricks Certified Associate Developer for Apache Spark 3.0 exam prep with other Community members:

Databricks Certified Associate Developer for Apache Spark 3.0 Exam Discussions & Posts