Databricks Databricks Certified Associate Developer for Apache Spark 3.0 Exam
Certified Associate Developer for Apache Spark (Page 18 )

Updated On: 26-Jan-2026

Which of the following code blocks returns a copy of DataFrame transactionsDf in which column productId has been renamed to productNumber?

  1. transactionsDf.withColumnRenamed("productId", "productNumber")
  2. transactionsDf.withColumn("productId", "productNumber")
  3. transactionsDf.withColumnRenamed("productNumber", "productId")
  4. transactionsDf.withColumnRenamed(col(productId), col(productNumber))
  5. transactionsDf.withColumnRenamed(productId, productNumber)

Answer(s): A

Explanation:

More info: pyspark.sql.DataFrame.withColumnRenamed — PySpark 3.1.2 documentation Static notebook | Dynamic notebook: See test 2, Question: 35 (
Databricks import instructions)



Which of the following code blocks returns a copy of DataFrame transactionsDf that only includes columns transactionId, storeId, productId and f?

Sample of DataFrame transactionsDf:

  1. transactionsDf.drop(col("value"), col("predError"))
  2. transactionsDf.drop("predError", "value")
  3. transactionsDf.drop(value, predError)
  4. transactionsDf.drop(["predError", "value"])
  5. transactionsDf.drop([col("predError"), col("value")])

Answer(s): B

Explanation:

Output of correct code block:

To solve this question, you should be familiar with the drop() API. The order of column names does not matter – in this Question: the order
differs in some answers just to confuse you. Also, drop() does not take a list. The *cols operator in the documentation means that all arguments passed to drop() are interpreted as column names.

More info: pyspark.sql.DataFrame.drop — PySpark 3.1.2 documentation
Static notebook | Dynamic notebook: See test 2, Question: 36 (Databricks import instructions)



The code block shown below should return a one-column DataFrame where the column storeId is converted to string type. Choose the answer that correctly fills the blanks in the code block to accomplish this.
transactionsDf . 1 ( 2 . 3 ( 4 ))

  1. 1. select
    2. col("storeId")
    3. cast
    4. StringType
  2. 1. select
    2. col("storeId")
    3. as
    4. StringType
  3. 1. cast
    2. "storeId"
    3. as
    4. StringType()
  4. 1. select
    2. col("storeId")
    3. cast
    4. StringType()
  5. 1. select
    2. storeId
    3. cast
    4. StringType()

Answer(s): D

Explanation:

Correct code block: transactionsDf.select(col("storeId").cast(StringType()))
Solving this Question: involves understanding
that, when using types from the pyspark.sql.types such as StringType, these types need to be instantiated when using them in Spark, or, in simple words, they need to be followed by parentheses like so: StringType(). You could also use
.cast("string") instead, but that option is not given here.
More info: pyspark.sql.Column.cast — PySpark 3.1.2 documentation

Static notebook | Dynamic notebook: See test 2, Question: 37 (Databricks import instructions)



Which of the following code blocks creates a new one-column, two-row DataFrame dfDates with column date of type timestamp?

  1. 1. dfDates = spark.createDataFrame(["23/01/2022 11:28:12","24/01/2022 10:58:34"], ["date"])
    2. dfDates = dfDates.withColumn("date", to_timestamp("dd/MM/yyyy HH:mm:ss", "date"))
  2. 1. dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"])
    2. dfDates = dfDates.withColumnRenamed("date", to_timestamp("date", "yyyy-MM-dd HH:mm:ss"))
  3. 1. dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"])
    2. dfDates = dfDates.withColumn("date", to_timestamp("date", "dd/MM/yyyy HH:mm:ss"))
  4. 1. dfDates = spark.createDataFrame(["23/01/2022 11:28:12","24/01/2022 10:58:34"], ["date"])
    2. dfDates = dfDates.withColumnRenamed("date", to_datetime("date", "yyyy-MM-dd HH:mm:ss"))
  5. 1. dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"])

Answer(s): C

Explanation:

This Question: is tricky
Two things are important to know here:
First, the syntax for createDataFrame: Here you need a list of tuples, like so: [(1,), (2,)]. To define a tuple in Python, if you just have a single item in it, it is important to put a comma after the item so that Python interprets it as a tuple and not just a normal parenthesis.
Second, you should understand the to_timestamp syntax. You can find out more about it in the documentation linked below.
For good measure, let's examine in detail why the incorrect options are wrong:
dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"]) This code snippet does everything the Question: asks for – except that the data type of the date column is a string and not a timestamp. When no schema is specified, Spark sets the string data type as default.
dfDates = spark.createDataFrame(["23/01/2022 11:28:12","24/01/2022 10:58:34"], ["date"]) dfDates = dfDates.withColumn("date", to_timestamp("dd/MM/yyyy HH:mm:ss", "date"))
In the first row of this command, Spark throws the following error: TypeError: Can not infer schema for type: <class 'str'>. This is because Spark expects to find row information, but instead finds strings. This is why you need to specify the data as tuples. Fortunately, the Spark documentation (linked below) shows a number of examples for creating DataFrames that should help you get on the right track here.
dfDates = spark.createDataFrame([("23/01/2022 11:28:12",),("24/01/2022 10:58:34",)], ["date"]) dfDates = dfDates.withColumnRenamed("date", to_timestamp("date", "yyyy-MM-dd HH:mm:ss")) The issue with this answer is that the operator withColumnRenamed is used. This operator simply renames a column, but it has no power to modify its actual content. This is why withColumn should be used instead. In addition, the date format yyyy-MM-dd HH:mm:ss does not reflect the format of the actual timestamp: "23/01/2022 11:28:12".
dfDates = spark.createDataFrame(["23/01/2022 11:28:12","24/01/2022 10:58:34"], ["date"]) dfDates = dfDates.withColumnRenamed("date", to_datetime("date", "yyyy-MM-dd HH:mm:ss"))
Here, withColumnRenamed is used instead of withColumn (see above). In addition, the rows are not expressed correctly – they should be written as tuples, using parentheses. Finally, even the date format is off here (see above).

More info: pyspark.sql.functions.to_timestamp — PySpark 3.1.2 documentation and pyspark.sql.SparkSession.createDataFrame — PySpark 3.1.1 documentation
Static notebook | Dynamic notebook: See test 2, Question: 38 (Databricks import instructions)



The code block displayed below contains an error. The code block should save DataFrame transactionsDf at path path as a parquet file, appending to any existing parquet file. Find the error. Code block:

  1. transactionsDf.format("parquet").option("mode", "append").save(path)
  2. The code block is missing a reference to the DataFrameWriter.
  3. save() is evaluated lazily and needs to be followed by an action.
  4. The mode option should be omitted so that the command uses the default mode.
  5. The code block is missing a bucketBy command that takes care of partitions.
  6. Given that the DataFrame should be saved as parquet file, path is being passed to the wrong method.

Answer(s): B

Explanation:

Correct code block:
transactionsDf.write.format("parquet").option("mode", "append").save(path)



Viewing page 18 of 37
Viewing questions 86 - 90 out of 342 questions



Post your Comments and Discuss Databricks Databricks Certified Associate Developer for Apache Spark 3.0 exam prep with other Community members:

Join the Databricks Certified Associate Developer for Apache Spark 3.0 Discussion