A user new to Databricks is trying to troubleshoot long execution times for some pipeline logic they are working on. Presently, the user is executing code cell-by-cell, using display() calls to confirm code is producing the logically correct results as new transformations are added to an operation. To get a measure of average time to execute, the user is running each cell multiple times interactively.Which of the following adjustments will get a more accurate measure of how code is likely to perform in production?
Answer(s): B
A production cluster has 3 executor nodes and uses the same virtual machine type for the driver and executor.When evaluating the Ganglia Metrics for this cluster, which indicator would signal a bottleneck caused by code executing on the driver?
Answer(s): E
Where in the Spark UI can one diagnose a performance problem induced by not leveraging predicate push- down?
Review the following error traceback:Which statement describes the error being raised?
Which distribution does Databricks support for installing custom Python code packages?
Answer(s): D
Which Python variable contains a list of directories to be searched when trying to locate required modules?
Incorporating unit tests into a PySpark application requires upfront attention to the design of your jobs, or a potentially significant refactoring of existing code.Which statement describes a main benefit that offset this additional effort?
Answer(s): C
Which statement describes integration testing?
Answer(s): A
Post your Comments and Discuss Databricks Certified Data Engineer Professional exam dumps with other Community members:
💬 Did you find this helpful?
Thank you for sharing! Your feedback helps the community.