Free Certified Data Engineer Professional Exam Braindumps (page: 22)

Page 22 of 46

A distributed team of data analysts share computing resources on an interactive cluster with autoscaling configured. In order to better manage costs and query throughput, the workspace administrator is hoping to evaluate whether cluster upscaling is caused by many concurrent users or resource-intensive queries.

In which location can one review the timeline for cluster resizing events?

  1. Workspace audit logs
  2. Driver's log file
  3. Ganglia
  4. Cluster Event Log
  5. Executor's log file

Answer(s): D



When evaluating the Ganglia Metrics for a given cluster with 3 executor nodes, which indicator would signal proper utilization of the VM's resources?

  1. The five Minute Load Average remains consistent/flat
  2. Bytes Received never exceeds 80 million bytes per second
  3. Network I/O never spikes
  4. Total Disk Space remains constant
  5. CPU Utilization is around 75%

Answer(s): E



Which of the following technologies can be used to identify key areas of text when parsing Spark Driver log4j output?

  1. Regex
  2. Julia
  3. pyspsark.ml.feature
  4. Scala Datasets
  5. C++

Answer(s): A



You are testing a collection of mathematical functions, one of which calculates the area under a curve as described by another function.

assert(myIntegrate(lambda x: x*x, 0, 3) [0] == 9)

Which kind of test would the above line exemplify?

  1. Unit
  2. Manual
  3. Functional
  4. Integration
  5. End-to-end

Answer(s): A



Page 22 of 46



Post your Comments and Discuss Databricks Certified Data Engineer Professional exam with other Community members:

Puran commented on September 18, 2024
Good material and very honest and knowledgeable support team. Contacted the support team and got a reply in less than 30 minutes.
New Zealand
upvote