Free Professional Data Engineer Exam Braindumps (page: 37)

Page 36 of 95

You launched a new gaming app almost three years ago. You have been uploading log files from the previous day to a separate Google BigQuery table with the table name format LOGS_yyyymmdd. You have been using table wildcard functions to generate daily and monthly reports for all time ranges. Recently, you discovered that some queries that cover long date ranges are exceeding the limit of 1,000 tables and failing. How can you resolve this issue?

  1. Convert all daily log tables into date-partitioned tables
  2. Convert the sharded tables into a single partitioned table
  3. Enable query caching so you can cache data from previous months
  4. Create separate views to cover each month, and query from these views

Answer(s): A



Your analytics team wants to build a simple statistical model to determine which customers are most likely to work with your company again, based on a few different metrics. They want to run the model on Apache Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on a 15-node cluster, outputting the results into Google BigQuery. The plan is to run this workload weekly. How should you optimize the cluster for cost?

  1. Migrate the workload to Google Cloud Dataflow
  2. Use pre-emptible virtual machines (VMs) for the cluster
  3. Use a higher-memory node so that the job runs faster
  4. Use SSDs on the worker nodes so that the job can run faster

Answer(s): A



Your company receives both batch- and stream-based event dat

  1. You want to process the data using Google Cloud Dataflow over a predictable time period.
    However, you realize that in some instances data can arrive late or out of order. How should you design your Cloud Dataflow pipeline to handle data that is late or out of order?
  2. Set a single global window to capture all the data.
  3. Set sliding windows to capture all the lagged data.
  4. Use watermarks and timestamps to capture the lagged data.
  5. Ensure every datasource type (stream or batch) has a timestamp, and use the timestamps to define the logic for lagged data.

Answer(s): B



You have some data, which is shown in the graphic below. The two dimensions are X and Y, and the shade of each dot represents what class it is. You want to classify this data accurately using a linear algorithm.



To do this you need to add a synthetic feature.
What should the value of that feature be?

  1. X^2+Y^2
  2. X^2
  3. Y^2
  4. cos(X)

Answer(s): D






Post your Comments and Discuss Google Professional Data Engineer exam with other Community members:

Exam Discussions & Posts