Free Professional Data Engineer Exam Braindumps (page: 35)

Page 35 of 68

You have Cloud Functions written in Node.js that pull messages from Cloud Pub/Sub and send the data to BigQuery. You observe that the message processing rate on the Pub/Sub topic is orders of magnitude higher than anticipated, but there is no error logged in Stackdriver Log Viewer.
What are the two most likely causes of this problem? Choose 2 answers.

  1. Publisher throughput quota is too small.
  2. Total outstanding messages exceed the 10-MB maximum.
  3. Error handling in the subscriber code is not handling run-time errors properly.
  4. The subscriber code cannot keep up with the messages.
  5. The subscriber code does not acknowledge the messages that it pulls.

Answer(s): C,D



Your team is responsible for developing and maintaining ETLs in your company. One of your Dataflow jobs is failing because of some errors in the input data, and you need to improve reliability of the pipeline (incl. being able to reprocess all failing data).

What should you do?

  1. Add a filtering step to skip these types of errors in the future, extract erroneous rows from logs.
  2. Add a try... catch block to your DoFn that transforms the data, extract erroneous rows from logs.
  3. Add a try... catch block to your DoFn that transforms the data, write erroneous rows to PubSub directly from the DoFn.
  4. Add a try... catch block to your DoFn that transforms the data, use a sideOutput to create a PCollection that can be stored to PubSub later.

Answer(s): C



Your company is migrating its on-premises data warehousing solution to BigQuery. The existing data warehouse uses trigger-based change data capture (CDC) to apply daily updates from transactional database sources Your company wants to use BigQuery to improve its handling of CDC and to optimize the performance of the data warehouse Source system changes must be available for query m near-real time using tog-based CDC streams You need to ensure that changes in the BigQuery reporting table are available with minimal latency and reduced overhead.
What should you do? Choose 2 answers

  1. Perform a DML INSERT UPDATE, or DELETE to replicate each CDC record in the reporting table m real time.
  2. Periodically DELETE outdated records from the reporting table Periodically use a DML MERGE to simultaneously perform DML INSERT. UPDATE, and DELETE operations in the reporting table
  3. Insert each new CDC record and corresponding operation type into a staging table in real time
  4. Insert each new CDC record and corresponding operation type into the reporting table in real time and use a materialized view to expose only the current version of each unique record.

Answer(s): B,D



A live TV show asks viewers to cast votes using their mobile phones. The event generates a large volume of data during a 3 minute period. You are in charge of the Voting restructure* and must ensure that the platform can handle the load and Hal all votes are processed. You must display partial results write voting is open. After voting doses you need to count the votes exactly once white optimizing cost.
What should you do?

  1. Create a Memorystore instance with a high availability (HA) configuration
  2. Write votes to a Pub Sub tope and have Cloud Functions subscribe to it and write voles to BigQuery
  3. Write votes to a Pub/Sub tope and toad into both Bigtable and BigQuery via a Dataflow pipeline Query Bigtable for real-time results and BigQuery for later analysis Shutdown the Bigtable instance when voting concludes
    D Create a Cloud SQL for PostgreSQL database with high availability (HA) configuration and multiple read replicas

Answer(s): C



Page 35 of 68



Post your Comments and Discuss Google Professional Data Engineer exam with other Community members:

madhan commented on June 16, 2023
next question
EUROPEAN UNION
upvote