QUESTION: 5 Exam Topic: 1, Main Questions Set A

An external customer provides you with a daily dump of data from their database. The data flows into Google Cloud Storage GCS as comma-separated values (CSV) files. You want to analyze this data in Google BigQuery, but the data could have rows that are formatted incorrectly or corrupted. How should you build this pipeline?

Use federated data sources, and check data in the SQL query.
Enable BigQuery monitoring in Google Stackdriver and create an alert.
Import the data into BigQuery using the gcloud CLI and set max_bad_records to 0.
Run a Google Cloud Dataflow batch pipeline to import the data into BigQuery, and push errors to another dead-letter table for analysis.

Answer(s): D

Reveal Solution Next Question

QUESTION: 6 Exam Topic: 1, Main Questions Set A

Your weather app queries a database every 15 minutes to get the current temperature. The frontend is powered by Google App Engine and server millions of users. How should you design the frontend to respond to a database failure?

Issue a command to restart the database servers.
Retry the query with exponential backoff, up to a cap of 15 minutes.
Retry the query every second until it comes back online to minimize staleness of data.
Reduce the query frequency to once every hour until the database comes back online.

Answer(s): B

Explanation:

https://cloud.google.com/sql/docs/mysql/manage-connections#backoff

Reveal Solution Next Question

QUESTION: 7 Exam Topic: 1, Main Questions Set A

You are creating a model to predict housing prices. Due to budget constraints, you must run it on a single resource-constrained virtual machine.
Which learning algorithm should you use?

Linear regression
Logistic classification
Recurrent neural network
Feedforward neural network

Answer(s): A

Reveal Solution Next Question

QUESTION: 8 Exam Topic: 1, Main Questions Set A

You are building new real-time data warehouse for your company and will use Google BigQuery streaming inserts. There is no guarantee that data will only be sent in once but you do have a unique ID for each row of data and an event timestamp. You want to ensure that duplicates are not included while interactively querying dat

Which query type should you use?
Include ORDER BY DESK on timestamp column and LIMIT to 1.
Use GROUP BY on the unique ID column and timestamp column and SUM on the values.
Use the LAG window function with PARTITION by unique ID along with WHERE LAG IS NOT NULL.
Use the ROW_NUMBER window function with PARTITION by unique ID along with WHERE row equals 1.

Answer(s): D

Explanation:

https://cloud.google.com/bigquery/docs/reference/standard-sql/analytic-function-concepts

Reveal Solution Next Question

Free Google Professional Data Engineer Exam Braindumps (page: 3)

QUESTION: 5 Exam Topic: 1, Main Questions Set A

QUESTION: 6 Exam Topic: 1, Main Questions Set A

Explanation:

QUESTION: 7 Exam Topic: 1, Main Questions Set A

QUESTION: 8 Exam Topic: 1, Main Questions Set A

Explanation:

Professional Data Engineer Exam Discussions & Posts