QUESTION: 29 Exam Topic: 1, Main Questions Set A

Your company is streaming real-time sensor data from their factory floor into Bigtable and they have noticed extremely poor performance. How should the row key be redesigned to improve Bigtable performance on queries that populate real-time dashboards?

Use a row key of the form <timestamp>.
Use a row key of the form <sensorid>.
Use a row key of the form <timestamp>#<sensorid>.
Use a row key of the form >#<sensorid>#<timestamp>.

Answer(s): A

Reveal Solution Next Question

QUESTION: 30 Exam Topic: 1, Main Questions Set A

Your company's customer and order databases are often under heavy load. This makes performing analytics against them difficult without harming operations. The databases are in a MySQL cluster, with nightly backups taken using mysqldump. You want to perform analytics with minimal impact on operations.
What should you do?

Add a node to the MySQL cluster and build an OLAP cube there.
Use an ETL tool to load the data from MySQL into Google BigQuery.
Connect an on-premises Apache Hadoop cluster to MySQL and perform ETL.
Mount the backups to Google Cloud SQL, and then process the data using Google Cloud Dataproc.

Answer(s): C

Reveal Solution Next Question

QUESTION: 31 Exam Topic: 1, Main Questions Set A

You have Google Cloud Dataflow streaming pipeline running with a Google Cloud Pub/Sub subscription as the source. You need to make an update to the code that will make the new Cloud Dataflow pipeline incompatible with the current version. You do not want to lose any data when making this update.
What should you do?

Update the current pipeline and use the drain flag.
Update the current pipeline and provide the transform mapping JSON object.
Create a new pipeline that has the same Cloud Pub/Sub subscription and cancel the old pipeline.
Create a new pipeline that has a new Cloud Pub/Sub subscription and cancel the old pipeline.

Answer(s): D

Reveal Solution Next Question

QUESTION: 32 Exam Topic: 1, Main Questions Set A

Your company is running their first dynamic campaign, serving different offers by analyzing real-time data during the holiday season. The data scientists are collecting terabytes of data that rapidly grows every hour during their 30-day campaign. They are using Google Cloud Dataflow to preprocess the data and collect the feature (signals) data that is needed for the machine learning model in Google Cloud Bigtable. The team is observing suboptimal performance with reads and writes of their initial load of 10 TB of dat

They want to improve this performance while minimizing cost.
What should they do?
Redefine the schema by evenly distributing reads and writes across the row space of the table.
The performance issue should be resolved over time as the site of the BigDate cluster is increased.
Redesign the schema to use a single row key to identify values that need to be updated frequently in the cluster.
Redesign the schema to use row keys based on numeric IDs that increase sequentially per user viewing the offers.

Answer(s): A

Reveal Solution Next Question

Free Professional Data Engineer Exam Braindumps (page: 19)

QUESTION: 29 Exam Topic: 1, Main Questions Set A

QUESTION: 30 Exam Topic: 1, Main Questions Set A

QUESTION: 31 Exam Topic: 1, Main Questions Set A

QUESTION: 32 Exam Topic: 1, Main Questions Set A

Professional Data Engineer Exam Discussions & Posts