Free Professional Data Engineer Exam Braindumps (page: 5)

Page 5 of 68

You want to process payment transactions in a point-of-sale application that will run on Google Cloud Platform. Your user base could grow exponentially, but you do not want to manage infrastructure scaling.

Which Google database service should you use?

  1. Cloud SQL
  2. BigQuery
  3. Cloud Bigtable
  4. Cloud Datastore

Answer(s): A

Your software uses a simple JSON format for all messages. These messages are published to Google Cloud Pub/Sub, then processed with Google Cloud Dataflow to create a real-time dashboard for the CFO. During testing, you notice that some messages are missing in the dashboard. You check the logs, and all messages are being published to Cloud Pub/Sub successfully.
What should you do next?

  1. Check the dashboard application to see if it is not displaying correctly.
  2. Run a fixed dataset through the Cloud Dataflow pipeline and analyze the output.
  3. Use Google Stackdriver Monitoring on Cloud Pub/Sub to find the missing messages.
  4. Switch Cloud Dataflow to pull messages from Cloud Pub/Sub instead of Cloud Pub/Sub pushing messages to Cloud Dataflow.

Answer(s): B

Your company handles data processing for a number of different clients. Each client prefers to use their own suite of analytics tools, with some allowing direct query access via Google BigQuery. You need to secure the data so that clients cannot see each other's data. You want to ensure appropriate access to the data.
Which three steps should you take? (Choose three.)

  1. Load data into different partitions.
  2. Load data into a different dataset for each client.
  3. Put each client's BigQuery dataset into a different table.
  4. Restrict a client's dataset to approved users.
  5. Only allow a service account to access the datasets.
  6. Use the appropriate identity and access management (IAM) roles for each client's users.

Answer(s): B,D,F

You are building new real-time data warehouse for your company and will use Google BigQuery streaming inserts. There is no guarantee that data will only be sent in once but you do have a unique ID for each row of data and an event timestamp. You want to ensure that duplicates are not included while interactively querying data.
Which query type should you use?

  1. Include ORDER BY DESK on timestamp column and LIMIT to 1.
  2. Use GROUP BY on the unique ID column and timestamp column and SUM on the values.
  3. Use the LAG window function with PARTITION by unique ID along with WHERE LAG IS NOT NULL.
  4. Use the ROW_NUMBER window function with PARTITION by unique ID along with WHERE row equals 1.

Answer(s): D


Page 5 of 68

Post your Comments and Discuss Google Professional Data Engineer exam with other Community members:

madhan 6/16/2023 6:22:08 AM
next question