Free Professional Data Engineer Exam Braindumps (page: 6)

Page 6 of 68

You are working on a sensitive project involving private user data. You have set up a project on Google Cloud Platform to house your work internally. An external consultant is going to assist with coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should you maintain users' privacy?

  1. Grant the consultant the Viewer role on the project.
  2. Grant the consultant the Cloud Dataflow Developer role on the project.
  3. Create a service account and allow the consultant to log on with it.
  4. Create an anonymized sample of the data for the consultant to work with in a different project.

Answer(s): C



Your company is migrating their 30-node Apache Hadoop cluster to the cloud. They want to re-use Hadoop jobs they have already created and minimize the management of the cluster as much as possible. They also want to be able to persist data beyond the life of the cluster.
What should you do?

  1. Create a Google Cloud Dataflow job to process the data.
  2. Create a Google Cloud Dataproc cluster that uses persistent disks for HDFS.
  3. Create a Hadoop cluster on Google Compute Engine that uses persistent disks.
  4. Create a Cloud Dataproc cluster that uses the Google Cloud Storage connector.
  5. Create a Hadoop cluster on Google Compute Engine that uses Local SSD disks.

Answer(s): D



You have Google Cloud Dataflow streaming pipeline running with a Google Cloud Pub/Sub subscription as the source. You need to make an update to the code that will make the new Cloud Dataflow pipeline incompatible with the current version. You do not want to lose any data when making this update.
What should you do?

  1. Update the current pipeline and use the drain flag.
  2. Update the current pipeline and provide the transform mapping JSON object.
  3. Create a new pipeline that has the same Cloud Pub/Sub subscription and cancel the old
    pipeline.
  4. Create a new pipeline that has a new Cloud Pub/Sub subscription and cancel the old pipeline.

Answer(s): D



An external customer provides you with a daily dump of data from their database. The data flows into Google Cloud Storage GCS as comma-separated values (CSV) files. You want to analyze this data in Google BigQuery, but the data could have rows that are formatted incorrectly or corrupted. How should you build this pipeline?

  1. Use federated data sources, and check data in the SQL query.
  2. Enable BigQuery monitoring in Google Stackdriver and create an alert.
  3. Import the data into BigQuery using the gcloud CLI and set max_bad_records to 0.
  4. Run a Google Cloud Dataflow batch pipeline to import the data into BigQuery, and push errors to another dead-letter table for analysis.

Answer(s): D



Page 6 of 68



Post your Comments and Discuss Google Professional Data Engineer exam with other Community members:

madhan 6/16/2023 6:22:08 AM
next question
EUROPEAN UNION
upvote