Free Professional Data Engineer Exam Braindumps (page: 36)

Page 36 of 68

You are implementing security best practices on your data pipeline. Currently, you are manually executing jobs as the Project Owner. You want to automate these jobs by taking nightly batch files containing non-public information from Google Cloud Storage,
processing them with a Spark Scala job on a Google Cloud Dataproc cluster, and depositing the results into Google BigQuery.

How should you securely run this workload?

  1. Restrict the Google Cloud Storage bucket so only you can see the files
  2. Grant the Project Owner role to a service account, and run the job with it
  3. Use a service account with the ability to read the batch files and to write to BigQuery
  4. Use a user account with the Project Viewer role on the Cloud Dataproc cluster to read the batch files and write to BigQuery

Answer(s): B



You want to rebuild your batch pipeline for structured data on Google Cloud You are using PySpark to conduct data transformations at scale, but your pipelines are taking over twelve hours to run To expedite development and pipeline run time, you want to use a serverless tool and SQL syntax You have already moved your raw data into Cloud Storage How should you build the pipeline on Google Cloud while meeting speed and processing requirements?

  1. Convert your PySpark commands into SparkSQL queries to transform the data; and then run your pipeline
    on Dataproc to write the data into BigQuery
  2. Ingest your data into Cloud SQL, convert your PySpark commands into SparkSQL queries to transform the
    data, and then use federated queries from BigQuery for machine learning.
  3. Ingest your data into BigQuery from Cloud Storage, convert your PySpark commands into BigQuery SQL
    queries to transform the data, and then write the transformations to a new table
  4. Use Apache Beam Python SDK to build the transformation pipelines, and write the data into BigQuery

Answer(s): A



You are operating a streaming Cloud Dataflow pipeline. Your engineers have a new version of the pipeline with a different windowing algorithm and triggering strategy. You
want to update the running pipeline with the new version. You want to ensure that no data is lost during the update.
What should you do?

  1. Update the Cloud Dataflow pipeline inflight by passing the --update option with the -- jobName set to the existing job name
  2. Update the Cloud Dataflow pipeline inflight by passing the --update option with the -- jobName set to a new unique job name
  3. Stop the Cloud Dataflow pipeline with the Cancel option. Create a new Cloud Dataflow job with the updated code
  4. Stop the Cloud Dataflow pipeline with the Drain option. Create a new Cloud Dataflow job with the updated code

Answer(s): A

Explanation:

References:



You want to analyze hundreds of thousands of social media posts daily at the lowest cost and with the fewest steps.

You have the following requirements:

You will batch-load the posts once per day and run them through the Cloud Natural Language API.
You will extract topics and sentiment from the posts. You must store the raw posts for archiving and reprocessing. You will create dashboards to be shared with people both inside and outside your organization.

You need to store both the data extracted from the API to perform analysis as well as the raw social media posts for historical archiving.
What should you do?

  1. Store the social media posts and the data extracted from the API in BigQuery.
  2. Store the social media posts and the data extracted from the API in Cloud SQL.
  3. Store the raw social media posts in Cloud Storage, and write the data extracted from the API into BigQuery.
  4. Feed to social media posts into the API directly from the source, and write the extracted data from the API into BigQuery.

Answer(s): D



Page 36 of 68



Post your Comments and Discuss Google Professional Data Engineer exam with other Community members:

madhan commented on June 16, 2023
next question
EUROPEAN UNION
upvote