Free Professional Data Engineer Exam Braindumps (page: 26)

Page 25 of 95

Which of the following IAM roles does your Compute Engine account require to be able to run pipeline jobs?

  1. dataflow.worker
  2. dataflow.compute
  3. dataflow.developer
  4. dataflow.viewer

Answer(s): A

Explanation:

The dataflow.worker role provides the permissions necessary for a Compute Engine service account to execute work units for a Dataflow pipeline


Reference:

https://cloud.google.com/dataflow/access-control



Which of the following is not true about Dataflow pipelines?

  1. Pipelines are a set of operations
  2. Pipelines represent a data processing job
  3. Pipelines represent a directed graph of steps
  4. Pipelines can share data between instances

Answer(s): D

Explanation:

The data and transforms in a pipeline are unique to, and owned by, that pipeline.
While your program can create multiple pipelines, pipelines cannot share data or transforms


Reference:

https://cloud.google.com/dataflow/model/pipelines



By default, which of the following windowing behavior does Dataflow apply to unbounded data sets?

  1. Windows at every 100 MB of data
  2. Single, Global Window
  3. Windows at every 1 minute
  4. Windows at every 10 minutes

Answer(s): B

Explanation:

Dataflow's default windowing behavior is to assign all elements of a PCollection to a single, global window, even for unbounded PCollections


Reference:

https://cloud.google.com/dataflow/model/pcollection



Which of the following job types are supported by Cloud Dataproc (select 3 answers)?

  1. Hive
  2. Pig
  3. YARN
  4. Spark

Answer(s): A,B,D

Explanation:

Cloud Dataproc provides out-of-the box and end-to-end support for many of the most popular job types, including Spark, Spark SQL, PySpark, MapReduce, Hive, and Pig jobs.


Reference:

https://cloud.google.com/dataproc/docs/resources/faq#what_type_of_jobs_can_i_run






Post your Comments and Discuss Google Professional Data Engineer exam with other Community members:

Exam Discussions & Posts