Free Professional Data Engineer Exam Braindumps (page: 19)

Page 19 of 68

Which of the following statements about the Wide & Deep Learning model are true? (Select 2 answers.)

  1. The wide model is used for memorization, while the deep model is used for generalization.
  2. A good use for the wide and deep model is a recommender system.
  3. The wide model is used for generalization, while the deep model is used for memorization.
  4. A good use for the wide and deep model is a small-scale linear regression problem.

Answer(s): A,B

Explanation:

Can we teach computers to learn like humans do, by combining the power of memorization and generalization? It's not an easy question to answer, but by jointly training a wide linear model (for memorization) alongside a deep neural network (for generalization), one can combine the strengths of both to bring us one step closer. At Google, we call it Wide & Deep Learning. It's useful for generic large-scale regression and classification problems with sparse inputs (categorical features with a large number of possible feature values), such as recommender systems, search, and ranking problems.


Reference:

https://research.googleblog.com/2016/06/wide-deep-learning-better-together- with.html



When creating a new Cloud Dataproc cluster with the projects.regions.clusters.create operation, these four values are required: project, region, name, and ____.

  1. zone
  2. node
  3. label
  4. type

Answer(s): A

Explanation:

At a minimum, you must specify four values when creating a new cluster with the projects.regions.clusters.create operation:
The project in which the cluster will be created The region to use
The name of the cluster
The zone in which the cluster will be created
You can specify many more details beyond these minimum requirements. For example, you can
also specify the number of workers, whether preemptible compute should be used, and the network settings.


Reference:

https://cloud.google.com/dataproc/docs/tutorials/python-library- example#create_a_new_cloud_dataproc_cluste



Which methods can be used to reduce the number of rows processed by BigQuery?

  1. Splitting tables into multiple tables; putting data in partitions
  2. Splitting tables into multiple tables; putting data in partitions; using the LIMIT clause
  3. Putting data in partitions; using the LIMIT clause
  4. Splitting tables into multiple tables; using the LIMIT clause

Answer(s): A

Explanation:

If you split a table into multiple tables (such as one table for each day), then you can limit your query to the data in specific tables (such as for particular days). A better method is to use a partitioned table, as long as your data can be separated by the day. If you use the LIMIT clause, BigQuery will still process the entire table.


Reference:

https://cloud.google.com/bigquery/docs/partitioned-tables



You want to use a BigQuery table as a data sink. In which writing mode(s) can you use BigQuery as a sink?

  1. Both batch and streaming
  2. BigQuery cannot be used as a sink
  3. Only batch
  4. Only streaming

Answer(s): A

Explanation:

When you apply a BigQueryIO.Write transform in batch mode to write to a single table, Dataflow invokes a BigQuery load job.
When you apply a BigQueryIO.Write transform in streaming mode or in batch mode using a function to specify the destination table, Dataflow uses BigQuery's streaming inserts


Reference:

https://cloud.google.com/dataflow/model/bigquery-io



Page 19 of 68



Post your Comments and Discuss Google Professional Data Engineer exam with other Community members:

madhan commented on June 16, 2023
next question
EUROPEAN UNION
upvote