Free Professional Data Engineer Exam Braindumps (page: 29)

Page 29 of 68

Which of the following is not true about Dataflow pipelines?

  1. Pipelines are a set of operations
  2. Pipelines represent a data processing job
  3. Pipelines represent a directed graph of steps
  4. Pipelines can share data between instances

Answer(s): D

Explanation:

The data and transforms in a pipeline are unique to, and owned by, that pipeline.
While your program can create multiple pipelines, pipelines cannot share data or transforms


Reference:

https://cloud.google.com/dataflow/model/pipelines



What are two of the benefits of using denormalized data structures in BigQuery?

  1. Reduces the amount of data processed, reduces the amount of storage required
  2. Increases query speed, makes queries simpler
  3. Reduces the amount of storage required, increases query speed
  4. Reduces the amount of data processed, increases query speed

Answer(s): B

Explanation:

Denormalization increases query speed for tables with billions of rows because BigQuery's performance degrades when doing JOINs on large tables, but with a denormalized data structure, you don't have to use JOINs, since all of the data has been combined into one table. Denormalization also makes queries simpler because you do not have to use JOIN clauses.
Denormalization increases the amount of data processed and the amount of storage required because it creates redundant data.


Reference:

https://cloud.google.com/solutions/bigquery-data-warehouse#denormalizing_data



Which of the following are examples of hyperparameters? (Select 2 answers.)

  1. Number of hidden layers
  2. Number of nodes in each hidden layer
  3. Biases
  4. Weights

Answer(s): A,B

Explanation:

If model parameters are variables that get adjusted by training with existing data, your hyperparameters are the variables about the training process itself. For example, part of setting up a deep neural network is deciding how many "hidden" layers of nodes to use between the input layer and the output layer, as well as how many nodes each layer should use. These variables are not directly related to the training data at all. They are configuration variables. Another difference is that parameters change during a training job, while the hyperparameters are usually constant during a job. Weights and biases are variables that get adjusted during the training process, so they are not hyperparameters.


Reference:

https://cloud.google.com/ml-engine/docs/hyperparameter-tuning-overview



What are two methods that can be used to denormalize tables in BigQuery?

  1. 1) Split table into multiple tables; 2) Use a partitioned table
  2. 1) Join tables into one table; 2) Use nested repeated fields
  3. 1) Use a partitioned table; 2) Join tables into one table
  4. 1) Use nested repeated fields; 2) Use a partitioned table

Answer(s): B

Explanation:

The conventional method of denormalizing data involves simply writing a fact, along with all its dimensions, into a flat table structure. For example, if you are dealing with sales transactions, you would write each individual fact to a record, along with the accompanying dimensions such as order and customer information. The other method for denormalizing data takes advantage of BigQuery's native support for
nested and repeated structures in JSON or Avro input data. Expressing records using nested and repeated structures can provide a more natural representation of the underlying data. In the case of the sales order, the outer part of a JSON structure would contain the order and customer information, and the inner part of the structure would contain the individual line items of the order, which would be represented as nested, repeated elements.


Reference:

https://cloud.google.com/solutions/bigquery-data- warehouse#denormalizing_data



Page 29 of 68



Post your Comments and Discuss Google Professional Data Engineer exam with other Community members:

madhan commented on June 16, 2023
next question
EUROPEAN UNION
upvote