Free Professional Data Engineer Exam Braindumps (page: 22)

Page 21 of 95

If a dataset contains rows with individual people and columns for year of birth, country, and income, how many of the columns are continuous and how many are categorical?

  1. 1 continuous and 2 categorical
  2. 3 categorical
  3. 3 continuous
  4. 2 continuous and 1 categorical

Answer(s): D

Explanation:

The columns can be grouped into two types--categorical and continuous columns:

A column is called categorical if its value can only be one of the categories in a finite set. For example, the native country of a person (U.S., India, Japan, etc.) or the education level (high school, college, etc.) are categorical columns.

A column is called continuous if its value can be any numerical value in a continuous range. For example, the capital gain of a person (e.g. $14,084) is a continuous column.

Year of birth and income are continuous columns. Country is a categorical column.

You could use bucketization to turn year of birth and/or income into categorical features, but the raw columns are continuous.


Reference:

https://www.tensorflow.org/tutorials/wide#reading_the_census_data



Which of the following are examples of hyperparameters? (Select 2 answers.)

  1. Number of hidden layers
  2. Number of nodes in each hidden layer
  3. Biases
  4. Weights

Answer(s): A,B

Explanation:

If model parameters are variables that get adjusted by training with existing data, your hyperparameters are the variables about the training process itself. For example, part of setting up a deep neural network is deciding how many "hidden" layers of nodes to use between the input layer and the output layer, as well as how many nodes each layer should use. These variables are not directly related to the training data at all. They are configuration variables. Another difference is that parameters change during a training job, while the hyperparameters are usually constant during a job.

Weights and biases are variables that get adjusted during the training process, so they are not hyperparameters.


Reference:

https://cloud.google.com/ml-engine/docs/hyperparameter-tuning-overview



Which of the following are feature engineering techniques? (Select 2 answers)

  1. Hidden feature layers
  2. Feature prioritization
  3. Crossed feature columns
  4. Bucketization of a continuous feature

Answer(s): C,D

Explanation:

Selecting and crafting the right set of feature columns is key to learning an effective model.

Bucketization is a process of dividing the entire range of a continuous feature into a set of consecutive bins/buckets, and then converting the original numerical feature into a bucket ID (as a categorical feature) depending on which bucket that value falls into.

Using each base feature column separately may not be enough to explain the data. To learn the differences between different feature combinations, we can add crossed feature columns to the model.


Reference:

https://www.tensorflow.org/tutorials/wide#selecting_and_engineering_features_for_the_model



You want to use a BigQuery table as a data sink. In which writing mode(s) can you use BigQuery as a sink?

  1. Both batch and streaming
  2. BigQuery cannot be used as a sink
  3. Only batch
  4. Only streaming

Answer(s): A

Explanation:

When you apply a BigQueryIO.Write transform in batch mode to write to a single table, Dataflow invokes a BigQuery load job.
When you apply a BigQueryIO.Write transform in streaming mode or in batch mode using a function to specify the destination table, Dataflow uses BigQuery's streaming inserts


Reference:

https://cloud.google.com/dataflow/model/bigquery-io






Post your Comments and Discuss Google Professional Data Engineer exam with other Community members:

Exam Discussions & Posts