Free Professional Data Engineer Exam Braindumps (page: 14)

Page 14 of 68

You work for a manufacturing plant that batches application log files together into a single log file once a day at 2:00 AM. You have written a Google Cloud Dataflow job to process that log file. You need to make sure the log file in processed once per day as inexpensively as possible.
What should you do?

  1. Change the processing job to use Google Cloud Dataproc instead.
  2. Manually start the Cloud Dataflow job each morning when you get into the office.
  3. Create a cron job with Google App Engine Cron Service to run the Cloud Dataflow job.
  4. Configure the Cloud Dataflow job as a streaming job so that it processes the log data immediately.

Answer(s): C



Which of these operations can you perform from the BigQuery Web UI?

  1. Upload a file in SQL format.
  2. Load data with nested and repeated fields.
  3. Upload a 20 MB file.
  4. Upload multiple files using a wildcard.

Answer(s): B

Explanation:

You can load data with nested and repeated fields using the Web UI.
You cannot use the Web UI to:
- Upload a file greater than 10 MB in size
- Upload multiple files at the same time
- Upload a file in SQL format
All three of the above operations can be performed using the "bq" command.


Reference:

https://cloud.google.com/bigquery/loading-data



Which TensorFlow function can you use to configure a categorical column if you don't know all of the possible values for that column?

  1. categorical_column_with_vocabulary_list
  2. categorical_column_with_hash_bucket
  3. categorical_column_with_unknown_values
  4. sparse_column_with_keys

Answer(s): B

Explanation:

If you know the set of all possible feature values of a column and there are only a few of them, you can use categorical_column_with_vocabulary_list. Each key in the list will get assigned an auto-incremental ID starting from 0.
What if we don't know the set of possible values in advance? Not a problem. We can use categorical_column_with_hash_bucket instead.
What will happen is that each possible value in the feature column occupation will be hashed to an integer ID as we encounter them in training.


Reference:

https://www.tensorflow.org/tutorials/wide



What are two of the characteristics of using online prediction rather than batch prediction?

  1. It is optimized to handle a high volume of data instances in a job and to run more complex models.
  2. Predictions are returned in the response message.
  3. Predictions are written to output files in a Cloud Storage location that you specify.
  4. It is optimized to minimize the latency of serving predictions.

Answer(s): B,D

Explanation:

Online prediction
Optimized to minimize the latency of serving predictions.
Predictions returned in the response message.
Batch prediction
Optimized to handle a high volume of instances in a job and to run more complex models. Predictions written to output files in a Cloud Storage location that you specify.


Reference:

https://cloud.google.com/ml-engine/docs/prediction- overview#online_prediction_versus_batch_prediction



Page 14 of 68



Post your Comments and Discuss Google Professional Data Engineer exam with other Community members:

madhan commented on June 16, 2023
next question
EUROPEAN UNION
upvote