Free Professional Data Engineer Exam Braindumps (page: 25)

Page 25 of 68

If a dataset contains rows with individual people and columns for year of birth, country, and income, how many of the columns are continuous and how many are categorical?

  1. 1 continuous and 2 categorical
  2. 3 categorical
  3. 3 continuous
  4. 2 continuous and 1 categorical

Answer(s): D

Explanation:

The columns can be grouped into two types--categorical and continuous columns:
A column is called categorical if its value can only be one of the categories in a finite set. For example, the native country of a person (U.S., India, Japan, etc.) or the education level (high school, college, etc.) are categorical columns. A column is called continuous if its value can be any numerical value in a continuous range. For example, the capital gain of a person (e.g. $14,084) is a continuous column. Year of birth and income are continuous columns. Country is a categorical column. You could use bucketization to turn year of birth and/or income into categorical features, but the raw columns are continuous.


Reference:

https://www.tensorflow.org/tutorials/wide#reading_the_census_data



What are the minimum permissions needed for a service account used with Google Dataproc?

  1. Execute to Google Cloud Storage; write to Google Cloud Logging
  2. Write to Google Cloud Storage; read to Google Cloud Logging
  3. Execute to Google Cloud Storage; execute to Google Cloud Logging
  4. Read and write to Google Cloud Storage; write to Google Cloud Logging

Answer(s): D

Explanation:

Service accounts authenticate applications running on your virtual machine instances to other Google Cloud Platform services. For example, if you write an application that reads and writes files on Google Cloud Storage, it must first authenticate to the Google Cloud Storage API. At a minimum, service accounts used with Cloud Dataproc need permissions to read and write to Google Cloud Storage, and to write to Google Cloud Logging.


Reference:

https://cloud.google.com/dataproc/docs/concepts/service- accounts#important_notes



Dataproc clusters contain many configuration files. To update these files, you will need to
use the --properties option. The format for the option is: file_prefix:property=_____.

  1. details
  2. value
  3. null
  4. id

Answer(s): B

Explanation:

To make updating files and properties easy, the --properties command uses a special format to specify the configuration file and the property and value within the file that should be updated. The formatting is as follows: file_prefix:property=value.


Reference:

https://cloud.google.com/dataproc/docs/concepts/cluster-properties#formatting



You are planning to use Google's Dataflow SDK to analyze customer data such as displayed below. Your project requirement is to extract only the customer name from the data source and then write to an output PCollection.

Tom,555 X street

Tim,553 Y street

Sam, 111 Z street

Which operation is best suited for the above data processing requirement?

  1. ParDo
  2. Sink API
  3. Source API
  4. Data extraction

Answer(s): A

Explanation:

In Google Cloud dataflow SDK, you can use the ParDo to extract only a customer name of each
element in your PCollection.


Reference:

https://cloud.google.com/dataflow/model/par-do



Page 25 of 68



Post your Comments and Discuss Google Professional Data Engineer exam with other Community members:

madhan commented on June 16, 2023
next question
EUROPEAN UNION
upvote