Free Professional Data Engineer Exam Braindumps (page: 21)

Page 21 of 68

For the best possible performance, what is the recommended zone for your Compute Engine instance and Cloud Bigtable instance?

  1. Have the Compute Engine instance in the furthest zone from the Cloud Bigtable instance.
  2. Have both the Compute Engine instance and the Cloud Bigtable instance to be in
    different zones.
  3. Have both the Compute Engine instance and the Cloud Bigtable instance to be in the same zone.
  4. Have the Cloud Bigtable instance to be in the same zone as all of the consumers of your data.

Answer(s): C

Explanation:

It is recommended to create your Compute Engine instance in the same zone as your Cloud Bigtable instance for the best possible performance, If it's not possible to create a instance in the same zone, you should create your instance in another zone within the same region. For example, if your Cloud Bigtable instance is located in us-central1-b, you could create your instance in us-central1-f. This change may result in several milliseconds of additional latency for each Cloud Bigtable request. It is recommended to avoid creating your Compute Engine instance in a different region from
your Cloud Bigtable instance, which can add hundreds of milliseconds of latency to each Cloud Bigtable request.


Reference:

https://cloud.google.com/bigtable/docs/creating-compute-instance



Why do you need to split a machine learning dataset into training data and test data?

  1. So you can try two different sets of features
  2. To make sure your model is generalized for more than just the training data
  3. To allow you to create unit tests in your code
  4. So you can use one dataset for a wide model and one for a deep model

Answer(s): B

Explanation:

The flaw with evaluating a predictive model on training data is that it does not inform you on how well the model has generalized to new unseen data. A model that is selected for its accuracy on the training dataset rather than its accuracy on an unseen test dataset is very likely to have lower accuracy on an unseen test dataset. The reason is that the model is not as generalized. It has specialized to the structure in the training dataset. This is called
overfitting.


Reference:

https://machinelearningmastery.com/a-simple-intuition-for-overfitting/



Which of the following is not possible using primitive roles?

  1. Give a user viewer access to BigQuery and owner access to Google Compute Engine instances.
  2. Give UserA owner access and UserB editor access for all datasets in a project.
  3. Give a user access to view all datasets in a project, but not run queries on them.
  4. Give GroupA owner access and GroupB editor access for all datasets in a project.

Answer(s): C

Explanation:

Primitive roles can be used to give owner, editor, or viewer access to a user or group, but they can't be used to separate data access permissions from job-running permissions.


Reference:

https://cloud.google.com/bigquery/docs/access-control#primitive_iam_roles



Scaling a Cloud Dataproc cluster typically involves ____.

  1. increasing or decreasing the number of worker nodes
  2. increasing or decreasing the number of master nodes
  3. moving memory to run more applications on a single node
  4. deleting applications from unused nodes periodically

Answer(s): A

Explanation:

After creating a Cloud Dataproc cluster, you can scale the cluster by increasing or decreasing the number of worker nodes in the cluster at any time, even when jobs are
running on the cluster. Cloud Dataproc clusters are typically scaled to:
1) increase the number of workers to make a job run faster
2) decrease the number of workers to save money
3) increase the number of nodes to expand available Hadoop Distributed Filesystem (HDFS) storage


Reference:

https://cloud.google.com/dataproc/docs/concepts/scaling-clusters



Page 21 of 68



Post your Comments and Discuss Google Professional Data Engineer exam with other Community members:

madhan commented on June 16, 2023
next question
EUROPEAN UNION
upvote