Free Google Google Cloud Certified Professional Data Engineer Exam Questions (page: 7)

You are working on a sensitive project involving private user dat

  1. You have set up a project on Google Cloud Platform to house your work internally. An external consultant is going to assist with coding a complex transformation in a Google Cloud Dataflow pipeline for your project. How should you maintain users' privacy?
  2. Grant the consultant the Viewer role on the project.
  3. Grant the consultant the Cloud Dataflow Developer role on the project.
  4. Create a service account and allow the consultant to log on with it.
  5. Create an anonymized sample of the data for the consultant to work with in a different project.

Answer(s): C



You are building a model to predict whether or not it will rain on a given day. You have thousands of input features and want to see if you can improve training speed by removing some features while having a minimum effect on model accuracy.
What can you do?

  1. Eliminate features that are highly correlated to the output labels.
  2. Combine highly co-dependent features into one representative feature.
  3. Instead of feeding in each feature individually, average their values in batches of 3.
  4. Remove the features that have null values for more than 50% of the training records.

Answer(s): B



Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow. Numerous data logs are being are being generated during this step, and the team wants to analyze them. Due to the dynamic nature of the campaign, the data is growing exponentially every hour.

The data scientists have written the following code to read the data for a new key features in the logs.

BigQueryIO.Read

.named("ReadLogData")

.from("clouddataflow-readonly:samples.log_data")

You want to improve the performance of this data read.
What should you do?

  1. Specify the Table
    Reference: object in the code.
  2. Use .fromQuery operation to read specific fields from the table.
  3. Use of both the Google BigQuery TableSchema and TableFieldSchema classes.
  4. Call a transform that returns TableRow objects, where each element in the PCollexction represents a single row in the table.

Answer(s): D


Reference:

object in the code.
B. Use .fromQuery operation to read specific fields from the table.
C. Use of both the Google BigQuery TableSchema and TableFieldSchema classes.
D. Call a transform that returns TableRow objects, where each element in the PCollexction represents a single row in the table.

Answer(s): D



Your company is streaming real-time sensor data from their factory floor into Bigtable and they have noticed extremely poor performance. How should the row key be redesigned to improve Bigtable performance on queries that populate real-time dashboards?

  1. Use a row key of the form <timestamp>.
  2. Use a row key of the form <sensorid>.
  3. Use a row key of the form <timestamp>#<sensorid>.
  4. Use a row key of the form >#<sensorid>#<timestamp>.

Answer(s): A



Your company's customer and order databases are often under heavy load. This makes performing analytics against them difficult without harming operations. The databases are in a MySQL cluster, with nightly backups taken using mysqldump. You want to perform analytics with minimal impact on operations.
What should you do?

  1. Add a node to the MySQL cluster and build an OLAP cube there.
  2. Use an ETL tool to load the data from MySQL into Google BigQuery.
  3. Connect an on-premises Apache Hadoop cluster to MySQL and perform ETL.
  4. Mount the backups to Google Cloud SQL, and then process the data using Google Cloud Dataproc.

Answer(s): C



Viewing page 7 of 78
Viewing questions 31 - 35 out of 384 questions



Post your Comments and Discuss Google Google Cloud Certified Professional Data Engineer exam prep with other Community members:

Google Cloud Certified Professional Data Engineer Exam Discussions & Posts