Free Professional Data Engineer Exam Braindumps (page: 7)

Page 7 of 68

Business owners at your company have given you a database of bank transactions. Each row contains the user ID, transaction type, transaction location, and transaction amount. They ask you to investigate what type of machine learning can be applied to the data.
Which three machine learning applications can you use? (Choose three.)

  1. Supervised learning to determine which transactions are most likely to be fraudulent.
  2. Unsupervised learning to determine which transactions are most likely to be fraudulent.
  3. Clustering to divide the transactions into N categories based on feature similarity.
  4. Supervised learning to predict the location of a transaction.
  5. Reinforcement learning to predict the location of a transaction.
  6. Unsupervised learning to predict the location of a transaction.

Answer(s): B,C,D



Your company is streaming real-time sensor data from their factory floor into Bigtable and they have noticed extremely poor performance. How should the row key be redesigned to improve Bigtable performance on queries that populate real-time dashboards?

  1. Use a row key of the form <timestamp>.
  2. Use a row key of the form <sensorid>.
  3. Use a row key of the form <timestamp>#<sensorid>.
  4. Use a row key of the form >#<sensorid>#<timestamp>.

Answer(s): A



Your company is running their first dynamic campaign, serving different offers by analyzing real-time data during the holiday season. The data scientists are collecting terabytes of data that rapidly grows every hour during their 30-day campaign. They are using Google Cloud Dataflow to preprocess the data and collect the feature (signals) data that is needed for the machine learning model in Google Cloud Bigtable. The team is observing suboptimal performance with reads and writes of their initial load of 10 TB of data. They want to improve this performance while minimizing cost.
What should they do?

  1. Redefine the schema by evenly distributing reads and writes across the row space of the table.
  2. The performance issue should be resolved over time as the site of the BigDate cluster is increased.
  3. Redesign the schema to use a single row key to identify values that need to be updated frequently in the cluster.
  4. Redesign the schema to use row keys based on numeric IDs that increase sequentially per user viewing the offers.

Answer(s): A



You are building a model to make clothing recommendations. You know a user's fashion
preference is likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available. How should you use this data to train the model?

  1. Continuously retrain the model on just the new data.
  2. Continuously retrain the model on a combination of existing data and the new data.
  3. Train on the existing data while using the new data as your test set.
  4. Train on the new data while using the existing data as your test set.

Answer(s): C


Reference:

https://cloud.google.com/automl-tables/docs/prepare



Page 7 of 68



Post your Comments and Discuss Google Professional Data Engineer exam with other Community members:

madhan commented on June 16, 2023
next question
EUROPEAN UNION
upvote