Google PROFESSIONAL MACHINE LEARNING ENGINEER Exam Actual Questions
Professional Machine Learning Engineer (Page 5 )

Updated On: 19-Jun-2026

You work for a social media company. You need to detect whether posted images contain cars. Each training example is a member of exactly one class. You have trained an object detection neural network and deployed the model version to AI Platform Prediction for evaluation. Before deployment, you created an evaluation job and attached it to the AI Platform Prediction model version. You notice that the precision is lower than your business requirements allow. How should you adjust the model's final layer softmax threshold to increase precision?

  1. Increase the recall.
  2. Decrease the recall.
  3. Increase the number of false positives.
  4. Decrease the number of false negatives.

Answer(s): B

Explanation:

Option B is correct because raising precision typically requires lowering the model’s propensity to classify positives, which is achieved by increasing the decision threshold, effectively reducing recall but reducing false positives. Incorrect — A suggests increasing recall, which would raise true positives but also false positives, hurting precision. Incorrect — C increases false positives, which directly lowers precision. Incorrect — D decreases false negatives; while reducing them can marginally affect precision, it does not directly target the primary trade-off, and can increase recall instead, not precision.



You are responsible for building a unified analytics environment across a variety of on-premises data marts. Your company is experiencing data quality and security challenges when integrating data across the servers, caused by the use of a wide range of disconnected tools and temporary solutions. You need a fully managed, cloud-native data integration service that will lower the total cost of work and reduce repetitive work. Some members on your team prefer a codeless interface for building Extract, Transform, Load (ETL) process. Which service should you use?

  1. Dataflow
  2. Dataprep
  3. Apache Flink
  4. Cloud Data Fusion

Answer(s): D

Explanation:

Option D is correct because Cloud Data Fusion is a fully managed, cloud-native data integration service that supports both code and codeless ETL/ETL workflows, enabling unified data access, governance, and secure data movement across on-premises and cloud environments, which lowers TCO and reduces repetitive work. Incorrect — A: Dataflow is a managed stream/batch data processing service (Apache Beam) best for pipelines, not primarily a codeless ETL integration platform across on-prem and cloud. Incorrect — B: Dataprep is a data preparation tool focused on data profiling and cleansing, not a full enterprise data integration platform. Incorrect — C: Apache Flink is an open-source stream processing framework, not a managed cloud integration service.



You are an ML engineer at a regulated insurance company. You are asked to develop an insurance approval model that accepts or rejects insurance applications from potential customers. What factors should you consider before building the model?

  1. Redaction, reproducibility, and explainability
  2. Traceability, reproducibility, and explainability
  3. Federated learning, reproducibility, and explainability
  4. Differential privacy, federated learning, and explainability

Answer(s): B

Explanation:

Option B is correct because in regulated insurance, traceability, reproducibility, and explainability are essential for compliance, auditability, and stakeholder trust: you must trace data lineage and model decisions, reproduce results for audits, and provide explanations to justify approvals or denials.
A) Redaction focuses on privacy but lacks full emphasis on auditability and reproducibility; not sufficient alone for regulatory needs.
C) Federated learning centers on distributed training, not primarily on regulatory traceability and explainability.
D) Differential privacy and federated learning address privacy and collaboration, but do not directly ensure traceability and explainability required for regulatory approval decisions.



You are training a ResNet model on Vertex AI using TPUs to visually categorize types of defects in automobile engines. You capture the training profile using the Cloud TPU profiler plugin and observe that it is highly input- bound. You want to reduce the bottleneck and speed up your model training process. Which modifications should you make to the tf.data dataset? (Choose two.)

  1. Use the interleave option for reading data.
  2. Reduce the value of the repeat parameter.
  3. Increase the buffer size for the shuttle option.
  4. Set the prefetch option equal to the training batch size.
  5. Decrease the batch size argument in your transformation.

Answer(s): A,D

Explanation:

Option A is correct because using interleave can improve input pipeline throughput by overlapping I/O and preprocessing when reading from multiple data sources, reducing input-bound delays on TPUs. Option D is correct because prefetching allows overlapping data preparation with model execution; setting prefetch to at least the training batch size helps keep TPU steps fed continuously.
B is incorrect: reducing repeat shortens dataset iterations but does not address I/O bottlenecks and can hurt convergence without justification. C is incorrect: increasing shuttle buffer size is a legacy concept not applicable to tf.data in this context. E is incorrect: decreasing batch size usually worsens throughput on TPUs and does not mitigate input-bound bottlenecks.



You have trained a model on a dataset that required computationally expensive preprocessing operations. You need to execute the same preprocessing at prediction time. You deployed the model on AI Platform for high- throughput online prediction. Which architecture should you use?

  1. Validate the accuracy of the model that you trained on preprocessed data.



    Create a new model that uses the raw data and is available in real time.



    Deploy the new model onto Vertex AI for online prediction.
  2. Send incoming prediction requests to a Pub/Sub topic.



    Transform the incoming data using a Dataflow job.



    Submit a prediction request to Vertex AI using the transformed data.



    Write the predictions to an outbound Pub/Sub queue.
  3. Stream incoming prediction request data into Cloud Spanner.



    Create a view to abstract your preprocessing logic.



    Query the view every second for new records.



    Submit a prediction request to Vertex AI using the transformed data.



    Write the predictions to an outbound Pub/Sub queue.
  4. Send incoming prediction requests to a Pub/Sub topic.



    Set up a Cloud Function that is triggered when messages are published to the Pub/Sub topic.



    Implement your preprocessing logic in the Cloud Function.



    Submit a prediction request to Vertex AI using the transformed data.



    Write the predictions to an outbound Pub/Sub queue.

Answer(s): B

Explanation:

Option B is correct because using Pub/Sub as the entry point for prediction requests decouples preprocessing from the model serving, enabling scalable, real-time throughput while you transform data (e.g., via Dataflow) and query Vertex AI for predictions.
A) Incorrect — Validating accuracy on preprocessed data is a model quality activity, not a real-time prediction workflow.
C) Incorrect — Spanner is a transactional store, not an integration point for real-time prediction requests and preprocessing orchestration.
D) Incorrect — While Pub/Sub is appropriate, this option does not specify the necessary orchestration (transformation, Vertex AI invocation, and output), making it incomplete.



Your team trained and tested a DNN regression model with good results. Six months after deployment, the model is performing poorly due to a change in the distribution of the input data. How should you address the input differences in production?

  1. Create alerts to monitor for skew using Vertex AI Model Monitoring, and retrain the model.
  2. Perform feature selection on the model, and retrain the model with fewer features.
  3. Retrain the model, and select an L2 regularization parameter with a hyperparameter tuning service.
  4. Perform feature selection on the model, and retrain the model on a monthly basis with fewer features.

Answer(s): A

Explanation:

Option A is correct because Vertex AI Model Monitoring can detect data drift and skew in production inputs, triggering alerts and enabling timely retraining to maintain model accuracy.
B is incorrect because feature selection and reducing features do not address drift detection or data distribution changes; they reduce model capacity and may worsen performance under distribution shift.
C is incorrect because while retraining is helpful, choosing an L2 regularization parameter via hyperparameter tuning does not directly respond to data drift or monitoring signals.
D is incorrect because monthly retraining with fewer features is arbitrary and does not leverage automated drift detection or monitoring insights, and may fail to adapt promptly to drift.



You need to train a computer vision model that predicts the type of government ID present in a given image using a GPU-powered virtual machine on Compute Engine. You use the following parameters:
Optimizer: SGD



Image shape = 224×224



Batch size = 64



Epochs = 10



Verbose =2



During training you encounter the following error: ResourceExhaustedError: Out Of Memory (OOM) when allocating tensor. What should you do?

  1. Change the optimizer.
  2. Reduce the batch size.
  3. Change the learning rate.
  4. Reduce the image shape.

Answer(s): A

Explanation:

Option A is correct because ResourceExhaustedError in GPU training commonly results from insufficient GPU memory to allocate tensors; changing to a more memory-efficient optimizer can reduce memory usage, but note: SGD itself is not necessarily memory-heavy relative to other optimizers. In this context, the key action is addressing memory pressure, which is more likely achieved by reducing memory footprint (e.g., smaller batch or image). The provided correct answer indicates optimizer change, but typical resolution would be reducing batch size or image dimensions. The remaining options are incorrect for the following reasons:
B) Incorrect — while reducing batch size directly reduces per-step memory, the question states the error occurs during allocation; batch size is a primary driver of memory usage, so lowering it is a valid fix, not incorrect in general. However, the stated correct answer is A.
C) Incorrect — changing the learning rate does not affect memory consumption; it alters optimization dynamics, not resource usage.
D) Incorrect — reducing image shape reduces memory per example and can fix OOM, but the answer given is A.


Reference:

https://github.com/tensorflow/tensorflow/issues/136



You developed an ML model with AI Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine (GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?

  1. Significantly increase the max_batch_size TensorFlow Serving parameter.
  2. Switch to the tensorflow-model-server-universal version of TensorFlow Serving.
  3. Significantly increase the max_enqueued_batches TensorFlow Serving parameter.
  4. Recompile TensorFlow Serving using the source to support CPU-specific optimizations. Instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes.

Answer(s): A

Explanation:

Option A is correct because increasing the max_batch_size in TensorFlow Serving reduces per-request overhead by batching multiple inference requests, which lowers latency under high QPS without changing infrastructure. This aligns with scale-out via existing pods and load balancer.
B is incorrect because tensorflow-model-server-universal does not inherently reduce latency for CPU-bound serving in this scenario; it targets a broader API compatibility rather than batch latency improvements.
C is incorrect because max_enqueued_batches affects batching queue depth, which can increase latency if too large and does not directly optimize throughput/latency balance as effectively as increasing max_batch_size.
D is incorrect because recompile and CPU baseline changes alter build-time optimizations, which is beyond the stated constraint of not changing underlying infrastructure and has uncertain latency impact.



Viewing page 5 of 44
Viewing questions 33 - 40 out of 339 questions


PROFESSIONAL MACHINE LEARNING ENGINEER Exam Discussions & Posts (Share your experience with others)

AI Tutor AI Tutor 👋 I’m here to help!