QUESTION: 25

Your team is building several data pipelines that contain a collection of complex tasks and dependencies that you want to execute on a schedule, in a specific order. The tasks and dependencies consist of files in Cloud Storage, Apache Spark jobs, and data in BigQuery. You need to design a system that can schedule and automate these data processing tasks using a fully managed approach.
What should you do?

Use Cloud Scheduler to schedule the jobs to run.
Use Cloud Tasks to schedule and run the jobs asynchronously.
Create directed acyclic graphs (DAGs) in Cloud Composer. Use the appropriate operators to connect to Cloud Storage, Spark, and BigQuery.
Create directed acyclic graphs (DAGs) in Apache Airflow deployed on Google Kubernetes Engine.
Use the appropriate operators to connect to Cloud Storage, Spark, and BigQuery.

Answer(s): C

Explanation:

Using Cloud Composer to create Directed Acyclic Graphs (DAGs) is the best solution because it is a fully managed, scalable workflow orchestration service based on Apache Airflow. Cloud Composer allows you to define complex task dependencies and schedules while integrating seamlessly with Google Cloud services such as Cloud Storage, BigQuery, and Dataproc for Apache Spark jobs. This approach minimizes operational overhead, supports scheduling and automation, and provides an efficient and fully managed way to orchestrate your data pipelines.

Reveal Solution Next Question

QUESTION: 26

You are responsible for managing Cloud Storage buckets for a research company. Your company has well-defined data tiering and retention rules. You need to optimize storage costs while achieving your data retention needs.
What should you do?

Configure the buckets to use the Archive storage class.
Configure a lifecycle management policy on each bucket to downgrade the storage class and remove objects based on age.
Configure the buckets to use the Standard storage class and enable Object Versioning.
Configure the buckets to use the Autoclass feature.

Answer(s): B

Explanation:

Configuring a lifecycle management policy on each Cloud Storage bucket allows you to automatically transition objects to lower-cost storage classes (such as Nearline, Coldline, or Archive) based on their age or other criteria. Additionally, the policy can automate the removal of objects once they are no longer needed, ensuring compliance with retention rules and optimizing storage costs. This approach aligns well with well-defined data tiering and retention needs, providing cost efficiency and automation.

Reveal Solution Next Question

QUESTION: 27

You are using your own data to demonstrate the capabilities of BigQuery to your organization's leadership team. You need to perform a one- time load of the files stored on your local machine into BigQuery using as little effort as possible.
What should you do?

Write and execute a Python script using the BigQuery Storage Write API library.
Create a Dataproc cluster, copy the files to Cloud Storage, and write an Apache Spark job using the spark-bigquery-connector.
Execute the bq load command on your local machine.
Create a Dataflow job using the Apache Beam FileIO and BigQueryIO connectors with a local runner.

Answer(s): C

Explanation:

Using the bq load command is the simplest and most efficient way to perform a one-time load of files from your local machine into BigQuery. This command-line tool is easy to use, requires minimal setup, and supports direct uploads from local files to BigQuery tables. It meets the requirement for minimal effort while allowing you to quickly demonstrate BigQuery's capabilities to your organization's leadership team.

Reveal Solution Next Question

QUESTION: 28

Your organization uses Dataflow pipelines to process real-time financial transactions. You discover that one of your Dataflow jobs has failed. You need to troubleshoot the issue as quickly as possible.
What should you do?

Set up a Cloud Monitoring dashboard to track key Dataflow metrics, such as data throughput, error rates, and resource utilization.
Create a custom script to periodically poll the Dataflow API for job status updates, and send email alerts if any errors are identified.
Navigate to the Dataflow Jobs page in the Google Cloud console. Use the job logs and worker logs to identify the error.
Use the gcloud CLI tool to retrieve job metrics and logs, and analyze them for errors and performance bottlenecks.

Answer(s): C

Explanation:

To troubleshoot a failed Dataflow job as quickly as possible, you should navigate to the Dataflow Jobs page in the Google Cloud console. The console provides access to detailed job logs and worker logs, which can help you identify the cause of the failure. The graphical interface also allows you to visualize pipeline stages, monitor performance metrics, and pinpoint where the error occurred, making it the most efficient way to diagnose and resolve the issue promptly.

Reveal Solution Next Question

Free Associate-Data-Practitioner Exam Braindumps (page: 8)

QUESTION: 25

Explanation:

QUESTION: 26

Explanation:

QUESTION: 27

Explanation:

QUESTION: 28

Explanation: