Free Associate-Data-Practitioner Exam Braindumps (page: 4)

Page 3 of 19

You are designing a pipeline to process data files that arrive in Cloud Storage by 3:00 am each day. Data processing is performed in stages, where the output of one stage becomes the input of the next. Each stage takes a long time to run. Occasionally a stage fails, and you have to address the problem. You need to ensure that the final output is generated as quickly as possible.
What should you do?

  1. Design a Spark program that runs under Dataproc. Code the program to wait for user input when an error is detected. Rerun the last action after correcting any stage output data errors.
  2. Design the pipeline as a set of PTransforms in Dataflow. Restart the pipeline after correcting any stage output data errors.
  3. Design the workflow as a Cloud Workflow instance. Code the workflow to jump to a given stage based on an input parameter. Rerun the workflow after correcting any stage output data errors.
  4. Design the processing as a directed acyclic graph (DAG) in Cloud Composer. Clear the state of the failed task after correcting any stage output data errors.

Answer(s): D

Explanation:

Using Cloud Composer to design the processing pipeline as a Directed Acyclic Graph (DAG) is the most suitable approach because:

Fault tolerance: Cloud Composer (based on Apache Airflow) allows for handling failures at specific stages. You can clear the state of a failed task and rerun it without reprocessing the entire pipeline.

Stage-based processing: DAGs are ideal for workflows with interdependent stages where the output of one stage serves as input to the next.

Efficiency: This approach minimizes downtime and ensures that only failed stages are rerun, leading to faster final output generation.



Another team in your organization is requesting access to a BigQuery dataset. You need to share the dataset with the team while minimizing the risk of unauthorized copying of dat

  1. You also want to create a reusable framework in case you need to share this data with other teams in the future.
    What should you do?
  2. Create authorized views in the team's Google Cloud project that is only accessible by the team.
  3. Create a private exchange using Analytics Hub with data egress restriction, and grant access to the team members.
  4. Enable domain restricted sharing on the project. Grant the team members the BigQuery Data Viewer IAM role on the dataset.
  5. Export the dataset to a Cloud Storage bucket in the team's Google Cloud project that is only accessible by the team.

Answer(s): B

Explanation:

Using Analytics Hub to create a private exchange with data egress restrictions ensures controlled sharing of the dataset while minimizing the risk of unauthorized copying. This approach allows you to provide secure, managed access to the dataset without giving direct access to the raw data. The egress restriction ensures that data cannot be exported or copied outside the designated boundaries. Additionally, this solution provides a reusable framework that simplifies future data sharing with other teams or projects while maintaining strict data governance.



Your company has developed a website that allows users to upload and share video files. These files are most frequently accessed and shared when they are initially uploaded. Over time, the files are accessed and shared less frequently, although some old video files may remain very popular.

You need to design a storage system that is simple and cost-effective.
What should you do?

  1. Create a single-region bucket with Autoclass enabled.
  2. Create a single-region bucket. Configure a Cloud Scheduler job that runs every 24 hours and changes the storage class based on upload date.
  3. Create a single-region bucket with custom Object Lifecycle Management policies based on upload date.
  4. Create a single-region bucket with Archive as the default storage class.

Answer(s): C

Explanation:

Creating a single-region bucket with custom Object Lifecycle Management policies based on upload date is the most appropriate solution. This approach allows you to automatically transition objects to less expensive storage classes as their access frequency decreases over time. For example, frequently accessed files can remain in the Standard storage class initially, then transition to Nearline, Coldline, or Archive storage as their popularity wanes. This strategy ensures a cost-effective and efficient storage system while maintaining simplicity by automating the lifecycle management of video files.



You recently inherited a task for managing Dataflow streaming pipelines in your organization and noticed that proper access had not been provisioned to you. You need to request a Google-provided IAM role so you can restart the pipelines. You need to follow the principle of least privilege.
What should you do?

  1. Request the Dataflow Developer role.
  2. Request the Dataflow Viewer role.
  3. Request the Dataflow Worker role.
  4. Request the Dataflow Admin role.

Answer(s): A

Explanation:

The Dataflow Developer role provides the necessary permissions to manage Dataflow streaming pipelines, including the ability to restart pipelines. This role adheres to the principle of least privilege, as it grants only the permissions required to manage and operate Dataflow jobs without unnecessary administrative access. Other roles, such as Dataflow Admin, would grant broader permissions, which are not needed in this scenario.






Post your Comments and Discuss Google Associate-Data-Practitioner exam with other Community members:

Associate-Data-Practitioner Exam Discussions & Posts