Free Associate-Data-Practitioner Exam Braindumps (page: 7)

Page 6 of 19

You need to create a weekly aggregated sales report based on a large volume of dat

  1. You want to use Python to design an efficient process for generating this report.
    What should you do?
  2. Create a Cloud Run function that uses NumPy. Use Cloud Scheduler to schedule the function to run once a week.
  3. Create a Colab Enterprise notebook and use the bigframes.pandas library. Schedule the notebook to execute once a week.
  4. Create a Cloud Data Fusion and Wrangler flow. Schedule the flow to run once a week.
  5. Create a Dataflow directed acyclic graph (DAG) coded in Python. Use Cloud Scheduler to schedule the code to run once a week.

Answer(s): D

Explanation:

Using Dataflow with a Python-coded Directed Acyclic Graph (DAG) is the most efficient solution for generating a weekly aggregated sales report based on a large volume of data. Dataflow is optimized for large-scale data processing and can handle aggregation efficiently. Python allows you to customize the pipeline logic, and Cloud Scheduler enables you to automate the process to run weekly. This approach ensures scalability, efficiency, and the ability to process large datasets in a cost-effective manner.



Your organization has decided to move their on-premises Apache Spark-based workload to Google Cloud. You want to be able to manage the code without needing to provision and manage your own cluster.
What should you do?

  1. Migrate the Spark jobs to Dataproc Serverless.
  2. Configure a Google Kubernetes Engine cluster with Spark operators, and deploy the Spark jobs.
  3. Migrate the Spark jobs to Dataproc on Google Kubernetes Engine.
  4. Migrate the Spark jobs to Dataproc on Compute Engine.

Answer(s): A

Explanation:

Migrating the Spark jobs to Dataproc Serverless is the best approach because it allows you to run Spark workloads without the need to provision or manage clusters. Dataproc Serverless automatically scales resources based on workload requirements, simplifying operations and reducing administrative overhead. This solution is ideal for organizations that want to focus on managing their Spark code without worrying about the underlying infrastructure. It is cost-effective and fully managed, aligning well with the goal of minimizing cluster management.



You are developing a data ingestion pipeline to load small CSV files into BigQuery from Cloud Storage. You want to load these files upon arrival to minimize data latency. You want to accomplish this with minimal cost and maintenance.
What should you do?

  1. Use the bq command-line tool within a Cloud Shell instance to load the data into BigQuery.
  2. Create a Cloud Composer pipeline to load new files from Cloud Storage to BigQuery and schedule it to run every 10 minutes.
  3. Create a Cloud Run function to load the data into BigQuery that is triggered when data arrives in Cloud Storage.
  4. Create a Dataproc cluster to pull CSV files from Cloud Storage, process them using Spark, and write the results to BigQuery.

Answer(s): C

Explanation:

Using a Cloud Run function triggered by Cloud Storage to load the data into BigQuery is the best solution because it minimizes both cost and maintenance while providing low-latency data ingestion. Cloud Run is a serverless platform that automatically scales based on the workload, ensuring efficient use of resources without requiring a dedicated instance or cluster. It integrates seamlessly with Cloud Storage event notifications, enabling real-time processing of incoming files and loading them into BigQuery. This approach is cost-effective, scalable, and easy to manage.



Your organization has a petabyte of application logs stored as Parquet files in Cloud Storage. You need to quickly perform a one-time SQL-based analysis of the files and join them to data that already resides in BigQuery.
What should you do?

  1. Create a Dataproc cluster, and write a PySpark job to join the data from BigQuery to the files in Cloud Storage.
  2. Launch a Cloud Data Fusion environment, use plugins to connect to BigQuery and Cloud Storage, and use the SQL join operation to analyze the data.
  3. Create external tables over the files in Cloud Storage, and perform SQL joins to tables in BigQuery to analyze the data.
  4. Use the bq load command to load the Parquet files into BigQuery, and perform SQL joins to analyze the data.

Answer(s): C

Explanation:

Creating external tables over the Parquet files in Cloud Storage allows you to perform SQL-based analysis and joins with data already in BigQuery without needing to load the files into BigQuery. This approach is efficient for a one-time analysis as it avoids the time and cost associated with loading large volumes of data into BigQuery. External tables provide seamless integration with Cloud Storage, enabling quick and cost-effective analysis of data stored in Parquet format.






Post your Comments and Discuss Google Associate-Data-Practitioner exam with other Community members:

Associate-Data-Practitioner Exam Discussions & Posts