Free AWS Certified Data Engineer - Associate DEA-C01 Exam Braindumps (page: 13)

Page 13 of 39

A data engineer has a one-time task to read data from objects that are in Apache Parquet format in an Amazon S3 bucket. The data engineer needs to query only one column of the data.
Which solution will meet these requirements with the LEAST operational overhead?

  1. Configure an AWS Lambda function to load data from the S3 bucket into a pandasdataframe. Write a SQL SELECT statement on the dataframe to query the required column.
  2. Use S3 Select to write a SQL SELECT statement to retrieve the required column from the S3 objects.
  3. Prepare an AWS Glue DataBrew project to consume the S3 objects and to query the required column.
  4. Run an AWS Glue crawler on the S3 objects. Use a SQL SELECT statement in Amazon Athena to query the required column.

Answer(s): B



A company uses Amazon Redshift for its data warehouse. The company must automate refresh schedules for Amazon Redshift materialized views.
Which solution will meet this requirement with the LEAST effort?

  1. Use Apache Airflow to refresh the materialized views.
  2. Use an AWS Lambda user-defined function (UDF) within Amazon Redshift to refresh the materialized views.
  3. Use the query editor v2 in Amazon Redshift to refresh the materialized views.
  4. Use an AWS Glue workflow to refresh the materialized views.

Answer(s): C



A data engineer must orchestrate a data pipeline that consists of one AWS Lambda function and one AWS Glue job. The solution must integrate with AWS services.
Which solution will meet these requirements with the LEAST management overhead?

  1. Use an AWS Step Functions workflow that includes a state machine. Configure the state machine to run the Lambda function and then the AWS Glue job.
  2. Use an Apache Airflow workflow that is deployed on an Amazon EC2 instance. Define a directed acyclic graph (DAG) in which the first task is to call the Lambda function and the second task is to call the AWS Glue job.
  3. Use an AWS Glue workflow to run the Lambda function and then the AWS Glue job.
  4. Use an Apache Airflow workflow that is deployed on Amazon Elastic Kubernetes Service (Amazon EKS). Define a directed acyclic graph (DAG) in which the first task is to call the Lambda function and the second task is to call the AWS Glue job.

Answer(s): A



A company needs to set up a data catalog and metadata management for data sources that run in the AWS Cloud. The company will use the data catalog to maintain the metadata of all the objects that are in a set of data stores. The data stores include structured sources such as Amazon RDS and Amazon Redshift. The data stores also include semistructured sources such as JSON files and .xml files that are stored in Amazon S3.
The company needs a solution that will update the data catalog on a regular basis. The solution also must detect changes to the source metadata.
Which solution will meet these requirements with the LEAST operational overhead?

  1. Use Amazon Aurora as the data catalog. Create AWS Lambda functions that will connect to the data catalog. Configure the Lambda functions to gather the metadata information from multiple sources and to update the Aurora data catalog. Schedule the Lambda functions to run periodically.
  2. Use the AWS Glue Data Catalog as the central metadata repository. Use AWS Glue crawlers to connect to multiple data stores and to update the Data Catalog with metadata changes. Schedule the crawlers to run periodically to update the metadata catalog.
  3. Use Amazon DynamoDB as the data catalog. Create AWS Lambda functions that will connect to the data catalog. Configure the Lambda functions to gather the metadata information from multiple sources and to update the DynamoDB data catalog. Schedule the Lambda functions to run periodically.
  4. Use the AWS Glue Data Catalog as the central metadata repository. Extract the schema for Amazon RDS and Amazon Redshift sources, and build the Data Catalog. Use AWS Glue crawlers for data that is in Amazon S3 to infer the schema and to automatically update the Data Catalog.

Answer(s): B



Page 13 of 39



Post your Comments and Discuss Amazon AWS Certified Data Engineer - Associate DEA-C01 exam with other Community members:

Abhishek commented on December 21, 2024
It was Nice
Anonymous
upvote

saif Ali commented on October 24, 2024
for Question no 50 The answer would be using lambda vdf as this provides automation
INDIA
upvote

Josh commented on October 09, 2024
Team, thanks for the wonderful support. This guide helped me a lot.
UNITED STATES
upvote

Ming commented on September 19, 2024
Very cool very precise. I highly recommend this study package.
UNITED STATES
upvote

Geovani commented on September 18, 2024
Very useful content and point by point explanation. And also the payment and download process was straight forward. Good job guys.
Italy
upvote