Free AWS Certified Data Engineer - Associate DEA-C01 Exam Braindumps (page: 8)

Page 8 of 39

A data engineer needs to use AWS Step Functions to design an orchestration workflow. The workflow must parallel process a large collection of data files and apply a specific transformation to each file.
Which Step Functions state should the data engineer use to meet these requirements?

  1. Parallel state
  2. Choice state
  3. Map state
  4. Wait state

Answer(s): C



A company is migrating a legacy application to an Amazon S3 based data lake. A data engineer reviewed data that is associated with the legacy application. The data engineer found that the legacy data contained some duplicate information.
The data engineer must identify and remove duplicate information from the legacy application data.
Which solution will meet these requirements with the LEAST operational overhead?

  1. Write a custom extract, transform, and load (ETL) job in Python. Use the DataFrame.drop_duplicates() function by importing the Pandas library to perform data deduplication.
  2. Write an AWS Glue extract, transform, and load (ETL) job. Use the FindMatches machine learning (ML) transform to transform the data to perform data deduplication.
  3. Write a custom extract, transform, and load (ETL) job in Python. Import the Python dedupe library. Use the dedupe library to perform data deduplication.
  4. Write an AWS Glue extract, transform, and load (ETL) job. Import the Python dedupe library. Use the dedupe library to perform data deduplication.

Answer(s): B



A company is building an analytics solution. The solution uses Amazon S3 for data lake storage and Amazon Redshift for a data warehouse. The company wants to use Amazon Redshift Spectrum to query the data that is in Amazon S3.
Which actions will provide the FASTEST queries? (Choose two.)

  1. Use gzip compression to compress individual files to sizes that are between 1 GB and 5 GB.
  2. Use a columnar storage file format.
  3. Partition the data based on the most common query predicates.
  4. Split the data into files that are less than 10 KB.
  5. Use file formats that are not splittable.

Answer(s): B,C



A company uses Amazon RDS to store transactional data. The company runs an RDS DB instance in a private subnet. A developer wrote an AWS Lambda function with default settings to insert, update, or delete data in the DB instance.
The developer needs to give the Lambda function the ability to connect to the DB instance privately without using the public internet.
Which combination of steps will meet this requirement with the LEAST operational overhead? (Choose two.)

  1. Turn on the public access setting for the DB instance.
  2. Update the security group of the DB instance to allow only Lambda function invocations on the database port.
  3. Configure the Lambda function to run in the same subnet that the DB instance uses.
  4. Attach the same security group to the Lambda function and the DB instance. Include a self-referencing rule that allows access through the database port.
  5. Update the network ACL of the private subnet to include a self-referencing rule that allows access through the database port.

Answer(s): C,D



Page 8 of 39



Post your Comments and Discuss Amazon AWS Certified Data Engineer - Associate DEA-C01 exam with other Community members:

saif Ali commented on October 24, 2024
for Question no 50 The answer would be using lambda vdf as this provides automation
INDIA
upvote

Josh commented on October 09, 2024
Team, thanks for the wonderful support. This guide helped me a lot.
UNITED STATES
upvote

Ming commented on September 19, 2024
Very cool very precise. I highly recommend this study package.
UNITED STATES
upvote

Geovani commented on September 18, 2024
Very useful content and point by point explanation. And also the payment and download process was straight forward. Good job guys.
Italy
upvote