Amazon AWS Certified Data Engineer - Associate DEA-C01 Exam
AWS Certified Data Engineer - Associate DEA-C01 (Page 5 )

Updated On: 1-Feb-2026

A company is building a data lake for a new analytics team. The company is using Amazon S3 for storage and Amazon Athena for query analysis. All data that is in Amazon S3 is in Apache Parquet format.
The company is running a new Oracle database as a source system in the company’s data center. The company has 70 tables in the Oracle database. All the tables have primary keys. Data can occasionally change in the source system. The company wants to ingest the tables every day into the data lake.
Which solution will meet this requirement with the LEAST effort?

  1. Create an Apache Sqoop job in Amazon EMR to read the data from the Oracle database. Configure the Sqoop job to write the data to Amazon S3 in Parquet format.
  2. Create an AWS Glue connection to the Oracle database. Create an AWS Glue bookmark job to ingest the data incrementally and to write the data to Amazon S3 in Parquet format.
  3. Create an AWS Database Migration Service (AWS DMS) task for ongoing replication. Set the Oracle database as the source. Set Amazon S3 as the target. Configure the task to write the data in Parquet format.
  4. Create an Oracle database in Amazon RDS. Use AWS Database Migration Service (AWS DMS) to migrate the on-premises Oracle database to Amazon RDS. Configure triggers on the tables to invoke AWS Lambda functions to write changed records to Amazon S3 in Parquet format.

Answer(s): C



A transportation company wants to track vehicle movements by capturing geolocation records. The records are 10 bytes in size. The company receives up to 10.000 records every second. Data transmission delays of a few minutes are acceptable because of unreliable network conditions.
The transportation company wants to use Amazon Kinesis Data Streams to ingest the geolocation data. The company needs a reliable mechanism to send data to Kinesis Data Streams. The company needs to maximize the throughput efficiency of the Kinesis shards.
Which solution will meet these requirements in the MOST operationally efficient way?

  1. Kinesis Agent
  2. Kinesis Producer Library (KPL)
  3. Amazon Kinesis Data Firehose
  4. Kinesis SDK

Answer(s): B



An investment company needs to manage and extract insights from a volume of semi-structured data that grows continuously.
A data engineer needs to deduplicate the semi-structured data, remove records that are duplicates, and remove common misspellings of duplicates.
Which solution will meet these requirements with the LEAST operational overhead?

  1. Use the FindMatches feature of AWS Glue to remove duplicate records.
  2. Use non-Windows functions in Amazon Athena to remove duplicate records.
  3. Use Amazon Neptune ML and an Apache Gremlin script to remove duplicate records.
  4. Use the global tables feature of Amazon DynamoDB to prevent duplicate data.

Answer(s): A



A company is building an inventory management system and an inventory reordering system to automatically reorder products. Both systems use Amazon Kinesis Data Streams. The inventory management system uses the Amazon Kinesis Producer Library (KPL) to publish data to a stream. The inventory reordering system uses the Amazon Kinesis Client Library (KCL) to consume data from the stream. The company configures the stream to scale up and down as needed.
Before the company deploys the systems to production, the company discovers that the inventory reordering system received duplicated data.
Which factors could have caused the reordering system to receive duplicated data? (Choose two.)

  1. The producer experienced network-related timeouts.
  2. The stream’s value for the IteratorAgeMilliseconds metric was too high.
  3. There was a change in the number of shards, record processors, or both.
  4. The AggregationEnabled configuration property was set to true.
  5. The max_records configuration property was set to a number that was too high.

Answer(s): A,C



An ecommerce company operates a complex order fulfilment process that spans several operational systems hosted in AWS. Each of the operational systems has a Java Database
Connectivity (JDBC)-compliant relational database where the latest processing state is captured.
The company needs to give an operations team the ability to track orders on an hourly basis across the entire fulfillment process.
Which solution will meet these requirements with the LEAST development overhead?

  1. Use AWS Glue to build ingestion pipelines from the operational systems into Amazon Redshift Build dashboards in Amazon QuickSight that track the orders.
  2. Use AWS Glue to build ingestion pipelines from the operational systems into Amazon DynamoDBuild dashboards in Amazon QuickSight that track the orders.
  3. Use AWS Database Migration Service (AWS DMS) to capture changed records in the operational systems. Publish the changes to an Amazon DynamoDB table in a different AWS region from the source database. Build Grafana dashboards that track the orders.
  4. Use AWS Database Migration Service (AWS DMS) to capture changed records in the operational systems. Publish the changes to an Amazon DynamoDB table in a different AWS region from the source database. Build Amazon QuickSight dashboards that track the orders.

Answer(s): A



Viewing page 5 of 43
Viewing questions 21 - 25 out of 298 questions



Post your Comments and Discuss Amazon AWS Certified Data Engineer - Associate DEA-C01 exam prep with other Community members:

Join the AWS Certified Data Engineer - Associate DEA-C01 Discussion