Amazon AWS Certified Data Engineer - Associate DEA-C01 Exam
AWS Certified Data Engineer - Associate DEA-C01 (Page 4 )

Updated On: 1-Feb-2026

A data engineer uses Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to run data pipelines in an AWS account.
A workflow recently failed to run. The data engineer needs to use Apache Airflow logs to diagnose the failure of the workflow.
Which log type should the data engineer use to diagnose the cause of the failure?

  1. YourEnvironmentName-WebServer
  2. YourEnvironmentName-Scheduler
  3. YourEnvironmentName-DAGProcessing
  4. YourEnvironmentName-Task

Answer(s): D



A company currently uses a provisioned Amazon EMR cluster that includes general purpose Amazon EC2 instances. The EMR cluster uses EMR managed scaling between one to five task nodes for the company’s long-running Apache Spark extract, transform, and load (ETL) job. The company runs the ETL job every day.
When the company runs the ETL job, the EMR cluster quickly scales up to five nodes. The EMR cluster often reaches maximum CPU usage, but the memory usage remains under 30%.
The company wants to modify the EMR cluster configuration to reduce the EMR costs to run the daily ETL job.
Which solution will meet these requirements MOST cost-effectively?

  1. Increase the maximum number of task nodes for EMR managed scaling to 10.
  2. Change the task node type from general purpose EC2 instances to memory optimized EC2 instances.
  3. Switch the task node type from general purpose Re instances to compute optimized EC2 instances.
  4. Reduce the scaling cooldown period for the provisioned EMR cluster.

Answer(s): C



A finance company uses Amazon Redshift as a data warehouse. The company stores the data in a shared Amazon S3 bucket. The company uses Amazon Redshift Spectrum to access the data that is stored in the S3 bucket. The data comes from certified third-party data providers. Each third-party data provider has unique connection details.
To comply with regulations, the company must ensure that none of the data is accessible from outside the company's AWS environment.
Which combination of steps should the company take to meet these requirements? (Choose two.)

  1. Replace the existing Redshift cluster with a new Redshift cluster that is in a private subnet. Use an interface VPC endpoint to connect to the Redshift cluster. Use a NAT gateway to give Redshift access to the S3 bucket.
  2. Create an AWS CloudHSM hardware security module (HSM) for each data provider. Encrypt each data provider's data by using the corresponding HSM for each data provider.
  3. Turn on enhanced VPC routing for the Amazon Redshift cluster. Set up an AWS Direct Connect connection and configure a connection between each data provider and the finance company’s VP
  4. Define table constraints for the primary keys and the foreign keys.
  5. Use federated queries to access the data from each data provider. Do not upload the data to the S3 bucket. Perform the federated queries through a gateway VPC endpoint.

Answer(s): A,C



A technology company currently uses Amazon Kinesis Data Streams to collect log data in real time. The company wants to use Amazon Redshift for downstream real-time queries and to enrich the log data.
Which solution will ingest data into Amazon Redshift with the LEAST operational overhead?

  1. Set up an Amazon Kinesis Data Firehose delivery stream to send data to a Redshift provisioned cluster table.
  2. Set up an Amazon Kinesis Data Firehose delivery stream to send data to Amazon S3. Configure a Redshift provisioned cluster to load data every minute.
  3. Configure Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) to send data directly to a Redshift provisioned cluster table.
  4. Use Amazon Redshift streaming ingestion from Kinesis Data Streams and to present data as a materialized view.

Answer(s): D



A company maintains a data warehouse in an on-premises Oracle database. The company wants to build a data lake on AWS. The company wants to load data warehouse tables into Amazon S3 and synchronize the tables with incremental data that arrives from the data warehouse every day.
Each table has a column that contains monotonically increasing values. The size of each table is less than 50 GB. The data warehouse tables are refreshed every night between 1 AM and 2 AM. A business intelligence team queries the tables between 10 AM and 8 PM every day.
Which solution will meet these requirements in the MOST operationally efficient way?

  1. Use an AWS Database Migration Service (AWS DMS) full load plus CDC job to load tables that contain monotonically increasing data columns from the on-premises data warehouse to Amazon S3. Use custom logic in AWS Glue to append the daily incremental data to a full-load copy that is in Amazon S3.
  2. Use an AWS Glue Java Database Connectivity (JDBC) connection. Configure a job bookmark for a column that contains monotonically increasing values. Write custom logic to append the daily incremental data to a full-load copy that is in Amazon S3.
  3. Use an AWS Database Migration Service (AWS DMS) full load migration to load the data warehouse tables into Amazon S3 every day. Overwrite the previous day's full-load copy every day.
  4. Use AWS Glue to load a full copy of the data warehouse tables into Amazon S3 every day. Overwrite the previous day's full-load copy every day.

Answer(s): A



Viewing page 4 of 43
Viewing questions 16 - 20 out of 298 questions



Post your Comments and Discuss Amazon AWS Certified Data Engineer - Associate DEA-C01 exam prep with other Community members:

Join the AWS Certified Data Engineer - Associate DEA-C01 Discussion