Amazon AWS Certified Data Engineer - Associate DEA-C01 Exam
AWS Certified Data Engineer - Associate DEA-C01 (Page 2 )

Updated On: 1-Feb-2026

A data engineer is configuring an AWS Glue job to read data from an Amazon S3 bucket. The data engineer has set up the necessary AWS Glue connection details and an associated IAM role. However, when the data engineer attempts to run the AWS Glue job, the data engineer receives an error message that indicates that there are problems with the Amazon S3 VPC gateway endpoint.
The data engineer must resolve the error and connect the AWS Glue job to the S3 bucket.
Which solution will meet this requirement?

  1. Update the AWS Glue security group to allow inbound traffic from the Amazon S3 VPC gateway endpoint.
  2. Configure an S3 bucket policy to explicitly grant the AWS Glue job permissions to access the S3 bucket.
  3. Review the AWS Glue job code to ensure that the AWS Glue connection details include a fully qualified domain name.
  4. Verify that the VPC's route table includes inbound and outbound routes for the Amazon S3 VPC gateway endpoint.

Answer(s): D



A company uploads .csv files to an Amazon S3 bucket. The company’s data platform team has set up an AWS Glue crawler to perform data discovery and to create the tables and schemas.
An AWS Glue job writes processed data from the tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creates the Amazon Redshift tables in the Redshift database appropriately.
If the company reruns the AWS Glue job for any reason, duplicate records are introduced into the Amazon Redshift tables. The company needs a solution that will update the Redshift tables without duplicates.
Which solution will meet these requirements?

  1. Modify the AWS Glue job to copy the rows into a staging Redshift table. Add SQL commands to update the existing rows with new values from the staging Redshift table.
  2. Modify the AWS Glue job to load the previously inserted data into a MySQL database. Perform an upsert operation in the MySQL database. Copy the results to the Amazon Redshift tables.
  3. Use Apache Spark’s DataFrame dropDuplicates() API to eliminate duplicates. Write the data to the Redshift tables.
  4. Use the AWS Glue ResolveChoice built-in transform to select the value of the column from the most recent record.

Answer(s): A



A company is using Amazon Redshift to build a data warehouse solution. The company is loading hundreds of files into a fact table that is in a Redshift cluster.
The company wants the data warehouse solution to achieve the greatest possible throughput. The solution must use cluster resources optimally when the company loads data into the fact table.
Which solution will meet these requirements?

  1. Use multiple COPY commands to load the data into the Redshift cluster.
  2. Use S3DistCp to load multiple files into Hadoop Distributed File System (HDFS). Use an HDFS connector to ingest the data into the Redshift cluster.
  3. Use a number of INSERT statements equal to the number of Redshift cluster nodes. Load the data in parallel into each node.
  4. Use a single COPY command to load the data into the Redshift cluster.

Answer(s): D



A company ingests data from multiple data sources and stores the data in an Amazon S3 bucket. An AWS Glue extract, transform, and load (ETL) job transforms the data and writes the transformed data to an Amazon S3 based data lake. The company uses Amazon Athena to query the data that is in the data lake.
The company needs to identify matching records even when the records do not have a common unique identifier.
Which solution will meet this requirement?

  1. Use Amazon Macie pattern matching as part of the ETL job.
  2. Train and use the AWS Glue PySpark Filter class in the ETL job.
  3. Partition tables and use the ETL job to partition the data on a unique identifier.
  4. Train and use the AWS Lake Formation FindMatches transform in the ETL job.

Answer(s): D



A data engineer is using an AWS Glue crawler to catalog data that is in an Amazon S3 bucket. The S3 bucket contains both .csv and json files. The data engineer configured the crawler to exclude the .json files from the catalog.
When the data engineer runs queries in Amazon Athena, the queries also process the excluded .json files. The data engineer wants to resolve this issue. The data engineer needs a solution that will not affect access requirements for the .csv files in the source S3 bucket.
Which solution will meet this requirement with the SHORTEST query times?

  1. Adjust the AWS Glue crawler settings to ensure that the AWS Glue crawler also excludes .json files.
  2. Use the Athena console to ensure the Athena queries also exclude the .json files.
  3. Relocate the .json files to a different path within the S3 bucket.
  4. Use S3 bucket policies to block access to the .json files.

Answer(s): C



Viewing page 2 of 43
Viewing questions 6 - 10 out of 298 questions



Post your Comments and Discuss Amazon AWS Certified Data Engineer - Associate DEA-C01 exam prep with other Community members:

Join the AWS Certified Data Engineer - Associate DEA-C01 Discussion