Free AWS Certified Data Engineer - Associate DEA-C01 Exam Braindumps (page: 20)

Page 20 of 39

A data engineer is configuring Amazon SageMaker Studio to use AWS Glue interactive sessions to prepare data for machine learning (ML) models.
The data engineer receives an access denied error when the data engineer tries to prepare the data by using SageMaker Studio.
Which change should the engineer make to gain access to SageMaker Studio?

  1. Add the AWSGlueServiceRole managed policy to the data engineer's IAM user.
  2. Add a policy to the data engineer's IAM user that includes the sts:AssumeRole action for the AWS Glue and SageMaker service principals in the trust policy.
  3. Add the AmazonSageMakerFullAccess managed policy to the data engineer's IAM user.
  4. Add a policy to the data engineer's IAM user that allows the sts:AddAssociation action for the AWS Glue and SageMaker service principals in the trust policy.

Answer(s): C



A company extracts approximately 1 TB of data every day from data sources such as SAP HANA, Microsoft SQL Server, MongoDB, Apache Kafka, and Amazon DynamoDB. Some of the data sources have undefined data schemas or data schemas that change.
A data engineer must implement a solution that can detect the schema for these data sources. The solution must extract, transform, and load the data to an Amazon S3 bucket. The company has a service level agreement (SLA) to load the data into the S3 bucket within 15 minutes of data creation.
Which solution will meet these requirements with the LEAST operational overhead?

  1. Use Amazon EMR to detect the schema and to extract, transform, and load the data into the S3 bucket. Create a pipeline in Apache Spark.
  2. Use AWS Glue to detect the schema and to extract, transform, and load the data into the S3 bucket. Create a pipeline in Apache Spark.
  3. Create a PySpark program in AWS Lambda to extract, transform, and load the data into the S3 bucket.
  4. Create a stored procedure in Amazon Redshift to detect the schema and to extract, transform, and load the data into a Redshift Spectrum table. Access the table from Amazon S3.

Answer(s): B



A company has multiple applications that use datasets that are stored in an Amazon S3 bucket. The company has an ecommerce application that generates a dataset that contains personally identifiable information (PII). The company has an internal analytics application that does not require access to the PII.
To comply with regulations, the company must not share PII unnecessarily. A data engineer needs to implement a solution that with redact PII dynamically, based on the needs of each application that accesses the dataset.
Which solution will meet the requirements with the LEAST operational overhead?

  1. Create an S3 bucket policy to limit the access each application has. Create multiple copies of the dataset. Give each dataset copy the appropriate level of redaction for the needs of the application that accesses the copy.
  2. Create an S3 Object Lambda endpoint. Use the S3 Object Lambda endpoint to read data from the S3 bucket. Implement redaction logic within an S3 Object Lambda function to dynamically redact PII based on the needs of each application that accesses the data.
  3. Use AWS Glue to transform the data for each application. Create multiple copies of the dataset. Give each dataset copy the appropriate level of redaction for the needs of the application that accesses the copy.
  4. Create an API Gateway endpoint that has custom authorizers. Use the API Gateway endpoint to read data from the S3 bucket. Initiate a REST API call to dynamically redact PII based on the needs of each application that accesses the data.

Answer(s): B



A data engineer needs to build an extract, transform, and load (ETL) job. The ETL job will process daily incoming .csv files that users upload to an Amazon S3 bucket. The size of each S3 object is less than 100 MB.
Which solution will meet these requirements MOST cost-effectively?

  1. Write a custom Python application. Host the application on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster.
  2. Write a PySpark ETL script. Host the script on an Amazon EMR cluster.
  3. Write an AWS Glue PySpark job. Use Apache Spark to transform the data.
  4. Write an AWS Glue Python shell job. Use pandas to transform the data.

Answer(s): D



Page 20 of 39



Post your Comments and Discuss Amazon AWS Certified Data Engineer - Associate DEA-C01 exam with other Community members:

Abhishek commented on December 21, 2024
It was Nice
Anonymous
upvote

saif Ali commented on October 24, 2024
for Question no 50 The answer would be using lambda vdf as this provides automation
INDIA
upvote

Josh commented on October 09, 2024
Team, thanks for the wonderful support. This guide helped me a lot.
UNITED STATES
upvote

Ming commented on September 19, 2024
Very cool very precise. I highly recommend this study package.
UNITED STATES
upvote

Geovani commented on September 18, 2024
Very useful content and point by point explanation. And also the payment and download process was straight forward. Good job guys.
Italy
upvote