QUESTION: 25

A data engineer needs to join data from multiple sources to perform a one-time analysis job. The data is stored in Amazon DynamoDB, Amazon RDS, Amazon Redshift, and Amazon S3.
Which solution will meet this requirement MOST cost-effectively?

Use an Amazon EMR provisioned cluster to read from all sources. Use Apache Spark to join the data and perform the analysis.
Copy the data from DynamoDB, Amazon RDS, and Amazon Redshift into Amazon S3. Run Amazon Athena queries directly on the S3 files.
Use Amazon Athena Federated Query to join the data from all data sources.
Use Redshift Spectrum to query data from DynamoDB, Amazon RDS, and Amazon S3 directly from Redshift.

Answer(s): C

Show Answer Next Question

QUESTION: 26

A company is planning to use a provisioned Amazon EMR cluster that runs Apache Spark jobs to perform big data analysis. The company requires high reliability. A big data team must follow best practices for running cost-optimized and long-running workloads on Amazon EMR. The team must find a solution that will maintain the company's current level of performance.
Which combination of resources will meet these requirements MOST cost-effectively? (Choose two.)

Use Hadoop Distributed File System (HDFS) as a persistent data store.
Use Amazon S3 as a persistent data store.
Use x86-based instances for core nodes and task nodes.
Use Graviton instances for core nodes and task nodes.
Use Spot Instances for all primary nodes.

Answer(s): B,D

Show Answer Next Question

QUESTION: 27

A company wants to implement real-time analytics capabilities. The company wants to use Amazon Kinesis Data Streams and Amazon Redshift to ingest and process streaming data at the rate of several gigabytes per second. The company wants to derive near real-time insights by using existing business intelligence (BI) and analytics tools.
Which solution will meet these requirements with the LEAST operational overhead?

Use Kinesis Data Streams to stage data in Amazon S3. Use the COPY command to load data from Amazon S3 directly into Amazon Redshift to make the data immediately available for real-time analysis.
Access the data from Kinesis Data Streams by using SQL queries. Create materialized views directly on top of the stream. Refresh the materialized views regularly to query the most recent stream data.
Create an external schema in Amazon Redshift to map the data from Kinesis Data Streams to an Amazon Redshift object. Create a materialized view to read data from the stream. Set the materialized view to auto refresh.
Connect Kinesis Data Streams to Amazon Kinesis Data Firehose. Use Kinesis Data Firehose to stage the data in Amazon S3. Use the COPY command to load the data from Amazon S3 to a table in Amazon Redshift.

Answer(s): C

Show Answer Next Question

QUESTION: 28

A company uses an Amazon QuickSight dashboard to monitor usage of one of the company's applications. The company uses AWS Glue jobs to process data for the dashboard. The company stores the data in a single Amazon S3 bucket. The company adds new data every day.
A data engineer discovers that dashboard queries are becoming slower over time. The data engineer determines that the root cause of the slowing queries is long-running AWS Glue jobs.
Which actions should the data engineer take to improve the performance of the AWS Glue jobs? (Choose two.)

Partition the data that is in the S3 bucket. Organize the data by year, month, and day.
Increase the AWS Glue instance size by scaling up the worker type.
Convert the AWS Glue schema to the DynamicFrame schema class.
Adjust AWS Glue job scheduling frequency so the jobs run half as many times each day.
Modify the IAM role that grants access to AWS glue to grant access to all S3 features.

Answer(s): A,B

Show Answer Next Question

Free Amazon AWS Certified Data Engineer - Associate DEA-C01 Exam Questions (page: 8)

QUESTION: 25

QUESTION: 26

QUESTION: 27

QUESTION: 28

AWS Certified Data Engineer - Associate DEA-C01 Exam Discussions & Posts