Free DAS-C01 Exam Braindumps (page: 17)

Page 16 of 42

A media analytics company consumes a stream of social media posts. The posts are sent to an Amazon Kinesis data stream partitioned on user_id. An AWS
Lambda function retrieves the records and validates the content before loading the posts into an Amazon OpenSearch Service (Amazon Elasticsearch Service) cluster. The validation process needs to receive the posts for a given user in the order they were received by the Kinesis data stream.
During peak hours, the social media posts take more than an hour to appear in the Amazon OpenSearch Service (Amazon ES) cluster. A data analytics specialist must implement a solution that reduces this latency with the least possible operational overhead.
Which solution meets these requirements?

  1. Migrate the validation process from Lambda to AWS Glue.
  2. Migrate the Lambda consumers from standard data stream iterators to an HTTP/2 stream consumer.
  3. Increase the number of shards in the Kinesis data stream.
  4. Send the posts stream to Amazon Managed Streaming for Apache Kafka instead of the Kinesis data stream.

Answer(s): C

Explanation:

For real-time processing of streaming data, Amazon Kinesis partitions data in multiple shards that can then be consumed by multiple Amazon EC2


Reference:

https://d1.awsstatic.com/whitepapers/AWS_Cloud_Best_Practices.pdf



A company launched a service that produces millions of messages every day and uses Amazon Kinesis Data Streams as the streaming service. The company uses the Kinesis SDK to write data to Kinesis Data Streams. A few months after launch, a data analyst found that write performance is signi cantly reduced. The data analyst investigated the metrics and determined that Kinesis is throttling the write requests. The data analyst wants to address this issue without signi cant changes to the architecture. Which actions should the data analyst take to resolve this issue? (Choose two.)

  1. Increase the Kinesis Data Streams retention period to reduce throttling.
  2. Replace the Kinesis API-based data ingestion mechanism with Kinesis Agent.
  3. Increase the number of shards in the stream using the UpdateShardCount API.
  4. Choose partition keys in a way that results in a uniform record distribution across shards.
  5. Customize the application code to include retry logic to improve performance.

Answer(s): C,D



A smart home automation company must e ciently ingest and process messages from various connected devices and sensors. The majority of these messages are comprised of a large number of small les. These messages are ingested using Amazon Kinesis Data Streams and sent to Amazon S3 using a Kinesis data stream consumer application. The Amazon S3 message data is then passed through a processing pipeline built on Amazon EMR running scheduled PySpark jobs.
The data platform team manages data processing and is concerned about the e ciency and cost of downstream data processing. They want to continue to use
PySpark.
Which solution improves the e ciency of the data processing jobs and is well architected?

  1. Send the sensor and devices data directly to a Kinesis Data Firehose delivery stream to send the data to Amazon S3 with Apache Parquet record format conversion enabled. Use Amazon EMR running PySpark to process the data in Amazon S3.
  2. Set up an AWS Lambda function with a Python runtime environment. Process individual Kinesis data stream messages from the connected devices and sensors using Lambda.
  3. Launch an Amazon Redshift cluster. Copy the collected data from Amazon S3 to Amazon Redshift and move the data processing jobs from Amazon EMR to Amazon Redshift.
  4. Set up AWS Glue Python jobs to merge the small data les in Amazon S3 into larger les and transform them to Apache Parquet format.
    Migrate the downstream PySpark jobs from Amazon EMR to AWS Glue.

Answer(s): A



A large nancial company is running its ETL process. Part of this process is to move data from Amazon S3 into an Amazon Redshift cluster. The company wants to use the most cost-e cient method to load the dataset into Amazon Redshift.
Which combination of steps would meet these requirements? (Choose two.)

  1. Use the COPY command with the manifest le to load data into Amazon Redshift.
  2. Use S3DistCp to load les into Amazon Redshift.
  3. Use temporary staging tables during the loading process.
  4. Use the UNLOAD command to upload data into Amazon Redshift.
  5. Use Amazon Redshift Spectrum to query les from Amazon S3.

Answer(s): A,C


Reference:

https://aws.amazon.com/blogs/big-data/top-8-best-practices-for-high-performance-etl-processing-using-amazon-redshift/






Post your Comments and Discuss Amazon DAS-C01 exam with other Community members:

DAS-C01 Discussions & Posts