A team of data scientists plans to analyze market trend data for their company's new investment strategy. The trend data comes from ve different data sources in large volumes. The team wants to utilize Amazon Kinesis to support their use case. The team uses SQL-like queries to analyze trends and wants to send noti cations based on certain signi cant patterns in the trends. Additionally, the data scientists want to save the data to Amazon S3 for archival and historical re- processing, and use AWS managed services wherever possible. The team wants to implement the lowest-cost solution.Which solution meets these requirements?
Answer(s): B
A concise explanation of the correct choice and why others are incorrect follows.B) Correct: Kinesis Data Analytics (KDA) provides SQL-like queries over streams, enabling real-time trend analysis; Lambda can output notifications via SNS, and Kinesis Data Firehose can persist to S3 for archival with low operational cost using managed services.A) Uses a custom KCL app instead of managed KDA for SQL-like queries, increasing maintenance and cost; Firehose on a single stream may suffice, but the architecture isn’t optimized for SQL analytics and serverless outputs.C) Splits data across streams unnecessarily and uses Firehose on a second stream; Lambda-based output still relies on reactive processing rather than streamlined real-time analysis on a single stream.D) Uses a custom KCL for analysis and Firehose on the second stream, adding complexity and cost; two streams without leveraging KDA for SQL analytics is suboptimal.
A company currently uses Amazon Athena to query its global datasets. The regional data is stored in Amazon S3 in the us-east-1 and us-west-2 Regions. The data is not encrypted. To simplify the query process and manage it centrally, the company wants to use Athena in us-west-2 to query data from Amazon S3 in bothRegions. The solution should be as low-cost as possible.What should the company do to achieve this goal?
Athena cross-region querying is supported when the Data Catalog is aware of tables across regions; using a single us-west-2 catalog via a crawler in that region to discover datasets in all Regions minimizes replication and keeps costs low.A) DMS migration of Glue Data Catalog is unnecessary for cross-region catalog access and adds cost/time.C) Cross-Region replication of S3 adds storage and transfer costs; duplicating data is not required for cataloging.D) Granting cross-region catalog access via policy is not a valid mechanism to register and query remote-region data; catalog synchronization is needed. B) Correct: a single crawler in us-west-2 catalogs datasets in all Regions, enabling centralized queries at minimal cost.
A large company receives les from external parties in Amazon EC2 throughout the day. At the end of the day, the les are combined into a single le, compressed into a gzip le, and uploaded to Amazon S3. The total size of all the les is close to 100 GB daily. Once the les are uploaded to Amazon S3, anAWS Batch program executes a COPY command to load the les into an Amazon Redshift cluster.Which program modi cation will accelerate the COPY process?
The correct answer is B. Aligning the number of input files to a multiple of Redshift slice count enables optimal parallelism during COPY, reducing bottlenecks and improving throughput. Splitting into equal-sized files and compressing them minimizes I/O and speeds up network transfer to S3 and subsequent loading.A is incorrect because uploading per-file as they arrive prevents optimal parallelism and can introduce scheduling delays; combining into one big file at the end underutilizes parallel COPY across slices. C is incorrect since using compute node count rather than slice count determines parallelism in distribution, but the COPY parallelism is driven by slices, not nodes. D is incorrect because sharding by distkey value alignment is beneficial for JOIN performance, not for COPY throughput.
https://docs.aws.amazon.com/redshift/latest/dg/t_splitting-data- les.html
A large ride-sharing company has thousands of drivers globally serving millions of unique customers every day. The company has decided to migrate an existing data mart to Amazon Redshift. The existing schema includes the following tables.A trips fact table for information on completed rides.A drivers dimension table for driver pro les.A customers fact table holding customer pro le information.The company analyzes trip details by date and destination to examine pro tability by region. The drivers data rarely changes. The customers data frequently changes.What table design provides optimal query performance?
Answer(s): C
The optimal design uses a key distribution on trips by destination to colocate data with the relevant dimension and uses a dense, rarely changing drivers dimension with DISTSTYLE ALL to ensure joins are fast; customers, which changes frequently, uses DISTSTYLE EVEN to avoid redistribution overhead on updates.A) Using DISTSTYLE KEY on trips and ALL on drivers and customers is suboptimal because customers change frequently, and ALL on large dimensions can cause data skew and maintenance overhead.B) DISTSTYLE EVEN for trips disperses data, hurting joins with destination-based analysis; drivers ALL is ok, but customers needs EVEN to handle frequent updates efficiently.D) Distinguishing drivers with EVEN and using ALL for both fact tables ignores the benefits of destination-based co-location and proper handling of high-change customer data.
Three teams of data analysts use Apache Hive on an Amazon EMR cluster with the EMR File System (EMRFS) to query data stored within each teams AmazonS3 bucket. The EMR cluster has Kerberos enabled and is con gured to authenticate users from the corporate Active Directory. The data is highly sensitive, so access must be limited to the members of each team.Which steps will satisfy the security requirements?
The correct option ensures least-privilege by separating base EMR service access (no S3) from per-bucket access granted to team roles, and links those roles to AD groups via security configuration, allowing Kerberos/AD-based authorization without broad S3 access.A) Incorrect: binds additional roles to EMR role policy instead of EMR EC2 service role trust, misconfiguring trust for instance profiles.C) Incorrect: EMR service role grants full S3 access, violating least privilege.D) Incorrect: trusts base EMR roles instead of configuring trust for the per-team bucket roles, leading to improper permission scoping.
A company is planning to create a data lake in Amazon S3. The company wants to create tiered storage based on access patterns and cost objectives. The solution must include support for JDBC connections from legacy clients, metadata management that allows federation for access control, and batch-based ETL using PySpark and Scala. Operational management should be limited. Which combination of components can meet these requirements? (Choose three.)
Answer(s): A,C,E
A) AWS Glue Data Catalog for metadata managementC) AWS Glue for Scala-based ETLE) Amazon Athena for querying data in Amazon S3 using JDBC driversA) Glue Data Catalog provides centralized metadata management with federation-compatible IAM and fine-grained access control, enabling cross-account and cross-service metadata access suitable for a data lake with tiered storage.C) Glue supports PySpark and Scala-based ETL jobs via AWS Glue Studio/ETL, aligning with batch-based processing requirements.E) Athena natively queries S3 data and supports JDBC-based connections through drivers, enabling legacy JDBC clients without heavy operational overhead.B) EMR with Spark adds complexity and ongoing cluster ops; not minimal management.D) EMR with Hive lacks federation metadata management and JDBC client support in a minimal setup.F) EMR with Hive with RDS-backed metastore introduces unnecessary managed RDS dependency and operational overhead.
https://d1.awsstatic.com/whitepapers/Storage/data-lake-on-aws.pdf
A company wants to optimize the cost of its data and analytics platform. The company is ingesting a number of .csv and JSON les in Amazon S3 from various data sources. Incoming data is expected to be 50 GB each day. The company is using Amazon Athena to query the raw data in Amazon S3 directly. Most queries aggregate data from the past 12 months, and data that is older than 5 years is infrequently queried. The typical query scans about 500 MB of data and is expected to return results in less than 1 minute. The raw data must be retained inde nitely for compliance requirements.Which solution meets the company's requirements?
Answer(s): A
A) Compress, partition, and convert to a columnar format; query via Athena; move processed data to S3 Standard-IA after 5 years; move raw data to Glacier after 7 days. This aligns with cost optimization for infrequently accessed data post-5 years, while preserving immutable raw data in Glacier for long-term archival per compliance; columnar format and partitioning optimize query performance and reduce scanned data, meeting sub-1-minute response. B) uses row-based format harming scan efficiency; C) and D) base lifecycle on last accessed date, which is unsuitable for immutable retention policy and may misprice data hot/cold, plus Glacier timing remains 7 days after last access, not alignment with 5-year access patterns.
An energy company collects voltage data in real time from sensors that are attached to buildings. The company wants to receive noti cations when a sequence of two voltage drops is detected within 10 minutes of a sudden voltage increase at the same building. All noti cations must be delivered as quickly as possible. The system must be highly available. The company needs a solution that will automatically scale when this monitoring feature is implemented in other cities. The noti cation system is subscribed to an Amazon Simple Noti cation Service (Amazon SNS) topic for remediation.Which solution will meet these requirements?
A) The combination of KSQL-like streaming with Spark on an auto-scaling EMR cluster provides real-time processing, scalable ingestion, and low-latency SNS notifications, meeting high availability and on-demand scaling as deployments expand to other cities.B) REST+Lambda with RDS is not optimal for real-time sequence detection due to potential latency, limited scalability, and overhead of polling queries on relational DBs for streaming patterns.C) Kinesis Data Firehose with Lambda provides near real-time but Firehose is primarily for delivery, not complex stateful sequence detection across events within a window.D) Separate streams with Kinesis Data Analytics for Java adds unnecessary complexity; dual streams and polling increase latency and operational overhead without clear advantage over Spark on EMR.
https://aws.amazon.com/kinesis/data-streams/faqs/
Post your Comments and Discuss Amazon DAS-C01 exam dumps with other Community members:
CloudFront
ALB
AWS PrivateLink
CRR
SSE-S3
Athena
S3
SSE-KMS
RDS Custom for Oracle
s3:GetObject
Amazon OpenSearch Service
CloudWatch Logs
Kinesis Data Firehose
Kinesis
S3 bucket
SQS
AWS Lambda
AWS Secrets Manager
AWS Systems Manager OpsCenter
secretsmanager:GetSecretValue
seq
for h in {1..254}
for h in $(seq 1 254); do
Kinesis Data Streams
Amazon Redshift
secrets:GetSecretValue
aws:PrincipalOrgID
"aws:PrincipalOrgID": "o-1234567890"
Azure Bot Service
Microsoft.Network/applicationSecurityGroups
Microsoft.Network/bastions
Microsoft.Network
COPY INTO
SELECT
COPY INTO @stage/path/file.csv FROM (SELECT col1, col2 FROM my_table WHERE date >= '2024-01-01') FILE_FORMAT=(TYPE=CSV);
Users
External collaboration settings
zone
subinterfaces
test
test security-policy-match
Our website is free, but we have to fight against AI bots and content theft. We're sorry for the inconvenience caused by these security measures. You can access the rest of the DAS-C01 content, but please register or login to continue.