Free AWS Certified Machine Learning - Specialty Exam Braindumps (page: 17)

Page 17 of 84

Machine Learning Specialist is building a model to predict future employment rates based on a wide range of economic factors. While exploring the data, the Specialist notices that the magnitude of the input features vary greatly. The Specialist does not want variables with a larger magnitude to dominate the model.

What should the Specialist do to prepare the data for model training?

  1. Apply quantile binning to group the data into categorical bins to keep any relationships in the data by replacing the magnitude with distribution.
  2. Apply the Cartesian product transformation to create new combinations of fields that are independent of the magnitude.
  3. Apply normalization to ensure each field will have a mean of 0 and a variance of 1 to remove any significant magnitude.
  4. Apply the orthogonal sparse bigram (OSB) transformation to apply a fixed-size sliding window to generate new features of a similar magnitude.

Answer(s): C


Reference:

https://docs.aws.amazon.com/machine-learning/latest/dg/data-transformations-reference.html



A Machine Learning Specialist must build out a process to query a dataset on Amazon S3 using Amazon Athena. The dataset contains more than 800,000 records stored as plaintext CSV files. Each record contains 200 columns and is approximately 1.5 MB in size. Most queries will span 5 to 10 columns only.

How should the Machine Learning Specialist transform the dataset to minimize query runtime?

  1. Convert the records to Apache Parquet format.
  2. Convert the records to JSON format.
  3. Convert the records to GZIP CSV format.
  4. Convert the records to XML format.

Answer(s): A

Explanation:

Using compressions will reduce the amount of data scanned by Amazon Athena, and also reduce your S3 bucket storage. It’s a Win-Win for your AWS bill. Supported formats: GZIP, LZO, SNAPPY (Parquet) and ZLIB.


Reference:

https://www.cloudforecast.io/blog/using-parquet-on-athena-to-save-money-on-aws/



A Machine Learning Specialist is developing a daily ETL workflow containing multiple ETL jobs. The workflow consists of the following processes:

•Start the workflow as soon as data is uploaded to Amazon S3.
•When all the datasets are available in Amazon S3, start an ETL job to join the uploaded datasets with multiple terabyte-sized datasets already stored in Amazon S3.
•Store the results of joining datasets in Amazon S3.
•If one of the jobs fails, send a notification to the Administrator.

Which configuration will meet these requirements?

  1. Use AWS Lambda to trigger an AWS Step Functions workflow to wait for dataset uploads to complete in Amazon S3. Use AWS Glue to join the datasets. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
  2. Develop the ETL workflow using AWS Lambda to start an Amazon SageMaker notebook instance. Use a lifecycle configuration script to join the datasets and persist the results in Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
  3. Develop the ETL workflow using AWS Batch to trigger the start of ETL jobs when data is uploaded to Amazon S3. Use AWS Glue to join the datasets in Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
  4. Use AWS Lambda to chain other Lambda functions to read and join the datasets in Amazon S3 as soon as the data is uploaded to Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.

Answer(s): A


Reference:

https://aws.amazon.com/step-functions/use-cases/



An agency collects census information within a country to determine healthcare and social program needs by province and city. The census form collects responses for approximately 500 questions from each citizen.

Which combination of algorithms would provide the appropriate insights? (Choose two.)

  1. The factorization machines (FM) algorithm
  2. The Latent Dirichlet Allocation (LDA) algorithm
  3. The principal component analysis (PCA) algorithm
  4. The k-means algorithm
  5. The Random Cut Forest (RCF) algorithm

Answer(s): C,D

Explanation:

The PCA and K-means algorithms are useful in collection of data using census form.



Page 17 of 84



Post your Comments and Discuss Amazon AWS Certified Machine Learning - Specialty exam with other Community members:

Perumal commented on March 01, 2024
Very useful
Anonymous
upvote

Reddy commented on December 14, 2023
these are pretty useful
Anonymous
upvote

Reddy commented on December 14, 2023
These are pretty useful
Anonymous
upvote

Nik commented on July 16, 2021
These study guides are the same as any other exam dums except you get them here for a very discounted price. Quality and formatting is good plus the Xengine App software is a good simulator tool which comes for free.
UNITED STATES
upvote