Free AWS Certified Machine Learning - Specialty Exam Braindumps

A gaming company has launched an online game where people can start playing for free, but they need to pay if they choose to use certain features. The company needs to build an automated system to predict whether or not a new user will become a paid user within 1 year. The company has gathered a labeled dataset from 1 million users.

The training dataset consists of 1,000 positive samples (from users who ended up paying within 1 year) and 999,000 negative samples (from users who did not use any paid features). Each data sample consists of 200 features including user age, device, location, and play patterns.

Using this dataset for training, the Data Science team trained a random forest model that converged with over 99% accuracy on the training set. However, the prediction results on a test dataset were not satisfactory

Which of the following approaches should the Data Science team take to mitigate this issue? (Choose two.)

  1. Add more deep trees to the random forest to enable the model to learn more features.
  2. Include a copy of the samples in the test dataset in the training dataset.
  3. Generate more positive samples by duplicating the positive samples and adding a small amount of noise to the duplicated data.
  4. Change the cost function so that false negatives have a higher impact on the cost value than false positives.
  5. Change the cost function so that false positives have a higher impact on the cost value than false negatives.

Answer(s): C,D



A Data Scientist is developing a machine learning model to predict future patient outcomes based on information collected about each patient and their treatment plans. The model should output a continuous value as its prediction. The data available includes labeled outcomes for a set of 4,000 patients. The study was conducted on a group of individuals over the age of 65 who have a particular disease that is known to worsen with age.

Initial models have performed poorly. While reviewing the underlying data, the Data Scientist notices that, out of 4,000 patient observations, there are 450 where the patient age has been input as 0. The other features for these observations appear normal compared to the rest of the sample population

How should the Data Scientist correct this issue?

  1. Drop all records from the dataset where age has been set to 0.
  2. Replace the age field value for records with a value of 0 with the mean or median value from the dataset
  3. Drop the age feature from the dataset and train the model using the rest of the features.
  4. Use k-means clustering to handle missing features

Answer(s): D

Explanation:

Dropping the Age feature is a NOT ATOLL a good idea - as age plays a critical role in this disease as per the question
Dropping 10% of data is NOT a good idea considering the fact that the number of observations is already low.
The Mean or Median are a potential solutions
But the question says that "Disease worsens after age 65 so there is a correlation between age and other symptoms related feature" So that means that using Unsupervised Learning we can make pretty good prediction of "Age"


Reference:

https://medium.com/jungle-book/missing-data-filling-with-unsupervised-learning-b448964030d



A Data Science team is designing a dataset repository where it will store a large amount of training data commonly used in its machine learning models. As Data Scientists may create an arbitrary number of new datasets every day, the solution has to scale automatically and be cost-effective. Also, it must be possible to explore the data using SQL.

Which storage scheme is MOST adapted to this scenario?

  1. Store datasets as files in Amazon S3.
  2. Store datasets as files in an Amazon EBS volume attached to an Amazon EC2 instance.
  3. Store datasets as tables in a multi-node Amazon Redshift cluster.
  4. Store datasets as global tables in Amazon DynamoDB.

Answer(s): A



A Machine Learning Specialist deployed a model that provides product recommendations on a company's website. Initially, the model was performing very well and resulted in customers buying more products on average. However, within the past few months, the Specialist has noticed that the effect of product recommendations has diminished and customers are starting to return to their original habits of spending less. The Specialist is unsure of what happened, as the model has not changed from its initial deployment over a year ago.

Which method should the Specialist try to improve model performance?

  1. The model needs to be completely re-engineered because it is unable to handle product inventory changes.
  2. The model's hyperparameters should be periodically updated to prevent drift.
  3. The model should be periodically retrained from scratch using the original data while adding a regularization term to handle product inventory changes
  4. The model should be periodically retrained using the original training data plus new data as product inventory changes.

Answer(s): D






Post your Comments and Discuss Amazon AWS Certified Machine Learning - Specialty exam with other Community members:

Perumal commented on March 01, 2024
Very useful
Anonymous
upvote

Reddy commented on December 14, 2023
these are pretty useful
Anonymous
upvote

Reddy commented on December 14, 2023
These are pretty useful
Anonymous
upvote

Nik commented on July 16, 2021
These study guides are the same as any other exam dums except you get them here for a very discounted price. Quality and formatting is good plus the Xengine App software is a good simulator tool which comes for free.
UNITED STATES
upvote