Free AWS Certified Machine Learning - Specialty Exam Braindumps (page: 27)

Page 27 of 84

A Machine Learning Specialist is deciding between building a naive Bayesian model or a full Bayesian network for a classification problem. The Specialist computes the Pearson correlation coefficients between each feature and finds that their absolute values range between 0.1 to 0.95.

Which model describes the underlying data in this situation?

  1. A naive Bayesian model, since the features are all conditionally independent.
  2. A full Bayesian network, since the features are all conditionally independent.
  3. A naive Bayesian model, since some of the features are statistically dependent.
  4. A full Bayesian network, since some of the features are statistically dependent.

Answer(s): D

Explanation:

In a full Bayesian network, features are connected to each other by edges that represent their conditional dependence relationships. A full Bayesian network is useful when the relationships between the features are complex, non-linear or when they are not conditionally independent.

In this situation, where the Pearson correlation coefficients range between 0.1 and 0.95, it suggests that there are dependencies between the features, indicating that a full Bayesian network would be appropriate to capture the relationships between the features and model the data.


Reference:

https://towardsdatascience.com/basics-of-bayesian-network-79435e11ae7b and https://www.quora.com/Whats-the-difference-between-a-naive-Bayes-classifier-and-a-Bayesian-network?share=1



A Data Scientist is building a linear regression model and will use resulting p-values to evaluate the statistical significance of each coefficient. Upon inspection of the dataset, the Data Scientist discovers that most of the features are normally distributed. The plot of one feature in the dataset is shown in the graphic.


What transformation should the Data Scientist apply to satisfy the statistical assumptions of the linear regression model?

  1. Exponential transformation
  2. Logarithmic transformation
  3. Polynomial transformation
  4. Sinusoidal transformation

Answer(s): B


Reference:

https://corporatefinanceinstitute.com/resources/knowledge/other/positively-skewed-distribution/#:~:text=For%20positively%20skewed%20distributions%2C%20the,each%20value%20in%20the%20dataset.



A Machine Learning Specialist is assigned to a Fraud Detection team and must tune an XGBoost model, which is working appropriately for test data. However, with unknown data, it is not working as expected. The existing parameters are provided as follows.


Which parameter tuning guidelines should the Specialist follow to avoid overfitting?

  1. Increase the max_depth parameter value.
  2. Lower the max_depth parameter value.
  3. Update the objective to binary:logistic.
  4. Lower the min_child_weight parameter value.

Answer(s): B



A data scientist is developing a pipeline to ingest streaming web traffic data. The data scientist needs to implement a process to identify unusual web traffic patterns as part of the pipeline. The patterns will be used downstream for alerting and incident response. The data scientist has access to unlabeled historic data to use, if needed.

The solution needs to do the following:
•Calculate an anomaly score for each web traffic entry.
•Adapt unusual event identification to changing web patterns over time.

Which approach should the data scientist implement to meet these requirements?

  1. Use historic web traffic data to train an anomaly detection model using the Amazon SageMaker Random Cut Forest (RCF) built-in model. Use an Amazon Kinesis Data Stream to process the incoming web traffic data. Attach a preprocessing AWS Lambda function to perform data enrichment by calling the RCF model to calculate the anomaly score for each record.
  2. Use historic web traffic data to train an anomaly detection model using the Amazon SageMaker built-in XGBoost model. Use an Amazon Kinesis Data Stream to process the incoming web traffic data. Attach a preprocessing AWS Lambda function to perform data enrichment by calling the XGBoost model to calculate the anomaly score for each record.
  3. Collect the streaming data using Amazon Kinesis Data Firehose. Map the delivery stream as an input source for Amazon Kinesis Data Analytics. Write a SQL query to run in real time against the streaming data with the k-Nearest Neighbors (kNN) SQL extension to calculate anomaly scores for each record using a tumbling window.
  4. Collect the streaming data using Amazon Kinesis Data Firehose. Map the delivery stream as an input source for Amazon Kinesis Data Analytics. Write a SQL query to run in real time against the streaming data with the Amazon Random Cut Forest (RCF) SQL extension to calculate anomaly scores for each record using a sliding window.

Answer(s): D

Explanation:

"The algorithm starts developing the machine learning model using current records in the stream when you start the application. The algorithm does not use older records in the stream for machine learning, nor does it use statistics from previous executions of the application."


Reference:

https://docs.aws.amazon.com/kinesisanalytics/latest/sqlref/sqlrf-random-cut-forest.html



Page 27 of 84



Post your Comments and Discuss Amazon AWS Certified Machine Learning - Specialty exam with other Community members:

Perumal commented on March 01, 2024
Very useful
Anonymous
upvote

Reddy commented on December 14, 2023
these are pretty useful
Anonymous
upvote

Reddy commented on December 14, 2023
These are pretty useful
Anonymous
upvote

Nik commented on July 16, 2021
These study guides are the same as any other exam dums except you get them here for a very discounted price. Quality and formatting is good plus the Xengine App software is a good simulator tool which comes for free.
UNITED STATES
upvote