QUESTION: 89

A Machine Learning Specialist is given a structured dataset on the shopping habits of a company’s customer base. The dataset contains thousands of columns of data and hundreds of numerical columns for each customer. The Specialist wants to identify whether there are natural groupings for these columns across all customers and visualize the results as quickly as possible.

What approach should the Specialist take to accomplish these tasks?

Embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a scatter plot.
Run k-means using the Euclidean distance measure for different values of k and create an elbow plot.
Embed the numerical features using the t-distributed stochastic neighbor embedding (t-SNE) algorithm and create a line graph.
Run k-means using the Euclidean distance measure for different values of k and create box plots for each numerical column within each cluster.

Answer(s): A

Explanation:

Pay attention to what the question is asking:
"Whether there are natural groupings for these columns across all customers and visualize the results as quickly as possible"

The key point is to visualize the "groupings"(exactly what t-SNE scatter plot does, it visualize high-dimensional data points on 2D space).
The question does not ask to visualize how many groups you would classify (K-Means Elbow Plot does not visualize the groupings, it is used to determine the optimal # of groups=K).

Reference:

https://towardsdatascience.com/an-introduction-to-t-sne-with-python-example-5a3a293108d1

Reveal Solution Next Question

QUESTION: 90

A Machine Learning Specialist is planning to create a long-running Amazon EMR cluster. The EMR cluster will have 1 master node, 10 core nodes, and 20 task nodes. To save on costs, the Specialist will use Spot Instances in the EMR cluster.

Which nodes should the Specialist launch on Spot Instances?

Master node
Any of the core nodes
Any of the task nodes
Both core and task nodes

Answer(s): C

Explanation:

"Long-Running Clusters and Data Warehouses
If you are running a persistent Amazon EMR cluster that has a predictable variation in computational capacity, such as a data warehouse, you can handle peak demand at lower cost with Spot Instances. You can launch your master and core instance groups as On-Demand Instances to handle the normal capacity and launch task instance groups as Spot Instances to handle your peak load requirements."

Reference:

https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-instances-guidelines.html

Reveal Solution Next Question

QUESTION: 91

A manufacturer of car engines collects data from cars as they are being driven. The data collected includes timestamp, engine temperature, rotations per minute (RPM), and other sensor readings. The company wants to predict when an engine is going to have a problem, so it can notify drivers in advance to get engine maintenance. The engine data is loaded into a data lake for training.

Which is the MOST suitable predictive model that can be deployed into production?

Add labels over time to indicate which engine faults occur at what time in the future to turn this into a supervised learning problem. Use a recurrent neural network (RNN) to train the model to recognize when an engine might need maintenance for a certain fault.
This data requires an unsupervised learning algorithm. Use Amazon SageMaker k-means to cluster the data.
Add labels over time to indicate which engine faults occur at what time in the future to turn this into a supervised learning problem. Use a convolutional neural network (CNN) to train the model to recognize when an engine might need maintenance for a certain fault.
This data is already formulated as a time series. Use Amazon SageMaker seq2seq to model the time series.

Answer(s): A

Reference:

https://towardsdatascience.com/how-to-implement-machine-learning-for-predictive-maintenance-4633cdbe4860

Reveal Solution Next Question

QUESTION: 92

A company wants to predict the sale prices of houses based on available historical sales data. The target variable in the company’s dataset is the sale price. The features include parameters such as the lot size, living area measurements, non-living area measurements, number of bedrooms, number of bathrooms, year built, and postal code. The company wants to use multi-variable linear regression to predict house sale prices.

Which step should a machine learning specialist take to remove features that are irrelevant for the analysis and reduce the model’s complexity?

Plot a histogram of the features and compute their standard deviation. Remove features with high variance.
Plot a histogram of the features and compute their standard deviation. Remove features with low variance.
Build a heatmap showing the correlation of the dataset against itself. Remove features with low mutual correlation scores.
Run a correlation check of all features against the target variable. Remove features with low target variable correlation scores.

Answer(s): D

Reveal Solution Next Question

Free MLS-C01 Exam Braindumps (page: 24)

QUESTION: 89

Explanation:

Reference:

QUESTION: 90

Explanation:

Reference:

QUESTION: 91

Reference:

QUESTION: 92

Exam Discussions & Posts