QUESTION: 1
You are using MADlib for Linear Regression analysis. Which value does the statement return?
SELECT (linregr(depvar, indepvar)).r2 FROM zeta1;
A. Goodness of fit
B. Coefficients
C. Standard error
D. P-value
Answer(s): A
QUESTION: 2
Which data asset is an example of quasi-structured data?
A. Webserver log
B. XML data file
C. Database table
D. News article
Answer(s): A
QUESTION: 3
What would be considered "Big Data"?
A. An OLAP Cube containing customer demographic information about 100, 000, 000
customers
B. Daily Log files from a web server that receives 100, 000 hits per minute
C. Aggregated statistical data stored in a relational database table
D. Spreadsheets containing monthly sales data for a Global 100 corporation
Answer(s): B
QUESTION: 4
A data scientist plans to classify the sentiment polarity of 10, 000 product reviews collected from
the Internet. What is the most appropriate model to use? Suppose labeled training data is
available.
A. Na?ve Bayesian classifier
B. Linear regression
C. Logistic regression
D. K-means clustering
Answer(s): A
QUESTION: 5
In which lifecycle stage are test and training data sets created?