Free CCA175 Exam Braindumps

Problem Scenario 16 : You have been given following mysql database details as well as other info.
user=retail_dba
password=cloudera
database=retail_db
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish below assignment.

1. Create a table in hive as below.
create table departments_hive(department_id int, department_name string);
2. Now import data from mysql table departments to this hive table. Please make sure that
data should be visible using below hive command, select" from departments_hive

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create hive table as said.
hive
show tables;
create table departments_hive(department_id int, department_name string);
Step 2: The important here is, when we create a table without delimiter fields. Then default delimiter for hive is ^A (\001). Hence, while importing data we have to provide proper delimiter.
sqoop import \
-connect jdbc:mysql://quickstart:3306/retail_db \
~username=retail_dba \
-password=cloudera \
--table departments \
--hive-home /user/hive/warehouse \
-hive-import \
-hive-overwrite \
--hive-table departments_hive \
--fields-terminated-by '\001'
Step 3: Check-the data in directory.
hdfs dfs -Is /user/hive/warehouse/departments_hive
hdfs dfs -cat/user/hive/warehouse/departmentshive/part'
Check data in hive table.
Select * from departments_hive;



Problem Scenario 68 : You have given a file as below.
spark75/f ile1.txt
File contain some text. As given Below
spark75/file1.txt
Apache Hadoop is an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common and should be automatically handled by the framework
The core of Apache Hadoop consists of a storage part known as Hadoop Distributed File System (HDFS) and a processing part called MapReduce. Hadoop splits files into large blocks and distributes them across nodes in a cluster. To process data, Hadoop transfers packaged code for nodes to process in parallel based on the data that needs to be processed.
his approach takes advantage of data locality nodes manipulating the data they have access to to allow the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking
For a slightly more complicated task, lets look into splitting up sentences from our documents into word bigrams. A bigram is pair of successive tokens in some sequence. We will look at building bigrams from the sequences of words in each sentence, and then try to find the most frequently occuring ones.
The first problem is that values in each partition of our initial RDD describe lines from the file rather than sentences. Sentences may be split over multiple lines. The glom() RDD method is used to create a single entry for each document containing the list of all lines, we can then join the lines up, then resplit them into sentences using "." as the separator, using flatMap so that every object in our RDD is now a sentence.
A bigram is pair of successive tokens in some sequence. Please build bigrams from the sequences of words in each sentence, and then try to find the most frequently occuring ones.

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create all three tiles in hdfs (We will do using Hue}. However, you can first create in local filesystem and then upload it to hdfs.
Step 2: The first problem is that values in each partition of our initial RDD describe lines from the file rather than sentences. Sentences may be split over multiple lines. The glom() RDD method is used to create a single entry for each document containing the list of all lines, we can then join the lines up, then resplit them into sentences using "." as the separator, using flatMap so that every object in our RDD is now a sentence. sentences = sc.textFile("spark75/file1.txt") \ .glom() \ map(lambda x: " ".join(x)) \ .flatMap(lambda x: x.spllt("."))
Step 3: Now we have isolated each sentence we can split it into a list of words and extract the word bigrams from it. Our new RDD contains tuples containing the word bigram (itself a tuple containing the first and second word) as the first value and the number 1 as the second value. bigrams = sentences.map(lambda x:x.split()) \ .flatMap(lambda x: [((x[i], x[i+1]), 1)for i in range(0, len(x)-1)])
Step 4: Finally we can apply the same reduceByKey and sort steps that we used in the wordcount example, to count up the bigrams and sort them in order of descending frequency. In reduceByKey the key is not an individual word but a bigram.
freq_bigrams = bigrams.reduceByKey(lambda x, y:x+y)\
map(lambda x:(x[1], x[0])) \
sortByKey(False)
freq_bigrams.take(10)



CORRECT TEXT
Problem Scenario 79 : You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.orders
table=retail_db.order_items
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Columns of products table : (product_id | product categoryid | product_name | product_description | product_prtce | product_image )
Please accomplish following activities.

1. Copy "retaildb.products" table to hdfs in a directory p93_products
2. Filter out all the empty prices
3. Sort all the products based on price in both ascending as well as descending order.
4. Sort all the products based on price as well as product_id in descending order.
5. Use the below functions to do data ordering or ranking and fetch top 10 elements top()
takeOrdered() sortByKey()

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Import Single table .
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=products -target-dir=p93_products -m 1 Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs
Step 2: Read the data from one of the partition, created using above command, hadoop fs -cat p93_products/part-m-00000
Step 3: Load this directory as RDD using Spark and Python (Open pyspark terminal and do following). productsRDD = sc.textFile("p93_products")
Step 4: Filter empty prices, if exists
#filter out empty prices lines
nonemptyjines = productsRDD.filter(lambda x: len(x.split(", ")[4]) > 0)
Step 5: Now sort data based on product_price in order.
sortedPriceProducts=nonempty_lines.map(lambdaline:(float(line.split(", ")[4]), line.split(", ")[2] )).sortByKey()
for line in sortedPriceProducts.collect(): print(line)
Step 6: Now sort data based on product_price in descending order.
sortedPriceProducts=nonempty_lines.map(lambda line:
(float(line.split(", ")[4]), line.split(", ")[2])).sortByKey(False) for line in sortedPriceProducts.collect(): print(line)
Step 7: Get highest price products name.
sortedPriceProducts=nonemptyJines.map(lambda line : (float(line.split(", ")[4]), line- split(, , , , , )[2]))-sortByKey(False).take(1)
print(sortedPriceProducts)
Step 8: Now sort data based on product_price as well as product_id in descending order. #Dont forget to cast string #Tuple as key ((price, id), name) sortedPriceProducts=nonemptyJines.map(lambda line : ((float(line print(sortedPriceProducts)
Step 9: Now sort data based on product_price as well as product_id in descending order, using top() function.
#Dont forget to cast string
#Tuple as key ((price, id), name)
sortedPriceProducts=nonemptyJines.map(lambda line: ((float(line.s^^ print(sortedPriceProducts)
Step 10: Now sort data based on product_price as ascending and product_id in ascending order, using takeOrdered{) function.
#Dont forget to cast string
#Tuple as key ((price, id), name) sortedPriceProducts=nonemptyJines.map(lambda line:
((float(line.split(", "}[4]}, int(line.split(", "}[0]}}, line.split(", "}[2]}}.takeOrdered(10, lambda tuple :
(tuple[0][0], tuple[0][1]))
Step 11: Now sort data based on product_price as descending and product_id in ascending order, using takeOrdered() function.
#Dont forget to cast string
#Tuple as key ((price, id}, name)
#Using minus(-) parameter can help you to make descending ordering , only for numeric value.
sortedPrlceProducts=nonemptylines.map(lambda line:
((float(line.split(", "}[4]}, int(line.split(", "}[0]}}, line.split(", "}[2]}}.takeOrdered(10, lambda tuple :
(-tuple[0][0], tuple[0][1]}}



Problem Scenario 49 : You have been given below code snippet (do a sum of values by key}, with intermediate output.
val keysWithValuesList = Array("foo=A", "foo=A", "foo=A", "foo=A", "foo=B", "bar=C", "bar=D", "bar=D")
val data = sc.parallelize(keysWithValuesl_ist}
//Create key value pairs
val kv = data.map(_.split("=")).map(v => (v(0), v(l))).cache()
val initialCount = 0;
val countByKey = kv.aggregateByKey(initialCount)(addToCounts, sumPartitionCounts)
Now define two functions (addToCounts, sumPartitionCounts) such, which will produce following results.
Output 1
countByKey.collect
res3: Array[(String, Int)] = Array((foo, 5), (bar, 3))
import scala.collection._
val initialSet = scala.collection.mutable.HashSet.empty[String]
val uniqueByKey = kv.aggregateByKey(initialSet)(addToSet, mergePartitionSets)
Now define two functions (addToSet, mergePartitionSets) such, which will produce following results.
Output 2:
uniqueByKey.collect
res4: Array[(String, scala.collection.mutable.HashSet[String])] = Array((foo, Set(B, A}}, (bar, Set(C, D}}}

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
val addToCounts = (n: Int, v: String) => n + 1
val sumPartitionCounts = (p1: Int, p2: Int} => p1 + p2
val addToSet = (s: mutable.HashSet[String], v: String) => s += v
val mergePartitionSets = (p1: mutable.HashSet[String], p2: mutable.HashSet[String]) => p1 ++= p2






Post your Comments and Discuss Cloudera CCA175 exam with other Community members:

CCA175 Exam Discussions & Posts