Free CCA175 Exam Braindumps (page: 2)

Page 2 of 25

Problem Scenario 59 : You have been given below code snippet.
val x = sc.parallelize(1 to 20)
val y = sc.parallelize(10 to 30) operationl
z.collect
Write a correct code snippet for operationl which will produce desired output, shown below. Array[lnt] = Array(16, 12, 20, 13, 17, 14, 18, 10, 19, 15, 11)

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
val z = x.intersection(y)
intersection : Returns the elements in the two RDDs which are the same.



Problem Scenario 74 : You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.orders
table=retail_db.order_items
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Columns of order table : (orderjd , order_date , ordercustomerid, order status}
Columns of orderjtems table : (order_item_td , order_item_order_id , order_item_product_id,
order_item_quantity, order_item_subtotal, order_item_product_price)
Please accomplish following activities.

1. Copy "retaildb.orders" and "retaildb.orderjtems" table to hdfs in respective directory
p89_orders and p89_order_items .
2. Join these data using orderjd in Spark and Python
3. Now fetch selected columns from joined data Orderld, Order date and amount collected
on this order.
4. Calculate total order placed for each date, and produced the output sorted by date.

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution:Step 1: Import Single table .
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=orders --target-dir=p89_orders - -m1 sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=order_items ~target-dir=p89_ order items -m 1 Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs
Step 2: Read the data from one of the partition, created using above command, hadoopfs -cat p89_orders/part-m-00000 hadoop fs -cat p89_order_items/part-m-00000

Step 3: Load these above two directory as RDD using Spark and Python (Open pyspark terminal and do following). orders = sc.textFile("p89_orders") orderitems = sc.textFile("p89_order_items")
Step 4: Convert RDD into key value as (orderjd as a key and rest of the values as a value)
#First value is orderjd
ordersKeyValue = orders.map(lambda line: (int(line.split(", ")[0]), line))
#Second value as an Orderjd
orderltemsKeyValue = orderltems.map(lambda line: (int(line.split(", ")[1]), line))
Step 5: Join both the RDD using orderjd
joinedData = orderltemsKeyValue.join(ordersKeyValue)
#print the joined data
tor line in joinedData.collect():
print(line)
Format of joinedData as below.
[Orderld, 'All columns from orderltemsKeyValue', 'All columns from orders Key Value']
Step 6: Now fetch selected values Orderld, Order date and amount collected on this order. revenuePerOrderPerDay = joinedData.map(lambda row: (row[0]( row[1][1].split(", ")[1]( f!oat(row[1][0].split('\M}[4]}}}
#printthe result
for line in revenuePerOrderPerDay.collect():
print(line)
Step 7: Select distinct order ids for each date.
#distinct(date, order_id)
distinctOrdersDate = joinedData.map(lambda row: row[1][1].split('\")[1] + ", " + str(row[0])).distinct()
for line in distinctOrdersDate.collect(): print(line)
Step 8: Similar to word count, generate (date, 1) record for each row. newLineTuple = distinctOrdersDate.map(lambda line: (line.split(", ")[0], 1))
Step 9: Do the count for each key(date), to get total order per date. totalOrdersPerDate = newLineTuple.reduceByKey(lambda a, b: a + b}
#print results
for line in totalOrdersPerDate.collect():
print(line)

Step 10: Sort the results by date sortedData=totalOrdersPerDate.sortByKey().collect()
#print results
for line in sortedData:
print(line)



Problem Scenario 34 : You have given a file named spark6/user.csv.
Data is given below:
user.csv
id, topic, hits
Rahul, scala, 120
Nikita, spark, 80
Mithun, spark, 1
myself, cca175, 180
Now write a Spark code in scala which will remove the header part and create RDD of values as below, for all rows. And also if id is myself" than filter out row.
Map(id -> om, topic -> scala, hits -> 120)

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create file in hdfs (We will do using Hue). However, you can first create in local filesystem and then upload it to hdfs.
Step 2: Load user.csv file from hdfs and create PairRDDs val csv = sc.textFile("spark6/user.csv")
Step 3: split and clean data
val headerAndRows = csv.map(line => line.split(", ").map(_.trim))
Step 4: Get header row
val header = headerAndRows.first
Step 5: Filter out header (We need to check if the first val matches the first header name) val data = headerAndRows.filter(_(0) != header(O))
Step 6: Splits to map (header/value pairs)
val maps = data.map(splits => header.zip(splits).toMap)

Step 7: Filter out the user "myself
val result = maps.filter(map => mapf'id") != "myself")
Step 8: Save the output as a Text file. result.saveAsTextFile("spark6/result.txt")



Problem Scenario 39 : You have been given two files
spark16/file1.txt
1, 9, 5
2, 7, 4
3, 8, 3
spark16/file2.txt
1, g, h
2, i, j
3, k, l
Load these two tiles as Spark RDD and join them to produce the below results
(l, ((9, 5), (g, h)))
(2, ((7, 4), (i, j))) (3, ((8, 3), (k, l)))
And write code snippet which will sum the second columns of above joined results (5+4+3).

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create tiles in hdfs using Hue.
Step 2: Create pairRDD for both the files.
val one = sc.textFile("spark16/file1.txt").map{
_.split(", ", -1) match {
case Array(a, b, c) => (a, ( b, c))
} }
val two = sc.textFHe(Mspark16/file2.txt").map{
_.split('7\-1) match {
case Array(a, b, c) => (a, (b, c))
} }
Step 3: Join both the RDD. val joined = one.join(two)
Step 4: Sum second column values.
val sum = joined.map {
case (_, ((_, num2), (_, _))) => num2.tolnt
}.reduce(_ + _)



Page 2 of 25



Post your Comments and Discuss Cloudera CCA175 exam with other Community members:

BU WIN SIO commented on December 11, 2024
GOOD VERY HELP FUL
UNITED STATES
upvote

Pss wd commented on December 11, 2024
preparing for exam
Anonymous
upvote

Anonymous commented on December 11, 2024
really good
INDIA
upvote

Anonymous commented on December 10, 2024
Good questions for revision
UNITED STATES
upvote

Milik commented on December 10, 2024
Very resourceful information
Anonymous
upvote

Milik commented on December 10, 2024
Great info Marion to succeed on your test……….
Anonymous
upvote

Ritesh commented on December 10, 2024
Good content
Anonymous
upvote

Mikil commented on December 10, 2024
I will tell others about this study site
Anonymous
upvote

Milik commented on December 10, 2024
Good resource for your studies. I will refer to my frirnds
Anonymous
upvote

Mikil commented on December 10, 2024
I will tell others about this site.
Anonymous
upvote

Mikil commented on December 10, 2024
I will tell others of this site
Anonymous
upvote

Mikil commented on December 10, 2024
Great research for my test
Anonymous
upvote

Mikil commented on December 10, 2024
Great resource. I would tell others
Anonymous
upvote

Mikil commented on December 10, 2024
Great resource
Anonymous
upvote

Michelle commented on December 10, 2024
Great resource
Anonymous
upvote

ArulMani commented on December 10, 2024
It's very useful study for EMT exam
UNITED STATES
upvote

no name commented on December 10, 2024
helpful to recap the course
Anonymous
upvote

none commented on December 10, 2024
very helpful to recall the course
Anonymous
upvote

Sandeep Singh commented on December 10, 2024
All questions are from real exam.
UNITED STATES
upvote

Usman commented on December 10, 2024
It is a great collection but I have noticed that some answers are wrong. For example, it says that correct answer is B but the description of that answer matches with answer A. So it is advisable to read the answer's description as well.
Anonymous
upvote

Anamika commented on December 10, 2024
dumps are good and helpful
UNITED STATES
upvote

santosh k sharma commented on December 10, 2024
A good way to practice
Anonymous
upvote

Faith Egwuenu commented on December 09, 2024
The case studies/questions were very helpful.
Anonymous
upvote

Jaydin commented on December 09, 2024
Think I will do well on test I'm brave confident I swear no hard feelings
UNITED STATES
upvote

Jaydin grimball commented on December 09, 2024
I doing well thinks
UNITED STATES
upvote

Calista Eva commented on December 09, 2024
Good practice
UNITED STATES
upvote

mamatha commented on December 09, 2024
informative
Anonymous
upvote

Mishti commented on December 08, 2024
Preparing for certification
CANADA
upvote

Jbomb commented on December 08, 2024
I'll take the test and report back
KOREA REPUBLIC OF
upvote

Vic commented on December 08, 2024
Interesting answers
CANADA
upvote

Cristina commented on December 08, 2024
good questions
ROMANIA
upvote

kanhaiya kumar commented on December 08, 2024
awsome stuff
Anonymous
upvote

WILLIAM RIBEIRO RODRIGUES commented on December 08, 2024
Amazing place to learning and share knowleg.
BRAZIL
upvote

WILLIAM RIBEIRO RODRIGUES commented on December 08, 2024
Nice place to practice and learning.
BRAZIL
upvote