Free CCA175 Exam Braindumps (page: 7)

Page 7 of 25

Problem Scenario 17 : You have been given following mysql database details as well as other info.
user=retail_dba
password=cloudera
database=retail_db
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish below assignment.

1. Create a table in hive as below, create table departments_hiveOl(department_id int,
department_name string, avg_salary int);
2. Create another table in mysql using below statement CREATE TABLE IF NOT EXISTS
departments_hive01(id int, department_name varchar(45), avg_salary int);
3. Copy all the data from departments table to departments_hive01 using insert into
departments_hive01 select a.*, null from departments a;
Also insert following records as below
insert into departments_hive01 values(777, "Not known",1000);
insert into departments_hive01 values(8888, null,1000);
insert into departments_hive01 values(666, null,1100);
4. Now import data from mysql table departments_hive01 to this hive table. Please make
sure that data should be visible using below hive command. Also, while importing if null
value found for department_name column replace it with "" (empty string) and for id column
with -999 select * from departments_hive;

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create hive table as below.
hive
show tables;
create table departments_hive01(department_id int, department_name string, avgsalary int);
Step 2: Create table in mysql db as well.
mysql -user=retail_dba -password=cloudera
use retail_db
CREATE TABLE IF NOT EXISTS departments_hive01(id int, department_name varchar(45), avg_salary int);
show tables;

Step 3: Insert data in mysql table.
insert into departments_hive01 select a.*, null from departments a; check data inserts
select' from departments_hive01;
Now iserts null records as given in problem. insert into departments_hive01 values(777, "Not known", 1000); insert into departments_hive01 values(8888, null, 1000); insert into departments_hive01 values(666, null, 1100);
Step 4: Now import data in hive as per requirement.
sqoop import \
-connect jdbc:mysql://quickstart:3306/retail_db \
~username=retail_dba \
--password=cloudera \
-table departments_hive01 \
--hive-home /user/hive/warehouse \
--hive-import \
-hive-overwrite \
-hive-table departments_hive0l \
--fields-terminated-by '\001' \
--null-string M"\
--null-non-strlng -999 \
-split-by id \
-m 1
Step 5: Checkthe data in directory.
hdfs dfs -Is /user/hive/warehouse/departments_hive01 hdfs dfs -cat/user/hive/warehouse/departments_hive01/part" Check data in hive table.
Select * from departments_hive01;



Problem Scenario 75 : You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.orders
table=retail_db.order_items
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish following activities.

1. Copy "retail_db.order_items" table to hdfs in respective directory p90_order_items .
2. Do the summation of entire revenue in this table using pyspark.
3. Find the maximum and minimum revenue as well.
4. Calculate average revenue
Columns of ordeMtems table : (order_item_id , order_item_order_id ,
order_item_product_id, order_item_quantity,order_item_subtotal,order_
item_subtotal,order_item_product_price)

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Import Single table .
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=order_items --target -dir=p90 ordeMtems --m 1
Note : Please check you dont have space between before or after '=' sign. Sqoop uses the
MapReduce framework to copy data from RDBMS to hdfs
Step 2: Read the data from one of the partition, created using above command. hadoop fs -cat p90_order_items/part-m-00000
Step 3: In pyspark, get the total revenue across all days and orders. entire TableRDD = sc.textFile("p90_order_items")
#Cast string to float
extractedRevenueColumn = entireTableRDD.map(lambda line: float(line.split(", ")[4]))
Step 4: Verify extracted data
for revenue in extractedRevenueColumn.collect():
print revenue
#use reduce'function to sum a single column vale
totalRevenue = extractedRevenueColumn.reduce(lambda a, b: a + b)
Step 5: Calculate the maximum revenue
maximumRevenue = extractedRevenueColumn.reduce(lambda a, b: (a if a>=b else b))
Step 6: Calculate the minimum revenue
minimumRevenue = extractedRevenueColumn.reduce(lambda a, b: (a if a<=b else b))
Step 7: Caclculate average revenue
count=extractedRevenueColumn.count()
averageRev=totalRevenue/count



Problem Scenario 69 : Write down a Spark Application using Python,
In which it read a file "Content.txt" (On hdfs) with following content.
And filter out the word which is less than 2 characters and ignore all empty lines.
Once doen store the filtered data in a directory called "problem84" (On hdfs)
Content.txt

Hello this is ABCTECH.com
This is ABYTECH.com
Apache Spark Training
This is Spark Learning Session
Spark is faster than MapReduce

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create an application with following code and store it in problem84.py
# Import SparkContext and SparkConf
from pyspark import SparkContext, SparkConf
# Create configuration object and set App name
conf = SparkConf().setAppName("CCA 175 Problem 84") sc = sparkContext(conf=conf)
#load data from hdfs
contentRDD = sc.textFile(MContent.txt")
#filter out non-empty lines
nonemptyjines = contentRDD.filter(lambda x: len(x) > 0)
#Split line based on space
words = nonempty_lines.ffatMap(lambda x: x.split(''}}
#filter out all 2 letter words
finalRDD = words.filter(lambda x: len(x) > 2)
for word in finalRDD.collect():
print(word)
#Save final data finalRDD.saveAsTextFile("problem84M)

Step 2: Submit this application
spark-submit -master yarn problem84.py



Problem Scenario 80 : You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.products
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Columns of products table : (product_id | product_category_id | product_name | product_description | product_price | product_image )
Please accomplish following activities.

1. Copy "retaildb.products" table to hdfs in a directory p93_products
2. Now sort the products data sorted by product price per category, use productcategoryid
colunm to group by category

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Import Single table .
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db -username=retail_dba - password=cloudera -table=products --target-dir=p93
Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs
Step 2: Read the data from one of the partition, created using above command, hadoop fs -cat p93_products/part-m-00000
Step 3: Load this directory as RDD using Spark and Python (Open pyspark terminal and do following}. productsRDD = sc.textFile(Mp93_products")
Step 4: Filter empty prices, if exists
#filter out empty prices lines
Nonempty_lines = productsRDD.filter(lambda x: len(x.split(", ")[4]) > 0)
Step 5: Create data set like (categroyld, (id, name, price) mappedRDD = nonempty_lines.map(lambda line: (line.split(", ")[1], (line.split(", ")[0], line.split(", ")[2], float(line.split(", ")[4]))))
tor line in mappedRDD.collect(): print(line)
Step 6: Now groupBy the all records based on categoryld, which a key on mappedRDD it will produce output like (categoryld, iterable of all lines for a key/categoryld) groupByCategroyld = mappedRDD.groupByKey() for line in groupByCategroyld.collect():
print(line)

Step 7: Now sort the data in each category based on price in ascending order.
# sorted is a function to sort an iterable, we can also specify, what would be the Key on which we want to sort in this case we have price on which it needs to be sorted. groupByCategroyld.map(lambda tuple: sorted(tuple[1], key=lambda tupleValue:
tupleValue[2])).take(5)
Step 8: Now sort the data in each category based on price in descending order. # sorted is a function to sort an iterable, we can also specify, what would be the Key on which we want to sort in this case we have price which it needs to be sorted. on groupByCategroyld.map(lambda tuple: sorted(tuple[1], key=lambda tupleValue:
tupleValue[2] , reverse=True)).take(5)



Page 7 of 25



Post your Comments and Discuss Cloudera CCA175 exam with other Community members:

the coder1 commented on December 11, 2024
It helped alot
UNITED KINGDOM
upvote

N commented on December 11, 2024
This is so good. I will literally ace the test.
Anonymous
upvote

BU WIN SIO commented on December 11, 2024
GOOD VERY HELP FUL
UNITED STATES
upvote

Pss wd commented on December 11, 2024
preparing for exam
Anonymous
upvote

Anonymous commented on December 11, 2024
really good
INDIA
upvote

Anonymous commented on December 10, 2024
Good questions for revision
UNITED STATES
upvote

Milik commented on December 10, 2024
Very resourceful information
Anonymous
upvote

Milik commented on December 10, 2024
Great info Marion to succeed on your test……….
Anonymous
upvote

Ritesh commented on December 10, 2024
Good content
Anonymous
upvote

Mikil commented on December 10, 2024
I will tell others about this study site
Anonymous
upvote

Milik commented on December 10, 2024
Good resource for your studies. I will refer to my frirnds
Anonymous
upvote

Mikil commented on December 10, 2024
I will tell others about this site.
Anonymous
upvote

Mikil commented on December 10, 2024
I will tell others of this site
Anonymous
upvote

Mikil commented on December 10, 2024
Great research for my test
Anonymous
upvote

Mikil commented on December 10, 2024
Great resource. I would tell others
Anonymous
upvote

Mikil commented on December 10, 2024
Great resource
Anonymous
upvote

Michelle commented on December 10, 2024
Great resource
Anonymous
upvote

ArulMani commented on December 10, 2024
It's very useful study for EMT exam
UNITED STATES
upvote

no name commented on December 10, 2024
helpful to recap the course
Anonymous
upvote

none commented on December 10, 2024
very helpful to recall the course
Anonymous
upvote

Sandeep Singh commented on December 10, 2024
All questions are from real exam.
UNITED STATES
upvote

Usman commented on December 10, 2024
It is a great collection but I have noticed that some answers are wrong. For example, it says that correct answer is B but the description of that answer matches with answer A. So it is advisable to read the answer's description as well.
Anonymous
upvote

Anamika commented on December 10, 2024
dumps are good and helpful
UNITED STATES
upvote

santosh k sharma commented on December 10, 2024
A good way to practice
Anonymous
upvote

Faith Egwuenu commented on December 09, 2024
The case studies/questions were very helpful.
Anonymous
upvote

Jaydin commented on December 09, 2024
Think I will do well on test I'm brave confident I swear no hard feelings
UNITED STATES
upvote

Jaydin grimball commented on December 09, 2024
I doing well thinks
UNITED STATES
upvote

Calista Eva commented on December 09, 2024
Good practice
UNITED STATES
upvote

mamatha commented on December 09, 2024
informative
Anonymous
upvote

Mishti commented on December 08, 2024
Preparing for certification
CANADA
upvote

Jbomb commented on December 08, 2024
I'll take the test and report back
KOREA REPUBLIC OF
upvote

Vic commented on December 08, 2024
Interesting answers
CANADA
upvote

Cristina commented on December 08, 2024
good questions
ROMANIA
upvote

kanhaiya kumar commented on December 08, 2024
awsome stuff
Anonymous
upvote