Free CCA175 Exam Braindumps (page: 4)

Page 3 of 25

Problem Scenario 58 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "spider", "eagle"), 2) val b = a.keyBy(_.length)
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, Seq[String])] = Array((4, ArrayBuffer(lion)), (6, ArrayBuffer(spider)), (3, ArrayBuffer(dog, cat)), (5, ArrayBuffer(tiger, eagle}}}

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
b.groupByKey.collect
groupByKey [Pair]
Very similar to groupBy, but instead of supplying a function, the key-component of each pair will automatically be presented to the partitioner.
Listing Variants
def groupByKeyQ: RDD[(K, lterable[V]}]
def groupByKey(numPartittons: Int): RDD[(K, lterable[V] )] def groupByKey(partitioner: Partitioner): RDD[(K, lterable[V])]



Problem Scenario 63 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2)
val b = a.map(x => (x.length, x))
operation1

Write a correct code snippet for operationl which will produce desired output, shown below. Array[(lnt, String}] = Array((4, lion), (3, dogcat), (7, panther), (5, tigereagle))

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
b.reduceByKey(_ + _).collect
reduceByKey JPair] : This function provides the well-known reduce functionality in Spark. Please note that any function f you provide, should be commutative in order to generate reproducible results.



Problem Scenario 71 :
Write down a Spark script using Python,
In which it read a file "Content.txt" (On hdfs) with following content.
After that split each row as (key, value), where key is first word in line and entire line as value.
Filter out the empty lines.
And save this key value in "problem86" as Sequence file(On hdfs)
Part 2 : Save as sequence file , where key as null and entire line as value. Read back the stored sequence files.
Content.txt
Hello this is ABCTECH.com
This is XYZTECH.com
Apache Spark Training
This is Spark Learning Session

Spark is faster than MapReduce

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1:
# Import SparkContext and SparkConf
from pyspark import SparkContext, SparkConf

Step 2:
#load data from hdfs
contentRDD = sc.textFile(MContent.txt")

Step 3:
#filter out non-empty lines
nonemptyjines = contentRDD.filter(lambda x: len(x) > 0)

Step 4:
#Split line based on space (Remember : It is mandatory to convert is in tuple} words = nonempty_lines.map(lambda x: tuple(x.split('', 1)))
words.saveAsSequenceFile("problem86")

Step 5: Check contents in directory problem86 hdfs dfs -cat problem86/part*
Step 6: Create key, value pair (where key is null)
nonempty_lines.map(lambda line: (None, Mne}).saveAsSequenceFile("problem86_1")
Step 7: Reading back the sequence file data using spark. seqRDD = sc.sequenceFile("problem86_1")
Step 8: Print the content to validate the same.
for line in seqRDD.collect():
print(line)



Problem Scenario 12 : You have been given following mysql database details as well as other info.
user=retail_dba
password=cloudera
database=retail_db
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish following.

1. Create a table in retailedb with following definition.
CREATE table departments_new (department_id int(11), department_name varchar(45),
created_date T1MESTAMP DEFAULT NOW());
2. Now isert records from departments table to departments_new
3. Now import data from departments_new table to hdfs.
4. Insert following 5 records in departmentsnew table. Insert into departments_new
values(110, "Civil" , null); Insert into departments_new values(111, "Mechanical" , null);
Insert into departments_new values(112, "Automobile" , null); Insert into departments_new
values(113, "Pharma" , null);
Insert into departments_new values(114, "Social Engineering" , null);
5. Now do the incremental import based on created_date column.

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Login to musql db
mysql --user=retail_dba -password=cloudera
show databases;
use retail db; show tables;
Step 2: Create a table as given in problem statement. CREATE table departments_new (department_id int(11), department_name varchar(45), createddate T1MESTAMP DEFAULT NOW());
show tables;
Step 3: isert records from departments table to departments_new insert into departments_new select a.", null from departments a;
Step 4: Import data from departments new table to hdfs.
sqoop import \
-connect jdbc:mysql://quickstart:330G/retail_db \
~username=retail_dba \
-password=cloudera \
-table departments_new\
--target-dir /user/cloudera/departments_new \
--split-by departments
Stpe 5 : Check the imported data.
hdfs dfs -cat /user/cloudera/departmentsnew/part"
Step 6: Insert following 5 records in departmentsnew table. Insert into departments_new values(110, "Civil" , null); Insert into departments_new values(111, "Mechanical" , null); Insert into departments_new values(112, "Automobile" , null); Insert into departments_new values(113, "Pharma" , null); Insert into departments_new values(114, "Social Engineering" , null); commit;
Stpe 7 : Import incremetal data based on created_date column.
sqoop import \
-connect jdbc:mysql://quickstart:330G/retaiI_db \
-username=retail_dba \
-password=cloudera \
--table departments_new\
-target-dir /user/cloudera/departments_new \
-append \
-check-column created_date \
-incremental lastmodified \
-split-by departments \
-last-value "2016-01-30 12:07:37.0"
Step 8: Check the imported value.
hdfs dfs -cat /user/cloudera/departmentsnew/part"






Post your Comments and Discuss Cloudera CCA175 exam with other Community members:

CCA175 Exam Discussions & Posts