Free CCA175 Exam Braindumps (page: 8)

Page 8 of 25

Problem Scenario 40 : You have been given sample data as below in a file called spark15/file1.txt
3070811, 1963, 1096, , "US", "CA", , 1,
3022811, 1963, 1096, , "US", "CA", , 1, 56
3033811, 1963, 1096, , "US", "CA", , 1, 23

Below is the code snippet to process this tile.
val field= sc.textFile("spark15/f ilel.txt")
val mapper = field.map(x=> A)
mapper.map(x => x.map(x=> {B})).collect
Please fill in A and B so it can generate below final output
Array(Array(3070811, 1963, 109G, 0, "US", "CA", 0, 1, 0)
, Array(3022811, 1963, 1096, 0, "US", "CA", 0, 1, 56)
, Array(3033811, 1963, 1096, 0, "US", "CA", 0, 1, 23)
)

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
A) x.split(", "-1)
B) if (x. isEmpty) 0 else x



Problem Scenario 42 : You have been given a file (sparklO/sales.txt), with the content as given in below.
spark10/sales.txt
Department, Designation, costToCompany, State
Sales, Trainee, 12000, UP
Sales, Lead, 32000, AP
Sales, Lead, 32000, LA
Sales, Lead, 32000, TN
Sales, Lead, 32000, AP
Sales, Lead, 32000, TN
Sales, Lead, 32000, LA
Sales, Lead, 32000, LA
Marketing, Associate, 18000, TN
Marketing, Associate, 18000, TN
HR, Manager, 58000, TN
And want to produce the output as a csv with group by Department, Designation, State with additional columns with sum(costToCompany) and TotalEmployeeCountt
Should get result like
Dept, Desg, state, empCount, totalCost
Sales, Lead, AP, 2, 64000
Sales.Lead.LA.3.96000
Sales, Lead, TN, 2, 64000

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create a file first using Hue in hdfs.
Step 2: Load tile as an RDD
val rawlines = sc.textFile("spark10/sales.txt")
Step 3: Create a case class, which can represent its column fileds. case class Employee(dep: String, des: String, cost: Double, state: String)
Step 4: Split the data and create RDD of all Employee objects. val employees = rawlines.map(_.split(", ")).map(row=>Employee(row(0), row{1), row{2).toDouble, row{3)))

Step 5: Create a row as we needed. All group by fields as a key and value as a count for each employee as well as its cost, val keyVals = employees.map( em => ((em.dep, em.des, em.state), (1 , em.cost)))
Step 6: Group by all the records using reduceByKey method as we want summation as well. For number of employees and their total cost, val results = keyVals.reduceByKey{ (a, b) => (a._1 + b._1, a._2 + b._2)} // (a.count + b.count, a.cost + b.cost)}
Step 7: Save the results in a text file as below.
results.repartition(1).saveAsTextFile("spark10/group.txt")



Problem Scenario 37 : ABCTECH.com has done survey on their Exam Products feedback using a web based form. With the following free text field as input in web ui.
Name: String
Subscription Date: String
Rating : String
And servey data has been saved in a file called spark9/feedback.txt
Christopher|Jan 11, 2015|5
Kapil|11 Jan, 2015|5
Thomas|6/17/2014|5
John|22-08-2013|5
Mithun|2013|5
Jitendra||5
Write a spark program using regular expression which will filter all the valid dates and save in two separate file (good record and bad record)

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create a file first using Hue in hdfs.
Step 2: Write all valid regular expressions sysntex for checking whether records are having valid dates or not.
val regl =......(\d+)\s(\w{3})(, )\s(\d{4}).......r//11 Jan, 2015 val reg2 =......(\d+)(U)(\d+)(U)(\d{4})......s II 6/17/2014 val reg3 =......(\d+)(-)(\d+)(-)(\d{4})""".r//22-08-2013 val reg4 =......(\w{3})\s(\d+)(, )\s(\d{4})......s II Jan 11, 2015
Step 3: Load the file as an RDD.
val feedbackRDD = sc.textFile("spark9/feedback.txt"}
Step 4: As data are pipe separated , hence split the same. val feedbackSplit = feedbackRDD.map(line => line.split('|'))
Step 5: Now get the valid records as well as , bad records.
val validRecords = feedbackSplit.filter(x =>
(reg1.pattern.matcher(x(1).trim).matches|reg2.pattern.matcher(x(1).trim).matches|reg3.patt ern.matcher(x(1).trim).matches | reg4.pattern.matcher(x(1).trim).matches)) val badRecords = feedbackSplit.filter(x =>
!(reg1.pattern.matcher(x(1).trim).matches|reg2.pattern.matcher(x(1).trim).matches|reg3.pat tern.matcher(x(1).trim).matches | reg4.pattern.matcher(x(1).trim).matches))
Step 6: Now convert each Array to Strings
val valid =vatidRecords.map(e => (e(0), e(1), e(2)))
val bad =badRecords.map(e => (e(0), e(1), e(2)))
Step 7: Save the output as a Text file and output must be written in a single tile,
valid.repartition(1).saveAsTextFile("spark9/good.txt") bad.repartition(1).saveAsTextFile("sparkS7bad.txt")



Problem Scenario 25 : You have been given below comma separated employee information. That needs to be added in /home/cloudera/flumetest/in.txt file (to do tail source)
sex, name, city
1, alok, mumbai
1, jatin, chennai
1, yogesh, kolkata
2, ragini, delhi
2, jyotsana, pune
1, valmiki, banglore
Create a flume conf file using fastest non-durable channel, which write data in hive warehouse directory, in two separate tables called flumemaleemployee1 and flumefemaleemployee1
(Create hive table as well for given data}. Please use tail source with /home/cloudera/flumetest/in.txt file.
Flumemaleemployee1 : will contain only male employees data flumefemaleemployee1 :
Will contain only woman employees data

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create hive table for flumemaleemployeel and .' CREATE TABLE flumemaleemployeel
(
sex_type int, name string, city string )
ROW FORMAT DELIMITED FIELDS TERMINATED BY ', ';
CREATE TABLE flumefemaleemployeel
(
sex_type int, name string, city string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ', ';
Step 2: Create below directory and file mkdir /home/cloudera/flumetest/ cd /home/cloudera/flumetest/
Step 3: Create flume configuration file, with below configuration for source, sink and channel and save it in flume5.conf.
agent.sources = tailsrc
agent.channels = mem1 mem2
agent.sinks = stdl std2
agent.sources.tailsrc.type = exec
agent.sources.tailsrc.command = tail -F /home/cloudera/flumetest/in.txt agent.sources.tailsrc.batchSize = 1
agent.sources.tailsrc.interceptors = i1 agent.sources.tailsrc.interceptors.i1.type = regex_extractor agent.sources.tailsrc.interceptors.il.regex = A(\\d} agent.sources.tailsrc. interceptors. M.serializers = t1 agent.sources.tailsrc. interceptors, i1.serializers.t1. name = type
agent.sources.tailsrc.selector.type = multiplexing agent.sources.tailsrc.selector.header = type agent.sources.tailsrc.selector.mapping.1 = memi agent.sources.tailsrc.selector.mapping.2 = mem2
agent.sinks.std1.type = hdfs
agent.sinks.stdl.channel = mem1
agent.sinks.stdl.batchSize = 1
agent.sinks.std1.hdfs.path = /user/hive/warehouse/flumemaleemployeei agent.sinks.stdl.rolllnterval = 0
agent.sinks.stdl.hdfs.tileType = Data Stream
agent.sinks.std2.type = hdfs
agent.sinks.std2.channel = mem2
agent.sinks.std2.batchSize = 1
agent.sinks.std2.hdfs.path = /user/hi ve/warehouse/fIumefemaleemployee1 agent.sinks.std2.rolllnterval = 0
agent.sinks.std2.hdfs.tileType = Data Stream
agent.channels.mem1.type = memory agent.channels.meml.capacity = 100 agent.channels.mem2.type = memory agent.channels.mem2.capacity = 100
agent.sources.tailsrc.channels = mem1 mem2
Step 4: Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/flume5.conf --name agent
Step 5: Open another terminal create a file at /home/cloudera/flumetest/in.txt.
Step 6: Enter below data in file and save it.
l.alok.mumbai
1 jatin.chennai
1, yogesh, kolkata
2, ragini, delhi
2, jyotsana, pune
1, valmiki, banglore
Step 7: Open hue and check the data is available in hive table or not.
Step 8: Stop flume service by pressing ctrl+c



Page 8 of 25



Post your Comments and Discuss Cloudera CCA175 exam with other Community members:

Abhishek commented on December 21, 2024
It was Nice
Anonymous
upvote

Sumeet G Hongekar commented on December 21, 2024
I am eger to write cad exaam
UNITED STATES
upvote

KAREEM ROFIAT BOLANLE commented on December 21, 2024
not yet written the exam
Anonymous
upvote

Subham commented on December 21, 2024
Good set of question for practice
Anonymous
upvote

Krish commented on December 20, 2024
Good to have test papers
INDIA
upvote

Ashish Sharma commented on December 20, 2024
Very elaborative explanation and apt questions
CANADA
upvote

Ashish Sharma commented on December 20, 2024
Very Useful
CANADA
upvote

Ashwani commented on December 20, 2024
Nice questions
UNITED KINGDOM
upvote

hardik commented on December 20, 2024
Very good content
UNITED STATES
upvote

Test commented on December 20, 2024
its helpful
Anonymous
upvote

haardik commented on December 20, 2024
Good so far
UNITED STATES
upvote

priya commented on December 20, 2024
good to prepare for the exam
Anonymous
upvote

Nagaraj commented on December 20, 2024
The questions help me to review
Anonymous
upvote

Reagan commented on December 20, 2024
Very Useful
Anonymous
upvote

Anonymous commented on December 20, 2024
definitely a perfect set of questions
Anonymous
upvote

DD commented on December 20, 2024
Preparing for exam
Anonymous
upvote

Anonymous1 commented on December 20, 2024
Nice questions
Anonymous
upvote

PrepGoku commented on December 20, 2024
Great list of questions, with full explaination
Anonymous
upvote

Hemlata commented on December 20, 2024
Great content
INDIA
upvote

Nicholos J Frates commented on December 20, 2024
I just passed the Salesforce-AI-Associate exam recently! my Result Card: https://docs.google.com/document/d/1CicoY5IGQwyyanVV_cCEUE2jFT86tyl3FZ_hA6Q_BiM
Anonymous
upvote

Hemlata commented on December 20, 2024
It is useful.
INDIA
upvote

Koomi commented on December 20, 2024
Great Content
Anonymous
upvote

Aamamm commented on December 20, 2024
useful for certfication
UNITED STATES
upvote

Preeti commented on December 20, 2024
How many questions in exam was from dump who give exam recently?
INDIA
upvote

Preeti commented on December 20, 2024
Have any of you taken the exam recently and passed just by using this dump?
INDIA
upvote

gill commented on December 20, 2024
nice nice nice
Anonymous
upvote

Hitesh commented on December 20, 2024
good practice questions available here
UNITED STATES
upvote

KT commented on December 20, 2024
I passed using this dumps.
Anonymous
upvote

Hassan commented on December 19, 2024
Quite challenging and interesting
Anonymous
upvote

mke commented on December 19, 2024
so far so good
UNITED STATES
upvote

JP commented on December 19, 2024
Good so far
UNITED STATES
upvote

Anyah Vincent Ndubuisi commented on December 19, 2024
Microsoft SC 200 SOC, is awesomely good enough for every cybersecurity specialist. Well detailed for freshers also. From Anyah Vincent.Nigeria.
Anonymous
upvote

Shehan commented on December 19, 2024
Superb stuff
Anonymous
upvote

gege commented on December 19, 2024
The questions looks promising and well formatted. But has anyone passed this exam recently? I have heard the exam is very very hard.
Anonymous
upvote