Free CCA175 Exam Braindumps (page: 5)

Page 5 of 25

Problem Scenario 24 : You have been given below comma separated employee information.
Data Set:
name, salary, sex, age
alok, 100000, male, 29
jatin, 105000, male, 32
yogesh, 134000, male, 39
ragini, 112000, female, 35
jyotsana, 129000, female, 39
valmiki, 123000, male, 29
Requirements:
Use the netcat service on port 44444, and nc above data line by line. Please do the following activities.

1. Create a flume conf file using fastest channel, which write data in hive warehouse
directory, in a table called flumemaleemployee (Create hive table as well tor given data).
2. While importing, make sure only male employee data is stored.

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Step 1: Create hive table for flumeemployee.'
CREATE TABLE flumemaleemployee
(
name string,
salary int,
sex string,
age int
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ', ';
Step 2: Create flume configuration file, with below configuration for source, sink and channel and save it in flume4.conf.
#Define source , sink, channel and agent.
agent1 .sources = source1
agent1 .sinks = sink1
agent1 .channels = channel1
# Describe/configure source1
agent1 .sources.source1.type = netcat
agent1 .sources.source1.bind = 127.0.0.1
agent1.sources.sourcel.port = 44444
#Define interceptors
agent1.sources.source1.interceptors=il
agent1 .sources.source1.interceptors.i1.type=regex_filter agent1 .sources.source1.interceptors.i1.regex=female agent1 .sources.source1.interceptors.i1.excludeEvents=true
## Describe sink1
agent1 .sinks, sinkl.channel = memory-channel
agent1.sinks.sink1.type = hdfs
agent1 .sinks, sinkl. hdfs. path = /user/hive/warehouse/flumemaleemployee hdfs-agent.sinks.hdfs-write.hdfs.writeFormat=Text
agentl .sinks.sink1.hdfs.fileType = Data Stream
# Now we need to define channel1 property.
agent1.channels.channel1.type = memory
agent1.channels.channell.capacity = 1000
agent1.channels.channel1.transactionCapacity = 100
# Bind the source and sink to the channel
agent1 .sources.source1.channels = channel1
agent1 .sinks.sink1.channel = channel1
Step 3: Run below command which will use this configuration file and append data in hdfs.

Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/flumeconf/flume4.conf --name agentl
Step 4: Open another terminal and use the netcat service, nc localhost 44444
Step 5: Enter data line by line.
alok, 100000, male, 29
jatin, 105000, male, 32
yogesh, 134000, male, 39
ragini, 112000, female, 35
jyotsana, 129000, female, 39
valmiki.123000.male.29
Step 6: Open hue and check the data is available in hive table or not.
Step 7: Stop flume service by pressing ctrl+c
Step 8: Calculate average salary on hive table using below query. You can use either hive command line tool or hue. select avg(salary) from flumeemployee;



Problem Scenario 27 : You need to implement near real time solutions for collecting information when submitted in file with below information.
Data
echo "IBM, 100, 20160104" >> /tmp/spooldir/bb/.bb.txt
echo "IBM, 103, 20160105" >> /tmp/spooldir/bb/.bb.txt
mv /tmp/spooldir/bb/.bb.txt /tmp/spooldir/bb/bb.txt
After few mins
echo "IBM, 100.2, 20160104" >> /tmp/spooldir/dr/.dr.txt
echo "IBM, 103.1, 20160105" >> /tmp/spooldir/dr/.dr.txt
mv /tmp/spooldir/dr/.dr.txt /tmp/spooldir/dr/dr.txt
Requirements:
You have been given below directory location (if not available than create it) /tmp/spooldir . You have a finacial subscription for getting stock prices from BloomBerg as well as
Reuters and using ftp you download every hour new files from their respective ftp site in directories /tmp/spooldir/bb and /tmp/spooldir/dr respectively.
As soon as file committed in this directory that needs to be available in hdfs in /tmp/flume/finance location in a single directory.
Write a flume configuration file named flume7.conf and use it to load data in hdfs with following additional properties .

1. Spool /tmp/spooldir/bb and /tmp/spooldir/dr
2. File prefix in hdfs sholuld be events
3. File suffix should be .log
4. If file is not commited and in use than it should have _ as prefix.
5. Data should be written as text to hdfs

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
Step 1: Create directory mkdir /tmp/spooldir/bb mkdir /tmp/spooldir/dr
Step 2: Create flume configuration file, with below configuration for
agent1.sources = source1 source2
agent1 .sinks = sink1
agent1.channels = channel1
agent1 .sources.source1.channels = channel1
agentl .sources.source2.channels = channell agent1 .sinks.sinkl.channel = channell agent1 .sources.source1.type = spooldir
agent1 .sources.sourcel.spoolDir = /tmp/spooldir/bb
agent1 .sources.source2.type = spooldir
agent1 .sources.source2.spoolDir = /tmp/spooldir/dr
agent1 .sinks.sink1.type = hdfs
agent1 .sinks.sink1.hdfs.path = /tmp/flume/finance
agent1-sinks.sink1.hdfs.filePrefix = events
agent1.sinks.sink1.hdfs.fileSuffix = .log
agent1 .sinks.sink1.hdfs.inUsePrefix = _
agent1 .sinks.sink1.hdfs.fileType = Data Stream
agent1.channels.channel1.type = file
Step 4: Run below command which will use this configuration file and append data in hdfs.
Start flume service:
flume-ng agent -conf /home/cloudera/flumeconf -conf-file /home/cloudera/fIumeconf/fIume7.conf --name agent1
Step 5: Open another terminal and create a file in /tmp/spooldir/ echo "IBM, 100, 20160104" » /tmp/spooldir/bb/.bb.txt
echo "IBM, 103, 20160105" » /tmp/spooldir/bb/.bb.txt mv /tmp/spooldir/bb/.bb.txt /tmp/spooldir/bb/bb.txt
After few mins
echo "IBM, 100.2, 20160104" » /tmp/spooldir/dr/.dr.txt echo "IBM, 103.1, 20160105" »/tmp/spooldir/dr/.dr.txt mv /tmp/spooldir/dr/.dr.txt /tmp/spooldir/dr/dr.txt



Problem Scenario 60 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "salmon", "salmon", "rat", "elephant"}, 3}
val b = a.keyBy(_.length)
val c = sc.parallelize(List("dog", "cat", "gnu", "salmon", "rabbit", "turkey", "woif", "bear", "bee"), 3)
val d = c.keyBy(_.length)
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, (String, String))] = Array((6, (salmon, salmon)), (6, (salmon, rabbit)), (6, (salmon, turkey)), (6, (salmon, salmon)), (6, (salmon, rabbit)),
(6, (salmon, turkey)), (3, (dog, dog)), (3, (dog, cat)), (3, (dog, gnu)), (3, (dog, bee)), (3, (rat, dog)), (3, (rat, cat)), (3, (rat, gnu)), (3, (rat, bee)))

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

solution:
b.join(d).collect
join [Pair]: Performs an inner join using two key-value RDDs. Please note that the keys must be generally comparable to make this work. keyBy : Constructs two-component tuples (key-value pairs) by applying a function on each data item. The result of the function becomes the data item becomes the key and the original value of the newly created tuples.



Problem Scenario 90 : You have been given below two files
course.txt
id, course
1, Hadoop
2, Spark
3, HBase
fee.txt
id, fee
2, 3900
3, 4200
4, 2900
Accomplish the following activities.

1. Select all the courses and their fees , whether fee is listed or not.
2. Select all the available fees and respective course. If course does not exists still list the
fee
3. Select all the courses and their fees , whether fee is listed or not. However, ignore
records having fee as null.

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :

Step 1:
hdfs dfs -mkdir sparksql4
hdfs dfs -put course.txt sparksql4/
hdfs dfs -put fee.txt sparksql4/
Step 2: Now in spark shell
// load the data into a new RDD
val course = sc.textFile("sparksql4/course.txt")
val fee = sc.textFile("sparksql4/fee.txt")
// Return the first element in this RDD
course.fi rst()
fee.fi rst()
//define the schema using a case class case class Course(id: Integer, name: String) case class Fee(id: Integer, fee: Integer)
// create an RDD of Product objects
val courseRDD = course.map(_.split(", ")).map(c => Course(c(0).tolnt, c(1))) val feeRDD =fee.map(_.split(", ")).map(c => Fee(c(0}.tolnt, c(1}.tolnt)) courseRDD.first()
courseRDD.count(}
feeRDD.first()
feeRDD.countQ
// change RDD of Product objects to a DataFrame val courseDF = courseRDD.toDF(} val feeDF = feeRDD.toDF{)
// register the DataFrame as a temp table courseDF. registerTempTable("course") feeDF.
registerTempTablef'fee")
// Select data from table
val results = sqlContext.sql(......SELECT' FROM course """ ) results. showQ
val results = sqlContext.sql(......SELECT' FROM fee......) results. showQ
val results = sqlContext.sql(......SELECT * FROM course LEFT JOIN fee ON course.id = fee.id......)
results-showQ
val results ="sqlContext.sql(......SELECT * FROM course RIGHT JOIN fee ON course.id = fee.id "MM )
results. showQ
val results = sqlContext.sql(......SELECT' FROM course LEFT JOIN fee ON course.id = fee.id where fee.id IS NULL"
results. show()



Page 5 of 25



Post your Comments and Discuss Cloudera CCA175 exam with other Community members:

MG commented on November 24, 2024
I love it .Thanks
Anonymous
upvote

MG commented on November 24, 2024
It saved me for my exam preparationg .
Anonymous
upvote

Dev commented on November 24, 2024
Appreciate it very much
Anonymous
upvote

Crypt TH commented on November 24, 2024
its greate resource to prepre for the exam
JORDAN
upvote

Humtet commented on November 23, 2024
Very helpful and informative
CANADA
upvote

Oga commented on November 23, 2024
Graet very helpful
CANADA
upvote

kspp commented on November 23, 2024
Good Material
UNITED STATES
upvote

Saurabh commented on November 23, 2024
These are good dumps
UNITED STATES
upvote

cron commented on November 23, 2024
care to share those who bought this exam guide? more power
Anonymous
upvote

Lakshminarsimhan.R commented on November 23, 2024
The questions and answers are good in this portal, kindly please add comments as well for answers, so that it will be very hepful.
Anonymous
upvote

Naredn commented on November 22, 2024
Best practices at one place
Anonymous
upvote

max commented on November 22, 2024
thanks, i appreciate it
CANADA
upvote

Fefe commented on November 22, 2024
Great so far
Anonymous
upvote

sami commented on November 22, 2024
is the quation real exam?
GERMANY
upvote

sami commented on November 22, 2024
is The Quation from real exam or not
GERMANY
upvote

sam commented on November 22, 2024
practice for cad
RESERVED
upvote

James commented on November 21, 2024
I love this
CANADA
upvote

siii commented on November 21, 2024
great resources for exam
INDIA
upvote

Tom commented on November 21, 2024
Can anyone confirm if these questions are still valid?
SOUTH AFRICA
upvote

Harshit Soni commented on November 21, 2024
Good explanation
INDIA
upvote

Devopsengineer commented on November 21, 2024
review my knowledge to take an exam
UNITED STATES
upvote

gopu singh commented on November 21, 2024
ok ok ok ok
INDIA
upvote

huiyi commented on November 21, 2024
Great dumps to practice
Anonymous
upvote

Harshit Soni commented on November 21, 2024
Questions looks technical and authentic
INDIA
upvote

FN commented on November 21, 2024
Great work team!
ITALY
upvote

raj commented on November 20, 2024
it is good comare to other sites
Anonymous
upvote

bpop commented on November 20, 2024
@Patak when did you take the exam?
UNITED STATES
upvote

Rab commented on November 20, 2024
Useful reference
Anonymous
upvote

Pallavi commented on November 20, 2024
Preparing for certification
EUROPEAN UNION
upvote

John Okediji commented on November 20, 2024
I like it. It's helping me prepare well for my exam.
Anonymous
upvote

aam commented on November 20, 2024
great lesson
Anonymous
upvote

Ajay commented on November 20, 2024
Best exam questions & answers
Anonymous
upvote

Diago commented on November 19, 2024
Unless you use these so called exam dumps to prepare for your exam you are not going to be able to pass your exam. The questions are tricky, confusing and very hard.
Brazil
upvote

Soman commented on November 19, 2024
Hay Guys, I just got my first certificate. This exam dumps worked well... but I also studied prior to coming across this site.
UNITED KINGDOM
upvote