Free CCA175 Exam Braindumps (page: 12)

Page 11 of 25

Problem Scenario 3: You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.categories
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish following activities.

1. Import data from categories table, where category=22 (Data should be stored in
categories subset)
2. Import data from categories table, where category>22 (Data should be stored in
categories_subset_2)
3. Import data from categories table, where category between 1 and 22 (Data should be
stored in categories_subset_3)
4. While importing catagories data change the delimiter to '|' (Data should be stored in
categories_subset_S)
5. Importing data from catagories table and restrict the import to category_name,category
id columns only with delimiter as '|'
6. Add null values in the table using below SQL statement ALTER TABLE categories
modify category_department_id int(11); INSERT INTO categories values
(eO.NULL.'TESTING');
7. Importing data from catagories table (In categories_subset_17 directory) using '|'
delimiter and categoryjd between 1 and 61 and encode null values for both string and non
string columns.
8. Import entire schema retail_db in a directory categories_subset_all_tables

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution:

Step 1: Import Single table (Subset data} Note: Here the ' is the same you find on - key sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --username=retail_dba - password=cloudera -table=categories ~warehouse-dir= categories_subset --where \'category_id\'=22 --m 1
Step 2: Check the output partition
hdfs dfs -cat categoriessubset/categories/part-m-00000
Step 3: Change the selection criteria (Subset data) sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --username=retail_dba - password=cloudera -table=categories ~warehouse-dir= categories_subset_2 --where \'category_id\'\>22 -m 1
Step 4: Check the output partition
hdfs dfs -cat categories_subset_2/categories/part-m-00000
Step 5: Use between clause (Subset data)
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --username=retail_dba - password=cloudera -table=categories ~warehouse-dir=categories_subset_3 --where "\'category_id\' between 1 and 22" --m 1
Step 6: Check the output partition
hdfs dfs -cat categories_subset_3/categories/part-m-00000
Step 7: Changing the delimiter during import.
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --username=retail dba - password=cloudera -table=categories -warehouse-dir=:categories_subset_6 --where "/'categoryjd /' between 1 and 22" -fields-terminated-by='|' -m 1
Step 8: Check the.output partition
hdfs dfs -cat categories_subset_6/categories/part-m-00000
Step 9: Selecting subset columns
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --username=retail_dba - password=cloudera -table=categories --warehouse-dir=categories subset col -where "/'category id/' between 1 and 22" -fields-terminated-by=T -columns=category name, category id --m 1
Step 10: Check the output partition
hdfs dfs -cat categories_subset_col/categories/part-m-00000
Step 11: Inserting record with null values (Using mysql} ALTER TABLE categories modify category_department_id int(11); INSERT INTO categories values ^NULL/TESTING'); select" from categories;
Step 12: Encode non string null column
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --username=retail dba - password=cloudera -table=categories --warehouse-dir=categortes_subset_17 -where "\"category_id\" between 1 and 61" -fields-terminated-by=, |' --null-string-N' -null-non- string=, N' --m 1
Step 13: View the content
hdfs dfs -cat categories_subset_17/categories/part-m-00000
Step 14: Import all the tables from a schema (This step will take little time) sqoop import-all-tables -connect jdbc:mysql://quickstart:3306/retail_db -- username=retail_dba -password=cloudera -warehouse-dir=categories_si
Step 15: View the contents
hdfs dfs -Is categories_subset_all_tables
Step 16: Cleanup or back to originals.
delete from categories where categoryid in (59, 60);
ALTER TABLE categories modify category_department_id int(11) NOTNULL; ALTER TABLE categories modify category_name varchar(45) NOT NULL; desc categories;



Problem Scenario GG : You have been given below code snippet.
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "spider", "eagle"), 2)
val b = a.keyBy(_.length)
val c = sc.parallelize(List("ant", "falcon", "squid"), 2)
val d = c.keyBy(.length)
operation 1

Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, String)] = Array((4, lion))

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :
b.subtractByKey(d).collect
subtractByKey [Pair] : Very similar to subtract, but instead of supplying a function, the key- component of each pair will be automatically used as criterion for removing items from the first RDD.



Problem Scenario 89 : You have been given below patient data in csv format,
patientID, name, dateOfBirth, lastVisitDate
1001, Ah Teck, 1991-12-31, 2012-01-20
1002, Kumar, 2011-10-29, 2012-09-20
1003, Ali, 2011-01-30, 2012-10-21
Accomplish following activities.

1. Find all the patients whose lastVisitDate between current time and '2012-09-15'
2. Find all the patients who born in 2011
3. Find all the patients age
4. List patients whose last visited more than 60 days ago
5. Select patients 18 years old or younger

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution :

Step 1:
hdfs dfs -mkdir sparksql3
hdfs dfs -put patients.csv sparksql3/
Step 2: Now in spark shell
// SQLContext entry point for working with structured data val sqlContext = neworg.apache.spark.sql.SQLContext(sc) // this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.impIicits._
// Import Spark SQL data types and Row.
import org.apache.spark.sql._
// load the data into a new RDD
val patients = sc.textFilef'sparksqIS/patients.csv") // Return the first element in this RDD
patients.first()
//define the schema using a case class
case class Patient(patientid: Integer, name: String, dateOfBirth:String , lastVisitDate:
String)
// create an RDD of Product objects
val patRDD = patients.map(_.split(M, M)).map(p => Patient(p(0).tolnt, p(1), p(2), p(3))) patRDD.first()
patRDD.count(}
// change RDD of Product objects to a DataFrame val patDF = patRDD.toDF() // register the DataFrame as a temp table patDF.registerTempTable("patients"} // Select data from table
val results = sqlContext.sql(......SELECT* FROM patients '.....) // display dataframe in a tabular format
results.show()
//Find all the patients whose lastVisitDate between current time and '2012-09-15' val results = sqlContext.sql(......SELECT * FROM patients WHERE TO_DATE(CAST(UNIX_TIMESTAMP(lastVisitDate, 'yyyy-MM-dd') AS TIMESTAMP)) BETWEEN '2012-09-15' AND current_timestamp() ORDER BY lastVisitDate......) results.showQ
/.Find all the patients who born in 2011
val results = sqlContext.sql(......SELECT * FROM patients WHERE YEAR(TO_DATE(CAST(UNIXJTlMESTAMP(dateOfBirth, 'yyyy-MM-dd') AS
TIMESTAMP))) = 2011 ......)
results. show()
//Find all the patients age
val results = sqlContext.sql(......SELECT name, dateOfBirth, datediff(current_date(), TO_DATE(CAST(UNIX_TIMESTAMP(dateOfBirth, 'yyyy-MM-dd') AS TlMESTAMP}}}/365 AS age
FROM patients
Mini >
results.show()
//List patients whose last visited more than 60 days ago
-- List patients whose last visited more than 60 days ago val results = sqlContext.sql(......SELECT name, lastVisitDate FROM patients WHERE datediff(current_date(), TO_DATE(CAST(UNIX_TIMESTAMP[lastVisitDate, 'yyyy-MM-dd') AS T1MESTAMP))) > 60......);
results. showQ;
-- Select patients 18 years old or younger
SELECT' FROM patients WHERE TO_DATE(CAST(UNIXJTlMESTAMP(dateOfBirth, 'yyyy-MM-dd') AS TIMESTAMP}) > DATE_SUB(current_date(), INTERVAL 18 YEAR); val results = sqlContext.sql(......SELECT' FROM patients WHERE TO_DATE(CAST(UNIX_TIMESTAMP(dateOfBirth, 'yyyy-MM--dd') AS TIMESTAMP)) > DATE_SUB(current_date(), T8*365)......);
results. showQ;
val results = sqlContext.sql(......SELECT DATE_SUB(current_date(), 18*365) FROM patients......);
results.show();



Problem Scenario 95 : You have to run your Spark application on yarn with each executor Maximum heap size to be 512MB and Number of processor cores to allocate on each executor will be 1 and Your main application required three values as input arguments V1 V2 V3.

Please replace XXX, YYY, ZZZ
./bin/spark-submit -class com.hadoopexam.MyTask --master yarn-cluster--num-executors 3 --driver-memory 512m XXX YYY lib/hadoopexam.jarZZZ

  1. See the explanation for Step by Step Solution and configuration.

Answer(s): A

Explanation:

Solution:

XXX: -executor-memory 512m YYY: -executor-cores 1
ZZZ : V1 V2 V3
Notes : spark-submit on yarn options Option Description archives Comma-separated list of archives to be extracted into the working directory of each executor. The path must be globally visible inside your cluster; see Advanced Dependency Management.
executor-cores Number of processor cores to allocate on each executor. Alternatively, you can use the spark.executor.cores property, executor-memory Maximum heap size to allocate to each executor. Alternatively, you can use the spark.executor.memory-property. num-executors Total number of YARN containers to allocate for this application. Alternatively, you can use the spark.executor.instances property. queue YARN queue to submit to. For more information, see Assigning Applications and Queries to Resource Pools. Default: default.






Post your Comments and Discuss Cloudera CCA175 exam with other Community members:

CCA175 Exam Discussions & Posts