Problem Scenario 89 : You have been given below patient data in csv format,
patientID, name, dateOfBirth, lastVisitDate
1001, Ah Teck, 1991-12-31, 2012-01-20
1002, Kumar, 2011-10-29, 2012-09-20
1003, Ali, 2011-01-30, 2012-10-21
Accomplish following activities.
1. Find all the patients whose lastVisitDate between current time and '2012-09-15'
2. Find all the patients who born in 2011
3. Find all the patients age
4. List patients whose last visited more than 60 days ago
5. Select patients 18 years old or younger
- See the explanation for Step by Step Solution and configuration.
Answer(s): A
Explanation:
Solution :
Step 1:
hdfs dfs -mkdir sparksql3
hdfs dfs -put patients.csv sparksql3/
Step 2: Now in spark shell
// SQLContext entry point for working with structured data val sqlContext = neworg.apache.spark.sql.SQLContext(sc) // this is used to implicitly convert an RDD to a DataFrame.
import sqlContext.impIicits._
// Import Spark SQL data types and Row.
import org.apache.spark.sql._
// load the data into a new RDD
val patients = sc.textFilef'sparksqIS/patients.csv") // Return the first element in this RDD
patients.first()
//define the schema using a case class
case class Patient(patientid: Integer, name: String, dateOfBirth:String , lastVisitDate:
String)
// create an RDD of Product objects
val patRDD = patients.map(_.split(M, M)).map(p => Patient(p(0).tolnt, p(1), p(2), p(3))) patRDD.first()
patRDD.count(}
// change RDD of Product objects to a DataFrame val patDF = patRDD.toDF() // register the DataFrame as a temp table patDF.registerTempTable("patients"} // Select data from table
val results = sqlContext.sql(......SELECT* FROM patients '.....) // display dataframe in a tabular format
results.show()
//Find all the patients whose lastVisitDate between current time and '2012-09-15' val results = sqlContext.sql(......SELECT * FROM patients WHERE TO_DATE(CAST(UNIX_TIMESTAMP(lastVisitDate, 'yyyy-MM-dd') AS TIMESTAMP)) BETWEEN '2012-09-15' AND current_timestamp() ORDER BY lastVisitDate......) results.showQ
/.Find all the patients who born in 2011
val results = sqlContext.sql(......SELECT * FROM patients WHERE YEAR(TO_DATE(CAST(UNIXJTlMESTAMP(dateOfBirth, 'yyyy-MM-dd') AS
TIMESTAMP))) = 2011 ......)
results. show()
//Find all the patients age
val results = sqlContext.sql(......SELECT name, dateOfBirth, datediff(current_date(), TO_DATE(CAST(UNIX_TIMESTAMP(dateOfBirth, 'yyyy-MM-dd') AS TlMESTAMP}}}/365 AS age
FROM patients
Mini >
results.show()
//List patients whose last visited more than 60 days ago
-- List patients whose last visited more than 60 days ago val results = sqlContext.sql(......SELECT name, lastVisitDate FROM patients WHERE datediff(current_date(), TO_DATE(CAST(UNIX_TIMESTAMP[lastVisitDate, 'yyyy-MM-dd') AS T1MESTAMP))) > 60......);
results. showQ;
-- Select patients 18 years old or younger
SELECT' FROM patients WHERE TO_DATE(CAST(UNIXJTlMESTAMP(dateOfBirth, 'yyyy-MM-dd') AS TIMESTAMP}) > DATE_SUB(current_date(), INTERVAL 18 YEAR); val results = sqlContext.sql(......SELECT' FROM patients WHERE TO_DATE(CAST(UNIX_TIMESTAMP(dateOfBirth, 'yyyy-MM--dd') AS TIMESTAMP)) > DATE_SUB(current_date(), T8*365)......);
results. showQ;
val results = sqlContext.sql(......SELECT DATE_SUB(current_date(), 18*365) FROM patients......);
results.show();
Reveal Solution
Next Question