QUESTION: 21 Exam Topic: 1, Main Questions Set A

Your company uses a proprietary system to send inventory data every 6 hours to a data ingestion service in the cloud. Transmitted data includes a payload of several fields and the timestamp of the transmission. If there are any concerns about a transmission, the system re-transmits the dat

How should you deduplicate the data most efficiency?
Assign global unique identifiers (GUID) to each data entry.
Compute the hash value of each data entry, and compare it with all historical data.
Store each data entry as the primary key in a separate database and apply an index.
Maintain a database table to store the hash value and other metadata for each data entry.

Answer(s): D

Reveal Solution Next Question

QUESTION: 22 Exam Topic: 1, Main Questions Set A

Your company has hired a new data scientist who wants to perform complicated analyses across very large datasets stored in Google Cloud Storage and in a Cassandra cluster on Google Compute Engine. The scientist primarily wants to create labelled data sets for machine learning projects, along with some visualization tasks. She reports that her laptop is not powerful enough to perform her tasks and it is slowing her down. You want to help her perform her tasks.
What should you do?

Run a local version of Jupiter on the laptop.
Grant the user access to Google Cloud Shell.
Host a visualization tool on a VM on Google Compute Engine.
Deploy Google Cloud Datalab to a virtual machine (VM) on Google Compute Engine.

Answer(s): B

Reveal Solution Next Question

QUESTION: 23 Exam Topic: 1, Main Questions Set A

You are deploying 10,000 new Internet of Things devices to collect temperature data in your warehouses globally. You need to process, store and analyze these very large datasets in real time.
What should you do?

Send the data to Google Cloud Datastore and then export to BigQuery.
Send the data to Google Cloud Pub/Sub, stream Cloud Pub/Sub to Google Cloud Dataflow, and store the data in Google BigQuery.
Send the data to Cloud Storage and then spin up an Apache Hadoop cluster as needed in Google
Cloud Dataproc whenever analysis is required.
Export logs in batch to Google Cloud Storage and then spin up a Google Cloud SQL instance, import the data from Cloud Storage, and run an analysis as needed.

Answer(s): B

Reveal Solution Next Question

QUESTION: 24 Exam Topic: 1, Main Questions Set A

You have spent a few days loading data from comma-separated values (CSV) files into the Google BigQuery table CLICK_STREAM. The column DT stores the epoch time of click events. For convenience, you chose a simple schema where every field is treated as the STRING type. Now, you want to compute web session durations of users who visit your site, and you want to change its data type to the TIMESTAMP. You want to minimize the migration effort without making future queries computationally expensive.
What should you do?

Delete the table CLICK_STREAM, and then re-create it such that the column DT is of the TIMESTAMP type. Reload the data.
Add a column TS of the TIMESTAMP type to the table CLICK_STREAM, and populate the numeric values from the column TS for each row.
Reference: the column TS instead of the column DT from now on.
Create a view CLICK_STREAM_V, where strings from the column DT are cast into TIMESTAMP values.
Reference: the view CLICK_STREAM_V instead of the table CLICK_STREAM from now on.
Add two columns to the table CLICK STREAM: TS of the TIMESTAMP type and IS_NEW of the BOOLEAN type. Reload all data in append mode. For each appended row, set the value of IS_NEW to true. For future queries, reference the column TS instead of the column DT, with the WHERE clause ensuring that the value of IS_NEW must be true.
Construct a query to return every row of the table CLICK_STREAM, while using the built-in function to cast strings from the column DT into TIMESTAMP values. Run the query into a destination table NEW_CLICK_STREAM, in which the column TS is the TIMESTAMP type.
Reference: the table NEW_CLICK_STREAM instead of the table CLICK_STREAM from now on. In the future, new data is loaded into the table NEW_CLICK_STREAM.

Answer(s): D

Reference:

the column TS instead of the column DT from now on.
C. Create a view CLICK_STREAM_V, where strings from the column DT are cast into TIMESTAMP values.
the view CLICK_STREAM_V instead of the table CLICK_STREAM from now on.
D. Add two columns to the table CLICK STREAM: TS of the TIMESTAMP type and IS_NEW of the BOOLEAN type. Reload all data in append mode. For each appended row, set the value of IS_NEW to true. For future queries, reference the column TS instead of the column DT, with the WHERE clause ensuring that the value of IS_NEW must be true.
E. Construct a query to return every row of the table CLICK_STREAM, while using the built-in function to cast strings from the column DT into TIMESTAMP values. Run the query into a destination table NEW_CLICK_STREAM, in which the column TS is the TIMESTAMP type.
the table NEW_CLICK_STREAM instead of the table CLICK_STREAM from now on. In the future, new data is loaded into the table NEW_CLICK_STREAM.

Answer(s): D

Reveal Solution Next Question

Free Professional Data Engineer Exam Braindumps (page: 7)

QUESTION: 21 Exam Topic: 1, Main Questions Set A

QUESTION: 22 Exam Topic: 1, Main Questions Set A

QUESTION: 23 Exam Topic: 1, Main Questions Set A

QUESTION: 24 Exam Topic: 1, Main Questions Set A

Reference:

Professional Data Engineer Exam Discussions & Posts