Free Google Professional Data Engineer Exam Questions (page: 37)

Which of these is not a supported method of putting data into a partitioned table?

  1. If you have existing data in a separate file for each day, then create a partitioned table and upload each file into the appropriate partition.
  2. Run a query to get the records for a specific day from an existing table and for the destination table, specify a partitioned table ending with the day in the format "$YYYYMMDD".
  3. Create a partitioned table and stream new records to it every day.
  4. Use ORDER BY to put a table's rows into chronological order and then change the table's type to "Partitioned".

Answer(s): D

Explanation:

You cannot change an existing table into a partitioned table. You must create a partitioned table from scratch. Then you can either stream data into it every day and the data will automatically be put in the right partition, or you can load data into a specific partition by using "$YYYYMMDD" at the end of the table name.


Reference:

https://cloud.google.com/bigquery/docs/partitioned-tables



Which of these operations can you perform from the BigQuery Web UI?

  1. Upload a file in SQL format.
  2. Load data with nested and repeated fields.
  3. Upload a 20 MB file.
  4. Upload multiple files using a wildcard.

Answer(s): B

Explanation:

You can load data with nested and repeated fields using the Web UI.

You cannot use the Web UI to:

- Upload a file greater than 10 MB in size

- Upload multiple files at the same time

- Upload a file in SQL format

All three of the above operations can be performed using the "bq" command.


Reference:

https://cloud.google.com/bigquery/loading-data



Which methods can be used to reduce the number of rows processed by BigQuery?

  1. Splitting tables into multiple tables; putting data in partitions
  2. Splitting tables into multiple tables; putting data in partitions; using the LIMIT clause
  3. Putting data in partitions; using the LIMIT clause
  4. Splitting tables into multiple tables; using the LIMIT clause

Answer(s): A

Explanation:

If you split a table into multiple tables (such as one table for each day), then you can limit your query to the data in specific tables (such as for particular days). A better method is to use a partitioned table, as long as your data can be separated by the day.

If you use the LIMIT clause, BigQuery will still process the entire table.


Reference:

https://cloud.google.com/bigquery/docs/partitioned-tables



Why do you need to split a machine learning dataset into training data and test data?

  1. So you can try two different sets of features
  2. To make sure your model is generalized for more than just the training data
  3. To allow you to create unit tests in your code
  4. So you can use one dataset for a wide model and one for a deep model

Answer(s): B

Explanation:

The flaw with evaluating a predictive model on training data is that it does not inform you on how well the model has generalized to new unseen data. A model that is selected for its accuracy on the training dataset rather than its accuracy on an unseen test dataset is very likely to have lower accuracy on an unseen test dataset. The reason is that the model is not as generalized. It has specialized to the structure in the training dataset. This is called overfitting.


Reference:

https://machinelearningmastery.com/a-simple-intuition-for-overfitting/



Which of these numbers are adjusted by a neural network as it learns from a training dataset (select 2 answers)?

  1. Weights
  2. Biases
  3. Continuous features
  4. Input values

Answer(s): A,B

Explanation:

A neural network is a simple mechanism that's implemented with basic math. The only difference between the traditional programming model and a neural network is that you let the computer determine the parameters (weights and bias) by learning from training datasets.


Reference:

https://cloud.google.com/blog/big-data/2016/07/understanding-neural-networks-with- tensorflow-playground



Viewing page 37 of 78



Post your Comments and Discuss Google Professional Data Engineer exam prep with other Community members:

Professional Data Engineer Exam Discussions & Posts