You are implementing security best practices on your data pipeline. Currently, you are manually executing jobs as the Project Owner. You want to automate these jobs by taking nightly batch files containing non-public information from Google Cloud Storage,
processing them with a Spark Scala job on a Google Cloud Dataproc cluster, and depositing the results into Google BigQuery.
How should you securely run this workload?
- Restrict the Google Cloud Storage bucket so only you can see the files
- Grant the Project Owner role to a service account, and run the job with it
- Use a service account with the ability to read the batch files and to write to BigQuery
- Use a user account with the Project Viewer role on the Cloud Dataproc cluster to read the batch files and write to BigQuery
Reveal Solution Next Question