A large company receives les from external parties in Amazon EC2 throughout the day. At the end of the day, the les are combined into a single le, compressed into a gzip le, and uploaded to Amazon S3. The total size of all the les is close to 100 GB daily. Once the les are uploaded to Amazon S3, an
AWS Batch program executes a COPY command to load the les into an Amazon Redshift cluster.
Which program modi cation will accelerate the COPY process?
- Upload the individual les to Amazon S3 and run the COPY command as soon as the les become available.
- Split the number of les so they are equal to a multiple of the number of slices in the Amazon Redshift cluster. Gzip and upload the les to Amazon S3. Run the COPY command on the les.
- Split the number of les so they are equal to a multiple of the number of compute nodes in the Amazon Redshift cluster. Gzip and upload the les to Amazon S3. Run the COPY command on the les.
- Apply sharding by breaking up the les so the distkey columns with the same values go to the same le. Gzip and upload the sharded les to Amazon S3. Run the COPY command on the les.
Answer(s): B
Reference:
https://docs.aws.amazon.com/redshift/latest/dg/t_splitting-data- les.html
Reveal Solution Next Question