Downloading multiple Amazon S3 files

2

Situation

I have hundreds (and even thousands) of small files (~ 50KB) in Amazon S3 separated into buckets per day.

Problem

I need to download through my Java application delivering to the front end of all files for a given period. My machine in the Cloud is limited in memory and disk resources (it has 2GB of RAM and 5GB of disk).

Solution 1

Download one by one the files and pass them to the front end? This is a rather inefficient solution, since it's thousands of small files.

Solution 2

Download one by one the files and zip compress (considering the limits of the machine, breaking the zip in parts if any) and upload this zip to Amazon S3, delivering to the front only the zip link.

Question

Is there another solution that someone else has used, some native AWS feature, or some more efficient idea to solve this problem?

    
asked by anonymous 06.06.2018 / 19:00

1 answer

1

If I understood correctly the problem would be performance, some things that I believe can help:

1- Use a function to make the zip ready: link

2- Deliver to the client via CloudFront (CDN): link

3- Deliver via BitTorrent: link

4- Use the TransferManager class to download in parallel: link

5 - Avoid using files as small as possible, maybe add to larger batches with lambda or glue.

Delivering direct from S3 / CloudFront is better in terms of cost, performance and security.

    
07.06.2018 / 10:43