What is the best way to save files, Bank or File System?

3

I have a demand to attach PDFs to my application. Files vary in size from 5 to 200 MB or more.

Here comes my question: what is the best option to store?

  • Direct in the database (PostgreSQL)
  • In the file system

What are the advantages and disadvantages?

    
asked by anonymous 14.04.2015 / 15:09

2 answers

2

The question seems to be off topic, as it is very broad and even based on personal opinions, there is something "accurate". Even so I will share an experience I had, in addition to mentioning some things about the approaches.

As for your specific question, let's first talk about persisting in PostgreSQL, advantages and disadvantages according to them own :

  • Advantages:

    • Access control and security are simplified;
    • version control is easy;
    • ACID
    • Backups are simple
  • Disadvantages:

    • performance: depends on the performance of your filesystem;
    • Increased memory requirements for processing;
    • backups will be larger, of course;
    • access by your application will be greater. Bank access clients typically generate temporary files to access and modify files. (have you ever thought of doing this for a 200MB file? o_O)

What about using normal file system (solution from scratch):

  • Advantages:

    • fast access and less overhead, but highly dependent on the
    • easier to manipulate than some DBMS;
    • Do not lose performance as you grow (consider creating something that does not persist as many files in the same directory, for example)
  • Disadvantages:

    • worry about backups, mirroring;
    • Without transactional support, it can generate garbage, you will have to control it;
    • In a home solution, you may have trouble accessing concurrently;
    • worry about security issues (location of file storage, who can access, etc.);
    • worry about not storing too many files;
    • tend to get fragmented, slower access;

That up there is always considering that the infra is yours, that is, you control everything, because some of the problems are treated if you use something in the cloud, such as memory and everything. In a homemade solution to store in file system you have to worry about a lot, I do not see why I spend time in this nowadays. The reference in the PostgreSQL wiki is very good, consider reading and viewing the other references there =)

Now the personal experience that can even solve problems mentioned above: If you can opt for this, your environment supports consuming third-party services, opt for services such as Amazon S3 and Glacier . In my case, I use S3 for newer files and move them to Glacier after a certain "age" for the sake of cost. In this case, I have persisted on base only where the files are saved, very performative and do not worry about backups, etc.

There are other services on the market, such as Google Blobstore and Google Cloud Storage from Google and on Azure Blob Storage .

This approach will solve problems mentioned before, such as access security, backups, scalability, etc. In addition to letting you focus on your problem, you do not need to reinvent the wheel.

If you can not adopt such things in your environment, consider adopting a GED of life to make your life easier.

Maybe interest:

14.04.2015 / 19:17
1

There is no right answer to your question, there are many things to keep in mind, such as:

  • the number of files to be generated
  • frequency of use
  • security
  • others ...
  • However, I would do the following:

  • Files Saved to Directories
  • Location of files saved to the bank
  • Tips: Use some encryption algorithm to generate random directory names, making it difficult for users to access other files. Also, be sure to set the access permissions to these directories correctly.

        
    14.04.2015 / 18:47