Creating a database for each user

5

I am setting up a web sales system with php and mysql , where we will have several companies registered, each company being a user that will control their sales. Our team has a question: create a banco de dados for each user, or create only banco de dados overall and the tables have the column with the user id to execute the records and queries pertinent to the user? Do any of the options return greater performance in the bank's selects? What are the advantages and disadvantages of each option?

    
asked by anonymous 11.12.2015 / 02:52

3 answers

3

Depends on scale or volume

It all depends on the scale of the product. When scale increases, many design decisions are counterintuitive.

By scale I do not mean just the volume of data. And scale and volume depend on context.

From a performance standpoint, a reasonable volume would be perhaps in the millions of records. I've worked on systems that processed hundreds of thousands of records without any difficulty, obviously with optimizations. If each client has their millions of records, then putting everything in one table can be a problem.

However, from the point of view of the business, it depends on who the customers will be, how the product is intended to be evolved, the procedures that will be adopted for development and maintenance, having the data in a database can be totally desired or totally impossible.

The plan is to sell the system to a few companies and then the demand of each company for maintenance and evolution is expected to be large? Or would it sell the system as a SaaS to a mass of small businesses without them having more advanced access (think Free Market, for example)?

Problems with a shared database

Let's think about some scenarios with a single bank:

  • You need to back up, all clients are slow because of this
  • A client accidentally deletes data and requests a backup restore (I do not want to be in charge of that)
  • Application errors cause data from one client to appear to another, or worse, data is updated without a suitable filter in WHERE .
  • You need to update the application and the new version changes the data structure. You necessarily need to upgrade the system to all customers, all or nothing. It is a common practice in serious companies to release first updates to a subset of customers and then, if all goes well, upgrade the others. That would be impossible.
  • Your company sells the system to different around the world. From a moment it is decided that to improve the performance in other countries or still to suit the current legislation the system has to host the data in that country. If the database were separate, it would be enough to move the database to a virtual machine in a data center of that country and also put an instance of the application there. But now you will have to extract data from a shared bank and put it in another empty.
  • If there is a lot of use by one client, this can disrupt the others. For example, a large client runs a routine that imports one million records. Other clients that are willing to read / write to the same table will be affected.
  • Generating ids and keys will be more complex and / or less intuitive, since two records with consecutive ids may belong to completely different clients.
  • All system queries and all database access methods will require an extra parameter. The domain objects and virtually the entire system would be polluted by the tenant id back and forth.

Advantages of separate databases

  • Making and restoring backups is trivial
  • Support does not go crazy to investigate a data loss / corruption problem
  • Different clients may be in different versions of the system
  • Instances of the system may be scattered around the world and moved as needed
  • Intense use, deadlocks, and other bank-related issues that occur on one client do not affect others.

Difficulties in maintaining multiple banks

The biggest problem for a bank for each client is to keep the strict organization of changes in the database properly related to each version of the system.

Languages like PHP where it's easy for you to simply overwrite a file in production should be double-checked.

First, use a migration library so that your system has the ability to always update the database automatically and ensure that the structure is consistent with that version. This may give you a bit more work at the beginning, but you pay (well, by the way) over time, especially on projects that evolve constantly.

Second, always have strong control over the system versions of your code repository. For example, each version of the system that is for production must have a corresponding tag in Git. So when any problem occurs you can easily join the database and that specific version and reproduce any errors.

    
11.12.2015 / 06:26
5

You probably will not notice any difference in bank performance unless you have millions of customers.

Some advantages of applying this method would be:

  • Simpler bank model;
  • Easier to export and back up data for a specific user;
  • When any changes, problems or maintenance occur, only the affected users are affected;
  • This way it is easier to divert resources if a specific user needs them;
  • Easily expand the database, since it's easy to separate some users and put them on a second server. This is also more economical, as it is cheaper to buy new servers than to upgrade an existing one in most cases.
  • Disadvantages:

  • If you have any changes that need to be made to all users, it will be very complicated.
  • Considerably complicated to create and maintain, mainly to leave organized, versioning of each bank, change log ...
  • The list of advantages can be greater, but in most cases it is not worth the headache that this can cause, I advise you to think well before implementing.

    For more information, see this article by Microsoft on the subject.

        
    11.12.2015 / 04:22
    1

    I would say that the biggest performance villain is the server, not the way the database is structured.

    Regardless of the volume of data generated by each client, it is necessary to take into account the volume of access to the server, especially the simultaneous access. If the server is not able to meet the demand for access, the final performance will be unsatisfactory in any case.

    To give a practical example, I currently maintain a system that has about 10,000 customers and more than 35,000 orders containing on average two to three products; a single database, referencing customers, products and orders through their respective IDs; the system's most complex history query, involving all tables, does not exceed 15 seconds.

        
    11.12.2015 / 05:37