Working with ID boundaries in large volumes

Question

Working with ID boundaries in large volumes

Navigation

#1 by (5 votes)
#2 by (4 votes)
#3 by (2 votes)

4

I have in mind some questions about large volumes of data in MYSQL , referring to the ID limit. I'll give an ID type tinyint as an example to simplify the explanation. In the example the field tinyint accepts 127 records.

Assuming 100 new records are entered, this creates IDs from 1 to 100 . Eventually the 50 first records will be deleted, then I will have only 50 records with IDs of 50 to 100 and allocation to plus 27 records, which results in 77 records. That is, we have 77 records only, in a table that should receive up to 127 records.

I know that MYSQL does not regress to occupy vacant positions, the question is another. It is not uncommon to have to remove entries, and with that you lose a space in the table.

Using bigint instead of int which is the most commonly used, would only increase the time to reach the limit?

Disregard any error in the above calculations:)

mysql sql modelagem

asked by anonymous 06.08.2015 / 19:05

3 answers

4

I indicate the use of UUID since it is unique.

Normally we sort the data by other columns and not by the id. To know who comes before who is only order by the creation date if it has and if necessary.

06.08.2015 / 19:23

2

If the ID to which you refer is a surrogate key ( link ), that is, a value that exists just to give identity to each record, having no relation to your application, then you should not worry about it. Put bigint and stay calm. Do not worry about reusing IDs that have been lost by deletions, this will give a lot more work for benefit 0.

Obviously, everything has a limit, but the unsigned bigint of MYSQL goes up to 18446744073709551615 ( link ). There is a lot of registry in the table to achieve this, and if your application hits, there will be another way to resolve it, maybe another table with the same structure that bears the other 18446744073709551615 records.

06.08.2015 / 19:18

How to tell if a port is open or not in Linux Why declare pointer to array if arrays are already pointers?

score 5 · Accepted Answer

Cantoni's answer would basically be mine. I do not share the sympathy of UUID. I think it's useful in some cases but you have to have a reason to use it and I do not think that's the case. The UUID has several problems that are not relevant here, but only the consumption of space is already one of them.

I would use BIGINT same. I doubt you come into trouble with this. Physically, you can not have this number of rows in the table even in the next few decades. If it gets deleting too many lines I think it would make up for having a mechanism that compresses the IDs that it rotates periodically. I doubt you need to, but needing to renumber all the rows by making the proper settings where the IDs were used in other tables. Yes, of course, this is not simple and in some cases it pays off, in others it does not. Remember that if one day you have to change the type of the column, you will have similar work.

I reinforce that I doubt that it is necessary to do all this mechanism, but if you want to avoid this, I think it is better to create a combination of columns to increase the maximum number, provided that this is done in advance. You can also use a string type, which would have to do with the UUID, and control the increment by the database itself.

If you need to save even space, then use the smallest possible type, if there are too many deletions, and the deleted IDs need to be reused soon so as not to pop, I would create a free-list reusing the lines instead of removing them. Each comes that you have to remove, marks the line as removed (a proper column for this that will be used to filter) and places the ID in a table that stores a list of available IDs. When you enter, if you have an ID in this list, remove it and reuse the line by clearing it completely and saving the new data. You have to know how to do it the right way in competing environments, as is common.

No one is good, but there are some alternatives to what can be done, in addition to UUID, which is still an option.