What is the best way to query SQL Server database using a list entry?

1

I have a question as to how best to approach the following problem: I need to query records in a SQL Server table using a list of serial numbers. I know the IN command, but it does not answer because it is limited to receiving a maximum of 2100 data as a parameter and my list has much more than that.

I wonder if it's better:

  • subdivide my list before the call and make multiple queries
  • Pass the list only once and query the registry by using a for in java to go through the list

What would be the best performing form? I will run my tests here on both options, but I imagine that someone has an experience in it and can prevent me from making a mistake.

    
asked by anonymous 13.01.2015 / 13:35

2 answers

3

Let's look at some possibilities, including the ones you mentioned.

Reading in Blocks

Split items into blocks of equal size and execute queries using IN until you read all the items.

  • Advantage: reads only required data.
  • Disadvantage : need multiple queries (item size / block size)
  • Conclusion : This is the best generic solution when you have no idea how many items to search for and the total amount of data in the table.

Individual Reading

Read record to record.

  • Advantage: I do not see any.
  • Disadvantage : This greatly increases the number of queries and processing overhead. Although it may appear that the amount of data is the same, each query executed adds some processing, so if we make a difference between processing and data transmitted raw and net, the raw will be much larger than in the case of block reading. >
  • Conclusion : feasible only if the number of queried items is small.

Single Reading with Auxiliary Table

If somehow the data comes from the database, it would be easier to make a JOIN between the tables or even use IN followed by SELECT . Example:

select * from tab1 where tab1.id in (select tab1_id from tab2)

If the data is not in the database it would still be possible to include it in a temporary table, for example.

  • Advantage: reads the required data in one go.
  • Disadvantage : You may need to include the data in the database before querying.
  • Conclusion : more fitting if the queried items are somehow available in the database. Otherwise, it would be interesting to make a comparison to know if the additional time to insert the items into a temporary table is greater or less than the additional time to query the records in blocks using IN with multiple parameters.

Whole Table Single Reading

Read the entire table and filter the records in Java.

  • Advantage: a single query in the database.
  • Disadvantage : need multiple queries (item size / block size)
  • Conclusion : interesting if the total number of records in the table is not very large, and also if not enough data is read from each record, for example if the case is a read only id 2000 of a total of 4000 records.

Bonus

If performance is critical, a cache in memory with indexed records in some easily retrievable data structure (map, set, list) might be something to think about. >     

13.01.2015 / 14:27
0

This seems like a case of block execution. Often we have to process many records in a routine and several developers reuse the business layer (java or c #) for this, but, experience shows that you will end up compensating on hardware implementation or suffering to optimize code and bank trying to win milliseconds changing small snippets of code, when in fact much of the time spent in routine is spent on network traffic, opening connections, and managing batch execution.

The definitive solution to this does not exist, it will depend a lot on your long-term view of the system and the size of the records and processing actions you perform as well as the architecture of your system.

If you do not want to stick to the layers and you really want to make sure that the business rules are always in the coding (c #, java, c ++ etc) then you're going to have a little trouble to gain performance.

A simple alternative is to reassess if you really need to pass the list of parameters that is the result of the filter, when you can simply pass the filter and operate the select on the bank.

One advantage of this is that you will only use a few parameters running through the network (faster) and since you will not have to break the list, you will only go to the bank once.

Another alternative is to serialize into a string and pass the parameters you need, I think, if you select that data and a filter, you can pass filter values to a procedure is faster and easier. p>

Even if you perform batch processing with the data that is passed (the many records) you could either do this in the procedure that receives the parameters or create a scheduling form based on the filters to process the records in a job .

Avoiding locking the user's screen expecting responses from the transaction.

    
13.01.2015 / 16:28