CURSOR vs table-type variable

2

I'm facing one of those situations where I need to perform an action for each row resulting from a query.

In this way I have two options, to use a Cursor or a Table Variable , however the two seem to me very similar (semantically).

I'd like to know if using Table Variable will get some performance improvement over Cursor .

I believe the difference between the two is that Table Variable will only perform a query and scroll through the records in memory, whereas Cursor will perform a query ( Fetch ) for each row, but not% I have to confirm (this my objection).

So which one is better and why?

EDIT

I decided to add a complete example and the statistics.

TABLE

CREATE TABLE [dbo].[CursorTeste](
    [CursorTesteID] [int] IDENTITY(1,1) NOT NULL,
    [Coluna1] [uniqueidentifier] NOT NULL,
    [Coluna2] [uniqueidentifier] NOT NULL,
    [Coluna3] [uniqueidentifier] NOT NULL,
 CONSTRAINT [PK_CursorTeste] PRIMARY KEY CLUSTERED 
(
    [CursorTesteID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

INSERT

DECLARE @count int;
SET @count = 0;

WHILE (@count < 10000)
BEGIN
    INSERT INTO CursorTeste VALUES (NEWID(), NEWID(), NEWID());
    SET @count = @count + 1;
END

CURSOR

DECLARE @coluna1 uniqueidentifier;
DECLARE @coluna2 uniqueidentifier;
DECLARE @coluna3 uniqueidentifier;
DECLARE @CURSOR_teste CURSOR;

SET @CURSOR_teste = CURSOR LOCAL FAST_FORWARD FOR
SELECT Coluna1, Coluna2, Coluna3 FROM CursorTeste

OPEN @CURSOR_teste
WHILE (1 = 1)
BEGIN
    FETCH NEXT FROM @CURSOR_teste INTO @coluna1, @coluna2, @coluna3;
    IF (@@FETCH_STATUS <> 0)
    BEGIN
        BREAK;
    END

    PRINT '{ ' + cast(@coluna1 as varchar(50)) + ' } - { ' + cast(@coluna2 as varchar(50)) + ' } - { ' + cast(@coluna3 as varchar(50)) + ' }';
END

CLOSE @CURSOR_teste   
DEALLOCATE @CURSOR_teste

Table Variable

DECLARE @coluna1 uniqueidentifier;
DECLARE @coluna2 uniqueidentifier;
DECLARE @coluna3 uniqueidentifier;
DECLARE @indice int
DECLARE @count int
DECLARE @tabela table(
    RowNumber int identity,
    Coluna1 uniqueidentifier not null,
    Coluna2 uniqueidentifier not null,
    Coluna3 uniqueidentifier not null,
    PRIMARY KEY (RowNumber)
);

INSERT INTO @tabela
SELECT Coluna1, Coluna2, Coluna3 FROM CursorTeste

SET @count = (SELECT COUNT(RowNumber) FROM @tabela)
SET @indice = 1;
WHILE (@indice <= @count)
BEGIN
    SELECT 
        @indice = RowNumber + 1,
        @coluna1 = Coluna1, 
        @coluna2 = Coluna2, 
        @coluna3 = Coluna3
    FROM @tabela 
    WHERE RowNumber = @indice

    PRINT '{ ' + cast(@coluna1 as varchar(50)) + ' } - { ' + cast(@coluna2 as varchar(50)) + ' } - { ' + cast(@coluna3 as varchar(50)) + ' }';
END

  • Evaluation 1: CURSOR
  • Evaluation 2: CURSOR FAST_FORWARD
  • Evaluation 3: FAST_FORWARD LOCAL CURSOR
  • Rating 4: WHILE LOOP WITH @TABLE
asked by anonymous 26.03.2015 / 20:37

2 answers

2

Cursors vs. Memory Table

Cursors are almost always sub-optimal with respect to performance, but there are exceptions and depending on the case may not be significant for your problem.

In all of the tests I attended that involved cursors, I have never seen a situation where they were better at performance, but there are reports on the internet that say otherwise for very specific situations. Examples here > and here .

I noticed that you avoided some of the problems by declaring it with FAST_FORWARD , as it allows SQL Server to plot a more optimized plan for reading the data.

The main problem of the in-memory table is the non-moderate use of memory, which ends up limiting the number of records that can be processed per run.

About your performance test

Through your detailed performance test, you may notice that there will not be a real performance gain between the two solutions for this particular case. It's like saying the obvious, with so much detail.

So, use the way that's best for you to maintain code. In this aspect the cursor gains a bit because the code is more compact and intuitive.

The performance difference between the two solutions may be insignificant if the operation you do on each line consumes many resources. The two solutions may be inadequate in such a context.

When you can use a single query

Whenever you can, preference for queries that operate on the data set rather than on each line individually.

For example, imagine that we have the A , B , and C tables and we want to insert a record in C for each relationship between A and B . We could then create a query as follows:

INSERT INTO C (C1, C2)
SELECT A1, B2
FROM A 
JOIN B ON B.FK = A.PK
    
26.03.2015 / 22:29
2

As many people have said here many times: You should only try to solve known performance problems. And potential performance problems (those not yet known but suspected to appear) can well be simulated in advance, then you will have a known problem to solve.

However, in your example using table variable you run a query to fill in the variable and then another query for each row loaded for this variable. While using cursor you execute a single query and then traverses the result of this query line by line.

Considering your examples, due to the number of operations, using cursor will have at least the same performance, tending to be less costly and with less response time than using table .

But see that there are other options to the cursor beyond the one you proposed. For example, if the original query returns a unique value column, such as an id, or if you return a set of columns that together can uniquely identify each record, you can repeat the original query in the loop by alternating this unique identifier instead to pre-populate a table variable.

In addition, only use loops if there is no other solution besides processing row by line the result of the query . Otherwise, SQL Server specializes in linking tables, reading and processing large sets of records at once. The server will always do this with better performance than using multiple commands and than using cursors. And when a single command ends up demanding an impossible logic, you can still use temporary tables to split the processing logic into parts, and each part will act on a large mass of records at once.

If you need to update millions of rows at once, then you may need to process in batches (type, 200,000 rows at a time) because the log file (which is used to keep transactions in progress) may burst the disk space. But all this can be verified in previous tests.

    
26.03.2015 / 22:24