Why is a temp table a more efficient solution to the Halloween Problem than an eager spool?












13















Consider the following query that inserts rows from a source table only if they aren't already in the target table:



INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT maybe_new_rows.ID
FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
WHERE NOT EXISTS (
SELECT 1
FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
WHERE maybe_new_rows.ID = halloween.ID
)
OPTION (MAXDOP 1, QUERYTRACEON 7470);


One possible plan shape includes a merge join and an eager spool. The eager spool operator is present to solve the Halloween Problem:



first plan



On my machine, the above code executes in about 6900 ms. Repro code to create the tables is included at the bottom of the question. If I'm dissatisfied with performance I might try to load the rows to be inserted into a temp table instead of relying on the eager spool. Here's one possible implementation:



DROP TABLE IF EXISTS #CONSULTANT_RECOMMENDED_TEMP_TABLE;
CREATE TABLE #CONSULTANT_RECOMMENDED_TEMP_TABLE (
ID BIGINT,
PRIMARY KEY (ID)
);

INSERT INTO #CONSULTANT_RECOMMENDED_TEMP_TABLE WITH (TABLOCK)
SELECT maybe_new_rows.ID
FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
WHERE NOT EXISTS (
SELECT 1
FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
WHERE maybe_new_rows.ID = halloween.ID
)
OPTION (MAXDOP 1, QUERYTRACEON 7470);

INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT new_rows.ID
FROM #CONSULTANT_RECOMMENDED_TEMP_TABLE new_rows
OPTION (MAXDOP 1);


The new code executes in about 4400 ms. I can get actual plans and use Actual Time Statistics™ to examine where time is spent at the operator level. Note that asking for an actual plan adds significant overhead for these queries so totals will not match the previous results.



╔═════════════╦═════════════╦══════════════╗
║ operator ║ first query ║ second query ║
╠═════════════╬═════════════╬══════════════╣
║ big scan ║ 1771 ║ 1744 ║
║ little scan ║ 163 ║ 166 ║
║ sort ║ 531 ║ 530 ║
║ merge join ║ 709 ║ 669 ║
║ spool ║ 3202 ║ N/A ║
║ temp insert ║ N/A ║ 422 ║
║ temp scan ║ N/A ║ 187 ║
║ insert ║ 3122 ║ 1545 ║
╚═════════════╩═════════════╩══════════════╝


The query plan with the eager spool seems to spend significantly more time on the insert and spool operators compared to the plan that uses the temp table.



Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway? I believe I am looking for answers that focus on internals. I'm able to see how the call stacks are different but can't figure out the big picture.



I am on SQL Server 2017 CU 11 in case someone wants to know. Here is code to populate the tables used in the above queries:



DROP TABLE IF EXISTS dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR;

CREATE TABLE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR (
ID BIGINT NOT NULL,
PRIMARY KEY (ID)
);

INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT TOP (20000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
CROSS JOIN master..spt_values t3
OPTION (MAXDOP 1);


DROP TABLE IF EXISTS dbo.A_HEAP_OF_MOSTLY_NEW_ROWS;

CREATE TABLE dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (
ID BIGINT NOT NULL
);

INSERT INTO dbo.A_HEAP_OF_MOSTLY_NEW_ROWS WITH (TABLOCK)
SELECT TOP (1900000) 19999999 + ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;









share|improve this question























  • Spool has to work with bags not just sets, and even with sets there's no guarantee it knows any keys. Note that in your temp table, you not only asserted it was a set but also provided an index supporting that, by declaring the primary key.

    – Damien_The_Unbeliever
    Feb 26 at 9:29
















13















Consider the following query that inserts rows from a source table only if they aren't already in the target table:



INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT maybe_new_rows.ID
FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
WHERE NOT EXISTS (
SELECT 1
FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
WHERE maybe_new_rows.ID = halloween.ID
)
OPTION (MAXDOP 1, QUERYTRACEON 7470);


One possible plan shape includes a merge join and an eager spool. The eager spool operator is present to solve the Halloween Problem:



first plan



On my machine, the above code executes in about 6900 ms. Repro code to create the tables is included at the bottom of the question. If I'm dissatisfied with performance I might try to load the rows to be inserted into a temp table instead of relying on the eager spool. Here's one possible implementation:



DROP TABLE IF EXISTS #CONSULTANT_RECOMMENDED_TEMP_TABLE;
CREATE TABLE #CONSULTANT_RECOMMENDED_TEMP_TABLE (
ID BIGINT,
PRIMARY KEY (ID)
);

INSERT INTO #CONSULTANT_RECOMMENDED_TEMP_TABLE WITH (TABLOCK)
SELECT maybe_new_rows.ID
FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
WHERE NOT EXISTS (
SELECT 1
FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
WHERE maybe_new_rows.ID = halloween.ID
)
OPTION (MAXDOP 1, QUERYTRACEON 7470);

INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT new_rows.ID
FROM #CONSULTANT_RECOMMENDED_TEMP_TABLE new_rows
OPTION (MAXDOP 1);


The new code executes in about 4400 ms. I can get actual plans and use Actual Time Statistics™ to examine where time is spent at the operator level. Note that asking for an actual plan adds significant overhead for these queries so totals will not match the previous results.



╔═════════════╦═════════════╦══════════════╗
║ operator ║ first query ║ second query ║
╠═════════════╬═════════════╬══════════════╣
║ big scan ║ 1771 ║ 1744 ║
║ little scan ║ 163 ║ 166 ║
║ sort ║ 531 ║ 530 ║
║ merge join ║ 709 ║ 669 ║
║ spool ║ 3202 ║ N/A ║
║ temp insert ║ N/A ║ 422 ║
║ temp scan ║ N/A ║ 187 ║
║ insert ║ 3122 ║ 1545 ║
╚═════════════╩═════════════╩══════════════╝


The query plan with the eager spool seems to spend significantly more time on the insert and spool operators compared to the plan that uses the temp table.



Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway? I believe I am looking for answers that focus on internals. I'm able to see how the call stacks are different but can't figure out the big picture.



I am on SQL Server 2017 CU 11 in case someone wants to know. Here is code to populate the tables used in the above queries:



DROP TABLE IF EXISTS dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR;

CREATE TABLE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR (
ID BIGINT NOT NULL,
PRIMARY KEY (ID)
);

INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT TOP (20000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
CROSS JOIN master..spt_values t3
OPTION (MAXDOP 1);


DROP TABLE IF EXISTS dbo.A_HEAP_OF_MOSTLY_NEW_ROWS;

CREATE TABLE dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (
ID BIGINT NOT NULL
);

INSERT INTO dbo.A_HEAP_OF_MOSTLY_NEW_ROWS WITH (TABLOCK)
SELECT TOP (1900000) 19999999 + ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;









share|improve this question























  • Spool has to work with bags not just sets, and even with sets there's no guarantee it knows any keys. Note that in your temp table, you not only asserted it was a set but also provided an index supporting that, by declaring the primary key.

    – Damien_The_Unbeliever
    Feb 26 at 9:29














13












13








13


2






Consider the following query that inserts rows from a source table only if they aren't already in the target table:



INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT maybe_new_rows.ID
FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
WHERE NOT EXISTS (
SELECT 1
FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
WHERE maybe_new_rows.ID = halloween.ID
)
OPTION (MAXDOP 1, QUERYTRACEON 7470);


One possible plan shape includes a merge join and an eager spool. The eager spool operator is present to solve the Halloween Problem:



first plan



On my machine, the above code executes in about 6900 ms. Repro code to create the tables is included at the bottom of the question. If I'm dissatisfied with performance I might try to load the rows to be inserted into a temp table instead of relying on the eager spool. Here's one possible implementation:



DROP TABLE IF EXISTS #CONSULTANT_RECOMMENDED_TEMP_TABLE;
CREATE TABLE #CONSULTANT_RECOMMENDED_TEMP_TABLE (
ID BIGINT,
PRIMARY KEY (ID)
);

INSERT INTO #CONSULTANT_RECOMMENDED_TEMP_TABLE WITH (TABLOCK)
SELECT maybe_new_rows.ID
FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
WHERE NOT EXISTS (
SELECT 1
FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
WHERE maybe_new_rows.ID = halloween.ID
)
OPTION (MAXDOP 1, QUERYTRACEON 7470);

INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT new_rows.ID
FROM #CONSULTANT_RECOMMENDED_TEMP_TABLE new_rows
OPTION (MAXDOP 1);


The new code executes in about 4400 ms. I can get actual plans and use Actual Time Statistics™ to examine where time is spent at the operator level. Note that asking for an actual plan adds significant overhead for these queries so totals will not match the previous results.



╔═════════════╦═════════════╦══════════════╗
║ operator ║ first query ║ second query ║
╠═════════════╬═════════════╬══════════════╣
║ big scan ║ 1771 ║ 1744 ║
║ little scan ║ 163 ║ 166 ║
║ sort ║ 531 ║ 530 ║
║ merge join ║ 709 ║ 669 ║
║ spool ║ 3202 ║ N/A ║
║ temp insert ║ N/A ║ 422 ║
║ temp scan ║ N/A ║ 187 ║
║ insert ║ 3122 ║ 1545 ║
╚═════════════╩═════════════╩══════════════╝


The query plan with the eager spool seems to spend significantly more time on the insert and spool operators compared to the plan that uses the temp table.



Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway? I believe I am looking for answers that focus on internals. I'm able to see how the call stacks are different but can't figure out the big picture.



I am on SQL Server 2017 CU 11 in case someone wants to know. Here is code to populate the tables used in the above queries:



DROP TABLE IF EXISTS dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR;

CREATE TABLE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR (
ID BIGINT NOT NULL,
PRIMARY KEY (ID)
);

INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT TOP (20000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
CROSS JOIN master..spt_values t3
OPTION (MAXDOP 1);


DROP TABLE IF EXISTS dbo.A_HEAP_OF_MOSTLY_NEW_ROWS;

CREATE TABLE dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (
ID BIGINT NOT NULL
);

INSERT INTO dbo.A_HEAP_OF_MOSTLY_NEW_ROWS WITH (TABLOCK)
SELECT TOP (1900000) 19999999 + ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;









share|improve this question














Consider the following query that inserts rows from a source table only if they aren't already in the target table:



INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT maybe_new_rows.ID
FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
WHERE NOT EXISTS (
SELECT 1
FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
WHERE maybe_new_rows.ID = halloween.ID
)
OPTION (MAXDOP 1, QUERYTRACEON 7470);


One possible plan shape includes a merge join and an eager spool. The eager spool operator is present to solve the Halloween Problem:



first plan



On my machine, the above code executes in about 6900 ms. Repro code to create the tables is included at the bottom of the question. If I'm dissatisfied with performance I might try to load the rows to be inserted into a temp table instead of relying on the eager spool. Here's one possible implementation:



DROP TABLE IF EXISTS #CONSULTANT_RECOMMENDED_TEMP_TABLE;
CREATE TABLE #CONSULTANT_RECOMMENDED_TEMP_TABLE (
ID BIGINT,
PRIMARY KEY (ID)
);

INSERT INTO #CONSULTANT_RECOMMENDED_TEMP_TABLE WITH (TABLOCK)
SELECT maybe_new_rows.ID
FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
WHERE NOT EXISTS (
SELECT 1
FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
WHERE maybe_new_rows.ID = halloween.ID
)
OPTION (MAXDOP 1, QUERYTRACEON 7470);

INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT new_rows.ID
FROM #CONSULTANT_RECOMMENDED_TEMP_TABLE new_rows
OPTION (MAXDOP 1);


The new code executes in about 4400 ms. I can get actual plans and use Actual Time Statistics™ to examine where time is spent at the operator level. Note that asking for an actual plan adds significant overhead for these queries so totals will not match the previous results.



╔═════════════╦═════════════╦══════════════╗
║ operator ║ first query ║ second query ║
╠═════════════╬═════════════╬══════════════╣
║ big scan ║ 1771 ║ 1744 ║
║ little scan ║ 163 ║ 166 ║
║ sort ║ 531 ║ 530 ║
║ merge join ║ 709 ║ 669 ║
║ spool ║ 3202 ║ N/A ║
║ temp insert ║ N/A ║ 422 ║
║ temp scan ║ N/A ║ 187 ║
║ insert ║ 3122 ║ 1545 ║
╚═════════════╩═════════════╩══════════════╝


The query plan with the eager spool seems to spend significantly more time on the insert and spool operators compared to the plan that uses the temp table.



Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway? I believe I am looking for answers that focus on internals. I'm able to see how the call stacks are different but can't figure out the big picture.



I am on SQL Server 2017 CU 11 in case someone wants to know. Here is code to populate the tables used in the above queries:



DROP TABLE IF EXISTS dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR;

CREATE TABLE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR (
ID BIGINT NOT NULL,
PRIMARY KEY (ID)
);

INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
SELECT TOP (20000000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2
CROSS JOIN master..spt_values t3
OPTION (MAXDOP 1);


DROP TABLE IF EXISTS dbo.A_HEAP_OF_MOSTLY_NEW_ROWS;

CREATE TABLE dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (
ID BIGINT NOT NULL
);

INSERT INTO dbo.A_HEAP_OF_MOSTLY_NEW_ROWS WITH (TABLOCK)
SELECT TOP (1900000) 19999999 + ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values t1
CROSS JOIN master..spt_values t2;






sql-server sql-server-2017 database-internals






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Feb 26 at 1:27









Joe ObbishJoe Obbish

21.6k43190




21.6k43190













  • Spool has to work with bags not just sets, and even with sets there's no guarantee it knows any keys. Note that in your temp table, you not only asserted it was a set but also provided an index supporting that, by declaring the primary key.

    – Damien_The_Unbeliever
    Feb 26 at 9:29



















  • Spool has to work with bags not just sets, and even with sets there's no guarantee it knows any keys. Note that in your temp table, you not only asserted it was a set but also provided an index supporting that, by declaring the primary key.

    – Damien_The_Unbeliever
    Feb 26 at 9:29

















Spool has to work with bags not just sets, and even with sets there's no guarantee it knows any keys. Note that in your temp table, you not only asserted it was a set but also provided an index supporting that, by declaring the primary key.

– Damien_The_Unbeliever
Feb 26 at 9:29





Spool has to work with bags not just sets, and even with sets there's no guarantee it knows any keys. Note that in your temp table, you not only asserted it was a set but also provided an index supporting that, by declaring the primary key.

– Damien_The_Unbeliever
Feb 26 at 9:29










2 Answers
2






active

oldest

votes


















13














This is what I call Manual Halloween Protection.



You can find an example of it being used with an update statement in my article Optimizing Update Queries. One has to be a bit careful to preserve the same semantics, for example by locking the target table against all concurrent modifications while the separate queries execute, if that is relevant in your scenario.




Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway?




A spool has some of the characteristics of a temporary table, but the two are not exact equivalents. In particular, a spool is essentially a row-by-row unordered insert to a b-tree structure. It does benefit from locking and logging optimizations, but does not support bulk load optimizations.



Consequently, one can often get better performance by splitting the query in a natural way: Bulk loading the new rows into a temporary table or variable, then performing an optimized insert (without explicit Halloween Protection) from the temporary object.



Making this separation also allows you extra freedom to tune the read and write portions of the original statement separately.



As a side note, it is interesting to think about how the Halloween Problem might be addressed using row versions. Perhaps a future version of SQL Server will provide that feature in suitable circumstances.





As Michael Kutz alluded to in a comment, you could also explore the possibility of exploiting the hole-filling optimization to avoid explicit HP. One way to achieve this for the demo is to create a unique index (clustered if you like) on the ID column of A_HEAP_OF_MOSTLY_NEW_ROWS.



CREATE UNIQUE INDEX i ON dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (ID);


With that guarantee in place the optimizer can use hole-filling and rowset sharing:



MERGE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (SERIALIZABLE) AS HICETY
USING dbo.A_HEAP_OF_MOSTLY_NEW_ROWS AS AHOMNR
ON AHOMNR.ID = HICETY.ID
WHEN NOT MATCHED BY TARGET
THEN INSERT (ID) VALUES (AHOMNR.ID);


MERGE plan



While interesting, you will still be able to achieve better performance in many cases by employing carefully-implemented Manual Halloween Protection.






share|improve this answer































    4














    To expand on Paul's answer a bit, part of the difference in elapsed time between the spool and the temp table approaches seems to come down to the lack of support for the DML Request Sort option in the spool plan. With undocumented trace flag 8795, the elapsed time for the temp table approach jumps from 4400 ms to 5600 ms.



    INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
    SELECT new_rows.ID
    FROM #CONSULTANT_RECOMMENDED_TEMP_TABLE new_rows
    OPTION (MAXDOP 1, QUERYTRACEON 8795);


    Note that this isn't exactly equivalent to the insert performed by the spool plan. This query writes significantly more data to the transaction log.



    The same effect can be seen in reverse with some trickery. It is possible to encourage SQL Server to use a sort instead of a spool for Halloween Protection. One implementation:



    INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
    SELECT TOP (987654321)
    maybe_new_rows.ID
    FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
    WHERE NOT EXISTS (
    SELECT 1
    FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
    WHERE maybe_new_rows.ID = halloween.ID
    )
    ORDER BY maybe_new_rows.ID, maybe_new_rows.ID + 1
    OPTION (MAXDOP 1, QUERYTRACEON 7470, MERGE JOIN);


    Now the plan has a TOP N Sort operator in place of the spool. The sort is a blocking operator so the spool is no longer necessary:



    enter image description here



    More importantly, we now have support for the DML Request Sort option. Looking at Actual Time Statistics again, the insert operator now only takes 1623 ms. The entire plan takes about 5400 ms to execute without requesting an actual plan.



    As Hugo explains, the Eager Spool operator does preserve order. That can most easily be seen with a TOP PERCENT plan. It's unfortunate that the original query with the spool cannot take better advantage of the sorted nature of the data in the spool.






    share|improve this answer
























      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "182"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230722%2fwhy-is-a-temp-table-a-more-efficient-solution-to-the-halloween-problem-than-an-e%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      13














      This is what I call Manual Halloween Protection.



      You can find an example of it being used with an update statement in my article Optimizing Update Queries. One has to be a bit careful to preserve the same semantics, for example by locking the target table against all concurrent modifications while the separate queries execute, if that is relevant in your scenario.




      Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway?




      A spool has some of the characteristics of a temporary table, but the two are not exact equivalents. In particular, a spool is essentially a row-by-row unordered insert to a b-tree structure. It does benefit from locking and logging optimizations, but does not support bulk load optimizations.



      Consequently, one can often get better performance by splitting the query in a natural way: Bulk loading the new rows into a temporary table or variable, then performing an optimized insert (without explicit Halloween Protection) from the temporary object.



      Making this separation also allows you extra freedom to tune the read and write portions of the original statement separately.



      As a side note, it is interesting to think about how the Halloween Problem might be addressed using row versions. Perhaps a future version of SQL Server will provide that feature in suitable circumstances.





      As Michael Kutz alluded to in a comment, you could also explore the possibility of exploiting the hole-filling optimization to avoid explicit HP. One way to achieve this for the demo is to create a unique index (clustered if you like) on the ID column of A_HEAP_OF_MOSTLY_NEW_ROWS.



      CREATE UNIQUE INDEX i ON dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (ID);


      With that guarantee in place the optimizer can use hole-filling and rowset sharing:



      MERGE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (SERIALIZABLE) AS HICETY
      USING dbo.A_HEAP_OF_MOSTLY_NEW_ROWS AS AHOMNR
      ON AHOMNR.ID = HICETY.ID
      WHEN NOT MATCHED BY TARGET
      THEN INSERT (ID) VALUES (AHOMNR.ID);


      MERGE plan



      While interesting, you will still be able to achieve better performance in many cases by employing carefully-implemented Manual Halloween Protection.






      share|improve this answer




























        13














        This is what I call Manual Halloween Protection.



        You can find an example of it being used with an update statement in my article Optimizing Update Queries. One has to be a bit careful to preserve the same semantics, for example by locking the target table against all concurrent modifications while the separate queries execute, if that is relevant in your scenario.




        Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway?




        A spool has some of the characteristics of a temporary table, but the two are not exact equivalents. In particular, a spool is essentially a row-by-row unordered insert to a b-tree structure. It does benefit from locking and logging optimizations, but does not support bulk load optimizations.



        Consequently, one can often get better performance by splitting the query in a natural way: Bulk loading the new rows into a temporary table or variable, then performing an optimized insert (without explicit Halloween Protection) from the temporary object.



        Making this separation also allows you extra freedom to tune the read and write portions of the original statement separately.



        As a side note, it is interesting to think about how the Halloween Problem might be addressed using row versions. Perhaps a future version of SQL Server will provide that feature in suitable circumstances.





        As Michael Kutz alluded to in a comment, you could also explore the possibility of exploiting the hole-filling optimization to avoid explicit HP. One way to achieve this for the demo is to create a unique index (clustered if you like) on the ID column of A_HEAP_OF_MOSTLY_NEW_ROWS.



        CREATE UNIQUE INDEX i ON dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (ID);


        With that guarantee in place the optimizer can use hole-filling and rowset sharing:



        MERGE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (SERIALIZABLE) AS HICETY
        USING dbo.A_HEAP_OF_MOSTLY_NEW_ROWS AS AHOMNR
        ON AHOMNR.ID = HICETY.ID
        WHEN NOT MATCHED BY TARGET
        THEN INSERT (ID) VALUES (AHOMNR.ID);


        MERGE plan



        While interesting, you will still be able to achieve better performance in many cases by employing carefully-implemented Manual Halloween Protection.






        share|improve this answer


























          13












          13








          13







          This is what I call Manual Halloween Protection.



          You can find an example of it being used with an update statement in my article Optimizing Update Queries. One has to be a bit careful to preserve the same semantics, for example by locking the target table against all concurrent modifications while the separate queries execute, if that is relevant in your scenario.




          Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway?




          A spool has some of the characteristics of a temporary table, but the two are not exact equivalents. In particular, a spool is essentially a row-by-row unordered insert to a b-tree structure. It does benefit from locking and logging optimizations, but does not support bulk load optimizations.



          Consequently, one can often get better performance by splitting the query in a natural way: Bulk loading the new rows into a temporary table or variable, then performing an optimized insert (without explicit Halloween Protection) from the temporary object.



          Making this separation also allows you extra freedom to tune the read and write portions of the original statement separately.



          As a side note, it is interesting to think about how the Halloween Problem might be addressed using row versions. Perhaps a future version of SQL Server will provide that feature in suitable circumstances.





          As Michael Kutz alluded to in a comment, you could also explore the possibility of exploiting the hole-filling optimization to avoid explicit HP. One way to achieve this for the demo is to create a unique index (clustered if you like) on the ID column of A_HEAP_OF_MOSTLY_NEW_ROWS.



          CREATE UNIQUE INDEX i ON dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (ID);


          With that guarantee in place the optimizer can use hole-filling and rowset sharing:



          MERGE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (SERIALIZABLE) AS HICETY
          USING dbo.A_HEAP_OF_MOSTLY_NEW_ROWS AS AHOMNR
          ON AHOMNR.ID = HICETY.ID
          WHEN NOT MATCHED BY TARGET
          THEN INSERT (ID) VALUES (AHOMNR.ID);


          MERGE plan



          While interesting, you will still be able to achieve better performance in many cases by employing carefully-implemented Manual Halloween Protection.






          share|improve this answer













          This is what I call Manual Halloween Protection.



          You can find an example of it being used with an update statement in my article Optimizing Update Queries. One has to be a bit careful to preserve the same semantics, for example by locking the target table against all concurrent modifications while the separate queries execute, if that is relevant in your scenario.




          Why is the plan with the temp table more efficient? Isn't an eager spool mostly just an internal temp table anyway?




          A spool has some of the characteristics of a temporary table, but the two are not exact equivalents. In particular, a spool is essentially a row-by-row unordered insert to a b-tree structure. It does benefit from locking and logging optimizations, but does not support bulk load optimizations.



          Consequently, one can often get better performance by splitting the query in a natural way: Bulk loading the new rows into a temporary table or variable, then performing an optimized insert (without explicit Halloween Protection) from the temporary object.



          Making this separation also allows you extra freedom to tune the read and write portions of the original statement separately.



          As a side note, it is interesting to think about how the Halloween Problem might be addressed using row versions. Perhaps a future version of SQL Server will provide that feature in suitable circumstances.





          As Michael Kutz alluded to in a comment, you could also explore the possibility of exploiting the hole-filling optimization to avoid explicit HP. One way to achieve this for the demo is to create a unique index (clustered if you like) on the ID column of A_HEAP_OF_MOSTLY_NEW_ROWS.



          CREATE UNIQUE INDEX i ON dbo.A_HEAP_OF_MOSTLY_NEW_ROWS (ID);


          With that guarantee in place the optimizer can use hole-filling and rowset sharing:



          MERGE dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (SERIALIZABLE) AS HICETY
          USING dbo.A_HEAP_OF_MOSTLY_NEW_ROWS AS AHOMNR
          ON AHOMNR.ID = HICETY.ID
          WHEN NOT MATCHED BY TARGET
          THEN INSERT (ID) VALUES (AHOMNR.ID);


          MERGE plan



          While interesting, you will still be able to achieve better performance in many cases by employing carefully-implemented Manual Halloween Protection.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Feb 26 at 4:10









          Paul WhitePaul White

          53.9k14287459




          53.9k14287459

























              4














              To expand on Paul's answer a bit, part of the difference in elapsed time between the spool and the temp table approaches seems to come down to the lack of support for the DML Request Sort option in the spool plan. With undocumented trace flag 8795, the elapsed time for the temp table approach jumps from 4400 ms to 5600 ms.



              INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
              SELECT new_rows.ID
              FROM #CONSULTANT_RECOMMENDED_TEMP_TABLE new_rows
              OPTION (MAXDOP 1, QUERYTRACEON 8795);


              Note that this isn't exactly equivalent to the insert performed by the spool plan. This query writes significantly more data to the transaction log.



              The same effect can be seen in reverse with some trickery. It is possible to encourage SQL Server to use a sort instead of a spool for Halloween Protection. One implementation:



              INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
              SELECT TOP (987654321)
              maybe_new_rows.ID
              FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
              WHERE NOT EXISTS (
              SELECT 1
              FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
              WHERE maybe_new_rows.ID = halloween.ID
              )
              ORDER BY maybe_new_rows.ID, maybe_new_rows.ID + 1
              OPTION (MAXDOP 1, QUERYTRACEON 7470, MERGE JOIN);


              Now the plan has a TOP N Sort operator in place of the spool. The sort is a blocking operator so the spool is no longer necessary:



              enter image description here



              More importantly, we now have support for the DML Request Sort option. Looking at Actual Time Statistics again, the insert operator now only takes 1623 ms. The entire plan takes about 5400 ms to execute without requesting an actual plan.



              As Hugo explains, the Eager Spool operator does preserve order. That can most easily be seen with a TOP PERCENT plan. It's unfortunate that the original query with the spool cannot take better advantage of the sorted nature of the data in the spool.






              share|improve this answer




























                4














                To expand on Paul's answer a bit, part of the difference in elapsed time between the spool and the temp table approaches seems to come down to the lack of support for the DML Request Sort option in the spool plan. With undocumented trace flag 8795, the elapsed time for the temp table approach jumps from 4400 ms to 5600 ms.



                INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
                SELECT new_rows.ID
                FROM #CONSULTANT_RECOMMENDED_TEMP_TABLE new_rows
                OPTION (MAXDOP 1, QUERYTRACEON 8795);


                Note that this isn't exactly equivalent to the insert performed by the spool plan. This query writes significantly more data to the transaction log.



                The same effect can be seen in reverse with some trickery. It is possible to encourage SQL Server to use a sort instead of a spool for Halloween Protection. One implementation:



                INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
                SELECT TOP (987654321)
                maybe_new_rows.ID
                FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
                WHERE NOT EXISTS (
                SELECT 1
                FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
                WHERE maybe_new_rows.ID = halloween.ID
                )
                ORDER BY maybe_new_rows.ID, maybe_new_rows.ID + 1
                OPTION (MAXDOP 1, QUERYTRACEON 7470, MERGE JOIN);


                Now the plan has a TOP N Sort operator in place of the spool. The sort is a blocking operator so the spool is no longer necessary:



                enter image description here



                More importantly, we now have support for the DML Request Sort option. Looking at Actual Time Statistics again, the insert operator now only takes 1623 ms. The entire plan takes about 5400 ms to execute without requesting an actual plan.



                As Hugo explains, the Eager Spool operator does preserve order. That can most easily be seen with a TOP PERCENT plan. It's unfortunate that the original query with the spool cannot take better advantage of the sorted nature of the data in the spool.






                share|improve this answer


























                  4












                  4








                  4







                  To expand on Paul's answer a bit, part of the difference in elapsed time between the spool and the temp table approaches seems to come down to the lack of support for the DML Request Sort option in the spool plan. With undocumented trace flag 8795, the elapsed time for the temp table approach jumps from 4400 ms to 5600 ms.



                  INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
                  SELECT new_rows.ID
                  FROM #CONSULTANT_RECOMMENDED_TEMP_TABLE new_rows
                  OPTION (MAXDOP 1, QUERYTRACEON 8795);


                  Note that this isn't exactly equivalent to the insert performed by the spool plan. This query writes significantly more data to the transaction log.



                  The same effect can be seen in reverse with some trickery. It is possible to encourage SQL Server to use a sort instead of a spool for Halloween Protection. One implementation:



                  INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
                  SELECT TOP (987654321)
                  maybe_new_rows.ID
                  FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
                  WHERE NOT EXISTS (
                  SELECT 1
                  FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
                  WHERE maybe_new_rows.ID = halloween.ID
                  )
                  ORDER BY maybe_new_rows.ID, maybe_new_rows.ID + 1
                  OPTION (MAXDOP 1, QUERYTRACEON 7470, MERGE JOIN);


                  Now the plan has a TOP N Sort operator in place of the spool. The sort is a blocking operator so the spool is no longer necessary:



                  enter image description here



                  More importantly, we now have support for the DML Request Sort option. Looking at Actual Time Statistics again, the insert operator now only takes 1623 ms. The entire plan takes about 5400 ms to execute without requesting an actual plan.



                  As Hugo explains, the Eager Spool operator does preserve order. That can most easily be seen with a TOP PERCENT plan. It's unfortunate that the original query with the spool cannot take better advantage of the sorted nature of the data in the spool.






                  share|improve this answer













                  To expand on Paul's answer a bit, part of the difference in elapsed time between the spool and the temp table approaches seems to come down to the lack of support for the DML Request Sort option in the spool plan. With undocumented trace flag 8795, the elapsed time for the temp table approach jumps from 4400 ms to 5600 ms.



                  INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
                  SELECT new_rows.ID
                  FROM #CONSULTANT_RECOMMENDED_TEMP_TABLE new_rows
                  OPTION (MAXDOP 1, QUERYTRACEON 8795);


                  Note that this isn't exactly equivalent to the insert performed by the spool plan. This query writes significantly more data to the transaction log.



                  The same effect can be seen in reverse with some trickery. It is possible to encourage SQL Server to use a sort instead of a spool for Halloween Protection. One implementation:



                  INSERT INTO dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR WITH (TABLOCK)
                  SELECT TOP (987654321)
                  maybe_new_rows.ID
                  FROM dbo.A_HEAP_OF_MOSTLY_NEW_ROWS maybe_new_rows
                  WHERE NOT EXISTS (
                  SELECT 1
                  FROM dbo.HALLOWEEN_IS_COMING_EARLY_THIS_YEAR halloween
                  WHERE maybe_new_rows.ID = halloween.ID
                  )
                  ORDER BY maybe_new_rows.ID, maybe_new_rows.ID + 1
                  OPTION (MAXDOP 1, QUERYTRACEON 7470, MERGE JOIN);


                  Now the plan has a TOP N Sort operator in place of the spool. The sort is a blocking operator so the spool is no longer necessary:



                  enter image description here



                  More importantly, we now have support for the DML Request Sort option. Looking at Actual Time Statistics again, the insert operator now only takes 1623 ms. The entire plan takes about 5400 ms to execute without requesting an actual plan.



                  As Hugo explains, the Eager Spool operator does preserve order. That can most easily be seen with a TOP PERCENT plan. It's unfortunate that the original query with the spool cannot take better advantage of the sorted nature of the data in the spool.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Feb 27 at 5:02









                  Joe ObbishJoe Obbish

                  21.6k43190




                  21.6k43190






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Database Administrators Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f230722%2fwhy-is-a-temp-table-a-more-efficient-solution-to-the-halloween-problem-than-an-e%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      How do I know what Microsoft account the skydrive app is syncing to?

                      When does type information flow backwards in C++?

                      Grease: Live!