Why is the auto created statistic on this column empty?












7















Info



My question relates to a moderately big table (~40GB data space) that is a heap
(Unfortunately, I am not allowed to add a clustered index to the table by the application owners)



An auto created statistic on an Identity column (ID) was created, but is empty.




  • Auto create stats & auto update stats are on

  • Modifications have happened in the table

  • There are other (auto created) statistics that are getting updated

  • There is another statistic on the same column created by an index (duplicate)

  • Build: 12.0.5546


The duplicate statistic is getting updated:
enter image description here



The actual question



To my understanding, all stats could be used and modifications are tracked, even if there are two statistics on exactly the same columns (duplicates), so why does this statistic remain empty?



Stats Info



enter image description here



DB stat info



enter image description here



Table Size



enter image description here



Column Information where the statistic is created on



enter image description here



[ID] [int] IDENTITY(1,1) NOT NULL


Identity column



select * from sys.stats  
where name like '%_WA_Sys_0000000A_6B7099F3%';


enter image description hereAuto created



Getting some info on another statistic



select * From sys.dm_db_stats_properties (1802541555, 3)  


enter image description here



In comparison with my empty stat:



enter image description here



Stats + Histogram from "generate scripts":



/****** Object:  Statistic [_WA_Sys_0000000A_6B7099F3]    Script Date: 2/1/2019 10:18:19 AM ******/

CREATE STATISTICS [_WA_Sys_0000000A_6B7099F3] ON [dbo].[table]([ID]) WITH STATS_STREAM = 0x01000000010000000000000000000000EC03686B0000000040000000000000000000000000000000380348063800000004000A00000000000000000000000000


When creating a copy of the stats, no data is inside



CREATE STATISTICS [_WA_Sys_0000000A_6B7099F3_TEST] ON [dbo].[table]([ID]) WITH STATS_STREAM = 0x01000000010000000000000000000000EC03686B0000000040000000000000000000000000000000380348063800000004000A00000000000000000000000000


enter image description here



When manually updating the stat they do get updated.



UPDATE STATISTICS [dbo].[Table]([_WA_Sys_0000000A_6B7099F3_TEST])


enter image description here










share|improve this question





























    7















    Info



    My question relates to a moderately big table (~40GB data space) that is a heap
    (Unfortunately, I am not allowed to add a clustered index to the table by the application owners)



    An auto created statistic on an Identity column (ID) was created, but is empty.




    • Auto create stats & auto update stats are on

    • Modifications have happened in the table

    • There are other (auto created) statistics that are getting updated

    • There is another statistic on the same column created by an index (duplicate)

    • Build: 12.0.5546


    The duplicate statistic is getting updated:
    enter image description here



    The actual question



    To my understanding, all stats could be used and modifications are tracked, even if there are two statistics on exactly the same columns (duplicates), so why does this statistic remain empty?



    Stats Info



    enter image description here



    DB stat info



    enter image description here



    Table Size



    enter image description here



    Column Information where the statistic is created on



    enter image description here



    [ID] [int] IDENTITY(1,1) NOT NULL


    Identity column



    select * from sys.stats  
    where name like '%_WA_Sys_0000000A_6B7099F3%';


    enter image description hereAuto created



    Getting some info on another statistic



    select * From sys.dm_db_stats_properties (1802541555, 3)  


    enter image description here



    In comparison with my empty stat:



    enter image description here



    Stats + Histogram from "generate scripts":



    /****** Object:  Statistic [_WA_Sys_0000000A_6B7099F3]    Script Date: 2/1/2019 10:18:19 AM ******/

    CREATE STATISTICS [_WA_Sys_0000000A_6B7099F3] ON [dbo].[table]([ID]) WITH STATS_STREAM = 0x01000000010000000000000000000000EC03686B0000000040000000000000000000000000000000380348063800000004000A00000000000000000000000000


    When creating a copy of the stats, no data is inside



    CREATE STATISTICS [_WA_Sys_0000000A_6B7099F3_TEST] ON [dbo].[table]([ID]) WITH STATS_STREAM = 0x01000000010000000000000000000000EC03686B0000000040000000000000000000000000000000380348063800000004000A00000000000000000000000000


    enter image description here



    When manually updating the stat they do get updated.



    UPDATE STATISTICS [dbo].[Table]([_WA_Sys_0000000A_6B7099F3_TEST])


    enter image description here










    share|improve this question



























      7












      7








      7


      2






      Info



      My question relates to a moderately big table (~40GB data space) that is a heap
      (Unfortunately, I am not allowed to add a clustered index to the table by the application owners)



      An auto created statistic on an Identity column (ID) was created, but is empty.




      • Auto create stats & auto update stats are on

      • Modifications have happened in the table

      • There are other (auto created) statistics that are getting updated

      • There is another statistic on the same column created by an index (duplicate)

      • Build: 12.0.5546


      The duplicate statistic is getting updated:
      enter image description here



      The actual question



      To my understanding, all stats could be used and modifications are tracked, even if there are two statistics on exactly the same columns (duplicates), so why does this statistic remain empty?



      Stats Info



      enter image description here



      DB stat info



      enter image description here



      Table Size



      enter image description here



      Column Information where the statistic is created on



      enter image description here



      [ID] [int] IDENTITY(1,1) NOT NULL


      Identity column



      select * from sys.stats  
      where name like '%_WA_Sys_0000000A_6B7099F3%';


      enter image description hereAuto created



      Getting some info on another statistic



      select * From sys.dm_db_stats_properties (1802541555, 3)  


      enter image description here



      In comparison with my empty stat:



      enter image description here



      Stats + Histogram from "generate scripts":



      /****** Object:  Statistic [_WA_Sys_0000000A_6B7099F3]    Script Date: 2/1/2019 10:18:19 AM ******/

      CREATE STATISTICS [_WA_Sys_0000000A_6B7099F3] ON [dbo].[table]([ID]) WITH STATS_STREAM = 0x01000000010000000000000000000000EC03686B0000000040000000000000000000000000000000380348063800000004000A00000000000000000000000000


      When creating a copy of the stats, no data is inside



      CREATE STATISTICS [_WA_Sys_0000000A_6B7099F3_TEST] ON [dbo].[table]([ID]) WITH STATS_STREAM = 0x01000000010000000000000000000000EC03686B0000000040000000000000000000000000000000380348063800000004000A00000000000000000000000000


      enter image description here



      When manually updating the stat they do get updated.



      UPDATE STATISTICS [dbo].[Table]([_WA_Sys_0000000A_6B7099F3_TEST])


      enter image description here










      share|improve this question
















      Info



      My question relates to a moderately big table (~40GB data space) that is a heap
      (Unfortunately, I am not allowed to add a clustered index to the table by the application owners)



      An auto created statistic on an Identity column (ID) was created, but is empty.




      • Auto create stats & auto update stats are on

      • Modifications have happened in the table

      • There are other (auto created) statistics that are getting updated

      • There is another statistic on the same column created by an index (duplicate)

      • Build: 12.0.5546


      The duplicate statistic is getting updated:
      enter image description here



      The actual question



      To my understanding, all stats could be used and modifications are tracked, even if there are two statistics on exactly the same columns (duplicates), so why does this statistic remain empty?



      Stats Info



      enter image description here



      DB stat info



      enter image description here



      Table Size



      enter image description here



      Column Information where the statistic is created on



      enter image description here



      [ID] [int] IDENTITY(1,1) NOT NULL


      Identity column



      select * from sys.stats  
      where name like '%_WA_Sys_0000000A_6B7099F3%';


      enter image description hereAuto created



      Getting some info on another statistic



      select * From sys.dm_db_stats_properties (1802541555, 3)  


      enter image description here



      In comparison with my empty stat:



      enter image description here



      Stats + Histogram from "generate scripts":



      /****** Object:  Statistic [_WA_Sys_0000000A_6B7099F3]    Script Date: 2/1/2019 10:18:19 AM ******/

      CREATE STATISTICS [_WA_Sys_0000000A_6B7099F3] ON [dbo].[table]([ID]) WITH STATS_STREAM = 0x01000000010000000000000000000000EC03686B0000000040000000000000000000000000000000380348063800000004000A00000000000000000000000000


      When creating a copy of the stats, no data is inside



      CREATE STATISTICS [_WA_Sys_0000000A_6B7099F3_TEST] ON [dbo].[table]([ID]) WITH STATS_STREAM = 0x01000000010000000000000000000000EC03686B0000000040000000000000000000000000000000380348063800000004000A00000000000000000000000000


      enter image description here



      When manually updating the stat they do get updated.



      UPDATE STATISTICS [dbo].[Table]([_WA_Sys_0000000A_6B7099F3_TEST])


      enter image description here







      sql-server sql-server-2014 statistics






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Feb 1 at 12:15









      Paul White

      52.6k14281455




      52.6k14281455










      asked Feb 1 at 9:02









      Randi VertongenRandi Vertongen

      2,896721




      2,896721






















          1 Answer
          1






          active

          oldest

          votes


















          8














          I was able to reproduce this, both with an empty statistic, and a populated statistic. I arranged for an automatic statistic to be created on an empty table, and the index was created later:



          IF OBJECT_ID(N'dbo.Heap', N'U') IS NOT NULL
          BEGIN
          DROP TABLE dbo.Heap;
          END;
          GO
          CREATE TABLE dbo.Heap
          (
          id integer NOT NULL IDENTITY,
          val integer NOT NULL,
          );
          GO
          -- Add 1000 rows
          INSERT dbo.Heap
          WITH (TABLOCKX)
          (val)
          SELECT
          SV.number
          FROM master.dbo.spt_values AS SV
          WHERE
          SV.[type] = N'P'
          AND SV.number BETWEEN 1 AND 1000;
          GO
          SELECT COUNT_BIG(*)
          FROM dbo.Heap AS H
          JOIN dbo.Heap AS H2
          ON H2.id = H.id
          WHERE H.id > 0
          AND H2.id > 0;
          GO
          -- Empty table
          TRUNCATE TABLE dbo.Heap;
          GO
          -- Repeat exact same query (RT = 500 + 0.2 * 1000 = 700)
          GO
          SELECT COUNT_BIG(*)
          FROM dbo.Heap AS H
          JOIN dbo.Heap AS H2
          ON H2.id = H.id
          WHERE H.id > 0
          AND H2.id > 0;
          GO
          -- Add 1000 rows
          INSERT dbo.Heap
          WITH (TABLOCKX)
          (val)
          SELECT
          SV.number
          FROM master.dbo.spt_values AS SV
          WHERE
          SV.[type] = N'P'
          AND SV.number BETWEEN 1 AND 1000;
          GO
          -- Add index
          ALTER TABLE dbo.Heap ADD
          CONSTRAINT [PK dbo.Heap id]
          PRIMARY KEY NONCLUSTERED (id);
          GO
          SELECT
          S.[name],
          S.auto_created,
          DDSP.stats_id,
          DDSP.last_updated,
          DDSP.[rows],
          DDSP.rows_sampled,
          DDSP.steps,
          DDSP.unfiltered_rows,
          DDSP.modification_counter
          FROM sys.stats AS S
          CROSS APPLY sys.dm_db_stats_properties(S.[object_id], S.stats_id) AS DDSP
          WHERE
          S.[object_id] = OBJECT_ID(N'dbo.Heap', N'U');


          Output



          I found that modifications continue to be tracked accurately on all non-empty duplicates, but only one statistic is updated automatically (regardless of the asynchronous setting).



          Automatic statistics updates only occur when the query optimizer needs a particular statistic, and finds that it is out of date (an optimality-related recompile).



          The optimizer chooses from duplicate statistics as mentioned in the Plan Caching and Recompilations in SQL Server 2012 paper:




          An issue not directly related to the topic of this document is: given multiple statistics on the same set of columns in the same order, how does the query optimizer decide which ones to load during query optimization? The answer is not simple, but the query optimizer uses such guidelines as: Give preference to recent statistics over older statistics; Give preference to statistics computed using FULLSCAN option to those computed using sampling; and so on.




          The point being that the optimizer chooses one of the available duplicate statistics (the "best one), and that one is automatically updated if found to be stale.



          I believe this is a change in behaviour from older releases - or at least the documentation suggests that all out-of-date statistics for an object would be updated as part of this process, but I have no idea when this changed. It was certainly after August 2013 when Matt Bowler posted Duplicate Statistics, which contains a handy AdventureWorks-based repo. That script now results in only one of the statistics objects being updated, while at the time both were.



          The above explanation matches all the behaviours I observed while attempting to reproduce your scenario, but I doubt it is explicitly documented anywhere. It does seem like a sensible optimization, since there is little value in keeping duplicates fully updated.



          This is probably all at a level of detail below that which Microsoft are willing to support. This also means it could change without notice.






          share|improve this answer

























            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "182"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f228656%2fwhy-is-the-auto-created-statistic-on-this-column-empty%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            8














            I was able to reproduce this, both with an empty statistic, and a populated statistic. I arranged for an automatic statistic to be created on an empty table, and the index was created later:



            IF OBJECT_ID(N'dbo.Heap', N'U') IS NOT NULL
            BEGIN
            DROP TABLE dbo.Heap;
            END;
            GO
            CREATE TABLE dbo.Heap
            (
            id integer NOT NULL IDENTITY,
            val integer NOT NULL,
            );
            GO
            -- Add 1000 rows
            INSERT dbo.Heap
            WITH (TABLOCKX)
            (val)
            SELECT
            SV.number
            FROM master.dbo.spt_values AS SV
            WHERE
            SV.[type] = N'P'
            AND SV.number BETWEEN 1 AND 1000;
            GO
            SELECT COUNT_BIG(*)
            FROM dbo.Heap AS H
            JOIN dbo.Heap AS H2
            ON H2.id = H.id
            WHERE H.id > 0
            AND H2.id > 0;
            GO
            -- Empty table
            TRUNCATE TABLE dbo.Heap;
            GO
            -- Repeat exact same query (RT = 500 + 0.2 * 1000 = 700)
            GO
            SELECT COUNT_BIG(*)
            FROM dbo.Heap AS H
            JOIN dbo.Heap AS H2
            ON H2.id = H.id
            WHERE H.id > 0
            AND H2.id > 0;
            GO
            -- Add 1000 rows
            INSERT dbo.Heap
            WITH (TABLOCKX)
            (val)
            SELECT
            SV.number
            FROM master.dbo.spt_values AS SV
            WHERE
            SV.[type] = N'P'
            AND SV.number BETWEEN 1 AND 1000;
            GO
            -- Add index
            ALTER TABLE dbo.Heap ADD
            CONSTRAINT [PK dbo.Heap id]
            PRIMARY KEY NONCLUSTERED (id);
            GO
            SELECT
            S.[name],
            S.auto_created,
            DDSP.stats_id,
            DDSP.last_updated,
            DDSP.[rows],
            DDSP.rows_sampled,
            DDSP.steps,
            DDSP.unfiltered_rows,
            DDSP.modification_counter
            FROM sys.stats AS S
            CROSS APPLY sys.dm_db_stats_properties(S.[object_id], S.stats_id) AS DDSP
            WHERE
            S.[object_id] = OBJECT_ID(N'dbo.Heap', N'U');


            Output



            I found that modifications continue to be tracked accurately on all non-empty duplicates, but only one statistic is updated automatically (regardless of the asynchronous setting).



            Automatic statistics updates only occur when the query optimizer needs a particular statistic, and finds that it is out of date (an optimality-related recompile).



            The optimizer chooses from duplicate statistics as mentioned in the Plan Caching and Recompilations in SQL Server 2012 paper:




            An issue not directly related to the topic of this document is: given multiple statistics on the same set of columns in the same order, how does the query optimizer decide which ones to load during query optimization? The answer is not simple, but the query optimizer uses such guidelines as: Give preference to recent statistics over older statistics; Give preference to statistics computed using FULLSCAN option to those computed using sampling; and so on.




            The point being that the optimizer chooses one of the available duplicate statistics (the "best one), and that one is automatically updated if found to be stale.



            I believe this is a change in behaviour from older releases - or at least the documentation suggests that all out-of-date statistics for an object would be updated as part of this process, but I have no idea when this changed. It was certainly after August 2013 when Matt Bowler posted Duplicate Statistics, which contains a handy AdventureWorks-based repo. That script now results in only one of the statistics objects being updated, while at the time both were.



            The above explanation matches all the behaviours I observed while attempting to reproduce your scenario, but I doubt it is explicitly documented anywhere. It does seem like a sensible optimization, since there is little value in keeping duplicates fully updated.



            This is probably all at a level of detail below that which Microsoft are willing to support. This also means it could change without notice.






            share|improve this answer






























              8














              I was able to reproduce this, both with an empty statistic, and a populated statistic. I arranged for an automatic statistic to be created on an empty table, and the index was created later:



              IF OBJECT_ID(N'dbo.Heap', N'U') IS NOT NULL
              BEGIN
              DROP TABLE dbo.Heap;
              END;
              GO
              CREATE TABLE dbo.Heap
              (
              id integer NOT NULL IDENTITY,
              val integer NOT NULL,
              );
              GO
              -- Add 1000 rows
              INSERT dbo.Heap
              WITH (TABLOCKX)
              (val)
              SELECT
              SV.number
              FROM master.dbo.spt_values AS SV
              WHERE
              SV.[type] = N'P'
              AND SV.number BETWEEN 1 AND 1000;
              GO
              SELECT COUNT_BIG(*)
              FROM dbo.Heap AS H
              JOIN dbo.Heap AS H2
              ON H2.id = H.id
              WHERE H.id > 0
              AND H2.id > 0;
              GO
              -- Empty table
              TRUNCATE TABLE dbo.Heap;
              GO
              -- Repeat exact same query (RT = 500 + 0.2 * 1000 = 700)
              GO
              SELECT COUNT_BIG(*)
              FROM dbo.Heap AS H
              JOIN dbo.Heap AS H2
              ON H2.id = H.id
              WHERE H.id > 0
              AND H2.id > 0;
              GO
              -- Add 1000 rows
              INSERT dbo.Heap
              WITH (TABLOCKX)
              (val)
              SELECT
              SV.number
              FROM master.dbo.spt_values AS SV
              WHERE
              SV.[type] = N'P'
              AND SV.number BETWEEN 1 AND 1000;
              GO
              -- Add index
              ALTER TABLE dbo.Heap ADD
              CONSTRAINT [PK dbo.Heap id]
              PRIMARY KEY NONCLUSTERED (id);
              GO
              SELECT
              S.[name],
              S.auto_created,
              DDSP.stats_id,
              DDSP.last_updated,
              DDSP.[rows],
              DDSP.rows_sampled,
              DDSP.steps,
              DDSP.unfiltered_rows,
              DDSP.modification_counter
              FROM sys.stats AS S
              CROSS APPLY sys.dm_db_stats_properties(S.[object_id], S.stats_id) AS DDSP
              WHERE
              S.[object_id] = OBJECT_ID(N'dbo.Heap', N'U');


              Output



              I found that modifications continue to be tracked accurately on all non-empty duplicates, but only one statistic is updated automatically (regardless of the asynchronous setting).



              Automatic statistics updates only occur when the query optimizer needs a particular statistic, and finds that it is out of date (an optimality-related recompile).



              The optimizer chooses from duplicate statistics as mentioned in the Plan Caching and Recompilations in SQL Server 2012 paper:




              An issue not directly related to the topic of this document is: given multiple statistics on the same set of columns in the same order, how does the query optimizer decide which ones to load during query optimization? The answer is not simple, but the query optimizer uses such guidelines as: Give preference to recent statistics over older statistics; Give preference to statistics computed using FULLSCAN option to those computed using sampling; and so on.




              The point being that the optimizer chooses one of the available duplicate statistics (the "best one), and that one is automatically updated if found to be stale.



              I believe this is a change in behaviour from older releases - or at least the documentation suggests that all out-of-date statistics for an object would be updated as part of this process, but I have no idea when this changed. It was certainly after August 2013 when Matt Bowler posted Duplicate Statistics, which contains a handy AdventureWorks-based repo. That script now results in only one of the statistics objects being updated, while at the time both were.



              The above explanation matches all the behaviours I observed while attempting to reproduce your scenario, but I doubt it is explicitly documented anywhere. It does seem like a sensible optimization, since there is little value in keeping duplicates fully updated.



              This is probably all at a level of detail below that which Microsoft are willing to support. This also means it could change without notice.






              share|improve this answer




























                8












                8








                8







                I was able to reproduce this, both with an empty statistic, and a populated statistic. I arranged for an automatic statistic to be created on an empty table, and the index was created later:



                IF OBJECT_ID(N'dbo.Heap', N'U') IS NOT NULL
                BEGIN
                DROP TABLE dbo.Heap;
                END;
                GO
                CREATE TABLE dbo.Heap
                (
                id integer NOT NULL IDENTITY,
                val integer NOT NULL,
                );
                GO
                -- Add 1000 rows
                INSERT dbo.Heap
                WITH (TABLOCKX)
                (val)
                SELECT
                SV.number
                FROM master.dbo.spt_values AS SV
                WHERE
                SV.[type] = N'P'
                AND SV.number BETWEEN 1 AND 1000;
                GO
                SELECT COUNT_BIG(*)
                FROM dbo.Heap AS H
                JOIN dbo.Heap AS H2
                ON H2.id = H.id
                WHERE H.id > 0
                AND H2.id > 0;
                GO
                -- Empty table
                TRUNCATE TABLE dbo.Heap;
                GO
                -- Repeat exact same query (RT = 500 + 0.2 * 1000 = 700)
                GO
                SELECT COUNT_BIG(*)
                FROM dbo.Heap AS H
                JOIN dbo.Heap AS H2
                ON H2.id = H.id
                WHERE H.id > 0
                AND H2.id > 0;
                GO
                -- Add 1000 rows
                INSERT dbo.Heap
                WITH (TABLOCKX)
                (val)
                SELECT
                SV.number
                FROM master.dbo.spt_values AS SV
                WHERE
                SV.[type] = N'P'
                AND SV.number BETWEEN 1 AND 1000;
                GO
                -- Add index
                ALTER TABLE dbo.Heap ADD
                CONSTRAINT [PK dbo.Heap id]
                PRIMARY KEY NONCLUSTERED (id);
                GO
                SELECT
                S.[name],
                S.auto_created,
                DDSP.stats_id,
                DDSP.last_updated,
                DDSP.[rows],
                DDSP.rows_sampled,
                DDSP.steps,
                DDSP.unfiltered_rows,
                DDSP.modification_counter
                FROM sys.stats AS S
                CROSS APPLY sys.dm_db_stats_properties(S.[object_id], S.stats_id) AS DDSP
                WHERE
                S.[object_id] = OBJECT_ID(N'dbo.Heap', N'U');


                Output



                I found that modifications continue to be tracked accurately on all non-empty duplicates, but only one statistic is updated automatically (regardless of the asynchronous setting).



                Automatic statistics updates only occur when the query optimizer needs a particular statistic, and finds that it is out of date (an optimality-related recompile).



                The optimizer chooses from duplicate statistics as mentioned in the Plan Caching and Recompilations in SQL Server 2012 paper:




                An issue not directly related to the topic of this document is: given multiple statistics on the same set of columns in the same order, how does the query optimizer decide which ones to load during query optimization? The answer is not simple, but the query optimizer uses such guidelines as: Give preference to recent statistics over older statistics; Give preference to statistics computed using FULLSCAN option to those computed using sampling; and so on.




                The point being that the optimizer chooses one of the available duplicate statistics (the "best one), and that one is automatically updated if found to be stale.



                I believe this is a change in behaviour from older releases - or at least the documentation suggests that all out-of-date statistics for an object would be updated as part of this process, but I have no idea when this changed. It was certainly after August 2013 when Matt Bowler posted Duplicate Statistics, which contains a handy AdventureWorks-based repo. That script now results in only one of the statistics objects being updated, while at the time both were.



                The above explanation matches all the behaviours I observed while attempting to reproduce your scenario, but I doubt it is explicitly documented anywhere. It does seem like a sensible optimization, since there is little value in keeping duplicates fully updated.



                This is probably all at a level of detail below that which Microsoft are willing to support. This also means it could change without notice.






                share|improve this answer















                I was able to reproduce this, both with an empty statistic, and a populated statistic. I arranged for an automatic statistic to be created on an empty table, and the index was created later:



                IF OBJECT_ID(N'dbo.Heap', N'U') IS NOT NULL
                BEGIN
                DROP TABLE dbo.Heap;
                END;
                GO
                CREATE TABLE dbo.Heap
                (
                id integer NOT NULL IDENTITY,
                val integer NOT NULL,
                );
                GO
                -- Add 1000 rows
                INSERT dbo.Heap
                WITH (TABLOCKX)
                (val)
                SELECT
                SV.number
                FROM master.dbo.spt_values AS SV
                WHERE
                SV.[type] = N'P'
                AND SV.number BETWEEN 1 AND 1000;
                GO
                SELECT COUNT_BIG(*)
                FROM dbo.Heap AS H
                JOIN dbo.Heap AS H2
                ON H2.id = H.id
                WHERE H.id > 0
                AND H2.id > 0;
                GO
                -- Empty table
                TRUNCATE TABLE dbo.Heap;
                GO
                -- Repeat exact same query (RT = 500 + 0.2 * 1000 = 700)
                GO
                SELECT COUNT_BIG(*)
                FROM dbo.Heap AS H
                JOIN dbo.Heap AS H2
                ON H2.id = H.id
                WHERE H.id > 0
                AND H2.id > 0;
                GO
                -- Add 1000 rows
                INSERT dbo.Heap
                WITH (TABLOCKX)
                (val)
                SELECT
                SV.number
                FROM master.dbo.spt_values AS SV
                WHERE
                SV.[type] = N'P'
                AND SV.number BETWEEN 1 AND 1000;
                GO
                -- Add index
                ALTER TABLE dbo.Heap ADD
                CONSTRAINT [PK dbo.Heap id]
                PRIMARY KEY NONCLUSTERED (id);
                GO
                SELECT
                S.[name],
                S.auto_created,
                DDSP.stats_id,
                DDSP.last_updated,
                DDSP.[rows],
                DDSP.rows_sampled,
                DDSP.steps,
                DDSP.unfiltered_rows,
                DDSP.modification_counter
                FROM sys.stats AS S
                CROSS APPLY sys.dm_db_stats_properties(S.[object_id], S.stats_id) AS DDSP
                WHERE
                S.[object_id] = OBJECT_ID(N'dbo.Heap', N'U');


                Output



                I found that modifications continue to be tracked accurately on all non-empty duplicates, but only one statistic is updated automatically (regardless of the asynchronous setting).



                Automatic statistics updates only occur when the query optimizer needs a particular statistic, and finds that it is out of date (an optimality-related recompile).



                The optimizer chooses from duplicate statistics as mentioned in the Plan Caching and Recompilations in SQL Server 2012 paper:




                An issue not directly related to the topic of this document is: given multiple statistics on the same set of columns in the same order, how does the query optimizer decide which ones to load during query optimization? The answer is not simple, but the query optimizer uses such guidelines as: Give preference to recent statistics over older statistics; Give preference to statistics computed using FULLSCAN option to those computed using sampling; and so on.




                The point being that the optimizer chooses one of the available duplicate statistics (the "best one), and that one is automatically updated if found to be stale.



                I believe this is a change in behaviour from older releases - or at least the documentation suggests that all out-of-date statistics for an object would be updated as part of this process, but I have no idea when this changed. It was certainly after August 2013 when Matt Bowler posted Duplicate Statistics, which contains a handy AdventureWorks-based repo. That script now results in only one of the statistics objects being updated, while at the time both were.



                The above explanation matches all the behaviours I observed while attempting to reproduce your scenario, but I doubt it is explicitly documented anywhere. It does seem like a sensible optimization, since there is little value in keeping duplicates fully updated.



                This is probably all at a level of detail below that which Microsoft are willing to support. This also means it could change without notice.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Feb 1 at 12:59

























                answered Feb 1 at 12:15









                Paul WhitePaul White

                52.6k14281455




                52.6k14281455






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Database Administrators Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f228656%2fwhy-is-the-auto-created-statistic-on-this-column-empty%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Probability when a professor distributes a quiz and homework assignment to a class of n students.

                    Aardman Animations

                    Are they similar matrix