How to prove the lack of implicit order in a database?












20















Recently I was explaining to colleagues the importance of having a column by which to sort data in a database table if it is necessary to do so, for example for chronologically-ordered data. This proved somewhat difficult because they could simply re-run their query seemingly endlessly and it would always return the same set of rows in the same order.



I have noticed this before and all I could really do is insist that they trust me and not simply assume that a database table will behave like a traditional CSV or Excel file.



For example, executing the (PostgreSQL) query



create table mytable (
id INTEGER PRIMARY KEY,
data TEXT
);
INSERT INTO mytable VALUES
(0, 'a'),
(1, 'b'),
(2, 'c'),
(3, 'd'),
(4, 'e'),
(5, 'f'),
(6, 'g'),
(7, 'h'),
(8, 'i'),
(9, 'j');


will create a table with a clear conceptual order. Selecting that same data in the simplest way would be:



SELECT * FROM mytable;


Always gives me the following results:



 id | data 
----+------
0 | a
1 | b
2 | c
3 | d
4 | e
5 | f
6 | g
7 | h
8 | i
9 | j
(10 rows)


I can do this over and over again and it will always return to me the same data in the same order. However I know that this implicit order can be broken, I have seen it before, particularly in large datasets, where some random value will get apparently thrown up into the "wrong" place when selected. But it has occurred to me that I don't know how this happens or how to reproduce it. I find it difficult to get results on Google because the search query tends to just return general help on sorting result sets.



So, my questions are essentially these:




  1. How can I demonstrably and concretely prove that the return order of rows from a query without an ORDER BY statement is not reliable, preferably by causing and showing a breakdown of the implicit order even when the table in question is not updated or edited?


  2. Does it make any difference at all if the data is only inserted once en masse and then never updated again?



I would prefer a postgres-based answer since that is the one I am most familiar with but I'm more interested in the theory itself.










share|improve this question




















  • 6





    “Never written to or updated again” - why is this a table? Sounds like a file. Or an enum. Or something that doesn’t need to be in a database. If it’s chronological, isn’t there a date column to order by? If chronology matters you’d think that info would be important enough to have in the table. Anyway, plans can change due to someone dropping or creating a new index, or events like memory changes, trace flags, or other influences. Their argument sounds like “I never wear my seatbelt and I’ve never gone through my windshield, so I will continue not wearing my seatbelt.” :-(

    – Aaron Bertrand
    Jan 25 at 22:20






  • 9





    Some logic problems just can’t be solved technically or without HR involvement. If your company wants to allow developer practices that rely on believing in voodoo and ignoring the documentation, and your use case really is limited to a tiny table that is never ever updated, just let them have their way and update your resume. It’s not worth the arguing.

    – Aaron Bertrand
    Jan 25 at 22:25






  • 1





    You have no basis to claim "will always". You can only claim "has always", "when I checked". The language has a definition--that is the contract with the user.

    – philipxy
    Jan 25 at 22:57








  • 10





    I'm curious why these colleagues of yours are against adding the order by clause to their queries? Are they trying to save on source code storage? keyboard wear and tear? time it takes to type the dreaded clause?

    – mustaccio
    Jan 25 at 23:21








  • 2





    I've always thought that database engines should randomly permute the first few rows of queries for which the semantics don't guarantee an ordering, to help facilitate testing.

    – Doug McClean
    Jan 28 at 2:41
















20















Recently I was explaining to colleagues the importance of having a column by which to sort data in a database table if it is necessary to do so, for example for chronologically-ordered data. This proved somewhat difficult because they could simply re-run their query seemingly endlessly and it would always return the same set of rows in the same order.



I have noticed this before and all I could really do is insist that they trust me and not simply assume that a database table will behave like a traditional CSV or Excel file.



For example, executing the (PostgreSQL) query



create table mytable (
id INTEGER PRIMARY KEY,
data TEXT
);
INSERT INTO mytable VALUES
(0, 'a'),
(1, 'b'),
(2, 'c'),
(3, 'd'),
(4, 'e'),
(5, 'f'),
(6, 'g'),
(7, 'h'),
(8, 'i'),
(9, 'j');


will create a table with a clear conceptual order. Selecting that same data in the simplest way would be:



SELECT * FROM mytable;


Always gives me the following results:



 id | data 
----+------
0 | a
1 | b
2 | c
3 | d
4 | e
5 | f
6 | g
7 | h
8 | i
9 | j
(10 rows)


I can do this over and over again and it will always return to me the same data in the same order. However I know that this implicit order can be broken, I have seen it before, particularly in large datasets, where some random value will get apparently thrown up into the "wrong" place when selected. But it has occurred to me that I don't know how this happens or how to reproduce it. I find it difficult to get results on Google because the search query tends to just return general help on sorting result sets.



So, my questions are essentially these:




  1. How can I demonstrably and concretely prove that the return order of rows from a query without an ORDER BY statement is not reliable, preferably by causing and showing a breakdown of the implicit order even when the table in question is not updated or edited?


  2. Does it make any difference at all if the data is only inserted once en masse and then never updated again?



I would prefer a postgres-based answer since that is the one I am most familiar with but I'm more interested in the theory itself.










share|improve this question




















  • 6





    “Never written to or updated again” - why is this a table? Sounds like a file. Or an enum. Or something that doesn’t need to be in a database. If it’s chronological, isn’t there a date column to order by? If chronology matters you’d think that info would be important enough to have in the table. Anyway, plans can change due to someone dropping or creating a new index, or events like memory changes, trace flags, or other influences. Their argument sounds like “I never wear my seatbelt and I’ve never gone through my windshield, so I will continue not wearing my seatbelt.” :-(

    – Aaron Bertrand
    Jan 25 at 22:20






  • 9





    Some logic problems just can’t be solved technically or without HR involvement. If your company wants to allow developer practices that rely on believing in voodoo and ignoring the documentation, and your use case really is limited to a tiny table that is never ever updated, just let them have their way and update your resume. It’s not worth the arguing.

    – Aaron Bertrand
    Jan 25 at 22:25






  • 1





    You have no basis to claim "will always". You can only claim "has always", "when I checked". The language has a definition--that is the contract with the user.

    – philipxy
    Jan 25 at 22:57








  • 10





    I'm curious why these colleagues of yours are against adding the order by clause to their queries? Are they trying to save on source code storage? keyboard wear and tear? time it takes to type the dreaded clause?

    – mustaccio
    Jan 25 at 23:21








  • 2





    I've always thought that database engines should randomly permute the first few rows of queries for which the semantics don't guarantee an ordering, to help facilitate testing.

    – Doug McClean
    Jan 28 at 2:41














20












20








20


5






Recently I was explaining to colleagues the importance of having a column by which to sort data in a database table if it is necessary to do so, for example for chronologically-ordered data. This proved somewhat difficult because they could simply re-run their query seemingly endlessly and it would always return the same set of rows in the same order.



I have noticed this before and all I could really do is insist that they trust me and not simply assume that a database table will behave like a traditional CSV or Excel file.



For example, executing the (PostgreSQL) query



create table mytable (
id INTEGER PRIMARY KEY,
data TEXT
);
INSERT INTO mytable VALUES
(0, 'a'),
(1, 'b'),
(2, 'c'),
(3, 'd'),
(4, 'e'),
(5, 'f'),
(6, 'g'),
(7, 'h'),
(8, 'i'),
(9, 'j');


will create a table with a clear conceptual order. Selecting that same data in the simplest way would be:



SELECT * FROM mytable;


Always gives me the following results:



 id | data 
----+------
0 | a
1 | b
2 | c
3 | d
4 | e
5 | f
6 | g
7 | h
8 | i
9 | j
(10 rows)


I can do this over and over again and it will always return to me the same data in the same order. However I know that this implicit order can be broken, I have seen it before, particularly in large datasets, where some random value will get apparently thrown up into the "wrong" place when selected. But it has occurred to me that I don't know how this happens or how to reproduce it. I find it difficult to get results on Google because the search query tends to just return general help on sorting result sets.



So, my questions are essentially these:




  1. How can I demonstrably and concretely prove that the return order of rows from a query without an ORDER BY statement is not reliable, preferably by causing and showing a breakdown of the implicit order even when the table in question is not updated or edited?


  2. Does it make any difference at all if the data is only inserted once en masse and then never updated again?



I would prefer a postgres-based answer since that is the one I am most familiar with but I'm more interested in the theory itself.










share|improve this question
















Recently I was explaining to colleagues the importance of having a column by which to sort data in a database table if it is necessary to do so, for example for chronologically-ordered data. This proved somewhat difficult because they could simply re-run their query seemingly endlessly and it would always return the same set of rows in the same order.



I have noticed this before and all I could really do is insist that they trust me and not simply assume that a database table will behave like a traditional CSV or Excel file.



For example, executing the (PostgreSQL) query



create table mytable (
id INTEGER PRIMARY KEY,
data TEXT
);
INSERT INTO mytable VALUES
(0, 'a'),
(1, 'b'),
(2, 'c'),
(3, 'd'),
(4, 'e'),
(5, 'f'),
(6, 'g'),
(7, 'h'),
(8, 'i'),
(9, 'j');


will create a table with a clear conceptual order. Selecting that same data in the simplest way would be:



SELECT * FROM mytable;


Always gives me the following results:



 id | data 
----+------
0 | a
1 | b
2 | c
3 | d
4 | e
5 | f
6 | g
7 | h
8 | i
9 | j
(10 rows)


I can do this over and over again and it will always return to me the same data in the same order. However I know that this implicit order can be broken, I have seen it before, particularly in large datasets, where some random value will get apparently thrown up into the "wrong" place when selected. But it has occurred to me that I don't know how this happens or how to reproduce it. I find it difficult to get results on Google because the search query tends to just return general help on sorting result sets.



So, my questions are essentially these:




  1. How can I demonstrably and concretely prove that the return order of rows from a query without an ORDER BY statement is not reliable, preferably by causing and showing a breakdown of the implicit order even when the table in question is not updated or edited?


  2. Does it make any difference at all if the data is only inserted once en masse and then never updated again?



I would prefer a postgres-based answer since that is the one I am most familiar with but I'm more interested in the theory itself.







order-by data-integrity






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 26 at 10:38

























asked Jan 25 at 21:20







anon















  • 6





    “Never written to or updated again” - why is this a table? Sounds like a file. Or an enum. Or something that doesn’t need to be in a database. If it’s chronological, isn’t there a date column to order by? If chronology matters you’d think that info would be important enough to have in the table. Anyway, plans can change due to someone dropping or creating a new index, or events like memory changes, trace flags, or other influences. Their argument sounds like “I never wear my seatbelt and I’ve never gone through my windshield, so I will continue not wearing my seatbelt.” :-(

    – Aaron Bertrand
    Jan 25 at 22:20






  • 9





    Some logic problems just can’t be solved technically or without HR involvement. If your company wants to allow developer practices that rely on believing in voodoo and ignoring the documentation, and your use case really is limited to a tiny table that is never ever updated, just let them have their way and update your resume. It’s not worth the arguing.

    – Aaron Bertrand
    Jan 25 at 22:25






  • 1





    You have no basis to claim "will always". You can only claim "has always", "when I checked". The language has a definition--that is the contract with the user.

    – philipxy
    Jan 25 at 22:57








  • 10





    I'm curious why these colleagues of yours are against adding the order by clause to their queries? Are they trying to save on source code storage? keyboard wear and tear? time it takes to type the dreaded clause?

    – mustaccio
    Jan 25 at 23:21








  • 2





    I've always thought that database engines should randomly permute the first few rows of queries for which the semantics don't guarantee an ordering, to help facilitate testing.

    – Doug McClean
    Jan 28 at 2:41














  • 6





    “Never written to or updated again” - why is this a table? Sounds like a file. Or an enum. Or something that doesn’t need to be in a database. If it’s chronological, isn’t there a date column to order by? If chronology matters you’d think that info would be important enough to have in the table. Anyway, plans can change due to someone dropping or creating a new index, or events like memory changes, trace flags, or other influences. Their argument sounds like “I never wear my seatbelt and I’ve never gone through my windshield, so I will continue not wearing my seatbelt.” :-(

    – Aaron Bertrand
    Jan 25 at 22:20






  • 9





    Some logic problems just can’t be solved technically or without HR involvement. If your company wants to allow developer practices that rely on believing in voodoo and ignoring the documentation, and your use case really is limited to a tiny table that is never ever updated, just let them have their way and update your resume. It’s not worth the arguing.

    – Aaron Bertrand
    Jan 25 at 22:25






  • 1





    You have no basis to claim "will always". You can only claim "has always", "when I checked". The language has a definition--that is the contract with the user.

    – philipxy
    Jan 25 at 22:57








  • 10





    I'm curious why these colleagues of yours are against adding the order by clause to their queries? Are they trying to save on source code storage? keyboard wear and tear? time it takes to type the dreaded clause?

    – mustaccio
    Jan 25 at 23:21








  • 2





    I've always thought that database engines should randomly permute the first few rows of queries for which the semantics don't guarantee an ordering, to help facilitate testing.

    – Doug McClean
    Jan 28 at 2:41








6




6





“Never written to or updated again” - why is this a table? Sounds like a file. Or an enum. Or something that doesn’t need to be in a database. If it’s chronological, isn’t there a date column to order by? If chronology matters you’d think that info would be important enough to have in the table. Anyway, plans can change due to someone dropping or creating a new index, or events like memory changes, trace flags, or other influences. Their argument sounds like “I never wear my seatbelt and I’ve never gone through my windshield, so I will continue not wearing my seatbelt.” :-(

– Aaron Bertrand
Jan 25 at 22:20





“Never written to or updated again” - why is this a table? Sounds like a file. Or an enum. Or something that doesn’t need to be in a database. If it’s chronological, isn’t there a date column to order by? If chronology matters you’d think that info would be important enough to have in the table. Anyway, plans can change due to someone dropping or creating a new index, or events like memory changes, trace flags, or other influences. Their argument sounds like “I never wear my seatbelt and I’ve never gone through my windshield, so I will continue not wearing my seatbelt.” :-(

– Aaron Bertrand
Jan 25 at 22:20




9




9





Some logic problems just can’t be solved technically or without HR involvement. If your company wants to allow developer practices that rely on believing in voodoo and ignoring the documentation, and your use case really is limited to a tiny table that is never ever updated, just let them have their way and update your resume. It’s not worth the arguing.

– Aaron Bertrand
Jan 25 at 22:25





Some logic problems just can’t be solved technically or without HR involvement. If your company wants to allow developer practices that rely on believing in voodoo and ignoring the documentation, and your use case really is limited to a tiny table that is never ever updated, just let them have their way and update your resume. It’s not worth the arguing.

– Aaron Bertrand
Jan 25 at 22:25




1




1





You have no basis to claim "will always". You can only claim "has always", "when I checked". The language has a definition--that is the contract with the user.

– philipxy
Jan 25 at 22:57







You have no basis to claim "will always". You can only claim "has always", "when I checked". The language has a definition--that is the contract with the user.

– philipxy
Jan 25 at 22:57






10




10





I'm curious why these colleagues of yours are against adding the order by clause to their queries? Are they trying to save on source code storage? keyboard wear and tear? time it takes to type the dreaded clause?

– mustaccio
Jan 25 at 23:21







I'm curious why these colleagues of yours are against adding the order by clause to their queries? Are they trying to save on source code storage? keyboard wear and tear? time it takes to type the dreaded clause?

– mustaccio
Jan 25 at 23:21






2




2





I've always thought that database engines should randomly permute the first few rows of queries for which the semantics don't guarantee an ordering, to help facilitate testing.

– Doug McClean
Jan 28 at 2:41





I've always thought that database engines should randomly permute the first few rows of queries for which the semantics don't guarantee an ordering, to help facilitate testing.

– Doug McClean
Jan 28 at 2:41










6 Answers
6






active

oldest

votes


















31














I see three ways to try to convince them:




  1. Let them try the same query but with bigger table (more number of rows) or when the table is being updated between executions. Or new rows are inserted and some old ones are deleted. Or an index is added or removed between executions. Or the table is vacuumed (in Postgres). Or indexes are rebuilt (in SQL Server). Or the table is changed from clustered to a heap. Or the database service is restarted.


  2. You can suggest that they prove that different executions will return the same order. Can they prove it? Can they provide a series of tests that proves that any query will give the result in the same order, no matter how many times it is executed?


  3. Provide the documentation of various DBMS in that matter. For example:



PostgreSQL:




Sorting Rows



After a query has produced an output table (after the select list has been processed) it can optionally be sorted. If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on. A particular output ordering can only be guaranteed if the sort step is explicitly chosen.




SQL Server:




SELECT - ORDER BY Clause (Transact-SQL)



Sorts data returned by a query in SQL Server. Use this clause to:



Order the result set of a query by the specified column list and, optionally, limit the rows returned to a specified range. The order in which rows are returned in a result set are not guaranteed unless an ORDER BY clause is specified.




Oracle:




order_by_clause



Use the ORDER BY clause to order rows returned by the statement. Without an order_by_clause, no guarantee exists that the same query executed more than once will retrieve rows in the same order.







share|improve this answer


























  • With very small tables that are not modified, you may see this behaviour. That is expected. But it is not guaranteed either. The order may change because you added an index or you modified an index or you restarted the database and possibly many other cases.

    – ypercubeᵀᴹ
    Jan 25 at 21:54






  • 6





    If the order matters, then who ever is responsible for reviewing their code should reject until they use ORDER BY. The developers of the DBMSs (Oracle, SQL Server, Postgres) all say the same thing about what their product guarantess and what not (and they are paid much more than I will be, so they know what they saying, besides having built these damn things).

    – ypercubeᵀᴹ
    Jan 25 at 22:05






  • 1





    Even if the order looks the same now, is it certain that these tables will never be updated in the whole lifetime of the software you are building? That no more rows will be inserted, ever?

    – ypercubeᵀᴹ
    Jan 25 at 22:09








  • 1





    Is there a guarantee that this table will always be this small? Is there a guarantee that no more columns will be added? I can see tens of different cases where the table may be changed in the future (and some of these changes may affect the order of a query result). I suggest you ask them to answer all of these. Can they guarantee that nothing like that will ever happen? And why won't they add a simple ORDER BY, which will guarantee the order, no matter how the table is going to change? Why not have a safe added, which does no harm?

    – ypercubeᵀᴹ
    Jan 25 at 22:14






  • 10





    The documentation should be sufficient. Anything else is second-guessing, and at any rate, will never be seen as definitive, no matter what you prove. It will always be something you did and explainable, probably at your expense, rather than something that is. Armed with the documentation, submit your "warranty" in writing, and simply seek written permission to not return rows in the required order (you won't get it).

    – Haakon Dahl
    Jan 26 at 4:56



















19














This is the black swan story all over again. If you haven't seen one yet it doesn't mean they don't exist. Hopefully in your case it won't lead to another world wide financial crisis, simply to a few unhappy customers.



Postgres documentation says this explicitly:




If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce.




"The system" in this case comprises the postgres daemon itself (including implementation of its data access methods and the query optimizer), the underlying operating system, logical and physical layout of the database storage, possibly even CPU caches. Since you as the database user have no control over that stack you should not rely on it continuing to behave forever the way it behaves this very minute.



Your colleagues are committing the hasty generalization fallacy. To disprove their point it is sufficient to show that their assumption is wrong only once, e.g. by this dbfiddle.






share|improve this answer

































    12














    Consider the following example, where we have three related tables. Orders, Users, and OrderDetails. OrderDetails is linked with foreign keys to the Orders table and the Users Table. This is essentially a very typical setup for relational databases; arguably the entire purpose of a relational DBMS.



    USE tempdb;

    IF OBJECT_ID(N'dbo.OrderDetails', N'U') IS NOT NULL
    DROP TABLE dbo.OrderDetails;

    IF OBJECT_ID(N'dbo.Orders', N'U') IS NOT NULL
    DROP TABLE dbo.Orders;

    IF OBJECT_ID(N'dbo.Users', N'U') IS NOT NULL
    DROP TABLE dbo.Users;

    CREATE TABLE dbo.Orders
    (
    OrderID int NOT NULL
    CONSTRAINT OrderTestPK
    PRIMARY KEY
    CLUSTERED
    , SomeOrderData varchar(1000)
    CONSTRAINT Orders_somedata_df
    DEFAULT (CRYPT_GEN_RANDOM(1000))
    );

    CREATE TABLE dbo.Users
    (
    UserID int NOT NULL
    CONSTRAINT UsersPK
    PRIMARY KEY
    CLUSTERED
    , SomeUserData varchar(1000)
    CONSTRAINT Users_somedata_df
    DEFAULT (CRYPT_GEN_RANDOM(1000))
    );

    CREATE TABLE dbo.OrderDetails
    (
    OrderDetailsID int NOT NULL
    CONSTRAINT OrderDetailsTestPK
    PRIMARY KEY
    CLUSTERED
    , OrderID int NOT NULL
    CONSTRAINT OrderDetailsOrderID
    FOREIGN KEY
    REFERENCES dbo.Orders(OrderID)
    , UserID int NOT NULL
    CONSTRAINT OrderDetailsUserID
    FOREIGN KEY
    REFERENCES dbo.Users(UserID)
    , SomeOrderDetailsData varchar(1000)
    CONSTRAINT OrderDetails_somedata_df
    DEFAULT (CRYPT_GEN_RANDOM(1000))
    );

    INSERT INTO dbo.Orders (OrderID)
    SELECT TOP(100) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM sys.syscolumns sc;

    INSERT INTO dbo.Users (UserID)
    SELECT TOP(100) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    FROM sys.syscolumns sc;

    INSERT INTO dbo.OrderDetails (OrderDetailsID, OrderID, UserID)
    SELECT TOP(10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
    , o.OrderID
    , u.UserID
    FROM sys.syscolumns sc
    CROSS JOIN dbo.Orders o
    CROSS JOIN dbo.Users u
    ORDER BY NEWID();

    CREATE INDEX OrderDetailsOrderID ON dbo.OrderDetails(OrderID);
    CREATE INDEX OrderDetailsUserID ON dbo.OrderDetails(UserID);


    Here, we're querying the OrderDetails table where the UserID is 15:



    SELECT od.OrderDetailsID
    , o.OrderID
    , u.UserID
    FROM dbo.OrderDetails od
    INNER JOIN dbo.Users u ON u.UserID = od.UserID
    INNER JOIN dbo.Orders o ON od.OrderID = o.OrderID
    WHERE u.UserID = 15


    The output from the query looks like:




    ╔════════════════╦═════════╦════════╗
    ║ OrderDetailsID ║ OrderID ║ UserID ║
    ╠════════════════╬═════════╬════════╣
    ║ 2200115 ║ 2 ║ 15 ║
    ║ 630215 ║ 3 ║ 15 ║
    ║ 1990215 ║ 3 ║ 15 ║
    ║ 4960215 ║ 3 ║ 15 ║
    ║ 100715 ║ 8 ║ 15 ║
    ║ 3930815 ║ 9 ║ 15 ║
    ║ 6310815 ║ 9 ║ 15 ║
    ║ 4441015 ║ 11 ║ 15 ║
    ║ 2171315 ║ 14 ║ 15 ║
    ║ 3431415 ║ 15 ║ 15 ║
    ║ 4571415 ║ 15 ║ 15 ║
    ║ 6421515 ║ 16 ║ 15 ║
    ║ 2271715 ║ 18 ║ 15 ║
    ║ 2601715 ║ 18 ║ 15 ║
    ║ 3521715 ║ 18 ║ 15 ║
    ║ 221815 ║ 19 ║ 15 ║
    ║ 3381915 ║ 20 ║ 15 ║
    ║ 4471915 ║ 20 ║ 15 ║
    ╚════════════════╩═════════╩════════╝


    As you can see, the order of rows output does not match the order of rows in the OrderDetails table.



    Adding an explicit ORDER BY ensures rows will be returned to the client in the desired order:



    SELECT od.OrderDetailsID
    , o.OrderID
    , u.UserID
    FROM dbo.OrderDetails od
    INNER JOIN dbo.Users u ON u.UserID = od.UserID
    INNER JOIN dbo.Orders o ON od.OrderID = o.OrderID
    WHERE u.UserID = 15
    ORDER BY od.OrderDetailsID;


    ╔════════════════╦═════════╦════════╗
    ║ OrderDetailsID ║ OrderID ║ UserID ║
    ╠════════════════╬═════════╬════════╣
    ║ 3915 ║ 40 ║ 15 ║
    ║ 100715 ║ 8 ║ 15 ║
    ║ 221815 ║ 19 ║ 15 ║
    ║ 299915 ║ 100 ║ 15 ║
    ║ 368215 ║ 83 ║ 15 ║
    ║ 603815 ║ 39 ║ 15 ║
    ║ 630215 ║ 3 ║ 15 ║
    ║ 728515 ║ 86 ║ 15 ║
    ║ 972215 ║ 23 ║ 15 ║
    ║ 992015 ║ 21 ║ 15 ║
    ║ 1017115 ║ 72 ║ 15 ║
    ║ 1113815 ║ 39 ║ 15 ║
    ╚════════════════╩═════════╩════════╝


    If order of rows is imperative, and your engineers know that order is imperative, they should only ever want to use an ORDER BY statement, since it might cost them their designation if there was a failure related to incorrect order.



    A second, perhaps more instructive example, using the OrderDetails table from above, where we're not joining any other tables, but have a simple requirement to find rows matching both the OrderID and the UserID, we see the problem.



    We'll create an index to support the query, as you would likely do in real life if performance is in any way important (when isn't it?).



    CREATE INDEX OrderDetailsOrderIDUserID ON dbo.OrderDetails(OrderID, UserID);


    Here's the query:



    SELECT od.OrderDetailsID
    FROM dbo.OrderDetails od
    WHERE od.OrderID = 15
    AND (od.UserID = 21 OR od.UserID = 22)


    And the results:




    ╔════════════════╗
    ║ OrderDetailsID ║
    ╠════════════════╣
    ║ 21421 ║
    ║ 5061421 ║
    ║ 7091421 ║
    ║ 691422 ║
    ║ 3471422 ║
    ║ 7241422 ║
    ╚════════════════╝


    Adding an ORDER BY clause will most definitely ensure we get the correct sort here, too.



    These mock-ups are just simple examples where rows are not guaranteed to be "in order" without an explicit ORDER BY statement. There are many more example like this, and since DBMS engine code changes quite frequently, the specific behavior may change over time.






    share|improve this answer

































      10














      As a practical example, in Postgres, the order currently changes when you update a row:



      % SELECT * FROM mytable;
      id | data
      ----+------
      0 | a
      1 | b
      2 | c
      3 | d
      4 | e
      5 | f
      6 | g
      7 | h
      8 | i
      9 | j
      (10 rows)

      % UPDATE mytable SET data = 'ff' WHERE id = 5;
      UPDATE 1
      % SELECT * FROM mytable;
      id | data
      ----+------
      0 | a
      1 | b
      2 | c
      3 | d
      4 | e
      6 | g
      7 | h
      8 | i
      9 | j
      5 | ff
      (10 rows)


      I don't think the rules of this existing implicit ordering is documented anywhere, is definitely subject to change without notice, and is definitely not portable behavior across DB engines.






      share|improve this answer


























      • It is documented: ypercube's answer quotes the documentation telling us that the order is unspecified.

        – Lightness Races in Orbit
        Jan 28 at 13:58











      • @LightnessRacesinOrbit I'd take that as the documentation explicitly telling us that it's not documented. I mean, it's also true that anything not in the documentation is unspecified. It's a kind of tautology. Anyway, I edited that part of the answer to be more specific.

        – JoL
        Jan 28 at 16:46



















      3














      not exactly a demo, but too long for a comment.



      On large tables some databases will do interleaved parallel scans:



      If two queries want to scan the same table, and arrive at almost the same time, the first might be part way through the table when the second starts.



      The second query could receive records starting from the middle of the table (as the first query is completing) and then receive the records from the start of the table.






      share|improve this answer































        2














        Create a clustered index that has the "wrong" order. For example, cluster on ID DESC. This will often output the reverse order (although this is not guaranteed either).






        share|improve this answer























          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "182"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f228131%2fhow-to-prove-the-lack-of-implicit-order-in-a-database%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown
























          6 Answers
          6






          active

          oldest

          votes








          6 Answers
          6






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          31














          I see three ways to try to convince them:




          1. Let them try the same query but with bigger table (more number of rows) or when the table is being updated between executions. Or new rows are inserted and some old ones are deleted. Or an index is added or removed between executions. Or the table is vacuumed (in Postgres). Or indexes are rebuilt (in SQL Server). Or the table is changed from clustered to a heap. Or the database service is restarted.


          2. You can suggest that they prove that different executions will return the same order. Can they prove it? Can they provide a series of tests that proves that any query will give the result in the same order, no matter how many times it is executed?


          3. Provide the documentation of various DBMS in that matter. For example:



          PostgreSQL:




          Sorting Rows



          After a query has produced an output table (after the select list has been processed) it can optionally be sorted. If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on. A particular output ordering can only be guaranteed if the sort step is explicitly chosen.




          SQL Server:




          SELECT - ORDER BY Clause (Transact-SQL)



          Sorts data returned by a query in SQL Server. Use this clause to:



          Order the result set of a query by the specified column list and, optionally, limit the rows returned to a specified range. The order in which rows are returned in a result set are not guaranteed unless an ORDER BY clause is specified.




          Oracle:




          order_by_clause



          Use the ORDER BY clause to order rows returned by the statement. Without an order_by_clause, no guarantee exists that the same query executed more than once will retrieve rows in the same order.







          share|improve this answer


























          • With very small tables that are not modified, you may see this behaviour. That is expected. But it is not guaranteed either. The order may change because you added an index or you modified an index or you restarted the database and possibly many other cases.

            – ypercubeᵀᴹ
            Jan 25 at 21:54






          • 6





            If the order matters, then who ever is responsible for reviewing their code should reject until they use ORDER BY. The developers of the DBMSs (Oracle, SQL Server, Postgres) all say the same thing about what their product guarantess and what not (and they are paid much more than I will be, so they know what they saying, besides having built these damn things).

            – ypercubeᵀᴹ
            Jan 25 at 22:05






          • 1





            Even if the order looks the same now, is it certain that these tables will never be updated in the whole lifetime of the software you are building? That no more rows will be inserted, ever?

            – ypercubeᵀᴹ
            Jan 25 at 22:09








          • 1





            Is there a guarantee that this table will always be this small? Is there a guarantee that no more columns will be added? I can see tens of different cases where the table may be changed in the future (and some of these changes may affect the order of a query result). I suggest you ask them to answer all of these. Can they guarantee that nothing like that will ever happen? And why won't they add a simple ORDER BY, which will guarantee the order, no matter how the table is going to change? Why not have a safe added, which does no harm?

            – ypercubeᵀᴹ
            Jan 25 at 22:14






          • 10





            The documentation should be sufficient. Anything else is second-guessing, and at any rate, will never be seen as definitive, no matter what you prove. It will always be something you did and explainable, probably at your expense, rather than something that is. Armed with the documentation, submit your "warranty" in writing, and simply seek written permission to not return rows in the required order (you won't get it).

            – Haakon Dahl
            Jan 26 at 4:56
















          31














          I see three ways to try to convince them:




          1. Let them try the same query but with bigger table (more number of rows) or when the table is being updated between executions. Or new rows are inserted and some old ones are deleted. Or an index is added or removed between executions. Or the table is vacuumed (in Postgres). Or indexes are rebuilt (in SQL Server). Or the table is changed from clustered to a heap. Or the database service is restarted.


          2. You can suggest that they prove that different executions will return the same order. Can they prove it? Can they provide a series of tests that proves that any query will give the result in the same order, no matter how many times it is executed?


          3. Provide the documentation of various DBMS in that matter. For example:



          PostgreSQL:




          Sorting Rows



          After a query has produced an output table (after the select list has been processed) it can optionally be sorted. If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on. A particular output ordering can only be guaranteed if the sort step is explicitly chosen.




          SQL Server:




          SELECT - ORDER BY Clause (Transact-SQL)



          Sorts data returned by a query in SQL Server. Use this clause to:



          Order the result set of a query by the specified column list and, optionally, limit the rows returned to a specified range. The order in which rows are returned in a result set are not guaranteed unless an ORDER BY clause is specified.




          Oracle:




          order_by_clause



          Use the ORDER BY clause to order rows returned by the statement. Without an order_by_clause, no guarantee exists that the same query executed more than once will retrieve rows in the same order.







          share|improve this answer


























          • With very small tables that are not modified, you may see this behaviour. That is expected. But it is not guaranteed either. The order may change because you added an index or you modified an index or you restarted the database and possibly many other cases.

            – ypercubeᵀᴹ
            Jan 25 at 21:54






          • 6





            If the order matters, then who ever is responsible for reviewing their code should reject until they use ORDER BY. The developers of the DBMSs (Oracle, SQL Server, Postgres) all say the same thing about what their product guarantess and what not (and they are paid much more than I will be, so they know what they saying, besides having built these damn things).

            – ypercubeᵀᴹ
            Jan 25 at 22:05






          • 1





            Even if the order looks the same now, is it certain that these tables will never be updated in the whole lifetime of the software you are building? That no more rows will be inserted, ever?

            – ypercubeᵀᴹ
            Jan 25 at 22:09








          • 1





            Is there a guarantee that this table will always be this small? Is there a guarantee that no more columns will be added? I can see tens of different cases where the table may be changed in the future (and some of these changes may affect the order of a query result). I suggest you ask them to answer all of these. Can they guarantee that nothing like that will ever happen? And why won't they add a simple ORDER BY, which will guarantee the order, no matter how the table is going to change? Why not have a safe added, which does no harm?

            – ypercubeᵀᴹ
            Jan 25 at 22:14






          • 10





            The documentation should be sufficient. Anything else is second-guessing, and at any rate, will never be seen as definitive, no matter what you prove. It will always be something you did and explainable, probably at your expense, rather than something that is. Armed with the documentation, submit your "warranty" in writing, and simply seek written permission to not return rows in the required order (you won't get it).

            – Haakon Dahl
            Jan 26 at 4:56














          31












          31








          31







          I see three ways to try to convince them:




          1. Let them try the same query but with bigger table (more number of rows) or when the table is being updated between executions. Or new rows are inserted and some old ones are deleted. Or an index is added or removed between executions. Or the table is vacuumed (in Postgres). Or indexes are rebuilt (in SQL Server). Or the table is changed from clustered to a heap. Or the database service is restarted.


          2. You can suggest that they prove that different executions will return the same order. Can they prove it? Can they provide a series of tests that proves that any query will give the result in the same order, no matter how many times it is executed?


          3. Provide the documentation of various DBMS in that matter. For example:



          PostgreSQL:




          Sorting Rows



          After a query has produced an output table (after the select list has been processed) it can optionally be sorted. If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on. A particular output ordering can only be guaranteed if the sort step is explicitly chosen.




          SQL Server:




          SELECT - ORDER BY Clause (Transact-SQL)



          Sorts data returned by a query in SQL Server. Use this clause to:



          Order the result set of a query by the specified column list and, optionally, limit the rows returned to a specified range. The order in which rows are returned in a result set are not guaranteed unless an ORDER BY clause is specified.




          Oracle:




          order_by_clause



          Use the ORDER BY clause to order rows returned by the statement. Without an order_by_clause, no guarantee exists that the same query executed more than once will retrieve rows in the same order.







          share|improve this answer















          I see three ways to try to convince them:




          1. Let them try the same query but with bigger table (more number of rows) or when the table is being updated between executions. Or new rows are inserted and some old ones are deleted. Or an index is added or removed between executions. Or the table is vacuumed (in Postgres). Or indexes are rebuilt (in SQL Server). Or the table is changed from clustered to a heap. Or the database service is restarted.


          2. You can suggest that they prove that different executions will return the same order. Can they prove it? Can they provide a series of tests that proves that any query will give the result in the same order, no matter how many times it is executed?


          3. Provide the documentation of various DBMS in that matter. For example:



          PostgreSQL:




          Sorting Rows



          After a query has produced an output table (after the select list has been processed) it can optionally be sorted. If sorting is not chosen, the rows will be returned in an unspecified order. The actual order in that case will depend on the scan and join plan types and the order on disk, but it must not be relied on. A particular output ordering can only be guaranteed if the sort step is explicitly chosen.




          SQL Server:




          SELECT - ORDER BY Clause (Transact-SQL)



          Sorts data returned by a query in SQL Server. Use this clause to:



          Order the result set of a query by the specified column list and, optionally, limit the rows returned to a specified range. The order in which rows are returned in a result set are not guaranteed unless an ORDER BY clause is specified.




          Oracle:




          order_by_clause



          Use the ORDER BY clause to order rows returned by the statement. Without an order_by_clause, no guarantee exists that the same query executed more than once will retrieve rows in the same order.








          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jan 25 at 21:58

























          answered Jan 25 at 21:43









          ypercubeᵀᴹypercubeᵀᴹ

          76.5k11131213




          76.5k11131213













          • With very small tables that are not modified, you may see this behaviour. That is expected. But it is not guaranteed either. The order may change because you added an index or you modified an index or you restarted the database and possibly many other cases.

            – ypercubeᵀᴹ
            Jan 25 at 21:54






          • 6





            If the order matters, then who ever is responsible for reviewing their code should reject until they use ORDER BY. The developers of the DBMSs (Oracle, SQL Server, Postgres) all say the same thing about what their product guarantess and what not (and they are paid much more than I will be, so they know what they saying, besides having built these damn things).

            – ypercubeᵀᴹ
            Jan 25 at 22:05






          • 1





            Even if the order looks the same now, is it certain that these tables will never be updated in the whole lifetime of the software you are building? That no more rows will be inserted, ever?

            – ypercubeᵀᴹ
            Jan 25 at 22:09








          • 1





            Is there a guarantee that this table will always be this small? Is there a guarantee that no more columns will be added? I can see tens of different cases where the table may be changed in the future (and some of these changes may affect the order of a query result). I suggest you ask them to answer all of these. Can they guarantee that nothing like that will ever happen? And why won't they add a simple ORDER BY, which will guarantee the order, no matter how the table is going to change? Why not have a safe added, which does no harm?

            – ypercubeᵀᴹ
            Jan 25 at 22:14






          • 10





            The documentation should be sufficient. Anything else is second-guessing, and at any rate, will never be seen as definitive, no matter what you prove. It will always be something you did and explainable, probably at your expense, rather than something that is. Armed with the documentation, submit your "warranty" in writing, and simply seek written permission to not return rows in the required order (you won't get it).

            – Haakon Dahl
            Jan 26 at 4:56



















          • With very small tables that are not modified, you may see this behaviour. That is expected. But it is not guaranteed either. The order may change because you added an index or you modified an index or you restarted the database and possibly many other cases.

            – ypercubeᵀᴹ
            Jan 25 at 21:54






          • 6





            If the order matters, then who ever is responsible for reviewing their code should reject until they use ORDER BY. The developers of the DBMSs (Oracle, SQL Server, Postgres) all say the same thing about what their product guarantess and what not (and they are paid much more than I will be, so they know what they saying, besides having built these damn things).

            – ypercubeᵀᴹ
            Jan 25 at 22:05






          • 1





            Even if the order looks the same now, is it certain that these tables will never be updated in the whole lifetime of the software you are building? That no more rows will be inserted, ever?

            – ypercubeᵀᴹ
            Jan 25 at 22:09








          • 1





            Is there a guarantee that this table will always be this small? Is there a guarantee that no more columns will be added? I can see tens of different cases where the table may be changed in the future (and some of these changes may affect the order of a query result). I suggest you ask them to answer all of these. Can they guarantee that nothing like that will ever happen? And why won't they add a simple ORDER BY, which will guarantee the order, no matter how the table is going to change? Why not have a safe added, which does no harm?

            – ypercubeᵀᴹ
            Jan 25 at 22:14






          • 10





            The documentation should be sufficient. Anything else is second-guessing, and at any rate, will never be seen as definitive, no matter what you prove. It will always be something you did and explainable, probably at your expense, rather than something that is. Armed with the documentation, submit your "warranty" in writing, and simply seek written permission to not return rows in the required order (you won't get it).

            – Haakon Dahl
            Jan 26 at 4:56

















          With very small tables that are not modified, you may see this behaviour. That is expected. But it is not guaranteed either. The order may change because you added an index or you modified an index or you restarted the database and possibly many other cases.

          – ypercubeᵀᴹ
          Jan 25 at 21:54





          With very small tables that are not modified, you may see this behaviour. That is expected. But it is not guaranteed either. The order may change because you added an index or you modified an index or you restarted the database and possibly many other cases.

          – ypercubeᵀᴹ
          Jan 25 at 21:54




          6




          6





          If the order matters, then who ever is responsible for reviewing their code should reject until they use ORDER BY. The developers of the DBMSs (Oracle, SQL Server, Postgres) all say the same thing about what their product guarantess and what not (and they are paid much more than I will be, so they know what they saying, besides having built these damn things).

          – ypercubeᵀᴹ
          Jan 25 at 22:05





          If the order matters, then who ever is responsible for reviewing their code should reject until they use ORDER BY. The developers of the DBMSs (Oracle, SQL Server, Postgres) all say the same thing about what their product guarantess and what not (and they are paid much more than I will be, so they know what they saying, besides having built these damn things).

          – ypercubeᵀᴹ
          Jan 25 at 22:05




          1




          1





          Even if the order looks the same now, is it certain that these tables will never be updated in the whole lifetime of the software you are building? That no more rows will be inserted, ever?

          – ypercubeᵀᴹ
          Jan 25 at 22:09







          Even if the order looks the same now, is it certain that these tables will never be updated in the whole lifetime of the software you are building? That no more rows will be inserted, ever?

          – ypercubeᵀᴹ
          Jan 25 at 22:09






          1




          1





          Is there a guarantee that this table will always be this small? Is there a guarantee that no more columns will be added? I can see tens of different cases where the table may be changed in the future (and some of these changes may affect the order of a query result). I suggest you ask them to answer all of these. Can they guarantee that nothing like that will ever happen? And why won't they add a simple ORDER BY, which will guarantee the order, no matter how the table is going to change? Why not have a safe added, which does no harm?

          – ypercubeᵀᴹ
          Jan 25 at 22:14





          Is there a guarantee that this table will always be this small? Is there a guarantee that no more columns will be added? I can see tens of different cases where the table may be changed in the future (and some of these changes may affect the order of a query result). I suggest you ask them to answer all of these. Can they guarantee that nothing like that will ever happen? And why won't they add a simple ORDER BY, which will guarantee the order, no matter how the table is going to change? Why not have a safe added, which does no harm?

          – ypercubeᵀᴹ
          Jan 25 at 22:14




          10




          10





          The documentation should be sufficient. Anything else is second-guessing, and at any rate, will never be seen as definitive, no matter what you prove. It will always be something you did and explainable, probably at your expense, rather than something that is. Armed with the documentation, submit your "warranty" in writing, and simply seek written permission to not return rows in the required order (you won't get it).

          – Haakon Dahl
          Jan 26 at 4:56





          The documentation should be sufficient. Anything else is second-guessing, and at any rate, will never be seen as definitive, no matter what you prove. It will always be something you did and explainable, probably at your expense, rather than something that is. Armed with the documentation, submit your "warranty" in writing, and simply seek written permission to not return rows in the required order (you won't get it).

          – Haakon Dahl
          Jan 26 at 4:56













          19














          This is the black swan story all over again. If you haven't seen one yet it doesn't mean they don't exist. Hopefully in your case it won't lead to another world wide financial crisis, simply to a few unhappy customers.



          Postgres documentation says this explicitly:




          If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce.




          "The system" in this case comprises the postgres daemon itself (including implementation of its data access methods and the query optimizer), the underlying operating system, logical and physical layout of the database storage, possibly even CPU caches. Since you as the database user have no control over that stack you should not rely on it continuing to behave forever the way it behaves this very minute.



          Your colleagues are committing the hasty generalization fallacy. To disprove their point it is sufficient to show that their assumption is wrong only once, e.g. by this dbfiddle.






          share|improve this answer






























            19














            This is the black swan story all over again. If you haven't seen one yet it doesn't mean they don't exist. Hopefully in your case it won't lead to another world wide financial crisis, simply to a few unhappy customers.



            Postgres documentation says this explicitly:




            If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce.




            "The system" in this case comprises the postgres daemon itself (including implementation of its data access methods and the query optimizer), the underlying operating system, logical and physical layout of the database storage, possibly even CPU caches. Since you as the database user have no control over that stack you should not rely on it continuing to behave forever the way it behaves this very minute.



            Your colleagues are committing the hasty generalization fallacy. To disprove their point it is sufficient to show that their assumption is wrong only once, e.g. by this dbfiddle.






            share|improve this answer




























              19












              19








              19







              This is the black swan story all over again. If you haven't seen one yet it doesn't mean they don't exist. Hopefully in your case it won't lead to another world wide financial crisis, simply to a few unhappy customers.



              Postgres documentation says this explicitly:




              If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce.




              "The system" in this case comprises the postgres daemon itself (including implementation of its data access methods and the query optimizer), the underlying operating system, logical and physical layout of the database storage, possibly even CPU caches. Since you as the database user have no control over that stack you should not rely on it continuing to behave forever the way it behaves this very minute.



              Your colleagues are committing the hasty generalization fallacy. To disprove their point it is sufficient to show that their assumption is wrong only once, e.g. by this dbfiddle.






              share|improve this answer















              This is the black swan story all over again. If you haven't seen one yet it doesn't mean they don't exist. Hopefully in your case it won't lead to another world wide financial crisis, simply to a few unhappy customers.



              Postgres documentation says this explicitly:




              If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce.




              "The system" in this case comprises the postgres daemon itself (including implementation of its data access methods and the query optimizer), the underlying operating system, logical and physical layout of the database storage, possibly even CPU caches. Since you as the database user have no control over that stack you should not rely on it continuing to behave forever the way it behaves this very minute.



              Your colleagues are committing the hasty generalization fallacy. To disprove their point it is sufficient to show that their assumption is wrong only once, e.g. by this dbfiddle.







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Jan 25 at 23:12

























              answered Jan 25 at 21:50









              mustacciomustaccio

              9,55872237




              9,55872237























                  12














                  Consider the following example, where we have three related tables. Orders, Users, and OrderDetails. OrderDetails is linked with foreign keys to the Orders table and the Users Table. This is essentially a very typical setup for relational databases; arguably the entire purpose of a relational DBMS.



                  USE tempdb;

                  IF OBJECT_ID(N'dbo.OrderDetails', N'U') IS NOT NULL
                  DROP TABLE dbo.OrderDetails;

                  IF OBJECT_ID(N'dbo.Orders', N'U') IS NOT NULL
                  DROP TABLE dbo.Orders;

                  IF OBJECT_ID(N'dbo.Users', N'U') IS NOT NULL
                  DROP TABLE dbo.Users;

                  CREATE TABLE dbo.Orders
                  (
                  OrderID int NOT NULL
                  CONSTRAINT OrderTestPK
                  PRIMARY KEY
                  CLUSTERED
                  , SomeOrderData varchar(1000)
                  CONSTRAINT Orders_somedata_df
                  DEFAULT (CRYPT_GEN_RANDOM(1000))
                  );

                  CREATE TABLE dbo.Users
                  (
                  UserID int NOT NULL
                  CONSTRAINT UsersPK
                  PRIMARY KEY
                  CLUSTERED
                  , SomeUserData varchar(1000)
                  CONSTRAINT Users_somedata_df
                  DEFAULT (CRYPT_GEN_RANDOM(1000))
                  );

                  CREATE TABLE dbo.OrderDetails
                  (
                  OrderDetailsID int NOT NULL
                  CONSTRAINT OrderDetailsTestPK
                  PRIMARY KEY
                  CLUSTERED
                  , OrderID int NOT NULL
                  CONSTRAINT OrderDetailsOrderID
                  FOREIGN KEY
                  REFERENCES dbo.Orders(OrderID)
                  , UserID int NOT NULL
                  CONSTRAINT OrderDetailsUserID
                  FOREIGN KEY
                  REFERENCES dbo.Users(UserID)
                  , SomeOrderDetailsData varchar(1000)
                  CONSTRAINT OrderDetails_somedata_df
                  DEFAULT (CRYPT_GEN_RANDOM(1000))
                  );

                  INSERT INTO dbo.Orders (OrderID)
                  SELECT TOP(100) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                  FROM sys.syscolumns sc;

                  INSERT INTO dbo.Users (UserID)
                  SELECT TOP(100) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                  FROM sys.syscolumns sc;

                  INSERT INTO dbo.OrderDetails (OrderDetailsID, OrderID, UserID)
                  SELECT TOP(10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                  , o.OrderID
                  , u.UserID
                  FROM sys.syscolumns sc
                  CROSS JOIN dbo.Orders o
                  CROSS JOIN dbo.Users u
                  ORDER BY NEWID();

                  CREATE INDEX OrderDetailsOrderID ON dbo.OrderDetails(OrderID);
                  CREATE INDEX OrderDetailsUserID ON dbo.OrderDetails(UserID);


                  Here, we're querying the OrderDetails table where the UserID is 15:



                  SELECT od.OrderDetailsID
                  , o.OrderID
                  , u.UserID
                  FROM dbo.OrderDetails od
                  INNER JOIN dbo.Users u ON u.UserID = od.UserID
                  INNER JOIN dbo.Orders o ON od.OrderID = o.OrderID
                  WHERE u.UserID = 15


                  The output from the query looks like:




                  ╔════════════════╦═════════╦════════╗
                  ║ OrderDetailsID ║ OrderID ║ UserID ║
                  ╠════════════════╬═════════╬════════╣
                  ║ 2200115 ║ 2 ║ 15 ║
                  ║ 630215 ║ 3 ║ 15 ║
                  ║ 1990215 ║ 3 ║ 15 ║
                  ║ 4960215 ║ 3 ║ 15 ║
                  ║ 100715 ║ 8 ║ 15 ║
                  ║ 3930815 ║ 9 ║ 15 ║
                  ║ 6310815 ║ 9 ║ 15 ║
                  ║ 4441015 ║ 11 ║ 15 ║
                  ║ 2171315 ║ 14 ║ 15 ║
                  ║ 3431415 ║ 15 ║ 15 ║
                  ║ 4571415 ║ 15 ║ 15 ║
                  ║ 6421515 ║ 16 ║ 15 ║
                  ║ 2271715 ║ 18 ║ 15 ║
                  ║ 2601715 ║ 18 ║ 15 ║
                  ║ 3521715 ║ 18 ║ 15 ║
                  ║ 221815 ║ 19 ║ 15 ║
                  ║ 3381915 ║ 20 ║ 15 ║
                  ║ 4471915 ║ 20 ║ 15 ║
                  ╚════════════════╩═════════╩════════╝


                  As you can see, the order of rows output does not match the order of rows in the OrderDetails table.



                  Adding an explicit ORDER BY ensures rows will be returned to the client in the desired order:



                  SELECT od.OrderDetailsID
                  , o.OrderID
                  , u.UserID
                  FROM dbo.OrderDetails od
                  INNER JOIN dbo.Users u ON u.UserID = od.UserID
                  INNER JOIN dbo.Orders o ON od.OrderID = o.OrderID
                  WHERE u.UserID = 15
                  ORDER BY od.OrderDetailsID;


                  ╔════════════════╦═════════╦════════╗
                  ║ OrderDetailsID ║ OrderID ║ UserID ║
                  ╠════════════════╬═════════╬════════╣
                  ║ 3915 ║ 40 ║ 15 ║
                  ║ 100715 ║ 8 ║ 15 ║
                  ║ 221815 ║ 19 ║ 15 ║
                  ║ 299915 ║ 100 ║ 15 ║
                  ║ 368215 ║ 83 ║ 15 ║
                  ║ 603815 ║ 39 ║ 15 ║
                  ║ 630215 ║ 3 ║ 15 ║
                  ║ 728515 ║ 86 ║ 15 ║
                  ║ 972215 ║ 23 ║ 15 ║
                  ║ 992015 ║ 21 ║ 15 ║
                  ║ 1017115 ║ 72 ║ 15 ║
                  ║ 1113815 ║ 39 ║ 15 ║
                  ╚════════════════╩═════════╩════════╝


                  If order of rows is imperative, and your engineers know that order is imperative, they should only ever want to use an ORDER BY statement, since it might cost them their designation if there was a failure related to incorrect order.



                  A second, perhaps more instructive example, using the OrderDetails table from above, where we're not joining any other tables, but have a simple requirement to find rows matching both the OrderID and the UserID, we see the problem.



                  We'll create an index to support the query, as you would likely do in real life if performance is in any way important (when isn't it?).



                  CREATE INDEX OrderDetailsOrderIDUserID ON dbo.OrderDetails(OrderID, UserID);


                  Here's the query:



                  SELECT od.OrderDetailsID
                  FROM dbo.OrderDetails od
                  WHERE od.OrderID = 15
                  AND (od.UserID = 21 OR od.UserID = 22)


                  And the results:




                  ╔════════════════╗
                  ║ OrderDetailsID ║
                  ╠════════════════╣
                  ║ 21421 ║
                  ║ 5061421 ║
                  ║ 7091421 ║
                  ║ 691422 ║
                  ║ 3471422 ║
                  ║ 7241422 ║
                  ╚════════════════╝


                  Adding an ORDER BY clause will most definitely ensure we get the correct sort here, too.



                  These mock-ups are just simple examples where rows are not guaranteed to be "in order" without an explicit ORDER BY statement. There are many more example like this, and since DBMS engine code changes quite frequently, the specific behavior may change over time.






                  share|improve this answer






























                    12














                    Consider the following example, where we have three related tables. Orders, Users, and OrderDetails. OrderDetails is linked with foreign keys to the Orders table and the Users Table. This is essentially a very typical setup for relational databases; arguably the entire purpose of a relational DBMS.



                    USE tempdb;

                    IF OBJECT_ID(N'dbo.OrderDetails', N'U') IS NOT NULL
                    DROP TABLE dbo.OrderDetails;

                    IF OBJECT_ID(N'dbo.Orders', N'U') IS NOT NULL
                    DROP TABLE dbo.Orders;

                    IF OBJECT_ID(N'dbo.Users', N'U') IS NOT NULL
                    DROP TABLE dbo.Users;

                    CREATE TABLE dbo.Orders
                    (
                    OrderID int NOT NULL
                    CONSTRAINT OrderTestPK
                    PRIMARY KEY
                    CLUSTERED
                    , SomeOrderData varchar(1000)
                    CONSTRAINT Orders_somedata_df
                    DEFAULT (CRYPT_GEN_RANDOM(1000))
                    );

                    CREATE TABLE dbo.Users
                    (
                    UserID int NOT NULL
                    CONSTRAINT UsersPK
                    PRIMARY KEY
                    CLUSTERED
                    , SomeUserData varchar(1000)
                    CONSTRAINT Users_somedata_df
                    DEFAULT (CRYPT_GEN_RANDOM(1000))
                    );

                    CREATE TABLE dbo.OrderDetails
                    (
                    OrderDetailsID int NOT NULL
                    CONSTRAINT OrderDetailsTestPK
                    PRIMARY KEY
                    CLUSTERED
                    , OrderID int NOT NULL
                    CONSTRAINT OrderDetailsOrderID
                    FOREIGN KEY
                    REFERENCES dbo.Orders(OrderID)
                    , UserID int NOT NULL
                    CONSTRAINT OrderDetailsUserID
                    FOREIGN KEY
                    REFERENCES dbo.Users(UserID)
                    , SomeOrderDetailsData varchar(1000)
                    CONSTRAINT OrderDetails_somedata_df
                    DEFAULT (CRYPT_GEN_RANDOM(1000))
                    );

                    INSERT INTO dbo.Orders (OrderID)
                    SELECT TOP(100) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                    FROM sys.syscolumns sc;

                    INSERT INTO dbo.Users (UserID)
                    SELECT TOP(100) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                    FROM sys.syscolumns sc;

                    INSERT INTO dbo.OrderDetails (OrderDetailsID, OrderID, UserID)
                    SELECT TOP(10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                    , o.OrderID
                    , u.UserID
                    FROM sys.syscolumns sc
                    CROSS JOIN dbo.Orders o
                    CROSS JOIN dbo.Users u
                    ORDER BY NEWID();

                    CREATE INDEX OrderDetailsOrderID ON dbo.OrderDetails(OrderID);
                    CREATE INDEX OrderDetailsUserID ON dbo.OrderDetails(UserID);


                    Here, we're querying the OrderDetails table where the UserID is 15:



                    SELECT od.OrderDetailsID
                    , o.OrderID
                    , u.UserID
                    FROM dbo.OrderDetails od
                    INNER JOIN dbo.Users u ON u.UserID = od.UserID
                    INNER JOIN dbo.Orders o ON od.OrderID = o.OrderID
                    WHERE u.UserID = 15


                    The output from the query looks like:




                    ╔════════════════╦═════════╦════════╗
                    ║ OrderDetailsID ║ OrderID ║ UserID ║
                    ╠════════════════╬═════════╬════════╣
                    ║ 2200115 ║ 2 ║ 15 ║
                    ║ 630215 ║ 3 ║ 15 ║
                    ║ 1990215 ║ 3 ║ 15 ║
                    ║ 4960215 ║ 3 ║ 15 ║
                    ║ 100715 ║ 8 ║ 15 ║
                    ║ 3930815 ║ 9 ║ 15 ║
                    ║ 6310815 ║ 9 ║ 15 ║
                    ║ 4441015 ║ 11 ║ 15 ║
                    ║ 2171315 ║ 14 ║ 15 ║
                    ║ 3431415 ║ 15 ║ 15 ║
                    ║ 4571415 ║ 15 ║ 15 ║
                    ║ 6421515 ║ 16 ║ 15 ║
                    ║ 2271715 ║ 18 ║ 15 ║
                    ║ 2601715 ║ 18 ║ 15 ║
                    ║ 3521715 ║ 18 ║ 15 ║
                    ║ 221815 ║ 19 ║ 15 ║
                    ║ 3381915 ║ 20 ║ 15 ║
                    ║ 4471915 ║ 20 ║ 15 ║
                    ╚════════════════╩═════════╩════════╝


                    As you can see, the order of rows output does not match the order of rows in the OrderDetails table.



                    Adding an explicit ORDER BY ensures rows will be returned to the client in the desired order:



                    SELECT od.OrderDetailsID
                    , o.OrderID
                    , u.UserID
                    FROM dbo.OrderDetails od
                    INNER JOIN dbo.Users u ON u.UserID = od.UserID
                    INNER JOIN dbo.Orders o ON od.OrderID = o.OrderID
                    WHERE u.UserID = 15
                    ORDER BY od.OrderDetailsID;


                    ╔════════════════╦═════════╦════════╗
                    ║ OrderDetailsID ║ OrderID ║ UserID ║
                    ╠════════════════╬═════════╬════════╣
                    ║ 3915 ║ 40 ║ 15 ║
                    ║ 100715 ║ 8 ║ 15 ║
                    ║ 221815 ║ 19 ║ 15 ║
                    ║ 299915 ║ 100 ║ 15 ║
                    ║ 368215 ║ 83 ║ 15 ║
                    ║ 603815 ║ 39 ║ 15 ║
                    ║ 630215 ║ 3 ║ 15 ║
                    ║ 728515 ║ 86 ║ 15 ║
                    ║ 972215 ║ 23 ║ 15 ║
                    ║ 992015 ║ 21 ║ 15 ║
                    ║ 1017115 ║ 72 ║ 15 ║
                    ║ 1113815 ║ 39 ║ 15 ║
                    ╚════════════════╩═════════╩════════╝


                    If order of rows is imperative, and your engineers know that order is imperative, they should only ever want to use an ORDER BY statement, since it might cost them their designation if there was a failure related to incorrect order.



                    A second, perhaps more instructive example, using the OrderDetails table from above, where we're not joining any other tables, but have a simple requirement to find rows matching both the OrderID and the UserID, we see the problem.



                    We'll create an index to support the query, as you would likely do in real life if performance is in any way important (when isn't it?).



                    CREATE INDEX OrderDetailsOrderIDUserID ON dbo.OrderDetails(OrderID, UserID);


                    Here's the query:



                    SELECT od.OrderDetailsID
                    FROM dbo.OrderDetails od
                    WHERE od.OrderID = 15
                    AND (od.UserID = 21 OR od.UserID = 22)


                    And the results:




                    ╔════════════════╗
                    ║ OrderDetailsID ║
                    ╠════════════════╣
                    ║ 21421 ║
                    ║ 5061421 ║
                    ║ 7091421 ║
                    ║ 691422 ║
                    ║ 3471422 ║
                    ║ 7241422 ║
                    ╚════════════════╝


                    Adding an ORDER BY clause will most definitely ensure we get the correct sort here, too.



                    These mock-ups are just simple examples where rows are not guaranteed to be "in order" without an explicit ORDER BY statement. There are many more example like this, and since DBMS engine code changes quite frequently, the specific behavior may change over time.






                    share|improve this answer




























                      12












                      12








                      12







                      Consider the following example, where we have three related tables. Orders, Users, and OrderDetails. OrderDetails is linked with foreign keys to the Orders table and the Users Table. This is essentially a very typical setup for relational databases; arguably the entire purpose of a relational DBMS.



                      USE tempdb;

                      IF OBJECT_ID(N'dbo.OrderDetails', N'U') IS NOT NULL
                      DROP TABLE dbo.OrderDetails;

                      IF OBJECT_ID(N'dbo.Orders', N'U') IS NOT NULL
                      DROP TABLE dbo.Orders;

                      IF OBJECT_ID(N'dbo.Users', N'U') IS NOT NULL
                      DROP TABLE dbo.Users;

                      CREATE TABLE dbo.Orders
                      (
                      OrderID int NOT NULL
                      CONSTRAINT OrderTestPK
                      PRIMARY KEY
                      CLUSTERED
                      , SomeOrderData varchar(1000)
                      CONSTRAINT Orders_somedata_df
                      DEFAULT (CRYPT_GEN_RANDOM(1000))
                      );

                      CREATE TABLE dbo.Users
                      (
                      UserID int NOT NULL
                      CONSTRAINT UsersPK
                      PRIMARY KEY
                      CLUSTERED
                      , SomeUserData varchar(1000)
                      CONSTRAINT Users_somedata_df
                      DEFAULT (CRYPT_GEN_RANDOM(1000))
                      );

                      CREATE TABLE dbo.OrderDetails
                      (
                      OrderDetailsID int NOT NULL
                      CONSTRAINT OrderDetailsTestPK
                      PRIMARY KEY
                      CLUSTERED
                      , OrderID int NOT NULL
                      CONSTRAINT OrderDetailsOrderID
                      FOREIGN KEY
                      REFERENCES dbo.Orders(OrderID)
                      , UserID int NOT NULL
                      CONSTRAINT OrderDetailsUserID
                      FOREIGN KEY
                      REFERENCES dbo.Users(UserID)
                      , SomeOrderDetailsData varchar(1000)
                      CONSTRAINT OrderDetails_somedata_df
                      DEFAULT (CRYPT_GEN_RANDOM(1000))
                      );

                      INSERT INTO dbo.Orders (OrderID)
                      SELECT TOP(100) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                      FROM sys.syscolumns sc;

                      INSERT INTO dbo.Users (UserID)
                      SELECT TOP(100) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                      FROM sys.syscolumns sc;

                      INSERT INTO dbo.OrderDetails (OrderDetailsID, OrderID, UserID)
                      SELECT TOP(10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                      , o.OrderID
                      , u.UserID
                      FROM sys.syscolumns sc
                      CROSS JOIN dbo.Orders o
                      CROSS JOIN dbo.Users u
                      ORDER BY NEWID();

                      CREATE INDEX OrderDetailsOrderID ON dbo.OrderDetails(OrderID);
                      CREATE INDEX OrderDetailsUserID ON dbo.OrderDetails(UserID);


                      Here, we're querying the OrderDetails table where the UserID is 15:



                      SELECT od.OrderDetailsID
                      , o.OrderID
                      , u.UserID
                      FROM dbo.OrderDetails od
                      INNER JOIN dbo.Users u ON u.UserID = od.UserID
                      INNER JOIN dbo.Orders o ON od.OrderID = o.OrderID
                      WHERE u.UserID = 15


                      The output from the query looks like:




                      ╔════════════════╦═════════╦════════╗
                      ║ OrderDetailsID ║ OrderID ║ UserID ║
                      ╠════════════════╬═════════╬════════╣
                      ║ 2200115 ║ 2 ║ 15 ║
                      ║ 630215 ║ 3 ║ 15 ║
                      ║ 1990215 ║ 3 ║ 15 ║
                      ║ 4960215 ║ 3 ║ 15 ║
                      ║ 100715 ║ 8 ║ 15 ║
                      ║ 3930815 ║ 9 ║ 15 ║
                      ║ 6310815 ║ 9 ║ 15 ║
                      ║ 4441015 ║ 11 ║ 15 ║
                      ║ 2171315 ║ 14 ║ 15 ║
                      ║ 3431415 ║ 15 ║ 15 ║
                      ║ 4571415 ║ 15 ║ 15 ║
                      ║ 6421515 ║ 16 ║ 15 ║
                      ║ 2271715 ║ 18 ║ 15 ║
                      ║ 2601715 ║ 18 ║ 15 ║
                      ║ 3521715 ║ 18 ║ 15 ║
                      ║ 221815 ║ 19 ║ 15 ║
                      ║ 3381915 ║ 20 ║ 15 ║
                      ║ 4471915 ║ 20 ║ 15 ║
                      ╚════════════════╩═════════╩════════╝


                      As you can see, the order of rows output does not match the order of rows in the OrderDetails table.



                      Adding an explicit ORDER BY ensures rows will be returned to the client in the desired order:



                      SELECT od.OrderDetailsID
                      , o.OrderID
                      , u.UserID
                      FROM dbo.OrderDetails od
                      INNER JOIN dbo.Users u ON u.UserID = od.UserID
                      INNER JOIN dbo.Orders o ON od.OrderID = o.OrderID
                      WHERE u.UserID = 15
                      ORDER BY od.OrderDetailsID;


                      ╔════════════════╦═════════╦════════╗
                      ║ OrderDetailsID ║ OrderID ║ UserID ║
                      ╠════════════════╬═════════╬════════╣
                      ║ 3915 ║ 40 ║ 15 ║
                      ║ 100715 ║ 8 ║ 15 ║
                      ║ 221815 ║ 19 ║ 15 ║
                      ║ 299915 ║ 100 ║ 15 ║
                      ║ 368215 ║ 83 ║ 15 ║
                      ║ 603815 ║ 39 ║ 15 ║
                      ║ 630215 ║ 3 ║ 15 ║
                      ║ 728515 ║ 86 ║ 15 ║
                      ║ 972215 ║ 23 ║ 15 ║
                      ║ 992015 ║ 21 ║ 15 ║
                      ║ 1017115 ║ 72 ║ 15 ║
                      ║ 1113815 ║ 39 ║ 15 ║
                      ╚════════════════╩═════════╩════════╝


                      If order of rows is imperative, and your engineers know that order is imperative, they should only ever want to use an ORDER BY statement, since it might cost them their designation if there was a failure related to incorrect order.



                      A second, perhaps more instructive example, using the OrderDetails table from above, where we're not joining any other tables, but have a simple requirement to find rows matching both the OrderID and the UserID, we see the problem.



                      We'll create an index to support the query, as you would likely do in real life if performance is in any way important (when isn't it?).



                      CREATE INDEX OrderDetailsOrderIDUserID ON dbo.OrderDetails(OrderID, UserID);


                      Here's the query:



                      SELECT od.OrderDetailsID
                      FROM dbo.OrderDetails od
                      WHERE od.OrderID = 15
                      AND (od.UserID = 21 OR od.UserID = 22)


                      And the results:




                      ╔════════════════╗
                      ║ OrderDetailsID ║
                      ╠════════════════╣
                      ║ 21421 ║
                      ║ 5061421 ║
                      ║ 7091421 ║
                      ║ 691422 ║
                      ║ 3471422 ║
                      ║ 7241422 ║
                      ╚════════════════╝


                      Adding an ORDER BY clause will most definitely ensure we get the correct sort here, too.



                      These mock-ups are just simple examples where rows are not guaranteed to be "in order" without an explicit ORDER BY statement. There are many more example like this, and since DBMS engine code changes quite frequently, the specific behavior may change over time.






                      share|improve this answer















                      Consider the following example, where we have three related tables. Orders, Users, and OrderDetails. OrderDetails is linked with foreign keys to the Orders table and the Users Table. This is essentially a very typical setup for relational databases; arguably the entire purpose of a relational DBMS.



                      USE tempdb;

                      IF OBJECT_ID(N'dbo.OrderDetails', N'U') IS NOT NULL
                      DROP TABLE dbo.OrderDetails;

                      IF OBJECT_ID(N'dbo.Orders', N'U') IS NOT NULL
                      DROP TABLE dbo.Orders;

                      IF OBJECT_ID(N'dbo.Users', N'U') IS NOT NULL
                      DROP TABLE dbo.Users;

                      CREATE TABLE dbo.Orders
                      (
                      OrderID int NOT NULL
                      CONSTRAINT OrderTestPK
                      PRIMARY KEY
                      CLUSTERED
                      , SomeOrderData varchar(1000)
                      CONSTRAINT Orders_somedata_df
                      DEFAULT (CRYPT_GEN_RANDOM(1000))
                      );

                      CREATE TABLE dbo.Users
                      (
                      UserID int NOT NULL
                      CONSTRAINT UsersPK
                      PRIMARY KEY
                      CLUSTERED
                      , SomeUserData varchar(1000)
                      CONSTRAINT Users_somedata_df
                      DEFAULT (CRYPT_GEN_RANDOM(1000))
                      );

                      CREATE TABLE dbo.OrderDetails
                      (
                      OrderDetailsID int NOT NULL
                      CONSTRAINT OrderDetailsTestPK
                      PRIMARY KEY
                      CLUSTERED
                      , OrderID int NOT NULL
                      CONSTRAINT OrderDetailsOrderID
                      FOREIGN KEY
                      REFERENCES dbo.Orders(OrderID)
                      , UserID int NOT NULL
                      CONSTRAINT OrderDetailsUserID
                      FOREIGN KEY
                      REFERENCES dbo.Users(UserID)
                      , SomeOrderDetailsData varchar(1000)
                      CONSTRAINT OrderDetails_somedata_df
                      DEFAULT (CRYPT_GEN_RANDOM(1000))
                      );

                      INSERT INTO dbo.Orders (OrderID)
                      SELECT TOP(100) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                      FROM sys.syscolumns sc;

                      INSERT INTO dbo.Users (UserID)
                      SELECT TOP(100) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                      FROM sys.syscolumns sc;

                      INSERT INTO dbo.OrderDetails (OrderDetailsID, OrderID, UserID)
                      SELECT TOP(10000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                      , o.OrderID
                      , u.UserID
                      FROM sys.syscolumns sc
                      CROSS JOIN dbo.Orders o
                      CROSS JOIN dbo.Users u
                      ORDER BY NEWID();

                      CREATE INDEX OrderDetailsOrderID ON dbo.OrderDetails(OrderID);
                      CREATE INDEX OrderDetailsUserID ON dbo.OrderDetails(UserID);


                      Here, we're querying the OrderDetails table where the UserID is 15:



                      SELECT od.OrderDetailsID
                      , o.OrderID
                      , u.UserID
                      FROM dbo.OrderDetails od
                      INNER JOIN dbo.Users u ON u.UserID = od.UserID
                      INNER JOIN dbo.Orders o ON od.OrderID = o.OrderID
                      WHERE u.UserID = 15


                      The output from the query looks like:




                      ╔════════════════╦═════════╦════════╗
                      ║ OrderDetailsID ║ OrderID ║ UserID ║
                      ╠════════════════╬═════════╬════════╣
                      ║ 2200115 ║ 2 ║ 15 ║
                      ║ 630215 ║ 3 ║ 15 ║
                      ║ 1990215 ║ 3 ║ 15 ║
                      ║ 4960215 ║ 3 ║ 15 ║
                      ║ 100715 ║ 8 ║ 15 ║
                      ║ 3930815 ║ 9 ║ 15 ║
                      ║ 6310815 ║ 9 ║ 15 ║
                      ║ 4441015 ║ 11 ║ 15 ║
                      ║ 2171315 ║ 14 ║ 15 ║
                      ║ 3431415 ║ 15 ║ 15 ║
                      ║ 4571415 ║ 15 ║ 15 ║
                      ║ 6421515 ║ 16 ║ 15 ║
                      ║ 2271715 ║ 18 ║ 15 ║
                      ║ 2601715 ║ 18 ║ 15 ║
                      ║ 3521715 ║ 18 ║ 15 ║
                      ║ 221815 ║ 19 ║ 15 ║
                      ║ 3381915 ║ 20 ║ 15 ║
                      ║ 4471915 ║ 20 ║ 15 ║
                      ╚════════════════╩═════════╩════════╝


                      As you can see, the order of rows output does not match the order of rows in the OrderDetails table.



                      Adding an explicit ORDER BY ensures rows will be returned to the client in the desired order:



                      SELECT od.OrderDetailsID
                      , o.OrderID
                      , u.UserID
                      FROM dbo.OrderDetails od
                      INNER JOIN dbo.Users u ON u.UserID = od.UserID
                      INNER JOIN dbo.Orders o ON od.OrderID = o.OrderID
                      WHERE u.UserID = 15
                      ORDER BY od.OrderDetailsID;


                      ╔════════════════╦═════════╦════════╗
                      ║ OrderDetailsID ║ OrderID ║ UserID ║
                      ╠════════════════╬═════════╬════════╣
                      ║ 3915 ║ 40 ║ 15 ║
                      ║ 100715 ║ 8 ║ 15 ║
                      ║ 221815 ║ 19 ║ 15 ║
                      ║ 299915 ║ 100 ║ 15 ║
                      ║ 368215 ║ 83 ║ 15 ║
                      ║ 603815 ║ 39 ║ 15 ║
                      ║ 630215 ║ 3 ║ 15 ║
                      ║ 728515 ║ 86 ║ 15 ║
                      ║ 972215 ║ 23 ║ 15 ║
                      ║ 992015 ║ 21 ║ 15 ║
                      ║ 1017115 ║ 72 ║ 15 ║
                      ║ 1113815 ║ 39 ║ 15 ║
                      ╚════════════════╩═════════╩════════╝


                      If order of rows is imperative, and your engineers know that order is imperative, they should only ever want to use an ORDER BY statement, since it might cost them their designation if there was a failure related to incorrect order.



                      A second, perhaps more instructive example, using the OrderDetails table from above, where we're not joining any other tables, but have a simple requirement to find rows matching both the OrderID and the UserID, we see the problem.



                      We'll create an index to support the query, as you would likely do in real life if performance is in any way important (when isn't it?).



                      CREATE INDEX OrderDetailsOrderIDUserID ON dbo.OrderDetails(OrderID, UserID);


                      Here's the query:



                      SELECT od.OrderDetailsID
                      FROM dbo.OrderDetails od
                      WHERE od.OrderID = 15
                      AND (od.UserID = 21 OR od.UserID = 22)


                      And the results:




                      ╔════════════════╗
                      ║ OrderDetailsID ║
                      ╠════════════════╣
                      ║ 21421 ║
                      ║ 5061421 ║
                      ║ 7091421 ║
                      ║ 691422 ║
                      ║ 3471422 ║
                      ║ 7241422 ║
                      ╚════════════════╝


                      Adding an ORDER BY clause will most definitely ensure we get the correct sort here, too.



                      These mock-ups are just simple examples where rows are not guaranteed to be "in order" without an explicit ORDER BY statement. There are many more example like this, and since DBMS engine code changes quite frequently, the specific behavior may change over time.







                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Jan 25 at 22:42

























                      answered Jan 25 at 22:13









                      Max VernonMax Vernon

                      50.9k13112224




                      50.9k13112224























                          10














                          As a practical example, in Postgres, the order currently changes when you update a row:



                          % SELECT * FROM mytable;
                          id | data
                          ----+------
                          0 | a
                          1 | b
                          2 | c
                          3 | d
                          4 | e
                          5 | f
                          6 | g
                          7 | h
                          8 | i
                          9 | j
                          (10 rows)

                          % UPDATE mytable SET data = 'ff' WHERE id = 5;
                          UPDATE 1
                          % SELECT * FROM mytable;
                          id | data
                          ----+------
                          0 | a
                          1 | b
                          2 | c
                          3 | d
                          4 | e
                          6 | g
                          7 | h
                          8 | i
                          9 | j
                          5 | ff
                          (10 rows)


                          I don't think the rules of this existing implicit ordering is documented anywhere, is definitely subject to change without notice, and is definitely not portable behavior across DB engines.






                          share|improve this answer


























                          • It is documented: ypercube's answer quotes the documentation telling us that the order is unspecified.

                            – Lightness Races in Orbit
                            Jan 28 at 13:58











                          • @LightnessRacesinOrbit I'd take that as the documentation explicitly telling us that it's not documented. I mean, it's also true that anything not in the documentation is unspecified. It's a kind of tautology. Anyway, I edited that part of the answer to be more specific.

                            – JoL
                            Jan 28 at 16:46
















                          10














                          As a practical example, in Postgres, the order currently changes when you update a row:



                          % SELECT * FROM mytable;
                          id | data
                          ----+------
                          0 | a
                          1 | b
                          2 | c
                          3 | d
                          4 | e
                          5 | f
                          6 | g
                          7 | h
                          8 | i
                          9 | j
                          (10 rows)

                          % UPDATE mytable SET data = 'ff' WHERE id = 5;
                          UPDATE 1
                          % SELECT * FROM mytable;
                          id | data
                          ----+------
                          0 | a
                          1 | b
                          2 | c
                          3 | d
                          4 | e
                          6 | g
                          7 | h
                          8 | i
                          9 | j
                          5 | ff
                          (10 rows)


                          I don't think the rules of this existing implicit ordering is documented anywhere, is definitely subject to change without notice, and is definitely not portable behavior across DB engines.






                          share|improve this answer


























                          • It is documented: ypercube's answer quotes the documentation telling us that the order is unspecified.

                            – Lightness Races in Orbit
                            Jan 28 at 13:58











                          • @LightnessRacesinOrbit I'd take that as the documentation explicitly telling us that it's not documented. I mean, it's also true that anything not in the documentation is unspecified. It's a kind of tautology. Anyway, I edited that part of the answer to be more specific.

                            – JoL
                            Jan 28 at 16:46














                          10












                          10








                          10







                          As a practical example, in Postgres, the order currently changes when you update a row:



                          % SELECT * FROM mytable;
                          id | data
                          ----+------
                          0 | a
                          1 | b
                          2 | c
                          3 | d
                          4 | e
                          5 | f
                          6 | g
                          7 | h
                          8 | i
                          9 | j
                          (10 rows)

                          % UPDATE mytable SET data = 'ff' WHERE id = 5;
                          UPDATE 1
                          % SELECT * FROM mytable;
                          id | data
                          ----+------
                          0 | a
                          1 | b
                          2 | c
                          3 | d
                          4 | e
                          6 | g
                          7 | h
                          8 | i
                          9 | j
                          5 | ff
                          (10 rows)


                          I don't think the rules of this existing implicit ordering is documented anywhere, is definitely subject to change without notice, and is definitely not portable behavior across DB engines.






                          share|improve this answer















                          As a practical example, in Postgres, the order currently changes when you update a row:



                          % SELECT * FROM mytable;
                          id | data
                          ----+------
                          0 | a
                          1 | b
                          2 | c
                          3 | d
                          4 | e
                          5 | f
                          6 | g
                          7 | h
                          8 | i
                          9 | j
                          (10 rows)

                          % UPDATE mytable SET data = 'ff' WHERE id = 5;
                          UPDATE 1
                          % SELECT * FROM mytable;
                          id | data
                          ----+------
                          0 | a
                          1 | b
                          2 | c
                          3 | d
                          4 | e
                          6 | g
                          7 | h
                          8 | i
                          9 | j
                          5 | ff
                          (10 rows)


                          I don't think the rules of this existing implicit ordering is documented anywhere, is definitely subject to change without notice, and is definitely not portable behavior across DB engines.







                          share|improve this answer














                          share|improve this answer



                          share|improve this answer








                          edited Jan 28 at 16:43

























                          answered Jan 26 at 1:11









                          JoLJoL

                          26114




                          26114













                          • It is documented: ypercube's answer quotes the documentation telling us that the order is unspecified.

                            – Lightness Races in Orbit
                            Jan 28 at 13:58











                          • @LightnessRacesinOrbit I'd take that as the documentation explicitly telling us that it's not documented. I mean, it's also true that anything not in the documentation is unspecified. It's a kind of tautology. Anyway, I edited that part of the answer to be more specific.

                            – JoL
                            Jan 28 at 16:46



















                          • It is documented: ypercube's answer quotes the documentation telling us that the order is unspecified.

                            – Lightness Races in Orbit
                            Jan 28 at 13:58











                          • @LightnessRacesinOrbit I'd take that as the documentation explicitly telling us that it's not documented. I mean, it's also true that anything not in the documentation is unspecified. It's a kind of tautology. Anyway, I edited that part of the answer to be more specific.

                            – JoL
                            Jan 28 at 16:46

















                          It is documented: ypercube's answer quotes the documentation telling us that the order is unspecified.

                          – Lightness Races in Orbit
                          Jan 28 at 13:58





                          It is documented: ypercube's answer quotes the documentation telling us that the order is unspecified.

                          – Lightness Races in Orbit
                          Jan 28 at 13:58













                          @LightnessRacesinOrbit I'd take that as the documentation explicitly telling us that it's not documented. I mean, it's also true that anything not in the documentation is unspecified. It's a kind of tautology. Anyway, I edited that part of the answer to be more specific.

                          – JoL
                          Jan 28 at 16:46





                          @LightnessRacesinOrbit I'd take that as the documentation explicitly telling us that it's not documented. I mean, it's also true that anything not in the documentation is unspecified. It's a kind of tautology. Anyway, I edited that part of the answer to be more specific.

                          – JoL
                          Jan 28 at 16:46











                          3














                          not exactly a demo, but too long for a comment.



                          On large tables some databases will do interleaved parallel scans:



                          If two queries want to scan the same table, and arrive at almost the same time, the first might be part way through the table when the second starts.



                          The second query could receive records starting from the middle of the table (as the first query is completing) and then receive the records from the start of the table.






                          share|improve this answer




























                            3














                            not exactly a demo, but too long for a comment.



                            On large tables some databases will do interleaved parallel scans:



                            If two queries want to scan the same table, and arrive at almost the same time, the first might be part way through the table when the second starts.



                            The second query could receive records starting from the middle of the table (as the first query is completing) and then receive the records from the start of the table.






                            share|improve this answer


























                              3












                              3








                              3







                              not exactly a demo, but too long for a comment.



                              On large tables some databases will do interleaved parallel scans:



                              If two queries want to scan the same table, and arrive at almost the same time, the first might be part way through the table when the second starts.



                              The second query could receive records starting from the middle of the table (as the first query is completing) and then receive the records from the start of the table.






                              share|improve this answer













                              not exactly a demo, but too long for a comment.



                              On large tables some databases will do interleaved parallel scans:



                              If two queries want to scan the same table, and arrive at almost the same time, the first might be part way through the table when the second starts.



                              The second query could receive records starting from the middle of the table (as the first query is completing) and then receive the records from the start of the table.







                              share|improve this answer












                              share|improve this answer



                              share|improve this answer










                              answered Jan 28 at 7:20









                              JasenJasen

                              1,201410




                              1,201410























                                  2














                                  Create a clustered index that has the "wrong" order. For example, cluster on ID DESC. This will often output the reverse order (although this is not guaranteed either).






                                  share|improve this answer




























                                    2














                                    Create a clustered index that has the "wrong" order. For example, cluster on ID DESC. This will often output the reverse order (although this is not guaranteed either).






                                    share|improve this answer


























                                      2












                                      2








                                      2







                                      Create a clustered index that has the "wrong" order. For example, cluster on ID DESC. This will often output the reverse order (although this is not guaranteed either).






                                      share|improve this answer













                                      Create a clustered index that has the "wrong" order. For example, cluster on ID DESC. This will often output the reverse order (although this is not guaranteed either).







                                      share|improve this answer












                                      share|improve this answer



                                      share|improve this answer










                                      answered Jan 27 at 13:04









                                      usrusr

                                      5,25722052




                                      5,25722052






























                                          draft saved

                                          draft discarded




















































                                          Thanks for contributing an answer to Database Administrators Stack Exchange!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function () {
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f228131%2fhow-to-prove-the-lack-of-implicit-order-in-a-database%23new-answer', 'question_page');
                                          }
                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          Index of /

                                          Tribalistas

                                          Listed building