Does order matter in a JOIN clause?
May 14, 2009 1 Comment
Note: This is a repost of a popular article I wrote just over a year ago on my SQLTeam blog site. The original posting received quite a few comments and I’ve gotten more than a few direct emails about it. Many people were asking about the answer to questions #2 and #3, which I never did get around to posting. I recently received another request for the follow up posts so I pledge to get those posts written and posted soon. In the meantime, I thought I’d share the original article with you here.
A very astute SQL Server professional and reader of this blog posed the following question in a comment to a prior posting entitled “Does the order of criteria the WHERE clause matter?” The comment, submitted through Plaxo Pulse, is not available outside the Pulse community so I’ve included it below since it’s the genesis of today’s post.
“What if you were using join statements? Does it matter what order your columns are used in the ON statement? ~ Ariel M.”
There are several derivatives of the original question:
- Does the order of tables referenced in the ON clause of the JOIN matter?
- Does the order of tables referenced in the FROM clause matter?
- Does the order of columns referenced in the ON clause of the JOIN matter when multiple columns are used in the JOIN?
I’ll address questions 2 and 3 in another post; for now let’s consider the first question. To paraphrase and exemplify the question: Will the following two queries produce differing execution plans and thus varying performance? Or does the Query Optimizer take this into account as it analyzes a statement before choosing an execution plan?
--Query #1 SELECT c.Customer_ID ,c.Last_Name ,c.First_Name ,s.Order_ID ,s.Order_Date FROM dbo.Customers AS c JOIN dbo.Sales_Orders AS s ON s.Customer_ID = c.Customer_ID
--Query #2 SELECT c.Customer_ID ,c.Last_Name ,c.First_Name ,s.Order_ID ,s.Order_Date FROM dbo.Customers AS c JOIN dbo.Sales_Orders AS s ON c.Customer_ID = s.Customer_ID
Notice that the only difference in the two queries lies in the ON clause of the JOIN. The first query references Sales then Customers while the second reverses the order.
To answer this question, let’s look at the execution plan as provided by SQL Server Management Studio. The first query produces the following execution plan.
In this query, SQL Server has chosen a Clustered Index Scan of the Sales_Orders table and an Index Scan for the Customers table. The two tables are joined using a Hash Match Inner Join.
Now, let’s look at the execution plan for the second query.
Query #2 produced the exact same execution plan! So, we can conclude from this simple example that the order of tables referenced in the ON clause of a JOIN doesn’t affect the performance of a query.