This post has already been read 2045 times!

SQL Server APPLY operator has two variants; CROSS APPLY and OUTER APPLY

  • The CROSS APPLY operator returns only those rows from the left table expression (in its final output) if it matches with the right table expression. In other words, the right table expression returns rows for the left table expression match only.
  • The OUTER APPLY operator returns all the rows from the left table expression irrespective of its match with the right table expression. For those rows for which there are no corresponding matches in the right table expression, it contains NULL values in columns of the right table expression.
  • So you might conclude, the CROSS APPLY is equivalent to an INNER JOIN (or to be more precise its like a CROSS JOIN with a correlated sub-query) with an implicit join condition of 1=1 whereas the OUTER APPLY is equivalent to a LEFT OUTER JOIN.
TEMPORARY TABLES AND OBJECTS
--Script #1 - Creating some temporary objects to work on...

USE [tempdb] 
GO
 
IF EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'[Employee]') AND type IN (N'U')) 
BEGIN 
   DROP TABLE [Employee] 
END 
GO 

IF EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'[Department]') AND type IN (N'U')) 
BEGIN 
   DROP TABLE [Department] 
END 

CREATE TABLE [Department]( 
   [DepartmentID] [int] NOT NULL PRIMARY KEY, 
   [Name] VARCHAR(250) NOT NULL, 
) ON [PRIMARY] 

INSERT [Department] ([DepartmentID], [Name])  
VALUES (1, N'Engineering') 
INSERT [Department] ([DepartmentID], [Name])  
VALUES (2, N'Administration') 
INSERT [Department] ([DepartmentID], [Name])  
VALUES (3, N'Sales') 
INSERT [Department] ([DepartmentID], [Name])  
VALUES (4, N'Marketing') 
INSERT [Department] ([DepartmentID], [Name])  
VALUES (5, N'Finance') 
GO 

CREATE TABLE [Employee]( 
   [EmployeeID] [int] NOT NULL PRIMARY KEY, 
   [FirstName] VARCHAR(250) NOT NULL, 
   [LastName] VARCHAR(250) NOT NULL, 
   [DepartmentID] [int] NOT NULL REFERENCES [Department](DepartmentID), 
) ON [PRIMARY] 
GO
 
INSERT [Employee] ([EmployeeID], [FirstName], [LastName], [DepartmentID]) 
VALUES (1, N'Orlando', N'Gee', 1 ) 
INSERT [Employee] ([EmployeeID], [FirstName], [LastName], [DepartmentID]) 
VALUES (2, N'Keith', N'Harris', 2 ) 
INSERT [Employee] ([EmployeeID], [FirstName], [LastName], [DepartmentID]) 
VALUES (3, N'Donna', N'Carreras', 3 ) 
INSERT [Employee] ([EmployeeID], [FirstName], [LastName], [DepartmentID]) 
VALUES (4, N'Janet', N'Gates', 3 ) 

SQL Server CROSS APPLY vs INNER JOIN example

CROSS APPLY and INNER JOIN
--Script #2 - CROSS APPLY and INNER JOIN

SELECT * FROM Department D 
CROSS APPLY 
   ( 
   SELECT * FROM Employee E 
   WHERE E.DepartmentID = D.DepartmentID 
   ) A 
GO
 
SELECT * FROM Department D 
INNER JOIN Employee E ON D.DepartmentID = E.DepartmentID 
GO 

If you look at the results, you can see see they are the same.

cross apply and inner join result set

Also, the execution plans for these queries are similar and they have an equal query cost, as you can see in the image below.

cross apply and inner join query plan

SQL Sever OUTER APPLY vs LEFT OUTER JOIN example

OUTER APPLY and LEFT OUTER JOIN
--Script #3 - OUTER APPLY and LEFT OUTER JOIN

SELECT * FROM Department D 
OUTER APPLY 
   ( 
   SELECT * FROM Employee E 
   WHERE E.DepartmentID = D.DepartmentID 
   ) A 
GO
 
SELECT * FROM Department D 
LEFT OUTER JOIN Employee E ON D.DepartmentID = E.DepartmentID 
GO 
outer apply and left outer join result set

Even though the above two queries return the same information, the execution plan is a bit different. Although cost wise there is not much difference, the query with the OUTER APPLY uses a Compute Scalar operator (with estimated operator cost of 0.0000103 or around 0%) before the Nested Loops operator to evaluate and produce the columns of the Employee table.

outer apply and left outer join query plan

Joining table valued functions and tables using APPLY operators

In Script #4, I am creating a table-valued function which accepts DepartmentID as its parameter and returns all the employees who belong to this department. The next query selects data from the Department table and uses a CROSS APPLY to join with the function we created. It passes the DepartmentID for each row from the outer table expression (in our case Department table) and evaluates the function for each row similar to a correlated subquery. The next query uses the OUTER APPLY in place of the CROSS APPLY and hence unlike the CROSS APPLY which returned only correlated data, the OUTER APPLY returns non-correlated data as well, placing NULLs into the missing columns.

APPLY with TABLE-VALUED FUNCTION
--Script #4 - APPLY with table-valued function

IF EXISTS (SELECT * FROM sys.objects WHERE OBJECT_ID = OBJECT_ID(N'[fn_GetAllEmployeeOfADepartment]') AND type IN (N'IF')) 
BEGIN 
   DROP FUNCTION dbo.fn_GetAllEmployeeOfADepartment 
END 
GO
 
CREATE FUNCTION dbo.fn_GetAllEmployeeOfADepartment(@DeptID AS INT)  
RETURNS TABLE 
AS 
RETURN 
   ( 
   SELECT * FROM Employee E 
   WHERE E.DepartmentID = @DeptID 
   ) 
GO
 
SELECT * FROM Department D 
CROSS APPLY dbo.fn_GetAllEmployeeOfADepartment(D.DepartmentID) 
GO
 
SELECT * FROM Department D 
OUTER APPLY dbo.fn_GetAllEmployeeOfADepartment(D.DepartmentID) 
GO 

You might be wondering if we can use a simple join in place of the above queries, the answer is NO. If you replace the CROSS/OUTER APPLY in the above queries with an INNER JOIN/LEFT OUTER JOIN, specifying the ON clause with 1=1 and run the query, you will get the error "The multi-part identifier "D.DepartmentID" could not be bound.". This is because with JOINs the execution context of the outer query is different from the execution context of the function (or a derived table), and you cannot bind a value/variable from the outer query to the function as a parameter. Hence the APPLY operator is required for such queries.

So in summary the APPLY operator is required when you have to use a table-valued function in the query, but it can also be used with inline SELECT statements.

Now, let's create the sample tables and check all these solutions for efficiency:

-- VERIFY THE EFFICIENCY
-- VERIFY THE EFFICIENCY

SET NOCOUNT ON
GO
DROP TABLE [20090716_cross].table1
DROP TABLE [20090716_cross].table2
DROP SCHEMA [20090716_cross]
GO
CREATE SCHEMA [20090716_cross]
CREATE TABLE table1
        (
        id INT NOT NULL PRIMARY KEY,
        row_count INT NOT NULL
        )
CREATE TABLE table2
        (
        id INT NOT NULL PRIMARY KEY,
        value VARCHAR(20) NOT NULL
        )
GO
BEGIN TRANSACTION
DECLARE @cnt INT
SET @cnt = 1
WHILE @cnt <= 100000
BEGIN
        INSERT
        INTO    [20090716_cross].table2 (id, value)
        VALUES  (@cnt, 'Value ' + CAST(@cnt AS VARCHAR))
        SET @cnt = @cnt + 1
END
INSERT
INTO    [20090716_cross].table1 (id, row_count)
SELECT  TOP 5
        id, id % 2 + 1
FROM    [20090716_cross].table2
ORDER BY
        id
COMMIT
GO

table2 contains 100,000 rows with sequential ids.

table1 contains the following:

id row_count
1 2
2 1
3 2
4 1
5 2

Now let's run the first query (with COUNT):

SELECT  *
FROM    [20090716_cross].table1 t1
JOIN    (
        SELECT  t2o.*,
                (
                SELECT  COUNT(*)
                FROM    [20090716_cross].table2 t2i
                WHERE   t2i.id <= t2o.id
                ) AS rn
        FROM    [20090716_cross].table2 t2o
        ) t2
ON      t2.rn <= t1.row_count
ORDER BY
        t1.id, t2.id
id row_count id value rn
1 2 1 Value 1 1
1 2 2 Value 2 2
2 1 1 Value 1 1
3 2 1 Value 1 1
3 2 2 Value 2 2
4 1 1 Value 1 1
5 2 1 Value 1 1
5 2 2 Value 2 2
8 rows fetched in 0.0000s (498.4063s)
Table 'table1'. Scan count 2, logical reads 200002, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'Worktable'. Scan count 100000, logical reads 8389920, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'table2'. Scan count 4, logical reads 1077, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 947655 ms,  elapsed time = 498385 ms. 

This query, as was expected, is very unoptimal. It runs for more than 500 seconds.

Here's the query plan:

SELECT
  Sort
    Compute Scalar
      Parallelism (Gather Streams)
        Inner Join (Nested Loops)
          Inner Join (Nested Loops)
            Clustered Index Scan ([20090716_cross].[table2])
            Compute Scalar
              Stream Aggregate
                Eager Spool
                  Clustered Index Scan ([20090716_cross].[table2])
          Clustered Index Scan ([20090716_cross].[table1])

For each row selected from table2, it counts all previous rows again and again, never recording the intermediate result. The complexity of such an algorithm is O(n^2), that's why it takes so long.

Let's run the second query, which uses ROW_NUMBER():

SELECT  *
FROM    [20090716_cross].table1 t1
JOIN    (
        SELECT  t2o.*, ROW_NUMBER() OVER (ORDER BY id) AS rn
        FROM    [20090716_cross].table2 t2o
        ) t2
ON      t2.rn <= t1.row_count
ORDER BY
        t1.id, t2.id

This is much faster, only 0.5 ms.

Let's look into the query plan:

SELECT
  Inner Join (Nested Loops)
    Clustered Index Scan ([20090716_cross].[table1])
  Lazy Spool
    Sequence Project (Compute Scalar)
      Compute Scalar
        Segment
          Clustered Index Scan ([20090716_cross].[table2])

This is much better, since this query plan keeps the intermediate results while calculating the ROW_NUMBER.

However, it still calculates ROW_NUMBERs for all 100,000 of rows in table2, then puts them into a temporary index over rn created by Lazy Spool, and uses this index in a nested loop to range the rns for each row from table1.

Calculating and indexing all ROW_NUMBERs is quite expensive, that's why we see 214,093 logical reads in the query statistics.

Finally, let's try a CROSS APPLY:

SELECT  *
FROM    [20090716_cross].table1 t1
CROSS APPLY
        (
        SELECT  TOP (t1.row_count) *
        FROM    [20090716_cross].table2
        ORDER BY
                id
        ) t2
ORDER BY
        t1.id, t2.id
id row_count id value
1 2 1 Value 1
1 2 2 Value 2
2 1 1 Value 1
3 2 1 Value 1
3 2 2 Value 2
4 1 1 Value 1
5 2 1 Value 1
5 2 2 Value 2
8 rows fetched in 0.0004s (0.0008s)
Table 'table2'. Scan count 5, logical reads 10, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 
Table 'table1'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. 

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms. 

This query is instant, as it should be.

The plan is quite simple:

SELECT
  Inner Join (Nested Loops)
    Clustered Index Scan ([20090716_cross].[table1])
    Top
      Clustered Index Scan ([20090716_cross].[table2])

For each row from table1, it just takes first row_count rows from table2. So simple and so fast.

Summary:

While most queries which employ CROSS APPLY can be rewritten using an INNER JOIN, CROSS APPLY can yield better execution plan and better performance, since it can limit the set being joined yet before the join occurs.

Article code source : https://www.mssqltips.com/sqlservertip/1958/sql-server-cross-apply-and-outer-apply/

Leave a Reply

Post Navigation