Friday, May 6, 2011

Are LEFT JOIN subquery table arguments evaluated more than once?

Hello. I have a query that looks like this:

SELECT  *
FROM    employees e
LEFT JOIN
        (
        SELECT  *
        FROM    timereports
        WHERE   date = '2009-05-04'
        ) t
ON      e.id = t.employee_id

As you can see, my LEFT JOIN second table parameter is generated by a a subquery.

Does the db evaluate this subquery only once, or multiple times?

thanks. matti

From stackoverflow
  • This depends on the RDBMS.

    In most of them, a HASH OUTER JOIN will be employed, in which case the subquery will be evaluated once.

    MySQL, on the other hand, isn't capable of making HASH JOIN's, that's why it will most probably push the predicate into the subquery and will issue this query:

    SELECT  *
    FROM    timereports t
    WHERE   t.employee_id = e.id
            AND date = '2009-05-04'
    

    in a nested loop. If you have an index on timereports (employee_id, date), this will also be efficient.

  • If you are using SQL Server, you can take a look at the Execution Plan of the query. The SQL Server query optimizer will optimize the query so that it takes the least time in execution. Best time will be based on some conditions viz. indexing and the like.

  • You have to ask the database to show the plan. The algorithm for doing this is chosen dynamically (at query time) based on many factors. Some databases use statistics of the key distribution to decide which algorithm to use. Other databases have relatively fixed rules.

    Further, each database has a menu of different algorithms. The database could use a sort-merge algorithm, or nested loops. In this case, there may be a query flattening strategy.

    You need to use your database's unique "Explain Plan" feature to look at the query execution plan.

    You also need to know if your database uses hints (usually comments embedded in the SQL) to pick an algorithm.

    You also need to know if your database uses statistics (sometimes called a "cost-based query optimizer) to pick an algorithm.

    One you know all that, you'll know how your query is executed and if an inner query is evaluated multiple times or flattened into the parent query or evaluated once to create a temporary result that's used by the parent query.

  • What do you mean by evaluated?

    The database has a couple of different options how to perform a join, the two most common ones being

    • Nested loops, in which case each row in one table will be looped through and the corresponding row in the other table will be looked up, and
    • Hash join, which means that both tables will be scanned once and the results are then merged using some hash algorithm.

    Which of those two options is chosen depends on the database, the size of the table and the available indexes (and perhaps other things as well).

0 comments:

Post a Comment