Friday, May 6, 2011

What indexes optimize this query with four joins?

I have an sql query with inner joins of four tables that takes more than 30 seconds with the current indexes and query structure. I would like to make it as fast as possible; at least faster than 5 seconds.

I first thought about denormalizing, but read here that generally it should be possible to optimize via correct indexes etc. I cannot figure it out in this case. The current query plan contains an index scan on the smallest table and a 'no join predicate' warning on one of the inner joins.

  • How can I optimize the speed of the following?
  • Which indexes?
  • Which query structure?
  • Other considerations?

We have the following tables (with number of rows and relevant fields indicated):

TableName           Rows  Fields
------------------- ----- ----------------------------------------------
ProjectType         150   ProjectTypeID, ProjectTypeName
Employee            200   EmployeeID, RefDepartmentID
Project             0.2M  ProjectID, RefProjectTypeID
ProjectTransaction  3.5M  Hours, RefEmployeeID, RefProjectID, Date, Type

The query should sum the hours for a given department, date range, etc. Currently I try:

SELECT E.RefDepartmentID, SUM(PTran.Hours)
FROM Employee E
JOIN ProjectTransaction PTran
    ON E.EmployeeID = PTran.RefEmployeeID
JOIN Project P
    ON PTran.RefProjectID = P.ProjectID
JOIN ProjectType PType
    ON P.RefProjectTypeID = PType.ProjectTypeID
WHERE E.RefDepartmentID = @departmentID
    AND @from <= PTran.Date AND PTran.Date <= @to
    AND PTran.Type = 0
    AND PType.ProjectTypeName NOT IN (N'1', N'2', N'3')
GROUP BY E.RefDepartmentID


Thanks for all the quick answers. (I already had indexes on 'foreign keys' and criteria in WHERE clause.) I reordered the query to have the two small tables first, then the medium sized, and the big one last. And voila takes around one second:

SELECT E.RefDepartmentID, SUM(PTran.Hours)
FROM Employee E
JOIN ProjectType PType
    ON E.RefCustomerID = PType.RefCustomerID
JOIN Project P
    ON PType.ProjectTypeID = P.RefProjectTypeID
JOIN ProjectTransaction PTran
    ON E.EmployeeID = PTran.RefEmployeeID
    AND P.ProjectID = PTran.RefProjectID
WHERE E.RefDepartmentID = @departmentID
    AND @from <= PTran.Date AND PTran.Date <= @to
    AND PTran.Type = 0
    AND PType.ProjectTypeName NOT IN (N'1', N'2', N'3')
GROUP BY E.RefDepartmentID
From stackoverflow
  • Make sure you have indexes on foreign key columns.

    Ole Lynge : Thanks. Yes. Columns with Ref* (like RefEmployeeID, RefDepartmentID) refer to ID columns in other tables. I have indexes on those columns. However, they are not set up explicitly as foreign keys in the sql database. Could the speed depend on the fact that they are not 'true' foreign keys?
    Mehrdad Afshari : It *might* help. This require testing. If you have a foreign key on a column and a primary key on the other, the query optimizer will be sure there's a single row on the parent table. This might help it run the query more efficiently. It might hurt INSERT performance. The good thing is they also enforce data integrity. To see if it's a performance gain or loss, you should try both and compare.
  • It not always work, but try to:

    1. Reorder tables in joins from the smallest one to the biggest one.
    2. Use subquery instead of ProjectTransaction table:

      JOIN (SELECT RefEmployeeID, RefProjectID FROM ProjectTransaction WHERE @from <= PTran.Date AND PTran.Date <= @to AND PTran.Type = 0) AS trans

    Ole Lynge : Yes, indeed. It is faster. Now it takes 1 second. Thanks.
    Ole Lynge : I forgot to say what I did: I used your first tip: reordered the tables with the smallest first and biggest last.
    codeulike : +1 crikey! 30 seconds to 1 second.
  • Assuming that P.ProjectID and PType.ProjectTypeID are PRIMARY KEY's, create this index:

    CREATE INDEX ix_ptran_emloyee_type_date ON ProjectTransaction(RefEmployeeID, Type, Date)
    

    , and get rid of GROUP BY:

    SELECT  @departmentID, SUM(PTran.Hours)
    FROM    Employee E
    JOIN    ProjectTransaction PTran
    ON      PTran.RefEmployeeID = E.EmployeeID
    JOIN    Project P
    ON      P.ProjectID = PTran.RefProjectID
    JOIN    ProjectType PType
    ON      PType.ProjectTypeID = P.RefProjectTypeID
    WHERE   E.RefDepartmentID = @departmentID
            AND PTran.Date BETWEEN @from AND @to
            AND PTran.Type = 0
            AND PType.ProjectTypeName NOT IN (N'1', N'2', N'3')
    
    codeulike : How can you get rid of GroupBy() and still have Sum() in the Select?
    Ole Lynge : I think Quassnoi forgot E.RefDepartmentID = @departmentID in the WHERE clause. With that included it is not necessary to use GROUP BY because besides the SUM there are no result fields in the SELECT list. Quassnoi only SELECT the parameter @departmentID and SUM there.
    Quassnoi : Sure, I forgot E.RefDepartmentID = @departmentID
  • What does your query plan say? If you have two index scans on the same table, followed by a hash join (or similar) between the results of them both, create a compound index with the fields from both of the index scans - the DB can then do a single index lookup in the compound index.

    Make sure you have up-to-date statistics. The database can optimise better when those are provided.

  • Indexes on all the foreign keys, obviously.

    Also ProjectTransaction is your only really big table, and you are doing lots of filtering on it in the Where clause. I would add indexes to all the columns that you are filtering on:

    ProjectTransaction.Date
    ProjectTransaction.Type

    edit: As you have a table there with 3.5 million rows, another optimisation 'technique' you might want to consider is upgrading the hardware on your SQL Server. Some extra memory or a few extra processors can sometimes be more cost-effective than development time spent optimising (depending on development costs and how much optimisation you are doing, obviously) ... e.g. see Hardware is Cheap, Programmers are Expensive.

  • The Database Engine Tuning Advisor will tell you if you just ask it.

  • All the advice so far has been sound, but if you want the expert's (i.e. SQL Server's) view, you could do worse than use the Database Engine Tuning Advisor. Even if you don't have extensive real-world data, the Adviser can make some sensible suggestions based on a small set of test data.

0 comments:

Post a Comment