Monday, March 7, 2011

How to do equivalent of "limit distinct"?

How can I limit a result set to n distinct values of a given column(s), where the actual number of rows may be higher?

Input table:

client_id, employer_id, other_value
1, 2, abc
1, 3, defg
2, 3, dkfjh
3, 1, ldkfjkj
4, 4, dlkfjk
4, 5, 342
4, 6, dkj
5, 1, dlkfj
6, 1, 34kjf
7, 7, 34kjf
8, 6, lkjkj
8, 7, 23kj

desired output, where limit distinct=5 distinct values of client_id:

1, 2, abc
1, 3, defg
2, 3, dkfjh
3, 1, ldkfjkj
4, 4, dlkfjk
4, 5, 342
4, 6, dkj
5, 1, dlkfj

Platform this is intended for is MySQL.

From stackoverflow
  • You can use a subselect

    select * from table where client_id in 
    (select distinct client_id from table order by client_id limit 5)
    
    ʞɔıu : I think this is right but I'll leave the question open for a little while to see if anyone else has a better answer
  • This is for SQL Server. I can't remember, MySQL may use a LIMIT keyword instead of TOP. That may make the query more efficient if you can get rid of the inner most subquery by using the LIMIT and DISTINCT in the same subquery. (It looks like Vinko used this method and that LIMIT is correct. I'll leave this here for the second possible answer though.)

    SELECT
         client_id,
         employer_id,
         other_value
    FROM
         MyTable
    WHERE
         client_id IN
         (
              SELECT TOP 5
                   client_id
              FROM
              (
                   SELECT DISTINCT
                        client_id
                   FROM
                        MyTable
              ) SQ
              ORDER BY
                   client_id
         )
    

    Of course, add in your own WHERE clause and ORDER BY clause in the subquery.

    Another possibility (compare performance and see which works out better) is:

    SELECT
         client_id,
         employer_id,
         other_value
    FROM
         MyTable T1
    WHERE
         T1.code IN
         (
              SELECT
                   T2.code
              FROM
                   MyTable T2
              WHERE
                   (SELECT COUNT(*) FROM MyTable T3 WHERE T3,code < T2.code) < 5
         )
    

0 comments:

Post a Comment