Friday, February 4, 2011

How do I (or can I) SELECT DISTINCT on multiple columns (postgresql)?

I need to retrieve all rows from a table where 2 columns combined are all different. So I want all the sales that do not have any other sales that happened on the same day for the same price. The sales that are unique based on day and price will get updated to an active status.

So I'm thinking:

UPDATE sales
SET status = 'ACTIVE'
WHERE id IN (SELECT DISTINCT (saleprice, saledate), id, count(id)
             FROM sales
             HAVING count = 1)

But my brain hurts going any farther than that.

  • SELECT DISTINCT a,b,c FROM t
    

    is roughly equivalent to:

    SELECT a,b,c FROM t GROUP BY a,b,c
    

    It's a good idea to get used to the GROUP BY syntax, as it's more powerful.

    For your query, I'd do it like this:

    UPDATE sales
    SET status='ACTIVE'
    WHERE id IN
    (
        SELECT id
        FROM sales S
        INNER JOIN
        (
            SELECT saleprice, saledate
            FROM sales
            GROUP BY saleprice, saledate
            HAVING COUNT(*) = 1 
        ) T
        ON S.saleprice=T.saleprice AND s.saledate=T.saledate
     )
    
  • The problem with your query is that when using a GROUP BY clause (which you essentially do by using distinct) you can only use columns that you group by or aggregate functions. You cannot use the column id because there are potentially different values. In your case there is always only one value because of the HAVING clause, but most RDBMS are not smart enough to recognize that.

    This should work however (and doesn't need a join):

    UPDATE sales
    SET status='ACTIVE'
    WHERE id IN (
      SELECT MIN(id) FROM sales
      GROUP BY saleprice, saledate
      HAVING COUNT(id) = 1
    )
    

    You could also use MAX or AVG instead of MIN, it is only important to use a function that returns the value of the column if there is only one matching row.

0 comments:

Post a Comment