Thursday, April 14, 2011

How to outer-join two tables (main and many-to-one sub-table) to get only ONE item from second table?

I have two tables that are something like this:

Main table: id (int), title (varchar), etc. Sub-table: main_table_id (foreign key into main table), tag (varchar), etc.

There can be zero or more subtable rows for a given row in the main table.

I want to do a query that will return every row of the main table, with the columns of the main table, and a the columns from only a single row (doesn't matter which) of the sub-table, if there are any, otherwise NULL in those columns.

Of course if I just do a basic LEFT OUTER JOIN then of course I get the main table repeated multiple times, one for each match in the sub-table.

I'm sure I have seen this done before using a LEFT OUTER JOIN and some sort of trickery that forces only one row to be selected from the sub-table, not all of them -- maybe picking out the minimum or maximum OID. However, more than an hour of googling has not yielded any solutions.

Does anybody have this trick in their toolbelt?

From stackoverflow
  • If you're just checking of there is something in the second table that goes with the items in the first table you can use an outer join with a group by clause and an aggregate:

    select t1.[name], count(t1.[name]) as t2count
       from table1 t1
       left outer join table2 t2 on t1.fk = t2.pk
    group by t1.[name]
    

    Then, anything that has 0 for t2count will be those that have nothing in table2

    edit: actually, I don't remember if t2count will have null or 0 in it...it should be one of those values.

    danwood : This one sort of works, though in reality my main table (t1) has multiple columns, and from what I can tell, I can't use the group-by trick for more than just the one column. select t1.*, t1.[name], .... --> (other column) must appear in the GROUP BY clause or be used in an aggregate function
    dustyburwell : you can just include multiple items in the group by if you need to select more than one column like "group by t1.[name], t1.id, t1.whatever
  • If you're using SQL Server, you can use the TOP clause. If it is something other than SQL Server, you'll have to see if that database offers something equivalent (many of them do). Something like this ...

    Select * from Main m
    left outer join 
    (select top 1 * from subtable s
      where s.main_table_id = m.id) q
    on q.main_table_id = m.id;
    

    Note: That is to show you the general idea. I didn't have a chance to run it, so there might be a couple of changes necessary, but the concept is there.

    danwood : Alas I'm on postgres so I don't think that can apply to me. Thanks anyhow!
    cbp : Including the inner WHERE clause doesn't work for me ("The multi-part identifier m.id could not be bound"). Removing it works ok.
    Charlie Flowers : I"m not surprised, I just typed it off top of my head without being able to run it. But the general idea is there. Thanks for supplying the detail.
  • I like Charlie's answer best, but i'm not sure if Postges has a TOP/LIMIT function, so here's an alternate solution that doesn't need one (but does assume sub_table has a primary key called "id"):

    SELECT * 
    FROM main_table m LEFT OUTER JOIN sub_table s 
    ON s.main_table_id = m.id
    WHERE s.id IS NULL OR s.id IN ( 
      SELECT MAX(id) FROM sub_table GROUP BY main_table_id
    )
    
    danwood : Hey it worked! I had to add a primary key to the sub-table but that did the trick.

0 comments:

Post a Comment