Code Answer: grouping records in one temp table

I have a table where one column has duplicate records but other columns are distinct. so something like this

Code SubCode version status

1234 D1 1 A

1234 D1 0 P

1234 DA 1 A

1234 DB 1 P

5678 BB 1 A

5678 BB 0 P

5678 BP 1 A

5678 BJ 1 A

0987 HH 1 A

So in the above table. subcode and Version are unique values whereas Code is repeated. I want to transfer records from the above table into a temporary table. Only records I would like to transfer are where ALL the subcodes for a code have status of 'A' and I want them in the temp table only once.

So from example above. the temporary table should only have 5678 and 0987 since all the subcodes relative to 5678 have status of 'A' and all subcodes for 0987 (it only has one) have status of A. 1234 is ommited because its subcode 'DB' has status of 'P'

I'd appreciate any help!

From stackoverflow

In your select, add a where clause that reads:
```
Select [stuff]
From Table T
Where Exists
    (Select * From Table 
     Where Code = T.Code
        And Status = 'A')
  And Not Exists
    (Select * From Table I
     Where Code = T.Code 
       And Not Exists
          (Select * From Table
           Where Code = I.Code
               And SubCode = I.SubCode
               And Status = 'A'))
```
In English, Show me the rows, where there is at least one row with status 'A', and there are NO rows with any specific subcode, that do not have at least one row with that code/subcode, with status 'A'

Tom H. : This doesn't return the correct results. 5678 will be excluded because at least one row exists with a non-"A" status. (also, your query has SubCode where I think you meant status)

Charles Bretana : Yes thx, corrected that

Tom H. : This still doesn't work.

Charles Bretana : Now it works... , I had a "<>" where it shouild have been an "="
```
INSERT theTempTable (Code)
SELECT t.Code
FROM   theTable t
       LEFT OUTER JOIN theTable subT ON (t.Code = subT.Code AND subT.status <> 'A')
WHERE  subT.Code IS NULL
GROUP BY t.Code
```
This should do the trick. The logic is a little tricky, but I'll do my best to explain how it is derived.

The outer join combined with the IS NULL check allows you to search for the absence of a criteria. Combine that with the inverse of what you're normally looking for (in this case status = 'A') and the query succeeds when there are no rows that do not match. This is the same as ((there are no rows) OR (all rows match)). Since we know that there are rows due to the other query on the table, all rows must match.

Tom H. : This doesn't return the correct results. 5678 will be excluded because at least one row exists with a non-"A" status.

It's a little unclear as to whether or not the version column comes into play. For example, do you only want to consider rows with the largest version or if ANY subcde has an "A" should it count. Take 5678, BB for example, where version 1 has an "A" and version 0 has a "B". Is 5678 included because at least one of subcode BB has an "A" or is it because version 1 has an "A".

The following code assumes that you want all codes where every subcode has at least one "A" regardless of the version.

SELECT
    T1.code,
    T1.subcode,
    T1.version,
    T1.status
FROM
    MyTable T1
WHERE
    (
      SELECT COUNT(DISTINCT subcode)
      FROM MyTable T2
      WHERE T2.code = T1.code
    ) =
    (
      SELECT COUNT(DISTINCT subcode)
      FROM MyTable T3
      WHERE T3.code = T1.code AND T3.status = 'A'
    )

Performance may be abysmal if your table is large. I'll try to come up with a query that is likely to have better performance since this was off the top of my head.

Also, if you explain the full extent of your problem maybe we can find a way to get rid of that temp table... ;)

Here are two more possible methods. Still a lot of subqueries, but they look like they will perform better than the method above. They are both very similar, although the second one here had a better query plan in my DB. Of course, with limited data and no indexing that's not a great test. You should try all of the methods out and see which is best for your database.

SELECT
    T1.code,
    T1.subcode,
    T1.version,
    T1.status
FROM
    MyTable T1
WHERE
    EXISTS
    (
     SELECT *
     FROM MyTable T2
     WHERE T2.code = T1.code
       AND T2.status = 'A'
    ) AND
    NOT EXISTS
    (
     SELECT *
     FROM MyTable T3
     LEFT OUTER JOIN MyTable T4 ON
      T4.code = T3.code AND
      T4.subcode = T3.subcode AND
      T4.status = 'A'
     WHERE T3.code = T1.code
       AND T3.status <> 'A'
       AND T4.code IS NULL
    )

SELECT
    T1.code,
    T1.subcode,
    T1.version,
    T1.status
FROM
    MyTable T1
WHERE
    EXISTS
    (
     SELECT *
     FROM MyTable T2
     WHERE T2.code = T1.code
       AND T2.status = 'A'
    ) AND
    NOT EXISTS
    (
     SELECT *
     FROM MyTable T3
     WHERE T3.code = T1.code
       AND T3.status <> 'A'
       AND NOT EXISTS
      (
       SELECT *
       FROM MyTable T4
       WHERE T4.code = T3.code
         AND T4.subcode = T3.subcode
         AND T4.status = 'A'
      )
    )

David B : Good solution. FROM is the enemy of performance, but you know this.

Here's my solution

SELECT Code
FROM
(
  SELECT
    Code,
    COUNT(SubCode) as SubCodeCount
    SUM(CASE WHEN ACount > 0 THEN 1 ELSE 0 END)
      as SubCodeCountWithA
  FROM
  (
    SELECT
      Code,
      SubCode,
      SUM(CASE WHEN Status = 'A' THEN 1 ELSE 0 END)
        as ACount
    FROM CodeTable
    GROUP BY Code, SubCode
  ) sub
  GROUP BY Code
) sub2
WHERE SubCodeCountWithA = SubCodeCount

Let's break it down from the inside out.

    SELECT
      Code,
      SubCode,
      SUM(CASE WHEN Status = 'A' THEN 1 ELSE 0 END)
        as ACount
    FROM CodeTable
    GROUP BY Code, SubCode

Group up the codes and subcodes (Each row is a distinct pairing of Code and Subcode). See how many A's occured in each pairing.

  SELECT
    Code,
    COUNT(SubCode) as SubCodeCount
    SUM(CASE WHEN ACount > 0 THEN 1 ELSE 0 END)
      as SubCodeCountWithA
  FROM
    --previous
  GROUP BY Code

Regroup those pairings by Code (now each row is a Code) and count how many subcodes there are, and how many subcodes had an A.

SELECT Code
FROM
  --previous
WHERE SubCodeCountWithA = SubCodeCount

Emit those codes with have the same number of subcodes as subcodes with A's.

Tom H. : Also an interesting solution. On the limited dataset above on a table with no indexes I'm seeing a high cost for sorting (relatively), but with an actual table with data this may be better than the table scans from my method.

Code Answer

Monday, March 7, 2011

grouping records in one temp table

0 comments:

Post a Comment

Blog Archive