I have a table where one column has duplicate records but other columns are distinct. so something like this
Code SubCode version status
1234 D1 1 A
1234 D1 0 P
1234 DA 1 A
1234 DB 1 P
5678 BB 1 A
5678 BB 0 P
5678 BP 1 A
5678 BJ 1 A
0987 HH 1 A
So in the above table. subcode and Version are unique values whereas Code is repeated. I want to transfer records from the above table into a temporary table. Only records I would like to transfer are where ALL the subcodes for a code have status of 'A' and I want them in the temp table only once.
So from example above. the temporary table should only have 5678 and 0987 since all the subcodes relative to 5678 have status of 'A' and all subcodes for 0987 (it only has one) have status of A. 1234 is ommited because its subcode 'DB' has status of 'P'
I'd appreciate any help!
-
In your select, add a where clause that reads:
Select [stuff] From Table T Where Exists (Select * From Table Where Code = T.Code And Status = 'A') And Not Exists (Select * From Table I Where Code = T.Code And Not Exists (Select * From Table Where Code = I.Code And SubCode = I.SubCode And Status = 'A'))
In English, Show me the rows, where there is at least one row with status 'A', and there are NO rows with any specific subcode, that do not have at least one row with that code/subcode, with status 'A'
Tom H. : This doesn't return the correct results. 5678 will be excluded because at least one row exists with a non-"A" status. (also, your query has SubCode where I think you meant status)Charles Bretana : Yes thx, corrected thatTom H. : This still doesn't work.Charles Bretana : Now it works... , I had a "<>" where it shouild have been an "=" -
INSERT theTempTable (Code) SELECT t.Code FROM theTable t LEFT OUTER JOIN theTable subT ON (t.Code = subT.Code AND subT.status <> 'A') WHERE subT.Code IS NULL GROUP BY t.Code
This should do the trick. The logic is a little tricky, but I'll do my best to explain how it is derived.
The outer join combined with the IS NULL check allows you to search for the absence of a criteria. Combine that with the inverse of what you're normally looking for (in this case status = 'A') and the query succeeds when there are no rows that do not match. This is the same as ((there are no rows) OR (all rows match)). Since we know that there are rows due to the other query on the table, all rows must match.
Tom H. : This doesn't return the correct results. 5678 will be excluded because at least one row exists with a non-"A" status. -
It's a little unclear as to whether or not the version column comes into play. For example, do you only want to consider rows with the largest version or if ANY subcde has an "A" should it count. Take 5678, BB for example, where version 1 has an "A" and version 0 has a "B". Is 5678 included because at least one of subcode BB has an "A" or is it because version 1 has an "A".
The following code assumes that you want all codes where every subcode has at least one "A" regardless of the version.
SELECT T1.code, T1.subcode, T1.version, T1.status FROM MyTable T1 WHERE ( SELECT COUNT(DISTINCT subcode) FROM MyTable T2 WHERE T2.code = T1.code ) = ( SELECT COUNT(DISTINCT subcode) FROM MyTable T3 WHERE T3.code = T1.code AND T3.status = 'A' )
Performance may be abysmal if your table is large. I'll try to come up with a query that is likely to have better performance since this was off the top of my head.
Also, if you explain the full extent of your problem maybe we can find a way to get rid of that temp table... ;)
Here are two more possible methods. Still a lot of subqueries, but they look like they will perform better than the method above. They are both very similar, although the second one here had a better query plan in my DB. Of course, with limited data and no indexing that's not a great test. You should try all of the methods out and see which is best for your database.
SELECT T1.code, T1.subcode, T1.version, T1.status FROM MyTable T1 WHERE EXISTS ( SELECT * FROM MyTable T2 WHERE T2.code = T1.code AND T2.status = 'A' ) AND NOT EXISTS ( SELECT * FROM MyTable T3 LEFT OUTER JOIN MyTable T4 ON T4.code = T3.code AND T4.subcode = T3.subcode AND T4.status = 'A' WHERE T3.code = T1.code AND T3.status <> 'A' AND T4.code IS NULL ) SELECT T1.code, T1.subcode, T1.version, T1.status FROM MyTable T1 WHERE EXISTS ( SELECT * FROM MyTable T2 WHERE T2.code = T1.code AND T2.status = 'A' ) AND NOT EXISTS ( SELECT * FROM MyTable T3 WHERE T3.code = T1.code AND T3.status <> 'A' AND NOT EXISTS ( SELECT * FROM MyTable T4 WHERE T4.code = T3.code AND T4.subcode = T3.subcode AND T4.status = 'A' ) )
David B : Good solution. FROM is the enemy of performance, but you know this. -
Here's my solution
SELECT Code FROM ( SELECT Code, COUNT(SubCode) as SubCodeCount SUM(CASE WHEN ACount > 0 THEN 1 ELSE 0 END) as SubCodeCountWithA FROM ( SELECT Code, SubCode, SUM(CASE WHEN Status = 'A' THEN 1 ELSE 0 END) as ACount FROM CodeTable GROUP BY Code, SubCode ) sub GROUP BY Code ) sub2 WHERE SubCodeCountWithA = SubCodeCount
Let's break it down from the inside out.
SELECT Code, SubCode, SUM(CASE WHEN Status = 'A' THEN 1 ELSE 0 END) as ACount FROM CodeTable GROUP BY Code, SubCode
Group up the codes and subcodes (Each row is a distinct pairing of Code and Subcode). See how many A's occured in each pairing.
SELECT Code, COUNT(SubCode) as SubCodeCount SUM(CASE WHEN ACount > 0 THEN 1 ELSE 0 END) as SubCodeCountWithA FROM --previous GROUP BY Code
Regroup those pairings by Code (now each row is a Code) and count how many subcodes there are, and how many subcodes had an A.
SELECT Code FROM --previous WHERE SubCodeCountWithA = SubCodeCount
Emit those codes with have the same number of subcodes as subcodes with A's.
Tom H. : Also an interesting solution. On the limited dataset above on a table with no indexes I'm seeing a high cost for sorting (relatively), but with an actual table with data this may be better than the table scans from my method.
0 comments:
Post a Comment