I have 3 lists, I will make them simple here.
list of letters
A
B
C
list of numbers
1
2
3
Mixed
A,1
A,2
B,2
B,3
C,1
C,3
I need to know what is missing:
A,3
B,1
C,2
The list of letters has about 85 entries
and the list of numbers has about 500 entries.
The mixed list has about 75,000 entries.
I can use either a database query (mysql 5.0) or Turbo Delphi 2006 to process text files. What would be the best way to find what is missing?
Thanks,
Dave
-
A cross join would produce every combination there is, given that you have both of your lists in SQL tables:
SELECT Letter + ',' + Number AS Combination FROM NumberList, LetterList
Using the combined result (maybe save it into a temporary table), you can use a NOT EXISTS query to find what is missing:
SELECT Combination FROM AllCombinations AS a WHERE NOT EXISTS (SELECT 1 FROM MyCombitations AS m WHERE m.Combination = a.Combination)
This would require a table
MyCombitations
, which lists all of the combinations you actually have and want to check against the full list.If you want to speed things up, you should use a permanent table of combinations and an index on the
MyCombitations.Combination
field. For repeated querying this is definitely advisable. -
75.000 is not much. Load list of letters and list of numbers into two separate TStringLists. Create dynamic array (indices would be indexes into those two string lists) with the appropriate dimensions. Fill it up. Load the data and mark each line in the array. Output all elements that were not marked.
In pseudocode (untested):
var i1, i2: integer; sl1, sl2: TStringList; cross: array of array of boolean; begin // load data into sl1, sl2 SetLength(cross, sl1.Count, sl2.Count); for i1 := 0 to sl1.Count - 1 do for i2 := 0 to sl2.Count - 1 do cross[i1, i2] := false; // for each element in 'combined' list // split it into elements s1, s2 i1 := sl1.IndexOf(s1); i2 := sl2.IndexOf(s2); if (i1 < 0) or (i2 < 0) then // report error else cross[i1, i2] := true; for i1 := 0 to sl1.Count - 1 do for i2 := 0 to sl2.Count - 1 do if not cross[i1, i2] then // output sl1[i1], sl2[i2] end;
-
SELECT letter,number FROM lettersTable l , numbersTable n WHERE ( SELECT count(*) FROM ( SELECT * FROM combinationsTable WHERE l.letter=combinationsTable.letter AND n.number = combinationsTable .number ) AS temp ) = 0;
This relies on SELECT * FROM A,B testing all combinations (implicit cross-join).
-
If you can sort the data (Turbo powers SysTools has a good sort routine which works well) then you can do this in code fairly quickly with two input lists and an output list. The concept behind this is simple:
- Take two lists, sorted in the same manner
- If the left side is less than the right side, then the right side is missing that value so add it to your "missing list" and increment the cursor for the left side.
- If they are equal, then increment both,
- If the right side is less than the left side, then only increment the right side (optionally add to the "must delete" list).
This process is sometimes refered to as the "Old Master/New Master" process and is extremely fast as your only walking both lists once.
Simple example:
var ListL : tStringList; // the left list ListR : tSTringList; // the right list ListA : tSTringList; // the Add List (should start empty) ListD : tStringList; // the Delete list (should start empty) iCurL : integer; // Left Cursor iCurR : integer; // Right Cursor iRes : integer; // result of compare begin iCurL := 0; iCurR := 0; ListL := tStringList.create; ListR := tSTringList.create; ListA := tSTringList.create; ListD := tStringList.create; InitAndLoadLists(ListL,ListR,ListA,ListD); while (iCurL <= ListL.Count-1) and (iCurR <= ListR.Count-1) do begin iRes := CompareStr(ListL.Strings[iCurL],ListR.Strings[iCurR]); if iRes = 0 then begin inc(iCurL); inc(iCurR); end; if iRes < 0 then begin ListA.Add(ListL.Strings[iCurL]); inc(iCurL); end; if iRes > 0 then begin listD.Add(ListR.Strings[iCurR]); inc(iCurR); end; end; while (iCurL <= ListL.Count-1) do begin listA.Add(ListL.Strings[iCurL]); inc(iCurL); end; while (iCurR <= ListR.Count-1) do begin listD.Add(ListR.Strings[iCurR]); inc(iCurR); end; ShowMessage( 'ADDS' + ^M+^J + ListA.Text); ShowMessage( 'DELS' + ^M+^J + ListD.Text); end;
The following code is what I used for testing. This is just an example, but in a real world situation I would build my keys so that they would sort properly, right padding numbers, and forcing case where appropriate. If your using the turbo power sort routines, you can use TWO sorts instead of two lists, but the end effect is the same.
procedure InitAndLoadLists(ListL, ListR, ListA, ListD: TStringList); begin ListL.Add('A,1'); ListL.Add('B,3'); ListL.Add('C,2'); ListR.Add('A,2'); ListR.Add('B,3'); ListR.Add('C,4'); end;
Edit: Code tested in Delphi 2009 and behaves properly.
-
There's no need to create extra tables. The following query would work just as well:
SELECT c.chr, n.num FROM chars c, nums n WHERE NOT EXISTS (SELECT 1 FROM mix m WHERE m.chr = c.chr AND m.num = n.num)
Dave Albert : This along with LIMIT 100 is exactly what I was looking for. Thank you all!
0 comments:
Post a Comment