Friday, February 4, 2011

How do you implement Levenshtein distance in Delphi?

I'm posting this in the spirit of answering your own questions.

The question I had was: How can I implement the Levenshtein algorithm for calculating edit-distance between two strings, as described here, in Delphi?

Just a note on performance: This thing is very fast. On my desktop (2.33 Ghz dual-core, 2GB ram, WinXP), I can run through an array of 100K strings in less than one second.

  • function EditDistance(s, t: string): integer;
    var
      d : array of array of integer;
      i,j,cost : integer;
    begin
      {
      Compute the edit-distance between two strings.
      Algorithm and description may be found at either of these two links:
      http://en.wikipedia.org/wiki/Levenshtein_distance
      http://www.google.com/search?q=Levenshtein+distance
      }
    
      try
        //initialize our cost array
        SetLength(d,Length(s)+1);
        for i := Low(d) to High(d) do begin
          SetLength(d[i],Length(t)+1);
        end;
    
        for i := Low(d) to High(d) do begin
          d[i,0] := i;
          for j := Low(d[i]) to High(d[i]) do begin
            d[0,j] := j;
          end;
        end;
    
        //store our costs in a 2-d grid  
        for i := Low(d)+1 to High(d) do begin
          for j := Low(d[i])+1 to High(d[i]) do begin
            if s[i] = t[j] then begin
              cost := 0;
            end
            else begin
              cost := 1;
            end;
    
            //to use "Min", add "Math" to your uses clause!
            d[i,j] := Min(Min(
                       d[i-1,j]+1,      //deletion
                       d[i,j-1]+1),     //insertion
                       d[i-1,j-1]+cost  //substitution
                       );
          end;  //for j
        end;  //for i
    
        //now that we've stored the costs, return the final one
        Result := d[Length(s),Length(t)];
      finally
        //cleanup
        for i := Low(d) to High(d) do begin
          for j := Low(d[i]) to High(d[i]) do begin
            SetLength(d[i],0);
          end;  //for j
        end;  //for i
        SetLength(d,0);
      end;  //try-finally
    end;
    

0 comments:

Post a Comment