Wednesday, March 16, 2011

.NET 2.0 - Tokenizing space separated text

Suppose you have output like this:

Word1           Word2   Word3      Word4

Where the number of spaces between words is arbitrary. I want to break it into an array of words.

I used the following code:

string[] tokens =
         new List<String>(input.Split(' '))
             .FindAll
             (
                 delegate(string token)
                 {
                      return token != String.Empty;
                 }
             ).ToArray();

Not exactly efficient, but does the job nicely.

How would you do it?

From stackoverflow
  • I would use a regex for the split with "\w+" for the pattern.

    FlySwat : Then I'd have two problems.
    EBGreen : :) - So twice as many chances to expand your abilities.
    EBGreen : Actually I had the pattern wrong anyway...D'OH!
    FlySwat : See, that's why I'm afraid of regex.
  • He already mentions string.Split(). What he's missing is StringSplitOptions.RemoveEmptyEntries:

    string[] tokens = input.Split(new char[] { ' ' },
        StringSplitOptions.RemoveEmptyEntries);
    
    FlySwat : Genius. I didn't notice that overload.
    EBGreen : +1 for using what is already there.
    FlySwat : You have to admit, anonymous delegates make it look cooler =)
    John Rudy : Cooler, yes, more legible and obvious ... not so much. I never noticed this overload either, but it absolutely rocks! +1!
    Joel Coehoorn : The best part is that it's easy to include tabs, newlines, and other whitespace in the array if you want.
    Mitchel Sellers : Wow, in all my time using things I've never noticed this overload...

0 comments:

Post a Comment