Sunday, March 6, 2011

split string on a number of different characters

I'd like to split a string using one or more separator characters.

E.g. "a b.c", split on " " and "." would give the list ["a", "b", "c"].

At the moment, I can't see anything in the standard library to do this, and my own attempts are a bit clumsy. E.g.

def my_split(string, split_chars):
    if isinstance(string_L, basestring):
        string_L = [string_L]
    try:
        split_char = split_chars[0]
    except IndexError:
        return string_L

    res = []
    for s in string_L:
        res.extend(s.split(split_char))
    return my_split(res, split_chars[1:])

print my_split("a b.c", [' ', '.'])

Horrible! Any better suggestions?

From stackoverflow
  • >>> re.split('[ .]', 'a b.c')
    ['a', 'b', 'c']
    
  • This one replaces all of the separators with the first separator in the list, and then "splits" using that character.

    def split(string, divs):
        for d in divs[1:]:
            string = string.replace(d, divs[0])
        return string.split(divs[0])
    

    output:

    >>> split("a b.c", " .")
    ['a', 'b', 'c']
    
    >>> split("a b.c", ".")
    ['a b', 'c']
    

    I do like that 're' solution though.

  • Not very fast but does the job:

    def my_split(text, seps):
      for sep in seps:
        text = text.replace(sep, seps[0])
      return text.split(seps[0])
    

0 comments:

Post a Comment