Wednesday, March 16, 2011

End-line characters from lines read from text file, using Python

When reading lines from a text file using python, the end-line character often needs to be truncated before processing the text, as in the following example:

    f = open("myFile.txt", "r")
    for line in f:
        line = line[:-1]
        # do something with line

Is there an elegant way or idiom for retrieving text lines without the end-line character?

From stackoverflow
  • What's wrong with your code? I find it to be quite elegant and simple. The only problem is that if the file doesn't end in a newline, the last line returned won't have a '\n' as the last character, and therefore doing line = line[:-1] would incorrectly strip off the last character of the line.

    The most elegant way to solve this problem would be to define a generator which took the lines of the file and removed the last character from each line only if that character is a newline:

    def strip_trailing_newlines(file):
        for line in file:
            if line[-1] == '\n':
                yield line[:-1]
            else:
                yield line
    
    f = open("myFile.txt", "r")
    for line in strip_trailing_newlines(f):
        # do something with line
    
    too much php : Mac files using '\r', windows uses '\r\n', it starts to get chunky. Much better to use str.rstrip()
    Adam Rosenfield : If the file is opened in text mode, the platform's native line endings are automatically converted to a single '\n' as they are read in. And only _really old_ Mac OSs use plain '\r'. You can't use rstrip() if you want to retain trailing spaces and tabs.
    pythonquick : Good idea, with the generator. Would be handy in a reusable library. I would combine your solution with efonitis' solution (to save the if:else:). Without the reusable library at hand, I would prefer efotinis' solution (using line.rstrip('\n')).
  • Simple. Use splitlines()

    L = open("myFile.txt", "r").read().splitlines();
    for line in L: 
        process(line) # this 'line' will not have '\n' character at the end
    
    Matthew Trevor : But do note this loads the entire file into memory first, which may render it unsuitable for some situations.
    Vijay Dev : @Matthew: Yes, you are right.
    David Sykes : renders it exactly right for me, thanks
  • You may also consider using line.rstrip() to remove the whitespaces at the end of your line.

    too much php : I use rstrip() as well, but you have to keep in mind it also takes out trailing spaces and tabs
    monkut : As efotinis has shown, if you specify the chars argument, you can specfy what to strip. From the documentation: """rstrip([chars]) The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace."""
  • The idiomatic way to do this in Python is to use rstrip('\n'):

    for line in open('myfile.txt'):  # opened in text-mode; all EOLs are converted to '\n'
        line = line.rstrip('\n')
        process(line)
    

    Each of the other alternatives has a gotcha:

    • file('...').read().splitlines() has to load the whole file in memory at once.
    • line = line[:-1] will fail if the last line has no EOL.
    too much php : HTTP and other protocols specify '\r\n' for line endings, so you should use line.rstrip('\r\n') for robustness.
  • Long time ago, there was Dear, clean, old, BASIC code that could run on 16 kb core machines: like that:

    if (not open(1,"file.txt")) error "Could not open 'file.txt' for reading"
    while(not eof(1)) 
      line input #1 a$
      print a$
    wend
    close
    

    Now, to read a file line by line, with far better hardware and software (Python), we must reinvent the wheel:

    def line_input (file):
        for line in file:
            if line[-1] == '\n':
                yield line[:-1]
            else:
                yield line
    
    f = open("myFile.txt", "r")
    for line_input(f):
        # do something with line
    

    I am induced to think that something has gone the wrong way somewhere...

0 comments:

Post a Comment