Code Answer: End-line characters from lines read from text file, using Python

When reading lines from a text file using python, the end-line character often needs to be truncated before processing the text, as in the following example:

    f = open("myFile.txt", "r")
    for line in f:
        line = line[:-1]
        # do something with line

Is there an elegant way or idiom for retrieving text lines without the end-line character?

From stackoverflow

What's wrong with your code? I find it to be quite elegant and simple. The only problem is that if the file doesn't end in a newline, the last line returned won't have a '\n' as the last character, and therefore doing line = line[:-1] would incorrectly strip off the last character of the line.

The most elegant way to solve this problem would be to define a generator which took the lines of the file and removed the last character from each line only if that character is a newline:
```
def strip_trailing_newlines(file):
    for line in file:
        if line[-1] == '\n':
            yield line[:-1]
        else:
            yield line

f = open("myFile.txt", "r")
for line in strip_trailing_newlines(f):
    # do something with line
```
too much php : Mac files using '\r', windows uses '\r\n', it starts to get chunky. Much better to use str.rstrip()

Adam Rosenfield : If the file is opened in text mode, the platform's native line endings are automatically converted to a single '\n' as they are read in. And only _really old_ Mac OSs use plain '\r'. You can't use rstrip() if you want to retain trailing spaces and tabs.

pythonquick : Good idea, with the generator. Would be handy in a reusable library. I would combine your solution with efonitis' solution (to save the if:else:). Without the reusable library at hand, I would prefer efotinis' solution (using line.rstrip('\n')).
Simple. Use splitlines()
```
L = open("myFile.txt", "r").read().splitlines();
for line in L: 
    process(line) # this 'line' will not have '\n' character at the end
```
Matthew Trevor : But do note this loads the entire file into memory first, which may render it unsuitable for some situations.

Vijay Dev : @Matthew: Yes, you are right.

David Sykes : renders it exactly right for me, thanks
You may also consider using line.rstrip() to remove the whitespaces at the end of your line.

too much php : I use rstrip() as well, but you have to keep in mind it also takes out trailing spaces and tabs

monkut : As efotinis has shown, if you specify the chars argument, you can specfy what to strip. From the documentation: """rstrip([chars]) The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace."""
The idiomatic way to do this in Python is to use rstrip('\n'):
```
for line in open('myfile.txt'):  # opened in text-mode; all EOLs are converted to '\n'
    line = line.rstrip('\n')
    process(line)
```
Each of the other alternatives has a gotcha:
- file('...').read().splitlines() has to load the whole file in memory at once.
- line = line[:-1] will fail if the last line has no EOL.
too much php : HTTP and other protocols specify '\r\n' for line endings, so you should use line.rstrip('\r\n') for robustness.

Long time ago, there was Dear, clean, old, BASIC code that could run on 16 kb core machines: like that:

if (not open(1,"file.txt")) error "Could not open 'file.txt' for reading"
while(not eof(1)) 
  line input #1 a$
  print a$
wend
close

Now, to read a file line by line, with far better hardware and software (Python), we must reinvent the wheel:

def line_input (file):
    for line in file:
        if line[-1] == '\n':
            yield line[:-1]
        else:
            yield line

f = open("myFile.txt", "r")
for line_input(f):
    # do something with line

I am induced to think that something has gone the wrong way somewhere...

Code Answer

Wednesday, March 16, 2011

End-line characters from lines read from text file, using Python

0 comments:

Post a Comment

Blog Archive