When reading lines from a text file using python, the end-line character often needs to be truncated before processing the text, as in the following example:
f = open("myFile.txt", "r")
for line in f:
line = line[:-1]
# do something with line
Is there an elegant way or idiom for retrieving text lines without the end-line character?
-
What's wrong with your code? I find it to be quite elegant and simple. The only problem is that if the file doesn't end in a newline, the last line returned won't have a
'\n'
as the last character, and therefore doingline = line[:-1]
would incorrectly strip off the last character of the line.The most elegant way to solve this problem would be to define a generator which took the lines of the file and removed the last character from each line only if that character is a newline:
def strip_trailing_newlines(file): for line in file: if line[-1] == '\n': yield line[:-1] else: yield line f = open("myFile.txt", "r") for line in strip_trailing_newlines(f): # do something with line
too much php : Mac files using '\r', windows uses '\r\n', it starts to get chunky. Much better to use str.rstrip()Adam Rosenfield : If the file is opened in text mode, the platform's native line endings are automatically converted to a single '\n' as they are read in. And only _really old_ Mac OSs use plain '\r'. You can't use rstrip() if you want to retain trailing spaces and tabs.pythonquick : Good idea, with the generator. Would be handy in a reusable library. I would combine your solution with efonitis' solution (to save the if:else:). Without the reusable library at hand, I would prefer efotinis' solution (using line.rstrip('\n')). -
Simple. Use splitlines()
L = open("myFile.txt", "r").read().splitlines(); for line in L: process(line) # this 'line' will not have '\n' character at the end
Matthew Trevor : But do note this loads the entire file into memory first, which may render it unsuitable for some situations.Vijay Dev : @Matthew: Yes, you are right.David Sykes : renders it exactly right for me, thanks -
You may also consider using line.rstrip() to remove the whitespaces at the end of your line.
too much php : I use rstrip() as well, but you have to keep in mind it also takes out trailing spaces and tabsmonkut : As efotinis has shown, if you specify the chars argument, you can specfy what to strip. From the documentation: """rstrip([chars]) The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace.""" -
The idiomatic way to do this in Python is to use rstrip('\n'):
for line in open('myfile.txt'): # opened in text-mode; all EOLs are converted to '\n' line = line.rstrip('\n') process(line)
Each of the other alternatives has a gotcha:
- file('...').read().splitlines() has to load the whole file in memory at once.
- line = line[:-1] will fail if the last line has no EOL.
too much php : HTTP and other protocols specify '\r\n' for line endings, so you should use line.rstrip('\r\n') for robustness. -
Long time ago, there was Dear, clean, old, BASIC code that could run on 16 kb core machines: like that:
if (not open(1,"file.txt")) error "Could not open 'file.txt' for reading" while(not eof(1)) line input #1 a$ print a$ wend close
Now, to read a file line by line, with far better hardware and software (Python), we must reinvent the wheel:
def line_input (file): for line in file: if line[-1] == '\n': yield line[:-1] else: yield line f = open("myFile.txt", "r") for line_input(f): # do something with line
I am induced to think that something has gone the wrong way somewhere...
0 comments:
Post a Comment