For debugging purposes, I need to recursively search a directory for all files which start with a UTF-8 byte order mark (BOM). My current solution is a simple shell script:
find -type f |
while read file
do
if [ "`head -c 3 -- "$file"`" == $'\xef\xbb\xbf' ]
then
echo "found BOM in: $file"
fi
done
Or, if you prefer short, unreadable one-liners:
find -type f|while read file;do [ "`head -c3 -- "$file"`" == $'\xef\xbb\xbf' ] && echo "found BOM in: $file";done
It doesn't work with filenames that contain a line break, but such files are not to be expected anyway.
Is there any shorter or more elegant solution?
Are there any interesting text editors or macros for text editors?
-
If you accept some false positives (in case there are non-text files, or in the unlikely case there is a ZWNBSP in the middle of a file), you can use grep:
fgrep -rl `echo -ne '\xef\xbb\xbf'` .
From CesarB -
find -type f -print0 | xargs -0 grep -l `printf '^\xef\xbb\xbf'` | sed 's/^/found BOM in: /'
find -print0
puts a null \0 between each file name instead of using new linesxargs -0
expects null separated arguments instead of line separatedgrep -l
lists the files which match the regex- The regex
^\xeff\xbb\xbf
isn't entirely correct, as it will match non-BOMed UTF-8 files if they have zero width spaces at the start of a line
MSalters : You still need a "head 1" in the pipe before the grepFrom Jonathan Wright -
I would use something like:
grep -orHbm1 "^`echo -ne '\xef\xbb\xbf'`" . | sed '/:0:/!d;s/:0:.*//'
Which will ensure that the BOM occurs starting at the first byte of the file.
From Marcus Griep -
What about this one simple command which not just finds but clears nasty BOM? :)
find . -type f -exec sed 's/^\xEF\xBB\xBF//' -i.bak {} \; -exec rm {}.bak \;
I love "find" :)
If you want just to show BOM files, use this one:
grep -rl $'\xEF\xBB\xBF' .
From Denis -
find . -type f -print0 | xargs -0r awk ' /^\xEF\xBB\xBF/ {print FILENAME} {nextfile}'
Most of the solutions given above test more than the first line of the file, even if some (such as Marcus's solution) then filter the results. This solution only tests the first line of each file so it should be a bit quicker.
0 comments:
Post a Comment