Friday, April 15, 2011

Converting webpages from UTF-8 to ISO-8859-1 in linux

Anyone have a neat trick on how to convert a number of php and html files from UTF-8 to ISO-8859-1 in linux (Ubuntu)?

From stackoverflow
  • I think iconv is your answer...

    Form man iconv:

      NAME
          iconv - Convert encoding of given files from one encoding to another
    
      SYNOPSIS
          iconv -f encoding -t encoding inputfile
    
      DESCRIPTION
          The iconv program converts the encoding of characters in inputfile from one coded 
          character set to another. The result is written to standard output unless otherwise 
          specified by the --output option.
    
          .....
    

    So you could probably do a

    find $my_base_dir -name "*.php" -o -name "*.html" -exec sh -c "( \
       iconv -t ISO88592 -f UTF8 {} -o {}.iconv ; \
       mv {}.iconv {} ; \
    )" \;
    

    This will recursively find the appropriately named files and re-encode them (the temporary file is necessary, as iconv will truncate output before starting to work).

  • Ubuntu has recode

    $ sudo apt-get install recode
    $ recode UTF-8..latin1 *.php
    

    Recursively, thanks to Ted Dziuba:

    $ find . -name "*.php" -exec recode UTF-8..latin1 {} \;
    
    David Zaslavsky : recode is a fairly standard Linux program - not so standard that it's always installed by default, but it should be available on all distributions, not just Ubuntu.
    Svish : how can I do this recursive?
    Ted Dziuba : Recursively, it's find . -name "*.php" -exec recode UTF-8..latin1 {}\;
    Luiz Damim : +1 Found your answer while searching google for this conversion. It saved my day :)

0 comments:

Post a Comment