Sunday, February 13, 2011

Escape path separator in a regular expression

I need to write a regular expression that finds javascript files that match

<anypath><slash>js<slash><anything>.js

For example, it should work for both :

  • c:\mysite\js\common.js (Windows)
  • /var/www/mysite/js/common.js (UNIX)

The problem is that the file separator in Windows is not being properly escaped :

pattern = Pattern.compile(
     "^(.+?)" + 
     File.separator +
     "js" +
     File.separator +
     "(.+?).js$" );

Throwing

java.util.regex.PatternSyntaxException: Illegal/unsupported escape sequence

Is there any way to use a common regular expression that works in both Windows and UNIX systems ?

  • Does Pattern.quote(File.separator) do the trick?

    EDIT: This is available as of Java 1.5 or later. For 1.4, you need to simply escape the file separator char:

    "\\" + File.separator
    

    Escaping punctuation characters will not break anything, but escaping letters or numbers unconditionally will either change them to their special meaning or lead to a PatternSyntaxException. (Thanks Alan M for pointing this out in the comments!)

    Guido : Great, what a pity it is only available since Java 1.5+ (I still need it to work in 1.4)
    From Tomalak
  • Can't you just use a backslash to escape the path separator like so:

    pattern = Pattern.compile(
         "^(.+?)\\" + 
         File.separator +
         "js\\" +
         File.separator +
         "(.+?).js$" );
    
  • Why don't you escape File.separator:

    ... +
    "\\" + File.separator +
    ...
    

    to fit Pattern.compile requirements? I hope "\/" (unix case) is processed as a single "/".

    From gimel
  • I've tested gimel's answer on a Unix system - putting "\\" + File.separator works fine - the resulting "\/" in the pattern correctly matches a single "/"

    From Alnitak
  • Is there any way to use a common regular expression that works in both Windows and UNIX systems ?

    Yes, just use a regex that matches both kinds of separator.

    pattern = Pattern.compile(
        "^(.+?)" + 
        "[/\\\\]" +
        "js" +
        "[/\\\\]" +
        "(.+?)\\.js$" );
    

    It's safe because neither Windows nor Unix permits those characters in a file or directory name.

    From Alan Moore

0 comments:

Post a Comment