Sunday, January 23, 2011

Apache rewrite rules and special characters

I have a server where some files have an actual %20 in their name (they are generated by an automated tool which handles spaces this way, and I can't do anything about this); this is not a space: it's "%" followed by "2" followed by "0".

On this server, there is an Apache web server, and there are some web pages which links to those files, using their name in URLs like http://servername/file%20with%20a%20name%20like%20this.html; those pages are also generated by the same tool, so I (again!) can't do anything about that. A full search-and-replace on all files, pages and URLs is out of question here.

The problem: when Apache gets called with an URL like the one above, it (correctly) translates the "%20"s into spaces, and then of course it can't find the files, because they don't have actuale spaces in their names.

How can I solve this?

I discovered than by using an URL like http://servername/file%2520name.html it works nicely, because then Apache translates "%25" into a "%" sign, and thus the correct filename gets built.

I tried using an Apache rewrite rule, and I can succesfully replace spaces with hypens with a syntax like this:

RewriteRule    (.*)\ (.*)      $1-$2

The problem: when I try to replace them with a "%2520" sequence, this just doesn't happen. If I use

RewriteRule    (.*)\ (.*)      $1%2520$2

then the resulting URL is http://servername/file520name.html; I've tried "%25" too, but then I only get a "5"; it just looks like the initial "%2" gets somewhat discarded.

The questions:

  • How can I build such a regexp to replace spaces with "%2520"?
  • Is this the only way I can deal with this issue (other than a full search-and-replace which, as I said, can't be done), or do you have any better idea?

Edit:

Escaping was the key, it works using this rule:

RewriteRule    (.*)\ (.*)      $1\%2520$2

But it only works if there is one "%20" in the initial URL; I get an "internal server error" if there is more than one.

Looks like I'm almost there... please help :-)


Edit 2:

I was able to get it to work for two spaces using the following rule:

RewriteRule    (.*)\ (.*)\ (.*)     $1\%2520$2\%2520$3

This is enough for my needs, as URLs generated by the tool can only contain at most two "%20"s; but, out of curiosity: is there any way to make this work with any number of spaces? It works with the first rule if replacing any number of spaces with a normal character, this problem happens only when special characters are involved.

  • The % is being read as a back reference, so you need to escape the %.

    Massimo : Ok, but **how**? I tried "%%25" and "\%25", but both didn't work.
    Massimo : Ok, it worked using "$1\%2520$2", but see my edit on the main question for another problem.
    Nerdling : You can nest parentheses: to handle any number of something and catch it: ((pattern)*) You won't be able to reference these in the URL rewrite as the quantity may be infinite.
    From Nerdling

0 comments:

Post a Comment