Thursday, May 5, 2011

XSLT character encoding problem

Hey guys,

I'm using XSLT and having great success, just got a couple of problems.

Warning: DOMDocument::loadXML() [domdocument.loadxml]: Entity 'Aring' not defined in Entity

Basically for some non-standard characters I am getting the above type of error. I really want to fix this once and for all. I know about character mapping but I cannot possibly write down every possible combination of special characters.

Thanks!

From stackoverflow
  • Include a DTD that defines the entities, like this one

    Here's a post at PHP.net that hints at how to succesfully include it.

    The DTD above should probably cover you; Å is an HTML entity, and the DTD above covers all HTML 4.01 entities.

    James : How do I include those? And do you have a list of DTDs that may be relevant?
    James : That is the code at the top of my .xsl page, but the problem still persists. Sorry for being slow if I am making a basic mistake.
    Ciaran McNulty : James you need the DOCTYPE on the XML file with the entities in.
  • When used without a DTD, XML only supports a very limited number of named entities. (<, >, &, and ", as I recall.) To include other out-of-charset characters without using a DTD, simply refer to them with a numeric entity instead.

    For example, Å corresponds to Unicode character U+00C5, "Latin Capital Letter A With Ring Above". You can therefore use Å (leading zeroes can be omitted) to include it in your document. If you're on Windows, the Character Map tool (on XP: Start > Programs > Accessories > System Tools) is a big help.

  • Å is not a standard XML entity. In order to support it in your XML document, your XML parser needs to be DTD-aware and the document must have a DOCTYPE declaration which either defines that entity or refers to a DTD that defines that entity. An XHTML DTD, for example, defines Å to mean Å.

    It is correct for your DOM XML parser to throw an error when it sees a named entity that it is not already aware of, and the parser is either not DTD-aware or there is no DOCTYPE declaration for what that entity means. XML itself defines the entities <, >, &, and ". These are the named entities that can be safely used in any XML application.

    If you are writing the document yourself, then just don't use Å - use a numeric equivalent instead or, assuming you're using Unicode, just use the character literal.

    If you need to be able to parse XML documents from other people containing any other named entity, and the documents don't have a DOCTYPE, then as Frank mentioned you will need to fix the document yourself by inserting a correct DOCTYPE after the XML declaration.

0 comments:

Post a Comment