Thursday, March 24, 2011

Can I know how much of a XML document is processed by an XSL transformation?

I have rather large input XML files that must be converted to a given format with XSL. These XSL files have been hand-crafted, and we want to make sure that every non-empty text node in the original document is taken into account, i.e. either value-of'ed, copy-of'ed, or read and concatenated with something else.

Is this even possible? I.e., is it possible to know how much of an input document is "covered" by an XSL transformation?

From stackoverflow
  • You could extract all the text from the original Xml and then search if these strings are in the resulting document. This isn't any proof that really everything was converted, but you might find some obviously missing parts.

    lindelof : Yes but my problem is that some text might be converted, e.g. date fields might be converted from one locale to another.
    sth : I feared that would be the case...
  • The best I can come up with now is to add something like this to the end of your xsl:

    <xsl:template match="text()[normalize-space()]">
      To battle stations!
      This sneaky tag tried to escape: <xsl:value-of select="name(..)"/>
    </xsl:template>
    

    But it really depends on <xsl:apply-templates/> being called in all the right places and probably won't do it for non-trivial stylesheets...

    Dimitre Novatchev : Unfortunately, the described technique is useful only to capture unmatched text nodes. A text node may not be matched by any template but it still can be accessed/copied by the code of other templates. On the other side even if a text node is matched by a template it still may not be copied!
    Dimitre Novatchev : Note, I haven't downvoted you :)
  • The only way I have done this this in the past is by stepping through the XSL with a product like Altova's XMLSpy. This is very tedious for large XSL and XML documents of course but have found it necessary sometimes to find out what is going on with transformations.

  • The general answer is negative.

    Also generally impossible is the implementation of the less-ambitious idea to find all text nodes to which no template was matched during runtime. This is so, because even if matching templates are defined, they can only be selected for processing as result of an <xsl:apply-templates> with a "select" attribute that selects the specific text nodes.

    Whether or not this is done during run time is generally impossible to analyze.

    It is also not possible in general to analyze every XPath expression used in the select attribute of <xsl:value-of/> and of <xsl:copy-of>, because such an expression may contain an xsl variable and we must know the runtime contents of this variable in order to determine what nodes will be selected.

0 comments:

Post a Comment