Thursday, February 10, 2011

XPath + Firebug +XML/HTML +HTML AgilityPack C#

Hi! When using Firebug or some of the bookmarklets:

javascript:(function(){var a=document.createElement("script");a.setAttribute("src","http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.js");if(typeof jQuery=="undefined"){document.getElementsByTagName("head")[0].appendChild(a)}(function(){if(typeof jQuery=="undefined"){setTimeout(arguments.callee,100)}else{jQuery("*").one("click",function(d){jQuery(this)[0].scrollIntoView();for(var e="",c=jQuery(this)[0];c&&c.nodeType==1;c=c.parentNode){var b=jQuery(c.parentNode).children(c.tagName).index(c)+1;b>1?(b="["+b+"]"):(b="");e="/"+c.tagName.toLowerCase()+b+e}window.location.hash="#xpath:"+e;prompt('Twoje wyrazenie:',e);d.preventDefault();d.stopPropagation();jQuery("*").unbind("click",arguments.callee)})}})()})();

I receive a HTML's XPath. In order to parse HTML via HTML Agility Pack or Sgml, i need to convert it to XHTML (XML). But the problem is (i think) that XHTML's XPath is different from HTML's XPath. That's why Firebug's "XPath Copy" feature doesn't work when using it with

HtmlNode valueNode = doc.DocumentNode.SelectSingleNode(Firebugs_XPath);

For example, firebug/bookmarklet gives (if I remove tbody it won't help):

/html/body/div[2]/table/tbody/tr/td[2]/table/tbody/tr[2]/td[2]/form/table/tbody/tr[2]/td/div/table/tbody/tr/td[2]/table/tbody/tr[2]/td[2]/u

and proper code is (give or take):

/html/body/div/table/tr[1]/td[2]/table/tr[1]//td[2]/table[2]/tr[1]//td[2]/table/tr/tr/td[2]/u

My question is - how to fix that behavior, in order to make firebugXpath->HtmlAgilityPack work. And - is this possible, to use bookmarklet with built in C# WebBrowser component.

I will really appreciate your help.

  • Firebug's representation of your markup might be different from the actual XHTML because it tries to normalise the markup, and that's what the XPath queries are generated against rather than the actual underlying XHTML. I'm not sure it's possible to change this behaviour, you might just need to tweak the XPaths by hand.

    : I don't think that hand tweaking is possible, just take a look at these two Xpath's, they're really different. Maybe there's an another solution ?
    From cxfx

0 comments:

Post a Comment