Friday, April 29, 2011

InterPage Linking Problem in MHT

Hello sir,

I had a task were I have to process one big .xml file and genrate different .html from it by using xslt. Now after release coustomer wants to combime all the html to single .mht file. I have converted the source code to genrate the .mht file, but I am having problem in linking the html file in the .mht.

I have created a .mht file in following format (See attachment).

EXPLANATION:-

The .mht file contains following htmls. MHTLinkingProblem.html Left.html Right.html Start.html Plus.gif Mktree.css

  1. The MHTLinkingProblem.html is the main html which is divided in 2 frames.
  2. The Start.html dummy html to display the start of the page
  3. The Right.html contains :- An image with a source attribute pointing to the content-id of the image embedded in the .mht file. Code: -

b. And a anchor tag with a name attribute. Code: -

  1. The Left.html contains two links to the Right.html. :- LinkWithoutFragmentRefURL LinkWithFragmentRefURL
    1. The mktreee.css file contain some url that refers to the image that are present in the .mht file.

THE PROBLEM:-

  1. The .mht file has one mktree.css file where its using the url to access one of the image which is embedded in the .mht file. Like this ul.mktree li.liOpen .bullet { cursor: pointer; background: url(cid:minus.gif);
    center left no-repeat; } its not able to pic the image in the .mht file.

  2. The link shown in 5.b is not working. That is I am not able to refer a particular fragment of the html file using .mht. This is because the .mht resolve this URL in following format:- cid:Right.html#LinkName Now there is no content-id naming cid:Right.html#LinkName so it returns a error Page Cannot be displayed.

QUESTION:-

  1. How to use the Relative Reference URL in the "href" attribute of the anchor tag so that it refers to a fragment(anchor tag with name attribute) of ANOTHER html in the same .mht file?

  2. If the image is embedded in the .mht file then How to use the image embedded in the .css file.

Please help me with this problem. Its been two weeks since I am searching for a solution to this problem, but I had no success.

Regards, Thanks in advance, Naveen Murthy

From stackoverflow
  • Hello all,

    I apologize for not pasting the code in the correct format. I have attached the code in the correct format now, so please copy the code as some test.mht

    I have posted this topic in mime as well as html catgories as it is relevant to both the topics.

    The core problems remain the same:- We have one .mht file containing multiple html files

    1. How to use the Relative Reference URL in the "href" attribute of the anchor tag (a.html#some_locn1) so that it refers to a fragment (anchor tag with name attribute) of ANOTHER html contained within the same .mht file

    EXPLANATION:-

    1. The .mht file contains following htmls

      a. MHTLinkingProblem.html
      b. Left.html
      c. Right.html
      d. Start.html
      e. Plus.gif
      f. Mktree.css
      
    2. The MHTLinkingProblem.html is the main html which is divided in 2 frames.

    3. The Start.html serves no purpose other than bringing up a filler page on load.
    4. The Right.html contains :-

      a.An image with a source attribute pointing to the content-id of the image embedded in the .mht file.

         <img src="cid:plus.gif" alt="plus.gif">
      

      b. And a anchor tag with a name attribute.

         <a name="LinkName">
      

    5.The Left.html contains two links to the Right.html.

      <a href="cid:Right.html" target="Data">LinkWithoutFragmentRefURL</a>
      <a href="cid:Right.html#LinkName" target="Data">LinkWithFragmentRefURL</a>
    

    THE PROBLEM (Revisited):-

    1. The .mht file has one mktree.css file where its using the url to access one of the image which is embedded in the .mht file. Like this ul.mktree li.liOpen .bullet { cursor: pointer; background: url(cid:minus.gif); center left no-repeat; } its not able to pic the image in the .mht file.

    2. The link shown in 5.b is not working. That is I am not able to refer a particular fragment of the html file using .mht. This is because the .mht resolve this URL in following format:- cid:Right.html#LinkName Now there is no content-id naming cid:Right.html#LinkName so it returns a error Page Cannot be displayed.

    Regards,

    Thanks in advance,

    Naveen Murthy

    Code follows from here


       From: Regression
      To: Regression-User
      Subject: Regression
      Message-ID: Regression.html
      Mime-Version: 1.0
      AbsoluteURI: "thismessage:/"
      Content-Type: multipart/related; boundary="----------1234567890------00-----";
                    type="text/html"
    
      ------------1234567890------00-----
      Content-Type: text/html; charset="US-ASCII"   
    
      <html>
       <head>   
           <title>MHTLinkingProblem</title>
        </head>
        <frameset cols="350,*">
           <frame src="file://c:/Left.html" name="Navigation" scrolling="yes"   noresize/> 
           <frame src="cid:Start.html" name="Data" scrolling="yes" noresize/>
           <noframes>
               <h1>Sorry, this browser does not support frames.</h1>
           </noframes>
        </frameset>
      </html>
    
      ------------1234567890------00-----
      Content-Location: file://c:/Left.html
      <!-- Content-ID: Left.html -->
      Content-Type:text/html;
    
      <html>
         <head>
            <title>Right</title>     
         </head>    
         <body> 
            <ul>
               <li><a href="cid:Right" target="Data">LinkWithoutFragmentRefURL</a></li>         
               <li><a href="cid:Right#LinkName" target="Data">LinkWithFragmentRefURL</a></li>     
            </ul>
         </body>
      </html>
    
      ------------1234567890------00-----
      Content-Location: file://c:/Right.html
      Content-ID: Right
      Content-Type:  text/html    
    
      <html>
         <head>
         <title>Right</title>        
         </head>
         <body >
           This is a Right Hand Side of Page.<br/><br/>   
           <IMG SRC="cid:plus.gif" ALT="plus.gif"></body>     
           <a href="#LinkName">Name</a>
           <!-- Please dont delete these br. These are required. -->    <br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/>           <br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/><br/>
           <a name="LinkName"><b>This the Location where LinkWithFragmentRefURL should refer to.</b></a>     
          </body>
       </html>
    
      ------------1234567890------00-----
      Content-Location: cid:Start.html
      Content-ID: Start.html
      Content-Type: text/html     
    
      <html>
         <head>
           <title>Start</title>
         </head>
         <body>
            This is a Starting Page.
         </body>
      </html>
    
     ------------1234567890------00-----
     Content-Type: image/gif
     Content-ID: plus.gif
     Content-Transfer-Encoding: base64
     Content-Disposition: inline; filename="plus.gif"
    
    
     R0lGODlhEAAQAKIAAMDAwICAgACAgAAA/wAAAP///wAAAAAAACH5BAUUAAUALAAAAAAQABAAAANC
     WLpFxHC5MV5Uk1LLstZcIwBfyRHDWH4WmpLr1mhq7LhUDRJCMNMwx6iH+QiEgKRDAholAcvO5hmN
     oHjVSyOr7SoSADs=
    
     ------------1234567890------00-------
    
  • A quick way to solve your problem would be to build what you want in html, then open it inside a browser that saves pages as mht files (like IE or Opera), save your html as mht, then inspect the generated code. I'll post some code as soon as I get it working.

    Edit 1:
    Ok. I've made a few tests and here's what I found out.

    1. mhtml browsers support is incomplete
    2. Opera's mhtml support seems to be better then IE's, or at least more reasonable
    3. Saving a frameset from internet explorer doesn't includes the frames inside the mht file (who was the idiot who came up with this!?)
    4. Saving a frameset from Opera does includes the frames inside the mht file
    5. Opera supports interlinking between internal pages inside mht files using relative urls, but only from the starting page
    6. Opera supports interlinking between internal pages inside mht files using cid urls from all internal pages
    7. Opera doesn't supports framesets inside mht files at all, even though it is able to save them
    8. IE doesn't supports interlinking between internal pages inside mht files at all, nor it supports framesets inside mht files

    Maybe Safari web archive format will achieve what you want. I'll try that later and if it works I'll post the results.

    Edit 2:
    I noticed your question was posted on many other forums. I hope this also clarifies some of the answers posted elsewhere.

    The ietf rfc 2110 ("mhtml spec") links to the html 2 spec, so I guess I was written long before frames existed. The rfc 2557 which superseeded 2110 doesn't talks about interlinking html files inside mht, but Jacob Palme's page about mhtml (Mr. Palme was one of the 3 people who wrote the rfc) states that:

    The main idea of the MHTML standard is that you send a HTML document, together with in-line graphics, applets, etc., and also other linked documents if you so wish, in a MIME multipart/related body part. Links in the HTML to other included parts can be provided by CID (Content-ID) URLs or by any other kind of URI, and the linked body part is identified in its heading by either a Content-ID (linked to by CID URLs) or a Content-Location (linked to by any other kind of URL). (In fact, the "Content-ID: foo@bar header" can be seen as a special case of the "Content-Location: CID: foo@bar header".)

    So I guess what you're trying to do is perfectly legal, it's just not supported yet.

    I also gave a look on Safari web archive. It saves everything inside a binary xml plist, which is not very intuitive to edit by hand. Also, the page is encoded inside the xml instead of being stored in plain html, which is even less friendly, so I won't even bother to test if it supports frames because it would be a nightmare to create a whole site inside a plist.

  • Here's a minimal mht file with 2 internal pages and an image to everyone who whishes to test mht interlinking. It works fine with Opera. It opens with IE, but the links are broken.

    Content-Type: multipart/related; start=<index@local>; boundary=next_part
    Content-Location: /
    Subject: =?utf-8?Q?MHT Interlinking?=
    MIME-Version: 1.0
    
    --next_part
    Content-Disposition: inline; filename=index.htm
    Content-Type: text/html; charset=utf-8; name=index.htm
    Content-Id: <index@local>
    Content-Location: /
    Content-Transfer-Encoding: 8bit
    
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
    <title>Index</title>
    </head>
    <body>
    <p>this is the start page</p>
    <p><a href="cid:internal@local">internal page</a><img src="cid:gif@local" /></p>
    </body>
    </html>
    
    --next_part
    Content-Disposition: inline; filename=internal.htm
    Content-Type: text/html; charset=utf-8; name=internal.htm
    Content-Id: <internal@local>
    Content-Location: /
    Content-Transfer-Encoding: 8bit
    
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
    <head>
        <title>Right</title>
    </head>
    <body>
        <p>this is the internal page</p>
        <p><a href="cid:index@local">start page</a></p>
    </body>
    </html>
    
    --next_part
    Content-Type: image/gif
    Content-Id: <gif@local>
    Content-Transfer-Encoding: base64
    Content-Location: right.gif
    
    R0lGODlhCgAaANU/AAQWPAkcQwARNBYuWQAOLB03Yw8lTnqGky5NfTVVhy1LezdYiUFOZBozX6Cw
    x3N+i4SYt1VxmwsaN0tplS07VCZCcP39/m+Gqh44ZVRwmwwhSn+UtFtxlRkyXSM+ay1Jdxw1YbXH
    2zBGbQAPLwMRL5WmwZ2txitIeBwuTlNvmoCMmggXNxEnUBMoUDFRghgsUp2svR8yWAsgSIuYpw4j
    TIKXtmN2lE5rlxgxXC5MfB02Yg0hSgAOK3iEkTZXif///yH5BAEAAD8ALAAAAAAKABoAQAaMQIfD
    VNpwRLaQxcdk5nQ7Xq/phEp5WKyE8fj9qD4ExkBSgZ/RKTPhqrAEhF4kkkndEpUXCgZGWNVNaFJn
    f2AKIDJSDQ0dAzQAAjwMBxAQNRsXEx8xSU0LCwgeLCszfQU0BAeEaauDVIKAVayvhbQarmsnOAFS
    Cr4nHgMAIxI9AceQIwQUXT3OzwdeP0EAOw==
    
    --next_part--
    

0 comments:

Post a Comment