Friday, January 21, 2011

how to save a public html page with all media and preserve structure

Looking for a linux application (or firefox extension) that will allow me to scrape an html mockup and keep the page's integrity. Firefox does an almost perfect job but doesn't grab images referenced in the css.

The Scrabbook extension for Firefox gets everything, but flattens the directory structure.

I wouldn't terribly mind if all folders became children of the index page.

  • Have you tried wget?

  • Teleport Pro is great for this sort of thing. You can point it at complete websites and it will download a copy locally maintaining directory structure, and replacing absolute links with relative ones as necessary. You can also specify whether you want content from other third-party websites linked to from the original site.

    From X-Cubed
  • See Website Mirroring With wget

    wget --mirror –w 2 –p --HTML-extension –-convert-links http://www.yourdomain.com
    
    From Gilean
  • wget -r does what you want, and if not, there are plenty of flags to configure it. See man wget.

    Another option is curl, which is even more powerful. See http://curl.haxx.se/.

    From Thomas
  • /palmface, i didn't even consider checking the man for wget/curl.

    wget, though those options should do it all, doesn't seem to be working for me. have to toy with the command line.

    From Adam

0 comments:

Post a Comment