Looking for a linux application (or firefox extension) that will allow me to scrape an html mockup and keep the page's integrity. Firefox does an almost perfect job but doesn't grab images referenced in the css.
The Scrabbook extension for Firefox gets everything, but flattens the directory structure.
I wouldn't terribly mind if all folders became children of the index page.
-
Have you tried wget?
From etchasketch -
Teleport Pro is great for this sort of thing. You can point it at complete websites and it will download a copy locally maintaining directory structure, and replacing absolute links with relative ones as necessary. You can also specify whether you want content from other third-party websites linked to from the original site.
From X-Cubed -
See Website Mirroring With wget
wget --mirror –w 2 –p --HTML-extension –-convert-links http://www.yourdomain.com
From Gilean -
wget -r
does what you want, and if not, there are plenty of flags to configure it. Seeman wget
.Another option is
curl
, which is even more powerful. See http://curl.haxx.se/.From Thomas -
/palmface, i didn't even consider checking the man for wget/curl.
wget, though those options should do it all, doesn't seem to be working for me. have to toy with the command line.
From Adam
0 comments:
Post a Comment