[Coco] Site downloader help
Andrew
keeper63 at cox.net
Tue Jan 24 16:18:44 EST 2023
HTTrack is a hit-or-miss thing sometimes; you usually can't rely on the
default settings for a lot of sites, and you have to really understand
what it's doing under-the-hood to configure the settings "just right"
for each site.
If you don't - you either end up with little-to-nothing - or your end up
with the world (ie - all of the website and then some, plus other
websites if you're really unlucky).
I usually use a custom script I wrote for wget that will usually grab
everything from a site - and sometimes a bit more. Then it does some
post-cleanup, and a couple of other tidbits. It ain't perfect, but
usually will get the job done (provided there ain't a rate-limiter
monitor on the site that boots me off and bans my access - if I suspect
such a thing, I'll put in a randomized grab and rate-limit to better
simulate a human which will usually get around that - but takes forever
to pull down a large site).
You can probably come up with a decent enough wget command to pull what
you need down to your PC; I'd only be concerned about being banned
without permission to do it (for some sites, I don't care - but there
are ones - like the CoCo Archive - that I would care quite a bit about,
and so if I'm grabbing more than a few files automagically - I ask,
first)...
Andrew L. Ayers
Glendale, Arizona
phoenixgarage.org
github.com/andrew-ayers
More information about the Coco
mailing list