[Coco] Site downloader help

Andrew keeper63 at cox.net
Tue Jan 24 16:18:44 EST 2023


HTTrack is a hit-or-miss thing sometimes; you usually can't rely on the 
default settings for a lot of sites, and you have to really understand 
what it's doing under-the-hood to configure the settings "just right" 
for each site.

If you don't - you either end up with little-to-nothing - or your end up 
with the world (ie - all of the website and then some, plus other 
websites if you're really unlucky).

I usually use a custom script I wrote for wget that will usually grab 
everything from a site - and sometimes a bit more. Then it does some 
post-cleanup, and a couple of other tidbits. It ain't perfect, but 
usually will get the job done (provided there ain't a rate-limiter 
monitor on the site that boots me off and bans my access - if I suspect 
such a thing, I'll put in a randomized grab and rate-limit to better 
simulate a human which will usually get around that - but takes forever 
to pull down a large site).

You can probably come up with a decent enough wget command to pull what 
you need down to your PC; I'd only be concerned about being banned 
without permission to do it (for some sites, I don't care - but there 
are ones - like the CoCo Archive - that I would care quite a bit about, 
and so if I'm grabbing more than a few files automagically - I ask, 
first)...

Andrew L. Ayers
Glendale, Arizona
phoenixgarage.org
github.com/andrew-ayers


More information about the Coco mailing list