Amiga.org

Amiga computer related discussion => Amiga Tutorials => Topic started by: ID4 on March 30, 2007, 01:19:25 PM

Title: How to backup or download this?????
Post by: ID4 on March 30, 2007, 01:19:25 PM
http://web.archive.org/web/20040415065133/www.nethkin.com/bmori/amiga/dos1.html

Any Idea? I tried web download software, but no luck :-(
Title: Re: How to backup or download this?????
Post by: motorollin on March 30, 2007, 01:24:55 PM
I just tried it with SurfOffline 2.0 beta, and got loads of "Forbidden" errors. Maybe they recognise bot activity and block it? If so you might have to manually download the pages and modify any image tags if necessary to point to local paths (if they are absolute URLs).

--
moto
Title: Re: How to backup or download this?????
Post by: Colani1200 on March 30, 2007, 01:39:32 PM
You might want to try wget (http://aminet.net/comm/tcp/wget-1.8.2.lha), maybe with the option --user-agent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" to fake the user agent string.  ;-)
Title: Re: How to backup or download this?????
Post by: motorollin on March 30, 2007, 01:41:41 PM
I think have spotted the problem. In the source code of dos1.html there is a tag "". When I look in the log file for SurfOffline I see that it is trying to download, for example, http://www.nethkin.com/bmori/amiga/ados7.gif, when the file is actually located in http://web.archive.org/web/20010619122216/www.nethkin.com/bmori/amiga/ados7.gif. I think you would need to get a web spider software which ignores the BASE tag.

--
moto
Title: Re: How to backup or download this?????
Post by: blobrana on March 30, 2007, 02:26:42 PM
Hum,
just view source code, and use screen capture to rip the images (if needed)
[img=http://img339.imageshack.us/img339/4072/image3nw7.th.gif] (http://img339.imageshack.us/my.php?image=image3nw7.gif)
Title: Re: How to backup or download this?????
Post by: James on March 30, 2007, 03:41:52 PM
Have you emailed the author? He's already sharing all the info for free, I don't see why he wouldn't agree to give you a way to back it up for personal use.
Title: Re: How to backup or download this?????
Post by: Piru on March 30, 2007, 04:07:01 PM
This will allow you to get at least some of the files:

wget --user-agent "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705)" http://web.archive.org/web/20040415065133/www.nethkin.com/bmori/amiga/dos1.html --output-document - | perl -p -e 's/\/\/www.nethkin.com/\/\/web.archive.org\/web\/20040415065133\/www.nethkin.com/g' | wget --input-file - --force-html --user-agent "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705)" --convert-links --force-directories --no-host-directories --cut-dirs 3 --wait 20 --random-wait

The pages will appear in bmori/amiga/ directory (and subdirectories).

Note that archive.org has robots.txt file that if followed prohibits apps from recursively grabbing content. In this case I've added "--wait 20 --random-wait" to make the leeching less distruptive. Downloading takes longer, but shouldn't piss off archive.org admins.

I know this is far from perfect solution, but at least it works somewhat (without need for downloading everything by hand).
Title: Re: How to backup or download this?????
Post by: AmiKit on March 30, 2007, 07:07:34 PM
or you might want to try HTTrack (http://www.httrack.com/).