Author Topic: How to backup or download this????? (Read 2595 times)

ID4 · « **on:** March 30, 2007, 01:19:25 PM »

http://web.archive.org/web/20040415065133/www.nethkin.com/bmori/amiga/dos1.html

Any Idea? I tried web download software, but no luck :-(

motorollin · « **Reply #1 on:** March 30, 2007, 01:24:55 PM »

I just tried it with SurfOffline 2.0 beta, and got loads of "Forbidden" errors. Maybe they recognise bot activity and block it? If so you might have to manually download the pages and modify any image tags if necessary to point to local paths (if they are absolute URLs).

--
moto

Colani1200 · « **Reply #2 on:** March 30, 2007, 01:39:32 PM »

You might want to try wget, maybe with the option --user-agent="Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)" to fake the user agent string. ;-)

motorollin · « **Reply #3 on:** March 30, 2007, 01:41:41 PM »

I think have spotted the problem. In the source code of dos1.html there is a tag "http://www.nethkin.com/bmori/amiga/dos1.html">". When I look in the log file for SurfOffline I see that it is trying to download, for example, http://www.nethkin.com/bmori/amiga/ados7.gif, when the file is actually located in http://web.archive.org/web/20010619122216/www.nethkin.com/bmori/amiga/ados7.gif. I think you would need to get a web spider software which ignores the BASE tag.

--
moto

blobrana · « **Reply #4 on:** March 30, 2007, 02:26:42 PM »

Hum,
just view source code, and use screen capture to rip the images (if needed)
[img=http://img339.imageshack.us/img339/4072/image3nw7.th.gif]

James · « **Reply #5 on:** March 30, 2007, 03:41:52 PM »

Have you emailed the author? He's already sharing all the info for free, I don't see why he wouldn't agree to give you a way to back it up for personal use.

Piru · « **Reply #6 on:** March 30, 2007, 04:07:01 PM »

This will allow you to get at least some of the files:

wget --user-agent "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705)" http://web.archive.org/web/20040415065133/www.nethkin.com/bmori/amiga/dos1.html --output-document - | perl -p -e 's/\/\/www.nethkin.com/\/\/web.archive.org\/web\/20040415065133\/www.nethkin.com/g' | wget --input-file - --force-html --user-agent "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.0.3705)" --convert-links --force-directories --no-host-directories --cut-dirs 3 --wait 20 --random-wait

The pages will appear in bmori/amiga/ directory (and subdirectories).

Note that archive.org has robots.txt file that if followed prohibits apps from recursively grabbing content. In this case I've added "--wait 20 --random-wait" to make the leeching less distruptive. Downloading takes longer, but shouldn't piss off archive.org admins.

I know this is far from perfect solution, but at least it works somewhat (without need for downloading everything by hand).

AmiKit · « **Reply #7 on:** March 30, 2007, 07:07:34 PM »

or you might want to try HTTrack.

Author Topic: How to backup or download this????? (Read 2595 times)

ID4

How to backup or download this?????

motorollin

Re: How to backup or download this?????

Colani1200

Re: How to backup or download this?????

motorollin

Re: How to backup or download this?????

blobrana

Re: How to backup or download this?????

James

Re: How to backup or download this?????

Piru

Re: How to backup or download this?????

AmiKit

Re: How to backup or download this?????