Welcome, Guest. Please login or register.

Author Topic: New CopyMem & CopyMemQuick patch  (Read 4828 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
New CopyMem & CopyMemQuick patch
« on: May 04, 2009, 05:21:34 AM »
It's called CopyMem060 and I need some software testers before uploading to Aminet.
It's faster than CMQ060, mostly with small copies which are *very* common.
I used up over 100MB of memory logging the calls with Snoopy in about a minute.
I also just added an English worbench.guide AmigaGuide that brings up some useful help when the help key is pressed in Workbench with AmigaOS 3.9.
I'm looking for any feedback, comments, or bug reports.
Programs are here...

http://www.heywheel.com/matthey/Amiga/programming.html
 

Offline HammerD

Re: New CopyMem & CopyMemQuick patch
« Reply #1 on: May 04, 2009, 08:45:57 AM »
@matthey

First, thanks alot for this, I am always looking for speedup's :)

Ok, if i run this from shell after booting, it works ok.

If I put it in startup-sequence, or even user-startup, my machine doesn't finish booting, just stays at a black screen.

[Edit: I have to patch my scsi.device as Chris Hodges just pointed out that there is a serious memory bug.  Perhaps this is causing it not to work on my system.  that bug certainly kills TLSFMem.]

I am using:
A4000T with Quikpak 060
Deneb with rom updates in flashrom
OS 3.9 with AFA_OS
TLSFmem

AmigaOS 4.x Beta Tester - Classic Amiga enthusiast - http://www.hd-zone.com is my Amiga Blog, check it out!
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: New CopyMem & CopyMemQuick patch
« Reply #2 on: May 04, 2009, 12:31:11 PM »
I had a crazy idea for large copies once. If a PPC is detected, use that for copies that are large enough to mask the effects of the context switch. Only downside there is that such copies are rare.
int p; // A
 

Offline wawrzon

Re: New CopyMem & CopyMemQuick patch
« Reply #3 on: May 04, 2009, 12:57:54 PM »
@hammerd, matthey:
same here. no boot if set in s-s. im using a4k with csppc060, custom 3.9 kick with all the newest patches, afa_os4.4(+powerwindows) atm.

edit:typo
 

Offline Piru

  • \' union select name,pwd--
  • Hero Member
  • *****
  • Join Date: Aug 2002
  • Posts: 6946
    • Show only replies by Piru
    • http://www.iki.fi/sintonen/
Re: New CopyMem & CopyMemQuick patch
« Reply #4 on: May 04, 2009, 03:56:56 PM »
This patch requires that you run it. The documentation is too vague about this (or rather it looks like a mistake in the documentation).

This should work in startup-sequence / user-startup:

run <>nil: copymem060 <>nil:
 

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: New CopyMem & CopyMemQuick patch
« Reply #5 on: May 04, 2009, 04:23:58 PM »
It should say in the readme to start with...

Run >NIL: CopyMem060

It doesn't detach from the cli/shell as stated. I guess I need to work on the docs. English is not my strong point as I was born in the U.S. ;-).

Edit: CopyMem060.readme has been fixed now. Should the readme have an icon?
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: New CopyMem & CopyMemQuick patch
« Reply #6 on: May 04, 2009, 05:04:02 PM »
@matthey

Incidentally, if you want to speed up your unrolled move16 loop a tiny bit, move the subq instruction before the last move16.

The move16 doesn't affect the CC so you can do this safely. Not having to test the cc in the instruction immediately after the one that sets it might give a small speedup.
int p; // A
 

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: New CopyMem & CopyMemQuick patch
« Reply #7 on: May 04, 2009, 06:11:46 PM »
@Karlos

I tried the subq in the middle of the move16's and it was slower in my tests. I'll try it at the beginning and see if there is a speedup. I thought it might be some kind of branch optimization for a subq followed by bcc on the 68060 but it shouldn't be. The default branch logic for a backward branch is taken anyway. The whole instruction cache line is already loaded so that shouldn't be the problem either and branch target alignment doesn't seem to make much if any difference on the 68060 either. I'll try again though and run some more speed tests. Most of the gains over Piru's CMQ060 come from optimizing the branches. Where some of the code could have been combined I didn't because the branch logic is different and messes up the branch prediction cache.
 

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: New CopyMem & CopyMemQuick patch
« Reply #8 on: May 04, 2009, 06:33:52 PM »
@Piru

I tried to assemble your CMQ060.asm v1.5d with different flags set at the top and it didn't work. I believe line 757...

   IFNE USE_MOVEM&SAFE_MOVEM

maybe should be...

   IFNE USE_MOVEM|SAFE_MOVEM

I had something like SAFE_MOVE16=0 and SAFE_MOVEM=0 and maybe USE_MOVEM=0. I got all kinds of errors about undefined references and such. Also, thanks again for your programs and making your source available. I've learned a lot.
 

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: New CopyMem & CopyMemQuick patch
« Reply #9 on: May 05, 2009, 03:56:46 AM »
I added some speed test results to the readme as well as mentioning that CopyMem060 is free.
Here's the test results by the way...

Using "TestIt" from CopyMemQuicker V2.8

orig=original AmigaOS 3.9 routines
CMQ28=CopyMemQuicker v2.8
CMQ060=CopyMemQuick v1.5
CM060=CopyMem060 v1.0

orig CMQ28 CMQ060 CM060

1.30 1.28 0.91 0.91 CM565×64kB L->L
0.64 0.43 0.35 0.35 CM147×64kB L->L+1
1.06 1.06 0.99 0.98 CM413×64kB L->E
0.61 0.43 0.36 0.35 CM147×64kB L->E+1
0.60 0.41 0.35 0.34 CM147×64kB L+1->L
1.01 0.86 0.61 0.61 CM382×64kB L+1->L+1
0.66 0.43 0.35 0.35 CM147×64kB L+1->E
1.18 1.18 1.21 1.18 CM501×64kB L+1->E+1
1.20 1.18 1.18 1.18 CM501×64kB E->L
0.59 0.41 0.35 0.34 CM147×64kB E->L+1
1.01 0.86 1.61 0.61 CM382×64kB E->E
0.64 0.43 1.35 0.34 CM147×64kB E->E+1
0.59 0.41 0.35 0.34 CM147×64kB E+1->L
1.06 1.06 0.96 0.98 CM413×64kB E+1->L+1
0.60 0.41 0.34 0.35 CM147×64kB E+1->E
1.30 1.26 0.91 0.91 CM564×64kB E+1->E+1
0.30 0.29 0.26 0.24 CM33900×1kB L->L
0.39 0.21 0.13 0.13 CM9400×1kB L->L+1
0.45 0.18 0.18 0.18 CM24000×1kB E->E
0.36 0.30 0.24 0.21 CM196000×128B L->L
0.48 0.26 0.20 0.18 CM155000×128B E->E
0.54 0.41 0.34 0.33 CM588000×19B L->L
0.55 0.33 0.31 0.31 CM622000×18B L->L
0.48 0.45 0.33 0.33 CM663000×17B L->L
0.53 0.48 0.46 0.41 CM956000×16B L->L
0.56 0.48 0.35 0.23 CM1060000×8B L->L
0.51 0.43 0.26 0.06 CM1430000×4B L->L
0.45 0.41 0.38 0.18 CM2190000×1B L->L

1.30 1.28 0.91 0.91 CMQ565×64kB L->L
0.30 0.30 0.25 0.25 CMQ33900×1kB L->L
0.34 0.28 0.21 0.20 CMQ196000×128B L->L
0.46 0.43 0.36 0.28 CMQ956000×16B L->L
0.34 0.40 0.25 0.18 CMQ1060000×8B L->L
0.24 0.36 0.15 0.08 CMQ1430000×4B L->L
---- ---- ---- ----
22.79 19.53 15.89 14.96

16.69% speedup for CMQ v2.8
43.42% speedup for CMQ060 v1.5
52.34% speedup for CopyMem060 v1.0

Actual results will vary but these are typical.

Hmm. So can the rest of the AmigaOS be sped up by 50%?
Can all those C programs out there be sped up by 50%?
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show only replies by ChaosLord
    • http://totalchaoseng.dbv.pl/news.php
Re: New CopyMem & CopyMemQuick patch
« Reply #10 on: May 05, 2009, 04:24:07 AM »
Looks like some good work!

If you can beat Piru at coding, you have really done something!

I would be more motivated to install your patch if you supplied timing results from copying a 327680 block of ram.
That is the exact size of a 640x512x8bit AGA screen.  My game code sometimes must copy screens around.  Sometimes fastram to fastram, sometimes fastram to chipram.

Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show only replies by ChaosLord
    • http://totalchaoseng.dbv.pl/news.php
Re: New CopyMem & CopyMemQuick patch
« Reply #11 on: May 05, 2009, 04:42:20 AM »
After thinking about this in my head for 5 minutes...
I do not see how Move16 could possibly be legal on the Amiga.  In fastram or chipram.  It will corrupt any device that is DMAing into ram.

It might work on UAE since all UAE DMA is fake.  Actually the Move16 itself would be fake.
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA
 

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: New CopyMem & CopyMemQuick patch
« Reply #12 on: May 05, 2009, 05:42:56 AM »
Quote

ChaosLord wrote:
After thinking about this in my head for 5 minutes...
I do not see how Move16 could possibly be legal on the Amiga.  In fastram or chipram.  It will corrupt any device that is DMAing into ram.


I don't think so. When move16 is being used, you are doing PIO not DMA. DMA doesn't use the main CPU to move data. Some other device accesses the memory bus directly and transfers the memory. The main CPU is free to do other things then. The problem with move16 is, I believe, wrongly assuming that it writes 16 byte bursts into chip memory which could cause problems. However, if the MMU sets up these memory areas correctly, burst will not happen into them. The instruction and data cache are filled with 16 bytes bursts as needed and they do not cause problems in chip ram either when the MMU mapped chip memory properly. This is because the MMU tells the CPU not to do burst into chip ram (or other 24 bit address space) where it would be dangerous. Setpatch, which loads the 68060.library and sets up the MMU, should always be run before CopyMem060. SysSpeed uses the move16 instruction into chip memory in it's speed test and works reliably on a wide range of Amigas. Using move16 into chip ram doesn't have much if any speed advantage except it doesn't replace the data caches with data that likely won't be accessed again soon.

The CopyMem and CopyMemQuick functions are in the exec.library and free for any program to use. I encourage there use.
 

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: New CopyMem & CopyMemQuick patch
« Reply #13 on: May 05, 2009, 06:09:09 AM »
Quote

ChaosLord wrote:
If you can beat Piru at coding, you have really done something!


Thanks for the complement but it was more like my code was built on the shoulder's of Piru's with many of his ideas so it can't really be beating him. I tried several different ways of rewriting the code and most of the time found no gains or worse. It always amazes me how a few words of code can almost always be optimized more somehow.
 

Offline ChaosLord

  • Hero Member
  • *****
  • Join Date: Nov 2003
  • Posts: 2608
    • Show only replies by ChaosLord
    • http://totalchaoseng.dbv.pl/news.php
Re: New CopyMem & CopyMemQuick patch
« Reply #14 on: May 05, 2009, 06:51:04 AM »
It sounds like you are saying your code will fail on any 060 or 040 that does not have an MMU.
Wanna try a wonderfull strategy game with lots of handdrawn anims,
Magic Spells and Monsters, Incredible playability and lastability,
English speech, etc. Total Chaos AGA