Welcome, Guest. Please login or register.

Author Topic: New CopyMem & CopyMemQuick patch  (Read 4847 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
New CopyMem & CopyMemQuick patch
« on: May 04, 2009, 05:21:34 AM »
It's called CopyMem060 and I need some software testers before uploading to Aminet.
It's faster than CMQ060, mostly with small copies which are *very* common.
I used up over 100MB of memory logging the calls with Snoopy in about a minute.
I also just added an English worbench.guide AmigaGuide that brings up some useful help when the help key is pressed in Workbench with AmigaOS 3.9.
I'm looking for any feedback, comments, or bug reports.
Programs are here...

http://www.heywheel.com/matthey/Amiga/programming.html
 

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New CopyMem & CopyMemQuick patch
« Reply #1 on: May 04, 2009, 04:23:58 PM »
It should say in the readme to start with...

Run >NIL: CopyMem060

It doesn't detach from the cli/shell as stated. I guess I need to work on the docs. English is not my strong point as I was born in the U.S. ;-).

Edit: CopyMem060.readme has been fixed now. Should the readme have an icon?
 

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New CopyMem & CopyMemQuick patch
« Reply #2 on: May 04, 2009, 06:11:46 PM »
@Karlos

I tried the subq in the middle of the move16's and it was slower in my tests. I'll try it at the beginning and see if there is a speedup. I thought it might be some kind of branch optimization for a subq followed by bcc on the 68060 but it shouldn't be. The default branch logic for a backward branch is taken anyway. The whole instruction cache line is already loaded so that shouldn't be the problem either and branch target alignment doesn't seem to make much if any difference on the 68060 either. I'll try again though and run some more speed tests. Most of the gains over Piru's CMQ060 come from optimizing the branches. Where some of the code could have been combined I didn't because the branch logic is different and messes up the branch prediction cache.
 

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New CopyMem & CopyMemQuick patch
« Reply #3 on: May 04, 2009, 06:33:52 PM »
@Piru

I tried to assemble your CMQ060.asm v1.5d with different flags set at the top and it didn't work. I believe line 757...

   IFNE USE_MOVEM&SAFE_MOVEM

maybe should be...

   IFNE USE_MOVEM|SAFE_MOVEM

I had something like SAFE_MOVE16=0 and SAFE_MOVEM=0 and maybe USE_MOVEM=0. I got all kinds of errors about undefined references and such. Also, thanks again for your programs and making your source available. I've learned a lot.
 

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New CopyMem & CopyMemQuick patch
« Reply #4 on: May 05, 2009, 03:56:46 AM »
I added some speed test results to the readme as well as mentioning that CopyMem060 is free.
Here's the test results by the way...

Using "TestIt" from CopyMemQuicker V2.8

orig=original AmigaOS 3.9 routines
CMQ28=CopyMemQuicker v2.8
CMQ060=CopyMemQuick v1.5
CM060=CopyMem060 v1.0

orig CMQ28 CMQ060 CM060

1.30 1.28 0.91 0.91 CM565×64kB L->L
0.64 0.43 0.35 0.35 CM147×64kB L->L+1
1.06 1.06 0.99 0.98 CM413×64kB L->E
0.61 0.43 0.36 0.35 CM147×64kB L->E+1
0.60 0.41 0.35 0.34 CM147×64kB L+1->L
1.01 0.86 0.61 0.61 CM382×64kB L+1->L+1
0.66 0.43 0.35 0.35 CM147×64kB L+1->E
1.18 1.18 1.21 1.18 CM501×64kB L+1->E+1
1.20 1.18 1.18 1.18 CM501×64kB E->L
0.59 0.41 0.35 0.34 CM147×64kB E->L+1
1.01 0.86 1.61 0.61 CM382×64kB E->E
0.64 0.43 1.35 0.34 CM147×64kB E->E+1
0.59 0.41 0.35 0.34 CM147×64kB E+1->L
1.06 1.06 0.96 0.98 CM413×64kB E+1->L+1
0.60 0.41 0.34 0.35 CM147×64kB E+1->E
1.30 1.26 0.91 0.91 CM564×64kB E+1->E+1
0.30 0.29 0.26 0.24 CM33900×1kB L->L
0.39 0.21 0.13 0.13 CM9400×1kB L->L+1
0.45 0.18 0.18 0.18 CM24000×1kB E->E
0.36 0.30 0.24 0.21 CM196000×128B L->L
0.48 0.26 0.20 0.18 CM155000×128B E->E
0.54 0.41 0.34 0.33 CM588000×19B L->L
0.55 0.33 0.31 0.31 CM622000×18B L->L
0.48 0.45 0.33 0.33 CM663000×17B L->L
0.53 0.48 0.46 0.41 CM956000×16B L->L
0.56 0.48 0.35 0.23 CM1060000×8B L->L
0.51 0.43 0.26 0.06 CM1430000×4B L->L
0.45 0.41 0.38 0.18 CM2190000×1B L->L

1.30 1.28 0.91 0.91 CMQ565×64kB L->L
0.30 0.30 0.25 0.25 CMQ33900×1kB L->L
0.34 0.28 0.21 0.20 CMQ196000×128B L->L
0.46 0.43 0.36 0.28 CMQ956000×16B L->L
0.34 0.40 0.25 0.18 CMQ1060000×8B L->L
0.24 0.36 0.15 0.08 CMQ1430000×4B L->L
---- ---- ---- ----
22.79 19.53 15.89 14.96

16.69% speedup for CMQ v2.8
43.42% speedup for CMQ060 v1.5
52.34% speedup for CopyMem060 v1.0

Actual results will vary but these are typical.

Hmm. So can the rest of the AmigaOS be sped up by 50%?
Can all those C programs out there be sped up by 50%?
 

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New CopyMem & CopyMemQuick patch
« Reply #5 on: May 05, 2009, 05:42:56 AM »
Quote

ChaosLord wrote:
After thinking about this in my head for 5 minutes...
I do not see how Move16 could possibly be legal on the Amiga.  In fastram or chipram.  It will corrupt any device that is DMAing into ram.


I don't think so. When move16 is being used, you are doing PIO not DMA. DMA doesn't use the main CPU to move data. Some other device accesses the memory bus directly and transfers the memory. The main CPU is free to do other things then. The problem with move16 is, I believe, wrongly assuming that it writes 16 byte bursts into chip memory which could cause problems. However, if the MMU sets up these memory areas correctly, burst will not happen into them. The instruction and data cache are filled with 16 bytes bursts as needed and they do not cause problems in chip ram either when the MMU mapped chip memory properly. This is because the MMU tells the CPU not to do burst into chip ram (or other 24 bit address space) where it would be dangerous. Setpatch, which loads the 68060.library and sets up the MMU, should always be run before CopyMem060. SysSpeed uses the move16 instruction into chip memory in it's speed test and works reliably on a wide range of Amigas. Using move16 into chip ram doesn't have much if any speed advantage except it doesn't replace the data caches with data that likely won't be accessed again soon.

The CopyMem and CopyMemQuick functions are in the exec.library and free for any program to use. I encourage there use.
 

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New CopyMem & CopyMemQuick patch
« Reply #6 on: May 05, 2009, 06:09:09 AM »
Quote

ChaosLord wrote:
If you can beat Piru at coding, you have really done something!


Thanks for the complement but it was more like my code was built on the shoulder's of Piru's with many of his ideas so it can't really be beating him. I tried several different ways of rewriting the code and most of the time found no gains or worse. It always amazes me how a few words of code can almost always be optimized more somehow.
 

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New CopyMem & CopyMemQuick patch
« Reply #7 on: May 05, 2009, 07:36:58 AM »
Quote

ChaosLord wrote:
It sounds like you are saying your code will fail on any 060 or 040 that does not have an MMU.


Yes! Let me read you a paragraph from Amiga Mail dated Nov/Dec 1992 Titled "The 68030 and 68040 on the Zorro III Bus" by Michael Sinz...

"As the 68040.library requires an MMU to map address space, the fix described above will not work on systems with a MC68EC040. Because burst mode on the 68040 is activated along with the cache, there is no way to prevent a 68EC040-equipped Amiga from doing full line bursts when accessing cachable address space. This means a 68EC040 cannot prevent the excessive reads and writes when reading non-cachable ZorroIII devices that reside in cachable address space. A 68EC040-equipped Amiga will experience a significant decrease in performance when accessing non-cachable Zorro III devices. For this reason we cannot recommend that anyone use a 68EC040 (or any future 68000 series CPU that has no MMU) as the CPU on a Zorro III bus system."

So, CopyMem060 is dangerous to use on the 68040 or 68060 without MMU unless all the caches are left off (default after reset). The MMU needs to map memory so bursts do not occur to certain areas of memory. After that, the CPU is smart enough not to burst into those areas of memory. This is all done by the 68060.library when the setpatch command is run in the S:Startup-Sequence. Using move16 is no more dangerous than turning on the caches in a 68040 or 68060 without the MMU setup properly and nobody in their right mind would run with the caches off. All of this does NOT apply to an emulated 68040 or 68060.
 

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New CopyMem & CopyMemQuick patch
« Reply #8 on: May 05, 2009, 01:59:37 PM »
Quote

mongo wrote:
Quote

ChaosLord wrote:
@mongo

Apparently Move16 is only illegal on Amigas lacking an MMU.

So if you have a 68EC040 or 68EC060 then you are screwed.



No.


Your posts are not too helpful mongo. Do you care to spread some of your great intellect with a reason or two or are we just incapable of understanding such superior intelligence? What ChaosLord said here is not worded well but practically correct. A better wording might be...

"Apparently move16 should not be used on Amigas lacking an MMU unless all the caches are off. So if you have a 68EC040 or 68EC060 then you are screwed."
 

Offline mattheyTopic starter

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: New CopyMem & CopyMemQuick patch
« Reply #9 on: September 08, 2011, 01:56:19 AM »
@psxphill
Nice dig up of a 2 year old thread. There must be kludge methods for handling EC processors but they probably aren't fast or well tested. I don't recommend using move16 with an EC processor but probably the most it would do is crash. Non EC processors are not expensive and a worthwhile upgrade in my opinion.

Quote from: itix;658334

Anyway, didnt AmigaOS have big issues with 040/060 when DMAing to fast memory? Those pre/postdma calls... Because of caches the CPU could accidentally trash data and DMAing to fast ram would only work if you had an MMU. This is completely unrelated to move16. And it would be only an issue if you had a devices with DMA support.


Maybe you are thinking of the Buster chips that caused DMA problems on 4000s? I vaguely remember a bug in the AmigaOS cache flushing too. Either way, they should be fixed or worked around and shouldn't affect move16.