Amiga.org
Amiga computer related discussion => Amiga Software Issues and Discussion => Topic started by: matthey on May 04, 2009, 05:21:34 AM
-
It's called CopyMem060 and I need some software testers before uploading to Aminet.
It's faster than CMQ060, mostly with small copies which are *very* common.
I used up over 100MB of memory logging the calls with Snoopy in about a minute.
I also just added an English worbench.guide AmigaGuide that brings up some useful help when the help key is pressed in Workbench with AmigaOS 3.9.
I'm looking for any feedback, comments, or bug reports.
Programs are here...
http://www.heywheel.com/matthey/Amiga/programming.html
-
@matthey
First, thanks alot for this, I am always looking for speedup's :)
Ok, if i run this from shell after booting, it works ok.
If I put it in startup-sequence, or even user-startup, my machine doesn't finish booting, just stays at a black screen.
[Edit: I have to patch my scsi.device as Chris Hodges just pointed out that there is a serious memory bug. Perhaps this is causing it not to work on my system. that bug certainly kills TLSFMem.]
I am using:
A4000T with Quikpak 060
Deneb with rom updates in flashrom
OS 3.9 with AFA_OS
TLSFmem
-
I had a crazy idea for large copies once. If a PPC is detected, use that for copies that are large enough to mask the effects of the context switch. Only downside there is that such copies are rare.
-
@hammerd, matthey:
same here. no boot if set in s-s. im using a4k with csppc060, custom 3.9 kick with all the newest patches, afa_os4.4(+powerwindows) atm.
edit:typo
-
This patch requires that you run it. The documentation is too vague about this (or rather it looks like a mistake in the documentation).
This should work in startup-sequence / user-startup:
run <>nil: copymem060 <>nil:
-
It should say in the readme to start with...
Run >NIL: CopyMem060
It doesn't detach from the cli/shell as stated. I guess I need to work on the docs. English is not my strong point as I was born in the U.S. ;-).
Edit: CopyMem060.readme has been fixed now. Should the readme have an icon?
-
@matthey
Incidentally, if you want to speed up your unrolled move16 loop a tiny bit, move the subq instruction before the last move16.
The move16 doesn't affect the CC so you can do this safely. Not having to test the cc in the instruction immediately after the one that sets it might give a small speedup.
-
@Karlos
I tried the subq in the middle of the move16's and it was slower in my tests. I'll try it at the beginning and see if there is a speedup. I thought it might be some kind of branch optimization for a subq followed by bcc on the 68060 but it shouldn't be. The default branch logic for a backward branch is taken anyway. The whole instruction cache line is already loaded so that shouldn't be the problem either and branch target alignment doesn't seem to make much if any difference on the 68060 either. I'll try again though and run some more speed tests. Most of the gains over Piru's CMQ060 come from optimizing the branches. Where some of the code could have been combined I didn't because the branch logic is different and messes up the branch prediction cache.
-
@Piru
I tried to assemble your CMQ060.asm v1.5d with different flags set at the top and it didn't work. I believe line 757...
IFNE USE_MOVEM&SAFE_MOVEM
maybe should be...
IFNE USE_MOVEM|SAFE_MOVEM
I had something like SAFE_MOVE16=0 and SAFE_MOVEM=0 and maybe USE_MOVEM=0. I got all kinds of errors about undefined references and such. Also, thanks again for your programs and making your source available. I've learned a lot.
-
I added some speed test results to the readme as well as mentioning that CopyMem060 is free.
Here's the test results by the way...
Using "TestIt" from CopyMemQuicker V2.8
orig=original AmigaOS 3.9 routines
CMQ28=CopyMemQuicker v2.8
CMQ060=CopyMemQuick v1.5
CM060=CopyMem060 v1.0
orig CMQ28 CMQ060 CM060
1.30 1.28 0.91 0.91 CM565×64kB L->L
0.64 0.43 0.35 0.35 CM147×64kB L->L+1
1.06 1.06 0.99 0.98 CM413×64kB L->E
0.61 0.43 0.36 0.35 CM147×64kB L->E+1
0.60 0.41 0.35 0.34 CM147×64kB L+1->L
1.01 0.86 0.61 0.61 CM382×64kB L+1->L+1
0.66 0.43 0.35 0.35 CM147×64kB L+1->E
1.18 1.18 1.21 1.18 CM501×64kB L+1->E+1
1.20 1.18 1.18 1.18 CM501×64kB E->L
0.59 0.41 0.35 0.34 CM147×64kB E->L+1
1.01 0.86 1.61 0.61 CM382×64kB E->E
0.64 0.43 1.35 0.34 CM147×64kB E->E+1
0.59 0.41 0.35 0.34 CM147×64kB E+1->L
1.06 1.06 0.96 0.98 CM413×64kB E+1->L+1
0.60 0.41 0.34 0.35 CM147×64kB E+1->E
1.30 1.26 0.91 0.91 CM564×64kB E+1->E+1
0.30 0.29 0.26 0.24 CM33900×1kB L->L
0.39 0.21 0.13 0.13 CM9400×1kB L->L+1
0.45 0.18 0.18 0.18 CM24000×1kB E->E
0.36 0.30 0.24 0.21 CM196000×128B L->L
0.48 0.26 0.20 0.18 CM155000×128B E->E
0.54 0.41 0.34 0.33 CM588000×19B L->L
0.55 0.33 0.31 0.31 CM622000×18B L->L
0.48 0.45 0.33 0.33 CM663000×17B L->L
0.53 0.48 0.46 0.41 CM956000×16B L->L
0.56 0.48 0.35 0.23 CM1060000×8B L->L
0.51 0.43 0.26 0.06 CM1430000×4B L->L
0.45 0.41 0.38 0.18 CM2190000×1B L->L
1.30 1.28 0.91 0.91 CMQ565×64kB L->L
0.30 0.30 0.25 0.25 CMQ33900×1kB L->L
0.34 0.28 0.21 0.20 CMQ196000×128B L->L
0.46 0.43 0.36 0.28 CMQ956000×16B L->L
0.34 0.40 0.25 0.18 CMQ1060000×8B L->L
0.24 0.36 0.15 0.08 CMQ1430000×4B L->L
---- ---- ---- ----
22.79 19.53 15.89 14.96
16.69% speedup for CMQ v2.8
43.42% speedup for CMQ060 v1.5
52.34% speedup for CopyMem060 v1.0
Actual results will vary but these are typical.
Hmm. So can the rest of the AmigaOS be sped up by 50%?
Can all those C programs out there be sped up by 50%?
-
Looks like some good work!
If you can beat Piru at coding, you have really done something!
I would be more motivated to install your patch if you supplied timing results from copying a 327680 block of ram.
That is the exact size of a 640x512x8bit AGA screen. My game code sometimes must copy screens around. Sometimes fastram to fastram, sometimes fastram to chipram.
-
After thinking about this in my head for 5 minutes...
I do not see how Move16 could possibly be legal on the Amiga. In fastram or chipram. It will corrupt any device that is DMAing into ram.
It might work on UAE since all UAE DMA is fake. Actually the Move16 itself would be fake.
-
ChaosLord wrote:
After thinking about this in my head for 5 minutes...
I do not see how Move16 could possibly be legal on the Amiga. In fastram or chipram. It will corrupt any device that is DMAing into ram.
I don't think so. When move16 is being used, you are doing PIO not DMA. DMA doesn't use the main CPU to move data. Some other device accesses the memory bus directly and transfers the memory. The main CPU is free to do other things then. The problem with move16 is, I believe, wrongly assuming that it writes 16 byte bursts into chip memory which could cause problems. However, if the MMU sets up these memory areas correctly, burst will not happen into them. The instruction and data cache are filled with 16 bytes bursts as needed and they do not cause problems in chip ram either when the MMU mapped chip memory properly. This is because the MMU tells the CPU not to do burst into chip ram (or other 24 bit address space) where it would be dangerous. Setpatch, which loads the 68060.library and sets up the MMU, should always be run before CopyMem060. SysSpeed uses the move16 instruction into chip memory in it's speed test and works reliably on a wide range of Amigas. Using move16 into chip ram doesn't have much if any speed advantage except it doesn't replace the data caches with data that likely won't be accessed again soon.
The CopyMem and CopyMemQuick functions are in the exec.library and free for any program to use. I encourage there use.
-
ChaosLord wrote:
If you can beat Piru at coding, you have really done something!
Thanks for the complement but it was more like my code was built on the shoulder's of Piru's with many of his ideas so it can't really be beating him. I tried several different ways of rewriting the code and most of the time found no gains or worse. It always amazes me how a few words of code can almost always be optimized more somehow.
-
It sounds like you are saying your code will fail on any 060 or 040 that does not have an MMU.
-
ChaosLord wrote:
It sounds like you are saying your code will fail on any 060 or 040 that does not have an MMU.
Yes! Let me read you a paragraph from Amiga Mail dated Nov/Dec 1992 Titled "The 68030 and 68040 on the Zorro III Bus" by Michael Sinz...
"As the 68040.library requires an MMU to map address space, the fix described above will not work on systems with a MC68EC040. Because burst mode on the 68040 is activated along with the cache, there is no way to prevent a 68EC040-equipped Amiga from doing full line bursts when accessing cachable address space. This means a 68EC040 cannot prevent the excessive reads and writes when reading non-cachable ZorroIII devices that reside in cachable address space. A 68EC040-equipped Amiga will experience a significant decrease in performance when accessing non-cachable Zorro III devices. For this reason we cannot recommend that anyone use a 68EC040 (or any future 68000 series CPU that has no MMU) as the CPU on a Zorro III bus system."
So, CopyMem060 is dangerous to use on the 68040 or 68060 without MMU unless all the caches are left off (default after reset). The MMU needs to map memory so bursts do not occur to certain areas of memory. After that, the CPU is smart enough not to burst into those areas of memory. This is all done by the 68060.library when the setpatch command is run in the S:Startup-Sequence. Using move16 is no more dangerous than turning on the caches in a 68040 or 68060 without the MMU setup properly and nobody in their right mind would run with the caches off. All of this does NOT apply to an emulated 68040 or 68060.
-
ChaosLord wrote:
After thinking about this in my head for 5 minutes...
I do not see how Move16 could possibly be legal on the Amiga. In fastram or chipram. It will corrupt any device that is DMAing into ram.
Uh... No. Why would it?
-
@mongo
Apparently Move16 is only illegal on Amigas lacking an MMU.
So if you have a 68EC040 or 68EC060 then you are screwed.
-
ChaosLord wrote:
@mongo
Apparently Move16 is only illegal on Amigas lacking an MMU.
So if you have a 68EC040 or 68EC060 then you are screwed.
No.
-
mongo wrote:
ChaosLord wrote:
@mongo
Apparently Move16 is only illegal on Amigas lacking an MMU.
So if you have a 68EC040 or 68EC060 then you are screwed.
No.
Your posts are not too helpful mongo. Do you care to spread some of your great intellect with a reason or two or are we just incapable of understanding such superior intelligence? What ChaosLord said here is not worded well but practically correct. A better wording might be...
"Apparently move16 should not be used on Amigas lacking an MMU unless all the caches are off. So if you have a 68EC040 or 68EC060 then you are screwed."
-
"As the 68040.library requires an MMU to map address space, the fix described above will not work on systems with a MC68EC040. Because burst mode on the 68040 is activated along with the cache, there is no way to prevent a 68EC040-equipped Amiga from doing full line bursts when accessing cachable address space. This means a 68EC040 cannot prevent the excessive reads and writes when reading non-cachable ZorroIII devices that reside in cachable address space. A 68EC040-equipped Amiga will experience a significant decrease in performance when accessing non-cachable Zorro III devices. For this reason we cannot recommend that anyone use a 68EC040 (or any future 68000 series CPU that has no MMU) as the CPU on a Zorro III bus system."
All the EC processors have access control registers for controlling the cache. You don't need an MMU, although as you can only use the high 8 bits of the address and you need to make the custom chip registers non cachable as well you would have to be careful with where you put your cachable and non cachable boards to achieve the best performance.
The following is also true of a 68ec040
"On an Amiga with a 68040, Exec uses one of these registers to map the low
24-bits of the Amiga's address space (the Zorro II range,
$00000000-$00FFFFFFFF) as non-cachable and serialized1 .
The Amiga uses the second register to map the remaining memory
($01000000-$FFFFFFFF) as cachable and non-serialized. Because of its
mapping, any RAM in this region will yield considerably higher
performance than RAM in Zorro II space. Unfortunately, this mapping
can cause problems for a Zorro III device that is not RAM."
By changing the ranges that are cached/non cached and making expansion.library put boards in the correct place depending on whether they are ram or not you would avoid the problem. I don't know why they didn't do that. Especially at a time when dedicating nearly 4gb of address space to ram was insane.
-
Your posts are not too helpful mongo. Do you care to spread some of your great intellect with a reason or two or are we just incapable of understanding such superior intelligence? What ChaosLord said here is not worded well but practically correct. A better wording might be...
"Apparently move16 should not be used on Amigas lacking an MMU unless all the caches are off. So if you have a 68EC040 or 68EC060 then you are screwed."
"For this reason we cannot recommend that anyone use a 68EC040 (or any future 68000 series CPU that has no MMU) as the CPU on a Zorro III bus system."
Anyway, didnt AmigaOS have big issues with 040/060 when DMAing to fast memory? Those pre/postdma calls... Because of caches the CPU could accidentally trash data and DMAing to fast ram would only work if you had an MMU. This is completely unrelated to move16. And it would be only an issue if you had a devices with DMA support.
(Geez, I have lost so much information about Amiga in these years....)
-
@psxphill
Nice dig up of a 2 year old thread. There must be kludge methods for handling EC processors but they probably aren't fast or well tested. I don't recommend using move16 with an EC processor but probably the most it would do is crash. Non EC processors are not expensive and a worthwhile upgrade in my opinion.
Anyway, didnt AmigaOS have big issues with 040/060 when DMAing to fast memory? Those pre/postdma calls... Because of caches the CPU could accidentally trash data and DMAing to fast ram would only work if you had an MMU. This is completely unrelated to move16. And it would be only an issue if you had a devices with DMA support.
Maybe you are thinking of the Buster chips that caused DMA problems on 4000s? I vaguely remember a bug in the AmigaOS cache flushing too. Either way, they should be fixed or worked around and shouldn't affect move16.
-
Because of caches the CPU could accidentally trash data and DMAing to fast ram would only work if you had an MMU. This is completely unrelated to move16. And it would be only an issue if you had a devices with DMA support.
You can flush the cache without a MMU. So it really shouldn't be a problem.
No idea why I thought this was a current thread :-)
-
You can flush the cache without a MMU. So it really shouldn't be a problem.
It won't help because caches could get dirty again.
No idea why I thought this was a current thread :-)
Me neither :)
@matthey
Maybe you are thinking of the Buster chips that caused DMA problems on 4000s? I vaguely remember a bug in the AmigaOS cache flushing too. Either way, they should be fixed or worked around and shouldn't affect move16.
I dont think it has anything to do with Buster... if you DMA to fast ram CPU wont know about it. It could accidentally trash data crossing cache line boundary.