Author Topic: Copymem Quick & Big Released! (Read 8496 times)

XDelusion · « **Reply #14 on:** January 28, 2015, 08:21:51 PM »

I guess I would not care if it is a tad slower, so long as it is friendlier with resources.

paul1981 · « **Reply #15 on:** January 28, 2015, 09:24:49 PM »

I still don't understand the results table. Never mind though, I can't help being stupid!

Is it quicker than this for 060's? : http://m68k.aminet.net/package/util/boot/NewCMQ060

SpeedGeek · « **Reply #16 on:** January 29, 2015, 12:57:07 PM »

OK, I replaced "Old" and "New" with the names of the patch routines:
(If you still don't get it I'm sorry I can't help you)

Some Testit results for CMQ&B:

Code: [Select]

This test will compare the CMQ&B CopyMem/CopyMemQuick routines with
the Copymemquicker2.8 ones you have installed.  A great variety of tests will be
run, and this might take some time, especially if your system has a
slow processor.

Initiating test (please be patient...)

Copying 65536 bytes 282 times (long -> long offset)
CMQ&B CopyMem    :  1.46 secs
Copymemquicker2.8 CopyMem    :  1.51 secs (+ 3.4%)
CMQ&B CopyMemQuick:  1.46 secs
Copymemquicker2.8 CopyMemQuick:  1.48 secs (+ 0.7%)

Copying 65536 bytes 73 times (long -> long+1 offset)
CMQ&B CopyMem    :  0.39 secs
Copymemquicker2.8 CopyMem    :  0.79 secs (+102.6%)

Copying 65536 bytes 206 times (long -> even offset)
CMQ&B CopyMem    :  1.13 secs
Copymemquicker2.8 CopyMem    :  1.56 secs (+38.0%)

Copying 65536 bytes 73 times (long -> even+1 offset)
CMQ&B CopyMem    :  0.39 secs
Copymemquicker2.8 CopyMem    :  0.81 secs (+105.1%)

Copying 65536 bytes 73 times (long+1 -> long offset)
CMQ&B CopyMem    :  0.45 secs
Copymemquicker2.8 CopyMem    :  0.81 secs (+80.0%)

Copying 65536 bytes 191 times (long+1 -> long+1 offset)
CMQ&B CopyMem    :  0.99 secs
Copymemquicker2.8 CopyMem    :  1.03 secs (+ 3.0%)

Copying 65536 bytes 73 times (long+1 -> even offset)
CMQ&B CopyMem    :  0.40 secs
Copymemquicker2.8 CopyMem    :  0.79 secs (+97.5%)

Copying 65536 bytes 250 times (long+1 -> even+1 offset)
CMQ&B CopyMem    :  1.38 secs
Copymemquicker2.8 CopyMem    :  1.48 secs (+ 6.5%)

Copying 65536 bytes 250 times (even -> long offset)
CMQ&B CopyMem    :  1.51 secs
Copymemquicker2.8 CopyMem    :  1.46 secs (- 3.3%)

Copying 65536 bytes 73 times (even -> long+1 offset)
CMQ&B CopyMem    :  0.43 secs
Copymemquicker2.8 CopyMem    :  0.81 secs (+88.4%)

Copying 65536 bytes 191 times (even -> even offset)
CMQ&B CopyMem    :  0.98 secs
Copymemquicker2.8 CopyMem    :  1.01 secs (+ 3.1%)

Copying 65536 bytes 73 times (even -> even+1 offset)
CMQ&B CopyMem    :  0.39 secs
Copymemquicker2.8 CopyMem    :  0.80 secs (+102.6%)

Copying 65536 bytes 73 times (even+1 -> long offset)
CMQ&B CopyMem    :  0.44 secs
Copymemquicker2.8 CopyMem    :  0.78 secs (+75.0%)

Copying 65536 bytes 206 times (even+1 -> long+1 offset)
CMQ&B CopyMem    :  1.25 secs
Copymemquicker2.8 CopyMem    :  1.55 secs (+24.0%)

Copying 65536 bytes 73 times (even+1 -> even offset)
CMQ&B CopyMem    :  0.43 secs
Copymemquicker2.8 CopyMem    :  0.81 secs (+88.4%)

Copying 65536 bytes 282 times (even+1 -> even+1 offset)
CMQ&B CopyMem    :  1.46 secs
Copymemquicker2.8 CopyMem    :  1.53 secs (+ 4.1%)

Copying 1024 bytes 16950 times (long -> long offset)
CMQ&B CopyMem    :  1.38 secs
Copymemquicker2.8 CopyMem    :  1.49 secs (+ 8.0%)
CMQ&B CopyMemQuick:  1.38 secs
Copymemquicker2.8 CopyMemQuick:  1.49 secs (+ 8.0%)

Copying 1024 bytes 4700 times (long -> long+1 offset)
CMQ&B CopyMem    :  0.40 secs
Copymemquicker2.8 CopyMem    :  0.81 secs (+102.5%)

Copying 1024 bytes 12000 times (even -> even offset)
CMQ&B CopyMem    :  1.01 secs
Copymemquicker2.8 CopyMem    :  1.08 secs (+ 5.9%)

Copying 128 bytes 98000 times (long -> long offset)
CMQ&B CopyMem    :  1.08 secs
Copymemquicker2.8 CopyMem    :  1.06 secs (- 0.9%)
CMQ&B CopyMemQuick:  0.98 secs
Copymemquicker2.8 CopyMemQuick:  0.99 secs (+ 1.0%)

Copying 128 bytes 77500 times (even -> even offset)
CMQ&B CopyMem    :  0.89 secs
Copymemquicker2.8 CopyMem    :  0.93 secs (+ 3.4%)

Copying 19 bytes 294000 times (long -> long offset)
CMQ&B CopyMem    :  0.53 secs
Copymemquicker2.8 CopyMem    :  0.68 secs (+28.3%)

Copying 18 bytes 311000 times (long -> long offset)
CMQ&B CopyMem    :  0.54 secs
Copymemquicker2.8 CopyMem    :  0.66 secs (+20.4%)

Copying 17 bytes 331500 times (long -> long offset)
CMQ&B CopyMem    :  0.56 secs
Copymemquicker2.8 CopyMem    :  0.71 secs (+25.0%)

Copying 16 bytes 478000 times (long -> long offset)
CMQ&B CopyMem    :  0.76 secs
Copymemquicker2.8 CopyMem    :  0.91 secs (+18.4%)
CMQ&B CopyMemQuick:  0.61 secs
Copymemquicker2.8 CopyMemQuick:  0.54 secs (- 9.8%)

Copying 8 bytes 530000 times (long -> long offset)
CMQ&B CopyMem    :  0.46 secs
Copymemquicker2.8 CopyMem    :  1.08 secs (+132.6%)
CMQ&B CopyMemQuick:  0.30 secs
Copymemquicker2.8 CopyMemQuick:  0.28 secs (- 3.3%)

Copying 4 bytes 715000 times (long -> long offset)
CMQ&B CopyMem    :  0.26 secs
Copymemquicker2.8 CopyMem    :  0.83 secs (+215.4%)
CMQ&B CopyMemQuick:  0.05 secs
Copymemquicker2.8 CopyMemQuick:  0.23 secs (+360.0%)

Copying 1 bytes 1095000 times (long -> long offset)
CMQ&B CopyMem    :  0.41 secs
Copymemquicker2.8 CopyMem    :  0.51 secs (+24.4%)

Total timing:
-------------
CMQ&B routines    :  26.71 secs
Copymemquicker2.8 routines    :  33.48 secs
Total slowdown    :  25.31 %

Some Testit results for CMQ&B040:

Code: [Select]

This test will compare the CMQ&B040 CopyMem/CopyMemQuick routines with
the Copymemquicker2.8 ones you have installed.  A great variety of tests will be
run, and this might take some time, especially if your system has a
slow processor.

Initiating test (please be patient...)

Copying 65536 bytes 565 times (long -> long offset)
CMQ&B040 CopyMem    :  4.08 secs
Copymemquicker2.8 CopyMem    :  5.89 secs (+44.4%)
CMQ&B040 CopyMemQuick:  4.08 secs
Copymemquicker2.8 CopyMemQuick:  5.86 secs (+43.6%)

Copying 65536 bytes 147 times (long -> long+1 offset)
CMQ&B040 CopyMem    :  1.46 secs
Copymemquicker2.8 CopyMem    :  1.61 secs (+10.3%)

Copying 65536 bytes 413 times (long -> even offset)
CMQ&B040 CopyMem    :  4.13 secs
Copymemquicker2.8 CopyMem    :  4.40 secs (+ 6.3%)

Copying 65536 bytes 147 times (long -> even+1 offset)
CMQ&B040 CopyMem    :  1.46 secs
Copymemquicker2.8 CopyMem    :  1.59 secs (+ 8.9%)

Copying 65536 bytes 147 times (long+1 -> long offset)
CMQ&B040 CopyMem    :  1.49 secs
Copymemquicker2.8 CopyMem    :  1.60 secs (+ 6.7%)

Copying 65536 bytes 382 times (long+1 -> long+1 offset)
CMQ&B040 CopyMem    :  2.71 secs
Copymemquicker2.8 CopyMem    :  3.96 secs (+46.1%)

Copying 65536 bytes 147 times (long+1 -> even offset)
CMQ&B040 CopyMem    :  1.48 secs
Copymemquicker2.8 CopyMem    :  1.61 secs (+ 8.8%)

Copying 65536 bytes 501 times (long+1 -> even+1 offset)
CMQ&B040 CopyMem    :  5.06 secs
Copymemquicker2.8 CopyMem    :  5.26 secs (+ 3.9%)

Copying 65536 bytes 501 times (even -> long offset)
CMQ&B040 CopyMem    :  5.16 secs
Copymemquicker2.8 CopyMem    :  5.24 secs (+ 1.5%)

Copying 65536 bytes 147 times (even -> long+1 offset)
CMQ&B040 CopyMem    :  1.53 secs
Copymemquicker2.8 CopyMem    :  1.59 secs (+ 3.9%)

Copying 65536 bytes 382 times (even -> even offset)
CMQ&B040 CopyMem    :  2.71 secs
Copymemquicker2.8 CopyMem    :  3.96 secs (+46.1%)

Copying 65536 bytes 147 times (even -> even+1 offset)
CMQ&B040 CopyMem    :  1.48 secs
Copymemquicker2.8 CopyMem    :  1.59 secs (+ 7.4%)

Copying 65536 bytes 147 times (even+1 -> long offset)
CMQ&B040 CopyMem    :  1.53 secs
Copymemquicker2.8 CopyMem    :  1.58 secs (+ 2.6%)

Copying 65536 bytes 413 times (even+1 -> long+1 offset)
CMQ&B040 CopyMem    :  4.30 secs
Copymemquicker2.8 CopyMem    :  4.41 secs (+ 2.5%)

Copying 65536 bytes 147 times (even+1 -> even offset)
CMQ&B040 CopyMem    :  1.53 secs
Copymemquicker2.8 CopyMem    :  1.59 secs (+ 3.9%)

Copying 65536 bytes 564 times (even+1 -> even+1 offset)
CMQ&B040 CopyMem    :  3.99 secs
Copymemquicker2.8 CopyMem    :  5.85 secs (+46.4%)

Copying 1024 bytes 33900 times (long -> long offset)
CMQ&B040 CopyMem    :  0.29 secs
Copymemquicker2.8 CopyMem    :  0.35 secs (+17.2%)
CMQ&B040 CopyMemQuick:  0.30 secs
Copymemquicker2.8 CopyMemQuick:  0.34 secs (+13.3%)

Copying 1024 bytes 9400 times (long -> long+1 offset)
CMQ&B040 CopyMem    :  0.16 secs
Copymemquicker2.8 CopyMem    :  0.28 secs (+68.7%)

Copying 1024 bytes 24000 times (even -> even offset)
CMQ&B040 CopyMem    :  0.24 secs
Copymemquicker2.8 CopyMem    :  0.25 secs (+ 0.0%)

Copying 128 bytes 196000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.30 secs
Copymemquicker2.8 CopyMem    :  0.38 secs (+26.7%)
CMQ&B040 CopyMemQuick:  0.26 secs
Copymemquicker2.8 CopyMemQuick:  0.35 secs (+30.8%)

Copying 128 bytes 155000 times (even -> even offset)
CMQ&B040 CopyMem    :  0.28 secs
Copymemquicker2.8 CopyMem    :  0.31 secs (+10.7%)

Copying 19 bytes 588000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.39 secs
Copymemquicker2.8 CopyMem    :  0.51 secs (+28.2%)

Copying 18 bytes 622000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.38 secs
Copymemquicker2.8 CopyMem    :  0.41 secs (+ 7.9%)

Copying 17 bytes 663000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.41 secs
Copymemquicker2.8 CopyMem    :  0.55 secs (+31.7%)

Copying 16 bytes 956000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.53 secs
Copymemquicker2.8 CopyMem    :  0.58 secs (+ 7.5%)
CMQ&B040 CopyMemQuick:  0.53 secs
Copymemquicker2.8 CopyMemQuick:  0.51 secs (- 1.9%)

Copying 8 bytes 1060000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.33 secs
Copymemquicker2.8 CopyMem    :  0.58 secs (+75.7%)
CMQ&B040 CopyMemQuick:  0.35 secs
Copymemquicker2.8 CopyMemQuick:  0.46 secs (+31.4%)

Copying 4 bytes 1430000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.23 secs
Copymemquicker2.8 CopyMem    :  0.51 secs (+121.7%)
CMQ&B040 CopyMemQuick:  0.24 secs
Copymemquicker2.8 CopyMemQuick:  0.41 secs (+66.7%)

Copying 1 bytes 2190000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.41 secs
Copymemquicker2.8 CopyMem    :  0.50 secs (+19.5%)

Total timing:
-------------
CMQ&B040 routines    :  53.98 secs
Copymemquicker2.8 routines    :  65.04 secs
Total slowdown    :  20.49 %

ChaosLord · « **Reply #17 on:** January 29, 2015, 05:39:32 PM »

Sorry I am late to the party.

I didn't read every single test result but the ones I read from the very latest post all make sense to me.

One thing everyone needs to keep in mind is that there are 2 different OS routines for copying memory: CopyMem(), CopyMemQuick()

2 different routines that basically do the same thing. They both should be patched.

p.s. I luv 680x0 asm. :knuddel:

SpeedGeek · « **Reply #18 on:** July 04, 2020, 01:04:10 PM »

** 2ND NEWS UPDATE **

CMQ&B040 1.9 released!

-v1.9 New smart buffer copy code provides a BIG SPEED UP
since the MOVE16 alignment restrictions are well handled!
(See new Testit results on EAB).

P.S. If you still don't get it just look at the Old Copymem results. Old = CMQ&B040 and New = Copymemquicker 2.8!

utri007 · « **Reply #19 on:** July 05, 2020, 12:31:09 AM »

What is practical difference? What is faster with these and how much? I do I get 2fps faster Quake or what?

SpeedGeek · « **Reply #20 on:** July 05, 2020, 06:33:39 PM »

Quote from: utri007 on July 05, 2020, 12:31:09 AM

What is practical difference? What is faster with these and how much? I do I get 2fps faster Quake or what?

The Testit results speak for themselves. What is faster is large block copies which don't meet the alignment requirements of Move16. Please direct your application specific questions (e.g. Quake) to their respective developers.

I suggest you download Testit (included with COPMQR28) and determine your own results. I suspect the reason so many here didn't get it was because they were to lazy to determine their own results.

http://aminet.net/search?query=copmqr28

TribbleSmasher · « **Reply #21 on:** July 05, 2020, 10:44:37 PM »

I am with utri007 on this. What is the point of being faster on 1mio copies of 128 bytes if in reality this situation never happens? Where is the bottleneck you are trying to fix, how did you even find that? Optimizations are probably fun for you, but if it only shows in arbitrary tests - what is the point?

Quote from: utri007 on July 05, 2020, 12:31:09 AM

What is practical difference? What is faster with these and how much? I do I get 2fps faster Quake or what?

SpeedGeek · « **Reply #22 on:** July 07, 2020, 01:14:29 AM »

Quote from: TribbleSmasher on July 05, 2020, 10:44:37 PM

I am with utri007 on this. What is the point of being faster on 1mio copies of 128 bytes if in reality this situation never happens? Where is the bottleneck you are trying to fix, how did you even find that? Optimizations are probably fun for you, but if it only shows in arbitrary tests - what is the point?

A few more thoughts from Mr. Spock:

Logically, if 1mio copies of 128 bytes are faster, than 100K copies of 128 bytes should also be faster, and 10K copies and 1K copies, etc. and so on.

The problem is that most lamers don't understand that coding a benchmark program (which offers even a moderate degree of accuracy) is no easy task. In fact, given the limitations of the Amiga Hardware (timer.device) and the Amiga OS (multitasking and serious interrupt dependencies) and compounded by Software developer tools limitations (SAS Crap compiler) it's truly amazing and very impressive that some Amiga Benchmark programs work as well as they do.

But the one thing all Benchmark coders will appreciate (given the above constraints) is that More Iterations = Better Accuracy.

So logically speaking, the 1mio copies of 128 bytes has a practical purpose (even if some lamer still doesn't get it).

The bottleneck fix has already been explained and I don't like repeating myself. Optimizations are often more work than fun, but there is some satisfaction in finally solving the bottleneck problem. Now, if I could only do that and avoid these silly lamer questions...

kolla · « **Reply #23 on:** July 07, 2020, 05:28:42 AM »

Behind every bottleneck there is a new bottleneck, and the biggest bottleneck sits between behind the keyboard, and you will never ever be able to fix that

TribbleSmasher · « **Reply #24 on:** July 07, 2020, 05:43:16 AM »

@SpeedGeek My lamer question comes from your own statement

Quote

The main goal is to give the fastest possible results with Testit from COPMQR28

. As this benchmark seems to do thousands of sequential copies (maybe even from/to the same address every time?) one can assume this optimization only really shows in this kind of cases. Once we remove the repeating out of the copy procedure(like from 1mio down to 10, 50) the results are, probably, still faster, but the absolute effect is only marginal, as normal copyroutines don't do that mio times repeating. So the gain is a few ms every minute of normal operation, say Workbench or whatever?
If the installation of this patch takes one more reboot or a few seconds to install, then i lost all speed benefit for, say, the whole week in advance?
It is still interesting, but as long as LightWave or PPaint or Quake or Doom are not getting accelerated, i pass on this patch.

SpeedGeek · « **Reply #25 on:** July 07, 2020, 09:03:23 PM »

Quote from: TribbleSmasher on July 07, 2020, 05:43:16 AM

@SpeedGeek My lamer question comes from your own statement
Quote
The main goal is to give the fastest possible results with Testit from COPMQR28
.

...because this Benchmark tool was used by nearly all previous CopymemQuick patch developers. Now obviously, if you want to compare performance results on an "Apples to Apples" basis you will need the very same Benchmark tool running on the very same system. Now, if you read this thread completely you can see there are several requests for comparison results with these previously developed patches.

Quote from: TribbleSmasher on July 07, 2020, 05:43:16 AM

As this benchmark seems to do thousands of sequential copies (maybe even from/to the same address every time?) one can assume this optimization only really shows in this kind of cases. Once we remove the repeating out of the copy procedure(like from 1mio down to 10, 50) the results are, probably, still faster, but the absolute effect is only marginal, as normal copyroutines don't do that mio times repeating. So the gain is a few ms every minute of normal operation, say Workbench or whatever?
If the installation of this patch takes one more reboot or a few seconds to install, then i lost all speed benefit for, say, the whole week in advance?

Ok, so then why didn't you develop your own Benchmark tool which provides the specific results (which in your opinion) are more practical and reliable? Apparently, you are just not up to the task.

Quote from: TribbleSmasher on July 07, 2020, 05:43:16 AM

It is still interesting, but as long as LightWave or PPaint or Quake or Doom are not getting accelerated, i pass on this patch.

Really? Some assumptions you easily made but what efforts did you make to provide conclusive proof? Thanks (for the pass), but it's really too bad you didn't realize just how much of your valuable time (and mine) you could have saved by passing on the lamer questions too.

kolla · « **Reply #26 on:** July 08, 2020, 07:28:35 AM »

People, just let this go - what SpeedGeek is doing, is called "sport" and has little to do with real life.

Compare it with a 100m runner who just found some new shoes to wear that might earn him/her a few ms in the race.

For normal people it is quite acceptable to slowly walk 100m, and to use other means of transport if one is in a hurry, typically for longer distance than 100m.

utri007 · « **Reply #27 on:** July 08, 2020, 08:46:56 AM »

@kolla You are wrong. I'm interested to get most off my Amiga, but only if it makes practical difference. I would be interested what is practical meaning of this patch, wich apps benefits it.

Do you happen to know good guide for that, which would explain what patches makes difference?

kolla · « **Reply #28 on:** July 08, 2020, 01:25:32 PM »

Quote from: utri007 on July 08, 2020, 08:46:56 AM

@kolla You are wrong.

Really.

Quote

I'm interested to get most off my Amiga, but only if it makes practical difference.

You here demonstrate that I am correct.

Quote

I would be interested what is practical meaning of this patch, wich apps benefits it.

Yeah, that would be interesting huh? Did you not read what SpeedGeek answered to this? He does not care, his care is about the sport - the benchmarks. Whether there are programs that benefit, depends entirely on the program and how they are coded. Do we know how Amiga programs are coded? Some yes, most no. So what are the gains? Hard to know, you would have to compare each and every program with and without the patches. Who would do that? People like you, who are so interested in the practical meanings of such patches. Do people who have shown interest in the practical meaning of such patches been able to do so? To a very small degree - instead they whine to the "athletes" to compile such lists, which they are not at all interested in doing. So there you are.

Quote

Do you happen to know good guide for that, which would explain what patches makes difference?

The fact that you must ask for this just demonstrates again that I am right.

SpeedGeek · « **Reply #29 from previous page:** July 11, 2020, 02:17:11 AM »

** 3RD NEWS UPDATE **

CMQ&B040 2.0 released!

v2.0 Fixed a seldom occuring but serious bug with internal Smart
buffer usage.
- Nested call large block copies (WHEN MISALIGNED!) could corrupt
each others data when sharing the same buffer. This fix uses a stack
based buffer solution which results in a private buffer for each call.

Author Topic: Copymem Quick & Big Released! (Read 8496 times)

XDelusion

Re: Copymem Quick & Big Released!

paul1981

Re: Copymem Quick & Big Released!

SpeedGeek

Re: Copymem Quick & Big Released!

ChaosLord

Re: Copymem Quick & Big Released!

SpeedGeek

Re: Copymem Quick & Big Released!

utri007

Re: Copymem Quick & Big Released!

SpeedGeek

Re: Copymem Quick & Big Released!

TribbleSmasher

Re: Copymem Quick & Big Released!

SpeedGeek

Re: Copymem Quick & Big Released!

kolla

Re: Copymem Quick & Big Released!

TribbleSmasher

Re: Copymem Quick & Big Released!

SpeedGeek

Re: Copymem Quick & Big Released!

kolla

Re: Copymem Quick & Big Released!

utri007

Re: Copymem Quick & Big Released!

kolla

Re: Copymem Quick & Big Released!

SpeedGeek

Re: Copymem Quick & Big Released!