Welcome, Guest. Please login or register.

Author Topic: Copymem Quick & Big Released!  (Read 8065 times)

Description:

0 Members and 2 Guests are viewing this topic.

Offline SpeedGeekTopic starter

Copymem Quick & Big Released!
« on: January 25, 2015, 05:34:58 PM »
Here is a link to this thread:

http://eab.abime.net/showthread.php?p=999829
 

Offline SpeedGeekTopic starter

Re: Copymem Quick & Big Released!
« Reply #1 on: January 27, 2015, 11:53:52 AM »
** NEWS UPDATE **

CMQ&B040 v1.8 released

v1.8 minor change
- removed obsolete Copymemquick source address compare code
 

Offline SpeedGeekTopic starter

Re: Copymem Quick & Big Released!
« Reply #2 on: January 27, 2015, 01:00:09 PM »
@ral-clan

Testit always compares the old Copymem + Copymemquick routines against  the new routines from Copymemquicker2.8. So the "Old" results are the  results the CMQ patch installed before Copymemquicker2.8 (or the  original routines if no CMQ patch was installed). If you read the  results carefully you will see the "Old" results are much faster than the  "New" results. ;)
« Last Edit: January 27, 2015, 01:49:59 PM by SpeedGeek »
 

Offline SpeedGeekTopic starter

Re: Copymem Quick & Big Released!
« Reply #3 on: January 28, 2015, 03:14:49 PM »
Why is this so hard to understand?

1) Old results = CMQ patch routines (or original routines if no CMQ patch installed)
2) New results = Copymemquicker2.8 routines

You guys must need some coffee, tea and vitamins... :lol:
 

Offline SpeedGeekTopic starter

Re: Copymem Quick & Big Released!
« Reply #4 on: January 28, 2015, 04:23:30 PM »
Quote from: Oldsmobile_Mike;782501
That still doesn't make any sense, man.  Because in your example, you said:

Old CopyMem    :  1.46 secs
New CopyMem    :  1.51 secs (+ 3.4%)        

So the old routine completes in 1.46 seconds.  The new routine completes in 1.51 seconds.  Any way of reading that the old routine is faster, then.

Oh well, I give up.  ?????  Keep up the good work!  :roflmao:

A few thoughts from Mr. Spock...

Logically, for the old routines to be faster than the new routines there a 2 possible choices here:

1) The old routines (from exec) were patched with a CMQ patch which has faster routines than the new routines (from Copymemquicker2.8)

OR

2) The old routines (from exec) are faster than the new routines (from Copymemquicker2.8)

P.S. If the old routines (from exec) are really faster than why would anyone bother coding or using CMQ patches?
« Last Edit: January 28, 2015, 04:55:46 PM by SpeedGeek »
 

Offline SpeedGeekTopic starter

Re: Copymem Quick & Big Released!
« Reply #5 on: January 29, 2015, 12:57:07 PM »
OK, I replaced "Old" and "New" with the names of the patch routines:
(If you still don't get it I'm sorry I can't help you)

Some Testit results for CMQ&B:
Code: [Select]
This test will compare the CMQ&B CopyMem/CopyMemQuick routines with
the Copymemquicker2.8 ones you have installed.  A great variety of tests will be
run, and this might take some time, especially if your system has a
slow processor.

Initiating test (please be patient...)

Copying 65536 bytes 282 times (long -> long offset)
CMQ&B CopyMem    :  1.46 secs
Copymemquicker2.8 CopyMem    :  1.51 secs (+ 3.4%)
CMQ&B CopyMemQuick:  1.46 secs
Copymemquicker2.8 CopyMemQuick:  1.48 secs (+ 0.7%)

Copying 65536 bytes 73 times (long -> long+1 offset)
CMQ&B CopyMem    :  0.39 secs
Copymemquicker2.8 CopyMem    :  0.79 secs (+102.6%)

Copying 65536 bytes 206 times (long -> even offset)
CMQ&B CopyMem    :  1.13 secs
Copymemquicker2.8 CopyMem    :  1.56 secs (+38.0%)

Copying 65536 bytes 73 times (long -> even+1 offset)
CMQ&B CopyMem    :  0.39 secs
Copymemquicker2.8 CopyMem    :  0.81 secs (+105.1%)

Copying 65536 bytes 73 times (long+1 -> long offset)
CMQ&B CopyMem    :  0.45 secs
Copymemquicker2.8 CopyMem    :  0.81 secs (+80.0%)

Copying 65536 bytes 191 times (long+1 -> long+1 offset)
CMQ&B CopyMem    :  0.99 secs
Copymemquicker2.8 CopyMem    :  1.03 secs (+ 3.0%)

Copying 65536 bytes 73 times (long+1 -> even offset)
CMQ&B CopyMem    :  0.40 secs
Copymemquicker2.8 CopyMem    :  0.79 secs (+97.5%)

Copying 65536 bytes 250 times (long+1 -> even+1 offset)
CMQ&B CopyMem    :  1.38 secs
Copymemquicker2.8 CopyMem    :  1.48 secs (+ 6.5%)

Copying 65536 bytes 250 times (even -> long offset)
CMQ&B CopyMem    :  1.51 secs
Copymemquicker2.8 CopyMem    :  1.46 secs (- 3.3%)

Copying 65536 bytes 73 times (even -> long+1 offset)
CMQ&B CopyMem    :  0.43 secs
Copymemquicker2.8 CopyMem    :  0.81 secs (+88.4%)

Copying 65536 bytes 191 times (even -> even offset)
CMQ&B CopyMem    :  0.98 secs
Copymemquicker2.8 CopyMem    :  1.01 secs (+ 3.1%)

Copying 65536 bytes 73 times (even -> even+1 offset)
CMQ&B CopyMem    :  0.39 secs
Copymemquicker2.8 CopyMem    :  0.80 secs (+102.6%)

Copying 65536 bytes 73 times (even+1 -> long offset)
CMQ&B CopyMem    :  0.44 secs
Copymemquicker2.8 CopyMem    :  0.78 secs (+75.0%)

Copying 65536 bytes 206 times (even+1 -> long+1 offset)
CMQ&B CopyMem    :  1.25 secs
Copymemquicker2.8 CopyMem    :  1.55 secs (+24.0%)

Copying 65536 bytes 73 times (even+1 -> even offset)
CMQ&B CopyMem    :  0.43 secs
Copymemquicker2.8 CopyMem    :  0.81 secs (+88.4%)

Copying 65536 bytes 282 times (even+1 -> even+1 offset)
CMQ&B CopyMem    :  1.46 secs
Copymemquicker2.8 CopyMem    :  1.53 secs (+ 4.1%)

Copying 1024 bytes 16950 times (long -> long offset)
CMQ&B CopyMem    :  1.38 secs
Copymemquicker2.8 CopyMem    :  1.49 secs (+ 8.0%)
CMQ&B CopyMemQuick:  1.38 secs
Copymemquicker2.8 CopyMemQuick:  1.49 secs (+ 8.0%)

Copying 1024 bytes 4700 times (long -> long+1 offset)
CMQ&B CopyMem    :  0.40 secs
Copymemquicker2.8 CopyMem    :  0.81 secs (+102.5%)

Copying 1024 bytes 12000 times (even -> even offset)
CMQ&B CopyMem    :  1.01 secs
Copymemquicker2.8 CopyMem    :  1.08 secs (+ 5.9%)

Copying 128 bytes 98000 times (long -> long offset)
CMQ&B CopyMem    :  1.08 secs
Copymemquicker2.8 CopyMem    :  1.06 secs (- 0.9%)
CMQ&B CopyMemQuick:  0.98 secs
Copymemquicker2.8 CopyMemQuick:  0.99 secs (+ 1.0%)

Copying 128 bytes 77500 times (even -> even offset)
CMQ&B CopyMem    :  0.89 secs
Copymemquicker2.8 CopyMem    :  0.93 secs (+ 3.4%)

Copying 19 bytes 294000 times (long -> long offset)
CMQ&B CopyMem    :  0.53 secs
Copymemquicker2.8 CopyMem    :  0.68 secs (+28.3%)

Copying 18 bytes 311000 times (long -> long offset)
CMQ&B CopyMem    :  0.54 secs
Copymemquicker2.8 CopyMem    :  0.66 secs (+20.4%)

Copying 17 bytes 331500 times (long -> long offset)
CMQ&B CopyMem    :  0.56 secs
Copymemquicker2.8 CopyMem    :  0.71 secs (+25.0%)

Copying 16 bytes 478000 times (long -> long offset)
CMQ&B CopyMem    :  0.76 secs
Copymemquicker2.8 CopyMem    :  0.91 secs (+18.4%)
CMQ&B CopyMemQuick:  0.61 secs
Copymemquicker2.8 CopyMemQuick:  0.54 secs (- 9.8%)

Copying 8 bytes 530000 times (long -> long offset)
CMQ&B CopyMem    :  0.46 secs
Copymemquicker2.8 CopyMem    :  1.08 secs (+132.6%)
CMQ&B CopyMemQuick:  0.30 secs
Copymemquicker2.8 CopyMemQuick:  0.28 secs (- 3.3%)

Copying 4 bytes 715000 times (long -> long offset)
CMQ&B CopyMem    :  0.26 secs
Copymemquicker2.8 CopyMem    :  0.83 secs (+215.4%)
CMQ&B CopyMemQuick:  0.05 secs
Copymemquicker2.8 CopyMemQuick:  0.23 secs (+360.0%)

Copying 1 bytes 1095000 times (long -> long offset)
CMQ&B CopyMem    :  0.41 secs
Copymemquicker2.8 CopyMem    :  0.51 secs (+24.4%)

Total timing:
-------------
CMQ&B routines    :  26.71 secs
Copymemquicker2.8 routines    :  33.48 secs
Total slowdown    :  25.31 %
Some Testit results for CMQ&B040:
Code: [Select]
This test will compare the CMQ&B040 CopyMem/CopyMemQuick routines with
the Copymemquicker2.8 ones you have installed.  A great variety of tests will be
run, and this might take some time, especially if your system has a
slow processor.

Initiating test (please be patient...)

Copying 65536 bytes 565 times (long -> long offset)
CMQ&B040 CopyMem    :  4.08 secs
Copymemquicker2.8 CopyMem    :  5.89 secs (+44.4%)
CMQ&B040 CopyMemQuick:  4.08 secs
Copymemquicker2.8 CopyMemQuick:  5.86 secs (+43.6%)

Copying 65536 bytes 147 times (long -> long+1 offset)
CMQ&B040 CopyMem    :  1.46 secs
Copymemquicker2.8 CopyMem    :  1.61 secs (+10.3%)

Copying 65536 bytes 413 times (long -> even offset)
CMQ&B040 CopyMem    :  4.13 secs
Copymemquicker2.8 CopyMem    :  4.40 secs (+ 6.3%)

Copying 65536 bytes 147 times (long -> even+1 offset)
CMQ&B040 CopyMem    :  1.46 secs
Copymemquicker2.8 CopyMem    :  1.59 secs (+ 8.9%)

Copying 65536 bytes 147 times (long+1 -> long offset)
CMQ&B040 CopyMem    :  1.49 secs
Copymemquicker2.8 CopyMem    :  1.60 secs (+ 6.7%)

Copying 65536 bytes 382 times (long+1 -> long+1 offset)
CMQ&B040 CopyMem    :  2.71 secs
Copymemquicker2.8 CopyMem    :  3.96 secs (+46.1%)

Copying 65536 bytes 147 times (long+1 -> even offset)
CMQ&B040 CopyMem    :  1.48 secs
Copymemquicker2.8 CopyMem    :  1.61 secs (+ 8.8%)

Copying 65536 bytes 501 times (long+1 -> even+1 offset)
CMQ&B040 CopyMem    :  5.06 secs
Copymemquicker2.8 CopyMem    :  5.26 secs (+ 3.9%)

Copying 65536 bytes 501 times (even -> long offset)
CMQ&B040 CopyMem    :  5.16 secs
Copymemquicker2.8 CopyMem    :  5.24 secs (+ 1.5%)

Copying 65536 bytes 147 times (even -> long+1 offset)
CMQ&B040 CopyMem    :  1.53 secs
Copymemquicker2.8 CopyMem    :  1.59 secs (+ 3.9%)

Copying 65536 bytes 382 times (even -> even offset)
CMQ&B040 CopyMem    :  2.71 secs
Copymemquicker2.8 CopyMem    :  3.96 secs (+46.1%)

Copying 65536 bytes 147 times (even -> even+1 offset)
CMQ&B040 CopyMem    :  1.48 secs
Copymemquicker2.8 CopyMem    :  1.59 secs (+ 7.4%)

Copying 65536 bytes 147 times (even+1 -> long offset)
CMQ&B040 CopyMem    :  1.53 secs
Copymemquicker2.8 CopyMem    :  1.58 secs (+ 2.6%)

Copying 65536 bytes 413 times (even+1 -> long+1 offset)
CMQ&B040 CopyMem    :  4.30 secs
Copymemquicker2.8 CopyMem    :  4.41 secs (+ 2.5%)

Copying 65536 bytes 147 times (even+1 -> even offset)
CMQ&B040 CopyMem    :  1.53 secs
Copymemquicker2.8 CopyMem    :  1.59 secs (+ 3.9%)

Copying 65536 bytes 564 times (even+1 -> even+1 offset)
CMQ&B040 CopyMem    :  3.99 secs
Copymemquicker2.8 CopyMem    :  5.85 secs (+46.4%)

Copying 1024 bytes 33900 times (long -> long offset)
CMQ&B040 CopyMem    :  0.29 secs
Copymemquicker2.8 CopyMem    :  0.35 secs (+17.2%)
CMQ&B040 CopyMemQuick:  0.30 secs
Copymemquicker2.8 CopyMemQuick:  0.34 secs (+13.3%)

Copying 1024 bytes 9400 times (long -> long+1 offset)
CMQ&B040 CopyMem    :  0.16 secs
Copymemquicker2.8 CopyMem    :  0.28 secs (+68.7%)

Copying 1024 bytes 24000 times (even -> even offset)
CMQ&B040 CopyMem    :  0.24 secs
Copymemquicker2.8 CopyMem    :  0.25 secs (+ 0.0%)

Copying 128 bytes 196000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.30 secs
Copymemquicker2.8 CopyMem    :  0.38 secs (+26.7%)
CMQ&B040 CopyMemQuick:  0.26 secs
Copymemquicker2.8 CopyMemQuick:  0.35 secs (+30.8%)

Copying 128 bytes 155000 times (even -> even offset)
CMQ&B040 CopyMem    :  0.28 secs
Copymemquicker2.8 CopyMem    :  0.31 secs (+10.7%)

Copying 19 bytes 588000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.39 secs
Copymemquicker2.8 CopyMem    :  0.51 secs (+28.2%)

Copying 18 bytes 622000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.38 secs
Copymemquicker2.8 CopyMem    :  0.41 secs (+ 7.9%)

Copying 17 bytes 663000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.41 secs
Copymemquicker2.8 CopyMem    :  0.55 secs (+31.7%)

Copying 16 bytes 956000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.53 secs
Copymemquicker2.8 CopyMem    :  0.58 secs (+ 7.5%)
CMQ&B040 CopyMemQuick:  0.53 secs
Copymemquicker2.8 CopyMemQuick:  0.51 secs (- 1.9%)

Copying 8 bytes 1060000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.33 secs
Copymemquicker2.8 CopyMem    :  0.58 secs (+75.7%)
CMQ&B040 CopyMemQuick:  0.35 secs
Copymemquicker2.8 CopyMemQuick:  0.46 secs (+31.4%)

Copying 4 bytes 1430000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.23 secs
Copymemquicker2.8 CopyMem    :  0.51 secs (+121.7%)
CMQ&B040 CopyMemQuick:  0.24 secs
Copymemquicker2.8 CopyMemQuick:  0.41 secs (+66.7%)

Copying 1 bytes 2190000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.41 secs
Copymemquicker2.8 CopyMem    :  0.50 secs (+19.5%)

Total timing:
-------------
CMQ&B040 routines    :  53.98 secs
Copymemquicker2.8 routines    :  65.04 secs
Total slowdown    :  20.49 %  

« Last Edit: January 29, 2015, 01:20:19 PM by SpeedGeek »
 

Offline SpeedGeekTopic starter

Re: Copymem Quick & Big Released!
« Reply #6 on: July 04, 2020, 01:04:10 PM »
** 2ND NEWS UPDATE **

CMQ&B040 1.9 released!

-v1.9 New smart buffer copy code provides a BIG SPEED UP
since the MOVE16 alignment restrictions are well handled!
(See new Testit results on EAB).

P.S. If you still don't get it just look at the Old Copymem results. Old = CMQ&B040 and New = Copymemquicker 2.8!
« Last Edit: July 06, 2020, 01:59:34 PM by SpeedGeek »
 

Offline SpeedGeekTopic starter

Re: Copymem Quick & Big Released!
« Reply #7 on: July 05, 2020, 06:33:39 PM »
What is practical difference? What is faster with these and how much? I do I get 2fps faster Quake or what?

The Testit results speak for themselves. What is faster is large block copies which don't meet the alignment requirements of Move16. Please direct your application specific questions (e.g. Quake) to their respective developers.

I suggest you download Testit (included with COPMQR28) and determine your own results. I suspect the reason so many here didn't get it was because they were to lazy to determine their own results.

http://aminet.net/search?query=copmqr28





 
 

Offline SpeedGeekTopic starter

Re: Copymem Quick & Big Released!
« Reply #8 on: July 07, 2020, 01:14:29 AM »
I am with utri007 on this. What is the point of being faster on 1mio copies of 128 bytes if in reality this situation never happens? Where is the bottleneck you are trying to fix, how did you even find that? Optimizations are probably fun for you, but if it only shows in arbitrary tests - what is the point?

A few more thoughts from Mr. Spock:

Logically, if 1mio copies of 128 bytes are faster, than 100K copies of 128 bytes should also be faster, and 10K copies and 1K copies, etc. and so on.

The problem is that most lamers don't understand that coding a benchmark program  (which offers even a moderate degree of accuracy) is no easy task. In fact, given the limitations of the Amiga Hardware (timer.device) and the Amiga OS (multitasking and serious interrupt dependencies) and compounded by Software developer tools limitations (SAS Crap compiler) it's truly amazing and very impressive that some Amiga Benchmark programs work as well as they do.

But the one thing all Benchmark coders will appreciate (given the above constraints) is that More Iterations = Better Accuracy.

So logically speaking, the 1mio copies of 128 bytes has a practical purpose (even if some lamer still doesn't get it).

The bottleneck fix has already been explained and I don't like repeating myself. Optimizations are often more work than fun, but there is some satisfaction in finally solving the bottleneck problem. Now, if I could only do that and avoid these silly lamer questions...  ::)   

   
« Last Edit: July 08, 2020, 03:22:03 PM by SpeedGeek »
 

Offline SpeedGeekTopic starter

Re: Copymem Quick & Big Released!
« Reply #9 on: July 07, 2020, 09:03:23 PM »
@SpeedGeek My lamer question comes from your own statement
Quote
The main goal is to give the fastest possible results with Testit from COPMQR28
.

...because this Benchmark tool was used by nearly all previous CopymemQuick patch developers. Now obviously, if you want to compare performance results on an "Apples to Apples" basis you will need the very same Benchmark tool running on the very same system. Now, if you read this thread completely you can see there are several requests for comparison results with these previously developed patches.
     
As this benchmark seems to do thousands of sequential copies (maybe even from/to the same address every time?) one can assume this optimization only really shows in this kind of cases. Once we remove the repeating out of the copy procedure(like from 1mio down to 10, 50) the results are, probably, still faster, but the absolute effect is only marginal, as normal copyroutines don't do that mio times repeating. So the gain is a few ms every minute of normal operation, say Workbench or whatever?
If the installation of this patch takes one more reboot or a few seconds to install, then i lost all speed benefit for, say, the whole week in advance?

Ok, so then why didn't you develop your own Benchmark tool which provides the specific results (which in your opinion) are more practical and reliable? Apparently, you are just not up to the task.  :(
 
It is still interesting, but as long as LightWave or PPaint or Quake or Doom are not getting accelerated, i pass on this patch. ;)

Really? Some assumptions you easily made but what efforts did you make to provide conclusive proof? Thanks (for the pass), but it's really too bad you didn't realize just how much of your valuable time (and mine) you could have saved by passing on the lamer questions too.  ::)
« Last Edit: July 08, 2020, 01:53:10 PM by SpeedGeek »
 

Offline SpeedGeekTopic starter

Re: Copymem Quick & Big Released!
« Reply #10 on: July 11, 2020, 02:17:11 AM »
** 3RD NEWS UPDATE **

CMQ&B040 2.0 released!

v2.0 Fixed a seldom occuring but serious bug with internal Smart
buffer usage.
- Nested call large block copies (WHEN MISALIGNED!) could corrupt
each others data when sharing the same buffer. This fix uses a stack
based buffer solution which results in a private buffer for each call.
 

Offline SpeedGeekTopic starter

Re: Copymem Quick & Big Released!
« Reply #11 on: February 13, 2021, 02:25:06 PM »
** 4TH NEWS UPDATE **

CMQ&B 1.7 released!

v1.7 Updated Big loop code with faster instructions. Increased
Big loop copy size to 112 bytes. Replaced Small loop copy code
with new JMP copy code for <= 108 bytes (See new testit results for 1.7).
 

Offline SpeedGeekTopic starter

Re: Copymem Quick & Big Released!
« Reply #12 on: April 26, 2021, 01:18:38 PM »
** 5TH NEWS UPDATE **

CMQ&B040 2.1 released!

v2.1 Many changes
- Fixed a rarely occurring stack size bug when the stack was word
aligned and offset by one word from a 16 byte aligned address.
- Added code to test for the Move16 address bug and safely exit upon
detection
- Added code to restrict Smart buffer copy usage when the
destination address is in Chip RAM.
- Added code to change the default Block size
 

Offline SpeedGeekTopic starter

Re: Copymem Quick & Big Released!
« Reply #13 on: May 12, 2021, 12:57:41 PM »
** 6TH NEWS UPDATE **

CMQ&B040 2.2 released!

v2.2 minor change
- Removed "Move16 Bug" detection code. This was a blunder due to
Ax = Ay meaning the same registers rather than the same addresses.
 

Offline SpeedGeekTopic starter

Re: Copymem Quick & Big Released!
« Reply #14 on: June 07, 2021, 08:28:52 PM »
** 7TH NEWS UPDATE **

CMQ&B040 2.3 released!

v2.3 minor change
- Changed address register longword math to word math for the
Smart buffer copy loop. This is a small optimization but we always
want the fastest possible results