Amiga.org

Amiga computer related discussion => Amiga Software Issues and Discussion => Topic started by: SpeedGeek on January 25, 2015, 05:34:58 PM

Title: Copymem Quick & Big Released!
Post by: SpeedGeek on January 25, 2015, 05:34:58 PM
Here is a link to this thread:

http://eab.abime.net/showthread.php?p=999829
Title: Re: Copymem Quick & Big Released!
Post by: SpeedGeek on January 27, 2015, 11:53:52 AM
** NEWS UPDATE **

CMQ&B040 v1.8 released

v1.8 minor change
- removed obsolete Copymemquick source address compare code
Title: Re: Copymem Quick & Big Released!
Post by: Ral-Clan on January 27, 2015, 12:51:19 PM
I don't uderstand the results of the tests --- all of these seem to show the new library takes LONGER to copy the same amount of data as the old library.

For example:

Copying 65536 bytes 282 times (long -> long offset)
Old CopyMem    :  1.46 secs
New CopyMem    :  1.51 secs (+ 3.4%)
Title: Re: Copymem Quick & Big Released!
Post by: SpeedGeek on January 27, 2015, 01:00:09 PM
@ral-clan

Testit always compares the old Copymem + Copymemquick routines against  the new routines from Copymemquicker2.8. So the "Old" results are the  results the CMQ patch installed before Copymemquicker2.8 (or the  original routines if no CMQ patch was installed). If you read the  results carefully you will see the "Old" results are much faster than the  "New" results. ;)
Title: Re: Copymem Quick & Big Released!
Post by: Oldsmobile_Mike on January 27, 2015, 03:48:32 PM
So the "new" results are with the old version of the patch, and the "old" results are with the new version of the patch?

Sorry, I haven't had my coffee yet today, but that makes no sense to me, either.  ;)

I'd write them something like:

Speed with no patch: x
Speed with old version of patch: y
Speed with new version of patch: z

But maybe that's just me.  ;)
Title: Re: Copymem Quick & Big Released!
Post by: motrucker on January 27, 2015, 05:18:11 PM
Quote from: Oldsmobile_Mike;782432
So the "new" results are with the old version of the patch, and the "old" results are with the new version of the patch?

Sorry, I haven't had my coffee yet today, but that makes no sense to me, either.  ;)

I'd write them something like:

Speed with no patch: x
Speed with old version of patch: y
Speed with new version of patch: z

But maybe that's just me.  ;)

No, it's not just you. And I have had several cups of tea (Irish Breakfast!)
Title: Re: Copymem Quick & Big Released!
Post by: paul1981 on January 27, 2015, 06:01:47 PM
Quote from: motrucker;782437
No, it's not just you. And I have had several cups of tea (Irish Breakfast!)


Should of made them Irish coffee's... Then you would have understood those results :)
Title: Re: Copymem Quick & Big Released!
Post by: Oldsmobile_Mike on January 27, 2015, 06:07:26 PM
Quote from: paul1981;782440
Should of made them Irish coffee's... Then you would have understood those results :)

:pint::pint:

I think @speedgeek thinks differently than the rest of us.  But as long as he keeps churning out his awesome performance hacks, I'm okay with that!  :D
Title: Re: Copymem Quick & Big Released!
Post by: SpeedGeek on January 28, 2015, 03:14:49 PM
Why is this so hard to understand?

1) Old results = CMQ patch routines (or original routines if no CMQ patch installed)
2) New results = Copymemquicker2.8 routines

You guys must need some coffee, tea and vitamins... :lol:
Title: Re: Copymem Quick & Big Released!
Post by: Oldsmobile_Mike on January 28, 2015, 03:53:07 PM
Quote from: SpeedGeek;782497
Why is this so hard to understand?

1) Old results = CMQ patch routines (or original routines if no CMQ patch installed)
2) New results = Copymemquicker2.8 routines

You guys must need some coffee, tea and vitamins... :lol:

That still doesn't make any sense, man.  Because in your example, you said:

Old CopyMem    :  1.46 secs
New CopyMem    :  1.51 secs (+ 3.4%)       

So the old routine completes in 1.46 seconds.  The new routine completes in 1.51 seconds.  Any way of reading that the old routine is faster, then.

Oh well, I give up.  ?????  Keep up the good work!  :roflmao:
Title: Re: Copymem Quick & Big Released!
Post by: SpeedGeek on January 28, 2015, 04:23:30 PM
Quote from: Oldsmobile_Mike;782501
That still doesn't make any sense, man.  Because in your example, you said:

Old CopyMem    :  1.46 secs
New CopyMem    :  1.51 secs (+ 3.4%)        

So the old routine completes in 1.46 seconds.  The new routine completes in 1.51 seconds.  Any way of reading that the old routine is faster, then.

Oh well, I give up.  ?????  Keep up the good work!  :roflmao:

A few thoughts from Mr. Spock...

Logically, for the old routines to be faster than the new routines there a 2 possible choices here:

1) The old routines (from exec) were patched with a CMQ patch which has faster routines than the new routines (from Copymemquicker2.8)

OR

2) The old routines (from exec) are faster than the new routines (from Copymemquicker2.8)

P.S. If the old routines (from exec) are really faster than why would anyone bother coding or using CMQ patches?
Title: Re: Copymem Quick & Big Released!
Post by: Rabbi on January 28, 2015, 07:12:49 PM
Well, as everyone knows, Irish coffee provides your 4 basic food nutrient groups:

1) Sugar
2) Fat
3) Caffeine
4) Alcohol
Title: Re: Copymem Quick & Big Released!
Post by: Rabbi on January 28, 2015, 07:14:56 PM
Quote from: paul1981;782440
Should of made them Irish coffee's... Then you would have understood those results :)


Well, as everyone knows, Irish coffee provides your 4 basic food nutrient groups:

1) Sugar
2) Fat
3) Caffeine
4) Alcohol
Title: Re: Copymem Quick & Big Released!
Post by: Oldsmobile_Mike on January 28, 2015, 07:33:41 PM
Quote from: SpeedGeek;782503
A few thoughts from Mr. Spock...

Logically, for the old routines to be faster than the new routines there a 2 possible choices here:

1) The old routines (from exec) were patched with a CMQ patch which has faster routines than the new routines (from Copymemquicker2.8)

OR

2) The old routines (from exec) are faster than the new routines (from Copymemquicker2.8)

P.S. If the old routines (from exec) are really faster than why would anyone bother coding or using CMQ patches?

This still doesn't make sense. Although I think I'm starting to get your logic, it's rather confusing to the layman. Trying to dumb it down for the average person, I'd still write it like this:

Performance of routines without any patch: x
Performance of routines with version of patch you released a month or so back: y
Performance of routines with new version of patch you released this week: z

Logically, that should be a descending sequence of numbers. I.e., if it takes 20 seconds without any patch, 15 seconds with the old version of patch, and 10 seconds with the latest new and improved version of the patch. Why would you write a new version of the patch that's slower than the old version of the patch? :D

Would still like to see a speed comparison against this version, BTW! :)

http://aminet.net/package/util/boot/CopyMem

Anyhow, just messing with ya. Keep up the good work! :)
Title: Re: Copymem Quick & Big Released!
Post by: XDelusion on January 28, 2015, 08:21:51 PM
I guess I would not care if it is a tad slower, so long as it is friendlier with resources.
Title: Re: Copymem Quick & Big Released!
Post by: paul1981 on January 28, 2015, 09:24:49 PM
I still don't understand the results table. Never mind though, I can't help being stupid!

Is it quicker than this for 060's? : http://m68k.aminet.net/package/util/boot/NewCMQ060
Title: Re: Copymem Quick & Big Released!
Post by: SpeedGeek on January 29, 2015, 12:57:07 PM
OK, I replaced "Old" and "New" with the names of the patch routines:
(If you still don't get it I'm sorry I can't help you)

Some Testit results for CMQ&B:
Code: [Select]
This test will compare the CMQ&B CopyMem/CopyMemQuick routines with
the Copymemquicker2.8 ones you have installed.  A great variety of tests will be
run, and this might take some time, especially if your system has a
slow processor.

Initiating test (please be patient...)

Copying 65536 bytes 282 times (long -> long offset)
CMQ&B CopyMem    :  1.46 secs
Copymemquicker2.8 CopyMem    :  1.51 secs (+ 3.4%)
CMQ&B CopyMemQuick:  1.46 secs
Copymemquicker2.8 CopyMemQuick:  1.48 secs (+ 0.7%)

Copying 65536 bytes 73 times (long -> long+1 offset)
CMQ&B CopyMem    :  0.39 secs
Copymemquicker2.8 CopyMem    :  0.79 secs (+102.6%)

Copying 65536 bytes 206 times (long -> even offset)
CMQ&B CopyMem    :  1.13 secs
Copymemquicker2.8 CopyMem    :  1.56 secs (+38.0%)

Copying 65536 bytes 73 times (long -> even+1 offset)
CMQ&B CopyMem    :  0.39 secs
Copymemquicker2.8 CopyMem    :  0.81 secs (+105.1%)

Copying 65536 bytes 73 times (long+1 -> long offset)
CMQ&B CopyMem    :  0.45 secs
Copymemquicker2.8 CopyMem    :  0.81 secs (+80.0%)

Copying 65536 bytes 191 times (long+1 -> long+1 offset)
CMQ&B CopyMem    :  0.99 secs
Copymemquicker2.8 CopyMem    :  1.03 secs (+ 3.0%)

Copying 65536 bytes 73 times (long+1 -> even offset)
CMQ&B CopyMem    :  0.40 secs
Copymemquicker2.8 CopyMem    :  0.79 secs (+97.5%)

Copying 65536 bytes 250 times (long+1 -> even+1 offset)
CMQ&B CopyMem    :  1.38 secs
Copymemquicker2.8 CopyMem    :  1.48 secs (+ 6.5%)

Copying 65536 bytes 250 times (even -> long offset)
CMQ&B CopyMem    :  1.51 secs
Copymemquicker2.8 CopyMem    :  1.46 secs (- 3.3%)

Copying 65536 bytes 73 times (even -> long+1 offset)
CMQ&B CopyMem    :  0.43 secs
Copymemquicker2.8 CopyMem    :  0.81 secs (+88.4%)

Copying 65536 bytes 191 times (even -> even offset)
CMQ&B CopyMem    :  0.98 secs
Copymemquicker2.8 CopyMem    :  1.01 secs (+ 3.1%)

Copying 65536 bytes 73 times (even -> even+1 offset)
CMQ&B CopyMem    :  0.39 secs
Copymemquicker2.8 CopyMem    :  0.80 secs (+102.6%)

Copying 65536 bytes 73 times (even+1 -> long offset)
CMQ&B CopyMem    :  0.44 secs
Copymemquicker2.8 CopyMem    :  0.78 secs (+75.0%)

Copying 65536 bytes 206 times (even+1 -> long+1 offset)
CMQ&B CopyMem    :  1.25 secs
Copymemquicker2.8 CopyMem    :  1.55 secs (+24.0%)

Copying 65536 bytes 73 times (even+1 -> even offset)
CMQ&B CopyMem    :  0.43 secs
Copymemquicker2.8 CopyMem    :  0.81 secs (+88.4%)

Copying 65536 bytes 282 times (even+1 -> even+1 offset)
CMQ&B CopyMem    :  1.46 secs
Copymemquicker2.8 CopyMem    :  1.53 secs (+ 4.1%)

Copying 1024 bytes 16950 times (long -> long offset)
CMQ&B CopyMem    :  1.38 secs
Copymemquicker2.8 CopyMem    :  1.49 secs (+ 8.0%)
CMQ&B CopyMemQuick:  1.38 secs
Copymemquicker2.8 CopyMemQuick:  1.49 secs (+ 8.0%)

Copying 1024 bytes 4700 times (long -> long+1 offset)
CMQ&B CopyMem    :  0.40 secs
Copymemquicker2.8 CopyMem    :  0.81 secs (+102.5%)

Copying 1024 bytes 12000 times (even -> even offset)
CMQ&B CopyMem    :  1.01 secs
Copymemquicker2.8 CopyMem    :  1.08 secs (+ 5.9%)

Copying 128 bytes 98000 times (long -> long offset)
CMQ&B CopyMem    :  1.08 secs
Copymemquicker2.8 CopyMem    :  1.06 secs (- 0.9%)
CMQ&B CopyMemQuick:  0.98 secs
Copymemquicker2.8 CopyMemQuick:  0.99 secs (+ 1.0%)

Copying 128 bytes 77500 times (even -> even offset)
CMQ&B CopyMem    :  0.89 secs
Copymemquicker2.8 CopyMem    :  0.93 secs (+ 3.4%)

Copying 19 bytes 294000 times (long -> long offset)
CMQ&B CopyMem    :  0.53 secs
Copymemquicker2.8 CopyMem    :  0.68 secs (+28.3%)

Copying 18 bytes 311000 times (long -> long offset)
CMQ&B CopyMem    :  0.54 secs
Copymemquicker2.8 CopyMem    :  0.66 secs (+20.4%)

Copying 17 bytes 331500 times (long -> long offset)
CMQ&B CopyMem    :  0.56 secs
Copymemquicker2.8 CopyMem    :  0.71 secs (+25.0%)

Copying 16 bytes 478000 times (long -> long offset)
CMQ&B CopyMem    :  0.76 secs
Copymemquicker2.8 CopyMem    :  0.91 secs (+18.4%)
CMQ&B CopyMemQuick:  0.61 secs
Copymemquicker2.8 CopyMemQuick:  0.54 secs (- 9.8%)

Copying 8 bytes 530000 times (long -> long offset)
CMQ&B CopyMem    :  0.46 secs
Copymemquicker2.8 CopyMem    :  1.08 secs (+132.6%)
CMQ&B CopyMemQuick:  0.30 secs
Copymemquicker2.8 CopyMemQuick:  0.28 secs (- 3.3%)

Copying 4 bytes 715000 times (long -> long offset)
CMQ&B CopyMem    :  0.26 secs
Copymemquicker2.8 CopyMem    :  0.83 secs (+215.4%)
CMQ&B CopyMemQuick:  0.05 secs
Copymemquicker2.8 CopyMemQuick:  0.23 secs (+360.0%)

Copying 1 bytes 1095000 times (long -> long offset)
CMQ&B CopyMem    :  0.41 secs
Copymemquicker2.8 CopyMem    :  0.51 secs (+24.4%)

Total timing:
-------------
CMQ&B routines    :  26.71 secs
Copymemquicker2.8 routines    :  33.48 secs
Total slowdown    :  25.31 %
Some Testit results for CMQ&B040:
Code: [Select]
This test will compare the CMQ&B040 CopyMem/CopyMemQuick routines with
the Copymemquicker2.8 ones you have installed.  A great variety of tests will be
run, and this might take some time, especially if your system has a
slow processor.

Initiating test (please be patient...)

Copying 65536 bytes 565 times (long -> long offset)
CMQ&B040 CopyMem    :  4.08 secs
Copymemquicker2.8 CopyMem    :  5.89 secs (+44.4%)
CMQ&B040 CopyMemQuick:  4.08 secs
Copymemquicker2.8 CopyMemQuick:  5.86 secs (+43.6%)

Copying 65536 bytes 147 times (long -> long+1 offset)
CMQ&B040 CopyMem    :  1.46 secs
Copymemquicker2.8 CopyMem    :  1.61 secs (+10.3%)

Copying 65536 bytes 413 times (long -> even offset)
CMQ&B040 CopyMem    :  4.13 secs
Copymemquicker2.8 CopyMem    :  4.40 secs (+ 6.3%)

Copying 65536 bytes 147 times (long -> even+1 offset)
CMQ&B040 CopyMem    :  1.46 secs
Copymemquicker2.8 CopyMem    :  1.59 secs (+ 8.9%)

Copying 65536 bytes 147 times (long+1 -> long offset)
CMQ&B040 CopyMem    :  1.49 secs
Copymemquicker2.8 CopyMem    :  1.60 secs (+ 6.7%)

Copying 65536 bytes 382 times (long+1 -> long+1 offset)
CMQ&B040 CopyMem    :  2.71 secs
Copymemquicker2.8 CopyMem    :  3.96 secs (+46.1%)

Copying 65536 bytes 147 times (long+1 -> even offset)
CMQ&B040 CopyMem    :  1.48 secs
Copymemquicker2.8 CopyMem    :  1.61 secs (+ 8.8%)

Copying 65536 bytes 501 times (long+1 -> even+1 offset)
CMQ&B040 CopyMem    :  5.06 secs
Copymemquicker2.8 CopyMem    :  5.26 secs (+ 3.9%)

Copying 65536 bytes 501 times (even -> long offset)
CMQ&B040 CopyMem    :  5.16 secs
Copymemquicker2.8 CopyMem    :  5.24 secs (+ 1.5%)

Copying 65536 bytes 147 times (even -> long+1 offset)
CMQ&B040 CopyMem    :  1.53 secs
Copymemquicker2.8 CopyMem    :  1.59 secs (+ 3.9%)

Copying 65536 bytes 382 times (even -> even offset)
CMQ&B040 CopyMem    :  2.71 secs
Copymemquicker2.8 CopyMem    :  3.96 secs (+46.1%)

Copying 65536 bytes 147 times (even -> even+1 offset)
CMQ&B040 CopyMem    :  1.48 secs
Copymemquicker2.8 CopyMem    :  1.59 secs (+ 7.4%)

Copying 65536 bytes 147 times (even+1 -> long offset)
CMQ&B040 CopyMem    :  1.53 secs
Copymemquicker2.8 CopyMem    :  1.58 secs (+ 2.6%)

Copying 65536 bytes 413 times (even+1 -> long+1 offset)
CMQ&B040 CopyMem    :  4.30 secs
Copymemquicker2.8 CopyMem    :  4.41 secs (+ 2.5%)

Copying 65536 bytes 147 times (even+1 -> even offset)
CMQ&B040 CopyMem    :  1.53 secs
Copymemquicker2.8 CopyMem    :  1.59 secs (+ 3.9%)

Copying 65536 bytes 564 times (even+1 -> even+1 offset)
CMQ&B040 CopyMem    :  3.99 secs
Copymemquicker2.8 CopyMem    :  5.85 secs (+46.4%)

Copying 1024 bytes 33900 times (long -> long offset)
CMQ&B040 CopyMem    :  0.29 secs
Copymemquicker2.8 CopyMem    :  0.35 secs (+17.2%)
CMQ&B040 CopyMemQuick:  0.30 secs
Copymemquicker2.8 CopyMemQuick:  0.34 secs (+13.3%)

Copying 1024 bytes 9400 times (long -> long+1 offset)
CMQ&B040 CopyMem    :  0.16 secs
Copymemquicker2.8 CopyMem    :  0.28 secs (+68.7%)

Copying 1024 bytes 24000 times (even -> even offset)
CMQ&B040 CopyMem    :  0.24 secs
Copymemquicker2.8 CopyMem    :  0.25 secs (+ 0.0%)

Copying 128 bytes 196000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.30 secs
Copymemquicker2.8 CopyMem    :  0.38 secs (+26.7%)
CMQ&B040 CopyMemQuick:  0.26 secs
Copymemquicker2.8 CopyMemQuick:  0.35 secs (+30.8%)

Copying 128 bytes 155000 times (even -> even offset)
CMQ&B040 CopyMem    :  0.28 secs
Copymemquicker2.8 CopyMem    :  0.31 secs (+10.7%)

Copying 19 bytes 588000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.39 secs
Copymemquicker2.8 CopyMem    :  0.51 secs (+28.2%)

Copying 18 bytes 622000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.38 secs
Copymemquicker2.8 CopyMem    :  0.41 secs (+ 7.9%)

Copying 17 bytes 663000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.41 secs
Copymemquicker2.8 CopyMem    :  0.55 secs (+31.7%)

Copying 16 bytes 956000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.53 secs
Copymemquicker2.8 CopyMem    :  0.58 secs (+ 7.5%)
CMQ&B040 CopyMemQuick:  0.53 secs
Copymemquicker2.8 CopyMemQuick:  0.51 secs (- 1.9%)

Copying 8 bytes 1060000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.33 secs
Copymemquicker2.8 CopyMem    :  0.58 secs (+75.7%)
CMQ&B040 CopyMemQuick:  0.35 secs
Copymemquicker2.8 CopyMemQuick:  0.46 secs (+31.4%)

Copying 4 bytes 1430000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.23 secs
Copymemquicker2.8 CopyMem    :  0.51 secs (+121.7%)
CMQ&B040 CopyMemQuick:  0.24 secs
Copymemquicker2.8 CopyMemQuick:  0.41 secs (+66.7%)

Copying 1 bytes 2190000 times (long -> long offset)
CMQ&B040 CopyMem    :  0.41 secs
Copymemquicker2.8 CopyMem    :  0.50 secs (+19.5%)

Total timing:
-------------
CMQ&B040 routines    :  53.98 secs
Copymemquicker2.8 routines    :  65.04 secs
Total slowdown    :  20.49 %  

Title: Re: Copymem Quick & Big Released!
Post by: ChaosLord on January 29, 2015, 05:39:32 PM
Sorry I am late to the party.

I didn't read every single test result but the ones I read from the very latest post all make sense to me.

One thing everyone needs to keep in mind is that there are 2 different OS routines for copying memory: CopyMem(), CopyMemQuick()

2 different routines that basically do the same thing.  They both should be patched.

p.s. I luv 680x0 asm. :knuddel:
Title: Re: Copymem Quick & Big Released!
Post by: SpeedGeek on July 04, 2020, 01:04:10 PM
** 2ND NEWS UPDATE **

CMQ&B040 1.9 released!

-v1.9 New smart buffer copy code provides a BIG SPEED UP
since the MOVE16 alignment restrictions are well handled!
(See new Testit results on EAB).

P.S. If you still don't get it just look at the Old Copymem results. Old = CMQ&B040 and New = Copymemquicker 2.8!
Title: Re: Copymem Quick & Big Released!
Post by: utri007 on July 05, 2020, 12:31:09 AM
What is practical difference? What is faster with these and how much? I do I get 2fps faster Quake or what?
Title: Re: Copymem Quick & Big Released!
Post by: SpeedGeek on July 05, 2020, 06:33:39 PM
What is practical difference? What is faster with these and how much? I do I get 2fps faster Quake or what?

The Testit results speak for themselves. What is faster is large block copies which don't meet the alignment requirements of Move16. Please direct your application specific questions (e.g. Quake) to their respective developers.

I suggest you download Testit (included with COPMQR28) and determine your own results. I suspect the reason so many here didn't get it was because they were to lazy to determine their own results.

http://aminet.net/search?query=copmqr28





 
Title: Re: Copymem Quick & Big Released!
Post by: TribbleSmasher on July 05, 2020, 10:44:37 PM
I am with utri007 on this. What is the point of being faster on 1mio copies of 128 bytes if in reality this situation never happens? Where is the bottleneck you are trying to fix, how did you even find that? Optimizations are probably fun for you, but if it only shows in arbitrary tests - what is the point?

What is practical difference? What is faster with these and how much? I do I get 2fps faster Quake or what?
Title: Re: Copymem Quick & Big Released!
Post by: SpeedGeek on July 07, 2020, 01:14:29 AM
I am with utri007 on this. What is the point of being faster on 1mio copies of 128 bytes if in reality this situation never happens? Where is the bottleneck you are trying to fix, how did you even find that? Optimizations are probably fun for you, but if it only shows in arbitrary tests - what is the point?

A few more thoughts from Mr. Spock:

Logically, if 1mio copies of 128 bytes are faster, than 100K copies of 128 bytes should also be faster, and 10K copies and 1K copies, etc. and so on.

The problem is that most lamers don't understand that coding a benchmark program  (which offers even a moderate degree of accuracy) is no easy task. In fact, given the limitations of the Amiga Hardware (timer.device) and the Amiga OS (multitasking and serious interrupt dependencies) and compounded by Software developer tools limitations (SAS Crap compiler) it's truly amazing and very impressive that some Amiga Benchmark programs work as well as they do.

But the one thing all Benchmark coders will appreciate (given the above constraints) is that More Iterations = Better Accuracy.

So logically speaking, the 1mio copies of 128 bytes has a practical purpose (even if some lamer still doesn't get it).

The bottleneck fix has already been explained and I don't like repeating myself. Optimizations are often more work than fun, but there is some satisfaction in finally solving the bottleneck problem. Now, if I could only do that and avoid these silly lamer questions...  ::)   

   
Title: Re: Copymem Quick & Big Released!
Post by: kolla on July 07, 2020, 05:28:42 AM
Behind every bottleneck there is a new bottleneck, and the biggest bottleneck sits between behind the keyboard, and you will never ever be able to fix that :)
Title: Re: Copymem Quick & Big Released!
Post by: TribbleSmasher on July 07, 2020, 05:43:16 AM
@SpeedGeek My lamer question comes from your own statement
Quote
The main goal is to give the fastest possible results with Testit from COPMQR28
. As this benchmark seems to do thousands of sequential copies (maybe even from/to the same address every time?) one can assume this optimization only really shows in this kind of cases. Once we remove the repeating out of the copy procedure(like from 1mio down to 10, 50) the results are, probably, still faster, but the absolute effect is only marginal, as normal copyroutines don't do that mio times repeating. So the gain is a few ms every minute of normal operation, say Workbench or whatever?
If the installation of this patch takes one more reboot or a few seconds to install, then i lost all speed benefit for, say, the whole week in advance?
It is still interesting, but as long as LightWave or PPaint or Quake or Doom are not getting accelerated, i pass on this patch. ;)
Title: Re: Copymem Quick & Big Released!
Post by: SpeedGeek on July 07, 2020, 09:03:23 PM
@SpeedGeek My lamer question comes from your own statement
Quote
The main goal is to give the fastest possible results with Testit from COPMQR28
.

...because this Benchmark tool was used by nearly all previous CopymemQuick patch developers. Now obviously, if you want to compare performance results on an "Apples to Apples" basis you will need the very same Benchmark tool running on the very same system. Now, if you read this thread completely you can see there are several requests for comparison results with these previously developed patches.
     
As this benchmark seems to do thousands of sequential copies (maybe even from/to the same address every time?) one can assume this optimization only really shows in this kind of cases. Once we remove the repeating out of the copy procedure(like from 1mio down to 10, 50) the results are, probably, still faster, but the absolute effect is only marginal, as normal copyroutines don't do that mio times repeating. So the gain is a few ms every minute of normal operation, say Workbench or whatever?
If the installation of this patch takes one more reboot or a few seconds to install, then i lost all speed benefit for, say, the whole week in advance?

Ok, so then why didn't you develop your own Benchmark tool which provides the specific results (which in your opinion) are more practical and reliable? Apparently, you are just not up to the task.  :(
 
It is still interesting, but as long as LightWave or PPaint or Quake or Doom are not getting accelerated, i pass on this patch. ;)

Really? Some assumptions you easily made but what efforts did you make to provide conclusive proof? Thanks (for the pass), but it's really too bad you didn't realize just how much of your valuable time (and mine) you could have saved by passing on the lamer questions too.  ::)
Title: Re: Copymem Quick & Big Released!
Post by: kolla on July 08, 2020, 07:28:35 AM
People, just let this go - what SpeedGeek is doing, is called "sport" and has little to do with real life.

Compare it with a 100m runner who just found some new shoes to wear that might earn him/her a few ms in the race.

For normal people it is quite acceptable to slowly walk 100m, and to use other means of transport if one is in a hurry, typically for longer distance than 100m.
Title: Re: Copymem Quick & Big Released!
Post by: utri007 on July 08, 2020, 08:46:56 AM
@kolla You are wrong. I'm interested to get most off my Amiga, but only if it makes practical difference. I would be interested what is practical meaning of this patch, wich apps benefits it.

Do you happen to know good guide for that, which would explain what patches makes difference?
Title: Re: Copymem Quick & Big Released!
Post by: kolla on July 08, 2020, 01:25:32 PM
@kolla You are wrong.

Really.

Quote
I'm interested to get most off my Amiga, but only if it makes practical difference.

You here demonstrate that I am correct.

Quote
I would be interested what is practical meaning of this patch, wich apps benefits it.

Yeah, that would be interesting huh? Did you not read what SpeedGeek answered to this? He does not care, his care is about the sport - the benchmarks. Whether there are programs that benefit, depends entirely on the program and how they are coded. Do we know how Amiga programs are coded? Some yes, most no. So what are the gains? Hard to know, you would have to compare each and every program with and without the patches. Who would do that? People like you, who are so interested in the practical meanings of such patches. Do people who have shown interest in the practical meaning of such patches been able to do so? To a very small degree - instead they whine to the "athletes" to compile such lists, which they are not at all interested in doing. So there you are.

Quote
Do you happen to know good guide for that, which would explain what patches makes difference?

The fact that you must ask for this just demonstrates again that I am right.
Title: Re: Copymem Quick & Big Released!
Post by: SpeedGeek on July 11, 2020, 02:17:11 AM
** 3RD NEWS UPDATE **

CMQ&B040 2.0 released!

v2.0 Fixed a seldom occuring but serious bug with internal Smart
buffer usage.
- Nested call large block copies (WHEN MISALIGNED!) could corrupt
each others data when sharing the same buffer. This fix uses a stack
based buffer solution which results in a private buffer for each call.
Title: Re: Copymem Quick & Big Released!
Post by: SpeedGeek on February 13, 2021, 02:25:06 PM
** 4TH NEWS UPDATE **

CMQ&B 1.7 released!

v1.7 Updated Big loop code with faster instructions. Increased
Big loop copy size to 112 bytes. Replaced Small loop copy code
with new JMP copy code for <= 108 bytes (See new testit results for 1.7).
Title: Re: Copymem Quick & Big Released!
Post by: SpeedGeek on April 26, 2021, 01:18:38 PM
** 5TH NEWS UPDATE **

CMQ&B040 2.1 released!

v2.1 Many changes
- Fixed a rarely occurring stack size bug when the stack was word
aligned and offset by one word from a 16 byte aligned address.
- Added code to test for the Move16 address bug and safely exit upon
detection
- Added code to restrict Smart buffer copy usage when the
destination address is in Chip RAM.
- Added code to change the default Block size
Title: Re: Copymem Quick & Big Released!
Post by: SpeedGeek on May 12, 2021, 12:57:41 PM
** 6TH NEWS UPDATE **

CMQ&B040 2.2 released!

v2.2 minor change
- Removed "Move16 Bug" detection code. This was a blunder due to
Ax = Ay meaning the same registers rather than the same addresses.
Title: Re: Copymem Quick & Big Released!
Post by: SpeedGeek on June 07, 2021, 08:28:52 PM
** 7TH NEWS UPDATE **

CMQ&B040 2.3 released!

v2.3 minor change
- Changed address register longword math to word math for the
Smart buffer copy loop. This is a small optimization but we always
want the fastest possible results
Title: Re: Copymem Quick & Big Released!
Post by: SpeedGeek on January 08, 2024, 01:21:31 PM
** 8TH NEWS UPDATE **

CMQ&B040_SAFER2.3 released!

First safer version
- Added code to test for equal source and destination addresses
and avoid using Move16 for this specific case.
- Added code to test specified destination addresses and avoid
using Move16 for those cases.