Amiga.org

Amiga computer related discussion => Amiga Software Issues and Discussion => Topic started by: SpeedGeek on October 05, 2017, 09:34:29 AM

Title: FastCache040+ Released!
Post by: SpeedGeek on October 05, 2017, 09:34:29 AM
FastCache040+ 1.0 ©SpeedGeek 2017
             
INTRODUCTION:
FastCache040+ is a patch to replace the CachePreDMA() and
CachePostDMA() functions of most 68040/060 libraries. While
the old functions are adequate they are far from optimal.
These old functions have 3x more code then the new ones
provided with this patch!

Also, the new functions implement a much more efficient method
of managing the Copyback cache for DMA. While every system
will have some CPU performance loss under DMA conditions, the
new functions keep this performance loss to a bare minimum.          
               
FEATURES:
- Replaces CachePreDMA() and CachePostDMA() with smaller
  and more efficient code
- Replaces complex MMU code with simple and fast DTTR code
- Temporarily changes Copyback mode to Write Through for DMA
  (but only when required!)
- Never flushes the ATC!
- Never flushes the DC for Chip RAM DMA!          
- Uses 68040/060 library detection code
- Will not patch itself
- 100% Assembler code

CODE SIZE COMPARISONS:
- FastCache040+ 1.0  (NewFunc 132 bytes)
- 68060.library 46.7 (OldFunc 304 bytes)
- 68040.library 44.2 (OldFunc 414 bytes)  

REQUIREMENTS:
- Amiga with 68040 or 68060 CPU and MMU
- 68040.library or 68060.library

WARNING:
Do NOT use this patch with GigaMEM, VMM or any similar
virtual memory software! Do NOT use this patch with any
code which uses the MMU to write protect or remap modified
data structures!

NOTES:
Remapping a mirror image of the Kickstart ROM with the MMU
is OK! The new functions still have one thing in common with
the old functions. They do NOT translate virtual addresses
as specified in the Amiga RKRM! For more info on the old
functions see the Enforcer.guide by Michael Sinz.          

HISTORY:
v1.0 - First release

Here is the link:

http://eab.abime.net/showthread.php?p=1189690#post1189690
Title: Re: FastCache040+ Released!
Post by: Oldsmobile_Mike on October 05, 2017, 01:20:47 PM
How do these patches compare to THoR's mmu libraries?
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on October 05, 2017, 03:31:48 PM
Quote from: Oldsmobile_Mike;831335
How do these patches compare to THoR's mmu libraries?

Well, you probably don't know that he has insisted the old function code was the only way to guarantee reliable DMA transfers... and of course I strongly disagree with his claim. :furious:

Nevertheless, I don't recommend my patches for use with his MMU libraries.
They are not tested for compatibility, so if they do work it's by chance rather than by design.  ;)
Title: Re: FastCache040+ Released!
Post by: Oldsmobile_Mike on October 05, 2017, 10:12:25 PM
Quote from: SpeedGeek;831339
Well, you probably don't know that he has insisted the old function code was the only way to guarantee reliable DMA transfers... and of course I strongly disagree with his claim. :furious:

Well, that should be of no surprise to anyone. :laughing:

Good work on the patch, anyway! One of these years I still need to finish your A2091 speedup hack on my A2091.  :)
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on October 06, 2017, 03:45:25 AM
** NEWS UPDATE **

Sorry, there was a bug in v1.0 with the patch install code. :angry:

v1.1 - Fixed a bug which prevented the patch from installing
              - Added code to use OldCachePreDMA for MEMF_24BIT
       transfers (I don't know why errors occured here)
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on October 06, 2017, 02:13:31 PM
** 2ND NEWS UPDATE **

v1.2 released (updated patch size info)  
 - Added code to use OldCachePostDMA for MEMF_24BIT
transfers (So MMU Pages can be restored to original)

EDIT:
OK, I believe I have found a solution to the MEMF_24BIT transfer
error problem without OldPre/OldPost calls. Unfortunately, the cache mode would have to be changed to NoCache.

This would make the NewFunc code a little smaller but could reduce CPU performance a little for MEMF_24BIT transfers.

So it's a trade off situation... will give it some more thought! :biglaugh:
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on October 10, 2017, 09:14:05 PM
** 3RD NEWS UPDATE **

v1.3 Released!
- Added code to change MEMF_24BIT transfers to NoCache.
This eliminated all OldFunc calls. MEMF_24BIT transfers may have
some CPU performance loss but the NewFunc code performance
benefits should still justify this.

NOTES: v1.2 will still be available for download for users if they
believe using OldFunc calls is still justified. The v1.2 NewFuncSrc
for lbC00004E should read as follows:
CINVA    NC        ;Support 060, 040 not sure?

EDIT:
v1.4 Released!
- Removed MEMF_24BIT code from PreDMA/PostDMA for the
case of 16 byte aligned transfers. This will allow
some MEMF_24BIT transfers to be cache enabled!

EDIT2:
The v1.4 NewFuncSrc for lbC000080 should read as follows:
ORI.W   #$8000,D1    ;Cache WT mode + User FC
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on October 14, 2017, 01:43:26 PM
Ok guys, now it's your turn to post your compatibility results!

Please provide information on 68040.library or 68060.library vendor and  version. Also, accelerator card type and vendor is requested too. Thank  you! :)
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on October 15, 2017, 05:58:20 PM
** 4TH NEWS UPDATE **

The was another stupid version bug in v1.4 which has now been fixed (It  was a just a fully functional v1.4 reporting itself as v1.3).

I now have a simple benchmark tool called "CacheDMAmips" (see attached  image). I will probably release it when I am satisfied with the  compatibility results. ;)

EDIT: CacheDMAmips was removed for  providing bogus results. Obviously, programs compiled on an old "Pile  of Crap" C compiler and using v34 timer.device functions are not so  reliable. Mips benchmark results are generally bogus anyway! Thus a new  improved benchmark tool is called for! :biglaugh:
Title: Re: FastCache040+ Released!
Post by: matt3k on October 15, 2017, 10:57:15 PM
Hey Speed,

Been running 1.4 for a few days.  No issues so far.  System feels faster, not sure if it's reality. :)  Thanks for the great work.

System used for test:
Amiga 3000
Phase 5 Cyberstorm PPC
68060 version: 46.15

I have the command in my user-startup behind some other performance programs:
My CPU 060 Best
MemTrailer 96
MinStack 70000
CopyMem060
UtilPatch060
FastCache040+
Title: Re: FastCache040+ Released!
Post by: kolla on October 16, 2017, 12:41:20 PM
Quote from: matt3k;831781

My CPU 060 Best


Whaaaat? :confused:
Title: Re: FastCache040+ Released!
Post by: matt3k on October 16, 2017, 01:01:54 PM
Quote from: kolla;831798
Whaaaat? :confused:


http://aminet.net/package/util/misc/MyCpu060
Title: Re: FastCache040+ Released!
Post by: guest11527 on October 16, 2017, 03:11:31 PM
Quote from: matt3k;831781
Hey Speed,

Been running 1.4 for a few days.  No issues so far.

No wonder, there's nothing in your system that calls CachePre/PostDMA(), even though it should. Thus, a rather pointless exercise on your side.

There are reasons for these functions, and what this patch essentially does is that it disables or bypasses one of the functionalities the functions should have.

Their API is certainly not very wisely designed, though that is not a reason to break them...
Title: Re: FastCache040+ Released!
Post by: kolla on October 16, 2017, 03:47:44 PM
Quote from: matt3k;831801
http://aminet.net/package/util/misc/MyCpu060


Oh, so you run "MyCpu060 BEST", and not "My CPU 060 Best" :laughing:
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on October 17, 2017, 12:28:58 AM
Quote from: matt3k;831781
Hey Speed,

Been running 1.4 for a few days.  No issues so far.  System feels faster, not sure if it's reality. :)  Thanks for the great work.

System used for test:
Amiga 3000
Phase 5 Cyberstorm PPC
68060 version: 46.15

I have the command in my user-startup behind some other performance programs:
My CPU 060 Best
MemTrailer 96
MinStack 70000
CopyMem060
UtilPatch060
FastCache040+

Thanks for the info Matt! :)

Unfortunately, your system is very similar to my system (A3000, A3660, 68060.library 46.7). So hopefully, some users with different systems will post their results too.
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on October 17, 2017, 12:34:52 AM
Quote from: Thomas Richter;831804
No wonder, there's nothing in your system that calls CachePre/PostDMA(), even though it should. Thus, a rather pointless exercise on your side.

There are reasons for these functions, and what this patch essentially does is that it disables or bypasses one of the functionalities the functions should have.

Their API is certainly not very wisely designed, though that is not a reason to break them...

The A3000 scsi.device uses these functions, and just in case you didn't know Commodore was testing the 040 CPU prototype card (which eventually was replaced by the A3640) on the A3000 long before the A4000 was released!

http://www.bigbookofamigahardware.com/bboah/product.aspx?id=221
Title: Re: FastCache040+ Released!
Post by: matt3k on October 17, 2017, 01:14:13 AM
Lol I didn't notice that.  Funny.  Yes using mycpu060.  Love auto spell correct.
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on October 23, 2017, 04:39:06 PM
Ok, here are images of the new improved benchmark tool. Sadly, only 1 user has provided compatibility results so far? :rolleyes:
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on October 28, 2017, 04:42:41 PM
** 5TH NEWS UPDATE **

The new benchmark tool has now been released! The lamers who failed to  provide compatibility feedback owe a BIG THANKS to the users who did. A  very special Thanks to thebajaguy for providing feedback on multiple systems! :)

BTW, these benchmark results were easily predictable. It's a No-Brainer!
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on March 30, 2018, 02:45:03 PM
** 6TH NEWS UPDATE **

v1.5 - Found an occasional Recoverable Alert bug which could
possibly result in a crash but only on 060 systems!
The simple fix was to move "CINVA NC" in PostDMA to the
end of the code.
- Removed the "+" character from the executable name due
to a unknown "Feature" of the Amiga Shell causing script
execution and version command problems.

EDIT: [CPU060 NOWRITEBUFFER] with the Phase5 46.7 68060.library seems to  be a more reliable solution than the v1.5 update. Some more testing is  required.
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on April 01, 2018, 05:50:24 PM
** 7TH NEWS UPDATE **

v1.6 - Added code to PostDMA to Flush the cache conditionally
       (if the Store buffer and cache are enabled). Added NOPs
       to sync the pipelines before RTE (CINVA is now obsolete)

UPDATE:
68040 users can use v1.4 or v1.5 if they like since they will
be a little faster than v1.6 but 68060 users should use v1.6!
68060 users will now have a performance trade off to consider
in deciding whether to enable the store buffer.
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on April 04, 2018, 02:34:55 PM
** 8TH NEWS UPDATE **

v1.6P5 Removed code to allow PostDMA cache Flush for the case of      
       16 byte aligned transfers. Added code to skip PostDMA
       cache Flush for the case of cache disabled MEMF_24BIT
       transfers.

UPDATE:
v1.6P5 is my last attempt solve compatibility problems with
the Phase5 68060.library and Store buffer enabled. This
library is unstable and buggy WITH or WITHOUT FastCache040+
so either disable the Store buffer or expect the problems to
continue with only a MINIMAL improvement provided by this
patch!
       
v1.7 - Removed all v1.6P5 PostDMA cache flush code so most users
       (except Phase5 68060.library users) can run at full speed!

UPDATE:
Phase5 68060.library users should use v1.6P5. All others users
can (probably) use v1.4, v1.5 or v1.7 without any problems.
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on April 21, 2018, 01:09:45 PM
** 9TH NEWS UPDATE **

FastCache040+ v1.6P5 has been removed. Phase5 68060.library users should use FixMapP5 before using this patch.

FixMapP5 1.2 ©SpeedGeek 2018 (MMU Handler ©Michael Sinz 2001)
             
INTRODUCTION:
FixMapP5 is a tool to modify some of the default MMU mapping of
the Phase5 68040 and 68060 libraries. This can improve stability
and prevent crashing under the following condition:          

- Hardware or software interrupts which occur during a Chip RAM access by the 68060 (In particular when Store buffer is enabled).

Software bugs which allow illegal writes to the $F80000 Standard Kickstart ROM can cause a debugging problem in Copyback mode so this patch corrects that problem as well.

FEATURES:
- Changes Chip RAM mode to Precise (68060 only)
- Changes Standard ROM cache to Writethrough (68040 or 68060)
- Uses 68040/060 library detection code
- 100% Assembler code

REQUIREMENTS:
- Amiga with 68040 or 68060 CPU and MMU
- Phase5 68040.library or 68060.library

WARNING:
This tool was developed ONLY for use with the Phase5 libraries but
it does NOT actually verify such usage. So it can and probably
will mess up the mapping of ANY other libraries!        

CREDITS:
Thanks to Michael Sinz for his freely distributable MMU handler.

HISTORY:
v1.0 - First release
v1.1 - Added code to skip mapping $F00000 space (which included $F80000 space) for CyberstormPPC, CyberstormMK3 and BlizzardPPC
v1.2 - Replaced FindName() with FindResident() since v1.1 wasn't working at all. Also, fixed a typo on module names.
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on May 19, 2018, 07:30:09 PM
** 10TH NEWS UPDATE **

v1.8 Released!
- Reworked the code to eliminate a serious (but seldom noticed) data  transfer corruption bug for the case of multiple DMA drivers in the same  system. Special Thanks to Ralph Babel for his excellent knowledge on  this topic.
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on May 21, 2018, 04:59:07 PM
** 11TH NEWS UPDATE **

v1.9 Released!
- Fixed "D2 Register Not Preserved" coding bug in PreDMA.
Most DMA drivers don't seem to need it preserved but
Thanks to Cosmos for reporting it anyway. Moved PostDMA
Nest count code to user section of code. This eliminates
any calls to Supervisor when the count is more than 1.
v1.9BR Added new "Experimental" code which should allow only
DMA targeted 16MB blocks of Fast RAM to change to Write
Through mode. This "In Theory" allows the other 16MB
 blocks to remain in Copyback mode. This can only benefit
 "Big RAM" systems with 32MB+ of Fast RAM and ONLY when
 these systems run apps which use the extra Fast RAM.
 WARNING: Use at you own risk!

CACHEDMABENCH:
v1.0 - First release
v1.1 - Fixed address and size bugs in FC loop code which
could have affected the results.
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on October 21, 2018, 05:33:12 PM
** 12TH NEWS UPDATE **

FixMapP5 1.3 released

v1.3 - Swapped order of 68040/060 library test. Some OS 3.1
systems use a dummy 68040.library (which does not expunge)
and prevented the chip RAM change to precise. Thanks to
Northway for reporting this bug.
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on October 23, 2018, 02:21:48 PM
** 13TH NEWS UPDATE **

FixMapP5 1.4 released

v1.4 - Added code to determine the Chip RAM start address from the
       system memory list. Hopefully, this solves the problem with
       Kickstart versions which config the Chip RAM differently. 
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on November 25, 2018, 04:52:13 PM
** 14TH NEWS UPDATE **

FastCache040+ 2.0 released.

2.0 - Added code to enable only one DTTR when the Nest count
is one. Most systems have only one DMA driver and only need to
have 16MB of address space managed for this case.
Removed 1.9BR version which was over-rated due to most DMA
drivers operating at higher priority than typical user tasks.
Title: Re: FastCache040+ Released!
Post by: Pyromania on November 27, 2018, 03:18:06 AM
@SpeedGeek

Thanx for the release and the hard work.
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on December 20, 2018, 07:22:12 PM
** 15TH NEWS UPDATE **

FastCache040+ 2.1 released

v2.1 - Reworked the code to fix a problem with Snoopy 2.0 (Aminet).
Sorry, this version no longer supports 16 byte aligned cache enabled
MEMF_24BIT transfers. NOTE: The original P5 library functions have
problems with Snoopy too. I suppose FastCache040+ 2.0 should remain
available for the non-snoopers.

@Pyromania
Positive comments are always welcome!  :)
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on December 24, 2018, 01:49:49 PM
** 16TH NEWS UPDATE **

FastCache040+ 2.2 released

v2.2 - The Snoopy fix broke MEMF_24BIT transfers. So another
bug fix was required. Let's hope it's the last.
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on December 18, 2019, 01:34:18 AM
** 17TH NEWS UPDATE **

FastCache040+ 2.0 is no longer available and 2.2 is now the recommended version for all users. 2.2 is a little slower than 2.0 but it is also much more stable than 2.0. I was able to use 2.2 with the P5 68060.library (without FixMapP5) and there was only an occasional "Recoverable Alert" but never any hard crashing on my A3000/A3660 system.

THOR reported finding no instability problems at all with the P5 68060.library on his A2000/2060 system so there appears to be some hardware issues complicating the stability problem too. Hence, FixMapP5 is now optional and it's usage should be based the users determination of improved stability.
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on March 07, 2020, 05:22:57 PM
** 18TH NEWS UPDATE **

FastCache040+ 2.4 released.

2.3 - The 16 byte alignment code is back and now avoids the
change of cache mode for this specific case. Removed
Continue case from PreDMA since the expected results are
the same as the Non-Continue case. The cache disable test
code was removed to save the overhead of this very
uncommon case.

2.4 - Reworked PostDMA code to fix Nested call cache flush bugs.
We really don't want to forget about systems with multiple
DMA drivers do we?
Title: Re: FastCache040+ Released!
Post by: klx300r on March 08, 2020, 06:56:09 AM
thanks for the update :)
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on October 26, 2022, 05:17:32 PM
** 19TH NEWS UPDATE **

FastCache040+ 2.5 released!

v2.5 - Fixed another rare but possible bug with DMA transfers
crossing the 16MB boundary of the DTTR! So now (except for
MEMF_24BIT DMA transfers) both DTTRs are enabled to
manage the full 32MB of address space.
Title: Re: FastCache040+ Released!
Post by: SpeedGeek on March 01, 2024, 01:30:17 PM
** 19TH NEWS UPDATE **

FastCache040+ 2.6 released!

v2.6 - The previous bug fix only worked for addresses in the 16MB
range. This fix should should now work with any address.