It's called "Use the Os provided function".
So you really don't have a patch that small. If the "OS provided function" was fast enough than why would anyone bother to make a patch in the first place?
No, it's not. Again, whether bursting works over Zorro or not is a matter of luck. For my A2000, MOVE16 is *slow* when I move into the graphics card ram of the GVP spectrum. This is non-cacheable (!), imprecise, non-serial. Thus, the CPU may reorder accesses, does not need to expect bus-errors, but may not cache them, but yet, surprisingly, MOVE16 is slower than four moves. I already said why that is: Bursts over Zorro are no-no's, and the hardware may have to run in circles to get the data over the bus. We tested all this back then for P96, as it was suggested that MOVE16 may improve some blitter emulation cases. It does not. Worst, it may break things. Simply don't try that, it's a bad idea.
Other than that, have you made measurements which speed benefit this program has? I mean, in a realistic use case? If so, I would be interested to learn about your results. Which programs to run, what did the program do, and how did you measure?
The Zorro2 bus does NOT support Burst and so again as with Chip RAM Burst is a non-issue. Move16 does NOT need Burst to obtain a performance benefit. While it's certainly true Burst capable memory can improve Move16 performance it's true to same extent Burst would improve the performance of MoveL and any other instruction.
Move16 get's it main performance benefit because it interacts differently with the data cache than MoveL. This means Move16 is not affected by the worst case performance problem when the Copyback cache is enabled. This also means it can't benefit from the best case performance as MoveL can.
I have already posted a Testit result indicating a 44% speed increase with Move16 on EAB. As I said previously it's the SIZE of the copy which determines whether or not Move16 offers any performance benefit.
Move16 should not cause any problems with the MMU reordering a write to any Zorro2 or Chip RAM since the write cycle will be completed as 4 separate longword writes. But if you want to play it safe you can always fix the MMU config.
What's really surprising here is how people can continue to read the 040 and 060 documentation and ignore the very obvious:
5.4.6 Transfer Burst Inhibit (TBI)This input signal indicates to the processor that the accessed device cannot support burst mode accesses and that the requested line transfer should be divided into individual longword transfers. Asserting TBI with TA terminates the first data transfer of a line access, which causes the processor to terminate the burst and access the remaining data for the line as three successive long-word transfers. During alternate bus master accesses, the M68040 samples the TBI to detect completion of each bus transfer.