Author Topic: CopyMem Quick & Small released! (Read 14177 times)

Thorham · « **on:** January 02, 2015, 07:22:48 PM »

Quote from: Thomas Richter;780921

I would rather say that a an application that critically depends on memory-copies implements the copy itself, without going through the Os as there are many other factors only the calling program can know. For example, a "move" moves into and out of the cache. A move16 does not. Is that good or bad? MOVE16 doesn't "pollute" the cache. move "already fills the cache with the target data". Whether that is something you want or do not want cannot be distinguished by CopyMemQuick(). It is something only the calling program can known - and hence, only the calling program can select the optimal strategy. CopyMemQuick() is the "Ford Escord" you may select if it is "fast enough", so it's usually not worth the trouble patching into this call, even more so as it is rarely used.

Remember our OS blitting routine argument? You just stated the reason for writing one's own blit routine: A one size fits all routine isn't always the best solution.

Thorham · « **Reply #1 on:** January 03, 2015, 07:03:08 AM »

Quote from: Thomas Richter;780935

For the blitter, you get however a substantial disadvantage from not using the Os: If you try to implement a graphics primitivity, it might simply not work on an rtg system if you don't use the Os. Is it worth not using the Os? Typically not, because you "shoot yourself in the foot".

Whether it's worth it or not depends entirely on one's requirements. The OS blit routine is generic, and therefore unsuitable for fast, non-generic blits, even when the nature of the blits is very simple (you can see this clearly when you look at the function call). The right solution for getting both maximum performance on native screen modes and have GFX card compatibility is to simply implement both methods.

Thorham · « **Reply #2 on:** January 03, 2015, 02:52:15 PM »

Quote from: Thomas Richter;780956

And from where do you know that you have a gfx card in the system?

1. Ask the user.
2. Icon tool type and have two icons.

Quote from: Thomas Richter;780956

And is it worth to implement both methods?

Depends on the software. How much code are we talking about anyway? Two, maybe three kb extra? Hardly a waste if it means more users can enjoy the software.

Quote from: Thomas Richter;780956

For rendering graphics, it *usually* doesn't matter - not worth spending bytes on this decision.

It matters for the case I'm talking about:

2x 16 pixel wide background tile.
2x 16 pixel wide sprite mask.
2x 16 pixel wide sprite data.
2x 16 pixel wide second sprite mask.
2x 16 pixel wide second sprite data.
2x 16 pixel wide status gfx mask.
2x 16 pixel wide status gfx data.

That's seven sources. With a handwritten routine you can do this:

Code: [Select]

    move.l  (a0)+,d0
    and.l   (a1)+,d0
    or.l    (a1)+,d0
    and.l   (a2)+,d0
    or.l    (a2)+,d0
    and.l   (a3)+,d0
    or.l    (a3)+,d0

Do that twice, transpose, write to chipmem. After that you can unroll to use the pipeline on 20+ and get the transposes almost for free. I don't see how that's going to be anywhere near as fast with the OS blit function, so in this case it's crystal clear that it matters, because it lowers the CPU requirements.

Quote from: Thomas Richter;780956

Yes, as usual, it depends on the situation, but I currently cannot come up with a situation where I would need to blit something and not use the Os

I have another example. I wrote a simple real-time memory viewer. It opens a single bit plane 640x512 screen and blits 8x8 chars to the screen with my own code (which contains some optimized transposes from kalms' c2p routines). Very fast, and I highly doubt the OS can match that speed. It's important that such a program is fast because you're also running the program you're working on.

Quote from: Thomas Richter;780956

For example, we had the problem in P96 to copy memory quickly around (from the board to an off-board buffer, and reverse), but CopyMemQuick() is not even closely sufficient for that as it was necessary to copy "rectangular" memory regions with a different "modulo" factor from A to B. CopyMemQuick() cannot do that.

That's the whole reason. Something doesn't run at sufficient speed, or you know this is going to happen, or simply want to reduce CPU usage as much as possible, so you write your own code. Nothing wrong with that.

This is Amiga land after all. Lots of not so fast < 68060s out there, and you can do more on those lower end machines if your code is faster.

Thorham · « **Reply #3 on:** January 03, 2015, 08:08:59 PM »