And from where do you know that you have a gfx card in the system?
1. Ask the user.
2. Icon tool type and have two icons.
And is it worth to implement both methods?
Depends on the software. How much code are we talking about anyway? Two, maybe three kb extra? Hardly a waste if it means more users can enjoy the software.
For rendering graphics, it *usually* doesn't matter - not worth spending bytes on this decision.
It matters for the case I'm talking about:
2x 16 pixel wide background tile.
2x 16 pixel wide sprite mask.
2x 16 pixel wide sprite data.
2x 16 pixel wide second sprite mask.
2x 16 pixel wide second sprite data.
2x 16 pixel wide status gfx mask.
2x 16 pixel wide status gfx data.
That's seven sources. With a handwritten routine you can do this:
move.l (a0)+,d0
and.l (a1)+,d0
or.l (a1)+,d0
and.l (a2)+,d0
or.l (a2)+,d0
and.l (a3)+,d0
or.l (a3)+,d0
Do that twice, transpose, write to chipmem. After that you can unroll to use the pipeline on 20+ and get the transposes almost for free. I don't see how that's going to be anywhere near as fast with the OS blit function, so in this case it's crystal clear that it matters, because it lowers the CPU requirements.
Yes, as usual, it depends on the situation, but I currently cannot come up with a situation where I would need to blit something and not use the Os
I have another example. I wrote a simple real-time memory viewer. It opens a single bit plane 640x512 screen and blits 8x8 chars to the screen with my own code (which contains some optimized transposes from kalms' c2p routines). Very fast, and I highly doubt the OS can match that speed. It's important that such a program is fast because you're also running the program you're working on.
For example, we had the problem in P96 to copy memory quickly around (from the board to an off-board buffer, and reverse), but CopyMemQuick() is not even closely sufficient for that as it was necessary to copy "rectangular" memory regions with a different "modulo" factor from A to B. CopyMemQuick() cannot do that.
That's the whole reason. Something doesn't run at sufficient speed, or you know this is going to happen, or simply want to reduce CPU usage as much as possible, so you write your own code. Nothing wrong with that.
This is Amiga land after all. Lots of not so fast < 68060s out there, and you can do more on those lower end machines if your code is faster.