I'm not aware that this is a general problem. i.e. a SCSI controller only has ram on it as a convenience to the user for reducing the number of cards required. If there is an additional card that has ram with the same MEMF flag then that is also acceptable (usually 24 bit dma).
Unfortunately, that is not true in this generality. For example, there is a batch of A4000 where CBM in their wisdom installed the wrong bus drivers for motherboard RAM (inverting instead of non-inverting). For the CPU, it doesn't make a difference since the data is read and written just "upside down". For DMA, it means that you get all your data inverted from the disk into the RAM. Thus, the only safe destination is really RAM on the card. Yes, surely its broken.
I agree that there are some theoretical issues with MEMF and it is also inconvenient as it requires external knowledge and it would have been better if amiga had been forced into providing a better solution. My argument is that having a central system which coordinated memory allocation between components so they are as optimal as possible and only if that all falls down a fallback that allocates buffers is a much better solution than what you propose.
Then again, where do you suppose to make the cut? At File system level? At user program level? If you make it at file system level, there is no freaking difference. Wether the FFS copies the data, or the device copies the data makes no difference. The data needs copying anyhow. Broken hardware causes problems. If the user has to copy data, you're putting a lot of burden and responsibility at the user level, and given the general leeway users handle AmigaOs requirements, this does not explicitly help to improve the robustness of the overall system. Leave alone you're also creating problems at interoperability level - POSIX and the C-standard lib do not know anything about memory types, thus any program requiring any type of interoperability would require a C-library layer which performs the copy. One way or another, you cannot get rid of the copy except in exceptionally good circumstances, and *that* is something you could arrange *even though* the device implements a copy if it has to.
I don't have a problem with a device signalling "oh, by the way, I would work faster if you would align buffers in this specific way". I do have a problem with devices "constructed" the way we find them today "Oh, if you don't allocate memory properly, I will just overwrite some other random stuff and probably hang or crash, depending on my mood..."
AmigaOS is the only one that sends messages without copying them too. I understand your argument, I understand why the other operating systems do it that way. I don't buy that it's the correct way that AmigaOS should work.
AmigaOs passes by reference because that's simply the only thing that could reasonably be done back then with the limited resources of the 68K. Other systems had no multithreading at all (Apple, MS) and would require running in circles to create the illusion (MacOs copying entire patch-tables for its system resource to keep programs happy, as in "SetFunction() would be task specific". If you would construct a system nowadays, you would probably just copy the messages from one address space to another, just to make sure that you isolate tasks properly. But that's an entirely different discussion. At least theoretically, message passing by pointers is working. But given that there is not even an interface to query memory requirements of devices makes the current system completely corrupt. The end user simply "has to know" what a device can do, and there is not even a central repository to store such information. It is exclusively used by the filing system, but *other* programs having to access the harddisk somehow have to know the right values "by magic". Or not - and create a disaster.
I've been in a similar position to you where I've justified each component doing copies on a framework that I built, but it all ended up being memory hungry and slow. On a further redesign I was able to reverse my decision and the difference was visible.
But we're talking about AmigaOs, and as I already said: At *some* point, the copy has to be made. At file system level, or at device level. The question is just *where*. And the answer is quite obvious: Whoever created the mess must clean up. So it's the device because that's the only point where all the knowledge about constraints and restrictions is known. Hopefully. The filing system or the end user can hardly know all the details.