This course is more about embedded computers than desktops, and perhaps there's a bit of context influencing things in that distinction, I'm not sure. For the class lab assignments we're programming a Rabbit 3000 board, 8-bit stuff. But still, I just had to try and find some information about these things.
One of the examples I think of is the DMA on two serial ports in our ARM chip at work, which we're designing into an SOC. It seems the ARM goes a lot faster than serial ports, at least the serial ports when I had a modem, and I do realize that speeds have increased there far beyond what I ever used.
But, with a serial port doing DMA, it seems the CPU could have a few bus cycles between each serial port DMA bus cycle to do something else. Even with two serial ports each doing its own DMA, it would seem the CPU could have some free bus cycles to make use of. And that's ignoring internal caches.
Perhaps there's things that do saturate the bus for DMA, I suppose a big Gb ethernet or SATA transfer might be capable of that, or USB3. And that I can accept, that some things really do not leave any free cycles for the CPU until DMA is finished. But it seems like some things can be slow enough that sharing the bus (each for its own cycles of course) is possible, and in my/our minds, worth doing.
And I don't think that it makes sense to DMA no matter what. There is some overhead involved in configuring the DMA controller, telling it where to read from, where to write to, and how many times, to increment or not (If you're reading from a serial port for example, you're probably only reading from a single register every time, memory of course would increment to next address for every cycle). A single cycle DMA probably won't make any sense to do. It would need to be a transfer larger than your overhead cycles to begin being worth it I think.
For the very little DMA software I've written or looked at recently, the CPU does wait for DMA to complete. But this is a case of simulating the chip RTL before silicon, to make sure we have the DMA controller hooked up correctly inside the chip. There's nothing else to do but wait for the test results. There's no applications, no GUI to update, no OS, nothing else at all. I don't consider that a normal situation in writing software around a DMA controller.
I was just really surprised that I was the only one surprised by this detail in class that day. Thank you for making me feel like I'm on the sane side of this debate.
