Author Topic: question about DMA (Read 4845 times)

billt · « **Reply #14 from previous page:** September 13, 2011, 07:29:08 PM »

This course is more about embedded computers than desktops, and perhaps there's a bit of context influencing things in that distinction, I'm not sure. For the class lab assignments we're programming a Rabbit 3000 board, 8-bit stuff. But still, I just had to try and find some information about these things.

One of the examples I think of is the DMA on two serial ports in our ARM chip at work, which we're designing into an SOC. It seems the ARM goes a lot faster than serial ports, at least the serial ports when I had a modem, and I do realize that speeds have increased there far beyond what I ever used.

But, with a serial port doing DMA, it seems the CPU could have a few bus cycles between each serial port DMA bus cycle to do something else. Even with two serial ports each doing its own DMA, it would seem the CPU could have some free bus cycles to make use of. And that's ignoring internal caches.

Perhaps there's things that do saturate the bus for DMA, I suppose a big Gb ethernet or SATA transfer might be capable of that, or USB3. And that I can accept, that some things really do not leave any free cycles for the CPU until DMA is finished. But it seems like some things can be slow enough that sharing the bus (each for its own cycles of course) is possible, and in my/our minds, worth doing.

And I don't think that it makes sense to DMA no matter what. There is some overhead involved in configuring the DMA controller, telling it where to read from, where to write to, and how many times, to increment or not (If you're reading from a serial port for example, you're probably only reading from a single register every time, memory of course would increment to next address for every cycle). A single cycle DMA probably won't make any sense to do. It would need to be a transfer larger than your overhead cycles to begin being worth it I think.

For the very little DMA software I've written or looked at recently, the CPU does wait for DMA to complete. But this is a case of simulating the chip RTL before silicon, to make sure we have the DMA controller hooked up correctly inside the chip. There's nothing else to do but wait for the test results. There's no applications, no GUI to update, no OS, nothing else at all. I don't consider that a normal situation in writing software around a DMA controller.

I was just really surprised that I was the only one surprised by this detail in class that day. Thank you for making me feel like I'm on the sane side of this debate.

Zac67 · « **Reply #15 on:** September 13, 2011, 08:27:16 PM »

I guess he was talking of that system which may have a rather restrictive way to DMA - nothing bad in embedded.

For serial, it does make sense to do even single cycle, repetitive DMA for longer I/Os (depending on serial and memory speed) - you can have a higher throughput with much less overhead in with PIO:
- set up buffer start address
- set up buffer size / max I/O size (for input)
- set up timeout
- go play somewhere else
- until an interrupt tells you it's done

For PIO:
- install IRQ handler
- set up buffer start address
- set up buffer length
- set up timeout timer IRQ
- do something else
- on each interrupt in the IRQ handler:
- restore current buffer address
- read serial port buffer
- copy to buffer
- increase buffer pointer
- compare buffer length
- on timer IRQ:
- cancel the pending I/O

You see, DMA is much less hassle. IRQ handling can be much shorter which is a very good thing in embedded where you usually try to be very deterministic.

Starting with PCI, NICs started using the main memory as buffers with DMA. The achievable speed is high enough and once you've built the (simple) DMA engine you get away with extremely small hardware buffers, saving costs and reducing latency(!): when the NIC signals the frame done the data's already present in main memory and the software doesn't need to wait for the driver to copy the data out of the NIC.

Zac67 · « **Reply #16 on:** September 13, 2011, 08:37:44 PM »

Quote from: billt;659245

For the very little DMA software I've written or looked at recently, the CPU does wait for DMA to complete.

That highly depends on the system it's all running on. A single tasking OS has no way to spend the time elsewhere so you have to wait. (Unless you've implemented asynchronous I/O which usually isn't the case.) In a multitasking OS, the driver simply lets go of the CPU after DMA setup and is revived once the 'finished' IRQ drops in. All the other threads can do what they want during that time.

itix · « **Reply #17 on:** September 13, 2011, 10:01:38 PM »

Quote from: billt;659245

And I don't think that it makes sense to DMA no matter what. There is some overhead involved in configuring the DMA controller, telling it where to read from, where to write to, and how many times, to increment or not

Not only that but often on embedded system you have to maintain cache coherency yourself. Allocating cache aligned buffers, flushing caches and other housekeeping just for DMA can be much burden sometimes. If one used simple CPU copy it can skip all that stuff and data could be ready in CPU caches when you start processing it.

And of course if DMA controller can not transfer more than 64kB at once (for example) you have to split transfers but this can open new issues with the software design... :-)

On the other hand DMA controllers are often good for memory fill so it could be good substitute for good old memset().

Author Topic: question about DMA (Read 4845 times)

billt

Re: question about DMA

Zac67

Re: question about DMA

Zac67

Re: question about DMA

itix

Re: question about DMA