And I don't think that it makes sense to DMA no matter what. There is some overhead involved in configuring the DMA controller, telling it where to read from, where to write to, and how many times, to increment or not
Not only that but often on embedded system you have to maintain cache coherency yourself. Allocating cache aligned buffers, flushing caches and other housekeeping just for DMA can be much burden sometimes. If one used simple CPU copy it can skip all that stuff and data could be ready in CPU caches when you start processing it.
And of course if DMA controller can not transfer more than 64kB at once (for example) you have to split transfers but this can open new issues with the software design... :-)
On the other hand DMA controllers are often good for memory fill so it could be good substitute for good old memset().