I haven't actually seen the new code yet, but from my Atari work it is easy to do this sort of thing.
Plane mode is least efficient from a DRAM point of view. What I did with the ST is to burst read a chunk from each plane from the DRAM and hold it in a local RAM cache. I can then combine each plane how I like. The DRAM access is very efficient as it is reading 8 word burst access per plane. I wrote it to support 32 planes, and they all share one RAM block.
For chunky modes you just cut up the data as it arrives into suitable size, um, chunks.
It has to work in legacy mode of course as well, these new modes are no use unless we write a driver to support them.
/Mike