No real sense in DMAing into fast RAM and then transferring to chip RAM, so it's easiest to DMA (or PIO read) directly into double frame buffers. If there's a bit of fast RAM (for PIO overhead) there's probably no large penalty on PIO vs DMA. Access to IDE registers will happen in fast memory, as will ROM access. Driver and system structures residing in chip RAM will slow operation considerably however.
6 bitplane gfx costs you half CPU (or SCSI DMA) bandwidth on chip RAM during bitplane DMA. Since you're using no overscan, you can make up for part of the bandwidth loss during horizontal and vertical blanking.
You've got 225 DMA slots in each line, 312 lines/PAL frame. From the 225 cycles, the CPU can potentially use 112 (113?), 40 are lost to gfx, leaving 72. 72*256 + 112*56 = 24704 cycles per frame
16 bit chip RAM allows you to write 49408 bytes per frame, the A3000's 32 bit chip RAM would double that to 98816 bytes. Running 25 fps means you've got two frames for a full refresh (61440 bytes), so it shouldn't really be a big problem with optimized code, possibly even with a chip RAM only system!