The A4091 ROM won't replace anything in kickstart, rather it should should show up renamed as "2nd.scsi.device" (any IDE drives would continue to be managed by the scsi.device located in the Kickstart ROM).
Due to reliability issues with Z3 DMA, the transfer rate you're reporting is normal for the 4091. DMA to motherboard RAM (using a 3640 for example) is generally about 1 MB/s higher than DMA to accelerator RAM (and a bit faster still if the motherboard RAM is set to 60ns). Filesystem operations test roughly equivalent to accelerators I have here with onboard SCSI, so in practice, the 4091 "feels" very responsive using PFS3, and CPU use is negligible. (Sidenote: the Fastlane Z3 can be programmed to have a higher transfer rate than the 4091, but at the expense of CPU time). I use a 4091 with a CF AztecMonster in my 4000D, and really love the card.
Dave Haynie wrote lots of interesting info about the 4091 (various websites, and check the usenet archive).