Is there a documented limit other than 32bits and addresses used by other devices?
Local bus slaves (at least on A3000, the only reference I have) are within $08000000 to $0fffffff, so there is a limit there.
I don't see why you couldn't map some of the the RAM into another unused area like $80000000 to $FEFFFFFF, decode it yourself instead of using Gary, then link it in at runtime.
That's about 2GB of additional space without even getting touching the Z3 address space.