In hindsight, I'm also of the opinion that a less modular warp3d implementation for classic makes sense. For example, having a whole array of precompiled warp3d libraries and you install the one specific to your rtg/3d chip.
This may sound like heresy and a huge retrograde step but it has a lot of advantages for older systems. For example, in warp3d the rtg library component handles allocation of memory and the 3d component has no say, but on an 8MB card the allocation strategy that works best will nor be the same as some 128MB one. Also, it would remove many layers of indirection and partial features like multiple card support that add a lot of complexity. Finally, just support v4 and v5 vertex arrays and emulate the v3 calls using them. A single drop in library that will either work or fail to open if it isn't correct for your system.