The whole shared pens system sucks beyond belief when you are in RGB mode.
During work on my own code, I designed an abstract Draw2D class that is implemented in several versions that you obtain at runtime via a factory
1) pure software
2) pure software + OS routines (fills/blits)
3) warp3d
Guess which one I prefer?
Seriously, I found developing (2) the biggest nightmare going and if it wasnt for the possibility of HW accelerated fills I would have left it well alone.