In my not so humble opinion, existing RTG on 3.x is a big fat, dissatisfying hack. Of course, it has to be in order to work, as Piru points out, the graphics.library, layers.library etc were designed around the original hardware. The RTG software patches all the critical stuff that opens screens, allocates bitmaps etc and perform rendering on it. Completely OS legal / no hw banging programs will generally work fine unter RTG.
What irks me about RTG under 3.x is that many calls that could be accelerated by the GPU on your graphics card by the driver don't appear to be if many of my old experiments are anything to go by. Apart from basic blitting and block fills, that is. Most GPUs presently used in amiga graphics cards offer a bit more than just that and the graphics.library has several routines that would make good candidates, eg BltBitMapScale(), or colourspace conversion when blitting between different colour formats (provided both BitMaps are in memory addressable by the GPU, that is) for instance.
Another problem with the 3.x OS routines at least, is that they are tied to an 8 bit view of the world. When you have a 16-bit RGB screen or better, the notion of dealing with ObtainPen() etc is just irritating beyond belief.
CGX/P96 both allow you to get hold of the pixel representation of a chunky/RGB bitmap and do low level custom rendering. Since they provide this ability, what they really missed out on was the potential to build upon that and provide an alternative low level graphics interface for games and multimedia apps that can be fully hardware accelerated (stretched blits, transparent blits, primitive rasterization etc).