@patrik: the impact of using >4 colors has on chip mem speed depends severly on overscan use - if you use extreme overscan (I always did before I had a gfx card, but I did have fast mem), the impact will be a lot higher.
With 16 colors the bus is 100% loaded while the scan line data is fetched and the CPU stalls. The CPU has to wait for the end of the scan line and with max overscan it's only a handful of cycles before the next line starts fetching, AFAIR it's <1/2 of non overscan bandwidth left.