SIMD instructions?
vector processing instructions?
maybe also, a chunky-to-planar instruction? although it's been said that the '060 can already max out the chipmem bus bandwidth with software c2p so maybe it isn't needed.
how about something to speed up the DCT and inverse DCT?
http://en.wikipedia.org/wiki/Discrete_cosine_transform