While much maligned, MMX was actually a pretty big leap in mainstream computing evolution at the time. While it was awkwards to use for some types of processing, using it in my audio engine for mixing/DSP sped it up considerably. Up to 8x performance in some parts of the engine, with other parts getting something between 1.5 and the 8x speedup. That's some pretty darn good speedup for the time! It's one of those rare things that got me excited about computing again.
In all my years since then I'm not sure I recall any kind of architectural change that brought on that much of a further leap in performance. Sure SSE and all it's variants were added, but it wasn't actually any faster than MMX, just much easier and more precise to use.