My old 50Mhz 68030 could just about do it, but you'd probably need a 68040.
It would be cheaper and easier to use a specific MP3 decoder chip; you just feed mp3 data in one end and get audio data out the other. You can use any cheap CPU (PIC, for example) to feed it data and check for user input.