The benchmark is just supposed to test CPU and bus speed.
This is what your testing really testing.
Avcodec the version mplayer was compiled with there is big difference between etch version.
newlib or clib depending on how mplayer was compiled.
OS API that newlib or clib depends on.
OS API that mplayer depends on.
The hardware you're running on.
The other programs that runs in background and stealing CPU cycles.
While you have the same code, you compile it; it can end up doing something completely different on different OS.
To make any sense, of anything need to unit test the smallest of things, to find the bottlenecks.
If you have the same code running on two different OS's, and one case the code is slower, and the other is faster, then it's not the CPU or bus speed that is course of it, how the OS and library's was compiled and what has been optimized and what has not.
If what you're testing is not 100% the same you never know way it is slower or faster.