Benchmarking between systems isn't just a matter of running a program with the same parameters on two systems and timing the difference. You have to be certain that ALL other parameters are equal.
If you a attempting a scientific test of a microprocessor running a specific bit of code then the above might be valid, sometimes. But the result will be completely meaningless because that's not what people buy.
In this case it is
systems being compared with neither the same hardware or operating system. The compilers versions supported are likely to be different and the same versions of the code might not compile.
It is not some small part of the system being compared but the entire thing. It's how fast can system X produce a result compared to system Y.
Look at the industry standard benchmarks - SPEC and TCP. They all run on wildly different systems, you are allowed to compile them however you want using whatever compiler you want on whatever OS / hardware combination you want. There is no requirement other than you can actually buy the system.
Consider the Top500 supercomputer list. It tests systems where *everything* is different. You are not only allowed to modify the source code but you are expected to!