If you a attempting a scientific test of a microprocessor running a specific bit of code then the above might be valid, sometimes. But the result will be completely meaningless because that's not what people buy.
It's still meaningless. If you want I can benchmark a Core i7 and have it produce results slower than an Amiga 500. That's why it's meaningless, you need to know all the facts and there has to be constraints.
If a benchmark isn't the same across machines, then it has to be transparent, and the differences must be documented.
It's kind of like running quake benchmarks. One person runs it at 1600x1200 with a software renderer, the other plays it with Quake and runs it at 320x200.
Which is the better computer? without all the facts, you can't say. Ergo the benchmark is useless.
It is not some small part of the system being compared but the entire thing. It's how fast can system X produce a result compared to system Y.
Which only makes that benchmark valid for that operation using that software, and shouldn't be used in the computing equivalent of "my dad can beat up your Dad" arguments. It tells you nothing because it doesn't tell you where the bottleneck is.
You could run the same task now and have a wildly different result, therefore you're testing the software (which can get changed) and applying it to the system. That is not good benchmarking! At the very least it should be made clear the EXACT system being benchmarked, the software used, the compiler used and the compilation options. For all we know he could have used numbers from a debug version!
Consider the Top500 supercomputer list. It tests systems where *everything* is different. You are not only allowed to modify the source code but you are expected to!
I suspect those will more tightly regulated. I severely doubt the Top500 is worked out by trawling the forums for numbers and not even knowing what changes were made and what the systems were, which is how that graph came about