the GX (Gekko) has added SIMD instructions (also found in the Nintendo Gamecube). So the G4's vector processing advantage is lessened, the GX is also a newer design so it's internally more efficient than the G4 for most things.
The 750GX has almost nothing in common with the Gamecube's Gecko, which seems to be just a simplified 750CXe with a basic SIMD unit smashed on. The 750GX, like all the current 750 series, has no SIMD capability at all.
My pet theory is the the GX's large L2, coupled with improvements in the way the cache is managed and interfaced to the core, has reduced the number of times the pipeline is stalled by main memory hits to some critical point where the core can get much closer to its theoretical maximum throughput that earlier 750 designs can.
And that theoretical maximum is significantly higher for the 750's 4-stage core, at least on integer code, than for the G4's 7-stage core. The G4 is supposed to counter that by scaling to higher clocks than the 750 (which it does, of course, 1.6GHz against 1.0GHz currently) but the G4s on the A1 and Peg2 are simply not clocked high enough to overcome the 750GX's advantages.
If IBM can ever cure the Condition Register bug and actually get the 750GX working reliably above 1GHz then even the latest 1.6GHz G4 will have trouble staying ahead of it.