Incidentally, I ended up getting my OGL code working with glTexSubImage2d, and it's a lot faster than CALayer (using image in RGB5551, which is fast path in CG). I now get a consistent 60fps in full screen 2x scaling on i4 (960x640).
Using shader effects has been interesting. It doesn't take much to drop below 60fps, and dependent texture reads totally annihilate perf on the retina display. Dropped to as low as 23fps with a simple 2-pixel multi-texture shader to create a scanline effect (actually pretty consistent no matter how big the effect texture is. If you calculate tex-coords inside shader, perf goes out the window.
Implemented a native scanline shader effect (see below), and fps is consistently 60fps again, so it's definitely texture reads.
varying mediump vec2 TextureCoordOut;
uniform sampler2D DisplayTexture;
uniform lowp float EffectAmount;
void main(void)
{
gl_FragColor = texture2D(DisplayTexture, TextureCoordOut);
gl_FragColor.a = mod(gl_FragCoord.y, 2.0) + EffectAmount;
}
Cheers,
Stu