I don't agree with this. There are many diferences in PPC.
1)Some of them have Altivec and some not. (is it a small difference??)
2) clock speeds range from less than 300 Mhz (I don't remember exacly, I guess the first 603 where even les than 200Mhz) up to 1.4Ghz. And the 970 is coming..
3) the number of functional units=> the digreee of parallelism is different.
Both families have their peculiarities, but I think that...
Some 680x0 don't have FPU and others don't have MMU. That's a big difference in my opinion. Some 680x0 have caches and other don't have at all. Some are superscalar (like the 060) and other not...
And first generations have instructions that doesn't work with next ones. For example some 68000 instructions or the change from 882 to the 040 or 060 FPU...
On the other hand afaik the instruction set hasn't changed in the ppc series and code written for the first series works without problems in the latests without doing changes or emulating missing instructions. Ok, Altivec is a BIG change, enough to make interesting the development of demos/intros only to get the most of that unit.
Speed. Well from 300Mhz to 1.4Ghz is a 460% increase in the frequency.
From a 7Mhz 68000 to a 50Mhz 68030 is a bigger change, we change from a 16bit bus to a 32bit one, and that is a 714% if we only look at the Mhz figures.
So there's more difference between different 680x0 generations than between different ppcs. PPCs are all superscalar while 680x0 not and you have to take care about this...
A ppc without Altivec and other with Altivec is as different as a 68030 without FPU and other with FPU. And it's more funny because Altivec instructions haven't changed and aren't emulated and 040 instructions are different from 882 etc...
So I think that there are more differences in the 680x0 family than in the ppc family. In the 680x0 family we even have the 68008 with a 8 bit data bus but I will not count it because it hasn't been used in Amigas.
That is what i mean, I even have interest in optimizing for a specific board, not just for a specific CPU!
There's nothing stopping you from optimizing for a specific AmigaOne with G4 and making optimized code for AmigaOne G3 if you want. Ok ;-) I know that you don't have interest in RTG, but if the problem was the cpu it may not be a problem, you may optimize for different cpu modules (and if different boards appear, for different boards)
I'm not going to talk again about different graphic cards, I know that my point is clear: for me the biggest bottleneck is the cpu. Anyway if you don't have problems supporting different cpu boards (that give around 50MB/s with fastram like yours and around 30MB/s like mine), you wouldn't have problems supporting a few gfx boards were the only difference is different bandwitch. You have that problem with AGA too if you are using a cbm 3640 in an A4000 and someone is using another board for example yours... yours will give nearly 7MB/s and the poor 3640 only 4MB/s... so the problem is not bandwitch, the most important thing I see is the use of specific aga features as the ones psyco and you have talked about (blitter, copper, perfect sync, sprites...)
You can optimize a lot but sometimes people who has a machine slower than the one you have decided that is going to be the target machine tries it and it will run unoptimized and slowly. And if a do a demo for my mk2 I'll find that it may be more optimized for mk3... so we have more control with AGA, but you can only optimize at 99% for a few machines. There are always small changes between the machines that make that all machines aren't used at 100%.
Why use RTG instead of a PC? well, you still have the cpu and the rest of the Amiga. I think that doing optimized altivec code could be quite interesting... I'm not talking about converting existing routines to work with altivec (that usually not gives so impressive results), but designing new routines from scratch to make the best use of the vector unit. It could be as fun as coding the DSP in a Falcon.
And talking about 3D libraries, afaik Warp3D works at a lower level than Direct3D and OpenGL, so you can optimize a lot your code. Just look at the Warp3D demos using a Virge and compare that to a Virge in a PC. You will see that Amiga Virge 3D stuff runs faster.
People with real interest in demos usually has scandoublers. For example I have two, both are 24bits. Don't use DCE/phase5 ones like the one included with the CV3D, it's not 24bit and you will notice it soon if you are used to the real colours of demos and see some gradients.
I agree somewhat with MagicSN, it's important to be able to watch the demo. for example, A3000 users may be quite happy watching a RTG production. I guess that A2k users would prefer watching something that nothing. But this is always a decision of the makers of the demo, if they had to care about anything exists (like making a demo that uses at 100% an A500, but if you have a 4000PPC/voodoo5 it uses warp3d and AHI... that would be ridiculous and very tedious, testing every hardware configuration)