Author Topic: Demos using a GFX mode please ! (Read 30481 times)

Crumb · « **Reply #14 from previous page:** February 06, 2003, 11:23:08 AM »

@psycho
thanks for joining the thread

Quote

you actually have to differ between 1260 and 4060.

I've seen that 1240/40 is slower than a 4040/40 writing in chipram, so I understand that. But looking at the web page of speeds provided by Darkcoder, the speed of a blizzard 1260 & a an CS4060 MK2 seems to be quite similar. I guess you do that because Apollos 1260 seem to write more slowly in chipram. Now I see why you use 2 different c2p :-)

Quote

1) mode changes is a bad idea. So no hires (ham8) pictures etc..

what about using an ASL requester before starting the demo? mmm although I guess you do it because some monitors need too much time to switch from one mode to other.

Quote

2) no hardware sprites. So overlays have to be rendered on top of the buffer, which is slower and doesn't allow smoth interrupt driven movement on top of a less smooth routines.

And if you use BltBitMapMaskRastPort() with a image you have in the gfx card mem? The blitter of gfx cards is usually faster, doesn't it help? Couldn't be made an interrupt server each 1/50 second for example that makes a BltBitMapMaskRastPort() with the image we want to write? I think that would work at 50fps and we may have a slow routine in the background that goes at 15fps for example that may be interrupted to paint our bob, the slow routine then would continue painting its screen. This is just an idea of how I would try to do it... I haven't tried and I don't have a clue... it may work or not, but I think that it should work.

Quote

3) No well known refresh rates. So you can't do an exactly 25/50fps effect, and remember that exactly 25fps on a 50hz display would usually look smoother than 35fps on hz display. (ie. I can only recommend TheCastle.cgx to noaga or winuae ppl).

Yes that's difficult to make with a gfx card :-/
I would try to read the actual hz of the screen and would choose the closest submultiple to the one I want to use. For example, with a 72hz screen I would try to use 24fps. With a 60hz screen 20 or 30, it depends if you want it smoother or think that your graphic functions and graphic buffer will help enough. There's a function called WaitTOF that should help to keep the graphics syncronized with the vsync (afaik the bad point is that some gfx card don't have a vsync interrupt at all). Then you may use all the gfx ram as a buffer as you did in "The Castle". The test to get the maximum fps should be done once you have opened the screen. Anyway AGA is more elegant to do that...

Quote

no palette-synchronization (ie a new palette for every frame in an effect).

If you wait until the frame is completed and change the palette it wouldn't work? :-(
And with WaitTOF()?
I'm talking about 8bit screens, with 16bits it would be a lot of work... gfx cards seem to be quite fast changing the palette, I thought that It would be possible to change it in every hz.

I'm sorry for the mistakes I've comitted but I have not much experience. What do you think of some of my ideas? I hope at least one of them works ;-D

@darkcoder:

Quote

You say it's better to do:

write 4 long
some instrucions
write 4 long

??

That may be thanks to the small cache the procesor has to write in burst mode? That may be easy to do in asm, but I'm not sure if I could do that in C... maybe using a pointer to the cache address? I'm not sure. I use a 32bit variable that will probably be stored in a register... I don't know how to control with so much detail the cpu cache in C...
I have another doubt... when I write something to ram it goes to the burst cache (I think it's also called write pending buffer), but is it copied to ram if I try to write another longword? or the cpu waits until the burst cache is full? I've read somewhere that when the cpu access zorro3 the caches are flushed.

Best Regards from a newbie ;-D

Crumb · « **Reply #15 on:** February 06, 2003, 02:25:42 PM »

Quote

Last time I checked (ok, it's a long time ago) BltBitMapMaskRastPort was not accelerated.

I haven't tried much... you are probably right (but it's a pity that it's not accelerated). With 16bit screens I guess that it will be done by the cpu, with 8 bits it may be different.

Thanks for the tips about using WaitBOVP() instead of WaitTOF() ;-)

Crumb · « **Reply #16 on:** February 06, 2003, 10:51:25 PM »

Quote

I don't agree with this. There are many diferences in PPC.
1)Some of them have Altivec and some not. (is it a small difference??)
2) clock speeds range from less than 300 Mhz (I don't remember exacly, I guess the first 603 where even les than 200Mhz) up to 1.4Ghz. And the 970 is coming..
3) the number of functional units=> the digreee of parallelism is different.

Both families have their peculiarities, but I think that...
Some 680x0 don't have FPU and others don't have MMU. That's a big difference in my opinion. Some 680x0 have caches and other don't have at all. Some are superscalar (like the 060) and other not...
And first generations have instructions that doesn't work with next ones. For example some 68000 instructions or the change from 882 to the 040 or 060 FPU...

On the other hand afaik the instruction set hasn't changed in the ppc series and code written for the first series works without problems in the latests without doing changes or emulating missing instructions. Ok, Altivec is a BIG change, enough to make interesting the development of demos/intros only to get the most of that unit.

Speed. Well from 300Mhz to 1.4Ghz is a 460% increase in the frequency.
From a 7Mhz 68000 to a 50Mhz 68030 is a bigger change, we change from a 16bit bus to a 32bit one, and that is a 714% if we only look at the Mhz figures.
So there's more difference between different 680x0 generations than between different ppcs. PPCs are all superscalar while 680x0 not and you have to take care about this...
A ppc without Altivec and other with Altivec is as different as a 68030 without FPU and other with FPU. And it's more funny because Altivec instructions haven't changed and aren't emulated and 040 instructions are different from 882 etc...
So I think that there are more differences in the 680x0 family than in the ppc family. In the 680x0 family we even have the 68008 with a 8 bit data bus but I will not count it because it hasn't been used in Amigas.

Quote

That is what i mean, I even have interest in optimizing for a specific board, not just for a specific CPU!

There's nothing stopping you from optimizing for a specific AmigaOne with G4 and making optimized code for AmigaOne G3 if you want. Ok ;-) I know that you don't have interest in RTG, but if the problem was the cpu it may not be a problem, you may optimize for different cpu modules (and if different boards appear, for different boards)

I'm not going to talk again about different graphic cards, I know that my point is clear: for me the biggest bottleneck is the cpu. Anyway if you don't have problems supporting different cpu boards (that give around 50MB/s with fastram like yours and around 30MB/s like mine), you wouldn't have problems supporting a few gfx boards were the only difference is different bandwitch. You have that problem with AGA too if you are using a cbm 3640 in an A4000 and someone is using another board for example yours... yours will give nearly 7MB/s and the poor 3640 only 4MB/s... so the problem is not bandwitch, the most important thing I see is the use of specific aga features as the ones psyco and you have talked about (blitter, copper, perfect sync, sprites...)

You can optimize a lot but sometimes people who has a machine slower than the one you have decided that is going to be the target machine tries it and it will run unoptimized and slowly. And if a do a demo for my mk2 I'll find that it may be more optimized for mk3... so we have more control with AGA, but you can only optimize at 99% for a few machines. There are always small changes between the machines that make that all machines aren't used at 100%.

Why use RTG instead of a PC? well, you still have the cpu and the rest of the Amiga. I think that doing optimized altivec code could be quite interesting... I'm not talking about converting existing routines to work with altivec (that usually not gives so impressive results), but designing new routines from scratch to make the best use of the vector unit. It could be as fun as coding the DSP in a Falcon.

And talking about 3D libraries, afaik Warp3D works at a lower level than Direct3D and OpenGL, so you can optimize a lot your code. Just look at the Warp3D demos using a Virge and compare that to a Virge in a PC. You will see that Amiga Virge 3D stuff runs faster.

People with real interest in demos usually has scandoublers. For example I have two, both are 24bits. Don't use DCE/phase5 ones like the one included with the CV3D, it's not 24bit and you will notice it soon if you are used to the real colours of demos and see some gradients.

I agree somewhat with MagicSN, it's important to be able to watch the demo. for example, A3000 users may be quite happy watching a RTG production. I guess that A2k users would prefer watching something that nothing. But this is always a decision of the makers of the demo, if they had to care about anything exists (like making a demo that uses at 100% an A500, but if you have a 4000PPC/voodoo5 it uses warp3d and AHI... that would be ridiculous and very tedious, testing every hardware configuration)

Crumb · « **Reply #17 on:** February 07, 2003, 05:37:03 PM »

@darkcoder & the AGA gurus

Quote

There is not much to answare about slow speed of AGA with 18 bit HAM8 mode or hires/superhires displays. IT IS slow, I know.

HAM8 uses 8bitplanes, so copying a normal 8bitplane screen shouldn't slow it down much... but it requires lots of cpu power... 040s at 40Mhz achieve almost copyspeed when they do chunky to planar, wouldn't be possible with 060/50 to also do the convertion from 16bits to ham8? coneverting from a 16bit screen to ham8 may take cpu power, but once the convertion is done, the AGA display should be refreshed as fast as if it wasn't using a ham8 mode. I'm not sure if a 060 will be able to do this, but a ppc should be able to do this kind of stuff.

If the game is designed to run at least at 640x480, the chunky to planar routines can be done as in "Phase One" of Capsule, if you are using an interlaced screen there's no need to convert from chunky to planar the lines that aren't going to be seen in a refresh, so the amount of data that has to be written to chipram is halved. So the speed using interlaced modes will be even bigger... With a ppc board with a 604e/233 the frame rate should be lower than with a graphic card, but it still should be playable...

Best Regards
Crumb

Crumb · « **Reply #18 on:** February 07, 2003, 06:58:36 PM »

Quote

the truecolor-18bit HAM8 mode is a SHRES screen used to simulate a LORES one.

I read something about that technique in the amycoders web, but I thought that there was a way to do that in plain HIRES or LORES... :-/
I still don't know why a SHIRES is used with HAM8, I understand that HAM6 in SHIRES helps to simulate a hi/true color screen but...
why isn't faster to use a real LORES or HIRES ham8 screen instead of a SHIRES ham8 that needs a lot more of resources?
Sorry for my ignorance... is it to make "independent" each group of 4 pixels thanks to the last pixel written or what? wouldn't be enough fast to store the colour of the previous pixel in a register or variable?

Crumb · « **Reply #19 on:** February 08, 2003, 02:38:55 PM »

Quote

btw that capsule demo aint truecolor btw , but its a well designed demo .

yes, I wrote about that demo because the idea of only making c2p with the lines that are being shown is quite good in my opinion (I haven't seen many hi-res demos).
If the screen is 8bit or ham8 is irrelevant to implement that idea. Well... now that you are here may you explain (if you aren't very busy and if you want) in a few lines why SHIRES is used instead of LORES to make hi/true color screens? sorry for insisting but I'm quite curious about this...

Quote

i have a feeling that next u want the c64 guys to make RTG demos also, eh??

no, because there's few people with gfx cards in c= 64s... but in Amiga now it's quite usual to have a gfx card, I know more people with gfx cards than people without one... anyway I understand your point because although coding for RTG may be fun, AGA has a lot more interesting features to play with. People should understand that some AGA demos can't be done for RTG easily so there's not much sense in almost rewriting the entire demo to make it RTG compatible. The situation is different with those demos that don't use many special AGA features, they can be easily converted and is a nice present for A3000/A2000 users.
Friendship rules ;-D

Crumb · « **Reply #20 on:** February 10, 2003, 05:17:00 PM »

@Darkcoder

Quote

the usual HAM8 is a planar mode. The 18bit HAM8 is a technique to do a sort of truecolor chunky mode. (yep a fake one, as Agony says)

I know that HAM8 is a planar mode... what I want to know is why no one has done a routine that converts
a 24/16bit chunky pixel fastram buffer to lores planar ham8 screens. TurboEVD uses a HAM8 mode to display modes like 640x480 in pseudo 15bits. TurboEVD seems to have done it without using a SHIRES screen mode.
I'm not talking about putting 1 Red 1 Green and 1 Blue pixel in SHIRES very near to fake a 24 bit mode, I'm talking about a screen where each colour depends on the previous pixel... a ham planar one. And a routine that turns a 16bit or 24bit chunky buffer in a ham8 planar screen. Chunky to planar but using more colours in the buffer and ham8 in the screen. Now am I explaining better what I want to know?

Crumb · « **Reply #21 on:** February 11, 2003, 11:02:21 AM »

@Darkcoder&Ag0ny
thank you for your explanation. The rasterline example is very good to show the limitations of the ham mode.

Umm remember the values of chipram speed given by that URL? In the TurboEVD documents Aki shows the difference of chipram access speed using different screenmodes, it's quite interesting.
Screen mode Speed in MB/s
PAL: Lores 320x256x2 6.2 MB/s
PAL: Lores 320x256x256 5.2 MB/s
DBLPAL: Lores 320x256x256 4.2 MB/s
PAL: Hires Lace 640x512x2 6.2 MB/s
PAL: Hires Lace 640x512x16 6.0 MB/s
PAL: Hires Lace 640x512x256 4.3 MB/s
Multiscan: Productivity 640x480x16 5.7 MB/s
Multiscan: Productivity 640x480x256 2.1 MB/s

I don't know what machine he used for testing but it gives a good idea about how AGA performance drops when you increase the resolution.

The speed Aki achieves with his ham8 mode is only a bit slower than using a normal 8bitplane screen in shapeshifter... without using the compare buffer it may be even faster.

@Karlos
I don't think that it would produce a perfect picture in lo-res, but it seems to be 100% system friendly and is quite fast...

Author Topic: Demos using a GFX mode please ! (Read 30481 times)

Crumb

Re: Demos using a GFX mode please !

Crumb

Re: Demos using a GFX mode please !

Crumb

Re: Demos using a GFX mode please !

Crumb

Re: Demos using a GFX mode please !

Crumb

Re: Demos using a GFX mode please !

Crumb

Re: Demos using a GFX mode please !

Crumb

Re: Demos using a GFX mode please !

Crumb

Re: Demos using a GFX mode please !