Welcome, Guest. Please login or register.

Author Topic: What is the real power of Akiko chip in cd32 ?  (Read 36307 times)

Description:

0 Members and 2 Guests are viewing this topic.

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: What is the real power of Akiko chip in cd32 ?
« Reply #44 on: October 30, 2011, 05:41:12 AM »
Quote from: bbond007;665773
yes. it runs that slow. I have a Bliz1260 50mhz  with 64MB ram.

There's something wrong with your setup. In the Doom benchmark page, there is an Amiga 1200 with 060@50MHz using AGA getting 24.6 fps and even a 1200 with 040@40MHz using AGA getting 13.4 fps. You should be at least 20fps. Here is my setup...

All 060 caches turned on and MAPROM off (for BlizKick) in accelerator menu (ESC)
AmigaOS 3.9 BB4
"BlizKick myROM QUIET" line in S:Startup-Sequence using my custom made 3.9 ROM
ThoR's MuLibs installed (http://aminet.net/util/libs/MMULib.lha)
"MuFastZero MOVESSP ON" line added to S:Startup-Sequence for above
CopyMem installed (http://aminet.net/util/boot/CopyMem.lha)

The most likely problems would be incorrectly installed 68040 and 68060 libraries and not turning on the caches. The 68040.library in Libs: should be tiny. Mine is 748 bytes. If yours is over 1k then that is your problem. Open a shell and type "CPU" . You should see "System: 68060 68882 (INST: Cache Burst) (DATA: Cache CopyBack)". If these are fine, then try removing programs from your WBStartup, S:User-Startup and S:Startup-Sequence that might be causing a slow down.

Edit: I installed the 060 executable for Doom Attack and it made a large difference. I now get 1945 realtics (38.4 fps) instead of 2420 realtics (30.9 fps). ADoom is still faster though. The Doom Attack manual does mention removing divsu.l and divsl.l instructions because he thought they didn't exist on the 68060 but they do (the 68060 manual is not clear in some places) and his compiler did not support the 060 at all. The executables probably do contain some assembler which does provide a speed increase. There is probably room for improvement and the source is on Aminet. Here is Doom Attack with CPU specific executables...

http://aminet.net/game/shoot/DoomAttack.lha
« Last Edit: October 30, 2011, 06:55:12 AM by matthey »
 

Offline bbond007

  • Hero Member
  • *****
  • Join Date: Mar 2009
  • Posts: 1517
    • Show only replies by bbond007
Re: What is the real power of Akiko chip in cd32 ?
« Reply #45 on: October 30, 2011, 06:48:35 AM »
Quote from: matthey;665782
There's something wrong with your setup. In the Doom benchmark page, there is an Amiga 1200 with 060@50MHz using AGA getting 24.6 fps and even a 1200 with 040@40MHz using AGA getting 13.4 fps. You should be at least 20fps. Here is my setup...

All 060 caches turned on and MAPROM off (for BlizKick) in accelerator menu (ESC)
AmigaOS 3.9 BB4
"BlizKick myROM QUIET" line in S:Startup-Sequence using my custom made 3.9 ROM
ThoR's MuLibs installed (http://aminet.net/util/libs/MMULib.lha)
"MuFastZero MOVESSP ON" line added to S:Startup-Sequence for above
CopyMem installed (http://aminet.net/util/boot/CopyMem.lha)

The most likely problems would be incorrectly installed 68040 and 68060 libraries and not turning on the caches. The 68040.library in Libs: should be tiny. Mine is 748 bytes. If yours is over 1k then that is your problem. Open a shell and type "CPU" . You should see "System: 68060 68882 (INST: Cache Burst) (DATA: Cache CopyBack)". If these are fine, then try removing programs from your WBStartup, S:User-Startup and S:Startup-Sequence that might be causing a slow down.


FaLLeNOnE was correct. The problem was with the DoomAttack executable. I downloaded the DoomAttack package from aminet and the 060 one is much better...

2134 gametics 3385 realtics

I have never been able to use MuFastZero on my 1260, says the zero page is already remapped.

I use WB 3.1 with ClassicWB ADV

I am using blizkick and I'll look into CopyMem.

Anyway I have created an icon/script to run the timedemo. I was getting tired of retyping...

thanks

_nate
« Last Edit: October 30, 2011, 06:52:34 AM by bbond007 »
 

Offline psxphill

Re: What is the real power of Akiko chip in cd32 ?
« Reply #46 on: October 30, 2011, 07:16:05 AM »
Quote from: Digiman;665756
Chunky AND planer doesn't mix and is hugely epensive....ie AAA chipset trainwreck.

VGA did it, which is what AGA was supposed to compete with.
 
AAA was a trainwreck because engineering and management had a disfunctional relationship. There was no way that AAA was ever going to be financially successful, but they burned money on it for years.
 
Adding chunky to AGA during it's design phase would have been free in comparison to the money spent on AAA.
 
And adding a texture mapper would have saved them.
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: What is the real power of Akiko chip in cd32 ?
« Reply #47 on: October 30, 2011, 11:23:39 AM »
Quote from: Digiman;665755
You miss the point, big deal a 486 33mhz requiring game requires an 040 25mhz Amiga shocker.

The benchmarks show a 50% speed improvement. So if you wrote any game engine and it managed 13-15fps on A1200 sans Akiko you can make it run 22fps for free on Akiko. The issue is 95% of Amiga games had sloppy n00b coding.

I think it's you that has rather missed the point. The 50% it gave for an 020 won't scale upwards linearly with faster systems. Akiko gave a 50% speed increase on slow systems where the time it takes to do C2P per frame is comparable to the time it takes to do everything else, that is to say, systems which are too slow to run the game at all. Going from 3FPS to 5FPS doesn't at all imply a 15FPS game will go to 22FPS.

For an engine of a given complexity, as the CPU gets faster, the portion time it takes to do C2P on the CPU becomes less and less of the overall time of execution time until you hit the magical copyspeed bottleneck. At that point, performing C2P takes no more processor time than simply copying data from Fast RAM to Chip RAM.

Some fast 030 C2P routines are already approaching copyspeed, once you start using 040, they get there. On these systems, using an Akiko would likely result in an even slower FPS count than a copyspeed software C2P routine. The reason for that is the way Akiko works. You have to write your chunky data to it, then read back the planar result then write that to Chip RAM all using the CPU. All that copying back and forth takes longer than simply reading the source data from Fast RAM, transforming it then writing it to Chip RAM.

At copyspeed for AGA, even an infinitely fast CPU/Fast RAM combination (where  the Chip RAM write speed of ~7MB/s becomes the only rate limiting factor) would be limited to around 90fps at 320x256 for 256 colours.

Doom uses a fairly simple (and efficient) 2D BSP and optimised column and span texturing routines for walls and floors and a sprite routine for objects, each of which has been pretty well-tuned. For more complex game engines, the time taken to render the frame becomes dominant and so C2P time becomes even less important. Take Quake for example. Here you have a full 3D BSP and a much more general-case textured/shaded/depth-buffered polygon rendering system. All much more CPU work than Doom. On 68K, eliminating the C2P all together (i.e. using an RTG card) only gives a comparatively small increase in fps over AGA. For 603/604 PPC systems, you see a bigger improvement because they are able to render the frame significantly faster than the 040/060 can and the whole argument about time spent rendering versus time spent C2P/copying becomes relevant again.
int p; // A
 

Offline commodorejohn

  • Hero Member
  • *****
  • Join Date: Mar 2010
  • Posts: 3165
    • Show only replies by commodorejohn
    • http://www.commodorejohn.com
Re: What is the real power of Akiko chip in cd32 ?
« Reply #48 on: October 30, 2011, 03:10:53 PM »
Quote from: Karlos;665804
Doom uses a fairly simple (and efficient) 2D BSP and optimised column and span texturing routines for walls and floors and a sprite routine for objects, each of which has been pretty well-tuned.
Just curious, you make it sound like Doom has separate rendering code for sprites? I guess I'd kinda figured it'd just reuse the column routine, since that's capable of transparency anyway.
Computers: Amiga 1200, DEC VAXStation 4000/60, DEC MicroPDP-11/73
Synthesizers: Roland JX-10/MT-32/D-10, Oberheim Matrix-6, Yamaha DX7/FB-01, Korg MS-20 Mini, Ensoniq Mirage/SQ-80, Sequential Circuits Prophet-600, Hohner String Performer

"\'Legacy code\' often differs from its suggested alternative by actually working and scaling." - Bjarne Stroustrup
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: What is the real power of Akiko chip in cd32 ?
« Reply #49 on: October 30, 2011, 03:38:06 PM »
Quote from: Karlos;665804
For 603/604 PPC systems, you see a bigger improvement because they are able to render the frame significantly faster than the 040/060 can and the whole argument about time spent rendering versus time spent C2P/copying becomes relevant again.


I agree with your point about c2p becoming a smaller percentage of CPU usage as the CPU becomes faster. I'm not so sure that the 603/604 PPC on the Phase5 boards were significantly faster than the 68060. I obtained the same results of 40 fps in the timedemo test with 68060 at 1/2 the clock speed as an Amiga with 604e. The 68060 can compete with and sometimes win in a competition with a RISC processor at 2x the speed. The 2x faster clock speed, 4x greater CPU cache and faster memory access of the PPC looks better on paper and advertisements than real life results show. It was a while before the low end PPC macs even outperformed the 68040@40MHz. The code and compilers continued to improve for the PPC as that was the focus of marketing and development while the 68060 was abandoned before it's limits were explored on any desktop platform. On the Amiga, the bottlenecks in WarpUp and the poor PPC optimization in AmigaOS 4.x certainly remove any significant speed advantage of the PPC beyond clocking it up. Do you want to buy this CPU with a 2GHz CPU or this one with a 3GHz CPU? The vast majority of people will think the 3GHz sounds faster so it should play games faster and it doesn't even cost 50% more.
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: What is the real power of Akiko chip in cd32 ?
« Reply #50 on: October 30, 2011, 04:48:49 PM »
Quote from: commodorejohn;665814
Just curious, you make it sound like Doom has separate rendering code for sprites? I guess I'd kinda figured it'd just reuse the column routine, since that's capable of transparency anyway.


It might do, but sprites are a special case as they are always "face on" and thus don't require anything beyond scaling - certainly not the recalculation of scale per column that walls do. It would seem an obvious case where an optimisation could be made.

Quote from: matthey
I'm not so sure that the 603/604 PPC on the Phase5 boards were significantly faster than the 68060


I only mentioned PPC from the perspective of Quake, not Doom, as Quake is an example of an engine where even the fastest 68K spends most of it's time doing something other than C2P or copying data to the video device.

On 68060, using C2P+AGA versus an RTG board shows only a modest performance increase, since doing C2P takes only a small fraction of the overall time per frame. When you move up to PPC, going from C2P+AGA to RTG (at least when the bus is suitably faster than AGA, such as BVisionPPC/CVision/GRexx), you observe a more noticeable increase.

Still, now that you mention it, the 040 version of DoomAttack runs significantly faster on my 603e under emulation. Fast enough for 640x400 (RTG) to be playable (only the transition effects are slow):
[youtube]AQ1t5q3xmYk[/youtube]

Out of curiosity, how is it at 640x400 RTG with 68060 (I can't test as I only have 040, but it's cripplingly slow for me at that resolution)?
int p; // A
 

Offline commodorejohn

  • Hero Member
  • *****
  • Join Date: Mar 2010
  • Posts: 3165
    • Show only replies by commodorejohn
    • http://www.commodorejohn.com
Re: What is the real power of Akiko chip in cd32 ?
« Reply #51 on: October 30, 2011, 05:10:35 PM »
Quote from: Karlos;665828
It might do, but sprites are a special case as they are always "face on" and thus don't require anything beyond scaling - certainly not the recalculation of scale per column that walls do. It would seem an obvious case where an optimisation could be made.
Oh. Thought you were talking about the column renderer specifically.

Though I suppose it might help to have a separate column renderer for sprites and walls with transparency in their textures, only I don't know if Doom marks those specially or not.
Computers: Amiga 1200, DEC VAXStation 4000/60, DEC MicroPDP-11/73
Synthesizers: Roland JX-10/MT-32/D-10, Oberheim Matrix-6, Yamaha DX7/FB-01, Korg MS-20 Mini, Ensoniq Mirage/SQ-80, Sequential Circuits Prophet-600, Hohner String Performer

"\'Legacy code\' often differs from its suggested alternative by actually working and scaling." - Bjarne Stroustrup
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: What is the real power of Akiko chip in cd32 ?
« Reply #52 on: October 30, 2011, 05:24:04 PM »
Quote from: Karlos;665828
Out of curiosity, how is it at 640x400 RTG with 68060 (I can't test as I only have 040, but it's cripplingly slow for me at that resolution)?

ADoom at 640x400 RTG (Mediator with Voodoo 4 and no gfx bus overclocking) is ~11 fps using the timedemo test on my 68060@75MHz. The actual game play feels a little faster than Quake at 20 fps (HW 3D) on the same setup. It takes VERY heavy combat situations to perceive any slow down at all. There is no slow down just walking around, turning and shooting a barrel like in your video, or was that my slow 1.86 GHz Windows machine pausing just playing the low resolution video :/. That would be a good show for a 68040 though. I'm surprised Doom Attack worked at 640x400 as the docs say it doesn't support increasing the horizontal resolution.
« Last Edit: October 30, 2011, 05:31:56 PM by matthey »
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: What is the real power of Akiko chip in cd32 ?
« Reply #53 on: October 30, 2011, 06:48:16 PM »
Quote from: matthey;665832
ADoom at 640x400 RTG (Mediator with Voodoo 4 and no gfx bus overclocking) is ~11 fps using the timedemo test on my 68060@75MHz.

Mediator1200 or Mediator4000?

Quote
The actual game play feels a little faster than Quake at 20 fps (HW 3D) on the same setup. It takes VERY heavy combat situations to perceive any slow down at all. There is no slow down just walking around, turning and shooting a barrel like in your video, or was that my slow 1.86 GHz Windows machine pausing just playing the low resolution video :/. That would be a good show for a 68040 though.

It wasn't an 040, it was the 040 executable running under OS4's emulation on a 603e. The device recording the footage was a bit slow too. My old mouse was acting up, has a tendency to move in jumps (you can see that when opening the preferences application and it overshoots). The performance of the game itself was pretty consistent.

Quote
I'm surprised Doom Attack worked at 640x400 as the docs say it doesn't support increasing the horizontal resolution.

Give it a bash, seems to work regardless :)
int p; // A
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: What is the real power of Akiko chip in cd32 ?
« Reply #54 on: October 30, 2011, 07:53:36 PM »
Quote from: Karlos;665835
Mediator1200 or Mediator4000?


Mediator 3000T/4000T (works through Zorro II/III slots but only 6-8 MB/s as I recall)

Quote from: Karlos;665835

It wasn't an 040, it was the 040 executable running under OS4's emulation on a 603e. The device recording the footage was a bit slow too. My old mouse was acting up, has a tendency to move in jumps (you can see that when opening the preferences application and it overshoots). The performance of the game itself was pretty consistent.


The PPC emulation in AmigaOS 4.x is fast at least. Maybe AmigaOS 4.x should just use emulated 68k code as applets like Android OS uses java ;). The faster 604 CyberStorm PPCs should be close to emulating the 68k at 68060 speed with 3x the clock speed, 4x the cache, over 2x the memory bandwidth and much more memory needed. That's progress though :/.
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: What is the real power of Akiko chip in cd32 ?
« Reply #55 on: October 30, 2011, 08:18:29 PM »
Quote from: matthey;665837
Mediator 3000T/4000T (works through Zorro II/III slots but only 6-8 MB/s as I recall)


Hmm, I thought it was at least faster than the 1200 version.

Quote
The PPC emulation in AmigaOS 4.x is fast at least. Maybe AmigaOS 4.x should just use emulated 68k code as applets like Android OS uses java ;). The faster 604 CyberStorm PPCs should be close to emulating the 68k at 68060 speed with 3x the clock speed, 4x the cache, over 2x the memory bandwidth and much more memory needed. That's progress though :/.


Emulation is a funny thing. Sometimes it's faster, sometimes it isn't, it all depends.
int p; // A
 

Offline Piru

  • \' union select name,pwd--
  • Hero Member
  • *****
  • Join Date: Aug 2002
  • Posts: 6946
    • Show only replies by Piru
    • http://www.iki.fi/sintonen/
Re: What is the real power of Akiko chip in cd32 ?
« Reply #56 on: October 30, 2011, 08:29:40 PM »
Quote from: Karlos;665828
I can't test as I only have 040, but it's cripplingly slow for me at that resolution
Data cache size? Or is it rendering to framebuffer directly (in which case the writes would be going to cache inhibited memory anyway...)? If I remember correctly there was some specific resolutions that were optimal for the specific cache configurations, while other resolutions were way worse.

Speaking of emulation: MorphOS JIT was significantly faster than the parent 68k from very early on. Even the slowest 603 can easily beat 060@50 in pretty much everything.
« Last Edit: October 30, 2011, 08:36:48 PM by Piru »
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: What is the real power of Akiko chip in cd32 ?
« Reply #57 on: October 30, 2011, 08:51:39 PM »
Quote from: Piru;665842
Data cache size? Or is it rendering to framebuffer directly (in which case the writes would be going to cache inhibited memory anyway...)?

Not sure with DoomAttack, to be honest. I doubt it's rendering to the framebuffer directly, given the way Doom renders the scene columns at at a time. You'd at least want to use a buffer that could hold say four 8-bit pixel columns so that you could then transfer them as 32-bit words over the bus. That would be 800 bytes for a 200 pixel display. Or 1600, if you wanted to have a nice 16 column, cache-aligned wide buffer :)
« Last Edit: October 30, 2011, 09:05:47 PM by Karlos »
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: What is the real power of Akiko chip in cd32 ?
« Reply #58 on: October 30, 2011, 09:11:42 PM »
Quote from: Piru;665842
Speaking of emulation: MorphOS JIT was significantly faster than the parent 68k from very early on. Even the slowest 603 can easily beat 060@50 in pretty much everything


I don't doubt it, at least for anything processor intensive. There's little that can be done for the rest of the system though. Slow buses, in particular.

My remark about emulation performance was intended to be a general observation, not specific to emulation of 68K under OS4. There is the expectation that whenever you emulate something - particularly via any sort of JIT mechanism - that the result is going to be faster as long as the host processor is faster. It isn't always the case, however. How is 6888x extended precision floating point performance, for example? IIRC, the PPC only supports 64-bit IEEE 754, so any accurate emulation is probably going to end up hampered trying to find interesting ways of handling the extra precision. Or, is it (as I suspect) the case that only extended precision read/write are really emulated and any extended precision operations are all carried out at 64-bit instead?
int p; // A
 

Offline Piru

  • \' union select name,pwd--
  • Hero Member
  • *****
  • Join Date: Aug 2002
  • Posts: 6946
    • Show only replies by Piru
    • http://www.iki.fi/sintonen/
Re: What is the real power of Akiko chip in cd32 ?
« Reply #59 from previous page: October 30, 2011, 10:02:20 PM »
Quote from: Karlos;665843
Not sure with DoomAttack, to be honest. I doubt it's rendering to the framebuffer directly, given the way Doom renders the scene columns at at a time. You'd at least want to use a buffer that could hold say four 8-bit pixel columns so that you could then transfer them as 32-bit words over the bus. That would be 800 bytes for a 200 pixel display. Or 1600, if you wanted to have a nice 16 column, cache-aligned wide buffer :)
Ah the cache size vs resolutions were discussed in the PPC ADoom port readme:
http://www.aminet.net/game/shoot/ADoomPPC.readme

Dunno where that overflow comes from though and how you'd go on calculating fast/slow resolutions for various 68k.