Author Topic: PowerPC accelerator - how does that work then? (Read 12593 times)

Mrs Beanbag · « **on:** January 30, 2012, 07:52:53 PM »

Ok so I know the PowerPC accelerators for classic Amigas have both a PPC chip and a 68040 on board, but when does the PowerPC chip come in and how does it know? Does a program have to run on the 68k at first (like a bootstrap) and then trigger some kind of switch, throw an exception, or call a library function or something?

Karlos · « **Reply #1 on:** January 30, 2012, 08:22:37 PM »

Quote from: Mrs Beanbag;678299

Ok so I know the PowerPC accelerators for classic Amigas have both a PPC chip and a 68040 on board, but when does the PowerPC chip come in and how does it know? Does a program have to run on the 68k at first (like a bootstrap) and then trigger some kind of switch, throw an exception, or call a library function or something?

Neglecting OS4.x and MorphOS 1.x for the moment, there are 2 solutions on OS3.x. The following description is somewhat generalized and may have mistakes and omissions as it was a long time ago that I last looked at it.

First there was PowerUP (or just PUP). This was developed by phase5 as the software to allow applications to take advantage of the PPC. Essentially, it comprises a kernel for the PPC, along with some APIs for developers and a set of patches to the host operating system to bootstrap the processor.

Once the system was up and running, the (patched) host OS was able to load PPC binaries in ELF format (it may have been modified a bit, but essentially the same format described in the old System V). An ELF could be an entire application for the PPC, or just a set of PPC optimised functions to be used by an otherwise 68K application. Code running on the PPC was basically independent of the 68K until it needed to call a host OS function (note that the PUP kernel provides memory management, tasks and so on for the PPC side).
The mechanism required to call 68K code from the PPC (or vice versa) necessitated that both processors flush their data cache. The parameters for the call would then be passed through memory from one to the other. While the "other" CPU is working, the corresponding process on the current CPU is basically asleep until the opposite CPU completed and returned. This process was simply called a "context switch" (a widely used term in other areas of computing, but technically accurate nonetheless). I'm afraid I forget the specific underlying details on how all this was achieved exactly. If memory serves, there was an arbitration "server" process involved in the communication between the two processors. Perhaps someone better qualified can answer that.

The system worked, but early versions suffered poor performance due to the overhead of said context switches (this was addressed later). In the time it took either processor to flush it's cache, it could have done millions of instructions. Other criticisms revolved around the use of ELF which was seen as an alien standard on the Amiga* which had it's own hunk format for binaries. Soon thereafter, WarpUP appeared (later called WarpOS), a rival system that extended the amiga hunk format to allow PPC code segments in executables, leading to "fat" binaries.

Fundamentally, it was similar to PUP: a PPC kernel, API and a set of OS patches. The main differences were in the implementation. WarpUP tried to mirror Exec's structure for the services it provided. Threads were in pairs, one for each CPU and only one of which would be active at any time (there were asynchronous context switches but they were tricky to use). A single threaded application that used the PPC would thus really be two threads, one on either chip, only one of which was running at any instant, the other sleeping and waiting for it's counterpart to return. However, they shared many fundamental components, such as signals. If you sent CTRL-C to one, the corresponding signal bit would be set in both the Task and TaskPPC (or at least it was routed to whichever was active, I forget).

It still required cache flushing in the same way as PUP, the main advantage of the system (at the time) was that it offered considerably quicker at it. That performance gap closed in later versions of PUP, but by then we'd enjoyed the "kernel" wars and WarpUP had become the unofficial standard. At least until OS3.9 where the picture.datatype used it. That's about as official as it ever got.

Each system had it's strengths and weaknesses. The Achilles heel of both was the context switch. It may have been faster on WarpUp, but in real terms it was still shockingly expensive. You had to carefully design code to minimise these if you wanted your application to run acceptably.

*ironically, all the offshoot operating systems use it for their native binary format.

Tension · « **Reply #2 on:** January 30, 2012, 08:27:59 PM »

Quote from: Karlos;678302

Neglecting OS4.x and MorphOS 1.x for the moment, there are 2 solutions on OS3.x. The following description is somewhat generalized and may have mistakes and omissions as it was a long time ago that I last looked at it.

First there was PowerUP (or just PUP). This was developed by phase5 as the software to allow applications to take advantage of the PPC. Essentially, it comprises a kernel for the PPC, along with some APIs for developers and a set of patches to the host operating system to bootstrap the processor.

Once the system was up and running, the (patched) host OS was able to load PPC binaries in ELF format (it may have been modified a bit, but essentially the same format described in the old System V). An ELF could be an entire application for the PPC, or just a set of PPC optimised functions to be used by an otherwise 68K application. Code running on the PPC was basically independent of the 68K until it needed to call a host OS function (note that the PUP kernel provides memory management, tasks and so on for the PPC side).
The mechanism required to call 68K code from the PPC (or vice versa) necessitated that both processors flush their data cache. The parameters for the call would then be passed through memory from one to the other. While the "other" CPU is working, the corresponding process on the current CPU is basically asleep until the opposite CPU completed and returned. This process was simply called a "context switch" (a widely used term in other areas of computing, but technically accurate nonetheless). I'm afraid I forget the specific underlying details on how all this was achieved exactly. If memory serves, there was an arbitration "server" process involved in the communication between the two processors. Perhaps someone better qualified can answer that.

The system worked, but early versions suffered poor performance due to the overhead of said context switches (this was addressed later). In the time it took either processor to flush it's cache, it could have done millions of instructions. Other criticisms revolved around the use of ELF which was seen as an alien standard on the Amiga* which had it's own hunk format for binaries. Soon thereafter, WarpUP appeared (later called WarpOS), a rival system that extended the amiga hunk format to allow PPC code segments in executables, leading to "fat" binaries.

Fundamentally, it was similar to PUP: a PPC kernel, API and a set of OS patches. The main differences were in the implementation. WarpUP tried to mirror Exec's structure for the services it provided. Threads were in pairs, one for each CPU and only one of which would be active at any time (there were asynchronous context switches but they were tricky to use). A single threaded application that used the PPC would thus really be two threads, one on either chip, only one of which was running at any instant, the other sleeping and waiting for it's counterpart to return. However, they shared many fundamental components, such as signals. If you sent CTRL-C to one, the corresponding signal bit would be set in both the Task and TaskPPC (or at least it was routed to whichever was active, I forget).

It still required cache flushing in the same way as PUP, the main advantage of the system (at the time) was that it offered considerably quicker at it. That performance gap closed in later versions of PUP, but by then we'd enjoyed the "kernel" wars and WarpUP had become the unofficial standard. At least until OS3.9 where the picture.datatype used it. That's about as official as it ever got.

Each system had it's strengths and weaknesses. The Achilles heel of both was the context switch. It may have been faster on WarpUp, but in real terms it was still shockingly expensive. You had to carefully design code to minimise these if you wanted your application to run acceptably.

*ironically, all the offshoot operating systems use it for their native binary format.

barry do teh money dance

itix · « **Reply #3 on:** January 30, 2012, 08:29:36 PM »

Or to make long story short: PPC is usable only as co-processor where it is completing heavy computing on a background. Calling an OS from PPC is too expensive.

Karlos · « **Reply #4 on:** January 30, 2012, 08:36:34 PM »

Quote from: itix;678304

Or to make long story short: PPC is usable only as co-processor where it is completing heavy computing on a background. Calling an OS from PPC is too expensive.

That's not exactly true, though, is it? There were plenty of applications that ran almost entirely on the PPC, calling the OS to do IO, graphics etc.

The way I used to write code (albeit for WarpUP, never really did much for PUP) was to have a main 68K function that was polled from the PPC (usually once per frame in a graphical application). That function would then do all the OS stuff; switch screenbuffers, collect the contents of IDCMP messages and so on, before returning to the PPC. The PPC would then do a bunch of native callbacks for said messages before continuing on with it's main loop. It was surprisingly effective

Karlos · « **Reply #5 on:** January 30, 2012, 08:46:39 PM »

An interesting side-effect of the way both systems worked is that you actually got a degree of multiprocessing.

When a 68K thread invoked a PPC operation, it basically went to sleep until the PPC finished working and returned, and vice versa.

The PPC kernels were both, as far as I know, preemptive, as of course is exec. So, while a given 68K thread is asleep and the PPC doing some work for it, Exec is able to run a different 68K thread during that time (memory bandwidth permitting). Likewise, when any given PPC thread is asleep waiting for the 68K, the PPC kernel is able to run a different PPC thread.

Both processors ran with cache and copyback enabled and so if whatever they were executing didn't hit memory at the same time, you could get a 68K thread and (unrelated) PPC thread running concurrently.

Still, it probably didn't happen that often given that most people ran only one PPC application at a time.

Karlos · « **Reply #6 on:** January 30, 2012, 09:15:10 PM »

Quote from: Tension;678303

barry do teh money dance

Also, the above...

/me slowly edges backwards towards the door, smiling at the strange man from Belfast in a non-threatening manner :lol:

commodorejohn · « **Reply #7 on:** January 30, 2012, 09:23:26 PM »

So nobody outside of OS4/MOS went the Apple route and simply did 68k in emulation? Huh.

Karlos · « **Reply #8 on:** January 30, 2012, 09:29:15 PM »

Quote from: commodorejohn;678313

So nobody outside of OS4/MOS went the Apple route and simply did 68k in emulation? Huh.

That was the plan for later PPC accelerators that never materialized. Phase5 had announced the CyberstormG4 and BlizzardG4 line. I badly wanted the latter and had already located a source of spare kidneys to fund one, but alas it never appeared. Which is probably good news for one black market transplant recipient, I don't think any credible surgeon would be fooled by what I found at my local butchers...

Later, there was the AmiJoe G3, but again, it was never released.

All of these boards lacked a 68K processor and would require emulation resources in their firmware.

Mrs Beanbag · « **Reply #9 on:** January 30, 2012, 09:48:00 PM »

Wow, now that's a far more detailed answer than I was expecting! Thanks!

Tension · « **Reply #10 on:** January 30, 2012, 10:01:54 PM »

Quote from: Karlos;678311

Also, the above...

/me slowly edges backwards towards the door, smiling at the strange man from Belfast in a non-threatening manner :lol:

Sorry, I think moobunny is starting to get inside my head.

vox · « **Reply #11 on:** January 30, 2012, 10:04:43 PM »

Quote from: commodorejohn;678313

So nobody outside of OS4/MOS went the Apple route and simply did 68k in emulation? Huh.

Exact reason why OS 4 and MOS are seen as natural progression of Amiga PPC Cards. Unlike CUSA trolls that try to explain it was meant to be x86 way.

Sadly, while AmigaOS and MOS went PPC routes,
PPC kind of died as dekstop platforn, primary because it was left by Microsoft, IBM, Apple and most of Linux distro in that exact order of time (and all of them were part of lets replace x86 with PPC alliance in the beginning)

strim · « **Reply #12 on:** January 30, 2012, 10:34:24 PM »

Quote from: vox;678322

PPC kind of died as dekstop platform

Now that's a common misconception. It's not more dead than Amiga itself. As long as we, the users, are alive the platform is not dead.

commodorejohn · « **Reply #13 on:** January 30, 2012, 10:37:17 PM »

My Power Macs still work like new...

wawrzon · « **Reply #14 on:** January 30, 2012, 10:37:25 PM »

oyeah! karlos provided an answer in depth i was hoping for.
i d like to mention that even i now own one, i have considered and still consider the ppc accels too complicated, too clumsy, too expensive and in the long run not dependable enough a solution as it was worth to copy. at its time they mark a point when i decided to finally turn my back on what was going to happen to amiga and i am very happy i have been not aware of the whole ppc tragedy.

Author Topic: PowerPC accelerator - how does that work then? (Read 12593 times)

Mrs Beanbag

PowerPC accelerator - how does that work then?

Karlos

Re: PowerPC accelerator - how does that work then?

Tension

Re: PowerPC accelerator - how does that work then?

itix

Re: PowerPC accelerator - how does that work then?

Karlos

Re: PowerPC accelerator - how does that work then?

Karlos

Re: PowerPC accelerator - how does that work then?

Karlos

Re: PowerPC accelerator - how does that work then?

commodorejohn

Re: PowerPC accelerator - how does that work then?

Karlos

Re: PowerPC accelerator - how does that work then?

Mrs Beanbag

Re: PowerPC accelerator - how does that work then?

Tension

Re: PowerPC accelerator - how does that work then?

vox

Re: PowerPC accelerator - how does that work then?

strim

Re: PowerPC accelerator - how does that work then?

commodorejohn

Re: PowerPC accelerator - how does that work then?

wawrzon

Re: PowerPC accelerator - how does that work then?