Welcome, Guest. Please login or register.

Author Topic: PowerPC accelerator - how does that work then?  (Read 11049 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline Mrs BeanbagTopic starter

  • Sr. Member
  • ****
  • Join Date: Sep 2011
  • Posts: 455
    • Show only replies by Mrs Beanbag
PowerPC accelerator - how does that work then?
« on: January 30, 2012, 07:52:53 PM »
Ok so I know the PowerPC accelerators for classic Amigas have both a PPC chip and a 68040 on board, but when does the PowerPC chip come in and how does it know?  Does a program have to run on the 68k at first (like a bootstrap) and then trigger some kind of switch, throw an exception, or call a library function or something?
Signature intentionally left blank
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: PowerPC accelerator - how does that work then?
« Reply #1 on: January 30, 2012, 08:22:37 PM »
Quote from: Mrs Beanbag;678299
Ok so I know the PowerPC accelerators for classic Amigas have both a PPC chip and a 68040 on board, but when does the PowerPC chip come in and how does it know?  Does a program have to run on the 68k at first (like a bootstrap) and then trigger some kind of switch, throw an exception, or call a library function or something?

Neglecting OS4.x and MorphOS 1.x for the moment, there are 2 solutions on OS3.x. The following description is somewhat generalized and may have mistakes and omissions as it was a long time ago that I last looked at it.

First there was PowerUP (or just PUP). This was developed by phase5 as the software to allow applications to take advantage of the PPC. Essentially, it comprises a kernel for the PPC, along with some APIs for developers and a set of patches to the host operating system to bootstrap the processor.

Once the system was up and running, the (patched) host OS was able to load PPC binaries in ELF format (it may have been modified a bit, but essentially the same format described in the old System V). An ELF could be an entire application for the PPC, or just a set of PPC optimised functions to be used by an otherwise 68K application. Code running on the PPC was basically independent of the 68K until it needed to call a host OS function (note that the PUP kernel provides memory management, tasks and so on for the PPC side).
The mechanism required to call 68K code from the PPC (or vice versa) necessitated that both processors flush their data cache. The parameters for the call would then be passed through memory from one to the other. While the "other" CPU is working, the corresponding process on the current CPU is basically asleep until the opposite CPU completed and returned. This process was simply called a "context switch" (a widely used term in other areas of computing, but technically accurate nonetheless). I'm afraid I forget the specific underlying details on how all this was achieved exactly. If memory serves, there was an arbitration "server" process involved in the communication between the two processors. Perhaps someone better qualified can answer that.

The system worked, but early versions suffered poor performance due to the overhead of said context switches (this was addressed later). In the time it took either processor to flush it's cache, it could have done millions of instructions. Other criticisms revolved around the use of ELF which was seen as an alien standard on the Amiga* which had it's own hunk format for binaries. Soon thereafter, WarpUP appeared (later called WarpOS), a rival system that extended the amiga hunk format to allow PPC code segments in executables, leading to "fat" binaries.

Fundamentally, it was similar to PUP: a PPC kernel, API and a set of OS patches. The main differences were in the implementation. WarpUP tried to mirror Exec's structure for the services it provided. Threads were in pairs, one for each CPU and only one of which would be active at any time (there were asynchronous context switches but they were tricky to use). A single threaded application that used the PPC would thus really be two threads, one on either chip, only one of which was running at any instant, the other sleeping and waiting for it's counterpart to return. However, they shared many fundamental components, such as signals. If you sent CTRL-C to one, the corresponding signal bit would be set in both the Task and TaskPPC (or at least it was routed to whichever was active, I forget).

It still required cache flushing in the same way as PUP, the main advantage of the system (at the time) was that it offered considerably quicker at it. That performance gap closed in later versions of PUP, but by then we'd enjoyed the "kernel" wars and WarpUP had become the unofficial standard. At least until OS3.9 where the picture.datatype used it. That's about as official as it ever got.

Each system had it's strengths and weaknesses. The Achilles heel of both was the context switch. It may have been faster on WarpUp, but in real terms it was still shockingly expensive. You had to carefully design code to minimise these if you wanted your application to run acceptably.

*ironically, all the offshoot operating systems use it for their native binary format.
« Last Edit: January 30, 2012, 08:25:55 PM by Karlos »
int p; // A
 

Offline Tension

Re: PowerPC accelerator - how does that work then?
« Reply #2 on: January 30, 2012, 08:27:59 PM »
Quote from: Karlos;678302
Neglecting OS4.x and MorphOS 1.x for the moment, there are 2 solutions on OS3.x. The following description is somewhat generalized and may have mistakes and omissions as it was a long time ago that I last looked at it.

First there was PowerUP (or just PUP). This was developed by phase5 as the software to allow applications to take advantage of the PPC. Essentially, it comprises a kernel for the PPC, along with some APIs for developers and a set of patches to the host operating system to bootstrap the processor.

Once the system was up and running, the (patched) host OS was able to load PPC binaries in ELF format (it may have been modified a bit, but essentially the same format described in the old System V). An ELF could be an entire application for the PPC, or just a set of PPC optimised functions to be used by an otherwise 68K application. Code running on the PPC was basically independent of the 68K until it needed to call a host OS function (note that the PUP kernel provides memory management, tasks and so on for the PPC side).
The mechanism required to call 68K code from the PPC (or vice versa) necessitated that both processors flush their data cache. The parameters for the call would then be passed through memory from one to the other. While the "other" CPU is working, the corresponding process on the current CPU is basically asleep until the opposite CPU completed and returned. This process was simply called a "context switch" (a widely used term in other areas of computing, but technically accurate nonetheless). I'm afraid I forget the specific underlying details on how all this was achieved exactly. If memory serves, there was an arbitration "server" process involved in the communication between the two processors. Perhaps someone better qualified can answer that.

The system worked, but early versions suffered poor performance due to the overhead of said context switches (this was addressed later). In the time it took either processor to flush it's cache, it could have done millions of instructions. Other criticisms revolved around the use of ELF which was seen as an alien standard on the Amiga* which had it's own hunk format for binaries. Soon thereafter, WarpUP appeared (later called WarpOS), a rival system that extended the amiga hunk format to allow PPC code segments in executables, leading to "fat" binaries.

Fundamentally, it was similar to PUP: a PPC kernel, API and a set of OS patches. The main differences were in the implementation. WarpUP tried to mirror Exec's structure for the services it provided. Threads were in pairs, one for each CPU and only one of which would be active at any time (there were asynchronous context switches but they were tricky to use). A single threaded application that used the PPC would thus really be two threads, one on either chip, only one of which was running at any instant, the other sleeping and waiting for it's counterpart to return. However, they shared many fundamental components, such as signals. If you sent CTRL-C to one, the corresponding signal bit would be set in both the Task and TaskPPC (or at least it was routed to whichever was active, I forget).

It still required cache flushing in the same way as PUP, the main advantage of the system (at the time) was that it offered considerably quicker at it. That performance gap closed in later versions of PUP, but by then we'd enjoyed the "kernel" wars and WarpUP had become the unofficial standard. At least until OS3.9 where the picture.datatype used it. That's about as official as it ever got.

Each system had it's strengths and weaknesses. The Achilles heel of both was the context switch. It may have been faster on WarpUp, but in real terms it was still shockingly expensive. You had to carefully design code to minimise these if you wanted your application to run acceptably.

*ironically, all the offshoot operating systems use it for their native binary format.


barry do teh money dance

Offline itix

  • Hero Member
  • *****
  • Join Date: Oct 2002
  • Posts: 2380
    • Show only replies by itix
Re: PowerPC accelerator - how does that work then?
« Reply #3 on: January 30, 2012, 08:29:36 PM »
Or to make long story short: PPC is usable only as co-processor where it is completing heavy computing on a background. Calling an OS from PPC is too expensive.
My Amigas: A500, Mac Mini and PowerBook
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: PowerPC accelerator - how does that work then?
« Reply #4 on: January 30, 2012, 08:36:34 PM »
Quote from: itix;678304
Or to make long story short: PPC is usable only as co-processor where it is completing heavy computing on a background. Calling an OS from PPC is too expensive.


That's not exactly true, though, is it? There were plenty of applications that ran almost entirely on the PPC, calling the OS to do IO, graphics etc.

The way I used to write code (albeit for WarpUP, never really did much for PUP) was to have a main 68K function that was polled from the PPC (usually once per frame in a graphical application). That function would then do all the OS stuff; switch screenbuffers, collect the contents of IDCMP messages and so on, before returning to the PPC. The PPC would then do a bunch of native callbacks for said messages before continuing on with it's main loop. It was surprisingly effective :)
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: PowerPC accelerator - how does that work then?
« Reply #5 on: January 30, 2012, 08:46:39 PM »
An interesting side-effect of the way both systems worked is that you actually got a degree of multiprocessing.

When a 68K thread invoked a PPC operation, it basically went to sleep until the PPC finished working and returned, and vice versa.

The PPC kernels were both, as far as I know, preemptive, as of course is exec. So, while a given 68K thread is asleep and the PPC doing some work for it, Exec is able to run a different 68K thread during that time (memory bandwidth permitting). Likewise, when any given PPC thread is asleep waiting for the 68K, the PPC kernel is able to run a different PPC thread.

Both processors ran with cache and copyback enabled and so if whatever they were executing didn't hit memory at the same time, you could get a 68K thread and (unrelated) PPC thread running concurrently.

Still, it probably didn't happen that often given that most people ran only one PPC application at a time.
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: PowerPC accelerator - how does that work then?
« Reply #6 on: January 30, 2012, 09:15:10 PM »
Quote from: Tension;678303
barry do teh money dance


Also, the above...

/me slowly edges backwards towards the door, smiling at the strange man from Belfast in a non-threatening manner :lol:
int p; // A
 

Offline commodorejohn

  • Hero Member
  • *****
  • Join Date: Mar 2010
  • Posts: 3165
    • Show only replies by commodorejohn
    • http://www.commodorejohn.com
Re: PowerPC accelerator - how does that work then?
« Reply #7 on: January 30, 2012, 09:23:26 PM »
So nobody outside of OS4/MOS went the Apple route and simply did 68k in emulation? Huh.
Computers: Amiga 1200, DEC VAXStation 4000/60, DEC MicroPDP-11/73
Synthesizers: Roland JX-10/MT-32/D-10, Oberheim Matrix-6, Yamaha DX7/FB-01, Korg MS-20 Mini, Ensoniq Mirage/SQ-80, Sequential Circuits Prophet-600, Hohner String Performer

"\'Legacy code\' often differs from its suggested alternative by actually working and scaling." - Bjarne Stroustrup
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: PowerPC accelerator - how does that work then?
« Reply #8 on: January 30, 2012, 09:29:15 PM »
Quote from: commodorejohn;678313
So nobody outside of OS4/MOS went the Apple route and simply did 68k in emulation? Huh.

That was the plan for later PPC accelerators that never materialized. Phase5 had announced the CyberstormG4 and BlizzardG4 line. I badly wanted the latter and had already located a source of spare kidneys to fund one, but alas it never appeared. Which is probably good news for one black market transplant recipient, I don't think any credible surgeon would be fooled by what I found at my local butchers...

Later, there was the AmiJoe G3, but again, it was never released.

All of these boards lacked a 68K processor and would require emulation resources in their firmware.
« Last Edit: January 30, 2012, 09:31:37 PM by Karlos »
int p; // A
 

Offline Mrs BeanbagTopic starter

  • Sr. Member
  • ****
  • Join Date: Sep 2011
  • Posts: 455
    • Show only replies by Mrs Beanbag
Re: PowerPC accelerator - how does that work then?
« Reply #9 on: January 30, 2012, 09:48:00 PM »
Wow, now that's a far more detailed answer than I was expecting!  Thanks!
Signature intentionally left blank
 

Offline Tension

Re: PowerPC accelerator - how does that work then?
« Reply #10 on: January 30, 2012, 10:01:54 PM »
Quote from: Karlos;678311
Also, the above...

/me slowly edges backwards towards the door, smiling at the strange man from Belfast in a non-threatening manner :lol:


Sorry, I think moobunny is starting to get inside my head. ;)

Offline vox

  • Hero Member
  • *****
  • Join Date: Feb 2011
  • Posts: 862
    • Show only replies by vox
    • http://anticusa.wordpress.com
Re: PowerPC accelerator - how does that work then?
« Reply #11 on: January 30, 2012, 10:04:43 PM »
Quote from: commodorejohn;678313
So nobody outside of OS4/MOS went the Apple route and simply did 68k in emulation? Huh.


Exact reason why OS 4 and MOS are seen as natural progression of Amiga PPC Cards. Unlike CUSA trolls that try to explain it was meant to be x86 way.

Sadly, while AmigaOS and MOS went PPC routes,
PPC kind of died as dekstop platforn, primary because it was left by Microsoft, IBM, Apple and most of Linux distro in that exact order of time (and all of them were part of lets replace x86 with PPC alliance in the beginning)
Future Acube and MOS supporter, fi di good, nothing fi di unprofessionals. Learn it harder way! http://www.youtube.com/user/rasvoja and https://www.facebook.com/rasvoja
 

Offline strim

  • Jr. Member
  • **
  • Join Date: Apr 2010
  • Posts: 89
    • Show only replies by strim
    • http://c0ff33.net/
Re: PowerPC accelerator - how does that work then?
« Reply #12 on: January 30, 2012, 10:34:24 PM »
Quote from: vox;678322
PPC kind of died as dekstop platform

Now that's a common misconception. It's not more dead than Amiga itself. As long as we, the users, are alive the platform is not dead.
 

Offline commodorejohn

  • Hero Member
  • *****
  • Join Date: Mar 2010
  • Posts: 3165
    • Show only replies by commodorejohn
    • http://www.commodorejohn.com
Re: PowerPC accelerator - how does that work then?
« Reply #13 on: January 30, 2012, 10:37:17 PM »
My Power Macs still work like new...
Computers: Amiga 1200, DEC VAXStation 4000/60, DEC MicroPDP-11/73
Synthesizers: Roland JX-10/MT-32/D-10, Oberheim Matrix-6, Yamaha DX7/FB-01, Korg MS-20 Mini, Ensoniq Mirage/SQ-80, Sequential Circuits Prophet-600, Hohner String Performer

"\'Legacy code\' often differs from its suggested alternative by actually working and scaling." - Bjarne Stroustrup
 

Offline wawrzon

Re: PowerPC accelerator - how does that work then?
« Reply #14 on: January 30, 2012, 10:37:25 PM »
oyeah! karlos provided an answer in depth i was hoping for.
i d like to mention that even i now own one, i have considered and still consider the ppc accels too complicated, too clumsy, too expensive and in the long run not dependable enough a solution as it was worth to copy. at its time they mark a point when i decided to finally turn my back on what was going to happen to amiga and i am very happy i have been not aware of the whole ppc tragedy.