Author Topic: Why is SMT (multicore) support hard for Amiga OS? (Read 11657 times)

Karlos · « **Reply #14 on:** June 26, 2010, 10:06:29 PM »

Quote from: bloodline;567434

True, but Forbid() was quicker and easier... Let I also not forget that pointer passing also removed Data locality... Two threads on different CPU's could be sharing the same data... and that data could quite easily be located in two different caches unless marked non-cacheable... but no provision was ever made for that!

This assumes that two CPUs are running entirely independently and that no mechanism for enforcing cache coherency exists. This is a very old problem that has been mitigated by hardware advancements quite some time ago. Are you familiar with the MOESI cache coherency model?

Most multicore processors use this or a similar strategy, where the assumption is that pushing data between cores to keep their caches coherent is not a bottleneck, but that talking to memory is.

bloodline · « **Reply #15 on:** June 26, 2010, 10:21:12 PM »

Quote from: Karlos;567439

This assumes that two CPUs are running entirely independently and that no mechanism for enforcing cache coherency exists. This is a very old problem that has been mitigated by hardware advancements quite some time ago. Are you familiar with the MOESI cache coherency model?

Most multicore processors use this or a similar strategy, where the assumption is that pushing data between cores to keep their caches coherent is not a bottleneck, but that talking to memory is.

I'm not aware of that model (until now, will read in a bit), but I am aware of the x86's power cache coherency hardware... What I'm not sure about is how well that can work in a non memory protected environment. I'm not convinced the AmigaOS data model allows for the hardware to detect what apps in AmigaOS are trying to do... please though, do do do prove me wrong

NorthWay · « **Reply #16 on:** June 26, 2010, 10:54:48 PM »

The quick and dirty answer?

Way too many programmers (including the C= library ones) have been too quick and dirty.
Forbid()/Permit() is actually documented as a macro in the official includes. Same with Disable()/Enable().
That means you can't insert your fix in the jumptables and expect all uses of the Big Lock to get caught. Possible solutions are to do Enforcer style ExecBase hacks to mmu-track the lock or interrupt register.

From there on in I bet there are many more pitfalls.

Karlos · « **Reply #17 on:** June 26, 2010, 11:00:01 PM »

Quote from: bloodline;567443

I'm not aware of that model (until now, will read in a bit), but I am aware of the x86's power cache coherency hardware... What I'm not sure about is how well that can work in a non memory protected environment. I'm not convinced the AmigaOS data model allows for the hardware to detect what apps in AmigaOS are trying to do... please though, do do do prove me wrong

Memory protection is a tangential issue. It's nice to have, but not a pre-requisite for SMP itself.

There already is a multiprocessing (as defined by having more than one hardware processor running concurrently) precedent on amigaos. It surprises me how quickly people forget it.

Think about old WOS (and PUP) for a moment. You had two completely different CPU's sharing the same memory in a non-memory protected environment, in which both CPUs had L1 cache and delayed writes (copyback in 68K parlance) enabled. Cache coherency was maintained in software, using the performance killing context switch we all learned to hate.

On WOS, every exec Task that invoked a PPC call ended up with a mirror PPC Task. Most inter CPU calls were synchronous (async also possible, if you care to manage the side effects yourself), so your 68K code calls the PPC and goes to sleep until the PPC returns and vice versa. Caches were flushed on whichever CPU was transferring ownership of the data. Conceptually, in WOS, this 68K/PPC pair of Tasks was considered to be a single "virtual Task". All Task signals were routed to whichever one was running.

Imagine I have a WOS application which has the usual mixture of 68K and PPC code. At this instant, the 68K code is executing and is about to invoke a PPC call.

The call is now invoked and my 68K Task goes to sleep and the PPC one wakes up. Since the 68K task is now asleep, exec is free to schedule another. Which it will happily do. At this point, the code for a totally unrelated application is now running on the 68K concurrently with code from my application which is presently busy on the PPC.

You can almost describe the process of putting the 68K thread to sleep and waking up the complementary PPC one during the context switch as moving the "virtual Task" from one CPU to the other. Whether you are switching from 68K to PPC or the other way around, the newly vacated CPU is free to continue executing other Tasks.

This is clearly not the best example of SMP since it's nothing like symmetric and there are in fact two kernels (pretty much unavoidable given that they execute different object code). However, it does demonstrate that two processors can share data in a non-memory protected environment and run concurrently. The main drawback is the software "brute force" cache coherency model used in which each CPU has to flush anything it's touched during the switch, but no alternative was available. Conceptually, you could replace the PPC with an identical 680x0 and the model still holds. It works, it's just let down by the agonizingly slow method of managing cache coherency.

If you can do that, implementing a method for concurrent execution on a dual core CPU that implements a hardware cache coherency model like MOESI (that has no need to flush caches to memory) has got to somewhat easier.

Karlos · « **Reply #18 on:** June 26, 2010, 11:11:20 PM »

^one point to note about the above example however is that the "switch" from one processor to another is a decision made in the application code itself, rather than being pre-emptively moved. In that respect, you could regard it as "cooperative" multiprocessing :lol:

bloodline · « **Reply #19 on:** June 26, 2010, 11:15:08 PM »

Well, I hardly want to point to the mess of PUP and WOS as examples of multiprocessor systems

But I refer back to Staf, who pointed out that SMP is possible on AmigaOS, the problem is that is might actually hurt performance rather than increase it. Anyway we can argue this as much as we like, I know that Michal (I'm pretty sure it was Michal... he did almost all the other CPU resource stuff) wants to set up SMP support on AROS... so lets see a real example when it comes

Karlos · « **Reply #20 on:** June 26, 2010, 11:51:05 PM »

Quote from: bloodline;567453

Well, I hardly want to point to the mess of PUP and WOS as examples of multiprocessor systems

They may be a mess but they do demonstrate there's nothing fundamentally impossible about it, which is the point. And to be entirely fair, most of that mess in their implementation comes down to the following points, none of which apply to a modern multi core processor:

1) two totally unrelated CPUs are used
2) cache coherency is implemented in software
3) no direct method exists to update the caches in either CPU with data in the other, so flushing cache to memory and rereading it back in must be used

Imagine none of the above complications were the case and you have two identical cores on the same die, with hardware arbitrated coherency that doesn't depend on cache flushes. Suddenly, the entire thing looks a lot less of a mess, doesn't it?

bloodline · « **Reply #21 on:** June 27, 2010, 12:08:39 AM »

Again, I would never point to the impossibility of SMP in Amiga systems, I only suggest that it serves no purpose. What use is two cores when they will run slower than one?

Karlos · « **Reply #22 on:** June 27, 2010, 12:14:33 AM »

Quote from: bloodline;567457

Again, I would never point to the impossibility of SMP in Amiga systems, I only suggest that it serves no purpose. What use is two cores when they will run slower than one?

None, of course. However, that is an assumption that has not been proven.

Louis Dias · « **Reply #23 on:** June 27, 2010, 03:13:44 AM »

Quote from: Karlos;567458

None, of course. However, that is an assumption that has not been proven.

This sounds like a job that could be tested on Minimig/Replay...
Make a 2 68K core version and offload tasks like C2P...

Zac67 · « **Reply #24 on:** June 27, 2010, 10:48:33 AM »

Quote from: Karlos;567449

There already is a multiprocessing (as defined by having more than one hardware processor running concurrently) precedent on amigaos. It surprises me how quickly people forget it.

Very good point - but actually this is asymmetric multiprocessing rather than SMP. Just like the old 'coprocessor architecture' with Copper and Blitter it shows the possibilities of offloading special tasks or subroutines to other hardware. With the PPC and more advanced 68ks, cache coherency becomes an issue - which can be solved at some cost as you've stated.

A somewhat similar approach could be used to utilize a 2nd core, but it'd be very troublesome to get the kernel itself run multithreaded. Novell Netware 4 SMP used something similar where only special programs could make use of additional CPUs.

Real SMP however makes no assumptions to where the code is actually running - this implies to the OS itself as well. Apart from the Forbid/Permit workarounds, you're looking at a complete redesign of the kernel. Remember the Windows 95/98/ME line? These relatively simple kernel designs stood no chance of running SMP, so MS didn't even bother trying.

Karlos · « **Reply #25 on:** June 27, 2010, 11:54:33 AM »

Quote from: Zac67;567517

Very good point - but actually this is asymmetric multiprocessing rather than SMP. Just like the old 'coprocessor architecture' with Copper and Blitter it shows the possibilities of offloading special tasks or subroutines to other hardware. With the PPC and more advanced 68ks, cache coherency becomes an issue - which can be solved at some cost as you've stated.

I acknowledged the example was asymmetric here:

Quote

This is clearly not the best example of SMP since it's nothing like symmetric and there are in fact two kernels (pretty much unavoidable given that they execute different object code).

Nevertheless, both processors can be executing code concurrently.

Quote

A somewhat similar approach could be used to utilize a 2nd core, but it'd be very troublesome to get the kernel itself run multithreaded. Novell Netware 4 SMP used something similar where only special programs could make use of additional CPUs.

Indeed, this was the sort of thing I was hinting at here:

Quote

Conceptually, you could replace the PPC with an identical 680x0 and the model still holds. It works, it's just let down by the agonizingly slow method of managing cache coherency.

...

^one point to note about the above example however is that the "switch" from one processor to another is a decision made in the application code itself, rather than being pre-emptively moved. In that respect, you could regard it as "cooperative" multiprocessing :lol:

It's possible to imagine a card with two 680x0 processors on it and a WOS-like system that allows applications to invoke a library call to get a mirror thread running on the other CPU. These mirror threads don't run concurrently, but as soon as such a switch is made, the current CPU is free to run any ready-to-run task from a different application.

This approach is almost certainly going to be agonizingly slow (and slower than a single CPU) though as you'll be left with a software cache concurrency implementation. However, this really only applies to old 680x0 and PPC where you have physically separate processors. A lot of R&D effort has been put into hardware cache coherency for true multicore processors.

It isn't that hard to imagine the scheme working fairly well (well, at least as well as any such "cooperative" SMP model can, it's never going to be as good as genuine SMP) on a modern multicore processor.

Quote

Real SMP however makes no assumptions to where the code is actually running - this implies to the OS itself as well. Apart from the Forbid/Permit workarounds, you're looking at a complete redesign of the kernel. Remember the Windows 95/98/ME line? These relatively simple kernel designs stood no chance of running SMP, so MS didn't even bother trying.

Yeah, I'm not suggesting that exec as it stands is suitable for "real" SMP, as you say, it would need a redesign from the ground up.

dammy · « **Reply #26 on:** June 27, 2010, 12:56:04 PM »

Quote

Yeah, I'm not suggesting that exec as it stands is suitable for "real" SMP, as you say, it would need a redesign from the ground up.

That's my vote. :-) We need something modern that has the spirit of AOS but that would be a massive undertaking requiring a significant size dev team and funding for that clean sheet of paper.

Zac67 · « **Reply #27 on:** June 27, 2010, 01:23:16 PM »

Well, the usual way to avoid redesigning the kernel is to use an established one and add the desired API to it.
Smell Amithlon, or OS X here?

bloodline · « **Reply #28 on:** June 27, 2010, 02:17:28 PM »

Quote from: Karlos;567533

Yeah, I'm not suggesting that exec as it stands is suitable for "real" SMP, as you say, it would need a redesign from the ground up.

Ergo, we no longer would be using AmigaOS!

Karlos · « **Reply #29 from previous page:** June 27, 2010, 02:25:38 PM »

Quote from: bloodline;567558

Ergo, we no longer would be using AmigaOS!

That boat already sailed with OS4, AROS and MOS

Author Topic: Why is SMT (multicore) support hard for Amiga OS? (Read 11657 times)

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

bloodline

Re: Why is SMT (multicore) support hard for Amiga OS?

NorthWay

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

bloodline

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

bloodline

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

Louis Dias

Re: Why is SMT (multicore) support hard for Amiga OS?

Zac67

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

dammy

Re: Why is SMT (multicore) support hard for Amiga OS?

Zac67

Re: Why is SMT (multicore) support hard for Amiga OS?

bloodline

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?