Author Topic: Why is SMT (multicore) support hard for Amiga OS? (Read 20457 times)

Karlos · « **on:** June 26, 2010, 09:17:05 PM »

Quote from: bloodline;567411

@Bif

As Fats (or Staf as we know him on the AROS list), has pointed out, AmigaOS shares resources... Ok, the Forbid() problem is a big one, but now think about HOW the amiga shares resources. It's all pointer passing, you basically have no idea which task/process is using what memory... if you need atomic access to a resouce you NEED to stop all tasks/processes that are currently running on the system...

This is just another case of one simple design decision, to share memory pointers for a bit of extra speed, coming back to bite us 25 years later

Strictly speaking, Forbid()/Permit() are not needed for the majority of AmigaOS thread exclusive access to (non-hardware) resources. For the most part, SignalSemaphores do a decent job of mediating access to items and lists in the system.

Karlos · « **Reply #1 on:** June 26, 2010, 10:06:29 PM »

Quote from: bloodline;567434

True, but Forbid() was quicker and easier... Let I also not forget that pointer passing also removed Data locality... Two threads on different CPU's could be sharing the same data... and that data could quite easily be located in two different caches unless marked non-cacheable... but no provision was ever made for that!

This assumes that two CPUs are running entirely independently and that no mechanism for enforcing cache coherency exists. This is a very old problem that has been mitigated by hardware advancements quite some time ago. Are you familiar with the MOESI cache coherency model?

Most multicore processors use this or a similar strategy, where the assumption is that pushing data between cores to keep their caches coherent is not a bottleneck, but that talking to memory is.

Karlos · « **Reply #2 on:** June 26, 2010, 11:00:01 PM »

Quote from: bloodline;567443

I'm not aware of that model (until now, will read in a bit), but I am aware of the x86's power cache coherency hardware... What I'm not sure about is how well that can work in a non memory protected environment. I'm not convinced the AmigaOS data model allows for the hardware to detect what apps in AmigaOS are trying to do... please though, do do do prove me wrong

Memory protection is a tangential issue. It's nice to have, but not a pre-requisite for SMP itself.

There already is a multiprocessing (as defined by having more than one hardware processor running concurrently) precedent on amigaos. It surprises me how quickly people forget it.

Think about old WOS (and PUP) for a moment. You had two completely different CPU's sharing the same memory in a non-memory protected environment, in which both CPUs had L1 cache and delayed writes (copyback in 68K parlance) enabled. Cache coherency was maintained in software, using the performance killing context switch we all learned to hate.

On WOS, every exec Task that invoked a PPC call ended up with a mirror PPC Task. Most inter CPU calls were synchronous (async also possible, if you care to manage the side effects yourself), so your 68K code calls the PPC and goes to sleep until the PPC returns and vice versa. Caches were flushed on whichever CPU was transferring ownership of the data. Conceptually, in WOS, this 68K/PPC pair of Tasks was considered to be a single "virtual Task". All Task signals were routed to whichever one was running.

Imagine I have a WOS application which has the usual mixture of 68K and PPC code. At this instant, the 68K code is executing and is about to invoke a PPC call.

The call is now invoked and my 68K Task goes to sleep and the PPC one wakes up. Since the 68K task is now asleep, exec is free to schedule another. Which it will happily do. At this point, the code for a totally unrelated application is now running on the 68K concurrently with code from my application which is presently busy on the PPC.

You can almost describe the process of putting the 68K thread to sleep and waking up the complementary PPC one during the context switch as moving the "virtual Task" from one CPU to the other. Whether you are switching from 68K to PPC or the other way around, the newly vacated CPU is free to continue executing other Tasks.

This is clearly not the best example of SMP since it's nothing like symmetric and there are in fact two kernels (pretty much unavoidable given that they execute different object code). However, it does demonstrate that two processors can share data in a non-memory protected environment and run concurrently. The main drawback is the software "brute force" cache coherency model used in which each CPU has to flush anything it's touched during the switch, but no alternative was available. Conceptually, you could replace the PPC with an identical 680x0 and the model still holds. It works, it's just let down by the agonizingly slow method of managing cache coherency.

If you can do that, implementing a method for concurrent execution on a dual core CPU that implements a hardware cache coherency model like MOESI (that has no need to flush caches to memory) has got to somewhat easier.

Karlos · « **Reply #3 on:** June 26, 2010, 11:11:20 PM »

^one point to note about the above example however is that the "switch" from one processor to another is a decision made in the application code itself, rather than being pre-emptively moved. In that respect, you could regard it as "cooperative" multiprocessing :lol:

Karlos · « **Reply #4 on:** June 26, 2010, 11:51:05 PM »

Quote from: bloodline;567453

Well, I hardly want to point to the mess of PUP and WOS as examples of multiprocessor systems

They may be a mess but they do demonstrate there's nothing fundamentally impossible about it, which is the point. And to be entirely fair, most of that mess in their implementation comes down to the following points, none of which apply to a modern multi core processor:

1) two totally unrelated CPUs are used
2) cache coherency is implemented in software
3) no direct method exists to update the caches in either CPU with data in the other, so flushing cache to memory and rereading it back in must be used

Imagine none of the above complications were the case and you have two identical cores on the same die, with hardware arbitrated coherency that doesn't depend on cache flushes. Suddenly, the entire thing looks a lot less of a mess, doesn't it?

Karlos · « **Reply #5 on:** June 27, 2010, 12:14:33 AM »

Quote from: bloodline;567457

Again, I would never point to the impossibility of SMP in Amiga systems, I only suggest that it serves no purpose. What use is two cores when they will run slower than one?

None, of course. However, that is an assumption that has not been proven.

Karlos · « **Reply #6 on:** June 27, 2010, 11:54:33 AM »

Quote from: Zac67;567517

Very good point - but actually this is asymmetric multiprocessing rather than SMP. Just like the old 'coprocessor architecture' with Copper and Blitter it shows the possibilities of offloading special tasks or subroutines to other hardware. With the PPC and more advanced 68ks, cache coherency becomes an issue - which can be solved at some cost as you've stated.

I acknowledged the example was asymmetric here:

Quote

This is clearly not the best example of SMP since it's nothing like symmetric and there are in fact two kernels (pretty much unavoidable given that they execute different object code).

Nevertheless, both processors can be executing code concurrently.

Quote

A somewhat similar approach could be used to utilize a 2nd core, but it'd be very troublesome to get the kernel itself run multithreaded. Novell Netware 4 SMP used something similar where only special programs could make use of additional CPUs.

Indeed, this was the sort of thing I was hinting at here:

Quote

Conceptually, you could replace the PPC with an identical 680x0 and the model still holds. It works, it's just let down by the agonizingly slow method of managing cache coherency.

...

^one point to note about the above example however is that the "switch" from one processor to another is a decision made in the application code itself, rather than being pre-emptively moved. In that respect, you could regard it as "cooperative" multiprocessing :lol:

It's possible to imagine a card with two 680x0 processors on it and a WOS-like system that allows applications to invoke a library call to get a mirror thread running on the other CPU. These mirror threads don't run concurrently, but as soon as such a switch is made, the current CPU is free to run any ready-to-run task from a different application.

This approach is almost certainly going to be agonizingly slow (and slower than a single CPU) though as you'll be left with a software cache concurrency implementation. However, this really only applies to old 680x0 and PPC where you have physically separate processors. A lot of R&D effort has been put into hardware cache coherency for true multicore processors.

It isn't that hard to imagine the scheme working fairly well (well, at least as well as any such "cooperative" SMP model can, it's never going to be as good as genuine SMP) on a modern multicore processor.

Quote

Real SMP however makes no assumptions to where the code is actually running - this implies to the OS itself as well. Apart from the Forbid/Permit workarounds, you're looking at a complete redesign of the kernel. Remember the Windows 95/98/ME line? These relatively simple kernel designs stood no chance of running SMP, so MS didn't even bother trying.

Yeah, I'm not suggesting that exec as it stands is suitable for "real" SMP, as you say, it would need a redesign from the ground up.

Karlos · « **Reply #7 on:** June 27, 2010, 02:25:38 PM »

Quote from: bloodline;567558

Ergo, we no longer would be using AmigaOS!

That boat already sailed with OS4, AROS and MOS

Karlos · « **Reply #8 on:** June 27, 2010, 04:49:04 PM »

Quote from: Amiga_Nut;567575

Yes but in classic Amiga + PPC card systems you are killing one processor to a allow the other to function. If you have to kill the 680x0 CPU for the 603/604 PPC to execute then how is that multi processor supporting OS?

Wrong. You have clearly misunderstood how the system worked. Nothing is "killed off", certainly not the CPU.

Your "virtual task" has just moved from 68K to PPC (or vice versa). In practise, your physical 680x0 task goes to sleep whilst the mirror PPC one is running (and vice versa when it returns).

Each processor is running a pre-emptive multitasking kernel. When either the currently active 68K or PPC task goes to sleep, the corresponding processor is immediately able to continue running a different task when the kernel next runs it's scheduling routine.

Karlos · « **Reply #9 on:** June 28, 2010, 07:21:40 PM »

Quote from: AmigoHD;567813

But without software which would use it, smp is useless and a waste of time to implement it IMHO.

Sorry for this offtopic answer (as my first answer ever here :-)), but it's just true.

All software can take advantage of SMP, it doesn't have to be written for it. All you have to do is run more than one such software application at once to get the benefit of having additional processor cores.

Beyond that, any application that has multiple threads will also potentially benefit if it ever has two threads that are ready to run. Some apps create lower priority background threads for various maintenance purposes that would normally only get the chance to run when nothing more important was using the CPU. In a true SMP system, they'd not be held back.

Lastly, remember that on AmigaOS, for example, the user interface layout often happens in a different thread than the application itself (not so sure about MUI there).

Karlos · « **Reply #10 on:** June 28, 2010, 07:46:51 PM »

Quote from: trekiej;567829

I hope this is not off topic.
I understand that the Amiga is an Asymetric type machine.
Is it possible to make a library to run software on another core?
If I was running an MPEG decoder, could it be decoded on another core?

You do that every time you use a ppc datatype in OS3.x or run any application that supports WarpOS/PowerUp.

Karlos · « **Reply #11 on:** June 28, 2010, 08:02:36 PM »

Quote from: trekiej;567832

Unfortunatley, I do not think this happens to help programs run any faster or be con-current.

Nope, offloading image decoding to a CPU with 9.6x the clockspeed of your 68040 makes nooo difference

You should read further up the thread to understand how WOS works. The 680x0 is not idle when the PPC is called to do a task, it is instead busy running some other application's code. The reason is that the calling 68K task goes to sleep. The moment the presently active task goes to sleep, exec it free to switch it out for a task that is ready to run at the next scheduling interrupt.

There seems to be this idea that either the 680x0 is running, or the PPC is running. This is not the case. Both CPUs are running at the same time and only when they both attempt to access memory at the same moment is one physically blocked by another.

What is true is that a synchronous PPC <-> 68K call puts the calling thread to sleep and waits untill the callee's thread has finished executing the code.

There are also asynchronous calls but due to the complexity of using them, not many applications do.

Karlos · « **Reply #12 on:** June 28, 2010, 11:58:14 PM »

Quote from: Arkhan;567865

To: Amiga_Nut
<3: Karlos

^ top right: Product Placement Fail.

Karlos · « **Reply #13 on:** June 29, 2010, 12:25:26 AM »

I should point out that I'm in no way advocating a WOS like approach, which given my numerous references to it in this thread, people might think. The only reason I mention it at all is that it serves as a demonstration that general multiprocessing (ie two processors that can run code simultaneously, rather than one processor and a custom chip or FPU etc) already has a precedent of sorts.

True SMP is a different beast all together, of course.

Karlos · « **Reply #14 on:** June 29, 2010, 08:25:14 PM »

Quote from: psxphill;567981

The PPC isn't running AmigaOS though. The hardware isn't the issue.

If you wanted to run two 68000's, then ExecBase would have to change.

Not in a WOS style implementation, where tasks are "transferred" cooperatively from one CPU to another, it wouldn't (as I said earlier, this isn't a serious option for many good reasons). In a true SMP implementation, of course it would.

Author Topic: Why is SMT (multicore) support hard for Amiga OS? (Read 20457 times)

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?

Karlos

Re: Why is SMT (multicore) support hard for Amiga OS?