Author Topic: Why is SMT (multicore) support hard for Amiga OS? (Read 30849 times)

Bif · « **on:** June 26, 2010, 01:06:18 AM »

I've been lurking around on here for a while now and I've now seen many posts where people reference the fact that adding support for more than one CPU to any flavour of Amiga OS seems difficult or impossible for some technical reason, especially lately with the X1000 CPU.

However, I've yet to find a detailed explanation/reference as to why it would be difficult. I'm sure such info has been posted somewhere, but I can't find it with a cursory search of Google and amiga.org. If you can point me to it that would be much appreciated.

I can't seem to shake my mid-life crisis which sees me spending way too much time over the last year reading about the Amiga. I was in love with it 20 years ago but then got busy with a software engineering career that took me away from it. I may now have ample free time to return to programming as a hobby and advancing Amiga OS (I guess it will have to be AROS) is something that seems to be drawing me in. While I have nothing against them, I can't stand the idea of Windows/Linux programming as a hobby as it has been beaten to death and there isn't much I feel i can contribute. I always wanted to program Amiga but never got far into it before getting sidetracked. So I don't know too much about Amiga OS.

Anyway my last 5 years I did almost nothing but multi-core software design and programming across a diverse array of hardware. I've never written a multi-core OS though. But from my user view, I don't get why multi-core is hard. Threads can run on whatever CPU, synchronization primitives built into the OS need to be updated to do the proper things across cores, and the kernel needs to know how to schedule cores? I must be missing something, obviously.

In my searching I did read AmigaOS 4.0 slides where it said multi-threading was added or being added - does that mean that an AmigaOS task can not have multiple threads? I guess I find that surprising if true, but not unbelievable, and I guess that would be a key problem since cores could only be allocated per task instead of per thread. And then I guess if AmigaOS/MorphOS/AROS only share common AmigaOS API up to 3.x it becomes a real challenge to create new OS APIs that they all standardize on?

Anyway sorry for all the clueless hypothesizing, but I really am curious where to get the lowdown. If I had more time I'd break out the AROS source and check but I won't have much time on my hands for a few months yet.

Thanks

Bif · « **Reply #1 on:** June 26, 2010, 06:20:50 PM »

Quote from: Fats;567366

AmigaOS uses a lot of public memory like Execbase. To allow multitasking access to these memory spaces, they are often surrounded by Forbid()/Permit() pairs. This is in system code, libraries and user programs.
Problem is that in a multi-core/multi-CPU occasion also the programs running on the other processor(s) have to be paused when a Forbid() call is made. This will make it very difficult to get any advantage of multiple cores or even may make it run slower due to the overhead.

greets,
Staf.

Oops, yes, I did mean to say SMP, duh. SMT presents the same problem though.

Thanks for the insight Fats, that sheds a bit of light on it.

So it seems Forbid()/Permit() is a coarse-grained synchronization mechanism to allow shared access to resources. I'm thinking that shouldn't be a huge obstacle preventing multicore. Indeed while one core has a Forbid() taken out, another thread on another core might have to block/idle, which is not desirable.

However, I think every multicore OS already has this problem and suffers from it. There is not much way around it, access to shared resources requires synchronization. The Win32 API for example may not have equivalent Forbid/Permit, but when you call into the API to allocate memory or access the file system, it is performing it's own synchronization anyway. If two threads are accessing the file system, one will have to wait for the other to exit the critical parts of the API. Probably a key difference here is the granularity, in Win32 the file system may have one or more of its own mutexes, and the memory allocator a different one. So the file system access won't block the memory allocation system, and so on. In other ways the Forbid()/Permit() scheme probably has advantages, because each AmigaOS API call doesn't have to include the synchronization and thus will operate more efficiently. I've designed a number of APIs that work just this way, and execute on multicore just fine. Taking out a lock on each API call can get expensive as locks can get expensive.

The main reason I don't see the AmigaOS calls blocking being a huge issue is because a well written program that is using heaps of CPU probably shouldn't be spending all of it inside AmigaOS calls. You shouldn't have to call into AmigaOS to perform DSP, physics, rendering, or whatever other kind of expensive logic you need to execute that requires all that CPU. Memory allocation may be an issue but it shouldn't be too bad I think, and if you have to, you can create your own private heap to allocate from without the need for synchronization - where I come from you try to simply never reallocate memory so you don't even have to deal with the memory system. I'd think most AmigaOS calls would come down to user interface and IO. I can't see 10 programs all trying to access UI at once and having it stall things out. I think probably only file IO should be a real problem. And every other multicore OS is going to have the same blocking problem here. It's really not much of a problem in practice.

I think for multicore all you can do is try to minimize the time the OS holds locks on shared resources. If programs are written poorly such that they put Forbid() calls around large chunks of code it will prevent good multicore execution. But it would also be forbidding good single core execution as well as other tasks won't be able to get their timeslice if they are blocked out.

Ideally you can create lockless APIs...but realistically that's near impossible, especially for OS functionality I would think.

Anyway I'm just kind of babbling ... still wondering if there is something else big preventing AmigaOS going multi-core.

Bif · « **Reply #2 on:** June 28, 2010, 06:41:12 PM »

Lots of interesting stuff here, I am learning a lot just sitting back and watching this for a bit =).

Quote from: amigadave;567402

@Bif,

First off if it already has not been done, allow me to welcome you to the Amiga.org forums. It is always good to see Amiga users with programming experience/talent return to the Amiga community, as we are woefully short on competent programmers (at least compared to the number of Amiga programmers there once were).

Lastly, even if it proves to be impossible, or impractical to implement support for multiple core CPU's within AmigaOS4.x/MorphOS2.x/AROS, I hope that you will find the time to do some other Amiga programming in the future and that what ever code you do write will be able to be compiled for all three OS'es mentioned above.

Thanks amigadave =). I am keen on a number of potential Amiga projects so hope to help out at some point. The multicore thing was just the only real mystery gnawing at me. When i used the Amiga back in the early 90s my pet peeve was the lack of memory protection and virtual memory that caused the system or apps to go down in flames (multicore wasn't an issue back then =)). So my number one desire is to see Amiga OS get some modern OS conveniences.

In all honesty I'll probably only be doing AROS work because the others are simply impractical for me, at least at this time. I will be cognizant of maintaining compatibility with everything though and I'm an open source kind of guy so someone else can recompile/patch. As a habit I like to write cross-platform code where possible in case I want what I am working on running on something totally different.

Quote from: bloodline;567411

@Bif

As Fats (or Staf as we know him on the AROS list), has pointed out, AmigaOS shares resources... Ok, the Forbid() problem is a big one, but now think about HOW the amiga shares resources. It's all pointer passing, you basically have no idea which task/process is using what memory... if you need atomic access to a resouce you NEED to stop all tasks/processes that are currently running on the system...

This is just another case of one simple design decision, to share memory pointers for a bit of extra speed, coming back to bite us 25 years later

That sounds plausible. However, how does this even work on a single core system? Isn't access to these resources synchronized across tasks/threads already? Otherwise how do you know if whatever you have written to various memory locations is not in a half baked state when the other thread consuming it or also writing to it wakes up? Does the kernel have some intelligence when swapping tasks to ensure this somehow? I can't imagine that. I'd think this must be what the Forbid()/Permit() type of thing are exactly for if there is no other synchronization mechanism involved in these shared memory locations? I would think the kernel would not allow a task with a Forbid() in place to be swapped for this reason.

I think that multi-threaded programs need to live under pretty much the same constraints on single core systems vs. multicore systems (except for cache coherency issues). The difference when going multicore is that you hit multi-threading bugs much faster because it's all running at the same time. On single core systems you only run into potential problems during a context switch.

Karlos provided a lot of useful information in regards to memory/cache coherency, indeed I think I share the view that these days multicore is fairly evolved and not as challenging as it once was.

Quote from: NorthWay;567447

The quick and dirty answer?

Way too many programmers (including the C= library ones) have been too quick and dirty.
Forbid()/Permit() is actually documented as a macro in the official includes. Same with Disable()/Enable().
That means you can't insert your fix in the jumptables and expect all uses of the Big Lock to get caught. Possible solutions are to do Enforcer style ExecBase hacks to mmu-track the lock or interrupt register.

From there on in I bet there are many more pitfalls.

Fair enough, but at this point I think we are looking at next-gen amiga systems, and I don't think an application recompile is too unreasonable if required. Is the AROS Forbid() a macro for example? If it's 68k apps then I don't think we care too much about them, they can all run on one CPU or under emulation if need be or whatever. I guess I can see it being a problem where you let 68k apps call directly into the OS kernel that is running PPC/X86 if you can't trap that.

Quote from: bloodline;567457

Again, I would never point to the impossibility of SMP in Amiga systems, I only suggest that it serves no purpose. What use is two cores when they will run slower than one?

This is something I struggled with in my day job for years. I think I have a pretty good idea of what kind of code benefits from multicore and what doesn't. This is an argument that is often thrown up to easily avoid the problem of multicore coding. Indeed some code is bloody impossible to make multicore. In the case of an OS, I see it as a facilitator of letting programs go multicore. The OS itself shouldn't be using up all the CPU, your apps should. A well written app should not be banging the OS like crazy. In this context your app shouldn't even be aware that is running on one particular core of the CPU, unless you multi-thread it to take advantage of multiple cores (a good idea if it can be done). In this case the app is pretty much running on the bare metal and the OS shouldn't be getting in the way much. There is no excuse for slow downs at this point except for general memory bandwidth limits that all CPUs will suffer on all OSs, but are well known.

I think you can probably say one thing is true, a multicore friendly kernel will probably run a bit slower on a single core CPU because of the extra code involved. However, I wouldn't think very much slower. No idea at all really, but maybe less than 1%.

I can only see a multicore kernel slowing things down if it needs to be full of crazy hacks like MMU traps and the like to make some arcane thing work.

Multicore is here to stay and becoming more prevalent every day. For AROS/X86 it's hard to find a single core system, and now with the X1000 there is no excuse not to waste that second core.

Quote from: Zac67;567517

Real SMP however makes no assumptions to where the code is actually running - this implies to the OS itself as well. Apart from the Forbid/Permit workarounds, you're looking at a complete redesign of the kernel. Remember the Windows 95/98/ME line? These relatively simple kernel designs stood no chance of running SMP, so MS didn't even bother trying.

Quote from: dammy;567547

That's my vote. :-) We need something modern that has the spirit of AOS but that would be a massive undertaking requiring a significant size dev team and funding for that clean sheet of paper.

I'm not sure if adding multicore calls for a complete redesign of the kernel or not? Honestly I don't know the answer to this, I've never even perused any kernel code. I'm out of my element here. However, in the W95->NT case, I think there was a lot more going on there than just multicore. The NT kernel was built from the ground up with security and stability in mind. I think that probably took more precedence for its selection that just multicore. In my crude estimation, bringing AmigaOS up to the NT standard or stability and security would be a far larger undertaking than adding multicore.

In the end I think Amiga OS probably needs a hugely revamped kernel if we want to get all the modern stuff, and I'm not against that. But I'm thinking multicore alone may not require that.

Bif · « **Reply #3 on:** July 05, 2010, 10:16:43 PM »

Learning more here and doing some other research, it becomes clearer to me why SMP is hard. I think I probably underestimate the complexity of developing a multi-core kernel itself. I probably spend too much time on the other side of the kernel to think about all the ramifications of handling PCI bus stuff, etc., and I guess some variants of AmigaOS are based on microkernels that might not make it easy to provide SMP, etc (or maybe make it easier, I dunno).

Quote from: NorthWay;567863

Let me rephrase what I said:
The Forbid/Permit semaphore is a publicly readable and writeable kernel structure.
You are free to change the definition of how the OS works, but then you are on your own. AROS (very sensibly (IMNSHO)) decided to take 3.1 with warts and all and implement that first. When they are 100% done you can add New Stuff.

I wasn't suggesting changing how it works. I think I understood what you said, and I think it is fine if Forbid()/Permit() serializes execution of any apps waiting on a Forbid(). Even with all this contention I think multi-core would be effective where needed, because apps that suck up lots of CPU wouldn't be doing it all within Forbid() sections as that would be completely terrible programming practice. And if they do that, too bad for those apps. From Amiga Classic perspective there wasn't really enough CPU available to do heavy number crunching, so I can see how apps back then could monopolize a lot of the CPU just dealing with the OS for GUI/file IO, etc. I don't think that applies as much going forward.

Also understood about AROS wanting get the basics covered first. I wasn't even suggesting adding multicore to AROS right away, just trying to say it would have to be the sandbox for the experiment if someone like myself wanted to play with it as it's the only one with code to play with.

Quote from: Trev;567876

@Karlos

Personally, I don't care whether or not my OS4 box runs classic (680x0 or PPC) software, so long-term, I'd like to see support for legacy software disappear entirely. If you wanted to keep them, though, you could keep the first 32 bits of a 64-bit address space reserved for that stuff and create as many individual or shared sandboxes as the user wants. Think VM86 and DOS under OS/2 and Windows NT.

Quote from: Trev;568007

Whose requirements are those? What is AmigaOS, exactly? The kernel? The API? The coupled hardware? Regarding new applications, isn't that the point?

If the legacy kernel is itself a process under a new kernel--one that did all the magic necessary to support the legacy kernel as a process--then barring hardware incompatibilities, how would old applications know the difference? New applications could use the same APIs we use today, minus the bits we know to be "bad."

I think the Amiga without the legacy apps wouldn't be too interesting to me. My vision of an Amiga box going forward is something that can be all things Amiga, including historical, with as much modernization of things as possible as well. I don't really care how it does this, running on some unix variant kernel, through emulation, WINE-style, etc., though the better/faster/more compatible it runs the better it is IMO. I think in the end the OS X model is probably where it needs to go. Obviously that might be difficult to get the resources for, no idea how hard that is.

Quote from: hardlink;567944

The 'original' Los Gatos Amiga crew were very much aware of the problems related to the 1983-1985 design decisions, and tried to apply what they learned in their next design, the 3DO operating system. I don't know if they made it SMP friendly or not, but it suffered the same ultimate fate as Amiga.

Heh, I am probably one of the rare few who actually developed for the 3DO M2. It's funny because at the time I had no clue there was any kind of Amiga connection. I didn't learn about the people behind Amiga until the last year when I got back into it since that info was scarce in the pre-internet world that my Amiga days existed in before. I've since learned that I previously crossed paths with a few people and techs related to Amiga. Would have been nice to know at the time as I would have got a kick out of it and maybe brought it up.

The M2 seemed quite elegant to program for. It was quite polished for where it was in development. I think because of this I don't remember much about the system. I had to go to wikipedia and look up that it was indeed a dual core power PC box, so it suported SMP! That probably would have messed up programmers pretty good at that time - maybe not a great design decision. The audio DSP did also make the system AMP =). I remember you had to cobble together a bunch of fortran authored objects to create a plugin/signal path. It seemed pretty advanced at the time, hard to believe it even worked, but it did! So there you go, the M2 had both SMP and AMP, which makes it somewhat topical to this thread. =) No idea about the original 3DO.

Author Topic: Why is SMT (multicore) support hard for Amiga OS? (Read 30849 times)

Bif

Why is SMT (multicore) support hard for Amiga OS?

Bif

Re: Why is SMT (multicore) support hard for Amiga OS?

Bif

Re: Why is SMT (multicore) support hard for Amiga OS?

Bif

Re: Why is SMT (multicore) support hard for Amiga OS?