Lots of interesting stuff here, I am learning a lot just sitting back and watching this for a bit =).
@Bif,
First off if it already has not been done, allow me to welcome you to the Amiga.org forums. It is always good to see Amiga users with programming experience/talent return to the Amiga community, as we are woefully short on competent programmers (at least compared to the number of Amiga programmers there once were).
Lastly, even if it proves to be impossible, or impractical to implement support for multiple core CPU's within AmigaOS4.x/MorphOS2.x/AROS, I hope that you will find the time to do some other Amiga programming in the future and that what ever code you do write will be able to be compiled for all three OS'es mentioned above.
Thanks amigadave =). I am keen on a number of potential Amiga projects so hope to help out at some point. The multicore thing was just the only real mystery gnawing at me. When i used the Amiga back in the early 90s my pet peeve was the lack of memory protection and virtual memory that caused the system or apps to go down in flames (multicore wasn't an issue back then =)). So my number one desire is to see Amiga OS get some modern OS conveniences.
In all honesty I'll probably only be doing AROS work because the others are simply impractical for me, at least at this time. I will be cognizant of maintaining compatibility with everything though and I'm an open source kind of guy so someone else can recompile/patch. As a habit I like to write cross-platform code where possible in case I want what I am working on running on something totally different.
@Bif
As Fats (or Staf as we know him on the AROS list), has pointed out, AmigaOS shares resources... Ok, the Forbid() problem is a big one, but now think about HOW the amiga shares resources. It's all pointer passing, you basically have no idea which task/process is using what memory... if you need atomic access to a resouce you NEED to stop all tasks/processes that are currently running on the system...
This is just another case of one simple design decision, to share memory pointers for a bit of extra speed, coming back to bite us 25 years later 
That sounds plausible. However, how does this even work on a single core system? Isn't access to these resources synchronized across tasks/threads already? Otherwise how do you know if whatever you have written to various memory locations is not in a half baked state when the other thread consuming it or also writing to it wakes up? Does the kernel have some intelligence when swapping tasks to ensure this somehow? I can't imagine that. I'd think this must be what the Forbid()/Permit() type of thing are exactly for if there is no other synchronization mechanism involved in these shared memory locations? I would think the kernel would not allow a task with a Forbid() in place to be swapped for this reason.
I think that multi-threaded programs need to live under pretty much the same constraints on single core systems vs. multicore systems (except for cache coherency issues). The difference when going multicore is that you hit multi-threading bugs much faster because it's all running at the same time. On single core systems you only run into potential problems during a context switch.
Karlos provided a lot of useful information in regards to memory/cache coherency, indeed I think I share the view that these days multicore is fairly evolved and not as challenging as it once was.
The quick and dirty answer?
Way too many programmers (including the C= library ones) have been too quick and dirty.
Forbid()/Permit() is actually documented as a macro in the official includes. Same with Disable()/Enable().
That means you can't insert your fix in the jumptables and expect all uses of the Big Lock to get caught. Possible solutions are to do Enforcer style ExecBase hacks to mmu-track the lock or interrupt register.
From there on in I bet there are many more pitfalls.
Fair enough, but at this point I think we are looking at next-gen amiga systems, and I don't think an application recompile is too unreasonable if required. Is the AROS Forbid() a macro for example? If it's 68k apps then I don't think we care too much about them, they can all run on one CPU or under emulation if need be or whatever. I guess I can see it being a problem where you let 68k apps call directly into the OS kernel that is running PPC/X86 if you can't trap that.
Again, I would never point to the impossibility of SMP in Amiga systems, I only suggest that it serves no purpose. What use is two cores when they will run slower than one?
This is something I struggled with in my day job for years. I think I have a pretty good idea of what kind of code benefits from multicore and what doesn't. This is an argument that is often thrown up to easily avoid the problem of multicore coding. Indeed some code is bloody impossible to make multicore. In the case of an OS, I see it as a facilitator of letting programs go multicore. The OS itself shouldn't be using up all the CPU, your apps should. A well written app should not be banging the OS like crazy. In this context your app shouldn't even be aware that is running on one particular core of the CPU, unless you multi-thread it to take advantage of multiple cores (a good idea if it can be done). In this case the app is pretty much running on the bare metal and the OS shouldn't be getting in the way much. There is no excuse for slow downs at this point except for general memory bandwidth limits that all CPUs will suffer on all OSs, but are well known.
I think you can probably say one thing is true, a multicore friendly kernel will probably run a bit slower on a single core CPU because of the extra code involved. However, I wouldn't think very much slower. No idea at all really, but maybe less than 1%.
I can only see a multicore kernel slowing things down if it needs to be full of crazy hacks like MMU traps and the like to make some arcane thing work.
Multicore is here to stay and becoming more prevalent every day. For AROS/X86 it's hard to find a single core system, and now with the X1000 there is no excuse not to waste that second core.
Real SMP however makes no assumptions to where the code is actually running - this implies to the OS itself as well. Apart from the Forbid/Permit workarounds, you're looking at a complete redesign of the kernel. Remember the Windows 95/98/ME line? These relatively simple kernel designs stood no chance of running SMP, so MS didn't even bother trying.
That's my vote. :-) We need something modern that has the spirit of AOS but that would be a massive undertaking requiring a significant size dev team and funding for that clean sheet of paper.
I'm not sure if adding multicore calls for a complete redesign of the kernel or not? Honestly I don't know the answer to this, I've never even perused any kernel code. I'm out of my element here. However, in the W95->NT case, I think there was a lot more going on there than just multicore. The NT kernel was built from the ground up with security and stability in mind. I think that probably took more precedence for its selection that just multicore. In my crude estimation, bringing AmigaOS up to the NT standard or stability and security would be a far larger undertaking than adding multicore.
In the end I think Amiga OS probably needs a hugely revamped kernel if we want to get all the modern stuff, and I'm not against that. But I'm thinking multicore alone may not require that.