I disagree about the OS sucking so bad. It's the compilers that suck. If we could get an up-to-date compiler for 68k, we could write software in C, C++, and (by extension) PortablE that didn't suck instead of twiddling around in Assembly for our larger apps. Also, since AROS is written in C we'll need the compilers to work better for that as well.
I agree. AmigaOS looks poor on paper but most of the lacking features are "nice" slow bloat that's not really needed. Compilers (biggest problem is 68k optimizing) are the big limitation to AmigaOS and AROS cumming back to the 68k classic which would unite AROS x86, classic 68k, fpga 68k and probably to a certain extent, UAE. That's the majority of Amiga users.
The reason Exec and the other libraries are stuck in the past is that they were written in 68k Assembly. That just made them harder to bring into the 21st century when 3.5 and 3.9 were written.
I don't think 68k assembler is so bad to work with with an assembler programmer that knows what they are doing

. An assembler programmer needs to be more organized as there is no structure enforced like in C. A well written assembler program is easier to maintain and enhance than a poorly written C program. The Amiga lacks good compilers and a good source level debugger which would help C programming be easier and faster. I can write a little C code but the Amiga C programming experience is frustrating.
As for the compilers, LLVM will soon be able to use a PBQP register scheduler that can store multiple small variables in one register. This would improve register loading and make it look more like somebody sat down and hand-assembled the whole thing. The GCC compiler could do similar things but doesn't because the 68k backend is so antiquated. Nobody will put the time nor energy into GCC 68k because it's thought to be a dead architecture outside of embedded controller use.
I doubt you will gain much by using small variables in one register on 68040+ and N68k. The need to work on 32 bit values for efficient pipelining and register forwarding limits how much can be done. The fast bitfield instructions in N68k will allow more data per register as well as helping out GCC which likes to use them indiscriminately. You could store multiple Boolean values as a bit each in 1 register or memory address too. There wouldn't be any more overhead (vs GCC 16 bit BOOL) if the N68k bit instructions were enhanced (but not expanded) to be conditional bit instructions like...
bseteq #3,d0 ;set bit 3 of register d0 if CC Z flag is set
It's already easy to test a bit but it's not as easy to set, change or clear a bit based on a condition code. The N68k should be able to do it with predication in the same speed with 1 more word...
bne .skip
bset #3,d0
.skip:
The bne and bset will get combined into a conditional instruction also taking 1 cycle. It's a lot better than the 68060 could do where there would be some missed branches and a branch cache entry.
The 68k beats ARM hands down. It was a properly designed CISC instruction set with RISC core which ARM evolved into with Thumb-2 but with legacy instruction set growing pains worse than 68k. The 68k is easier to read, more logical and consistent and I think the N68k will beat Thumb-2 in code density once the compilers get good enough

.