Amiga.org

Operating System Specific Discussions => AROS Research Operating System => Topic started by: Ezrec on August 22, 2013, 05:56:14 PM

Title: AROS SMP Research: Technical Discussion
Post by: Ezrec on August 22, 2013, 05:56:14 PM: Continued from thread: http://www.amiga.org/forums/showthread.php?t=65748
Title: Re: AROS SMP Research: Technical Discussion
Post by: Ezrec on August 22, 2013, 06:02:28 PM: Quote from: psxphill;745838
Is that how your forbid works? That the other cpu is only stopped when it's quantum expires? That would probably be a mistake, the first cpu might be trying to send a message to port that is on the other cpu (which currently requires a forbid to make sure the port doesn't disappear before you send the message).

In that case, you (the programmer) need to update your code anyway, since you could have gotten pre-empted right before the SendMsg()/Signal() and lost that port, even on AmigaOS 3.x

Quote from: psxphill;745838
How is the second cpu task switching if it has no hardware interrupts? Is it controlled by the timer on cpu0?

That is architecture specific. On the 'unix hosted' AROS environment, CPU0 proxies the timer scheduling interrupt for the other CPUs.

On pc-x86, we will probably use the Local APIC timers, and an IPI signal to tell the cores to 'please stop running for a bit, until I tell you otherwise'.
Title: Re: AROS SMP Research: Technical Discussion
Post by: psxphill on August 22, 2013, 06:12:55 PM: Quote from: Ezrec;745841
In that case, you (the programmer) need to update your code anyway, since you could have gotten pre-empted right before the SendMsg()/Signal() and lost that port, even on AmigaOS 3.x

You need a forbid round the find/sendmsg, but you won't want the forbid to wait for all cpu's to finish their quantum. When the forbid happens the other cpu's need to stop what they are doing immediately.
Title: Re: AROS SMP Research: Technical Discussion
Post by: Ezrec on August 22, 2013, 06:17:47 PM: Quote from: psxphill;745843
You need a forbid round the find/sendmsg, but you won't want the forbid to wait for all cpu's to finish their quantum. When the forbid happens the other cpu's need to stop what they are doing immediately.

I think you're misunderstanding: the Forbid() doesn't return until the other CPUs have stopped.
Title: Re: AROS SMP Research: Technical Discussion
Post by: Ezrec on August 22, 2013, 06:31:44 PM: Quote from: Ezrec;745844
I think you're misunderstanding: the Forbid() doesn't return until the other CPUs have stopped.

And I think *I* have something wrong. Michal Shulz did some rough performance calculations, and even though my method (wait for quantum to expire) is semantically correct, the performance penalty is terrifying.

I'll experiment with signalling the other cores to stop immediately, and see how that works out.
Title: Re: AROS SMP Research: Technical Discussion
Post by: psxphill on August 22, 2013, 06:57:01 PM: Quote from: Ezrec;745846
And I think *I* have something wrong. Michal Shulz did some rough performance calculations, and even though my method (wait for quantum to expire) is semantically correct, the performance penalty is terrifying.

Yeah that was my point. Making forbid wait for the other cpus will mean that these four lines of code will take over 1 task quantum.

forbid()
permit()
forbid()
permit()

The first forbid() will take anywhere from nothing to 1 task quantum depending on how it aligns with the other cpu's tasks.

Stopping the other cpu's immediately will have some performance penalty, which even though it's much higher than the overhead in AOS 3.1, it should be nowhere near a quantum.

You also don't want the other cpu's tasks to lose their quantum when another cpu does a forbid(), the other cpu's tasks should have the quantum extended by the time they are suspended.

Rather than signalling the other cpu, it might be enough to actually stop them. The performance might depend on architecture, plus I don't know how you're abstracting all this stuff, so either way might make more sense.
Title: Re: AROS SMP Research: Technical Discussion
Post by: Ezrec on August 22, 2013, 07:19:51 PM: Quote from: psxphill;745851
Yeah that was my point. Making forbid wait for the other cpus will mean that these four lines of code will take over 1 task quantum.

Ok, looks like we're on the same page now.

Michal's planning on using IPI to signal the other CPUs to stop (on x86 SMP, there's isn't some "magic register" you can use to stop other CPUs, you have to ask them nicely), but it's a lot faster than waiting until they reach a Switch()/Dispatch() point.
Title: Re: AROS SMP Research: Technical Discussion
Post by: psxphill on August 22, 2013, 07:44:41 PM: Quote from: Ezrec;745855
Michal's planning on using IPI to signal the other CPUs to stop (on x86 SMP, there's isn't some "magic register" you can use to stop other CPUs, you have to ask them nicely), but it's a lot faster than waiting until they reach a Switch()/Dispatch() point.

Cool, that should work better.

Do you think the time the cpu is suspended not counting towards the current tasks quantum make sense? Otherwise the fairness will depend on what is running on the other cpu's & you could get one task that is permanently starved in pathological cases. If it's got it's own timer that fires when the quantum is up then it might just be a case of pausing it, but if you can only stop it you'd need to keep track of he current time left and use that when you start the cpu again.
Title: Re: AROS SMP Research: Technical Discussion
Post by: Ezrec on August 22, 2013, 07:46:22 PM: Quote from: psxphill;745857
Do you think the time the cpu is suspended not counting towards the quantum make sense? Otherwise the fairness will depend on what is running on the other cpu's & you could get one task that is permanently starved in pathological cases.

Right now, suspended CPUs do not have their Elapsed updated when they are suspended, so they should not be starved.
Title: Re: AROS SMP Research: Technical Discussion
Post by: minator on August 22, 2013, 08:00:51 PM: It's interesting that this is being tried but I suspect it will never get past the experimental phase.

Even if it can be made to work, it's going to serialise the CPUs so much that that's no point having multiple CPUs.

It might be possible to show a nice speedup on some long running highly parallelisable benchmark but that's it. In any real system apps will be constantly stalling the system and you don't need to be Gene Amdahl to know what the result will be.
Title: Re: AROS SMP Research: Technical Discussion
Post by: matthey on August 22, 2013, 09:12:22 PM: @Ezrec
Congratulation! Great effort! You have already proved some people wrong with your experiments.

How are you handling the ENABLE/DISABLE FORBID/PERMIT macros (ables.i) that increment and decrement the ExecBase IDNestCnt and TDNestCnt?

Quote from: minator;745861
It's interesting that this is being tried but I suspect it will never get past the experimental phase.

Even if it can be made to work, it's going to serialise the CPUs so much that that's no point having multiple CPUs.

It might be possible to show a nice speedup on some long running highly parallelisable benchmark but that's it. In any real system apps will be constantly stalling the system and you don't need to be Gene Amdahl to know what the result will be.

The performance of most current SMP processors would be limited by limitations of the AmigaOS. However, specialized hardware (and fpga-ware whatever you want to call it) could drastically reduce this overhead and increase compatibility. ExecBase could be setup in a particular area of memory with certain addresses that are monitored for changes and trigger some fpga programming action that affects all cores. Some of the multi-tasking and multi-core handling could even move into hardware (fpga code). Think of the Fido processor (68k) with it's semi-hardware handling of multi-tasking (it has a per task time slice countdown value with auto hardware interrupt when the time is up) being upgraded to SMP. It would be a little bit complex in hardware but then could offer the advantage of more protection of SMP and multi-tasking from errant and malicious software. Add partial memory protection with an MMU and virtual addressing for >4MB memory support (each task would be limited to 2MB or so) and the Amiga with 68k might be competitive again (with an ASIC). Gunnar von Boehn would like to make a multi-core version of the 68k Apollo processor. Duplicating the cores in fpga is very simple. The rest is just giving Jason what he needs provided his ideas do not have flaws ;).
Title: Re: AROS SMP Research: Technical Discussion
Post by: Ezrec on August 22, 2013, 09:34:39 PM: Quote from: minator;745861
Even if it can be made to work, it's going to serialise the CPUs so much that that's no point having multiple CPUs.

Very true.

I'm investigating a spinlock-style SignalSemaphore that has a lower latency for protecting frequently used internal data structures in Exec.

Just so people know - even though I am one of the m68k developers for AROS, AROS SillySMP is *not* targeted for m68k. It is only for *existing* processors with *actual* SMP hardware.

Once someone puts an actual piece of SMP m68k hardware into my hands, I'll be happy to develop for it.
Title: Re: AROS SMP Research: Technical Discussion
Post by: Zac67 on August 22, 2013, 09:48:20 PM: Sorry if I'm a bit naive here - but what would be the problem with leaving the other cores running on a Forbid() as long as they stay in userland?
Of course, you'd need to take care of them not running over another Forbid() or into an interrupt. Both could be prevented with a simple semaphore in Forbid() and all relevant interrupts (which need to be deferred until Permit()).
Title: Re: AROS SMP Research: Technical Discussion
Post by: psxphill on August 23, 2013, 12:12:19 AM: Quote from: Zac67;745872
Sorry if I'm a bit naive here - but what would be the problem with leaving the other cores running on a Forbid() as long as they stay in userland?

Because Forbid() is used to protect userland data structures shared between tasks too. As those tasks might be running on different cpu's then you have no choice but to stop other cpu's.
Title: Re: AROS SMP Research: Technical Discussion
Post by: kamelito on August 23, 2013, 12:22:32 AM: Might be interesting to ask Carl Sassenrath what he thinks about it.
Kamelito
Title: Re: AROS SMP Research: Technical Discussion
Post by: psxphill on August 23, 2013, 12:24:31 AM: Quote from: minator;745861
It might be possible to show a nice speedup on some long running highly parallelisable benchmark but that's it.

It doesn't have to be parallelisable, you could have two independent algorithms running on each cpu core.

If course it needs to be long running, because if it ran in a short amount of time then a 8mhz 68000 would be enough.

If you only have one task that is CPU bound then SMP can't help you, it doesn't matter what OS you use.

Quote from: minator;745861
In any real system apps will be constantly stalling the system and you don't need to be Gene Amdahl to know what the result will be.

If one of the tasks spends all it's time in a forbid then there will be no point in using SMP, but I don't believe this is a common case.

If it spends 1% of time in forbid then you will lose 1% of each cpu core, 20% of it's time in forbid and you will lose 20% of each cpu.

The forbid issue is not ideal, but I think you're overestimating the amount of time it will spend in forbid state.

Quote from: Ezrec;745870
I'm investigating a spinlock-style SignalSemaphore that has a lower latency for protecting frequently used internal data structures in Exec.

I'd get this working first, once you've got it working properly then you can see what effect the Forbid() has. Changing how the Exec structures are protected will mean changing applications too.

SMP 68000 is more likely to happen in an emulator than it is in hardware.
Title: Re: AROS SMP Research: Technical Discussion
Post by: bloodline on August 23, 2013, 12:35:29 AM: It would be a little bit weird and amazing to have Carl Sassenrath contribute to AROS :)
Title: Re: AROS SMP Research: Technical Discussion
Post by: takemehomegrandma on August 23, 2013, 02:09:49 AM: Quote from: matthey;745867
@Ezrec
You have already proved some people wrong with your experiments.

How so?
Title: Re: AROS SMP Research: Technical Discussion
Post by: Terminills on August 23, 2013, 02:22:30 AM: Quote from: takemehomegrandma;745909
How so?

Lets just wait and see what comes of it.
Title: Re: AROS SMP Research: Technical Discussion
Post by: psxphill on August 23, 2013, 02:32:55 AM: Quote from: matthey;745867
How are you handling the ENABLE/DISABLE FORBID/PERMIT macros (ables.i) that increment and decrement the ExecBase IDNestCnt and TDNestCnt?

That won't work anymore.

AFAICT that is from commodore's includes and isn't in AROS, so no software built for AROS should be legitimately manipulating that field in execbase already.

If someone creates a SMP 68k machine then they'll need to decide how to support binaries that do that. They could add hardware that checks for a write to that location and make it store the value in another register and cause an interrupt on the relevant cpu, in the interrupt handler it can check the value and call forbid or permit.

It's not a problem that needs solving yet anyway.
Title: Re: AROS SMP Research: Technical Discussion
Post by: takemehomegrandma on August 23, 2013, 02:46:35 AM: Quote from: Terminills;745911
(http://i231.photobucket.com/albums/ee23/scanlant/not_again_cat.jpg)

Lets just wait and see what comes of it.

What's that supposed to mean?

:confused:

I merely asked matthey to clarify his "You have already proved some people wrong with your experiments" statement.
Title: Re: AROS SMP Research: Technical Discussion
Post by: psxphill on August 23, 2013, 02:57:54 AM: Quote from: bloodline;745901
It would be a little bit weird and amazing to have Carl Sassenrath contribute to AROS :)

He's unlikely to have thought about exec in nearly 30 years. If he has any sense he's forgotten everything he knew about it.

There is unlikely to be any major design work left right now, although there might be some minor design work depending on what is found during coding/testing.

Coding, testing and fixing the current design and then testing the speed to see whether any changes are required is the current goal.

Even if it wastes 10% of each core then SMP could still have a big win.

However moving data between cpu cores might have an overhead, so sharing tasks across cpu's might not be the best strategy. It might make more sense to saturate a cpu and only spin up another cpu if there are still more tasks ready.

Only when the simple implementation is done can you get enough information to make those decisions. It's complex enough that guessing isn't easy.

Quote from: takemehomegrandma;745915
I merely asked matthey to clarify his "You have already proved some people wrong with your experiments" statement.

Some people said you couldn't do SMP with exec. Technically he hasn't proved them wrong as he's moved fields out of execbase, which was the only reason you can't do SMP with exec. His plan is to avoid the theoretical discussions of the implications of that and just try to code it, often this is the only way to solve an argument & it's actually how AROS came to exist.

Once you have an implementation then have a baseline. After evaluating it for any drawbacks you can try to address those and then when you test it you can know whether it's better or worse. The problem with arguing over technical concepts is that it is very difficult to judge their merit. Until you see the code running it's unlikely you'll have any idea what the cache implications are of using SMP etc.
Title: Re: AROS SMP Research: Technical Discussion
Post by: takemehomegrandma on August 23, 2013, 04:32:21 AM: Quote from: psxphill;745916
Some people said you couldn't do SMP with exec.

Wasn't the issue rather about SMP and full Amiga backward compatibility?

Either you have full compatibility, or software needs to be built for your particular system, right? If the latter, wouldn't it be better to take the opportunity to create a new multithreaded SMP environment/API once and for all (together with proper memory protection, 64-bit, etc while you're at it)? A "little" more work of course ;), but if you are making a "new" system anyway, why don't go all the way?

It's an interesting little AROS experiment though, and maybe something that's usable for real for AROS will come out of it eventually, who knows? :) I just wanted a clarification about the "proved people wrong" statement!

;)
Title: Re: AROS SMP Research: Technical Discussion
Post by: matthey on August 23, 2013, 05:00:58 AM: Quote from: matthey;745867
@Ezrec
You have already proved some people wrong with your experiments.

Quote from: takemehomegrandma;745909
How so?

Some people said you can't rocket to the moon and walk on it. They were proved wrong when it was accomplished by Neil Armstrong. Some people said that SMP couldn't be done with AmigaOS. Jason McMullan walked on the moon. It may be one small step but it's one giant leap for Amiga-kind ;).
Title: Re: AROS SMP Research: Technical Discussion
Post by: takemehomegrandma on August 23, 2013, 07:58:14 AM: Quote from: matthey;745921
Some people said that SMP couldn't be done with AmigaOS.

I don't think anyone has claimed that SMP couldn't be done in a situation where the precondition is that SW base is being built explicitly for that system? Rather the opposite, actually; this is a given! What "people" said is that true SMP can't be done without breaking the Amiga compatibility, which I suppose is a more relevant issue on MorphOS/OS4 than on AROS anyway, since the latter has been CPU/ISA agnostic since pretty much the beginning hence most people are happy to either run AROS builds of whatever SW they use, or run it in an UAE environment.
Title: Re: AROS SMP Research: Technical Discussion
Post by: wawrzon on August 23, 2013, 08:12:18 AM: back to topic please. this is supposed to be technical discussion.
Title: Re: AROS SMP Research: Technical Discussion
Post by: itix on August 23, 2013, 08:25:56 AM: Quote from: psxphill;745900

If it spends 1% of time in forbid then you will lose 1% of each cpu core, 20% of it's time in forbid and you will lose 20% of each cpu.

I think problem is not how much time is spent in Forbid() but overhead required to synchronize CPUs. In single core systems Forbid() is extremely light weight call. The same goes with Disable() which was implemented as simple assembler macro on old 68k days.

Now problem is that many many Exec calls are heavily depending on Disable()/Enable() to protect internal data structures. The message passing system in Exec is extremely light weight and used a lot in the system and in applications but in multicore systems it is far from ideal.

Having to stop other CPUs only to AddTail() new message is massive overhead.
Title: Re: AROS SMP Research: Technical Discussion
Post by: itix on August 23, 2013, 08:43:53 AM: Quote from: psxphill;745900
If it spends 1% of time in forbid then you will lose 1% of each cpu core, 20% of it's time in forbid and you will lose 20% of each cpu.

Oh, btw... you have to consider that other CPUs can call forbid not just that first one. When adding more CPUs chances to be in forbid state increases.

Problem could be demonstrated using silly pingpong task sending message back and worth constantly. Because sending a message requires forbid that task that could easily render other cores useless. Even when running at low priority it would disrupt higher priority task on other cores, due to forbid/disable semantics.

The net effect is high priority tasks run slower on multicore system than they would run on single core system.
Title: Re: AROS SMP Research: Technical Discussion
Post by: NorthWay on August 23, 2013, 09:40:35 AM: This reminds me of the BSD(?) Big Lock(?) discussion. All SMP stuff passed through a single lock and they did a lot of work to split the different parts into having their own lock.

First port of call is of course to drop Forbid/Permit use and go to using semaphores where possible, and perhaps find a lockless design.

IIRC I optimized my Exec patch to make semaphores assume success on first try and so not call Forbid if the semaphore was not held by someone else.
Title: Re: AROS SMP Research: Technical Discussion
Post by: psxphill on August 23, 2013, 12:35:10 PM: Quote from: itix;745940
Oh, btw... you have to consider that other CPUs can call forbid not just that first one. When adding more CPUs chances to be in forbid state increases.

Tell me how you will find out what the percentage of time various software will spend in forbid without trying it?

Quote from: itix;745940
Problem could be demonstrated using silly pingpong task sending message back and worth constantly. Because sending a message requires forbid that task that could easily render other cores useless. Even when running at low priority it would disrupt higher priority task on other cores, due to forbid/disable semantics.

You have always been able to write software for AmigaOS which disturbs high priority tasks. If it turns out that this is a problem that needs solving then you could try changing it so that cpu's will only ever be running a task that has the same priority. As high priority tasks are supposed to run for a short period of time then wasting the other cores during that time may not be a big deal. If you have a high priority task on AmigaOS that takes a long time then it becomes unusable (standard priority tasks like workbench won't be allowed to run at all). The priority in AmigaOS is quite fine grained (-128 to 127 IIRC) which means you could also derail this using as many as possible. But there is no reason why software that can take advantage of SMP shouldn't have limitations (like all Tasks that run at the same time have to run at the same priority).

I understand about the Forbid() overhead, if something spins in a Forbid()/Permit() call then it could cause problems. But is that something that any software should need/want to do? We pretty much have the source to all AROS software at this point & this isn't going to affect AROS 68k.

There is no reason why the number of cores in use at a time couldn't be dynamic & when you're only using 1 core then the Forbid()/Permit() overhead could be reduced to current levels. If it can detect situations where the SMP implementation will help and which will hurt then you could always end up benefiting.

Technologies like Intel Turboboost benefit from only using as many cores as necessary, i.e. when you're only using 1 core it can boost the clock speed but when you're saturating all cpu cores then it drops back to the default (some chips can sustain constant boosting, but in a laptop you'd want to minimize it for power usage).
Title: Re: AROS SMP Research: Technical Discussion
Post by: Terminills on August 23, 2013, 01:02:41 PM: Quote from: wawrzon;745932
back to topic please. this is supposed to be technical discussion.

Good point. Edited my post accordingly. :)
Title: Re: AROS SMP Research: Technical Discussion
Post by: ChaosLord on August 23, 2013, 01:37:06 PM: The whole Forbid() Permit() problem only affects old software on the old AmigaOS.

New software written for a new OS, such as a New AROS or new MorphOS can use semaphores to access the various protected OS structures.

So new software on a new OS can make full use of multiple processors.

Someone just has to code up a SMP-friendly new AROS, right?

Then I can start coding up gamez that make use of multiple cores, right?
Title: Re: AROS SMP Research: Technical Discussion
Post by: itix on August 23, 2013, 01:38:53 PM: Quote from: psxphill;745951
Tell me how you will find out what the percentage of time various software will spend in forbid without trying it?

Getting accurate results can be difficult but itcould be profiled. At least how many calls to Forbid() or Disable() there are per minute...

Quote
You have always been able to write software for AmigaOS which disturbs high priority tasks. If it turns out that this is a problem that needs solving then you could try changing it so that cpu's will only ever be running a task that has the same priority. As high priority tasks are supposed to run for a short period of time then wasting the other cores during that time may not be a big deal. If you have a high priority task on AmigaOS that takes a long time then it becomes unusable (standard priority tasks like workbench won't be allowed to run at all). The priority in AmigaOS is quite fine grained (-128 to 127 IIRC) which means you could also derail this using as many as possible. But there is no reason why software that can take advantage of SMP shouldn't have limitations (like all Tasks that run at the same time have to run at the same priority).

I know what you mean. There is a hope that at least sometimes some software could run on parallel. Even if you get only +10% instead of +100% it is better than nothing. In the future bottlenecks could be removed one by one.

Quote

I understand about the Forbid() overhead, if something spins in a Forbid()/Permit() call then it could cause problems. But is that something that any software should need/want to do? We pretty much have the source to all AROS software at this point & this isn't going to affect AROS 68k.

Often Forbid() or Disable() is called indirectly. Take this example:

Code: [Select]
sillypseudocode() { SetTaskPri(SysBase->ThisTask, -128); PutMsg(port, msg); while (true) PutMsg(GetMsg(port)); }
PutMsg() and GetMsg() have hidden Disable() but that is fine because on single core system high priority tasks get scheduled as soon as Enable() is called.

On multicore this would steal almost all available CPU time from each core.

Having Disable() free messaging system would solve this problem but this would have implications to all software. Another solution could be limiting scheduler to not schedule lower priority tasks on other cores as you mentioned.

Anyway, I just wanted to point out that biggest culprit is the OS itself and write some silly example :)
Title: Re: AROS SMP Research: Technical Discussion
Post by: NorthWay on August 23, 2013, 01:47:56 PM: Quote from: itix;745954
Getting accurate results can be difficult but itcould be profiled. At least how many calls to Forbid() or Disable() there are per minute...

I seem to remember some _old_tool that counted OS calls of your choice.
(When I say old I am thinking Fred Fish age.)
Title: Re: AROS SMP Research: Technical Discussion
Post by: bloodline on August 23, 2013, 01:51:56 PM: Quote from: itix;745954

Often Forbid() or Disable() is called indirectly. Take this example:

Code: [Select]
sillypseudocode() { SetTaskPri(SysBase->ThisTask, -128); PutMsg(port, msg); while (true) PutMsg(GetMsg(port)); }

Not quite relevant, but for SillySMP (currently) SysBase->ThisTask doesn't work anymore and you have to use findTask(null).
Title: Re: AROS SMP Research: Technical Discussion
Post by: warpdesign on August 23, 2013, 01:53:27 PM: @Itix: why do we need to halt multitask by using enable/disable ?
How do OS that support real SMP work ? I mean: what's the main difference with AmigaOS and "modern" OS ?
Title: Re: AROS SMP Research: Technical Discussion
Post by: bloodline on August 23, 2013, 02:16:51 PM: Quote from: warpdesign;745959
@Itix: why do we need to halt multitask by using enable/disable ?
How do OS that support real SMP work ? I mean: what's the main difference with AmigaOS and "modern" OS ?
One of the key problems with AmigaOS design is the sheer amount of freedom to access system structures that it allows :) but there are plenty of design decisions that never considered the machine would ever have more than one CPU.
Title: Re: AROS SMP Research: Technical Discussion
Post by: psxphill on August 23, 2013, 02:31:54 PM: Quote from: ChaosLord;745953
New software written for a new OS, such as a New AROS or new MorphOS can use semaphores to access the various protected OS structures.

I don't think this should even be considered unless as an absolute last resort. Making that compromise without knowing what the benefits are would be a mistake.

Quote from: warpdesign;745959
@Itix: why do we need to halt multitask by using enable/disable ?

Blame Carl Sassenrath.

Quote from: warpdesign;745959
How do OS that support real SMP work ? I mean: what's the main difference with AmigaOS and "modern" OS ?

AmigaOS has a lot of design mistakes in, which didn't matter so much on a games console from the early 1980's that would be around for a few years.

Worrying about not being able to use every ounce of cpu power when using SMP is a mistake. Windows/Linux has a high latency on a lot of it's api calls.

Quote from: itix;745954
Getting accurate results can be difficult but itcould be profiled. At least how many calls to Forbid() or Disable() there are per minute...

The number of calls is not the metric you need. It's how long it spends in Forbid(). You could make one call and stay in Forbid() for 99% of time, or 10 calls and only stay in Forbid() for 1% of time. This affects how much of each CPU you'll lose. The overhead of stopping and starting each cpu would also need to be taken into account, however this becomes even more of a problem to calculate because unless you've written the code and tested it you don't even know what the overhead will be. Plus just counting instructions doesn't help as modern CPU's are way too complex.
[/QUOTE]

Quote from: itix;745954
Code: [Select]
sillypseudocode() { SetTaskPri(SysBase->ThisTask, -128); PutMsg(port, msg); while (true) PutMsg(GetMsg(port)); }

I can play too.

Code: [Select]
sillypseudocode() { Forbid(); while (true); }
Sure there are pathological cases, The easiest way to speed up your program is for the user to not run it.

Quote from: itix;745954
Anyway, I just wanted to point out that biggest culprit is the OS itself and write some silly example :)

But you don't have any idea what the overhead of the biggest culprit is. It depends on how many messages are being processed, what work is done on each message.

The whole point of coding it was to avoid the constant arguments based on contrived examples & be able to see how real software that people might want to run will behave. It doesn't matter if it's not perfect, it's research. It could be derailed by something that nobody has considered.
Title: Re: AROS SMP Research: Technical Discussion
Post by: wawrzon on August 23, 2013, 02:58:18 PM: Quote from: NorthWay;745955
I seem to remember some _old_tool that counted OS calls of your choice.
(When I say old I am thinking Fred Fish age.)

wait a minute. couldnt you come up with what it was? there is tremendous overhead on some aros68k operations as i see on a slow system and it would be great to identify most frequently called functions while it happens without doing profiling job, which im not able to.
Title: Re: AROS SMP Research: Technical Discussion
Post by: psxphill on August 23, 2013, 03:16:50 PM: Quote from: wawrzon;745965
wait a minute. couldnt you come up with what it was? there is tremendous overhead on some aros68k operations as i see on a slow system and it would be great to identify most frequently called functions while it happens without doing profiling job, which im not able to.

You could, but it might lead you down the wrong path. If you have a function that is called 1000 times which takes 10ms or a function that is called 1 time which takes 100s then it won't help.

Profiling is the key, often bottlenecks show up in completely unexpected parts of the code. I've seen people spend time optimising code that when they'd finished made no perceivable difference, even though they could measure a 2x speed up in the function they sped up.

Some of the aros68k problems are caused by adding a level of abstraction to the graphics library, which wasn't designed to be as fast as it possibly could as an x86 was fast enough that you wouldn't care.

Also due to small/non existent caches on 68k hardware it's actually very hard to guess where the delays are going to be. For example what you consider a good algorithm choice could end up with the cache being thrashed, a less optimal design could end up being faster if it's memory access patterns suit the cache better. Making it aros68k faster will take a lot of research and effort. Just counting calls and then spending ten minutes rewriting the function with the most calls is quite dangerous, it might be slower in all cases except the one you tested & even if you speed it up it could end up being broken. Although I'm cynical after watching people do it repeatedly and fail (although they generally get to claim the credit before anyone finds out).
Title: Re: AROS SMP Research: Technical Discussion
Post by: wawrzon on August 23, 2013, 03:41:37 PM: Quote

If you have a function that is called 1000 times which takes 10ms or a function that is called 1 time which takes 100s then it won't help.

yes i know, and especially being not a programmer i dont expect a lot, though being able to see what happens while stall at least may suggest something. but lets not derail the thread.
Title: Re: AROS SMP Research: Technical Discussion
Post by: itix on August 23, 2013, 09:13:52 PM: Quote from: psxphill;745962

I can play too.

Code: [Select]
sillypseudocode() { Forbid(); while (true); }

Sure there are pathological cases, The easiest way to speed up your program is for the user to not run it.

You missed one very important difference. My example would run just fine on any traditional non-SMP Amiga OS system. Your example wouldnt.

Change Forbid() to SetTaskPri(task, 127) and you would have an example where SMP is superior to non-SMP system.
Title: Re: AROS SMP Research: Technical Discussion
Post by: Bif on August 23, 2013, 09:51:07 PM: I'm happy to see this work going on.

I've done nothing but write code for SMP (and AMP) game systems for the last 10 years, and almost every day I have to think about parallel programming problems. Based on this experience, my personal opinion is that an Amiga SMP system will perform better than most people expect. That said, I have zero recent programming experience on Amiga so I could also be talking out of my arse.

There's all sorts of talk of Forbid/permit/enable/disable and messages flying around. I think a key concept of writing a highly efficient CPU intensive program is to reduce all interactions with the OS as much as possible. This is regardless of what OS you are coding for - OS's always have overhead. If you take something like an MP3 encoder or decoder, how often do you need to interact with the OS and thus get stuck in a Forbid? If you are smart you will malloc all your memory up front on program startup so there will only be that initial interfacing to the OS for that. You then probably only need to ask the OS to handle file IO and maybe some output to the console. 99% of your CPU should be spent on computation, outside of any kind of Forbid(). Now I think that for any type of program where SMP is useful this will generally be true. Of course, you could do a really crap job of writing an MP3 encoder where you read and write 1 byte to files at a time instead of a block of data, and there will be Forbid() calls everywhere. But I'd also bet that program would run slow on AmigaOS as it is now.

I think the main culprits that will be issuing heaps of Forbid() calls will be programs with intense GUIs. E.g. a paint program. But ... are you going to be running a whole bunch of programs like this at once, or writing a multi-threaded GUI? I would think not too much. Probably something like a movie player in one window while painting in another window could cause some conflict. But worst case I think we are basically serializing all the drawing - even on a proper SMP OS that's probably much the case anyway.

Anyway, I think in particular if Forbid() can be made to only block if another CPU is already in a Forbid() call the SMP should give a pretty darn good benefit for those cases that actually need the extra CPU horsepower. If it can't work that way then certainly one Forbid() heavy program could lock things down a fair bit and introduce additional overhead that causes a net decrease in performance.

Very curious to see the results of the experiment, good luck.
Title: Re: AROS SMP Research: Technical Discussion
Post by: vidarh on August 23, 2013, 11:43:06 PM: Quote from: takemehomegrandma;745931
I don't think anyone has claimed that SMP couldn't be done in a situation where the precondition is that SW base is being built explicitly for that system? Rather the opposite, actually; this is a given! What "people" said is that true SMP can't be done without breaking the Amiga compatibility, which I suppose is a more relevant issue on MorphOS/OS4 than on AROS anyway, since the latter has been CPU/ISA agnostic since pretty much the beginning hence most people are happy to either run AROS builds of whatever SW they use, or run it in an UAE environment.

I think the "breaking Amiga compatibility is a bit too dogmatic. None of the modern alternatives after all run m68 Amiga software directly other than the 68k port of AROS - in all other instances any of the old software that we don't have sources for is run through an emulator/jit. That makes a huge difference, in that it is fully possible to detect attempts at accessing system structures etc., which means careful changes can be made while letting the emulator compensate (e.g. say take up a mutex/semaphore before accessing certain values)

The amount of "native" proprietary software for these new OS's where the author isn't still active is vanishingly small. And the open source software for these OS's can be updated reasonably easily. So if SMP support was added to all of these "tomorrow", how many applications would we realistically "lose"? And if done properly, the "worst case" scenario is to disable SMP while running them.

AROS is in the process of breaking all binary compatibility with past AROS versions anyway, so it's perfect timing for the SMP work as all apps will need to be recompiled, but frankly I just don't believe that it's worth being all that concerned about breaking compatibility over this - the systems where we may want to care about compatibility are the classics, and it's not like they are SMP systems anyway...
Title: Re: AROS SMP Research: Technical Discussion
Post by: vidarh on August 23, 2013, 11:53:13 PM: Quote from: psxphill;745962
I don't think this should even be considered unless as an absolute last resort. Making that compromise without knowing what the benefits are would be a mistake.

It's not a compromise. It is the clean alternative.

Forbid()/Disable() is the hacky comprise. In a single CPU system, when done in cases where you're "only" swapping a pointer or two, it can be forgiveable. In pretty much all other instances, it is a big, giant, lazy cop-out that made coding it a tiny bit easier back in the day compared to using semaphores and mutexes to protect the *specific* structures that an application needs to access safely.

Now, I agree that it's not necessary to jump straight into tearing out every Forbid()/Disable() you can find. But we can categorically state that having them there is bad. It's just that for the most part it won't be bad enough to prevent us from seeing some benefit from SMP, and so removing them can wait until there's some SMP support working.

It certainly makes sense to prioritise *where* to clean up Forbid()/Disable() calls first based on profiling which ones actually hurt the most.
Title: Re: AROS SMP Research: Technical Discussion
Post by: matthey on August 24, 2013, 01:26:36 AM: Quote from: NorthWay;745955
I seem to remember some _old_tool that counted OS calls of your choice.
(When I say old I am thinking Fred Fish age.)

Quote from: wawrzon;745965
wait a minute. couldnt you come up with what it was? there is tremendous overhead on some aros68k operations as i see on a slow system and it would be great to identify most frequently called functions while it happens without doing profiling job, which im not able to.

LibraryTimer perhaps?

http://aminet.net/search?query=LibraryTimer
Title: Re: AROS SMP Research: Technical Discussion
Post by: psxphill on August 24, 2013, 01:43:17 AM: Quote from: vidarh;746023
It certainly makes sense to prioritise *where* to clean up Forbid()/Disable() calls first based on profiling which ones actually hurt the most.

Removing Forbid calls is a compromise because it will break software in ways that is difficult to detect, but if the speed up is huge and can't be done in any better way then there may be some justification.

If you're allowed to break software then it becomes a whole lot easier.

Quote from: itix;746016
You missed one very important difference. My example would run just fine on any traditional non-SMP Amiga OS system. Your example wouldnt.

Change Forbid() to SetTaskPri(task, 127) and you would have an example where SMP is superior to non-SMP system.

Your software doesn't do anything useful and neither does mine, I don't see a difference. However when Windows started supporting multiple cpu cores there was software that ran worse on SMP systems than on single cpu systems. Until we can benchmark it then you can't tell if it's worth worrying about.
Title: Re: AROS SMP Research: Technical Discussion
Post by: itix on August 24, 2013, 02:12:56 AM: Quote from: vidarh;746022
AROS is in the process of breaking all binary compatibility with past AROS versions anyway, so it's perfect timing for the SMP work as all apps will need to be recompiled, but frankly I just don't believe that it's worth being all that concerned about breaking compatibility over this - the systems where we may want to care about compatibility are the classics, and it's not like they are SMP systems anyway...

This is not just a user application problem. Also the OS internals depend on Forbid/Enable and replacing Forbid/Enable in one OS call can cause serious implications to other parts in the OS.
Title: Re: AROS SMP Research: Technical Discussion
Post by: matthey on August 24, 2013, 02:43:15 AM: Quote from: psxphill;746034
However when Windows started supporting multiple cpu cores there was software that ran worse on SMP systems than on single cpu systems. Until we can benchmark it then you can't tell if it's worth worrying about.

For completely linear processing, a single core scalar processor is most efficient. Single core processors are generally more powerful than a single core in a processors with SMP that has the same resources. Likewise, a a scalar processor is generally going to be faster than a superscalar or OoO processor with all dependent code and the same resources. There is hardware overhead in parallel processing and that's not even counting the software overhead.
Title: Re: AROS SMP Research: Technical Discussion
Post by: psxphill on August 24, 2013, 09:12:14 AM: Quote from: matthey;746038
There is hardware overhead in parallel processing and that's not even counting the software overhead.

My point is that there is always a software overhead. Saying forbid will have to go because of an unmeasured software overhead is putting the cart before the horse.

It's not like an abi change where just a recompile is needed, it would require code audits and potentially redesigns. So changing from Forbid to a semaphore should be avoided until it's known what the advantage is (and if there is indeed a visible advantage).
Title: Re: AROS SMP Research: Technical Discussion
Post by: Karlos on August 24, 2013, 01:32:18 PM: Quote from: itix;746036
This is not just a user application problem. Also the OS internals depend on Forbid/Enable and replacing Forbid/Enable in one OS call can cause serious implications to other parts in the OS.

Of course. That said, since it's the OS you'd be working on, one would hope these are the ones that are easiest to find and engineer replacements for ;)
Title: Re: AROS SMP Research: Technical Discussion
Post by: wawrzon on August 24, 2013, 01:37:46 PM: Quote from: matthey;746033
LibraryTimer perhaps?

http://aminet.net/search?query=LibraryTimer

thanks, matt, great proggy! exactly what i was looking for.
Title: Re: AROS SMP Research: Technical Discussion
Post by: minator on August 24, 2013, 02:02:56 PM: Quote from: matthey;746038
For completely linear processing, a single core scalar processor is most efficient. Single core processors are generally more powerful than a single core in a processors with SMP that has the same resources.

Quote
Likewise, a a scalar processor is generally going to be faster than a superscalar or OoO processor with all dependent code and the same resources.

For that to be true every single instruction would have to depend on the previous instruction.
This is pretty much never the case so the OoO processor will win.

Quote
There is hardware overhead in parallel processing and that's not even counting the software overhead.

Yes.
Title: Re: AROS SMP Research: Technical Discussion
Post by: Blizz1220 on August 24, 2013, 03:39:40 PM: I remember that when AMD went dual core it made a press release saying that they are working on single core simulator/emulator in hardware to fight the lack of software that only used single core ...

What happened to that project ?
Title: Re: AROS SMP Research: Technical Discussion
Post by: psxphill on August 24, 2013, 06:25:04 PM: Quote from: Karlos;746058
Of course. That said, since it's the OS you'd be working on, one would hope these are the ones that are easiest to find and engineer replacements for ;)

Changing the OS will break the user programs too.

Quote from: Blizz1220;746075
I remember that when AMD went dual core it made a press release saying that they are working on single core simulator/emulator in hardware to fight the lack of software that only used single core ...

What happened to that project ?

Wasn't that a joke/fake? http://www.theinquirer.net/inquirer/news/1009078/reverse-hyperthreading-exist
http://forums.anandtech.com/archive/index.php/t-2198930.html