Welcome, Guest. Please login or register.

Author Topic: AROS SMP Research: Technical Discussion  (Read 33788 times)

Description:

0 Members and 2 Guests are viewing this topic.

Offline EzrecTopic starter

  • Jr. Member
  • **
  • Join Date: Aug 2010
  • Posts: 58
    • Show only replies by Ezrec
    • http://www.evillabs.net
AROS SMP Research: Technical Discussion
« on: August 22, 2013, 05:56:14 PM »
 

Offline EzrecTopic starter

  • Jr. Member
  • **
  • Join Date: Aug 2010
  • Posts: 58
    • Show only replies by Ezrec
    • http://www.evillabs.net
Re: AROS SMP Research: Technical Discussion
« Reply #1 on: August 22, 2013, 06:02:28 PM »
Quote from: psxphill;745838
Is that how your forbid works? That the other cpu is only stopped when it's quantum expires? That would probably be a mistake, the first cpu might be trying to send a message to port that is on the other cpu (which currently requires a forbid to make sure the port doesn't disappear before you send the message).

 
In that case, you (the programmer) need to update your code anyway, since you could have gotten pre-empted right before the SendMsg()/Signal() and lost that port, even on AmigaOS 3.x

 
Quote from: psxphill;745838
How is the second cpu task switching if it has no hardware interrupts? Is it controlled by the timer on cpu0?


That is architecture specific. On the 'unix hosted' AROS environment, CPU0 proxies the timer scheduling interrupt for the other CPUs.

On pc-x86, we will probably use the Local APIC timers, and an IPI signal to tell the cores to 'please stop running for a bit, until I tell you otherwise'.
 

Offline psxphill

Re: AROS SMP Research: Technical Discussion
« Reply #2 on: August 22, 2013, 06:12:55 PM »
Quote from: Ezrec;745841
In that case, you (the programmer) need to update your code anyway, since you could have gotten pre-empted right before the SendMsg()/Signal() and lost that port, even on AmigaOS 3.x

You need a forbid round the find/sendmsg, but you won't want the forbid to wait for all cpu's to finish their quantum. When the forbid happens the other cpu's need to stop what they are doing immediately.
 

Offline EzrecTopic starter

  • Jr. Member
  • **
  • Join Date: Aug 2010
  • Posts: 58
    • Show only replies by Ezrec
    • http://www.evillabs.net
Re: AROS SMP Research: Technical Discussion
« Reply #3 on: August 22, 2013, 06:17:47 PM »
Quote from: psxphill;745843
You need a forbid round the find/sendmsg, but you won't want the forbid to wait for all cpu's to finish their quantum. When the forbid happens the other cpu's need to stop what they are doing immediately.


I think you're misunderstanding: the Forbid() doesn't return until the other CPUs have stopped.
 

Offline EzrecTopic starter

  • Jr. Member
  • **
  • Join Date: Aug 2010
  • Posts: 58
    • Show only replies by Ezrec
    • http://www.evillabs.net
Re: AROS SMP Research: Technical Discussion
« Reply #4 on: August 22, 2013, 06:31:44 PM »
Quote from: Ezrec;745844
I think you're misunderstanding: the Forbid() doesn't return until the other CPUs have stopped.


And I think *I* have something wrong. Michal Shulz did some rough performance calculations, and even though my method (wait for quantum to expire) is semantically correct, the performance penalty is terrifying.

I'll experiment with signalling the other cores to stop immediately, and see how that works out.
 

Offline psxphill

Re: AROS SMP Research: Technical Discussion
« Reply #5 on: August 22, 2013, 06:57:01 PM »
Quote from: Ezrec;745846
And I think *I* have something wrong. Michal Shulz did some rough performance calculations, and even though my method (wait for quantum to expire) is semantically correct, the performance penalty is terrifying.

Yeah that was my point. Making forbid wait for the other cpus will mean that these four lines of code will take over 1 task quantum.
 
forbid()
permit()
forbid()
permit()
 
The first forbid() will take anywhere from nothing to 1 task quantum depending on how it aligns with the other cpu's tasks.
 
Stopping the other cpu's immediately will have some performance penalty, which even though it's much higher than the overhead in AOS 3.1, it should be nowhere near a quantum.
 
You also don't want the other cpu's tasks to lose their quantum when another cpu does a forbid(), the other cpu's tasks should have the quantum extended by the time they are suspended.
 
Rather than signalling the other cpu, it might be enough to actually stop them. The performance might depend on architecture, plus I don't know how you're abstracting all this stuff, so either way might make more sense.
« Last Edit: August 22, 2013, 07:04:01 PM by psxphill »
 

Offline EzrecTopic starter

  • Jr. Member
  • **
  • Join Date: Aug 2010
  • Posts: 58
    • Show only replies by Ezrec
    • http://www.evillabs.net
Re: AROS SMP Research: Technical Discussion
« Reply #6 on: August 22, 2013, 07:19:51 PM »
Quote from: psxphill;745851
Yeah that was my point. Making forbid wait for the other cpus will mean that these four lines of code will take over 1 task quantum.


Ok, looks like we're on the same page now.

Michal's planning on using IPI to signal the other CPUs to stop (on x86 SMP, there's isn't some "magic register" you can use to stop other CPUs, you have to ask them nicely), but it's a lot faster than waiting until they reach a Switch()/Dispatch() point.
 

Offline psxphill

Re: AROS SMP Research: Technical Discussion
« Reply #7 on: August 22, 2013, 07:44:41 PM »
Quote from: Ezrec;745855
Michal's planning on using IPI to signal the other CPUs to stop (on x86 SMP, there's isn't some "magic register" you can use to stop other CPUs, you have to ask them nicely), but it's a lot faster than waiting until they reach a Switch()/Dispatch() point.

Cool, that should work better.
 
Do you think the time the cpu is suspended not counting towards the current tasks quantum make sense? Otherwise the fairness will depend on what is running on the other cpu's & you could get one task that is permanently starved in pathological cases. If it's got it's own timer that fires when the quantum is up then it might just be a case of pausing it, but if you can only stop it you'd need to keep track of he current time left and use that when you start the cpu again.
« Last Edit: August 22, 2013, 07:47:18 PM by psxphill »
 

Offline EzrecTopic starter

  • Jr. Member
  • **
  • Join Date: Aug 2010
  • Posts: 58
    • Show only replies by Ezrec
    • http://www.evillabs.net
Re: AROS SMP Research: Technical Discussion
« Reply #8 on: August 22, 2013, 07:46:22 PM »
Quote from: psxphill;745857
Do you think the time the cpu is suspended not counting towards the quantum make sense? Otherwise the fairness will depend on what is running on the other cpu's & you could get one task that is permanently starved in pathological cases.


Right now, suspended CPUs do not have their Elapsed updated when they are suspended, so they should not be starved.
 

Offline minator

  • Hero Member
  • *****
  • Join Date: Jan 2003
  • Posts: 592
    • Show only replies by minator
    • http://www.blachford.info
Re: AROS SMP Research: Technical Discussion
« Reply #9 on: August 22, 2013, 08:00:51 PM »
It's interesting that this is being tried but I suspect it will never get past the experimental phase.

Even if it can be made to work, it's going to serialise the CPUs so much that that's no point having multiple CPUs.

It might be possible to show a nice speedup on some long running highly parallelisable benchmark but that's it.  In any real system apps will be constantly stalling the system and you don't need to be Gene Amdahl to know what the result will be.
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: AROS SMP Research: Technical Discussion
« Reply #10 on: August 22, 2013, 09:12:22 PM »
@Ezrec
Congratulation! Great effort! You have already proved some people wrong with your experiments.

How are you handling the ENABLE/DISABLE FORBID/PERMIT macros (ables.i) that increment and decrement the ExecBase IDNestCnt and TDNestCnt?  

Quote from: minator;745861
It's interesting that this is being tried but I suspect it will never get past the experimental phase.

Even if it can be made to work, it's going to serialise the CPUs so much that that's no point having multiple CPUs.

It might be possible to show a nice speedup on some long running highly parallelisable benchmark but that's it.  In any real system apps will be constantly stalling the system and you don't need to be Gene Amdahl to know what the result will be.


The performance of most current SMP processors would be limited by limitations of the AmigaOS. However, specialized hardware (and fpga-ware whatever you want to call it) could drastically reduce this overhead and increase compatibility. ExecBase could be setup in a particular area of memory with certain addresses that are monitored for changes and trigger some fpga programming action that affects all cores. Some of the multi-tasking and multi-core handling could even move into hardware (fpga code). Think of the Fido processor (68k) with it's semi-hardware handling of multi-tasking (it has a per task time slice countdown value with auto hardware interrupt when the time is up) being upgraded to SMP. It would be a little bit complex in hardware but then could offer the advantage of more protection of SMP and multi-tasking from errant and malicious software. Add partial memory protection with an MMU and virtual addressing for >4MB memory support (each task would be limited to 2MB or so) and the Amiga with 68k might be competitive again (with an ASIC). Gunnar von Boehn would like to make a multi-core version of the 68k Apollo processor. Duplicating the cores in fpga is very simple. The rest is just giving Jason what he needs provided his ideas do not have flaws ;).
 

Offline EzrecTopic starter

  • Jr. Member
  • **
  • Join Date: Aug 2010
  • Posts: 58
    • Show only replies by Ezrec
    • http://www.evillabs.net
Re: AROS SMP Research: Technical Discussion
« Reply #11 on: August 22, 2013, 09:34:39 PM »
Quote from: minator;745861
Even if it can be made to work, it's going to serialise the CPUs so much that that's no point having multiple CPUs.


Very true.

I'm investigating a spinlock-style SignalSemaphore that has a lower latency for protecting frequently used internal data structures in Exec.

Just so people know - even though I am one of the m68k developers for AROS, AROS SillySMP is *not* targeted for m68k. It is only for *existing* processors with *actual* SMP hardware.

Once someone puts an actual piece of SMP m68k hardware into my hands, I'll be happy to develop for it.
 

Offline Zac67

  • Hero Member
  • *****
  • Join Date: Nov 2004
  • Posts: 2890
    • Show only replies by Zac67
Re: AROS SMP Research: Technical Discussion
« Reply #12 on: August 22, 2013, 09:48:20 PM »
Sorry if I'm a bit naive here - but what would be the problem with leaving the other cores running on a Forbid() as long as they stay in userland?
Of course, you'd need to take care of them not running over another Forbid() or into an interrupt. Both could be prevented with a simple semaphore in Forbid() and all relevant interrupts (which need to be deferred until Permit()).
 

Offline psxphill

Re: AROS SMP Research: Technical Discussion
« Reply #13 on: August 23, 2013, 12:12:19 AM »
Quote from: Zac67;745872
Sorry if I'm a bit naive here - but what would be the problem with leaving the other cores running on a Forbid() as long as they stay in userland?

Because Forbid() is used to protect userland data structures shared between tasks too. As those tasks might be running on different cpu's then you have no choice but to stop other cpu's.
 

Offline kamelito

Re: AROS SMP Research: Technical Discussion
« Reply #14 on: August 23, 2013, 12:22:32 AM »
Might be interesting to ask Carl Sassenrath what he thinks about it.
Kamelito