Amiga.org
Amiga computer related discussion => Amiga Hardware Issues and discussion => Topic started by: rkauer on March 27, 2008, 11:33:12 PM
-
Yeah, I know this was discussed to the end of the world, but this time I found some feed for the kitty:
Looking inside this PPC emulator (http://aminet.net/package/dev/lib/libocpuis), I think:
If someone strip the emulation code down, and construct a table to emulate the calls from a real Amiga to the Coldfire accelerator, can it be used as a real Amiga upgrade?
Critics, please.
BTW: I know the translation process (and catch everything) is slow as hell. But even then a MPC54xx CPU can run circles around a 060@50MHz, maybe the slowdown (in certain instructions) made it comparable to a 040@40MHz (not bad at all!).
-
Hello rkauer,
I wrote a similar emulator (not completely finished) :
Mine is targeted for CF with internal SRAM.
I run the emulation code from it, for two reasons :
- The code is a self-modifying code (I know it is bad but it is so efficient)
- The SRAM is located at address 0xFFFF8000 (the upper 2 GB cannot be used as memory on Amiga) -> the 1024-entry jump table is done with WORDs (move.w xxx,An does a sign extension :-D)
With all these tricks I can achieve an average of 20 instructions per emulated instructions.
Regards,
Frederic
-
Good job!
That's an elegant solution for Coldfire's 68k emulation!
Since it have a RAM controller, it's just a matter of using the upper "RAM" space to manage the table.
Meh/ thinks we are coming to a solution!
-
@rkauer
Whats the point?
I mean either you run it native on the Hardware or you emulate it.
If you accept some buggy opcodes (compaired to the m68k serie) you can use ColdFire... if you emulate, ColdFire isent really the smartest choice.
RWO
-
FrenchShark wrote:
Hello rkauer,
I wrote a similar emulator (not completely finished) :
Mine is targeted for CF with internal SRAM.
I run the emulation code from it, for two reasons :
- The code is a self-modifying code (I know it is bad but it is so efficient)
- The SRAM is located at address 0xFFFF8000 (the upper 2 GB cannot be used as memory on Amiga) -> the 1024-entry jump table is done with WORDs (move.w xxx,An does a sign extension :-D)
With all these tricks I can achieve an average of 20 instructions per emulated instructions.
Regards,
Frederic
Sounds very cool. Are you saying you are emulating all 68k though or you are still just talking about emulating the missing op codes?
@rkauer
Whats the point?
I don't know much about how all this stuff works, but yeah I could see if you were just going to emulate "all" the 68k codes it seems like it would make more sense to just use a more widely available CPU like a CORE2Duo or something. Seems like it would be 'as' difficult. But I don't know. Hard work either way it seems.
-
@rkauer
Whats the point?
I don't know much about how all this stuff works, but yeah I could see if you were just going to emulate "all" the 68k codes it seems like it would make more sense to just use a more widely available CPU like a CORE2Duo or something. Seems like it would be 'as' difficult. But I don't know. Hard work either way it seems.[/quote]
Hey, not so quick with your conclusions. :-)
Lets look at all PROS and CONS first!
There is a major advantage that the Coldfire has got.
You can buy the "source" of the Coldfire for an affordable sum. This means that you can "bake" your own Coldfire including AGA/SuperAGA in one Chip.
Continuing from FrenchSharks point:
You can bake a Coldfire including AGA => which is basicly then an AMIGA on a single chip.
You can get Coldfire including AGAChipset to 400/500 Mhz.
If you are into comparing numbers then:
- The resulting SuperAGA blitter can be about 200 times faster than the AMIGA AGA was.
- The CPU is net about 10 times faster than a 68060.
- More than about 100 times faster A1200/020
Even in FrenchSharks "68k emul mode" its still about 10 times faster than an A1200/020.
Nothing to sneeze about.
The key is having it in a the single chip.
Based on this you can create a 500MHz AMIGA of the size and price of a fat USB stick.
-
From my knowledge, the goal is emulate only the bad instructions, but even this way, you have to emulate the supervisor mode of the CPU.
So here is the catch: construct a resident-hardware interpreter outside the Amiga memory space (a simple "mini" Spartan or other FPGA can do this) and, with the right code, forward to the CPU only the "good code" untouched. Almost the same approach as a table interpreter.
-
rkauer wrote:
From my knowledge, the goal is emulate only the bad instructions, but even this way, you have to emulate the supervisor mode of the CPU.
So here is the catch: construct a resident-hardware interpreter outside the Amiga memory space (a simple "mini" Spartan or other FPGA can do this) and, with the right code, forward to the CPU only the "good code" untouched. Almost the same approach as a table interpreter.
I see where you are coming from but I think this is not worth doing.
I'll try to explain why:
For the sake of easy discussion lets quickly compare the net
performance of the Coldfire and 68k.
Every application has of course different needs but lets just use one example for the sake of argument.
Lets say we have a piece of code (a loop) that uses:
4 register operations (ie: cmp, or add.l dx,dy)
2 memory operations (ie: add ,dx)
1 multiplication
2 index adressing modes (ie: move.l 2(a0,d0),d1)
1 branch (taken)
Now look how long (many clocks) a 68000 will need for this:
4 reg x 6
2 mem x 18
1 mul x 40
2 ind x 18
1 bra x 10
------
146 clocks
Now look how long (many clocks) a 68020 will need for this:
4 reg x 2
2 mem x 6
1 mul x 28
2 ind x 9
1 bra x 6
------
72
Now look how long (many clocks) a 68060 will need for this:
4 reg x 1
2 mem x 1
1 mul x 2
2 ind x 1
1 bra x 0
------
10 => 7 clocks
Yes, the 68060 can execute every instruction in one clock (the multiplication takes 2).
The 68060 can do loops for free (taken branch = 0 clocks)
And the 68060 has two instruction units. Allowing it to do two instructions per clock! Depending of your code structure the 10 instructions in the example loop could be folded down to 4.5 clocks in the best base.
For the sake of argument lets say the 68060 needs 7 clocks.
Looking at these numbers will give us a better feeling for the CPUs.
Now comparing the CPU based on their used clock rates:
68000 7.09 MHz = 48K
68020 14.2 Mhz = 196K
68060 50.0 Mhz = 7142K
In other words:
The A1200 (14Mhz 68020) is 4.0 times faster than a 68000
The (50MHz 68060) is 148 times faster than a 68000
The (50MHz 68060) is 36 times faster than a 68020
The above example is a realistic code sequence.
It show us the net CPU performance.
Now the behavior of Coldfire V5 is very much comparable to an 68060 but higher clocked.
In other words (our assumed) Coldfire 500 Mhz AMIGA has the CPU power to be 1480 times faster than a A500 (68000) with fastmem. Yes, thats over one thousand and four hundred times!
This single chip 500 Mhz AMIGA has as well the CPU power
to be 360 times faster than a A1200 (68020) with fastmem. Yes, thats over three hundred times!
Of course this is net CPU power.
Considering cache missed and memory latency the effective gained performance is of course a bit lower.
The point that we are trying to make here is:
If you was running an application on original Amiga then their speed was limited by the GFX subsystem (blitter) and by the CPU power.
As the SuperAGA Blitter is over 100 times faster - there is no GFX speed limit anymore!
The Coldfire is so much faster that even in full emulation mode you have many times the CPU power of the A1200.
Coldfire is net 360 times faster than A1200.
If FrenchSharks Code divides this by 20 its still over 10 times faster than A1200 was.
New applications with "coldfire" clean 68k code will be able to leverage the full performance of the Coldfire.
In other words this single Chip Amiga will then be 10 times faster than the fastest Cyberstorm was.
I don't know about you but for me this is enough!
If you would force me to add something to this chip then I would add a programmable DSP into the chipset.
Something like the AXE of the new e300 Freescale chip.
Such a SuperCopper could be used the decode MP4, MP3, DIVX etc for free.
Then this one Chip Amiga can do all that I need.
-
Some of these "so called" technical discussions dont half make me laugh.
-
If someone is serious about doing Coldfire development:
There is a limited number of sponsored (free) Coldfire V4 Development systems available on http://www.powerdeveloper.org/
If you propose a sensible project there is a good chance that you get a free board.
I assume that I'll get my board early next week. :-)
-
I really don't get the obsession with the Coldfire! if you want a small core to use for emulating a 68k... either design one yourself, I made a big post about this in a previous thread... or licence an ARM or MIPS, both of which are smaller, better supported and preferable to the hacked up mess of a CPU that the Coldfire is...
In fact I would go as far to say the MIPS is the better choice, since it has more registers than the 68k, which is a really good idea....
-
Here is a good thread to read:
http://www.embeddedrelated.com/usenet/embedded/show/74090-1.php
Nice little quote:
Coldfire looks not so exciting, since it's based on the CISC m68k, though
I gather it's got a cleaned-up instruction set. The AVR32 looks really
cool, like it is designed to be a do-everything architecture... good code
density, DSP capabilities, SIMD capabilities, MMU, Java, wow. It looks
like Atmel is aiming to really shake up the 32-bit embedded market.
> Few these days will care about assembler level code, so choose the best
> uC for the task you have.
>
> Debug support, and tools, will start to matter in many design starts.
Yeah, the thing that's appealing about MIPS and ARM to me is that they are
extremely well-supported by Linux and the GCC toolchain. Also, MIPS is
even cooler cause there are free Verilog implementations to play around
with!
In summery, it would seem the MIPS core is ideal for our task... Verilog cores are free, it's well supported, its design is suited to Register based tasks like emulation... it's small...
-
bloodline wrote:
I really don't get the obsession with the Coldfire! if you want a small core to use for emulating a 68k... either design one yourself, I made a big post about this in a previous thread... or licence an ARM or MIPS, both of which are smaller, better supported and preferable to the hacked up mess of a CPU that the Coldfire is...
In fact I would go as far to say the MIPS is the better choice, since it has more registers than the 68k, which is a really good idea....
Come on Bloodline,
you could create a higher quality post than this, can't you?
hacked up mess of a CPU that the Coldfire
WTF? The Coldfire is a very logic, clean design.
Design one yourself
Very thoughtless proposal.
How many people do you know that can design a fully fledged CPU like the Coldfire, and can do this cheaper than then core is at Freescale?
MIPS is the better choice
On what experience do you base your claim?
Gave you ever developed for mips?
-
biggun wrote:
bloodline wrote:
I really don't get the obsession with the Coldfire! if you want a small core to use for emulating a 68k... either design one yourself, I made a big post about this in a previous thread... or licence an ARM or MIPS, both of which are smaller, better supported and preferable to the hacked up mess of a CPU that the Coldfire is...
In fact I would go as far to say the MIPS is the better choice, since it has more registers than the 68k, which is a really good idea....
Come on Bloodline,
you could create a higher quality post than this, can't you?
Apparently not... :-(
hacked up mess of a CPU that the Coldfire
WTF? The Coldfire is a very logic, clean design.
The Coldfire is an interesting design, for sure! But if we are talking about a small efficient CPU core for an ASIC/FPGA that is to be used for Emulating a 68k... then the Coldfire offers us nothing.... Unlike the other cores I suggested.
Design one yourself
Very thoughtless proposal.
How many people do you know that can design a fully fledged CPU like the Coldfire, and can do this cheaper than then core is at Freescale?
There are websites full of peoples CPU experiments, on FPGAs... If we want a CPU that is specifically for emulating a 68k, then we could probably design a much better one than the coldfire.
MIPS is the better choice
On what experience do you base your claim?
Only on what I've read... I am a big fan of the MIPS design, I love it's simplicity.
Gave you ever developed for mips?
That's true, I have never developed for MIPS... but I wrote a VM (as a DSP plugin engine) that ended up looking very much like a MIPS, which is what led me to read up about the architecture... and I like the design choices made.
-
Some one better start stripping down that emulator then .. :-)
-
The Coldfire is an interesting design, for sure! But if we are talking about a small efficient CPU core for an ASIC/FPGA that is to be used for Emulating a 68k...
then the Coldfire offers us nothing....
You are aware that you can run 68k code in Coldfire, are you?
Yes the Coldfire does not implement ALL 68k instructions natively but it implement many 68k instructions natively.
The other day I ran an old AMIGA packer with the Coldfire library and funnily enough there were only 4-5 instructions in the whole binary which were not Coldfire native.
Maybe this was a lucky example but it shows that you do not need to emulate every instruction.
Depending on your application the Coldfire can get away running 90% of the instructions natively.
Cheers
-
biggun wrote:
The Coldfire is an interesting design, for sure! But if we are talking about a small efficient CPU core for an ASIC/FPGA that is to be used for Emulating a 68k...
then the Coldfire offers us nothing....
You are aware that you can run 68k code in Coldfire, are you?
I've never had the chance to develop for a Coldfire so I can't say for sure just how 68k compatible it is. I have read the developer documents though.
Yes the Coldfire does not implement ALL 68k instructions natively but it implement many 68k instructions natively.
Many... but the 68k offers something like 1500 possible opcode/operand combinations... The Coldfire seems to lack a significant number instructions (not a problem, as they can be trapped, but a speed penalty none the less), it's missing a lot of addressing modes a big problem with a CISC design like the 68k (still they can be trapped etc...)... now the two big show stoppers for me are the instructions that are functionally different, this would require a proper emulator to correct... and the final thing that puts me off the cold fire... the supervisor mode is totally different, the only way around that is to build a new AmigaOS... or preferably use AROS, if we can get some more 68k devs on board.
Until I get a chance to play with one I can't be convinced from the documentation that the Coldfire is good for our use.
The other day I ran an old AMIGA packer with the Coldfire library and funnily enough there were only 4-5 instructions in the whole binary which were not Coldfire native.
4 or 5? With something like a CPU we can't be vague... programs either work or don't... there is no half measures.
Maybe this was a lucky example but it shows that you do not need to emulate every instruction.
Depending on your application the Coldfire can get away running 90% of the instructions natively.
But HOW do you know which instructions you need to emulate without a full Emulator?
-
biggun wrote:
Depending on your application the Coldfire can get away running 90% of the instructions natively.
Which is why Coldfire is the best upgrade for 68k architecture, and if it works, would be a good choice for the Natami.
biggun wrote:
You can buy the "source" of the Coldfire for an affordable sum. This means that you can "bake" your own Coldfire including AGA/SuperAGA in one Chip.
If you can "bake" your own Coldfire, would it be possible to fix the few misbehaving instructions to make a fully 68k-compatible custom Coldfire CPU? I'm assuming the fact that one CPU uses a 16-bit architecture vs a 32-bit architecture in the other CPU doesn't matter as the vast majority of instructions already work perfectly.
-
HenryCase wrote:
biggun wrote:
Depending on your application the Coldfire can get away running 90% of the instructions natively.
Which is why Coldfire is the best upgrade for 68k architecture, and if it works, would be a good choice for the Natami.
Sure... but only if we had our 68k source code... With the functionally different instructions and totally different supervisor mode... you may as well use a CPU that has nothing to do with the 68k... but is better supported...
biggun wrote:
You can buy the "source" of the Coldfire for an affordable sum. This means that you can "bake" your own Coldfire including AGA/SuperAGA in one Chip.
If you can "bake" your own Coldfire, would it be possible to fix the few misbehaving instructions to make a fully 68k-compatible custom Coldfire CPU?
You buy a licence to use the core, not modify it. You would need the development documents too... and there are reasons why the instructions work differently, it's to get the speed up!!!
I'm assuming the fact that one CPU uses a 16-bit architecture vs a 32-bit architecture in the other CPU doesn't matter as the vast majority of instructions already work perfectly.
Err... the coldfire is 32bit, just like the 68k... :-?
-
bloodline wrote:
You buy a licence to use the core, not modify it.
Shame.
bloodline wrote:
You would need the development documents too... and there are reasons why the instructions work differently, it's to get the speed up!!!
Surely a hardware implemented function would be faster than an emulated one?
bloodline wrote:
I'm assuming the fact that one CPU uses a 16-bit architecture vs a 32-bit architecture in the other CPU doesn't matter as the vast majority of instructions already work perfectly.
Err... the coldfire is 32bit, just like the 68k... :-?
[/quote]
I thought the 68k family of CPUs was 16-bit, classic Amigas were always referred to as 16-bit computers, right?
-
HenryCase wrote:
bloodline wrote:
You buy a licence to use the core, not modify it.
Shame.
bloodline wrote:
You would need the development documents too... and there are reasons why the instructions work differently, it's to get the speed up!!!
Surely a hardware implemented function would be faster than an emulated one?
The Coldfire engineers removed all the bits of the 68k that slowed the design down... if you put them back in, you slow the design down.
bloodline wrote:
I'm assuming the fact that one CPU uses a 16-bit architecture vs a 32-bit architecture in the other CPU doesn't matter as the vast majority of instructions already work perfectly.
Err... the coldfire is 32bit, just like the 68k... :-?
I thought the 68k family of CPUs was 16-bit, classic Amigas were always referred to as 16-bit computers, right?[/quote]
Just the external data bus of the 68000... nothing to do with the architecture of the CPU.
-
You need a binary scanner of some form, it scans the binary before you run it and adds in routines to replace the unsupported instructions.
Be a lot easier then writing a full emulator or JIT engine for a different CPU.
-
minator wrote:
You need a binary scanner of some form, it scans the binary before you run it and adds in routines to replace the unsupported instructions.
Be a lot easier then writing a full emulator or JIT engine for a different CPU.
Such a solution might run faster than a full emulator... though what you are suggesting is just a JIT, that sometimes spits out the instructions unchanged...
-
though what you are suggesting is just a JIT, that sometimes spits out the instructions unchanged.
*cough* Dynamo-style JIT *cough* ;-)
Dynamo (a JIT made by Hewlett Packard) demonstrates the amusing (and at first glance ludicrous) fact that a hotspot JIT can 'emulate' code running the same processor it itself is running on faster than the CPU can run code natively.
The reason this is possible is down to the fact that at runtime you know more state information than you ever did at compile time. Consequently, a lot of if/else/switch/case/for/while etc code ends up taking only one or two possible paths at runtime (compared to many more possible paths at compile time) and unused code paths can be optimised away by the JIT.
The main overhead of any JIT system is the on-the-fly recompilation stage that's kicked off when the system encounters new code. Translating code for one CPU to another can be quite expensive where their architectures are very different. However, when most of your "recompilation" involves simply copying (rather than translating) the original code, that overhead is mitigated substantially.
Using such a mechanism, I expect a current generation coldfire core could run 680x0 code extremely well and without any of the performance problems trapping individual unimplemented instructions cause.
If only there were 24 more hours in my day I'd look at it.
-
Karlos wrote:
though what you are suggesting is just a JIT, that sometimes spits out the instructions unchanged.
*cough* Dynamo-style JIT *cough* ;-)
Dynamo (a JIT made by Hewlett Packard) demonstrates the amusing (and at first glance ludicrous) fact that a hotspot JIT can 'emulate' code running the same processor it itself is running on faster than the CPU can run code natively.
The reason this is possible is down to the fact that at runtime you know more state information than you ever did at compile time. Consequently, a lot of if/else/switch/case/for/while etc code ends up taking only one or two possible paths at runtime (compared to many more possible paths at compile time) and unused code paths can be optimised away by the JIT.
The main overhead of any JIT system is the on-the-fly recompilation stage that's kicked off when the system encounters new code. Translating code for one CPU to another can be quite expensive where their architectures are very different. However, when most of your "recompilation" involves simply copying (rather than translating) the original code, that overhead is mitigated substantially.
Using such a mechanism, I expect a current generation coldfire core could run 680x0 code extremely well and without any of the performance problems trapping individual unimplemented instructions cause.
If only there were 24 more hours in my day I'd look at it.
Dynamo is bit more than a JIT :-) since it's more like the front end of a CPU like the Athlon, done in software! Which is out of the scope of this project... especially while we don't know the implementation details of the Coldfire.
If we are to use the coldfire, the the JIT is the only way to go... even though I see this as a good opportunity to rid ourselves of the 68k (no matter how much I like it).
-
Semantics, my dear fellow :-D. Dynamo has been described by its creators as a hotspot JIT (like most other JIT implementation it also allows non-critical code to run through in interpreted mode). It dynamically recompiles critical sections to eliminate dead code branches, early returns etc. It simply happens to be the case that the target CPU is the same class as the source.
What you are alluding to are the deep implementation detail of how it works. That it is similar to the AthlonXP's instruction queue/decoder doesn't mean it is fundamentally different to any existing optimizing JIT as most of them employ the same sorts of code pruning.
-
But HOW do you know which instructions you need to emulate without a full Emulator?
Bloodline, you are funny :-)
You are not to shy to give advices on which CPU to use and rant about the Coldfire, but are you sure that you understood the Coldfire correctly?
No offence, but the risk that in 68k program runs to problem on the Coldfire is very, very small.
If you reread the Coldfire manual, you will realize that there nearly no 68000 instructions that are executed differently and could cause a problem.
A good starting point is:
http://www.microapl.co.uk/Porting/ColdFire/Download/pa68kcf.pdf
Cheers
-
biggun wrote:
But HOW do you know which instructions you need to emulate without a full Emulator?
Bloodline, you are funny :-)
I like to think so! :-)
You are not to shy to give advices on which CPU to use and rant about the Coldfire, but are you sure that you understood the Coldfire correctly?
I may well rant, but I have stated clearly that I've not actually used a coldfire, ever.
No offence, but the risk that in 68k program runs to problem on the Coldfire is very, very small.
Any risk is too much... but that's not the point, you don't have any detailed stats yet... and I don't intend to test it out myself. Until it's been tested we can't know.
There is however sufficient evidence that the Coldfire is unsuitable. Number one is that no coldfire boards exist for the Amiga, despite it being nearly a decade since the coldfire was released. Number two from what I've read, it doesn't seem so 68k object code compaible...
If you reread the Coldfire manual, you will realize that there nearly no 68000 instructions that are executed differently and could cause a problem.
A good starting point is:
http://www.microapl.co.uk/Porting/ColdFire/Download/pa68kcf.pdf
Cheers
Yes, I've read it :-)
-
Well, FWIW, I don't think a straightforward trap-and-emulate based amiga accelerator mechanism would work that well, otherwise we'd have seen one by now.
I seem to recall, but I may be wrong, the problem is that certain opcodes actually behave differently to the same operations on m68k. That is to say, they are implemented but operate slightly differently to the 680x0.
I mean an instruction that works but works differently to what you expect is probably worse than one that isn't implemented at all as you can't really trap it in the first place.
-
Karlos wrote:
I seem to recall, but I may be wrong, the problem is that certain opcodes actually behave differently to the same operations on m68k. That is to say, they are implemented but operate slightly differently to the 680x0.
Can you give a real example, or is this a hear say rumor mill?
-
biggun wrote:
Karlos wrote:
I seem to recall, but I may be wrong, the problem is that certain opcodes actually behave differently to the same operations on m68k. That is to say, they are implemented but operate slightly differently to the 680x0.
Can you give a real example, or is this a hear say rumor mill?
Well, for one, I seem to recall that MULS and MULU fail to set the overflow bit of the condition code register.
If your 68k code looks at the CCR to see if an overflow occurred after a multiplication and perform some specific action, it isn't going to behave the same on both CPU's under all circumstances.
There were a few other nuances like this, but I'd need to check and don't have time.
-
Personally, I'd rather see a core duo accelerator with a ram boost and a JIT emulator, a la Amithlon. With the custom chip code removed, that sh!t flies. *THEN* we could also do away with the AROS 68K and just use the X86 code, with minimal updates (I assume here - I'm a graphic designer. I'm not a coder).
Wouldn't it be more prudent to build a new accelerator card that *WASN'T* a 68k, but something more modern? Seriously. All joking aside, our Miggy's aren't getting any younger and there's no way in hell most of us could afford a "new" amiga anyways. I like the Clone A and I really like the NatAmi, but I want something faster than a 68k. I understand that these engineers are working in their own time, doing something they love and want to do, but I can't afford the prices of all this new hardware for a hobby machine. That's why instead of buying a MiniMig, I just got (a free, no less) A500 that does all my retro needs, so far.
That's just me, though.
Someday, I'm going to win the lotto and when I do, it's on. :lol:
-
Well, for one, I seem to recall that MULS and MULU fail to set the overflow bit of the condition code register.
Correct but the 68Klib provided free by freescale can emulate these instructions, you simply have to add an instruction before it to trigger the CPU's invalid instruction trap and then the emulator will give you a fully 68K compatiable MULS and MULU. (the other instructions are the DIV ones I think)
This wouldnt need to be done at compile time, a program could be wrote to insert the trap code into a binary file at the correct places.
/me goes back to watching all the Coldfire threads
-
Oli_hd wrote:
Well, for one, I seem to recall that MULS and MULU fail to set the overflow bit of the condition code register.
Correct but the 68Klib provided free by freescale can emulate these instructions, you simply have to add an instruction before it to trigger the CPU's invalid instruction trap and then the emulator will give you a fully 68K compatiable MULS and MULU. (the other instructions are the DIV ones I think)
This wouldnt need to be done at compile time, a program could be wrote to insert the trap code into a binary file at the correct places.
/me goes back to watching all the Coldfire threads
Doesn't that change the length of the instruction stream? If so, presumably you then need to update all the branches too?
-
Oli_hd wrote:
Well, for one, I seem to recall that MULS and MULU fail to set the overflow bit of the condition code register.
Correct but the 68Klib provided free by freescale can emulate these instructions, you simply have to add an instruction before it to trigger the CPU's invalid instruction trap and then the emulator will give you a fully 68K compatiable MULS and MULU. (the other instructions are the DIV ones I think)
This wouldnt need to be done at compile time, a program could be wrote to insert the trap code into a binary file at the correct places.
And recalculate the offsets...? what about checksums... I think this idea is really difficult!
/me goes back to watching all the Coldfire threads
You have the most experience with the CF on this board you should say more!!
-
@Methuselas
Thomas intends to make information on the CPU slot available to third parties, so they can create their own accelerator cards.
-
bloodline wrote:
You have the most experience with the CF on this board you should say more!!
Quite.
-
Karlos wrote:
Well, for one, I seem to recall that MULS and MULU fail to set the overflow bit of the condition code register.
To be precise here:
The 68000 instruction MULS.W did NEVER set the overflow bit on 68K.
This instruction is working 100% the same on the Coldfire.
The instruction that you are referring to is the muls.L and this instruction was 68020 only!
Programs compiled for 68000 could never include this instruction.
So if you have an A500 program this isssue can never show up.
BTW if muls.L does set the overflow then the calculate is 100% wrong anyway and their is NO way of recovering from it!
The only way to correct this it is using a the 64bit MUL instruction or proper multiplication routine.
The proper usage for this instruction is only to use it when you values will not overflow. And in this case the Coldfire version will work 100% the same.
Please remember, that the issue that you are referring too does not exist for A500 programs.
-
biggun wrote:
Karlos wrote:
Well, for one, I seem to recall that MULS and MULU fail to set the overflow bit of the condition code register.
To be precise here:
The 68000 instruction MULS.W did NEVER set the overflow bit on 68K.
This instruction is working 100% the same on the Coldfire.
The instruction that you are referring to is the muls.L and this instruction was 68020 only!
I have a lot of 020+ software... since I had my A1200 much longer than my A500, if I ever had the option I went for the 020+ version of a program.
Programs compiled for 68000 could never include this instruction.
So if you have an A500 program this isssue can never show up.
I cannot fault your logic here... but your design is supposed to be SuperAGA... If we were discussing the MiniMig then I think I would agree with you, that the coldfire might be a good solution.
BTW if muls.L does set the overflow then the calculate is 100% wrong anyway and their is NO way of recovering from it!
The only way to correct this it is using a the 64bit MUL instruction or proper multiplication routine.
The proper usage for this instruction is only to use it when you values will not overflow. And in this case the Coldfire version will work 100% the same.
Please remember, that the issue that you are referring too does not exist for A500 programs.
But the NetAmi is intended to supersede the A1200/A4000, not the A500...
-
biggun wrote:
BTW muls.L would calculate wrong of you get the overflow and their is NO way of recovering from it besides using the 64bit MUL version or a proper multiplication routine.
In other words if your code can overflow you will never use this instruction in the first place.
I think you'll find it's used in most 68020+ compiler-generated code where the effects of overflow aren't really defined by the language standard.
In hand coded ASM, you would still use it for example if you are writing saturation-based fixed-point arithmetic routines for some visual or audio application. You'd optionally fill the result with your maximum fixed point value on overflow.
The issue that you are referring too does not exist for A500 programs.
Perhaps, but it is possibly not the only such difference. Anyway, I would have thought that 68020 would be the base level for any 'revived' m68k amiga platform (other than minimig)? After all, you need 68020 compatibility to be able to run OS3.5/3.9, right?
Isn't the NatAmi going for minimum of AGA compatibility? I'm unaware of any working plain 68000+AGA hardware combination.
-
Karlos wrote:
biggun wrote:
BTW muls.L would calculate wrong of you get the overflow and their is NO way of recovering from it besides using the 64bit MUL version or a proper multiplication routine.
In other words if your code can overflow you will never use this instruction in the first place.
I think you'll find it's used in most 68020+ compiler-generated code where the effects of overflow aren't really defined by the language standard.
But in this case the MUL.L of Coldfire will behave correct too.
Its clear that there can be some a certain number of cases where this behavior was expected and where the saturation will not hit.
The fact is that A500 application are not 100% unaffected of this.
And if you think about it you will agree that over 95% of all 68020 games / applications are certainly not using this affected sequence too.
So what is the effect: 100% of A500 games unaffected.
And if 1 out of 100, A1200 applications is affected, how terrible is this?
So what it the real effect?
A few, very limited number of tools might become buggy.
But 99% of the AMIGA application will run correctly on Coldfire.
This is how it really looks like.
That some people state that the Coldfire is not possible to
run 68k code is certainly a 100% overstatement.
-
Methuselas wrote:
Personally, I'd rather see a core duo accelerator with a ram boost and a JIT emulator, a la Amithlon. With the custom chip code removed, that sh!t flies. *THEN* we could also do away with the AROS 68K and just use the X86 code, with minimal updates (I assume here - I'm a graphic designer. I'm not a coder).
Now just bare with me a minute!
Can't we just buy something off the shelf like this....
(http://www.gigabyte-usa.com/FileList/Image/motherboard_productimage_ga-x38-dq6_big.jpg)
It takes a CORE 2 Duo processor.
Now what I'm talking about is the ONLY thing on this board that is change are the BIOS chips. Basically the BIOS ROMS are modified to include an Amiga Emulation layer. Sort of like Amithlon included in BIOS. Or UAE included in BIOS!
BAM!! A new Amiga motherboard!
That's all I want. When I power it on, it comes to a Kickstart screen.
-
AmigaHeretic wrote:
Methuselas wrote:
Personally, I'd rather see a core duo accelerator with a ram boost and a JIT emulator, a la Amithlon. With the custom chip code removed, that sh!t flies. *THEN* we could also do away with the AROS 68K and just use the X86 code, with minimal updates (I assume here - I'm a graphic designer. I'm not a coder).
Now just bare with me a minute!
Can't we just buy something off the shelf like this....
(http://www.gigabyte-usa.com/FileList/Image/motherboard_productimage_ga-x38-dq6_big.jpg)
It takes a CORE 2 Duo processor.
Now what I'm talking about is the ONLY thing on this board that is change are the BIOS chips. Basically the BIOS ROMS are modified to include an Amiga Emulation layer. Sort of like Amithlon included in BIOS. Or UAE included in BIOS!
BAM!! A new Amiga motherboard!
That's all I want. When I power it on, it comes to a Kickstart screen.
Sure, AROS with this:
http://openbios.info/Welcome_to_OpenBIOS
http://www.coreboot.org/Welcome_to_coreboot
:-D
-
bloodline wrote:
HenryCase wrote:
bloodline wrote:
You buy a licence to use the core, not modify it.
Shame.
I have been thinking about this, and in reality it makes little difference if the Coldfire core has to remain untouched. If you're building a custom Coldfire SoC, you can design logic around the core that fixes certain instructions (instructions can reach custom logic before core logic). Since this logic will run at the full core speed the performance hit is negligible.
So why can't we have a fully 68k compatible Coldfire again? :-D
-
HenryCase wrote:
bloodline wrote:
HenryCase wrote:
bloodline wrote:
You buy a licence to use the core, not modify it.
Shame.
I have been thinking about this, and in reality it makes little difference if the Coldfire core has to remain untouched. If you're building a custom Coldfire SoC, you can design logic around the core that fixes certain instructions (instructions can reach custom logic before core logic).Since this logic will run at the full core speed the performance hit is negligible.
So why can't we have a fully 68k compatible Coldfire again? :-D
Really you would be wasting silicon here... you are suggesting a JIT in Hardware... Keep the JIT in software :-)
-
So what it the real effect?
A few, very limited number of tools might become buggy.
But 99% of the AMIGA application will run correctly on Coldfire.
This is how it really looks like.
That some people state that the Coldfire is not possible to
run 68k code is certainly a 100% overstatement.
I don't think anybody is saying the Coldfire can't run 68K code, I think they are saying AmigaOS and applications may not work readily on coldfire. In addition to the behavioural differences mentioned, how many byte and word size logic/arithmetic operations are there in typical amiga 68K object code that are not directly supported on coldfire (existing only as long version)? It may be the case that there will be more trap and emulation overhead than you think.
Remember, some applications using 64-bit integer multiplication on 020/030/040 ran like treacle on the first 060 cards that relied on trap-emulate (anybody remember Breathless) ?
I'm not saying that you can't run a coldfire based Amiga system but I really do think the difficulties are more than you seem to admit. There are a lot of things to consider beyond basic instruction implementation counts.
So far we've only looked at the user mode. Coldfire supervisor mode is a bit different and if I recall clearly, it doesn't have a separate supervisor stack pointer. This might not sound a big deal but it does have very real implications.
Any code that writes local data below the current stack depth (eg using negative offsets from a7), whilst working perfectly on a 680x0 Amiga, risks having that data trashed by an interrupt on a coldfire system. This might sound unlikely, but in fact code that has been optimised not to use stack frames within function may well assume it can safely use address modes such as -4(a7) etc to hold local variables (if it doesn't need to immediately call another function) rather than decrementing a7 first and using positive offsets for them, thus typically saving instructions to modify a7.
Can you say with certainty that the 100% of A500 applications you refer to as being compatible aren't doing anything like this?
-
bloodline wrote:
HenryCase wrote:
bloodline wrote:
HenryCase wrote:
bloodline wrote:
You buy a licence to use the core, not modify it.
Shame.
I have been thinking about this, and in reality it makes little difference if the Coldfire core has to remain untouched. If you're building a custom Coldfire SoC, you can design logic around the core that fixes certain instructions (instructions can reach custom logic before core logic).Since this logic will run at the full core speed the performance hit is negligible.
So why can't we have a fully 68k compatible Coldfire again? :-D
Really you would be wasting silicon here... you are suggesting a JIT in Hardware... Keep the JIT in software :-)
Hardly wasting silicon. What you'd be doing (in effect) is adding to the a table of 68k opcodes that the Coldfire already contains. So with the mul.l instruction, for instance, every time the mul.l code comes in (0x0007 in hex?) you perform actions that provide an identical result in a 68k CPU. That wouldn't take up a lot of chip space would it.
You just don't want to admit I'm on to something. :-P
-
HenryCase wrote:
bloodline wrote:
HenryCase wrote:
bloodline wrote:
HenryCase wrote:
bloodline wrote:
You buy a licence to use the core, not modify it.
Shame.
I have been thinking about this, and in reality it makes little difference if the Coldfire core has to remain untouched. If you're building a custom Coldfire SoC, you can design logic around the core that fixes certain instructions (instructions can reach custom logic before core logic).Since this logic will run at the full core speed the performance hit is negligible.
So why can't we have a fully 68k compatible Coldfire again? :-D
Really you would be wasting silicon here... you are suggesting a JIT in Hardware... Keep the JIT in software :-)
Hardly wasting silicon. What you'd be doing (in effect) is adding to the a table of 68k opcodes that the Coldfire already contains. So with the mul.l instruction, for instance, every time the mul.l code comes in (0x0007 in hex?) you perform actions that provide an identical result in a 68k CPU. That wouldn't take up a lot of chip space would it.
You just don't want to admit I'm on to something. :-P
You wish! :-)
How does this hardware know what is Code and what is Data? It can sit there as a parasite on the Data Bus, but it won't know what it's looking at...
-
bloodline wrote:
How does this hardware know what is Code and what is Data? It can sit there as a parasite on the Data Bus, but it won't know what it's looking at...
Surely in 68k ASM the first "half" of code is the instruction and the second "half" of the code is the data? As the code would be of fixed length (16-bit? 32-bit?) the 'parasite' would know exactly where to look for an instruction, right?
-
HenryCase wrote:
bloodline wrote:
How does this hardware know what is Code and what is Data? It can sit there as a parasite on the Data Bus, but it won't know what it's looking at...
Surely in 68k ASM the first "half" of code is the instruction and the second "half" of the code is the data? As the code would be of fixed length (16-bit? 32-bit?) the 'parasite' would know exactly where to look for an instruction, right?
Nope. Not the instruction format... the actual information traversing the Data bus, could be Code or Data... Only the CPU knows what the information actually is.
-Edit- And the 68k has variable length instructions... The parasite doesn't have a hope in hell's chance of ever correctly identifying the Code.
-
Karlos wrote:
So far we've only looked at the user mode. Coldfire supervisor mode is a bit different and if I recall clearly, it doesn't have a separate supervisor stack pointer. This might not sound a big deal but it does have very real implications.
[...]
Can you say with certainty that the 100% of A500 applications you refer to as being compatible aren't doing anything like this?
This is no problem.
You are referring to the very first Coldfire versions.
The V4 and V5 Coldfire have two a separate supervisor stack pointer.
-
bloodline wrote:
HenryCase wrote:
bloodline wrote:
How does this hardware know what is Code and what is Data? It can sit there as a parasite on the Data Bus, but it won't know what it's looking at...
Surely in 68k ASM the first "half" of code is the instruction and the second "half" of the code is the data? As the code would be of fixed length (16-bit? 32-bit?) the 'parasite' would know exactly where to look for an instruction, right?
Nope. Not the instruction format... the actual information traversing the Data bus, could be Code or Data... Only the CPU knows what the information actually is.
How does the CPU know what the information is? By knowing and following the instruction format, surely! :-D
-
biggun wrote:
This is no problem.
You are referring to the very first Coldfire versions.
The V4 and V5 Coldfire have two a separate supervisor stack pointer.
That's good. What other incompatibilities do they address?
-
biggun wrote:
You are referring to the very first Coldfire versions.
The V4 and V5 Coldfire have two a separate supervisor stack pointer.[/b]
@thread
You may want to look here:
http://www.elbox.com/faq_dragon.html
"Q: Why do you make a card with ColdFire processors only now? Motorola has been producing ColdFire processors for a long time...
A: Recently, Motorola has developed and produced several series of the ColdFire processors but none of them were compatible enough with 68k processors to be able to run AmigaOS3.x. This changed with appearance of the MCF54xx processors family. These are the first ColdFire processors based on the V4e core."
-
HenryCase wrote:
bloodline wrote:
HenryCase wrote:
bloodline wrote:
How does this hardware know what is Code and what is Data? It can sit there as a parasite on the Data Bus, but it won't know what it's looking at...
Surely in 68k ASM the first "half" of code is the instruction and the second "half" of the code is the data? As the code would be of fixed length (16-bit? 32-bit?) the 'parasite' would know exactly where to look for an instruction, right?
Nope. Not the instruction format... the actual information traversing the Data bus, could be Code or Data... Only the CPU knows what the information actually is.
How does the CPU know what the information is? By knowing and following the instruction format, surely! :-D
Hmmm... this is going to turn into CPU 101... The CPU makes requests to the Memory, sometimes those requests will be for an instruction, sometimes those requests will be for Data (as requested by an instruction).
Since the CPU made the the request, it knows what will be delivered on the bus. Anything watching the bus won't know what is being transmitted.
-
bloodline wrote:
HenryCase wrote:
bloodline wrote:
How does this hardware know what is Code and what is Data? It can sit there as a parasite on the Data Bus, but it won't know what it's looking at...
Surely in 68k ASM the first "half" of code is the instruction and the second "half" of the code is the data? As the code would be of fixed length (16-bit? 32-bit?) the 'parasite' would know exactly where to look for an instruction, right?
Nope. Not the instruction format... the actual information traversing the Data bus, could be Code or Data... Only the CPU knows what the information actually is.
-Edit- And the 68k has variable length instructions... The parasite doesn't have a hope in hell's chance of ever correctly identifying the Code.
The 68k "flags" instruction fetched on the debug pins. (Harvard Architecture).
So yes, you can from the outside distinguishe data from code fetched.
Saying that the idea of the paraside is not worth doing, as its much to complicate for the possible benefit.
If you really want to do a lot of work then you would rather alter the Coldfire core directly.
BTW, why did your claim that you are not allowed to change the Coldfire if you buy it?
I think the real story is that you would not want to change it - as its too much work.
The Coldfire is quite fast and powerful as it is.
-
@HenryCase
Drag(-)on, ey? Elbox don't seem to be in a rush to release it...
-
biggun wrote:
bloodline wrote:
HenryCase wrote:
bloodline wrote:
How does this hardware know what is Code and what is Data? It can sit there as a parasite on the Data Bus, but it won't know what it's looking at...
Surely in 68k ASM the first "half" of code is the instruction and the second "half" of the code is the data? As the code would be of fixed length (16-bit? 32-bit?) the 'parasite' would know exactly where to look for an instruction, right?
Nope. Not the instruction format... the actual information traversing the Data bus, could be Code or Data... Only the CPU knows what the information actually is.
-Edit- And the 68k has variable length instructions... The parasite doesn't have a hope in hell's chance of ever correctly identifying the Code.
The 68k "flags" instruction fetched on the debug pins. (Harvard Architecture).
So yes, you can from the outside distinguishe data from code fetched.
Does the Coldfire have MM capabilities of that scale?
Saying that the idea of the paraside is not worth doing.
BTW, why did your claim that you are not allowed to change the Coldfire if you buy it?
You know full well that when you licence a core you don't buy the right to modify it. That was the reason why the Amiga Team chose the PA-RISC, because HP did allow the licensee of the Core to modify it.
I think the real story is that you would not want to change it - as its too much work.
I stated that in my original post on the subject.
The Coldfire is quite fast and powerful as it is.
We have yet to see proof of this!
-
Ok... here is a CF emulator... lets see what it can do...
http://www.slicer.ca/coldfire/
-
biggun wrote:
The 68k "flags" instruction fetched on the debug pins. (Harvard Architecture).
ColdFire doesn't.
-
bloodline wrote:
Hmmm... this is going to turn into CPU 101...
I'm willing to learn. You never know, through my ignorance a good idea may emerge.
bloodline wrote:
The CPU makes requests to the Memory, sometimes those requests will be for an instruction, sometimes those requests will be for Data (as requested by an instruction).
Does the outgoing instruction know what it is requesting, or is it simply calling on a set memory space? Also, does memory get divided up into program space and data space or is it all jumbled up?
-
HenryCase wrote:
bloodline wrote:
Hmmm... this is going to turn into CPU 101...
I'm willing to learn. You never know, through my ignorance a good idea may emerge.
I used to think that... sadly, while ignorance used to allow abstract thinking, now is just causes you to fall into the same traps that people in the past have. It is always best to stand on the shoulders of giants :-)
bloodline wrote:
The CPU makes requests to the Memory, sometimes those requests will be for an instruction, sometimes those requests will be for Data (as requested by an instruction).
Does the outgoing instruction know what it is requesting, or is it simply calling on a set memory space? Also, does memory get divided up into program space and data space or is it all jumbled up?
???
There is no outgoing instruction. At boot the CPU requests data from a predetermined (by the CPU manufacturer) location, this data will be the first instruction to be executed. Often this first instruction will be a jump instruction, that tells the CPU where to fetch the next instruction from. From there it's up to the system designer what happens... then next instruction could tell the CPU to move Data around the internal registers... or to add a register to a memory location or to copy the data from a location to a register... only the CPU knows :-)
Memory on the Amiga is jumbled up, Code and Data are all over the place... on a x86 it has an MMU that marks memory pages as Code or Data... but then you would still have the problem of variable length instructions.
-
bloodline wrote:
???
There is no outgoing instruction.
I worded that badly, I should have used 'outgoing signal' instead. What I'm trying to get at is does the CPU give clues about what it is trying to fetch?
If not outgoing signals then are there CPU pins you can monitor to get information on what the CPU wants to do with the information it gets next?
bloodline wrote:
Memory on the Amiga is jumbled up, Code and Data are all over the place... on a x86 it has an MMU that marks memory pages as Code or Data... but then you would still have the problem of variable length instructions.
Is the variable length instructions issue not something we can work around? Let's say the 'parasite' is looking for the instruction 0x000189. Whenever this code comes up the 'parasite' looks at the code following it. If it follows the format it is expecting then it tries to use it, if not then it lets the CPU try to handle it.
-
HenryCase wrote:
bloodline wrote:
???
There is no outgoing instruction.
I worded that badly, I should have used 'outgoing signal' instead. What I'm trying to get at is does the CPU give clues about what it is trying to fetch?
If not outgoing signals then are there CPU pins you can monitor to get information on what the CPU wants to do with the information it gets next?
The CPU is not required to let anyone else on the bus know what it is trying to fetch. systems with an MMU (and as Biggun states Harvard Architecture) do provide signals to let other bus users know what's going on, this is useful for multiCPU systems where caches need to be kept coherent. The x86 provides this sort of data...
bloodline wrote:
Memory on the Amiga is jumbled up, Code and Data are all over the place... on a x86 it has an MMU that marks memory pages as Code or Data... but then you would still have the problem of variable length instructions.
Is the variable length instructions issue not something we can work around? Let's say the 'parasite' is looking for the instruction 0x000189. Whenever this code comes up the 'parasite' looks at the code following it. If it follows the format it is expecting then it tries to use it, if not then it lets the CPU try to handle it.
The problem here is that you would need to simulate the entire frontend of the CPU to follow what's going on... and even then, exceptions and interrupts will throw the whole thing out... I hope now you can see how difficult this idea would be!
Don't forget that an instruction is just a number, that number means something to the CPU, but only the CPU knows if that number is telling the CPU what to do next, or if the CPU needs that number for a calculation.
-
@bloodline
Well I'm convinced enough that the 'parasite' idea isn't worth pursuing now, little more difficult than I had hoped. Thanks for your help bloodline.
@all
Coldfire still shows promise IMO. Maybe we could do a bit of research into 68k-Coldfire compatibility. This should be a good place to start looking:
http://tinyurl.com/3exnxn
Oli hd please contribute to this discussion, your input will be very valuable here.
-
HenryCase wrote:
@bloodline
Well I'm convinced enough that the 'parasite' idea isn't worth pursuing now, little more difficult than I had hoped. Thanks for your help bloodline.
No worries, CPU design is a fun topic... I should say, was a fun topic... it's all very complex now... the Amiga was the last of the understandable architectures... now you need teams of people.
Coldfire still shows promise IMO. Maybe we could do a bit of research into 68k-Coldfire compatibility. This should be a good place to start looking:
http://tinyurl.com/3exnxn
Well, the V4 and V5 cores, with their separate supervisor stack do seem to have what it takes to emulate the 68000 (not sure about the data width issues though)... but the instruction traps which will be required are very expensive in terms of CPU cycles... I really do think we are looking at a massive penalty with the coldfire... until I see one in action I can't be 100% sure, but from what I've read I'm not hopeful.
Oli-HD please contribute to this discussion, your input will be very valuable here.
Indeed...
-
You know full well that when you licence a core you don't buy the right to modify it.
That will depend on the exact terms of the license. These aren't exactly shrink wrap licenses, probably specific to each licensee.
e.g. DEC, the designers of StrongARM had an ARM architectural license, that allowed them to design their own processor from scratch.
-
minator wrote:
You know full well that when you licence a core you don't buy the right to modify it.
That will depend on the exact terms of the license. These aren't exactly shrink wrap licenses, probably specific to each licensee.
e.g. DEC, the designers of StrongARM had an ARM architectural license, that allowed them to design their own processor from scratch.
But for the tiny sums of money (and low production runs) we are talking about, there is no way Freescale would allow any modification to the core.
-
bloodline wrote:
But for the tiny sums of money (and low production runs) we are talking about, there is no way Freescale would allow any modification to the core.
Why would money/low production runs come into it? If the redesign work was done by a 3rd party, and the cost of manufacture was the same, where would Freescale be losing out?
-
HenryCase wrote:
bloodline wrote:
But for the tiny sums of money (and low production runs) we are talking about, there is no way Freescale would allow any modification to the core.
Why would money/low production runs come into it? If the redesign work was done by a 3rd party, and the cost of manufacture was the same, where would Freescale be losing out?
That has nothing to do with it, it's about access to the technology. The less you pay the less you are allowed to do with the IP.
-Edit- missing word :-) oops too late :lol:
-
bloodline wrote:
HenryCase wrote:
bloodline wrote:
But for the tiny sums of money (and low production runs) we are talking about, there is no way Freescale would allow any modification to the core.
Why would money/low production runs come into it? If the redesign work was done by a 3rd party, and the cost of manufacture was the same, where would Freescale be losing out?
That has nothing to do with it, it's about access to the technology. The less you pay the you are allowed to do with the IP.
OIC. Fair enough.
-
bloodline wrote:
Well, the V4 and V5 cores, with their separate supervisor stack do seem to have what it takes to emulate the 68000 (not sure about the data width issues though)... but the instruction traps which will be required are very expensive in terms of CPU cycles... I really do think we are looking at a massive penalty with the coldfire.
As I understand it, the coldfire is cheap, that is really the only thing it has going for it.
The amiga is a multitasking computer, so why not use 2 coldfires, running separate tasks, while one chip is trapping and emulating code, the other coldfire continues running it's task, hiding the speed penalty that emulation brings.
A few extentions to kickstart (in ram) will enable the OS to use 2 processors simultaneously.
-
A6000 wrote:
bloodline wrote:
Well, the V4 and V5 cores, with their separate supervisor stack do seem to have what it takes to emulate the 68000 (not sure about the data width issues though)... but the instruction traps which will be required are very expensive in terms of CPU cycles... I really do think we are looking at a massive penalty with the coldfire.
As I understand it, the coldfire is cheap, that is really the only thing it has going for it.
hmmm, it's not that cheap...
The amiga is a multitasking computer, so why not use 2 coldfires, running separate tasks, while one chip is trapping and emulating code, the other coldfire continues running it's task, hiding the speed penalty that emulation brings.
A few extentions to kickstart (in ram) will enable the OS to use 2 processors simultaneously.
Not chance of Exec supporting 2 CPUs (this would require a complete rewrite, and Memory protection would help). Michal Schulz and NicJA did some work on getting AROS's Exec SMP ready, but... that was for x86 CPUs which are built for SMP (with extremely powerful MMUs and extensive cache coherency hardware).
Other than that, most Amiga programs spend their time sleeping, so it wouldn't really offer much improvement...
-
AmigaHeretic wrote:
Sounds very cool. Are you saying you are emulating all 68k though or you are still just talking about emulating the missing op codes?
I am emulating all 68k opcodes for maximum compatibility.
The core routine is pretty simple :
exec_68k_inst:
move.w (A5)+,D6 ; Read a 68k instruction word
move.l D6,D5
lsr.l #6,D6 ; D6 : bits 15-6 (jump table index)
andi.l #$3F,D5 ; D5 : bits 5-0 (effective address)
movea.w dispatch_tab(PC,D6.l*4),A0 ; A0 : emulation routine address
movea.w dispatch_tab+2(PC,D6.l*4),A2 ; A2 : effective address table
move.l (A2),D0 ; D0 : offset in the emulation routine
lea 0(A0,D0.l),A1 ; A1 : patching address
move.l 4(A2,D5.l*4),(A1) ; Patch the emulation routine
jmp (A0) ; Call the emulation routine
Then, the dispatch table:
(first word : routine address, second word : EA table address)
dispatch_tab:
;$0000 - $0FFF
dc.w inst_ORI_B,ea_ORI_B
dc.w inst_ORI_W,ea_ORI_W
dc.w inst_ORI_L,ea_ORI_L
dc.w inst_illegal,0
dc.w inst_BTST_D0,ea_BTST_reg
dc.w inst_BCHG_D0,ea_BCHG_reg
dc.w inst_BCLR_D0,ea_BCLR_reg
dc.w inst_BSET_D0,ea_BSET_reg
dc.w inst_ANDI_B,ea_ANDI_B
dc.w inst_ANDI_W,ea_ANDI_W
dc.w inst_ANDI_L,ea_ANDI_L
dc.w inst_illegal,0
...
One of the EA table :
;Effective address table for:
; ORI.B #xx,
; ANDI.B #xx,
; EORI.B #xx,
; ADDI.B #xx,
; SUBI.B #xx,
; CMPI.B #xx,
ea_ORI_B:
ea_ANDI_B:
ea_EORI_B:
ea_ADDI_B:
ea_SUBI_B:
ea_CMPI_B:
dc.l 2
lea reg_D0+3(A6),A0
lea reg_D1+3(A6),A0
lea reg_D2+3(A6),A0
lea reg_D3+3(A6),A0
lea reg_D4+3(A6),A0
lea reg_D5+3(A6),A0
lea reg_D6+3(A6),A0
lea reg_D7+3(A6),A0
dc.w $4EF8,inst_illegal
dc.w $4EF8,inst_illegal
dc.w $4EF8,inst_illegal
dc.w $4EF8,inst_illegal
dc.w $4EF8,inst_illegal
dc.w $4EF8,inst_illegal
dc.w $4EF8,inst_illegal
dc.w $4EF8,inst_illegal
move.l reg_A0(A6),A0
move.l reg_A1(A6),A0
move.l reg_A2(A6),A0
move.l reg_A3(A6),A0
move.l reg_A4(A6),A0
move.l reg_A5(A6),A0
move.l reg_A6(A6),A0
move.l reg_A7(A6),A0
dc.w $4EB8,calc_ea_18_B
dc.w $4EB8,calc_ea_19_B
dc.w $4EB8,calc_ea_1A_B
dc.w $4EB8,calc_ea_1B_B
dc.w $4EB8,calc_ea_1C_B
dc.w $4EB8,calc_ea_1D_B
dc.w $4EB8,calc_ea_1E_B
dc.w $4EB8,calc_ea_1F_B
dc.w $4EB8,calc_ea_20_B
dc.w $4EB8,calc_ea_21_B
dc.w $4EB8,calc_ea_22_B
dc.w $4EB8,calc_ea_23_B
dc.w $4EB8,calc_ea_24_B
dc.w $4EB8,calc_ea_25_B
dc.w $4EB8,calc_ea_26_B
dc.w $4EB8,calc_ea_27_B
dc.w $4EB8,calc_ea_28
dc.w $4EB8,calc_ea_29
dc.w $4EB8,calc_ea_2A
dc.w $4EB8,calc_ea_2B
dc.w $4EB8,calc_ea_2C
dc.w $4EB8,calc_ea_2D
dc.w $4EB8,calc_ea_2E
dc.w $4EB8,calc_ea_2F
dc.w $4EB8,calc_ea_30
dc.w $4EB8,calc_ea_31
dc.w $4EB8,calc_ea_32
dc.w $4EB8,calc_ea_33
dc.w $4EB8,calc_ea_34
dc.w $4EB8,calc_ea_35
dc.w $4EB8,calc_ea_36
dc.w $4EB8,calc_ea_37
movea.w (A5),A0
addq.l #2,A5
movea.l (A5),A0
addq.l #4,A5
dc.w $4EF8,inst_illegal
dc.w $4EF8,inst_illegal
dc.w $60F8,$4E71
dc.w $4EF8,inst_illegal
dc.w $4EF8,inst_illegal
dc.w $4EF8,inst_illegal
The emulation routines :
;$0000 - $003F : ORI.B #xx,
inst_ORI_CCR:
or.l D1,D7
rts
inst_ORI_B:
move.w (A5)+,D1
dc.l 0
move.b (A0),D2
or.l D1,D2
move D7,CCR
move.b D2,(A0)
move CCR,D7
rts
...
Quite straightforward :-D
Regards,
Frederic
-
If I recall correct, the 060 is 32-bit pure and hardware emulates <32-bit by simply stripping the communication since it would take more resources to do it "the right way" (and it wouldn't shave any cycles anyway). As others pointed out: modifying CF to do this is not a viable option. So (finally) my question is: will doing this with JIT have a notable impact on performance?
By "notable" I don't mean a reduction of a vanishing factor 4 in the scope of a x1400 or x140 gain.
-
JetRacer wrote:
If I recall correct, the 060 is 32-bit pure and hardware emulates <32-bit by simply stripping the communication since it would take more resources to do it "the right way" (and it wouldn't shave any cycles anyway). As others pointed out: modifying CF to do this is not a viable option. So (finally) my question is: will doing this with JIT have a notable impact on performance?
By "notable" I don't mean a reduction of a vanishing factor 4 in the scope of a x1400 or x140 gain.
A JIT shouldn't cause too much of a performance problem, it is the best solution really... the big question is if we are using a JIT, then why bother with a Coldfire, when you could use any CPU... a cheaper more powerful one...
-
A wild stab in the dark: because it needs to emulate everything all the time - as opposed to the CF who can use emu when it needs to?
But then again I don't know what I'm talking about when it comes to JIT and CF. -Less than the thread-average anyway.
-
The amiga is a multitasking computer, so why not use 2 coldfires, running separate tasks, while one chip is trapping and emulating code, the other coldfire continues running it's task, hiding the speed penalty that emulation brings.
Running two "normal" applications on two Coldfire CPUs (like SMP) does not work.
For this cache coherency (bus snooping) is required.
The Coldfire does not do bus snooping.
The 68040 and 68060 were supporting bus snooping.
You could create great working multi processor systems out of 68040 and 68060 but not out of Coldfire.
Lets be clear here.
The Coldfire has some advantages:
1st)
Freescale has the Coldfire set up to work like LEGO.
You can easily put together parts, as you please.
If you look at the Freescale side you will see that there are dozens of different Coldfire CPUs put together.
This somewhat shows how simple it is to put new Coldfires together.
This LEGO feature is what makes the Coldfire interesting as the key for the AMIGA is to get something like SuperAGA into the Chip quickly.
Compared the classic AMIGAs the Coldfire is quite fast.
Yes, there are other higher clocked CPUs available but
the Coldfire V5 runs with 400 MHz and has about the same performance as a 68060 clocked with 400MHz.
So for a slim AMIGA OS system a 400 MHz Coldfire / 400Mhz 68060 does fly.
The only situation where you would want more Power is for something like Video encoding. As you might know the Coldfire has a MAC unit which is about something like ALTIVEC for the poor. The MAC Unit helps to accelerate stuff like FF transformations quite good.
So for tasks like Video encoding the Coldfire has more power than a 68060 400MHz would have.
Another option which will fit perfectly into the AMIGA spirit is putting a dedicated unit on this tasks. Freescale offers DSP cores which will be perfect for doing video encoding. Freescale could without any problem put SuperAGA, a 400Mhz Coldfire and a 400Mhz DSP into one chip.
The resulting chip will be relative small (read: cheap),
will be quick to produce, and it will be powerful enough to let AMIGA OS fly.
The DSP could be nicely integrated into AMIGA OS.
Think of it as a 2nd Copper aka a SuperCopper.
The DSP could be used to play video and audio Datatypes.
The SuperAGA chipset already includes DMA engines for stuff like YUV conversion. The idea to upgrade this with a powerfull SuperCopper that would be specialized for the heavy lifting needed for Video encoding, makes good sense.
We were thinking about designing our own mini-DSP for this - but as Freescale has powerful DSP cores in their LEGO toy box it might be clever to just take an existing DSP which has already a wealth of datatypes for audio and video developed for.
This is what makes sense to me.
Please mind the target is not to create a CPU which is faster than a 8-way Opteron or CELL.
The goal is to create a CPU which is not expensive, which runs passiv and which is fast enough to let AMIGA OS fly.
The big advantage of the AMIGA OS system is the elegant design and the resulting low memory and speed requirements of it.
There is a market for AMIGA OS for a small system.
You can think of it as a Amiga-Joystick, AMIGA smartphone or AMIGA-Wii.
There is no market for a Desktop system anymore.
The Desktop area is fully saturated with Windows,Apple and now even Linux.
If you want to build a new high end system take x86 and Linux.
I believe that the future (if any) for AMIGA OS is to power a sub $100 device.
And for this target market the Coldfire is a quite sensible choice.
AMIAG OS needs again something like a A500.
Low price but powerful for its size and price.
-
@Biggun
I totally agree with your criteria for choosing the Coldfire in your last post, since you are not looking at it as a derivative of the 68k... but my argument revolves around the fact that we should be looking at all the available CPUs which can be licensed for use in the NatAmi project, and not just select the Coldfire because of it's history.
MIPS and ARM provide all the same services as Freescale, but with the advantage that both of those architectures are more widely used and better supported. Also, in the case of the MIPS, its implementations are tiny!
If the V4 and V5 versions do offer a very high degree of compatibility with the 68k and require only minimal work to support the existing Amiga software base, then and only then does the Coldfire have an advantage over the MIPS and ARM. Otherwise it's probably not the best choice, since the same amount of work would be required to get it to work regardless of the CPU architecture used.
-
I know, let's use PowerPC...
*hides*
-
Karlos wrote:
I know, let's use PowerPC...
*hides*
Actually, in this case... If it fit the criteria, I'd agree! :-D
-
biggun wrote:
There is a market for AMIGA OS for a small system.
You can think of it as a Amiga-Joystick, AMIGA smartphone or AMIGA-Wii.
There is no market for a Desktop system anymore.
The Desktop area is fully saturated with Windows,Apple and now even Linux.
I really loved this Idea, and I do agree with you, but I think it would be nice to "leave the doors open" to be possible to raise the system in the future up to a Desktop level (if it is possible).
I'm not an expert in Coldfire. Do you think it would be feasible to have in the future more than one CPU in the same board, in a cluster-like fashion? Maybe talking with each other with fast Ethernet or a serial bus?
I believe this fills the other side of the problem. Let's say, you can have a sub US$100 device, and the software for it could even run in a big multiprocessor box. One could add boards with multiple CPUs to speed up the machine.
Amiga OS is multitasking, so we could distribute new tasks to different CPUs and do software that launch many tasks.
It would take the advantage of multiple CPUs when they exist
It is just an idea...
-
It has the advantage that it could run the PPC descendants of AmigaOS too...
-
AeroMan wrote:
I'm not an expert in Coldfire. Do you think it would be feasible to have in the future more than one CPU in the same board, in a cluster-like fashion?
The Natami board is supposed to have a CPU expansion slot.
http://www.natami.net/specification.htm
"CPU expansion slot
Allows upgrade to another CPU (Coldfire/PowerPC/Cell)
SLOT layout will be fully documented to allow 3rd party upgrade designs."
-
Karlos wrote:
It has the advantage that it could run the PPC descendants of AmigaOS too...
Hmmm, it does... perhaps an embedded PPC would be right for this application... no wait, Karlos, you are not going to get me to endorse the PPC! ;-) :lol:
-
I do... :-D
But this thread made me look at Coldfire with a different vision...
-
biggun wrote:
The big advantage of the AMIGA OS system is the elegant design and the resulting low memory and speed requirements of it.
If you want to build a new high end system take x86 and Linux.
If I want to enjoy a minimum of AmigaOSs elegance, I can in windows (and linux for that matter), all I need to do is to only use the most buggy alphas of applications, but again as I said that's just a minimum of the sweet taste of the AmigaOSs elegance, minimum because the cursed Winblows (and Linsux) does not permit the sweet buggy apps destroy the data of the others (and consequently mine) and kill the OSs, what a shame.
-
In terms of software PPC is the easiest option as you've got as there exists 3 JIT engines for running 68K software. There's also as pointed out the option of going to one of the newer OSs.
On the other hand getting hold of one of these engines is another matter...
--
Another option would be to do a combo of some sort, i.e. an FPGA for backwards compatibility and a modern SoC for new stuff.
This is what a modern SoC can provide:
OpenPandora (http://openpandora.org/)
-
Einstein wrote:
the cursed Winblows (and Linsux) does not permit the sweet buggy apps destroy the data of the others (and consequently mine) and kill the OSs, what a shame.
When was the last time this happened to you, and what version of AmigaOS were you using at the time?
-
You really got him there HenryCase; no Amiga user would ever recall any of the weeks last 57 gurus and which of the 3 OS versions he used at the time :-)
-
@JetRacer
The different versions of AmigaOS aren't equally stable, so my question still stands.
-
HenryCase wrote:
When was the last time this happened to you, and what version of AmigaOS were you using at the time?
Eons ago, since I don't take the "OS" serious so to say, in order to consider it useful (lol).
And there is no difference between any AmigaOS since *there is no memory protection*.
-
JetRacer wrote:
You really got him there HenryCase; no Amiga user would ever recall any of the weeks last 57 gurus and which of the 3 OS versions he used at the time :-)
Ofcourse not :roflmao:
-
HenryCase wrote:
@JetRacer
The different versions of AmigaOS aren't equally stable, so my question still stands.
It's pretty irrelevant for how long the different versions of AmigaOSs can stand on their feet before collapsing, since the supreme champions in the arena are the tasks :-)
-
Why all the discussions about doing something new to the Amiga deteriorates in no-one-wins Windows vs Linux vs Mac wars ? :-(
Yes, AmigaOS has no "memory protection", but for some reason those blue screens and lock ups keeps telling me that there is something wrong with Windows "memory protection"... :-D
Can we go back to Coldfire ?
-
AeroMan wrote:
Why all the discussions about doing something new to the Amiga deteriorates in no-one-wins Windows vs Linux vs Mac wars ? :-(
Because.. (quoting you)
Yes, AmigaOS has no "memory protection", but for some reason those blue screens and lock ups keeps telling me that there is something wrong with Windows "memory protection"... :-D
If something goes wrong *inside* the *kernel* (hint: buggy drivers) then the kernel will "crash", this because modules (drivers) that are loaded into the kernel "become" a part of the kernel, and hence run in kernel mode, therefore they are privileged to destroy data in the kernel space, generally speaking. This is very different to processes that run in user space, those cannot {bleep} around in kernel space and pages which they have not been mapped to their virtual adress space. Hence no normal app can give windows "blue screens", in contrast to AmigaOS tasks/processes about which the story is pretty obvious.
Can we go back to Coldfire ?
You are free man, but why enter a discussion and ask for permission to leave it ?
-
Ok, I recognize it was my fault to express my frustration with Windows problems, and I apologize for that, even if I don't agree with you.
I promise not to be part in those issues anymore :-)
Can we go back to Coldfire ? :-D It was getting really interesting...
-
Einstein wrote:
And there is no difference between any AmigaOS since *there is no memory protection*.
Not true. AmigaOS4, MorphOS, AROS64 all have at least partial memory protection. I am currently investigating how easy it will be to add full memory protection to AROS (without throwing away AmigaOS 3.1 API compatibility), and so I asked you which version you had used so I could get an idea of the stability compared to the other memory protection options.
So I'll ask again, which version of AmigaOS were you using?
-
HenryCase wrote:
I am currently investigating how easy it will be to add full memory protection to AROS (without throwing away AmigaOS 3.1 API compatibility),
The answer is: Not very. i.e. Impossible. While the exec design team clearly envisiaged memory protection at some future time... none of the rest of the OS design team did (I think intuition alone is a very bad boy!), and public/private memory flags were never enforced.
Tasks and libraries happily pass memory pointers around without a care in the world.
-
Memory protection exists to clean up the mess of bad coders...
*runs away*
-
bloodline wrote:
The answer is: Not very. i.e. Impossible. While the exec design team clearly envisiaged memory protection at some future time... none of the rest of the OS design team did (I think intuition alone is a very bad boy!), and public/private memory flags were never enforced.
Tasks and libraries happily pass memory pointers around without a care in the world.
That is useful information to me bloodline, so thank you for that. None of what you describe makes retrofitting memory protection impossible though. Let me explain...
The memory doesn't have to flag whether it is public or private as long as the API functions asking for memory space are not allowed direct access to the memory. You put an OS layer inbetween memory allocation functions and the real memory. This MMU gives the memory allocation functions the impression that it is writing directly to the memory, when in reality it is controlling memory allocation/deallocation and protecting unwanted memory access.
For those memory functions called without a flag, the memory status is set to private by default. You then have the issue of inter-program communication. For this, you want to allow the programs to attempt communication, but you want to prevent incorrect values being entered into the program space and potentially crashing the OS. You allow the value to be passed by the first program, but the MMU checks to see if the value is valid before it passing it on to the second program.
For the few programs where this method is not suitable (where you need as much speed as possible for example) the MMU can either be switched off or the program can be patched. The MMU will only be known to the new kernel and the user, the API functions will not be aware of its existence, so they will not mind using non memory protected mode.
Those are my ideas as they stand at the moment. As I said I am still conducting research into the feasibility of a memory protected AROS.
Do you think my ideas are sound in principal?
-
HenryCase wrote:
bloodline wrote:
The answer is: Not very. i.e. Impossible. While the exec design team clearly envisiaged memory protection at some future time... none of the rest of the OS design team did (I think intuition alone is a very bad boy!), and public/private memory flags were never enforced.
Tasks and libraries happily pass memory pointers around without a care in the world.
That is useful information to me bloodline, so thank you for that. None of what you describe makes retrofitting memory protection impossible though.
It does if you want all your old software and large parts of the OS to keep working... :-D
Let me explain...
:nervous:
The memory doesn't have to flag whether it is public or private as long as the API functions asking for memory space are not allowed direct access to the memory.
When you request memory from exec you are supposed to flag the memory if it is to shared or private. Although in AmigaOS this actually does nothing (due to the early machines not having an MMU), this tells the OS that the memory should be allocated in either the task's private address space or if it should be allocated in some public memory pool (where anyone can mess around).
Due to the lack of MMU, these flags were never enforced so no amiga software actually bothers to set them properly... Enforcing them nnow after 20 years would break a lot of apps.
You put an OS layer inbetween memory allocation functions and the real memory. This MMU gives the memory allocation functions the impression that it is writing directly to the memory, when in reality it is controlling memory allocation/deallocation and protecting unwanted memory access.
I don't understand... The way memory protection would work is that each task gets it's own address space... in effect each task gets up to 4gig of address space to itself. I shall expalin the reprocussions of this later...
A good book on how MMU's work should give you a better understanding... The intel developer docs are a great place to start!
For those memory functions called without a flag, the memory status is set to private by default.
Well that would break EVERY app, since it is the exact opposite of the current situation... for there to be any hope of this working at all... the default behaviour needs to be the same as current behaviour.
You then have the issue of inter-program communication. For this, you want to allow the programs to attempt communication, but you want to prevent incorrect values being entered into the program space and potentially crashing the OS. You allow the value to be passed by the first program, but the MMU checks to see if the value is valid before it passing it on to the second program.
One of the great things about AmigaOS is that is places no restrictons on what is passed in an exec message... In effect the OS has no idea what the programs are sending to each other. (it's the same problem as the parasite emulator we discussed before).
Now here is where the problems begin... You have two tasks, both have their own address spaces... one task sends a pointer to a data block to another task in a message... that pointer is meaningless to the receiving task, since the pointer is either not vaild in its address space or pointer for somthing completely different!
The only way around this is to have a public pool of memory mapped to the same address window in all task address spaces... but since the public/private flags were ignored this can't work.
For the few programs where this method is not suitable (where you need as much speed as possible for example) the MMU can either be switched off or the program can be patched. The MMU will only be known to the new kernel and the user, the API functions will not be aware of its existence, so they will not mind using non memory protected mode.
You can't just switch off the MMU once it is on... the whole integrity of the memory structure is maintained by this device... Switching it off would leave the memory in effectly a random jumble of 4k blocks that used to mean something...
Those are my ideas as they stand at the moment. As I said I am still conducting research into the feasibility of a memory protected AROS.
Do you think my ideas are sound in principal?
Not really :-)
MMU's are hard to get your head around at first but some solid study and a lot of running it out on paper with a pencil will give you a good idea of the issues here...
-
bloodline wrote:
You put an OS layer inbetween memory allocation functions and the real memory. This MMU gives the memory allocation functions the impression that it is writing directly to the memory, when in reality it is controlling memory allocation/deallocation and protecting unwanted memory access.
I don't understand... The way memory protection would work is that each task gets it's own address space... in effect each task gets up to 4gig of address space to itself. I shall expalin the reprocussions of this later...
A good book on how MMU's work should give you a better understanding... The intel developer docs are a great place to start!
You don't need to create separate memory spaces to have full memory protection. In fact, now that many CPUs have 64-bit addressing, there are those who argue that a single linear address space is a good idea. Doing this would allow the existing messaging system to work fine, with the added restriction that programmers do need to mark the appropriate memory as shared.
In order for Amiga OS to have full memory protection whilst still allowing old software to run, there is only one option: place the old apps in a sandbox. There the old apps can do all the naughty things that they used to get away with because the rules were never enforced. Unfortunately, this does mean significant rewrites of some OS modules such as intuition, which didn't follow the rules either.
Hans
-
"the existing messaging system" would have to be updated to 64Bit in order to make full use of it, but that would require a recompile.
Wether it would actually do anything in regard to MP is questionable and would atleast require further measures (which will break source-compability too).
-
bloodline wrote:
MMU's are hard to get your head around at first but some solid study and a lot of running it out on paper with a pencil will give you a good idea of the issues here...
I am studying memory protection, but as you failed to grasp my ideas correctly allow me to try and explain them again...
You can't just switch off the MMU once it is on... the whole integrity of the memory structure is maintained by this device... Switching it off would leave the memory in effectly a random jumble of 4k blocks that used to mean something...
Yes you can. Imagine this scenario. You have two kernels available MP-Exec (Memory Protected Exec) and Exec (the standard no memory protection version). You've started running a program, but need it to run faster. When you give the command to switch from MP-Exec to Exec (I know this would remove memory protection for all processes), what effectively happens is that the MMU updates the 'virtual' memory addresses with the real memory addresses (like saying to the program 'here's where your data was'). Once this process is complete a kernel switcher program takes over, freezing all processes and keeping the memory as it was (kind of like when you put a laptop into hibernate). The kernels are switched and the processes and memory are unfrozen. The programs don't see any difference because the APIs of MP-Exec and Exec give the same results.
I really can't see a problem with what I've described, perhaps you can?
One of the great things about AmigaOS is that is places no restrictons on what is passed in an exec message... In effect the OS has no idea what the programs are sending to each other. (it's the same problem as the parasite emulator we discussed before).
But this can be changed by adding an MMU layer to the kernel and updating the kernel functions, so it's not an issue. Of course no memory protection gives you more freedom, what I am describing is a way to add memory protection for those who want it (which seems to be a lot of people).
bloodline wrote:
Now here is where the problems begin... You have two tasks, both have their own address spaces... one task sends a pointer to a data block to another task in a message... that pointer is meaningless to the receiving task, since the pointer is either not vaild in its address space or pointer for somthing completely different!
As the MMU is dishing out memory, it knows what kind of data will be valid for the space. So, for example, if you try to pass a message suitable for a graphics routine into a memory space designated for sound functions then the MMU will not allow it. This is just one example, there will be other protection measures in place, such as if a program crashes you could limit the crash to its own allocated memory space, seeing as all reading and writing of memory goes through the MMU.
bloodline wrote:
The only way around this is to have a public pool of memory mapped to the same address window in all task address spaces... but since the public/private flags were ignored this can't work.
FYI, that's not what I'm trying to propose.
bloodline wrote:
I don't understand... The way memory protection would work is that each task gets it's own address space... in effect each task gets up to 4gig of address space to itself. I shall expalin the reprocussions of this later...
Don't understand? Okay, this is how it works...
When an API call requests a memory space (doesn't matter if it has flagged anything) the MMU layer of the OS looks for a suitable space to store that memory and then tells the API call (I stored your data 0x001 here: 1F2C, or whatever). The point is, that doesn't have to be where the data is stored, the app just has to think that's where it is stored. The MMU has a look up table linking the 'virtual' location with the real memory location. When the memory stored at 1F2C gets called the MMU then fetches the real data...
My laptop battery is about to die. I do have more to say but I'm hoping the penny has dropped by now. I'll be back later, feel free to comment on what I've said so far.
-
...what effectively happens is that the MMU updates the 'virtual' memory addresses with the real memory addresses (like saying to the program 'here's where your data was').
...
I really can't see a problem with what I've described, perhaps you can?
How do you take, say, 16 4KiB pages scattered across the 4G physical address space that an application requested and originally thought was one contiguous 64KiB lump of memory and tell it "here is where your data was" ?
A single allocation of memory on a VM system using an MMU that an application uses a single pointer to refer to can translate into many unrelated chunks of genuine physical memory. You can't assume contiguous address mapped memory is contiguous in physical RAM.
-
@HenryCase
Problem is the OS can't decide wether one task made a legitemate request for data owned by another (lets say stuff running inside the input.device-task) or if it is just some roque pointer .....
Next idea is usually to just copy system-structures into each tasks private part of memory, but once you take a look into the interlinked (and open) system-structures you'll notice that ain't pratical either.
-
AeroMan wrote:
Ok, I recognize it was my fault to express my frustration with Windows problems, and I apologize for that, even if I don't agree with you.
Reality and facts ignore personal opinions, sorry to say.
Can we go back to Coldfire ? :-D It was getting really interesting...
You are still free to do so.
HenryCase wrote:
Not true. AmigaOS4, MorphOS, AROS64 all have at least partial memory protection.
Yea, remember the (tiny) shield and the samurai ?
I am currently investigating how easy it will be to add full memory protection to AROS (without throwing away AmigaOS 3.1 API compatibility),
No can do without an emulation/compatibilty layer, you still need an emulation layer #?UAE style if you want a portable yet amiga compatible OS, I belive bloodline gave you the reason why you can't just add MP to exec and think everythings gonna be alright.
and so I asked you which version you had used so I could get an idea of the stability compared to the other memory protection options.
Like I pointed out, it's *irrelevant*, if a buggy app copmtabile with OS 3.1, 2.1, and 1.3 runs in these it will bring'em down regardless how "stable" (bugless) any particular OS version is.
So I'll ask again, which version of AmigaOS were you using?
Last time it was 1.3, 2.0 and 3.something AIAB, when several soundtrackers sent the OS to hell. Offcourse those Soundtrackers werent compatible with the higher versions, but my point remains. Since then I havent touched any of'em.
Karlos wrote:
Memory protection exists to clean up the mess of bad coders...
*runs away*
And traffic lights exist to prevent the mess that would arise due to bad and "bad" drivers.
-
Karlos wrote:
How do you take, say, 16 4KiB pages scattered across the 4G physical address space that an application requested and originally thought was one contiguous 64KiB lump of memory and tell it "here is where your data was" ?
By arranging it into a 64KiB lump before you give memory control back to the program.
Karlos wrote:
A single allocation of memory on a VM system using an MMU that an application uses a single pointer to refer to can translate into many unrelated chunks of genuine physical memory. You can't assume contiguous address mapped memory is contiguous in physical RAM.
Give me an example of when you'd use a pointer to address more than one memory location so I can explain how its done.
-
Kronos wrote:
"the existing messaging system" would have to be updated to 64Bit in order to make full use of it, but that would require a recompile.
Wether it would actually do anything in regard to MP is questionable and would atleast require further measures (which will break source-compability too).
Updating to 64-bit would only need to be done on a 64-bit processor. Seeing as that jump is going to break stuff anyway, migration to a 64-bit processor would be a good time at which to make the necessary changes.
At a bare minimum, memory for messages should be marked as shared. Even better, message allocation/deallocation functions should be created that do this. What I was trying to say is that we wouldn't need a completely new message passing system. There would be additional restrictions, but the fundamentals would remain the same. The differences would be enough that old apps would have to run in a sandbox.
Hans
-
Kronos wrote:
@HenryCase
Problem is the OS can't decide wether one task made a legitemate request for data owned by another (lets say stuff running inside the input.device-task) or if it is just some roque pointer .....
I am not asking it to. I am asking the OS to make some limits on the commands that are physically not possible, like trying to put a 6-bit instruction into a 3-bit space. It was just one example of how the memory layer would keep the system stable.
Kronos wrote:
Next idea is usually to just copy system-structures into each tasks private part of memory, but once you take a look into the interlinked (and open) system-structures you'll notice that ain't pratical either.
Please expand on that point so I can understand it.
-
I see that in the old OS, the memory protection flags were ignored...but isn't the real question - were they actually set?
If they were actually set, couldn't the OS allocate a block of public memory available to all applications as well as private memory for each individual app...
If the some apps didn't set the flag in the call, wouldn't it be simple to find those calls and patch them by flipping a bit. Can't be hard to see where the app crashes...and find the offending call... ;)
This could be done with a "debug" Amiga emulator on x86 where you can poke around and step through stuff...
I know from the manuals I read, I read to always set the flag appropriately, but I only scratched the surface of Amiga programming 11 years ago...so I probably have things quite off...
-
HenryCase wrote:
Karlos wrote:
How do you take, say, 16 4KiB pages scattered across the 4G physical address space that an application requested and originally thought was one contiguous 64KiB lump of memory and tell it "here is where your data was" ?
By arranging it into a 64KiB lump before you give memory control back to the program.
Wow you can do that? (note: this is sarcasm)
Ok technically it might be possible to go an find all of those fragments of memory allocated to a process using the MMU somehow but even once you've found them and "give them back" the addresses that you "give back" to the program when you congeal that memory aren't going to be the same ones as it originally had when they were allocated under the MMU scheme.
Karlos wrote:
A single allocation of memory on a VM system using an MMU that an application uses a single pointer to refer to can translate into many unrelated chunks of genuine physical memory. You can't assume contiguous address mapped memory is contiguous in physical RAM.
Give me an example of when you'd use a pointer to address more than one memory location so I can explain how its done.
Striding through an array :-D
With all due respect Henry both bloodline and karlos are trying to explain how MMUs work from a position of knowing, at least from a code point-of-view, how they function. So some of your replies have read as being rather rude.
Andy
-
HenryCase wrote:
My laptop battery is about to die. I do have more to say but I'm hoping the penny has dropped by now. I'll be back later, feel free to comment on what I've said so far.
Continuing from where I left off...
Thought of an analogy about API calls that may be useful. In a restaurant, think of the API calls as the customers and the memory management and kernel as the waiter and chef. The customer can make any number of rediculous requests about what they'd like to eat, but in a restaurant it's the chef and waiter that decides what they're having. A non memory protected OS is like a restuarant without a waiter, the customers have to go into the kitchen to get their food, and can make a mess. If you put the waiter as a barrier between the kitchen and the customers then they can't make a mess in the kitchen. Hmm, must have food on the brain!
The point is API calls are not in control of the system, they are only means to make requests to that system. As long as the waiter delivers the same food that the chef was providing before the customer is not going to complain. Capiche?
Let me set a challenge. I've hopefully explained a little about how this memory abstraction layer will work. I'm going to stop calling it a MMU now and call it a MAU (Memory Allocation Unit) because although I am confident my ideas will work I don't know if this is how MMU's traditionally work.
My challenge is this. I have a program running which has got an important string "ABCDEF" stored in memory at real location "0x001" and virtual location "0x0FF". Using your knowledge of the Amiga's API architecture, I want you to try and corrupt this data. I am confident that whatever you suggest can be dealt with without breaking AmigaOS compatibility, so try and prove me wrong.
-
Hans_ wrote:
Kronos wrote:
"the existing messaging system" would have to be updated to 64Bit in order to make full use of it, but that would require a recompile.
Wether it would actually do anything in regard to MP is questionable and would atleast require further measures (which will break source-compability too).
Updating to 64-bit would only need to be done on a 64-bit processor. Seeing as that jump is going to break stuff anyway, migration to a 64-bit processor would be a good time at which to make the necessary changes.
At a bare minimum, memory for messages should be marked as shared. Even better, message allocation/deallocation functions should be created that do this. What I was trying to say is that we wouldn't need a completely new message passing system. There would be additional restrictions, but the fundamentals would remain the same. The differences would be enough that old apps would have to run in a sandbox.
Hans
Hans!! There is nothing wrong with the message system, the problem is what is sent in those messages! The OS doesn't know what they contain and they could contain anything! :-)
-
AJCopland wrote:
Wow you can do that? (note: this is sarcasm)
Think of it like HD defragmentation if it helps you understand it.
AJCopland wrote:
With all due respect Henry both bloodline and karlos are trying to explain how MMUs work from a position of knowing, at least from a code point-of-view, how they function. So some of your replies have read as being rather rude.
I'm not trying to be rude, but it's hard to maintain composure when you are being misunderstood. As I've just mentioned, lets call my idea an MAU (Memory Allocation Unit) as apposed to a MMU as I don't know if what I'm describing is your daddy's memory management system, if you catch my drift.
-
HenryCase wrote:
My challenge is this. I have a program running which has got an important string "ABCDEF" stored in memory at real location "0x001" and virtual location "0x0FF". Using your knowledge of the Amiga's API architecture, I want you to try and corrupt this data. I am confident that whatever you suggest can be dealt with without breaking AmigaOS compatibility, so try and prove me wrong.
10: program#1 has address 0x0FF
20: tells program#2 that the string it wants is at 0x0FF
30: program#2 accesses 0x0FF and catches a waiter whacking off into program#1's soup...
...they never eat at that restaurant again :crazy:
-
HenryCase wrote:
As I've just mentioned, lets call my idea an MAU (Memory Allocation Unit) as apposed to a MMU as I don't know if what I'm describing is your daddy's memory management system, if you catch my drift.
You could retain MMU, and call it Memory Mapping Unit :)
-
HenryCase wrote:
Karlos wrote:
How do you take, say, 16 4KiB pages scattered across the 4G physical address space that an application requested and originally thought was one contiguous 64KiB lump of memory and tell it "here is where your data was" ?
By arranging it into a 64KiB lump before you give memory control back to the program.
Memory allocated in a Memory Protected environment is built up from small blocks of RAM... they are usually 4k in size (this is convenient on a 32bit system) and they are called pages... The MMU can take pages from anywhere in the Physical address space, and it puts them together into a virtual address space that the task sees as a single contiguous area of ram. This is why MP systems don't suffer from RAM fragmentation like the Amiga does. What you are suggesting is a serious amount of house keeping... MP is expensive enough in terms of RAM and CPU Cycles... in order to rebuild the virtual address space would require probably another 4gig on top of your exisitng ram!!!
You really do need to read about MMU's I suggest the x86 MMU as it is the most logical IMHO... I would be more than happy to help answer any questions!
Karlos wrote:
A single allocation of memory on a VM system using an MMU that an application uses a single pointer to refer to can translate into many unrelated chunks of genuine physical memory. You can't assume contiguous address mapped memory is contiguous in physical RAM.
Give me an example of when you'd use a pointer to address more than one memory location so I can explain how its done.
In MP system a contiguous block of Virtual memory is more than one Physical memory location. :-)
-
HenryCase wrote:
HenryCase wrote:
My laptop battery is about to die. I do have more to say but I'm hoping the penny has dropped by now. I'll be back later, feel free to comment on what I've said so far.
Continuing from where I left off...
Thought of an analogy about API calls that may be useful. In a restaurant, think of the API calls as the customers and the memory management and kernel as the waiter and chef. The customer can make any number of rediculous requests about what they'd like to eat, but in a restaurant it's the chef and waiter that decides what they're having. A non memory protected OS is like a restuarant without a waiter, the customers have to go into the kitchen to get their food, and can make a mess. If you put the waiter as a barrier between the kitchen and the customers then they can't make a mess in the kitchen. Hmm, must have food on the brain!
The point is API calls are not in control of the system, they are only means to make requests to that system. As long as the waiter delivers the same food that the chef was providing before the customer is not going to complain. Capiche?
Let me set a challenge. I've hopefully explained a little about how this memory abstraction layer will work. I'm going to stop calling it a MMU now and call it a MAU (Memory Allocation Unit) because although I am confident my ideas will work I don't know if this is how MMU's traditionally work.
My challenge is this. I have a program running which has got an important string "ABCDEF" stored in memory at real location "0x001" and virtual location "0x0FF". Using your knowledge of the Amiga's API architecture, I want you to try and corrupt this data. I am confident that whatever you suggest can be dealt with without breaking AmigaOS compatibility, so try and prove me wrong.
Task A Stores "ABCDEF" at 0x0FF.
Task A sends a message to Task B.
The message simply contains 0x0FF.
Task B looks at Address 0x0FF and it doesn't find "ABCDEF".
Reason:
0x0FF are two different Physical memory locations to the separate tasks. infact Task B might even have some code at its 0x0FF, or anthing!!!
The OS has no idea that Physical address 0x0001 has to be mapped to the same place in both tasks... in fact it might not even be possible to map that physical address to the same virtual address!
Neither of the Tasks can ever know the Physical address, as their 0x0001 is mapped somewhere else... different for each task!!!
-
bloodline wrote:
Hans!! There is nothing wrong with the message system, the problem is what is sent in those messages! The OS doesn't know what they contain and they could contain anything! :-)
I agree. The thing that would change in a system with full MP is that programmers will have to allocate memory appropriately and be more careful about what's in the messages.
Hans
-
HenryCase wrote:
AJCopland wrote:
Wow you can do that? (note: this is sarcasm)
Think of it like HD defragmentation if it helps you understand it.
Oh I understand what you're saying perfectly clearly. An old version of an arcade game conversion that I worked on used a shared pool of memory instead of doing full blown networking (don't 'cos I can't tell you which). Each machine was given a physical ID (using jumpers) and this told them what address they were allowed to write into, and conversely what addresses other machines on the "network" would be writing into.
This situation is similar to the AmigaOS how? You ask. Well with AOS you've got a linear address space. Which means that to program#1 a pointer with the address 0x0000FEED is the same as program#2s address of 0x0000FEED.
Now at this point my knowledge of workbench/AOS gets wonky 'cos i've never done any coding in it.
The above means that programs could, and probably did, pass around the pointer addresses to data rather than actual data itself because since nothing ever set or respected the memory protection flags it was more efficient to pass a 4byte memory address than a larger data structure like a string.
Implementing memory protection or memory management et al at this stage would break all of those existing programs, and probably workbench as well because they'd no longer be able to communicate with either the underlying OS or the other programs like they used too as each program would be within its own 4GB protected memory space and so the pointer addresses that it was merrily passing around would no longer match those of the program on the recieving end.
EDIT: The relevence to the first paragraph i wrote is that if those arcade machines OS had memory protection then they'd never get each others messages because whilst they'd all be accessing their _own_ versions of the correct memory addresses the physical section of memory that they'd be writing too would be different from the physical address that the other machines were reading from :EDIT
Now that might all be bollox because i've had a long day.
I also just realised that this was the ColdFIRE thread and I only came in here in the hope that someone would be commenting on THAT topic. So I apologise profusely to everyone for continuing this off-topic subject.
Andy
-
bloodline wrote:
Task A Stores "ABCDEF" at 0x0FF.
Task A sends a message to Task B.
The message simply contains 0x0FF.
Task B looks at Address 0x0FF and it doesn't find "ABCDEF".
Reason:
0x0FF are two different Physical memory locations to the separate tasks. infact Task B might even have some code at its 0x0FF, or anthing!!!
thats what i said ... ok so I was more uncouth :-D
-
AJCopland wrote:
HenryCase wrote:
AJCopland wrote:
Wow you can do that? (note: this is sarcasm)
Think of it like HD defragmentation if it helps you understand it.
Oh I understand what you're saying perfectly clearly. An old version of an arcade game conversion that I worked on used a shared pool of memory instead of doing full blown networking (don't 'cos I can't tell you which). Each machine was given a physical ID (using jumpers) and this told them what address they were allowed to write into, and conversely what addresses other machines on the "network" would be writing into.
This situation is similar to the AmigaOS how? You ask. Well with AOS you've got a linear address space. Which means that to program#1 a pointer with the address 0x0000FEED is the same as program#2s address of 0x0000FEED.
Now at this point my knowledge of workbench/AOS gets wonky 'cos i've never done any coding in it.
The above means that programs could, and probably did, pass around the pointer addresses to data rather than actual data itself because since nothing ever set or respected the memory protection flags it was more efficient to pass a 4byte memory address than a larger data structure like a string.
Implementing memory protection or memory management et al at this stage would break all of those existing programs, and probably workbench as well because they'd no longer be able to communicate with either the underlying OS or the other programs like they used too as each program would be within its own 4GB protected memory space and so the pointer addresses that it was merrily passing around would no longer match those of the program on the recieving end.
EDIT: The relevence to the first paragraph i wrote is that if those arcade machines OS had memory protection then they'd never get each others messages because whilst they'd all be accessing their _own_ versions of the correct memory addresses the physical section of memory that they'd be writing too would be different from the physical address that the other machines were reading from :EDIT
Now that might all be bollox because i've had a long day.
You are spot on. This is exactly the situation we are in with AmigaOS.
I also just realised that this was the ColdFIRE thread and I only came in here in the hope that someone would be commenting on THAT topic. So I apologise profusely to everyone for continuing this off-topic subject.
Andy
This is sort of related :-)
-
Hans_ wrote:
bloodline wrote:
You put an OS layer inbetween memory allocation functions and the real memory. This MMU gives the memory allocation functions the impression that it is writing directly to the memory, when in reality it is controlling memory allocation/deallocation and protecting unwanted memory access.
I don't understand... The way memory protection would work is that each task gets it's own address space... in effect each task gets up to 4gig of address space to itself. I shall expalin the reprocussions of this later...
A good book on how MMU's work should give you a better understanding... The intel developer docs are a great place to start!
You don't need to create separate memory spaces to have full memory protection. In fact, now that many CPUs have 64-bit addressing, there are those who argue that a single linear address space is a good idea. Doing this would allow the existing messaging system to work fine, with the added restriction that programmers do need to mark the appropriate memory as shared.
If you have a flat VM... then you only have a single page table and you can't do per task protection!! Sure some pages can be read only... but really the only thing protected in that scheme would be the OS/Kernel... and that would give you a situation like Win95... shudder...
-Edit- Also, it would very hard to implement a VM paging to disk system with a flat address space... we all want to be able to page memory to and from disk :-)
-Edit2- No, wait I see what you mean now! Separate spaces mapped to the same physical locations... yes that's doable, but the house keeping would be quite complex :-)
In order for Amiga OS to have full memory protection whilst still allowing old software to run, there is only one option: place the old apps in a sandbox. There the old apps can do all the naughty things that they used to get away with because the rules were never enforced. Unfortunately, this does mean significant rewrites of some OS modules such as intuition, which didn't follow the rules either.
Hans
This is very true. As apple did!!! Sandbox out the old apps.
-
HenryCase wrote:
AJCopland wrote:
Wow you can do that? (note: this is sarcasm)
Think of it like HD defragmentation if it helps you understand it.
If you understand HD fragmentation... then you can apply that to how the MMU works... which is basically the hardware that allows the memory fragments (or pages) to be addressed as though they were ordinary RAM.
AJCopland wrote:
With all due respect Henry both bloodline and karlos are trying to explain how MMUs work from a position of knowing, at least from a code point-of-view, how they function. So some of your replies have read as being rather rude.
I'm not trying to be rude, but it's hard to maintain composure when you are being misunderstood. As I've just mentioned, lets call my idea an MAU (Memory Allocation Unit) as apposed to a MMU as I don't know if what I'm describing is your daddy's memory management system, if you catch my drift.
The problem is that the OS doesn't and can't know the mind of the programmer... The Programmer, on AmigaOS, is free to do whatever he/she wants. Thus the OS can't control what the tasks do, and they can happily break all the rules (The Memory protection) that you might want to put into place, and they suffer no ill in AmigaOS... but enforce those rules and the tasks break.
-
AJCopland wrote:
With all due respect Henry both bloodline and karlos are trying to explain how MMUs work from a position of knowing, at least from a code point-of-view, how they function. So some of your replies have read as being rather rude.
Andy
Henry is just passionate, I can understand that! If I can help steer his passion then I am happy to answer his questions!
OS Design is not an easy topic, it's good that he wants to think about it and does think about it!
If he is getting frustrated, then I blame my inability to explain the situation... I will endeavour to improve my communication skills!
-Edit- I think we can also blame the AmigaOS designers for cutting corners :-D
-
HenryCase wrote:
Karlos wrote:
How do you take, say, 16 4KiB pages scattered across the 4G physical address space that an application requested and originally thought was one contiguous 64KiB lump of memory and tell it "here is where your data was" ?
By arranging it into a 64KiB lump before you give memory control back to the program.
Assuming you could do this, do you have any idea how complex the algorithm required to sort all the scattered physical blocks into contiguous lumps that reflect what the code originally allocated and ensuring pointers everywhere in the system are updated? That's not even including the overhead of copying pages of memory around.
Give me an example of when you'd use a pointer to address more than one memory location so I can explain how its done.
Any code that walks arrays, traverses containers, manipulates strings, etc.
-
HenryCase wrote:
Kronos wrote:
@HenryCase
Problem is the OS can't decide wether one task made a legitemate request for data owned by another (lets say stuff running inside the input.device-task) or if it is just some roque pointer .....
I am not asking it to. I am asking the OS to make some limits on the commands that are physically not possible, like trying to put a 6-bit instruction into a 3-bit space. It was just one example of how the memory layer would keep the system stable.
But the OS can't limit anything, because it has no idea what the Tasks are passing to each other, or what they are accessing in the system...
Kronos wrote:
Next idea is usually to just copy system-structures into each tasks private part of memory, but once you take a look into the interlinked (and open) system-structures you'll notice that ain't pratical either.
Please expand on that point so I can understand it.
On a system like MacOX, to alter a system structure you have to use a specific OS function, this means that Apple can mess around with how the OS works and the Tasks are none the wiser... In AmigaOS due to it's open nature, tasks tended to alter system structures directly, which means the Amiga design team couldn't alter the system under hood... it also means they couldn't protect the OS anymore... as soon as you protect the OS, the tasks can't access the data they are used to accessing. Intuition, for example, actually documents this as an official method of working!!!! Apple would never allow anything like that!!!!
-
bloodline wrote:
On a system like MacOX, to alter a system structure you have to use a specific OS function, this means that Apple can mess around with how the OS works and the Tasks are none the wiser... In AmigaOS due to it's open nature, tasks tended to alter system structures directly, which means the Amiga design team couldn't alter the system under hood... it also means they couldn't protect the OS anymore... as soon as you protect the OS, the tasks can't access the data they are used to accessing. Intuition, for example, actually documents this as an official method of working!!!! Apple would never allow anything like that!!!!
Way back when PowerPC cards were just coming out, programming guidelines were published that explicitly said something along the lines of:
In future OS structures will not be directly accessible. Use OS functions to access OS structures/data or your code will break in future versions of the OS.
The problem was, they never got round to providing all the accessor functions required.
Hans
-
Hans_ wrote:
bloodline wrote:
On a system like MacOX, to alter a system structure you have to use a specific OS function, this means that Apple can mess around with how the OS works and the Tasks are none the wiser... In AmigaOS due to it's open nature, tasks tended to alter system structures directly, which means the Amiga design team couldn't alter the system under hood... it also means they couldn't protect the OS anymore... as soon as you protect the OS, the tasks can't access the data they are used to accessing. Intuition, for example, actually documents this as an official method of working!!!! Apple would never allow anything like that!!!!
Way back when PowerPC cards were just coming out, programming guidelines were published that explicitly said something along the lines of:
In future OS structures will not be directly accessible. Use OS functions to access OS structures/data or your code will break in future versions of the OS.
The problem was, they never got round to providing all the accessor functions required.
Hans
By the time the PPC cards were coming out, the OS had been out of development for 3 years... I think providing the accessor functions or not... that was far too little much much too late :-) Commodore cut corners with AmigaOS from the very beginning... to meet deadlines or keep performance up I don't know... but AOS2.0 really should have fixed an awful lot more than it did!
-
@HenryCase
I think you'd probably find "Managed Code" operating systems rather interesting the most famous being MicroSoft's Singularity (http://en.wikipedia.org/wiki/Singularity_%28operating_system%29)!
These put such controls on the programmer that no memory protection hardware is needed, this offers some advantages of AmigaOS without the fragile nature of AmigaOS's open memory model! Obviously it has disadvantages too... But still, some of the ideas are likely to make it into the mainstream!
-
I've never programmed a processor or microcontroller with MMU (unless you count PC code here, but I can't touch it), so let me ask to the MMU gurus here if my way of thinking is correct:
1: I suppose a task/process must be launched by the user or by the system at first instance.
2: The launched task may launch other daughter tasks if it needs.
Now the questions:
1: Am I wrong to suppose that the memory areas, will be shared only between the initial task, the system and the daughters ?
2: If so, why would somebody share memory with alien tasks for other purpouses than snooping ?
3: How often would software do this ? One per cent ? Ten ? One in a milllion ? :-)
4: Could we protect the memory in such a way that the task could only see what was allocated by itself and it's sub tasks ?
This could work in a way like having the Amiga/AROS just for this task, and could avoid thrashing other tasks' memory.
It looks fine for me, as it would ensure that a runaway task woud not mess with others.
-
AeroMan wrote:
I've never programmed a processor or microcontroller with MMU (unless you count PC code here, but I can't touch it), so let me ask to the MMU gurus here if my way of thinking is correct:
1: I suppose a task/process must be launched by the user or by the system at first instance.
2: The launched task may launch other daughter tasks if it needs.
Now the questions:
1: Am I wrong to suppose that the memory areas, will be shared only between the initial task, the system and the daughters ?
Depends upon the OS, Windows threads exist very much in the Parent's address space... I think the same is true of Linux, though I'm sure they are more heavy weight than Windows threads.
On these two OSs I believe only threads share address space, programs don't, and they MUST communicate via carefully defined and controlled methods.
2: If so, why would somebody share memory with alien tasks for other purposes than snooping ?
AmigaOS doesn't have threads... everything is a task, if you added Memory Protection these tasks should not share address space... You would need to add threading library really (AROS has added some fields to the task structure to make them pthread compatible*) :-)
*I would note that new functions would need to be added to exec or a new thread library to make use of them!
3: How often would software do this ? One per cent ? Ten ? One in a milllion ? :-)
We use Tasks as an example because it's easy to see what's going on... but all the arguments apply to Libraries, devices and in fact all system structures... The scale of the problem is huge!
Tasks and libraries are always poking around in structures they don't own.
4: Could we protect the memory in such a way that the task could only see what was allocated by itself and it's sub tasks ?
As stated above, that is only a tiny part of the problem.
This could work in a way like having the Amiga/AROS just for this task, and could avoid thrashing other tasks' memory.
It looks fine for me, as it would ensure that a runaway task woud not mess with others.
Memory Protection will break so much you wouldn't believe it! Blame Commodore :-)
-
@thread
First of all, I'd like to thank those that have responded to my memory protection posts, I've got a lot to digest there, and will do ASAP. However, it is late so I'll concentrate on bloodline's post quoted below, as it seems the most concise.
I would also like to state that I think memory protection would be useful with Coldfire, especially if we want to move away from emulating every 68k instruction.
@bloodline
bloodline wrote:
HenryCase wrote:
My challenge is this. I have a program running which has got an important string "ABCDEF" stored in memory at real location "0x001" and virtual location "0x0FF". Using your knowledge of the Amiga's API architecture, I want you to try and corrupt this data. I am confident that whatever you suggest can be dealt with without breaking AmigaOS compatibility, so try and prove me wrong.
Task A Stores "ABCDEF" at 0x0FF.
Task A sends a message to Task B.
The message simply contains 0x0FF.
Task B looks at Address 0x0FF and it doesn't find "ABCDEF".
Reason:
0x0FF are two different Physical memory locations to the separate tasks. infact Task B might even have some code at its 0x0FF, or anthing!!!
The OS has no idea that Physical address 0x0001 has to be mapped to the same place in both tasks... in fact it might not even be possible to map that physical address to the same virtual address!
Neither of the Tasks can ever know the Physical address, as their 0x0001 is mapped somewhere else... different for each task!!!
So if I understand correctly you're saying that the data cannot be read by more than one task because the OS doesn't know how to share this information?
If I have understood correctly, allow me to use your example but fix it with the MAU (yes I am still going to use that term to avoid confusion):
Task A requests to store "ABCDEF", the MAU stores it at location 0x0001 and passes back a pointer to that location through assigning a 'virtual memory space' 0x0FF.
Task A sends a message through the MAU to Task B to look at memory space 0x0FF.
Task B makes the request to look at address 0x0FF, the MAU does not see a reason to not show this information, and fetches the string "ABCDEF" to pass on to Task B.
So in terms of memory usage, what do you have? You have one location where the string is stored in memory, a simple look up table stored in memory that is controlled by the MAU, the MAU itself which provides this abstraction layer and offers opportunities to control memory usage, and the memory spaces used by the tasks.
The address 0x0FF can be called as many times as you want (please note this virtual memory value can only be assigned to one real memory space at a time), but each time the MAU will make a decision based on certain rules to determine whether that access is given. It will also be giving programs non overlapping memory spaces to run in.
Does this explanation help?
EDIT: Should also say that I intend to separate kernel commands from the kernel itself so that its commands go through the MAU too (obviously with a high level of privileges). I thought that was worth pointing out in case you thought I was intending to do MP with the standard design Amiga kernel.
-
Thanks for the explanation Bloodline,
One more question:
Would it be possible to treat memory under three different areas like this:
1) System (system structures and stuff that wasn't previously allocated) - memory has no protection and everyone can see and modify it.
2) Code (all of it...) - only system can play around with that. Let's sacrifice self modifyable code.
3) Data (all allocated memory) - only tasks and sub tasks may modify it.
It is not perfect, but maybe it can improve a little bit the situation. ;-)
-
HenryCase wrote:
Task A requests to store "ABCDEF", the MAU stores it at location 0x0001 and passes back a pointer to that location through assigning a 'virtual memory space' 0x0FF.
Task A sends a message through the MAU to Task B to look at memory space 0x0FF.
Task B makes the request to look at address 0x0FF, the MAU does not see a reason to not show this information, and fetches the string "ABCDEF" to pass on to Task B.
The reason that won't work is that there are no "requests to look at address 0x0FF" it simply does it. Theres nowhere for you to "insert" a virtual->physical memory lookup.
On systems with an MMU the hardware itself does the work of changing the address but it does this in the programs _own_ virtual memory space. Which is what brings us back to the original problem that Bloodline and I have pointed out to you, i.e. that virtual memory is _private_ to that program/task/etc.
If your MMU just let everyone read and write into the same memory space then it wouldn't be much use as an MMU because you'd still have the same linear address space that we have NOW on the AmigaOS and so you'd have no memory protection!
Your description would give things virtual memory, but it would give everyone the same virtual memory and thus it wouldn't give anything that ran in that virtual memory any memory protection.
Nice try but the system you've described doesn't fix any existing programs.
The only solution to this is to run the older Apps in a sandbox as someone else suggested and NEW apps can be run in a system friendly way using the MMU to gain memory protection.
The trouble is that this requires a rewrite of AmigaOS because AmigaOS itself is one of the main offenders for doing this kind of direct manipulation.
I suggest that if you want to continue this discussion we start a new thread.
Someone actually had an interesting topic going on here about the ColdFire and I was getting the impression that new versions V4e and V5 of the cores might actually have the necessary supervisor stacks to run 680x0 code using the cf68klib.
@FrenchShark
That asm listing you dumped earlier on, what version of the CF core was it for that made it necessary for you to emulate the entire instruction set?
Andy
-
My 2 cents,
* Memory protection is nearly impossible to implement under the idea of AMIGA OS.
* That AMIGA OS does not require memory protection gives it a VERY BIG speed boost.
* To secure a system CPU based memory protection can help.
But the system can still be destroyed by BLITTER or bwrongly set up DMA channels berserking through your system.
To prevent this the OS needs to forbid the direct usage of Blitter or userspace Disk DMA. If you do this you will sacrifice a huge amount of performance.
=> This is 100% the opposite of the idea and spirit of the Amiga OS.
* I would like to point out that there are other ways to stabilize a system. 99% of crashed come from bad pointer arithmetic. You can try to reduce the harm cause by the bad pointer by enforcing memory protection (for a high cost) or you can use coding styles which will not cause this problem in the first place. A would like to point out that the Amiga Oberon programs did NEVER crash!
I agree that this topic has nothing to do with the Coldfire.
And that for continues discussion opening another thread makes good sense.
Someone actually had an interesting topic going on here about the ColdFire and I was getting the impression that new versions V4e and V5 of the cores might actually have the necessary supervisor stacks to run 680x0 code using the cf68klib.
Yes, this is true.
@FrenchShark
That asm listing you dumped earlier on, what version of the CF core was it for that made it necessary for you to emulate the entire instruction set?
FrenchShark, post looked very interesting.
I would like to know if he is interested in working together in this project?
-
-Edit- @HenryCase I would be delighted to discuss these ideas with you, but as others have suggested please start a new thread, so we don't upset anyone :-)
Reply here:
http://www.amiga.org/forums/showthread.php?t=35580
HenryCase wrote:
@thread
First of all, I'd like to thank those that have responded to my memory protection posts, I've got a lot to digest there, and will do ASAP. However, it is late so I'll concentrate on bloodline's post quoted below, as it seems the most concise.
Actually, I think AJCopland's games console story is probably the easiest to understand...
I would also like to state that I think memory protection would be useful with Coldfire, especially if we want to move away from emulating every 68k instruction.
@bloodline
bloodline wrote:
HenryCase wrote:
My challenge is this. I have a program running which has got an important string "ABCDEF" stored in memory at real location "0x001" and virtual location "0x0FF". Using your knowledge of the Amiga's API architecture, I want you to try and corrupt this data. I am confident that whatever you suggest can be dealt with without breaking AmigaOS compatibility, so try and prove me wrong.
Task A Stores "ABCDEF" at 0x0FF.
Task A sends a message to Task B.
The message simply contains 0x0FF.
Task B looks at Address 0x0FF and it doesn't find "ABCDEF".
Reason:
0x0FF are two different Physical memory locations to the separate tasks. in fact Task B might even have some code at its 0x0FF, or anything!!!
The OS has no idea that Physical address 0x0001 has to be mapped to the same place in both tasks... in fact it might not even be possible to map that physical address to the same virtual address!
Neither of the Tasks can ever know the Physical address, as their 0x0001 is mapped somewhere else... different for each task!!!
So if I understand correctly you're saying that the data cannot be read by more than one task because the OS doesn't know how to share this information?
No, The OS can easily put in place measures to share data... Due to the design of AmigaOS, the OS cannot know what data needs to be shared. The systems put in place to provide this information at Exec's design phase were never enforced, and thus ignored... Not only by Developers but by the OS design team themselves!
If I have understood correctly, allow me to use your example but fix it with the MAU (yes I am still going to use that term to avoid confusion):
The problem is CPU's don't come with MAUs... they come with MMUs :-(
Task A requests to store "ABCDEF", the MAU stores it at location 0x0001 and passes back a pointer to that location through assigning a 'virtual memory space' 0x0FF.
But what is an MAU? Is it a piece of hardware? If so how on earth does it work? Where do we get one from?
Is it a piece of software? If so, are you really suggesting that we trap every single memory access made by a program on the off chance that something written to memory might be need to be shared? This would be so slow as to make the whole system pointless.
Task A sends a message through the MAU to Task B to look at memory space 0x0FF.
Task B makes the request to look at address 0x0FF, the MAU does not see a reason to not show this information, and fetches the string "ABCDEF" to pass on to Task B.
I have pointed out, as far as Task B is concerned 0x0FF does not relate to the same memory as What Task A sees as 0x0FF. Each task has it's own Address space... With Memory protection, each task effectively has its own computer... It has its own OS, it's own CPU and it's own memory, no other tasks exist in its universe... that is why memory protection is so safe.
Think of the Address space as being an island (imagine that the island is divided into 9 equal squares, this gives us a good 3x3 mental image, like a noughts and crosses board), and the Task as a small man living on that island.
The tasks can talk by sending messages... message in a bottle if you like!
In AmigaOS, Both Tasks (little men) live on the same Island, if Task A puts something in square one, then Task B can find that data in square one, since Square one is the same for both tasks.
With Memory Protection:
Each Task (little man), lives in it's own address space (on it's own little island). When Task A puts something in Square one on it's own island, that data doesn't magically appear in square one on Task B's island... In fact Task B probably has something different in square one! Neither Task can ever know the layout of each other's island.
The reason for this is the way AmigaOS is designed... In operating systems that have had Memory Protection from the start, the developers have had to make sure that data that needs to be shared is done so via Operating system mandated method. In AmigaOS this was never necessary, and thus Tasks do not provide the OS with the information it needs to allow memory to be shared.
So in terms of memory usage, what do you have? You have one location where the string is stored in memory, a simple look up table stored in memory that is controlled by the MAU, the MAU itself which provides this abstraction layer and offers opportunities to control memory usage, and the memory spaces used by the tasks.
What you are suggesting is a look up table that records every memory access on the off chance that it might need to be shared... Which is massively resource heavy undertaking... both CPU cycles and Memory... and even then HOW does the OS actually know that when Task B is referring to a particual memory location that it needs to be shared with another task? The tasks are no required by AmigaOS to let anyone know what they are up to!
What you are failing to see is that you can second guess what's going on because you know all sides of the equation... Task A and Task B only care about each other and expect to be able to mess around with each other's data structures. And the operating system (with memory protection) only cares about keeping tasks from messing around with each other's data structures.
With the design of AmigaOS there has never been the need to let the OS know that it's OK for task A to interfere with Task B... Thus no software has ever provided the OS with the information it needs to allow the tasks to Share Memory, in a Memory Protected environment.
The address 0x0FF can be called as many times as you want (please note this virtual memory value can only be assigned to one real memory space at a time), but each time the MAU will make a decision based on certain rules to determine whether that access is given.
But this is just it, programs written for AmigaOS do not provide enough information to the OS/MMU/MAU to allow it to make a decision as to what is shared and what is private.
It will also be giving programs non overlapping memory spaces to run in.
Does this explanation help?
EDIT: Should also say that I intend to separate kernel commands from the kernel itself so that its commands go through the MAU too (obviously with a high level of privileges). I thought that was worth pointing out in case you thought I was intending to do MP with the standard design Amiga kernel.
It's easy to design an OS with Memory Protection, there are numerous examples... what we have with AmigaOS is a system that has a 25 year development legacy of no Memory Protection, All existing programs assume this to be the case. You add memory protection and you have put them into a new environment that is no longer AmigaOS and they won't run.
-
In FreeMiNT (ok, I'm from the other side of the fence, sorry), it's possible to specify the level of protection for each process through dedicated flags in the program header. In practice, this means that you can flag two apps as belonging to "global" memory, causing them to share the same address space. These two applications can then access eachothers memory freely. (Ok, it's not considered perfectly clean, but it provides some degree of compatibility for older applications). Well behaved applications can run as "private", and their memory can't be touched nor can they touch other memory belonging to other processes.
Couldn't this approach be used in AmigaOS as well?
-
shoggoth wrote:
In FreeMiNT (ok, I'm from the other side of the fence, sorry), it's possible to specify the level of protection for each process through dedicated flags in the program header. In practice, this means that you can flag two apps as belonging to "global" memory, causing them to share the same address space. These two applications can then access eachothers memory freely. (Ok, it's not considered perfectly clean, but it provides some degree of compatibility for older applications). Well behaved applications can run as "private", and their memory can't be touched nor can they touch other memory belonging to other processes.
Couldn't this approach be used in AmigaOS as well?
Reply in new thread!
http://www.amiga.org/forums/showthread.php?t=35580
-
biggun wrote:
My 2 cents,
* Memory protection is nearly impossible to implement under the idea of AMIGA OS.
So apple fans don't regard MacOSX as Mac(OS) ?
* That AMIGA OS does not require memory protection gives it a VERY BIG speed boost.
What is the performance boost useful for when you will *not* use it for anything meaninful when slightest bug in any running task can destroy, say, the CD/DVD I was burning. Now it's (the CD/DVD) just useless and intended for the garbage can, and I'm now loading XP to do it the *safe* way, sheesh!
[/quote]* To secure a system CPU based memory protection can help.
But the system can still be destroyed by BLITTER or bwrongly set up DMA channels berserking through your system.
To prevent this the OS needs to forbid the direct usage of Blitter or userspace Disk DMA. If you do this you will sacrifice a huge amount of performance.[/quote]
Read my reply above.
=> This is 100% the opposite of the idea and spirit of the Amiga OS.
The spirit gets its rear end kicked by spiritless OSs!
* I would like to point out that there are other ways to stabilize a system. 99% of crashed come from bad pointer arithmetic. You can try to reduce the harm cause by the bad pointer by enforcing memory protection (for a high cost) or you can use coding styles which will not cause this problem in the first place. A would like to point out that the Amiga Oberon programs did NEVER crash!
It's like saying we don't need Police Departments, only if people behave than we could rid of'em have gain an economic boost, but unfortunately this is not reality.
I agree that this topic has nothing to do with the Coldfire.
And that for continues discussion opening another thread makes good sense.
Well, claims need to be answered, on spot, sorry about that.
-
Einstein wrote:
* I would like to point out that there are other ways to stabilize a system. 99% of crashed come from bad pointer arithmetic. You can try to reduce the harm cause by the bad pointer by enforcing memory protection (for a high cost) or you can use coding styles which will not cause this problem in the first place. A would like to point out that the Amiga Oberon programs did NEVER crash!
It's like saying we don't need Police Departments, only if people behave than we could rid of'em have gain an economic boost, but unfortunately this is not reality.
Your post clearly shows that you are NOT understanding the concept of AMIGA OS.
Memory protection on CLASSIC AMIGA is impossible and useless ! That's a fact which is obvious to those with some programming experience.
If you want to argue about useless things ok.
But please use another thread for this
-
biggun wrote:
Your post clearly shows that you are NOT understanding the concept of AMIGA OS.
What makes you think that ? trust me I know enough to understand *one cannot just add MP to exec while leaving the (future) userland modules in the same condition*, seems you have not been reading my posts.
And FWIW, I have read Andrew Tanenbaums Modern Operating Systems. Lets not post strange replies out of frustration :)
Memory protection on CLASSIC AMIGA is impossible and useless ! That's a fact which is obvious to those with some programming experience.
Great, I just started programming an hour ago, happy now ? no more frustration ?
Yes MP on classic Amiga as in A500 is impossible as 68000 lacks a MMU, if you mean classic AmigaOS though (should have added "OS" in that case) then it's impossible also, unless you rewrite *and* sandbox the a whole environment including the applications (preferably UAE style if one wants to run HW banging stuff, note HW banging does not necessarily imply apps that bypass the OS at boot time, but those that does it *while in the OS*)
If you want to argue about useless things ok.
But please use another thread for this
What useless things ? the only thing useless is AmigaOS ' glorified performance BS and your ridiculous claims.
-
Its not about glorifying. Its about facts!
For "proper" protection you need to sandbox and abstract all and everything.
This means no direct access to Blitter or any HW anymore!
This is the opposite direction of the original AMIGA and the NATAMI or Coldfire designs.
Asking for memory protection makes only sense for people and systems that are willing to trade a lot of performance for abstraction.
Full memory protection => No more direct access to blitter.
A Coldfire with SuperAGA will be very fast but
if you ask for MP then you it will crawl.
Please understand that Coldfire and MP are mutually exclusive.
If you want to argue for MP DON'T do it in the Coldfire thread.
-
biggun wrote:
Its not about glorifying.
Its about facts.
I don't consider an airplane elegant if something as small as a fly sneezes at it causing the plane to go down taking all the passengers with it.
For "proper" protection you need to sandbox and abstract all and everything.
This means no direct access to Blitter anymore!
Yup, thats one of the points.
This is the opposite direction of what NATAMI or Coldfire design.
The issue isn't what NATAMI project wants to achieve, it can be a refrigerator as far as I'm concerned, my "useless" posts were simply aimed at a claim.
A Coldfire with SuperAGA will be very fast but
if you ask for MP then you it will crawl.
I'm not asking anything regarding ColdNAT, read above.
Please understand that Coldfire and MP are mutually exclusive.
If you want to argue for MP DON'T do it in the Coldfire thread.
Just counter arguments to your claims. If you claim something you cannot seriously demand not to receive a reply, are you kidding me ?
And please stop typing in bold, there you have a useless action.
Thank you, and good luck with your project.
-
biggun wrote:
Someone actually had an interesting topic going on here about the ColdFire and I was getting the impression that new versions V4e and V5 of the cores might actually have the necessary supervisor stacks to run 680x0 code using the cf68klib.
Yes, this is true.
Have you got your developer board yet so that this could be tested in some way? I'm not initially interested in the performance merely in the compatibility of the actual V4e/V5 cores. Speed can come later for me, I doubt that at 266MHz it'll be much slower than an 040/40.
For speed, for the games that won't run or are just unplayable then I'd be willing to learn ColdFire/68k asm again just to contribute WHDLoad based fixes! :-D
(I can already do some MIPS and x86 but it's been almost 15 years since I did any serious 68k! :-o )
@FrenchShark
That asm listing you dumped earlier on, what version of the CF core was it for that made it necessary for you to emulate the entire instruction set?
FrenchShark, post looked very interesting.
I would like to know if he is interested in working together in this project?
Yeah I'm curious about what the listing was for if the core is now meant to be so close to being compatible.
I hasten to add that I'm not saying it isn't necessary I'm just wondering what it is that makes full emulation of the 68k necessary on the coldfire.
Andy
-
@ biggun
Some posts ago, before the discussions about MP and related stuff, I´ve made a post asking about the feasibility to preview the use of many Coldfires connected together to speed up the things....
The idea is to make it scalable, so it could go from a really small and cheap device up to a desktop with some software compatibility between then
Do you think it is possible ? (please, check my post...)
-
AeroMan wrote:
@ biggun
Some posts ago, before the discussions about MP and related stuff, I´ve made a post asking about the feasibility to preview the use of many Coldfires connected together to speed up the things....
The idea is to make it scalable, so it could go from a really small and cheap device up to a desktop with some software compatibility between then
Do you think it is possible ? (please, check my post...)
I don't think Coldfire has support for Multiprocessing, so you'd have to build some complex arbitration hardware and software... Certainly AmigaOS doesn't support multiprocessor environments...
-
@BigGun, AJCopland
Hello,
the software is intended for the 5282 coldfire which has a USP and SSP like the V4 even if it is a V2 core.
The CPU only has 60 MIPS of computing power.
My idea is to emulate all the instructions for maximum compatibility and to rewrite the OS in native ColdFire.
Then, the memory must be divided in two : one part for 68k application (running with the emlator), one part for the CF application. The CF and 68k areas are created by using the special debugging registers, no need for an MMU.
I already rewrote exec.library and partly timer.device in native ColdFire (exec multitasking needs timer to work).
As I said in some previous posts, the biggest issue seems to be the non-atomic access with a CF to IDNestCnt (TDNestCnt is not an issue since it is updated by the taskswitching routine).
I do not want to take any commitment about helping since I do not have a lot of time.
I am also doing some FPGA development, I have a nice idea to make a killing 2D video pipeline with up to 8 independant layers, colorspace conversion, fully configurable DMA scheduler, etc...
Regards,
Frederic
-
FrenchShark wrote:
@BigGun, AJCopland
Hello,
the software is intended for the 5282 coldfire which has a USP and SSP like the V4 even if it is a V2 core.
The CPU only has 60 MIPS of computing power.
I think you and we are working in the same direction, to the same goal.
Our goal is to integrate the SuperAGA and a CPU into one single Chip.
This single chip will reduce cost and enable people to again produce real cheap AMIGAs.
We are evaluating the Coldfire right now.
If we are happy with the Coldfire evaluation then we want to combine the SuperAGA + a new Coldfire with (700 MIPS) into one chip.
Thanks to Genesis help we have relative fast V4 development systems that we use for testing right now. The Natami will come shortly. You can "cludge" the Coldfire board ontot the Natami - then it will look like a CPU exypansion card to the Natami. This makes a good system to evaluate AMIGA OS running on Coldfire combined with a very high performing AGA chipset.
If you are interested in joining this project then tell us.
We could get you a V4 board.
Cheers
Gunnar
-
@ FrenchShark
Let me see if I understand what you are saying. You are planning to re-write AOS natively for coldfire ?
-
JJ wrote:
@ FrenchShark
Let me see if I understand what you are saying. You are planning to re-write AOS natively for coldfire ?
If so, I don't think he could sell it... Otherwise he could license the 'C' sources that Hyperion own and just and just recompile for CF.
Why not recompile AROS for CF and see how that runs native 68k apps... :) and I don't mean on Amiga hardware but on these CF developer boards...
-
@BigGun:
Full memory protection => No more direct access to blitter.
A Coldfire with SuperAGA will be very fast but
if you ask for MP then you it will crawl.
sorry to pickup it here one more time, but i suppose the access to blitter (if it went wrong) would not crash the system but ony produced corrupted graphic output. and even that only till the next refresh. the access to superaga (and other output periferials) could be treated as exception out of memory protection and in this case we would not need to sacrifice that much speed.
i am not at all familiar with hard and software engineering, so treat this post only like a noobs idea. no need to get upset about :)
-
@BigGun:
Full memory protection => No more direct access to blitter.
A Coldfire with SuperAGA will be very fast but
if you ask for MP then you it will crawl.
sorry to pickup it here one more time, but i suppose the access to blitter (if it went wrong) would not crash the system but ony produced corrupted graphic output. and even that only till the next refresh. the access to superaga (and other output periferials) could be treated as exception out of memory protection and in this case we would not need to sacrifice that much speed.
i am not at all familiar with hard and software engineering, so treat this post only like a noobs idea. no need to get upset about :)
-
wawrzon wrote:
@BigGun:
Full memory protection => No more direct access to blitter.
A Coldfire with SuperAGA will be very fast but
if you ask for MP then you it will crawl.
sorry to pickup it here one more time, but i suppose the access to blitter (if it went wrong) would not crash the system but ony produced corrupted graphic output. and even that only till the next refresh. the access to superaga (and other output periferials) could be treated as exception out of memory protection and in this case we would not need to sacrifice that much speed.
i am not at all familiar with hard and software engineering, so treat this post only like a noobs idea. no need to get upset about :)
No, the whole Chip RAM (which includes the custom chip resgister space) would have to be outside of any Task address space, only the OS would be allowed to that! The Alternative is to map the Chip RAM address space (which I believe is the lower 8meg) to every task, and that defeats the purpose of Memory Protection in the first place!
-
biggun wrote:
This means no direct access to Blitter or any HW anymore!
This is the opposite direction of the original AMIGA and the NATAMI or Coldfire designs.
I don't think that direct access to the Blitter was what Commodore wanted. IIRC, they begged people to NOT bang the hardware directly (after giving them the HW reference manuals :lol:). I'd suggest creating APIs for the NatAMI hardware; not necessarily for MP purposes, but to avoid HW conflicts, and to save people from having to write duplicate code unnecessarily.
Hans
-
Hans_ wrote:
biggun wrote:
This means no direct access to Blitter or any HW anymore!
This is the opposite direction of the original AMIGA and the NATAMI or Coldfire designs.
I don't think that direct access to the Blitter was what Commodore wanted. IIRC, they begged people to NOT bang the hardware directly (after giving them the HW reference manuals :lol:). I'd suggest creating APIs for the NatAMI hardware; not necessarily for MP purposes, but to avoid HW conflicts, and to save people from having to write duplicate code unnecessarily.
Hans
Why not simply have a CustomChip.library (obviously a better name would need to be chosen), which has functions for reading or writing to the chipset registers... that would act as a simple and suitable hal...
-
Question was: Can trashing Chipmem trash the system
Answer: Yes.
There is more than pictures in chipmem.
For example, there are Copperlist in chipmem.
If you thrash the copper list then the copper couldeven *fuck* up the Disk DMA.
In Chipmem could be IO buffers where disk-IO read/writes too.
Why not simply have a CustomChip.lbrary (obviously a better name would need to be chosen), which has functions for reading or writing to the chipset registers... that would act as a simple and suitable hal...
The problem is not the access alone.
Example:
You have programm which has a IO buffer at address $10000
The program calls an disk function to load data into this buffer the program has a typo in this call and ask to load to address $40000 instead.
"Kabunga !" Your IO-call now screws-up the program at this address.
Solution: The IO routine needs to have a list of all memory block belonging to your program it needs to compare your address and length with all entries in this list to verify that your DMA request will actually go to your block.
Will this be fast?
Same example:
You want to use the blitter to draw ONE! Char on the screen.
Same risk : Instead blitting into your buffer you could sénd a wrong adresss and the blitter will bit unto someoneelse memory.
Solution:
The blit function will now need to compare your address ptr and length will all memory blocks belonging to your programm.
Effect: Checking if the pointer are in legal ranges takes a lot of time. Much more time than using the blitter saves.
The orignal AMIGA allowed usage of DMA and Blitter to accelerate the system.
If you need to check ranges for each blitter call the overhead will be HUUGEE!
-
biggun wrote:
Question was: Can trashing Chipmem trash the system
Answer: Yes.
There is more than pictures in chipmem.
For example, there are Copperlist in chipmem.
If you thrash the copper list then the copper couldeven *fuck* up the Disk DMA.
In Chipmem could be IO buffers where disk-IO read/writes too.
Why not simply have a CustomChip.lbrary (obviously a better name would need to be chosen), which has functions for reading or writing to the chipset registers... that would act as a simple and suitable hal...
The problem is not the access alone.
Example:
You have programm which has a IO buffer at address $10000
The program calls an disk function to load data into this buffer the program has a typo in this call and ask to load to address $40000 instead.
"Kabunga !" Your IO-call now screws-up the program at this address.
Solution: The IO routine needs to have a list of all memory block belonging to your program it needs to compare your address and length with all entries in this list to verify that your DMA request will actually go to your block.
Will this be fast?
Same example:
You want to use the blitter to draw ONE! Char on the screen.
Same risk : Instead blitting into your buffer you could sénd a wrong adresss and the blitter will bit unto someoneelse memory.
Solution:
The blit function will now need to compare your address ptr and length will all memory blocks belonging to your programm.
Effect: Checking if the pointer are in legal ranges takes a lot of time. Much more time than using the blitter saves.
The orignal AMIGA allowed usage of DMA and Blitter to accelerate the system.
If you need to check ranges for each blitter call the overhead will be HUUGEE!
Yeah, I didn't mean for memory protection, as you already know and I already pointed out in my Email you would need the OS to mediate all hardware access if you want Memory Protection! What I was meaning as a simple software level arbitor to allow lowlevel access but with a bit more flexibility... locking registers, allowing a slight hardware changes without worrying about compatibilty problems etc. simple stuff... but I know, it goes against your hardware banging :-)
-
call it DirectAGA, lol
Anyway, I don't think the goal of NatAmi is to move the OS forward, but to move the hardware forward.
The goal is to have a fast 3.9 system... All this talk of memory protection is OT imho...
-
@all
I am sorry for the mess I instigated in this thread. I will be making further MP comments in the designated thread, and I recommend everyone else does the same. You can still quote from this thread if you wish.
Back to the Coldfire discussion. I think we should be looking to get help with implementing full 68k compatibility on Coldfire. FrenchShark obviously has skills in this area, and I'm sure Oli-HD has useful knowledge too. Thomas and Gunnar are busy enough without adding more to their workload. I've been looking at projects on the Internet where people have tried to move operating systems from 68k to Coldfire. The two projects I've found so far are:
Debian/68k
http://www.nabble.com/debian-68k-f12565.html
Atari Coldfire Project
http://acp.atari.org/
Would anyone object to me attempting to make contact and getting others involved with CFNatami?
-
biggun wrote:
For "proper" protection you need to sandbox and abstract all and everything.
This means no direct access to Blitter or any HW anymore!
This is the opposite direction of the original AMIGA and the NATAMI or Coldfire designs.
Asking for memory protection makes only sense for people and systems that are willing to trade a lot of performance for abstraction.
Full memory protection => No more direct access to blitter.
A Coldfire with SuperAGA will be very fast but
if you ask for MP then you it will crawl.
I don't think this is true. It is entirely dependant on how you organise the system.
What you want to do is allow both an MP system and a classic system to run side by side but not allow the Classic OS to wreck the rest of the system. You can use the MMU to contain Classic within a fixed area, you can also give it direct access to the chipset - the cost of going through the MMU is basically zero so it may as well be direct.
The only real problem is what happens if the Classic OS writes to the chipset while you are using the non classic OS. It could then mess up your display.
This however can be fixed by making the new chipset "modal", that is you restrict certain features depending on which OS you are using. Classic will see the chipset, the MP OS will see something slightly extra.
Basically what I'm thinking of setting aside a memory area specifically for the use of the MP mode, the MP OS's display port gets put in there and the Classic OS can't use the chipset to write to it - this should be pretty easy to implement in the FPGA (a control bit, a comparator and a couple of address registers which hold the range you don't want it to write to).
There is the matter of other I/O ports but these are slow anyway so abstracting them will make no difference.
It's not a perfect solution but it'd give you full graphics speed in both modes.
-
biggun wrote:
Question was: Can trashing Chipmem trash the system
Answer: Yes.
There is more than pictures in chipmem.
For example, there are Copperlist in chipmem.
If you thrash the copper list then the copper couldeven *fuck* up the Disk DMA.
In Chipmem could be IO buffers where disk-IO read/writes too.
...
The orignal AMIGA allowed usage of DMA and Blitter to accelerate the system.
If you need to check ranges for each blitter call the overhead will be HUUGEE!
You could do the same trick in the copperlist, DMA and blitter as was done by the MMU. You could design the SuperAGA chipset in such a way that the OS can say to the chipset what actions are allowed by certain tasks like the MMU allows to limit access to certain memory areas. This would normally not impact speed as it is done in parallel. Only when some non-allowed action is asked an exception is raised, just like an exception is raised by the MMU when a wrong memory address is accessed.
This would work best when the SuperAGA and CPU are integrated in one chip.
greets,
Staf.
-
You could do the same trick in the copperlist, DMA and blitter as was done by the MMU. You could design the SuperAGA chipset in such a way that the OS can say to the chipset what actions are allowed by certain tasks like the MMU allows to limit access to certain memory areas. This would normally not impact speed as it is done in parallel. Only when some non-allowed action is asked an exception is raised, just like an exception is raised by the MMU when a wrong memory address is accessed.
This would work best when the SuperAGA and CPU are integrated in one chip.
Don't even have to do that - address range restrictions could be imposed by including an MMU of sorts in the FPGA. Hit specific memory areas and an exception could be raised on the Coldfire, the MP OS would then kick in and handle it - i.e. the MP OS could handle things like the disc I/O.
BTW it doesn't need to necessarily be an MP OS, it could be say AROS, the aim is just to ensure the Classic OS couldn't mangle the second OS's memory or display without effecting the speed of Classic OS.
-
minator wrote:
You could do the same trick in the copperlist, DMA and blitter as was done by the MMU. You could design the SuperAGA chipset in such a way that the OS can say to the chipset what actions are allowed by certain tasks like the MMU allows to limit access to certain memory areas. This would normally not impact speed as it is done in parallel. Only when some non-allowed action is asked an exception is raised, just like an exception is raised by the MMU when a wrong memory address is accessed.
This would work best when the SuperAGA and CPU are integrated in one chip.
Don't even have to do that - address range restrictions could be imposed by including an MMU of sorts in the FPGA. Hit specific memory areas and an exception could be raised on the Coldfire, the MP OS would then kick in and handle it - i.e. the MP OS could handle things like the disc I/O.
No offense, but these proposals are frankly total non sense.
A Blitter that has a MMU will be extremly expensive to build and its performance will be dissapointing.
Using the Blitter will have a huge overhead as your MMU Blitter table will need to be replaced when switching tasks.
Please get a clue about the HW costs of implementing an MMU and the overhead of maintaining the MMU tables by the OS.
-
biggun wrote:
No offense, but these proposals are frankly total non sense.
A Blitter that has a MMU will be extremly expensive to build and its performance will be dissapointing.
Using the Blitter will have a huge overhead as your MMU Blitter table will need to be replaced when switching tasks.
Please get a clue about the HW costs of implementing an MMU and the overhead of maintaining the MMU tables by the OS.
I don't think that the performance will be as bad as you claim. The MMU table could be quite coarse, and would only cover chip RAM. Swapping the tables would only have to be done for tasks that actually use the blitter. Probably just changing the base-pointer to the table would be best; the table itself only needs to be checked if the blitter is actually used. So you don't have to swap the entire table every time.
Having said that, I'm pretty sure that PCI cards can DMA to memory that a particular task doesn't own. The solution, of course, is to provide drivers that do the access. In the case of the Amiga Blitter, having all functions contain pointers to bitmap structures and thus, enforcing blitter within those bitmaps only, would be an easier, and more efficient, solution. This way you could use the CPU's MMU to check that the bitmaps are accessible by the current task.
To be honest, I think that MP can help programmers detect more bugs. I find OS4's current level of protection very useful, as it catches mistakes that would have gone undetected on OS3.x and lower.
Hans
-
... and back to Coldfire. Where did the Coldfire discussion leave off before the MP tangent?
Hans
-
i don't get it.
and i'll be the first to put my hand up and say so.
68k can be emulated,
coldfire can be emulated,
PPC can even be emulated.
so. why bother with a coldfire core at all? why not go for an system-on-a-chip or x86 based mobile core design?
readily available parts, core2duo mobile, bit of ddr2 so-dimm action, onboard USB,firewire,ethernet,gfx,audio, PCI hook up to mate up to an amiga's CPU slot, emulate a 68k and PPC on the x86. we've apparently already got the software to do the job.
bish bash bosh, job's a good'un.
£250 to you guv'ner.
done and done.
that is the only way, except using real 68k chips, i can see a new accelerator seeing light of day going forward from this point.
:-?
-
To be honest, I think that MP can help programmers detect more bugs. I find OS4's current level of protection very useful, as it catches mistakes that would have gone undetected on OS3.x and lower.
Hans
As I said before. UAE is great for development.
You can patch UAE to get your MMU feature for no money.
Then you can use this feature during development to detect bugs.
Adding MMU to real HW would increase the price of the device significant!
If you add the MMU into the HW you will double the costs and lowering performance on the final device.
A clever solution might be to use a free UAE version for the testing. This development UAE can including all bells and whistles of an emulated MMU. It will allow you to detect and fix your bugs early.
-
A Blitter that has a MMU will be extremly expensive to build and its performance will be dissapointing.
Using the Blitter will have a huge overhead as your MMU Blitter table will need to be replaced when switching tasks.
Please get a clue about the HW costs of implementing an MMU and the overhead of maintaining the MMU tables by the OS.
You see to have an odd idea of just how big and slow MMUs are - they're neither big nor slow. They're so small in fact they can be found in almost every processor except the very smallest microcontrollers.
They might be big in high end PC processors but I'm not talking about those.
I'm talking about a much simpler mechanism that checks for writes into forbidden memory areas. There'll be a small number of these at most and the permissions can be represented by a single bit (ie 0 = Yes, 1 = No).
It can be set up like an MMU with (very simple) tables, but there's no need for virtual memory or anything like that.
The cost of switching tables on a task switch is zero - you wont need to switch tables on tasks and you wont even need to change them on switching OSs, you just need a single bit to represent the mode (MP OS or classic) and you just switch it. The blitter then gets access to everything - the MP OS can look after it's own memory...
-
biggun wrote:
Adding MMU to real HW would increase the price of the device significant!
If you add the MMU into the HW you will double the costs and lowering performance on the final device.
Not that I'm advocating the creation of a chipset MMU, but I don't think that it will be anywhere near as expensive as you claim. You wouldn't need the address remapping part, just a single bit per page indicating if the task is allowed to access the RAM or not. You could even group multiple contiguous pages into blocks for efficiency.
Of course it will lower performance slightly, but no more than the MMU in modern CPUs. All modern processors have them.
Personally I'd be more in favour of an API that prevents you from doing anything stupid (look at the rest of my previous post).
Hans
-
I see you guys have more idea about hardware than me (I can build every think... but I don't now much "how thinks work" ;) ), BUT...
For me Workbench (even 3.x) "is" dead... because of hardware... and lack of software since hardware "is slow". One big circle...
Personally I don't like when system has small lake with swimming fish in the background ;) (Vista - think). For me system should be FAST, EASY to understand how it works (e.g when I install program I know EXACTLY where are the parts needed to run the think...) yes WB is VERY good here. SO...
We NEVER had a chance to use a WB (expect UAE which I just don't like...) on FAST "native" computer, we always build this "sandwiches" A1200 is like a double burger... A4000 just a burger ;) and to make everythink work after 10 years... ehh...
We need some nice motherboard which will be compatible with OS3.x enough fast to run: DVD (etc.), mp3 (etc.), play with pictures, run the WB in higher resolution smoooooth ;), make a presentation in Hollywood or use new OWB for 68k?
And we need some people who will developer software for that... I can try "donate". I'm just a poor user :(
Kreciu
Ps. In general computers DO NOT deveolop so much today. There is a set of software we "need" for every day use. Sure we can develop/ change some stuff, but don't be :crazy: ;)
-
biggun wrote:
As I said before. UAE is great for development.
You can patch UAE to get your MMU feature for no money.
Then you can use this feature during development to detect bugs.
You can't detect all bugs as those rarely reveal themselves in a few sessions, only later when the poor user gets his/her data in other process(es) sent to neverland (directly, or indirectly through corrupted OS data/code).
-
minator wrote:
A Blitter that has a MMU will be extremly expensive to build and its performance will be dissapointing.
Using the Blitter will have a huge overhead as your MMU Blitter table will need to be replaced when switching tasks.
Please get a clue about the HW costs of implementing an MMU and the overhead of maintaining the MMU tables by the OS.
You see to have an odd idea of just how big and slow MMUs are - they're neither big nor slow. They're so small in fact they can be found in almost every processor except the very smallest microcontrollers.
In other words my friend you have no clue about HW design.
Please get a clue how much resources that would eat in a FPGA design and then come back with proposals.
If I would have the choice to
A) add a MMU around my blitter
B) to get for the same resources a Vector unit compareable to CELL/ALTIVEC/SSE
For me the choice will be clear.
-
People have been referencing a Coldfire V5 chip in this thread, but this is the current top of the line Coldfire chip that I can find listed on the Freescale site.
MCF5484 (http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MCF548X&nodeId=0162468rH3YTLC00M93426)
Its listed as a V4e core. (new V4 versions for running Linux® applications are the MCF5445X series) Are there any actual V5 core chips being produced?
The MCF5485 has a digikey price of $33 per unit 1 (it includes a MMU, FPU, Ethernet, USB, DDR/SDR-SDRAM Controller, ect ...)
The coldfire series is similar to the 680x0 series but not 100% code compatible. It seems like missing instructions have been added back to the coldfire as new core generations are released. It's possible that some future V5 core coldfire chips might make it possible to emmulate a 680x0, but based on the problems raised in this thread, it seems unlikely.
So wouldn't it be more practical to just use a current coldfire V4e core (like the MCF5485) as a system co-processor and run the OS on a actual 680x0 and only run blitter routines, ect, that can be rewritten to run as native code on the coldfire, and while getting the benefit of additional HW functions the MCF5485 makes available as part of the 68000 family of chips?
-
Hi Metalman,
metalman wrote:
People have been referencing a Coldfire V5 chip in this thread, but this is the current top of the line Coldfire chip that I can find listed on the Freescale site.
MCF5484 (http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MCF548X&nodeId=0162468rH3YTLC00M93426)
Its listed as a V4e core. (new V4 versions for running Linux® applications are the MCF5445X series) Are there any actual V5 core chips being produced?
Yes, there are V5 cores.
http://www.google.com/url?sa=t&ct=res&cd=1&url=http%3A%2F%2Fwww.freescale.com%2Fwebapp%2Fsps%2Fsite%2Foverview.jsp%3FnodeId%3D0162468rH3YTLC6951fgqk7SDQBWB3&ei=vxz3R9KOHoyQ-ALpp9UQ&usg=AFQjCNGLN56Z3xK5lVnjc1IjPbCh7agRAw&sig2=wiy0UeX65B-siy_LoBI8MQ
For example the HP LaserJet P2015dn Printers use as Processor a 400 MHz Motorola ColdFire V5
The MCF5485 has a digikey price of $33 per unit 1 (it includes a MMU, FPU, Ethernet, USD, DDR/SDR-SDRAM Controller, ect ...)
Yes, you can get 266Mhz Coldfire V4e for $20
The coldfire series is similar to the 680x0 series but not 100% code compatible. It seems like missing instructions have been added back to the coldfire as new core generations are released. It's possible that some future V5 core coldfire chips might make it possible to emmulate a 680x0, but based on the problems raised in this thread, it seems unlikely.
The V4 can execute 68k binaries
The question is not if it works, but how big the average performance impact it.
BTW several people are evaluating in this direction already:
http://projects.powerdeveloper.org/project/coldfire/707
So wouldn't it be more practical to just use a current coldfire V4e core (like the MCF5485) as a system co-processor and run the OS on a actual 680x0 and only run blitter routines, ect, that can be rewritten to run as native code on the coldfire, and while getting the benefit of additional HW functions the MCF5485 makes available as part of the 68000 family of chips?
I see where you are coming from. And for a test system your idea is okay.
Regarding using the Coldfire. I can only speak for the concept idea of the NatAmi www.natami.net here.
The Natami draws a lot of performance out of the SuperAGA blitter. A good Blitter will always be faster than a good CPU. This is because of the way a blitter more effectively pipeline and can fully use the chip select lines, which a CPU can not. When you connect the same memory to a blitter and a CPU, then the CPU can reach at best case 50% of the possible blitter speed.
As the SuperAGA Blitter is many times faster than the Coldfire it makes no sense to use the Coldfire as blitter.
It could make sense to use the Coldfire as main CPU.
I'm very curious to see the results of the Coldfire performance evaluation.
I think that a Coldfire combined with SuperAGA in one chip the the potential to be a winner.
The beauty of this SOC is that you can get blistering fast Blitter plus a decent CPU in one chip for $20.
The question that I'm wondering a bit is how fast do we need to be. Yes I know its cool to be faster than the fastest Cell.
But seriously, how fast does a AMIGA OS system need to be to be fast?
Is the performance of a 68030 with 50 MHz OK?
Is the performance of a 68030 with 100 MHz OK?
Is the performance of a 68030 with 200 MHz OK?
Is the performance of a 68030 with 500 MHz OK?
Is the performance of a 68030 with 1000 MHz OK?
Cheers
-
You see to have an odd idea of just how big and slow MMUs are - they're neither big nor slow. They're so small in fact they can be found in almost every processor except the very smallest microcontrollers.
In other words my friend you have no clue about HW design.
If you are going to reply to me and quote me you could at least read the entire post.
Secondly, you've blatantly quoted me out of context.
If you really think that comment is inaccurate or show any faulty knowledge on my part please explain why.
however...
An MMU might have been a big deal in 1985, but it's not today. Some of the high end processors may have large MMUs but as I said (on the next line that you didn't quote) we're not talking about these. In any case much of these will be taken up by the TLBs, these will not be necessary.
Please get a clue how much resources that would eat in a FPGA design and then come back with proposals.
If I would have the choice to
A) add a MMU around my blitter
B) to get for the same resources a Vector unit compareable to CELL/ALTIVEC/SSE
For me the choice will be clear.
A vector unit comparable to those is going to be quite considerably larger than any MMU, if you don't believe me have a look at a die photo of one of the Cell's SPEs - then compare how big it is to the MMU *it contains*.
--
There is a long term general anathema to using MMUs in the Amiga community, possibly based on the assumption that they slow memory access. More knowledgeable folks could argue that page table walks are slow and you have to switch page tables every time you switch tasks.
However, today, none of this is true.
MMUs will increase memory latency but the effect of this is utterly insignificant, I learned this when I first used BeOS about ten years ago. It had full memory protection but it is every bit as responsive as any Amiga.
CPU designers know the slow parts in their designs and fix them in subsequent designs. TLBs cache pages and this means page table walks are relatively rare. Modern processors also shouldn't need to change the page table every time they switch tasks, e.g. the processors in my mobile phone have full memory protection and virtual memory support and they will not switch tables on a task switch - I know because I happen to of had the features of those particular processors explained to me yesterday. I cannot say if it's true for all modern processors though.
However, as I said in my previous post what I'm suggesting is much simpler than this so wont have any of these complexities.
I have some ideas for implementation so I will put these down.
-
biggun wrote:
A Blitter that has a MMU will be extremly expensive to build and its performance will be dissapointing.
Using the Blitter will have a huge overhead as your MMU Blitter table will need to be replaced when switching tasks.
Please get a clue about the HW costs of implementing an MMU and the overhead of maintaining the MMU tables by the OS.
I don't think you need a full blown MMU. I think one or a few mask registers can do the trick. Page paged tables are needed when implementing virtual memory and page swapping which is not needed in this case.
greets,
Staf.
-
biggun wrote:
metalman wrote:
People have been referencing a Coldfire V5 chip in this thread, but this is the current top of the line Coldfire chip that I can find listed on the Freescale site.
MCF5484 (http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=MCF548X&nodeId=0162468rH3YTLC00M93426)
Its listed as a V4e core. (new V4 versions for running Linux® applications are the MCF5445X series) Are there any actual V5 core chips being produced?
Yes, there are V5 cores.
V5 ColdFire Core: Full Superscalar (http://www.freescale.com/webapp/sps/site/overview.jsp?nodeId=0162468rH3YTLC6951fgqk7SDQBWB3)
For example the HP LaserJet P2015dn Printers use as Processor a 400 MHz Motorola ColdFire V5
You found what I did when I searched the first time for a V5 coldfire. The document you linked is a Roadmap document, what I can't find is an link to a actual V5 chip datasheet.
biggun wrote:
metalman wrote:
The MCF5485 has a digikey price of $33 per unit 1 (it includes a MMU, FPU, 10/100 Ethernet, USB 2.0, DDR/SDR-SDRAM Controller, PCI Interface, ect ...)
Yes, you can get 266Mhz Coldfire V4e for $20
Which one?
The MCF5445X series and the MCF548x series which are designed to work with the Linux Development kits, with royalty-free, open-source software demonstration applications provided, seem to me to be the best choices.
M5484LITE: Linux Development Kit for the ColdFire MCF548x Family (http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=M5484LITE&parentCode=MCF548X&fpsp=1&nodeId=0162468rH3YTLC00M93426)
biggun wrote:
metalman wrote:
The coldfire series is similar to the 680x0 series but not 100% code compatible. It seems like missing instructions have been added back to the coldfire as new core generations are released. It's possible that some future V5 core coldfire chips might make it possible to emmulate a 680x0, but based on the problems raised in this thread, it seems unlikely.
The V4 can execute 68k binaries
The question is not if it works, but how big the average performance impact it.
BTW several people are evaluating in this direction already:
Coldfire MCF54455 Project (http://projects.powerdeveloper.org/project/coldfire/707)
Seems there are some major problems or products like the Dragon would be shipping by now. Maybe if some more 680x0 instructions are added back in a V5 chip it might work.
So wouldn't it be more practical to just use a current coldfire V4e core (like the MCF5485) as a system co-processor and run the OS on a actual 680x0 and only run blitter routines, ect, that can be rewritten to run as native code on the coldfire, and while getting the benefit of additional HW functions the MCF5485 makes available as part of the 68000 family of chips?
I see where you are coming from. And for a test system your idea is okay.
Regarding using the Coldfire. I can only speak for the concept idea of the NatAmi www.natami.net here.
The Natami draws a lot of performance out of the SuperAGA blitter. A good Blitter will always be faster than a good CPU. This is because of the way a blitter more effectively pipeline and can fully use the chip select lines, which a CPU can not. When you connect the same memory to a blitter and a CPU, then the CPU can reach at best case 50% of the possible blitter speed.
As the SuperAGA Blitter is many times faster than the Coldfire it makes no sense to use the Coldfire as blitter.
It could make sense to use the Coldfire as main CPU.[/quote]
Cool!!! Wasn't considering someone designing a new AGA hardware blitter.
Use the coldfire as a co-processor to run coldfire.library routines that have been re-written to run native code on the coldfire such a floating-point math ect...
biggun wrote:
I'm very curious to see the results of the Coldfire performance evaluation.
I think that a Coldfire combined with SuperAGA in one chip the the potential to be a winner.
The beauty of this SOC is that you can get blistering fast Blitter plus a decent CPU in one chip for $20.
I see the Coldfire as a way to add MMU, FPU, USB, Ethernet, DDR/SDR-SDRAM Controller, PCI Interface, hardware functions using the Coldfire as a co-processor.
MCF548x Reference Manual (http://www.freescale.com/files/32bit/doc/ref_manual/MCF5485RM.pdf?fpsp=1)
biggun wrote:
The question that I'm wondering a bit is how fast do we need to be. Yes I know its cool to be faster than the fastest Cell.
But seriously, how fast does a AMIGA OS system need to be to be fast?
Is the performance of a 68030 with 50 MHz OK?
Is the performance of a 68030 with 100 MHz OK?
Is the performance of a 68030 with 200 MHz OK?
Is the performance of a 68030 with 500 MHz OK?
Is the performance of a 68030 with 1000 MHz OK?
Cheers
A computer only seems as fast as its slowest bottle neck.
The Amiga hardware design philosophy was to offload as many functions as possible to fast co-processors.
Giving the main cpu more idle time by offloading more routines to co-processors (video, FPU, ect) makes the whole systems apparent speed faster.
-
What's this about the PMMU slowing things down? I've used the PMMU in the 030 and 060 in some personal project, and I can't really note any real performance hit when using it. Maybe it's an issue when you're running on an 8Mhz CPU with no cache, but it's definitely not an issue on a 060, for example.
Maybe I missed the point completely. Can someone shed some light on this for me?
-
:oops:
-
shoggoth wrote:
What's this about the PMMU slowing things down? I've used the PMMU in the 030 and 060 in some personal project, and I can't really note any real performance hit when using it. Maybe it's an issue when you're running on an 8Mhz CPU with no cache, but it's definitely not an issue on a 060, for example.
Maybe I missed the point completely. Can someone shed some light on this for me?
If you have a address range of 1GB and 4KB pages then you 260,000 MMU entries for this. These a Megabytes of MMU table data!
Your MMU can remember a limited number of page entries on Chip. (64 pages in case of the 68060).
64 Page entries equall to 256 KB of memory.
If your program is bigger than 256 KB and jumps around in memory a lot then your on CHIP MMU entries on chip are too few. This means your MMU needs to re-read these entries from memory constantly.
In worth case scenario is can go as far as your MMU end up needing to reload one MMU table entry per memory access.
Its common to see this behavior on certain memory stress test. Affective algorithms can degrade perfromance by up to 50%.
A MMU adds overhead to the Chip.
Motorola 68060 chip with FPU and MMU was advertised with 60 MHz clockrate. The same CPU without FPU and MMU was running with 75 MHZ.
A MMU will add overhead and latency.
The Latency can be hidden by creating a more complex address generation pipeline and by using on chip cache(to cache to MMU tables).
Please mind that even the most simplest MMU with just one Bit per page to indicate access allowed, will need an MMU-table of 32KB.
Creating a MMU with a on chip cache and a dynamic reloading of MMU table missed is complex.
Putting the whole MMU table on chip is a simpler design but it would eat 32K on chip cache memory.
The MMU on the chip is a lot of overheat.
There is a good reason that no one does a MMU on blitter!
-
biggun wrote:
A MMU adds overhead to the Chip.
Motorola 68060 chip with FPU and MMU was advertised with 60 MHz clockrate. The same CPU without FPU and MMU was running with 75 MHZ.
A MMU will add overhead and latency.
The Latency can be hidden by creating a more complex address generation pipeline and by using on chip cache(to cache to MMU tables).
Please mind that even the most simplest MMU with just one Bit per page to indicate access allowed, will need an MMU-table of 32KB.
Creating a MMU with a on chip cache and a dynamic reloading of MMU table missed is complex.
Putting the whole MMU table on chip is a simpler design but it would eat 32K on chip cache memory.
The MMU on the chip is a lot of overheat.
There is a good reason that no one does a MMU on blitter!
The presence of an FPU was the reason why the full '060 could not officially run at 75mhz, some rev6 '060s can run at 100mhz.
leaving out the MMU was an economic decision.
The benefits of an MMU, like memory protection, outweigh the speed penalty. even though the amigaOS is not suited to memory protection, an MMU IS still useful as proven by ENFORCER.
-
A6000 wrote:
The benefits of an MMU, like memory protection, outweigh the speed penalty. even though the amigaOS is not suited to memory protection, an MMU IS still useful as proven by ENFORCER.
You are missing the point of the discussion.
One statement was that a CPU MMU can help to create a more robust system - this is clear.
The discussion was about that to have a"full" protection you need to encapsulate all memory writes - this includes memory writes of CPU and other chips like the Blitter.
The AMIGA system is full of devices which can create DMA writes. Blitter/IDE-DiskDMA/SCSI-DMA/Floppy/PCI Cards/...
For the "dream"-protection you would need to encapsulate all these DMA sources into their own MMU bubbles.
Developing a chipset MMU is expensive in both development time and chip resources!
Setting up a chipset MMU will have a big performance penalty on custom chips.
Cheers
-
Since we were originally talking about ColdFire and I don't care about MMU :-D I was wondering why no-one ever seems to mention the Turbo-CF (http://turbo-cf.narod.ru/eng/index.html) (Yahoo group for it (http://tech.groups.yahoo.com/group/turbo_cf/))?
It has downloadable designs etc though I don't believe it's ever been built.
I think I once read a thread about it but it was discounted becuase of the usual "ColdFire can't execute 68k code" blahblahblah. Since people on here are saying that they can I'd like to know what other things are wrong with the Turbo-CF design and whether it'd be worth trying to build one from the schematics?
If it IS a totally worthless design then could someone explain why?
Andy
EDIT: corrected _some_ of my appalling grammar :lol:
-
*bump*
anyone?
-
If coldfire could be made to run 68k code without error, then wouldn't elbox have done it by now?, they have had the dragon coldfire accelerator in "development" for at least 4 years now, I think only biggun really believes coldfire can work.
-
Elbox have had SharkPPC "in development" since October 2000.
I guess we cant use amigaos with PPC then?
Gunnar is evaluating the cpu with a proper dev-platform right now. He is atleast conducting serious work instead of spinning.
regards
Bandis
-
There is a good reason that no one does a MMU on blitter!
I think you'll find GPUs have MMUs.
If your program is bigger than 256 KB and jumps around in memory a lot then your on CHIP MMU entries on chip are too few. This means your MMU needs to re-read these entries from memory constantly.
That's why they invented large pages. These can be anything up to 1GB in size on some processors.
In worth case scenario is can go as far as your MMU end up needing to reload one MMU table entry per memory access.
Its common to see this behavior on certain memory stress test. Affective algorithms can degrade perfromance by up to 50%.
This hurt Cell at one point, the initial FFT benchmarks showed it outrunning a Pentium 4 by 40X over. They later added support for 16MB pages and performance went up to 100X faster than a P4.
-
About this idea I was on about...
What's needed is a way to protect certain memory regions and registers from errant data. This can't be done properly for existing Classic Amiga of course but I think it can be used to allow a second OS to exist alongside without Classic writing over it's data.
The memory system doesn't need to be divided into lots of pages. You could say assign an area above address X to OS2 and simply forbid any access from OS1. If this address is on a neat power of 2 boundry you won't even need to check every address bit. A set of comparison registers would allow a set of regions to be protected.
You could also partition the registers in the chipset so new registers were in one region while old registers were in another. This will allow you to protect these registers from "classic" OS1 apps. New OS1 apps could be allowed to access these registers by assigning them an intermediate mode.
e.g. you can have new chunky modes and you can add to them the notion of an OS switch (i.e. these registers will be saved and restored when the system switched between OSs - otherwise you'll get corruption).
If you wanted to protect specific registers that could be done by checking against the current mode. OS2 will set the mode every time it switches between itself and OS1. This could be useful for specific registers such as disc control and I/O, if these are written to, OS2 can be woken up to pipe to the I/O to a HD (or whatever) in a safe manner.
This probably isn't well explained but adding a few registers, comparators and a few mode bits will give a form of protection that will allow one OS to be protected from another, this gives you a path to evolve towards using a full memory protected OS in the future. Yes there will be some performance impact but this will be insignificant, especially compared to memory access which is the real performance bottleneck.
-
minator wrote:
About this idea I was on about...
What's needed is a way to protect certain memory regions and registers from errant data. This can't be done properly for existing Classic Amiga of course but I think it can be used to allow a second OS to exist alongside without Classic writing over it's data.
The memory system doesn't need to be divided into lots of pages. You could say assign an area above address X to OS2 and simply forbid any access from OS1. If this address is on a neat power of 2 boundry you won't even need to check every address bit. A set of comparison registers would allow a set of regions to be protected.
You could also partition the registers in the chipset so new registers were in one region while old registers were in another. This will allow you to protect these registers from "classic" OS1 apps. New OS1 apps could be allowed to access these registers by assigning them an intermediate mode.
e.g. you can have new chunky modes and you can add to them the notion of an OS switch (i.e. these registers will be saved and restored when the system switched between OSs - otherwise you'll get corruption).
If you wanted to protect specific registers that could be done by checking against the current mode. OS2 will set the mode every time it switches between itself and OS1. This could be useful for specific registers such as disc control and I/O, if these are written to, OS2 can be woken up to pipe to the I/O to a HD (or whatever) in a safe manner.
This probably isn't well explained but adding a few registers, comparators and a few mode bits will give a form of protection that will allow one OS to be protected from another, this gives you a path to evolve towards using a full memory protected OS in the future. Yes there will be some performance impact but this will be insignificant, especially compared to memory access which is the real performance bottleneck.
Lets say we enclose old apps in address-range bubble.
Does it help anything?
The applications of OS 1 can still trash OS1.
And the applications of OS2 can still trash OS2.
I see where you coming from and I know that you only have the best intentions (being them impossible to do or not).
The thing is that this discussion has nothing to do with Coldfire or Classic Amigas. Please respect that this discussion is off-topic here.
If you want to continue the discussion about MMU and Memprotection, it would be nice to do this in the thread about this topic.
Thanks in advance
-
The thing is that this discussion has nothing to do with Coldfire or Classic Amigas. Please respect that this discussion is off-topic here.
If you want to continue the discussion about MMU and Memprotection, it would be nice to do this in the thread about this topic.
First you slag off the idea of an MMU, than you ask that I suggest something, then when I do it's off topic? Have you ever considered going into politics?
Lets say we enclose old apps in address-range bubble.
Does it help anything?
The applications of OS 1 can still trash OS1.
And the applications of OS2 can still trash OS2.
I see where you coming from and I know that you only have the best intentions (being them impossible to do or not).
It's up to OS2 how it handles things, I assume it'll just abstract the hardware in a sensible way and thus not allow apps to trash one another.
--
However here's a question for you.
The Amiga, when it was launched was ahead in both software and hardware.
If the aim of Natami's is to bring the Amiga up to date why is it being thought of in terms of hardware only?
It makes sense to do a a first revision as an Amiga clone for 3.x, but going forward do you really intend to stick with OS3.x?
-
minator wrote:
However here's a question for you.
The Amiga, when it was launched was ahead in both software and hardware.
If the aim of Natami's is to bring the Amiga up to date why is it being thought of in terms of hardware only?
It makes sense to do a a first revision as an Amiga clone for 3.x, but going forward do you really intend to stick with OS3.x?
How about: because one is possible the other is adding memory protection retroactively whilst maintaining full compatibility :-D
Also it was me that's been pushing to get this ColdFire thread onto er... well talking about ColdFire crazily enough. Which was why the other thread was started by bloodline over in "Memory Protection AGAIN (http://www.amiga.org/forums/showthread.php?t=35580)".
Andy
-
I've been reading this thread with great interest. Darksun raises a very good point, I think. If you look at the performance level possible with a modern Intel or AMD chip, the cpu time lost to emulating the 68k instructions is very small, I think.
Also, please consider this: the best solution to buggy old software crashing a nice new and fast Amiga system is to have a system that is nice enough that people will want to program for it _now_, correcting the problem by not only replacing those applications with new, but by releasing patches to old games and apps that have these sort of bugs.
It is not so hard to run a hardware-level debugger, say, oh, that guy ran out of coffee right here where there's this pointer arithmetic error, and just correct it by writing a small wrapper to the app that loads patches! I did this all the time on Macs - with resedit - and on winbloze boxes, too, and I can't imagine that amiga programmers from back in the day wrote code that is any harder to understand than that of other coders from that era.
People will flock to an os that can meet the challenge of not tormenting its users. Here is a good example of why:
http://picasaweb.google.com/patrick.killourhy/Winbloze/photo#5199703060705812642
This is not a fake image, it's a pic I took yesterday. Imagine if you were this gas station chain's mgmt and could order Amigas to replace the damned things to correct the loss of ad revenue (this sign is next to the I5 freeway and clearly visible from a _long_ way off) every time winbloze goes wonky! I really think people are at that point with these damned machines of ours.
I really wish luck to anyone trying to make an Amiga that can hold its own in performance these days, or even just be built out of available parts, and this is my small advice to them: dare to piss people off by 'breaking' some old functionality, because you'll never get it done otherwise. :) Hopefully I can buy a new 'Amiga' someday, even if it is not 100% compatible with the old stuff - I say this simply because I am really frustrated, as a programmer, at how poorly designed operating systems are when compared to other types of software (relational databases come to mind here).
-p.
-
Dude, if that was an option you'd be taking pictures of GURU MEDITATION on that sign.