Amiga.org
Amiga computer related discussion => Amiga Software Issues and Discussion => Topic started by: Erebos on April 10, 2012, 12:45:46 AM
-
Hi,
I just wondering why the onboard 68k is disable on the Classic version of OS4 ?
If it wasn't, would it be possible to run things without 'runinuae' or the likes ? or maybe it will speed up this as it wont need to emulate the 68k ?
-
From what I understand, the context switches between 68K and PPC were way too slow, and it introduces a whole lot more complexity.
-
You'd run into the same problems as you do with WarpOS and PowerUp - context switches that zap performance.
Besides, the main issues with OS4 and compatibility have less to do with the processor and more to do with the substantial rewrite of the OS. It's the same kind of problem we experienced going from 1.3 to 2.04. In other words, even if we had a full 68K build of OS4, the stuff that requires UAE would almost certainly still require UAE to work.
-
Hi,
I just wondering why the onboard 68k is disable on the Classic version of OS4 ?
If it wasn't, would it be possible to run things without 'runinuae' or the likes ? or maybe it will speed up this as it wont need to emulate the 68k ?
This seems to be a common misunderstanding. You don't need UAE or RunInUAE to use 68k software with AmigaOS4.x (or MorphOS in that respect).
-
Hi,
I just wondering why the onboard 68k is disable on the Classic version of OS4 ?
If it wasn't, would it be possible to run things without 'runinuae' or the likes ? or maybe it will speed up this as it wont need to emulate the 68k ?
Simplest answer is that whole OS4 is PowerPC and running native PPC apps is whole level of new experience. 68k WB friendly software will be working under Petunia, maybe even faster then 040 if PPC is 604 I suppose. AGA chipset should be even acessible? so even more software should work? (has anyone tested e.g. apps that used to crash on A1 on Classic OS4?
http://www.intuitionbase.com/ossoftware.php?category=3&letter=NON
Plus the context switching was known slowdown problem of PPC cards, so your PPC experience is full now :-)
-
In theory, the '040/060 could run asynchronously in parallel to the PPC (WarpOS + 3.x worked that way). However, OS4 isn't up to handling SMP/AMP, it needs to be designed that way.
-
In theory, the '040/060 could run asynchronously in parallel to the PPC (WarpOS + 3.x worked that way). However, OS4 isn't up to handling SMP/AMP, it needs to be designed that way.
It's not just OS4 that needs a redesign. The Phase5 accelerator card has only one memory bus so it CANNOT run both processors at the same time. It can switch between them, but not run them in parallel.
-
The issue Samurai describes is also present on the saturn, and a reson for its demise. The CPUs apart were weak, but together they ran well. Only one CPU could access the memory registers at a time, but their memory was divided for each cpu
-
In theory, the '040/060 could run asynchronously in parallel to the PPC (WarpOS + 3.x worked that way).
Really? I don't think it worked that way.
-
indeed, very interesting answers here.
So it is because of hardware design that OS4 is doing that way..
But if WarpOS+ 3.x managed to do that it must be feasible ;-)
-
The Phase5 accelerator card has only one memory bus so it CANNOT run both processors at the same time. It can switch between them, but not run them in parallel.
The CPUs most definitely do run in parallel.
I don't know who invented this nonsense and for what purposes, but it is not true.
-
Really? I don't think it worked that way.
It was possible to run 68k and PPC code independent from each other, at least with PowerUP. There were certain highly optimized apps that used to do this. For instance the infamous FastQuake used a special cache inhibited memory area as a ring-buffer to avoid the need for cache flushes. This way both CPUs could run at full speed with as few context switches as possible.
-
But if WarpOS+ 3.x managed to do that it must be feasible ;-)
It is feasible, but it would slow down the PPC CPU considerably: Each time you switch from one CPU to the other on a PowerUp board, you have to flush all CPU caches (because the CPUs are sharing the same memory, and this is the only way to maintain memory integrity). Now imagine doing that hundreds of times per second (as you would have to, when running 68k and PPC tasks in parallel) - the system would crawl.
As it is, depending on the speed of your PPC CPU, the 68k CPU emulated by OS4 is definitely faster than a 68040 anyway.
-
@cgutjahr
as i was talking about classic version, I cant see how the PPC side of my blizard ppc at 240mhz can be faster emulating 68k than my real 060 onboard...
Plus i'd say that performances do not drop at all when on 3.9 i have 68k tasks running along side ppc ones (like wos mp3 decoding)
Reason maybe is that it is much easier to maintain an unique branch of the OS that is primary towarded to NG amigas and not designed for classic at first.
maybe that contexts switches are too heavy to handle, but...i'm skeptical
Anyway please do not misunderstand my words, i'm glad that OS4 exists for classic and thanks Hyperion for it ;-)
-
@cgutjahr
as i was talking about classic version, I cant see how the PPC side of my blizard ppc at 240mhz can be faster emulating 68k than my real 060 onboard...
Plus i'd say that performances do not drop at all when on 3.9 i have 68k tasks running along side ppc ones (like wos mp3 decoding)
Reason maybe is that it is much easier to maintain an unique branch of the OS that is primary towarded to NG amigas and not designed for classic at first.
maybe that contexts switches are too heavy to handle, but...i'm skeptical
Anyway please do not misunderstand my words, i'm glad that OS4 exists for classic and thanks Hyperion for it ;-)
Well, you would need to take the same approach they took with 3.X
Because your native OS is in PPC you would need to write a specialized kernel for your 68K and as well as supporting PPC libraries to communicate with and facilitate data exchange with said 68K kernel.
You would convince application developers to compile parts of their applications 68K native and to retool their applications to use a design pattern conducive to offloading tasks to a slower processor. To convince the developers you'll just need to remind them of the Amiga's central design philosophy concerning multitasking and the offloading of work from the central CPU.
The first PPC->68K API would called "Power Down" but soon after its introduction a new standard will emerge called "WarpOS/2".
Fortunately, specific "Power Down" and "WarpOS/2" compatibility layers will created for the respective API's which will allow some degree of compatibility.
People will bicker over the technical merits of both APIs for years until the entire OS is fully ported to 68K.
-
It was possible to run 68k and PPC code independent from each other, at least with PowerUP. There were certain highly optimized apps that used to do this. For instance the infamous FastQuake used a special cache inhibited memory area as a ring-buffer to avoid the need for cache flushes. This way both CPUs could run at full speed with as few context switches as possible.
I must admit that I've never really "seen" applications that ran parallel in that sense on a phase5 board - regardless which kernel or OS was used. Using that kind of "cache-protected" buffer looks like a way or work-around to build up some kind of transient parallel work time frame, indeed. But every now and then the context switch will show up again...Thanks for that info!
Power Down
Oh dear! :-) A new kernel war PowerDown vs. WarpDown will rise...
-
A WOS proggy have been coded to show the duration of the context switches ?
-
as i was talking about classic version, I cant see how the PPC side of my blizard ppc at 240mhz can be faster emulating 68k than my real 060 onboard...
Just give it a try, and draw your conclusions after you saw it in action. Apart from stuff that hits the CPU badly (like, say, Maxon Cinema rendering a scene) you probably won't miss your 68060. Having the whole OS running on the PPC makes a whole lot of a difference.
A lot of OS4 software (read: ports) is running way to slow on OS4 classic, because the PPC CPU is really not all that powerful. But I don't think 68k software is a problem. Anybody still using Maxon Cinema t render scenes on real Amiga hardware belongs into a mental asylum anyway ;)
-
I must admit that I've never really "seen" applications that ran parallel in that sense on a phase5 board - regardless which kernel or OS was used.
I am sure everyone have used PPC accelerated mpega.library or datatypes. PPC is decoding next mpega frames while 68k CPU is doing something else but of course memory bus is always a bottleneck there.
But CPUs cant run in parallel in sense they could work on same data structures simultaneously. Something like reading system structures is strictly forbidden from PPC side because it is not coherent (no, cache flush technique is not going to fix that).
-
There seems to be a recurring misconception about the PowerUP hardware and the older PPC kernels that ran on it.
First off, the hardware. Both processors (all three, if you include the SCSI script processor on the 603e+ boards) most assuredly run concurrently. However, they do share a single bus which means that only one of them can perform IO on it at any given instant.
However, both processors also had large enough instruction and data caches, running them in copyback mode (meaning that reads and writes were generally full cache lines) that either one being forced to wait for in-progress IO from the other was not a particularly limiting factor.
Next, the software. A consequence of the hardware design is that each processor ends up with it's own, potentially out-of-date view of what is in RAM based on it's own cache. I may be wrong, but I don't think that bus snooping worked so well (or perhaps at all) with the design, so that whichever PPC kernel you used, caches had to be kept in sync by flushing them when one processor called another. The kernel generally managed this cache coherency problem allowing developers to write 68K apps and treat the PPC as a co-processor they'd compile certain expensive functions for, or alternatively write mostly PPC native apps that call the 68K for OS stuff, IO or event handling.
I can't recall the details for PowerUP but under WarpOS, each WarpOS application had at least 2 tasks. One that ran on the 68K and one that ran on the PPC. Together, these "mirror tasks" formed a single "virtual" thread of execution. Signals and such were routed to both tasks, but at any instant in time, only the PPC or 68K thread would be running and it's counterpart asleep, awaiting the flow of execution to come back to it. There was an exception to this rule, an asynchronous calling method that was rarely used as it required the application software to ensure cache coherency.
I think it's this notion of a given application running on only one CPU at any given instant that causes the misconception of a single processor model. However, Exec, WarpOS and PowerUP are all pre-emptively multitasking kernels. So, when your WarpOS task goes to sleep waiting for a slow 68K call to return, in principle, WarpOS is free to schedule any other ready-to-run PPC task in it's place. Vice versa, Exec the 68K.
In theory, this means that if you had two WarpOS applications, one could be executing some OS calls the 68K whilst the other is running PPC native code on the PPC. In practise, however, the context switching step whenever any application jumped CPU starved both processors of time until it was complete and more often than not, the actual time spent on a 68K function call (or vice versa) was dwarfed by the time spent just doing the context switching work. Consequently, whether you coded for PowerUP or WarpOS, the golden optimization rule was to minimize context switches. So, if you were going to need to do several OS calls, refactor your code so that they can be done from a single 68K call that you can invoke from the PPC (or vice versa). Sadly, it was not uncommon to see applications that didn't do this.
This brings us back to the original question. Well the answer is that emulating 68K code on the PPC is faster due to not having this cache flushing limitation. Even a 603e is generally faster than it would be on the real 68K, certainly if your 68K is an 040 like mine. Instantaneous JIT performance is, of course, variable and highly context sensitive so while there may be some 68K code out there that would prove pathological under emulation and thus run faster on the real 68K, any benefit would be instantly lost under the overhead of having to implement a WarpOS/PowerUP style cache coherencey strategy for both processors.
I did have a half-baked idea of my own that if I ever get around to experimenting with I might try, but it is probably doomed to the waste basket of silly ideas already. Essentially what I thought of was allocating a lump of memory to install some 68K code in and trying to see if I can get it running. However, I have no intention of using the 68K for running exsting 68K applications. Instead, I envisage it as a sort of general-purpose programmable DMA controller for classic hardware. It would use an MMU setup that would mark all memory uncached except for the space allocated for the code it executes and some private data workspace. As all other regions are uncached, coherency is only a problem from the host OS side and that's what CachePre/PostDMA() is there to manage. Having complete access to the hardware and memory space in the system might make it pretty useful for data transfer tasks. You might have a 68K uncacheable page of memory somewhere that represents a memory-mapped "register file" for your virtual DMA device. This would ideally need to be uncached from the PPC side too, so a page of ChipRAM might even be an idea. It doesn't matter that it's slow because you are only putting parameter data in it (eg address of memory region, size of memory region) and then via an interrupt or other mechanism, get the 68K to do the transfer (which, if done properly, could use move16 for block transfers to/from uncached memory).
It's all very pie in the sky, but it seemed theoretically possible when the idea first occurred that you might be able to produce virtual DMA controllers for things like the onboard IDE, parallel port, PCMCIA or whatever and then drivers that utilise them.
There are probably many very good reasons why this probably impossible though.
-
This brings us back to the original question. Well the answer is that emulating 68K code on the PPC is faster due to not having this cache flushing limitation. Even a 603e is generally faster than it would be on the real 68K, certainly if your 68K is an 040 like mine.
68k code that is executed only once probably runs faster on real 68k, though.
It's all very pie in the sky, but it seemed theoretically possible when the idea first occurred that you might be able to produce virtual DMA controllers for things like the onboard IDE, parallel port, PCMCIA or whatever and then drivers that utilise them.
Since interrupts are executed on a PPC side you would have to make some substantial changes there... and since those devices still wouldnt run any faster only advantage would be having more cpu time for idle task =P
-
68k code that is executed only once probably runs faster on real 68k, though.
Yeah, but as soon as it makes an OS call to the PPC native host, any such advantage would be lost instantly.
Since interrupts are executed on a PPC side you would have to make some substantial changes there... and since those devices still wouldnt run any faster only advantage would be having more cpu time for idle task =P
I think you misunderstood me. I was suggesting that using an interrupt "or other" mechanism to signal the 68K to do something asynchronously from the host.
Consider disk access on the motherboard IDE. It's cripplingly slow whether you do it on the 68K or the PPC. However, if you were able to do block transfers via your "virtual DMA device" (in reality a bit of 68K code doing PIO to/from some range of memory), the PPC would be free to do something else while it waits for it to complete. In that respect, no different than the 68K waiting for the NCR7xx to complete DMA transfer to/from a SCSI device. Why tie up your host processor doing such PIO transfers when the 68K can do it just as quickly in this case while your host schedules some other PPC task to run while this one sleeps?
If the required 68K transfer code were small enough to fit in the instruction cache (which a simple implementation should be), it might work pretty well. It's not hard to imagine reading 16-bit words from the IDE to a small16-byte aligned buffer in the cacheable area you reserved for the 68K and then move16'ing them over to wherever they are actually needed, all without performing any memory accesses during the loop except those required for data transfer.
Once finished, the 68K could raise an interrupt for the PPC so that the actual "driver" task for this would wake up, call CachePostDMA() and then return.
Since only it's private working area would be cacheable, most of the coherency issues that dogged the traditional software are not a problem. You just have to remember to treat it like any other DMA device within any host drivers you write that use this mechanism.
-
I may be wrong, but I don't think that bus snooping worked so well (or perhaps at all) with the design
Sadly, bus snooping doesn't solve the problem: since any CPU could be modifying data segments (that may even have been cached by the other side) without writing them back right away, data caches can diverge substantially. x86/x64 systems are having quite a tough time ensuring cache coherence - just look at AMD's MOESI protocol (with the added challenge of multiple local RAM busses). With a heterogene AMP system without MESI you have little choice.
The only way to get around the extremely expensive context switches is to divide the RAM into regions for each CPU with a shared (non-cached) region for message passing - if I understood correctly this is what you were thinking of. However, shared memory is also expensive as it must not be cached and you need to copy all data passed back and forth to this slow memory - so you'll starve a CPU depending on caching as well (though not as badly).
-
Sadly, bus snooping doesn't solve the problem: since any CPU could be modifying data segments (that may even have been cached by the other side) without writing them back right away, data caches can diverge substantially.
Indeed. I probably should have worded it a bit more clearly - this is essentially the point I was trying to make. Bus snooping may be implemented on one (or even both, but ISTR it was touted as a feature on 68040 but did it ever actually work?) of the processors but in the dual processor design, cache flushing was the only viable option.
The only way to get around the extremely expensive context switches is to divide the RAM into regions for each CPU with a shared (non-cached) region for message passing - if I understood correctly this is what you were thinking of. However, shared memory is also expensive as it must not be cached and you need to copy all data passed back and forth to this slow memory - so you'll starve a CPU depending on caching as well (though not as badly).
More or less. To reiterate, a section of ram allocated by the host for the 68K in which it stores the code, MMU tables and working data. The rest of the entire address range would then be marked as completely uncacheable by the 68K.
The shared memory would only be a few pages in which to implement a register file for the PPC to talk to whatever virtual DMA device you were getting the 68K to pretend be (for simplicity it could even be in chip RAM). Furthermore, it would be used for passing parameters, never bulk data, so even if it were in some very slow uncacheable location it wouldn't make any significant difference as you'd only ever be doing a few reads and writes to set up a "DMA" operation. The 68K would then read/write other memory locations as directed by the PPC. The PPC would have to ensure it called CachePreDMA() on the affected region beforehand and CachePostDMA() afterwards, just as it would with any other DMA device. The 68K, however, wouldn't need to worry about that as the only area it can cache is it's own private working set which the PPC would never touch after initializing it.
The idea is not to make data transfers faster since the bottleneck would be the device you were transferring data to or from. The idea was to make data transfers not chew up PPC cycles.
-edit-
I hope that's a bit clearer, but as I said, there are probably many other technical reasons why the whole idea would fall over that might not be obvious until trying it :)
-
@Karlos
Seriously when I asked for this , I didn't expect this thread to slip that way, and that is very interesting. your views on the possibility to use the 68k to do something instead of just being there in the cold is great, even if technical explanation is a little over my capabilities i have to admit, anyway thanks. That is Amiga hacking spirit :-)
@all
Do you know some software bench i can use under OS4 classic to compare the performance of a 060 to an emulated 68k with Petunia ?
@cgutjahr
I bought OS4.0 at launch date but haven't been happy with software compability back in the day and the fact that games (mostly whdload'ed) under runinuae were unplayable with the configuration i had at the time, and so I switched back to 3.9 but maybe I would have to try the 4.1 classic version or... 4.2 XD
-
Running classic game on os4 on a classic amiga will always be slow as hell.
Unless I am mistaken, you are emulating the whole chipset as well as the cpu.
-
Running classic game on os4 on a classic amiga will always be slow as hell.
Unless I am mistaken, you are emulating the whole chipset as well as the cpu.
You are mistaken. The native chipset is still supported in OS4 on classics. Even hardware banging stuff tends to work since the hardware being banged is actually present.
Stuff that is critically dependent on 68K v custom chip timing might not work so well but then that's often a problem for faster 68K processors too.
Unless you are talking about using UAE on the classic machines, in which case, yeah, it's pretty slow.
-
Oh ok I was always under the impression that running aga games for instance under os4 for classic had to be run in uae and were hence slow as hell, But you can just run an aga game in os4 and it will bang the hardware and the just he cpu will be emulated ?
Thats pretty neat/clever
-
Oh ok I was always under the impression that running aga games for instance under os4 for classic had to be run in uae and were hence slow as hell, But you can just run an aga game in os4 and it will bang the hardware and the just he cpu will be emulated ?
As a rule, yes. Obviously there are some incompatibilities, just as there are with accelerated 68K systems trying to run old 68000/OCS titles generally (without resorting to patching the games ala whdload for instance).
-
That is pretty cool. Has anyone ever written a version of whdload for Morphos ?
That would make the classic version of morphos interesting
-
That is pretty cool. Has anyone ever written a version of whdload for Morphos ?
No idea, but I wasn't implying there was a version for OS4, either. That was thrown in as an example that even without 68K emulation you often need a bit of an assist to get some classic games working on a classic system :)
That would make the classic version of morphos interesting
I've not had cause to boot it for a while, but my 1.4.5 install only seemed to work with RTG compatible titles.
-
I could never get MorphOS to boot on my 1200. I am guessing though that hardware banging stuff didnt work on it ? Piru ?
Does whdload run on os4 though ?
-
I could never get MorphOS to boot on my 1200. I am guessing though that hardware banging stuff didnt work on it ? Piru ?
I can't say I had a lot of joy on that front but OS/RTG friendly stuff ran well. My kit is pretty much A1200 + phase5 accelerator and graphics card so no surprises there :)
Does whdload run on os4 though?
Not tried it but I'd hazard a guess it probably doesn't. Or if it does, it will very much depend on what an individual installer patch does in order to get the specific game working. Poking around the supervisor model of the (emulated) 68K is probably a non starter.
-
@JJ
Nope, you just can't run whdload on OS4 without UAE (runinuae).
Some are saying that it needs a rewrite for hardbanging but no skilled coder(s) took the challenge up till now (snif...)
-
I found this : http://aminet.net/package/dev/c/measureconwos
Ram set at 60ns in the firmware on these BlizzardPPC :
Wos (060@72 rev5 & 603e@360 with a 72 Mhz FSB) : 668 ms
Pup (060@72 rev5 & 603e@360 with a 72 Mhz FSB) : 1031 ms
Wos (060@72 rev1 & 603e@200 with a 66 Mhz FSB) : 791 ms
Pup (060@72 rev1 & 603e@200 with a 66 Mhz FSB) : cannot load the ppc.library
-
You are mistaken. The native chipset is still supported in OS4 on classics. Even hardware banging stuff tends to work since the hardware being banged is actually present.
Stuff that is critically dependent on 68K v custom chip timing might not work so well but then that's often a problem for faster 68K processors too.
Unless you are talking about using UAE on the classic machines, in which case, yeah, it's pretty slow.
Must say that some kind of "Radeon is emulating AGA, or FPGA on SAM 440/460" would be nice thing to see just because with AGA chipset not being emulated and Petunia 68k JIT it would make OS 4.1 backward compatible without need for UAE + almost all WB and productivity apps working.
In these terms Classic OS 4.0/4.1 is most backward compatible OS 4.x just because AGA is onboard (and supported by OS), making UAE mostly not needed.
-
Karlos, if I understand what you're saying about this method of operation is to relegate the '060 to interfacing with the hardware, while the PowerPC is waiting for the memory registers to be open for use?
If so, this is a lot like what the Sega Saturn does in practice when you use both CPUs. In practice, the good development houses would have to utilize the caches of the CPU that is not performing I/O, but it did work well in games such as VF2 and Fighting Vipers, where one CPU performed the player character, and the other did the AI of the enemy. Parallel processing on older machines with only one memory bus is certainly a complex venture it seems.
-
The CPUs most definitely do run in parallel.
I don't know who invented this nonsense and for what purposes, but it is not true.
I just think a lot of people misunderstood the context switch problem as parallel processing wasn't mainstream at the time. Using a real 68k processor was a bad idea though.
-
Karlos, if I understand what you're saying about this method of operation is to relegate the '060 to interfacing with the hardware, while the PowerPC is waiting for the memory registers to be open for use?
Operationally, it would be something like this. Imagine you have implemented a virtual DMA driver for the motherboard IDE. This is a small 68K application located in memory designated as 68K cacheable, along with some workspace for the working set and MMU tables. The rest of the entire address space is marked as uncacheable.
This means the only stuff the 68K will cache is access to the 68K code, local working set, stack and so on, allowing it to operate at full speed on local data.
The host PPC now wants to load 128K into some nice cache aligned address in fast ram. It writes the base address, number of bytes to transfer and any other relevant information to a "register file" belonging to this device. In practise, this register file area needs to be uncached on the PPC too.
Having loaded the parameters, it then invokes the 68K into processing the request. How this would be done is still somewhat vague, but the 68K would likely have to be able to respond to interrupts issued by the PPC. The PPC task now goes to sleep, awaiting some indication the process has completed. At this point, the host OS will just find something else to do.
Concurrently, the 68K now starts retrieving data from the motherboard IDE into it's local working store. Once it has a 68K cache line's worth, it can transfer that whole line to the fast ram address and increment. Obviously non aligned bits will have to be handled, but this side is not rocket science.
When the 68K has finished loading the 128K, or some error occurs, it updates some output values accordingly in it's "register file" and signals the PPC to wake up the calling task. Said task ensures the PPC cache isn't now out of date, checks the return status in the register file and takes whatever action is now necessary, be it a return of success or invoke some error strategy.
If so, this is a lot like what the Sega Saturn does in practice when you use both CPUs. In practice, the good development houses would have to utilize the caches of the CPU that is not performing I/O, but it did work well in games such as VF2 and Fighting Vipers, where one CPU performed the player character, and the other did the AI of the enemy. Parallel processing on older machines with only one memory bus is certainly a complex venture it seems.
I'm not really clear how the saturn works, but yes, my idea is that the 68K runs as an uncached IO processor with the sole exception that it can cache some private working area that the PPC never touches.
-
I just think a lot of people misunderstood the context switch problem as parallel processing wasn't mainstream at the time. Using a real 68k processor was a bad idea though.
when the P5 cards were designed, the 68k was mandatory for running a 68k AmigaOS.
There was no MorphOS nor AmigaOS 4
-
@Karlos
I think I got it now... I'd call it 'fine-grained cache coordination' - definitely doable, provided the quick signalling bit (interrupts passing) works, but I can hardly imagine how anything should be working without it. You'd have to write new drivers for the 68k side though (or provide elaborate frontends to the existing ones, not sure if a generic one can do the trick). I can imagine that it may be easier to write new PPC drivers from scratch (definitely for something generic as IDE - yes, only an example of course). I got a bit distracted with the 'DMA' bit, but that's just one of the ways you could use the passing of jobs back and forth - very much like the AOS messaging system actually!
Ram set at 60ns in the firmware on these BlizzardPPC :
Wos (060@72 rev5 & 603e@360 with a 72 Mhz FSB) : 668 ms
Pup (060@72 rev5 & 603e@360 with a 72 Mhz FSB) : 1031 ms
Wos (060@72 rev1 & 603e@200 with a 66 Mhz FSB) : 791 ms
Pup (060@72 rev1 & 603e@200 with a 66 Mhz FSB) : cannot load the ppc.library
Woah - those definitely are killer times - far worse than I could imagine. But then again, flushing a better sized cache can take a while... Well, as Karlos has pointed out there are better ways, simply flushing everything is only the easiest by far.
-
@Karlos
I think I got it now... I'd call it 'fine-grained cache coordination' - definitely doable, provided the quick signalling bit (interrupts passing) works, but I can hardly imagine how anything should be working without it. You'd have to write new drivers for the 68k side though (or provide elaborate frontends to the existing ones, not sure if a generic one can do the trick). I can imagine that it may be easier to write new PPC drivers from scratch (definitely for something generic as IDE - yes, only an example of course). I got a bit distracted with the 'DMA' bit, but that's just one of the ways you could use the passing of jobs back and forth - very much like the AOS messaging system actually!
The reason I mentioned DMA is that this would seem to me to be the most useful thing the 68K could be doing as a service for the PPC native host OS simply because it could be implemented in the non-cache-flush-context-switchy manner described. That is to say, the PPC upon which the whole OS now runs sees the 68K as some IO processor that can read and write data from anywhere in the machine's entire address space - memory, ports, everything. And the most obvious use for this capability would be transferring data to/from the motherboard (IDE, parallel etc) resources from/to host memory buffers. This thus makes the 68K a very versatile IO processor for all the legacy hardware, one that is virtually limitlessly reprogrammable.
If you want to run actual 68K applications, the existing JIT emulation is much more sensible than some inverted WarpOS style idea, which would suffer all the same problems WarpOS itself does.
However, even if my idea could be made to work (and there's no guarantee, but I can't see any massive obstacles) as you suggest, entirely new PPC native drivers (an their 68K counterpart code) would be needed that could leverage such a system. If it could be made to work, however, to me it seems like a legitimate use for the old silicon. My 040 can read and write data from the old ports just as fast as the PPC can since the limit is usually the port. The main benefit of using the 68K to do this is to allow the PPC to other stuff while the IO is in progress.
-
when the P5 cards were designed, the 68k was mandatory for running a 68k AmigaOS.
There is no reason they couldn't have included a 68k emulator in rom to run AmigaOS.
-
There is no reason they couldn't have included a 68k emulator in rom to run AmigaOS.
I disagree. First of all, 68K emulation on PPC was not as mature back when these cards were devised. They'd have spent a long time developing a 68K emulation that probably ended up slower than their faster 68K boards, substantially so if they had to use intepretive methods. Apple had this very issue when they first moved to PPC only. Except at least they had the benefit of a PPC OS to mitigate it. We didn't even have that then.
-
I changed the rev5 by a rev6 and I get a stable 060@78 now
Ram set at 60ns in the firmware on my BlizzardPPC :
Wos (060@78 rev6 & 603e@324 with a 72 Mhz FSB) : 547 ms
-
I plugged one 40 ns Simm : http://leblogdecosmos.blogspot.fr/2012/04/simms-40ns.html
Ram set at 70 ns in the firmware on my BlizzardPPC :
Wos (060@80 rev6 & 603e@351 with a 78 Mhz FSB) : 619 ms