Author Topic: FPGA Replay Board (Read 995993 times)

vidarh · « **on:** January 07, 2011, 04:27:17 PM »

Quote from: Retro_71;604456

Any ETA Mike?? My money is burning a whole and before the wife sees it!!!

Wives are why bankers invented numbered Swizz bank accounts :lol:

vidarh · « **Reply #1 on:** January 23, 2011, 07:22:16 PM »

Quote from: DCAmiga;608472

Very Nice 112 MB/s but just out of curiousity, wouldnt it be better to also add a CPU 32-bit data bus aswell?

Well, yeah, that's presumably the reason yaqube earlier said it'll get faster once 32 bit transfers are turned on

vidarh · « **Reply #2 on:** March 10, 2011, 01:21:25 PM »

Quote from: freqmax;620861

Exactly sell it with free open source Boot-ROM and OS. And get rid of the whole issue. Doesn't seem reasonable to be responsible for user actions.

(any link where to download this AROS m68k ROM?)

aros.org has a download link for nightlies, or you can get the latest WinUAE beta's - WinUAE now comes with a "built in" AROS ROM.

vidarh · « **Reply #3 on:** March 26, 2011, 09:59:50 AM »

Quote from: freqmax;624782

The purpose of the 68060 board is to reverse engineer the CPU asfair?, so it really doesn't matter too much unless that's the purpose.

I don't think need to reverse engineer it? All the timings etc. as well as fairly detailed block diagrams are well documented both from official documentation and years of testing, and the N68050 is according to their team already more advanced in many ways.

I believe it's more as a development tool while they finish up the 050 and verify it's working properly

vidarh · « **Reply #4 on:** March 28, 2011, 09:20:17 AM »

Quote from: Mathias;625239

There are some ideas/concepts, but work on it will start this summer, as we also plan to do our own 68k handler, once the first series is out to the customers.

An idea for you guys: Look into a tracing JIT. Effectively you start by putting a breakpoint at the very start of the application. Then when the breakpoint is hit, the trap handler checks each following instruction until a branch point. At the branch point you insert another breakpoint.

If any of the found instructions are "unsafe" (differ between M68k and Coldfire) you rewrite it to an equivalent safe variation or patch in a jump to a tiny dynamically generated function that emulates the right behavior and jumps back (if there's no space to change it inline). Then you let the instruction stream execute until the next trap, and repeat.

Initially things will run slowly, but after each code path has been executed once, it'll have been rewritten and will run *far* faster than having to run a trap handler for every hit.

This approach is what's used by modern Javascript JIT's - there it's complex because they have to do complex code generation, but for M68k => Coldfire most of the code is already valid so the JIT can pass over all safe instruction and only need to recognize the few cases that are not valid Coldfire code and know how to generate code to emulate just those instructions.

You could probably do a decent functioning tracer/JIT for M68k => Coldfire in a couple of thousand lines of code.

vidarh · « **Reply #5 on:** March 28, 2011, 11:09:05 AM »

Quote from: Mathias;625256

Something like this is planned by Fredi Aschwanden. I just avoided the word JIT, because I do not completely understand it myselve, and the next question normally is "why not us a PPC than" ;)Such a "partial JIT" will solve everything except selfmodifying code as far as I understood. I will post your message insiode our development forum. Thanks.

Well, you could just as well ask "why not switch to emulation", but then I guess most of us here aren't all that rational when it comes to our attachment to the M68k architecture

And the other thing is that a JIT from M68k to PPC is massively more complicated. A M68k to Coldfire JIT can just skip past most instructions, and so for almost all instructions all it needs to know is how to determine that they fall in the "safe/compatible" category, and how to figure out the length of the operands to skip them. For m68k that's very easy for most instructions - there aren't that many variations of the encoding.

In theory there's nothing really stopping you from undoing relocation and generating/caching a new fully or partially translated binary through this process either, though it might not be much point adding the complexity.

Looking at the size of M68k instruction decoders, you can do enough of the instruction decode to skip the safe instructions in at most a couple of hundred lines of code (the only one I happen to have sitting on my hd is 457 lines of C to decode instructions fully to text - a decoder that for most cases only needs to figure out the type of instruction and size would be far smaller).

Beyond that it's the code to manage breakpoints and splice in "fixed" instructions, but that shouldn't be huge either.

vidarh · « **Reply #6 on:** March 28, 2011, 03:59:55 PM »

Quote from: Mathias;625295

For the rest of your posting, volutneers are highly welcome at every time )

See, the thing is I'm very tempted to write a tracer for M68k, but I've overcommitted myself to way too many projects already that are all proceeding at snails pace, so I'm hoping to trick someone else into doing the work :-P

The Coldfire part isn't really *that* interesting for me (though it'd be kind of cool to get AROS M68k running on Firebee...), because it's not that relevant for Amiga, but there's tons of other cool stuff you could do if someone wrote a generic M68k tracer (e.g. one that's configurable as to which instructions it traps), like building all kinds of funky debugging and analysis tools that'd work even on non-MMU systems.

vidarh · « **Reply #7 on:** March 28, 2011, 05:54:01 PM »

Quote from: psxphill;625330

The major problem is latency. You'll have to sit between the cpu and ram and insert something like a line-a/illegal instruction.

You misunderstand what I was describing. I was describing a purely software solution similar to a JIT (just in time compiler): loadseg() the binary, run a function to decode and process instructions up until the first potential branch, insert a breakpoint, jump into the (possibly modified) instruction stream. Repeat.

*No* hardware support is needed for this.

Quote

You'll also need some hackery to determine which is the opcode and which is the operands, which may also require lookups.

This "hackery" for M68k is really easy - as I mentioned, a full instruction decoder for M68k is at most a couple of hundred lines of C.

Quote

CF is fine for new software, you can even make new software that works on CF & 68k. But there are always going to be doubts over old software.

JIT of instruction streams that deviate massively from the target can approach compiled C in performance. E.g. Java bytecode JIT's or Lua for example. For M68k => CF getting performance far outstripping the original M68k's ought to be fairly trivial.

vidarh · « **Reply #8 on:** March 28, 2011, 06:19:31 PM »

Quote from: ChaosLord;625339

You make it sound very very very easy.

If it is that easy then why couldn't Elbox get it working at high speed on their Dragon?

Did they ever try? AFAIK most or all attempts at dealing with the CF incompatibilities so far have focused on trapping the instructions, which is horribly inefficient. Most likely they simply never thought about this precise approach, or they didn't have anyone who understood JIT's - you can probably still count the number of people who have ever implemented a JIT worldwide in the low 3 digits.

The tracing JIT mechanism is well tested in far more challenging circumstances (JIT'ing from things like Java and Lua bytecode, which is massively more complicated because it doesn't match the target machine code at all) and have been shown to work well. The new Lua JIT in particular is very impressive.

Quote

What do you do about games that don't use LoadSeg() ?

The method you use to load the binary is secondary. The only caveat is that you need to be able to start the translator *before* the code you ant to process, and to have some reasonable guarantee that you can prevent the JIT translator itself from being overwritten, but for modern reimplementations, providing a write barrier shouldn't be a major obstacle.

vidarh · « **Reply #9 on:** March 28, 2011, 09:01:43 PM »

Quote from: psxphill;625363

JIT may work, but it'll be slower & need more ram.

Slower and need more RAM than what? The alternative is to not run the application at all, or run it under emulation. For stuff you have source to, recompiling it is the better alternative.

In terms of RAM, unless the code is self-modifying, you can get away with only very minor amounts for cases like m68k to CF, in order to patch in emulation of instructions that are not supported and that can't be replaced in-line with code of the same size.

You only need to maintain two copies of the code *if* you need to deal with self modifying code. A simple solution is to not deal with it in ordinary cases, and possibly not at all (frankly, given the small, finite amount of legacy code relying on self modifying code, it's probably better to spend the time patching the few programs that do).

Quote

Code accessed through jump tables are also difficult to find until you actually get to it.

Exactly, and that's the reason to do a tracing JIT instead of static translator.

With a tracing JIT it's easy, as you'll always hit a breakpoint when the branch should happen until all paths have been completely traced. A major point of a tracing JIT as opposed to a method based JIT is exactly to make it trivial to handle control flow.

Quote

There is a difference between writing a JITing for a language that was designed for it & one that isn't.

Yes, but in this case it's vastly *easier*, as the mapping function for the vast majority of instructions is simply the identity function (that is, nothing is done other than to skip to the next instruction).

The existence of JIT's that JIT m68k to i386 or PPC demonstrates a worst case bound where all instructions need to be JIT'd. Yet there are decently performing JIT's that do that. For m68k to Coldfire the case is far simpler.

vidarh · « **Reply #10 on:** March 28, 2011, 09:07:03 PM »

Quote from: psxphill;625374

self modifying code is fine as long as you clear the cpu caches afterwards, so as long as you ditch the jit cache when the caches are cleared then it'll work.

Having to clear the cache is another reason why it's seen as bad practice, beyond being horribly unmaintainable. There's a reason people pretty much stopped doing it in the mid 80's.

Quote

How about pushing an address on the stack and then returning?

Simple enough to detect.

Quote

Although rts will need to cope anyway as the code you're returning to might have been flushed if your jit cache fills up. So it will have to always do a lookup to find the real code.

There would be no "jit cache" - that's the entire point of how to make it fast and simple - you'd patch the live code directly, so no, it doesn't need to do any lookups because to pushed address would be the address of the real code.

vidarh · « **Reply #11 on:** March 28, 2011, 09:56:42 PM »

Quote from: freqmax;625394

Self modifying code can have significant perfomance gains when cycles are hard to come by.

Lets separate two definitions here. *Generating* code at runtime is not necessarily a bad thing - after all that's what a JIT does. *Modifying* code by writing into already in-use parts of the code segment is a nasty thing.

The former is easy enough to handle with JIT too - any indirect jump would necessarily need to be replaced with a guard/breakpoint (not necessarily a trap - a jump to a handler in the JIT is sufficient, and can be much cheaper) that ensures no direct jump to untranslated code happens unless the indirection can be shown to be "safe" (relative to a known base, such as a library base).

Actual self modifying code as opposed to code that safely generates new code is not necessary for performance at all in my view - I believe you can get all the benefits of it by generating code in cleaner ways. But even self-modifying code is not _necessarily_ a big problem to handle - in most cases you can reasonably easily determine with tracing which instruction sequences can lead to writes to address ranges in the code segment, though it does complicate the tracer for very little benefit.

Frankly, I haven't seen self modifying code used for any good purpose since my Commodore 64 days (and then for cycle exact timing for raster effects, not for performance)... I'd be very interested in seeing a good example of it being used in a way where it couldn't easily be avoided without sacrificing a lot of performance.

vidarh · « **Reply #12 on:** March 28, 2011, 11:14:32 PM »

Quote from: psxphill;625408

So all you're going to do is patch at load time?

No. At runtime. It wouldn't be a JIT if it tried to do it all at once.

Quote

You might find some software that works for, but you can't get 100% coverage of all opcodes on all software at load time.

Which is why you trace the execution until each branching point and JIT trace by trace rather than the whole thing at once, at which point determining the instruction stream is trivial (couple of hundred lines of C, at most, as I said - I have about half a dozen M68k instruction decoders sitting around on my harddisk from various disassemblers and other tools).

Doing it this way means you can analyze each trace fairly easily to determine if the branch point is static (return to caller or branch to a specific address) or dynamic (in the latter case you'd need to insert a jump to a small guard function to ensure you don't jump to untranslated code unless to you can compute the full set of branch points. If in doubt you err on the side of treating it as dynamic, at a slight performance cost.

In reality, the cases here you'd need a guard are so rare that it's most likely not even worth optimizing (though there are a number of well understood ways of doing it, such as polymorphic inline caching, first developed for Self).

Note that for example jump tables for the most part does *not* fall in this category, as recognizing sufficient number of the most common jump tables approaches is fairly simple and handling them easy enough (add a small guard function that checks bounds, and adds breakpoints for all functions between the previous highest/lowest jump table values used if they can't be statically determined, or otherwise just jumps to them, trigger a breakpoint if the code hasn't been traced yet - you suffer a worst case cost of a couple of compares and branches once the translation has been done).

Quote

Is copy protection a good purpose?

No. Given that few of them prevented anything from getting copied for more than days back in the day, I'd say that's an exceedingly good example of how pointless it is. While getting originals to run would be nice, and handling the most basic self modifying code is reasonably straightforward, it's a clear example of what I'd consider a waste of time given that finding cracks is easy enough.

vidarh · « **Reply #13 on:** March 30, 2011, 10:36:24 AM »

Quote from: espskog;625943

I got mine yesterday (first batch).

Trying hard not to be jealous.... Any first impressions?

vidarh · « **Reply #14 on:** March 30, 2011, 10:39:42 PM »

Quote from: psxphill;626058

What you're suggesting is impossible. You say it won't take more ram and yet you're somehow going to have the entire program in memory and have it patch itself. Just try and code it.

I say it won't take a lot more RAM. For a typical program where only a small percentage of the instructions need to be patched, the growth would hardly be noticeable.

My point was that there's no reason to generate the whole program all over again at a new location in memory.

Author Topic: FPGA Replay Board (Read 995993 times)

vidarh

Re: FPGA Replay Board

vidarh

Re: FPGA Replay Board

vidarh

Re: FPGA Replay Board

vidarh

Re: FPGA Replay Board

vidarh

Re: FPGA Replay Board

vidarh

Re: FPGA Replay Board

vidarh

Re: FPGA Replay Board

vidarh

Re: FPGA Replay Board

vidarh

Re: FPGA Replay Board

vidarh

Re: FPGA Replay Board

vidarh

Re: FPGA Replay Board

vidarh

Re: FPGA Replay Board

vidarh

Re: FPGA Replay Board

vidarh

Re: FPGA Replay Board

vidarh

Re: FPGA Replay Board