Author Topic: Imaginary 64-bit 680x0 (Read 5178 times)

Karlos · « **Reply #14 on:** March 19, 2022, 12:54:05 PM »

Quote

Even lazy flag calculation in an interpreter would be a win, I did that when I wrote an interpreter in 6502 assembler

For an emulator yes. For an interpreter? Why? Unless you intend to support arithmetic overflow and other special cases, I can't think of a single reason to implement it. If you want to take a branch because an operand is zero, not zero or bigger than some other operand, comparing them directly means you don't have to evaluate the side effects of any operation.

trekiej · « **Reply #15 on:** March 21, 2022, 03:01:19 AM »

Are there many 64 bit processor that are programmed in a FPGA?

Karlos · « **Reply #16 on:** March 21, 2022, 10:33:00 AM »

Quote from: trekiej on March 21, 2022, 03:01:19 AM

Are there many 64 bit processor that are programmed in a FPGA?

I don't know. This project is probably unsuited to an FPGA implementation for the moment (it's not "hardware" enough), although having seen an FPGA that was built to just run E1M1 of Doom, I'm less sure of that.

For real 64 bit processors, with the exception of the Dec Alpha and a few other very early 64 bit processors, I imagine the biggest reason not to have FPGA implementations would be that they're all still in mainstream use.

psxphill · « **Reply #17 on:** March 24, 2022, 09:02:01 AM »

Quote from: Karlos on March 21, 2022, 10:33:00 AM

I imagine the biggest reason not to have FPGA implementations would be that they're all still in mainstream use.

I think it comes down to cost of the FPGA. Something like the DE10 Nano is pushing it running a 486 PC. Apollo isn't particularly better.

If you simplify your CPU then sure you could do a 256 bit one in an FPGA, but it wouldn't necessarily be able to do anything worthwhile.

Similar to how the TI99/4 had a 16 bit CPU but was so crippled that it was outperformed by machines with 8 bit CPU's

Karlos · « **Reply #18 on:** March 24, 2022, 07:52:30 PM »

The benefit of writing a software CPU (and I use the term in the loosest possible sense as it's missing critical features any self respecting emulation of a CPU has) is that you can do whatever you want. I did once think about a VM design that was entirely vector based except for a set of address registers (including stacks and PC). In order to do scalar operations. In order to control which elements of each vector were affected you set a mask. It was an interesting thought experiment but I never took it much further. It would be a pain to program for.

Karlos · « **Reply #19 on:** April 17, 2022, 05:20:58 PM »

I got bored with the basic virtual framebuffer and added a tiny interpreter to that as well at the point where pixels are converted to the XImage / GL Texture that's displayed. This copies the basic idea of the Amiga's Copper and executes instructions on specific "beam" coordinates. These instructions can modify palette entries and the basic X/Y viewport offsets. They can also modify the script itself for maximum horror.

The middle section of the display is scrolled here by using these hacks...
https://youtu.be/OgThodWAVOk

Waccoon · « **Reply #20 on:** April 19, 2022, 06:35:41 AM »

I'm a bit disappointed. It's not much like a 68000 at all, even if the assembler mnemonics are similar.

I was interested in the instruction encoding, because that's a very important detail for judging how much hardware would be needed to turn it into a real CPU, not to mention the code density. Oh yeah, and the novelty factor as well. 68K is pretty clever with how it encodes instructions and has very dense code. Among the "complex" CISC designs, 68K is pretty interesting.

The encoding of MC64K is actually a lot like x86 (!), as it just tacks on numerous extra extension bytes as modifiers, which is lazy. Like x86, you need to parse almost the whole instruction to determine how long it is, which is just silly. You can't easily add new instructions in the future without really going down the x86 route by adding opcode pages, which is messy. If you're doing to support variable instruction length, this implementation is a terrible way to do it, regardless of whether it's done in software or hardware.

Karlos · « **Reply #21 on:** April 19, 2022, 11:16:23 AM »

It's a bytecode interpreter. It's not remotely intended for running on an FPGA. If it looks like x86 that's because x86 looks a lot like bytecode. Not the other way around. The bytecode here is essentially a binary tokenised representation of the assembler source code.

Not all instructions are varying length, that depends on the operands. Register to register operations are 3 bytes, whether encoded using the generic EA mode or the register to register fast path. The operand size is a fixed function of the opcode. As for the suitability of the format, the simple fact is it's efficient enough. It runs functionally equivalent code faster than UAE does using interpretive mode on the same hardware, faster than a 100MHz 68060 does natively (memory bandwidth doesn't help it) and on my few years old i7500U laptop faster than my 603 runs functionally equivalent PPC native code at 240MHz.

Given the intent is just to have fun writing old demo style effects in a syntactically familiar settings, I think it's fast enough. And when it isn't, there's always the option of adding a JIT.

People seem to misunderstand what this is, despite the fact it's spelled out explicitly in the project readme page. It will only run as an interpreter with the goal of providing JIT at some stage.

Think of it as an Interpreter / Programming Language environment. Like Java was originally. Only instead of a high level compiled language syntax, it's an assembler syntax that is inspired by 680x0. Note the word inspired. Not identical. Not compatible.

Karlos · « **Reply #22 on:** April 20, 2022, 10:31:37 AM »

@Waccoon

I thought this criticism was worthy of an additional note:

Quote

The encoding of MC64K is actually a lot like x86 (!), as it just tacks on numerous extra extension bytes as modifiers, which is lazy. Like x86, you need to parse almost the whole instruction to determine how long it is, which is just silly.

What you describe as lazy is probably true for a hardware implementation. However, for a bytecode interpreter, it's pretty sensible because parsing is part of execution regardless. Let's consider what is just about the longest example I can think of:

fbgt.d #4.669201609, $ABADCAFE(a0, d0.l * 4), .target

The bytecode layout of this instruction would be (from lowest address to highest)

{ fbgt.d : 1 byte } { dst ea mode : 1 byte } { dst ea mode reg pair : 1 byte } { dst ea mode offset : 4 bytes } { src ea mode : 1 byte } { src ea literal : 8 bytes } { branch displacement : 4 bytes }

That's 20 bytes in total, which is pretty long. However, consider that this is executed serially by the interpreter:

1. Get the opcode => jump to handler for fbgt.d
2. Get the destination EA mode => jump to handler for scaled index with displacement
2.1 Get the register number pair for the scaled indexing, calculate base effective address
2.2 Get the displacement from the opcode stream and add to the base effective address.
2.3 Fetch the double at the address
3. Get the source EA mode => jump to handler for immediate 8 byte
3.1 Get the double in the opcode stream
4. Compare the operands
4.1 If the comparison is true, get the branch displacement in the opcode stream and add to the PC ready to branch.
4.2 if the comparison is false, step over the branch displacement to the next instruction.

At each step, the data that is needed is next in the instruction stream. All of these operations deal with bytes apart from the one packed register pair, which allows for very simplistic C code (read the kind the compiler can optimise most readily) to be used. As there is no performance penalty for misaligned access to a cached value on x64, the 4 and 8 byte entities are read as such. Each step of the decode and execution advances the PC. Thus the PC is always aligned at an instruction boundary on completion.

Note that it's not easy to create even longer instructions, since immediate values are illegal destination address modes. Any other operation that can be statically evaluated by the assembler, for example comparing an EA mode to itself (as long as the EA mode has no increment/decrement semantics) is evaluated by the assembler and folded to a fixed branch (for true) or nothing (for false).

Regarding the need to determine the instruction size, the only part of the current workflow that needs to care about that is the assembler itself. It's a simple two pass design that during pass 1 resolves any references to things it's already seen immediately, and references to things it hasn't seen yet in during pass 2.

Finally while you contend that using extension bytes is a bad idea for extending the instruction set, I'd say yes and no. First of all, I'm not especially interested in adding extra instructions, I'm interested in writing code that feels familiar. Secondly, there is an instruction that can call the host, the basic mechanism for which costs about the same as 2 of the fastest class of register to register operations. This allows the host to provide all sorts of native code solutions for things, including classically vectorisable stuff. We use this for the basic IO, graphics, etc. but there's also a vector algebra set for 2D/3D calculation, bulk memory operations and so on.

Adding an extension instruction set to the interpreter is totally possible and would just reserve a prefix byte to jump to the corresponding handler. However, for stuff like SIMD this is not ideal since the cost of the scaffolding around the instruction would limit the gain made by using it. However, if JIT is ever introduced, then the multibyte instruction encoding becomes somewhat moot and at that point introduction of SIMD operations as an instruction set extension would definitely have some merit.

TheBilgeRat · « **Reply #23 on:** May 09, 2022, 07:43:08 AM »

That's rather cool - although I wrote almost no 680x0 assembly and haven't a lot of nostalgia for it

I was thinking the other day about going the opposite direction back to 8 bit just on faster/tighter packages...

Karlos · « **Reply #24 on:** May 09, 2022, 01:51:17 PM »

There's two kinds of people. Those that loved 68K style assembly and those that never tried it

TheBilgeRat · « **Reply #25 on:** May 09, 2022, 03:34:03 PM »

Then I have no choice!

Karlos · « **Reply #26 on:** May 09, 2022, 10:34:00 PM »

Can't beat a good nerdsnipe.

psxphill · « **Reply #27 on:** May 10, 2022, 10:15:17 PM »

Quote from: TheBilgeRat on May 09, 2022, 07:43:08 AM

I was thinking the other day about going the opposite direction back to 8 bit just on faster/tighter packages...

Ultimate 64 has a nearly 48mhz 6510, mega65 gets to around 40mhz, turbo chameleon & supercpu does 20mhz.

supercpu & mega65 have cpu extensions that allow direct access to more than 64k, but you are coding specifically for one of the two platforms and nothing else.

Karlos · « **Reply #28 on:** May 11, 2022, 01:34:26 PM »

I wonder if anyone's ever put an eZ80 into a spectrum?

TheBilgeRat · « **Reply #29 from previous page:** May 12, 2022, 01:27:10 AM »

Not sure - I do know there are a few proto boards out there like Cerebrus....and that one that has an eZ80 that runs I think around 300Mhz

Author Topic: Imaginary 64-bit 680x0 (Read 5178 times)

Karlos

Re: Imaginary 64-bit 680x0

trekiej

Re: Imaginary 64-bit 680x0

Karlos

Re: Imaginary 64-bit 680x0

psxphill

Re: Imaginary 64-bit 680x0

Karlos

Re: Imaginary 64-bit 680x0

Karlos

Re: Imaginary 64-bit 680x0

Waccoon

Re: Imaginary 64-bit 680x0

Karlos

Re: Imaginary 64-bit 680x0

Karlos

Re: Imaginary 64-bit 680x0

TheBilgeRat

Re: Imaginary 64-bit 680x0

Karlos

Re: Imaginary 64-bit 680x0

TheBilgeRat

Re: Imaginary 64-bit 680x0

Karlos

Re: Imaginary 64-bit 680x0

psxphill

Re: Imaginary 64-bit 680x0

Karlos

Re: Imaginary 64-bit 680x0

TheBilgeRat

Re: Imaginary 64-bit 680x0