Amiga.org

Amiga computer related discussion => General chat about Amiga topics => Topic started by: Karlos on March 10, 2022, 07:55:41 PM

Title: Imaginary 64-bit 680x0
Post by: Karlos on March 10, 2022, 07:55:41 PM: Have you ever wondered what it would be like to have a 64-bit 68000? Not specifically the hardware architecture per se, bur the familiar programming environment? Of all the assembly languages I ever learned, 68000 was my favourite. During lockdown, a few of us wondered about it a bit too much, so we decided to build our own. We aren't hardware engineers bur I've implemented a few virtual machines in my time, so we went that route...

https://youtu.be/3W7gG8MXit4
Title: Re: Imaginary 64-bit 680x0
Post by: nicholas on March 16, 2022, 03:09:11 PM: Have all the creatives left? :'(
Title: Re: Imaginary 64-bit 680x0
Post by: BozzerBigD on March 16, 2022, 04:21:24 PM: @Karlos

It would just address more memory, I can't see what other difference it would make and it would be pointless unless 64-bit apps became standard.
Title: Re: Imaginary 64-bit 680x0
Post by: nicholas on March 16, 2022, 11:03:09 PM: Whoosh!

Here is there repo. The readme is quite upfront about what this is and why we are doing it.

https://github.com/intuitionamiga/mc64000
Title: Re: Imaginary 64-bit 680x0
Post by: Karlos on March 16, 2022, 11:16:47 PM: @BozzerBigD - I think you may have missed the point of this. It's not about what difference it would make to end users, it's about what it might be like to write code for. When AMD extended the x86 to 64 bit, it wasn't just wider registers, there were more of them, newer instructions, etc.

For MC64K, for example, we merged the integer and address registers into 16 64-bit ones. Similarly we increased the FPU to 16 registers too. We also allowed effective addressing modes to be used for source and destination operands for all dyadic instructions that made sense and likewise made some of the monadic ones dyadic. As we were making a virtual machine to run the code on we were free to experiment in this way without worrying about how difficult it would be to implement in in silicon.
Title: Re: Imaginary 64-bit 680x0
Post by: psxphill on March 17, 2022, 07:40:02 AM: Is it really a 64 bit 680x0 though?

It kinda reminds me more of a 64bit MIPS cpu.
Title: Re: Imaginary 64-bit 680x0
Post by: Karlos on March 17, 2022, 10:10:58 AM: Quote from: psxphill on March 17, 2022, 07:40:02 AM
Is it really a 64 bit 680x0 though?

It kinda reminds me more of a 64bit MIPS cpu.

It's a virtual machine so any similarity to any silicon (living or dead) is coincidental. MIPS is a load-store, register-to-register RISC style architecture. MC64K is not. You can use most of the familiar 68000 addressing modes for each operand. It just doesn't support the "memory indirect" modes added in the 020. It's also much easier (and preferred) to write position independent code because branch offsets are 32-bit and you'd be doing well to create bytecode gigabytes in size.

The largest divergence from 680x0 is in the handling of conditional logic. Maintaining a status register is a pain and for the vast majority of executed instructions it's wasted effort since the condition codes are not used. Instead we implemented a "compare and branch" model for simplicity. We aren't creating a simulator or emulator and accept that this will not be able to do "anything a 68000 can" because it doesn't do overflow or carry, for example. However for the things it can do, you can often do in fewer instructions because there are wider types, more available registers, ability to use effective addresses for both source and destination at the same time, etc.

These simplifications allow the bytecode interpreter to run at speeds that make it actually fun to play with. The simple Rotozoom demo in the video, for example, is running full RGB 640*480 at 30fps with time to spare despite being the most naive implementation I could think of. Factoring in the idle time and instructions per pixel, it's about 350 MIPS on my laptop (i7-7500U). Not going to win any prizes but it's adequately fast enough for most things. Obviously a JIT would be better but that's a long way off.
Title: Re: Imaginary 64-bit 680x0
Post by: Karlos on March 17, 2022, 11:25:29 AM: It's also little endian, so there's that I suppose. Not as big of a deal as it sounds as it's not an emulator and doesn't have to have binary compatibility with anything. The choice was dictated by the fact the immediate target hosts are x64 and AArch64. It means little from an assembly language perspective since registers have always behaved little endian on 68000 and the only gotcha is when accessing memory at the same address.
Title: Re: Imaginary 64-bit 680x0
Post by: AJCopland on March 18, 2022, 02:14:32 PM: It's cool that you can just decide to create a virtual CPU because it's fun!

Interesting and cool work
Title: Re: Imaginary 64-bit 680x0
Post by: Karlos on March 18, 2022, 09:10:20 PM: Quote from: AJCopland on March 18, 2022, 02:14:32 PM
It's cool that you can just decide to create a virtual CPU because it's fun!

Exactly. We were having a nostalgic conversation about the days when assembler was fun to write and still legible, things we'd wished the 68000 had, or had done differently from a user programming perspective.

Designing bytecode machine was pretty easy as I've done more than one in the past. By far the largest effort was actually writing the assembler tool itself. It's not especially brilliant but it does work.
Right now the focus is on building a "host" that provides the IO, graphics, audio etc. that are exposed through a trap like mechanism.
Title: Re: Imaginary 64-bit 680x0
Post by: Pyromania on March 18, 2022, 10:32:23 PM: Very cool project.
Title: Re: Imaginary 64-bit 680x0
Post by: psxphill on March 19, 2022, 11:29:54 AM: Quote from: Karlos on March 17, 2022, 10:10:58 AM
Obviously a JIT would be better but that's a long way off.

Even lazy flag calculation in an interpreter would be a win, I did that when I wrote an interpreter in 6502 assembler.

The no flags is what made me think of MIPS, but then 88100 doesn't appear to have flags either. But the 68k does.

You can create a virtual CPU if you want, I just think it's not inspired enough by the 68k to make a big deal out of it.

6502 was inspired by pdp11, but that was never once mentioned in documentation.
Title: Re: Imaginary 64-bit 680x0
Post by: nicholas on March 19, 2022, 12:30:15 PM: Quote
6502 was inspired by pdp11, but that was never once mentioned in documentation.

As was the 68000, which is sod all like the 6502.

mc64k is inspired by the 68k because that's what we know and love the best.

Feel free to use it or not. No skin off our noses.
Title: Re: Imaginary 64-bit 680x0
Post by: Karlos on March 19, 2022, 12:35:04 PM: I think you're conflating the implementation specifics with the intent. The intent is to provide a programming experience familiar to those used to writing 680x0 assembly but providing 64 bit support. The no flags implementation was a simplification trade off. If your code is mostly comprised mostly of conditional branching instructions then sure, it's going to less familiar. You're also probably doing something a bit weird.

The Rotozoom demo code his here: https://github.com/IntuitionAmiga/MC64000/blob/main/assembler/test_projects/rotozoom/src/main.s

I'm sure most 68000 programmers would have no difficulty understanding it and navigating the differences from actual 68000.
Title: Re: Imaginary 64-bit 680x0
Post by: Karlos on March 19, 2022, 12:54:05 PM: Quote
Even lazy flag calculation in an interpreter would be a win, I did that when I wrote an interpreter in 6502 assembler

For an emulator yes. For an interpreter? Why? Unless you intend to support arithmetic overflow and other special cases, I can't think of a single reason to implement it. If you want to take a branch because an operand is zero, not zero or bigger than some other operand, comparing them directly means you don't have to evaluate the side effects of any operation.
Title: Re: Imaginary 64-bit 680x0
Post by: trekiej on March 21, 2022, 03:01:19 AM: Are there many 64 bit processor that are programmed in a FPGA?
Title: Re: Imaginary 64-bit 680x0
Post by: Karlos on March 21, 2022, 10:33:00 AM: Quote from: trekiej on March 21, 2022, 03:01:19 AM
Are there many 64 bit processor that are programmed in a FPGA?

I don't know. This project is probably unsuited to an FPGA implementation for the moment (it's not "hardware" enough), although having seen an FPGA that was built to just run E1M1 of Doom, I'm less sure of that.

For real 64 bit processors, with the exception of the Dec Alpha and a few other very early 64 bit processors, I imagine the biggest reason not to have FPGA implementations would be that they're all still in mainstream use.
Title: Re: Imaginary 64-bit 680x0
Post by: psxphill on March 24, 2022, 09:02:01 AM: Quote from: Karlos on March 21, 2022, 10:33:00 AM
I imagine the biggest reason not to have FPGA implementations would be that they're all still in mainstream use.

I think it comes down to cost of the FPGA. Something like the DE10 Nano is pushing it running a 486 PC. Apollo isn't particularly better.

If you simplify your CPU then sure you could do a 256 bit one in an FPGA, but it wouldn't necessarily be able to do anything worthwhile.

Similar to how the TI99/4 had a 16 bit CPU but was so crippled that it was outperformed by machines with 8 bit CPU's
Title: Re: Imaginary 64-bit 680x0
Post by: Karlos on March 24, 2022, 07:52:30 PM: The benefit of writing a software CPU (and I use the term in the loosest possible sense as it's missing critical features any self respecting emulation of a CPU has) is that you can do whatever you want. I did once think about a VM design that was entirely vector based except for a set of address registers (including stacks and PC). In order to do scalar operations. In order to control which elements of each vector were affected you set a mask. It was an interesting thought experiment but I never took it much further. It would be a pain to program for.
Title: Re: Imaginary 64-bit 680x0
Post by: Karlos on April 17, 2022, 05:20:58 PM: I got bored with the basic virtual framebuffer and added a tiny interpreter to that as well at the point where pixels are converted to the XImage / GL Texture that's displayed. This copies the basic idea of the Amiga's Copper and executes instructions on specific "beam" coordinates. These instructions can modify palette entries and the basic X/Y viewport offsets. They can also modify the script itself for maximum horror.

The middle section of the display is scrolled here by using these hacks...
https://youtu.be/OgThodWAVOk
Title: Re: Imaginary 64-bit 680x0
Post by: Waccoon on April 19, 2022, 06:35:41 AM: I'm a bit disappointed. It's not much like a 68000 at all, even if the assembler mnemonics are similar.

I was interested in the instruction encoding, because that's a very important detail for judging how much hardware would be needed to turn it into a real CPU, not to mention the code density. Oh yeah, and the novelty factor as well. 68K is pretty clever with how it encodes instructions and has very dense code. Among the "complex" CISC designs, 68K is pretty interesting.

The encoding of MC64K is actually a lot like x86 (!), as it just tacks on numerous extra extension bytes as modifiers, which is lazy. Like x86, you need to parse almost the whole instruction to determine how long it is, which is just silly. You can't easily add new instructions in the future without really going down the x86 route by adding opcode pages, which is messy. If you're doing to support variable instruction length, this implementation is a terrible way to do it, regardless of whether it's done in software or hardware.
Title: Re: Imaginary 64-bit 680x0
Post by: Karlos on April 19, 2022, 11:16:23 AM: It's a bytecode interpreter. It's not remotely intended for running on an FPGA. If it looks like x86 that's because x86 looks a lot like bytecode. Not the other way around. The bytecode here is essentially a binary tokenised representation of the assembler source code.

Not all instructions are varying length, that depends on the operands. Register to register operations are 3 bytes, whether encoded using the generic EA mode or the register to register fast path. The operand size is a fixed function of the opcode. As for the suitability of the format, the simple fact is it's efficient enough. It runs functionally equivalent code faster than UAE does using interpretive mode on the same hardware, faster than a 100MHz 68060 does natively (memory bandwidth doesn't help it) and on my few years old i7500U laptop faster than my 603 runs functionally equivalent PPC native code at 240MHz.

Given the intent is just to have fun writing old demo style effects in a syntactically familiar settings, I think it's fast enough. And when it isn't, there's always the option of adding a JIT.

People seem to misunderstand what this is, despite the fact it's spelled out explicitly in the project readme page. It will only run as an interpreter with the goal of providing JIT at some stage.

Think of it as an Interpreter / Programming Language environment. Like Java was originally. Only instead of a high level compiled language syntax, it's an assembler syntax that is inspired by 680x0. Note the word inspired. Not identical. Not compatible.
Title: Re: Imaginary 64-bit 680x0
Post by: Karlos on April 20, 2022, 10:31:37 AM: @Waccoon

I thought this criticism was worthy of an additional note:
Quote
The encoding of MC64K is actually a lot like x86 (!), as it just tacks on numerous extra extension bytes as modifiers, which is lazy. Like x86, you need to parse almost the whole instruction to determine how long it is, which is just silly.

What you describe as lazy is probably true for a hardware implementation. However, for a bytecode interpreter, it's pretty sensible because parsing is part of execution regardless. Let's consider what is just about the longest example I can think of:

fbgt.d #4.669201609, $ABADCAFE(a0, d0.l * 4), .target

The bytecode layout of this instruction would be (from lowest address to highest)

{ fbgt.d : 1 byte } { dst ea mode : 1 byte } { dst ea mode reg pair : 1 byte } { dst ea mode offset : 4 bytes } { src ea mode : 1 byte } { src ea literal : 8 bytes } { branch displacement : 4 bytes }

That's 20 bytes in total, which is pretty long. However, consider that this is executed serially by the interpreter:

1. Get the opcode => jump to handler for fbgt.d
2. Get the destination EA mode => jump to handler for scaled index with displacement
2.1 Get the register number pair for the scaled indexing, calculate base effective address
2.2 Get the displacement from the opcode stream and add to the base effective address.
2.3 Fetch the double at the address
3. Get the source EA mode => jump to handler for immediate 8 byte
3.1 Get the double in the opcode stream
4. Compare the operands
4.1 If the comparison is true, get the branch displacement in the opcode stream and add to the PC ready to branch.
4.2 if the comparison is false, step over the branch displacement to the next instruction.

At each step, the data that is needed is next in the instruction stream. All of these operations deal with bytes apart from the one packed register pair, which allows for very simplistic C code (read the kind the compiler can optimise most readily) to be used. As there is no performance penalty for misaligned access to a cached value on x64, the 4 and 8 byte entities are read as such. Each step of the decode and execution advances the PC. Thus the PC is always aligned at an instruction boundary on completion.

Note that it's not easy to create even longer instructions, since immediate values are illegal destination address modes. Any other operation that can be statically evaluated by the assembler, for example comparing an EA mode to itself (as long as the EA mode has no increment/decrement semantics) is evaluated by the assembler and folded to a fixed branch (for true) or nothing (for false).

Regarding the need to determine the instruction size, the only part of the current workflow that needs to care about that is the assembler itself. It's a simple two pass design that during pass 1 resolves any references to things it's already seen immediately, and references to things it hasn't seen yet in during pass 2.

Finally while you contend that using extension bytes is a bad idea for extending the instruction set, I'd say yes and no. First of all, I'm not especially interested in adding extra instructions, I'm interested in writing code that feels familiar. Secondly, there is an instruction that can call the host, the basic mechanism for which costs about the same as 2 of the fastest class of register to register operations. This allows the host to provide all sorts of native code solutions for things, including classically vectorisable stuff. We use this for the basic IO, graphics, etc. but there's also a vector algebra set for 2D/3D calculation, bulk memory operations and so on.

Adding an extension instruction set to the interpreter is totally possible and would just reserve a prefix byte to jump to the corresponding handler. However, for stuff like SIMD this is not ideal since the cost of the scaffolding around the instruction would limit the gain made by using it. However, if JIT is ever introduced, then the multibyte instruction encoding becomes somewhat moot and at that point introduction of SIMD operations as an instruction set extension would definitely have some merit.
Title: Re: Imaginary 64-bit 680x0
Post by: TheBilgeRat on May 09, 2022, 07:43:08 AM: That's rather cool - although I wrote almost no 680x0 assembly and haven't a lot of nostalgia for it

I was thinking the other day about going the opposite direction back to 8 bit just on faster/tighter packages...
Title: Re: Imaginary 64-bit 680x0
Post by: Karlos on May 09, 2022, 01:51:17 PM: There's two kinds of people. Those that loved 68K style assembly and those that never tried it :)
Title: Re: Imaginary 64-bit 680x0
Post by: TheBilgeRat on May 09, 2022, 03:34:03 PM: Then I have no choice! ;D
Title: Re: Imaginary 64-bit 680x0
Post by: Karlos on May 09, 2022, 10:34:00 PM: Can't beat a good nerdsnipe.
Title: Re: Imaginary 64-bit 680x0
Post by: psxphill on May 10, 2022, 10:15:17 PM: Quote from: TheBilgeRat on May 09, 2022, 07:43:08 AM
I was thinking the other day about going the opposite direction back to 8 bit just on faster/tighter packages...

Ultimate 64 has a nearly 48mhz 6510, mega65 gets to around 40mhz, turbo chameleon & supercpu does 20mhz.

supercpu & mega65 have cpu extensions that allow direct access to more than 64k, but you are coding specifically for one of the two platforms and nothing else.
Title: Re: Imaginary 64-bit 680x0
Post by: Karlos on May 11, 2022, 01:34:26 PM: I wonder if anyone's ever put an eZ80 into a spectrum?
Title: Re: Imaginary 64-bit 680x0
Post by: TheBilgeRat on May 12, 2022, 01:27:10 AM: Not sure - I do know there are a few proto boards out there like Cerebrus....and that one that has an eZ80 that runs I think around 300Mhz
Title: Re: Imaginary 64-bit 680x0
Post by: Karlos on May 12, 2022, 08:39:42 PM: @TheBilgeRat

https://youtu.be/bIj3xBmALxE

Absolute, top-shelf hardest 8-bit nerd pr0n. Viewer discretion advised.