Author Topic: in case you are interested to test new fpga accelerators for a600/a500 (Read 39801 times)

alphadec · « **Reply #209 on:** April 01, 2015, 09:03:36 PM »

Quote from: biggun;787159

The new CPU is so much faster than we can now look at stuff like H264 playback which is totally senseless to try on old 68030.
So whether there is a new instruction used or not - does not make a difference the new video datatype will anyway not run on old slow 68030.
And also 68060 will be too slow in many cases.

When can we see some screengrabs where SASG is used, or is too early. ?

wawrzon · « **Reply #210 on:** April 01, 2015, 09:07:00 PM »

not every application needs full screen update in a high frame rate.

though if you want introduce incompatibility to begin with, then what is the point in a 68k type cpu. er than that it looks like similar road the amiga ppc offshots took, being able to execute old code but otherwise promoting completely new binary target. even faster would be probably to put there an x86 with 68k emu.

matthey · « **Reply #211 on:** April 01, 2015, 09:56:58 PM »

Quote from: Thomas Richter;787092

So wait. Why exactly do we need a "move zero extended" instruction again?

After all, "moveq #0,d0; move.b (a0),d0" could also be merged into a single "meta"-instruction, right?

Similarly, "move.w (a0),d0;ext.l d0" could also be merged into one instruction....

I see now even less the need to extend the ISA.

I find it interesting that Gunnar talks about the technical design of the CPU, including assembler examples, but any mention of ISA details or questions about it or MacOS compatibility can't be discussed in public. I see this as hypocrisy and lack of openness. I tried to get you involved in the ISA creation when he was "taking control" but I don't even think you could have stopped him now.

The ColdFire MVS and MVZ instructions can be added in the same encoding location as the ColdFire with practically no compatibility issues for the 68k. I find your complaint about this odd considering how tame it is compared to Gunnar's hole filling encodings for E0-E7, a non-orthogonal A8 and possible use of A-line (goodbye MacOS compatibility).

Yes, you are correct with your equivalent "merged" instructions but these CF instructions can be used on their own destination register also.

MOVEQ #0,Dn + MOVE.W ,Dn -> MVZ.W ,Dn ; 2 bytes saved
MOVEQ #0,Dn + MOVE.B ,Dn -> MVZ.B ,Dn ; 2 bytes saved
SWAP Dn + CLR.W Dn + SWAP Dn -> MVZ.W Dn,Dn ; 4 bytes saved
AND(I).L #$ffff,Dn -> MVZ.W Dn,Dn ; 4 bytes saved
AND(I).L #$ff,Dn -> MVZ.B Dn,Dn ; 4 bytes saved

The latter 2 are vasm peephole CF optimizations (there are more using MVZ and MVS which are less common). Multi-instruction inputs are not allowed for vasm peephole optimizations. MVS at first appears less useful than MVZ.

MOVE.W ,Dn + EXT.L Dn -> MVS.W ,Dn ; 2 bytes saved
MOVE.B ,Dn + EXTB.L Dn -> MVS.B ,Dn ; 2 bytes saved

Compilers like these types of instructions because they are common on modern processors. The 68k and CF backends are shared by most compilers so support only needs to be turned on.

I would prefer to have 68k names of SXTB.L, SXTW.L, ZXTB.L and ZXTW.L like EXTB.L but specifying the type of extend. I would also allow "SXTW.L ,An" for "MOVE.W ,An" to improve orthogonality and description of the operation. Too many 68k compilers have had problems with the sign extension into address registers and none that I have seen have been able to take advantage of this auto word to longword sign extension for optimizations. Compilers have had trouble here and the CF instructions simplify the code.

Instruction folding/fusion has a cost and it doesn't catch reordered combinations. Let's look at your layers.library 45.24 for example.

Statistics
----------------------------------------
Instructions 2 bytes in length = 2574
Instructions 4 bytes in length = 1910
Instructions 6 bytes in length = 88
Instructions 8 bytes in length = 0
Instructions 10 bytes in length = 0
Instructions 12 bytes in length = 0
Instructions 14 bytes in length = 0
Instructions 16 bytes in length = 0
Instructions 18 bytes in length = 0
Instructions 20 bytes in length = 0
Instructions 22 bytes in length = 0
Instruction total = 4572
Code total bytes = 13316

1 op.l #,Rn -> op.l #.w,Rn : bytes saved = 2
3 opi.l #,Dn -> op.l #.w,Dn : 68kF1 bytes saved = 6
3 opi.l #,EA -> opi.l #.w,EA : 68kF2 bytes saved = 6
0 move.b EA,Dn + extb.l Dn -> mvs.b EA,Dn : bytes saved = 0
90 move.w EA,Dn + ext.l Dn -> mvs.w : bytes saved = 180
0 moveq #0,Dn + move.b EA,Dn -> mvz.b EA,Dn : bytes saved = 0
3 moveq #0,Dn + move.w EA,Dn -> mvz.w EA,Dn : bytes saved = 6

0 pea (xxx).w -> mov3q #,EA : bytes saved = 0
0 move.l #,EA -> mov3q #,EA : bytes saved = 0 68kF bytes saved = 0

EA modes used
----------------------------------------
Dn = 743
An = 976
# = 1
# = 57
# = 6
(xxx).l = 2
(An) = 217
(An)+ = 382
-(An) = 163
(d16,An) = 1449
(d8,PC,Xn*SF) = 2

This is from a version of ADis Statistics this "outsider" modified for Gunnar. The first thing that sticks out are the instruction lengths which are very short and why I adamently recommended 3 superscalar integer units which should be a success even if dual ported memory with 2 cache reads/cycle would make it a monster. The results are a little misleading as SAS/C avoids many immediate longword operations like this:

MOVEQ #,Dn + LSL.L #8,Dn (loads 3rd most significant byte)
MOVEQ #,Dn + SWAP Dn (loads 2nd most significant byte)
MOVEQ #,Dn + ROR.L #8,Dn (loads the most significant byte)

Tricks like this shorten the average instruction length but they generate 2 dependent instructions (without instruction scheduling or folding/fusion). It's more efficient and simpler for the compiler to use the OP.L #,Dn especially with a new addressing mode that could sometimes optimize "op(i).l #,Dn -> op.l #.w,Dn". More modern compilers which promote variables to longword show much more savings using a new sign extended addressing mode and MVS/MVZ instructions which are needed for the promotion to 32 bits. Many processors like the 68060 can only forward 32 bit results and are up to twice the performance with variable promotion to 32 bits.

My ADis Statistics find 90 occurences of "move.w EA,Dn + ext.l Dn" which would save 180 bytes with MVS.W and 3 occurences of "moveq #0,Dn + move.w EA,Dn" which would save another 6 bytes with MVZ.W (total number of instruction reduction for MVS and MVZ would be 93). This gives a good indication of how much could be saved with instruction folding/fusion of these popular pairs also. Let's look for additional savings that wasn't caught though.

move.w d0,d7 ; 26e : 3e00
movea.l a3,a0 ; 270 : 204b
ext.l d7 ; 272 : 48c7

move.w ($10,a0),d0 ; 330 : 3028 0010
move.w ($10,a1),d1 ; 334 : 3229 0010
ext.l d0 ; 338 : 48c0
ext.l d1 ; 33a : 48c1

move.w ($12,a1),d2 ; 33e : 3429 0012
move.l d1,d7 ; 342 : 2e01
move.w ($12,a0),d1 ; 344 : 3228 0012
ext.l d2 ; 348 : 48c2
ext.l d1 ; 34a : 48c1

move.w ($44,sp),d7 ; 3f8 : 3e2f 0044
move.w ($46,sp),d6 ; 3fc : 3c2f 0046
ext.l d7 ; 400 : 48c7
ext.l d6 ; 402 : 48c6

move.w ($46,sp),d0 ; 440 : 302f 0046
move.w ($4a,sp),d1 ; 444 : 322f 004a
ext.l d0 ; 448 : 48c0
ext.l d1 ; 44a : 48c1

move.w ($48,sp),d2 ; 44c : 342f 0048
sub.l d0,d1 ; 450 : 9280
move.w ($44,sp),d0 ; 452 : 302f 0044
ext.l d2 ; 456 : 48c2
ext.l d0 ; 458 : 48c0

...

Did someone turn on the instruction scheduler? I think it's helping some but if you want instruction folding/fusing then the scheduler is a problem. The instruction scheduler by itself removes some of the need for the folding/fusion and is probably better overall. The MVS/MVZ instructions also have limited capability to increase the IPC (Instructions Per Cycle) with only 1 cache reads/cycle but they do still improve code density. If 2 cache reads per cycle were ever implemented then they would make a big difference. The MVS/MVZ instructions would have been added to ColdFire after doing code analysis. I wish they had made it into the 68060 where they would have been especially helpful considering the instruction fetch bottleneck of this processor.

I think better ColdFire compatibility would be useful for the 68k and would win some support from ColdFire customers and developers. Just a few days ago, I compiled some code with ColdFire target (-cpu=5329) into my Amiga hunk executable to see what code it produced for someone. It would be nice if I could run the executable and the Amiga could become a CF CPU option and development platform. It could also be a next genation CPU option for the Atari Firebee. The more code and developers the better especially when relatively small changes can gain CF compatibility.

ChaosLord · « **Reply #212 on:** April 01, 2015, 10:30:05 PM »

For those of you who don't understand what is being discussed, I will now summarize:

Matt Hey has shown up to the debate with a library full of facts and figures and has won the debate on all counts.

Matt shoots.... he scores! Matt Hey FTW!

PlaySound CrowdGoesWild.8svx

xboxOwn · « **Reply #213 on:** April 02, 2015, 12:21:00 AM »

Quote from: ChaosLord;787165

For those of you who don't understand what is being discussed, I will now summarize:

Matt Hey has shown up to the debate with a library full of facts and figures and has won the debate on all counts.

Matt shoots.... he scores! Matt Hey FTW!

PlaySound CrowdGoesWild.8svx

Hey ChaosLord, do you have any idea where I can buy the latest version of Total Chaos?

Thanks in advance

kolla · « **Reply #214 on:** April 02, 2015, 04:52:29 AM »

A little confusing about all the plans for boards.

Quote

In order of time, we should see (but read carefully, because this is part of the plan to confuse the enemy):

1) A developer board with the Arrow Be Micro. It has HDMI, Ethernet, but only 25.000 L.E. This board seemed adequate only one month ago, but the core is grown too much. These boards are almost ready, are few, so they will be used for testing purpose

2) The Vampire 500. It has 40.000 LE, hdmi, ide... seems to me that Igor already purchased the PCB and it is on the way

3) The Vampire 600b. It is the old vampire "upgraded".
At beginning, igor has planned to have 10.000 LE, but later he decided to switch 40.000 LE so it is delayed. No HDMI.

4) A new board from Christoph Hoehne. It should be the developer board with some upgrades and a new design to fit also in the a600 (with a different connector).

5) A new PCI board with the FPGA, to speedup Winuae.

http://www.apollo-core.com/knowledge.php?b=1¬e=2742&x=2&z=aIEhYP

My biggest question though, is: who is "the enemy"?

kolla · « **Reply #215 on:** April 02, 2015, 04:55:49 AM »

And yeah, pretty weird that it's impossible to get a straight answer to a bolean question; will MacOS work under emulation on Phoenix?

Lurch · « **Reply #216 on:** April 02, 2015, 04:58:53 AM »

Who cares about MacOS??? Anyway, stop trying to shoot Gunnar down and move on.

My epenis is bigger etc like its a school yard or something.

biggun · « **Reply #217 on:** April 02, 2015, 06:42:44 AM »

Quote from: matthey;787163

possible use of A-line (goodbye MacOS compatibility).

THIS IS NOT TRUE

Matt all encoding which are new are gated with a Apollo-Mode switch.
You know this fact, this was discussed over a year ago.
This means there is no impopatibility old MacOS code.
You should know that ALL your posts about incompatibility are NOT TRUE.

Matt how do you call people which on prupose while knowing better write not the truth?

xboxOwn · « **Reply #218 on:** April 02, 2015, 06:56:30 AM »

Quote from: biggun;787189

THIS IS NOT TRUE

Matt all encoding which are new are gated with a Apollo-Mode switch.
You know this fact, this was discussed over a year ago.
This means there is no impopatibility old MacOS code.
You should know that ALL your posts about incompatibility are NOT TRUE.

Matt how do you call people which on prupose while knowing better write not the truth?

Hold on a second, I thought this project is about Amiga and not Mac. Why is Matt is focusing on Mac emulation? To me even IF this card does not allow Mac emulation I don't care...I am into improving Amiga hardware to getting modern software for the Amiga system not Mac.

It seem this entire fight is about Mac.

ferrellsl · « **Reply #219 on:** April 02, 2015, 07:43:07 AM »

Quote from: xboxOwn;787190

Hold on a second, I thought this project is about Amiga and not Mac. Why is Matt is focusing on Mac emulation? To me even IF this card does not allow Mac emulation I don't care...I am into improving Amiga hardware to getting modern software for the Amiga system not Mac.

It seem this entire fight is about Mac.

Yeah, if Matt isn't whining about Macintosh compatibility, he's whining about ColdFire compatibility. He really needs to take his trash talk to a ColdFire forum or start hanging out at a classic Mac site. He's gotten to be more than just tiring. He's either trying to "wow" us with his assembler knowledge or he has a personal problem with Gunnar, or maybe both. Just wish he'd leave it alone.

biggun · « **Reply #220 on:** April 02, 2015, 08:02:51 AM »

To Thomas,

Thomas I understand your point that new instructions can have
the side effect that new compiles will not run on old CPUs.
This is clear and this is understood.

What I am not understand is how relevant this is in our case.

Lets look at some numbers
Phoenix today can reach over 300 Mips - this means Phoenix is like 20-30 times faster than todays ACA cards.
We see on the horizon the next gen FPGAs.
This means we can have a future roadmap where we know we can create cards with over 1000 Mips.

The performance difference between Phoenix and old 68030 cards is so ridicolous
that I think the wish to run future killer applications on both platforms is pointless.

xboxOwn · « **Reply #221 on:** April 02, 2015, 08:04:46 AM »

Quote from: ferrellsl;787192

Yeah, if Matt isn't whining about Macintosh compatibility, he's whining about ColdFire compatibility. He really needs to take his trash talk to a ColdFire forum or start hanging out at a classic Mac site. He's gotten to be more than just tiring. He's either trying to "wow" us with his assembler knowledge or he has a personal problem with Gunnar, or maybe both. Just wish he'd leave it alone.

For me...even if Mac emulation is the greatest importance for me and even if this Vampire 500 is 0% mac compatible and I start pounding my head on the table..I WANT MAC, I WANT MAC EMULATION and I start pulling my hair and freaking out like enraged PMS woman...I have the solution to my problem:

A) Burn down the house with the Amiga in it..just because I could not get a Mac emulator that emulates a 20 year old Mac system

OR
B) Have both ACA500+ACA1233 and Vampire installed at the same time.

You will say to yourself...will..how can that be? We know that Amiga 500 hardware is genius...if it detects said CPU in the zorro expansion it simply disables the CPU in the motherboard where the Vampire 500 is installed. Now I boot in full 68k accelerator and here I can run Mac emulation no problem.

If I get tired of the 68k+Mac emulation and I want to use the modern technology with modern RTG, SAGA, etc...I do not need to remove the ACA500 and install it again....that will cause wear and tear on both the card and my A500 computer...what I will do as soon as I turn my A500 on I get the ACA500 menu right? Will...all I have to do is hit F10...and Del to disable the ACA500 card while it is installed in my A500..the A500 resets...the ACA500 is disabled during the entire time it is turned on...and POOOF VAMPIRE 500 TAKES OVER.

When I am finished...i can either turn on Amiga 500 and turn it back on...and use ACA500 mode..or turn off A500 for the day.

IN THE END MY A500 CAN RUN EVERYTHING...from the OLD 68K to the new Phoenix...without any problem!!

CASE CLOSED!!!!!!!!

Guys...can we please move on in the future. Like i said before...if we need past technology...use ACA500+ACA1233...want modern technology, faster than ACA1233, wanna watch movies on Amiga classic, run DOSBox, run VICE, blah blah...switch to vampire 500 (Phoenix) and both of these are installed in your A500. Between menu configuration you can switch between old or new and without needing to remove or install hardware all the time and you get both of the two worlds. So no need to complain here and let Biggun do his job and we enjoy the result, ok?

biggun · « **Reply #222 on:** April 02, 2015, 08:19:22 AM »

Last post from my side here.

I appreciate technical discussion.
I appreciate any discussion about which enhancements make most sense, about what benefit 100% Coldfire compatibilty can give us.

What I can not stand is that when people try to strengthen their points by posting manipulative lies.

I have a real problem here with how Matt behaves.

Matt posted on some forum that the way we design the CPU/FPU internally would hindern us going for ASIC.
This is pure bull%&$#?@!%&$#?@!%&$#?@!%&$#?@!.
The fact is that Phoenix design proposal here matches 100% what INTEL, AMD and IBM are planning to do for their next CPU generation coming to market in 2 years.
This means our design is state of the art, absolutely modern.
And what we plan to do is perfect for going for ASIC.

Matt posted here 2 times that Phoenix would have problem running MAC-OS.
This is bull%&$#?@!%&$#?@!%&$#?@!%&$#?@!.
I do not understand why Matt on purpose post lies.

This thread started as very valueable post by somoen pointing out the option to get early Developer cards.
Unfortunately this threads quality got quickly worth by off topic post.
And some of these posts are pure lies.

I see no reason for us to continue to talk on this level.
I will not post here anymore.

If people think it makes sense to spread lies then continue without me.

Lurch · « **Reply #223 on:** April 02, 2015, 09:06:14 AM »

@biggun I appreciate the work you are doing and hope to hear more great things coming our way because of it. I can see Phoenix shaking up the classic hardware community in a good way and perhaps some people are scared of this?

Anyway look forward to hearing further updates and hope that I get a chance to take part :-)

guest11527 · « **Reply #224 from previous page:** April 02, 2015, 10:09:25 AM »

Quote from: biggun;787193

What I am not understand is how relevant this is in our case.

Lets look at some numbers
Phoenix today can reach over 300 Mips - this means Phoenix is like 20-30 times faster than todays ACA cards.
We see on the horizon the next gen FPGAs.
This means we can have a future roadmap where we know we can create cards with over 1000 Mips.

Very fine then. Why is it then relevant to have new instructions in first place? Let's check what we have:

*) LineA: This instruction space is not usable because MacOs has its Os traps here.

*) More data registers, A8: Not usable in a multitasking Os because exec does not save and restore these on a context switch. For support, the exec scheduler had to be drilled up. Possible, but creates problems with monitors and system tools like Xoper and friends that analyze the exec stack frame. Hence, more problems for software developers and users, instead of less.

*) Instructions like "MVZ": Not useful because they can be replaced by a sequence of two instructions without any additional cost. Ok, the code gets two bytes longer. Big deal.

So in the end, there are no benefits or no usable benefits at the cost of compatibility. The question is: Would anyone use these instructions in new code? If so, new code would be required to compiled for the old instruction set, and the new instruction set. So basically, the user has to know on which system the software is to run - or would receive crashes.

Does it make sense to write a CPU-dispatcher in a program for such small benefits? Likely not. I will not add a dispatcher to save two bytes for some instructions (after all, I would have to duplicate code, thus making things longer instead of shorter). I will not use additional registers because they are not saved and restored by exec.

The only thing were I believe some extra instructions are useful are in highly specialized bottleneck-algorithms where it makes sense to have a CPU-specific dispatcher, and two versions of the same code because it makes a noticable speed benefit for the user.

Quote from: biggun;787193

The performance difference between Phoenix and old 68030 cards is so ridicolous that I think the wish to run future killer applications on both platforms is pointless.

Then why do we need new instructions in first place?

Author Topic: in case you are interested to test new fpga accelerators for a600/a500 (Read 39801 times)

alphadec

Re: in case you are interested to test new fpga accelerators for a600/a500

wawrzon

Re: in case you are interested to test new fpga accelerators for a600/a500

matthey

Re: in case you are interested to test new fpga accelerators for a600/a500

ChaosLord

Re: in case you are interested to test new fpga accelerators for a600/a500

xboxOwn

Re: in case you are interested to test new fpga accelerators for a600/a500

kolla

Re: in case you are interested to test new fpga accelerators for a600/a500

kolla

Re: in case you are interested to test new fpga accelerators for a600/a500

Lurch

Re: in case you are interested to test new fpga accelerators for a600/a500

biggun

Re: in case you are interested to test new fpga accelerators for a600/a500

xboxOwn

Re: in case you are interested to test new fpga accelerators for a600/a500

ferrellsl

Re: in case you are interested to test new fpga accelerators for a600/a500

biggun

Re: in case you are interested to test new fpga accelerators for a600/a500

xboxOwn

Re: in case you are interested to test new fpga accelerators for a600/a500

biggun

Re: in case you are interested to test new fpga accelerators for a600/a500

Lurch

Re: in case you are interested to test new fpga accelerators for a600/a500

guest11527

Re: in case you are interested to test new fpga accelerators for a600/a500