Welcome, Guest. Please login or register.

Author Topic: FPGA Replay Board  (Read 822550 times)

Description:

0 Members and 10 Guests are viewing this topic.

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: FPGA Replay Board
« on: May 15, 2011, 03:01:55 AM »
Quote from: digiflip;637825
Imagine our fpga arcade 2 will be able to use the latest versions of AMD radeon or nvidia cards,  while they be stuck with super natami aga or whatever they decide to call it lol.


The Natami 1 should be able to use a gfx card out of the box because it has PCI :P. I doubt the first versions of the Natami texture mapper will be as powerful or full featured as gfx cards from the 90s. That doesn't mean they shouldn't do it. Advantages are smaller profile case, less electrical use, standard, integrated with SAGA (1 monitor output) and no gfx bus bottleneck. It can improve over time as needed. I would love to see any fpga solutions that support gfx cards though. I would also like to see standards that allow the same OS to run on these different fpga Amiga's. Please add at least ColdFire instructions and compatible RTG "chunky" gfx. I might buy both if they both look good. A little competition will be good for the consumer ;).
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: FPGA Replay Board
« Reply #1 on: May 15, 2011, 05:29:33 PM »
Quote from: Iggy;637981
PCI-e would be great, but a single lane connection would be a severe bottleneck (so would a PCI connection).
However, a PCI-e X1 or X4 connection could be useful for expansion.


Actually, I think the fpga CPU (and FPU) is going to be the bottleneck even with PCI. Note that a FPU would almost be a requirement with gfx boards. More PCI gfx card's hardware is documented and Amiga supported as well. The most popular would be the Radeon series and Voodoo 3+. I think I could get the Voodoo3+ Warp3D driver to work if there was a 2D CyberGFX/P96 driver available and a FPU. There is no 68k Warp3D driver for Radeon cards at this time. The other route would be to go through AROS which has support for some Nvidia cards but is less Amiga backward compatible.
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: FPGA Replay Board
« Reply #2 on: October 11, 2011, 05:20:48 AM »
I hope the FPGA Arcade can do better with the update. If it's the 68020 enhancements slowing it down, then it would be hard to imagine the Natami N68k achieving 68060 performance.
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: FPGA Replay Board
« Reply #3 on: October 11, 2011, 02:49:07 PM »
Quote from: mikej;663219
The cache and prefetch must be disabled in the current core. This is what is hurting performance, not the cpu extensions. The new memory controller and cache system does a lot better.
/MikeJ


That should make a HUGE difference in performance. Maybe up to a 68030@50MHz? The fpga CPU isn't really pipelined is it? How big of cache? Is the branch logic like the 68020/68030 or the better 68040/68060/CF where branches backward are predicted taken and branches forward are assumed not taken. The latter provides much improved performance, especially on loops. The latter is what the N68k CPU will use also. I suggested the least significant bit of branches (no odd branch targets) be used to reverse this logic. It would make for prettier, more compact code, easier design for compilers, and easier branch optimization that could for example, change 68020/68030 branches to the newer style. A setting could allow this bit in the instruction to be inverted on the fly as the code is executed. Please consider cooperating with the Natami team to create standard ISA enhancements to the 68k after you have the fpga Arcade soft CPU working better ;).
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: FPGA Replay Board
« Reply #4 on: October 11, 2011, 05:34:42 PM »
Quote from: freqmax;663228
\
Earlier results show 256 byte instruction- and data cache improves performance to twice that of A4000 68040 @ 25 MHz. Dhrystone 9868, MFlops 10.30 measured with sysinfo.


You mean Mips=10.30 not MFlops right? There's probably not room to add FPU support in the fpga although it would only take 4 FPU instructions to do well in the SysInfo MFlops test. I posted the SysInfo FPU MFlops test on the Natami board...

http://www.natami.net/knowledge.php?b=2¬e=35355

The SysInfo tests are a joke. SysSpeed is a little better.

Quote from: billt;663229
Are Natami team open to this as well, or are they a closed-project group wanting to do their own thing?


I think they are open to standards for 68k ISA enhancements and (S)AGA/RTG enhancements. It would mean more software would support it. Software needs to support enhancements to get a benefit from them. More than 1 CPU standard could be supported if it's too burdensome to add full support. For example, a 68CF standard could add ColdFire instructions only and 68CF2 could add more enhancements for example. A defined specification, name with which to specify it and willingness of developers to support it is what is needed. I would be willing to help if I could find enough interest. I think developers that would support it include the Natami Team, Frank Wille (vasm, vbcc) and me (new version of ADis disassembler and also worked with Frank on some CF optimizations for vasm including more to come). Rune Stensland may do a Asm-Pro update as well. The 68k is a great development platform and we can make it better with a little bit of effort and cooperation.

Quote from: mikej;663234
1 block ram is about 2K byte. This is quite sufficient as a cache in this case.


Even a 2k cache will cause trouble for poorly written games using self-modifying code. Can the cache size be changed or is there bus sniffing (snooping)? I believe the N68k will get bus sniffing eventually in order to have as large of caches as they want.
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: FPGA Replay Board
« Reply #5 on: October 11, 2011, 08:21:33 PM »
Quote from: freqmax;663248
I hope there won't be any huge list of extra instruction ISA that all Amiga now should implement. Rather a few instructions that would give a great boost.

The FPGA capacity is limited.

Bloatwarning ..

I hate bloat and the Natami guys hate bloat. The 68k bloat is in the compilers, not the CPU. The 68k has one of the most compact 16 bit word encodings of any full featured CPU. ARM came back to a 16 bit CISC encoding because it's RISC encoding was bloated and now it has support for original RISC ARM + Thumb 1 + Thumb 2. They should have come back to the 68k and added CF and a few other carefully chosen extensions instead. 68k does not have the extensions from hell problem that x86 does. Making an instruction set wider should not impact performance much as Mike pointed out about moving to 68020+ ISA. This already is a huge improvement over 68000 for power and ease of programming. Do you think this was bloat even though CALLM/RTM and pre and post indexed indirect memory addressing modes should have been left out? The CF instructions are well thought out. This is from the CFPRM...

"As the ColdFire Family grew, input from users and tool developers as well as internal performance analysis suggested a number of ISA enhancements could improve performance and code density. Accordingly, different revisions to the baseline instruction set architecture have been defined and implemented in the various ColdFire processor cores."

Improving the code density increases the CPU performance as do any gains from fewer and more powerful instructions. New instructions should be encoded and processed in a similar way to existing instructions so as not to increase complexity in the decoder but could improve overall efficiency (reduce bloat). If you would like examples of the savings and how often the CF instructions could be used, I can start another thread about ISA enhancements. There are examples and talk about ISA enhancements on the Natami forum if you search for ColdFire.

Quote from: freqmax;663251
What method difference between Sysinfo and SysSpeed is there that makes SysSpeed benchmark so much more accurate?

The SysSpeed benchmark uses a more realistic set of instructions. The SysInfo MFlops test measures the performance of fnop and fmove register to register mostly. Maybe the 68881/68882 can perform nearly as well as the 68040 and 68060 at fnop but it says nothing about fp performance. Even the SysSpeed benchmarks are limited. Many benchmarks of realistic code that fits in the cache would need to be done and averaged to come up with a realistic Mips and that says nothing about how powerful the instructions are (makes RISC look good). SysSpeed does give me 99 Mips (and 40 MFlops) for my 68060@75MHz which is close to what Motorola claimed so I have to give it some credit ;).
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: FPGA Replay Board
« Reply #6 on: October 11, 2011, 09:13:54 PM »
Quote from: freqmax;663263
So SysInfo only fails at measure FPU performance accurately, not the CPU performance?


It fails at measuring CPU performance also. My 68060@75MHz gives 55820 Dhrystones, 58.26 Mips, 41.77 MFlops, and 6.00 Chip speed vs A600. Sadly, the MFlops is the closest to reality and I know how flawed the test is. I only found the MFlops test in the code as it's easy to spot as there aren't many other fp instructions. The results speak for themselves though.
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: FPGA Replay Board
« Reply #7 on: October 13, 2011, 08:52:04 PM »
Quote from: psxphill;663434

On one hand a cpu that was compatible with 68000-68060 + CF software would be nice. On the other it is not actually possible to do that 100%, especially because of stack frames & MMU differences.

The 68000-68060 were not compatible with each other as far as stack frames and MMU differences. Adding "normal" instructions that operate in the same way as previous would not affect these. It's the user level compatibility that we want and it's quite good...

"In most cases, an instruction/addressing mode which does exist in ColdFire behaves exactly like its 680x0 equivalent, which makes it easy for experienced 680x0 programmers to understand ColdFire code. It also means that user-mode code written for ColdFire can generally run unchanged on a 680x0 processor, provided the new ColdFire-only instructions are not used.

However, there are a few subtle cases where the ColdFire instruction is not exactly the same as its 680x0 counterpart. The most important of these is that multiply instructions (MULU and MULS) do not set the overflow bit. This means that a 680x0 code sequence which checks for overflow on multiply may assemble and run under ColdFire, but give incorrect results.

ASL and ASR also differ in that they do not set the overflow bit - but this is less likely to cause problems for real programs!"

MULU/MULS/ASL/ASR will not be a compatibility problem for the 68k as they will continue to be set the 68k way. ColdFire programs would be slightly incompatible because of this but it's extremely rare for a program to use the overflow flag. It's entirely possible to make a 68k+CF CPU that's more compatible with the 68k line than the 68060 was.

Quote from: psxphill;663434

Extending the instruction set beyond what is available now is a dangerous game though. I wouldn't necessarily even add CF, I'd spend the time and gates on getting the instruction rate up.

No, it's not dangerous. You would need to double the instruction rate to do the work of a mvs or mvz instruction and they would be common. You would need to at least triple the instruction rate to do the work of a byterev which is less common but used intensely in some drivers and data conversions for loaders/pictures etc. The code reduction also allows the cache to be used more effectively and reduces branch sizes improving overall efficiency. The CF instructions make the job of developers and compiler writers easier also.

Quote from: psxphill;663434
In terms of AGA enhancements, higher bandwidth and chunky pixels is all you really need. Copper and blitter etc can stay register compatible.

That's true. It's easy to keep the old and add more while staying compatible. The same applies to the instruction set. Poorly written software will find a way to break no matter what. They will have to be fixed instead of the hardware held back.
« Last Edit: October 13, 2011, 08:54:55 PM by matthey »
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: FPGA Replay Board
« Reply #8 on: October 13, 2011, 10:42:23 PM »
Quote from: freqmax;663459
Thus ColdFire and 680x0 compatibility is mutually exclusive when the same op-code is supposed to act in different ways?

Right. It's not possible to have 100% 68k compatibility and 100% CF compatibility at the same time in the same CPU. It is possible to have 100% 68k compatibility and 99.9% CF (user level) compatibility at the same time. That's what I think should be targeted.

Quote from: freqmax;663459
Then it's no longer a re-implementation, but something completely new. I see the main goal as being able to run the existing software base independently of rotting hardware.

The 68000 was not completely compatible with the 68020 at the user level either. The movem instruction was slightly different and some instructions became supervisor level. The 68020 call/rtm instructions were dropped in later 68k processors. The CF is similar with different ISA levels. There is a big difference between CF ISA_A which is minimalist and ISA_C which is close to the power of 68k. There are other options which can exist or not from one CF CPU to the next like MAC (3 different versions), FPU, division. If you want a new name for the slight differences, we could call 68k+CF "Cold Fusion". It would avoid the trademark infringement ;).

Quote from: freqmax;663459
Instruction sequences that involves some kind of looping can be optimized to identify those in the instruction queue. And performing them directly with logic gates with way less clock cycles per operation.

The 68k should be a good candidate for loop optimization logic like a loop counter combined with branch prediction. We know when a dbra instruction will terminate ahead of time making it easy to predict which code target of the branch not to load. The post-increment/pre-decrement addressing modes give a hint as to what memory the memory controller should be pre-loading. I suggested as much on the Natami forum but the simple gains need to be realized first. Using the 68040/68060 branch prediction scheme instead of 68020/68030 would save 2 cycles on every iteration of a loop (except last) assuming the same timing as the 68020/68030. That savings adds up fast. Branch prediction schemes generally don't save as many cycles on the 68k due to the short pipeline compared to most other processors.
« Last Edit: October 13, 2011, 11:41:43 PM by matthey »
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: FPGA Replay Board
« Reply #9 on: October 14, 2011, 02:14:06 PM »
Quote from: psxphill;663498
It depends on the software. If you could get a 25% overall increase then you'd still be faster than CF instructions unless your software was all mvs/mvz.


I see your point. Making all the instructions 25% faster would be faster overall. I think after a few easy changes, like enabling the cache and prefetch in the case of the fpga Arcade, finding 25% overall performance increase will start to get difficult. Adding instructions is an easy way to increase performance. Making the decoding path wider should has little impact on performance. Mike verified that adding the 68020 support has little effect on the speed...

Quote from: mikej;663219
The cache and prefetch must be disabled in the current core. This is what is hurting performance, not the (68020) cpu extensions. The new memory controller and cache system does a lot better.



Quote from: psxphill;663498

The danger comes from platform fragmentation. It's rather early for natami to try dominating the Amiga market. There never has been a coldfire Amiga, so there is no software that takes advantage of it except for that which you produce yourself & that software won't run on anything but a natami. That would kill my interest pretty quickly.


If you didn't notice, we already have platform fragmentation :(. We need supported open standards (helps to glue the fragments back together) and that's why I would like to see standards for an enhanced CPU and AGA. There won't be much new software without standards. It's the chicken and the egg problem that I'm trying to solve. The executables that have CF support would be much like for current 68k CPUs (000, 020, 030, 040, 060, 68CF, 68CF2, etc). Compilers only need a recompile to support whatever ISA/CPU.
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: FPGA Replay Board
« Reply #10 on: October 14, 2011, 07:05:04 PM »
@billt
That's the right attitude. Technology has moved forward, why keep the limits of the past? We can have high and true color chunky screens. We can add a few instructions and larger caches to the CPU as the few logic gates required are minor today. We can have many times the memory and storage space as we had in the past. The one thing we can't upgrade is the closed minds of Amiga users living in the past :/.
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: FPGA Replay Board
« Reply #11 on: October 14, 2011, 09:57:10 PM »
Quote from: freqmax;663504
I was under the impression we were in the business of re-implementing existing  Commodore computers. Not inventing new APIs for which there is no software for. Let the existing Commodore models be the "standard".

There is software available for the ColdFire! Add ColdFire instructions to the 68k and this software will run with a very high degree of compatibility on the new 68k + CF.

Quote from: freqmax;663540
The problem shows when you alter instruction sets, register structure, etc.. which will kill the capability to run existing software. Increasing memory, frequency, caches, etc..

Well, Mike better go back to the 68000 and throw out all the 68020 changes because he's killed compatibility with 68000 code, right? This is blatantly wrong! The 68000 to a 68020 was a very radical change, was not 100% compatible and made a few mistakes in my opinion but it was still a big enhancement in a positive direction. Adding CF instructions would be much more compatible than 68000 to 68020. It would be more compatible than enlarging the caches. Maybe you should stick with the MiniMig and ECS so you have all that compatibility. Oh wait, they enlarged memory and added new storage devices. There went compatibility. I guess your only option is an original unexpanded Amiga 1000 so you can enjoy your compatibility.

Quote from: freqmax;663540
If you want updated performance, try Intel Core (MIPS for freedom, or ARM for efficiency might be alternatives), Intel graphics (free API), it ends up being a PC. Though ARM + Graphics is becoming more common as well. However any compatibility goes out the window. Also modern systems use memory protection, preemptive multitasking, virtual memory, and user accounts which asfair AmigaOS doesn't support.

I think an updated 68k processor has more potential than ARM. I think with a few relatively minor additions the 68k can have...

+ better performance
+ better code density (I think 68CF+ can have 5-10% better code density than 68020)
+ easier to program

with the negatives of...

- higher power requirements (but still good and better than x86 derivatives)
- more gates (cost not really a problem today, power requirements higher)

I'm sorry if you can't see the potential.
« Last Edit: October 14, 2011, 09:59:24 PM by matthey »
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: FPGA Replay Board
« Reply #12 on: October 12, 2012, 07:39:46 PM »
Quote from: wawrzon;711138
@mikej
I think there was a copyback issue that the previous mask were not capable of carrying up properly.

Nothing to do with caches. It's probably a data register forwarding issue/bug that you are thinking of. The store-load bypass needs to be turned off on all full "RC" 68060 masks except 0E41J (Rev 6). This bug is unlikely with properly optimized code. It can only happen when writing a data register to memory as a longword and immediately reading it back. There is an easy solution that doesn't affect performance much. There are 21 known bugs in a 1F43G mask, 20 in a 1G65V mask and 0 in a 0E41J mask. Most of the bugs are minor or have workarounds.

Quote from: mikej;711141
We are just reading the product code register which contains mask revision. If you have any data this is not reliable I would love to know about it.

Product Code Register? You must mean "Processor Control Register". Yes, that's a good way to get the 68060 Revision Number. When the Chinese are sophisticated enough to change the PCR revision number, they might as well make us some new 68060 processors ;).
« Last Edit: October 12, 2012, 11:33:30 PM by matthey »
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: FPGA Replay Board
« Reply #13 on: October 14, 2012, 05:22:42 PM »
Quote from: Darrin;711407
If that is the case, I guess in theory we could have several Amiga cores loaded on the SD card which could be selected at boot time.  One that is 100% accurate and another that takes advantage of additional features, but might have a few issues with certain software.


Correct me if I'm wrong, but adding new instructions and addressing modes to the fpga would not even affect whether the fpga CPU is cycle exact to a previous CPU, provided none of the new functionality is used. It could affect compatibility where the programmer was relying on certain trapping behavior of illegal instructions or addressing modes but that is very rare. Adding the missing 64 bit integer instructions into the 68060 fpga CPU would be just as incompatible but offer a nice speedup. The Amiga rarely needs cycle exact as it relies more on custom chip timing but the Atari is more reliant on CPU timing. I would think a cycle exact 68000 and 68020 would be all that is needed for game consoles, Atari and a few old Amiga games. In my opinion, a cycle exact 68040 or 68060 is wasting time and resources that would be better spent on making a faster fpga CPU. Think of the new demos that could be created rather than the 2% of demos that don't work because they were poorly programmed. New instructions and addressing modes can provide a significant speedup, code density improvements to conserve memory and better use caches and add modern functionality the 68k was missing. With new fpga CPU functionality, we need standard ISAs in order to gain support in compilers and developer tools, gain support among programmers and users and create a common platform among multiple fpga CPUs that would make an ASIC more likely in the future. A little planning now could make a bright future possible.
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show all replies
Re: FPGA Replay Board
« Reply #14 on: October 16, 2012, 04:52:52 AM »
Quote from: Dozer;711603
No it is not; demos bang the hardware quite bad.


Banging the hardware doesn't have to mean incompatible. Frank Wille's SqrxzOCS game port bangs the hardware but works on 68000-68060, OCS-AGA, AmigaOS 1.x-3.x, is hard drive installable and properly exits.

Quote from: Dozer;711603

So, yes, we NEED proper settings, with cpu-selection and speed-setting to be able to watch the demos as intended

As a minimum:
7.14 MHz 68000
14.28 MHz 68020
50 MHz 68030
50 MHz 68060
66 MHz 68060 (some demos actually rely on this... )

Then again, this is to watch demos.. If all you want is to fire up the good old games, then "stock" settings would probably be enough.


So we need settings for every possible 68k processor at every possible speed of every possible accelerator with every speed of memory with OCS, ECS, or AGA on an Amiga 500, 600, 1000, 1200, 1500, 2000, 2500, 3000, 3000T, 4000, 4000T, CDTV or CD32 and cycle exact in both CPU and all custom chips so we can watch poorly programmed old demos. Mike might be busy for awhile. We could have new and better demos (as well as apps and games) on a faster and enhanced CPU and custom chips in a fraction of the time. Basic compatibility is good but supporting every poorly written program is ludicrous. We can patch what is important and make videos of troublesome demos.