Interesting to see how Schoenfeld-bashing goes on here.
Jen-SS: You're ignoring various facts:
- first of all, please consider writing my name correctly. Single d at the end, not dt.
- it's not three full-time programmers, but two hardware developers on the project - both only part-time.
- Dennis pointed out himself that my approach will produce a smaller design.
- Dennis and I are not competitors. I will even provide hardware for his core to run on. Why didn't anybody read that comment?
- I haven't "copied Dennis' effort". Dennis announced his project early december 2005, where Clone-A was already in the making
Some people might care less about inter-chip-communication. I do care a lot, because it tells me so much about the way the chips work. Being able to communicate with the original chips means that I'm on the right path. This approach leaves no room for errors, where the "could care less" approach leaves freedom for interpretation.
It's also funny to see how many people all of a sudden become hardware experts, just because hardware can be "produced" by writing Verilog code. Let me tell you that this is not the case. You're not becoming a C programmer because you can do a "hello world" in C. Therefore, please take the time to *technically* understand the following (and if you continue bashing, I take it that you're not interested in the truth):
If you're doing your own design and afterwards tweak it to be cycle-accurate, you will most probably end up with a bloated design, so in a way, you're right. Take the example of the "miracle two pixel delay" in Denise that the emulator programmers found out about. If you're doing it cycle-accurate, you have to add a 2-bit delay, which will cost you chip space and routing space.
The trick is to find out what Denise is *really* doing during this 2-cycle delay - apart from delaying. The question "why not output the data when it's already known" must be re-phrased to "why is data not yet ready to be sent to the outputs" - that's why Clone-A will not only use less FPGA space, but also be more compatible by design.
Other things are even Verilog-related. If for example a 10-bit counter value must be compared, you compare the 10-bit value and the counter with a simple expression in the code. No matter how good the compiler and optimizer is, it will always produce a 10-bit comparator, which is correct.
However, on a hardware level, you can reduce things to a minimum if you take other knowledge into account. Let's assume the counter is the horizontal line counter (x-coordinate), and want to generate a signal that (dis-)allows sprite DMA at positions 20 and 772. The Verilog code would be absolutely simple about this, as it's two 10-bit comparators. My approach to the thing is to only compare single bits in order to save FPGA space:
I know that the counter is only counting up, so the number 20 is the first where bits 4 and 2 are set at the same time. My first comparator is only 2 bits instead of 10. That same knowledge used on the number 772: It's the first numer (when counting up) where bits 9,8 and 2 are set at the same time. This comparator is 3 bits wide, so I'm using 5 comparator-inputs where a correct Verilog-implementation would use 20. Saved 75% in this example, which is NOT representative.
I am all for your claim to have people make up their own mind about things. In turn, you should feed them with the correct information, and not just with your way to see the world. If you don't have *all* the information, you know nothing. I explained it on a hardware-level above, but you can also hear it on the radio if you take the song by the Scissor sisters:
I don't feel like dancin'
is a totally different line compared to
I don't feel like dancin' without you.
Jens Schönfeld