Author Topic: Of FPGAs and the way forward (Read 6677 times)

SamuraiCrow · « **on:** August 05, 2008, 04:04:55 AM »

This post by me from the other Natami thread is what started this thread:

Quote

The Natami is pushing the hardware limits but it will only be able to run OS 4+ if they add a PowerPC accelerator card to it. Before the Mac went to Intel the PowerPC was a vibrant design with lots of attention and now it is relegated only to game machines which are restricted by hypervisors so that they don't allow ordinary people to experience their full potential.

I think PowerPC is a dead-end as much now as ever and the Intel and AMD processors are reaching a dead end also due to the heat restrictions of higher clock frequencies and are turning their attention toward multiple cores. Since their software doesn't run well on multiple cores they are going to be at a standstill.

I think the Intels will make it further than the PowerPCs because they have a more compact instruction set but that will wear thin when easier-to-use instruction sets prevail. I think the way of the future is asymmetric multicore design where the cores are dedicated to the functionality they are intended for.

What makes an Amiga, in my opinion, is the dual-bus architecture. While most computers are stuck with a left-brain dominant design that didn't allow for parallelism, the Amiga introduced a computer with a right hemisphere for creative thinking. On most computers it may be considered a required peripheral for graphics and sound, but on the Amiga it was standard, integrated, and elegant. On the stock A1200 it was even right-brain dominant due to the underpowered main processor.

I could say more but if you want to hear more of why I think the way I do, I'll start a new thread.

So here I go:

Most software is serial with branches. Most hardware is parallel but controlled by a processor. Now the industry expects software engineers to develop parallel applications and hardware engineers to take the lazy way out and make duplicate cores. This can only lead to tears because they are asking hardware and software engineers to use the other one's work techniques.

Software engineering is supposed to be easier than hardware engineering because unique software is more common than unique hardware. The symmetric multiprocessing path will be used in high-end applications but will be impractical for low-end applications until new programming languages come out that take advantage of parallelism.

Hardware engineering is always more parallel than software engineering because that's the way gate layout works on a chip. It's always quickest to have many things done at once and only resort to serialization when you run out of space on the die.

What I'm proposing is a compromise: FPGAs are programmable hardware and the software that controls them is designed to convert serial programs into parallel programs whenever possible but, at the same, maintain synchronization with the other parts of the chip by counting clock cycles.

The type of parallelism that takes place on an FPGA is much more flexible than parallel processors because, as long as you've got space on the chip, you can expand the bus width to 128 bits or narrow it to 1 serial connection depending on the data type.

The Amiga championed the way into the multimedia age by allowing a serial threaded processor to multitask and by making the most processor-intensive tasks run on a separate bus for all of the right-brain graphics calculations and sound mixing.

The next generation of hardware will start there with asymmetric cores like Intel's Atom and the Natami70. These multicore chips have a processor, a graphics core, sound capabilties and have them all on one chip. To keep down the costs they will have to cache the accesses differently from the dual-bus architecture of the Amiga but there is a unique characteristic that the Amiga chipsets had that isn't on the others: The and/or/invert logic in the blitter's bit-masking mode is the same bit-twiddling technique used in the FPGA!

Now, by practicing the custom bit-masking modes of the blitter, software engineers can learn to become hardware engineers and hardware engineers have a more important role. Hardware engineers can take the bit-masking modes of the blitter programmers and make them into custom chips thus completing the cycle. Software makes new hardware and new hardware makes more efficent ways to make software.

For an example of software being translated into hardware, download this PDF Slideshow from the LLVM Developers' Conference that took place last Friday. There's a Quicktime Movie to go with it if you can view it.

QuikSanz · « **Reply #1 on:** August 05, 2008, 04:50:20 AM »

SamuraiCrow,

I guess this is the bottom line for the reason I have to have a Natami. It is the closest modern example of a new Amiga I've seen yet. None of the other projects "even the vaporware ones" come close. The final version of this thing may just catch on to other types of users. The only problem I see is making new software. I'm not too optimistic on the idea that some software may go back in to production, I myself have a good collection but new users may have a problem.

Chris

Trev · « **Reply #2 on:** August 05, 2008, 05:30:17 AM »

Lazy or not, single-threaded software still benefits from multiple cores, as users tend to run more than one single-threaded program at a time. At the same time, however, we're expecting more out those single cores than we have in the past by way of hardware virtualization.

My own preference is for hardware solutions that keep my current Amiga platform running. I'm not really interested in reinventing the Amiga as something that does all the things a cost-effective *nix or Windows workstation does.

That said, I should probably just keep my mouth shut. ;-)

cicero790 · « **Reply #3 on:** August 05, 2008, 08:22:38 AM »

This sound like Amiga to me. Amiga was all about forward thinking. Jay Miner would have been all over this. This sounds really promising.

Can you see a system like this with AROS on it apart from all the platforms it running on all ready?

A super NATAMI ONE Amiga for WB AROS with seamless backward compatibility thanks to UAE. Is this what lays ahead? Closer cooperation between the soft and hardware side?

alexh · « **Reply #4 on:** August 05, 2008, 08:35:01 AM »

Quote

SamuraiCrow wrote:
Hardware engineering is always more parallel [snip] and only resort to serialization when you run out of space on the die.

Bollox.

Quote

SamuraiCrow wrote:
The software that controls [FPGAs] is designed to convert serial programs into parallel programs whenever possible but

Again not true. This software you talk of does not control. It creates the contents of the FPGA. And it does not convert serial programs into parallel programs. (Not quite sure where you got that idea from.) The high level languages used to define the contents have sequential (serial) and concurrent (parallel) statements where the designer can choose serial or parallel. The Synthesis tools do not (or at least, very rarely) try to change this.

Quote

SamuraiCrow wrote:
at the same, maintain synchronization with the other parts of the chip by counting clock cycles.

Yup, static timing analysis... all done over several minutes while creating the image to load onto the FPGA.

Quote

SamuraiCrow wrote:
The type of parallelism that takes place on an FPGA is much more flexible than parallel processors because, as long as you've got space on the chip, you can expand the bus width to 128 bits or narrow it to 1 serial connection depending on the data type.

Not real-time you can't.

You seem to be talking about reconfigurable hardware using FPGA's. While this has been discussed and experimented with over the years. The devices to support partial reprogramming do not commonly exist. (Although there are some on the market).

You could not reprogram the entire FPGA in real time as it currently takes many many hours to re-calculate complex designs. You have to cut the design into manageable hierarchical structures which can individually be reconfigured. However if the reconfiguration exceeds the resources previously owned by that block... you're screwed. You need to have worked out the sizes of all possible reconfigurations before you implement the overall FPGA.

And as I said... most devices do not support partial reconfiguration. Not to mention holding the current data in the pipelines.

BUT: It's very interesting stuff... undoubtedly the future of some areas of electronics.

Quote

SamuraiCrow wrote:
The and/or/invert logic in the blitter's bit-masking mode is the same bit-twiddling technique used in the FPGA!

At the lowest possible level.

Quote

SamuraiCrow wrote:
Now, by practicing the custom bit-masking modes of the blitter, software engineers can learn to become hardware engineers and hardware engineers have a more important role. Hardware engineers can take the bit-masking modes of the blitter programmers and make them into custom chips thus completing the cycle.

Erm, I don't think so. :-)

The overheads of reprogramming will be (for a long time) worse than just doing it in software.

What would be better is if the software engineers knew how to use the profiling tools that come with their system, they profile where the code is spending most of it's time, and work out what "acceleration" hardware they would like... and then reconfigure some "spare" logic to do this task. Reconfiguring at the data-path level is not practical yet.

Edit: Just read the slides. Interesting how it brushes over the reconfigurability aspect.

P.S. I think I had a board like the one in those slides years ago. We're all Virtex4 in our office now.

cicero790 · « **Reply #5 on:** August 05, 2008, 09:29:05 AM »

Well, lets hope there is a crouched way forward, and you know who travel them roads.

bloodline · « **Reply #6 on:** August 05, 2008, 11:09:39 AM »

@SamuraiCrow,

I'm not sure if I follow your thinking here. The FPGA issues aside, since they have been covered by AlexH... I don't uderstand your left brain/right brain metaphore...

Lets look at my aged Althon64 PC (and contrast it with my one of my A1200s with 4meg FastRam)...

The Athlon: It has a CPU on a bus with some memory, local and for the CPU only.
The A1200: It has a CPU on a bus with some memory, local and for the CPU only.

The Athlon: It has a separate main System bus for all support systems (GFX, Audio, I/O).
The A1200: It has a separate main System bus for all support systems (GFX, Audio, I/O).

The Athlon: It has a GFX CoProcessor (a Nvidia 8600) with it'a own RAM (512Megs) that performs all gfx functions, and capible of massively parallel GP processing.
The 1200: It has a GFX CoProcessor (The Blitter, Copper and barrel shifter) with its own RAM (well shared with the Audio and I/O).

The Athlon: It has a dedicated Audio DSP and its own RAM.
The A1200: It has a DMA fed DAC, and RAM shared with the GFX and I/O

I could go on... but my point is made, the Athlon is structurally rather similar to the A1200, but massively improved on the idea. Each of the various subsystems are powerful independant devices in their own right. I don't really see how this relates at all to your mataphore?

SamuraiCrow · « **Reply #7 on:** August 05, 2008, 01:25:43 PM »

@alexh

I never said configuring the FPGA in realtime was a reality. I didn't know how long it took to do the timing analysis though. What I was thinking of was more cores to add to the chipset could be implemented by converting some of the most frequently called subroutines in the OS to custom operations in the hardware.

@bloodline
The right-brain analogy was for the custom chips having their own memory bus. On the NatAmi there are 16-megs of Chip RAM built in to the FPGA. It is used as local-store memory for the custom chips and functions like a software cache for multimedia chip accesses.

Oddly, for whatever reason, the custom chips on the Natami can access the Fast bus and gain a huge amount of memory. So whether it's fully a dual-bus Amiga remains to be seen.

Your analogy of the PC being like an Amiga holds very true. Many graphics cards do have their own memory and are like Amigas in that aspect. If the GPU were used for doing all of the sound mixing like is proposed on the NatAmi, your AROS system could be an Amiga.

Hans_ · « **Reply #8 on:** August 05, 2008, 04:19:49 PM »

This is slightly off-topic, but people might want to take a look at Stanford's ELM architecture. They've identified that 70% of all power in a CPU is taken up by loading data and instructions. The ELM architecture is designed to lower that significantly, and they claim that it's power consumption is as little as 3x what an ASIC implementation for a task would be (i.e., the same algorithms implemented as software on the ELM vs an ASIC based design). This is down from 50x for typical CPUs.

Hans

bloodline · « **Reply #9 on:** August 07, 2008, 05:14:15 PM »

Quote

SamuraiCrow wrote:

@bloodline
The right-brain analogy was for the custom chips having their own memory bus.

Well, all systems use separate busses for different parts of the system... that's nothing unique to the Amiga.

Quote

On the NatAmi there are 16-megs of Chip RAM built in to the FPGA.

Is that an FPGA with 16megs?

Quote

It is used as local-store memory for the custom chips and functions like a software cache for multimedia chip accesses.

I don't understand... software cache?

Quote

Oddly, for whatever reason, the custom chips on the Natami can access the Fast bus and gain a huge amount of memory. So whether it's fully a dual-bus Amiga remains to be seen.

Again, I don't understand... if the Custom chips can access the CPU local bus, then why bother with it?

Quote

Your analogy of the PC being like an Amiga holds very true. Many graphics cards do have their own memory and are like Amigas in that aspect. If the GPU were used for doing all of the sound mixing like is proposed on the NatAmi, your AROS system could be an Amiga.

Why not just use a DSP for the Audio mixing? Leave the GPU to do GFX stuff... perhaps?

SamuraiCrow · « **Reply #10 on:** August 07, 2008, 08:30:42 PM »

Quote

bloodline wrote:
Quote

SamuraiCrow wrote:

@bloodline
The right-brain analogy was for the custom chips having their own memory bus.

Well, all systems use separate busses for different parts of the system... that's nothing unique to the Amiga.

Quote

On the NatAmi there are 16-megs of Chip RAM built in to the FPGA.

Is that an FPGA with 16megs?

Quote

It is used as local-store memory for the custom chips and functions like a software cache for multimedia chip accesses.

I don't understand... software cache?

Quote

Oddly, for whatever reason, the custom chips on the Natami can access the Fast bus and gain a huge amount of memory. So whether it's fully a dual-bus Amiga remains to be seen.

Again, I don't understand... if the Custom chips can access the CPU local bus, then why bother with it?

Quote

Your analogy of the PC being like an Amiga holds very true. Many graphics cards do have their own memory and are like Amigas in that aspect. If the GPU were used for doing all of the sound mixing like is proposed on the NatAmi, your AROS system could be an Amiga.

Why not just use a DSP for the Audio mixing? Leave the GPU to do GFX stuff... perhaps?

The Natami is a hybrid between PC architecture and Amiga architecture. It will still have 16 megs of static memory on the chip for use as chip memory but it will be many times faster than the fast-page memory used in the AGA machines. It will function much faster than even the main memory on the motherboard. That's how it will function as a local-store memory. When I referred to it as a software cache, I mean that it will function much like a disk caching system works. The recently accessed stuff will be in the 10 nanosecond chip memory for ready access by future operations.

If you used a DSP for sound then it wouldn't be a single-chip solution. As it is, there will likely be some external memory chips on the system board but only because it is more configurable that way. The idea is to have a mostly self-contained computer on just one chip.

As for the multi-bus architecture, it is only common for desktop models, single-chip solutions like the Intel Atom will use a bunch of backside caches and other tricks to make their systems work. Since the Amiga is designed to run from 2 megs of chip RAM, there will be no conflict with using software to detect and use the chip memory manually on the Natami without wasting chip space on yet another cache controller.

alexh · « **Reply #11 on:** August 07, 2008, 10:18:10 PM »

Quote

SamuraiCrow wrote:
On the NatAmi there are 16-megs of Chip RAM built in to the FPGA.

Yeah right... NOT!

An FPGA with 16-Mbytes of on chip SRAM would cost more than the gross debt of Northern Rock!

SamuraiCrow · « **Reply #12 on:** August 07, 2008, 10:25:44 PM »

At least that was the impression that I got. Maybe it will use external SRAM but either way it's going to be fast for a 200 MHz chip assuming it gets that far.

alexh · « **Reply #13 on:** August 07, 2008, 10:27:18 PM »

It's gotta be external. The biggest FPGA's today only have about 3Mbytes SRAM and they cost about $10,000 each.

You have to take everything written by the Natami wannabe "Gunnar von Boehn" as bollox.

Only Thomas Hirsch knows what is really going on.

downix · « **Reply #14 on:** August 08, 2008, 12:37:57 AM »

Quote

bloodline wrote:
@SamuraiCrow,

I'm not sure if I follow your thinking here. The FPGA issues aside, since they have been covered by AlexH... I don't uderstand your left brain/right brain metaphore...

Lets look at my aged Althon64 PC (and contrast it with my one of my A1200s with 4meg FastRam)...

correcting a lot of mistakes here

Quote

The Athlon: It has a CPU on a bus with some memory, local and for the CPU only.
The A1200: It has a CPU on a bus with some memory, local and for the CPU only.

The Athlon has the memory controller built-into the CPU, requiring all system access to utilize the Athlon whenever they need memory. Is limited to an 8-bit DMA system.
By comparison, the Amiga has memory used exclusively for the CPU, and used for the rest of the system.

Quote

The Athlon: It has a separate main System bus for all support systems (GFX, Audio, I/O).
The A1200: It has a separate main System bus for all support systems (GFX, Audio, I/O).

No, the Athlon has all support systems running through the CPU bus, opposite of the Amiga which keeps them seperately

Quote

The Athlon: It has a GFX CoProcessor (a Nvidia 8600) with it'a own RAM (512Megs) that performs all gfx functions, and capible of massively parallel GP processing.
The 1200: It has a GFX CoProcessor (The Blitter, Copper and barrel shifter) with its own RAM (well shared with the Audio and I/O).

Here is the real difference. The Athlon has this the opposite of the Amiga, sharing the CPU memory while the Amiga shares the video memory. The advantage to this is in the fact that the CPU gets undivided memory access in the Amiga, unlike the Athlon[/quote]
The Athlon: It has a dedicated Audio DSP and its own RAM.
The A1200: It has a DMA fed DAC, and RAM shared with the GFX and I/O
[/quote]I don't see many Athlons with audio DSP's. In addition, you remain stuck with the 8-bit DMA system of the Athlon vs the 16-bit DMA of the Amigas.

Quote

I could go on... but my point is made, the Athlon is structurally rather similar to the A1200, but massively improved on the idea. Each of the various subsystems are powerful independant devices in their own right. I don't really see how this relates at all to your mataphore?

No, your example fails due to not understanding the underlying design.

Author Topic: Of FPGAs and the way forward (Read 6677 times)

SamuraiCrow

Of FPGAs and the way forward

QuikSanz

Re: Of FPGAs and the way forward

Trev

Re: Of FPGAs and the way forward

cicero790

Re: Of FPGAs and the way forward

alexh

Re: Of FPGAs and the way forward

cicero790

Re: Of FPGAs and the way forward

bloodline

Re: Of FPGAs and the way forward

SamuraiCrow

Re: Of FPGAs and the way forward

Hans_

Re: Of FPGAs and the way forward

bloodline

Re: Of FPGAs and the way forward

SamuraiCrow

Re: Of FPGAs and the way forward

alexh

Re: Of FPGAs and the way forward

SamuraiCrow

Re: Of FPGAs and the way forward

alexh

Re: Of FPGAs and the way forward

downix

Re: Of FPGAs and the way forward