Author Topic: a golden age of Amiga (Read 14555 times)

Mrs Beanbag · « **Reply #14 on:** February 01, 2012, 08:42:39 PM »

Well that makes sense but

Quote from: Karlos;678632

Think of these as very simple CPU cores where stuff like conditional branching is expensive but data processing is not. Then imagine them in clusters, each cluster running the same code but on different data. Not like a SIMD unit, but as an array of cores, able to branch independently but optimal when in step.

see this is where I'm getting stuck, surely independent conditional branching is exactly what ray tracing needs a lot of.

Also the kernel doesn't have random access into a large area of memory (where your scene might be stored, for instance) but only to the small portion that comes in on the stream, yes?

Mrs Beanbag · « **Reply #15 on:** February 01, 2012, 10:00:34 PM »

Quote from: HenryCase;678651

Of course, it depends on what you're looking for. If you just want a ray trace accelerator then these issues are not so pressing.

The model I have in mind is one where your operating system will run on a fairly standard dual or quad core CPU, but bulk calculations can be done on a set of co-processors with the barrel-architecture mentioned above.

[/QUOTE]How much memory do you anticipate being adequate? Graphics card memory is fairly large these days, can get cards with 1GB directly on the graphics card, for example. Plus, the PCI-E bus these cards are plugged into isn't exactly sluggish.[/QUOTE]

It's not the total amount of memory that's the problem, it's the random access. The entire point of "streaming processor" is that the data comes in sequentially (more-or-less). For ray-tracing you need to traverse the scene as a tree structure independently for each pixel, so it's difficult to serialise into a stream, because it's recursive rather than linear. But the tree itself is static, so it can be in a read only memory. The complication is that this read-only memory has to be accessed by all the threads simultaneously, but because it's guaranteed not to change during rendering there aren't any cache coherency issues there. Each thread can have its own local data area and output buffer, and there are no interdependency issues. The outputs can be combined after all threads complete.

I'm thinking of a cyclic arrangement something like this:

[CPU] --> [input data] --> [barrel coprocessors] --> [output buffers] --> [CPU]

which is kind of like a semi-streaming set up, I guess. The output can still be streamed even if the input cannot.

There could be some kind of "burst unit" that transfers the output back to the input for the next iteration, which would be useful for physics simulations.

Mrs Beanbag · « **Reply #16 on:** February 01, 2012, 10:04:05 PM »

Quote from: Karlos;678648

Example: (youtube)kcP1NzB49zU

I did see that, it's neat, but it seems to be doing some crude form of render while you're moving it about and only does real ray tracing when you leave it still, and takes a short while to do it. The Intel Xeon demonstrations were full ray tracing in real time. Granted a Xeon setup like that would set you back a few grand...

Mrs Beanbag · « **Reply #17 on:** February 01, 2012, 10:33:54 PM »

Quote from: Karlos;678666

Even the crude render is ray traced, it's just not iterated as far. I'm a GPU fan for a number of reasons.They are massively powerful, cheap (comparatively) and to me, represent the logical evolution of what the Amiga's custom chips could have been.

No argument, I've always been a fan of offloading things to separate units so the CPU could twiddle its thumbs, just not convinced that streaming processors are the ultimate solution. Maybe there is some way to efficiently "stream" recursively...

You know, the Amiga's old blitter was this >< close to being able to do texture mapping. If only you could use the B source DMA in line drawing mode...

Quote

Incidentally, if you want to see what you can do on a slightly more serious (remember, the garage demo was running on a gaming card) GPU in realtime, check what the quadro 6000 can do:
(youtube)QaKwLp77kjQ

Nice, anyone got £4k to spare?

Mrs Beanbag · « **Reply #18 on:** February 01, 2012, 10:47:56 PM »

Quote from: Karlos;678674

Did you see the thread recently where someone (sorry, I forgot the username) got sub-pixel correct line drawing and (slightly buggy) sub-pixel correct polygon rendering out of ECS? Damned impressive stuff.

I did not...

Quote

Still cheaper than a bucketful of high end Xeons

Heh maybe, but Xeons are over-spec anyway, as I've been saying, we don't need all that superscalar jazz if we can throw enough threads at the problem.

But this is even better...
grmanet.sogang.ac.kr/seminar/RPU.pdf

Mrs Beanbag · « **Reply #19 on:** February 01, 2012, 11:24:35 PM »

Quote from: Karlos;678679

Actually, you've just brought the conversation full circle. The reason I brought up GPU in the first place was this.

In that case, it was full circle as soon as I started! This is exactly why I brought up UltraSPARC T1, a CPU designed specially for large numbers of threads with no regard for single-thread performance.

But, I hasten to point out, I'm considering this for a co-processor, not the main processor. I'm still thinking about how the memory system would work. I reckon there must be some way for a streaming unit to send the root of a tree to each core, and then the core can choose its path down the tree, so you can get the best of both worlds. Considering the FPGA article above, efficient real-time ray tracing is still a long way from current GPUs and CPUs. Both can do it, but both have to sweat.

Mrs Beanbag · « **Reply #20 on:** February 01, 2012, 11:27:12 PM »

Quote from: Karlos;678682

Yep, that's the one. I read his entire set of blog articles on the subject in the end. It was quite informative. So much cool stuff was locked away in some of that old hardware, never to be really exploited by anybody. More's the pity.

Indeed. One trick I've still yet to try, but I know must be possible, is to use HAM as a zero-cost polygon filler!

Mrs Beanbag · « **Reply #21 on:** February 02, 2012, 06:56:54 PM »

Quote from: Richard42;678774

Amdahl's law doesn't have anything to do with memory bandwidth. It's very simple, and trust me, there is no way around it. Some algorithms, or parts of algorithms, cannot be parallelized; they are inherently serial. CABAC in H.264 video compression is a good example. Your overall software performance will be bounded by the execution time of the most complex piece of the algorithm which cannot be parallized. Once that piece of your algorithm is taking up 100% of an execution core, your software cannot go any faster, regardless of how many more CPUs you throw at it.

This is exactly right. However ray tracing is one of a class of problems called "embarrassingly parallel".

Quote

I read a paper a few days ago claiming that it's better to have a few beefy cores than a bunch of wimpy cores. This is exactly what I've also seen in my experience with high-performance computing, and the reason why serious people run HPC compute loads on badass x86 chips and not ARM or Atom.

Well that rather depends on what you want to do with your computer! If you know you are going to use a lot of threads, a bunch of wimpy cores are a good choice.

A GPU is exactly a "bunch of wimpy cores" designed to maximise for data throughput. So is an UltraSPARC T1. They both do their jobs in different ways, but both are the right solutions to the right problems.

Mrs Beanbag · « **Reply #22 on:** February 02, 2012, 07:58:20 PM »

Quote from: HenryCase;678804

Let me put this question to you: what stops serial processing tasks being shared amongst different cores?

Umm it's the fact that every stage in the computation has a direct dependency on the previous stage. Instruction level parallelism might be able to squeeze some extra performance out of each stage, but even that has its limits (more than four-way superscalar is typically more effort than it's worth).

There's an argument to be made for this being down to unimaginative formulations of the problem. But that's another story. A lot of existing software has been designed this way.

Mrs Beanbag · « **Reply #23 on:** February 07, 2012, 09:48:43 AM »

I don't even know what "official" means in this context, and it certainly doesn't particularly motivate me. I don't much care if something is "official" or not, I only care if it is any good.

Mrs Beanbag · « **Reply #24 on:** February 08, 2012, 05:39:37 PM »

Quote from: HenryCase;679788

I have a plan that would achieve exactly that. Issue now is getting the technical skills to implement it, but that's something I'm working on. The architecture isn't exactly like the Amiga, but things I picked up from the Amiga and discussions around it have partly inspired it.

What's your plan? I might be able to help.

Mrs Beanbag · « **Reply #25 on:** February 08, 2012, 10:10:03 PM »

One thing that I have thought about FPGAs is that they're designed for "rapid prototyping" or in otherwords with the idea in mind that they are something you make with the intention of making a "proper" chip at some stage in the future... how this manifests is in the scheme where you "flash" your design to the chip, into non-volatile storage of some sort. But if these are going to be used as reconfigurable processors, they don't need to be non-volatile at all and could be made of ordinary RAM so that they can be reconfigured much more quickly. In fact they could be wired up in such a way that they can reconfigure themselves as they go along.

This has a lot of advantages. If your program doesn't use some feature of a CPU, such as floating point, but does a lot of integer maths, it could reconfigure it all as integer cores when needed. But perhaps we can go even further than an FPGA design here...

Mrs Beanbag · « **Reply #26 on:** February 09, 2012, 12:22:49 PM »

Quote from: HenryCase;679892

Thank you for your ideas Mrs Beanbag.

With regards to FPGAs, you are correct that currently they are often used for rapid prototyping. However, they are occassionally used in commercial products,

I should have said rapid prototyping and low-volume production. But in either case, the problem is to "make some hardware", to put it simply. They are not yet thinking outside of the box. Once the design is put on the chip (by some external device or circuit), it is a constant, so as to behave just like any other special-purpose chip.

Quote

Use of RAM for reconfigurable computers is an interesting idea, but I'm not quite sure how it would work. Could you explain more?

Basically, I'm suggesting that instead of the Look Up Tables (LUTs) being initialised once by some external device, it can configure and reconfigure itself on the fly. Currently (as I understand it at least) the device has to be turned off, loaded with a design, and then turned on again. But rather it could load its own design through a DMA channel or suchlike as it goes along.

In fact the Amiga already has an FPGA in it. It is called the Blitter. The blitter has three sources which it can combine using any combinatorial logic supplied to it in the Minterms field of its control registers. This Minterms field then is exactly the same as the LUT in an FPGA's logic cell. It applies the same LUT to every bit in a 16-bit word at once and then sequentially on word after word, which it pulls in through DMA and writes out again. Now imagine if it could pull the Minterms in through DMA as well, it would open up all kinds of possibilities. Then if you could connect lots of blitters together so that they could pipe their outputs to each other, many things become possible.

I've never seen anyone mention this, but you can configure the blitter to do arithmetic addition. Presumably this is why the fill operation works in descending mode only - it is to propagate the carry bit.

So really using FPGAs isn't so different from the Amiga afterall!

Quote

Yes, I intended the CPU to be programmable in the way you suggest. When you say we could go further than an FPGA design here, what do you have in mind?

What I have in mind is a pipelined scheme where not only the data can be passed down the pipeline, but the functionality as well! Say for instance you want a processing unit that can perform several different kinds of pipelined operation. So instead of having a pipeline for each different operation, you can pull the circuits themselves from memory and actually pass the functionality down the one pipeline.

Mrs Beanbag · « **Reply #27 on:** February 09, 2012, 11:14:43 PM »

Quote from: HenryCase;680011

@Mrs Beanbag

I quoted this particular text, as I think it shows I've confused you by using the term FPGA. The FPGAs that are possible with memristors do not need to follow the same restrictions that traditional FPGA designs have.

For the conversation to move forward, I think it is absolutely vital that you understand the benefits that memristors could bring.

Ok memristors are very exciting, I'll give you that. I did watch the long video after I composed my previous reply, and it's got me thinking even more. I need to get my head round this "imaginary current" that you see in passive circuit analysis, because as much as it works, it makes very little sense. I suspect there is something even more complex going on behind that but when I try to research it it's kind of a brick wall. I suspect that SU(2) might come into play at some point, and we can start inventing some really weird passives. But I'm struggling to devise a Lorentz Invariant formulation of Ohm's Law now...

But, what I propose with completely reconfigurable FPGAs is already possible with existing technologies. In fact now I think about it I reckon it's possible to do it in an FPGA. META.

Mrs Beanbag · « **Reply #28 on:** February 09, 2012, 11:19:09 PM »

Quote from: actung_bab;680020

a understand what your saying on very basic level what is diffrence between fpg
the simple pic control chips i seen my friend program i know these have very limited storage but seems you can do alot with less , which what your saying .
apart from elagance of working with this whats the practice benifets of this
lower power use.

A pic is a normal, simple CPU with some flash ROM on chip. You just put a program there and it goes. It is programmed in C or assembly language and runs sequentially, and typically very slowly. Although in theory you could design one with an ARM core or a x86 core or whatever. But the ones you can buy are really simple and tiny.

An FPGA can be rewired electronically to be any kind of chip at all. It doesn't run a program (unless you design a CPU core on it), it simply reroutes data and performs any logical operation you want in any combination or sequence. So you can make it do specialist tasks, and do umpteen things at once, the only limitation is the number of logic cells. Ok that's not the only limitation but it will do for explanation's sake.

Mrs Beanbag · « **Reply #29 from previous page:** February 10, 2012, 11:11:29 AM »

Quote from: HenryCase;680050

I'm sorry, what imaginary current are you referring to?

Well we all know V=IR, right? Ohm's law. Well if you let R be a complex number (we write it as Z instead and call it "impedance" rather than resistance) you can model impedance and capacitance as well. The voltage and current ends up complex as well, of course, which makes no sense, but nevertheless it works. I don't know what the "impedance" of a memristor would be, I can only surmise that this simple "electronic theory hack" isn't quite up to the task of representing it. The problem is that it's quite ad hoc, as far as I can tell it's not properly derived from fundamental laws, it's just made up and used because it works.

Quote

The function of present day FPGA devices is fixed at boot time, how do you intend to get around this? You may be able to get around it using multiple fast booting FPGAs (see article here: http://electronicdesign.com/article/digital/fpgas-boot-in-a-flash15649.aspx ), is that what you intended?

You can configure the LUTs to act as register files. The same LUTs that would normally be used to hold your fixed designs can still be changed internally. They are Read/Write. What is needed is a design scheme that would let you do this in a useful way.

Whether our FPGA works on memristors or not, we need a design. You can't just throw memristors at it and magically it becomes reconfigurable on the fly. It might be reconfigurable quicker, but it will still have the same limitations. It's not the SRAM that's the problem. There are strategies for partial reconfiguration, but they all assume the design being fed in from some outside source.

You can design a DMA controller in an FPGA. If you can access external memory your design can pump data in and populate its own LUTs. I think a good analogy might be something like Conway's Game of Life. (Surprisingly, it is Turing complete! In fact I'm amused by the idea that given a board large enough, one could simulate John Conway. But I digress.) The external bootstrap circuit feeds in a small "agent" that has a DMA and a ruleset, the rest of the unconfigured cells are basically its playground, where it can wander about and pull in design blocks through its DMA and write them to the LUTs surrounding it. We'd need it to be able to grow and branch and create paths that packets can be sent along.

Author Topic: a golden age of Amiga (Read 14555 times)

Mrs Beanbag

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga