Author Topic: a golden age of Amiga (Read 27876 times)

HenryCase · « **on:** February 01, 2012, 01:34:33 PM »

@Mrs Beanbag

Quote from: Mrs Beanbag;678487

Forget GPUs, massive parallelism is the way to go.

I do agree that FPGAs represent a big opportunity to change how flexible computing architecture can be, but the line I quoted above doesn't make sense. The very reason GPGPU is a growing field is due to the massively parallel nature of modern GPUs. GPU computing and FPGA computing are not identical, but they are clearly related.

Quote from: Mrs Beanbag;678487

We could call them Juggler Chips!

I've got good news for you, your Juggler chips already exist:
http://www.eetimes.com/electronics-products/processors/4115523/Xilinx-puts-ARM-core-into-its-FPGAs

HenryCase · « **Reply #1 on:** February 01, 2012, 07:00:42 PM »

Quote from: Mrs Beanbag;678578

These are not barrel processors. They are FPGAs with an ARM core attached. Which is also cool and useful, but not "Juggler chip" as described above. "Juggler chip" is similar design strategy to UltraSPARC T1 but with ARM instruction set.

I guess I misread what you meant. Perhaps it would be best to outline in more detail what design you had in mind for the 'juggler chip', I'm interested to hear your thoughts.

In the meantime, here's another couple of links about massively parallel chips that you may be interested in following up on:
http://www.greenarraychips.com/
http://www.tilera.com/

HenryCase · « **Reply #2 on:** February 01, 2012, 09:12:13 PM »

Quote from: Mrs Beanbag;678634

UltraSPARC T1 took a more holistic approach. Knowing servers always run umpteen threads at once, there's really no point in all that extra complexity to get the most single threaded performance. So they ditched it all and instead made a CPU core that could switch threads on every cycle. They only have to have a register file for each thread and rotate them round (hence the term "barrel processor"), and you can get rid of a whole load of complexity and go back to a very simple core that only does one instruction at once, which gives you room for loads more cores on a die, and cache misses can be made to vanish into the background. Single-thread performance is terrible, but if you can throw enough threads at it it can keep up with CPUs that run at far faster clock speeds. The T1 typically ran at 1.2GHz and, given the right sort of workloads, could keep pace with 3GHz Xeons.

I see, thank you for the information. That design makes sense for server chips, where you have a high number of unrelated threads, but expanding this type of architecture to be more generally useful does require looking at memory management. The main issue with parallel computing is managing memory, having the processing power to deal with multiple threads is easy in comparison.

Of course, it depends on what you're looking for. If you just want a ray trace accelerator then these issues are not so pressing. If you'd like to explore the memory issues more (and solutions to them), I can highly recommend this video on concurrency in Clojure, which offers the best solution I've found to date on making programming parallel systems (comparitively) easy to manage:
http://blip.tv/clojure/clojure-concurrency-819147

Quote from: Mrs Beanbag;678644

Also the kernel doesn't have random access into a large area of memory

How much memory do you anticipate being adequate? Graphics card memory is fairly large these days, can get cards with 1GB directly on the graphics card, for example. Plus, the PCI-E bus these cards are plugged into isn't exactly sluggish.

HenryCase · « **Reply #3 on:** February 01, 2012, 11:02:18 PM »

Quote from: Karlos;678674

Did you see the thread recently where someone (sorry, I forgot the username) got sub-pixel correct line drawing and (slightly buggy) sub-pixel correct polygon rendering out of ECS? Damned impressive stuff.

Karlos, is this thread referring to the same demonstration?
http://www.natami.net/knowledge.php?b=6¬e=43776

EDIT: Ah, I see you found what you were looking for.

HenryCase · « **Reply #4 on:** February 01, 2012, 11:50:30 PM »

Quote from: Karlos;678679

For multiple CPU approaches to massive threading, though there are other complications. Amdahl's law, for one :-/

There are ways around Amdahl's Law. The main cause of it, AFAIR, is the memory architecture of traditional computer systems (the bottleneck is the memory bus of designs like the Von Neumann architecture).

However, this isn't necessarily a property of the CPUs themselves. In systems that have moved away from traditional memory routing setups, Amdahl's Law can be defeated. One of the reasons I looked into this was after reading an article (or maybe watching a video) about the XMOS chip, which apparently scales really well whilst adding new cores without having the same memory bottleneck. I can't find the article where this was mentioned, but if I do I'll link to it here.

In my opinion, there is a need to make managing memory as parallel an operation as can be achieved. Lockless concurrency is definitely possible at the software level (as shown in the Clojure video I linked to earlier), and it'll be interesting to see memory management evolves at the hardware level.

To those reading that don't know of Amdahl's Law, it basically says there are limits to the efficiency of multi-core systems, and past a certain point you can damage the efficiency of your system by adding more cores. This tipping point varies depending on the architecture, for example for modern x86 CPUs it is said that there is not much benefit beyond 4 cores (in this case it is CPU dependent as the Northbridge has now moved onto the CPU die).

Quote from: Mrs Beanbag;678688

Indeed. One trick I've still yet to try, but I know must be possible, is to use HAM as a zero-cost polygon filler!

Sounds like a good idea!

HenryCase · « **Reply #5 on:** February 02, 2012, 07:34:32 PM »

Quote from: Richard42;678774

Amdahl's law doesn't have anything to do with memory bandwidth. It's very simple, and trust me, there is no way around it. Some algorithms, or parts of algorithms, cannot be parallelized; they are inherently serial.

Let me put this question to you: what stops serial processing tasks being shared amongst different cores?

The issue with parallelising serial tasks is not the access to more cores, as out-of-order execution shows it's possible to streamline processing based on the computing resources available. What does hold things back is that the memory holding the data being worked on is not shared out. That is why I stated that commonly employed memory architectures are the bottleneck. If it helps, think about it like this. What we have now is multiple cores working on a single data set. Now think about a network of computers working on a problem together. A key part of making this efficient is ensuring they block each other as little as possible. Now consider that it's possible to build a 'network' of computers within a single computing device, so long as they have control of their own memory. Hopefully you can see where this is going, if not this page should give a little more clues:
http://www.eetimes.com/design/eda-design/4211228/Overcoming-32-28-nm-IC-implementation-challenges

Amdahl's Law applies only to a certain set of programs. Yes, there are parts of algorithms that must be executed in a certain order. However, there are many ways to write code that lends itself to parallel execution. Here's one example of an article that discusses ways to beat Amdahl's law:
http://drdobbs.com/cpp/205900309?pgno=1

Generally speaking, one of the key things when designing programs that are highly parallelised is avoiding the need to manipulate state. For example, the programming language Haskell is 'pure' by design in the sense that it doesn't alter the state of program whilst running it, and the elements of the program that do require changing state and the side effects from this are sandboxed in structures called monads. This allows Haskell programs to take advantage of multi-core CPUs without needing to worry about program execution.

If this is new, need to explore what is meant by side effects. Imagine if every time you asked a certain question you got the same answer. Having such a question in a program is an example of something without side effects. Next, imagine the opposite. With the question with side effects, the answer is partly determined by when you ask it. By removing side effects, it doesn't matter when you ask the question.

HenryCase · « **Reply #6 on:** February 02, 2012, 09:33:13 PM »

Quote from: Mrs Beanbag;678808

A lot of existing software has been designed this way.

You won't find me disagreeing with that. Of course it's possible to design software that doesn't scale well to multiple CPU cores.

What is being suggested by Amdahl's Law is that there's a limit beyond which you can't improve performance by adding a new processing core, even if you code from scratch. Looking at it another way, if you're trying to execute an algorithm, you'll only be as fast as the slowest non-divisible element in the algorithm. This is true, but what I feel is being overlooked is that it's possible to make smaller non-divisible elements in a program by altering the architecture that the program runs on. It's about maximising performance, and from what I've seen we've got plenty of room to optimise both the hardware and the software of parallel systems.

HenryCase · « **Reply #7 on:** February 03, 2012, 09:01:44 PM »

@yssing
FWIW, the people claiming that Amiga OS4.x somehow needs to share the 'official' title are just being pedantic IMO, and I say this as an AROS fan.

HenryCase · « **Reply #8 on:** February 08, 2012, 09:24:46 AM »

Quote from: Digiman;679740

What I meant about Jay Miner etc was a new generation of designers who designed a machine architecture radically different and superior to current desktop PC or Mac.

I have a plan that would achieve exactly that. Issue now is getting the technical skills to implement it, but that's something I'm working on. The architecture isn't exactly like the Amiga, but things I picked up from the Amiga and discussions around it have partly inspired it.

HenryCase · « **Reply #9 on:** February 08, 2012, 08:11:06 PM »

@Mrs Beanbag, @Tripitaka
Thanks for your interest. My plan is a mishmash of ideas from all over the place, and I've not tried to summarise them succinctly before, but I'll try.

The plan is based around two central themes: FPGA computing and an OS that has a similar structure throughout its construction. As hinted at before, I do have Amiga-inspired ideas for this system too, but it's best to explain the core structure first.

With regards to FPGA computing, FPGAs are quickly improving in power and cost already, but I believe we're on the verge of seeing a game changer emerge. What I believe to be a game changer will be FPGAs built on memristor technology. There are a few reasons for this. Other than being fast storage devices, memristors can also be used for computation. This video is what inspired me with this particular idea, if you wish then skip to 29:10 to hear the bit about the memristor-based FPGA:
http://www.youtube.com/watch?v=bKGhvKyjgLY

The implication logic section starting at around 38:37 is also worth pointing out.

Next, the OS. The main source of inspiration behind the OS came from watching this video by Ian Piumarta, who currently works for VPRI:
http://www.youtube.com/watch?v=cn7kTPbW6QQ

The video describes a programming language being developed for VPRI that basically combines the functional programming of Lisp with the object-oriented nature of Smalltalk, which appeared to be a powerful combination for approaching programming tasks (FWIW, I realise that there are Lisps with object systems in place already, but these previous object systems were afterthoughts rather than at the core of the language).

What I was particularly struck by was he was describing the way VPRI was trying to build an OS using this language, which would be compact (OS in approx 20,000 lines of code), as well as structurally similar at every level of the OS (in other words, understand the code for high level apps, and you'd also understand the code at the lower levels of the OS).

What's so good about this? There are a few things. One of the advantages touched on in the video is that this same language can be used to define the hardware. So, if you run the OS on an FPGA, learning one language doesn't just give you the ability to create programs, it also allows you to define hardware accelerators for your programs.

To further illustrate what's possible with this approach, it's worth knowing about the Bluespec language. A simple explanation of what's possible in Bluespec is that when you evaluate Bluespec code, what is produced is both an accelerator design and the code that makes use of this accelerator. Essentially, it tries to create an optimal solution utilising both hardware and software:
http://www.gizmag.com/bluespec-code-circuit-system/20827/
I'd attempt the same thing with this system I'm proposing.

Another advantage that comes along with memristor-based FPGAs is the re-programmability. One thing you're taught early when learning Lisp is to view code and data as different ways to interpret the same object, in that code can be data and data can be code. Think about what we have with a memristor-based FPGA; we have a device that can store programs, and we have a device that can perform computation. What if you made these areas of the chip interchangeable? In other words, a section of the chip could be storage one minute, and then a logic circuit the next. What this gives you is on-the-fly re-programmability of FPGAs, you program a new accelerator as 'data' then you flip the switch and its now an accelerator.

As I mentioned before, I have other ideas for this system, and if you'd like information about the Amiga-inspired parts I'd be happy to share them. So to summarise, the basic plan is for flexible FPGA computing + simple OS + architecture that scales from high level to low level. In a way this simple OS + hardware accelerators approach is a spiritual successor to the Amiga anyway, just taken to the next evolutionary level.

Why is this better than what's out there? Other than the simplicity of the system, it has a lot of potential to speed up code execution. Not only would you have as many specialised accelerators as you could fit on your hardware, you'd be concentrating the processing on a single chip (apart from RAM, there's some engineering challenges to overcome before memristors replace RAM), which should remove a lot of processing bottlenecks.

So what do you think? :-)

@Fats

Quote

All I wanted to say is that a.org is not the right place to look for that

Are you sure? ;-)

HenryCase · « **Reply #10 on:** February 08, 2012, 10:11:40 PM »

Quote from: amigadave;679885

@HenryCase,

Is this something you are really working on, or just thinking about?

Working on in the sense of educating myself, but not working on in the sense of building it yet. Do you have any feedback?

HenryCase · « **Reply #11 on:** February 08, 2012, 10:40:16 PM »

Quote from: Mrs Beanbag;679886

One thing that I have thought about FPGAs is that they're designed for "rapid prototyping" or in otherwords with the idea in mind that they are something you make with the intention of making a "proper" chip at some stage in the future... how this manifests is in the scheme where you "flash" your design to the chip, into non-volatile storage of some sort. But if these are going to be used as reconfigurable processors, they don't need to be non-volatile at all and could be made of ordinary RAM so that they can be reconfigured much more quickly. In fact they could be wired up in such a way that they can reconfigure themselves as they go along.

Thank you for your ideas Mrs Beanbag.

With regards to FPGAs, you are correct that currently they are often used for rapid prototyping. However, they are occassionally used in commercial products, and again I must stress that memristors bring a number of advantages to typical FPGAs that will allow the performance to get much closer to fixed function devices. Allow me to explain...

The simple explanation for current FPGAs is that they are organised into code blocks called logic elements. You can see a diagram of one here:
http://www.eecg.toronto.edu/~pc/research/fpga/des/Logic_Element/logic_element.html

Through configuring these logic elements you change the function of FPGAs to fit your needs. However, the architecture is sub-optimal compared to ASICs as the extra bulk required to manage the re-programmability limits their logic density (in other words, ASICs allow circuitry to be simplified).

However, using memristors in FPGAs has the potential to make FPGAs a lot more efficient. There are a few ways to build FPGAs using memristors, but let's first focus on the first way described in the video I shared. The simple way to think about this is that you have two layers in the chip; on one layer you have transistors, on the other layer you have memristors. Memristors act as the wiring between the transistors. They are ideally suited to this. Memristors are electrically controlled variable resisitors that remember their resistance. Increase the resistance high enough and you have a 'blocked' connection. Reduce the resistance and you have an 'open' connection. The circuitry to manage the memristor wiring would be minimal, and the simpler connections would allow more efficient designs to be implemented on FPGAs.

As suggested, this is just one way to implement FPGAs utilising memristors. The other way hinted at in the video was to use implication logic. What has been discovered is that memristors can do logic by themselves, you don't need transistors at all, but you need to use a different form of logic. My knowledge on implication logic is sketchy at the moment, but from what was suggested in the video, the preliminary work done at HP on this indicates that when compiling C code, you end up with smaller binaries using implication logic. It's an area I need to research more, but I should learn to walk before I can run! ;-)

Use of RAM for reconfigurable computers is an interesting idea, but I'm not quite sure how it would work. Could you explain more?

Quote from: Mrs Beanbag;679886

This has a lot of advantages. If your program doesn't use some feature of a CPU, such as floating point, but does a lot of integer maths, it could reconfigure it all as integer cores when needed. But perhaps we can go even further than an FPGA design here...

Yes, I intended the CPU to be programmable in the way you suggest. When you say we could go further than an FPGA design here, what do you have in mind?

HenryCase · « **Reply #12 on:** February 08, 2012, 11:19:23 PM »

@Tripitaka
Thank you for your words of support, I'm glad you see the benefits of blurring the boundaries between hardware and software.

Regarding the price, I don't have much control over this, but FPGAs are clearly improving fast when it comes to their price/performance ratio, plus concentrating the processing and storage in a single chip should help decrease costs. To be honest, I say single chip, but there's no reason you have to limit this design to a single chip, it would scale well to multiple chips too, enhancing upgradability no end. Want a bit more processing power? Stick in another memristor FPGA. You still keep the processing power you had before, but now nearly seamlessly enhanced with the capacity you just added.

I hope you will feel free to post any more ideas and feedback you have, and I'd be interested to hear from your software engineer friend too.

Thanks again. :-)

HenryCase · « **Reply #13 on:** February 08, 2012, 11:24:47 PM »

Quote from: bloodline;679897

The big question is "what problem does it solve"... In engineering almost anything is possible, but few things are actually useful :-/

What problem do you want it to solve? Computers are programmable devices, you can turn them to anything that can be symbolically represented. Altering the architecture in the way being described allows increased efficiency, enhanced extensibility (both at the hardware and software level), easier maintainability and even potentially lower power draw and lower cost. What's not to like! ;-)

HenryCase · « **Reply #14 on:** February 09, 2012, 09:53:13 PM »

@Tripitaka

Quote from: Tripitaka;679908

We thought it was a terrible waste as the hardware encoder was now redundant and had cost a whopping amount of cash. Anyway, we got into talking about what a shame it was that we couldn't re-write the damn chip to do something more useful.

I agree, it's a shame when dedicated hardware no longer adds value to a system. This is something reconfigurable computing aims to do away with.

@bloodline

Quote

It's really fun to think about the technology, and what you could put together in an interesting way... But really when building a product, you need to find a core, basic need that is unforfilled and then try and meet that need in the simplest cheapest way...

Make no mistake about it, I'm not going for a single niche here, I'm going for general purpose computing. Let's go through the benefits I listed before:

Increased efficiency - Important where performance is key. Whatever tasks need high performance computing could benefit. If you want a single example, think of physics simulations.

Enhanced extensibility (both at the hardware and software level) - Important where you have tasks you have specific niche requirements. This is important in markets that are just beginning, or are too small for large investment. For an example of a market where extensibility could be a benefit, think of home automation.

Easier maintainability - This is just common sense. By keeping the core operating system compact and easier to understand, you increase the number of people that are adept at reasoning about its function. The more accurately that people can build a mental model of how something works, the more adept they will be at using, fixing and enhancing it. Benefits will be felt in all industries where it is used.

(Potentially) lower power draw - The obvious answer is this is important for mobile devices, but it's important for much more than that, for example power draw in servers is also a big concern. Memristor-based FPGAs have a number of ways to reduce power draw. For example, with modern CPUs you hear people talking about 'dark silicon'. Dark silicon is where you have unused sections of your processor, because it's getting increasingly hard to send power the whole chip all of the time. We can use this to our advantage; by switching off the unused chip real estate through use of memristor switches, you can optimise your device for low power draw. Then, when more power is available, you switch the extra circuitry on again.

(Potentially) lower cost - If I really have to explain why this will help this succeed, I really don't know what to say!

With all that said, there are markets I think will adopt earlier than others. The obvious places to look is where FPGAs are already in use, i.e. in embedded systems. Embedded systems are specialised enough that you don't need a large, expansive software ecosystem to build what you need. Plus, the engineers building embedded systems are more attuned to choosing hardware based on its actual merits rather than from any emotional ties.

As the software ecosystem develops, the system will become attractive to more and more markets. However, I'm not going to waste my time imagining what these markets will be, I am confident that people will see the benefits, just need the system to be built so those benefits can be realised. Hope that answers your question.

@Mrs Beanbag

Quote

Currently (as I understand it at least) the device has to be turned off, loaded with a design, and then turned on again.

I quoted this particular text, as I think it shows I've confused you by using the term FPGA. The FPGAs that are possible with memristors do not need to follow the same restrictions that traditional FPGA designs have.

For the conversation to move forward, I think it is absolutely vital that you understand the benefits that memristors could bring.

Firstly, it can make FPGAs reprogrammable on-the fly.

Secondly, it can make FPGAs more efficient.

Thirdly, memristors could potentially replace RAM, as well as long term storage, and processing. Essentially you can get all three in one chip.

Now, I hinted before that there are some issues with memristors replacing RAM at the moment, but considering how new memristors are, I anticipate memristors will be used in RAM in the future, once these challenges are overcome.

If you'd like a shorter introduction to the memristor video I posted before, please watch this video, it's only 6 minutes long, and should help you understand how memristors can change the structure of FPGAs:
http://www.youtube.com/watch?v=rvA5r4LtVnc

@all
Welcome to field any more questions. Also, welcome to hear more feedback, positive or negative. Thanks.

Author Topic: a golden age of Amiga (Read 27876 times)

HenryCase

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga