Author Topic: a golden age of Amiga (Read 31247 times)

Mrs Beanbag · « **Reply #59 on:** February 01, 2012, 11:24:35 PM »

Quote from: Karlos;678679

Actually, you've just brought the conversation full circle. The reason I brought up GPU in the first place was this.

In that case, it was full circle as soon as I started! This is exactly why I brought up UltraSPARC T1, a CPU designed specially for large numbers of threads with no regard for single-thread performance.

But, I hasten to point out, I'm considering this for a co-processor, not the main processor. I'm still thinking about how the memory system would work. I reckon there must be some way for a streaming unit to send the root of a tree to each core, and then the core can choose its path down the tree, so you can get the best of both worlds. Considering the FPGA article above, efficient real-time ray tracing is still a long way from current GPUs and CPUs. Both can do it, but both have to sweat.

Mrs Beanbag · « **Reply #60 on:** February 01, 2012, 11:27:12 PM »

Quote from: Karlos;678682

Yep, that's the one. I read his entire set of blog articles on the subject in the end. It was quite informative. So much cool stuff was locked away in some of that old hardware, never to be really exploited by anybody. More's the pity.

Indeed. One trick I've still yet to try, but I know must be possible, is to use HAM as a zero-cost polygon filler!

HenryCase · « **Reply #61 on:** February 01, 2012, 11:50:30 PM »

Quote from: Karlos;678679

For multiple CPU approaches to massive threading, though there are other complications. Amdahl's law, for one :-/

There are ways around Amdahl's Law. The main cause of it, AFAIR, is the memory architecture of traditional computer systems (the bottleneck is the memory bus of designs like the Von Neumann architecture).

However, this isn't necessarily a property of the CPUs themselves. In systems that have moved away from traditional memory routing setups, Amdahl's Law can be defeated. One of the reasons I looked into this was after reading an article (or maybe watching a video) about the XMOS chip, which apparently scales really well whilst adding new cores without having the same memory bottleneck. I can't find the article where this was mentioned, but if I do I'll link to it here.

In my opinion, there is a need to make managing memory as parallel an operation as can be achieved. Lockless concurrency is definitely possible at the software level (as shown in the Clojure video I linked to earlier), and it'll be interesting to see memory management evolves at the hardware level.

To those reading that don't know of Amdahl's Law, it basically says there are limits to the efficiency of multi-core systems, and past a certain point you can damage the efficiency of your system by adding more cores. This tipping point varies depending on the architecture, for example for modern x86 CPUs it is said that there is not much benefit beyond 4 cores (in this case it is CPU dependent as the Northbridge has now moved onto the CPU die).

Quote from: Mrs Beanbag;678688

Indeed. One trick I've still yet to try, but I know must be possible, is to use HAM as a zero-cost polygon filler!

Sounds like a good idea!

Thorham · « **Reply #62 on:** February 02, 2012, 02:36:30 AM »

The golden age of the Amiga has long since passed, and it's not going to come back using hardware that hasn't got anything to do with Amigas (yes, here we go again). There's no chance, especially not when people keep sticking to operating systems that are so far behind the current standard (which is arguably a questionable standard).

yssing · « **Reply #63 on:** February 02, 2012, 08:14:42 AM »

The Official Amiga moved to PPC a long time ago, and yes AOS4.x is the official amiga platform.

Regarding browsers, there are plenty of OS Browsers, so its a matter of "just" porting those. Alas, I don't know how to do it though.

What I would really like is a java runtime environment and maybe some sort of flash implementation.

An office suite, but I guess we can use Google Docs already??

Thorham · « **Reply #64 on:** February 02, 2012, 11:24:40 AM »

Quote from: yssing;678731

The Official Amiga moved to PPC a long time ago, and yes AOS4.x is the official amiga platform.

:laughing:

gertsy · « **Reply #65 on:** February 02, 2012, 12:45:21 PM »

The golden age of Amiga.... I've already been through it once. But by all means:
"Make it so !"
I'll be waiting in my Ready Room.

Richard42 · « **Reply #66 on:** February 02, 2012, 03:54:19 PM »

Quote from: HenryCase;678692

There are ways around Amdahl's Law. The main cause of it, AFAIR, is the memory architecture of traditional computer systems (the bottleneck is the memory bus of designs like the Von Neumann architecture).

Amdahl's law doesn't have anything to do with memory bandwidth. It's very simple, and trust me, there is no way around it. Some algorithms, or parts of algorithms, cannot be parallelized; they are inherently serial. CABAC in H.264 video compression is a good example. Your overall software performance will be bounded by the execution time of the most complex piece of the algorithm which cannot be parallized. Once that piece of your algorithm is taking up 100% of an execution core, your software cannot go any faster, regardless of how many more CPUs you throw at it.

I read a paper a few days ago claiming that it's better to have a few beefy cores than a bunch of wimpy cores. This is exactly what I've also seen in my experience with high-performance computing, and the reason why serious people run HPC compute loads on badass x86 chips and not ARM or Atom.

TheBilgeRat · « **Reply #67 on:** February 02, 2012, 04:27:43 PM »

Quote from: HenryCase;678692

To those reading that don't know of Amdahl's Law, it basically says there are limits to the efficiency of multi-core systems, and past a certain point you can damage the efficiency of your system by adding more cores.

Like the other gentleman stated concerning the pieces of code that cannot be made parallel, there is a simple equation that allows you to determine the speedup:

Tp = 1/s * Ts + (1 - 1/s) * Ts/p

jorkany · « **Reply #68 on:** February 02, 2012, 04:41:43 PM »

Quote from: yssing;678731

The Official Amiga moved to PPC a long time ago, and yes AOS4.x is the official amiga platform.

If you believe that then I expect you'll be buying the upcoming Commodore Amiga from CUSA, as it is the official Amiga.

But then if you believe what you wrote, you'll probably believe anything.

Look, I found the perfect soundtrack for you to listen to as you pretend OS4 is "teh REAL Amiga!":
http://www.metacafe.com/watch/sy-959746544/journey_dont_stop_believin_official_music_video/

Mrs Beanbag · « **Reply #69 on:** February 02, 2012, 06:56:54 PM »

Quote from: Richard42;678774

Amdahl's law doesn't have anything to do with memory bandwidth. It's very simple, and trust me, there is no way around it. Some algorithms, or parts of algorithms, cannot be parallelized; they are inherently serial. CABAC in H.264 video compression is a good example. Your overall software performance will be bounded by the execution time of the most complex piece of the algorithm which cannot be parallized. Once that piece of your algorithm is taking up 100% of an execution core, your software cannot go any faster, regardless of how many more CPUs you throw at it.

This is exactly right. However ray tracing is one of a class of problems called "embarrassingly parallel".

Quote

I read a paper a few days ago claiming that it's better to have a few beefy cores than a bunch of wimpy cores. This is exactly what I've also seen in my experience with high-performance computing, and the reason why serious people run HPC compute loads on badass x86 chips and not ARM or Atom.

Well that rather depends on what you want to do with your computer! If you know you are going to use a lot of threads, a bunch of wimpy cores are a good choice.

A GPU is exactly a "bunch of wimpy cores" designed to maximise for data throughput. So is an UltraSPARC T1. They both do their jobs in different ways, but both are the right solutions to the right problems.

HenryCase · « **Reply #70 on:** February 02, 2012, 07:34:32 PM »

Quote from: Richard42;678774

Amdahl's law doesn't have anything to do with memory bandwidth. It's very simple, and trust me, there is no way around it. Some algorithms, or parts of algorithms, cannot be parallelized; they are inherently serial.

Let me put this question to you: what stops serial processing tasks being shared amongst different cores?

The issue with parallelising serial tasks is not the access to more cores, as out-of-order execution shows it's possible to streamline processing based on the computing resources available. What does hold things back is that the memory holding the data being worked on is not shared out. That is why I stated that commonly employed memory architectures are the bottleneck. If it helps, think about it like this. What we have now is multiple cores working on a single data set. Now think about a network of computers working on a problem together. A key part of making this efficient is ensuring they block each other as little as possible. Now consider that it's possible to build a 'network' of computers within a single computing device, so long as they have control of their own memory. Hopefully you can see where this is going, if not this page should give a little more clues:
http://www.eetimes.com/design/eda-design/4211228/Overcoming-32-28-nm-IC-implementation-challenges

Amdahl's Law applies only to a certain set of programs. Yes, there are parts of algorithms that must be executed in a certain order. However, there are many ways to write code that lends itself to parallel execution. Here's one example of an article that discusses ways to beat Amdahl's law:
http://drdobbs.com/cpp/205900309?pgno=1

Generally speaking, one of the key things when designing programs that are highly parallelised is avoiding the need to manipulate state. For example, the programming language Haskell is 'pure' by design in the sense that it doesn't alter the state of program whilst running it, and the elements of the program that do require changing state and the side effects from this are sandboxed in structures called monads. This allows Haskell programs to take advantage of multi-core CPUs without needing to worry about program execution.

If this is new, need to explore what is meant by side effects. Imagine if every time you asked a certain question you got the same answer. Having such a question in a program is an example of something without side effects. Next, imagine the opposite. With the question with side effects, the answer is partly determined by when you ask it. By removing side effects, it doesn't matter when you ask the question.

Mrs Beanbag · « **Reply #71 on:** February 02, 2012, 07:58:20 PM »

Quote from: HenryCase;678804

Let me put this question to you: what stops serial processing tasks being shared amongst different cores?

Umm it's the fact that every stage in the computation has a direct dependency on the previous stage. Instruction level parallelism might be able to squeeze some extra performance out of each stage, but even that has its limits (more than four-way superscalar is typically more effort than it's worth).

There's an argument to be made for this being down to unimaginative formulations of the problem. But that's another story. A lot of existing software has been designed this way.

HenryCase · « **Reply #72 on:** February 02, 2012, 09:33:13 PM »

Quote from: Mrs Beanbag;678808

A lot of existing software has been designed this way.

You won't find me disagreeing with that. Of course it's possible to design software that doesn't scale well to multiple CPU cores.

What is being suggested by Amdahl's Law is that there's a limit beyond which you can't improve performance by adding a new processing core, even if you code from scratch. Looking at it another way, if you're trying to execute an algorithm, you'll only be as fast as the slowest non-divisible element in the algorithm. This is true, but what I feel is being overlooked is that it's possible to make smaller non-divisible elements in a program by altering the architecture that the program runs on. It's about maximising performance, and from what I've seen we've got plenty of room to optimise both the hardware and the software of parallel systems.

psxphill · « **Reply #73 on:** February 03, 2012, 07:53:40 AM »

Quote from: Mrs Beanbag;678688

Indeed. One trick I've still yet to try, but I know must be possible, is to use HAM as a zero-cost polygon filler!

It was originally designed for that. However there is a cost as you need to clip polygons rather than use z sorting.

yssing · « **Reply #74 from previous page:** February 03, 2012, 08:19:48 AM »

Quote from: jorkany;678781

If you believe that then I expect you'll be buying the upcoming Commodore Amiga from CUSA, as it is the official Amiga.

But then if you believe what you wrote, you'll probably believe anything.

If I beleive that AOS.x is the official amiga, and it runs on PPC only, why would I then use money on CUSA rebranded x86HW?

No reason for you not to behave like an adult either.

Author Topic: a golden age of Amiga (Read 31247 times)

Mrs Beanbag

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga

Thorham

Re: a golden age of Amiga

yssing

Re: a golden age of Amiga

Thorham

Re: a golden age of Amiga

gertsy

Re: a golden age of Amiga

Richard42

Re: a golden age of Amiga

TheBilgeRat

Re: a golden age of Amiga

jorkany

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga

Mrs Beanbag

Re: a golden age of Amiga

HenryCase

Re: a golden age of Amiga

psxphill

Re: a golden age of Amiga

yssing

Re: a golden age of Amiga