Author Topic: PC still playing Amiga catchup (Read 306381 times)

Karlos · « **Reply #104 from previous page:** June 13, 2009, 10:48:28 AM »

Quote from: amigaksi;510955

>You have not shown that DirectX/OpenGL/OpenCL/Cuda/X are flawed.

That was not my argument, but there are bugs since they are up to version 10.0 DirectX or something around that number.

Let me get this straight. Something must be buggy since it exists in a high version number?

It's true that there are bugs in all versions of DX, like there are in any software (and hardware), but this isn't the reason it's up to V10. It's up to V10 due to the inclusion of more and more features.

You only have to look at the featureset of a typical DX10 game compared to one that'll work on DX3 to know that :lol:

Karlos · « **Reply #105 on:** June 13, 2009, 10:11:18 PM »

@quarkx

Regarding the default gui on ubuntu, you do realise that you can change the entire look and feel of pretty much any linux distro as much or as little as you like, right? There are a dozen window managers, all of which are customisable to the hilt.

As for Vista, yeah it isn't great. But it does have DX10. Which is all I really use it for

Karlos · « **Reply #106 on:** June 14, 2009, 12:40:19 AM »

Quote from: amigaksi;511065

I don't see the argument why you MUST have an API access only. And if hardware compatibility can help make things more efficient, why not spend the time. I/O has improved but not as much as processor speeds.

And what sort of "hardware access" stanards do you propose for modern GPU's? You're thinking is so yesteryear with rgards this branch of hardware than you seemingly fail to note that modern graphics hardware has all but completely evolved away from fixed function pipelines into turing complete programmable devices.

Do you, in all honesty, even have the faintest notion how modern graphics hardware works?

Quote

And I gave the example of palette changes which (if you go time them) you will find they are not that much faster than amiga changing palette. This is just one example.

From the above statement, clearly not. I could program a "pallete change" for my GPU that simultaneously sets every colour register in parallel in a couple of shader clock cycles.

However, palette changes are a thing of the past for modern hardware. I haven't used a indexed colour mode for more than a few hours (usually when retgrogaming) in almost 10 years, even on the Amiga.

Karlos · « **Reply #107 on:** June 14, 2009, 04:22:14 PM »

Quote from: amigaksi;511103

All I am stating regarding hardware standards is that they base it on I/O ports and memory maps like VGA/EGA/CGA was rather than API calls. That way, you don't have to rely on any drivers and API calls although those can also be present in a system.

>From the above statement, clearly not. I could program a "pallete change" for my GPU that simultaneously sets every colour register in parallel in a couple of shader clock cycles.

Now you call an API to do that which works on majority of PCs and see how well it performs as compared to Amiga swapping two palette registers or to a standard VGA swapping color registers. You can't just target your machine since we are talking about making it work in general; that's why I am talking hardware compatibility to begin with.

1) Not just my machine, any CUDA 1.0+ capable machine. That's every G80 upwards. There are more of those installed in machines today than there are amigas. Hence my code will run on more machines than yours.

2) It won't run on the rest. Well, there's a pity. Perhaps this is why API calls exist in the first place, eh?

Quote

>However, palette changes are a thing of the past for modern hardware. I haven't used a indexed colour mode for more than a few hours (usually when retgrogaming) in almost 10 years, even on the Amiga.

That's subjective.

No it isn't its FACT. The only time I have used indexed colour modes are on a physical AGA machine when playing old games. The rest of the time I run 32-bit truecolour displays, or at worst, high colour 16-bit ones.

Quote

But regardless, I was giving example where I/O accesses are better than API calls and Amiga I/O accesses aren't that slow.

Compared to speed of modern graphics memory it is ACHINGLY SLOW. Why don't you time how many palette registers you can update in 1 second? Doing it from either the copper or the CPU, you'll hit the limit of the bus speed. However you decide to divide the workload between the copper and CPU, you are going to be restricted by the write bandwidth, which is at best 7-8MB/s for AGA class hardware and even less for OCS/ECS.

That particular limitation is mitigated in current hardware, where memory bandwidth is in the GiB/s range. So, in the time it takes you to set just one palette register with an IO instruction, a modern G200 class GPU could have done all of them dozens of times over, that's even before you parallelise the code.

Assuming you want to set 256 32-bit registers, you can do so by moving a vector of 4 32-bit integers to the target location. CUDA executes parallel threads in "warps" of 32 threads. In the absence of any conditional branching, every thread in a warp runs the same instruction concurrently as it's neigbours.

So, you'd need a GPU kernel that you invoke as a pair of concurrent warps (giving 64 threads in total), where each thread sets a block of 4 registers using a vector of 4 ints, ensuring coalesced memory access.

64 threads each setting 4 registers in parallel, each thread running in parallel. On my GPU, which will happily run 24576 parallel threads at any one instant, that's only 64/24576 = 0.26% utilisation.

In theory, that's pretty damn fast, especially at 1.3GHz (shader clock of said PU)

In practise, it would take much longer to set up than it does to execute. That dominates the timing, so much so that I'd almost guarantee that a basic API call on the CPU that just does 256 successive 32-bit writes to the same address space would be faster in "real" time and would work on any graphics card.

So, for all I can actually set the registers faster by "hitting the metal", the end result is less portable and most likely slower than using the API since the job at hand is so easy to do anyway that it ceases to benefit from HW acceleration.

Karlos · « **Reply #108 on:** June 14, 2009, 05:22:03 PM »

Quote from: amigaksi;511215

>My CUDA GPU is faster than Amiga's changing palette capabilities i.e. the compute wavefront is larger. The purpose pixel shader is .... pixel processing.

"My" is the keyword. Is it generic enough?

Every GPU presently available is faster at writing to it's own register set than all versions of the amiga chipset are at changing their palette registers.

Even my old Permedia 2, which doesn't really count as being a GPU (there's no real programmability), can load it's RGB registers via the FIFO in less time than it takes OCS to do it.

Is that generic enough for you?

Karlos · « **Reply #109 on:** June 16, 2009, 10:17:27 AM »

Quote

i have a freeware scheduler "Executive" and i can render an animation in cinema 4d, whilst editing scenes and objects in Cinema 4D, send the resultant pics to Adpro for processing, save the files automatically, do a spot of house keeping with DOpus, paint a texture in Dpaint and the Operating System menues are just as fast as if had nothing loaded- on a 50 mhz machine with 16 meg ram

And you had the audacity to call my A1200 machine a frankenstein?

Executive is a complete hack into exec. A very well written one, I might add and one I was happy to use for many years. However, since we're on the subject, why don't you read the documentation and see where the entire motivation and basis for Executive came from? That's right, Un*x. All linux kernels have such a scheduler and modern ones are significantly improved compared to those around when Executive was written.

Karlos · « **Reply #110 on:** June 16, 2009, 03:43:06 PM »

Quote from: Hammer;511524

CUDA GPUs have extremely large instruction issue per cycle rate, extremely large registers (e.g. can go up to 512 KByte), super-scalar (dual issue per SP) pipelines, extremely large SMT, caches (both hardware and software managed), high speed memory (and designed specifically for graphics i.e. GDDRx types), Ghz range stream processors(SP), multiple ROPS,Triple digit (e.g. 400Mhz) Mhz dual RAMDACs and 'etc'.

The amount of “Instructions in flight” (both in parallel and sequential(in pipeline)) in CUDA GPU kills any classic Amiga IGP chipset.

You only have to look at the functionality available for CUDA devices:

For Compute 1.0

The maximum number of threads per block is 512
The maximum sizes of the x-, y-, and z-dimension of a thread block are 512, 512 and 64, respectively
The maximum size of each dimension of a grid of thread blocks is 65535
The warp size is 32 threads
The number of registers per multiprocessor is 8192;
The amount of shared memory available per multiprocessor is 16 KB organized into 16 banks
The total amount of constant memory is 64 KB
The cache working set for constant memory is 8 KB per multiprocessor
The cache working set for texture memory varies between 6 and 8 KB per multiprocessor
The maximum number of active blocks per multiprocessor is 8
The maximum number of active warps per multiprocessor is 24
The maximum number of active threads per multiprocessor is 768
For a texture reference bound to a one-dimensional CUDA array, the maximum width is 213
For a texture reference bound to a two-dimensional CUDA array, the maximum width is 216 and the maximum height is 215
For a texture reference bound to a three-dimensional CUDA array, the maximum width is 211, the maximum height is 211, and the maximum depth is 211
For a texture reference bound to linear memory, the maximum width is 227
The limit on kernel size is 2 million PTX instructions
Each multiprocessor is composed of eight processors, so that a multiprocessor is able to process the 32 threads of a warp in four clock cycles

For Compute 1.1

Support for atomic functions operating on 32-bit words in global memory

For Compute 1.2

Support for atomic functions operating in shared memory and atomic functions operating on 64-bit words in global memory
Support for warp vote functions
The number of registers per multiprocessor is 16384
The maximum number of active warps per multiprocessor is 32
The maximum number of active threads per multiprocessor is 1024

For Compute 1.3

Support for double-precision floating-point numbers

Compare the above with the capability of the copper and blitter combined. I'm not knocking the native amiga hardware, it is capable of all kinds of strange and wonderful things when coded directly but any CUDA Compute 1.0 capable GPU utterly surpasses it potential. It isn't really a fair comparison however.

Karlos · « **Reply #111 on:** June 16, 2009, 07:32:39 PM »

Quote from: stefcep2;511528

Man, Karlos I cut you deep with that "Frankenstein" thing back there. I really am sorry. Ok. Look you have an "expanded" Amiga. Better?

*sigh* I guess you have to punctuate all satirical remarks with smilies these days

Quote

Actually I did read the guide for executive about a decade ago. From memeory, Executive offers more than one unix-like scheduler. It offers several, some that have nothing to do with Unix. Which you can select and turn off without rebooting. Can any other PC OS do this?

The registered version offered several scheduling models based on established models already used by different Unix based operating systems of the day. The non-registered version only has the one basic model.

Now, as for disabling the dynamic task scheduler, the only reason you can do this under AmigaOS is because it doesn't use one by default anyway. Most Unix based OSes don't have a model similar to exec's fixed priority one (by which I mean tasks are assigned priorities that the OS doesn't adjust) simply because it isn't a good model for a many process multi-user system.

As for switching algorithms, the reason you shouldn't have to do this is because the OS should know better than you what every process is up to and what model best fits.

Quote

What's better about the new schedulers?

Many things. For instance, modern schedulers have access to far more meta data about processes than Executive did, such as how much memory, IO, swap etc is in use, the load average history, type of load (compute or transpute bound) and so on for every process in the system. These data give them far more to go on when deciding what processes to service immediately and what can wait. They also understand how to distribute load across multiple CPU's and multiple cores on multiple CPU's, which really is not a simple thing to do well.

We had a server go slightly mental after somebody had written some really bad code that essentially fork bombed the machine. It's load average was over 1000 at this point (I think on that revision of the kernel, load averages might even wrap at 1024, so it could have been even higher) and it was over 20GiB into swap. In short, the system was being brought to it's absolute knees with thousands of amok processes constantly trying to spawn further ones and every one of them trying to grab memory. It is a testament to the scheduler that the problem could be fixed through a normal ssh login. Most other OSes would have simply locked up and died long before this.

Karlos · « **Reply #112 on:** June 17, 2009, 01:35:52 AM »

Quote from: stefcep2;511694

PC's that are 15 years newer than the last classic amiga that was built have trouble doing this. What exactly is your point?

If by that time scale you mean PC's currently on sale, then I'd only be able to agree with the caveat that the observation it applies to some systems. You can buy/build PCs in the 500 quid range that will manage 1080p playback perfectly well. You don't even need a high end graphics card, my work Radeon X300 with MPlayer under fedora manages it, the principal limitation there is the fact it's a single core P4 (which is pretty much yesterdays kit now).

Karlos · « **Reply #113 on:** June 17, 2009, 07:27:50 AM »

Quote from: stefcep2;511705

i'm talking PC's that were on sale barely 2 years ago. On my Win XP Athlon 4800+ using integrated graphics, 1080i Full HDTV occassionally skips a frame.

What else can you do when you play 1080p on the pentium 4?

Plenty. Drink coffee, eat popcorn :lol:

Seriously though, you missed the point I was making. The P4 can just about handle it and not always. It'll drop frames, especially if I'm working away (on the second monitor).

Since the classic Amiga, from a hardware perspective is pretty much a static platform it's easy to forget that 2 years is a hell of a long time in the mainstream PC market. Typical CPU performance alone has tended to double within that time frame.

The new machines sold today are significantly more powerful than their equivalent price range counterparts 2 years ago. My work P4 is a couple of years older still. It can just about do 1080p, machines sold today tend to take it in their stride.

Karlos · « **Reply #114 on:** June 17, 2009, 07:31:22 AM »

Quote from: smerf;511742

Hi,

@the_leander

Know what you mean, the last 2 people that took the challenge I had to lend them my usb floppy drive, in a way I cheated on this since most PC's almost shut down when trying to format a floppy disk and write to them. It is just one of those area's where the Amiga excels by its hardware design. Another area is playing music while they are trying to do this stuff, playing music really put a hit on the PC's CPU and even though we are tasked with playing 10 songs, most PC users try to copy me and use multi tasking while playing the music, but when they do that it puts a pretty good hit on their system, even dual cores slow down a little when trying this. OK, I rigged the thing a little bit, but they all agreed to the terms, and they all lost. The major thing is, that most of them till this day can't believe that a 25 mhz machine, with a 2 gig hard drive, and 18 megs of memory completed all those tasks before their machine, as a matter of fact the first time I did this was against a 1.1 ghz PC with windows 2000, I had everything done before he even got his destop screen up. It was my best victory. I loved it.

smerf

There has to be something wrong with their machines if they can't play back 10 audio streams concurrently. Most current generation games will throw a few dozen audio streams together at any given instant and find plenty of time to render some eye popping visuals.

Karlos · « **Reply #115 on:** June 17, 2009, 03:36:09 PM »

Quote from: stefcep2;511755

But Smerf don't you get their standard reply yet: "they don't use a floppies, so they don't format them, they don't care, it doesn't matter. PC wins"

It's true that my current system has a multiformat card reader where the floppy might fir but my old AMD duron has a floppy disk drive and I never had any problems using it with either Win2K, WinXP or Linux.

I do fully recognise the "floppy death" syndrome that seems to infect PC's from a decade ago when people were using Win9x. I think it had more to do with the design of Windows than the hardware.

Actually, my work PC also has a floppy drive. Let me mount /dev/floppy and see what it does...

Karlos · « **Reply #116 on:** June 17, 2009, 03:44:33 PM »

Nope, no real handicap from using the floppy drive. Finding a disk to test was much more difficult :lol:

Karlos · « **Reply #117 on:** June 18, 2009, 02:23:48 PM »

Quote from: Wayne;512025

Is it about time to close this thread as pointless bickering over a subject which is clearly not true?

Or it could be renamed the "the great PC v Amiga flamewar" and keep it as the fighting pit for people that want to duke it out so that they don't open equivalent threads elsewhere.

Karlos · « **Reply #118 on:** June 18, 2009, 08:29:35 PM »

Quote from: Wayne;512064

Or simply "The Delusional in the Amiga crowd versus reality 101".

How about "comp.sys.amiga.advocacy rides again!" ?

Quote

Color me amazed that vBulletin seems to be readily handling 1 thread with over a thousand replies. Xoops would have croaked by now.

Also... LOVE the tags for the thread...

Another vBulletin forum I visit has a thread that's well over 1000 pages by now and it handles it just fine

Karlos · « **Reply #119 on:** June 18, 2009, 09:06:37 PM »

Quote from: Wayne;512133

What really gets to me on the pet peeve level is that in 2009, a full 15 years after the death of Commodore, there are still those so desperate to hang on to "what used to be", that they invent the smallest, most asinine reasons that they believe the Commodore Amiga -- a computer almost 8 times removed from Moore's Law -- is still a commercially competitive (not to even mention viable) platform...

I find it doubly bizarre that those who regard it in that light are usually quite dismissive of things like UAE that can actually transform an OS3.x installation into a platform that can actually hold it's own thanks to the exponential increase in raw horsepower you get from it.

Author Topic: PC still playing Amiga catchup (Read 306381 times)

Karlos

Re: PC still playing Amiga catchup

Karlos

Re: PC still playing Amiga catchup

Karlos

Re: PC still playing Amiga catchup

Karlos

Re: PC still playing Amiga catchup

Karlos

Re: PC still playing Amiga catchup

Karlos

Re: PC still playing Amiga catchup

Karlos

Re: PC still playing Amiga catchup

Karlos

Re: PC still playing Amiga catchup

Karlos

Re: PC still playing Amiga catchup

Karlos

Re: PC still playing Amiga catchup

Karlos

Re: PC still playing Amiga catchup

Karlos

Re: PC still playing Amiga catchup

Karlos

Re: PC still playing Amiga catchup

Karlos

Re: PC still playing Amiga catchup

Karlos

Re: PC still playing Amiga catchup

Karlos

Re: PC still playing Amiga catchup