Author Topic: newb questions, hit the hardware or not? (Read 63728 times)

Bif · « **on:** July 18, 2014, 09:19:22 AM »

Quote from: Sean Cunningham;769225

Your name made me recall the first generation of next-gen consoles and how they relate directly to this discussion. The developers on the original PSX titles were coding games in C and couldn't achieve anything but porkish performance and absolutely nothing for the first couple generations could achieve anything approaching arcade quality responsiveness and framerates.

The PlayStation series are an interesting study on this topic. I think each machine faced different problems and required something quite different to achieve high performance.

For PSX, indeed I'm really not aware of much ASM being used on projects. I don't think I ever used any in my area. I think this was for two reasons: 1) all the graphics and sound heavy lifting needed was done via dedicated hardware. The main CPU was really too slow to waste precious cycles doing heavy lifting, and 2) there wasn't anything terribly special about the CPU that ASM would produce vastly better performance than C (no vector unit, etc.). I think the interesting irony here is that the PSX (of all Playstations) hardware setup (mediocre CPU with custom hardware to do heavy lifting) would be closest to the Amiga. In this case its the old slow platform that required no ASM to squeeze performance out of it, the opposite of what you might expect.

Now for PSX, I believe one of the things that really dragged down early game performance was the piss poor Sony APIs we were forced to use. Not only did they not always make a lot of sense, their performance was atrocious in some cases (for no great reasons, just brain dead code / API design), with no legal way around it. In my game area, using their stuff robbed the R3000 of 10% of its total cycles across the whole game. I'm sure this was pretty much true for almost any game that ever shipped. I got frustrated and bypassed the problem area, 10% game cycles back. I did fear the trouble I might get in - they eventually found out what I was doing through performance analysis of our games, but just gave me the nudge nudge wink wink as having games perform that much better is not going to look bad for their brand. I'm sure other gameplay areas ran into similar issues and worked out improvements over time.

For PS2 I spent a crapload of time writing ASM. Gobs of heavy lifting code written for both the main vector unit and R3000. Luckily it supported inline ASM so you only had to code the critical part of each function in ASM - it's really not bloody exciting fun or useful coding function call/entry code in ASM. At that point ASM was the only way to use the vector unit to full advantage, and it can provide a huge boost in performance. In the PS2 the R3000 also sat pretty much unused so I used the crap out of it for my stuff, and I think coding in ASM did help in many cases. When a loop to get something done is just several cycles it can really help to knock one or two cycles off. The R3000 was interesting in its instruction pairings and I think the compiler wasn't daring enough to get as aggressive as it could. I think I also got a lot of performance out of trial and error with straight C code though. With these older compilers it could be a lot of trial in error in how a loop is constructed, pointer increment vs. index increment, the magic amount of times to unroll a loop, etc.

The PS3 requires even more gobs of ASM to make it shine as all the power is in the SPUs, and there are lesser amounts of other hardware you can offload work to. You need ASM to take advantage of the vector processing in the SPUs. Actually, that is not fully true - unless you are insane, you use "intrinsics" instead to get at vector or other instructions that a compiler cannot easily use. Intrinsics are essentially ASM instructions, but they do not take registers as arguments, they take C variables. The compiler then does the work of register allocation. It's a beautiful compromise as register allocation/tracking is always what drives me totally nuts about ASM programming when dealing with a more complex algorithm, and a good compiler is going to do a better job of this than you unless you REALLY want to spend a lot of time thinking about how to beat it. I did have to work with a code base that was actually programmed in a large amount of real SPU ASM, probably out of stubbornness as I couldn't see a performance advantage to it - I really wanted to bitchslap the programmer as it is brutally hard to try to understand and debug that amount of someone else's ASM.

Now I've not touched a bit of Amiga code in 25 years, but if I had to get something going, I think I'd be inclined to at least try to code in C/C++ as much as possible, and only ASM optimize the heavy lifting tight loops where the compiler is sucking. I'd try to use the libraries provided, but if they became a problem, I'd bypass them. Just saying not sure there is 100% one right or wrong way to do anything, it will depend, though I would say I'd certainly avoid 100% ASM coding just for the sake of it, I'm too old for that crap now.

Bif · « **Reply #1 on:** July 19, 2014, 09:11:55 AM »

Quote from: psxphill;769317

There aren't instruction pairings as such, it's pipelined so that each instruction should finish in a cycle and there is no register stalling (apart from GTE and mult/div).

Yeah you are right, I wasn't thinking superscalar pairing, my memory was bringing back the weirdness with the branch delay slot, where the instruction after the branch is always executed. If you didn't throw an instruction outside the end of a loop you wasted cycles. I can hardly remember any of this stuff, your memory and knowledge is really quite amazing, you live up to your moniker. I'm only now remembering a bit more where I recall designing every loop to do at least 6 loads before anything else. Or maybe that was the R5900, my memory is not that reliable.

Quote from: psxphill;769317

There is a lot more investment in better compilers these days. If anyone likes staring at ASM trying to figure out ways of making it faster and wants better Amiga software then writing a new back end for gcc or clang would probably be the best bet.

Yeah I agree, I was going to say the same thing but got too tired typing all that out. I still think there is room to use ASM to leverage some things that compilers can probably never be good at. E.g. in early days on integer only machines you could do tricks with Add With Carry type instructions to shave an instruction or two off a tight loop. That's the kind of stuff I'd be looking at if I went down to ASM.

Author Topic: newb questions, hit the hardware or not? (Read 63728 times)

Bif

Re: newb questions, hit the hardware or not?

Bif

Re: newb questions, hit the hardware or not?