Author Topic: newb questions, hit the hardware or not? (Read 33449 times)

SamuraiCrow · « **on:** July 14, 2014, 05:55:52 PM »

@Thomas Richter

I've researched the OS functions extensively in the interest of encapsulating the most efficient special effects of the Amiga chipsets in shared libraries. This is not only to make things easier for newbies on the Amiga but also to be able to reroute some of them to emulations on graphics cards for better compatibility.

BltBitMap only works on the Blitter. What about MrgCop, the rather incomplete interface to the Copper? CINIT, CBUMP, CWAIT, and CMOVE are hardly enough macros to suffice! The Copper is able to implement display-list like properties to queue the Blitter much more efficiently than QBlit can do using the CPU and an interrupt. Also, I'd like to be able to merge partial Copper lists based on different starting raster positions so that multiple effects can be combined. (Both usages need not be concurrent since waiting for the Blitter and raster positions in the same CWAIT tend to result in race conditions.) Neither the CNEXTBUFFER macro nor its underlying subroutine to skip to another buffer of Copper instructions was ever implemented even though certain infrastructure was added to graphics/copper.h indicating its existence.

SamuraiCrow · « **Reply #1 on:** July 15, 2014, 11:26:39 AM »

The main advantage of using the OS functions for blitting is that it works asynchronously so that the CPU can multitask while the Blitter is still plotting graphics.

Thorham's approach is the old-fashioned way of using the CPU to copy pixels. It works best when you have sufficient cache memory and CPU time to do it that way.

The Blitter is clocked slow (3.5 MHz) and doesn't cache the mask plane when blitting BOBs. I'd call that a shortcoming of the Blitter more than a shortcoming of the OS routines though.

Once faster, FPGA-based chipsets come on the scene, other advantages for using OS functions appear: Using native chunky modes instead of chunky to planar conversion will save a lot of CPU time, for example. If you're banging the hardware, your program will be oblivious to this new chunky hardware.

Does that sum it up?

SamuraiCrow · « **Reply #2 on:** July 15, 2014, 11:36:30 AM »

Quote from: Thomas Richter;768982

It is at least better than writing a software that does not work for some configurations. That is what the operating system is good for.

Or would be good for if it adequately supported all features of the chipset... Copper lists anyone?

My opinion is that until we run all of our software in a static-compiled VM or have some way of propagating macros at install time rather than compile/assemble time, the OS will have some compatibility advantage. Once we reach that point, however, a lot of stuff will shift from runtime to the programming ABI as accessed by the API defined by the OS so that unnecessary runtime checks can be optimized away via constant propagation and dead-code elimination in an ahead-of-time compiler. That's one thing (and maybe the only thing) that AmigaDE did right.

SamuraiCrow · « **Reply #3 on:** July 15, 2014, 12:00:37 PM »

Quote from: Thorham;768987

It's also the only way to get good performance on low end machines. It's old fashioned because the hardware is old, and many Amiga users use this hardware.

Hand-optimizing is a pain but being able to reinstall and reoptimize software from some bytecode that is optimized and compiled at install time will get your cake and let you eat it too! (At the cost of some install time.)

SamuraiCrow · « **Reply #4 on:** July 15, 2014, 12:37:27 PM »

Quote from: nicholas;768991

It's not like we can't detect the machine we are running on and use a different codepath for each architecture. Best of both worlds.

We could if we coded for a static VM and let the final optimization take place at install time.

Quote from: Thorham;768993

Do we currently have such tools? How fast would that be for 68020/30 compared to hand optimized code?

Mostly all that exists at this point is compiler middle-ware like LLVM or fixed-function compilers like GCC and VBCC. But I know LLVM has a PBQP register allocator that is smart enough to stuff multiple small values in a large register when it makes sense to do so, for example:

Code: [Select]

move.w (a0,var1), d0  ;load var1
swap d0  ;stuff it in top half of d0
move.w (a0, var2), d0;  load var2 in the bottom half of d0

and so on. Compilers can be smart if they are programmed to be.

Once implemented, a virtual machine based on this could make a difference for anybody with generally unsupported hardware! This technology is already in use by the PNaCl VM in the Google Chrome browser but for little-endian machines only. This is why low-level coding is dying out, not only because high-level code is cheaper to make, but the optimization can be automated!

I've been pushing for this stuff since 1998 and if Amiga, Inc. hadn't been so pig-headed stubborn, we'd have had this on the Amigas by now! AmigaDE could have made Amiga much more compatible without the need to alter the hardware. Since LLVM is Apple-supported open-source freeware and the backend used by the PNaCl VM is also (except-Google supported), it is almost in reach again!

SamuraiCrow · « **Reply #5 on:** July 15, 2014, 12:48:31 PM »

Quote from: itix;769001

At overall they are not better than OS routines. Maybe some functions are optimized better for some hardware but you still can't get max performance out from older Amigas. Sprites are hard to abstract since they have many HW restrictions on Amiga and Copper is completely missing from RTG.

I've been studying how to do this since 1995. Sprites may be hard to abstract but they are easy to emulate.

One of my planned libraries will be to make it possible to "tile" sprites side-by-side and top-to-bottom. All it needs is for a rectangle of a layer to be allocated for each 16-color sprite pair (using a non-standard interleave so that it appears to the OS to be a narrow 16-color screen). Layers.library will queue a separate blitter operations to each rectangle with clipping applied so that the seams between the sprites will not be detectable. Of course the main graphics will have to be specially coded for Amiga but getting it to work on other systems will be dead-simple by comparison.

I've got similar plans for the effects created with the Copper as well.

SamuraiCrow · « **Reply #6 on:** July 15, 2014, 02:38:54 PM »

@Thorham

That's cool. Just don't throw away your Assembly source. The comments help others to figure out how to cross-assemble it into bytecode. :-)

SamuraiCrow · « **Reply #7 on:** July 16, 2014, 11:15:54 AM »

Quote from: Thomas Richter;769028

Why do you make this needlessly complicated? First, layers is *not* the right library or abstraction for sprite emulation or blitting. It only maintains cliprects and damage regions, nothing else. But, if you want movable objects, there is already perfect support for this. Graphics.library BOBs exist since ever, provide an unlimited number of moving and animated objects. The correct abstraction is there, is in the Os and is completely supported. IIRC, the workbench uses them for moving drawers and icons around.

Now that I think about it you're right. Calling ClipBlit on multiple clipping regions would do the job just as well.

BOBs won't work for what I have planned because split-screen effects require seams in the display DMA. Sprites may be able to overlap this though.

Quote from: Thomas Richter;769028

The copper is too specialized and, given the number of colors a graphics card supports, not even required there. The copper was a chip from the 80s required to work around the limited bandwidth chips had back then. Similar effects do not require a copper nowadays, and allow a simple straight-foreward approach that was not available back then.

Who said anything about JUST using palette-changes? I'm talking about seamless both horizontal and vertical split-screen effects (unlike those buggy ones that leave a pixel seam like the OS does) at the same time as palette changes and maybe a few more Copper effects at the same time.

The Copper and sprite hacks would be less necessary if Commodore hadn't cheated their customers on the amount of Chip RAM that was addressable on AGA. It should have been expandable beyond 2 megs! For what it's worth, the compatible version of my libraries won't use hacks at all! It only will support hacks on the low-end systems that require them for performance reasons. Encapsulation is the goal.

SamuraiCrow · « **Reply #8 on:** July 16, 2014, 11:33:47 AM »

@Thorham

I'm not going to support double-scan resolutions without a hardware scan-doubler. There's also not enough bandwidth on the AGA Chip bus for what I'm planning to do anyway, in double-scan modes.

SamuraiCrow · « **Reply #9 on:** July 16, 2014, 12:52:51 PM »

Quote from: Thorham;769119

What exactly are you planning?

One of the libraries I was planning to write was a video codec that would stream Copperlists and Chip memory data from a CD-ROM, hard drive, or Flash memory in realtime. The Blitter and Copper will need as much bandwidth as they can get. It's a well-known fact that AGA's scan-doubling hardware takes twice as much display DMA bandwidth to create the same resolution as a single-scan display mode. (I'm debating the format to use for the disk-based portion since the CPU will have to run address relocation on each frame's Copper-list at the minimum. Maybe a simple JIT will make sense as well, to improve disk transfer speeds.)

SamuraiCrow · « **Reply #10 on:** July 16, 2014, 01:00:59 PM »

@Thomas Richter

If you used GCC to generate 68k code, I'd have to ask which version. The 68k backends have bit-rotted terribly due to lack of maintenance. (And as Matthey observed, it misses loads of optimizations and may have never been fully complete in the first place. Simply using an optimizing assembler like VASM instead of GAS would help too.)

Also, cycle counting works for compiler designers. But if you want to avoid cycle counting you ought to choose your compiler more carefully than to use a bit-rotted heap of old code. The x86 may have nearly-optimal code generation in free compilers but the 68000 has never had terribly good compilers.

SamuraiCrow · « **Reply #11 on:** July 16, 2014, 02:44:28 PM »

Quote from: Thomas Richter;769125

That's a plain simple SAS C 6.51 simply because the Os development chain depends on it (with the exception of intuition, actually, which depended on a really rotten compiler.)

I've worked with that one also. Generates pretty good code most of the time. If you use deep orders of operation in a formula, it stuffs the temporaries to the stack regardless of how many registers are free for temporary variables. Also, ChaosLord used SAS/C for his game writing and it occasionally would get confused and generate pure nonsense code that wouldn't even execute. In that event inline Assembly is unavoidable.

Quote from: Thomas Richter;769125

Anyhow, I stand for my opinion. Pointless argument. If you want to write video codec, the *blitter* is your least problem. The decoding algorithm will make a huge difference, and even there it makes sense to optimize the algorithm first. Been there, done that. That was actually a JPEG 2000 if you care.

I would care, if I were making a bitmap-based codec. I was planning on using mostly filled vectors though. I know how to optimize a full-screen vector into the minimum number of line-draws so that the whole screen can do a single vector-fill operation. That full-screen, full bitplane-depth pass is going to be costly though, as are the uncompressed audio samples. I may have to triple-buffer the display and use the CPU to clear the screen after it's been displayed just to take some strain off the Blitter.

Author Topic: newb questions, hit the hardware or not? (Read 33449 times)

SamuraiCrow

Re: newb questions, hit the hardware or not?

SamuraiCrow

Re: newb questions, hit the hardware or not?

SamuraiCrow

Re: newb questions, hit the hardware or not?

SamuraiCrow

Re: newb questions, hit the hardware or not?

SamuraiCrow

Re: newb questions, hit the hardware or not?

SamuraiCrow

Re: newb questions, hit the hardware or not?

SamuraiCrow

Re: newb questions, hit the hardware or not?

SamuraiCrow

Re: newb questions, hit the hardware or not?

SamuraiCrow

Re: newb questions, hit the hardware or not?

SamuraiCrow

Re: newb questions, hit the hardware or not?

SamuraiCrow

Re: newb questions, hit the hardware or not?

SamuraiCrow

Re: newb questions, hit the hardware or not?