Author Topic: newb questions, hit the hardware or not? (Read 33380 times)

LiveForIt · « **Reply #44 on:** July 16, 2014, 06:04:47 AM »

Quote from: Thorham;769052

On 20s and 30s there's no need for that, Especially the mouse and keyboard are very easy to handle properly with the OS.

its even easier to check the BITS, that's way some developers do it.
No need to reply intuition messages and so on, its maybe more common in games and demos.

Quote

and if some user can't use this, then I wonder how their Amiga is set up.

Some people like to use ModePro or some tool like that :-)
lol

Quote

Only using the OS in general is unacceptable for me, because it only helps NG users.

Not really, many Amiga1200/4000 users have Opalvision/Cybervison64/Cybervison3D/BlizzardVision, Grex/Mediator bus board upgrades some Roaden graphic card, sure they get around it by simply selecting a different video source on the monitor or some thing like that, or have two monitors connected, Its maybe more of convenience for this people, but might be problem for some with out scan double if they exist.

:-)

LiveForIt · « **Reply #45 on:** July 16, 2014, 07:07:53 AM »

@biggun

I'm shore it will help. But its just in drop in the sea.

Quote from: matthey;769066

The SAM 440 and 460 have a small Lattice fpga. The mentality of some of the so called next generation Amiga guys is to get away from hardware dependency.

It might not be connected in away that allowed it to be used emulate CIAA/CIAB.

I believe its used to control the clock speed, the FPGA is also programmed as GPIO, so it can be used in theory as joystick port easily. Just configure the pins as inputs, and make small 9pin dsub cable.

But to get it on right address I think you need to do some MMU magic.

Quote

They also may be trying to keep their AmigaOS closed for proprietary and security reasons.

Most things are no harder to do then on AmigaOS3.x, AmigaOS4.x is hackable if you like to, but its not open as you say.

Thorham · « **Reply #46 on:** July 16, 2014, 08:20:03 AM »

Quote from: matthey;769066

With a little bit of learning, it's possible to write code that is fairly optimal on the 68020-68060. Instruction scheduling for a 68060 generally doesn't affect 68020/68030 performance but can double 68060 performance with some code.

Can't do it. 20s and 30s have priority for me. Not to mention that instruction scheduling sucks. My goal is to get something to run well on the lower end machines (25 mhz 68020). When something runs well on such machines, why would I need to optimize for 68060s? For me anything above 68030 is irrelevant in terms of optimizing, because if a '30 can run it fast enough, then so can a '40 or '60. I also don't have a '40 or '60.

Quote from: matthey;769066

Learning about modern CPU pipelining, superscalar execution, caches, hazards/bubbles, etc. (not just 68020/68030 timings and specifics) will improve your code for the 68020/68030 also.

Really? So, you're telling me that on 20/30 there's more than cache+timings+pipeline? Interesting!

matthey · « **Reply #47 on:** July 16, 2014, 09:23:23 AM »

Quote from: Thorham;769105

Can't do it. 20s and 30s have priority for me. Not to mention that instruction scheduling sucks. My goal is to get something to run well on the lower end machines (25 mhz 68020). When something runs well on such machines, why would I need to optimize for 68060s? For me anything above 68030 is irrelevant in terms of optimizing, because if a '30 can run it fast enough, then so can a '40 or '60. I also don't have a '40 or '60.

More performance is always useful. Settings with better gfx, more sound effects/music and more options can be turned on a 68040/68060. Some games are nicer at 30fps than 20fps even if they are playable and fun at 20fps on a 68020/030. It does take a little more time to instruction schedule code but the code become re-usable for more and expanded projects.

Quote from: Thorham;769105

Really? So, you're telling me that on 20/30 there's more than cache+timings+pipeline? Interesting!

The 020/030 is friendly being lightly pipelined but performance is affected by alignment and data sizes (32 bit is sometimes faster than 16 bit) at least. Unfortunately, documentation is lacking in general for 68k instructions in regards to hazards/bubbles and instruction scheduling. I know the 020/030 has some instruction overlap but I don't know if it's enough to affect resource availability from instruction to instruction. Contrary to most 68k Amiga programmers, I have studied the 040 and 060 more (and I know more about the AmigaOS functions than banging the Amiga hardware also). Avoiding general slow downs for the 040/060 rarely hurts and sometimes helps 020/030 performance. This in contrast to the 68000 where optimizing for 68000-68060 is difficult as the 68000 is a 16 bit processor.

Thorham · « **Reply #48 on:** July 16, 2014, 09:52:54 AM »

Quote from: matthey;769110

More performance is always useful. Settings with better gfx, more sound effects/music and more options can be turned on a 68040/68060. Some games are nicer at 30fps than 20fps even if they are playable and fun at 20fps on a 68020/030. It does take a little more time to instruction schedule code but the code become re-usable for more and expanded projects.

Sure, but it depends on what you're writing. I'm writing a Fire Emblem clone with a full tiled display with several layers (16x16 pixel anim tiles, 16x24 map sprites, 16x8 status icons) and it's already much faster than needed on my 50 mhz '30 (you only need around seven FPS to get the animations to look like the original), and that's with out pipelining. I see no reason to optimize for 40/60 at all in this case, and will instead try to get as close as I can to A1200 with trapdoor fastmem.

Another example might be a text editor that has to run well in 640x512+overscan in eight colors (nice for double scan modes). Getting this to run as fast as possible on 20/30 obviously has priority over 40/60 where optimized 20/30 code will be fast enough anyway.

There is some software where it might be important, but for those cases it might be a better idea to write separate loops for 20/30 and 60. It's not as if it's much extra work.

Quote from: matthey;769110

The 020/030 is friendly being lightly pipelined but performance is affected by alignment and data sizes (32 bit is sometimes faster than 16 bit) at least. Unfortunately, documentation is lacking in general for 68k instructions in regards to hazards/bubbles and instruction scheduling. I know the 020/030 has some instruction overlap but I don't know if it's enough to affect resource availability from instruction to instruction. Contrary to most 68k Amiga programmers, I have studied the 040 and 060 more (and I know more about the AmigaOS functions than banging the Amiga hardware also). Avoiding general slow downs for the 040/060 rarely hurts and sometimes helps 020/030 performance. This in contrast to the 68000 where optimizing for 68000-68060 is difficult as the 68000 is a 16 bit processor.

Should be interesting to check out.

SamuraiCrow · « **Reply #49 on:** July 16, 2014, 11:15:54 AM »

Quote from: Thomas Richter;769028

Why do you make this needlessly complicated? First, layers is *not* the right library or abstraction for sprite emulation or blitting. It only maintains cliprects and damage regions, nothing else. But, if you want movable objects, there is already perfect support for this. Graphics.library BOBs exist since ever, provide an unlimited number of moving and animated objects. The correct abstraction is there, is in the Os and is completely supported. IIRC, the workbench uses them for moving drawers and icons around.

Now that I think about it you're right. Calling ClipBlit on multiple clipping regions would do the job just as well.

BOBs won't work for what I have planned because split-screen effects require seams in the display DMA. Sprites may be able to overlap this though.

Quote from: Thomas Richter;769028

The copper is too specialized and, given the number of colors a graphics card supports, not even required there. The copper was a chip from the 80s required to work around the limited bandwidth chips had back then. Similar effects do not require a copper nowadays, and allow a simple straight-foreward approach that was not available back then.

Who said anything about JUST using palette-changes? I'm talking about seamless both horizontal and vertical split-screen effects (unlike those buggy ones that leave a pixel seam like the OS does) at the same time as palette changes and maybe a few more Copper effects at the same time.

The Copper and sprite hacks would be less necessary if Commodore hadn't cheated their customers on the amount of Chip RAM that was addressable on AGA. It should have been expandable beyond 2 megs! For what it's worth, the compatible version of my libraries won't use hacks at all! It only will support hacks on the low-end systems that require them for performance reasons. Encapsulation is the goal.

Thorham · « **Reply #50 on:** July 16, 2014, 11:28:54 AM »

Quote from: SamuraiCrow;769115

Sprites may be able to overlap this though.

That won't work well for double scan users, because of the limited number of sprites. I think there may actually be only one sprite available in double scan modes

SamuraiCrow · « **Reply #51 on:** July 16, 2014, 11:33:47 AM »

@Thorham

I'm not going to support double-scan resolutions without a hardware scan-doubler. There's also not enough bandwidth on the AGA Chip bus for what I'm planning to do anyway, in double-scan modes.

Thorham · « **Reply #52 on:** July 16, 2014, 12:13:28 PM »

Quote from: SamuraiCrow;769117

for what I'm planning to do

What exactly are you planning?

guest11527 · « **Reply #53 on:** July 16, 2014, 12:19:42 PM »

Quote from: matthey;769066

I agree with your point on using the AmigaOS (where possible given constraints) but I disagree with the "no overhead" claim to using the AmigaOS, even if it "goes directly to the the hardware". Function calls through the jump table have overhead and compiled AmigaOS code is not optimal. For example, your new layers.library is riddled with instructions like:

lea (0,a6),a4 ; optimize to move.l a6,a4
move.l #0,-(sp) ; optimize to clr.l -(sp)
lea (a3),a6 ; optimize to move.l a3,a6

That's exactly what I call a "cycle counter party argument". It is completely pointless because it makes no observable difference. Probably the reverse, the compiler had likely made the choice for a reason. Anyhow, the low-level graphics.library is in assembly, if that makes you feel any better. Still, does not make a difference. Fast code comes from smart algorithms, not micro-optimizations. V45 is smarter in many respects because it avoids thousands of CPU cycles of worthless copy operations in most cases, probably of the expense of a couple of hundred CPU cycles elsewhere.

Quote from: matthey;769066

We are not talking about a cycle or 2. All these lack of optimizations add up and then programmers roll their own code to gain 10%+ speed over the AmigaOS.

Which I actually doubt, and even if it would be hardly noticable because there is more that adds up to the complexity of moving windows than a couple of trivial move operations. Actually, V45 is faster, not slower, because it is smarter algorithmically.

Quote from: matthey;769066

I want programmers to use the AmigaOS functions (but not required). We need to improve compilers and try to make code close to optimal for this to happen. Call me a cycle counter and ignore me if you like.

Pointless argument, see above. It requires algorithmic improvements, or probably additional abstractions to make it fit to the requirements of its time. Arguing about a

SamuraiCrow · « **Reply #54 on:** July 16, 2014, 12:52:51 PM »

Quote from: Thorham;769119

What exactly are you planning?

One of the libraries I was planning to write was a video codec that would stream Copperlists and Chip memory data from a CD-ROM, hard drive, or Flash memory in realtime. The Blitter and Copper will need as much bandwidth as they can get. It's a well-known fact that AGA's scan-doubling hardware takes twice as much display DMA bandwidth to create the same resolution as a single-scan display mode. (I'm debating the format to use for the disk-based portion since the CPU will have to run address relocation on each frame's Copper-list at the minimum. Maybe a simple JIT will make sense as well, to improve disk transfer speeds.)

commodorejohn · « **Reply #55 on:** July 16, 2014, 12:53:05 PM »

You seem to have misread the "algorithm first, implementation later" rule of optimization as "algorithm first, implementation never, also you're stupid and horrible for thinking that a human being could ever be smarter than a piece of software engineered by human beings, or that multiple small numbers can add up into a larger number!" there, Thomas.

SamuraiCrow · « **Reply #56 on:** July 16, 2014, 01:00:59 PM »

@Thomas Richter

If you used GCC to generate 68k code, I'd have to ask which version. The 68k backends have bit-rotted terribly due to lack of maintenance. (And as Matthey observed, it misses loads of optimizations and may have never been fully complete in the first place. Simply using an optimizing assembler like VASM instead of GAS would help too.)

Also, cycle counting works for compiler designers. But if you want to avoid cycle counting you ought to choose your compiler more carefully than to use a bit-rotted heap of old code. The x86 may have nearly-optimal code generation in free compilers but the 68000 has never had terribly good compilers.

guest11527 · « **Reply #57 on:** July 16, 2014, 01:41:59 PM »

Quote from: SamuraiCrow;769124

@Thomas Richter

If you used GCC to generate 68k code, I'd have to ask which version.

That's a plain simple SAS C 6.51 simply because the Os development chain depends on it (with the exception of intuition, actually, which depended on a really rotten compiler.) Anyhow, I stand for my opinion. Pointless argument. If you want to write video codec, the *blitter* is your least problem. The decoding algorithm will make a huge difference, and even there it makes sense to optimize the algorithm first. Been there, done that. That was actually a JPEG 2000 if you care.

guest11527 · « **Reply #58 on:** July 16, 2014, 02:37:34 PM »

Quote from: commodorejohn;769123

or that multiple small numbers can add up into a larger number!" there, Thomas.

Please get your math fixed. If you have n algorithms, each of them spends 1/nth of the time in solving a problem, and each of them is speed up by 10%, the overall speedup is still 10%. In fact, if you only speed up one of them (e.g. layers) by 10%, the overall improvement is much smaller, depending on n, and even marginal.

If, however, you have an algorithm whose running time grows as O(N^2) (N being the number of layers being moved, arranged or resized) and that is replaced by an O(N) algorithm (as it happened, actually), then even for suitably small N the improvement can be enormous. It is really that simple. Do not waste your time optimizing the useless details. Get the big picture correct. Then, if performance is still not right, check whether the problem is, find the bottlenecks, and either get rid of them by changing the algorithm, or optimize only there.

SamuraiCrow · « **Reply #59 from previous page:** July 16, 2014, 02:44:28 PM »

Quote from: Thomas Richter;769125

That's a plain simple SAS C 6.51 simply because the Os development chain depends on it (with the exception of intuition, actually, which depended on a really rotten compiler.)

I've worked with that one also. Generates pretty good code most of the time. If you use deep orders of operation in a formula, it stuffs the temporaries to the stack regardless of how many registers are free for temporary variables. Also, ChaosLord used SAS/C for his game writing and it occasionally would get confused and generate pure nonsense code that wouldn't even execute. In that event inline Assembly is unavoidable.

Quote from: Thomas Richter;769125

Anyhow, I stand for my opinion. Pointless argument. If you want to write video codec, the *blitter* is your least problem. The decoding algorithm will make a huge difference, and even there it makes sense to optimize the algorithm first. Been there, done that. That was actually a JPEG 2000 if you care.

I would care, if I were making a bitmap-based codec. I was planning on using mostly filled vectors though. I know how to optimize a full-screen vector into the minimum number of line-draws so that the whole screen can do a single vector-fill operation. That full-screen, full bitplane-depth pass is going to be costly though, as are the uncompressed audio samples. I may have to triple-buffer the display and use the CPU to clear the screen after it's been displayed just to take some strain off the Blitter.

Author Topic: newb questions, hit the hardware or not? (Read 33380 times)

LiveForIt

Re: newb questions, hit the hardware or not?

LiveForIt

Re: newb questions, hit the hardware or not?

Thorham

Re: newb questions, hit the hardware or not?

matthey

Re: newb questions, hit the hardware or not?

Thorham

Re: newb questions, hit the hardware or not?

SamuraiCrow

Re: newb questions, hit the hardware or not?

Thorham

Re: newb questions, hit the hardware or not?

SamuraiCrow

Re: newb questions, hit the hardware or not?

Thorham

Re: newb questions, hit the hardware or not?

guest11527

Re: newb questions, hit the hardware or not?

SamuraiCrow

Re: newb questions, hit the hardware or not?

commodorejohn

Re: newb questions, hit the hardware or not?

SamuraiCrow

Re: newb questions, hit the hardware or not?

guest11527

Re: newb questions, hit the hardware or not?

guest11527

Re: newb questions, hit the hardware or not?

SamuraiCrow

Re: newb questions, hit the hardware or not?