Welcome, Guest. Please login or register.

Author Topic: newb questions, hit the hardware or not?  (Read 33380 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline LiveForIt

Re: newb questions, hit the hardware or not?
« Reply #44 on: July 16, 2014, 06:04:47 AM »
Quote from: Thorham;769052
On 20s and 30s there's no need for that, Especially the mouse and keyboard are very easy to handle properly with the OS.


its even easier to check the BITS, that's way some developers do it.
No need to reply intuition messages and so on, its maybe more common in games and demos.

Quote
and if some user can't use this, then I wonder how their Amiga is set up.


Some people like to use ModePro or some tool like that :-)
lol

Quote
Only using the OS in general is unacceptable for me, because it only helps NG users.


Not really, many Amiga1200/4000 users have Opalvision/Cybervison64/Cybervison3D/BlizzardVision, Grex/Mediator bus board upgrades some Roaden graphic card, sure they get around it by simply selecting a different video source on the monitor or some thing like that, or have two monitors connected, Its maybe more of convenience for this people, but might be problem for some with out scan double if they exist.

:-)
 

Offline LiveForIt

Re: newb questions, hit the hardware or not?
« Reply #45 on: July 16, 2014, 07:07:53 AM »
@biggun

I'm shore it will help. But its just in drop in the sea.

Quote from: matthey;769066
The SAM 440 and 460 have a small Lattice fpga. The mentality of some of the so called next generation Amiga guys is to get away from hardware dependency.

It might not be connected in away that allowed it to be used emulate CIAA/CIAB.

I believe its used to control the clock speed, the FPGA is also programmed as GPIO, so it can be used in theory as joystick port easily. Just configure the pins as inputs, and make small 9pin dsub cable.

But to get it on right address I think you need to do some MMU magic.

Quote
They also may be trying to keep their AmigaOS closed for proprietary and security reasons.

Most things are no harder to do then on AmigaOS3.x, AmigaOS4.x is hackable if you like to, but its not open as you say.
« Last Edit: July 16, 2014, 07:10:46 AM by LiveForIt »
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1149
    • Show only replies by Thorham
Re: newb questions, hit the hardware or not?
« Reply #46 on: July 16, 2014, 08:20:03 AM »
Quote from: matthey;769066
With a little bit of learning, it's possible to write code that is fairly optimal on the 68020-68060. Instruction scheduling for a 68060 generally doesn't affect 68020/68030 performance but can double 68060 performance with some code.
Can't do it. 20s and 30s have priority for me. Not to mention that instruction scheduling sucks. My goal is to get something to run well on the lower end machines (25 mhz 68020). When something runs well on such machines, why would I need to optimize for 68060s? For me anything above 68030 is irrelevant in terms of optimizing, because if a '30 can run it fast enough, then so can a '40 or '60. I also don't have a '40 or '60.

Quote from: matthey;769066
Learning about modern CPU pipelining, superscalar execution, caches,  hazards/bubbles, etc. (not just 68020/68030 timings and specifics) will  improve your code for the 68020/68030 also.
Really? So, you're telling me that on 20/30 there's more than cache+timings+pipeline? Interesting!
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: newb questions, hit the hardware or not?
« Reply #47 on: July 16, 2014, 09:23:23 AM »
Quote from: Thorham;769105
Can't do it. 20s and 30s have priority for me. Not to mention that instruction scheduling sucks. My goal is to get something to run well on the lower end machines (25 mhz 68020). When something runs well on such machines, why would I need to optimize for 68060s? For me anything above 68030 is irrelevant in terms of optimizing, because if a '30 can run it fast enough, then so can a '40 or '60. I also don't have a '40 or '60.


More performance is always useful. Settings with better gfx, more sound effects/music and more options can be turned on a 68040/68060. Some games are nicer at 30fps than 20fps even if they are playable and fun at 20fps on a 68020/030. It does take a little more time to instruction schedule code but the code become re-usable for more and expanded projects.

Quote from: Thorham;769105

Really? So, you're telling me that on 20/30 there's more than cache+timings+pipeline? Interesting!


The 020/030 is friendly being lightly pipelined but performance is affected by alignment and data sizes (32 bit is sometimes faster than 16 bit) at least. Unfortunately, documentation is lacking in general for 68k instructions in regards to hazards/bubbles and instruction scheduling. I know the 020/030 has some instruction overlap but I don't know if it's enough to affect resource availability from instruction to instruction. Contrary to most 68k Amiga programmers, I have studied the 040 and 060 more (and I know more about the AmigaOS functions than banging the Amiga hardware also). Avoiding general slow downs for the 040/060 rarely hurts and sometimes helps 020/030 performance. This in contrast to the 68000 where optimizing for 68000-68060 is difficult as the 68000 is a 16 bit processor.
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1149
    • Show only replies by Thorham
Re: newb questions, hit the hardware or not?
« Reply #48 on: July 16, 2014, 09:52:54 AM »
Quote from: matthey;769110
More performance is always useful. Settings with better gfx, more sound effects/music and more options can be turned on a 68040/68060. Some games are nicer at 30fps than 20fps even if they are playable and fun at 20fps on a 68020/030. It does take a little more time to instruction schedule code but the code become re-usable for more and expanded projects.
Sure, but it depends on what you're writing. I'm writing a Fire Emblem clone with a full tiled display with several layers (16x16 pixel anim tiles, 16x24 map sprites, 16x8 status icons) and it's already much faster than needed on my 50 mhz '30 (you only need around seven FPS to get the animations to look like the original), and that's with out pipelining. I see no reason to optimize for 40/60 at all in this case, and will instead try to get as close as I can to A1200 with trapdoor fastmem.

Another example might be a text editor that has to run well in 640x512+overscan in eight colors (nice for double scan modes). Getting this to run as fast as possible on 20/30 obviously has priority over 40/60 where optimized 20/30 code will be fast enough anyway.

There is some software where it might be important, but for those cases it might be a better idea to write separate loops for 20/30 and 60. It's not as if it's much extra work.

Quote from: matthey;769110
The 020/030 is friendly being lightly pipelined but performance is affected by alignment and data sizes (32 bit is sometimes faster than 16 bit) at least. Unfortunately, documentation is lacking in general for 68k instructions in regards to hazards/bubbles and instruction scheduling. I know the 020/030 has some instruction overlap but I don't know if it's enough to affect resource availability from instruction to instruction. Contrary to most 68k Amiga programmers, I have studied the 040 and 060 more (and I know more about the AmigaOS functions than banging the Amiga hardware also). Avoiding general slow downs for the 040/060 rarely hurts and sometimes helps 020/030 performance. This in contrast to the 68000 where optimizing for 68000-68060 is difficult as the 68000 is a 16 bit processor.
Should be interesting to check out.
 

Offline SamuraiCrow

  • Hero Member
  • *****
  • Join Date: Feb 2002
  • Posts: 2280
  • Country: us
  • Gender: Male
    • Show only replies by SamuraiCrow
Re: newb questions, hit the hardware or not?
« Reply #49 on: July 16, 2014, 11:15:54 AM »
Quote from: Thomas Richter;769028
Why do you make this needlessly complicated? First, layers is *not* the right library or abstraction for sprite emulation or blitting. It only maintains cliprects and damage regions, nothing else. But, if you want movable objects, there is already perfect support for this. Graphics.library BOBs exist since ever, provide an unlimited number of moving and animated objects. The correct abstraction is there, is in the Os and is completely supported. IIRC, the workbench uses them for moving drawers and icons around.

Now that I think about it you're right.  Calling ClipBlit on multiple clipping regions would do the job just as well.

BOBs won't work for what I have planned because split-screen effects require seams in the display DMA.  Sprites may be able to overlap this though.

Quote from: Thomas Richter;769028
The copper is too specialized and, given the number of colors a graphics card supports, not even required there. The copper was a chip from the 80s required to work around the limited bandwidth chips had back then. Similar effects do not require a copper nowadays, and allow a simple straight-foreward approach that was not available back then.

Who said anything about JUST using palette-changes?  I'm talking about seamless both horizontal and vertical split-screen effects (unlike those buggy ones that leave a pixel seam like the OS does) at the same time as palette changes and maybe a few more Copper effects at the same time.

The Copper and sprite hacks would be less necessary if Commodore hadn't cheated their customers on the amount of Chip RAM that was addressable on AGA.  It should have been expandable beyond 2 megs!  For what it's worth, the compatible version of my libraries won't use hacks at all!  It only will support hacks on the low-end systems that require them for performance reasons.  Encapsulation is the goal.
« Last Edit: July 16, 2014, 11:19:36 AM by SamuraiCrow »
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1149
    • Show only replies by Thorham
Re: newb questions, hit the hardware or not?
« Reply #50 on: July 16, 2014, 11:28:54 AM »
Quote from: SamuraiCrow;769115
Sprites may be able to overlap this though.
That won't work well for double scan users, because of the limited number of sprites. I think there may actually be only one sprite available in double scan modes :(
 

Offline SamuraiCrow

  • Hero Member
  • *****
  • Join Date: Feb 2002
  • Posts: 2280
  • Country: us
  • Gender: Male
    • Show only replies by SamuraiCrow
Re: newb questions, hit the hardware or not?
« Reply #51 on: July 16, 2014, 11:33:47 AM »
@Thorham

I'm not going to support double-scan resolutions without a hardware scan-doubler.  There's also not enough bandwidth on the AGA Chip bus for what I'm planning to do anyway, in double-scan modes.
« Last Edit: July 16, 2014, 11:34:14 AM by SamuraiCrow »
 

Offline Thorham

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1149
    • Show only replies by Thorham
Re: newb questions, hit the hardware or not?
« Reply #52 on: July 16, 2014, 12:13:28 PM »
Quote from: SamuraiCrow;769117
for what I'm planning to do
What exactly are you planning?
 

guest11527

  • Guest
Re: newb questions, hit the hardware or not?
« Reply #53 on: July 16, 2014, 12:19:42 PM »
Quote from: matthey;769066
I agree with your point on using the AmigaOS (where possible given constraints) but I disagree with the "no overhead" claim to using the AmigaOS, even if it "goes directly to the the hardware". Function calls through the jump table have overhead and compiled AmigaOS code is not optimal. For example, your new layers.library is riddled with instructions like:

   lea (0,a6),a4  ; optimize to move.l a6,a4
   move.l #0,-(sp) ; optimize to clr.l -(sp)
   lea (a3),a6 ; optimize to move.l a3,a6

That's exactly what I call a "cycle counter party argument". It is completely pointless because it makes no observable difference. Probably the reverse, the compiler had likely made the choice for a reason. Anyhow, the low-level graphics.library is in assembly, if that makes you feel any better. Still, does not make a difference. Fast code comes from smart algorithms, not micro-optimizations. V45 is smarter in many respects because it avoids thousands of CPU cycles of worthless copy operations in most cases, probably of the expense of a couple of hundred CPU cycles elsewhere.  
Quote from: matthey;769066
We are not talking about a cycle or 2. All these lack of optimizations add up and then programmers roll their own code to gain 10%+ speed over the AmigaOS.
Which I actually doubt, and even if it would be hardly noticable because there is more that adds up to the complexity of moving windows than a couple of trivial move operations. Actually, V45 is faster, not slower, because it is smarter algorithmically.    
Quote from: matthey;769066
 I want programmers to use the AmigaOS functions (but not required). We need to improve compilers and try to make code close to optimal for this to happen. Call me a cycle counter and ignore me if you like.
Pointless argument, see above. It requires algorithmic improvements, or probably additional abstractions to make it fit to the requirements of its time. Arguing about a
 

Offline SamuraiCrow

  • Hero Member
  • *****
  • Join Date: Feb 2002
  • Posts: 2280
  • Country: us
  • Gender: Male
    • Show only replies by SamuraiCrow
Re: newb questions, hit the hardware or not?
« Reply #54 on: July 16, 2014, 12:52:51 PM »
Quote from: Thorham;769119
What exactly are you planning?


One of the libraries I was planning to write was a video codec that would stream Copperlists and Chip memory data from a CD-ROM, hard drive, or Flash memory in realtime.  The Blitter and Copper will need as much bandwidth as they can get.  It's a well-known fact that AGA's scan-doubling hardware takes twice as much display DMA bandwidth to create the same resolution as a single-scan display mode.  (I'm debating the format to use for the disk-based portion since the CPU will have to run address relocation on each frame's Copper-list at the minimum.  Maybe a simple JIT will make sense as well, to improve disk transfer speeds.)
 

Offline commodorejohn

  • Hero Member
  • *****
  • Join Date: Mar 2010
  • Posts: 3165
    • Show only replies by commodorejohn
    • http://www.commodorejohn.com
Re: newb questions, hit the hardware or not?
« Reply #55 on: July 16, 2014, 12:53:05 PM »
You seem to have misread the "algorithm first, implementation later" rule of optimization as "algorithm first, implementation never, also you're stupid and horrible for thinking that a human being could ever be smarter than a piece of software engineered by human beings, or that multiple small numbers can add up into a larger number!" there, Thomas.
Computers: Amiga 1200, DEC VAXStation 4000/60, DEC MicroPDP-11/73
Synthesizers: Roland JX-10/MT-32/D-10, Oberheim Matrix-6, Yamaha DX7/FB-01, Korg MS-20 Mini, Ensoniq Mirage/SQ-80, Sequential Circuits Prophet-600, Hohner String Performer

"\'Legacy code\' often differs from its suggested alternative by actually working and scaling." - Bjarne Stroustrup
 

Offline SamuraiCrow

  • Hero Member
  • *****
  • Join Date: Feb 2002
  • Posts: 2280
  • Country: us
  • Gender: Male
    • Show only replies by SamuraiCrow
Re: newb questions, hit the hardware or not?
« Reply #56 on: July 16, 2014, 01:00:59 PM »
@Thomas Richter

If you used GCC to generate 68k code, I'd have to ask which version.  The 68k backends have bit-rotted terribly due to lack of maintenance.  (And as Matthey observed, it misses loads of optimizations and may have never been fully complete in the first place.  Simply using an optimizing assembler like VASM instead of GAS would help too.)

Also, cycle counting works for compiler designers.  But if you want to avoid cycle counting you ought to choose your compiler more carefully than to use a bit-rotted heap of old code.  The x86 may have nearly-optimal code generation in free compilers but the 68000 has never had terribly good compilers.
 

guest11527

  • Guest
Re: newb questions, hit the hardware or not?
« Reply #57 on: July 16, 2014, 01:41:59 PM »
Quote from: SamuraiCrow;769124
@Thomas Richter

If you used GCC to generate 68k code, I'd have to ask which version.  

That's a plain simple SAS C 6.51 simply because the Os development chain depends on it (with the exception of intuition, actually, which depended on a really rotten compiler.) Anyhow, I stand for my opinion. Pointless argument. If you want to write video codec, the *blitter* is your least problem. The decoding algorithm will make a huge difference, and even there it makes sense to optimize the algorithm first. Been there, done that. That was actually a JPEG 2000 if you care.
 

guest11527

  • Guest
Re: newb questions, hit the hardware or not?
« Reply #58 on: July 16, 2014, 02:37:34 PM »
Quote from: commodorejohn;769123
or that multiple small numbers can add up into a larger number!" there, Thomas.

Please get your math fixed. If you have n algorithms, each of them spends 1/nth of the time in solving a problem, and each of them is speed up by 10%, the overall speedup is still 10%. In fact, if you only speed up one of them (e.g. layers) by 10%, the overall improvement is much smaller, depending on n, and even marginal.  

If, however, you have an algorithm whose running time grows as O(N^2) (N being the number of layers being moved, arranged or resized) and that is replaced by an O(N) algorithm (as it happened, actually), then even for suitably small N the improvement can be enormous. It is really that simple. Do not waste your time optimizing the useless details. Get the big picture correct. Then, if performance is still not right, check whether the problem is, find the bottlenecks, and either get rid of them by changing the algorithm, or optimize only there.
 

Offline SamuraiCrow

  • Hero Member
  • *****
  • Join Date: Feb 2002
  • Posts: 2280
  • Country: us
  • Gender: Male
    • Show only replies by SamuraiCrow
Re: newb questions, hit the hardware or not?
« Reply #59 from previous page: July 16, 2014, 02:44:28 PM »
Quote from: Thomas Richter;769125
That's a plain simple SAS C 6.51 simply because the Os development chain depends on it (with the exception of intuition, actually, which depended on a really rotten compiler.)


I've worked with that one also.  Generates pretty good code most of the time.  If you use deep orders of operation in a formula, it stuffs the temporaries to the stack regardless of how many registers are free for temporary variables.  Also, ChaosLord used SAS/C for his game writing and it occasionally would get confused and generate pure nonsense code that wouldn't even execute.  In that event inline Assembly is unavoidable.

Quote from: Thomas Richter;769125
Anyhow, I stand for my opinion. Pointless argument. If you want to write video codec, the *blitter* is your least problem. The decoding algorithm will make a huge difference, and even there it makes sense to optimize the algorithm first. Been there, done that. That was actually a JPEG 2000 if you care.


I would care, if I were making a bitmap-based codec.  I was planning on using mostly filled vectors though.  I know how to optimize a full-screen vector into the minimum number of line-draws so that the whole screen can do a single vector-fill operation.  That full-screen, full bitplane-depth pass is going to be costly though, as are the uncompressed audio samples.  I may have to triple-buffer the display and use the CPU to clear the screen after it's been displayed just to take some strain off the Blitter.