Welcome, Guest. Please login or register.

Author Topic: New Replacement Workbench 3.1 Disk Sets www.amigakit.com  (Read 6746 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: New Replacement Workbench 3.1 Disk Sets www.amigakit.com
« Reply #74 from previous page: November 14, 2014, 12:36:42 PM »
Quote from: olsen;777325
Well, I'm not using it ;) So far I'm reasonably satisfied with the icon.library which I wrote for the OS 3.5 update. If there is a problem badly in need of a solution, it's in how workbench.library and icon.library interact for directory scanning and display. It just does not scale: larger icon files, more files, no matter what, the performance and responsiveness quickly goes down the drain.

I can't comment on the size and functionality of the replacement icon.library, as I have never used it. I only spent a couple of months rewriting the icon.library from scratch, integrating NewIcons support, colour icon support, etc., making it work better with workbench.library, building new APIs, etc. The focus was not on optimizations for size or speed, because icon loading is pretty much restricted by what the file system can do (and that isn't much). My focus was more on making the whole thing as robust as I could, and on opening up the API.

I appreciate that you are proud of your own work, and I'm not saying it's bad, but PeterK's icon.library really is significantly faster and it supports PNG and AmigaOS 4 icons as well as everything it did before while shrinking over 1/3. There is good and then there is amazing ;).

Quote from: olsen;777325
Hey, I wrote "*more* refined and effective", and the reference was Lattice 'C' 5.04. SAS/C 6 was definitely an improvement considering the quality of the code optimization. However, it did take a couple of years to mature (1995-1996), by which time Commodore could no longer put it to good use.

The guys that did SAS/C were professional, fixing a lot of bugs and giving a lot of Amiga support. The basic code generation was ok but they did some weird stuff like branching into CMP.L #imm,Dn instructions for little if any advantage and they loved the double memory indirect addressing modes like ([d16,An],od.w) which was used more with later versions (IBrowse has 1968 uses). These didn't hurt the 68020 code as much as for 68040 and 68060 where instruction scheduling is sorely needed. There are way too many byte and word operations for the 68060 which is most optimal with longword operations also. The direct FPU generation is poor for the 6888x and worse for the 68040+. It should be possible to generate good quality code for the 68020-68060, excluding the 16 bit 68000.

Quote from: olsen;777325
I was told that Commodore was a driving force in getting SAS, Inc. to improve the code generator and the optimizer. They would submit samples of code as produced by the Green Hills compiler (obviously, they could not share compiler source code) and ask the compiler developers at SAS to replicate the results. Step by step the compiler improved.

Looking at other compilers code generation is a good start. It's hard to imagine that Green Hills compiler was once better after looking at the intuition.library disaster. The Green Hills compiler is still around and pretty well respected in the embedded market for it's optimizing capabilities. They still have a ColdFire backend but I couldn't tell whether they had dropped 68k support.

Quote from: olsen;777325
Colour my curious. Where do I start?

The lack of an interactive source debugger is something of a dealbreaker, though. I'd hate to go back to where I was back in 1987. Life's too short for peppering your code with printf()s and assert()s, rerunning it, watching it crash, modifying it and rerunning it all over again. Now CodeProbe may not be much fun, but it's not that big a productivity sink as "old school" debugging is.

Frank Wille's vbcc site is here:

http://sun.hasenbraten.de/vbcc/

Unfortunately, the version there is pretty old now. There should be a new version available anytime (surely before the end of the year) with a huge number of bug fixes and improvements. You can always e-mail Frank for the newest sources also.

The newest version of vbcc for the Amiga 68k target generates Amiga symbols and debug information (with -g) in the Amiga Hunk format executables that is compatible with CodeProbe and BDebug. CodeProbe is a good debugger but I prefer BDebug from the Barfly package.

http://aminet.net/dev/asm/BarflyDisk2_00.lha

BDebug is another great developer tool you should try if you have not.

Quote from: Thomas Richter;777326
Sorry, that's not exactly what the problem is. The trick that is used here is the following: If you have a file like this:

struct Library *GfxBase;

void LIBFUNC foo(struct RastPort *rp)
{
 SetAPen(rp,0);
}

then, with a properly defined LIBFUNC macro, SAS/C will place "GfxBase" as an object into the library you are creating (*not* the data segment), will create a library entry for "foo" and will use a6 to load a4 with the "near" data pointer, i.e. it will use the library base as "near" data. SAS/C can also create the fd file for your, or place the functions at specific offsets in the library.

Don't you mean the LIBFUNC macro causes the function to use A4 like a small data pointer to load A6 from GfxBase in the library base? What does the LIBFUNC macro look like?

Quote from: Thomas Richter;777326
As said, SAS/C provides a lot of system-specific "magic" to support AmigaOs development and a bit of the toolchain depends on this magic.

There are a couple of other "magical" support features it supports, so it does require some work to move from one compiler to another for system components. For user programs, this is much less an issue.

The C language did not specify much back then so every compiler had it's own customized features and pragmas. We have better C standards with C99 now that should be used where possible over custom compiler features. It's always a pain to convert the old stuff though. You should see the GCCisms that the AROS 68k build system uses and would need updated to compile with vbcc. It makes these problems look easy.
« Last Edit: November 14, 2014, 12:39:18 PM by matthey »
 

Offline wawrzon

Re: New Replacement Workbench 3.1 Disk Sets www.amigakit.com
« Reply #75 on: November 14, 2014, 01:08:25 PM »
Quote from: Thomas Richter;777321
Relax, nobody is going to release a new kickstart anyhow, exactly due to the legal problems it would cause.


therefore i am trying to push for compiling aros kickstartt and eventually aros68k with vbcc, at least partly:

http://eab.abime.net/showthread.php?p=986550#post986550
http://aros-exec.org/modules/newbb/viewtopic.php?topic_id=8929&forum=2

i will document the progress also here (for starters):
http://www.amigaforum.de/index.php?topic=42.msg216#new
please understand that this is an initoative of an noob, so it relays on experise of others, may take ages and lead to nowhere. but "learning by doing"

anyway any help is apreciated. i succesfully built a vbcc crosscompiler and am currently getting aros amiga-m68k target built regular way under gcc on ubuntu. one step at a time.
 

guest11527

  • Guest
Re: New Replacement Workbench 3.1 Disk Sets www.amigakit.com
« Reply #76 on: November 14, 2014, 02:53:42 PM »
Quote from: matthey;777337
IThe basic code generation was ok but they did some weird stuff like branching into CMP.L #imm,Dn instructions for little if any advantage
That's what the peephole optimizer does, to avoid a branch around an instruction. Instead, this instruction is hiding in the data of the cmp.l# instruction. That's probably not an advantage on the 060 as it probably invalidates the branch-prediction cache, but it was at least a common optimization on even older microprocessors, like the 6502 (yes, really) where the BIT instruction served a similar purpose for avoiding "short branches".

Quote from: matthey;777337
There are way too many byte and word operations for the 68060 which is most optimal with longword operations also.
That rather depends on the source code.  If the source uses a WORD, then what can the compiler do? There's an interesting interaction with the C language here I only mention for the curious (it doesn't make the compiler better or worse, it is just a feature that makes the time for the optimizer harder) and that is integer promotion. As soon as you have an operation with an integer literal (or any wider data type in general), it is first promoted to int. Thus, even something trivial like

short x = 2;
short y = x+1;

requires (as by language) first to widen the x to an int, then add one, then cast it down to a short. This is a trivial example where the optimizer will likely remove all the cruft of wideing and shorting, but there are more complicated examples like

if (x+1 == y)

which first widen x on the left, add the integer one, then y has to be widened, and then a full 32 bit comparison has to be made. And that's of course not the same as just adding a one to x in word since it differs in one single race condition, so all the widening cannot be optimized away. If I write that in assembler, and I know from other constraints that a wrap-around cannot happen (or I don't want to bother about for other reasons, who knows..) then I can do that of course much better by a single addq.w #1,dx, cmp.w. But its strictly speaking not correct, and not the same comparison.

In the end, it doesn't really matter much, unless you're in a tight loop somewhere in a computing-intense algorithm, and then you would probably look closer on what is actually happening there.

So, long story short: Some of the "seemingly useless" instructions are really there to follow the C language specs.


Quote from: matthey;777337
Looking at other compilers code generation is a good start. It's hard to imagine that Green Hills compiler was once better after looking at the intuition.library disaster.
It's not really a disaster. Greenhill haven't had registerized parameters, thus you see a lot of register ping-pong, but that's probably the only bad thing about it. Besides, it isn't heavy duty code to begin with.


Quote from: matthey;777337
Don't you mean the LIBFUNC macro causes the function to use A4 like a small data pointer to load A6 from GfxBase in the library base? What does the LIBFUNC macro look like?
LIBFUNC for SAS/C is just __saveds, i.e. requires the compiler to reload its NEAR data pointer, i.e. a4. Then there is another magic compiler switch that tells the compiler that the near data pointer comes actually from A6 plus an offset, where the offset depends on the size of the library base and whether there is any other magic that requires an offset from the library, to be determined at link time.

Thus, what the compiler essentially generates is a

lea NEAR(a6),a4

for __saveds in library code. Since NEAR is unknown until link time, the instruction remains in, even when the linker replaces the NEAR with zero. Which is the reason why you see some "seemingly useless" "lea 0(a6),a4) in layers, because the compiler could not possibly figure out that here NEAR=0, and at link time it is too late to remove that.


Quote from: matthey;777337
The C language did not specify much back then so every compiler had it's own customized features and pragmas. We have better C standards with C99 now that should be used where possible over custom compiler features.
Yes, but they don't have anything to say about library generation. A "shared library" is nothing C (or C99) has to say anything about, leave alone an Amiga shared library. So in one way or another, it requires some compiler support to build one, even nowadays. SAS/C offered a pretty good infrastructure for that, which is the reason why it is still what I use today. There's nohting in C99 to help you with that. The only other alternative would be to use a couple of assembler stubs (aka "register ping-pong") which is what happened for intuition. You didn't like that either. (-:

Quote from: matthey;777337
It's always a pain to convert the old stuff though. You should see the GCCisms that the AROS 68k build system uses and would need updated to compile with vbcc. It makes these problems look easy.

I have no doubt about that, but that's pretty much the reason why I'm reluctant to switch the compiler. These are code-generation problems I want to stay away from. I would reconsider if there would be the potential for a dramatic speedup or a dramatic size reduction when switching, but that doesn't look too likely.
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: New Replacement Workbench 3.1 Disk Sets www.amigakit.com
« Reply #77 on: November 14, 2014, 04:46:37 PM »
Quote from: Thomas Richter;777348
That's what the peephole optimizer does, to avoid a branch around an instruction. Instead, this instruction is hiding in the data of the cmp.l# instruction. That's probably not an advantage on the 060 as it probably invalidates the branch-prediction cache, but it was at least a common optimization on even older microprocessors, like the 6502 (yes, really) where the BIT instruction served a similar purpose for avoiding "short branches".


I don't see other compilers like GCC or vbcc doing this trick. The 68020+ can hide code in a TRAPcc.W and TRAPcc.L instruction (TPF in ColdFire) as well to avoid a branch sometimes with an if-then-else (actually recommended and described in the ColdFirePRM). This technique can save a couple of cycles on the 68040 with it's large instruction fetch but it's not worth it on the 68020 and is usually slower on the 68060. It's not very friendly for debugging either.

Quote from: Thomas Richter;777348

That rather depends on the source code.  If the source uses a WORD, then what can the compiler do? There's an interesting interaction with the C language here I only mention for the curious (it doesn't make the compiler better or worse, it is just a feature that makes the time for the optimizer harder) and that is integer promotion. As soon as you have an operation with an integer literal (or any wider data type in general), it is first promoted to int. Thus, even something trivial like

...

In the end, it doesn't really matter much, unless you're in a tight loop somewhere in a computing-intense algorithm, and then you would probably look closer on what is actually happening there.


Modern compilers will commonly promote shorter integer sizes to the register size, if there isn't too much overhead. Many superscalar processors like the 68060 have internal optimizations for and can only forward full register results. Unfortunately, the 68060 doesn't have the instructions it needs to make this happen efficiently like the MVS/MVZ ColdFire instructions and the MOVSX/MOVZX x86 instructions. I've been trying to get the ColdFire instructions (as encoded on the CF) into a new 68k like ISA for the new fpga processors coming out where most recently I have referred to them as SXT/ZXT (SXT replacing the EXT name which is still supported).

http://www.heywheel.com/matthey/Amiga/68kF_PRM.pdf

Promoting integers to the register size simplifies the backend also. Vbcc does this a lot and generates better 68060 code as a result with some cost to 68020 code speed and size (the 68040 handles the big instructions but is slowed some by extra instructions). GCC is somewhere in the middle between vbcc and SAS/C, trying to be smart about whether to promote registers or not.

Quote from: Thomas Richter;777348

It's not really a disaster. Greenhill haven't had registerized parameters, thus you see a lot of register ping-pong, but that's probably the only bad thing about it. Besides, it isn't heavy duty code to begin with.


It may not be a processor intensive library but it's one that could have a significant percentage of it's code optimized away. Register ping-pong usually isn't too bad but it can commonly result in register spilling which is bad. This compiler did a poor job of peephole optimizing also.

Quote from: Thomas Richter;777348

LIBFUNC for SAS/C is just __saveds, i.e. requires the compiler to reload its NEAR data pointer, i.e. a4. Then there is another magic compiler switch that tells the compiler that the near data pointer comes actually from A6 plus an offset, where the offset depends on the size of the library base and whether there is any other magic that requires an offset from the library, to be determined at link time.

Thus, what the compiler essentially generates is a

lea NEAR(a6),a4

for __saveds in library code. Since NEAR is unknown until link time, the instruction remains in, even when the linker replaces the NEAR with zero. Which is the reason why you see some "seemingly useless" "lea 0(a6),a4) in layers, because the compiler could not possibly figure out that here NEAR=0, and at link time it is too late to remove that.


Ok, so vbcc already has the __saveds attribute. It just needs a way to set the global data pointer in A4 to something besides what small data uses. It would be nice to add support for resident/pure executables that use global variables (in allocated memory) like SAS/C supports as this is also similar. I need to do some further research and investigating (I have the SAS/C manuals which are good). It may help if you could show how the custom data pointer is setup.

The "lea 0(a6),a4" may be difficult for the compiler to optimize but it's not a problem for a peephole optimizing assembler like vasm. There are only a few link code related optimizations that vasm can't make (where the section is unknown before linking) which are usually branches (JSR->BSR and JMP->BRA).
 

guest11527

  • Guest
Re: New Replacement Workbench 3.1 Disk Sets www.amigakit.com
« Reply #78 on: November 14, 2014, 06:19:18 PM »
Quote from: matthey;777359
Modern compilers will commonly promote shorter integer sizes to the register size, if there isn't too much overhead. Many superscalar processors like the 68060 have internal optimizations for and can only forward full register results. Unfortunately, the 68060 doesn't have the instructions it needs to make this happen efficiently like the MVS/MVZ ColdFire instructions and the MOVSX/MOVZX x86 instructions. I've been trying to get the ColdFire instructions (as encoded on the CF) into a new 68k like ISA for the new fpga processors coming out where most recently I have referred to them as SXT/ZXT (SXT replacing the EXT name which is still supported).
That's pretty much what coldfire was about: Drop everything in the ISA what is not required for C, and arithmetic for words or bytes isn't. Instead, the move-and-extend came in, which are useful for reading unsigned words or bytes.  
Quote from: matthey;777359
It may not be a processor intensive library but it's one that could have a significant percentage of it's code optimized away. Register ping-pong usually isn't too bad but it can commonly result in register spilling which is bad. This compiler did a poor job of peephole optimizing also.
Maybe so. A recompile wouldn't hurt, but see above. It ain't going to happen.    
Quote from: matthey;777359
Ok, so vbcc already has the __saveds attribute. It just needs a way to set the global data pointer in A4 to something besides what small data uses. It would be nice to add support for resident/pure executables that use global variables (in allocated memory) like SAS/C supports as this is also similar. I need to do some further research and investigating (I have the SAS/C manuals which are good). It may help if you could show how the custom data pointer is setup.
It's basically set at the beginning of the __MERGED segment, or, if that grows larger than 32K, right in the middle so data can be accessed with positive or negative offsets. I don't think the manual states that, at least I don't remeber having it seen there. The trick for library code is that it is reloaded relative to the library base, and not relative to the __MERGED segment, so the data is allocated by exec when the library is created, allowing the lib to be placed in ROM.  
Quote from: matthey;777359
The "lea 0(a6),a4" may be difficult for the compiler to optimize but it's not a problem for a peephole optimizing assembler like vasm. There are only a few link code related optimizations that vasm can't make (where the section is unknown before linking) which are usually branches (JSR->BSR and JMP->BRA).

No, look, the lea __NEAR(a4),a6 is never seen by the assembler. The __NEAR section is generated by the linker (or not generated at all, in case of the library), and SLink is smart about to which point of the segment a4 will point, depending on how large the data is. Thus, in the end, __NEAR can be zero, or 0x8000, or some other value, if constant data is moved into the __MERGED segment that is addressed by absolute addresses rather than relative to a4. Thus, it is only the linker that knows what the symbol will be, and it is the linker that puts it in. For layers, the library base is so small that it is just put at the beginning and __NEAR remains zero, but when the linker puts in the zero offset, it is too late to patch up the code for a smaller move instruction as the linker would have to resolve all relative branches around such lea's. Which, of course, it cannot do since the references are simply lost at this point.  The best it could do is to replace it by a move and a NOP, but that's not exactly an improvement either (in fact, it spills the pipeline).
 

Offline wawrzon

Re: New Replacement Workbench 3.1 Disk Sets www.amigakit.com
« Reply #79 on: November 14, 2014, 10:37:51 PM »
@thor

if you think vbcc is lacking in comparison with sas/c it might be work to talk to phx personally. perhaps there is room for improvement. i have a feeling he is open for suggestions and it would be great to have an up to date compiler for amiga-m68k that is actively maintained and aware of system requirements, which apparently is not the simplest case when going with gcc.
 

Offline olsen

Re: New Replacement Workbench 3.1 Disk Sets www.amigakit.com
« Reply #80 on: November 15, 2014, 08:58:26 AM »
Quote from: matthey;777337
I appreciate that you are proud of your own work, and I'm not saying it's bad, but PeterK's icon.library really is significantly faster and it supports PNG and AmigaOS 4 icons as well as everything it did before while shrinking over 1/3. There is good and then there is amazing ;).
I understand that PeterK's icon.library is written entirely in assembly language. Given the complexity of the whole design, warts and everything, that's nothing short of very impressive. Even the integrated PNG decoder is in fact integrated into the library code and not merely latched onto it. It appears that the library has been under constant development for almost four years now, and it shows.

Suffice it to say that spending four years on polishing an assembly language implementation of icon.library is the kind of luxury that few are able to afford, and which in the context of the OS 3.5 project would not have been an option.
Quote
The guys that did SAS/C were professional, fixing a lot of bugs and giving a lot of Amiga support. The basic code generation was ok but they did some weird stuff like branching into CMP.L #imm,Dn instructions for little if any advantage and they loved the double memory indirect addressing modes like ([d16,An],od.w) which was used more with later versions (IBrowse has 1968 uses). These didn't hurt the 68020 code as much as for 68040 and 68060 where instruction scheduling is sorely needed. There are way too many byte and word operations for the 68060 which is most optimal with longword operations also. The direct FPU generation is poor for the 6888x and worse for the 68040+. It should be possible to generate good quality code for the 68020-68060, excluding the 16 bit 68000.
It's possible that this development path was eventually taken. SAS, Inc. acquired the compiler so that it could more easily port their stochastic analysis package to more platforms and produce better quality ports. At the time the interest was less in making a better Amiga compiler, but in providing other 68k platforms (namely the Apple Macintosh) with the SAS flagship software.

As far as I know the Amiga compiler business did not actually make much money (probably lost money), but it became a convenient test bed for compiler development. At the time it's likely that there were more users of the SAS/C compiler for the Amiga than there were customers for the SAS software that was built using the same compiler technology. Commercial support for SAS/C ended long before the last patch for SAS/C was released for the Amiga, and it's possible that further enhancements to the code generation were made that never saw integration into SAS/C for the Amiga.
Quote
Looking at other compilers code generation is a good start. It's hard to imagine that Green Hills compiler was once better after looking at the intuition.library disaster. The Green Hills compiler is still around and pretty well respected in the embedded market for it's optimizing capabilities. They still have a ColdFire backend but I couldn't tell whether they had dropped 68k support.
As far as I know the Green Hills compiler (referred to as "metacc" in the slim "AmigaDOS developer's manual", ca. 1985) used to have a major advantage not just in performing data flow analysis, but also in generating code sequences, back in 1984/1985.

This was an optimizing 'C' compiler intended for use on Sun 2 / Sun 3 workstations, which was adapted so that it emitted Amiga compatible 68k assembly language source code (as an intermediate language). That source code was then translated using a 'C' language precursor version of the ancient "assem" assembler into object code format suitable for linking. Mind you, this was not an optimizing assembler, just a plain translator. All optimizations happened strictly within the 'C' compiler.

What exactly rubs you the wrong way with Intuition?
« Last Edit: November 15, 2014, 11:06:47 AM by olsen »
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: New Replacement Workbench 3.1 Disk Sets www.amigakit.com
« Reply #81 on: November 15, 2014, 08:38:49 PM »
Quote from: Thomas Richter;777374
It's basically set at the beginning of the __MERGED segment, or, if that grows larger than 32K, right in the middle so data can be accessed with positive or negative offsets. I don't think the manual states that, at least I don't remeber having it seen there. The trick for library code is that it is reloaded relative to the library base, and not relative to the __MERGED segment, so the data is allocated by exec when the library is created, allowing the lib to be placed in ROM.  

It looks like vlink for the 68k Amiga will use the value 0x7ffe (32766) for the small data __MERGED section offset unless overridden.

Quote from: Thomas Richter;777374
No, look, the lea __NEAR(a4),a6 is never seen by the assembler. The __NEAR section is generated by the linker (or not generated at all, in case of the library), and SLink is smart about to which point of the segment a4 will point, depending on how large the data is. Thus, in the end, __NEAR can be zero, or 0x8000, or some other value, if constant data is moved into the __MERGED segment that is addressed by absolute addresses rather than relative to a4. Thus, it is only the linker that knows what the symbol will be, and it is the linker that puts it in. For layers, the library base is so small that it is just put at the beginning and __NEAR remains zero, but when the linker puts in the zero offset, it is too late to patch up the code for a smaller move instruction as the linker would have to resolve all relative branches around such lea's. Which, of course, it cannot do since the references are simply lost at this point.  The best it could do is to replace it by a move and a NOP, but that's not exactly an improvement either (in fact, it spills the pipeline).

I see what you mean about the lea __NEAR(a4),a6 optimization now. The symbol isn't evaluated until link time after the assembler. Vbcc does have cross-module optimizations (high optimization levels have bugs so I don't generally use more than -O2) which could take care of the JSR->BSR and JMP->BRA optimization I talked about earlier but it's highly doubtful it would be able to take care of symbols that are defined for the linker. Vbcc's 68k backend generated assembler code for loading the small data base looks like this.

Code: [Select]
  xref t_LinkerDB
   lea t_LinkerDB,a4

That's not going to work as the library base is dynamically allocated making it impossible to put a label (t_LinkerDB) there. I believe it would need a "MOVE.L custom_DB,a4" or similar. So much for my hopes of being able to use most of the small data handling. It looks like it would need some custom work in the backend to make it happen and it's tricky. Here are links to the vbcc 68k backend (machine.c) and startup.asm so you can have a look.

http://www.heywheel.com/matthey/Amiga/machine.c
http://www.heywheel.com/matthey/Amiga/startup.asm

Frank Wille can be e-mailed for the latest version of the vbcc sources.

Quote from: wawrzon;777418
if you think vbcc is lacking in comparison with sas/c it might be work to talk to phx personally. perhaps there is room for improvement. i have a feeling he is open for suggestions and it would be great to have an up to date compiler for amiga-m68k that is actively maintained and aware of system requirements, which apparently is not the simplest case when going with gcc.

If we could figure out what to do, we could make the changes and do the testing but Volker would need to look it over and ok it. The backend is complex enough that it's easy to have unintended consequences with changes.

Quote from: olsen;777487
This was an optimizing 'C' compiler intended for use on Sun 2 / Sun 3 workstations, which was adapted so that it emitted Amiga compatible 68k assembly language source code (as an intermediate language). That source code was then translated using a 'C' language precursor version of the ancient "assem" assembler into object code format suitable for linking. Mind you, this was not an optimizing assembler, just a plain translator. All optimizations happened strictly within the 'C' compiler.

What exactly rubs you the wrong way with Intuition?

There isn't anything wrong with how the Green Hill's compiler works but there are signs of lack of maturity in the compiler like this:

Code: [Select]
  movea.w ($c,a3),a0  ; movea.w moves and then sign extends to a longword
   move.l a0,d7  ; d7 is used later so this is needed
   ext.l d7  ; Unnecessary instruction! We are already sign extended!
   movea.l d7,a0  ; Unnecessary instruction! We are already sign extended!
Waste: 2 instructions, 4 bytes

Code: [Select]
  movea.w ($1c,a0),a1  ; movea.w moves and then sign extends to a longword
   move.l a1,d1  ; Unnecessary instruction! d1 is not used later!
   ext.l d1  ; Unnecessary instruction! We are already sign extended!
   movea.l d1,a1  ; Unnecessary instruction! We are already sign extended!
   cmpa.l a3,a1
Waste: 3 instructions, 6 bytes, 1 scratch register

Code: [Select]
  cmpi.l #$f3333334,d2
   blt lab_1521c
   cmpi.l #$f3333334,d2  ; Unnecessary big instruction! CC from first cmpi still valid
   bne lab_1521c
Waste: 1 instruction, 6 bytes

Code: [Select]
  divs.w #2,d0  ; could be asr.l #1,d0 as the remainder is unused
Waste: 2 bytes and a bunch of cycles

Code: [Select]
  move.l d0,d0  ; funny or should we say scary way to do a tst.l
   seq d0  ; the next 3 instructions could be replaced with and.l #1,d0
   neg.b d0
   ext.w d0  ; 68020 extb.l could replace next 2 instructions
   ext.l d0
   lea ($c,sp),sp
   ext.l d0  ; Unnecessary instruction!
Waste: 3 instructions, 2 bytes

When I see MOVE.L Dn,Dn, I know the compiler has problems with it's register management. Compilers repeat the same mistakes of course. Add in all the function stubs because it can't do registerized function parameters and it's pretty ugly. It might be passable for a 68000 but for a 68020+ there are a lot of places that EXTB.L could be used, MOVE.L mem,mem instead of MOVE.B mem,mem and index register scaling in addressing modes.
« Last Edit: November 15, 2014, 08:43:21 PM by matthey »
 

guest11527

  • Guest
Re: New Replacement Workbench 3.1 Disk Sets www.amigakit.com
« Reply #82 on: November 15, 2014, 09:49:04 PM »
Quote from: matthey;777555
It looks like vlink for the 68k Amiga will use the value 0x7ffe (32766) for the small data __MERGED section offset unless overridden.
Looks I got it almost right, it's been a long time. I had the impression that it also places a4 at the beginning of __MERGED when this is short enough. May also depend on some other options, for example whether constant data is moved into this section or remains in the text section.  
Quote from: matthey;777555
That's not going to work as the library base is dynamically allocated making it impossible to put a label (t_LinkerDB) there.
It would need a lea _DATA(a6),a4 here for libraries as the "bss-segment" is part of the library base, and the compiler generated library startup code would have taken care to copy the constant data to there.  SAS has also a couple of additional options, as to give every program opening the library a new library base, if you configure it right. As said, there is a lot of magic happening here, also for "load and stay resident" programs, where SAS/C also provides a startup that copies the data segment into a private memory segment and relocates data from a private database of relocation offsets. I personally had never use for this, but it's just another example what the compiler was able to offer as a service and how well it was integrated into the system.  As far as the quality of the compiled code goes, gcc 2.95 was ahead of SAS/C, but it was harder to use, even slower, and it was hard to use it for anyhting else but POSIX compliant C, i.e. integration was much worse.  
Quote from: matthey;777555
There isn't anything wrong with how the Green Hill's compiler works but there are signs of lack of maturity in the compiler like this...
Just a couple of short comments here: First of all, you're describing a set of low-level peephole optimizations the compiler misses. I don't have any experience with greenhill, but its not untypical that many optimizations happen at a higher level though code-flow analysis, e.g. leading to dead-code removal that do not short at instruction level. Thus, just from looking at the instruction level, you only get a very incomplete view on the compiler and its performance. What you rather see here is probably a lack of maturity of the code-generator, but that's only one out of many phases a compiler has to go through. That doesn't mean that the upper levels were any better, I just don't know. All I'm saying is that your judgement is a bit premature.  In addition, allow me to fix a couple of your suggestions: First, divs #2,d0 is *not* equivalent to asr.w #1,d0. The former rounds to zero, the latter rounds to minus infinity. In specific (-1/2) = 0, but (-1 >> 1) = -1, so you cannot replace one without the other. Condition codes are also not equivalent.  tst.l d0, seq d0 is also not equivalent to and.l #1,d0. move.l d0,d0 is probably untypical for a tst, but it would probably work completely alike, but it is indeed more likely that the code generator had a hickup here and did not notice that there was no need to generate the code in first place.   Allow me to add that if you look nowadays at code gcc generates on x64 platforms you'll find that it generates considerably ugly assembler from your sources, with many code repetitions and jumps around labels etc. Still, the results show that the decisions made by the compiler were not that bad, its partially due to loop unrollment, and it also features a pretty good high-level code analysis that can shorten a lot of computations - but at a much higher level you would usually notice. Thus, it's not always that easy to see from the assembler what the compiler really intended, or what the original code should have been. That works for modern compilers, at least, only at the very low optimizer settings.
 

Offline matthey

  • Hero Member
  • *****
  • Join Date: Aug 2007
  • Posts: 1294
    • Show only replies by matthey
Re: New Replacement Workbench 3.1 Disk Sets www.amigakit.com
« Reply #83 on: November 16, 2014, 12:23:03 AM »
Quote from: Thomas Richter;777560
  It would need a lea _DATA(a6),a4 here for libraries as the "bss-segment" is part of the library base, and the compiler generated library startup code would have taken care to copy the constant data to there.  SAS has also a couple of additional options, as to give every program opening the library a new library base, if you configure it right. As said, there is a lot of magic happening here, also for "load and stay resident" programs, where SAS/C also provides a startup that copies the data segment into a private memory segment and relocates data from a private database of relocation offsets. I personally had never use for this, but it's just another example what the compiler was able to offer as a service and how well it was integrated into the system.  As far as the quality of the compiled code goes, gcc 2.95 was ahead of SAS/C, but it was harder to use, even slower, and it was hard to use it for anyhting else but POSIX compliant C, i.e. integration was much worse.

I agree that it would be good to add support for several of the SAS/C custom data pointer features at the same time as the support needed will be similar. It would probably need to link with a custom startup as well as requiring changes in the 68k backend and some new options in vc (vbcc launch program). I still need to do some more research and I would probably need Frank's help. He wrote vlink so he knows the linker stuff like the back of his hand which could be very useful. If you looked at the source, it's complex but at least manageable unlike GCC. Jason McMullan tried to change some things in GCC for AROS and what follows are some of his comments.

 
Quote from: Jason McMullan
Fix gcc to give diagnostics when it feels 'forced' to use a frame pointer, sufficient to allow a programmer to make the correct changes so that gcc would not make a frame pointer in that routine.
 
 - This code is very convoluted and ugly. I tried this once, and ran away screaming.
 

 
Quote from: Jason McMullan
Fix gcc to never need a frame pointer on m68k
 
 - This may be impossible. reload1.c is an impenetrable morass of evil.
 

With vbcc we have friendly support help, manageable sources and the source is available. That's about as good as it gets.

Quote from: Thomas Richter;777560
  Just a couple of short comments here: First of all, you're describing a set of low-level peephole optimizations the compiler misses.

I don't believe what I have described are peephole optimizations except for possibly the DIVS instruction (MOVE Dn,Dn -> TST Dn would be peephole but it's ridiculous enough that the best peephole assemblers don't look for it). Integer DIV by immediates can involve a complex algorithm that converts the DIV to a multiply by a magic number, shifts and adds. This is beyond a this for that low level substitution. GCC has been doing magic number integer DIV to MUL since the Amiga days but Motorola ignorantly took out an important tool for the multiply on the 68k by removing the 64 bit multiply. The other examples show that the compiler was not aware at times of the data types and sizes in it's registers and had issues with it's register management. These are serious issues that could potentially lead to a crash but it's bad enough that they cause poor quality code.

Quote from: Thomas Richter;777560
I don't have any experience with greenhill, but its not untypical that many optimizations happen at a higher level though code-flow analysis, e.g. leading to dead-code removal that do not short at instruction level. Thus, just from looking at the instruction level, you only get a very incomplete view on the compiler and its performance. What you rather see here is probably a lack of maturity of the code-generator, but that's only one out of many phases a compiler has to go through. That doesn't mean that the upper levels were any better, I just don't know. All I'm saying is that your judgement is a bit premature.  In addition, allow me to fix a couple of your suggestions: First, divs #2,d0 is *not* equivalent to asr.w #1,d0. The former rounds to zero, the latter rounds to minus infinity. In specific (-1/2) = 0, but (-1 >> 1) = -1, so you cannot replace one without the other. Condition codes are also not equivalent.

DIVS.W is 32/16 so it is necessary to start with an ASR.L #1,D0. You are correct that a correction is necessary. Maybe something like this:

Code: [Select]
  asr.l #1,d0
   bcc.b .skip
   addq.l #1,d0
.skip:

This is still a big savings over a hardware divide. This is normally not done by a peephole optimizer.

Quote from: Thomas Richter;777560
 tst.l d0, seq d0 is also not equivalent to and.l #1,d0.

My point was that a Scc+NEG.B+EXT.W+EXT.L can be replaced by a Scc+AND.L#1 which is significantly faster on most 68k processors.

Quote from: Thomas Richter;777560
move.l d0,d0 is probably untypical for a tst, but it would probably work completely alike, but it is indeed more likely that the code generator had a hickup here and did not notice that there was no need to generate the code in first place.   Allow me to add that if you look nowadays at code gcc generates on x64 platforms you'll find that it generates considerably ugly assembler from your sources, with many code repetitions and jumps around labels etc. Still, the results show that the decisions made by the compiler were not that bad, its partially due to loop unrollment, and it also features a pretty good high-level code analysis that can shorten a lot of computations - but at a much higher level you would usually notice. Thus, it's not always that easy to see from the assembler what the compiler really intended, or what the original code should have been. That works for modern compilers, at least, only at the very low optimizer settings.

Optimized code may be ugly, especially if it's big and increases complexity. I didn't give any marks for looks though, other than MOVE Dn,Dn remark. This code does not look particularly complex. It looks like it's mostly data movement with a lot of sign extensions, especially word->longword. It's too bad compilers haven't learned that this comes for free to an address register.

I did an ADis disassemble to a vasm reassemble with peephole optimizations and intuition.library went from 114536 to 107776 bytes for a savings of 6760 bytes or 5.90%. I would expect that compiling for the 68020 would save somewhere around that much again. I don't know if vbcc would do any better as it doesn't always generate the most optimized code yet either (vasm tries to clean it up but only safe peephole optimizations can be made). Vbcc wouldn't need the function stubs anymore so that could save more.
« Last Edit: November 16, 2014, 12:28:47 AM by matthey »
 

Offline danbeaver

Re: New Replacement Workbench 3.1 Disk Sets www.amigakit.com
« Reply #84 on: November 16, 2014, 09:18:42 AM »
Did we go off topic?
 

guest11527

  • Guest
Re: New Replacement Workbench 3.1 Disk Sets www.amigakit.com
« Reply #85 on: November 16, 2014, 09:29:13 AM »
Quote from: danbeaver;777597
Did we go off topic?

Pretty much so, but that's actually a pretty good discussion here and more fruitful than most others I've seen. Thus, if we can stay off-topic in that direction, I would appreciate it..
 

Offline amigadave

  • Lifetime Member
  • Hero Member
  • *****
  • Join Date: Jul 2004
  • Posts: 3836
    • Show only replies by amigadave
    • http://www.EfficientByDesign.org
Re: New Replacement Workbench 3.1 Disk Sets www.amigakit.com
« Reply #86 on: November 16, 2014, 09:32:56 AM »
Quote from: wawrzon;777418
@thor

............ it would be great to have an up to date compiler for amiga-m68k that is actively maintained and aware of system requirements, which apparently is not the simplest case when going with gcc.

With the possibility of getting new accelerators for Commodore Amiga computers using an FPGA loaded with a Soft-Core 680x0 running at a speed equal to 400MHz or more, new programming tools for 68k coding would be a great thing to see created and maintained.
How are you helping the Amiga community? :)
 

Offline olsen

Re: New Replacement Workbench 3.1 Disk Sets www.amigakit.com
« Reply #87 on: November 16, 2014, 11:27:19 AM »
Quote from: matthey;777555

There isn't anything wrong with how the Green Hill's compiler works but there are signs of lack of maturity in the compiler like this:

...


When I see MOVE.L Dn,Dn, I know the compiler has problems with it's register management. Compilers repeat the same mistakes of course. Add in all the function stubs because it can't do registerized function parameters and it's pretty ugly. It might be passable for a 68000 but for a 68020+ there are a lot of places that EXTB.L could be used, MOVE.L mem,mem instead of MOVE.B mem,mem and index register scaling in addressing modes.
I think I understand your criticism. My experience with 68k assembly language could charmingly be described as "finding exceedingly clever ways not to use it", so I'm in not an expert in writing or optimizing it ;)

From what I gather, an optimizing assembler would have had difficulties improving upon the sequences the compiler emitted, because the relationships between the sign extension and the repetition thereof are not easy to spot.

Shuffling register contents around in order to avoid pushing them to the stack looks like a reasonable strategy in 1983's terms. This sort of thing only started to become a problem when it caused pipeline stalls, didn't it? For an 68010 or 68020, which would have been the targets of the compiler, it should have worked fine.

The compiler seems to be restricted to emitting 68000 code only, so no "extb.l" for you ;)

Given its age (the 68000 didn't become available until 1979, if I remember correctly, and the core of the compiler seems to date back to 1983), this still makes it a pretty good compiler which no doubt would continue to mature over the 1980'ies when the 68000 family saw widespread adoption, not just in desktop computers.

Aside from the ABI issues (function parameters passed on the stack), I would criticize the compiler for being obtuse in error reporting (at least, the logs which I saw are not particularly helpful). It's (of course) following K&R syntax rules with a few oddities included. For example, you could pass structures "by value", and the compiler would cleverly pack 'struct Point { SHORT x; SHORT y; }' into a 32 bit scalar value which would be passed on the stack. Problem is, Intuition *assumes* that the 'struct Point' parameter will be passed as a scalar value, and if you change compilers (say to SAS/C 6.50) then this assumption will no longer hold.
 

Offline wawrzon

Re: New Replacement Workbench 3.1 Disk Sets www.amigakit.com
« Reply #88 on: November 16, 2014, 11:27:19 AM »
Quote from: Thomas Richter;777599
Pretty much so, but that's actually a pretty good discussion here and more fruitful than most others I've seen. Thus, if we can stay off-topic in that direction, I would appreciate it..


im all for it.
 

Offline Minuous

Re: New Replacement Workbench 3.1 Disk Sets www.amigakit.com
« Reply #89 on: November 16, 2014, 12:57:48 PM »
Quote from: kolla;777229
Updated kickstarts would be great too. In general, an AmigaOS "3.2" with all essencial bits updated to what is in 3.9+boing bags, things like shell, various libs, handlers and devices - a 3.9 without all the fluff (Reaction etc) from 3.5 and 3.9.

That wouldn't be great at all. ReAction is hardly "fluff", it's the official graphical interface for AmigaOS. If it was removed then most of Workbench would likewise cease to function (eg. Preferences), it's not some optional component.