Author Topic: GCC asm() warning suppression options? (Read 19940 times)

Karlos · « **Reply #59 from previous page:** July 01, 2009, 07:51:18 AM »

Quote from: Trev;514073

I suspect your template will be faster, but only because the optimizer isn't doing rol's:

Well, that and the fact it doesn't require an additional register to hold the shift value for many of the sizes. Saving a register gives the optimizer more breathing space in 'real' code.

Quote

I don't know anything about how the optimizer works, really, so I don't know why it's always opting for one solution over another.

It could be that the RTL model only supports rotation in one direction? Just a guess.

Trev · « **Reply #60 on:** July 01, 2009, 07:52:05 AM »

I've been digging into GCC's SSA trees, but it's getting late here. Maybe I'll have a moment of clarity tomorrow and actually understand how they work. :-P

Trev · « **Reply #61 on:** July 01, 2009, 07:53:48 AM »

Quote from: Karlos;514076

Well, that and the fact it doesn't require an additional register to hold the shift value for many of the sizes. Saving a register gives the optimizer more breathing space in 'real' code.

And I suspect that GCC will reduce to constant values anything that isn't defined as or determined to be volatile.

Karlos · « **Reply #62 on:** July 01, 2009, 07:54:13 AM »

Well, the day has just started here and I have to head off for work in 2 minutes

Karlos · « **Reply #63 on:** July 01, 2009, 07:55:17 AM »

Quote from: Trev;514078

And I suspect that GCC will reduce to constant values anything that isn't defined as or determined to be volatile.

Not so sure it can do that inside an asm() though.

Trev · « **Reply #64 on:** July 01, 2009, 07:24:56 PM »

In gcc 3.4.4, the traditional shift-or expressions are reduced to rotates while the SSA tree is being built, during constant folding and arithmetic reduction, before tree optimization and RTL generation occur. gcc 4.4.0 is probably similar. (I'm on a system without gcc 4.4.0 at the moment.) No idea what gcc 2.95.3 does yet.

EDIT: Hope to have an understanding later today of why gcc 4.4.0 m68k reduces to a shifted right rotate instead of a left rotate. None of this helps gcc 2.95.3, of course, but it's fun nonetheless.

EDIT2: Constant folding and arithmetic reduction should be done prior to RTL generation in gcc 2.95.3 as well.

Trev · « **Reply #65 on:** July 01, 2009, 09:38:42 PM »

gcc 2.95.3 isn't that bad, actually. For the most part, it optimizes in the same way your template would.

Code: [Select]


static inline unsigned _rotl(unsigned val, int shift)
{
    shift &= 0x1f;
    val = (val>>(0x20 - shift)) | (val << shift);
    return val;
}

static inline unsigned _rotr(unsigned val, int shift)
{
    shift &= 0x1f;
    val = (val<<(0x20 - shift)) | (val >> shift);
    return val;
}

int main()
{
    volatile unsigned x = 1;

    volatile unsigned c = _rotl(x, 64);
    volatile unsigned d = _rotl(x, 48);
    volatile unsigned e = _rotl(x, 41);
    volatile unsigned f = _rotl(x, 36);
    volatile unsigned g = _rotl(x, 32);
    volatile unsigned h = _rotl(x, 24);
    volatile unsigned i = _rotl(x, 16);
    volatile unsigned j = _rotl(x, 9);
    volatile unsigned k = _rotl(x, 4);
    volatile unsigned l = _rotl(x, 0);

    volatile unsigned m = _rotr(x, 0);
    volatile unsigned n = _rotr(x, 4);
    volatile unsigned o = _rotr(x, 9);
    volatile unsigned p = _rotr(x, 16);
    volatile unsigned q = _rotr(x, 24);
    volatile unsigned r = _rotr(x, 32);
    volatile unsigned s = _rotr(x, 36);
    volatile unsigned t = _rotr(x, 41);
    volatile unsigned u = _rotr(x, 48);
    volatile unsigned v = _rotr(x, 64);

    return 0;
}

Code: [Select]

00000000

:
   0:   4e56 ffac       linkw %fp,#-84
   4:   4eb9 0000 0000  jsr 0


   
volatile unsigned x = 1;
   a:   7001            moveq #1,%d0
   c:   2d40 fffc       movel %d0,%fp@(-4)

volatile unsigned c = _rotl(x, 64);
  10:   202e fffc       movel %fp@(-4),%d0
  14:   2d40 fff8       movel %d0,%fp@(-8)

volatile unsigned d = _rotl(x, 48);
  18:   202e fffc       movel %fp@(-4),%d0
  1c:   4840            swap %d0
  1e:   2d40 fff4       movel %d0,%fp@(-12)

volatile unsigned e = _rotl(x, 41);
  22:   202e fffc       movel %fp@(-4),%d0
  26:   7209            moveq #9,%d1
  28:   e3b8            roll %d1,%d0
  2a:   2d40 fff0       movel %d0,%fp@(-16)
  
volatile unsigned f = _rotl(x, 36);
  2e:   202e fffc       movel %fp@(-4),%d0
  32:   e998            roll #4,%d0
  34:   2d40 ffec       movel %d0,%fp@(-20)

volatile unsigned g = _rotl(x, 32);
  38:   202e fffc       movel %fp@(-4),%d0
  3c:   2d40 ffe8       movel %d0,%fp@(-24)
  
volatile unsigned h = _rotl(x, 24);  
  40:   202e fffc       movel %fp@(-4),%d0
  44:   e098            rorl #8,%d0
  46:   2d40 ffe4       movel %d0,%fp@(-28)

volatile unsigned i = _rotl(x, 16);
  4a:   202e fffc       movel %fp@(-4),%d0
  4e:   4840            swap %d0
  50:   2d40 ffe0       movel %d0,%fp@(-32)

volatile unsigned j = _rotl(x, 9);
  54:   202e fffc       movel %fp@(-4),%d0
  58:   e3b8            roll %d1,%d0
  5a:   2d40 ffdc       movel %d0,%fp@(-36)

volatile unsigned k = _rotl(x, 4);
  5e:   202e fffc       movel %fp@(-4),%d0
  62:   e998            roll #4,%d0
  64:   2d40 ffd8       movel %d0,%fp@(-40)
  
volatile unsigned l = _rotl(x, 0);
  68:   202e fffc       movel %fp@(-4),%d0
  6c:   2d40 ffd4       movel %d0,%fp@(-44)

volatile unsigned m = _rotr(x, 0);
  70:   202e fffc       movel %fp@(-4),%d0
  74:   2d40 ffd0       movel %d0,%fp@(-48)

volatile unsigned n = _rotr(x, 4);
  78:   202e fffc       movel %fp@(-4),%d0
  7c:   e898            rorl #4,%d0
  7e:   2d40 ffcc       movel %d0,%fp@(-52)

volatile unsigned o = _rotr(x, 9);
  82:   202e fffc       movel %fp@(-4),%d0
  86:   e2b8            rorl %d1,%d0
  88:   2d40 ffc8       movel %d0,%fp@(-56)

volatile unsigned p = _rotr(x, 16);
  8c:   202e fffc       movel %fp@(-4),%d0
  90:   7210            moveq #16,%d1
  92:   e2b8            rorl %d1,%d0
  94:   2d40 ffc4       movel %d0,%fp@(-60)

volatile unsigned q = _rotr(x, 24);
  98:   202e fffc       movel %fp@(-4),%d0
  9c:   7218            moveq #24,%d1
  9e:   e2b8            rorl %d1,%d0
  a0:   2d40 ffc0       movel %d0,%fp@(-64)

volatile unsigned r = _rotr(x, 32);
  a4:   202e fffc       movel %fp@(-4),%d0
  a8:   2d40 ffbc       movel %d0,%fp@(-68)

volatile unsigned s = _rotr(x, 36);
  ac:   202e fffc       movel %fp@(-4),%d0
  b0:   e898            rorl #4,%d0
  b2:   2d40 ffb8       movel %d0,%fp@(-72)

volatile unsigned t = _rotr(x, 41);
  b6:   202e fffc       movel %fp@(-4),%d0
  ba:   7209            moveq #9,%d1
  bc:   e2b8            rorl %d1,%d0
  be:   2d40 ffb4       movel %d0,%fp@(-76)

volatile unsigned u = _rotr(x, 48);
  c2:   202e fffc       movel %fp@(-4),%d0
  c6:   7210            moveq #16,%d1
  c8:   e2b8            rorl %d1,%d0
  ca:   2d40 ffb0       movel %d0,%fp@(-80)

volatile unsigned v = _rotr(x, 64);
  ce:   202e fffc       movel %fp@(-4),%d0
  d2:   2d40 ffac       movel %d0,%fp@(-84)

return 0;
  d6:   4280            clrl %d0

  d8:   4e5e            unlk %fp
  da:   4e75            rts

If I had to choose a compiler based on this alone, I'd go with gcc 2.95.3. Notice, though, how it does a swap on _rotl(x, ) and not _rorl(x, ). The same goes for direction changes for large shifts.

Your template is better in that regard, but as you noted, you might exclude the templated asm from further optimization. I think, though, that the code should be optimized (or at least scheduled) properly as long as you don't use asm volatile (...).

Karlos · « **Reply #66 on:** July 01, 2009, 09:47:24 PM »

I don't use asm volatile in my templates as there's no reason to presuppose the code has to be emitted in every case. If the compiler can see the code is redundant it should be allowed to remove it.

Strange, though, I didn't get the anticipated rotate instructions generated by gcc 2.95.3. I wonder why?

-edit-

How is it with rotation of 8/16-bit types?

2.95.3's behaviour is slightly moot at this point as I'm hoping to use a later version anyway. Still a bit confused by your findings above though. Perhaps this could be down to stormgcc's backend? I was under the impression they hadn't messed about with the m68k compiler part at all.

Trev · « **Reply #67 on:** July 01, 2009, 10:08:01 PM »

Quote from: Karlos;514157

How is it with 8/16-bit rotate?

I'll take a look.

Quote

2.95.3's behaviour is slightly moot at this point as I'm hoping to use a later version anyway. Still a bit confused by your findings above though. Perhaps this could be down to stormgcc's backend? I was under the impression they hadn't messed about with the m68k compiler part at all.

I don't know. If the source on Alinea's web site is current, we can take a look.

I'm thinking I'll have a go at amigaos targets. I'm building win32 native, non-Cygwin tools, which I'm sure would be useful to others, particularly people that don't want their Cygwin environment hijacked by a single target a la the current solutions out there.

EDIT: The StormC gcc (m68k-storm) is a bit of a mess. They built a modified m68k-amigaos binutils, added an m68k-storm target to gcc (modified from the Geek Gadgets m68k-amigaos), configured for the target, and then created a bunch of StormC projects to bootstrap the compiler, probably from a vanilla Geek Gadgets install. Funky. Anyhow, I don't have it built yet, but I'm not I'm seeing a benefit to completing it. StormC 4 is based on gcc 2.95.2. It's not difficult to get a new native m68k compiler.

And what I was really interested in is why gcc 4.4.0 doesn't optimize correctly--in fact, worse than gcc 2.95.3 (which still isn't optimal). A shiney new gcc 4.4.0 m68k-*-amigaos* with fixed optimization (for this parituclar issue, anyway) and a native newlib implementation would be, well, shiney.

Trev · « **Reply #68 on:** July 03, 2009, 08:23:39 PM »

I've started adding m68k*-*-amigaos* target support to gcc 4.4.0, and I have a freestanding compiler built. There's a bug in the adtools gas parser (or in my build of it), however, that causes assembly like 'jsr a6@(-0x228:W)' to be assembled as 'js a6@(-0x228:W)', resulting in an assembler error. 'jsrr a6@(-0x228:W)' assembles as 'jsr a6@(-0x228:W)', so that's a bit funny. Anyway, I think it has something to do with the way the offsets are parsed. If the bit after -0x is longer than two characters, the parser eats the r in jrs.

So, I need to fix that before I can move forward.

Karlos · « **Reply #69 on:** July 03, 2009, 08:55:13 PM »

Quote from: Trev;514158

And what I was really interested in is why gcc 4.4.0 doesn't optimize correctly--in fact, worse than gcc 2.95.3 (which still isn't optimal). A shiney new gcc 4.4.0 m68k-*-amigaos* with fixed optimization (for this parituclar issue, anyway) and a native newlib implementation would be, well, shiney.

Indeed it would

Kind of scary that what started out as what I hoped was a simple "is there an -Wno-complain-about-asm" option turned into this :laughing:

-edit-

Do I take it my crazy Machine::rot8/16/32() are still fair game, then?

Trev · « **Reply #70 on:** July 03, 2009, 09:04:28 PM »

Quote from: Karlos;514409

Kind of scary that what started out as what I hoped was a simple "is there an -Wno-complain-about-asm" option turned into this :laughing:

:-) Anything to pass the time.

Quote

Do I take it my crazy Machine::rot8/16/32() are still fair game, then?

I think so, yes. A "fixed" gcc should be able to properly reduce and generate optimal code for shift-or operations, however, and after that, the templates will be redundant. There's no reason why you shouldn't/wouldn't continue to use templates, though, if that fits your coding style. You think? I'd still get rid of the direction and width and use a set of overloaded rotate templates, though. ;-) Keep it generic.

Karlos · « **Reply #71 on:** July 03, 2009, 09:07:57 PM »

The reason I didn't go for "signed direction" with the rotate operations was that you don't get that behaviour with <> either, by default.

I should point out that the template versions only exist to optimise "constant" rotates. There are normal inline methods where the number of bits to rotate is a variable.

Karlos · « **Reply #72 on:** July 03, 2009, 09:14:51 PM »

I should probably rename this thread "Trev builds gcc 4.4 for m68k target"

Trev · « **Reply #73 on:** July 03, 2009, 09:22:12 PM »

Quote from: Karlos;514413

I should probably rename this thread "Trev builds gcc 4.4 for m68k target"

Not yet! My track record with finishing hobby projects isn't so great. It might be "Trev got bored porting gcc to m68k-amigaos and watched Red Dwarf and Star Trek instead."

Re: gas, I might be running into a problem with Microsoft's implementation of snprintf, which differs from the ISO C99 definition with regard to when and where to append NULLs. So, 'jsr' is probably being truncated to 'js\0' somewhere, but only if the operands are over a certain length. Maybe there's a buffer that's too small somewhere in code. Anyhow, just guessing, as I ran into a similar problem when cross-compiling vbcc.

Karlos · « **Reply #74 on:** July 03, 2009, 09:25:15 PM »

From what I gather looking at MS, they pretty much want to deprecate the C standard library in favour of their own "safe" versions of everything

I have to question the valididy of their safe "strcpy" alternative. All you have to do is give it a bad destination buffer size. How is that any safer?

Author Topic: GCC asm() warning suppression options? (Read 19940 times)

Karlos

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Karlos

Re: GCC asm() warning suppression options?

Karlos

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Karlos

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Karlos

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Karlos

Re: GCC asm() warning suppression options?

Karlos

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Karlos

Re: GCC asm() warning suppression options?