Author Topic: GCC asm() warning suppression options? (Read 9551 times)

Trev · « **on:** June 28, 2009, 12:13:56 AM »

Is the warning thrown for all values of N or just values outside the range (0..31) of the I constraint? Can you use N%32? EDIT: A value outside 0..31 might throw "error: impossible constraint in `asm'" in addition to the warning, depending on your version of GCC.

I don't know if that particular warning can be suppressed. Does it show up when optimizations are disabled?

Trev · « **Reply #1 on:** June 28, 2009, 12:31:06 AM »

Quote from: Karlos;513631

Turning off optimisations would somewhat defeat the purpose of this code. Dead code elimination at the very least is required to ensure the inlining only emits the one branch of the code.

I was just thinking GCC might have a bug in the parser when optimizations are enabled. It's been noted on x86 and Alpha. What does the preproccesed output look like for each possible branch?

Trev · « **Reply #2 on:** June 28, 2009, 12:56:25 AM »

Quote from: Karlos;513622

Code: [Select]
namespace Machine { // ... template<const uint32 N> inline uint32 rotRight32(uint32 val) { // N is shift size if (!(N&31)) { return val; } if (N<9) { asm("ror.l %1, %0" : "=d"(val) : "I"(N), "0"(val) : "cc"); } else if (N==16) { asm("swap %0" : "=d"(val) : "0"(val) : "cc"); } else if (N>23) { asm("rol.l %1, %0" : "=d"(val) : "I"(32-N), "0"(val) : "cc"); } else { [b]asm("ror.l %1, %0" : "=d"(val) : "d"(N), "0"(val) : "cc");[/b] } return val; } // ... }

Shouldn't the bold line be the same as the N<9 branch, or is your intent to use a data register if N<9 || N>23? If so, don't you need to move.l the value first?

Trev · « **Reply #3 on:** June 28, 2009, 01:01:57 AM »

Quote from: Karlos;513638

I should point out it's always bugged me that C never had operators for integer rotate :lol:

Yeah, I think it's a common gripe. A lot of compilers implement rotation as intrinsic functions.

Quote

If N is outside the range, this is considered a programming error and I'm happy for it to fail compilation.

Good call.

Trev · « **Reply #4 on:** June 28, 2009, 01:19:42 AM »

Which version of gcc are you using? Native or cross?

You could also specialize your templates a bit to perhaps make things a bit clearer:

Code: [Select]


template <uint32 N> inline uint32 rotRight32(uint32 val)
{
  if (N%32 < 9) {
    asm(&quot;ror.l %1, %0;&quot; : &quot;=d&quot;(val) : &quot;I&quot;(N%32), &quot;0&quot;(val) : &quot;cc&quot;);
  }
  else if (N%32 > 23) {
    asm(&quot;rol.l %1, %0;&quot; : &quot;=d&quot;(val) : &quot;I&quot;(32-(N%32)), &quot;0&quot;(val) : &quot;cc&quot;);
  }
  else {
    asm(&quot;ror.l %1, %0;&quot; : &quot;=d&quot;(val) : &quot;d&quot;(N%32), &quot;0&quot;(val) : &quot;cc&quot;);
  }

  return val;
}

template <> inline uint32 rotRight32<0>(uint32 val)
{
  return val;
}

template <> inline uint32 rotRight32<16>(uint32 val)
{
  asm(&quot;swap %0;&quot; : &quot;=d&quot;(val) : &quot;0&quot;(val) : &quot;cc&quot;);
  return val;
}

EDIT: For extra credit, let's do , too, i.e. rotRight32(-1) rotates left 1 bit. You could then generalize it as rotate(), where the sign of the operand determines the direction. Anyway....

Trev · « **Reply #5 on:** June 28, 2009, 01:24:12 AM »

Quote from: Karlos;513641

(which is not for N<9 | N>23 but for the intermediate ranges between 8 and 16 bit either direction)

Yeah, I'm not doing so well with the ranges today.

Trev · « **Reply #6 on:** June 28, 2009, 01:57:20 AM »

@x303

On PowerPC, yes, but Karlos is talking 680x0.

Trev · « **Reply #7 on:** June 28, 2009, 02:00:07 AM »

That's still not a rotation function. ;-) I don't think he's trying to do endian conversions.

Trev · « **Reply #8 on:** June 28, 2009, 02:22:51 AM »

Looks like gcc 2.95 falls through to a default warning("asm operand %d probably doesn't match constraints", i) for constraints that it doesn't know how to test.

Trev · « **Reply #9 on:** June 28, 2009, 02:40:43 AM »

And here's the snippet from gcc 2.95.3's recog.c in asm_operand_ok():

Code: [Select]


        case '0': case '1': case '2': case '3': case '4':
        case '5': case '6': case '7': case '8': case '9':
          /* For best results, our caller should have given us the
             proper matching constraint, but we can't actually fail
             the check if they didn't.  Indicate that results are
             inconclusive.  */
          result = -1;
          break;

So, it automatically falls through on matching constraints. What happens if you change "0" to "d0" to tell it operand 0 goes into a data register? EDIT: The point being that matched input operands aren't checked against the constraints previously specified for the output operand.

Trev · « **Reply #10 on:** June 28, 2009, 07:33:01 PM »

Quote from: Karlos;513700

As I expected, changing "0" to "d0", breaks, since there's nothing forcing the input into d0 in the first place:

Hmmm, well the intent wasn't to tell it to use d0, but rather that it should use the data register that matches operand 0. (Incidentally, "r0" works correctly on x86, but there is, of course, no r0 register to work around.) Whitespace should be ignored, so a constraint of "d0" might produce the correct result and get rid of the warning; otherwise, I guess the warning is there to stay.

Trev · « **Reply #11 on:** June 28, 2009, 07:41:24 PM »

Quote from: Karlos;513702

That said, I've gone back to the single function(s) and just made it handle all values of N:

I think your solution is best anyway, as values equivalent to 0 and 16 won't be reduced by the specialized templates, and you won't get the expected result.

I also think your idea of letting N < 0 (if signed inputs are allowed) and N > 31 fail at compile time is a good idea if you're worried about source typos.

For fun, though, I still think a general purpose rotation template is a cool idea. ;-) It's too bad C++ won't let you create custom operators without using macros or some other magic. You could throw in <<<, >>>, ^<<, ^>>, or whatever you want. Actually, would it be that difficult to add those to gcc? I don't know. Nothing wrong with non-standard extensions as long as they're well documented.

Trev · « **Reply #12 on:** June 28, 2009, 09:52:28 PM »

Looking at Microsoft's implementation, 8-bit and 16-bit rotations are intrinsics, and 32-bit and 64-bit rotations are functions. The functions are implemented using the traditional bitwise shift and bitwise or combination.

16-bit rotations are actually implemented using 32-bit registers, but the result is truncated:

x86

Code: [Select]

    volatile unsigned char a    = _rotl8(1, 1);
013213DE  mov         al,1 
013213E0  rol         al,1 
013213E2  mov         byte ptr [a],al 
    volatile unsigned short b   = _rotl16(1, 1);
013213E5  mov         eax,1 
013213EA  rol         ax,1 
013213ED  mov         word ptr [b],ax 
    volatile unsigned c         = _rotl(1, 1);
013213F1  push        1    
013213F3  push        1    
013213F5  call        @ILT+155(__rotl) (13210A0h) 
013213FA  add         esp,8 
013213FD  mov         dword ptr [c],eax 
    volatile unsigned long d    = _lrotl(1, 1);
01321400  push        1    
01321402  push        1    
01321404  call        @ILT+65(__lrotl) (1321046h) 
01321409  add         esp,8 
0132140C  mov         dword ptr [d],eax 
    volatile unsigned __int64 e = _rotl64(1, 1);
0132140F  push        1    
01321411  push        0    
01321413  push        1    
01321415  call        @ILT+280(__rotl64) (132111Dh) 
0132141A  add         esp,0Ch 
0132141D  mov         dword ptr [e],eax 
01321420  mov         dword ptr [ebp-38h],edx

x64

Code: [Select]

    volatile unsigned char a    = _rotl8(1, 1);
000000013FBF102A  mov         al,1 
000000013FBF102C  rol         al,1 
000000013FBF102E  mov         byte ptr [a],al 
    volatile unsigned short b   = _rotl16(1, 1);
000000013FBF1032  mov         ax,1 
000000013FBF1036  rol         ax,1 
000000013FBF1039  mov         word ptr [b],ax 
    volatile unsigned c         = _rotl(1, 1);
000000013FBF103E  mov         edx,1 
000000013FBF1043  mov         ecx,1 
000000013FBF1048  call        _rotl (13FBF10A8h) 
000000013FBF104D  mov         dword ptr [c],eax 
    volatile unsigned long d    = _lrotl(1, 1);
000000013FBF1051  mov         edx,1 
000000013FBF1056  mov         ecx,1 
000000013FBF105B  call        _lrotl (13FBF10A2h) 
000000013FBF1060  mov         dword ptr [d],eax 
    volatile unsigned __int64 e = _rotl64(1, 1);
000000013FBF1064  mov         edx,1 
000000013FBF1069  mov         ecx,1 
000000013FBF106E  call        _rotl64 (13FBF109Ch) 
000000013FBF1073  mov         qword ptr [e],rax

The good news--for Visual C++, anyway--is that the shift-or combination is correctly optimized into a rotate. Microsoft reduces with N&31 as well:

Code: [Select]

___rotl	PROC						; COMDAT

; 8    :     shift &= 0x1f;
; 9    :     val = (val>>(0x20 - shift)) | (val << shift);

  00000	b8 01 00 00 00	 mov	 eax, 1
  00005	d1 c0		 rol	 eax, 1

; 10   :     return val;
; 11   : }

  00007	c3		 ret	 0
___rotl	ENDP

Anyhow, just food for thought. I know GCC's m68k optimizer needs work, even in current versions.

Trev · « **Reply #13 on:** June 29, 2009, 11:19:07 PM »

Quote from: Karlos;513766

Thanks to the power of templated inline assembler, it doesn't matter how bad the m68k optimizer is, you've basically taken matters into your own hands :lol:

The things we do for cycles, eh?

Too true, but our compilers should be smart enough to know a rotation when they see it. I've got a marathon gcc-4.4.0 m68k build going at home. Taking hours--on a Core i7. That's what I get for not limiting newlib and g++ to m680x0 CPUs. (Plus, shouldn't we have schnazzy parallel compilers by now? Even a loopback distcc would be nice.) So, we'll see if newer GCCs are any better at optimizing rotatations.

EDIT: Oh, I guess we have 'make -j n', but it's not very robust.

Trev · « **Reply #14 on:** June 30, 2009, 05:50:28 AM »

Quote from: Karlos;513878

You're building gcc 4.4 for an m68k backend just to check this? That's hardcore

Hardcore boredom, maybe. ;-) gcc 4.4.0 is definitely smarter than gcc 2.95.3:

intput:

Code: [Select]

static inline unsigned _rotl(unsigned val, int shift)
{
    shift &= 0x1f;
    val = (val>>(0x20 - shift)) | (val << shift);
    return val;
}

int main(void)
{
    volatile unsigned val = 1;
    volatile unsigned a = _rotl(val, 1);
    return 0;
}

output:

Code: [Select]

00000000

:
   0:   4e56 fff8       linkw %fp,#-8
   4:   7001            moveq #1,%d0
   6:   2d40 fffc       movel %d0,%fp@(-4)
   a:   202e fffc       movel %fp@(-4),%d0
   e:   721f            moveq #31,%d1
  [b]10:   e2b8            rorl %d1,%d0[/b]
  12:   2d40 fff8       movel %d0,%fp@(-8)
  16:   4280            clrl %d0
  18:   4e5e            unlk %fp
  1a:   4e75            rts

It's even decided that a register based rotate right is faster than an immediate rotate left, I guess. Or maybe not. I can't get it to produce a rotate left. I guess gcc isn't an ambiturner.

In this case, at least, the optimizer does OK. I don't know anything about gcc internals, but maybe the optimization occurs at the RTL level based on the capabilities of the underlying architecture. That would mean an optimization should apply regardless of the architecture, as long as the architecture supports it.

But why doesn't it rotate left? The execution times are the same for both. Correction: The execution times are the same if the bit count is the same: 8+2n for register operand, 12+2n for immediate operand. What am I missing?

I compiled with an m68k-elf target, but now the m68k-amigaos gods are calling. I know there are some gcc 4.x.x builds out there somewhere, but I haven't used one. I'd also like to see ixemul and libnix go away and be replaced with newlib or another current library.

If necessary, one could even target specific releases, e.g. m68k-*-amigaos1.2, m68k-*-amigaos2.0, m68k-*-amigaos3.0, et al. It depends on how tightly coupled the tool chain is to the target environment. The main differences from a tool chain perspective, though, should be in the hunks supported. Everything else could be handled as it is today, and newlib, crt0.s, and amiga.lib could be written to run optimally on arbitrary releases. Having that magic at compile time, though, would result in much tighter binaries.

Author Topic: GCC asm() warning suppression options? (Read 9551 times)

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?