Author Topic: GCC asm() warning suppression options? (Read 19937 times)

Karlos · « *Last Edit: June 30, 2009, 05:02:21 PM by Karlos* »

Quote from: Trev;513971

You have to trust that your target makefile is essentially thread-safe, i.e. all dependencies are properly documented for synchronization, no race conditions exist in similar commands used by different rules, etc.

That's basically what I was trying to suggest. I've not built anything as large as gcc and the toolchain with multiple concurrent jobs, but I can imagine concurrency issues can exist.

-edit-

Quote

Yeah, I'm really not sure why, since ror.l #n,Dn and ror.l Dx,Dn execute in the same number of cycles assuming n == Dx. Barring outside influences, the only reason to use a register is for a shift >8 or <24 (>8 in the opposite direction), as you've done in your template. Right?

Well, you could probably do it with two successive rotates, but I figured that the register method might be better than a pair of rotates. Having said that, I didn't test the latter. You'd swap a move for a rotate but you'd gain a free register overall.

I'll have to look into that.

Trev · « **Reply #45 on:** June 30, 2009, 06:10:34 PM »

Quote from: Karlos;513975

Well, you could probably do it with two successive rotates, but I figured that the register method might be better than a pair of rotates. Having said that, I didn't test the latter. You'd swap a move for a rotate but you'd gain a free register overall.

I'm leaning towards trusting the compiler. One can always hand optimize code, but writing specializations for all possible cases? Tedium. ;-)

Karlos · « **Reply #46 on:** June 30, 2009, 07:05:52 PM »

Quote from: Trev;513983

I'm leaning towards trusting the compiler. One can always hand optimize code, but writing specializations for all possible cases? Tedium. ;-)

The fact is, I needed rotate operations, so I wrote the functions (in pure C++) for 8/16 and 32-bit rotate.

After analysing the output (from older gcc) it was clear that the emitted code wasn't very good. So, I made the inline assembler implementations you saw one of in this thread. There's no question they produce better code now than when they were using shifts and ors

Of course, if I use a better compiler, there's nothing stopping me turning off the inline assembler versions. It's just a compilation directive to use the asm tuned or normal C ones.

Trev · « **Reply #47 on:** June 30, 2009, 07:17:27 PM »

Quote from: Karlos;513995

After analysing the output (from older gcc) it was clear that the emitted code wasn't very good. So, I made the inline assembler implementations you saw one of in this thread. There's no question they produce better code now than when they were using shifts and ors

Oh, for sure. I was just thinking that for the newer compiler, the optimizer should be much better at dynamically optimizing for all possible scenarios than I would be at hand coding them. I'm no amigaksi, after all.

Karlos · « **Reply #48 on:** June 30, 2009, 07:57:32 PM »

Quote from: Trev;514002

Oh, for sure. I was just thinking that for the newer compiler, the optimizer should be much better at dynamically optimizing for all possible scenarios than I would be at hand coding them. I'm no amigaksi, after all.

ROFL (+1)

Tell you what. Seeing as you've compiled a working 4.4 compiler, we could try some synthetic benchmarks. My template rotate versus a standard implementation based on shifting and or'ing that the compiler is left to optimize.

I'd actually be quite interested in the results

Trev · « **Reply #49 on:** June 30, 2009, 09:59:17 PM »

Quote from: Karlos;514007

ROFL (+1)

;-)

Quote

Tell you what. Seeing as you've compiled a working 4.4 compiler, we could try some synthetic benchmarks. My template rotate versus a standard implementation based on shifting and or'ing that the compiler is left to optimize.

I'd actually be quite interested in the results

Well, it's m68k-elf with no real back end. We could count cycles in a simulator, I suppose. :-)

Also, fun with generic templates. Here's an x86 rotate that reduces positive and negative shift values to a positive shift (assuming +right, -left), and "optimizes" based on the width of the shift:

Code: [Select]


template <signed N> inline unsigned rotate(unsigned val)
{
    if ((32-(-(N%32)))%32 != 0) {
        if ((32-(-(N%32)))%32 < 16) {
            asm(&quot;rorl %1, %0;&quot; : &quot;=r&quot;(val) : &quot;I&quot;((32-(-(N%32)))%32), &quot;0&quot;(val) : &quot;cc&quot;);
        }
        else {
            asm(&quot;roll %1, %0;&quot; : &quot;=r&quot;(val) : &quot;I&quot;(32-((32-(-(N%32)))%32)), &quot;0&quot;(val) : &quot;cc&quot;);
        }
    }

    return val;
}

(I haven't looked at the execution times, so the optimization might not even make sense. But that wasn't point, regardless.)

But guess what! N==0 (or any value that reduces to 0) throws this:

Code: [Select]


warning: asm operand 1 probably doesn't match constraints

Bugger! It still compiles, still runs, and doesn't leave any dead code. Not sure how to get rid of the warning, though, if it's parsing code it shouldn't be parsing after templatization. Template misuse, maybe?

Karlos · « **Reply #50 on:** June 30, 2009, 10:39:51 PM »

Quote from: Trev;514024

But guess what! N==0 (or any value that reduces to 0) throws this:

Code: [Select]
warning: asm operand 1 probably doesn't match constraints
Bugger! It still compiles, still runs, and doesn't leave any dead code. Not sure how to get rid of the warning, though, if it's parsing code it shouldn't be parsing after templatization. Template misuse, maybe?

Boo! So you basically get the same warning I started this whole thread in aid of? :roflmao:

It's only taken us 50 posts to come full circle

-edit-

Template misuse? Are you suggesting that the use of high level metaprogramming devices like templates to emit conditionally selected hand generated code directly for the assembler stage might be outside the original scope?

It isn't quite as cheeky as the processor trap -> C++ exception throw that I used. Frankly, I'm amazed that bugger worked at all. Inside the (asm) m68k trap handler (which you install into your exec Task structure), you poke the stack frame to change the return address to a function which does nothing other than throw an exception of a type suitably mapped to the nature of the trap. Saves having to check for divide by zero when you can just put a try/catch block around a bit of code and trap ZeroDivide

Karlos · « **Reply #51 on:** June 30, 2009, 10:47:04 PM »

Quote

Well, it's m68k-elf with no real back end. We could count cycles in a simulator, I suppose. :-)

Or I could write the function to be benchmarked, you can compile it and post the assembler output of the function and I'll put that source back into a test project?

Trev · « **Reply #52 on:** June 30, 2009, 11:08:33 PM »

Quote from: Karlos;514035

Or I could write the function to be benchmarked, you can compile it and post the assembler output of the function and I'll put that source back into a test project?

Or that. :-P

Karlos · « **Reply #53 on:** June 30, 2009, 11:11:41 PM »

Quote from: Trev;514036

Or that. :-P

It will have to wait though, I have a date with the shower then bed. I'm wiped.

Trev · « **Reply #54 on:** June 30, 2009, 11:12:12 PM »

Quote from: Karlos;514033

Boo! So you basically get the same warning I started this whole thread in aid of? :roflmao:

It's only taken us 50 posts to come full circle

I think we're safe as long as no one invokes Sir Elton.

Quote

It isn't quite as cheeky as the processor trap -> C++ exception throw that I used. Frankly, I'm amazed that bugger worked at all. Inside the (asm) m68k trap handler (which you install into your exec Task structure), you poke the stack frame to change the return address to a function which does nothing other than throw an exception of a type suitably mapped to the nature of the trap. Saves having to check for divide by zero when you can just put a try/catch block around a bit of code and trap ZeroDivide

Actually, that sounds like a quite valid use. Within the design of the operating system even. (Well, sort of. But manipulating stack frames is kind of at the core of exception handling, isn't it?)

Karlos · « **Reply #55 on:** June 30, 2009, 11:34:56 PM »

Quote from: Trev;514038

Actually, that sounds like a quite valid use. Within the design of the operating system even. (Well, sort of. But manipulating stack frames is kind of at the core of exception handling, isn't it?)

Well, yes, but not quite like this. Normally C++ exceptions operate entirely in userland and unwind the stack of the process they were fired in (well, if you omit threadsafe.lib in old gcc, watch the fun when that assertion fails).

Here, we are actually in the supervisor state, altering the saved stack frame of the thread that performed the illegal op and altering the return address such that when the trap is complete, it returns to a completely different location. Right into our code that throws the exception.

The old thread about that is on here somewhere. Amazingly it really does work very well and I built it into my codebase. I'm currently figuring out how to accomplish the same thing inside a signal handler under posix, but it always seems as if the exception occurred inside main() rather than where it really happened.

-edit-

http://www.amiga.org/forums/showthread.php?t=25181 here

Trev · « **Reply #56 on:** July 01, 2009, 12:15:44 AM »

Were you ever able to simulate a null pointer exception, short of wrapping all pointers in a class and overloading the indirection operator?

Trev · « **Reply #57 on:** July 01, 2009, 07:07:31 AM »

I suspect your template will be faster, but only because the optimizer isn't doing rol's:

Code: [Select]

template <signed N> static inline unsigned rotate(unsigned val)
{
    if ((32-(-(N%32)))%32 != 0) {
        if ((32-(-(N%32)))%32 < 9) {
            asm(&quot;rorl %1, %0;&quot; : &quot;=d&quot;(val) : &quot;I&quot;((32-(-(N%32)))%32), &quot;0&quot;(val) : &quot;cc&quot;);
        }
        else if ((32-(-(N%32)))%32 > 23) {
            asm(&quot;roll %1, %0;&quot; : &quot;=d&quot;(val) : &quot;I&quot;(32-((32-(-(N%32)))%32)), &quot;0&quot;(val) : &quot;cc&quot;);
        }
        else if ((32-(-(N%32)))%32 == 16) {
            asm(&quot;swap %0;&quot; : &quot;=d&quot;(val) : &quot;0&quot;(val) : &quot;cc&quot;);
        }
        else {
            asm(&quot;rorl %1, %0;&quot; : &quot;=d&quot;(val) : &quot;d&quot;((32-(-(N%32)))%32), &quot;0&quot;(val) : &quot;cc&quot;);
        }
    }

    return val;
}

static inline unsigned _rotl(unsigned val, int shift)
{
    shift &= 0x1f;
    val = (val>>(0x20 - shift)) | (val << shift);
    return val;
}

static inline unsigned _rotr(unsigned val, int shift)
{
    shift &= 0x1f;
    val = (val<<(0x20 - shift)) | (val >> shift);
    return val;
}

int main(void)
{
    volatile unsigned x = 1;

    volatile unsigned a = _rotl(x, 1);
    volatile unsigned b = _rotr(x, 1);

    volatile unsigned c = rotate<-1>(x);
    volatile unsigned d = rotate<1>(x);

    return 0;
}

/*
00000000 <main>:
   0:   4e56 ffec       linkw %fp,#-20

volatile unsigned x = 1;
   4:   7001            moveq #1,%d0

volatile unsigned a = _rotl(x, 1);
   6:   2d40 fffc       movel %d0,%fp@(-4)
   a:   202e fffc       movel %fp@(-4),%d0
   e:   721f            moveq #31,%d1
  10:   e2b8            rorl %d1,%d0
  12:   2d40 fff8       movel %d0,%fp@(-8)

volatile unsigned b = _rotr(x, 1);
  16:   202e fffc       movel %fp@(-4),%d0
  1a:   e298            rorl #1,%d0
  1c:   2d40 fff4       movel %d0,%fp@(-12)

volatile unsigned c = rotate<-1>(x);
  20:   202e fffc       movel %fp@(-4),%d0
  24:   e398            roll #1,%d0
  26:   2d40 fff0       movel %d0,%fp@(-16)

volatile unsigned d = rotate<1>(x);
  2a:   202e fffc       movel %fp@(-4),%d0
  2e:   e298            rorl #1,%d0
  30:   2d40 ffec       movel %d0,%fp@(-20)

return 0;
  34:   4280            clrl %d0
  36:   4e5e            unlk %fp
  38:   4e75            rts
*/

I don't know anything about how the optimizer works, really, so I don't know why it's always opting for one solution over another.

Karlos · « **Reply #58 on:** July 01, 2009, 07:41:32 AM »

Quote from: Trev;514045

Were you ever able to simulate a null pointer exception, short of wrapping all pointers in a class and overloading the indirection operator?

No. Well, I didn't try too hard as I was never able to get mmu.library working on my system. Every time I'd install it, it would drop to bits.

Karlos · « **Reply #59 on:** July 01, 2009, 07:51:18 AM »

Quote from: Trev;514073

I suspect your template will be faster, but only because the optimizer isn't doing rol's:

Well, that and the fact it doesn't require an additional register to hold the shift value for many of the sizes. Saving a register gives the optimizer more breathing space in 'real' code.

Quote

I don't know anything about how the optimizer works, really, so I don't know why it's always opting for one solution over another.

It could be that the RTL model only supports rotation in one direction? Just a guess.

Author Topic: GCC asm() warning suppression options? (Read 19937 times)

Karlos

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Karlos

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Karlos

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Karlos

Re: GCC asm() warning suppression options?

Karlos

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Karlos

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Karlos

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Trev

Re: GCC asm() warning suppression options?

Karlos

Re: GCC asm() warning suppression options?

Karlos

Re: GCC asm() warning suppression options?