Welcome, Guest. Please login or register.

Author Topic: Disassembled ASM problem.  (Read 2852 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline DoobreyTopic starter

  • Hero Member
  • *****
  • Join Date: Oct 2002
  • Posts: 1876
    • Show only replies by Doobrey
    • http://www.doobreynet.co.uk
Disassembled ASM problem.
« on: November 11, 2004, 12:40:33 AM »
Just wondered if any of you ASM gurus could tell me wtf is going on with the branch to +2 ?
Code: [Select]

; a1=dos
; a0=exec
; on stack =proc struct,temp,dunno
LAB_0206:
MOVEM.L D2-D3/A3-A4/A6,-(A7) ;2D8A: 48E7301A
MOVEA.L 36(A7),A3 ;2D8E: 266F0024 Addr of buffer to a3
MOVEA.L 24(A7),A4 ;2D92: 286F0018

;## Clear whitespace from start.
LAB_0207:
CMPI #$528B,D0 ;2D96: 0C40528B
MOVE.B (A3),D0 ;2D9A: 1013
MOVEQ #32,D1 ;2D9C: 7220
CMP.B D1,D0 ;2D9E: B001 ;## Check for space.
BEQ.S LAB_0207+2 ;2DA0: 67F6
MOVEQ #9,D1 ;2DA2: 7209
CMP.B D1,D0 ;2DA4: B001 ## Check for tab
BEQ.S LAB_0207+2 ;2DA6: 67F0


Seeing as the code at 0207+2 is actually 'addq.l #1,a3', why the compare?
Is it simply some compiler trick to shave a couple of bytes off the exe, and speed it up by not having another branch?
(Both IRA and VDA68k produced the same code)

BTW, I`ll donate £10 to amiga.org if the first person to correctly answer this *isnt* Piru  :-)
On schedule, and suing
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: Disassembled ASM problem.
« Reply #1 on: November 11, 2004, 12:50:27 AM »
Thats quite strange.

Assuming the code entry is at LAB_0206, the compare operation at LAB_0207 would seem to be entirely wasted, since the cc fields will be set my the move.b that follows it.

The compare also seems to take no part in the loop because the branch offsets skip past it. Actually, I am not sure they do. They jump to the address + 2, which assuming a byte based offset actually jumps into the immediate word data of the instruction (the #$528B) - which is your addq #1, a3.

So the effects are that it harmlessly eliminates the effect of the addition operation on initial loop entry, since it is interpreted as a compare (which simply does nothing useful of itself) and each successive iteration afterwards it is interpreted as the add quick.

Hence, I guess its simply a method of suppressing the add operation the first time through the loop as a form of optimisation in the flow in a do/while style construct.
int p; // A
 

Offline DoobreyTopic starter

  • Hero Member
  • *****
  • Join Date: Oct 2002
  • Posts: 1876
    • Show only replies by Doobrey
    • http://www.doobreynet.co.uk
Re: Disassembled ASM problem.
« Reply #2 on: November 11, 2004, 01:00:10 AM »
Quote

Karlos wrote:
 They jump to the address + 2, which assuming a byte based offset actually jumps into the immediate word data of the instruction (the #$528B).


Yup.. and $528B is the 'addq.l #1,a3' I mentioned, and makes sense to what the code does (it`s part of a routine that reads a config file and strips all whitespace and comments from each line before parsing with ParseArgs() )

I figured the compare is just a dummy op to save on branching past the addq when it`s first called, otherwise it`d scan from the 2nd char in the line.

It`s something I`ve seen quite a bit lately..poking about inside the OS. Some parts of the OS are riddled with it, others don`t have it at all.

Anyway, ta for confirming my suspicions.
On schedule, and suing
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: Disassembled ASM problem.
« Reply #3 on: November 11, 2004, 01:01:38 AM »
I think you replied whilst I was editing my post :-)

As optimisations go, its a bit pointless as at best it saves one unconditional branch on loop entry. It's most likely a space saving optimisation rather than a performance one.
int p; // A
 

Offline DoobreyTopic starter

  • Hero Member
  • *****
  • Join Date: Oct 2002
  • Posts: 1876
    • Show only replies by Doobrey
    • http://www.doobreynet.co.uk
Re: Disassembled ASM problem.
« Reply #4 on: November 11, 2004, 01:06:59 AM »
Ditto  :lol:
On schedule, and suing
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: Disassembled ASM problem.
« Reply #5 on: November 11, 2004, 01:10:26 AM »
The places that have this type of optimisation were possibly written in asm in the first instance. Those that don't are perhaps the result of C compilers?

Or even vice versa, if a C compiler is optimising for size ;-)

-edit-

Would this count as a form of self modifying code? It's certianly a "self reinterpreting" code ;-)
int p; // A
 

Offline Piru

  • \' union select name,pwd--
  • Hero Member
  • *****
  • Join Date: Aug 2002
  • Posts: 6946
    • Show only replies by Piru
    • http://www.iki.fi/sintonen/
Re: Disassembled ASM problem.
« Reply #6 on: November 11, 2004, 01:36:56 AM »
Quote
The places that have this type of optimisation were possibly written in asm in the first instance. Those that don't are perhaps the result of C compilers?

Such code is commonly generated by C compilers because it's typically faster than branching, and often reduces code size too.

Quote
Would this count as a form of self modifying code? It's certianly a "self reinterpreting" code

This indeed can cause some problems with 68060's Branch Prediction cache. Specifically it can result in Access Error exception with Branch Prediction Exception bit set (BPE bit in FSLW). In such cases the exception handler needs to flush the branch cache before continuing.

NOTE: This isn't something coders need to worry about, unless if you're implementing your own OS or replacing the access error exception vector. :-)
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: Disassembled ASM problem.
« Reply #7 on: November 11, 2004, 01:41:31 AM »
@Piru

As I said, in this instance its wasted as a speed optimisation, it affects the first iteration of the loop only. Surely a size optimisation then - it would save 2 bytes overall compared to a bra.b into the loop body after the add instruction, right?

Given that on the 060 it may invoke an exception, it's hardly a speed optimisation anymore :-)
int p; // A
 

Offline Piru

  • \' union select name,pwd--
  • Hero Member
  • *****
  • Join Date: Aug 2002
  • Posts: 6946
    • Show only replies by Piru
    • http://www.iki.fi/sintonen/
Re: Disassembled ASM problem.
« Reply #8 on: November 11, 2004, 01:48:47 AM »
Quote
As I said, in this instance its wasted as a speed optimisation, it affects the first interation only. Surely a size optimisation then - it would save 2 bytes overall compared to a bra.b into the loop body after the add instruction, right?

Yeah.. But in cases where two set of instructions are equally fast, if the other reduces to fewer memory/cache inst fetches, the smaller code is generally faster. This is highly academic though, as it much depends on the state of the cache, and the total size of the code being executed, aswell as the target CPU. Anyway, in essense size optimization that doesn't slow down execution can be considered speed optimization, too.

Quote
Given that on the 060 it may invoke an exception, it's hardly a speed optimisation anymore

Well, it was ok for 68020/68030 at least, possibly with 68040. But if the code is to be executed on 68060, it's obviously not recommended.
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: Disassembled ASM problem.
« Reply #9 on: November 11, 2004, 01:59:02 AM »
All true, of course. I guess it's because I always see the loop unrolled in my mind that optimisations such as this appear virtually futile to me, other than for space saving ;-)

-edit-

Speaking of all things low down and dirty, did you solve my float to int problem yet? I need a non conditional branching way of handling the x=1.0 case and if anybody can solve that one it's you. Don't make me come and extinguish the sauna now.... :-D

-edit2-

Dang, I forgot, I posted that problem prior to the great thread massacre of 2004...
int p; // A
 

Offline DoobreyTopic starter

  • Hero Member
  • *****
  • Join Date: Oct 2002
  • Posts: 1876
    • Show only replies by Doobrey
    • http://www.doobreynet.co.uk
Re: Disassembled ASM problem.
« Reply #10 on: November 11, 2004, 02:14:18 AM »
Quote

Piru wrote:
But if the code is to be executed on 68060, it's obviously not recommended.


So would you be horrified to learn that snippet of code came from Setpatch (v44.38)..specifically the NSDPatch part. ?
On schedule, and suing
 

Offline DoobreyTopic starter

  • Hero Member
  • *****
  • Join Date: Oct 2002
  • Posts: 1876
    • Show only replies by Doobrey
    • http://www.doobreynet.co.uk
Re: Disassembled ASM problem.
« Reply #11 on: November 11, 2004, 02:49:49 AM »
Quote

Karlos wrote:
at best it saves one unconditional branch on loop entry.


I dunno why they didn`t just do..

 subq.l #1,a3
Loop:
 addq.l #1,a3

 It doesn`t make the code any bigger or use any extra branches.
 As you`ve both said, it looks like it was the result of a C compiler being clever(maybe too clever!)
On schedule, and suing
 

Offline Gofromiel

  • Newbie
  • *
  • Join Date: Jul 2004
  • Posts: 38
    • Show only replies by Gofromiel
    • http://www.gofromiel.com
Re: Disassembled ASM problem.
« Reply #12 on: November 11, 2004, 06:20:48 AM »
Have you tried another disassembler like d68k ? Because I use it since birth and I've never seen a "addr+2" thing !

Offline PiR

  • Full Member
  • ***
  • Join Date: Apr 2003
  • Posts: 148
    • Show only replies by PiR
Re: Disassembled ASM problem.
« Reply #13 on: November 18, 2004, 06:29:33 PM »
@Gofromiel

Sorry, but I think you've lost the main problem in this listing... It is not possible to disassemble such code in a good way.

@Doobrey

I guess this came from times of optimalisation for plain 68000. Notice that single cmpi.w seems to be easier to execute than two artmetical operations (where both also set CC afterwards, so it is calculation + cmp for both).

But for newer processors your solution is obviously better (and much better than bra).

I think I've read also somewhere that for 68040 it is faster to have fewer, even longer commands, than more simpler.

No such problems for PowerPC code.  :-)

And in the last word - to extend what Piru said about 68060 Branch Prediciton Cache - for WinUAE users:
Imagine what all this can do to JIT emutation.
Let's asume that we have even optimasing JIT, that is able even to analyse and store, if it is needed to prepare CC in the instruction (or even eliminates dead-code instructions).