Amiga.org

Operating System Specific Discussions => Amiga OS => Amiga OS -- Development => Topic started by: Franko on August 29, 2011, 05:16:58 PM

Title: OT - assembler versus C for Amiga development
Post by: Franko on August 29, 2011, 05:16:58 PM
OT discussion continued from here: http://www.amiga.org/forums/showthread.php?t=58923

@ Jose @ Golem

Afraid C is not the programming language for me for the simple reason that Golem said the generated code is waaay bigger (normally increasing the size by a third at least) and it's a lot slower and highly inefficient... :(

I know a lot of folk back in the day who tried C and it's variants but they could never produce the results they wanted with it, especially when working with fast moving gfx and most of them ended up coming back to assembler... :)

Assembler to me personally is very easy and simple to understand and use and much easier to follow when writing large pieces of code but best of all you get the absolute best speed and efficiency from you code at the end of the day on the Amiga that's what's important... :)
Title: Re: Intuition questions...
Post by: SamuraiCrow on August 29, 2011, 05:23:25 PM
Quote from: Franko;656712
@ Jose @ Golem

Afraid C is not the programming language for me for the simple reason that Golem said the generated code is waaay bigger (normally increasing the size by a third at least) and it's a lot slower and highly inefficient... :(


That's not a problem with the language, that is a problem with the compiler.  If Sidewinder and I can get Clang to work on AROS 68k, we'll have a bit newer compiler technology and maybe we'll get closer to parity with Assembly developers.

One example is that sometimes assembly programmers could pack 2 words in a long variable to get better register loading.  Clang does this if you use the PBQP register allocator.  I don't know of any existing Amiga compilers that do this.
Title: Re: Intuition questions...
Post by: Karlos on August 29, 2011, 05:36:34 PM
GCC has options for packing suitably small structures into single register fields. Certainly you can do it for return values using -freg-struct-return.

Packing several values, eg two shorts, into a register doesn't always make sense depending on the CPU you are coding for. For example, on 68040, almost any memory access that results in a direct cache hit is probably going to be faster than having to write several instructions to transform a register to get hold of the element you want, do the operation and then put it all back together again. Yet on the 020, the reverse is almost always true.

Of course, a good compiler should be able to apply the appropriate analysis to the code generation and see which makes more sense here.
Title: Re: Intuition questions...
Post by: Franko on August 29, 2011, 05:37:13 PM
Quote from: SamuraiCrow;656715
That's not a problem with the language, that is a problem with the compiler.  If Sidewinder and I can get Clang to work on AROS 68k, we'll have a bit newer compiler technology and maybe we'll get closer to parity with Assembly developers.

One example is that sometimes assembly programmers could pack 2 words in a long variable to get better register loading.  Clang does this if you use the PBQP register allocator.  I don't know of any existing Amiga compilers that do this.


Whether it's the language itself or the compiler, it make no difference... :)

C and it's derivatives will always produce larger code and will never be as efficient in terms of speed as something written in pure assember... :)

Way, way back we used to have this argument all the time especially with mates who went to Uni and for whatever reason they had to learn C. They would produce a bit of code in C on the Amiga claiming it was every bit as efficient as Assembler but ALWAYS those of us who coded in Assembler would prove this wrong by producing the same routine in Assembler smaller & faster... :)

The thing was back then most folk were using an unexpanded A500 (as it cost about 250 quid for a 512Mb ram board) so with just the chipmem available coding had to be as efficient and fast as possible for such machines... :)

It taught us to be very proficient and not to be lazy when coding something so that we could get the very best out of such small resources and to this day that has always been to me the way to do things on the Amiga, small, efficient and speedy... :)
Title: Re: Intuition questions...
Post by: Thorham on August 29, 2011, 05:45:07 PM
Quote from: Franko;656712
Assembler to me personally is very easy and simple to understand and use and much easier to follow when writing large pieces of code but best of all you get the absolute best speed and efficiency from you code at the end of the day on the Amiga that's what's important... :)
Indeed. C is nice on the peecee, but on 680x0 Amigas, assembly language rules.
Title: Re: Intuition questions...
Post by: Karlos on August 29, 2011, 06:06:40 PM
Personally, I go for C(++) first, then anything that still isn't fast enough after reviewing algorithm changes and profiling I will look at writing assembler replacements for.

Some stuff just doesn't need optimizing. Basic IO, for example, is almost always going to be limited by the speed of the device being communicated with. Event handling is another example. No amount of assembler will basically speed up having to wait for an asynchronous event to happen.

The problem with assembly coding is that unless you keep absolutely up-to-date with each revision of your target architecture, you will always fall foul of bad assumptions in the end.

There are many clock cycle optimisations for the basic 68000 that are slower on the 68020. Likewise, once you master the 020's behaviour, a lot of it ends up being counter-productive on the 68040.

Then there are general changes in system architecture. On the cacheless 68000, precomputed lookup tables were king to speed up various complex operations. As processors have gotten faster in relation to memory, it often ends up quicker to evaluate the expression than it does to precompute it and perform memory lookups, unless you can arrange your precomputed data in a very cache friendly way.

Anyhow, this is somewhat off-topic.
Title: Re: Intuition questions...
Post by: Franko on August 29, 2011, 06:33:42 PM
Quote from: Karlos;656729
Personally, I go for C(++) first, then anything that still isn't fast enough after reviewing algorithm changes and profiling I will look at writing assembler replacements for.

Some stuff just doesn't need optimizing. Basic IO, for example, is almost always going to be limited by the speed of the device being communicated with. Event handling is another example. No amount of assembler will basically speed up having to wait for an asynchronous event to happen.

The problem with assembly coding is that unless you keep absolutely up-to-date with each revision of your target architecture, you will always fall foul of bad assumptions in the end.

There are many clock cycle optimisations for the basic 68000 that are slower on the 68020. Likewise, once you master the 020's behaviour, a lot of it ends up being counter-productive on the 68040.

Then there are general changes in system architecture. On the cacheless 68000, precomputed lookup tables were king to speed up various complex operations. As processors have gotten faster in relation to memory, it often ends up quicker to evaluate the expression than it does to precompute it and perform memory lookups, unless you can arrange your precomputed data in a very cache friendly way.

Anyhow, this is somewhat off-topic.


While I see the points your making they are not exactly correct however... :)

Take for example something written for doing IO like an HD DOS driver or device. If you wrote that entirely in C it would be highly inefficient in comparison to coding it in assembler... ;)

Sure not matter which method you choose to write your code in they both are restricted by hardware & the hardware bus and physical speeds of IO lines... :)

BUT that's not where it ends, as the actual code for shifting all this data back and forth is in the driver you write and if you write it in C and not in highly optimised assembler you lose speed overall as your routine performs it's code x amount of time per second... :)

No matter what you write and which version of the OS or processor you write it for at the end of day C will never outperform Assembler, proven fact and easy to prove... :)
Title: Re: Intuition questions...
Post by: Tension on August 29, 2011, 06:57:59 PM
Quote from: Karlos;656729
Personally, I go for C(++) first, then anything that still isn't fast enough after reviewing algorithm changes and profiling I will look at writing assembler replacements for.

Some stuff just doesn't need optimizing. Basic IO, for example, is almost always going to be limited by the speed of the device being communicated with. Event handling is another example. No amount of assembler will basically speed up having to wait for an asynchronous event to happen.

The problem with assembly coding is that unless you keep absolutely up-to-date with each revision of your target architecture, you will always fall foul of bad assumptions in the end.

There are many clock cycle optimisations for the basic 68000 that are slower on the 68020. Likewise, once you master the 020's behaviour, a lot of it ends up being counter-productive on the 68040.

Then there are general changes in system architecture. On the cacheless 68000, precomputed lookup tables were king to speed up various complex operations. As processors have gotten faster in relation to memory, it often ends up quicker to evaluate the expression than it does to precompute it and perform memory lookups, unless you can arrange your precomputed data in a very cache friendly way.

Anyhow, this is somewhat off-topic.


but very interesting!
Title: Re: Intuition questions...
Post by: golem on August 29, 2011, 07:07:02 PM
Quote from: Tension;656743
but very interesting!


+1.
Title: Re: Intuition questions...
Post by: SamuraiCrow on August 29, 2011, 07:13:34 PM
@Karlos

Could you split from post 11 onward into a separate thread?  I'd like to continue this discussion without going off-topic.
Title: Re: Intuition questions...
Post by: Karlos on August 29, 2011, 07:18:43 PM
Quote from: SamuraiCrow;656750
@Karlos

Could you split from post 11 onward into a separate thread?  I'd like to continue this discussion without going off-topic.


Done.
Title: Re: Intuition questions...
Post by: Karlos on August 29, 2011, 07:29:41 PM
Quote from: franko
While I see the points your making they are not exactly correct however...

Take for example something written for doing IO like an HD DOS driver or device. If you wrote that entirely in C it would be highly inefficient in comparison to coding it in assembler...

Sure not matter which method you choose to write your code in they both are restricted by hardware & the hardware bus and physical speeds of IO lines...

BUT that's not where it ends, as the actual code for shifting all this data back and forth is in the driver you write and if you write it in C and not in highly optimised assembler you lose speed overall as your routine performs it's code x amount of time per second...

You are making some poor assumptions there. If you are dealing with a slow bus, then "inefficient" generated code can often be as fast as hand-written assembler simply because the latency of the IO hides the cost of the operation being performed.

If you want demonstrable proof of this, look no further than C2P to chip RAM on any decent 040 or higher. The most highly tuned implementations tend to run at copy speed, that is to say, as fast as a vanilla unrolled move.l (a0)+, (a1) style loop. And yet they have many more instructions per longword transferred than the latter. The point being, that the cost of many instructions (compared to a basic copy loop) is entirely masked by the slow bus.

Likewise, a naive C longword copy loop such as the following

Code: [Select]
while (count--) {
   *dest++ = *src++;
}

will perform almost as well as a hand written move.l based loop when it comes to slow buses like the Chip RAM or Zorro-II interface. However, the compiler will almost certainly unroll the above at any modest level of optimization, resulting in more efficient code than the above loop implies.

Sure there are other tricks you can try, like playing around with MMU settings and imprecise cache modes on 060 that can get you a boost, so you can definitely improve upon what vanilla C can do in some cases, but not all.

I've tested various techniques to try and burst data faster to my graphics card, using hand generated move16 and other such contrivances and in the end, they simply weren't significantly faster than well tuned C (used a 16x unrolled Duff's device loop) code.

Anyway, the two aren't at opposition. One of the best features about C is that it's usually fairly easy to add assembler into places where you know it isn't going to be able to compete with your own ingenuity or domain knowledge. However, it takes the drudgery out of almost everything else.
Title: Re: Intuition questions...
Post by: golem on August 29, 2011, 07:37:43 PM
Quote from: Karlos;656729
Personally, I go for C(++) first, then anything that still isn't fast enough after reviewing algorithm changes and profiling I will look at writing assembler replacements for.

Some stuff just doesn't need optimizing. Basic IO, for example, is almost always going to be limited by the speed of the device being communicated with. Event handling is another example. No amount of assembler will basically speed up having to wait for an asynchronous event to happen.

The problem with assembly coding is that unless you keep absolutely up-to-date with each revision of your target architecture, you will always fall foul of bad assumptions in the end.

There are many clock cycle optimisations for the basic 68000 that are slower on the 68020. Likewise, once you master the 020's behaviour, a lot of it ends up being counter-productive on the 68040.

Then there are general changes in system architecture. On the cacheless 68000, precomputed lookup tables were king to speed up various complex operations. As processors have gotten faster in relation to memory, it often ends up quicker to evaluate the expression than it does to precompute it and perform memory lookups, unless you can arrange your precomputed data in a very cache friendly way.

Anyhow, this is somewhat off-topic.


I am not a programmer, only hobbyist and professionally IT support but I get your point that whether you go to machine code is dependant upon what you are trying to do. I had a  big project that was very CPU intensive (mainly subset generating algorithms) and I coded this in 68k machine code which was fast but stupid. It taught me about the 68000 but when I converted it to C 12 years later I could then even recompile it on Linux with very few changes and it worked. If I was banging the Amiga hardware then obviously this wouldn't have been possible and I suppose that is one of the cases where assembler rules.
Title: Re: Intuition questions...
Post by: SamuraiCrow on August 29, 2011, 07:43:47 PM
Most of the reason you write C code isn't for performance but maintainability.

I've rewritten the hash function of my AmigaE hash table class in Assembly.  It cut out a lot of cruft but mostly the cruft was the result of E not supporting bit rotations.

In order to run it on a non-Classic Amiga, such as an AROS system, I had to also write the code in PortablE and it generated some hacky-looking C++ code but the GCC compiler knows how to convert a couple of shifts, and an OR to a rotate internally.  Now suddenly I don't have to worry about writing in a new Assembly code for my x86 AROS hosted environment for the Mac, nor for PPC AROS, nor anything else.

It's a tradeoff that's becoming increasingly biased against hand-optimized code beyond what C can offer.
Title: Re: Intuition questions...
Post by: Karlos on August 29, 2011, 07:53:54 PM
@SamuraiCrow

LOL, regarding bitwise rotate, I totally agree:
Code: [Select]

///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
//
//  File:         platforms/amigaos3_68k/systemlib/machine_bitops_native.hpp
//  Tab Size:     2
//  Max Line:     120
//  Description:  AmigaOS Specific implementation of systemlib internals
//  Comment(s):
//  Library:      System
//  Created:      2006-10-08
//  Updated:      2006-10-08
//  Author(s):    Karl Churchill
//  Note(s):
//  Copyright:    (C)2006+, eXtropia Studios
//                Karl Churchill
//                All Rights Reserved.
//
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

#ifndef _EXNG2_SYSTEMLIB_BITOPS_NATIVE_HPP
# define _EXNG2_SYSTEMLIB_BITOPS_NATIVE_HPP

////////////////////////////////////////////////////////////////////////////////
//
//  Native bit operations
//
////////////////////////////////////////////////////////////////////////////////

namespace Machine {

  template<typename T>
  inline T _rotLeft8(uint32 bits, T val)
  {
    if (__builtin_constant_p(bits)) {
      if (bits&7) {
        asm(
          &quot;rol.b %1, %0\n&quot;
          : &quot;=d&quot;(val) : &quot;I&quot;(bits&7), &quot;0&quot;(val) : &quot;cc&quot;
        );
      }
    } else {
      asm(
        &quot;rol.b %1, %0\n&quot;
        : &quot;=d&quot;(val) : &quot;d&quot;(bits), &quot;0&quot;(val) : &quot;cc&quot;
      );
    }
    return val;
  }

  template<typename T>
  inline T _rotLeft16(uint32 bits, T val)
  {
    if (__builtin_constant_p(bits)) {
      if (bits&15) {
        // only rotate when modulus 16 > 0
        if ((bits&15) < 9) {
          asm(
            &quot;rol.w %1, %0\n&quot;
            : &quot;=d&quot;(val) : &quot;I&quot;(bits&15), &quot;0&quot;(val) : &quot;cc&quot;
          );
        }
        else {
          // use opposite rotate for N > 8
          asm(
            &quot;ror.w %1, %0\n&quot;
            : &quot;=d&quot;(val) : &quot;I&quot;(16-(bits&15)), &quot;0&quot;(val) : &quot;cc&quot;
          );
        }
      }
    }
    else {
      asm(
        &quot;rol.w %1, %0\n&quot;
        : &quot;=d&quot;(val) : &quot;d&quot;(bits), &quot;0&quot;(val) : &quot;cc&quot;
      );
    }
    return val;
  }

  template<typename T>
  inline T  _rotRight8(uint32 bits, T val)
  {
    if (__builtin_constant_p(bits)) {
      if (bits&7) {
        // only rotate when modulus 8 > 0
        asm(
          &quot;ror.b %1, %0\n&quot;
          : &quot;=d&quot;(val) : &quot;I&quot;(bits&7), &quot;0&quot;(val) : &quot;cc&quot;
        );
      }
    }
    else {
      asm(
        &quot;ror.b %1, %0\n&quot;
        : &quot;=d&quot;(val) : &quot;d&quot;(bits), &quot;0&quot;(val) : &quot;cc&quot;
      );
    }
    return val;
  }

  template<typename T>
  inline T _rotRight16(uint32 bits, T val)
  {
    if (__builtin_constant_p(bits)) {
      if (bits&15) {
        // only rotate when modulus 16 > 0
        if ((bits&15) < 9) {
          asm(
            &quot;ror.w %1, %0\n&quot;
            : &quot;=d&quot;(val) : &quot;I&quot;(bits&15), &quot;0&quot;(val) : &quot;cc&quot;
          );
        }
        else {
          // use opposite rotate for N > 8
          asm(
            &quot;rol.w %1, %0\n&quot;
            : &quot;=d&quot;(val) : &quot;I&quot;(16-(bits&15)), &quot;0&quot;(val) : &quot;cc&quot;
          );
        }
      }
    }
    else {
      asm(
        &quot;ror.w %1, %0\n&quot;
        : &quot;=d&quot;(val) : &quot;d&quot;(bits), &quot;0&quot;(val) : &quot;cc&quot;
      );
    }
    return val;
  }


  inline uint16 swap16(uint16 val)
  {
    if (__builtin_constant_p(val)) {
      val = val<<8|val>>8;
    } else {
      asm(
        &quot;rol.w #8, %0\n&quot;
        : &quot;=d&quot;(val)
        : &quot;0&quot;(val)
        : &quot;cc&quot;
      );
    }
    return val;
  }
  #define _EXNG2_MACHINE_HAS_SWAP16

  inline uint32 swap32(uint32 val)
  {
    if (__builtin_constant_p(val)) {
      val = val<<16 | val>>16;
      val = ((val&0x00FF00FF)<<8) | ((val&0xFF00FF00)>>8);
    } else {
      asm(
        &quot;rol.w #8, %0\n\t&quot;
        &quot;swap %0\n\t&quot;
        &quot;rol.w #8, %0\n&quot;
        : &quot;=d&quot;(val)
        : &quot;0&quot;(val)
        : &quot;cc&quot;
      );
    }
    return val;
  }
  #define _EXNG2_MACHINE_HAS_SWAP32

  inline uint64 swap64(uint64 val)
  {
    if (__builtin_constant_p(val)) {
      return  (((val & 0xff00000000000000ull) >> 56)
            | ((val & 0x00ff000000000000ull) >> 40)
            | ((val & 0x0000ff0000000000ull) >> 24)
            | ((val & 0x000000ff00000000ull) >> 8)
            | ((val & 0x00000000ff000000ull) << 8)
            | ((val & 0x0000000000ff0000ull) << 24)
            | ((val & 0x000000000000ff00ull) << 40)
            | ((val & 0x00000000000000ffull) << 56));
    }
    else {
      union { uint64 u64; uint32 u32[2]; };
      u64 = val;
      uint32 msw  = swap32(u32[0]);
      u32[0]      = swap32(u32[1]);
      u32[1]      = msw;
      return u64;
    }
  }
  #define _EXNG2_MACHINE_HAS_SWAP64

  // runtime known rotate
  inline uint32 rotLeft8_32(uint32 bits, uint32 val)  { return _rotLeft8<uint32>(bits, val); }
  inline uint16 rotLeft8_16(uint32 bits, uint16 val)  { return _rotLeft8<uint16>(bits, val); }
  inline uint8  rotLeft8(uint32 bits, uint8 val)      { return _rotLeft8<uint8>(bits, val); }



  #define _EXNG2_MACHINE_HAS_ROL8

  inline uint32 rotLeft16_32(uint16 bits, uint32 val) { return _rotLeft16<uint32>(bits, val); }
  inline uint16 rotLeft16(uint32 bits, uint16 val)    { return _rotLeft16<uint16>(bits, val); }


  #define _EXNG2_MACHINE_HAS_ROL16

  inline uint32 rotLeft32(uint32 bits, uint32 val)
  {
    if (__builtin_constant_p(bits)) {
      if (bits&31) {
        // only rotate when modulus 32 > 0
        if ((bits&31) < 9) {
          asm(
            &quot;rol.l %1, %0\n&quot;
            : &quot;=d&quot;(val) : &quot;I&quot;(bits&31), &quot;0&quot;(val) : &quot;cc&quot;
          );
        }
        else if ((bits&31)==16) {
          asm(
            &quot;swap %0\n&quot;
            : &quot;=d&quot;(val) : &quot;0&quot;(val) : &quot;cc&quot;
          );
        }
        else if ((bits&31)>23) {
          // use opposite rotate for N > 23
          asm(
            &quot;ror.l %1, %0\n&quot;
            : &quot;=d&quot;(val) : &quot;I&quot;(32-(bits&31)), &quot;0&quot;(val) : &quot;cc&quot;
          );
        }
        else {
          // use register rotate for all intermediate sizes
          asm(
            &quot;rol.l %1, %0\n&quot;
            : &quot;=d&quot;(val) : &quot;d&quot;(bits&31), &quot;0&quot;(val) : &quot;cc&quot;
          );
        }
      }
    }
    else {
      asm(
        &quot;rol.l %1, %0\n&quot;
        : &quot;=d&quot;(val) : &quot;d&quot;(bits), &quot;0&quot;(val) : &quot;cc&quot;
      );
    }
    return val;
  }
  #define _EXNG2_MACHINE_HAS_ROL32

  inline uint32 rotRight8_32(uint32 bits, uint32 val) { return _rotRight8<uint32>(bits, val); }
  inline uint16 rotRight8_16(uint32 bits, uint16 val) { return _rotRight8<uint16>(bits, val); }
  inline uint8  rotRight8(uint32 bits, uint8 val)     { return _rotRight8<uint8>(bits, val);  }

  #define _EXNG2_MACHINE_HAS_ROR8

  inline uint32 rotRight16_32(uint32 bits, uint32 val)  { return _rotRight16<uint32>(bits, val); }
  inline uint16 rotRight16(uint32 bits, uint16 val)     { return _rotRight16<uint32>(bits, val); }

  #define _EXNG2_MACHINE_HAS_ROR16

  inline uint32 rotRight32(uint32 bits, uint32 val)
  {
    if (__builtin_constant_p(bits)) {
      if (bits&31) {
        // only rotate when modulus 32 > 0
        if ((bits&31) < 9) {
          asm(
            &quot;ror.l %1, %0\n&quot;
            : &quot;=d&quot;(val) : &quot;I&quot;(bits&31), &quot;0&quot;(val) : &quot;cc&quot;
          );
        }
        else if ((bits&31)==16) {
          asm(
            &quot;swap %0\n&quot;
            : &quot;=d&quot;(val) : &quot;0&quot;(val) : &quot;cc&quot;
          );
        }
        else if ((bits&31)>23) {
          // use opposite rotate for N > 23
          asm(
            &quot;rol.l %1, %0\n&quot;
            : &quot;=d&quot;(val) : &quot;I&quot;(32-(bits&31)), &quot;0&quot;(val) : &quot;cc&quot;
          );
        }
        else {
          // use register rotate for all intermediate sizes
          asm(
            &quot;ror.l %1, %0\n&quot;
            : &quot;=d&quot;(val) : &quot;d&quot;(bits&31), &quot;0&quot;(val) : &quot;cc&quot;
          );
        }
      }
    }
    else {
      asm(
        &quot;ror.l %1, %0\n&quot;
        : &quot;=d&quot;(val) : &quot;d&quot;(bits), &quot;0&quot;(val) : &quot;cc&quot;
      );
    }
    return val;
  }
  #define _EXNG2_MACHINE_HAS_ROR32

  inline sint32 mostSigBit32(uint32 val)
  {
    asm(
      &quot;bfffo %0 {#0:#32}, %0&quot; &quot;\n\t&quot;
      &quot;eor.w #31,%0\n&quot;
      : &quot;=d&quot;(val) : &quot;0&quot;(val) : &quot;cc&quot;
    );
    return val;
  }
  #define _EXNG2_MACHINE_HAS_BFFFO

};

#endif

;-)

If you know your gnu C, you'll recognise that almost all of that reduces down to inserting just the right bitwise rotate operation and despite the apparent awesome size of the C++ code, usually boils down to 1-3 instructions that are identical to what you'd write for an assembler version. The __builtin_constant_p() test is a compile time operation that, when the operand is determined to be a constant value, ends up emitting a constant value for the output. After all, there's no sense in rotating a constant, when you can just use the constant it would evaluate to.
Title: Re: Intuition questions...
Post by: Sidewinder on August 29, 2011, 08:38:19 PM
Quote
Some stuff just doesn't need optimizing. Basic IO, for example, is almost always going to be limited by the speed of the device being communicated with. Event handling is another example. No amount of assembler will basically speed up having to wait for an asynchronous event to happen.


This is a good argument when thinking about a single process or thread, but on multi-tasking systems like the Amiga a different process may be able to use the open processor time for further computation.  Thus an optimized I/O routine would be preferable to an unoptimized one in terms of overall system performance.

If I had unlimited time I'd probably want to write everything in assembly to be the most efficient possible.  But, since I do not, the question boils down to priorities.  If it will take another 100 hours of coding to rewrite the I/O system to save only a few cycles, it's probably not worth it unless overall system speed and efficiency are the priority.  In most cases the 100 hours would be better spent speeding up the most used code sections.

In addition, there is the case of specialization.  Very few people can say they are experts in all areas of computer architecture.  There are many cases where I'm fairly certain I would not be able to improve upon the work of others.  For me, I/O is one such case.  I'm content to trust that the authors of the I/O library have done their homework and made their code as efficient as possible.


And the original poster made the claim that with assembler...

Quote
you get the absolute best speed and efficiency from your code...


Clearly this statement cannot always be true.  In the right hands an assembly program can be a masterpiece, but in naive hands it can be a disaster.  Writing efficient assembly code takes skill and extensive knowledge of system architecture.  If one doesn't have this knowledge, C would probably be a better choice--unless, of course, the goal is to gain this knowledge.
Title: Re: Intuition questions...
Post by: commodorejohn on August 29, 2011, 08:47:31 PM
The problem with C on small systems is that everybody uses GCC, which just plain isn't designed for efficiency so much as for massively multi-platform reliability. I don't know if there's a decent C99 compiler for 68k out there, but it would sure help things if there were.

That said, I kind of agree with Franko - 68k assembler is the nicest I've ever used in terms of programmer-friendliness, and it's pretty much guaranteed to be better than GCC at least. If you're doing an Amiga-specific project that's fairly simple in organization, I don't see a reason not to use it - it'll help majorly on low-end Amiga systems (and yes, people still use them,) and for those of us with a slightly beefier setup, the extra efficiency is just gravy :)
Title: Re: Intuition questions...
Post by: SamuraiCrow on August 29, 2011, 08:50:56 PM
@CommodoreJohn

VBCC (http://sun.hasenbraten.de/vbcc/) is largely C99 compliant.
Title: Re: Intuition questions...
Post by: Karlos on August 29, 2011, 08:51:55 PM
Quote from: Sidewinder;656765
This is a good argument when thinking about a single process or thread, but on multi-tasking systems like the Amiga a different process may be able to use the open processor time for further computation.  Thus an optimized I/O routine would be preferable to an unoptimized one in terms of overall system performance.

Not necessarily. While the processor is waiting for the bus, as it would be with slow IO, you can't just assume you can go away and run another thread. The OS divides processor time into quanta that are much larger than the granularity we are talking about here.

What you are saying is true when you are literally Wait()ing for IO, that is, having put the thread to sleep while waiting for an interrupt or IPC event of some kind.
Title: Re: OT - assembler versus C for Amiga development
Post by: billt on August 29, 2011, 08:55:33 PM
What is a good 68k assembler to use today? I may have use for it more for 68000 system-agnostic than for Amiga-specific. Soething I could use with easy68K or ide68k simulator and other things that don't have much of a system attached to the CPU. For all 68K flavors 68000 to 68060, but at least 68000.

For Amiga programming I'll go C and maybe learn a little C++.

Can one use WinUAE to test generic 68k assembler binaries without an OS in the way?

I'd like to tinker with the free/open verilog/vhdl 68k CPUs and compare particular things with simulator to a known working chip or software sim, and it seems UAE or easy68k is probably easier to get/use than some 68k experimenter board today.
Title: Re: OT - assembler versus C for Amiga development
Post by: Karlos on August 29, 2011, 08:59:50 PM
Quote from: billt;656773
What is a good 68k assembler to use today?


Quite a few. If you can find it, DevPac was great. As I'm only writing subcomponents of code in assembler these days, I tend to use PhxAss. Failing that, I just inject assembler directly into C code when using gcc.
Title: Re: OT - assembler versus C for Amiga development
Post by: Karlos on August 29, 2011, 09:02:30 PM
One final comment on the overall subject, as far as I'm concerned, you don't need to know anything about assembler to be a C programmer, but writing assembly language gives you a much better insight into how to write C optimized for a given platform.

Everybody that wants to write fast code in any compiled language should be a bit familiar with assembler at least, just to understand the inner workings of how their kit works.
Title: Re: OT - assembler versus C for Amiga development
Post by: SamuraiCrow on August 29, 2011, 09:04:21 PM
Quote from: Karlos;656774
Quite a few. If you can find it, DevPac was great. As I'm only writing subcomponents of code in assembler these days, I tend to use PhxAss. Failing that, I just inject assembler directly into C code when using gcc.


The author of PhxAss has written a newer Assembler that is more flexible.  See VAsm (http://sun.hasenbraten.de/vasm/) for details.
Title: Re: OT - assembler versus C for Amiga development
Post by: Karlos on August 29, 2011, 09:07:36 PM
Quote from: SamuraiCrow;656776
The author of PhxAss has written a newer Assembler that is more flexible.  See VAsm (http://sun.hasenbraten.de/vasm/) for details.


It was probably vasm that I meant :lol: Old naming traditions die hard.
Title: Re: Intuition questions...
Post by: commodorejohn on August 29, 2011, 09:34:04 PM
Quote from: SamuraiCrow;656769
VBCC (http://sun.hasenbraten.de/vbcc/) is largely C99 compliant.
Hmm. How's the code quality?
Quote from: billt;656773
What is a good 68k assembler to use today? I may  have use for it more for 68000 system-agnostic than for Amiga-specific.  Soething I could use with easy68K or ide68k simulator and other things  that don't have much of a system attached to the CPU. For all 68K  flavors 68000 to 68060, but at least 68000.
I agree with Karlos that Devpac is quite nice; if you want to code on a  non-Amiga platform, vasm (http://sun.hasenbraten.de/vasm/) (which SamuraiCrow already linked, but it bears repeating)  apparently supports Devpac's directives on top of the standard Motorola  syntax.
Quote from: Karlos;656775
One final comment on the overall subject, as far as  I'm concerned, you don't need to know anything about assembler to be a C  programmer, but writing assembly language gives you a much better  insight into how to write C optimized for a given platform.

Everybody that wants to write fast code in any compiled language should  be a bit familiar with assembler at least, just to understand the inner  workings of how their kit works.
Amen. Amen. Even if you never write a single project in assembler, understanding the nuances of your architecture(s) is crucial to being able to write good code for them.
Title: Re: OT - assembler versus C for Amiga development
Post by: itix on August 29, 2011, 10:04:13 PM
Quote from: Karlos;656775
Everybody that wants to write fast code in any compiled language should be a bit familiar with assembler at least, just to understand the inner workings of how their kit works.

I sort of agree but I wouldnt recommend it. It can lead to bad habits. I know coder who used to write lenghty C# methods because he knew there is always small overhead when calling subroutines in assembler code. Obviously that is not relevant to C# anymore and often not relevant to low level languages like C/C++ even.

I sometimes see this similar behaviour in my code when I am monitoring generated assembly to optimize software pipelined loops...

But of course assembly coding has its place in time critical routines and sometimes you just cant get compilers to produce efficient code for your task (i.e. AltiVec optimizations or using special PPC instructions, move16 on 68k and so on).

Quote
There are many clock cycle optimisations for the basic 68000 that are slower on the 68020. Likewise, once you master the 020's behaviour, a lot of it ends up being counter-productive on the 68040.

That it so true. It is also possible beat machine language if you select better algorithm. Bubble sort is always slow no matter how many hours is spent to squeeze last clock cycles away. Refactoring is so much easier in higher level languages and you are also more productive.
Title: Re: OT - assembler versus C for Amiga development
Post by: Karlos on August 29, 2011, 10:11:22 PM
Quote from: itix;656787
I sort of agree but I wouldnt recommend it. It can lead to bad habits. I know coder who used to write lenghty C# methods because he knew there is always small overhead when calling subroutines in assembler code. Obviously that is not relevant to C# anymore and often not relevant to low level languages like C/C++ even.

Agreed. You should understand how your high-level language works first and foremost, particularly how it is optimized by your compiler. C# adds an extra layer of indirection through the CLR that means a lot of assumptions you might make about low level performance of language constructs may be invalid. However, I stand by the assertion that being able to look at code performance from both ends is better than understanding it from one end only.
Title: Re: Intuition questions...
Post by: SamuraiCrow on August 29, 2011, 10:12:43 PM
Quote from: commodorejohn;656782
Hmm. How's the code quality?


Better than GCC.
Title: Re: OT - assembler versus C for Amiga development
Post by: commodorejohn on August 29, 2011, 10:17:36 PM
Quote from: itix;656787
I sort of agree but I wouldnt recommend it. It can lead to bad habits. I know coder who used to write lenghty C# methods because he knew there is always small overhead when calling subroutines in assembler code. Obviously that is not relevant to C# anymore and often not relevant to low level languages like C/C++ even.

I sometimes see this similar behaviour in my code when I am monitoring generated assembly to optimize software pipelined loops...

That it so true. It is also possible beat machine language if you select better algorithm. Bubble sort is always slow no matter how many hours is spent to squeeze last clock cycles away. Refactoring is so much easier in higher level languages and you are also more productive.
Yes and no. Unthinking application of optimization techniques learned by rote is going to lead to convoluted code that probably isn't even that optimal, whether you're writing in C, assembler, Forth, or what-the-hell-have-you. And while you're absolutely right that intelligent refactoring can make much more difference than a couple assembler tweaks ever will, I wouldn't say that's "beating machine language" - refactoring is refactoring no matter what it's written in.

This is why I love Michael Abrash. His Black Book (http://www.gamedev.net/page/resources/_/reference/programming/140/283/graphics-programming-black-book-r1698) is specifically geared towards 386/486 optimization, but there's so much information in there just as suitable to any architecture...one key point of which is "the best optimizer is between your ears." Learning how to identify problem areas and optimize them intelligently will serve you well in any language.