Amiga.org

Amiga News and Community Announcements => Amiga News and Community Announcements => Amiga Software News => Topic started by: unusedunused on May 05, 2010, 03:46:37 PM

Title: new ixemul V63.1 Version
Post by: unusedunused on May 05, 2010, 03:46:37 PM: http://amiga.sourceforge.net/

Version 63.1

* I find functions that are not in MOS ixemul but in the 48.3 fork from megacz
and i add them now in this ixemul

realpath
inet_ntop
inet_pton
gai_strerror
freeaddrinfo
getaddrinfo
getnameinfo

* Add ix_UseAmigaPaths(mode) func.
When mode = 1 then amiga Path Mode is used.
For example when current dir is sys:wbstartup
ix_UseAmigaPaths(1);
h = fopen("/tools.info","r");

work

* Add C99 funcs
hypot
exp2
exp2f
log10f
expf

* revert change from jDc in __write.c to V48 because some programs do not correct line feeds in Shell Output(gdb ) with new Code
and for what the new Code is need i dont know

* Add dummy libm.a that come in lib dir.It help some
configure scripts that want link with libm to detect
available math functions correct.

* the ixemul with poolmem allocator give a message when run sashimi on first ixemul use.
"ixemul for poolmem used".
for fewest mem fragmentation dont use it currently, but its usefull for develop,
because then Amiga memtracker can watch all malloc and free

* usleep use now precise Amiga Timer and work correct with newest ffplay builds now.
Title: Re: new ixemul V63.1 Version
Post by: amigagr on May 05, 2010, 08:08:09 PM: is this new version only for 060?
Title: Re: new ixemul V63.1 Version
Post by: matthey on May 05, 2010, 08:33:16 PM: @amigagr
Knowing Bernd, it's probably compiled for the 68060 without bitfield instructions. This should be compatible with the 68040 although is not optimal. It should work on the 68020 and 68030 with good speed but there is a much greater chance of incompatibility if floating point is used. Bernd leaves out the bit field instructions because they are slow on UAE and the GCC compiler does not make good choices of speed trade offs when deciding to use. The 68040 would probably benefit from them though. My best answer to your question is probably and try it. Bernd might offer some different advise though.
Title: Re: new ixemul V63.1 Version
Post by: amigagr on May 05, 2010, 11:26:03 PM: a friend of mine told me some days ago that the new netsurf 68k was asking for fpu.
i told him to run netsurf after snoopdos and we found that the new ixemul was that which asking for the fpu.
after a version check he realize that had the 68060 ixemul, but in his amiga has only a blizard030/50.
Title: Re: new ixemul V63.1 Version
Post by: Karlos on May 05, 2010, 11:57:30 PM: @matthey

I'll be honest, whenever I've used bitfield operations in assembler on a real 68040 I've often found them to be slower than the equivalent mask/shift/combine operation. Never did figure out why that was. The whole point of having them was to speed up those sorts of operations.
Title: Re: new ixemul V63.1 Version
Post by: matthey on May 06, 2010, 08:48:21 AM: @amigagr
Quite possible. If your friend has a 68881 or 68882 it might work. There are a few new fpu instructions in the 68040+ that aren't really necessary (fdxxx and fsxxx). If that is the case, then he might ask Bernd to compile a different version. If he doesn't have a fpu then he is most likely going to crash.

@Karlos
Yes, the bit field insructions are often slower for simple shift and mask even on the 68040. A bf instruction usually needs to replace at least 3 shift and mask instructions on the 68040 to be faster. Shifting and rotating is not the strong point of the 68040. The 68060 can usually do the shift and mask in 1 cycle yet the bf instructions are about the same speed as the 68040. Therefore, the 68060 usually needs to replace 6 to 18 shift and mask instructions to be faster. That makes most of the bf instructions not very useful for time critical code on the 68060. bfffo and bfext are exceptions if branches would be needed with the replacements. A missed branch (always occurs on a loop) still takes 7 cycles on the 68060 which is about what a bf instruction takes. GCC uses the bf instructions about right on the 68040 but way too much for the 68060.
Title: Re: new ixemul V63.1 Version
Post by: unusedunused on May 06, 2010, 09:02:50 AM: Quote from: amigagr;556836
a friend of mine told me some days ago that the new netsurf 68k was asking for fpu.
i told him to run netsurf after snoopdos and we found that the new ixemul was that which asking for the fpu.
after a version check he realize that had the 68060 ixemul, but in his amiga has only a blizard030/50.

also if you have a ixemul without FPU support, the problem is today on Linux systems a FPU is standard and its most used.for example netsurf need it for CSS.and CSS layouting is a speed critical part.

we all know amiga have a fast floating point ffp format, the old Amiga Compilers support direct.But this need special compiled programs and translation code for data.because the format is not compatible to FPU format.

GCC do only support the software float format, that is compatible with the FPU.And this is slow.

so the result is, when you compile a program only for software float it get unusable slow.

see here thats the code for mul a single float.normaly a FPU can do that in 2 clocks.but this code need more than 80 clocks.thats around 40* slower.but if you not believe you can try a non fpu build of netsurf.the source is on aminet.

| float __mulsf3(float, float);
   FUNC(__mulsf3)
SYM (__mulsf3):
#ifndef __mcoldfire__
   link   a6,IMM (0)
   moveml   d2-d7,sp@-
#else
   link   a6,IMM (-24)
   moveml   d2-d7,sp@
#endif
   movel   a6@(8),d0   | get a into d0
   movel   a6@(12),d1   | and b into d1
   movel   d0,d7      | d7 will hold the sign of the product
   eorl   d1,d7      |
   andl   IMM (0x80000000),d7
   movel   IMM (INFINITY),d6   | useful constant (+INFINITY)
   movel   d6,d5         | another (mask for fraction)
   notl   d5         |
   movel   IMM (0x00800000),d4   | this is to put hidden bit back
   bclr   IMM (31),d0      | get rid of a's sign bit '
   movel   d0,d2         |
   beq   Lmulsf$a$0      | branch if a is zero
   bclr   IMM (31),d1      | get rid of b's sign bit '
   movel   d1,d3      |
   beq   Lmulsf$b$0   | branch if b is zero
   cmpl   d6,d0      | is a big?
   bhi   Lmulsf$inop   | if a is NaN return NaN
   beq   Lmulsf$inf   | if a is INFINITY we have to check b
   cmpl   d6,d1      | now compare b with INFINITY
   bhi   Lmulsf$inop   | is b NaN?
   beq   Lmulsf$overflow | is b INFINITY?
| Here we have both numbers finite and nonzero (and with no sign bit).
| Now we get the exponents into d2 and d3.
   andl   d6,d2      | and isolate exponent in d2
   beq   Lmulsf$a$den   | if exponent is zero we have a denormalized
   andl   d5,d0      | and isolate fraction
   orl   d4,d0      | and put hidden bit back
   swap   d2      | I like exponents in the first byte
#ifndef __mcoldfire__
   lsrw   IMM (7),d2   |
#else
   lsrl   IMM (7),d2   |
#endif
Lmulsf$1:         | number
   andl   d6,d3      |
   beq   Lmulsf$b$den   |
   andl   d5,d1      |
   orl   d4,d1      |
   swap   d3      |
#ifndef __mcoldfire__
   lsrw   IMM (7),d3   |
#else
   lsrl   IMM (7),d3   |
#endif
Lmulsf$2:         |
#ifndef __mcoldfire__
   addw   d3,d2      | add exponents
   subw   IMM (F_BIAS+1),d2 | and subtract bias (plus one)
#else
   addl   d3,d2      | add exponents
   subl   IMM (F_BIAS+1),d2 | and subtract bias (plus one)
#endif

| We are now ready to do the multiplication. The situation is as follows:
| both a and b have bit FLT_MANT_DIG-1 set (even if they were
| denormalized to start with!), which means that in the product
| bit 2*(FLT_MANT_DIG-1) (that is, bit 2*FLT_MANT_DIG-2-32 of the
| high long) is set.

| To do the multiplication let us move the number a little bit around ...
   movel   d1,d6      | second operand in d6
   movel   d0,d5      | first operand in d4-d5
   movel   IMM (0),d4
   movel   d4,d1      | the sums will go in d0-d1
   movel   d4,d0

| now bit FLT_MANT_DIG-1 becomes bit 31:
   lsll   IMM (31-FLT_MANT_DIG+1),d6

| Start the loop (we loop #FLT_MANT_DIG times):
   moveq   IMM (FLT_MANT_DIG-1),d3
1:   addl   d1,d1      | shift sum
   addxl   d0,d0
   lsll   IMM (1),d6   | get bit bn
   bcc   2f      | if not set skip sum
   addl   d5,d1      | add a
   addxl   d4,d0
2:
#ifndef __mcoldfire__
   dbf   d3,1b      | loop back
#else
   subql   IMM (1),d3
   bpl   1b
#endif

| Now we have the product in d0-d1, with bit (FLT_MANT_DIG - 1) + FLT_MANT_DIG
| (mod 32) of d0 set. The first thing to do now is to normalize it so bit
| FLT_MANT_DIG is set (to do the rounding).
#ifndef __mcoldfire__
   rorl   IMM (6),d1
   swap   d1
   movew   d1,d3
   andw   IMM (0x03ff),d3
   andw   IMM (0xfd00),d1
#else
   movel   d1,d3
   lsll   IMM (8),d1
   addl   d1,d1
   addl   d1,d1
   moveq   IMM (22),d5
   lsrl   d5,d3
   orl   d3,d1
   andl   IMM (0xfffffd00),d1
#endif
   lsll   IMM (8),d0
   addl   d0,d0
   addl   d0,d0
#ifndef __mcoldfire__
   orw   d3,d0
#else
   orl   d3,d0
#endif

   moveq   IMM (MULTIPLY),d5

   btst   IMM (FLT_MANT_DIG+1),d0
   beq   Lround$exit
#ifndef __mcoldfire__
   lsrl   IMM (1),d0
   roxrl   IMM (1),d1
   addw   IMM (1),d2
#else
   lsrl   IMM (1),d1
   btst   IMM (0),d0
   beq   10f
   bset   IMM (31),d1
10:   lsrl   IMM (1),d0
   addql   IMM (1),d2
#endif
   bra   Lround$exit

Lmulsf$inop:
   moveq   IMM (MULTIPLY),d5
   bra   Lf$inop

Lmulsf$overflow:
   moveq   IMM (MULTIPLY),d5
   bra   Lf$overflow

Lmulsf$inf:
   moveq   IMM (MULTIPLY),d5
| If either is NaN return NaN; else both are (maybe infinite) numbers, so
| return INFINITY with the correct sign (which is in d7).
   cmpl   d6,d1      | is b NaN?
   bhi   Lf$inop      | if so return NaN
   bra   Lf$overflow   | else return +/-INFINITY

| If either number is zero return zero, unless the other is +/-INFINITY,
| or NaN, in which case we return NaN.
Lmulsf$b$0:
| Here d1 (==b) is zero.
   movel   a6@(8),d1   | get a again to check for non-finiteness
   bra   1f
Lmulsf$a$0:
   movel   a6@(12),d1   | get b again to check for non-finiteness
1:   bclr   IMM (31),d1   | clear sign bit
   cmpl   IMM (INFINITY),d1 | and check for a large exponent
   bge   Lf$inop      | if b is +/-INFINITY or NaN return NaN
   movel   d7,d0      | else return signed zero
   PICLEA   SYM (_fpCCR),a0   |
   movew   IMM (0),a0@   |
#ifndef __mcoldfire__
   moveml   sp@+,d2-d7   |
#else
   moveml   sp@,d2-d7
   | XXX if frame pointer is ever removed, stack pointer must
   | be adjusted here.
#endif
   unlk   a6      |
   rts         |
Title: Re: new ixemul V63.1 Version
Post by: unusedunused on May 06, 2010, 09:26:46 AM: >If that is the case, then he might ask Bernd to compile a different version. If he doesn't >have a fpu then he is most likely going to crash.

the 68060 Version work on 68020 68881 good.

if somebody can show me on a real world program that a real 68020 build give a speedup, and the program is not too slow to usable, i can build a 68020 version too.

but i have not the time to do builds that nobody need.

But i think this cant happen, because when look with ixtrace what code programs mostly execute on ixemul is memalloc, string functions.

and here is no diffrence possible, the string functions are in asm written and the memalloc time depend on cache misses a lot.

i also test ixemul poolmem version with tlsfmem and netsurf.i change the AFA exec so, that pool threshold is set to 4 and so every time a allocpooled is called, it do a allocmem because allocmem size is greater and so the pool is not used.

In benchmarks this give the fastest speed.
but in real world with netsurf and other C++ programs(that alloc and free small pieces of mem very often) tlsf is realy slow on my AMD64 3000+ System.64 kb 1. Level Cache 256 kb 2. Level Cache and winuae.

The main Problem on tlsf is, that it is a best fit allocator and here it happen that mem allocs that with other memallocators are placed together, so they fit better in cache are scatter around and do lots cache misses, that slow down netsurf by Factor 2 on my system.but of course tlsfmem is faster as when use no tlsfmem and do allways allocmem with this special exec.
here the system slow down by factor 4.but then there come no crashes.

i see that netsurf or other ixemul programs(that do often malloc and free) crash when use tlsfmem, this happen on a classic too.
speed tests are here not get.

I dont know wy this happen, netsurf work stable, memtracker show no illegal access, and it work with poolmem and buddy allocator good.

but if somebody want test on classic tlsfmem and netsurf more for speed , i can send a exec version for AFA.but be warned, its possible that you can kill your install.
Title: Re: new ixemul V63.1 Version
Post by: Karlos on May 06, 2010, 10:01:37 AM: @matthey

Yeah, I did find a use for bfffo when implementing a quick integer log2 estimation.
Title: Re: new ixemul V63.1 Version
Post by: amigagr on May 06, 2010, 01:51:51 PM: @bernd_afa (http://www.amiga.org/forums/member.php?u=3520) and matthey (http://www.amiga.org/forums/member.php?u=5111)

thanks for all this info, i will propose my friend to get an 68882. he's interesting
for netbsd too in his a1200, so an fpu is a must.
Title: Re: new ixemul V63.1 Version
Post by: x303 on May 09, 2010, 09:25:56 PM: So why not compile with -m68020-60 -m68881 ? Everybody happy. :angel: