Amiga.org
Amiga News and Community Announcements => Amiga News and Community Announcements => Amiga Software News => Topic started by: unusedunused on May 05, 2010, 03:46:37 PM
-
http://amiga.sourceforge.net/
Version 63.1
* I find functions that are not in MOS ixemul but in the 48.3 fork from megacz
and i add them now in this ixemul
realpath
inet_ntop
inet_pton
gai_strerror
freeaddrinfo
getaddrinfo
getnameinfo
* Add ix_UseAmigaPaths(mode) func.
When mode = 1 then amiga Path Mode is used.
For example when current dir is sys:wbstartup
ix_UseAmigaPaths(1);
h = fopen("/tools.info","r");
work
* Add C99 funcs
hypot
exp2
exp2f
log10f
expf
* revert change from jDc in __write.c to V48 because some programs do not correct line feeds in Shell Output(gdb ) with new Code
and for what the new Code is need i dont know
* Add dummy libm.a that come in lib dir.It help some
configure scripts that want link with libm to detect
available math functions correct.
* the ixemul with poolmem allocator give a message when run sashimi on first ixemul use.
"ixemul for poolmem used".
for fewest mem fragmentation dont use it currently, but its usefull for develop,
because then Amiga memtracker can watch all malloc and free
* usleep use now precise Amiga Timer and work correct with newest ffplay builds now.
-
is this new version only for 060?
-
@amigagr
Knowing Bernd, it's probably compiled for the 68060 without bitfield instructions. This should be compatible with the 68040 although is not optimal. It should work on the 68020 and 68030 with good speed but there is a much greater chance of incompatibility if floating point is used. Bernd leaves out the bit field instructions because they are slow on UAE and the GCC compiler does not make good choices of speed trade offs when deciding to use. The 68040 would probably benefit from them though. My best answer to your question is probably and try it. Bernd might offer some different advise though.
-
a friend of mine told me some days ago that the new netsurf 68k was asking for fpu.
i told him to run netsurf after snoopdos and we found that the new ixemul was that which asking for the fpu.
after a version check he realize that had the 68060 ixemul, but in his amiga has only a blizard030/50.
-
@matthey
I'll be honest, whenever I've used bitfield operations in assembler on a real 68040 I've often found them to be slower than the equivalent mask/shift/combine operation. Never did figure out why that was. The whole point of having them was to speed up those sorts of operations.
-
@amigagr
Quite possible. If your friend has a 68881 or 68882 it might work. There are a few new fpu instructions in the 68040+ that aren't really necessary (fdxxx and fsxxx). If that is the case, then he might ask Bernd to compile a different version. If he doesn't have a fpu then he is most likely going to crash.
@Karlos
Yes, the bit field insructions are often slower for simple shift and mask even on the 68040. A bf instruction usually needs to replace at least 3 shift and mask instructions on the 68040 to be faster. Shifting and rotating is not the strong point of the 68040. The 68060 can usually do the shift and mask in 1 cycle yet the bf instructions are about the same speed as the 68040. Therefore, the 68060 usually needs to replace 6 to 18 shift and mask instructions to be faster. That makes most of the bf instructions not very useful for time critical code on the 68060. bfffo and bfext are exceptions if branches would be needed with the replacements. A missed branch (always occurs on a loop) still takes 7 cycles on the 68060 which is about what a bf instruction takes. GCC uses the bf instructions about right on the 68040 but way too much for the 68060.
-
a friend of mine told me some days ago that the new netsurf 68k was asking for fpu.
i told him to run netsurf after snoopdos and we found that the new ixemul was that which asking for the fpu.
after a version check he realize that had the 68060 ixemul, but in his amiga has only a blizard030/50.
also if you have a ixemul without FPU support, the problem is today on Linux systems a FPU is standard and its most used.for example netsurf need it for CSS.and CSS layouting is a speed critical part.
we all know amiga have a fast floating point ffp format, the old Amiga Compilers support direct.But this need special compiled programs and translation code for data.because the format is not compatible to FPU format.
GCC do only support the software float format, that is compatible with the FPU.And this is slow.
so the result is, when you compile a program only for software float it get unusable slow.
see here thats the code for mul a single float.normaly a FPU can do that in 2 clocks.but this code need more than 80 clocks.thats around 40* slower.but if you not believe you can try a non fpu build of netsurf.the source is on aminet.
| float __mulsf3(float, float);
FUNC(__mulsf3)
SYM (__mulsf3):
#ifndef __mcoldfire__
link a6,IMM (0)
moveml d2-d7,sp@-
#else
link a6,IMM (-24)
moveml d2-d7,sp@
#endif
movel a6@(8),d0 | get a into d0
movel a6@(12),d1 | and b into d1
movel d0,d7 | d7 will hold the sign of the product
eorl d1,d7 |
andl IMM (0x80000000),d7
movel IMM (INFINITY),d6 | useful constant (+INFINITY)
movel d6,d5 | another (mask for fraction)
notl d5 |
movel IMM (0x00800000),d4 | this is to put hidden bit back
bclr IMM (31),d0 | get rid of a's sign bit '
movel d0,d2 |
beq Lmulsf$a$0 | branch if a is zero
bclr IMM (31),d1 | get rid of b's sign bit '
movel d1,d3 |
beq Lmulsf$b$0 | branch if b is zero
cmpl d6,d0 | is a big?
bhi Lmulsf$inop | if a is NaN return NaN
beq Lmulsf$inf | if a is INFINITY we have to check b
cmpl d6,d1 | now compare b with INFINITY
bhi Lmulsf$inop | is b NaN?
beq Lmulsf$overflow | is b INFINITY?
| Here we have both numbers finite and nonzero (and with no sign bit).
| Now we get the exponents into d2 and d3.
andl d6,d2 | and isolate exponent in d2
beq Lmulsf$a$den | if exponent is zero we have a denormalized
andl d5,d0 | and isolate fraction
orl d4,d0 | and put hidden bit back
swap d2 | I like exponents in the first byte
#ifndef __mcoldfire__
lsrw IMM (7),d2 |
#else
lsrl IMM (7),d2 |
#endif
Lmulsf$1: | number
andl d6,d3 |
beq Lmulsf$b$den |
andl d5,d1 |
orl d4,d1 |
swap d3 |
#ifndef __mcoldfire__
lsrw IMM (7),d3 |
#else
lsrl IMM (7),d3 |
#endif
Lmulsf$2: |
#ifndef __mcoldfire__
addw d3,d2 | add exponents
subw IMM (F_BIAS+1),d2 | and subtract bias (plus one)
#else
addl d3,d2 | add exponents
subl IMM (F_BIAS+1),d2 | and subtract bias (plus one)
#endif
| We are now ready to do the multiplication. The situation is as follows:
| both a and b have bit FLT_MANT_DIG-1 set (even if they were
| denormalized to start with!), which means that in the product
| bit 2*(FLT_MANT_DIG-1) (that is, bit 2*FLT_MANT_DIG-2-32 of the
| high long) is set.
| To do the multiplication let us move the number a little bit around ...
movel d1,d6 | second operand in d6
movel d0,d5 | first operand in d4-d5
movel IMM (0),d4
movel d4,d1 | the sums will go in d0-d1
movel d4,d0
| now bit FLT_MANT_DIG-1 becomes bit 31:
lsll IMM (31-FLT_MANT_DIG+1),d6
| Start the loop (we loop #FLT_MANT_DIG times):
moveq IMM (FLT_MANT_DIG-1),d3
1: addl d1,d1 | shift sum
addxl d0,d0
lsll IMM (1),d6 | get bit bn
bcc 2f | if not set skip sum
addl d5,d1 | add a
addxl d4,d0
2:
#ifndef __mcoldfire__
dbf d3,1b | loop back
#else
subql IMM (1),d3
bpl 1b
#endif
| Now we have the product in d0-d1, with bit (FLT_MANT_DIG - 1) + FLT_MANT_DIG
| (mod 32) of d0 set. The first thing to do now is to normalize it so bit
| FLT_MANT_DIG is set (to do the rounding).
#ifndef __mcoldfire__
rorl IMM (6),d1
swap d1
movew d1,d3
andw IMM (0x03ff),d3
andw IMM (0xfd00),d1
#else
movel d1,d3
lsll IMM (8),d1
addl d1,d1
addl d1,d1
moveq IMM (22),d5
lsrl d5,d3
orl d3,d1
andl IMM (0xfffffd00),d1
#endif
lsll IMM (8),d0
addl d0,d0
addl d0,d0
#ifndef __mcoldfire__
orw d3,d0
#else
orl d3,d0
#endif
moveq IMM (MULTIPLY),d5
btst IMM (FLT_MANT_DIG+1),d0
beq Lround$exit
#ifndef __mcoldfire__
lsrl IMM (1),d0
roxrl IMM (1),d1
addw IMM (1),d2
#else
lsrl IMM (1),d1
btst IMM (0),d0
beq 10f
bset IMM (31),d1
10: lsrl IMM (1),d0
addql IMM (1),d2
#endif
bra Lround$exit
Lmulsf$inop:
moveq IMM (MULTIPLY),d5
bra Lf$inop
Lmulsf$overflow:
moveq IMM (MULTIPLY),d5
bra Lf$overflow
Lmulsf$inf:
moveq IMM (MULTIPLY),d5
| If either is NaN return NaN; else both are (maybe infinite) numbers, so
| return INFINITY with the correct sign (which is in d7).
cmpl d6,d1 | is b NaN?
bhi Lf$inop | if so return NaN
bra Lf$overflow | else return +/-INFINITY
| If either number is zero return zero, unless the other is +/-INFINITY,
| or NaN, in which case we return NaN.
Lmulsf$b$0:
| Here d1 (==b) is zero.
movel a6@(8),d1 | get a again to check for non-finiteness
bra 1f
Lmulsf$a$0:
movel a6@(12),d1 | get b again to check for non-finiteness
1: bclr IMM (31),d1 | clear sign bit
cmpl IMM (INFINITY),d1 | and check for a large exponent
bge Lf$inop | if b is +/-INFINITY or NaN return NaN
movel d7,d0 | else return signed zero
PICLEA SYM (_fpCCR),a0 |
movew IMM (0),a0@ |
#ifndef __mcoldfire__
moveml sp@+,d2-d7 |
#else
moveml sp@,d2-d7
| XXX if frame pointer is ever removed, stack pointer must
| be adjusted here.
#endif
unlk a6 |
rts |
-
>If that is the case, then he might ask Bernd to compile a different version. If he doesn't >have a fpu then he is most likely going to crash.
the 68060 Version work on 68020 68881 good.
if somebody can show me on a real world program that a real 68020 build give a speedup, and the program is not too slow to usable, i can build a 68020 version too.
but i have not the time to do builds that nobody need.
But i think this cant happen, because when look with ixtrace what code programs mostly execute on ixemul is memalloc, string functions.
and here is no diffrence possible, the string functions are in asm written and the memalloc time depend on cache misses a lot.
i also test ixemul poolmem version with tlsfmem and netsurf.i change the AFA exec so, that pool threshold is set to 4 and so every time a allocpooled is called, it do a allocmem because allocmem size is greater and so the pool is not used.
In benchmarks this give the fastest speed.
but in real world with netsurf and other C++ programs(that alloc and free small pieces of mem very often) tlsf is realy slow on my AMD64 3000+ System.64 kb 1. Level Cache 256 kb 2. Level Cache and winuae.
The main Problem on tlsf is, that it is a best fit allocator and here it happen that mem allocs that with other memallocators are placed together, so they fit better in cache are scatter around and do lots cache misses, that slow down netsurf by Factor 2 on my system.but of course tlsfmem is faster as when use no tlsfmem and do allways allocmem with this special exec.
here the system slow down by factor 4.but then there come no crashes.
i see that netsurf or other ixemul programs(that do often malloc and free) crash when use tlsfmem, this happen on a classic too.
speed tests are here not get.
I dont know wy this happen, netsurf work stable, memtracker show no illegal access, and it work with poolmem and buddy allocator good.
but if somebody want test on classic tlsfmem and netsurf more for speed , i can send a exec version for AFA.but be warned, its possible that you can kill your install.
-
@matthey
Yeah, I did find a use for bfffo when implementing a quick integer log2 estimation.
-
@bernd_afa (http://www.amiga.org/forums/member.php?u=3520) and matthey (http://www.amiga.org/forums/member.php?u=5111)
thanks for all this info, i will propose my friend to get an 68882. he's interesting
for netbsd too in his a1200, so an fpu is a must.
-
So why not compile with -m68020-60 -m68881 ? Everybody happy. :angel: