Author Topic: CopyMem Quick & Small released! (Read 14221 times)

Cosmos Amiga · « **on:** January 11, 2015, 09:30:20 AM »

Great ideas and work, SpeedGeek !

Your patch give me motivation to update the BVisionPPC monitor :

BVisionPPC v4.4

- section removed
- four 060 emulated fmovecr removed
- two realtime 68030 checking removed (what's that ?)
- one realtime FPU checking removed
- ugly internal SAS/C copyroutine replaced by a _LVOCopyMem call

They are still two ugly ugly ugly copyroutines with move16 into this new version : ashamed of slowness...

I'll email them to you, you will see...

Cosmos Amiga · « **Reply #1 on:** January 13, 2015, 02:38:42 PM »

Quote from: olsen;781601

(SASC compiles it properly)

For the love of the Amiga, please DO NOT USE or write on forum this compilator at all...

Here an example of what I have found in the GRex Voodoo3 monitor :

Code: [Select]

JL_0_CC68
 	move.l	a5,-(sp)
 	move.l	d0,d1
 	move.l	a0,a5
 	bmi.b	JL_0_CC76
 	moveq	#8,d0
 	cmp.l	d0,d1
 	ble.b	JL_0_CC7E
JL_0_CC76
 	move.w	#-$0001,a0
 	move.l	a0,d0
 	bra.b	JL_0_CC92
JL_0_CC7E
 	asl.l	#2,d1
 	lea	$2AC(a4),a0
 	lea	$2AC(a4),a1
 	move.l	(a0,d1.l),a0
 	move.l	a5,(a1,d1.l)
 	move.l	a0,d0
JL_0_CC92
 	move.l	(sp)+,a5
 	rts

It's AWFULL : DO NOT USE THIS COMPILATOR !!

Cosmos Amiga · « **Reply #2 on:** January 13, 2015, 04:33:15 PM »

Quote from: Thomas Richter;781612

Not really. It is "getting the job done with the resources available". In real life, the cost factor of software is *your* time, not *computer time*. Thus, if I can get something done in a high-level scripting language that satisfies the needs of my customers, and I take two days for that, then that's a better solution than working on the same project for a month in C even if the resulting C code would run probably at three times the speed. There are situations where this slow-down is not acceptable, of course, that depends on the problem. But the cases where you can justify for your clients a six-month development time for an Assembly program that runs 50% faster than a C code that could have been done in a month are pretty rare. Yes, I've seen such cases, but that's really the exception.

Don't care about the code time for me : I just want PERFECT code...

Sometimes I can think many days about one small routine to find finally his Truth... Just like SpeedGeek with CopyMem & CopyMemQuick !

Cosmos Amiga · « **Reply #3 on:** January 22, 2015, 04:37:54 PM »

"Little by little the bird makes its nest"

Cosmos Amiga · « **Reply #4 on:** February 22, 2015, 11:17:09 AM »

Here some very interesting and precise benchs about cycles penality for the fastram !

Test machine : Apollo 1260 with 68060@90

DataCache and Store Buffer enable :

Code: [Select]

1)
addr read.l 0(a0) : 1060 us (19 cycles)
addr read.l 1(a0) : 1115 us (20 cycles) => +1
addr read.l 2(a0) : 1115 us (20 cycles) => +1
addr read.l 3(a0) : 1115 us (20 cycles) => +1

2)
addr read.w 0(a0) : 1060 us (19 cycles)
addr read.w 1(a0) : 1059 us (19 cycles) => +0
addr read.w 2(a0) : 1059 us (19 cycles) => +0
addr read.w 3(a0) : 1115 us (20 cycles) => +1

3)
addr write.l 0(a0) : 1170 us (21 cycles)
addr write.l 1(a0) : 1170 us (21 cycles) => +0
addr write.l 2(a0) : 1282 us (23 cycles) => +2
addr write.l 3(a0) : 1282 us (23 cycles) => +2

4)
addr write.w 0(a0) : 1170 us (21 cycles)
addr write.w 1(a0) : 1170 us (21 cycles) => +0
addr write.w 2(a0) : 1170 us (21 cycles) => +0
addr write.w 3(a0) : 1282 us (23 cycles) => +2

DataCache and Store Buffer disable :

Code: [Select]

1)
addr read.l 0(a0) : 4985 us  (89 cycles)
addr read.l 1(a0) : 6665 us (119 cycles) => +30
addr read.l 2(a0) : 5880 us (105 cycles) => +16
addr read.l 3(a0) : 6665 us (119 cycles) => +30

2)
addr read.w 0(a0) : 4986 us  (89 cycles)
addr read.w 1(a0) : 5774 us (103 cycles) => +14
addr read.w 2(a0) : 4986 us  (89 cycles) => +0
addr read.w 3(a0) : 5880 us (105 cycles) => +16

3)
addr write.l 0(a0) : 5102 us  (91 cycles)
addr write.l 1(a0) : 5102 us  (91 cycles) => +0
addr write.l 2(a0) : 5883 us (105 cycles) => +14
addr write.l 3(a0) : 6664 us (119 cycles) => +28

4)
addr write.w 0(a0) : 5102 us  (91 cycles)
addr write.w 1(a0) : 5883 us (105 cycles) => +14
addr write.w 2(a0) : 5102 us  (91 cycles) => +0
addr write.w 3(a0) : 5883 us (105 cycles) => +14

Code: [Select]

1)
nop
move.l (a0),d0
nop

2)
nop
move.w (a0),d0
nop

3)
nop
move.l d0,(a0)
nop

4)
nop
move.w d0,(a0)
nop

Cosmos Amiga · « **Reply #5 on:** February 22, 2015, 01:23:19 PM »

Here some very interesting and precise benchs about cycles penality for the fastram !

Test machine : GVP Turbo+ Jaws 1230 with 68030@40

DataCache enable :

Code: [Select]


1)
addr read.l 0(a0) : 1424 us (14 cycles)
addr read.l 1(a0) : 1530 us (15 cycles) => +1
addr read.l 2(a0) : 1530 us (15 cycles) => +1
addr read.l 3(a0) : 1530 us (15 cycles) => +1

2)
addr read.w 0(a0) : 1424 us (14 cycles)
addr read.w 1(a0) : 1426 us (14 cycles) => +0
addr read.w 2(a0) : 1426 us (14 cycles) => +0
addr read.w 3(a0) : 1530 us (15 cycles) => +1

3)
addr write.l 0(a0) : 1540 us (15 cycles)
addr write.l 1(a0) : 1540 us (15 cycles) => +0
addr write.l 2(a0) : 2174 us (21 cycles) => +6
addr write.l 3(a0) : 2174 us (21 cycles) => +6

4)
addr write.w 0(a0) : 1530 us (15 cycles)
addr write.w 1(a0) : 1530 us (15 cycles) => +0
addr write.w 2(a0) : 1530 us (15 cycles) => +0
addr write.w 3(a0) : 2174 us (21 cycles) => +6

DataCache disable :

Code: [Select]


1)
addr read.l 0(a0) : 2173 us (21 cycles)
addr read.l 1(a0) : 2813 us (28 cycles) => +7
addr read.l 2(a0) : 2813 us (28 cycles) => +7
addr read.l 3(a0) : 2813 us (28 cycles) => +7

2)
addr read.w 0(a0) : 2173 us (21 cycles)
addr read.w 1(a0) : 2173 us (21 cycles) => +0
addr read.w 2(a0) : 2173 us (21 cycles) => +0
addr read.w 3(a0) : 2813 us (28 cycles) => +7

3)
addr write.l 0(a0) : 1792 us (17 cycles)
addr write.l 1(a0) : 1792 us (17 cycles) => +0
addr write.l 2(a0) : 2419 us (24 cycles) => +7
addr write.l 3(a0) : 2419 us (24 cycles) => +7

4)
addr write.w 0(a0) : 1431 us (14 cycles)
addr write.w 1(a0) : 1792 us (17 cycles) => +3
addr write.w 2(a0) : 1792 us (17 cycles) => +3
addr write.w 3(a0) : 2419 us (24 cycles) => +10

Author Topic: CopyMem Quick & Small released! (Read 14221 times)

Cosmos Amiga

Re: CopyMem Quick & Small released!

Cosmos Amiga

Re: CopyMem Quick & Small released!

Cosmos Amiga

Re: CopyMem Quick & Small released!

Cosmos Amiga

Re: CopyMem Quick & Small released!

Cosmos Amiga

Re: CopyMem Quick & Small released!

Cosmos Amiga

Re: CopyMem Quick & Small released!