Welcome, Guest. Please login or register.

Author Topic: CopyMem Quick & Small released!  (Read 14221 times)

Description:

0 Members and 2 Guests are viewing this topic.

Offline Cosmos Amiga

  • Hero Member
  • *****
  • Join Date: Jan 2007
  • Posts: 954
    • Show all replies
    • http://leblogdecosmos.blogspot.com
Re: CopyMem Quick & Small released!
« on: January 11, 2015, 09:30:20 AM »
Great ideas and work, SpeedGeek !

Your patch give me motivation to update the BVisionPPC monitor :

 BVisionPPC v4.4

  - section removed
  - four 060 emulated fmovecr removed
  - two realtime 68030 checking removed (what's that ?)
  - one realtime FPU checking removed
  - ugly internal SAS/C copyroutine replaced by a _LVOCopyMem call


They are still two ugly ugly ugly copyroutines with move16 into this new version : ashamed of slowness...

I'll email them to you, you will see...



:)

Offline Cosmos Amiga

  • Hero Member
  • *****
  • Join Date: Jan 2007
  • Posts: 954
    • Show all replies
    • http://leblogdecosmos.blogspot.com
Re: CopyMem Quick & Small released!
« Reply #1 on: January 13, 2015, 02:38:42 PM »
Quote from: olsen;781601
(SASC compiles it properly)

For the love of the Amiga, please DO NOT USE or write on forum this compilator at all...


Here an example of what I have found in the GRex Voodoo3 monitor :

Code: [Select]
JL_0_CC68
  move.l a5,-(sp)
  move.l d0,d1
  move.l a0,a5
  bmi.b JL_0_CC76
  moveq #8,d0
  cmp.l d0,d1
  ble.b JL_0_CC7E
JL_0_CC76
  move.w #-$0001,a0
  move.l a0,d0
  bra.b JL_0_CC92
JL_0_CC7E
  asl.l #2,d1
  lea $2AC(a4),a0
  lea $2AC(a4),a1
  move.l (a0,d1.l),a0
  move.l a5,(a1,d1.l)
  move.l a0,d0
JL_0_CC92
  move.l (sp)+,a5
  rts


It's AWFULL : DO NOT USE THIS COMPILATOR !!



:(

Offline Cosmos Amiga

  • Hero Member
  • *****
  • Join Date: Jan 2007
  • Posts: 954
    • Show all replies
    • http://leblogdecosmos.blogspot.com
Re: CopyMem Quick & Small released!
« Reply #2 on: January 13, 2015, 04:33:15 PM »
Quote from: Thomas Richter;781612
Not really. It is "getting the job done with the resources available". In real life, the cost factor of software is *your* time, not *computer time*. Thus, if I can get something done in a high-level scripting language that satisfies the needs of my customers, and I take two days for that, then that's a better solution than working on the same project for a month in C even if the resulting C code would run probably at three times the speed. There are situations where this slow-down is not acceptable, of course, that depends on the problem. But the cases where you can justify for your clients a six-month development time for an Assembly program that runs 50% faster than a C code that could have been done in a month are pretty rare. Yes, I've seen such cases, but that's really the exception.

Don't care about the code time for me : I just want PERFECT code...

Sometimes I can think many days about one small routine to find finally his Truth... Just like SpeedGeek with CopyMem & CopyMemQuick !



:)

Offline Cosmos Amiga

  • Hero Member
  • *****
  • Join Date: Jan 2007
  • Posts: 954
    • Show all replies
    • http://leblogdecosmos.blogspot.com
Re: CopyMem Quick & Small released!
« Reply #3 on: January 22, 2015, 04:37:54 PM »
"Little by little the bird makes its nest"



:)

Offline Cosmos Amiga

  • Hero Member
  • *****
  • Join Date: Jan 2007
  • Posts: 954
    • Show all replies
    • http://leblogdecosmos.blogspot.com
Re: CopyMem Quick & Small released!
« Reply #4 on: February 22, 2015, 11:17:09 AM »
Here some very interesting and precise benchs about cycles penality for the fastram !

Test machine : Apollo 1260 with 68060@90


DataCache and Store Buffer enable :
Code: [Select]
1)
addr read.l 0(a0) : 1060 us (19 cycles)
addr read.l 1(a0) : 1115 us (20 cycles) => +1
addr read.l 2(a0) : 1115 us (20 cycles) => +1
addr read.l 3(a0) : 1115 us (20 cycles) => +1

2)
addr read.w 0(a0) : 1060 us (19 cycles)
addr read.w 1(a0) : 1059 us (19 cycles) => +0
addr read.w 2(a0) : 1059 us (19 cycles) => +0
addr read.w 3(a0) : 1115 us (20 cycles) => +1

3)
addr write.l 0(a0) : 1170 us (21 cycles)
addr write.l 1(a0) : 1170 us (21 cycles) => +0
addr write.l 2(a0) : 1282 us (23 cycles) => +2
addr write.l 3(a0) : 1282 us (23 cycles) => +2

4)
addr write.w 0(a0) : 1170 us (21 cycles)
addr write.w 1(a0) : 1170 us (21 cycles) => +0
addr write.w 2(a0) : 1170 us (21 cycles) => +0
addr write.w 3(a0) : 1282 us (23 cycles) => +2


DataCache and Store Buffer disable :
Code: [Select]
1)
addr read.l 0(a0) : 4985 us  (89 cycles)
addr read.l 1(a0) : 6665 us (119 cycles) => +30
addr read.l 2(a0) : 5880 us (105 cycles) => +16
addr read.l 3(a0) : 6665 us (119 cycles) => +30

2)
addr read.w 0(a0) : 4986 us  (89 cycles)
addr read.w 1(a0) : 5774 us (103 cycles) => +14
addr read.w 2(a0) : 4986 us  (89 cycles) => +0
addr read.w 3(a0) : 5880 us (105 cycles) => +16

3)
addr write.l 0(a0) : 5102 us  (91 cycles)
addr write.l 1(a0) : 5102 us  (91 cycles) => +0
addr write.l 2(a0) : 5883 us (105 cycles) => +14
addr write.l 3(a0) : 6664 us (119 cycles) => +28

4)
addr write.w 0(a0) : 5102 us  (91 cycles)
addr write.w 1(a0) : 5883 us (105 cycles) => +14
addr write.w 2(a0) : 5102 us  (91 cycles) => +0
addr write.w 3(a0) : 5883 us (105 cycles) => +14


Code: [Select]
1)
nop
move.l (a0),d0
nop

2)
nop
move.w (a0),d0
nop

3)
nop
move.l d0,(a0)
nop

4)
nop
move.w d0,(a0)
nop



:)

Offline Cosmos Amiga

  • Hero Member
  • *****
  • Join Date: Jan 2007
  • Posts: 954
    • Show all replies
    • http://leblogdecosmos.blogspot.com
Re: CopyMem Quick & Small released!
« Reply #5 on: February 22, 2015, 01:23:19 PM »
Here some very interesting and precise benchs about cycles penality for the fastram !

Test machine : GVP Turbo+ Jaws 1230 with 68030@40


DataCache enable :
Code: [Select]

1)
addr read.l 0(a0) : 1424 us (14 cycles)
addr read.l 1(a0) : 1530 us (15 cycles) => +1
addr read.l 2(a0) : 1530 us (15 cycles) => +1
addr read.l 3(a0) : 1530 us (15 cycles) => +1

2)
addr read.w 0(a0) : 1424 us (14 cycles)
addr read.w 1(a0) : 1426 us (14 cycles) => +0
addr read.w 2(a0) : 1426 us (14 cycles) => +0
addr read.w 3(a0) : 1530 us (15 cycles) => +1

3)
addr write.l 0(a0) : 1540 us (15 cycles)
addr write.l 1(a0) : 1540 us (15 cycles) => +0
addr write.l 2(a0) : 2174 us (21 cycles) => +6
addr write.l 3(a0) : 2174 us (21 cycles) => +6

4)
addr write.w 0(a0) : 1530 us (15 cycles)
addr write.w 1(a0) : 1530 us (15 cycles) => +0
addr write.w 2(a0) : 1530 us (15 cycles) => +0
addr write.w 3(a0) : 2174 us (21 cycles) => +6


DataCache disable :
Code: [Select]

1)
addr read.l 0(a0) : 2173 us (21 cycles)
addr read.l 1(a0) : 2813 us (28 cycles) => +7
addr read.l 2(a0) : 2813 us (28 cycles) => +7
addr read.l 3(a0) : 2813 us (28 cycles) => +7

2)
addr read.w 0(a0) : 2173 us (21 cycles)
addr read.w 1(a0) : 2173 us (21 cycles) => +0
addr read.w 2(a0) : 2173 us (21 cycles) => +0
addr read.w 3(a0) : 2813 us (28 cycles) => +7

3)
addr write.l 0(a0) : 1792 us (17 cycles)
addr write.l 1(a0) : 1792 us (17 cycles) => +0
addr write.l 2(a0) : 2419 us (24 cycles) => +7
addr write.l 3(a0) : 2419 us (24 cycles) => +7

4)
addr write.w 0(a0) : 1431 us (14 cycles)
addr write.w 1(a0) : 1792 us (17 cycles) => +3
addr write.w 2(a0) : 1792 us (17 cycles) => +3
addr write.w 3(a0) : 2419 us (24 cycles) => +10




:)