Welcome, Guest. Please login or register.

Author Topic: My C homework  (Read 16011 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline Smack

  • Newbie
  • *
  • Join Date: Jul 2003
  • Posts: 16
    • Show all replies
Re: Mel's C homework
« on: March 01, 2007, 12:18:56 PM »
@Karlos

Quote
Since your 030 has only 256 bytes of data cache, you might want to redesign your character table like so:

May I express my doubts that this will make the table lookups any faster.

While the bit array needs less memory and can therefore fit into the 68030's tiny cache, the extra instructions required for the bitwise access will probably nullify the performance boost achieved by the cache hits. (I haven't counted the instruction cycles, yet)

Here's some example code in M68k assembly to support my claim:
Code: [Select]

;  input: d0 = character (values 0...255)
;  input: a0 = address of lookup table
; output: status bit Z (Zero)

; alternative (1) byte array lookup
tst.b   (a0,d0.w)       ;(cache-miss likely on 68030)

; alternative (2) bit array lookup (supposed C compiler output)
move.w  d0,d1
lsr.w   #5,d1           ;d1=table offset (long words)
move.l  (a0,d1.w*4),d1  ;d1=long word value
and.l   #31,d0          ;d0=bit number (modulo 32)
moveq   #1,d2
lsl.l   d0,d2           ;d2=bit mask
and.l   d2,d1

; alternative (3) bit arry lookup (asm optimized)
move.w  d0,d1
lsr.w   #3,d1           ;d1=table offset (bytes)
btst    d0,(a0,d1.w)    ;(implied: d0=bit number modulo 8)

Depending on the number of wait states caused by the cache-miss during byte array lookup (alternative 1), even the hand-optimised bit array lookup (alternative 3) may be slower due to the extra instructions, let alone the C version (alternative 2). If the code runs on a 68040 or 060 then things will look even worse for the bit array...

I think this kind of performance tweaking is way off-topic for a beginners session in C, anyway.

Sorry for nagging.  ;-)
 

Offline Smack

  • Newbie
  • *
  • Join Date: Jul 2003
  • Posts: 16
    • Show all replies
Re: Mel's C homework
« Reply #1 on: March 01, 2007, 03:17:08 PM »
Quote
It was faster than the basic char array on my 040. I reasoned that it may well be faster on the 030 but as you say, mileage varies ;-)

Interesting! Why's that? Can the longword memory access be that much faster than the byte memory access?
 

Offline Smack

  • Newbie
  • *
  • Join Date: Jul 2003
  • Posts: 16
    • Show all replies
Re: Mel's C homework
« Reply #2 on: March 01, 2007, 04:32:49 PM »
Quote
SamuraiCrow wrote:
No...  The '040 has a 4k cache on the die instead of a 256 byte cache.

Yeah, I know. The byte array (256 bytes) should easily fit into that larger cache, which should give the byte lookup a boost. But Karlos observed a better performance of the bit lookup with its smaller table (32 bytes) but more instructions.

Why? Cache-misses due to lookup table of 256 bytes not being in 68040 data cache? Longword lookups faster than byte lookups? Hm...
 

Offline Smack

  • Newbie
  • *
  • Join Date: Jul 2003
  • Posts: 16
    • Show all replies
Re: Mel's C homework
« Reply #3 on: March 01, 2007, 05:28:21 PM »
Quote
Karlos wrote:
The source data also enters the cache.
...
Therefore, perhaps the smaller bit-table is less affected due to the fact that it's both smaller and fact that a lot of the time the same 32-bit word will be repeatedly hit for many different characters.

Yes, that's probably the explanation of the results.

But as you said, in the presence of file I/O it seems useless to optimise this part of the program. It's just a nice technical exercise with no practical benefit. Have fun.  ;-)