Welcome, Guest. Please login or register.

Author Topic: Threaded Code  (Read 6355 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline itix

  • Hero Member
  • *****
  • Join Date: Oct 2002
  • Posts: 2380
    • Show only replies by itix
Re: Threaded Code
« Reply #14 on: November 24, 2010, 06:38:20 PM »
Quote from: bloodline;594081
Not really, your advice was good, I didn't provide enough specification for you! As Karlos has also pointed out function pointers are the legal way to do this... I just don't want to save the registers for every function call!


Necessarily it is not saving the registers for every function call.  It is platform specific...
My Amigas: A500, Mac Mini and PowerBook
 

Offline bloodlineTopic starter

  • Master Sock Abuser
  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 12113
    • Show only replies by bloodline
    • http://www.troubled-mind.com
Re: Threaded Code
« Reply #15 on: November 24, 2010, 07:26:07 PM »
Quote from: itix;594110
Necessarily it is not saving the registers for every function call.  It is platform specific...
If I have a function that takes no arguments and returns nothing, and acts only on global variables... then will the compiler optimize for not saving any registers? I doubt it :(

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: Threaded Code
« Reply #16 on: November 24, 2010, 07:48:16 PM »
Quote from: bloodline;594117
If I have a function that takes no arguments and returns nothing, and acts only on global variables... then will the compiler optimize for not saving any registers? I doubt it :(


That depends. If shovel it all into the same translation unit so it can see the scope of everything whilst it is compiling, you'd be surprised...
int p; // A
 

Offline bloodlineTopic starter

  • Master Sock Abuser
  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 12113
    • Show only replies by bloodline
    • http://www.troubled-mind.com
Re: Threaded Code
« Reply #17 on: November 24, 2010, 08:13:37 PM »
Quote from: Karlos;594120
That depends. If shovel it all into the same translation unit so it can see the scope of everything whilst it is compiling, you'd be surprised...
Unfortunately the ARM compiler I'm using doesn't allow me access to the asm (I support I could disassemble it...) so I can't see what's really going on...

But you are right though, I've seen some pretty incredible optimization done by a C complier before...

I'm rapidly losing interest in this idea... as my real life workload has just increased :(

Offline Trev

  • Hero Member
  • *****
  • Join Date: May 2003
  • Posts: 1550
  • Country: 00
    • Show only replies by Trev
Re: Threaded Code
« Reply #18 on: November 24, 2010, 09:27:33 PM »
setjmp() and longjmp()? "sjlj" exceptions are used in many C++ implementations, including G++ on AmigaOS. In C, see the Protothreads library (more like fibers than threads) for creative uses http://www.sics.se/~adam/pt/. Protothreads is used by uIP.
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: Threaded Code
« Reply #19 on: November 24, 2010, 09:35:03 PM »
Hmm, IIRC, setjmp / longjmp save more state information than a normal function call does, so if the latter are too expensive, the former aren't likely to be any better?
int p; // A
 

Offline Trev

  • Hero Member
  • *****
  • Join Date: May 2003
  • Posts: 1550
  • Country: 00
    • Show only replies by Trev
Re: Threaded Code
« Reply #20 on: November 24, 2010, 10:19:00 PM »
Then you're just talking about a state machine of some sort, e.g. in pseudo code:

Code: [Select]
unsigned thread = 0;
unsigned state0 = 0;
unsigned state1 = 0;
unsigned state2 = 0;
unsigned state3 = 0;

while (1) {
  switch (thread++ % 4)
  {
  case 0:
    switch (state0++ % 2)
    {
      case 0:
        /* do something */
        break; /* yield */
     
      case 1:
        /* continue doing something */
    }
    break;

  case 1:
    switch (state1++ % 2)
    {
      case 0:
        /* do something */
        break; /* yield */
     
      case 1:
        /* continue doing something */
    }
    break;

  case 2:
    switch (state2++ % 2)
    {
      case 0:
        /* do something */
        break; /* yield */
     
      case 1:
        /* continue doing something */
    }
    break;

  case 3:
    switch (state3++ % 2)
    {
      case 0:
        /* do something */
        break; /* yield */
     
      case 1:
        /* continue doing something */
    }
  }
}

That's essentially what Protothreads does, albeit with the overhead of setjmp/longjmp.
« Last Edit: November 24, 2010, 10:21:48 PM by Trev »
 

Offline yakumo9275

  • Sr. Member
  • ****
  • Join Date: Jun 2008
  • Posts: 301
    • Show only replies by yakumo9275
    • http://mega-tokyo.com/blog
Re: Threaded Code
« Reply #21 on: November 24, 2010, 10:40:09 PM »
protothreads / coroutines. But mostly, C was not built to do what you want to do. Look at a lot of Forth interpreters and see how they do it, they pioneered the threaded code model, but mostly the ones I've seen use this method are written in assembler.
--/\\-[ Stu ]-/\\--
Commodore 128DCR, JiffyDOS, Ultimate 1541 II, uIEC/SD, CBM 1902A  Monitor
 

Offline bloodlineTopic starter

  • Master Sock Abuser
  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 12113
    • Show only replies by bloodline
    • http://www.troubled-mind.com
Re: Threaded Code
« Reply #22 on: November 24, 2010, 11:27:32 PM »
Quote from: yakumo9275;594170
protothreads / coroutines. But mostly, C was not built to do what you want to do. Look at a lot of Forth interpreters and see how they do it, they pioneered the threaded code model, but mostly the ones I've seen use this method are written in assembler.
Yeah, just a note to say I'm not talking about Multithreading (I don't want to run more than one task) I'm talking about a method of implementing a Virtual machine using a table of functions/subroutines... What little reading I have done has mostly been around forth... And yeah.. C really wasn't build for this :(

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: Threaded Code
« Reply #23 on: November 24, 2010, 11:30:52 PM »
Quote from: bloodline;594186
Yeah, just a note to say I'm not talking about Multithreading (I don't want to run more than one task) I'm talking about a method of implementing a Virtual machine using a table of functions/subroutines... What little reading I have done has mostly been around forth... And yeah.. C really wasn't build for this :(

If it helps, I've built a "virtual processor" using function tables / giant switch case depending on compiler settings as an experiment. Can send you the source if it helps.

PM me if you are interested.
« Last Edit: November 24, 2010, 11:38:50 PM by Karlos »
int p; // A
 

Offline bloodlineTopic starter

  • Master Sock Abuser
  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 12113
    • Show only replies by bloodline
    • http://www.troubled-mind.com
Re: Threaded Code
« Reply #24 on: November 25, 2010, 12:02:23 AM »
Quote from: Karlos;594188
If it helps, I've built a "virtual processor" using function tables / giant switch case depending on compiler settings as an experiment. Can send you the source if it helps.

PM me if you are interested.
Very kind, but I already have that! When last discussed this topic I was convinced that a giant switch case was the way to go and you helpfully sent me your own experiments! My project never progressed very far it was just too slow for the task I had intended (basic DSP work)...

Now I'm playing with ARM microcontrollers I'm wondering if I can get them to do something rather fun... That is to say replace an ASIC in a circuit... Or maybe even an old 16bit CPU... ;)

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: Threaded Code
« Reply #25 on: November 25, 2010, 12:04:36 AM »
Quote from: bloodline;594200
Very kind, but I already have that! When last discussed this topic I was convinced that a giant switch case was the way to go and you helpfully sent me your own experiments! My project never progressed very far it was just too slow for the task I had intended (basic DSP work)...

Now I'm playing with ARM microcontrollers I'm wondering if I can get them to do something rather fun... That is to say replace an ASIC in a circuit... Or maybe even an old 16bit CPU... ;)


Is it the one that has a test virtual program to generate the Mandelbrot set as a PPM file?
int p; // A
 

Offline bloodlineTopic starter

  • Master Sock Abuser
  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 12113
    • Show only replies by bloodline
    • http://www.troubled-mind.com
Re: Threaded Code
« Reply #26 on: November 25, 2010, 12:19:28 AM »
Quote from: Karlos;594202
Is it the one that has a test virtual program to generate the Mandelbrot set as a PPM file?
I don't recall any test code...

Offline Trev

  • Hero Member
  • *****
  • Join Date: May 2003
  • Posts: 1550
  • Country: 00
    • Show only replies by Trev
Re: Threaded Code
« Reply #27 on: November 25, 2010, 12:39:51 AM »
This is from the hip and probably dragon-infested, but it looks legal:

Code: [Select]
int thread[] = {
  1, 2, 3, n, 0
};                 /* thread of instructions */

int *ip = thread;  /* initialize instruction pointer */

jmp_buf buf[0xff]; /* 0x00..0xff bytecode operations */
jmp_buf top;       /* top-level interpreter */

if (setjmp(buf[0x00])) {
  /* example uses op 0x00 as terminator, so this should never be executed */
  longjmp(top, 1); /* optionally, use second parameter to raise exceptions */
}

if (setjmp(buf[0x01])) {
  /* do op 0x01, manipulate ip (also the stack) as needed */
  longjmp(top, 1); /* optionally, use second parameter to raise exceptions */
}


if (setjmp(buf[0x02])) {
  /* do op 0x02, manipulate ip (also the stack) as needed */
  longjmp(top, 1); /* optionally, use second parameter to raise exceptions */
}

if (setjmp(buf[n])) {
  /* do op n, manipulate ip (also the stack) as needed */
  longjmp(top, 1); /* optionally, use second parameter to raise exceptions */
}

while (*ip) {
  if (!setjmp(top)) {
    longjmp(buf[*ip++], 1);
  }
  else {
    /* process exception */
  }
}

The overhead of setjmp/longjmp is system specific but probably less than a function call.
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16879
  • Country: gb
  • Thanked: 5 times
    • Show only replies by Karlos
Re: Threaded Code
« Reply #28 on: November 25, 2010, 01:02:16 AM »
Quote from: bloodline;594208
I don't recall any test code...


Then maybe the version you have is old? This one even had a whole bunch of evil C macros that allowed the VM code to be written within the C source:

Code: [Select]



void nativeAllocBuffer(VMCore* vm)
{
  // width/height in r2/r3
  // return buffer in r1
  int w = vm->getReg(_r2).s32();
  int h = vm->getReg(_r3).s32();
  vm->getReg(_r1).pU8() = new uint8[w*h];
  printf("Allocated buffer [%d x %d]\n", w, h);
}

void nativeFreeBuffer(VMCore* vm)
{
  // expects buffer in r1
  delete[] vm->getReg(_r1).pCh();
  vm->getReg(_r1).pCh() = 0;
  printf("Freed buffer\n");
}

void nativeWriteBuffer(VMCore* vm)
{
  // writes buffer in r1 to filename in r4
  // expects width/height in r2/r3
  const char* fileName = vm->getReg(_r16).pCh();
  int w = vm->getReg(_r2).s32();
  int h = vm->getReg(_r3).s32();
  if (fileName) {
    FILE *f = fopen(fileName, "wb");
    fprintf(f, "P5\n%d\n%d\n255\n", w, h);
    fwrite(vm->getReg(_r1).pCh(), 1, w*h, f);
    fclose(f);
    printf("Wrote buffer '%s'\n", fileName);
  }
}

void nativePrintCoords(VMCore* vm)
{
  printf(
    "Coords %4d, %4d (%.6f, %.6f)\n",
    (int)vm->getReg(_r7).s32(),
    (int)vm->getReg(_r5).s32(),
    vm->getReg(_r9).f32(),
    vm->getReg(_r4).f32()
  );
}

_VM_CODE(makeFractal)
{
  // r1 = pixel data address
  // r2 = width in pixels
  // r3 = height in pixels
  // r4 = cY (float pos, starting at yMin)
  // r5 = y (int) pixel
  // r6 = xMin (float)
  // r7 = cX (float pos, starting at xMin)
  // r8 = fStep
  // r9 = x (int) pixel
  // r10 = iStep (1)

  _save       (_mr1)           // 2 : save r1
  _ldq        (0, _r5)         // 1 : y (r5) = 0
  _ld_16_i32  (255, _r10)      // 2 : max iters

  // y-loop
  _ldq        (0, _r9)         // 1 : x = 0
  _move_32    (_r6, _r7)       // 1 : cX = xMin

  // x-loop                        do {

  _move_32    (_r7, _r11)            // 1 : zx = cX
  _move_32    (_r4, _r12)            // 1 : zy = cY
  _ldq        (0, _r13)              // 1 :  n = 0

                                      // do {

  _move_32    (_r11, _r14)           // 1
  _mul_f32    (_r11, _r14)           // 1 : zx2 = zx*zx
  _move_32    (_r12, _r15)           // 1
  _mul_f32    (_r12, _r15)           // 1 : zy2   = zy*zy

  _move_32    (_r7,  _r16)           // 1 : new_zx = cX
  _add_f32    (_r14, _r16)           // 1 : new_zx += zx2
  _sub_f32    (_r15, _r16)           // 1 : new_zx -= zy2

  _add_f32    (_r15, _r14)           // 1 : r14 = zx*zx + zy*zy (for loop test)

  _move_32    (_r11, _r15)           // 1 : tmp = zx
  _mul_f32    (_r12, _r15)           // 1 : tmp *= zy
  _add_f32    (_r15, _r15)           // 1 : tmp += tmp2
  _add_f32    (_r4,  _r15)           // 1 : tmp += cY (tmp = 2*zx*zy+cY)

  _move_32    (_r15, _r12)           // 1 : zy = tmp
  _move_32    (_r16, _r11)           // 1 : zx = new_zx
  _addi_16    (1, _r13)              // 2 : n++

  _ld_32_f32  (4.0f, _r16)             // 3
  _bgr_f32    (_r14, _r16, 2)          // 2
  _bls_32     (_r13, _r10, -23)        // 2

  _mul_u16    (_r13, _r13)             // 1
  _st_ripi_8  (_r13, _r1)              // 1 : out = n
  _add_f32    (_r8, _r7)               // 1 : cX += fStep
  _addi_16    (1, _r9)                 // 2 : x += iStep

  _bls_32     (_r9, _r2, -(6+23+3+1))    // 2 : } while (x < width)

  _add_f32    (_r8, _r4)                 // 1 : cY += fStep
  _addi_16    (1, _r5)                   // 2 : y += iStep
  _bls_32     (_r5, _r3, -(5+5+6+23+1))  // 2 : } while (y < height)

  _restore    (_mr1)                     // 1
  _ret
};

_VM_CODE(calculateRanges)
{
  // calculates xMin in r6, xMax in r7, step in r8
  _move_32    (_r5, _r6)
  _sub_f32    (_r4, _r6)         // r6 = r5-r4 (total y range)
  _s32to_f32  (_r2, _r7)         // r7 = (float) r2
  _move_32    (_r7, _r9)
  _mul_f32    (_r6, _r7)         // r7 *= r6
  _s32to_f32  (_r3, _r6)         // r6 = (float) r3
  _div_f32    (_r6, _r7)         // r7 /= r6
  _move_32    (_r7, _r8)
  _div_f32    (_r9, _r8)
  _ld_32_f32  (0.75f, _r6)
  _sub_f32    (_r7, _r6)
  _add_f32    (_r6, _r7)

  _ret
};


_VM_CODE(virtualProgram)          // a vm function
{
  _ld_16_i16  (512, _r2)
  _ld_16_i16  (512, _r3)
  _calln      (nativeAllocBuffer)
  _ld_32_f32  (-1.25f, _r4)       // yMin
  _ld_32_f32  (1.25f, _r5)        // yMax
  _call       (calculateRanges)
  _call       (makeFractal)
  _lda        (&quot;framebuffer.pgm&quot;, _r16)
  _calln      (nativeWriteBuffer)
  _calln      (nativeFreeBuffer)
  _ret
};
int p; // A
 

Offline bloodlineTopic starter

  • Master Sock Abuser
  • Hero Member
  • *****
  • Join Date: Mar 2002
  • Posts: 12113
    • Show only replies by bloodline
    • http://www.troubled-mind.com
Re: Threaded Code
« Reply #29 from previous page: November 25, 2010, 01:05:24 AM »
Quote from: Karlos;594217
Then maybe the version you have is old? This one even had a whole bunch of evil C macros that allowed the VM code to be written within the C source:

Code: [Select]



void nativeAllocBuffer(VMCore* vm)
{
  // width/height in r2/r3
  // return buffer in r1
  int w = vm->getReg(_r2).s32();
  int h = vm->getReg(_r3).s32();
  vm->getReg(_r1).pU8() = new uint8[w*h];
  printf(&quot;Allocated buffer [%d x %d]\n&quot;, w, h);
}

void nativeFreeBuffer(VMCore* vm)
{
  // expects buffer in r1
  delete[] vm->getReg(_r1).pCh();
  vm->getReg(_r1).pCh() = 0;
  printf(&quot;Freed buffer\n&quot;);
}

void nativeWriteBuffer(VMCore* vm)
{
  // writes buffer in r1 to filename in r4
  // expects width/height in r2/r3
  const char* fileName = vm->getReg(_r16).pCh();
  int w = vm->getReg(_r2).s32();
  int h = vm->getReg(_r3).s32();
  if (fileName) {
    FILE *f = fopen(fileName, &quot;wb&quot;);
    fprintf(f, &quot;P5\n%d\n%d\n255\n&quot;, w, h);
    fwrite(vm->getReg(_r1).pCh(), 1, w*h, f);
    fclose(f);
    printf(&quot;Wrote buffer '%s'\n&quot;, fileName);
  }
}

void nativePrintCoords(VMCore* vm)
{
  printf(
    &quot;Coords %4d, %4d (%.6f, %.6f)\n&quot;,
    (int)vm->getReg(_r7).s32(),
    (int)vm->getReg(_r5).s32(),
    vm->getReg(_r9).f32(),
    vm->getReg(_r4).f32()
  );
}

_VM_CODE(makeFractal)
{
  // r1 = pixel data address
  // r2 = width in pixels
  // r3 = height in pixels
  // r4 = cY (float pos, starting at yMin)
  // r5 = y (int) pixel
  // r6 = xMin (float)
  // r7 = cX (float pos, starting at xMin)
  // r8 = fStep
  // r9 = x (int) pixel
  // r10 = iStep (1)

  _save       (_mr1)           // 2 : save r1
  _ldq        (0, _r5)         // 1 : y (r5) = 0
  _ld_16_i32  (255, _r10)      // 2 : max iters

  // y-loop
  _ldq        (0, _r9)         // 1 : x = 0
  _move_32    (_r6, _r7)       // 1 : cX = xMin

  // x-loop                        do {

  _move_32    (_r7, _r11)            // 1 : zx = cX
  _move_32    (_r4, _r12)            // 1 : zy = cY
  _ldq        (0, _r13)              // 1 :  n = 0

                                      // do {

  _move_32    (_r11, _r14)           // 1
  _mul_f32    (_r11, _r14)           // 1 : zx2 = zx*zx
  _move_32    (_r12, _r15)           // 1
  _mul_f32    (_r12, _r15)           // 1 : zy2   = zy*zy

  _move_32    (_r7,  _r16)           // 1 : new_zx = cX
  _add_f32    (_r14, _r16)           // 1 : new_zx += zx2
  _sub_f32    (_r15, _r16)           // 1 : new_zx -= zy2

  _add_f32    (_r15, _r14)           // 1 : r14 = zx*zx + zy*zy (for loop test)

  _move_32    (_r11, _r15)           // 1 : tmp = zx
  _mul_f32    (_r12, _r15)           // 1 : tmp *= zy
  _add_f32    (_r15, _r15)           // 1 : tmp += tmp2
  _add_f32    (_r4,  _r15)           // 1 : tmp += cY (tmp = 2*zx*zy+cY)

  _move_32    (_r15, _r12)           // 1 : zy = tmp
  _move_32    (_r16, _r11)           // 1 : zx = new_zx
  _addi_16    (1, _r13)              // 2 : n++

  _ld_32_f32  (4.0f, _r16)             // 3
  _bgr_f32    (_r14, _r16, 2)          // 2
  _bls_32     (_r13, _r10, -23)        // 2

  _mul_u16    (_r13, _r13)             // 1
  _st_ripi_8  (_r13, _r1)              // 1 : out = n
  _add_f32    (_r8, _r7)               // 1 : cX += fStep
  _addi_16    (1, _r9)                 // 2 : x += iStep

  _bls_32     (_r9, _r2, -(6+23+3+1))    // 2 : } while (x < width)

  _add_f32    (_r8, _r4)                 // 1 : cY += fStep
  _addi_16    (1, _r5)                   // 2 : y += iStep
  _bls_32     (_r5, _r3, -(5+5+6+23+1))  // 2 : } while (y < height)

  _restore    (_mr1)                     // 1
  _ret
};

_VM_CODE(calculateRanges)
{
  // calculates xMin in r6, xMax in r7, step in r8
  _move_32    (_r5, _r6)
  _sub_f32    (_r4, _r6)         // r6 = r5-r4 (total y range)
  _s32to_f32  (_r2, _r7)         // r7 = (float) r2
  _move_32    (_r7, _r9)
  _mul_f32    (_r6, _r7)         // r7 *= r6
  _s32to_f32  (_r3, _r6)         // r6 = (float) r3
  _div_f32    (_r6, _r7)         // r7 /= r6
  _move_32    (_r7, _r8)
  _div_f32    (_r9, _r8)
  _ld_32_f32  (0.75f, _r6)
  _sub_f32    (_r7, _r6)
  _add_f32    (_r6, _r7)

  _ret
};


_VM_CODE(virtualProgram)          // a vm function
{
  _ld_16_i16  (512, _r2)
  _ld_16_i16  (512, _r3)
  _calln      (nativeAllocBuffer)
  _ld_32_f32  (-1.25f, _r4)       // yMin
  _ld_32_f32  (1.25f, _r5)        // yMax
  _call       (calculateRanges)
  _call       (makeFractal)
  _lda        (&quot;framebuffer.pgm&quot;, _r16)
  _calln      (nativeWriteBuffer)
  _calln      (nativeFreeBuffer)
  _ret
};
Old! I'll say... We were last talking about this in 2003... Hmmm ilike the c macros idea... That allows you to test the functions! :)