Author Topic: Threaded Code (Read 6391 times)

Karlos · « **on:** November 24, 2010, 02:59:42 PM »

Quote from: bloodline;594067

So here I am thinking about an idea... And it turns out my idea already has a name, "Threaded Code"!

My question is, is there any C/C++ legal way to jump to an address? A computed Goto if you will...

goto *someaddress;

The most obvious legal way to do this is to use a table of function pointers.

Karlos · « **Reply #1 on:** November 24, 2010, 03:03:36 PM »

Quote from: bloodline;594074

No, that's a function pointer which will invoke the calling convention (saving registers etc) which would add a lot of overhead a threaded code Virtual Machine...

Well the gcc extension is as portable as gcc is. If you keep your dubious computed branch target code to just one translation unit, you could always make it an exception in your makefile and have everything else all nice and ANSI.

Karlos · « **Reply #2 on:** November 24, 2010, 07:48:16 PM »

Quote from: bloodline;594117

If I have a function that takes no arguments and returns nothing, and acts only on global variables... then will the compiler optimize for not saving any registers? I doubt it

That depends. If shovel it all into the same translation unit so it can see the scope of everything whilst it is compiling, you'd be surprised...

Karlos · « **Reply #3 on:** November 24, 2010, 09:35:03 PM »

Hmm, IIRC, setjmp / longjmp save more state information than a normal function call does, so if the latter are too expensive, the former aren't likely to be any better?

Karlos · « **Reply #4 on:** November 24, 2010, 11:30:52 PM »

Quote from: bloodline;594186

Yeah, just a note to say I'm not talking about Multithreading (I don't want to run more than one task) I'm talking about a method of implementing a Virtual machine using a table of functions/subroutines... What little reading I have done has mostly been around forth... And yeah.. C really wasn't build for this

If it helps, I've built a "virtual processor" using function tables / giant switch case depending on compiler settings as an experiment. Can send you the source if it helps.

PM me if you are interested.

Karlos · « **Reply #5 on:** November 25, 2010, 12:04:36 AM »

Quote from: bloodline;594200

Very kind, but I already have that! When last discussed this topic I was convinced that a giant switch case was the way to go and you helpfully sent me your own experiments! My project never progressed very far it was just too slow for the task I had intended (basic DSP work)...

Now I'm playing with ARM microcontrollers I'm wondering if I can get them to do something rather fun... That is to say replace an ASIC in a circuit... Or maybe even an old 16bit CPU...

Is it the one that has a test virtual program to generate the Mandelbrot set as a PPM file?

Karlos · « **Reply #6 on:** November 25, 2010, 01:02:16 AM »

Quote from: bloodline;594208

I don't recall any test code...

Then maybe the version you have is old? This one even had a whole bunch of evil C macros that allowed the VM code to be written within the C source:

Code: [Select]




void nativeAllocBuffer(VMCore* vm)
{
  // width/height in r2/r3
  // return buffer in r1
  int w = vm->getReg(_r2).s32();
  int h = vm->getReg(_r3).s32();
  vm->getReg(_r1).pU8() = new uint8[w*h];
  printf(&quot;Allocated buffer [%d x %d]\n&quot;, w, h);
}

void nativeFreeBuffer(VMCore* vm)
{
  // expects buffer in r1
  delete[] vm->getReg(_r1).pCh();
  vm->getReg(_r1).pCh() = 0;
  printf(&quot;Freed buffer\n&quot;);
}

void nativeWriteBuffer(VMCore* vm)
{
  // writes buffer in r1 to filename in r4
  // expects width/height in r2/r3
  const char* fileName = vm->getReg(_r16).pCh();
  int w = vm->getReg(_r2).s32();
  int h = vm->getReg(_r3).s32();
  if (fileName) {
    FILE *f = fopen(fileName, &quot;wb&quot;);
    fprintf(f, &quot;P5\n%d\n%d\n255\n&quot;, w, h);
    fwrite(vm->getReg(_r1).pCh(), 1, w*h, f);
    fclose(f);
    printf(&quot;Wrote buffer '%s'\n&quot;, fileName);
  }
}

void nativePrintCoords(VMCore* vm)
{
  printf(
    &quot;Coords %4d, %4d (%.6f, %.6f)\n&quot;,
    (int)vm->getReg(_r7).s32(),
    (int)vm->getReg(_r5).s32(),
    vm->getReg(_r9).f32(),
    vm->getReg(_r4).f32()
  );
}

_VM_CODE(makeFractal)
{
  // r1 = pixel data address
  // r2 = width in pixels
  // r3 = height in pixels
  // r4 = cY (float pos, starting at yMin)
  // r5 = y (int) pixel
  // r6 = xMin (float)
  // r7 = cX (float pos, starting at xMin)
  // r8 = fStep
  // r9 = x (int) pixel
  // r10 = iStep (1)

  _save       (_mr1)           // 2 : save r1
  _ldq        (0, _r5)         // 1 : y (r5) = 0
  _ld_16_i32  (255, _r10)      // 2 : max iters

  // y-loop
  _ldq        (0, _r9)         // 1 : x = 0
  _move_32    (_r6, _r7)       // 1 : cX = xMin

  // x-loop                        do {

  _move_32    (_r7, _r11)            // 1 : zx = cX
  _move_32    (_r4, _r12)            // 1 : zy = cY
  _ldq        (0, _r13)              // 1 :  n = 0

                                      // do {

  _move_32    (_r11, _r14)           // 1
  _mul_f32    (_r11, _r14)           // 1 : zx2 = zx*zx
  _move_32    (_r12, _r15)           // 1
  _mul_f32    (_r12, _r15)           // 1 : zy2   = zy*zy

  _move_32    (_r7,  _r16)           // 1 : new_zx = cX
  _add_f32    (_r14, _r16)           // 1 : new_zx += zx2
  _sub_f32    (_r15, _r16)           // 1 : new_zx -= zy2

  _add_f32    (_r15, _r14)           // 1 : r14 = zx*zx + zy*zy (for loop test)

  _move_32    (_r11, _r15)           // 1 : tmp = zx
  _mul_f32    (_r12, _r15)           // 1 : tmp *= zy
  _add_f32    (_r15, _r15)           // 1 : tmp += tmp2
  _add_f32    (_r4,  _r15)           // 1 : tmp += cY (tmp = 2*zx*zy+cY)

  _move_32    (_r15, _r12)           // 1 : zy = tmp
  _move_32    (_r16, _r11)           // 1 : zx = new_zx
  _addi_16    (1, _r13)              // 2 : n++

  _ld_32_f32  (4.0f, _r16)             // 3
  _bgr_f32    (_r14, _r16, 2)          // 2
  _bls_32     (_r13, _r10, -23)        // 2

  _mul_u16    (_r13, _r13)             // 1
  _st_ripi_8  (_r13, _r1)              // 1 : out = n
  _add_f32    (_r8, _r7)               // 1 : cX += fStep
  _addi_16    (1, _r9)                 // 2 : x += iStep

  _bls_32     (_r9, _r2, -(6+23+3+1))    // 2 : } while (x < width)

  _add_f32    (_r8, _r4)                 // 1 : cY += fStep
  _addi_16    (1, _r5)                   // 2 : y += iStep
  _bls_32     (_r5, _r3, -(5+5+6+23+1))  // 2 : } while (y < height)

  _restore    (_mr1)                     // 1
  _ret
};

_VM_CODE(calculateRanges)
{
  // calculates xMin in r6, xMax in r7, step in r8
  _move_32    (_r5, _r6)
  _sub_f32    (_r4, _r6)         // r6 = r5-r4 (total y range)
  _s32to_f32  (_r2, _r7)         // r7 = (float) r2
  _move_32    (_r7, _r9)
  _mul_f32    (_r6, _r7)         // r7 *= r6
  _s32to_f32  (_r3, _r6)         // r6 = (float) r3
  _div_f32    (_r6, _r7)         // r7 /= r6
  _move_32    (_r7, _r8)
  _div_f32    (_r9, _r8)
  _ld_32_f32  (0.75f, _r6)
  _sub_f32    (_r7, _r6)
  _add_f32    (_r6, _r7)

  _ret
};


_VM_CODE(virtualProgram)          // a vm function
{
  _ld_16_i16  (512, _r2)
  _ld_16_i16  (512, _r3)
  _calln      (nativeAllocBuffer)
  _ld_32_f32  (-1.25f, _r4)       // yMin
  _ld_32_f32  (1.25f, _r5)        // yMax
  _call       (calculateRanges)
  _call       (makeFractal)
  _lda        (&quot;framebuffer.pgm&quot;, _r16)
  _calln      (nativeWriteBuffer)
  _calln      (nativeFreeBuffer)
  _ret
};

Karlos · « **Reply #7 on:** November 25, 2010, 01:08:48 AM »

Quote from: bloodline;594218

Old! I'll say... We were last talking about this in 2003... Hmmm ilike the c macros idea... That allows you to test the functions!

I can send you this version. It basically is a bit crap in that it compiles to a single executable containing the embedded test. The reason being, the final target was for a library which included all the necessary loading/linking stuff. It should compile on any posix compliant system.

Karlos · « **Reply #8 on:** November 25, 2010, 01:25:28 AM »

@bloodline

Well there are some things I'd do differently if I was going to do it again. That VM had 16 general purpose 64-bit registers that could contain any 8/16/32/64-bit wide elemental type at once. The opcode defined how they were to be interpreted. Each opcode was (at least) a 2-byte entity, with a byte for the operation and usually a byte that encoded the source and destination register. As such it was a load/store architecture.

This made it easy to design and write, but doing it from scratch, I'd probably go for a stack-frame machine. It wouldn't actually be any slower since the above registers are still memory locations anyway, and if done correctly, would allow you to have as many "registers" as you have local data inside any function. That is to say, I'd use the same register-like topology but have only as many of them in a function context as needed.

I did write some documentation, but it is rather out of date, I expect: http://extropia.co.uk/projects/vm/

Karlos · « **Reply #9 on:** November 25, 2010, 11:11:25 AM »

^ that's precisely how mine works when compiled with -D_VM_INTERPRETER=_VM_INTERPRETER_SWITCH_CASE. Otherwise it generates a function table.

There is an incomplete 68K version which uses assembler for the core interpreter. In this model, each opcode handler is at a 64-byte boundary relative to a base address and each handler ends with the code required to read the next opcode and calculate which handler to branch to next. The rest of the space is left to implement the handler or jump out to an external block (for the few that did not fit).

This design showed a lot of promise, performance wise.

Author Topic: Threaded Code (Read 6391 times)

Karlos

Re: Threaded Code

Karlos

Re: Threaded Code

Karlos

Re: Threaded Code

Karlos

Re: Threaded Code

Karlos

Re: Threaded Code

Karlos

Re: Threaded Code

Karlos

Re: Threaded Code

Karlos

Re: Threaded Code

Karlos

Re: Threaded Code

Karlos

Re: Threaded Code