Author Topic: Threaded Code (Read 2389 times)

bloodline · « **Reply #29 from previous page:** November 25, 2010, 01:05:24 AM »

Quote from: Karlos;594217

Then maybe the version you have is old? This one even had a whole bunch of evil C macros that allowed the VM code to be written within the C source:

Code: [Select]
void nativeAllocBuffer(VMCore* vm) { // width/height in r2/r3 // return buffer in r1 int w = vm->getReg(_r2).s32(); int h = vm->getReg(_r3).s32(); vm->getReg(_r1).pU8() = new uint8[w*h]; printf("Allocated buffer [%d x %d]\n", w, h); } void nativeFreeBuffer(VMCore* vm) { // expects buffer in r1 delete[] vm->getReg(_r1).pCh(); vm->getReg(_r1).pCh() = 0; printf("Freed buffer\n"); } void nativeWriteBuffer(VMCore* vm) { // writes buffer in r1 to filename in r4 // expects width/height in r2/r3 const char* fileName = vm->getReg(_r16).pCh(); int w = vm->getReg(_r2).s32(); int h = vm->getReg(_r3).s32(); if (fileName) { FILE *f = fopen(fileName, "wb"); fprintf(f, "P5\n%d\n%d\n255\n", w, h); fwrite(vm->getReg(_r1).pCh(), 1, w*h, f); fclose(f); printf("Wrote buffer '%s'\n", fileName); } } void nativePrintCoords(VMCore* vm) { printf( "Coords %4d, %4d (%.6f, %.6f)\n", (int)vm->getReg(_r7).s32(), (int)vm->getReg(_r5).s32(), vm->getReg(_r9).f32(), vm->getReg(_r4).f32() ); } _VM_CODE(makeFractal) { // r1 = pixel data address // r2 = width in pixels // r3 = height in pixels // r4 = cY (float pos, starting at yMin) // r5 = y (int) pixel // r6 = xMin (float) // r7 = cX (float pos, starting at xMin) // r8 = fStep // r9 = x (int) pixel // r10 = iStep (1) _save (_mr1) // 2 : save r1 _ldq (0, _r5) // 1 : y (r5) = 0 _ld_16_i32 (255, _r10) // 2 : max iters // y-loop _ldq (0, _r9) // 1 : x = 0 _move_32 (_r6, _r7) // 1 : cX = xMin // x-loop do { _move_32 (_r7, _r11) // 1 : zx = cX _move_32 (_r4, _r12) // 1 : zy = cY _ldq (0, _r13) // 1 : n = 0 // do { _move_32 (_r11, _r14) // 1 _mul_f32 (_r11, _r14) // 1 : zx2 = zx*zx _move_32 (_r12, _r15) // 1 _mul_f32 (_r12, _r15) // 1 : zy2 = zy*zy _move_32 (_r7, _r16) // 1 : new_zx = cX _add_f32 (_r14, _r16) // 1 : new_zx += zx2 _sub_f32 (_r15, _r16) // 1 : new_zx -= zy2 _add_f32 (_r15, _r14) // 1 : r14 = zx*zx + zy*zy (for loop test) _move_32 (_r11, _r15) // 1 : tmp = zx _mul_f32 (_r12, _r15) // 1 : tmp *= zy _add_f32 (_r15, _r15) // 1 : tmp += tmp2 _add_f32 (_r4, _r15) // 1 : tmp += cY (tmp = 2*zx*zy+cY) _move_32 (_r15, _r12) // 1 : zy = tmp _move_32 (_r16, _r11) // 1 : zx = new_zx _addi_16 (1, _r13) // 2 : n++ _ld_32_f32 (4.0f, _r16) // 3 _bgr_f32 (_r14, _r16, 2) // 2 _bls_32 (_r13, _r10, -23) // 2 _mul_u16 (_r13, _r13) // 1 _st_ripi_8 (_r13, _r1) // 1 : out = n _add_f32 (_r8, _r7) // 1 : cX += fStep _addi_16 (1, _r9) // 2 : x += iStep _bls_32 (_r9, _r2, -(6+23+3+1)) // 2 : } while (x < width) _add_f32 (_r8, _r4) // 1 : cY += fStep _addi_16 (1, _r5) // 2 : y += iStep _bls_32 (_r5, _r3, -(5+5+6+23+1)) // 2 : } while (y < height) _restore (_mr1) // 1 _ret }; _VM_CODE(calculateRanges) { // calculates xMin in r6, xMax in r7, step in r8 _move_32 (_r5, _r6) _sub_f32 (_r4, _r6) // r6 = r5-r4 (total y range) _s32to_f32 (_r2, _r7) // r7 = (float) r2 _move_32 (_r7, _r9) _mul_f32 (_r6, _r7) // r7 *= r6 _s32to_f32 (_r3, _r6) // r6 = (float) r3 _div_f32 (_r6, _r7) // r7 /= r6 _move_32 (_r7, _r8) _div_f32 (_r9, _r8) _ld_32_f32 (0.75f, _r6) _sub_f32 (_r7, _r6) _add_f32 (_r6, _r7) _ret }; _VM_CODE(virtualProgram) // a vm function { _ld_16_i16 (512, _r2) _ld_16_i16 (512, _r3) _calln (nativeAllocBuffer) _ld_32_f32 (-1.25f, _r4) // yMin _ld_32_f32 (1.25f, _r5) // yMax _call (calculateRanges) _call (makeFractal) _lda ("framebuffer.pgm", _r16) _calln (nativeWriteBuffer) _calln (nativeFreeBuffer) _ret };

Old! I'll say... We were last talking about this in 2003... Hmmm ilike the c macros idea... That allows you to test the functions!

Karlos · « **Reply #30 on:** November 25, 2010, 01:08:48 AM »

Quote from: bloodline;594218

Old! I'll say... We were last talking about this in 2003... Hmmm ilike the c macros idea... That allows you to test the functions!

I can send you this version. It basically is a bit crap in that it compiles to a single executable containing the embedded test. The reason being, the final target was for a library which included all the necessary loading/linking stuff. It should compile on any posix compliant system.

bloodline · « **Reply #31 on:** November 25, 2010, 01:13:02 AM »

Quote from: Karlos;594219

I can send you this version. It basically is a bit crap in that it compiles to a single executable containing the embedded test. The reason being, the final target was for a library which included all the necessary loading/linking stuff. It should compile on any posix compliant system.

Kind offer, but I don't have much time right now to develop this idea further

but I am keen to disccus vm techniques so that as time permits I will have a clear idea of the issues and maybe build something! :idea:

Though right now I'm just fighting insomnia... I've got an early start tomorrow and for some reason that means I am unable to get the rest in need

Karlos · « **Reply #32 on:** November 25, 2010, 01:25:28 AM »

@bloodline

Well there are some things I'd do differently if I was going to do it again. That VM had 16 general purpose 64-bit registers that could contain any 8/16/32/64-bit wide elemental type at once. The opcode defined how they were to be interpreted. Each opcode was (at least) a 2-byte entity, with a byte for the operation and usually a byte that encoded the source and destination register. As such it was a load/store architecture.

This made it easy to design and write, but doing it from scratch, I'd probably go for a stack-frame machine. It wouldn't actually be any slower since the above registers are still memory locations anyway, and if done correctly, would allow you to have as many "registers" as you have local data inside any function. That is to say, I'd use the same register-like topology but have only as many of them in a function context as needed.

I did write some documentation, but it is rather out of date, I expect: http://extropia.co.uk/projects/vm/

Trev · « **Reply #33 on:** November 25, 2010, 05:04:58 AM »

My abuse of setjmp/longjmp worked as expected. Add a stack and an op to push values, and you have a very simple (yet poorly designed ;-) virtual machine.

ganyaik · « **Reply #34 on:** November 25, 2010, 06:01:06 AM »

Hi,

I did(/am doing) something similar: a VM on a resource constrained system. I made some experiments with jump tables, ifs and such and settled with plain "switch/case".

In my VM the opcode is always the first byte of the instruction so I can branch on it easily and the C compiler(gcc) generates a nice big jumptable. The code, that jumps to the case: which executes the emulated instruction is a few instructions to create pointer and an indirect jump in assembly. No register saving and such involved.

Hope this helps,
Chris

Karlos · « **Reply #35 on:** November 25, 2010, 11:11:25 AM »

^ that's precisely how mine works when compiled with -D_VM_INTERPRETER=_VM_INTERPRETER_SWITCH_CASE. Otherwise it generates a function table.

There is an incomplete 68K version which uses assembler for the core interpreter. In this model, each opcode handler is at a 64-byte boundary relative to a base address and each handler ends with the code required to read the next opcode and calculate which handler to branch to next. The rest of the space is left to implement the handler or jump out to an external block (for the few that did not fit).

This design showed a lot of promise, performance wise.

Author Topic: Threaded Code (Read 2389 times)

bloodline

Re: Threaded Code

Karlos

Re: Threaded Code

bloodline

Re: Threaded Code

Karlos

Re: Threaded Code

Trev

Re: Threaded Code

ganyaik

Re: Threaded Code

Karlos

Re: Threaded Code