Amiga.org
The "Not Quite Amiga but still computer related category" => Amiga Emulation => Topic started by: bloodline on November 24, 2010, 02:31:29 PM
-
So here I am thinking about an idea... And it turns out my idea already has a name, "Threaded Code"!
My question is, is there any C/C++ legal way to jump to an address? A computed Goto if you will...
goto *someaddress;
-
Don't no too much about C, but I read that it explicitly doesn't allow such things.
-
Don't no too much about C, but I read that it explicitly doesn't allow such things.
Yeah... I think I've found a gcc extention that allows it... But I'd prefer something more portable :(
-
Yes, something like this should work
void (*app)() = (void(*))0x123456;
and then later
app();
I can't test this right now, but I'm pretty sure the above would jump to addres 0x123456.
-
So here I am thinking about an idea... And it turns out my idea already has a name, "Threaded Code"!
My question is, is there any C/C++ legal way to jump to an address? A computed Goto if you will...
goto *someaddress;
The most obvious legal way to do this is to use a table of function pointers.
-
Yes, something like this should work
void (*app)() = (void(*))0x123456;
and then later
app();
I can't test this right now, but I'm pretty sure the above would jump to addres 0x123456.
No, that's a function pointer which will invoke the calling convention (saving registers etc) which would add a lot of overhead a threaded code Virtual Machine...
-
No, that's a function pointer which will invoke the calling convention (saving registers etc) which would add a lot of overhead a threaded code Virtual Machine...
Well the gcc extension is as portable as gcc is. If you keep your dubious computed branch target code to just one translation unit, you could always make it an exception in your makefile and have everything else all nice and ANSI.
-
The most obvious legal way to do this is to use a table of function pointers.
I appreciate the advice (and from skurk too), but a function pointer is just too expensive in this regard... My VM needs to run on my 100mhz ARM M3 microcontroller... We have to think lightweight! But I also need it to be portable so I can test it on my test machine... An x86... So ARM Asm is going to be problematic...
-
No, that's a function pointer which will invoke the calling convention (saving registers etc) which would add a lot of overhead a threaded code Virtual Machine...
Oh, then I misunderstood your question. :-)
-
Well the gcc extension is as portable as gcc is. If you keep your dubious computed branch target code to just one translation unit, you could always make it an exception in your makefile and have everything else all nice and ANSI.
Yeah, that's almost certainly the most intelligent way to do it! :)
-
No, that's a function pointer which will invoke the calling convention (saving registers etc) which would add a lot of overhead a threaded code Virtual Machine...
Oh, then I misunderstood your question. :-)
Not really, your advice was good, I didn't provide enough specification for you! As Karlos has also pointed out function pointers are the legal way to do this... I just don't want to save the registers for every function call!
-
If I'm happy to stay with gcc, then this seems to be the winner:
http://docs.freebsd.org/info/gcc/gcc.info.Labels_as_Values.html
Bit sucky really...
-
If you're just writing for ARM and testing on x86, couldn't you use a conditionally-compiled bit of assembler like:
#ifdef ARM
asm { /* whatever */}
#else
asm { /* x86 equivalent */}
#endif
or somesuch?
-
If you're just writing for ARM and testing on x86, couldn't you use a conditionally-compiled bit of assembler like:
#ifdef ARM
asm { /* whatever */}
#else
asm { /* x86 equivalent */}
#endif
or somesuch?
It's not the code paths that I have a problem with... it's maintaining two code paths :lol: and in ASM too... kinda defeats the point of using C in the first place ;)
-
Not really, your advice was good, I didn't provide enough specification for you! As Karlos has also pointed out function pointers are the legal way to do this... I just don't want to save the registers for every function call!
Necessarily it is not saving the registers for every function call. It is platform specific...
-
Necessarily it is not saving the registers for every function call. It is platform specific...
If I have a function that takes no arguments and returns nothing, and acts only on global variables... then will the compiler optimize for not saving any registers? I doubt it :(
-
If I have a function that takes no arguments and returns nothing, and acts only on global variables... then will the compiler optimize for not saving any registers? I doubt it :(
That depends. If shovel it all into the same translation unit so it can see the scope of everything whilst it is compiling, you'd be surprised...
-
That depends. If shovel it all into the same translation unit so it can see the scope of everything whilst it is compiling, you'd be surprised...
Unfortunately the ARM compiler I'm using doesn't allow me access to the asm (I support I could disassemble it...) so I can't see what's really going on...
But you are right though, I've seen some pretty incredible optimization done by a C complier before...
I'm rapidly losing interest in this idea... as my real life workload has just increased :(
-
setjmp() and longjmp()? "sjlj" exceptions are used in many C++ implementations, including G++ on AmigaOS. In C, see the Protothreads library (more like fibers than threads) for creative uses http://www.sics.se/~adam/pt/. Protothreads is used by uIP.
-
Hmm, IIRC, setjmp / longjmp save more state information than a normal function call does, so if the latter are too expensive, the former aren't likely to be any better?
-
Then you're just talking about a state machine of some sort, e.g. in pseudo code:
unsigned thread = 0;
unsigned state0 = 0;
unsigned state1 = 0;
unsigned state2 = 0;
unsigned state3 = 0;
while (1) {
switch (thread++ % 4)
{
case 0:
switch (state0++ % 2)
{
case 0:
/* do something */
break; /* yield */
case 1:
/* continue doing something */
}
break;
case 1:
switch (state1++ % 2)
{
case 0:
/* do something */
break; /* yield */
case 1:
/* continue doing something */
}
break;
case 2:
switch (state2++ % 2)
{
case 0:
/* do something */
break; /* yield */
case 1:
/* continue doing something */
}
break;
case 3:
switch (state3++ % 2)
{
case 0:
/* do something */
break; /* yield */
case 1:
/* continue doing something */
}
}
}
That's essentially what Protothreads does, albeit with the overhead of setjmp/longjmp.
-
protothreads / coroutines. But mostly, C was not built to do what you want to do. Look at a lot of Forth interpreters and see how they do it, they pioneered the threaded code model, but mostly the ones I've seen use this method are written in assembler.
-
protothreads / coroutines. But mostly, C was not built to do what you want to do. Look at a lot of Forth interpreters and see how they do it, they pioneered the threaded code model, but mostly the ones I've seen use this method are written in assembler.
Yeah, just a note to say I'm not talking about Multithreading (I don't want to run more than one task) I'm talking about a method of implementing a Virtual machine using a table of functions/subroutines... What little reading I have done has mostly been around forth... And yeah.. C really wasn't build for this :(
-
Yeah, just a note to say I'm not talking about Multithreading (I don't want to run more than one task) I'm talking about a method of implementing a Virtual machine using a table of functions/subroutines... What little reading I have done has mostly been around forth... And yeah.. C really wasn't build for this :(
If it helps, I've built a "virtual processor" using function tables / giant switch case depending on compiler settings as an experiment. Can send you the source if it helps.
PM me if you are interested.
-
If it helps, I've built a "virtual processor" using function tables / giant switch case depending on compiler settings as an experiment. Can send you the source if it helps.
PM me if you are interested.
Very kind, but I already have that! When last discussed this topic I was convinced that a giant switch case was the way to go and you helpfully sent me your own experiments! My project never progressed very far it was just too slow for the task I had intended (basic DSP work)...
Now I'm playing with ARM microcontrollers I'm wondering if I can get them to do something rather fun... That is to say replace an ASIC in a circuit... Or maybe even an old 16bit CPU... ;)
-
Very kind, but I already have that! When last discussed this topic I was convinced that a giant switch case was the way to go and you helpfully sent me your own experiments! My project never progressed very far it was just too slow for the task I had intended (basic DSP work)...
Now I'm playing with ARM microcontrollers I'm wondering if I can get them to do something rather fun... That is to say replace an ASIC in a circuit... Or maybe even an old 16bit CPU... ;)
Is it the one that has a test virtual program to generate the Mandelbrot set as a PPM file?
-
Is it the one that has a test virtual program to generate the Mandelbrot set as a PPM file?
I don't recall any test code...
-
This is from the hip and probably dragon-infested, but it looks legal:
int thread[] = {
1, 2, 3, n, 0
}; /* thread of instructions */
int *ip = thread; /* initialize instruction pointer */
jmp_buf buf[0xff]; /* 0x00..0xff bytecode operations */
jmp_buf top; /* top-level interpreter */
if (setjmp(buf[0x00])) {
/* example uses op 0x00 as terminator, so this should never be executed */
longjmp(top, 1); /* optionally, use second parameter to raise exceptions */
}
if (setjmp(buf[0x01])) {
/* do op 0x01, manipulate ip (also the stack) as needed */
longjmp(top, 1); /* optionally, use second parameter to raise exceptions */
}
if (setjmp(buf[0x02])) {
/* do op 0x02, manipulate ip (also the stack) as needed */
longjmp(top, 1); /* optionally, use second parameter to raise exceptions */
}
if (setjmp(buf[n])) {
/* do op n, manipulate ip (also the stack) as needed */
longjmp(top, 1); /* optionally, use second parameter to raise exceptions */
}
while (*ip) {
if (!setjmp(top)) {
longjmp(buf[*ip++], 1);
}
else {
/* process exception */
}
}
The overhead of setjmp/longjmp is system specific but probably less than a function call.
-
I don't recall any test code...
Then maybe the version you have is old? This one even had a whole bunch of evil C macros that allowed the VM code to be written within the C source:
void nativeAllocBuffer(VMCore* vm)
{
// width/height in r2/r3
// return buffer in r1
int w = vm->getReg(_r2).s32();
int h = vm->getReg(_r3).s32();
vm->getReg(_r1).pU8() = new uint8[w*h];
printf("Allocated buffer [%d x %d]\n", w, h);
}
void nativeFreeBuffer(VMCore* vm)
{
// expects buffer in r1
delete[] vm->getReg(_r1).pCh();
vm->getReg(_r1).pCh() = 0;
printf("Freed buffer\n");
}
void nativeWriteBuffer(VMCore* vm)
{
// writes buffer in r1 to filename in r4
// expects width/height in r2/r3
const char* fileName = vm->getReg(_r16).pCh();
int w = vm->getReg(_r2).s32();
int h = vm->getReg(_r3).s32();
if (fileName) {
FILE *f = fopen(fileName, "wb");
fprintf(f, "P5\n%d\n%d\n255\n", w, h);
fwrite(vm->getReg(_r1).pCh(), 1, w*h, f);
fclose(f);
printf("Wrote buffer '%s'\n", fileName);
}
}
void nativePrintCoords(VMCore* vm)
{
printf(
"Coords %4d, %4d (%.6f, %.6f)\n",
(int)vm->getReg(_r7).s32(),
(int)vm->getReg(_r5).s32(),
vm->getReg(_r9).f32(),
vm->getReg(_r4).f32()
);
}
_VM_CODE(makeFractal)
{
// r1 = pixel data address
// r2 = width in pixels
// r3 = height in pixels
// r4 = cY (float pos, starting at yMin)
// r5 = y (int) pixel
// r6 = xMin (float)
// r7 = cX (float pos, starting at xMin)
// r8 = fStep
// r9 = x (int) pixel
// r10 = iStep (1)
_save (_mr1) // 2 : save r1
_ldq (0, _r5) // 1 : y (r5) = 0
_ld_16_i32 (255, _r10) // 2 : max iters
// y-loop
_ldq (0, _r9) // 1 : x = 0
_move_32 (_r6, _r7) // 1 : cX = xMin
// x-loop do {
_move_32 (_r7, _r11) // 1 : zx = cX
_move_32 (_r4, _r12) // 1 : zy = cY
_ldq (0, _r13) // 1 : n = 0
// do {
_move_32 (_r11, _r14) // 1
_mul_f32 (_r11, _r14) // 1 : zx2 = zx*zx
_move_32 (_r12, _r15) // 1
_mul_f32 (_r12, _r15) // 1 : zy2 = zy*zy
_move_32 (_r7, _r16) // 1 : new_zx = cX
_add_f32 (_r14, _r16) // 1 : new_zx += zx2
_sub_f32 (_r15, _r16) // 1 : new_zx -= zy2
_add_f32 (_r15, _r14) // 1 : r14 = zx*zx + zy*zy (for loop test)
_move_32 (_r11, _r15) // 1 : tmp = zx
_mul_f32 (_r12, _r15) // 1 : tmp *= zy
_add_f32 (_r15, _r15) // 1 : tmp += tmp2
_add_f32 (_r4, _r15) // 1 : tmp += cY (tmp = 2*zx*zy+cY)
_move_32 (_r15, _r12) // 1 : zy = tmp
_move_32 (_r16, _r11) // 1 : zx = new_zx
_addi_16 (1, _r13) // 2 : n++
_ld_32_f32 (4.0f, _r16) // 3
_bgr_f32 (_r14, _r16, 2) // 2
_bls_32 (_r13, _r10, -23) // 2
_mul_u16 (_r13, _r13) // 1
_st_ripi_8 (_r13, _r1) // 1 : out = n
_add_f32 (_r8, _r7) // 1 : cX += fStep
_addi_16 (1, _r9) // 2 : x += iStep
_bls_32 (_r9, _r2, -(6+23+3+1)) // 2 : } while (x < width)
_add_f32 (_r8, _r4) // 1 : cY += fStep
_addi_16 (1, _r5) // 2 : y += iStep
_bls_32 (_r5, _r3, -(5+5+6+23+1)) // 2 : } while (y < height)
_restore (_mr1) // 1
_ret
};
_VM_CODE(calculateRanges)
{
// calculates xMin in r6, xMax in r7, step in r8
_move_32 (_r5, _r6)
_sub_f32 (_r4, _r6) // r6 = r5-r4 (total y range)
_s32to_f32 (_r2, _r7) // r7 = (float) r2
_move_32 (_r7, _r9)
_mul_f32 (_r6, _r7) // r7 *= r6
_s32to_f32 (_r3, _r6) // r6 = (float) r3
_div_f32 (_r6, _r7) // r7 /= r6
_move_32 (_r7, _r8)
_div_f32 (_r9, _r8)
_ld_32_f32 (0.75f, _r6)
_sub_f32 (_r7, _r6)
_add_f32 (_r6, _r7)
_ret
};
_VM_CODE(virtualProgram) // a vm function
{
_ld_16_i16 (512, _r2)
_ld_16_i16 (512, _r3)
_calln (nativeAllocBuffer)
_ld_32_f32 (-1.25f, _r4) // yMin
_ld_32_f32 (1.25f, _r5) // yMax
_call (calculateRanges)
_call (makeFractal)
_lda ("framebuffer.pgm", _r16)
_calln (nativeWriteBuffer)
_calln (nativeFreeBuffer)
_ret
};
-
Then maybe the version you have is old? This one even had a whole bunch of evil C macros that allowed the VM code to be written within the C source:
void nativeAllocBuffer(VMCore* vm)
{
// width/height in r2/r3
// return buffer in r1
int w = vm->getReg(_r2).s32();
int h = vm->getReg(_r3).s32();
vm->getReg(_r1).pU8() = new uint8[w*h];
printf("Allocated buffer [%d x %d]\n", w, h);
}
void nativeFreeBuffer(VMCore* vm)
{
// expects buffer in r1
delete[] vm->getReg(_r1).pCh();
vm->getReg(_r1).pCh() = 0;
printf("Freed buffer\n");
}
void nativeWriteBuffer(VMCore* vm)
{
// writes buffer in r1 to filename in r4
// expects width/height in r2/r3
const char* fileName = vm->getReg(_r16).pCh();
int w = vm->getReg(_r2).s32();
int h = vm->getReg(_r3).s32();
if (fileName) {
FILE *f = fopen(fileName, "wb");
fprintf(f, "P5\n%d\n%d\n255\n", w, h);
fwrite(vm->getReg(_r1).pCh(), 1, w*h, f);
fclose(f);
printf("Wrote buffer '%s'\n", fileName);
}
}
void nativePrintCoords(VMCore* vm)
{
printf(
"Coords %4d, %4d (%.6f, %.6f)\n",
(int)vm->getReg(_r7).s32(),
(int)vm->getReg(_r5).s32(),
vm->getReg(_r9).f32(),
vm->getReg(_r4).f32()
);
}
_VM_CODE(makeFractal)
{
// r1 = pixel data address
// r2 = width in pixels
// r3 = height in pixels
// r4 = cY (float pos, starting at yMin)
// r5 = y (int) pixel
// r6 = xMin (float)
// r7 = cX (float pos, starting at xMin)
// r8 = fStep
// r9 = x (int) pixel
// r10 = iStep (1)
_save (_mr1) // 2 : save r1
_ldq (0, _r5) // 1 : y (r5) = 0
_ld_16_i32 (255, _r10) // 2 : max iters
// y-loop
_ldq (0, _r9) // 1 : x = 0
_move_32 (_r6, _r7) // 1 : cX = xMin
// x-loop do {
_move_32 (_r7, _r11) // 1 : zx = cX
_move_32 (_r4, _r12) // 1 : zy = cY
_ldq (0, _r13) // 1 : n = 0
// do {
_move_32 (_r11, _r14) // 1
_mul_f32 (_r11, _r14) // 1 : zx2 = zx*zx
_move_32 (_r12, _r15) // 1
_mul_f32 (_r12, _r15) // 1 : zy2 = zy*zy
_move_32 (_r7, _r16) // 1 : new_zx = cX
_add_f32 (_r14, _r16) // 1 : new_zx += zx2
_sub_f32 (_r15, _r16) // 1 : new_zx -= zy2
_add_f32 (_r15, _r14) // 1 : r14 = zx*zx + zy*zy (for loop test)
_move_32 (_r11, _r15) // 1 : tmp = zx
_mul_f32 (_r12, _r15) // 1 : tmp *= zy
_add_f32 (_r15, _r15) // 1 : tmp += tmp2
_add_f32 (_r4, _r15) // 1 : tmp += cY (tmp = 2*zx*zy+cY)
_move_32 (_r15, _r12) // 1 : zy = tmp
_move_32 (_r16, _r11) // 1 : zx = new_zx
_addi_16 (1, _r13) // 2 : n++
_ld_32_f32 (4.0f, _r16) // 3
_bgr_f32 (_r14, _r16, 2) // 2
_bls_32 (_r13, _r10, -23) // 2
_mul_u16 (_r13, _r13) // 1
_st_ripi_8 (_r13, _r1) // 1 : out = n
_add_f32 (_r8, _r7) // 1 : cX += fStep
_addi_16 (1, _r9) // 2 : x += iStep
_bls_32 (_r9, _r2, -(6+23+3+1)) // 2 : } while (x < width)
_add_f32 (_r8, _r4) // 1 : cY += fStep
_addi_16 (1, _r5) // 2 : y += iStep
_bls_32 (_r5, _r3, -(5+5+6+23+1)) // 2 : } while (y < height)
_restore (_mr1) // 1
_ret
};
_VM_CODE(calculateRanges)
{
// calculates xMin in r6, xMax in r7, step in r8
_move_32 (_r5, _r6)
_sub_f32 (_r4, _r6) // r6 = r5-r4 (total y range)
_s32to_f32 (_r2, _r7) // r7 = (float) r2
_move_32 (_r7, _r9)
_mul_f32 (_r6, _r7) // r7 *= r6
_s32to_f32 (_r3, _r6) // r6 = (float) r3
_div_f32 (_r6, _r7) // r7 /= r6
_move_32 (_r7, _r8)
_div_f32 (_r9, _r8)
_ld_32_f32 (0.75f, _r6)
_sub_f32 (_r7, _r6)
_add_f32 (_r6, _r7)
_ret
};
_VM_CODE(virtualProgram) // a vm function
{
_ld_16_i16 (512, _r2)
_ld_16_i16 (512, _r3)
_calln (nativeAllocBuffer)
_ld_32_f32 (-1.25f, _r4) // yMin
_ld_32_f32 (1.25f, _r5) // yMax
_call (calculateRanges)
_call (makeFractal)
_lda ("framebuffer.pgm", _r16)
_calln (nativeWriteBuffer)
_calln (nativeFreeBuffer)
_ret
};
Old! I'll say... We were last talking about this in 2003... Hmmm ilike the c macros idea... That allows you to test the functions! :)
-
Old! I'll say... We were last talking about this in 2003... Hmmm ilike the c macros idea... That allows you to test the functions! :)
I can send you this version. It basically is a bit crap in that it compiles to a single executable containing the embedded test. The reason being, the final target was for a library which included all the necessary loading/linking stuff. It should compile on any posix compliant system.
-
I can send you this version. It basically is a bit crap in that it compiles to a single executable containing the embedded test. The reason being, the final target was for a library which included all the necessary loading/linking stuff. It should compile on any posix compliant system.
Kind offer, but I don't have much time right now to develop this idea further :) but I am keen to disccus vm techniques so that as time permits I will have a clear idea of the issues and maybe build something! :idea:
Though right now I'm just fighting insomnia... I've got an early start tomorrow and for some reason that means I am unable to get the rest in need :(
-
@bloodline
Well there are some things I'd do differently if I was going to do it again. That VM had 16 general purpose 64-bit registers that could contain any 8/16/32/64-bit wide elemental type at once. The opcode defined how they were to be interpreted. Each opcode was (at least) a 2-byte entity, with a byte for the operation and usually a byte that encoded the source and destination register. As such it was a load/store architecture.
This made it easy to design and write, but doing it from scratch, I'd probably go for a stack-frame machine. It wouldn't actually be any slower since the above registers are still memory locations anyway, and if done correctly, would allow you to have as many "registers" as you have local data inside any function. That is to say, I'd use the same register-like topology but have only as many of them in a function context as needed.
I did write some documentation, but it is rather out of date, I expect: http://extropia.co.uk/projects/vm/
-
My abuse of setjmp/longjmp worked as expected. Add a stack and an op to push values, and you have a very simple (yet poorly designed ;-) virtual machine.
-
Hi,
I did(/am doing) something similar: a VM on a resource constrained system. I made some experiments with jump tables, ifs and such and settled with plain "switch/case".
In my VM the opcode is always the first byte of the instruction so I can branch on it easily and the C compiler(gcc) generates a nice big jumptable. The code, that jumps to the case: which executes the emulated instruction is a few instructions to create pointer and an indirect jump in assembly. No register saving and such involved.
Hope this helps,
Chris
-
^ that's precisely how mine works when compiled with -D_VM_INTERPRETER=_VM_INTERPRETER_SWITCH_CASE. Otherwise it generates a function table.
There is an incomplete 68K version which uses assembler for the core interpreter. In this model, each opcode handler is at a 64-byte boundary relative to a base address and each handler ends with the code required to read the next opcode and calculate which handler to branch to next. The rest of the space is left to implement the handler or jump out to an external block (for the few that did not fit).
This design showed a lot of promise, performance wise.