Amiga.org
Amiga computer related discussion => Amiga Software Issues and Discussion => Topic started by: bloodline on February 21, 2005, 12:10:25 PM
-
Is there any "nice" way to do this in C, pointer magic etc?
Imagine I have an array of pointers to allow me to jump to a subroutine based on the index stored in a variable:
int* jump[3];
jump[0]= &task1;
jump[1]= &task2;
jump[2]= &task3;
a=user_input(); // where the user could enter 1, 2 or 3
Goto jump(a);
task_return:
task1:
Do Something;
goto task_return;
task2:
Do Something else;
goto task_return;
task3:
Do Something different;
goto task_return;
I have no idea why I'm asking this... :crazy:
-
I won't give you the exact solution, but this should help still:
#include <stdio.h>
typedef int (*funcptr_t)(int, char *);
int func1(int x, char *str);
int func2(int x, char *str);
int func3(int x, char *str);
int main(void)
{
funcptr_t funcarray[3] =
{
func1,
func2,
func3
};
int i;
for (i = 0; i < 3; i++)
{
int ret;
ret = funcarray[i](i * i + 2, "moo");
printf("ret %d\n", ret);
}
return 0;
}
int func1(int x, char *str)
{
printf("func1 x=%d, str=<%s>\n", x, str);
return 1;
}
int func2(int x, char *str)
{
printf("func2 x=%d, str=<%s>\n", x, str);
return 2;
}
int func3(int x, char *str)
{
printf("func3 x=%d, str=<%s>\n", x, str);
return 3;
}
[EDIT]changed i*i to i*i+2 to make it more obvious the input isn't always the index ;-)[/EDIT]
-
Piru wrote:
I won't give you the exact solution, but this should help still:
#include <stdio.h>
typedef int (*funcptr_t)(int, char *);
int func1(int x, char *str);
int func2(int x, char *str);
int func3(int x, char *str);
int main(void)
{
funcptr_t funcarray[3] =
{
func1,
func2,
func3
};
int i;
for (i = 0; i < 3; i++)
{
int ret;
ret = funcarray[i](i*i, "moo");
printf("ret %d\n", ret);
}
return 0;
}
int func1(int x, char *str)
{
printf("func1 x=%d, str=<%s>\n", x, str);
return 1;
}
int func2(int x, char *str)
{
printf("func2 x=%d, str=<%s>\n", x, str);
return 2;
}
int func3(int x, char *str)
{
printf("func3 x=%d, str=<%s>\n", x, str);
return 3;
}
Hmmm, yeah that actually makes more sense than what I was trying to achive!! Cheers :-D
-
Function lookup is great :-)
Generally, the syntax for a function pointer is ugly, but typedefs help a lot
eg
typedef (*)();
So, suppose you want a pointer to function of the signature "int func(const char* blah);"
you would use
typedef int (*MyFuncPtrType)(const char*);
and then you can do stuff like
int foo(const char* blah);
int moo(const char* blah);
int poo(const char* blah);
...
MyFuncPtr p = &foo; /* the & is optional here btw */
p("hello!");
Or create arrays of pointers of this kind,
MyFuncPtr jumpTable[] = { &foo, &moo, &poo };
and do
jumpTable[1]("Hello");
etc...
-
@bloodline
Of course C++ virtual functions are a bit nicer to use ;-)
-
Karlos wrote:
@bloodline
Of course C++ virtual functions are a bit nicer to use ;-)
Que?
-
@bloodline
Consider
class GameMonsterAI {
public:
virtual void huntTarget(); // AI behaviours
virtual void foundTarget();
virutal void attackTarget();
//..
};
Then later
class UglyMonster : public GameMosterAI {
public:
void huntTarget(); // unique implementation for this monster
void foundTarget(); // ditto
void attackTarget(); // ditto
//...
};
class FlyingMonster : public GameMosterAI {
public:
void huntTarget(); // unique implementation for this monster
void foundTarget(); // ditto
void attackTarget(); // ditto
//...
};
...and you might have loads of different monsters types.
You don't need to know exactly what monster you have to tell it how to huntTarget(), you just need to know that it has that method. This is the root of your "polymorphic" behaviour.
Whatever real monster type you have, simply handling it as an instance of GameMonsterAI and calling the huntTarget() method will invoke the appropriate behavour for the actual monster at hand.
That's about it for virtual functions. Each class that implements a set of virtual functions generally uses a jump table internally, so they're the same thing really.
It's just a nicer syntax :-)
-
yeah.. polymorphism rules, just too bad c++ is such a killer on the 68k's :(
-
Hmmm, But I can't see how virtual functions are applicable to my initial problem, which is to speed up a conditional path... (I've only realised that is what I was trying to achive after I saw Piru's example).
-Edit-
If you look at my original post, I could have written:
switch(a) {
case 1 : Do_something;
case 2 : Do_Something_else;
case 3 : Do_something_different;
}
I'm not sure why I chose to do it the other way... except maybe to eliminate any conditionality from the code.
-
So I suppose my question is, How does one eliminate uncertainty from a program :-D
-
Jump tables can be much faster than switch/case constructs. The latter often end up like large nested if/else if, especially when the blocks of code are different sizes or if the range has gaps. The further down the switch you go, the longer it takes.
A function lookup table can be written instead. It totally doesnt matter if the functions are different sizes or if there are holes in the range of cases - provided your overall table is big enough for every known case value.
The function lookup table effectively takes a constant time to evaluate for any case.
I use them in quite a few places where you might think a switch/case would do (I only use the switch/case when the range is small and the code for each case is so small that physically calling the function would take longer).
-
LP wrote:
yeah.. polymorphism rules, just too bad c++ is such a killer on the 68k's :(
It isn't that bad. It all depends on how you design your code - C++ can be more efficient than C code even on 680x0, provided you designed it properly in the first instance (no pun intended).
-
Karlos wrote:
Jump tables can be much faster than switch/case constructs. The latter often end up like large nested if/else if, especially when the blocks of code are different sizes or if the range has gaps. The further down the switch you go, the longer it takes.
Yup, I though I was on the right path... the idea of having to evaluate 60000 odd possible conditions made me panic... The Jumptable/function look up table is the best option!
A function lookup table can be written instead. It totally doesnt matter if the functions are different sizes or if there are holes in the range of cases - provided your overall table is big enough for every known case value.
The function lookup table effectively takes a constant time to evaluate for any case.
"Constant time to evaluate" are the magic words here :-)
I use them in quite a few places where you might think a switch/case would do (I only use the switch/case when the range is small and the code for each case is so small that physically calling the function would take longer).
Have you go any nice and simple, I mean really dumb, examples of function look up's that I can compare and contrast with Piru's? I've never seen these before and I want to get a few in to my brain :-D
-
Jump tables can be much faster than switch/case constructs. The latter often end up like large nested if/else if, especially when the blocks of code are different sizes or if the range has gaps. The further down the switch you go, the longer it takes.
Decent compilers (and gcc is one such), compile switch/case statements down to a jump table for when there are no holes in the range of possibilities, or to a binary search tree in the other case (it prefers saving space, if the range is very sparse).
Therefore, you go from costant time to O(log n) time.
@bloodline
Using functions is perhaps cleaner, but can be overkill for the functions prologue/epilogue. If you are using gcc, you may want to use its "computed goto's" extension.
It works like this (notice the double ampersand!):
void *jmptable[] =
{
&&lab1,
&&lab2,
&&lab3,
...
&&labN
};
...
...
goto jmptable[foo];
returnpoint:
...
lab1:
...
goto returnpoint;
lab2:
...
goto returnpoint;
lab3:
...
goto returnpoint;
...
labN:
...
goto returnpoint;
This is pretty much like the GOSUB statement from BASIC.
-
bloodline wrote:
If you look at my original post, I could have written:
switch(a) {
case 1 : Do_something;
case 2 : Do_Something_else;
case 3 : Do_something_different;
}
I'm not sure why I chose to do it the other way... except maybe to eliminate any conditionality from the code.
If that's really what it would look like, save your efforts for other parts of the code :) Gcc will compile that down to a jumptable.
-
falemagn wrote:
Jump tables can be much faster than switch/case constructs. The latter often end up like large nested if/else if, especially when the blocks of code are different sizes or if the range has gaps. The further down the switch you go, the longer it takes.
Decent compilers (and gcc is one such), compile switch/case statements down to a jump table for when there are no holes in the range of possibilities, or to a binary search tree in the other case (it prefers saving space, if the range is very sparse).
Well you know that I'll be using gcc (I don't think I have any choice!) :-D ;-)
Would I have to specify an optimisation flag to get it to generate the table, or is it a standard feature?
Therefore, you go from costant time to O(log n) time.
@bloodline
Using functions is perhaps cleaner, but can be overkill for the functions prologue/epilogue. If you are using gcc, you may want to use its "computed goto's" extension.
It works like this (notice the double ampersand!):
void *jmptable[] =
{
&&lab1,
&&lab2,
&&lab3,
...
&&labN
};
...
...
goto jmptable[foo];
returnpoint:
...
lab1:
...
goto returnpoint;
lab2:
...
goto returnpoint;
lab3:
...
goto returnpoint;
...
labN:
...
goto returnpoint;
This is pretty much like the GOSUB statement from BASIC.
That is pretty much what I was trying to do in my original example, I rather like the simplicity of it. Cheers Fabio!
-
@bloodline
Not that I can show you right now as I'm at work.
Basically the example I posted before the C++ thing is it really
typedef (*)();
So, imagine you were writing a z80 emulation...
typedef struct Z80_t {
/* register set, etc etc */
} Z80;
typedef void (*Z80InstFn)(Z80*); /* z80 instruction handler */
/* z80 instruction handling functions */
static void z80_NOP(Z80* cpu)
{
cpu->regPC++;
}
static void z80_LD_BC_NN(Z80* cpu)
{
cpu->regB = cpu->regPC[2];
cpu->regC = cpu->regPC[1];
cpu->regPC+=3;
}
/* and lots more! */
static Z80InstFn instTable[256] = {
/* all the function names (& is optional) */
&z80_NOP,
&z80_LD_BC_NN,
....all the others...
&z80_RST_38H
};
...and you might have an execute function like this...
void execute(Z80* cpu, int maxInsts)
{
while (maxInsts--) {
instTable[cpu->regPC](cpu);
}
}
-
falemagn wrote:
bloodline wrote:
If you look at my original post, I could have written:
switch(a) {
case 1 : Do_something;
case 2 : Do_Something_else;
case 3 : Do_something_different;
}
I'm not sure why I chose to do it the other way... except maybe to eliminate any conditionality from the code.
If that's really what it would look like, save your efforts for other parts of the code :) Gcc will compile that down to a jumptable.
Yeah, my code really would look like that. I was thinking about some kind of command parser, where commands could quite happily be a simple range of numbers, with no gaps.
-
falemagn wrote:
bloodline wrote:
If you look at my original post, I could have written:
switch(a) {
case 1 : Do_something;
case 2 : Do_Something_else;
case 3 : Do_something_different;
}
I'm not sure why I chose to do it the other way... except maybe to eliminate any conditionality from the code.
If that's really what it would look like, save your efforts for other parts of the code :) Gcc will compile that down to a jumptable.
I have seen hundreds of instances to the contrary with gcc on about 4 different platforms. It depends on far too many variables outside your control wether or not a partuclar switch/case will be reduced to a jump table or not.
-
Karlos wrote:
@bloodline
Not that I can show you right now as I'm at work.
Basically the example I posted before the C++ thing is it really
typedef (*)();
So, imagine you were writing a z80 emulation...
typedef struct Z80_t {
/* register set, etc etc */
} Z80;
typedef void (*Z80InstFn)(Z80*); /* z80 instruction handler */
/* z80 instruction handling functions */
static void z80_NOP(Z80* cpu)
{
cpu->regPC++;
}
static void z80_LD_BC_NN(Z80* cpu)
{
cpu->regB = cpu->regPC[2];
cpu->regC = cpu->regPC[1];
cpu->regPC+=3;
}
/* and lots more! */
static Z80InstFn instTable[256] = {
/* all the function names (& is optional) */
&z80_NOP,
&z80_LD_BC_NN,
....all the others...
&z80_RST_38H
};
...and you might have an execute function like this...
void execute(Z80* cpu, int maxInsts)
{
while (maxInsts--) {
instTable[cpu->regPC](cpu);
}
}
That's grand! and a good example that fits in well with my thoughts.
-
True, Karlos true, but you'll have to design it damn efficient to make it really fly.. Were IMHO, C is easier to make efective and faster.. were if you use some of the more heavy mechanisms in c++ you often enter the badlands at first.. But well any god programmer can make anything fly on asorted platforms in different languages... :-)
-
LP wrote:
True, Karlos true, but you'll have to design it damn efficient to make it really fly.. Were IMHO, C is easier to make efective and faster.. were if you use some of the more heavy mechanisms in c++ you often enter the badlands at first.. But well any god programmer can make anything fly on asorted platforms in different languages... :-)
Steer clear of RTTI, template and STL abuse and you are fine ;-)
-
bloodline wrote:
that fits in well with my thoughts.
I thought it might ;-)
-
:lol:
containers and the dear templates you say, eh?
.. yeah, I'll think of it next time the Amiga is on for coding.. :-)
-
Templates are great - used properly. Used improperly (and the STL itself in places is a prime example) it's a bloatmongers nightmare...
-
@falemagn:
Funny how we are supposed to unlearn all these dirty goto methods, only to reintroduce them via the backdoor route of the compiler... Yes, I realise what a compiler creates is not meant for mortal eyes, but still :-).
-
Ok, good. Now I've got one gripe sorted out... What about the adition of 5 new keywords/functions and a new datatype to standard C/C++ :-D
You're gonna love this.. :lol: I have far too much free time...
I'm having a horrid time with multithreading... mutexs and semaphores take far more work that I can be bothered with... soooooo, I want to change C/C++ slightly to make my life easier.
My new keywords/functions are:
lock()
unlock()
spawn(function pointer)
int test() (returns true or false)
void wait(flag)
My new datatype is:
flag
Today is the day of reinventing the wheel... :-)
The idea is that all global variable variables/data structures are shared between all threads. But in order to use one, you have to lock() it first (all global variables and data structures must be locked before use). lock() is a synchronous function, so if you lock a global variable which is already locked the prorgam flow will stop until it's unlocked(). You should unlock() the variable after you have used it.
The test() function will test a global variable. If the variable is already locked the function will fail (return false). If it's unlocked, it will be locked() by the function, and will have to be unlocked() for any other thread to use it.
spawn() is simple, it just spawns the function as a thread.
wait() is synchronous and will wait for a flag to be set to true, if the flag is already true it will set the flag to false and continue execution.
Ok bored now... back to my jumptable thing :-D
-
No likey mutex/semaphore?
Check the totally piece of p*ss multithreading example here (http://megaburken.net/~karlos/demos/)
/self->trumpet->blow();
-
Karlos wrote:
No likey mutex/semaphore?
Check the totally piece of p*ss multithreading example here (http://megaburken.net/~karlos/demos/)
/self->trumpet->blow();
Yeah, I've already had a little sit down with your code, I'm still not convinced that the amount of effort I put in is worth what I'd get out! :-D
-
Someone needs to do the AROS version of the framework core. And by that, I mean really low level, using the most direct API's possible....
If only I knew someone insanely into AROS...
-edit-
I'm still not convinced that the amount of effort I put in is worth what I'd get out!
Thanks... I think :lol:
-
I have seen hundreds of instances to the contrary with gcc on about 4 different platforms. It depends on far too many variables outside your control wether or not a partuclar switch/case will be reduced to a jump table or not.
If the compiler has the option to use a jump table and it choses not to, obviously it must have some good reasons.
You'll find that indirect jumps often are worth, in terms of cpu cycles, as much as 2 or 3 consecutive if/else statements.
If the range of possibilities is very narrow, gcc is likely to emit a series of if/else stamements. If, on the other hand, the list of possibilities contains more than a certain platform-specific minimum number of items, then a jumptable will be output.
The compiler makes always the right choice for the target cpu. Bugs apart, of course.
-
falemagn wrote:
If the compiler has the option to use a jump table and it choses not to, obviously it must have some good reasons.
Agreed, but it is not always the case that it's reasons are correct.
GCC is touted as the best code generator etc, but frankly it isn't even remotely true. It is a great compiler, with many source and RTL level optimisations, but the implementation quality of the codegenerator itself varies considerably from platform to platform. This is the trade off you pay for massive portability.
You'll find that indirect jumps often are worth, in terms of cpu cycles, as much as 2 or 3 consecutive if/else statements.
Again this totally depends on the platform.
-
Anyway, we are getting off topic ;-)
Getting back to bloodlines case
If he is doing what I think he is doing, a manual jump table makes more sense given the number of cases he will have. It's easier to manage a large jump table than it is to manage a large switch statement...
-
Karlos wrote:
Anyway, we are getting off topic ;-)
Getting back to bloodlines case
If he is doing what I think he is doing, a manual jump table makes more sense given the number of cases he will have. It's easier to manage a large jump table than it is to manage a large switch statement...
I dunno... The switch/case structure seems quite manageable, and certainly requires less coding... Also I rather like the idea of being able to use functions rather than gotos.
-
Again this totally depends on the platform.
No one said otherwise. Generally speaking, though, an indirect jump spoils branch prediction, regardless of the platform, and may cause cache misses. Depending on the target cpu, the compiler will always make the right choice.
-
Karlos wrote:
Anyway, we are getting off topic ;-)
Getting back to bloodlines case
If he is doing what I think he is doing, a manual jump table makes more sense given the number of cases he will have. It's easier to manage a large jump table than it is to manage a large switch statement...
I don't see how. Either you have to build a table with the addresses of the functions/labels, or you need to build a case/switch table. Practically the same amount of code needs to be typed in, with the advantage that, in the case of the switch/case structure, you don't risk putting an address at the wrong place.
-
Whichever route you choose, a good compiler can only manage bad code so far ;-)
If you use a large case/switch, make sure your handling code is nicely inline and not just trampolining into functions, otherwise you have the overhead of the switch/case (regardless of how it is implemented by the compiler) and the function call overhead on top. If you do start to find you are writing more and more functions for handling the cases, then a jump table is better in the first instance.
As I said at the start, use a switch/case when the range is unbroken and the code required to perform the operation for each option is small compared to the overhead of calling a function. This way, as falemagan says, the compiler will be able to best optimise your code.
-
falemagn wrote:
I don't see how. Either you have to build a table with the addresses of the functions/labels, or you need to build a case/switch table. Practically the same amount of code needs to be typed in, with the advantage that, in the case of the switch/case structure, you don't risk putting an address at the wrong place.
Try debugging a logical error in a switch case with several thousand case values - even little things like a missing break. The sheer size of the construct makes it difficult to see what you are doing.
-
Karlos wrote:
If you use a large case/switch, make sure your handling code is nicely inline and not just trampolining into functions, otherwise you have the overhead of the switch/case (regardless of how it is implemented by the compiler) and the function call overhead on top.
Uh, how's that different from an hand-made jumptable? At least with the switch/case statements the compiler _does_ have the possibility of inlining.
As I said at the start, use a switch/case when the range is unbroken and the code required to perform the operation for each option is small compared to the overhead of calling a function. This way, as falemagan says, the compiler will be able to best optimise your code.
The compiler will always optimize your code, making always the right choice. It's got more knowledge than you about the target cpu. Unless you can find a patological case, gcc produces the best possible code for switch/case statements.
-
Karlos wrote:
Whichever route you choose, a good compiler can only manage bad code so far ;-)
:lol: Yeah, I probably shouldn't have been worring about this.
If you use a large case/switch, make sure your handling code is nicely inline and not just trampolining into functions, otherwise you have the overhead of the switch/case (regardless of how it is implemented by the compiler) and the function call overhead on top. If you do start to find you are writing more and more functions for handling the cases, then a jump table is better in the first instance.
What about some inline code... and some functions...?
As I said at the start, use a switch/case when the range is unbroken and the code required to perform the operation for each option is small compared to the overhead of calling a function. This way, as falemagan says, the compiler will be able to best optimise your code.
The Range can be unbroken, that's not a problem. Some options will be small, some will be large...
Suppose for a moment we have an event stream, some events can be handled simply by updating a variable... others would require calling several functions....
-
Try debugging a logical error in a switch case with several thousand case values - even little things like a missing break. The sheer size of the construct makes it difficult to see what you are doing.
Again, how's that different from, say, putting the function's address at the wrong place in the table?
Moreover, if you have a switch/case construct where you simply jump to functions, you don't need 'break' at all, use 'return' instead.
-
falemagn wrote:
Uh, how's that different from an hand-made jumptable? At least with the switch/case statements the compiler _does_ have the possibility of inlining.
Never mind, I think we are talking cross purposes here :-)
I am saying that if you make a large switch case, that construct itself ends up as a jump table, where you index it to reach the code address of a given case handling code block. This is similar to the function call overhead of a handmade jump table aleady.
If the code for the case is all inline once you get theree, all is good. If, on the other hand, you simply trampoline into another function, you have made 2 jumps to reach the code (the case/switch jump and then the function call it trampolined into).
Unless your function definitions are visible in the same translation unit as the case/switch construct during compilation there is no guarentee they will be inlined.
-
Again, how's that different from, say, putting the function's address at the wrong place in the table?
It isn't different - but it is easier to find when looking at the code for it.
-
@bloodline
The Range can be unbroken, that's not a problem. Some options will be small, some will be large...
Suppose for a moment we have an event stream, some events can be handled simply by updating a variable... others would require calling several functions....
My personal preference is for a handmade jumptable for this kind of thing - like the z80 example earlier.
However, as gcc will try to do that for you anyway in your switch/case, go for that. The only advice I can give is to try to inline those functions that or ensure the compiler can by making their definition visible to the switch/case.
(Actually, in the real z80 example, I went pure ASM for an emulate() core and had a literal jump table; the code was physically in the table so that every 128 bytes was a block of code to emulate an instruction with the code needed to calculate the next block to jmp to on the end of each handler, then nop padded to the next boundary.... but that is another story entirely ;-))
You will get larger code, but less jumping about within it.
-
Suppose for a moment we have an event stream, some events can be handled simply by updating a variable... others would require calling several functions....
If you are updating same variable then switch/case is better... depending on code used GCC can produce code without conditional branches.
Regarding jump tables... it seems that it does not good for branch prediction...
-
itix wrote:
Regarding jump tables... it seems that it does not good for branch prediction...
Erm, didn't we just establish that the compiler typically optimises large case/switch by making jump tables :-?
-
@Bloodline
Don't listen to any of us. We are just confusing the issue :-D
Even if you write the most atrocious code ever, the compiler and CPU will get you out of it ;-)
-
Karlos wrote:
@Bloodline
Don't listen to any of us. We are just confusing the issue :-D
Even if you write the most atrocious code ever, the compiler and CPU will get you out of it ;-)
Hmmm... 2Ghz should get me out of it :lol:
-
@bloodline
Just scanning back over this thread and realised it looked like I was denigrating your coding ability, which I wasn't actually trying to do.
I was trying to point out that compilers are generally only able to optimize code for which it has no real idea what the purpose is, therefore if you choose a bad algorithm or control mechanism, all the compiler optimisation in the world won't help that much - it doesn't actually have any knowledge of what you are ultimately trying to accomplish.
There is a common trend to assume that you can approach something without much forethought and just let the compiler decide what to do. Just looking at some of the argument in this thread is interesting:
1) switch/case can be optimised by the compiler.
2) jump tables can be bad because of mis-prediction and cache misses.
Therefore one should always use the switch/case.
And yet, one of the main optimisations employed by the compiler for switch/case is to create a jump table to allow it to evaluate the switch in constant time, which is exactly what a jump table is for :-)
So, how bad is the original jump table idea really?
See my point?
It ultimately all depends on what you are planning to do with the code and what the desired runtime behaviour is, how likely repetetive calls to the same code are etc. Only you as the developer know these criteria.
-
Karlos wrote:
@bloodline
Just scanning back over this thread and realised it looked like I was denigrating your coding ability, which I wasn't actually trying to do.
My coding ability isn't that great, so I'm not offended :-D
I was trying to point out that compilers are generally only able to optimize code for which it has no real idea what the purpose is, therefore if you choose a bad algorithm or control mechanism, all the compiler optimisation in the world won't help that much - it doesn't actually have any knowledge of what you are ultimately trying to accomplish.
There is a common trend to assume that you can approach something without much forethought and just let the compiler decide what to do. Just looking at some of the argument in this thread is interesting:
1) switch/case can be optimised by the compiler.
2) jump tables can be bad because of mis-prediction and cache misses.
Therefore one should always use the switch/case.
WooHoo! the compiler does the hardwork, so I don't have too. I'm happy about that.
And yet, one of the main optimisations employed by the compiler for switch/case is to create a jump table to allow it to evaluate the switch in constant time, which is exactly what a jump table is for :-)
So, how bad is the original jump table idea really?
Well I got thinking about the Jumptable idea, as that is how I would do it in ASM, none of this fancy conditional m'larkie... :crazy:
See my point?
It ultimately all depends on what you are planning to do with the code and what the desired runtime behaviour is, how likely repetetive calls to the same code are etc. Only you as the developer know these criteria.
I just want to be able to parse a stream of command to control some settings in real time, where each command must take the same amount of time...
-
bloodline wrote:
I just want to be able to parse a stream of command to control some settings in real time, where each command must take the same amount of time...
That's guarenteed for a jump table and highly likely for a packed (meaning no gaps) switch/case construct.
Pick one and get on with it :-D
So, this command stream. Would it resemble 680x0 opcodes? You did say about 64K of them at some point?
-
Karlos wrote:
bloodline wrote:
I just want to be able to parse a stream of command to control some settings in real time, where each command must take the same amount of time...
That's guarenteed for a jump table and highly likely for a packed (meaning no gaps) switch/case construct.
Pick one and get on with it :-D
So, this command stream. Would it resemble 680x0 opcodes? You did say about 64K of them at some point?
Actually I was thinking more about platform independant DSP processing... but there would be a similarity to 68K, yes... as in 16bit instrection word length, etc :-D
-
Actually I was thinking more about platform independant DSP processing... but there would be a similarity to 68K, yes... as in 16bit instrection word length, etc
None of the techniques above will ensure anything runs in a fixed time - you can't know at compile time what is cached or which branches are mispredicted.
You shold however be able to figure out the worst case scenario and base your timing around that (leaving some CPU time free just in case) - that's the only way you *know* everything will run on time and I believe that's the way it's done in "hard" real time OSs.
I'm in the process of designing a modular synth which has to work in real time so I'm going to have similar problems soon...
I'll probably go for "soft" real time - let the user set the maximum number of voices. But just to make sure I'll be doing the important processing in very simple C and using AltiVec extensively.
-
We were talking about constant time in the code rather than strict "fixed time". Of course nothing can ever run truly fixed time on any machine with a cache/branch predictions/etc, as you say.
However, we can say the complexity of taking a from a jump table is independent of the position in the table.
A non-jump table based switch/case (ie those that are equivalent to nested if/else if) has a (usually linear) dependency on the position in the construct.
-
A non-jump table based switch/case (ie those that are equivalent to nested if/else if) has a (usually linear) dependency on the position in the construct.
You can also do it with if/else if you can base your condition on a number, Pain in the ar$e to maintain and not terribly readable though.
if (x<2)
{
if (x == 0)
{
// do 0;
}
else
{
// do 1;
}
else
}
else // x is greater than 2
{
if (x == 2)
{
// do 2;
}
else
{
// do 3;
}
else
}
// Naa, stick to case statements....
Um, it's lost all the indentation.
-
minatorb wrote:
You can also do it with if/else if you can base your condition on a number, Pain in the ar$e to maintain and not terribly readable though.
I think this is precisely what bloodline was trying to avoid doing. This kind of construct is really bad unless you have a small range for x and some values are statistically much more likely than others (put those first).
-
1) switch/case can be optimised by the compiler.
2) jump tables can be bad because of mis-prediction and cache misses.
Therefore one should always use the switch/case.
And yet, one of the main optimisations employed by the compiler for switch/case is to create a jump table to allow it to evaluate the switch in constant time, which is exactly what a jump table is for :-)
You don't seem to get the point... the jumptable, regardless of the fact it spoils branch prediction, is still the fastest approach from a certain number of choices on.
If you have only an handfull (processor-specific) number of choices, the compiler will emit if/else code. The compiler does that automatically, depending on the relative average costs of the jumptable vs if/else on the basis of the number of available case choices.
-
Karlos wrote:
minatorb wrote:
You can also do it with if/else if you can base your condition on a number, Pain in the ar$e to maintain and not terribly readable though.
I think this is precisely what bloodline was trying to avoid doing. This kind of construct is really bad unless you have a small range for x and some values are statistically much more likely than others (put those first).
Yup exactly. I just need a constant time, so that the interpretor doesn't suddenly change it's behavour, when the branch prediction kicks in, or the command stream uses a slightly different algorythm etc...
I started with the idea of a modular synth, which allowed me to have defined "modules" (actually implemented using C++ objects) which performed a particular action on an audio signal... Ocilator, filter, amplifier etc, which could then be arranged in any order to suite the users needs. While this idea is the most efficient in terms of CPU usage, it is limited in terms of usage, having to hard code the modules etc...
My idea now is to have a simple DSP interpreter which can be programmed to perform any action using a platform independant instruction set. This way the user only needs to string together a single type of DSP object to perform any given action. It also allows several operations to be performed in one step. Ultimately the DSP object should actually be a JIT compiler, but that can be though about if my idea is actually a valid one, since it's an oop design, upgrading that would not be a problem.
-
A non-jump table based switch/case (ie those that are equivalent to nested if/else if) has a (usually linear) dependency on the position in the construct.
Any decent compiler will emit code that does a binary search over the possible choices, hence it's O(log n), which means that, in the worse case, that is a totally sparse set of choices which spans the whole int domain[1], assuming an integer is 32 bit, you get a maximum of 32 comparisons. That's of course far from being the average case. The typical scenario is 3 or 4 comparisons.
[1] That means it comprises of 2^31, that is 2Giga, choices.
-
Now let's muddle the topic even more.....
@Bloodline
Is this virtual DSP supposed to work on the audio-stream in real-time (while it is being played), or will it modify audio-files ?
In the 1st case you would need some sort of external/reliable clock as you never know at what clock the CPU is running, and what other SW sucks at it.
In the 2nd case timing would be pretty much irrelvant (your not planning any parallel-computing I assume).
If timing is important, you can just forget about JIT (since translation takes time and it takes it while the DSP-program is running). You might use some sort of pre-compiling, but I how doubt that you will get much speed-benefit over the func-table.
-
Kronos wrote:
Now let's muddle the topic even more.....
@Bloodline
Is this virtual DSP supposed to work on the audio-stream in real-time (while it is being played), or will it modify audio-files ?
Real time
In the 1st case you would need some sort of external/reliable clock as you never know at what clock the CPU is running, and what other SW sucks at it.
A simple 1ms clock is what was I planning... The Audio stream would be cut up into 1/1000th of a second frames with an expiry time... if the frame reaches the play buffer after it's time limit has expired it would be dropped, this causes clicks and pops if the the machine can't process the frames in time... in which case the user would need to specify larger frames.
In the 2nd case timing would be pretty much irrelvant (your not planning any parallel-computing I assume).
I'm struggeling with Multithreading... it just takes too long to set up. And I'm not even doign anything difficult, just using SDL's multithreading functions.
It would be nice to have certain parts of the program asynchronous... but I can work around that.
If timing is important, you can just forget about JIT (since translation takes time and it takes it while the DSP-program is running). You might use some sort of pre-compiling, but I how doubt that you will get much speed-benefit over the func-table.
Then I'll forget about JIT, it doesn't matter. I'm just doing this to improve my understanding of the problem. It's a hobby, people! :-D
-
Since you alllrady cuted the stream into pieces, with "best used by" date, you shouldn't even have to worry about timing at all.
The question is just if the maximum time a filter needs is still in time on the specific computer in the a specific load-situation is still within limits. Thats something decided by the user and the coder of the filter. If a package is done faster, no prob....
So the question is, how many free cycles you still have after applying the effect to every sound-bit of the stream, and wether that is enough for decoding.
If not, you'll need pre-compiling (no decoding during run-time).
Hah, now I'm quite happy again doing the completly timing-issue free stuff I'm doing :-D
-
falemagn wrote:
You don't seem to get the point... the jumptable, regardless of the fact it spoils branch prediction, is still the fastest approach from a certain number of choices on.
:lol: That is the point I was making. I had assumed that bloodline had a very large number of choices (thousands) based on what he said earlier in the thread - ie it appeared to be an emulator core, with a case per opcode style affair and also he explicitly asked about how to implement jump tables in C.
I think you and I are having a communication barrier :-) I am certianly not arguing with you at all that if/else can be better than a jump table under the appropriate circumstances. This I am fully aware of, along with the factors that determine which is better for a given instance.
In this specific instance, with the assumption that a large number of case options (more than any if/else would efficiently cope with) and considering the fact that he asked how to make a handmade jump table but was unsure how, I simply answered the question.
The compiler's ability to judge the cost of if/else versus jump table (for bloodlines case) was irrelavent here because aside from the fact it was not what he asked, a switch/case construct would almost definately generate a jump table if he were to use a large number of cases (as already assumed).
Given this it comes down to style. I personally find large switch/case constructs extremely ugly and difficult to manage compared to a jumptable / index.
With that, can we please drop the issue now? It's getting silly.
-
Kronos wrote:
Since you alllrady cuted the stream into pieces, with "best used by" date, you shouldn't even have to worry about timing at all.
The question is just if the maximum time a filter needs is still in time on the specific computer in the a specific load-situation is still within limits. Thats something decided by the user and the coder of the filter. If a package is done faster, no prob....
Each frame is on 1ms in size so the it must get fromt the start to the end of the effects chain in 1ms, or it gets dropped. I would probably add a check at every DSP, to make sure the Frame is still valid before wasting CPU time on it.
If the user specifies larger frames it has more time to pass through the effects chain... but the longer it takes to pass through the DSP.
The signal starts life with a DSP object, which either mathmatically creates the signal or reads a sample buffer, or maybe a bit of both ;-), passes through other DSP objects (which act upon the frame and then pass it on) and ends at the play buffer, from which the DACs play the sound.
The code for the DSP objects is not really time dependant, only to say that it must execute as fast as possible and behave the same each time.
My idea for 1ms frame comes from the idea that each audio signal is 16bit and sampled at 44100Hz... so each frame is about 86kb... this easily fits into a modern L2 cache and probably some L1 caches (though my Athlon64 only has a 64kb L1 data cache). Now if the processor concentrates on 1 audio stream at a time, it will spend pretty much all it's time working from the cache.
Now I need to get the DSP interpretor to fit into the L1 cache, so that Main Memory access is minimal.
So the question is, how many free cycles you still have after applying the effect to every sound-bit of the stream, and wether that is enough for decoding.
That's a good question, tests have shown that my old Athlon 600Mhz could pump a single "1ms Frames" through about 12 hardcoded Lowpass filter objects in 1ms, though trying to get that Data to the Sound card to about 50ms. Obvously a virtual DSP object would be much slower. But speed isn't an issue as I now have a much better CPU :-D
If not, you'll need pre-compiling (no decoding during run-time).
Hah, now I'm quite happy again doing the completly timing-issue free stuff I'm doing :-D
:-D Nah, this is more fun.
-
Karlos wrote:
falemagn wrote:
You don't seem to get the point... the jumptable, regardless of the fact it spoils branch prediction, is still the fastest approach from a certain number of choices on.
:lol: That is the point I was making. I had assumed that bloodline had a very large number of choices (thousands) based on what he said earlier in the thread - ie it appeared to be an emulator core, with a case per opcode style affair and also he explicitly asked about how to implement jump tables in C.
I think you and I are having a communication barrier :-) I am certianly not arguing with you at all that if/else can be better than a jump table under the appropriate circumstances. This I am fully aware of, along with the factors that determine which is better for a given instance.
In this specific instance, with the assumption that a large number of case options (more than any if/else would efficiently cope with) and considering the fact that he asked how to make a handmade jump table but was unsure how, I simply answered the question.
The compiler's ability to judge the cost of if/else versus jump table (for bloodlines case) was irrelavent here because aside from the fact it was not what he asked, a switch/case construct would almost definately generate a jump table if he were to use a large number of cases (as already assumed).
Given this it comes down to style. I personally find large switch/case constructs extremely ugly and difficult to manage compared to a jumptable / index.
With that, can we please drop the issue now? It's getting silly.
I must say from my point of view, watching you two argue on the same side has been an huge source of entertainment... I was going to say something but I decided to put a sock in it ;-)
WooHoo... an impressive 60 odd posts, dominated by Karlos and Bloodline before the sock showed up :-P
-
That is the point I was making. I had assumed that bloodline had a very large number of choices (thousands) based on what he said earlier in the thread - ie it appeared to be an emulator core, with a case per opcode style affair and also he explicitly asked about how to implement jump tables in C.
I was commening on the part where you said, more or less "so why if jumptables are that bad the compiler still builds them?". I know perfectly well that that's not what Matt asked, I've answered to his original post already, that was an answer to your post. I thought it was clear.
-
Anyway, cheers for your help guys!
I've learned a lot of cool stuff.
a) How to make a Jumptable in C
b) Function pointer tables
c) And that gcc will try and get the best out of the code, I certainly feel much more comfortable using switch/case structures now :-D
-
falemagn wrote:
I was commening on the part where you said, more or less "so why if jumptables are that bad the compiler still builds them?". I know perfectly well that that's not what Matt asked, I've answered to his original post already, that was an answer to your post. I thought it was clear.
Ah, that was supposed to be rhetorical :-D I wasn't actually asking a question and I never expected anybody to answer it :-)
Anyhoo, we are agreed, the compiler knows where to do if/else versus jump table for a switch/case construct :lol:
-
More food for thought:
I don't know much about DSPs, but....
One filter-function could easily consist out of hundred commands.
The sequence would need to be called for every word of the stream (at
the same fequence as the sampling-frequency).
Could result into 10000 and more commands to be decoded for each 10ms
part.
But since you don't plan to simulate a real DSP, you could create one
that applies commands to every sample in the 10ms-part.
Commands that has commands more sophistacted that the typical
RISC-like DSP-opcodes.
And/or make sure that the interaction between 2 opcodes is clearly
defined, allowing the translation-blocks to be just cued together,
with the right add filled in for any conditional possible (that would
pretty much be the pre-complile case).
-
Kronos wrote:
More food for thought:
I don't know much about DSPs, but....
One filter-function could easily consist out of hundred commands.
The sequence would need to be called for every word of the stream (at
the same fequence as the sampling-frequency).
Could result into 10000 and more commands to be decoded for each 10ms
part.
But since you don't plan to simulate a real DSP, you could create one
that applies commands to every sample in the 10ms-part.
Commands that has commands more sophistacted that the typical
RISC-like DSP-opcodes.
And/or make sure that the interaction between 2 opcodes is clearly
defined, allowing the translation-blocks to be just cued together,
with the right add filled in for any conditional possible (that would
pretty much be the pre-complile case).
I though I would implement a good, but simple general purpose instruction set, and then have some "Application Specific" SIMD instructions which perform a sequence of operations suited to Audio processing (Filtering, amplifying, modulating, attenuating, mixing etc) and acting over a large area (or even the entire frame).
This way, the Audio DSP would be as fast as possible, but each object would still be able to be programmed to perform tasks I hadn't thought of.
My instruction set would be 16bit, and use an 8bit opcode (256 instructions), with 2 4bit operands (to allow 16 registers, chosen because I'm a 68k fan, the AMD64 has 16regs and that amount easilly fits into the PPC).
Also I'll have a special "immediate" intruction to allow the loading of immediate data.
-
What's wrong with writing filters in C/C++/whatever other compiled language?
If realtime usage is your goal, that is.
Of course if you only provide a limited set of things your interpreded filters can do, then you might achieve realtime capabilities anyway.
-
Why are you using 16 bit values for audio? Pro audio stuff uses 32 bit floating point or even 64 bit FP these days.
As for registers remember x86 CPUs use rename reisters. You can't access them directly but you can indirectly just allocate a load of variables in C and work on them.
You can do the same trick with the cache, you can test this by moving big chunks of data around using loads of variables - I tired this years ago on my PC and the speed difference was *amazing*.
-
I don't think he ever said that the audio data itself would be 16-bit. Just that the instruction set for his virtual DSP would use 16-bit opcodes. Much like how the SuperH series uses 16-bit opcodes even though it can work on 32 and 64-bit floating point numbers (well certain members of the SuperH series anyway).
Of course 16-bit opcodes can be quite limiting if you have a fixed width instruction set. There's a reason why most RISC architectures use 32-bit opcodes.
-
falemagn wrote:
What's wrong with writing filters in C/C++/whatever other compiled language?
If realtime usage is your goal, that is.
Of course if you only provide a limited set of things your interpreded filters can do, then you might achieve realtime capabilities anyway.
I've done that, it's very fast, but not very interesting and litmits you to the algorythm you choose when you wrote the program. The Idea of a virtual DSP sounds like more fun. Fun being the goal :-)
minatorb wrote:
Why are you using 16 bit values for audio? Pro audio stuff uses 32 bit floating point or even 64 bit FP these days.
As for registers remember x86 CPUs use rename reisters. You can't access them directly but you can indirectly just allocate a load of variables in C and work on them.
You can do the same trick with the cache, you can test this by moving big chunks of data around using loads of variables - I tired this years ago on my PC and the speed difference was *amazing*.
Yeah, my Edirol FA-101 actually runs at 192Khz @ 24bit, but as I'm just developing this idea I wanted to keep it simple and fast as possible, while keeping it useable. CD quality seems like a good trade off.
MskoDestny wrote:
I don't think he ever said that the audio data itself would be 16-bit. Just that the instruction set for his virtual DSP would use 16-bit opcodes. Much like how the SuperH series uses 16-bit opcodes even though it can work on 32 and 64-bit floating point numbers (well certain members of the SuperH series anyway).
Of course 16-bit opcodes can be quite limiting if you have a fixed width instruction set. There's a reason why most RISC architectures use 32-bit opcodes.
Yup, while I did say that my tests were done using 16bit audio data, there is no reason why it can't be user selectable.
And you are right that I have pretty much settled on a fixed width 16bit instruction word lenght, and practally no addressing modes, everything will be "register to register". This should give me enough room for a simple general purpose set + the application specific instruction required to get maximum speed.
-
I've done that, it's very fast, but not very interesting and litmits you to the algorythm you choose when you wrote the program.
Err... ever heard of the work "plugin"? :-P
-
falemagn wrote:
I've done that, it's very fast, but not very interesting and litmits you to the algorythm you choose when you wrote the program.
Err... ever heard of the word "plugin"? :-P
You honestly expect me to figure out how to implement a plugin system?!?! :-D
I prefer the virtual DSP idea anyway, as it is platform independant and is not limited to this project ;-)
-
@bloodline
You know, function jumpt tables are one way to implement a plug in mechansim ;-)
Unlike switch/case constructs (booo rubbish) a function table can me modified at runtime :-)
-
You honestly expect me to figure out how to implement a plugin system?!?!
Uh... you have functions, you have a functiontable, you take the address of the functions and put them in the table...
Amiga-like shared libraries offer plugin functionality for free :-)
I prefer the virtual DSP idea anyway, as it is platform independant and is not limited to this project
You may just as well compile the functions on the fly. Have them written in C/C++, invoke the compiler, load them.
Fast and portable. Yes, you need to ship with or rely on a C compiler, but so what?
The other approach would be to compile your own scripting/opcode language. Unless you want to allow for self-modifying filters (:-o), compiling them will always give better perfomances than interpreting them.
-
falemagn wrote:
You honestly expect me to figure out how to implement a plugin system?!?!
Uh... you have functions, you have a functiontable, you take the address of the functions and put them in the table...
Amiga-like shared libraries offer plugin functionality for free :-)
Yeah, I had thought about that. But at the end of the day that is more hassle than I want to be bothered with :-)
I prefer the virtual DSP idea anyway, as it is platform independant and is not limited to this project
You may just as well compile the functions on the fly. Have them written in C/C++, invoke the compiler, load them.
Fast and portable. Yes, you need to ship with or rely on a C compiler, but so what?
The other approach would be to compile your own scripting/opcode language. Unless you want to allow for self-modifying filters (:-o), compiling them will always give better perfomances than interpreting them.
No the DSP won't have (write) access to the memory where the opcode language is stored, so no Self-modifying code. :-)
As mentioned eariler in the thread some kind of JIT would be fun, but that is later project :-)
Also the DSP could have basic functions (standard filters etc...) which would obviously be compiled into the interpretor.
-
The binary script interpreter may be slower than a native mechanism, but 1000x more fun to do :-)
-
Karlos wrote:
The binary script interpreter may be slower than a native mechanism, but 1000x more fun to do :-)
Well, I kind of think it distracts you from the real code, doesn't it? I mean, Matt should be concentrating on writing the "sound" part of the code, not the interpreter :-)
Anyway, just came across this SMALL (http://www.compuphase.com/small.htm) language. Quite interesting...
-
Well, I kind of think it distracts you from the real code, doesn't it? I mean, Matt should be concentrating on writing the "sound" part of the code, not the interpreter
He's doing this as a hobby so I think this is more of an intellectual excercise than anything else.
Anyway, just came across this SMALL language. Quite interesting...
If you want a small language do it properly (http://www.muppetlabs.com/~breadbox/bf/) :-D
It was even written on an Amiga!
-
minatorb wrote:
Well, I kind of think it distracts you from the real code, doesn't it? I mean, Matt should be concentrating on writing the "sound" part of the code, not the interpreter
He's doing this as a hobby so I think this is more of an intellectual excercise than anything else.
That's true. I have several Commercial and opensource software synth packages, that are far more powerful and useful than any program I have the ability or time to make.
So I thought I would try something innovative rather than useful :-)
Anyway, just came across this SMALL language. Quite interesting...
If you want a small language do it properly (http://www.muppetlabs.com/~breadbox/bf/) :-D
It was even written on an Amiga!
SMALL looks cool! :-o
I don't like BF... try writing a virtual DSP in it :-P
-
SMALL looks cool!
Indeed it does. And it gets compiled down to bytecode. It's precisely what you need, give it a look :-)