Author Topic: jump tables (-Edit- Eliminating conditional statements :-)) (Read 13667 times)

bloodline · « **Reply #14 from previous page:** February 21, 2005, 05:41:40 PM »

Quote

Karlos wrote:
@Bloodline

Don't listen to any of us. We are just confusing the issue :-D

Even if you write the most atrocious code ever, the compiler and CPU will get you out of it ;-)

Hmmm... 2Ghz should get me out of it :lol:

bloodline · « **Reply #15 on:** February 21, 2005, 06:59:15 PM »

Quote

Karlos wrote:
@bloodline

Just scanning back over this thread and realised it looked like I was denigrating your coding ability, which I wasn't actually trying to do.

My coding ability isn't that great, so I'm not offended :-D

Quote

I was trying to point out that compilers are generally only able to optimize code for which it has no real idea what the purpose is, therefore if you choose a bad algorithm or control mechanism, all the compiler optimisation in the world won't help that much - it doesn't actually have any knowledge of what you are ultimately trying to accomplish.

There is a common trend to assume that you can approach something without much forethought and just let the compiler decide what to do. Just looking at some of the argument in this thread is interesting:

1) switch/case can be optimised by the compiler.
2) jump tables can be bad because of mis-prediction and cache misses.

Therefore one should always use the switch/case.

WooHoo! the compiler does the hardwork, so I don't have too. I'm happy about that.

Quote

And yet, one of the main optimisations employed by the compiler for switch/case is to create a jump table to allow it to evaluate the switch in constant time, which is exactly what a jump table is for :-)

So, how bad is the original jump table idea really?

Well I got thinking about the Jumptable idea, as that is how I would do it in ASM, none of this fancy conditional m'larkie... :crazy:

Quote

See my point?

It ultimately all depends on what you are planning to do with the code and what the desired runtime behaviour is, how likely repetetive calls to the same code are etc. Only you as the developer know these criteria.

I just want to be able to parse a stream of command to control some settings in real time, where each command must take the same amount of time...

bloodline · « **Reply #16 on:** February 21, 2005, 09:03:51 PM »

Quote

Karlos wrote:
Quote

bloodline wrote:

I just want to be able to parse a stream of command to control some settings in real time, where each command must take the same amount of time...

That's guarenteed for a jump table and highly likely for a packed (meaning no gaps) switch/case construct.

Pick one and get on with it :-D

So, this command stream. Would it resemble 680x0 opcodes? You did say about 64K of them at some point?

Actually I was thinking more about platform independant DSP processing... but there would be a similarity to 68K, yes... as in 16bit instrection word length, etc :-D

bloodline · « **Reply #17 on:** February 22, 2005, 12:40:37 PM »

Quote

Karlos wrote:
Quote

minatorb wrote:

You can also do it with if/else if you can base your condition on a number, Pain in the ar$e to maintain and not terribly readable though.

I think this is precisely what bloodline was trying to avoid doing. This kind of construct is really bad unless you have a small range for x and some values are statistically much more likely than others (put those first).

Yup exactly. I just need a constant time, so that the interpretor doesn't suddenly change it's behavour, when the branch prediction kicks in, or the command stream uses a slightly different algorythm etc...

I started with the idea of a modular synth, which allowed me to have defined "modules" (actually implemented using C++ objects) which performed a particular action on an audio signal... Ocilator, filter, amplifier etc, which could then be arranged in any order to suite the users needs. While this idea is the most efficient in terms of CPU usage, it is limited in terms of usage, having to hard code the modules etc...

My idea now is to have a simple DSP interpreter which can be programmed to perform any action using a platform independant instruction set. This way the user only needs to string together a single type of DSP object to perform any given action. It also allows several operations to be performed in one step. Ultimately the DSP object should actually be a JIT compiler, but that can be though about if my idea is actually a valid one, since it's an oop design, upgrading that would not be a problem.

bloodline · « **Reply #18 on:** February 22, 2005, 01:09:32 PM »

Quote

Kronos wrote:
Now let's muddle the topic even more.....

@Bloodline
Is this virtual DSP supposed to work on the audio-stream in real-time (while it is being played), or will it modify audio-files ?

Real time

Quote

In the 1st case you would need some sort of external/reliable clock as you never know at what clock the CPU is running, and what other SW sucks at it.

A simple 1ms clock is what was I planning... The Audio stream would be cut up into 1/1000th of a second frames with an expiry time... if the frame reaches the play buffer after it's time limit has expired it would be dropped, this causes clicks and pops if the the machine can't process the frames in time... in which case the user would need to specify larger frames.

Quote

In the 2nd case timing would be pretty much irrelvant (your not planning any parallel-computing I assume).

I'm struggeling with Multithreading... it just takes too long to set up. And I'm not even doign anything difficult, just using SDL's multithreading functions.

It would be nice to have certain parts of the program asynchronous... but I can work around that.

Quote

If timing is important, you can just forget about JIT (since translation takes time and it takes it while the DSP-program is running). You might use some sort of pre-compiling, but I how doubt that you will get much speed-benefit over the func-table.

Then I'll forget about JIT, it doesn't matter. I'm just doing this to improve my understanding of the problem. It's a hobby, people! :-D

bloodline · « **Reply #19 on:** February 22, 2005, 02:49:44 PM »

Quote

Kronos wrote:
Since you alllrady cuted the stream into pieces, with "best used by" date, you shouldn't even have to worry about timing at all.

The question is just if the maximum time a filter needs is still in time on the specific computer in the a specific load-situation is still within limits. Thats something decided by the user and the coder of the filter. If a package is done faster, no prob....

Each frame is on 1ms in size so the it must get fromt the start to the end of the effects chain in 1ms, or it gets dropped. I would probably add a check at every DSP, to make sure the Frame is still valid before wasting CPU time on it.

If the user specifies larger frames it has more time to pass through the effects chain... but the longer it takes to pass through the DSP.

The signal starts life with a DSP object, which either mathmatically creates the signal or reads a sample buffer, or maybe a bit of both ;-), passes through other DSP objects (which act upon the frame and then pass it on) and ends at the play buffer, from which the DACs play the sound.

The code for the DSP objects is not really time dependant, only to say that it must execute as fast as possible and behave the same each time.

My idea for 1ms frame comes from the idea that each audio signal is 16bit and sampled at 44100Hz... so each frame is about 86kb... this easily fits into a modern L2 cache and probably some L1 caches (though my Athlon64 only has a 64kb L1 data cache). Now if the processor concentrates on 1 audio stream at a time, it will spend pretty much all it's time working from the cache.

Now I need to get the DSP interpretor to fit into the L1 cache, so that Main Memory access is minimal.

Quote

So the question is, how many free cycles you still have after applying the effect to every sound-bit of the stream, and wether that is enough for decoding.

That's a good question, tests have shown that my old Athlon 600Mhz could pump a single "1ms Frames" through about 12 hardcoded Lowpass filter objects in 1ms, though trying to get that Data to the Sound card to about 50ms. Obvously a virtual DSP object would be much slower. But speed isn't an issue as I now have a much better CPU :-D

Quote

If not, you'll need pre-compiling (no decoding during run-time).

Hah, now I'm quite happy again doing the completly timing-issue free stuff I'm doing :-D

:-D Nah, this is more fun.

bloodline · « **Reply #20 on:** February 22, 2005, 02:53:41 PM »

Quote

Karlos wrote:
Quote

falemagn wrote:

You don't seem to get the point... the jumptable, regardless of the fact it spoils branch prediction, is still the fastest approach from a certain number of choices on.

:lol: That is the point I was making. I had assumed that bloodline had a very large number of choices (thousands) based on what he said earlier in the thread - ie it appeared to be an emulator core, with a case per opcode style affair and also he explicitly asked about how to implement jump tables in C.

I think you and I are having a communication barrier :-) I am certianly not arguing with you at all that if/else can be better than a jump table under the appropriate circumstances. This I am fully aware of, along with the factors that determine which is better for a given instance.

In this specific instance, with the assumption that a large number of case options (more than any if/else would efficiently cope with) and considering the fact that he asked how to make a handmade jump table but was unsure how, I simply answered the question.

The compiler's ability to judge the cost of if/else versus jump table (for bloodlines case) was irrelavent here because aside from the fact it was not what he asked, a switch/case construct would almost definately generate a jump table if he were to use a large number of cases (as already assumed).

Given this it comes down to style. I personally find large switch/case constructs extremely ugly and difficult to manage compared to a jumptable / index.

With that, can we please drop the issue now? It's getting silly.

I must say from my point of view, watching you two argue on the same side has been an huge source of entertainment... I was going to say something but I decided to put a sock in it ;-)

WooHoo... an impressive 60 odd posts, dominated by Karlos and Bloodline before the sock showed up :-P

bloodline · « **Reply #21 on:** February 22, 2005, 03:41:44 PM »

Anyway, cheers for your help guys!

I've learned a lot of cool stuff.

a) How to make a Jumptable in C
b) Function pointer tables
c) And that gcc will try and get the best out of the code, I certainly feel much more comfortable using switch/case structures now :-D

bloodline · « **Reply #22 on:** February 22, 2005, 06:16:38 PM »

Quote

Kronos wrote:
More food for thought:

I don't know much about DSPs, but....

One filter-function could easily consist out of hundred commands.
The sequence would need to be called for every word of the stream (at
the same fequence as the sampling-frequency).

Could result into 10000 and more commands to be decoded for each 10ms
part.

But since you don't plan to simulate a real DSP, you could create one
that applies commands to every sample in the 10ms-part.

Commands that has commands more sophistacted that the typical
RISC-like DSP-opcodes.

And/or make sure that the interaction between 2 opcodes is clearly
defined, allowing the translation-blocks to be just cued together,
with the right add filled in for any conditional possible (that would
pretty much be the pre-complile case).

I though I would implement a good, but simple general purpose instruction set, and then have some "Application Specific" SIMD instructions which perform a sequence of operations suited to Audio processing (Filtering, amplifying, modulating, attenuating, mixing etc) and acting over a large area (or even the entire frame).

This way, the Audio DSP would be as fast as possible, but each object would still be able to be programmed to perform tasks I hadn't thought of.

My instruction set would be 16bit, and use an 8bit opcode (256 instructions), with 2 4bit operands (to allow 16 registers, chosen because I'm a 68k fan, the AMD64 has 16regs and that amount easilly fits into the PPC).

Also I'll have a special "immediate" intruction to allow the loading of immediate data.

bloodline · « **Reply #23 on:** February 23, 2005, 08:02:10 AM »

Quote

falemagn wrote:
What's wrong with writing filters in C/C++/whatever other compiled language?

If realtime usage is your goal, that is.

Of course if you only provide a limited set of things your interpreded filters can do, then you might achieve realtime capabilities anyway.

I've done that, it's very fast, but not very interesting and litmits you to the algorythm you choose when you wrote the program. The Idea of a virtual DSP sounds like more fun. Fun being the goal :-)

Quote

minatorb wrote:
Why are you using 16 bit values for audio? Pro audio stuff uses 32 bit floating point or even 64 bit FP these days.

As for registers remember x86 CPUs use rename reisters. You can't access them directly but you can indirectly just allocate a load of variables in C and work on them.

You can do the same trick with the cache, you can test this by moving big chunks of data around using loads of variables - I tired this years ago on my PC and the speed difference was *amazing*.

Yeah, my Edirol FA-101 actually runs at 192Khz @ 24bit, but as I'm just developing this idea I wanted to keep it simple and fast as possible, while keeping it useable. CD quality seems like a good trade off.

Quote

MskoDestny wrote:
I don't think he ever said that the audio data itself would be 16-bit. Just that the instruction set for his virtual DSP would use 16-bit opcodes. Much like how the SuperH series uses 16-bit opcodes even though it can work on 32 and 64-bit floating point numbers (well certain members of the SuperH series anyway).

Of course 16-bit opcodes can be quite limiting if you have a fixed width instruction set. There's a reason why most RISC architectures use 32-bit opcodes.

Yup, while I did say that my tests were done using 16bit audio data, there is no reason why it can't be user selectable.

And you are right that I have pretty much settled on a fixed width 16bit instruction word lenght, and practally no addressing modes, everything will be "register to register". This should give me enough room for a simple general purpose set + the application specific instruction required to get maximum speed.

bloodline · « **Reply #24 on:** February 23, 2005, 10:02:36 AM »

Quote

falemagn wrote:
Quote

I've done that, it's very fast, but not very interesting and litmits you to the algorythm you choose when you wrote the program.

Err... ever heard of the word "plugin"? :-P

You honestly expect me to figure out how to implement a plugin system?!?! :-D

I prefer the virtual DSP idea anyway, as it is platform independant and is not limited to this project ;-)

bloodline · « **Reply #25 on:** February 23, 2005, 11:35:21 AM »

Quote

falemagn wrote:
Quote

You honestly expect me to figure out how to implement a plugin system?!?!

Uh... you have functions, you have a functiontable, you take the address of the functions and put them in the table...

Amiga-like shared libraries offer plugin functionality for free :-)

Yeah, I had thought about that. But at the end of the day that is more hassle than I want to be bothered with :-)

Quote

Quote

I prefer the virtual DSP idea anyway, as it is platform independant and is not limited to this project

You may just as well compile the functions on the fly. Have them written in C/C++, invoke the compiler, load them.

Fast and portable. Yes, you need to ship with or rely on a C compiler, but so what?

The other approach would be to compile your own scripting/opcode language. Unless you want to allow for self-modifying filters (:-o), compiling them will always give better perfomances than interpreting them.

No the DSP won't have (write) access to the memory where the opcode language is stored, so no Self-modifying code. :-)

As mentioned eariler in the thread some kind of JIT would be fun, but that is later project :-)

Also the DSP could have basic functions (standard filters etc...) which would obviously be compiled into the interpretor.

bloodline · « **Reply #26 on:** February 24, 2005, 02:56:46 PM »

Quote

minatorb wrote:
Quote
Well, I kind of think it distracts you from the real code, doesn't it? I mean, Matt should be concentrating on writing the "sound" part of the code, not the interpreter

He's doing this as a hobby so I think this is more of an intellectual excercise than anything else.

That's true. I have several Commercial and opensource software synth packages, that are far more powerful and useful than any program I have the ability or time to make.

So I thought I would try something innovative rather than useful :-)

Quote

Quote
Anyway, just came across this SMALL language. Quite interesting...

If you want a small language do it properly :-D
It was even written on an Amiga!

SMALL looks cool! :-o

I don't like BF... try writing a virtual DSP in it :-P

Author Topic: jump tables (-Edit- Eliminating conditional statements :-)) (Read 13667 times)

bloodline

Re: jump tables

bloodline

Re: jump tables

bloodline

Re: jump tables

bloodline

Re: jump tables

bloodline

Re: jump tables

bloodline

Re: jump tables

bloodline

Re: jump tables

bloodline

Re: jump tables

bloodline

Re: jump tables

bloodline

Re: jump tables

bloodline

Re: jump tables

bloodline

Re: jump tables

bloodline

Re: jump tables