Author Topic: Simple Amiga audio question. (Read 8065 times)

Karlos · « **Reply #29 on:** June 07, 2010, 11:15:08 AM »

Quote from: bubblebobble;563238

Karlos' algorithm follows a so called "code-book based" approach.
From what I can see it has several drawbacks:

1. On 256/4bit is has more than 25% data overhead because he stores individual codebooks for each frame. Means instead of 4:1, you will get ~3:1.

It's actually ~3.5:1. If you download the zip file, you'll see for yourself.

Quote

2. How to find the "best" delta representants is not well defined and might need a lot of experimenting to find the optimal algoritm.

Absolutely, which is why I wanted to dig out the source.

Quote

3. The encoder has a very high complexity because it needs to do vector quantisation. Luckily only for the encoder.

It isn't massively complex and it could certainly be implemented more simply than I did. Once it was working satisfactorily for my needs, I didn't bother improving it, since playback was my main concern.

Quote

4. Doesn't make use of the assumption that the left and right channel of a stereo signal are correlated.

That's not actually true. In the stereo case, there is still only one delta table derived from both channels. The independent variation of left and right will produce similar spread of delta values when there is a strong correlation between them. An advantage here is that the correlation of delta value spread isn't really affected by phase differences between the channels.

Experimentation with mid and side band encoding did not produce any real difference from a QSNR perspective.

Quote

5. "Wastes" precious 4 bits in the 256/4bit case ;-)

Yeah, you got me there. Of course, a modified algorithm would simply pack an extra source sample into that and live with odd sized frames. I just happened to require an arrangement that decompressed an even number of samples per frame.

Quote

6. The choosen Deltas may cause an error of up to 4096 quantisation steps (=reduces to 4bit PCM quality) in the worst case. However, very unlikely of course, but unlike in ADPCM, the error is not correlated with the high frequencies, so the error is not "masked".

> Regarding my old code versus ADPCM, ADPCM is probably better but is more expensive to decode and less fault tolerant.
Decoding is cheaper than ADPCM yes, but why less fault tolerent? Because you have a "sync" Sample at the beginning of a block? ADPCM can be "resetted" every N samples too. It wouldn't even need an explicit sync value.

My experience with ADPCM decode was that corrupt data in the compressed stream can (but won't necessarily) knock the decode out permanently from that point onwards. Explicit audio frames mean that at most only the remaining samples in the current frame will be corrupted.

Karlos · « **Reply #30 on:** June 07, 2010, 12:04:04 PM »

Perhaps the title should be changed to "(Not quite so) Simple..."

bubblebobble · « **Reply #31 on:** June 07, 2010, 12:21:34 PM »

Quote from: Karlos;563244

It's actually ~3.5:1. If you download the zip file, you'll see for yourself.

original datasize =
samples*sizeof(word) =
256 * 2 =
512 Bytes

compressed datasize =
samples*sizeof(4bit) + table*sizeof(word) + startsample*sizeof(word) + header =
(256*0.5) + (16*2) + 2 + 2 =
164 Bytes

=> ratio = 512/164 = 3.1219...

Quote

It isn't massively complex and it could certainly be implemented more simply than I did.

Complexity in sense of information theory, not implemtation wise. Implementing VQ (at high complexity) is easy.

Quote

That's not actually true. In the stereo case, there is still only one delta table derived from both channels. The independent variation of left and right will produce similar spread of

Yes (phase) and no (correlation).
If you would encode both independently, you would need 2 tables. But you use only one which does not resolve the tiny differences between left and right. Actually it is very likely that the stereo difference is killed completely, because it needs to compete with the other deltas. So you are actually worse than 2 independent channels with 2 tables, which is again worse than 2 joint stereo channels.

Quote

> the correlation of delta value spread isn't really affected by phase differences between the channels.

This is cool, didn't think of the phase difference. You got me here ;-)
(but only if the phase is shifted consistantly accross all sinus waves, which is almost never the case in reality)

Quote

Experimentation with mid and side band encoding did not produce any real difference from a QSNR perspective.

Yes, not with this algorithm, because the sideband suffers to much from competing against the mid band channel. You would need 2 Tables. (can be used together, but "trained" seperately).

Quote

My experience with ADPCM decode was that corrupt data in the compressed stream can (but won't necessarily) knock the decode out permanently from that point onwards. Explicit audio frames mean that at most only the remaining samples in the current frame will be corrupted.

Nobody stopps you from putting ADPCM into chunks of N samples. If this should be a stream, you need to do this anyway, because otherwise you cannot join the stream at any position you want, which is the main point of being a "stream".
But putting ADPCM into chunks has zero overhead. You just reset the adaptation factor to something average all N samples. Of course, setting the factor to the best value is better and can be achieved by adding 1 extra byte each frame.

bubblebobble · « **Reply #32 on:** June 07, 2010, 12:28:11 PM »

@Karlos

What about a competition? ;-)

Let Thorham choose a short sample as uncompressed 16bit .wav (lets say 10 secs).
(maybe with some music, fading, voice, sound effect etc.)
We can both compress and uncompress it again, and let him decide what sounds better?
(as a proof, we both must provide the compressed file too, of course).

Karlos · « **Reply #33 on:** June 07, 2010, 12:31:16 PM »

Quote from: bubblebobble;563256

@Karlos

What about a competition? ;-)

Let Thorham choose a short sample as uncompressed 16bit .wav (lets say 10 secs).
(maybe with some music, fading, voice, sound effect etc.)
We can both compress and uncompress it again, and let him decide what sounds better?
(as a proof, we both must provide the compressed file too, of course).

LOL!

I don't care enough really. I am sure ADPCM is capable of better compression/quality than my method, but you are welcome to try if it amuses you to do so

The only reason I mentioned it is that he was asking for a low CPU usage playback routine and I happened to have one I'd made earlier that will run on a 7MHz 68000 (though the codec application is foolishly compiled for 020/fpu)

bubblebobble · « **Reply #34 on:** June 07, 2010, 12:38:11 PM »

C'mon, it is just for fun and curiosity. I implemented my own version of ADPCM4, which is much more lightweight than e.g. the G.711 standard implementation of ADPCM4. I'd like to see how it compares to other home-brewn codecs.
We could even messure encoding/decoding time of e.g. 10min of audio on a Classic Amiga.

Karlos · « **Reply #35 on:** June 07, 2010, 12:50:21 PM »

Quote from: bubblebobble;563253

original datasize =
samples*sizeof(word) =
256 * 2 =
512 Bytes

compressed datasize =
samples*sizeof(4bit) + table*sizeof(word) + startsample*sizeof(word) + header =
(256*0.5) + (16*2) + 2 + 2 =
164 Bytes

=> ratio = 512/164 = 3.1219...

I might have lied about stereo streams. Sample pairs may actually be treated as single entities, in which case, 256 sample pairs are compressed per frame, in which case it's 256*2*2 bytes of sample data. I wrote it in 1996 or so, so my memory is slightly hazy on the details. However:

Code: [Select]


-rwxr-xr-x 1 karlos karlos 1763382 2010-06-06 15:36 test_decoded.aiff
-rwxr-xr-x 1 karlos karlos  506328 2010-06-06 15:36 test.xdac

That's ~3.48:1 based on filesize alone

Also consider that the algorithm will happily use less than N bit per frame if it finds it can. Doesn't happen much in music, but speech is a different matter.

Until I dig out the source, it's hard to say.

Thorham · « **Reply #36 on:** June 07, 2010, 04:11:19 PM »

Quote from: bubblebobble;563238

> Rreally? But doesn't entropy mostly apply to entropy coders?
Everybody has to obey the laws of nature. If you want to or not.

Sure, but that doesn't mean I can't play with this.

Quote from: bubblebobble;563238

Plus, it is quite pathetic to think that what hundreds of PhD-Level researchers achieved over decades can be wiped away by a hobbyist in a few afternoon-sessions without even undestanding the fundamentals of information theory.

Yes, but I don't think that. I know that I don't know much at all about this, and because of this, I also know it's likely that I'll fail to do it better. However, this doesn't mean that I have to take what all the PhD's say at face value, without challenging it. Like I said, trying is half the fun, and if I stop doing that, then I might as well throw my whole programming hobby in to the bin.

Quote from: bubblebobble;563238

I'll post you the code for my ADPCM4 implementation soon. It should be fast enough for a vanilla A1200, and doesn't need a lot of stuff around it, just one function to encode and one to decode.

Sounds good, thanks

Quote from: bubblebobble;563238

You can use my Tool "AudioConverter" or Samplemanager to generate ADPCM files and listen to the result.

I'll certainly do that.

Quote from: bubblebobble;563238

Karlos' algorithm follows a so called "code-book based" approach.
From what I can see it has several drawbacks:

1. On 256/4bit is has more than 25% data overhead because he stores individual codebooks for each frame. Means instead of 4:1, you will get ~3:1.
2. How to find the "best" delta representants is not well defined and might need a lot of experimenting to find the optimal algoritm.
3. The encoder has a very high complexity because it needs to do vector quantisation. Luckily only for the encoder.
4. Doesn't make use of the assumption that the left and right channel of a stereo signal are correlated.
5. "Wastes" precious 4 bits in the 256/4bit case ;-)
6. The choosen Deltas may cause an error of up to 4096 quantisation steps (=reduces to 4bit PCM quality) in the worst case. However, very unlikely of course, but unlike in ADPCM, the error is not correlated with the high frequencies, so the error is not "masked".

Those are interesting considerations, although it wouldn't matter much, because of the 3:1 compression, which is already enough. It's a different story if the audio quality of ADPCM is better.

Quote from: Karlos;563251

Perhaps the title should be changed to "(Not quite so) Simple..."

Yes, it has become less than simple, hasn't it?

Quote from: bubblebobble;563256

@Karlos

What about a competition? ;-)

Let Thorham choose a short sample as uncompressed 16bit .wav (lets say 10 secs).
(maybe with some music, fading, voice, sound effect etc.)
We can both compress and uncompress it again, and let him decide what sounds better?
(as a proof, we both must provide the compressed file too, of course).

Sounds interesting. Perhaps I'll do that.

Quote from: bubblebobble;563260

Classic Amiga.

I have one here, so it's no problem to try this.

bubblebobble · « **Reply #37 on:** June 09, 2010, 12:46:07 PM »

Here is the C-port of the simpliest version of the ADPCM4 decoder for 16bit/mono. It packs exactly 1:4.
Check if this is easy and fast enough for your purposes. The code can be optimized, this here is tuned to be a clean example.
The audio quality degrades of course, but how much depends on the actual material.
There are also better versions, but the computation power is higher, and the result is only slightly better.

Code: [Select]


/* C-port from &quot;file format audio&quot; include, Amiblitz3 */ 

/* ADPCM4 defines */
#define FFA_ADPCM4_MAXFB 12
#define FFA_ADPCM4_MINFB 0
#define FFA_ADPCM4_LOW   0x0
#define FFA_ADPCM4_HI    0xF -5

/* context struct for all ADPCM/ADDPCM codecs */
typedef struct ffa_ADPCM_ctx {
  int cSampM;
  int cSampS;
  int cDeltaM;
  int cDeltaS;
  int fbM;
  int fbS;
  int fpos;
};


/* ADPCM4 lookup table (Fibonacci) */
int ffa_ADPCM4_LUT[0xF];
ffa_ADPCM4_LUT[0x00] =     0;
ffa_ADPCM4_LUT[0x01] =    -1;
ffa_ADPCM4_LUT[0x02] =     1;
ffa_ADPCM4_LUT[0x03] =    -2;
ffa_ADPCM4_LUT[0x04] =     2;
ffa_ADPCM4_LUT[0x05] =    -3;
ffa_ADPCM4_LUT[0x06] =     3;
ffa_ADPCM4_LUT[0x07] =    -5;
ffa_ADPCM4_LUT[0x08] =     5;
ffa_ADPCM4_LUT[0x09] =    -8;
ffa_ADPCM4_LUT[0x0a] =     8;
ffa_ADPCM4_LUT[0x0b] =   -13;
ffa_ADPCM4_LUT[0x0c] =    13;
ffa_ADPCM4_LUT[0x0d] =   -21;
ffa_ADPCM4_LUT[0x0e] =    21;
ffa_ADPCM4_LUT[0x0f] =   -34;

/* ADPCM4 decoder function */
void ffa_DecodeADPCM4(BYTE *srcP, short* dstP, int flength, ffa_ADPCM_ctx *ctx) {

  /* declare/init variables */ 
  int cSamp = ctx->cSampM           // previous sample
  int fb    = ctx->fbM              // current bit shift
  int n, cValue;
   
  /* decoder loop */
  for (n=0; n<flength; n++) {                  // for all samples
    if (n&1) cValue = (*srcP++ & 0xF0) >> 4;   // get the upper 4bit and go to next byte
    else     cValue = (*srcP   & 0x0F);        // get the lower 4bit 

    cSamp += (ffa_ADPCM4_LUT[cValue] << fb);   // uncompress the sample
    *dstP++ = (short)cSamp;                    // store in destination PCM 16bit buffer

    if (cValue>=FFA_ADPCM4_HI && fb<FFA_ADPCM4_MAXFB) fb++;       // raise the bit shifter
    else if (cValue<=FFA_ADPCM4_LOW && fb>FFA_ADPCM4_MINFB) fb--; // lower the bit shifter
  }

  /* rescue decoder context for next call */
  ctx->cSampM  = cSamp           // previous sample
  ctx->fbM     = fb              // current bit shift
  ctx->fpos    + flength
}

Thorham · « **Reply #38 on:** June 09, 2010, 03:02:05 PM »

Quote from: bubblebobble;563682

Here is the C-port of the simpliest version of the ADPCM4 decoder for 16bit/mono. It packs exactly 1:4.

Thanks for the code

Quote from: bubblebobble;563682

Check if this is easy and fast enough for your purposes. The code can be optimized, this here is tuned to be a clean example.

This should be fine, and can easily be rewritten in assembler.

Quote from: bubblebobble;563682

The audio quality degrades of course, but how much depends on the actual material.
There are also better versions, but the computation power is higher, and the result is only slightly better.

So far I've only tried IMA ADPCM, and I must say that the quality isn't good enough. Will your encoder do better? Normally I'm not much of an audiophile, but in this case quality is everything (which is why I'm still trying to do this with some sort of lossless encoder as well).

I've tried Samplemanager, but this fails on my WinUae setup (which is the same as my Amiga setup). I've also checked Aminet for AudioConverter, but it doesn't seem to be on there. Should I try/search a little harder?

bubblebobble · « **Reply #39 on:** June 09, 2010, 03:23:24 PM »

> I've tried Samplemanager, but this fails on my WinUae setup
Can you specify "fails"? Normaly it should work. Maybe a lib is missing that I forgot to ask at the begining?
(it needs minimum 68020+FPU, and ~16MB RAM or more, and I guess 24bit datatypes)

Neighter the released Samplemanager nor AudioConverter offer this codec. I just added it now.

I haven't compared it to IMA, but my codec is surely not better (but faster), otherwise you should break those IMA guys all 10 fingers and not let them touch an computer again.
But it is probably close to and it depends on the parameters you have used.
You may give me a wav and I can create a preview for you. I didnt release the ADPCM codec yet, because I still tweak it here and there, and files get immediately incompatible of course. So in case of codec you have to be sure that it doesn't need an update when you release it.
If 1:3 is enough for you, or even 1:2, one could easily adjust the codec from 4 bits to 6 or 8 bits. In 8bit I expect a quality you wont hear any difference. 6bit should be still much better than 4.
I tried 5 bit when I wrote the stereo version (5+3bits), and 5 bit truly halves the artefacts from 4 bits.
Another thing that can be observed that the artefacts are at very high frequencies. So if you use 44kH, you barley here them, if you use 16kHz it is clearly crackling. The same effect should be observable with IMA or any other ADPCM codec.

If you use looseless, 1:2 is the maximum you can get, ever.
Some files may get smaller, but some will remain larger, 1:2 is the average maximum of "expensive" codecs like FLAC etc.).

Thorham · « **Reply #40 on:** June 09, 2010, 04:21:53 PM »