Author Topic: Simple Amiga audio question. (Read 17662 times)

bubblebobble · « **on:** June 04, 2010, 11:29:52 PM »

Reducing bit resolution doesn't do anything to the frequency range, means bass and trebble are not affacted.
2 bits less raise the quantisation noise about 4 times. Nothing more, but nothing less. Depending on the audio material, this can get audible of course, especially in very silent passages. For "loud" regular pop music, this won't be much audible though. If the dynamic range is well used, quantisation noise starts to be audible roughly at 12 bits.
On the other hand, if you want to compress the audio data, removing 2 bits of 16 is ridiculous. Better use 8 or 4 bit ADPCM, the 8bit variant will sound almost as 16bits, and halves the data size. 4 bit is good for voices, but can get problematic for complex music with a lot of high frequencies, then it starts to crackle a bit.

bubblebobble · « **Reply #1 on:** June 06, 2010, 08:55:59 AM »

Just to clarify: by "compressing" you mean data compression, not dynamic range compression as a DSP effect?

Looseless:
If you compress looseless, you can not guarantee any compression rate. If the data to compress doesn't show any redundancy, no compression is possible. (like zipping a zip file)
In reality, true looseless compression achives about 2 times compression rate on average complex audio data.

Loosless compression almost always works with a predictor (a function that predicts the next sample from the previous samples), calculates the error of the prediction (the delta between the predicted and real value) and zipps the result. E.g. like Flac for audio or PNG for images.

DPCM or ADPCM is not looseless. ADPCM8 uses 8 bits to store the error of the prediction and unpacks into 16bit samples. This is loossy, but almost not audible. ADPCM4 uses 4 bit per 16bit sample, and is audible, depending on the material of course. Music will sound a bit harsh in high frequencies. MP3 uses 1-2bit per 16bit sample and designed for music. So if you want to pack music, this should be your first choise. If you use 192 or 256 kbps and a decend encoder like LAME, the compression is almost not audible. Better than 12bit or ADPCM4. ADPCM8 is much faster to encode/decode than MP3, but packs only 1:2.

bubblebobble · « **Reply #2 on:** June 06, 2010, 12:55:34 PM »

Loosless compression will be always too expensive for your purpose, since it always involves some kind of zip compression.

MP3 is too expensive as well, otherwise this would be the best choise.

What's left is only DPCM or ADPCM. Those are relatively cheap to decode in linear complexity.
If you are calculating deltas, you are already using a predictor. The predictor says "the next sample will have the same value as the current one". Which isn't a bad predictor at all. This gives you already 80% of the quality.
ADPCM is significantly better than DPCM, since the value range is adaptive to the current audio data. But I would say roughly 4 times more expensive to calculate.

If this is a game project, I would also consider using a lower samplingrate and maybe mono, because nobody would recognize your nearly-looseless afforts anyway. 44100Hz/16bit/stereo is already quite expensive to sqeeze through the Zorro Slot without any decoding involved.
In 16kHz/8bit/mono you could also consider to use mp3, and decode it "offline" into a ram buffer. If the songs are not soo long, this should be affordable. 8bit isn't too bad either if you do proper dithering. (same effect as with pictures, when doing dithering even 12bit color looks acceptable)

bubblebobble · « **Reply #3 on:** June 06, 2010, 02:22:04 PM »

Again, loosless is expensive, and gives you in average 1:2 compression. If you even consider a huffman decoder and some nifty bit ticks, then you can affort ADPCM, this is cheaper. ADPCM8 can guarantee 1:2 compression and sounds almost as good as PCM16. Given your low-fi conditions (Paula14/28kHz), the quality loss is absolutely neglegtible.
You could also encode the stereo channel with 4bit, then you end up in 12 bits per stereo sample instead of 32bit, not too bad.

You should also consider 24kHz, because of the integer ratio of downsampling. The downsampling in your case has the biggest quality impact, much more than ADPCM8 would harm to your data. All this of course depends on the actual audio data.
If this is sampled MOD music, produced with 8kHz samples in 8bit, all those assumptions might be wrong. I assume high-fidelity random pop music as you can hear in the radio.

bubblebobble · « **Reply #4 on:** June 06, 2010, 06:06:35 PM »

I think you should first define how much memory you want to spend to store the music, and from that estimate the compression ratio that you need.
Given the compression ratio and the CPU power, we can evaluate what your options are.

A fact is: on avarage music, looseless compression will give you approximatly 2:1. You cannot break this barrier, otherwise you would be a good candiate for the Nobelprize in natural sience. ;-)

An experience: loosless doesn't necessarily mean the result sounds worse than the original. Looseless just says that the data is not reproduced bit-identical, like one needs for exact data like exacutables. Many people like a moderate mpeg compression on audio, because a lot of "garbadge" gets filtered out and the result is somewhat easier and more transparent to listen.

I would always prefere ADPCM8 over looseless, because the difference is not audible to humans and th compression ratio is predictable fix 2:1.

bubblebobble · « **Reply #5 on:** June 06, 2010, 08:26:40 PM »

Quote from: Thorham;563157

@Karlos: Thanks for the explanation.
It would be great if I could find the loop times somewhere, because then this would be a done deal.

You could write a tool that tries to find the loop points, but that would probably take longer than editing them manually.

Quote

Don't you mean computer science Somehow I doubt nature has set this ratio to 2:1, though, and I like to believe it can be done, but that's just me :lol:

No, I do mean natural sience. And unfortunately yes, nature has set this to 2:1. Without extra World-knowledge, the entropy of an average music signal in time domain is roughly 0.5, means 1bit gives 0.5bit of information.
You will never ever be able to compress better than 2:1. The sooner you accept this, the better for your precious spare time.

Check out this page:
http://wiki.hydrogenaudio.org/index.php?title=Lossless_comparison

Many wise men have worked on looseless codec. Here some examples:
FLAC    58.70%
WavPack   58.0%
TAK    57.0%
Monkey's   55.50%
OptimFROG    54.70%
ALAC    58.50%
WMA 56.30%

So dont fool yourself by thinking you can beat this.

Quote

While ADPCM and similar lossy techniques are certainly an option now, ADPCM8's 2:1 ratio still isn't good enough, I'm afraid Karlos's method, however, might be the solution to this problem.

If you need more than 2:1, loosless is out of the game anyway. Lossy is your only option.

The best is mpeg, e.g. mp3 can easily reach 10:1 without significant degradation. With ADPCM, you could get ~3:1 I'd say (ADPCM8 for the mid channel, and ADPCM4 for the stereo channel). ADPCM is fast and easy to implement compared to mpeg.

bubblebobble · « **Reply #6 on:** June 07, 2010, 09:37:50 AM »

> Rreally? But doesn't entropy mostly apply to entropy coders?
Everybody has to obey the laws of nature. If you want to or not.

> The point in trying is that people may have missed things.
You didn't get the point. It is a law of nature. Unless you are Q from Startrek, you won't be able to change this. It is just not that intuitive like the apple that falls down from the tree, but it is the same thing.
Plus, it is quite pathetic to think that what hundreds of PhD-Level researchers achieved over decades can be wiped away by a hobbyist in a few afternoon-sessions without even undestanding the fundamentals of information theory.

I'll post you the code for my ADPCM4 implementation soon. It should be fast enough for a vanilla A1200, and doesn't need a lot of stuff around it, just one function to encode and one to decode.
You can use my Tool "AudioConverter" or Samplemanager to generate ADPCM files and listen to the result.
I am currently tuning some parameters to minimize the error, and adding stereo support.
Right now, ADPCM4 gives me an average error of ~300 quantisation steps of a 16bit sample. This is roughly like 8bit PCM, but the distribution of the errors is better.
If the material contains a lof of high frequencies, the errors go up, but are less audible.
If the material contains more low frequencies or is quieter, the error goes down. E.g. if your music fades out, there will be no audible noise like with 8bit PCM.

Karlos' algorithm follows a so called "code-book based" approach.
From what I can see it has several drawbacks:

1. On 256/4bit is has more than 25% data overhead because he stores individual codebooks for each frame. Means instead of 4:1, you will get ~3:1.
2. How to find the "best" delta representants is not well defined and might need a lot of experimenting to find the optimal algoritm.
3. The encoder has a very high complexity because it needs to do vector quantisation. Luckily only for the encoder.
4. Doesn't make use of the assumption that the left and right channel of a stereo signal are correlated.
5. "Wastes" precious 4 bits in the 256/4bit case ;-)
6. The choosen Deltas may cause an error of up to 4096 quantisation steps (=reduces to 4bit PCM quality) in the worst case. However, very unlikely of course, but unlike in ADPCM, the error is not correlated with the high frequencies, so the error is not "masked".

> Regarding my old code versus ADPCM, ADPCM is probably better but is more expensive to decode and less fault tolerant.
Decoding is cheaper than ADPCM yes, but why less fault tolerent? Because you have a "sync" Sample at the beginning of a block? ADPCM can be "resetted" every N samples too. It wouldn't even need an explicit sync value.

bubblebobble · « **Reply #7 on:** June 07, 2010, 12:21:34 PM »

Quote from: Karlos;563244

It's actually ~3.5:1. If you download the zip file, you'll see for yourself.

original datasize =
samples*sizeof(word) =
256 * 2 =
512 Bytes

compressed datasize =
samples*sizeof(4bit) + table*sizeof(word) + startsample*sizeof(word) + header =
(256*0.5) + (16*2) + 2 + 2 =
164 Bytes

=> ratio = 512/164 = 3.1219...

Quote

It isn't massively complex and it could certainly be implemented more simply than I did.

Complexity in sense of information theory, not implemtation wise. Implementing VQ (at high complexity) is easy.

Quote

That's not actually true. In the stereo case, there is still only one delta table derived from both channels. The independent variation of left and right will produce similar spread of

Yes (phase) and no (correlation).
If you would encode both independently, you would need 2 tables. But you use only one which does not resolve the tiny differences between left and right. Actually it is very likely that the stereo difference is killed completely, because it needs to compete with the other deltas. So you are actually worse than 2 independent channels with 2 tables, which is again worse than 2 joint stereo channels.

Quote

> the correlation of delta value spread isn't really affected by phase differences between the channels.

This is cool, didn't think of the phase difference. You got me here ;-)
(but only if the phase is shifted consistantly accross all sinus waves, which is almost never the case in reality)

Quote

Experimentation with mid and side band encoding did not produce any real difference from a QSNR perspective.

Yes, not with this algorithm, because the sideband suffers to much from competing against the mid band channel. You would need 2 Tables. (can be used together, but "trained" seperately).

Quote

My experience with ADPCM decode was that corrupt data in the compressed stream can (but won't necessarily) knock the decode out permanently from that point onwards. Explicit audio frames mean that at most only the remaining samples in the current frame will be corrupted.

Nobody stopps you from putting ADPCM into chunks of N samples. If this should be a stream, you need to do this anyway, because otherwise you cannot join the stream at any position you want, which is the main point of being a "stream".
But putting ADPCM into chunks has zero overhead. You just reset the adaptation factor to something average all N samples. Of course, setting the factor to the best value is better and can be achieved by adding 1 extra byte each frame.

bubblebobble · « **Reply #8 on:** June 07, 2010, 12:28:11 PM »

@Karlos

What about a competition? ;-)

Let Thorham choose a short sample as uncompressed 16bit .wav (lets say 10 secs).
(maybe with some music, fading, voice, sound effect etc.)
We can both compress and uncompress it again, and let him decide what sounds better?
(as a proof, we both must provide the compressed file too, of course).

bubblebobble · « **Reply #9 on:** June 07, 2010, 12:38:11 PM »

C'mon, it is just for fun and curiosity. I implemented my own version of ADPCM4, which is much more lightweight than e.g. the G.711 standard implementation of ADPCM4. I'd like to see how it compares to other home-brewn codecs.
We could even messure encoding/decoding time of e.g. 10min of audio on a Classic Amiga.

bubblebobble · « **Reply #10 on:** June 09, 2010, 12:46:07 PM »

Here is the C-port of the simpliest version of the ADPCM4 decoder for 16bit/mono. It packs exactly 1:4.
Check if this is easy and fast enough for your purposes. The code can be optimized, this here is tuned to be a clean example.
The audio quality degrades of course, but how much depends on the actual material.
There are also better versions, but the computation power is higher, and the result is only slightly better.

Code: [Select]


/* C-port from &quot;file format audio&quot; include, Amiblitz3 */ 

/* ADPCM4 defines */
#define FFA_ADPCM4_MAXFB 12
#define FFA_ADPCM4_MINFB 0
#define FFA_ADPCM4_LOW   0x0
#define FFA_ADPCM4_HI    0xF -5

/* context struct for all ADPCM/ADDPCM codecs */
typedef struct ffa_ADPCM_ctx {
  int cSampM;
  int cSampS;
  int cDeltaM;
  int cDeltaS;
  int fbM;
  int fbS;
  int fpos;
};


/* ADPCM4 lookup table (Fibonacci) */
int ffa_ADPCM4_LUT[0xF];
ffa_ADPCM4_LUT[0x00] =     0;
ffa_ADPCM4_LUT[0x01] =    -1;
ffa_ADPCM4_LUT[0x02] =     1;
ffa_ADPCM4_LUT[0x03] =    -2;
ffa_ADPCM4_LUT[0x04] =     2;
ffa_ADPCM4_LUT[0x05] =    -3;
ffa_ADPCM4_LUT[0x06] =     3;
ffa_ADPCM4_LUT[0x07] =    -5;
ffa_ADPCM4_LUT[0x08] =     5;
ffa_ADPCM4_LUT[0x09] =    -8;
ffa_ADPCM4_LUT[0x0a] =     8;
ffa_ADPCM4_LUT[0x0b] =   -13;
ffa_ADPCM4_LUT[0x0c] =    13;
ffa_ADPCM4_LUT[0x0d] =   -21;
ffa_ADPCM4_LUT[0x0e] =    21;
ffa_ADPCM4_LUT[0x0f] =   -34;

/* ADPCM4 decoder function */
void ffa_DecodeADPCM4(BYTE *srcP, short* dstP, int flength, ffa_ADPCM_ctx *ctx) {

  /* declare/init variables */ 
  int cSamp = ctx->cSampM           // previous sample
  int fb    = ctx->fbM              // current bit shift
  int n, cValue;
   
  /* decoder loop */
  for (n=0; n<flength; n++) {                  // for all samples
    if (n&1) cValue = (*srcP++ & 0xF0) >> 4;   // get the upper 4bit and go to next byte
    else     cValue = (*srcP   & 0x0F);        // get the lower 4bit 

    cSamp += (ffa_ADPCM4_LUT[cValue] << fb);   // uncompress the sample
    *dstP++ = (short)cSamp;                    // store in destination PCM 16bit buffer

    if (cValue>=FFA_ADPCM4_HI && fb<FFA_ADPCM4_MAXFB) fb++;       // raise the bit shifter
    else if (cValue<=FFA_ADPCM4_LOW && fb>FFA_ADPCM4_MINFB) fb--; // lower the bit shifter
  }

  /* rescue decoder context for next call */
  ctx->cSampM  = cSamp           // previous sample
  ctx->fbM     = fb              // current bit shift
  ctx->fpos    + flength
}

bubblebobble · « **Reply #11 on:** June 09, 2010, 03:23:24 PM »

> I've tried Samplemanager, but this fails on my WinUae setup
Can you specify "fails"? Normaly it should work. Maybe a lib is missing that I forgot to ask at the begining?
(it needs minimum 68020+FPU, and ~16MB RAM or more, and I guess 24bit datatypes)

Neighter the released Samplemanager nor AudioConverter offer this codec. I just added it now.

I haven't compared it to IMA, but my codec is surely not better (but faster), otherwise you should break those IMA guys all 10 fingers and not let them touch an computer again.
But it is probably close to and it depends on the parameters you have used.
You may give me a wav and I can create a preview for you. I didnt release the ADPCM codec yet, because I still tweak it here and there, and files get immediately incompatible of course. So in case of codec you have to be sure that it doesn't need an update when you release it.
If 1:3 is enough for you, or even 1:2, one could easily adjust the codec from 4 bits to 6 or 8 bits. In 8bit I expect a quality you wont hear any difference. 6bit should be still much better than 4.
I tried 5 bit when I wrote the stereo version (5+3bits), and 5 bit truly halves the artefacts from 4 bits.
Another thing that can be observed that the artefacts are at very high frequencies. So if you use 44kH, you barley here them, if you use 16kHz it is clearly crackling. The same effect should be observable with IMA or any other ADPCM codec.

If you use looseless, 1:2 is the maximum you can get, ever.
Some files may get smaller, but some will remain larger, 1:2 is the average maximum of "expensive" codecs like FLAC etc.).

bubblebobble · « **Reply #12 on:** June 09, 2010, 05:13:35 PM »

A dowload of 32mb is no problem.

> Probably won't be a problem, because I'll do my own assembly language versions
Yes, but you need the encoder too.

>> If you use looseless, 1:2 is the maximum you can get, ever
> 'Perhaps'
No.

> I'll try to find some hard proof
This is hard because in theory, the compression can be anything from 0 to 99.9% percent, because it is not defined how audio data actually has to look like.
The only thing you can do is messure the entropy of a large amount of real-world audio data (or the songs you plan to compress) and calculate the entropy. This will give you an idea of the loosless packing potential.
Only the extremes can be proofen hard. E.g. a random walk cannot be packed at all. Otherwise the "walk" would not be random. Entropy is 1.0.
If the audio data is completely perfect silence, it can be reduced completely, the entropy is 0.
Real world audio data is somewere inbetween usually ~0.5. That's why if you try hard to predict the signal, you end up in 50% packing rate. BTW, none of the currently existing codecs packs lower than 50%, all are higher. And some of them are really really complex and sophisticated. But doesnt help. You cannot overcome laws of nature.

Just zipping audio data gives you usually 75% of the size. Using a good predictor gives you 50-60%. But this will get very expensive to decode. Try to run Flac on a Vanilla A1200. It will be far slower than realtime and gives you 54% in average.

bubblebobble · « **Reply #13 on:** June 13, 2010, 02:24:15 PM »

Here is an example of my implementation of ADPCM4.

All files are converted back to .wav so you can easily play them.

If you decide to use this codec, I can pass you the source along.
I personally find 4bit quite good, although with extreme material you can hear a slight crackling. In case of 8bit I dont hear any difference, whatever I encode.
The 2bit is more experimental how low you can go. I find it amazing what 2 bits per sample still can do, however, you can hear a significant degradation in quality.
Probably 6bit is quite hi-fi too, but it's a little hazzle to implement since 6 is not a fraction of 8, so you need to decode always 4 samples at once. But if 4 samples turns out to be too poor, 6bit would be doable too.

FF10 (original 16bit PCM) - File size ~5MB
FF10 (8bit ADPCM) - File size ~ 2.5MB
FF10 (4bit ADPCM) - File Size ~ 1.3MB
FF10 (2bit ADPCM) - File size ~ 0.8 MB

bubblebobble · « **Reply #14 on:** August 06, 2010, 09:15:00 PM »

@Thorham
This won't work. A MOD is smaller because it uses samples, usually at a low quality (short, low samplerate, bitresolution, mono etc.). In a regular music production (if done using samples), they could easily exceed the memory needed to store the final mixdown. As soon as you start to use analog things like voice recordings, real instruments etc. the amount of information needed to describe this goes immediately up.

Apart from that, mixing the channels is a lossy process and cannot be reverted. What we hear is extemely lossy, even worse than an mp3.

What the human ear is doing is tracking instruments/noises by harmonic frequencies or other frequency distributions that have been learned before. Doing that, they miss a lot of stuff that is actually in the music. You might hear it the next time you listen, but then ignore something else. The ear uses a lot of information available prior to listening to the music.

E.g. put it to the extreme, you could write a codec that stores the entire music that will ever be encoded/decoded using this codec in a database. Then, encoding/decoding comes down to an index lookup, every music score is compressed to an integer.
But, at the end, you have to store the information, in the encoded data or outside from meta knowledge.

The same with predictors in looseles codecs. They represent prior knowledge that is assumed and put into the decoder. Once the data doesnt obey this knowledge, your codec screws up, e.g. on white noise.

The best predictors achieve roughly 1:2. Just accept this. Anything else is lossy.
mp3 or ogg are pretty good by compressing 1:12. They make assumptions how the human ear works.

Author Topic: Simple Amiga audio question. (Read 17662 times)

bubblebobble

Re: Simple Amiga audio question.

bubblebobble

Re: Simple Amiga audio question.

bubblebobble

Re: Simple Amiga audio question.

bubblebobble

Re: Simple Amiga audio question.

bubblebobble

Re: Simple Amiga audio question.

bubblebobble

Re: Simple Amiga audio question.

bubblebobble

Re: Simple Amiga audio question.

bubblebobble

Re: Simple Amiga audio question.

bubblebobble

Re: Simple Amiga audio question.

bubblebobble

Re: Simple Amiga audio question.

bubblebobble

Re: Simple Amiga audio question.

bubblebobble

Re: Simple Amiga audio question.

bubblebobble

Re: Simple Amiga audio question.

bubblebobble

Re: Simple Amiga audio question.

bubblebobble

Re: Simple Amiga audio question.