Author Topic: Simple Amiga audio question. (Read 8060 times)

bubblebobble · « **Reply #14 on:** June 06, 2010, 12:55:34 PM »

Loosless compression will be always too expensive for your purpose, since it always involves some kind of zip compression.

MP3 is too expensive as well, otherwise this would be the best choise.

What's left is only DPCM or ADPCM. Those are relatively cheap to decode in linear complexity.
If you are calculating deltas, you are already using a predictor. The predictor says "the next sample will have the same value as the current one". Which isn't a bad predictor at all. This gives you already 80% of the quality.
ADPCM is significantly better than DPCM, since the value range is adaptive to the current audio data. But I would say roughly 4 times more expensive to calculate.

If this is a game project, I would also consider using a lower samplingrate and maybe mono, because nobody would recognize your nearly-looseless afforts anyway. 44100Hz/16bit/stereo is already quite expensive to sqeeze through the Zorro Slot without any decoding involved.
In 16kHz/8bit/mono you could also consider to use mp3, and decode it "offline" into a ram buffer. If the songs are not soo long, this should be affordable. 8bit isn't too bad either if you do proper dithering. (same effect as with pictures, when doing dithering even 12bit color looks acceptable)

Thorham · « **Reply #15 on:** June 06, 2010, 01:57:02 PM »

Quote from: Karlos;563116

It supported, if I recall correctly, between 2-5 bit encoding. It was lossy but didn't sound too awful.

Cool, but I really want lossless encoding. Lossy encoding is simply not an option.

Quote from: Karlos;563116

Each compressed frame contained a 16-bit word that had some encoding flags, followed by a 16-bit signed sample (2 for stereo and so on) that comes unmodified from the input frame and serves as the starting point. After that, the next 4/8/16/32 16-bit words contain the best 4/8/16/32 delta values as determined by the encoder.

The remainder of the compressed frame was simply bitfield lookups into that table. Stereo data was interleaved IIRC. For 2-bit encoding, each 16-bit word contained 8 entries, 3-bit encoding stored 5 entries (LSB aligned), 4-bit encoding stored 4 entries, 5-bit stored 3 entries (LSB aligned).

The replay algorithm simply takes the start value and then extracts each field value, looks up the delta value from the table and adds it to the current value to recreate the next sample.

I am sure I still have the sources somewhere.

There are some interesting ideas in here that I may be able to use in my compressor, thanks

Quote from: bubblebobble;563122

Loosless compression will be always too expensive for your purpose, since it always involves some kind of zip compression.

Not necessarily. I'm first going to try the simplest of Huffmann implementations. This is very cheap. Basically, when I calculate the deltas, and the deltas of the deltas, I simply store the sign sepperately, and make negative values positive.

This has the effect of creating easily compressed delta values. They're currently stored in the following way: I add two bits per delta. These bits tell how many bits the delta contains. The bit lengths are simply 4, 8, 12 and 16. This could be improved by setting those ranges in a better way. After this, a very simple implementation of Huffmann encoding can be used.

The sign data is stored in 'bit toggled' form (each time a bit in the data is different than the previous bit, a one is written out, for repeating bits a zero is written out).

I'm hoping that those signs can be reasonably well compressed with a simple Huffmann encoder.

All this is very cheap to decode, and should give reasonable compression rates.

Quote from: bubblebobble;563122

If you are calculating deltas, you are already using a predictor. The predictor says "the next sample will have the same value as the current one". Which isn't a bad predictor at all. This gives you already 80% of the quality.

Do you mean sound quality? If that's the case, than that isn't correct. The process I use is lossless. In case it's not, than I don't understand what you mean

Quote from: bubblebobble;563122

If this is a game project, I would also consider using a lower samplingrate and maybe mono, because nobody would recognize your nearly-looseless afforts anyway. 44100Hz/16bit/stereo is already quite expensive to sqeeze through the Zorro Slot without any decoding involved.

In 16kHz/8bit/mono you could also consider to use mp3, and decode it "offline" into a ram buffer. If the songs are not soo long, this should be affordable. 8bit isn't too bad either if you do proper dithering. (same effect as with pictures, when doing dithering even 12bit color looks acceptable)

No, it'a not a game project. It's for music CD, something like an old-school music disk, but a lot more extensive.

That's the main reason for me wanting maximum quality. Also, there are already two lossy factors: The downsampling from 48Khz to 28Khz and playing back on a miggy so that two bits are lost. I really hope I don't have to use lossy compression

bubblebobble · « **Reply #16 on:** June 06, 2010, 02:22:04 PM »

Again, loosless is expensive, and gives you in average 1:2 compression. If you even consider a huffman decoder and some nifty bit ticks, then you can affort ADPCM, this is cheaper. ADPCM8 can guarantee 1:2 compression and sounds almost as good as PCM16. Given your low-fi conditions (Paula14/28kHz), the quality loss is absolutely neglegtible.
You could also encode the stereo channel with 4bit, then you end up in 12 bits per stereo sample instead of 32bit, not too bad.

You should also consider 24kHz, because of the integer ratio of downsampling. The downsampling in your case has the biggest quality impact, much more than ADPCM8 would harm to your data. All this of course depends on the actual audio data.
If this is sampled MOD music, produced with 8kHz samples in 8bit, all those assumptions might be wrong. I assume high-fidelity random pop music as you can hear in the radio.

Karlos · « **Reply #17 on:** June 06, 2010, 03:01:25 PM »

Quote

Cool, but I really want lossless encoding. Lossy encoding is simply not an option.

Quote

Also, there are already two lossy factors: The downsampling from 48Khz to 28Khz and playing back on a miggy so that two bits are lost. I really hope I don't have to use lossy compression

Depending on your algorithm, your loss error might be limited to the bits you can't replay anyway. Also, consider how human hearing works. For example, you can't perceive the same degree of error in a quiet sound immediately after a loud one.

Experiment, I say.

Karlos · « **Reply #18 on:** June 06, 2010, 03:50:18 PM »

Anyway, if you are curious, I have dug around my old HD and found it.

I've uploaded the codec binary for you to play with. It only imports/exports AIFF 16-bit (mono and stereo supported).

I've encoded a short section of 44.1kHz stereo music, provided the compressed version and the decoded version for your appraisal. The default encode options were used which IIRC are 4-bit, frame length 256. This gives a compression of about 3.5:1.

Speech, with properly gated silences can compress much better, since an entire silence frame can be encoded as a single word, more or less.

An interesting side effect of the codec is that it is "first time lossy" only. If you re-encode the decoded output, except in very rare cases, will output the same compressed interpretation as the first pass of the original did.

If you do a waveform subtraction of the decoded from the original, you'll see what has been thrown away (and it is quite noticeable), yet it's a lot harder to perceive when just listening.

http://extropia.co.uk/_temp/xdac_codec.zip

-edit-

I think the codec has been compiled with FPU support, which isn't used in the codec but may be used when interpreting the AIFF sample rate (which is stored as an 80-bit long float)

Thorham · « **Reply #19 on:** June 06, 2010, 04:39:21 PM »

Quote from: bubblebobble;563135

Again, loosless is expensive, and gives you in average 1:2 compression. If you even consider a huffman decoder and some nifty bit ticks, then you can affort ADPCM, this is cheaper. ADPCM8 can guarantee 1:2 compression and sounds almost as good as PCM16. Given your low-fi conditions (Paula14/28kHz), the quality loss is absolutely neglegtible.

You sure like your lossy encoding :lol: Anyway, I've already said that my lossless encoder isn't very heavy, certainly fast enough to decode on an A1200 with some fastmem in the trapdoor slot. But enough of that

Karlos has uploaded a nice archive for me, and I must say that the lossy encoding he uses sounds quite good, actually

In other words, lossy encoding is now a serious option, rather than just a last resort.

Quote from: bubblebobble;563135

You could also encode the stereo channel with 4bit, then you end up in 12 bits per stereo sample instead of 32bit, not too bad.

I'll try that.

Quote from: bubblebobble;563135

You should also consider 24kHz, because of the integer ratio of downsampling. The downsampling in your case has the biggest quality impact, much more than ADPCM8 would harm to your data.

I'm using a high quality algorithm from Sox on the peecee. Even when halving the sample rate, just taking the average may not be enough. I've used cheap methods, and they're bad

Quote from: bubblebobble;563135

All this of course depends on the actual audio data.
If this is sampled MOD music, produced with 8kHz samples in 8bit, all those assumptions might be wrong. I assume high-fidelity random pop music as you can hear in the radio.

The music is all the music from Final Fantasy 10, ripped to PSF format. This is the original, tracked audio data, and includes the player code from the game (!). PSF players 'simply' emulate Playstation 1 and 2 audio hardware and CPU (and various other bits, of course), producing the original sound.

Quote from: Karlos;563139

Depending on your algorithm, your loss error might be limited to the bits you can't replay anyway. Also, consider how human hearing works. For example, you can't perceive the same degree of error in a quiet sound immediately after a loud one.

I didn't know that. Very interesting

Quote from: Karlos;563139

Experiment, I say.

Absolutely, and I'm also not even remotely done with my lossless experiments, yet.

Quote from: Karlos;563142

Anyway, if you are curious, I have dug around my old HD and found it.

I've uploaded the codec binary for you to play with. It only imports/exports AIFF 16-bit (mono and stereo supported).

Thanks

Quote from: Karlos;563139

I've encoded a short section of 44.1kHz stereo music, provided the compressed version and the decoded version for your appraisal. The default encode options were used which IIRC are 4-bit, frame length 256. This gives a compression of about 3.5:1.

Again, thanks

Sounds good! I expected a lot worse, to be honest, and now that I've heard this kind of lossy compression, I must sat that it has definitively become a serious option to consider for me. However, I do hear the difference, unfortunately, and that's without high end equipment, so I would need a solution for that.

Quote from: Karlos;563139

-edit-

I think the codec has been compiled with FPU support, which isn't used in the codec but may be used when interpreting the AIFF sample rate (which is stored as an 80-bit long float)

Oh, good to know, I don't have an FPU on my Blizzard '030. Guess I'll use WinUae, then, no problem. If I'm going to use this, then I have 1.67 gigabytes to encode, and this would take forever on my miggy anyway.

Karlos · « **Reply #20 on:** June 06, 2010, 05:01:34 PM »

The codec tool is very old. Not sure if I even implemented proper streaming to/from disk with it. When I find the source code (alas it wasn't in the same place as the bin), I'll put it up.

Run the codec without any parameters to see what options it takes.

-snd is to specify the aiff source for compression, target for decompression
-xdac is to specify xdac target for compression, source for decompression

-encode - pretty obvious, encodes the aiff to the xdac target (default is to decode)

-fsize to set the framesize. Default is 256 IIRC, think it maxes out at 1024. Longer frames give better compression, at the expense of quality.

-brate to set the maximum bitrate for encoding. This is not really a bitrate value in the mp3 sense but the maximum number of bits (thus delta table size) per frame to use.

Note that the compressor detects those cases in which there are less delta values to store than the current bit rate specifies and reduces those frames accordingly, with silence being compressed out all together. Doesn't happen in music much, but is common in speech.

Quote

However, I do hear the difference, unfortunately, and that's without high end equipment, so I would need a solution for that.

Try encoding with -brate 5 and -fsize 128. That should produce better quality, at the expense of file size.

bubblebobble · « **Reply #21 on:** June 06, 2010, 06:06:35 PM »

I think you should first define how much memory you want to spend to store the music, and from that estimate the compression ratio that you need.
Given the compression ratio and the CPU power, we can evaluate what your options are.

A fact is: on avarage music, looseless compression will give you approximatly 2:1. You cannot break this barrier, otherwise you would be a good candiate for the Nobelprize in natural sience. ;-)

An experience: loosless doesn't necessarily mean the result sounds worse than the original. Looseless just says that the data is not reproduced bit-identical, like one needs for exact data like exacutables. Many people like a moderate mpeg compression on audio, because a lot of "garbadge" gets filtered out and the result is somewhat easier and more transparent to listen.

I would always prefere ADPCM8 over looseless, because the difference is not audible to humans and th compression ratio is predictable fix 2:1.

Thorham · « **Reply #22 on:** June 06, 2010, 06:46:43 PM »

@Karlos: Thanks for the explanation.

Quote from: Karlos;563150

Try encoding with -brate 5 and -fsize 128. That should produce better quality, at the expense of file size.

I'll try that.

Quote from: bubblebobble;563152

I think you should first define how much memory you want to spend to store the music, and from that estimate the compression ratio that you need.
Given the compression ratio and the CPU power, we can evaluate what your options are.

Okay, here goes:

There's 93 WAVs which use up 1.67 gigabytes. Many of them (80+) are looped, and thus have repeating data. The repeats may take up 25% to 50% of the data. It's probably less than 50%. The problem with this that although the loopings can be chopped off and done in software easily enough, it has to be done by hand, and for so many tracks this is a downright pain in the backside, and it's certainly something I don't want to have to do if it's avoidable.

I want to store them on a CD with a couple of megabytes to spare for code an graphics (one megabyte will probably be more than enough).

The CPU I'm working on is a 50 mhz '030, but the lowest target should be something like an A1200 with some fastmem in the trap door. Or, at max, a 28 Mhz '020 board (a Blizzard, I believe).

It would be great if I could find the loop times somewhere, because then this would be a done deal.

Quote from: bubblebobble;563152

A fact is: on avarage music, looseless compression will give you approximatly 2:1. You cannot break this barrier, otherwise you would be a good candiate for the Nobelprize in natural sience. ;-)

Don't you mean computer science

Somehow I doubt nature has set this ratio to 2:1, though, and I like to believe it can be done, but that's just me :lol:

Quote from: bubblebobble;563152

An experience: loosless doesn't necessarily mean the result sounds worse than the original. Looseless just says that the data is not reproduced bit-identical, like one needs for exact data like exacutables.

That's a good point, I never considered that.

Quote from: bubblebobble;563152

I would always prefere ADPCM8 over looseless, because the difference is not audible to humans and th compression ratio is predictable fix 2:1.

While ADPCM and similar lossy techniques are certainly an option now, ADPCM8's 2:1 ratio still isn't good enough, I'm afraid

Karlos's method, however, might be the solution to this problem.

Karlos · « **Reply #23 on:** June 06, 2010, 08:10:50 PM »

I don't currently have the source code handy (I'll have to dig through a lot of backup cd's), but I do remember the technique well enough:

1) Choose a frame length and bit rate (say 256 samples / 4-bit for example)

2) For one complete frame of audio, transform the samples into a sequence of delta values, leaving the first sample as is (ie, in a mono stream with frame length 256, you now have 1 sample and 255 subsequent delta values). Another way of looking at it is that you have 256 delta values from 257 samples, where sample 0 had the value 0.

Note that for a stereo stream, remember that the source samples are usually interleaved so remember that when performing this step. Unless you plan to do some mid + side encoding, treat them separately.

3) Now find all the unique delta values for your frame and the popularity of each one. Don't include the first one here. My method simply did a qsort() and then walked through them counting duplicates as it went. Not particularly fast, but for encoding, who cares?

4) Use a reduction algorithm (I tried several) to find the best fit 2^N delta values for the above set, where N is your "bit rate".

5) Store the first delta value (which is the same as the first sample in the original frame) exactly (or pair of samples for a stereo stream) as 16-bit signed data.

6) Store these best fit delta values as 16-bit signed data. This is now your delta table with which to encode the rest of the frame.

7) Starting with your unblemished "start" sample, for each successive sample in the original frame, choose the delta value from your table that gets you nearest to that sample without clipping. Store the index of the used delta value as a bitfield, packing successive bitfields into 16-bit words.

Repeat from (7) until you've encoded the entire frame.

If I remember correctly, my compressed frame, now looks something like this, assuming a mono source with 4-bit encoding

Code: [Select]

word
000: [        frame header word         ]
001: [           start sample           ]
002: [        best fit delta  0         ]
003: [        best fit delta  1         ]
004: [        best fit delta  2         ]
                    ...
016: [        best fit delta 14         ]
017: [        best fit delta 15         ]
018: [ev  004][ev  003][ev  002][ev  001]
019: [ev  008][ev  007][ev  006][ev  005]
                    ...
081: [ empty ][ev  255][ev  254][ev  253]

ev N: encoded delta value for original sample N. Note we don't bother encoding the first (zeroth) sample as we already have it. Thus the last bitfield is always empty in a word aligned stream such as above. For 3/5-bit encoding, this may or may not always be true.

A stereo stream with the same frame length is encoded as follows:

Code: [Select]

word
000: [        frame header word         ]
001: [          start sample R          ]
002: [          start sample L          ]
003: [        best fit delta  0         ]
004: [        best fit delta  1         ]
005: [        best fit delta  2         ]
                    ...
017: [        best fit delta 14         ]
018: [        best fit delta 15         ]
019: [evL 002][evR 002][evL 001][evR 001]
020: [evL 004][evR 004][evL 003][evR 003]
                    ...
021: [ empty ][ empty ][evR 127][evL 127]

Notice that the encoder regards frame length as total number of samples, it doesn't consider a stereo frame of length 256 as having 256 sample pairs.

Decoding the above data is so easy that even a vanilla 68000 can do it. Assuming you have a compressed frame in memory, you simply:

1) set a pointer into the best fit area
2) set your current sample value to the start value
3) write your current value to the output
4) extract the next ev bitfield from the compressed block
5) look up the delta value indexed by your value from (4)
6) add it to the current sample
7) repeat from 3 until the entire frame has been decoded.

bubblebobble · « **Reply #24 on:** June 06, 2010, 08:26:40 PM »

Quote from: Thorham;563157

@Karlos: Thanks for the explanation.
It would be great if I could find the loop times somewhere, because then this would be a done deal.

You could write a tool that tries to find the loop points, but that would probably take longer than editing them manually.

Quote

Don't you mean computer science Somehow I doubt nature has set this ratio to 2:1, though, and I like to believe it can be done, but that's just me :lol:

No, I do mean natural sience. And unfortunately yes, nature has set this to 2:1. Without extra World-knowledge, the entropy of an average music signal in time domain is roughly 0.5, means 1bit gives 0.5bit of information.
You will never ever be able to compress better than 2:1. The sooner you accept this, the better for your precious spare time.

Check out this page:
http://wiki.hydrogenaudio.org/index.php?title=Lossless_comparison

Many wise men have worked on looseless codec. Here some examples:
FLAC    58.70%
WavPack   58.0%
TAK    57.0%
Monkey's   55.50%
OptimFROG    54.70%
ALAC    58.50%
WMA 56.30%

So dont fool yourself by thinking you can beat this.

Quote

While ADPCM and similar lossy techniques are certainly an option now, ADPCM8's 2:1 ratio still isn't good enough, I'm afraid Karlos's method, however, might be the solution to this problem.

If you need more than 2:1, loosless is out of the game anyway. Lossy is your only option.

The best is mpeg, e.g. mp3 can easily reach 10:1 without significant degradation. With ADPCM, you could get ~3:1 I'd say (ADPCM8 for the mid channel, and ADPCM4 for the stereo channel). ADPCM is fast and easy to implement compared to mpeg.

Thorham · « **Reply #25 on:** June 06, 2010, 09:35:55 PM »

Quote from: Karlos;563163

I don't currently have the source code handy (I'll have to dig through a lot of backup cd's)

Oh no, don't search for it, I might not use it, and I much prefer a good explanation anyway. Usually, even when I don't end up using something, an explanation always contains interesting and useful ideas, and is thus much more enlightening than source code (where the source code is basically stripped to what's needed, and that's then used as is).

Quote from: Karlos;563163

1) Choose a frame length and bit rate (say 256 samples / 4-bit for example)

2) For one complete frame of audio, transform the samples into a sequence of delta values, leaving the first sample as is (ie, in a mono stream with frame length 256, you now have 1 sample and 255 subsequent delta values). Another way of looking at it is that you have 256 delta values from 257 samples, where sample 0 had the value 0.

Note that for a stereo stream, remember that the source samples are usually interleaved so remember that when performing this step. Unless you plan to do some mid + side encoding, treat them separately.

3) Now find all the unique delta values for your frame and the popularity of each one. Don't include the first one here. My method simply did a qsort() and then walked through them counting duplicates as it went. Not particularly fast, but for encoding, who cares?

4) Use a reduction algorithm (I tried several) to find the best fit 2^N delta values for the above set, where N is your "bit rate".

5) Store the first delta value (which is the same as the first sample in the original frame) exactly (or pair of samples for a stereo stream) as 16-bit signed data.

6) Store these best fit delta values as 16-bit signed data. This is now your delta table with which to encode the rest of the frame.

7) Starting with your unblemished "start" sample, for each successive sample in the original frame, choose the delta value from your table that gets you nearest to that sample without clipping. Store the index of the used delta value as a bitfield, packing successive bitfields into 16-bit words.

Repeat from (7) until you've encoded the entire frame.

If I remember correctly, my compressed frame, now looks something like this, assuming a mono source with 4-bit encoding

Code: [Select]
word 000: [ frame header word ] 001: [ start sample ] 002: [ best fit delta 0 ] 003: [ best fit delta 1 ] 004: [ best fit delta 2 ] ... 016: [ best fit delta 14 ] 017: [ best fit delta 15 ] 018: [ev 004][ev 003][ev 002][ev 001] 019: [ev 008][ev 007][ev 006][ev 005] ... 081: [ empty ][ev 255][ev 254][ev 253]

ev N: encoded delta value for original sample N. Note we don't bother encoding the first (zeroth) sample as we already have it. Thus the last bitfield is always empty in a word aligned stream such as above. For 3/5-bit encoding, this may or may not always be true.

A stereo stream with the same frame length is encoded as follows:

Code: [Select]
word 000: [ frame header word ] 001: [ start sample R ] 002: [ start sample L ] 003: [ best fit delta 0 ] 004: [ best fit delta 1 ] 005: [ best fit delta 2 ] ... 017: [ best fit delta 14 ] 018: [ best fit delta 15 ] 019: [evL 002][evR 002][evL 001][evR 001] 020: [evL 004][evR 004][evL 003][evR 003] ... 021: [ empty ][ empty ][evR 127][evL 127]

Notice that the encoder regards frame length as total number of samples, it doesn't consider a stereo frame of length 256 as having 256 sample pairs.

Decoding the above data is so easy that even a vanilla 68000 can do it. Assuming you have a compressed frame in memory, you simply:

1) set a pointer into the best fit area
2) set your current sample value to the start value
3) write your current value to the output
4) extract the next ev bitfield from the compressed block
5) look up the delta value indexed by your value from (4)
6) add it to the current sample
7) repeat from 3 until the entire frame has been decoded.

That's quite clear, and very interesting, thanks a tonne, much appreciated