Welcome, Guest. Please login or register.

Author Topic: Simple Amiga audio question.  (Read 8060 times)

Description:

0 Members and 1 Guest are viewing this topic.

Offline bubblebobble

  • Jr. Member
  • **
  • Join Date: Dec 2003
  • Posts: 66
    • Show only replies by bubblebobble
    • http://www.hd-rec.de
Re: Simple Amiga audio question.
« Reply #14 on: June 06, 2010, 12:55:34 PM »
Loosless compression will be always too expensive for your purpose, since it always involves some kind of zip compression.

MP3 is too expensive as well, otherwise this would be the best choise.

What's left is only DPCM or ADPCM. Those are relatively cheap to decode in linear complexity.
If you are calculating deltas, you are already using a predictor. The predictor says "the next sample will have the same value as the current one". Which isn't a bad predictor at all. This gives you already 80% of the quality.
ADPCM is significantly better than DPCM, since the value range is adaptive to the current audio data. But I would say roughly 4 times more expensive to calculate.

If this is a game project, I would also consider using a lower samplingrate and maybe mono, because nobody would recognize your nearly-looseless afforts anyway. 44100Hz/16bit/stereo is already quite expensive to sqeeze through the Zorro Slot without any decoding involved.
In 16kHz/8bit/mono you could also consider to use mp3, and decode it "offline" into a ram buffer. If the songs are not soo long, this should be affordable. 8bit isn't too bad either if you do proper dithering. (same effect as with pictures, when doing dithering even 12bit color looks acceptable)
« Last Edit: June 06, 2010, 01:06:53 PM by bubblebobble »
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de
 

Offline ThorhamTopic starter

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1149
    • Show only replies by Thorham
Re: Simple Amiga audio question.
« Reply #15 on: June 06, 2010, 01:57:02 PM »
Quote from: Karlos;563116
It supported, if I recall correctly, between 2-5 bit encoding. It was lossy but didn't sound too awful.
Cool, but I really want lossless encoding. Lossy encoding is simply not an option.
Quote from: Karlos;563116
Each compressed frame contained a 16-bit word that had some encoding flags, followed by a 16-bit signed sample (2 for stereo and so on) that comes unmodified from the input frame and serves as the starting point. After that, the next 4/8/16/32 16-bit words contain the best 4/8/16/32 delta values as determined by the encoder.

The remainder of the compressed frame was simply bitfield lookups into that table. Stereo data was interleaved IIRC. For 2-bit encoding, each 16-bit word contained 8 entries, 3-bit encoding stored 5 entries (LSB aligned), 4-bit encoding stored 4 entries, 5-bit stored 3 entries (LSB aligned).

The replay algorithm simply takes the start value and then extracts each field value, looks up the delta value from the table and adds it to the current value to recreate the next sample.

I am sure I still have the sources somewhere.
There are some interesting ideas in here that I may be able to use in my compressor, thanks :)
Quote from: bubblebobble;563122
Loosless compression will be always too expensive for your purpose, since it always involves some kind of zip compression.
Not necessarily. I'm first going to try the simplest of Huffmann implementations. This is very cheap. Basically, when I calculate the deltas, and the deltas of the deltas, I simply store the sign sepperately, and make negative values positive.

This has the effect of creating easily compressed delta values. They're currently stored in the following way: I add two bits per delta. These bits tell how many bits the delta contains. The bit lengths are simply 4, 8, 12 and 16. This could be improved by setting those ranges in a better way. After this, a very simple implementation of Huffmann encoding can be used.

The sign data is stored in 'bit toggled' form (each time a bit in the data is different than the previous bit, a one is written out, for repeating bits a zero is written out).

I'm hoping that those signs can be reasonably well compressed with a simple Huffmann encoder.

All this is very cheap to decode, and should give reasonable compression rates.
Quote from: bubblebobble;563122
If you are calculating deltas, you are already using a predictor. The predictor says "the next sample will have the same value as the current one". Which isn't a bad predictor at all. This gives you already 80% of the quality.
Do you mean sound quality? If that's the case, than that isn't correct. The process I use is lossless. In case it's not, than I don't understand what you mean :D
Quote from: bubblebobble;563122
If this is a game project, I would also consider using a lower samplingrate and maybe mono, because nobody would recognize your nearly-looseless afforts anyway. 44100Hz/16bit/stereo is already quite expensive to sqeeze through the Zorro Slot without any decoding involved.

In 16kHz/8bit/mono you could also consider to use mp3, and decode it "offline" into a ram buffer. If the songs are not soo long, this should be affordable. 8bit isn't too bad either if you do proper dithering. (same effect as with pictures, when doing dithering even 12bit color looks acceptable)
No, it'a not a game project. It's for music CD, something like an old-school music disk, but a lot more extensive.

That's the main reason for me wanting maximum quality. Also, there are already two lossy factors: The downsampling from 48Khz to 28Khz and playing back on a miggy so that two bits are lost. I really hope I don't have to use lossy compression :(
« Last Edit: June 06, 2010, 01:59:31 PM by Thorham »
 

Offline bubblebobble

  • Jr. Member
  • **
  • Join Date: Dec 2003
  • Posts: 66
    • Show only replies by bubblebobble
    • http://www.hd-rec.de
Re: Simple Amiga audio question.
« Reply #16 on: June 06, 2010, 02:22:04 PM »
Again, loosless is expensive, and gives you in average 1:2 compression. If you even consider a huffman decoder and some nifty bit ticks, then you can affort ADPCM, this is cheaper. ADPCM8 can guarantee 1:2 compression and sounds almost as good as PCM16. Given your low-fi conditions (Paula14/28kHz), the quality loss is absolutely neglegtible.
You could also encode the stereo channel with 4bit, then you end up in 12 bits per stereo sample instead of 32bit, not too bad.

You should also consider 24kHz, because of the integer ratio of downsampling.  The downsampling in your case has the biggest quality impact, much more than ADPCM8 would harm to your data. All this of course depends on the actual audio data.
If this is sampled MOD music, produced with 8kHz samples in 8bit, all those assumptions might be wrong. I assume high-fidelity random pop music as you can hear in the radio.
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: Simple Amiga audio question.
« Reply #17 on: June 06, 2010, 03:01:25 PM »
Quote
Cool, but I really want lossless encoding. Lossy encoding is simply not an option.

Quote
Also, there are already two lossy factors: The downsampling from 48Khz to 28Khz and playing back on a miggy so that two bits are lost. I really hope I don't have to use lossy compression

Depending on your algorithm, your loss error might be limited to the bits you can't replay anyway. Also, consider how human hearing works. For example, you can't perceive the same degree of error in a quiet sound immediately after a loud one.

Experiment, I say.
« Last Edit: June 06, 2010, 03:03:33 PM by Karlos »
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: Simple Amiga audio question.
« Reply #18 on: June 06, 2010, 03:50:18 PM »
Anyway, if you are curious, I have dug around my old HD and found it.

I've uploaded the codec binary for you to play with. It only imports/exports AIFF 16-bit (mono and stereo supported).

I've encoded a short section of 44.1kHz stereo music, provided the compressed version and the decoded version for your appraisal. The default encode options were used which IIRC are 4-bit, frame length 256. This gives a compression of about 3.5:1.

Speech, with properly gated silences can compress much better, since an entire silence frame can be encoded as a single word, more or less.

An interesting side effect of the codec is that it is "first time lossy" only. If you re-encode the decoded output, except in very rare cases, will output the same compressed interpretation as the first pass of the original did.

If you do a waveform subtraction of the decoded from the original, you'll see what has been thrown away (and it is quite noticeable), yet it's a lot harder to perceive when just listening.

http://extropia.co.uk/_temp/xdac_codec.zip

-edit-

I think the codec has been compiled with FPU support, which isn't used in the codec but may be used when interpreting the AIFF sample rate (which is stored as an 80-bit long float)
« Last Edit: June 06, 2010, 04:05:40 PM by Karlos »
int p; // A
 

Offline ThorhamTopic starter

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1149
    • Show only replies by Thorham
Re: Simple Amiga audio question.
« Reply #19 on: June 06, 2010, 04:39:21 PM »
Quote from: bubblebobble;563135
Again, loosless is expensive, and gives you in average 1:2 compression. If you even consider a huffman decoder and some nifty bit ticks, then you can affort ADPCM, this is cheaper. ADPCM8 can guarantee 1:2 compression and sounds almost as good as PCM16. Given your low-fi conditions (Paula14/28kHz), the quality loss is absolutely neglegtible.
You sure like your lossy encoding :lol: Anyway, I've already said that my lossless encoder isn't very heavy, certainly fast enough to decode on an A1200 with some fastmem in the trapdoor slot. But enough of that ;) Karlos has uploaded a nice archive for me, and I must say that the lossy encoding he uses sounds quite good, actually :) In other words, lossy encoding is now a serious option, rather than just a last resort.
Quote from: bubblebobble;563135
You could also encode the stereo channel with 4bit, then you end up in 12 bits per stereo sample instead of 32bit, not too bad.
I'll try that.
Quote from: bubblebobble;563135
You should also consider 24kHz, because of the integer ratio of downsampling. The downsampling in your case has the biggest quality impact, much more than ADPCM8 would harm to your data.
I'm using a high quality algorithm from Sox on the peecee. Even when halving the sample rate, just taking the average may not be enough. I've used cheap methods, and they're bad :)
Quote from: bubblebobble;563135
All this of course depends on the actual audio data.
If this is sampled MOD music, produced with 8kHz samples in 8bit, all those assumptions might be wrong. I assume high-fidelity random pop music as you can hear in the radio.
The music is all the music from Final Fantasy 10, ripped to PSF format. This is the original, tracked audio data, and includes the player code from the game (!). PSF players 'simply' emulate Playstation 1 and 2 audio hardware and CPU (and various other bits, of course), producing the original sound.
Quote from: Karlos;563139
Depending on your algorithm, your loss error might be limited to the bits you can't replay anyway. Also, consider how human hearing works. For example, you can't perceive the same degree of error in a quiet sound immediately after a loud one.
I didn't know that. Very interesting :)
Quote from: Karlos;563139
Experiment, I say.
Absolutely, and I'm also not even remotely done with my lossless experiments, yet.
Quote from: Karlos;563142
Anyway, if you are curious, I have dug around my old HD and found it.

I've uploaded the codec binary for you to play with. It only imports/exports AIFF 16-bit (mono and stereo supported).
Thanks :)
Quote from: Karlos;563139
I've encoded a short section of 44.1kHz stereo music, provided the compressed version and the decoded version for your appraisal. The default encode options were used which IIRC are 4-bit, frame length 256. This gives a compression of about 3.5:1.
Again, thanks :) Sounds good! I expected a lot worse, to be honest, and now that I've heard this kind of lossy compression, I must sat that it has definitively become a serious option to consider for me. However, I do hear the difference, unfortunately, and that's without high end equipment, so I would need a solution for that.
Quote from: Karlos;563139
-edit-

I think the codec has been compiled with FPU support, which isn't used in the codec but may be used when interpreting the AIFF sample rate (which is stored as an 80-bit long float)
Oh, good to know, I don't have an FPU on my Blizzard '030. Guess I'll use WinUae, then, no problem. If I'm going to use this, then I have 1.67 gigabytes to encode, and this would take forever on my miggy anyway.
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: Simple Amiga audio question.
« Reply #20 on: June 06, 2010, 05:01:34 PM »
The codec tool is very old. Not sure if I even implemented proper streaming to/from disk with it. When I find the source code (alas it wasn't in the same place as the bin), I'll put it up.

Run the codec without any parameters to see what options it takes.

-snd is to specify the aiff source for compression, target for decompression
-xdac is to specify xdac target for compression, source for decompression

-encode - pretty obvious, encodes the aiff to the xdac target (default is to decode)

-fsize to set the framesize. Default is 256 IIRC, think it maxes out at 1024. Longer frames give better compression, at the expense of quality.

-brate to set the maximum bitrate for encoding. This is not really a bitrate value in the mp3 sense but the maximum number of bits (thus delta table size) per frame to use.

Note that the compressor detects those cases in which there are less delta values to store than the current bit rate specifies and reduces those frames accordingly, with silence being compressed out all together. Doesn't happen in music much, but is common in speech.

Quote
However, I do hear the difference, unfortunately, and that's without high end equipment, so I would need a solution for that.

Try encoding with -brate 5 and -fsize 128. That should produce better quality, at the expense of file size.
« Last Edit: June 06, 2010, 05:04:22 PM by Karlos »
int p; // A
 

Offline bubblebobble

  • Jr. Member
  • **
  • Join Date: Dec 2003
  • Posts: 66
    • Show only replies by bubblebobble
    • http://www.hd-rec.de
Re: Simple Amiga audio question.
« Reply #21 on: June 06, 2010, 06:06:35 PM »
I think you should first define how much memory you want to spend to store the music, and from that estimate the compression ratio that you need.
Given the compression ratio and the CPU power, we can evaluate what your options are.

A fact is: on avarage music, looseless compression will give you approximatly 2:1. You cannot break this barrier, otherwise you would be a good candiate for the Nobelprize in natural sience. ;-)

An experience: loosless doesn't necessarily mean the result sounds worse than the original. Looseless just says that the data is not reproduced bit-identical, like one needs for exact data like exacutables. Many people like a moderate mpeg compression on audio, because a lot of "garbadge" gets filtered out and the result is somewhat easier and more transparent to listen.

I would always prefere ADPCM8 over looseless, because the difference is not audible to humans and th compression ratio is predictable fix 2:1.
« Last Edit: June 06, 2010, 06:09:31 PM by bubblebobble »
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de
 

Offline ThorhamTopic starter

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1149
    • Show only replies by Thorham
Re: Simple Amiga audio question.
« Reply #22 on: June 06, 2010, 06:46:43 PM »
@Karlos: Thanks for the explanation.
Quote from: Karlos;563150
Try encoding with -brate 5 and -fsize 128. That should produce better quality, at the expense of file size.
I'll try that.

Quote from: bubblebobble;563152
I think you should first define how much memory you want to spend to store the music, and from that estimate the compression ratio that you need.
Given the compression ratio and the CPU power, we can evaluate what your options are.
Okay, here goes:

There's 93 WAVs which use up 1.67 gigabytes. Many of them (80+) are looped, and thus have repeating data. The repeats may take up 25% to 50% of the data. It's probably less than 50%. The problem with this that although the loopings can be chopped off and done in software easily enough, it has to be done by hand, and for so many tracks this is a downright pain in the backside, and it's certainly something I don't want to have to do if it's avoidable.

I want to store them on a CD with a couple of megabytes to spare for code an graphics (one megabyte will probably be more than enough).

The CPU I'm working on is a 50 mhz '030, but the lowest target should be something like an A1200 with some fastmem in the trap door. Or, at max, a 28 Mhz '020 board (a Blizzard, I believe).

It would be great if I could find the loop times somewhere, because then this would be a done deal.
Quote from: bubblebobble;563152
A fact is: on avarage music, looseless compression will give you approximatly 2:1. You cannot break this barrier, otherwise you would be a good candiate for the Nobelprize in natural sience. ;-)
Don't you mean computer science ;) Somehow I doubt nature has set this ratio to 2:1, though, and I like to believe it can be done, but that's just me :lol:
Quote from: bubblebobble;563152
An experience: loosless doesn't necessarily mean the result sounds worse than the original. Looseless just says that the data is not reproduced bit-identical, like one needs for exact data like exacutables.
That's a good point, I never considered that.
Quote from: bubblebobble;563152
I would always prefere ADPCM8 over looseless, because the difference is not audible to humans and th compression ratio is predictable fix 2:1.
While ADPCM and similar lossy techniques are certainly an option now, ADPCM8's 2:1 ratio still isn't good enough, I'm afraid :( Karlos's method, however, might be the solution to this problem.
« Last Edit: June 06, 2010, 06:49:07 PM by Thorham »
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: Simple Amiga audio question.
« Reply #23 on: June 06, 2010, 08:10:50 PM »
I don't currently have the source code handy (I'll have to dig through a lot of backup cd's), but I do remember the technique well enough:

1) Choose a frame length and bit rate (say 256 samples / 4-bit for example)

2) For one complete frame of audio, transform the samples into a sequence of delta values, leaving the first sample as is (ie, in a mono stream with frame length 256, you now have 1 sample and 255 subsequent delta values). Another way of looking at it is that you have 256 delta values from 257 samples, where sample 0 had the value 0.

Note that for a stereo stream, remember that the source samples are usually interleaved so remember that when performing this step. Unless you plan to do some mid + side encoding, treat them separately.

3) Now find all the unique delta values for your frame and the popularity of each one. Don't include the first one here. My method simply did a qsort() and then walked through them counting duplicates as it went. Not particularly fast, but for encoding, who cares?

4) Use a reduction algorithm (I tried several) to find the best fit 2^N delta values for the above set, where N is your "bit rate".

5) Store the first delta value (which is the same as the first sample in the original frame) exactly (or pair of samples for a stereo stream) as 16-bit signed data.

6) Store these best fit delta values as 16-bit signed data. This is now your delta table with which to encode the rest of the frame.

7) Starting with your unblemished "start" sample, for each successive sample in the original frame, choose the delta value from your table that gets you nearest to that sample without clipping. Store the index of the used delta value as a bitfield, packing successive bitfields into 16-bit words.

8) Repeat from (7) until you've encoded the entire frame.

If I remember correctly, my compressed frame, now looks something like this, assuming a mono source with 4-bit encoding

Code: [Select]
word
000: [        frame header word         ]
001: [           start sample           ]
002: [        best fit delta  0         ]
003: [        best fit delta  1         ]
004: [        best fit delta  2         ]
                    ...
016: [        best fit delta 14         ]
017: [        best fit delta 15         ]
018: [ev  004][ev  003][ev  002][ev  001]
019: [ev  008][ev  007][ev  006][ev  005]
                    ...
081: [ empty ][ev  255][ev  254][ev  253]

ev N: encoded delta value for original sample N. Note we don't bother encoding the first (zeroth) sample as we already have it. Thus the last bitfield is always empty in a word aligned stream such as above. For 3/5-bit encoding, this may or may not always be true.

A stereo stream with the same frame length is encoded as follows:

Code: [Select]
word
000: [        frame header word         ]
001: [          start sample R          ]
002: [          start sample L          ]
003: [        best fit delta  0         ]
004: [        best fit delta  1         ]
005: [        best fit delta  2         ]
                    ...
017: [        best fit delta 14         ]
018: [        best fit delta 15         ]
019: [evL 002][evR 002][evL 001][evR 001]
020: [evL 004][evR 004][evL 003][evR 003]
                    ...
021: [ empty ][ empty ][evR 127][evL 127]

Notice that the encoder regards frame length as total number of samples, it doesn't consider a stereo frame of length 256 as having 256 sample pairs.


Decoding the above data is so easy that even a vanilla 68000 can do it. Assuming you have a compressed frame in memory, you simply:

1) set a pointer into the best fit area
2) set your current sample value to the start value
3) write your current value to the output
4) extract the next ev bitfield from the compressed block
5) look up the delta value indexed by your value from (4)
6) add it to the current sample
7) repeat from 3 until the entire frame has been decoded.
« Last Edit: June 06, 2010, 08:27:42 PM by Karlos »
int p; // A
 

Offline bubblebobble

  • Jr. Member
  • **
  • Join Date: Dec 2003
  • Posts: 66
    • Show only replies by bubblebobble
    • http://www.hd-rec.de
Re: Simple Amiga audio question.
« Reply #24 on: June 06, 2010, 08:26:40 PM »
Quote from: Thorham;563157
@Karlos: Thanks for the explanation.
It would be great if I could find the loop times somewhere, because then this would be a done deal.
You could write a tool that tries to find the loop points, but that would probably take longer than editing them manually.

Quote
Don't you mean computer science ;) Somehow I doubt nature has set this ratio to 2:1, though, and I like to believe it can be done, but that's just me :lol:
No, I do mean natural sience. And unfortunately yes, nature has set this to 2:1. Without extra World-knowledge, the entropy of an average music signal in time domain is roughly 0.5, means 1bit gives 0.5bit of information.
You will never ever be able to compress better than 2:1. The sooner you accept this, the better for your precious spare time.

Check out this page:
http://wiki.hydrogenaudio.org/index.php?title=Lossless_comparison

Many wise men have worked on looseless codec. Here some examples:
FLAC    58.70%
WavPack   58.0%
TAK    57.0%
Monkey's   55.50%
OptimFROG     54.70%
ALAC    58.50%
WMA  56.30%

So dont fool yourself by thinking you can beat this.

Quote
While ADPCM and similar lossy techniques are certainly an option now, ADPCM8's 2:1 ratio still isn't good enough, I'm afraid :( Karlos's method, however, might be the solution to this problem.
If you need more than 2:1, loosless is out of the game anyway. Lossy is your only option.

The best is mpeg, e.g. mp3 can easily reach 10:1 without significant degradation. With ADPCM, you could get ~3:1 I'd say (ADPCM8 for the mid channel, and ADPCM4 for the stereo channel). ADPCM is fast and easy to implement compared to mpeg.
« Last Edit: June 06, 2010, 09:19:55 PM by bubblebobble »
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de
 

Offline ThorhamTopic starter

  • Hero Member
  • *****
  • Join Date: Oct 2009
  • Posts: 1149
    • Show only replies by Thorham
Re: Simple Amiga audio question.
« Reply #25 on: June 06, 2010, 09:35:55 PM »
Quote from: Karlos;563163
I don't currently have the source code handy (I'll have to dig through a lot of backup cd's)
Oh no, don't search for it, I might not use it, and I much prefer a good explanation anyway. Usually, even when I don't end up using something, an explanation always contains interesting and useful ideas, and is thus much more enlightening than source code (where the source code is basically stripped to what's needed, and that's then used as is).
Quote from: Karlos;563163
1) Choose a frame length and bit rate (say 256 samples / 4-bit for example)

2) For one complete frame of audio, transform the samples into a sequence of delta values, leaving the first sample as is (ie, in a mono stream with frame length 256, you now have 1 sample and 255 subsequent delta values). Another way of looking at it is that you have 256 delta values from 257 samples, where sample 0 had the value 0.

Note that for a stereo stream, remember that the source samples are usually interleaved so remember that when performing this step. Unless you plan to do some mid + side encoding, treat them separately.

3) Now find all the unique delta values for your frame and the popularity of each one. Don't include the first one here. My method simply did a qsort() and then walked through them counting duplicates as it went. Not particularly fast, but for encoding, who cares?

4) Use a reduction algorithm (I tried several) to find the best fit 2^N delta values for the above set, where N is your "bit rate".

5) Store the first delta value (which is the same as the first sample in the original frame) exactly (or pair of samples for a stereo stream) as 16-bit signed data.

6) Store these best fit delta values as 16-bit signed data. This is now your delta table with which to encode the rest of the frame.

7) Starting with your unblemished "start" sample, for each successive sample in the original frame, choose the delta value from your table that gets you nearest to that sample without clipping. Store the index of the used delta value as a bitfield, packing successive bitfields into 16-bit words.

8) Repeat from (7) until you've encoded the entire frame.

If I remember correctly, my compressed frame, now looks something like this, assuming a mono source with 4-bit encoding

Code: [Select]

word
000: [        frame header word         ]
001: [           start sample           ]
002: [        best fit delta  0         ]
003: [        best fit delta  1         ]
004: [        best fit delta  2         ]
                    ...
016: [        best fit delta 14         ]
017: [        best fit delta 15         ]
018: [ev  004][ev  003][ev  002][ev  001]
019: [ev  008][ev  007][ev  006][ev  005]
                    ...
081: [ empty ][ev  255][ev  254][ev  253]


ev N: encoded delta value for original sample N. Note we don't bother encoding the first (zeroth) sample as we already have it. Thus the last bitfield is always empty in a word aligned stream such as above. For 3/5-bit encoding, this may or may not always be true.

A stereo stream with the same frame length is encoded as follows:

Code: [Select]

word
000: [        frame header word         ]
001: [          start sample R          ]
002: [          start sample L          ]
003: [        best fit delta  0         ]
004: [        best fit delta  1         ]
005: [        best fit delta  2         ]
                    ...
017: [        best fit delta 14         ]
018: [        best fit delta 15         ]
019: [evL 002][evR 002][evL 001][evR 001]
020: [evL 004][evR 004][evL 003][evR 003]
                    ...
021: [ empty ][ empty ][evR 127][evL 127]


Notice that the encoder regards frame length as total number of samples, it doesn't consider a stereo frame of length 256 as having 256 sample pairs.


Decoding the above data is so easy that even a vanilla 68000 can do it. Assuming you have a compressed frame in memory, you simply:

1) set a pointer into the best fit area
2) set your current sample value to the start value
3) write your current value to the output
4) extract the next ev bitfield from the compressed block
5) look up the delta value indexed by your value from (4)
6) add it to the current sample
7) repeat from 3 until the entire frame has been decoded.
That's quite clear, and very interesting, thanks a tonne, much appreciated :)
Quote from: bubblebobble;563165
You could write a tool that tries to find the loop points, but that would probably take longer than editing them manually.
Yes, it would, even a quick and dirty one. But at least it would be much less boring, though ;)
Quote from: bubblebobble;563165
No, I do mean natural sience. And unfortunately yes, nature has set this to 2:1. Without extra World-knowledge, the entropy of an average music signal in time domain is roughly 0.5, means 1bit gives 0.5bit of information.
Rreally? But doesn't entropy mostly apply to entropy coders?
Quote from: bubblebobble;563165
You will never ever be able to compress better than 2:1. The sooner you accept this, the better for your precious spare time.
I can never accept these things. And always have to challenge them :lol: And don't worry about my precious time, because I like spending my free time on things like this ;) Even when this sort of thing fails (which tends to happen most of the time, of course :lol:), I've still learned something. Going through these kinds of  failures is better than simply taking someones word for it (too easy) ;)
Quote from: bubblebobble;563165
So dont fool yourself by thinking you can beat this.
The point in trying is that people may have missed things. It happens. Also, if no one challenges existing methods, then in my opinion there's no progress. Although I'm certainly not kidding myself in believing that I can beat these ratios, I also have to say that I won't know until I try. While I probably won't beat them, half the fun is in trying :)
Quote from: bubblebobble;563165
If you need more than 2:1, loosless is out of the game anyway.
Probably ;)
Quote from: bubblebobble;563165
The best is mpeg, e.g. mp3 can easily reach 1:10 without significant degradation. With ADPCM, you could get 3:1 I'd say. ADPCM is fast and easy to implement compared to mpeg.
If I'm forced to use lossy compression, I think I'll first try Karlos's method. Seem easy enough to implement, so I'll experiment with that first, or better yet, compare it to ADPCM and see wich produces the best quality at the right compression rate.
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: Simple Amiga audio question.
« Reply #26 on: June 06, 2010, 10:05:31 PM »
Regarding my old code versus ADPCM, ADPCM is probably better but is more expensive to decode and less fault tolerant.

I designed the above codec for a very specific purpose. I wanted a sound format, that would allow a mixing engine to decode N streams of compressed audio straight from memory with as little CPU usage as possible. Using a frame based mechanism helped in the following ways:

1) Mixing generally works on taking a "packet" of sound data and mixing it into a buffer. Having your sound in discrete chunks already facilitates this.

2) It's relatively cheap to apply a volume to the compressed data. In essence, you only do as many multiplications as you have start samples / delta values. By pre-multiplying (a copy of) these data by the desired volume, you save having to calculate the volume of every output sample.

On my 68040, hand optimised decode routines for mono sound were arguably faster than replaying uncompressed audio. As silly as it sounds, it's true. The reason being that for all the extra shift/add work we are doing (which is a tiny loop in reality and fits cache even on 020), we are doing far less memory reading for the amount of data we are spitting back out.
« Last Edit: June 06, 2010, 10:09:05 PM by Karlos »
int p; // A
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: Simple Amiga audio question.
« Reply #27 on: June 07, 2010, 12:21:30 AM »
Quote from: Thorham;563173
If I'm forced to use lossy compression, I think I'll first try Karlos's method. Seem easy enough to implement, so I'll experiment with that first, or better yet, compare it to ADPCM and see wich produces the best quality at the right compression rate.


One thing to consider with the method I used is the choice of reduction algorithm for finding the "best fit" delta table. For any given frame length/bit-depth, how you choose the values for this table will have the most obvious effect on quantization error and thus the overall quality.
int p; // A
 

Offline bubblebobble

  • Jr. Member
  • **
  • Join Date: Dec 2003
  • Posts: 66
    • Show only replies by bubblebobble
    • http://www.hd-rec.de
Re: Simple Amiga audio question.
« Reply #28 on: June 07, 2010, 09:37:50 AM »
> Rreally? But doesn't entropy mostly apply to entropy coders?
Everybody has to obey the laws of nature. If you want to or not.

> The point in trying is that people may have missed things.
You didn't get the point. It is a law of nature. Unless you are Q from Startrek, you won't be able to change this. It is just not that intuitive like the apple that falls down from the tree, but it is the same thing.
Plus, it is quite pathetic to think that what hundreds of PhD-Level researchers achieved over decades can be wiped away by a hobbyist in a few afternoon-sessions without even undestanding the fundamentals of information theory.

I'll post you the code for my ADPCM4 implementation soon. It should be fast enough for a vanilla A1200, and doesn't need a lot of stuff around it, just one function to encode and one to decode.
You can use my Tool "AudioConverter" or Samplemanager to generate ADPCM files and listen to the result.
I am currently tuning some parameters to minimize the error, and adding stereo support.
Right now, ADPCM4 gives me an average error of ~300 quantisation steps of a 16bit sample. This is roughly like 8bit PCM, but the distribution of the errors is better.
If the material contains a lof of high frequencies, the errors go up, but are less audible.
If the material contains more low frequencies or is quieter, the error goes down. E.g. if your music fades out, there will be no audible noise like with 8bit PCM.

Karlos' algorithm follows a so called "code-book based" approach.
From what I can see it has several drawbacks:

1. On 256/4bit is has more than 25% data overhead because he stores individual codebooks for each frame. Means instead of 4:1, you will get ~3:1.
2. How to find the "best" delta representants is not well defined and might need a lot of experimenting to find the optimal algoritm.
3. The encoder has a very high complexity because it needs to do vector quantisation. Luckily only for the encoder.
4. Doesn't make use of the assumption that the left and right channel of a stereo signal are correlated.
5. "Wastes" precious 4 bits in the 256/4bit case ;-)
6. The choosen Deltas may cause an error of up to 4096 quantisation steps (=reduces to 4bit PCM quality) in the worst case. However, very unlikely of course, but unlike in ADPCM, the error is not correlated with the high frequencies, so the error is not "masked".


> Regarding my old code versus ADPCM, ADPCM is probably better but is more expensive to decode and less fault tolerant.
Decoding is cheaper than ADPCM yes, but why less fault tolerent? Because you have a "sync" Sample at the beginning of a block? ADPCM can be "resetted" every N samples too. It wouldn't even need an explicit sync value.
« Last Edit: June 07, 2010, 09:42:26 AM by bubblebobble »
--
Author of
HD-Rec, Sweeper, Samplemanager, ArTKanoid, Monkeyscript, Toadies, AsteroidsTR, TuiTED, PosTED, TKPlayer, AudioConverter, ScreenCam, PerlinFX, MapEdit, AB3 Includes and many more...
Homepage: http://www.hd-rec.de
 

Offline Karlos

  • Sockologist
  • Global Moderator
  • Hero Member
  • *****
  • Join Date: Nov 2002
  • Posts: 16867
  • Country: gb
  • Thanked: 4 times
    • Show only replies by Karlos
Re: Simple Amiga audio question.
« Reply #29 from previous page: June 07, 2010, 11:15:08 AM »
Quote from: bubblebobble;563238
Karlos' algorithm follows a so called "code-book based" approach.
From what I can see it has several drawbacks:

1. On 256/4bit is has more than 25% data overhead because he stores individual codebooks for each frame. Means instead of 4:1, you will get ~3:1.

It's actually ~3.5:1. If you download the zip file, you'll see for yourself.

Quote
2. How to find the "best" delta representants is not well defined and might need a lot of experimenting to find the optimal algoritm.

Absolutely, which is why I wanted to dig out the source.

Quote
3. The encoder has a very high complexity because it needs to do vector quantisation. Luckily only for the encoder.

It isn't massively complex and it could certainly be implemented more simply than I did. Once it was working satisfactorily for my needs, I didn't bother improving it, since playback was my main concern.

Quote
4. Doesn't make use of the assumption that the left and right channel of a stereo signal are correlated.

That's not actually true. In the stereo case, there is still only one delta table derived from both channels. The independent variation of left and right will produce similar spread of delta values when there is a strong correlation between them. An advantage here is that the correlation of delta value spread isn't really affected by phase differences between the channels.

Experimentation with mid and side band encoding did not produce any real difference from a QSNR perspective.

Quote
5. "Wastes" precious 4 bits in the 256/4bit case ;-)

Yeah, you got me there. Of course, a modified algorithm would simply pack an extra source sample into that and live with odd sized frames. I just happened to require an arrangement that decompressed an even number of samples per frame.

Quote
6. The choosen Deltas may cause an error of up to 4096 quantisation steps (=reduces to 4bit PCM quality) in the worst case. However, very unlikely of course, but unlike in ADPCM, the error is not correlated with the high frequencies, so the error is not "masked".

> Regarding my old code versus ADPCM, ADPCM is probably better but is more expensive to decode and less fault tolerant.
Decoding is cheaper than ADPCM yes, but why less fault tolerent? Because you have a "sync" Sample at the beginning of a block? ADPCM can be "resetted" every N samples too. It wouldn't even need an explicit sync value.

My experience with ADPCM decode was that corrupt data in the compressed stream can (but won't necessarily) knock the decode out permanently from that point onwards. Explicit audio frames mean that at most only the remaining samples in the current frame will be corrupted.
int p; // A