I don't currently have the source code handy (I'll have to dig through a lot of backup cd's), but I do remember the technique well enough:
1) Choose a frame length and bit rate (say 256 samples / 4-bit for example)
2) For one complete frame of audio, transform the samples into a sequence of delta values, leaving the first sample as is (ie, in a mono stream with frame length 256, you now have 1 sample and 255 subsequent delta values). Another way of looking at it is that you have 256 delta values from 257 samples, where sample 0 had the value 0.
Note that for a stereo stream, remember that the source samples are usually interleaved so remember that when performing this step. Unless you plan to do some mid + side encoding, treat them separately.
3) Now find all the
unique delta values for your frame and the popularity of each one. Don't include the first one here. My method simply did a qsort() and then walked through them counting duplicates as it went. Not particularly fast, but for encoding, who cares?
4) Use a reduction algorithm (I tried several) to find the
best fit 2^N delta values for the above set, where N is your "bit rate".
5) Store the first delta value (which is the same as the first sample in the original frame) exactly (or pair of samples for a stereo stream) as 16-bit signed data.
6) Store these
best fit delta values as 16-bit signed data. This is now your delta table with which to encode the rest of the frame.
7) Starting with your unblemished "start" sample, for each successive sample in the original frame, choose the delta value from your table that gets you nearest to that sample without clipping. Store the index of the used delta value as a bitfield, packing successive bitfields into 16-bit words.
Repeat from (7) until you've encoded the entire frame.
If I remember correctly, my compressed frame, now looks something like this, assuming a mono source with 4-bit encoding
word
000: [ frame header word ]
001: [ start sample ]
002: [ best fit delta 0 ]
003: [ best fit delta 1 ]
004: [ best fit delta 2 ]
...
016: [ best fit delta 14 ]
017: [ best fit delta 15 ]
018: [ev 004][ev 003][ev 002][ev 001]
019: [ev 008][ev 007][ev 006][ev 005]
...
081: [ empty ][ev 255][ev 254][ev 253]
ev N: encoded delta value for original sample N. Note we don't bother encoding the first (zeroth) sample as we already have it. Thus the last bitfield is always empty in a word aligned stream such as above. For 3/5-bit encoding, this may or may not always be true.
A stereo stream with the same frame length is encoded as follows:
word
000: [ frame header word ]
001: [ start sample R ]
002: [ start sample L ]
003: [ best fit delta 0 ]
004: [ best fit delta 1 ]
005: [ best fit delta 2 ]
...
017: [ best fit delta 14 ]
018: [ best fit delta 15 ]
019: [evL 002][evR 002][evL 001][evR 001]
020: [evL 004][evR 004][evL 003][evR 003]
...
021: [ empty ][ empty ][evR 127][evL 127]
Notice that the encoder regards frame length as total number of samples, it doesn't consider a stereo frame of length 256 as having 256 sample pairs.
Decoding the above data is so easy that even a vanilla 68000 can do it. Assuming you have a compressed frame in memory, you simply:
1) set a pointer into the best fit area
2) set your current sample value to the start value
3) write your current value to the output
4) extract the next ev bitfield from the compressed block
5) look up the delta value indexed by your value from (4)
6) add it to the current sample
7) repeat from 3 until the entire frame has been decoded.