Author Topic: Simple Amiga audio question. (Read 17620 times)

Karlos · « **on:** June 04, 2010, 08:33:44 PM »

Quote

So if you have good quality, clean 16 bit audio files without audio or filesize compression and you convert them to 14 bit and play them on the Amiga with the filters off, you'll have audio quality which exceeds most consumer device outputs, in my opinion.

No conversion to 14-bit is necessary. The 14-bit replay routines do all that anyway.

Karlos · « **Reply #1 on:** June 06, 2010, 11:45:13 AM »

I wrote a low CPU use audio compression format. Encoding was relatively expensive though. Essentially, it split 16-bit audio into frames, found the best N delta values (depending on the # of bits to encode) for that frame (the expensive bit).

It supported, if I recall correctly, between 2-5 bit encoding. It was lossy but didn't sound too awful.

The idea was that each frame could be independently replayed.

Each compressed frame contained a 16-bit word that had some encoding flags, followed by a 16-bit signed sample (2 for stereo and so on) that comes unmodified from the input frame and serves as the starting point. After that, the next 4/8/16/32 16-bit words contain the best 4/8/16/32 delta values as determined by the encoder.

The remainder of the compressed frame was simply bitfield lookups into that table. Stereo data was interleaved IIRC. For 2-bit encoding, each 16-bit word contained 8 entries, 3-bit encoding stored 5 entries (LSB aligned), 4-bit encoding stored 4 entries, 5-bit stored 3 entries (LSB aligned).

The replay algorithm simply takes the start value and then extracts each field value, looks up the delta value from the table and adds it to the current value to recreate the next sample.

I am sure I still have the sources somewhere.

Karlos · « **Reply #2 on:** June 06, 2010, 11:53:27 AM »

^ I never implemented it, but the replay routines were aimed at keeping the sample data compressed in memory and simply generating output directly to a mixing stage. In order to support volume, each frame would have it's start sample and delta table attenuated to the current volume, resulting in an output stream at the correct level of attenuation.

This meant that the mixing routine only ever had to add the output sample values. Overall the aim was to avoid excessive multiplication or table lookup.

Karlos · « **Reply #3 on:** June 06, 2010, 03:01:25 PM »

Quote

Cool, but I really want lossless encoding. Lossy encoding is simply not an option.

Quote

Also, there are already two lossy factors: The downsampling from 48Khz to 28Khz and playing back on a miggy so that two bits are lost. I really hope I don't have to use lossy compression

Depending on your algorithm, your loss error might be limited to the bits you can't replay anyway. Also, consider how human hearing works. For example, you can't perceive the same degree of error in a quiet sound immediately after a loud one.

Experiment, I say.

Karlos · « **Reply #4 on:** June 06, 2010, 03:50:18 PM »

Anyway, if you are curious, I have dug around my old HD and found it.

I've uploaded the codec binary for you to play with. It only imports/exports AIFF 16-bit (mono and stereo supported).

I've encoded a short section of 44.1kHz stereo music, provided the compressed version and the decoded version for your appraisal. The default encode options were used which IIRC are 4-bit, frame length 256. This gives a compression of about 3.5:1.

Speech, with properly gated silences can compress much better, since an entire silence frame can be encoded as a single word, more or less.

An interesting side effect of the codec is that it is "first time lossy" only. If you re-encode the decoded output, except in very rare cases, will output the same compressed interpretation as the first pass of the original did.

If you do a waveform subtraction of the decoded from the original, you'll see what has been thrown away (and it is quite noticeable), yet it's a lot harder to perceive when just listening.

http://extropia.co.uk/_temp/xdac_codec.zip

-edit-

I think the codec has been compiled with FPU support, which isn't used in the codec but may be used when interpreting the AIFF sample rate (which is stored as an 80-bit long float)

Karlos · « **Reply #5 on:** June 06, 2010, 05:01:34 PM »

The codec tool is very old. Not sure if I even implemented proper streaming to/from disk with it. When I find the source code (alas it wasn't in the same place as the bin), I'll put it up.

Run the codec without any parameters to see what options it takes.

-snd is to specify the aiff source for compression, target for decompression
-xdac is to specify xdac target for compression, source for decompression

-encode - pretty obvious, encodes the aiff to the xdac target (default is to decode)

-fsize to set the framesize. Default is 256 IIRC, think it maxes out at 1024. Longer frames give better compression, at the expense of quality.

-brate to set the maximum bitrate for encoding. This is not really a bitrate value in the mp3 sense but the maximum number of bits (thus delta table size) per frame to use.

Note that the compressor detects those cases in which there are less delta values to store than the current bit rate specifies and reduces those frames accordingly, with silence being compressed out all together. Doesn't happen in music much, but is common in speech.

Quote

However, I do hear the difference, unfortunately, and that's without high end equipment, so I would need a solution for that.

Try encoding with -brate 5 and -fsize 128. That should produce better quality, at the expense of file size.

Karlos · « **Reply #6 on:** June 06, 2010, 08:10:50 PM »

I don't currently have the source code handy (I'll have to dig through a lot of backup cd's), but I do remember the technique well enough:

1) Choose a frame length and bit rate (say 256 samples / 4-bit for example)

2) For one complete frame of audio, transform the samples into a sequence of delta values, leaving the first sample as is (ie, in a mono stream with frame length 256, you now have 1 sample and 255 subsequent delta values). Another way of looking at it is that you have 256 delta values from 257 samples, where sample 0 had the value 0.

Note that for a stereo stream, remember that the source samples are usually interleaved so remember that when performing this step. Unless you plan to do some mid + side encoding, treat them separately.

3) Now find all the unique delta values for your frame and the popularity of each one. Don't include the first one here. My method simply did a qsort() and then walked through them counting duplicates as it went. Not particularly fast, but for encoding, who cares?

4) Use a reduction algorithm (I tried several) to find the best fit 2^N delta values for the above set, where N is your "bit rate".

5) Store the first delta value (which is the same as the first sample in the original frame) exactly (or pair of samples for a stereo stream) as 16-bit signed data.

6) Store these best fit delta values as 16-bit signed data. This is now your delta table with which to encode the rest of the frame.

7) Starting with your unblemished "start" sample, for each successive sample in the original frame, choose the delta value from your table that gets you nearest to that sample without clipping. Store the index of the used delta value as a bitfield, packing successive bitfields into 16-bit words.

Repeat from (7) until you've encoded the entire frame.

If I remember correctly, my compressed frame, now looks something like this, assuming a mono source with 4-bit encoding

Code: [Select]

word
000: [        frame header word         ]
001: [           start sample           ]
002: [        best fit delta  0         ]
003: [        best fit delta  1         ]
004: [        best fit delta  2         ]
                    ...
016: [        best fit delta 14         ]
017: [        best fit delta 15         ]
018: [ev  004][ev  003][ev  002][ev  001]
019: [ev  008][ev  007][ev  006][ev  005]
                    ...
081: [ empty ][ev  255][ev  254][ev  253]

ev N: encoded delta value for original sample N. Note we don't bother encoding the first (zeroth) sample as we already have it. Thus the last bitfield is always empty in a word aligned stream such as above. For 3/5-bit encoding, this may or may not always be true.

A stereo stream with the same frame length is encoded as follows:

Code: [Select]

word
000: [        frame header word         ]
001: [          start sample R          ]
002: [          start sample L          ]
003: [        best fit delta  0         ]
004: [        best fit delta  1         ]
005: [        best fit delta  2         ]
                    ...
017: [        best fit delta 14         ]
018: [        best fit delta 15         ]
019: [evL 002][evR 002][evL 001][evR 001]
020: [evL 004][evR 004][evL 003][evR 003]
                    ...
021: [ empty ][ empty ][evR 127][evL 127]

Notice that the encoder regards frame length as total number of samples, it doesn't consider a stereo frame of length 256 as having 256 sample pairs.

Decoding the above data is so easy that even a vanilla 68000 can do it. Assuming you have a compressed frame in memory, you simply:

1) set a pointer into the best fit area
2) set your current sample value to the start value
3) write your current value to the output
4) extract the next ev bitfield from the compressed block
5) look up the delta value indexed by your value from (4)
6) add it to the current sample
7) repeat from 3 until the entire frame has been decoded.

Karlos · « **Reply #7 on:** June 06, 2010, 10:05:31 PM »

Regarding my old code versus ADPCM, ADPCM is probably better but is more expensive to decode and less fault tolerant.

I designed the above codec for a very specific purpose. I wanted a sound format, that would allow a mixing engine to decode N streams of compressed audio straight from memory with as little CPU usage as possible. Using a frame based mechanism helped in the following ways:

1) Mixing generally works on taking a "packet" of sound data and mixing it into a buffer. Having your sound in discrete chunks already facilitates this.

2) It's relatively cheap to apply a volume to the compressed data. In essence, you only do as many multiplications as you have start samples / delta values. By pre-multiplying (a copy of) these data by the desired volume, you save having to calculate the volume of every output sample.

On my 68040, hand optimised decode routines for mono sound were arguably faster than replaying uncompressed audio. As silly as it sounds, it's true. The reason being that for all the extra shift/add work we are doing (which is a tiny loop in reality and fits cache even on 020), we are doing far less memory reading for the amount of data we are spitting back out.

Karlos · « **Reply #8 on:** June 07, 2010, 12:21:30 AM »

Quote from: Thorham;563173

If I'm forced to use lossy compression, I think I'll first try Karlos's method. Seem easy enough to implement, so I'll experiment with that first, or better yet, compare it to ADPCM and see wich produces the best quality at the right compression rate.

One thing to consider with the method I used is the choice of reduction algorithm for finding the "best fit" delta table. For any given frame length/bit-depth, how you choose the values for this table will have the most obvious effect on quantization error and thus the overall quality.

Karlos · « **Reply #9 on:** June 07, 2010, 11:15:08 AM »

Quote from: bubblebobble;563238

Karlos' algorithm follows a so called "code-book based" approach.
From what I can see it has several drawbacks:

1. On 256/4bit is has more than 25% data overhead because he stores individual codebooks for each frame. Means instead of 4:1, you will get ~3:1.

It's actually ~3.5:1. If you download the zip file, you'll see for yourself.

Quote

2. How to find the "best" delta representants is not well defined and might need a lot of experimenting to find the optimal algoritm.

Absolutely, which is why I wanted to dig out the source.

Quote

3. The encoder has a very high complexity because it needs to do vector quantisation. Luckily only for the encoder.

It isn't massively complex and it could certainly be implemented more simply than I did. Once it was working satisfactorily for my needs, I didn't bother improving it, since playback was my main concern.

Quote

4. Doesn't make use of the assumption that the left and right channel of a stereo signal are correlated.

That's not actually true. In the stereo case, there is still only one delta table derived from both channels. The independent variation of left and right will produce similar spread of delta values when there is a strong correlation between them. An advantage here is that the correlation of delta value spread isn't really affected by phase differences between the channels.

Experimentation with mid and side band encoding did not produce any real difference from a QSNR perspective.

Quote

5. "Wastes" precious 4 bits in the 256/4bit case ;-)

Yeah, you got me there. Of course, a modified algorithm would simply pack an extra source sample into that and live with odd sized frames. I just happened to require an arrangement that decompressed an even number of samples per frame.

Quote

6. The choosen Deltas may cause an error of up to 4096 quantisation steps (=reduces to 4bit PCM quality) in the worst case. However, very unlikely of course, but unlike in ADPCM, the error is not correlated with the high frequencies, so the error is not "masked".

> Regarding my old code versus ADPCM, ADPCM is probably better but is more expensive to decode and less fault tolerant.
Decoding is cheaper than ADPCM yes, but why less fault tolerent? Because you have a "sync" Sample at the beginning of a block? ADPCM can be "resetted" every N samples too. It wouldn't even need an explicit sync value.

My experience with ADPCM decode was that corrupt data in the compressed stream can (but won't necessarily) knock the decode out permanently from that point onwards. Explicit audio frames mean that at most only the remaining samples in the current frame will be corrupted.

Karlos · « **Reply #10 on:** June 07, 2010, 12:04:04 PM »

Perhaps the title should be changed to "(Not quite so) Simple..."

Karlos · « **Reply #11 on:** June 07, 2010, 12:31:16 PM »

Quote from: bubblebobble;563256

@Karlos

What about a competition? ;-)

Let Thorham choose a short sample as uncompressed 16bit .wav (lets say 10 secs).
(maybe with some music, fading, voice, sound effect etc.)
We can both compress and uncompress it again, and let him decide what sounds better?
(as a proof, we both must provide the compressed file too, of course).

LOL!

I don't care enough really. I am sure ADPCM is capable of better compression/quality than my method, but you are welcome to try if it amuses you to do so

The only reason I mentioned it is that he was asking for a low CPU usage playback routine and I happened to have one I'd made earlier that will run on a 7MHz 68000 (though the codec application is foolishly compiled for 020/fpu)

Karlos · « **Reply #12 on:** June 07, 2010, 12:50:21 PM »

Quote from: bubblebobble;563253

original datasize =
samples*sizeof(word) =
256 * 2 =
512 Bytes

compressed datasize =
samples*sizeof(4bit) + table*sizeof(word) + startsample*sizeof(word) + header =
(256*0.5) + (16*2) + 2 + 2 =
164 Bytes

=> ratio = 512/164 = 3.1219...

I might have lied about stereo streams. Sample pairs may actually be treated as single entities, in which case, 256 sample pairs are compressed per frame, in which case it's 256*2*2 bytes of sample data. I wrote it in 1996 or so, so my memory is slightly hazy on the details. However:

Code: [Select]


-rwxr-xr-x 1 karlos karlos 1763382 2010-06-06 15:36 test_decoded.aiff
-rwxr-xr-x 1 karlos karlos  506328 2010-06-06 15:36 test.xdac

That's ~3.48:1 based on filesize alone

Also consider that the algorithm will happily use less than N bit per frame if it finds it can. Doesn't happen much in music, but speech is a different matter.

Until I dig out the source, it's hard to say.

Karlos · « **Reply #13 on:** August 06, 2010, 06:42:10 PM »

Quote from: Thorham;573668

One more thought: If one could do software channel separation and sound recognition/reconstruction, I'm sure the entropy of music would be better than 0.5, but just try to write it :roflmao:

If by channel you mean left/right, then many encoders already do take advantage of the correlation between left and right to increase compression*.

If by channel you mean frequency band, then basically this is what mp3 and similar systems use.

*the encoder I wrote doesn't do mid/side band analysis. The reason for this is that the method of encoding tries to pick the best n deltas per frame of audio, using both channels anyway. As a consequence, when the channels are correlated; that is to say they have a similar range of delta values (regardless of overall phase difference between the left/right signals) then the encoder picks values equally suitable for each channel. When one channel has much more variation than the other (for instance, a loud sound in one and quiet in the other), then the louder channel dominates the pool of chosen delta values. The result being that the channel with most going on is always given a bigger share of the available "bits" than the other.

Karlos · « **Reply #14 on:** August 06, 2010, 09:14:05 PM »

Quote from: Thorham;573688

With a tracked format you can easily beat 1:2 lossless compression. A MOD recorded to WAV and compressed with, say, FLAC would be much bigger than the original MOD.

This is why I don't believe that the 1:2 ratio is dictated by nature, as bubblebobble claims. People simply don't know how to do it yet, because WAV to tracked format conversion isn't exactly easy to write, especially not without neural algorithms.

Ok, now I get you. Theoretically, assuming you could perform the required convolutions to identify the individual components, this might work for a piece of music where each "channel" is played by some easily identified oscillator function (square wave, sine wave, trinagle wave). You could then translate to a format that describes how those oscillators are played over time (pitch, volume, pan) in order to be able to reproduce it using those same oscillator functions. What you have there is something like your SMF (standard MIDI file) format.

However, real music produced on real instruments does not follow this paradigm. An individual instrument can produce a near infinite variation in tonal quality depending on how it is played, never mind the complexities added by production effects.

There are only two real analysis you can do with a recorded waveform in PCM format:

1) Analyse the sample data in the time domain.
2) Perform discrete FFT to convert into the frequency domain and analyse that.

In either case, you can perform an additional analysis based on the known effects of human audio perception and head related transfer function.

Author Topic: Simple Amiga audio question. (Read 17620 times)

Karlos

Re: Simple Amiga audio question.

Karlos

Re: Simple Amiga audio question.

Karlos

Re: Simple Amiga audio question.

Karlos

Re: Simple Amiga audio question.

Karlos

Re: Simple Amiga audio question.

Karlos

Re: Simple Amiga audio question.

Karlos

Re: Simple Amiga audio question.

Karlos

Re: Simple Amiga audio question.

Karlos

Re: Simple Amiga audio question.

Karlos

Re: Simple Amiga audio question.

Karlos

Re: Simple Amiga audio question.

Karlos

Re: Simple Amiga audio question.

Karlos

Re: Simple Amiga audio question.

Karlos

Re: Simple Amiga audio question.

Karlos

Re: Simple Amiga audio question.