Author Topic: Simple Amiga audio question. (Read 17586 times)

Thorham · « **on:** June 04, 2010, 07:17:48 PM »

Hi,

If I take a 16 bit audio file, and chop of the bottom two bits from each sample, it seems that the sound quality (when playing back in 14 bit on the miggy) isn't reduced. What I want to know is simple: Is there really no audible difference, or is there still some?

Any thoughts are appreciated

Thorham · « **Reply #1 on:** June 04, 2010, 08:55:28 PM »

Quote from: johnklos;562892

I used to use my Amiga 1200 for ALL audio, and I've had it connected to very nice, very accurate speakers and a very clean amplifier at a time when one of my roommates was an audiophile.

That's cool, that will certainly sound great

Quote from: johnklos;562892

14 bit on the Amiga really doesn't make a very audible difference.

Sorry, but I've asked the question in an unclear way

What I'm really asking is if a 14 bit player actually chops off bits 0 and 1 of each sample. I should've been more clear here

Quote from: johnklos;562892

Classical, for instance, will die from audio compression.

Yeah, I know, just try Classical internet radio stations. The music itself is still very enjoyable, but the quality certainly isn't optimal.

Quote from: johnklos;562892

So if you have good quality, clean 16 bit audio files without audio or filesize compression and you convert them to 14 bit and play them on the Amiga with the filters off, you'll have audio quality which exceeds most consumer device outputs, in my opinion.

Cool

This is about a little project of mine. I'm using 16 bit, 48Khz stereo WAVs which I've down sampled to 28KHz using Sox, so the quality is quite good.

Quote from: johnklos;562892

There are other factors - make sure your power supply is clean, and make sure you have good grounds so you don't need a ground loop isolator (which affects the quality of the sound), et cetera. An Amiga can sound quite good!

Oh man, everything here is ungrounded

Quote from: Karlos;562902

No conversion to 14-bit is necessary. The 14-bit replay routines do all that anyway.

Ah, yes, that's what I wanted to hear, thanks

The bit chopping I'm doing is for a lossless compression algorithm I'm writing. Of course bit chopping isn't lossless, but the resulting data is meant to be replayed on an Amiga in 14 bit, so effectively it should be lossless.

Nice to have this and some other things cleared up, thanks guys

Thorham · « **Reply #2 on:** June 06, 2010, 08:02:54 AM »

Quote from: platon42;562918

No, not if the output uses the calibration table. In this case, the conversion is not linear and all the 16 bits are used for mapping the input to the 2 x 8 bit output channels (one at full volume, the other at volume 1).

Perhaps the table includes the bit reduction?

Quote from: koshman;562922

I think jonklos was reffering more to so called "loudness war" in this case - i.e. severely reducing (compressing) dynamics by evening out the volume levels of different sections of a track (http://www.youtube.com/watch?v=xkkqsN69Jac&feature=fvst). Not that classical music would be the usual victim...

Ugh, that's a pretty bad practice (and I'm not even an audiophile)

Quote from: bubblebobble;562934

Reducing bit resolution doesn't do anything to the frequency range, means bass and trebble are not affacted.
2 bits less raise the quantisation noise about 4 times. Nothing more, but nothing less. Depending on the audio material, this can get audible of course, especially in very silent passages. For "loud" regular pop music, this won't be much audible though. If the dynamic range is well used, quantisation noise starts to be audible roughly at 12 bits.

Very interesting

Quote from: bubblebobble;562934

On the other hand, if you want to compress the audio data, removing 2 bits of 16 is ridiculous.

It's for an audio compressor I'm writing for a little project of mine, the bit chopping isn't used by itself.

Quote from: bubblebobble;562934

Better use 8 or 4 bit ADPCM, the 8bit variant will sound almost as 16bits, and halves the data size. 4 bit is good for voices, but can get problematic for complex music with a lot of high frequencies, then it starts to crackle a bit.

ADPCM is a last resort in case I can't fit all the music on a single CD. I really want lossless 14 bit compression.

Thorham · « **Reply #3 on:** June 06, 2010, 09:17:26 AM »

Quote from: bubblebobble;563105

Just to clarify: by "compressing" you mean data compression, not dynamic range compression as a DSP effect?

Yes, I mean data compression. Adding audio effects is absolutely out of the question, even if it improves compressibility.

Quote from: bubblebobble;563105

Loosless compression almost always works with a predictor

Yes, but I've been unable to come up with a reasonable predictor (it's really easy to make a bad one :lol:), so I'm experimenting whith the deltas of the samples, or, to be more accurate, the deltas between the deltas. I'll try those predictors again, but I just wanted to try something else after failing initially.

Quote from: bubblebobble;563105

DPCM or ADPCM is not looseless.

Yep, I know, hence the 'last resort' remark

This might be a last resort because I'm trying to fit 1.67 gigs of audio on a CD with a few megs to spare (for code and graphics) :lol:

Anyway, I want to store the audio without any loss, and while mp3 could be useful, it's to heavy for the target hardware (as low as possible).

Thorham · « **Reply #4 on:** June 06, 2010, 11:09:25 AM »

Quote from: Piru;563110

http://flac.sourceforge.net/

You're unlikely to beat flac with your custom "predictors".

Sorry, but FLAC isn't an option, because it's to heavy. Someone is doing a FLAC decoder for Amiga, and while it's quite optimized, it still uses about 90% CPU time on a 50Mhz '030. This is sadly much too heavy. Not a problem for me personally, but I want this to work on lesser CPUs than said CPU.

Also, I'm not using predictors at all right now (tried something quick, and failed :lol:), just deltas

Thorham · « **Reply #5 on:** June 06, 2010, 01:57:02 PM »

Quote from: Karlos;563116

It supported, if I recall correctly, between 2-5 bit encoding. It was lossy but didn't sound too awful.

Cool, but I really want lossless encoding. Lossy encoding is simply not an option.

Quote from: Karlos;563116

Each compressed frame contained a 16-bit word that had some encoding flags, followed by a 16-bit signed sample (2 for stereo and so on) that comes unmodified from the input frame and serves as the starting point. After that, the next 4/8/16/32 16-bit words contain the best 4/8/16/32 delta values as determined by the encoder.

The remainder of the compressed frame was simply bitfield lookups into that table. Stereo data was interleaved IIRC. For 2-bit encoding, each 16-bit word contained 8 entries, 3-bit encoding stored 5 entries (LSB aligned), 4-bit encoding stored 4 entries, 5-bit stored 3 entries (LSB aligned).

The replay algorithm simply takes the start value and then extracts each field value, looks up the delta value from the table and adds it to the current value to recreate the next sample.

I am sure I still have the sources somewhere.

There are some interesting ideas in here that I may be able to use in my compressor, thanks

Quote from: bubblebobble;563122

Loosless compression will be always too expensive for your purpose, since it always involves some kind of zip compression.

Not necessarily. I'm first going to try the simplest of Huffmann implementations. This is very cheap. Basically, when I calculate the deltas, and the deltas of the deltas, I simply store the sign sepperately, and make negative values positive.

This has the effect of creating easily compressed delta values. They're currently stored in the following way: I add two bits per delta. These bits tell how many bits the delta contains. The bit lengths are simply 4, 8, 12 and 16. This could be improved by setting those ranges in a better way. After this, a very simple implementation of Huffmann encoding can be used.

The sign data is stored in 'bit toggled' form (each time a bit in the data is different than the previous bit, a one is written out, for repeating bits a zero is written out).

I'm hoping that those signs can be reasonably well compressed with a simple Huffmann encoder.

All this is very cheap to decode, and should give reasonable compression rates.

Quote from: bubblebobble;563122

If you are calculating deltas, you are already using a predictor. The predictor says "the next sample will have the same value as the current one". Which isn't a bad predictor at all. This gives you already 80% of the quality.

Do you mean sound quality? If that's the case, than that isn't correct. The process I use is lossless. In case it's not, than I don't understand what you mean

Quote from: bubblebobble;563122

If this is a game project, I would also consider using a lower samplingrate and maybe mono, because nobody would recognize your nearly-looseless afforts anyway. 44100Hz/16bit/stereo is already quite expensive to sqeeze through the Zorro Slot without any decoding involved.

In 16kHz/8bit/mono you could also consider to use mp3, and decode it "offline" into a ram buffer. If the songs are not soo long, this should be affordable. 8bit isn't too bad either if you do proper dithering. (same effect as with pictures, when doing dithering even 12bit color looks acceptable)

No, it'a not a game project. It's for music CD, something like an old-school music disk, but a lot more extensive.

That's the main reason for me wanting maximum quality. Also, there are already two lossy factors: The downsampling from 48Khz to 28Khz and playing back on a miggy so that two bits are lost. I really hope I don't have to use lossy compression

Thorham · « **Reply #6 on:** June 06, 2010, 04:39:21 PM »

Quote from: bubblebobble;563135

Again, loosless is expensive, and gives you in average 1:2 compression. If you even consider a huffman decoder and some nifty bit ticks, then you can affort ADPCM, this is cheaper. ADPCM8 can guarantee 1:2 compression and sounds almost as good as PCM16. Given your low-fi conditions (Paula14/28kHz), the quality loss is absolutely neglegtible.

You sure like your lossy encoding :lol: Anyway, I've already said that my lossless encoder isn't very heavy, certainly fast enough to decode on an A1200 with some fastmem in the trapdoor slot. But enough of that

Karlos has uploaded a nice archive for me, and I must say that the lossy encoding he uses sounds quite good, actually

In other words, lossy encoding is now a serious option, rather than just a last resort.

Quote from: bubblebobble;563135

You could also encode the stereo channel with 4bit, then you end up in 12 bits per stereo sample instead of 32bit, not too bad.

I'll try that.

Quote from: bubblebobble;563135

You should also consider 24kHz, because of the integer ratio of downsampling. The downsampling in your case has the biggest quality impact, much more than ADPCM8 would harm to your data.

I'm using a high quality algorithm from Sox on the peecee. Even when halving the sample rate, just taking the average may not be enough. I've used cheap methods, and they're bad

Quote from: bubblebobble;563135

All this of course depends on the actual audio data.
If this is sampled MOD music, produced with 8kHz samples in 8bit, all those assumptions might be wrong. I assume high-fidelity random pop music as you can hear in the radio.

The music is all the music from Final Fantasy 10, ripped to PSF format. This is the original, tracked audio data, and includes the player code from the game (!). PSF players 'simply' emulate Playstation 1 and 2 audio hardware and CPU (and various other bits, of course), producing the original sound.

Quote from: Karlos;563139

Depending on your algorithm, your loss error might be limited to the bits you can't replay anyway. Also, consider how human hearing works. For example, you can't perceive the same degree of error in a quiet sound immediately after a loud one.

I didn't know that. Very interesting

Quote from: Karlos;563139

Experiment, I say.

Absolutely, and I'm also not even remotely done with my lossless experiments, yet.

Quote from: Karlos;563142

Anyway, if you are curious, I have dug around my old HD and found it.

I've uploaded the codec binary for you to play with. It only imports/exports AIFF 16-bit (mono and stereo supported).

Thanks

Quote from: Karlos;563139

I've encoded a short section of 44.1kHz stereo music, provided the compressed version and the decoded version for your appraisal. The default encode options were used which IIRC are 4-bit, frame length 256. This gives a compression of about 3.5:1.

Again, thanks

Sounds good! I expected a lot worse, to be honest, and now that I've heard this kind of lossy compression, I must sat that it has definitively become a serious option to consider for me. However, I do hear the difference, unfortunately, and that's without high end equipment, so I would need a solution for that.

Quote from: Karlos;563139

-edit-

I think the codec has been compiled with FPU support, which isn't used in the codec but may be used when interpreting the AIFF sample rate (which is stored as an 80-bit long float)

Oh, good to know, I don't have an FPU on my Blizzard '030. Guess I'll use WinUae, then, no problem. If I'm going to use this, then I have 1.67 gigabytes to encode, and this would take forever on my miggy anyway.

Thorham · « **Reply #7 on:** June 06, 2010, 06:46:43 PM »

@Karlos: Thanks for the explanation.

Quote from: Karlos;563150

Try encoding with -brate 5 and -fsize 128. That should produce better quality, at the expense of file size.

I'll try that.

Quote from: bubblebobble;563152

I think you should first define how much memory you want to spend to store the music, and from that estimate the compression ratio that you need.
Given the compression ratio and the CPU power, we can evaluate what your options are.

Okay, here goes:

There's 93 WAVs which use up 1.67 gigabytes. Many of them (80+) are looped, and thus have repeating data. The repeats may take up 25% to 50% of the data. It's probably less than 50%. The problem with this that although the loopings can be chopped off and done in software easily enough, it has to be done by hand, and for so many tracks this is a downright pain in the backside, and it's certainly something I don't want to have to do if it's avoidable.

I want to store them on a CD with a couple of megabytes to spare for code an graphics (one megabyte will probably be more than enough).

The CPU I'm working on is a 50 mhz '030, but the lowest target should be something like an A1200 with some fastmem in the trap door. Or, at max, a 28 Mhz '020 board (a Blizzard, I believe).

It would be great if I could find the loop times somewhere, because then this would be a done deal.

Quote from: bubblebobble;563152

A fact is: on avarage music, looseless compression will give you approximatly 2:1. You cannot break this barrier, otherwise you would be a good candiate for the Nobelprize in natural sience. ;-)

Don't you mean computer science

Somehow I doubt nature has set this ratio to 2:1, though, and I like to believe it can be done, but that's just me :lol:

Quote from: bubblebobble;563152

An experience: loosless doesn't necessarily mean the result sounds worse than the original. Looseless just says that the data is not reproduced bit-identical, like one needs for exact data like exacutables.

That's a good point, I never considered that.

Quote from: bubblebobble;563152

I would always prefere ADPCM8 over looseless, because the difference is not audible to humans and th compression ratio is predictable fix 2:1.

While ADPCM and similar lossy techniques are certainly an option now, ADPCM8's 2:1 ratio still isn't good enough, I'm afraid

Karlos's method, however, might be the solution to this problem.

Thorham · « **Reply #8 on:** June 06, 2010, 09:35:55 PM »

Quote from: Karlos;563163

I don't currently have the source code handy (I'll have to dig through a lot of backup cd's)

Oh no, don't search for it, I might not use it, and I much prefer a good explanation anyway. Usually, even when I don't end up using something, an explanation always contains interesting and useful ideas, and is thus much more enlightening than source code (where the source code is basically stripped to what's needed, and that's then used as is).

Quote from: Karlos;563163

1) Choose a frame length and bit rate (say 256 samples / 4-bit for example)

2) For one complete frame of audio, transform the samples into a sequence of delta values, leaving the first sample as is (ie, in a mono stream with frame length 256, you now have 1 sample and 255 subsequent delta values). Another way of looking at it is that you have 256 delta values from 257 samples, where sample 0 had the value 0.

Note that for a stereo stream, remember that the source samples are usually interleaved so remember that when performing this step. Unless you plan to do some mid + side encoding, treat them separately.

3) Now find all the unique delta values for your frame and the popularity of each one. Don't include the first one here. My method simply did a qsort() and then walked through them counting duplicates as it went. Not particularly fast, but for encoding, who cares?

4) Use a reduction algorithm (I tried several) to find the best fit 2^N delta values for the above set, where N is your "bit rate".

5) Store the first delta value (which is the same as the first sample in the original frame) exactly (or pair of samples for a stereo stream) as 16-bit signed data.

6) Store these best fit delta values as 16-bit signed data. This is now your delta table with which to encode the rest of the frame.

7) Starting with your unblemished "start" sample, for each successive sample in the original frame, choose the delta value from your table that gets you nearest to that sample without clipping. Store the index of the used delta value as a bitfield, packing successive bitfields into 16-bit words.

Repeat from (7) until you've encoded the entire frame.

If I remember correctly, my compressed frame, now looks something like this, assuming a mono source with 4-bit encoding

Code: [Select]
word 000: [ frame header word ] 001: [ start sample ] 002: [ best fit delta 0 ] 003: [ best fit delta 1 ] 004: [ best fit delta 2 ] ... 016: [ best fit delta 14 ] 017: [ best fit delta 15 ] 018: [ev 004][ev 003][ev 002][ev 001] 019: [ev 008][ev 007][ev 006][ev 005] ... 081: [ empty ][ev 255][ev 254][ev 253]

ev N: encoded delta value for original sample N. Note we don't bother encoding the first (zeroth) sample as we already have it. Thus the last bitfield is always empty in a word aligned stream such as above. For 3/5-bit encoding, this may or may not always be true.

A stereo stream with the same frame length is encoded as follows:

Code: [Select]
word 000: [ frame header word ] 001: [ start sample R ] 002: [ start sample L ] 003: [ best fit delta 0 ] 004: [ best fit delta 1 ] 005: [ best fit delta 2 ] ... 017: [ best fit delta 14 ] 018: [ best fit delta 15 ] 019: [evL 002][evR 002][evL 001][evR 001] 020: [evL 004][evR 004][evL 003][evR 003] ... 021: [ empty ][ empty ][evR 127][evL 127]

Notice that the encoder regards frame length as total number of samples, it doesn't consider a stereo frame of length 256 as having 256 sample pairs.

Decoding the above data is so easy that even a vanilla 68000 can do it. Assuming you have a compressed frame in memory, you simply:

1) set a pointer into the best fit area
2) set your current sample value to the start value
3) write your current value to the output
4) extract the next ev bitfield from the compressed block
5) look up the delta value indexed by your value from (4)
6) add it to the current sample
7) repeat from 3 until the entire frame has been decoded.

That's quite clear, and very interesting, thanks a tonne, much appreciated

Quote from: bubblebobble;563165

You could write a tool that tries to find the loop points, but that would probably take longer than editing them manually.

Yes, it would, even a quick and dirty one. But at least it would be much less boring, though

Quote from: bubblebobble;563165

No, I do mean natural sience. And unfortunately yes, nature has set this to 2:1. Without extra World-knowledge, the entropy of an average music signal in time domain is roughly 0.5, means 1bit gives 0.5bit of information.

Rreally? But doesn't entropy mostly apply to entropy coders?

Quote from: bubblebobble;563165

You will never ever be able to compress better than 2:1. The sooner you accept this, the better for your precious spare time.

I can never accept these things. And always have to challenge them :lol: And don't worry about my precious time, because I like spending my free time on things like this

Even when this sort of thing fails (which tends to happen most of the time, of course :lol:), I've still learned something. Going through these kinds of failures is better than simply taking someones word for it (too easy)

Quote from: bubblebobble;563165

So dont fool yourself by thinking you can beat this.

The point in trying is that people may have missed things. It happens. Also, if no one challenges existing methods, then in my opinion there's no progress. Although I'm certainly not kidding myself in believing that I can beat these ratios, I also have to say that I won't know until I try. While I probably won't beat them, half the fun is in trying

Quote from: bubblebobble;563165

If you need more than 2:1, loosless is out of the game anyway.

Probably

Quote from: bubblebobble;563165

The best is mpeg, e.g. mp3 can easily reach 1:10 without significant degradation. With ADPCM, you could get 3:1 I'd say. ADPCM is fast and easy to implement compared to mpeg.

If I'm forced to use lossy compression, I think I'll first try Karlos's method. Seem easy enough to implement, so I'll experiment with that first, or better yet, compare it to ADPCM and see wich produces the best quality at the right compression rate.

Thorham · « **Reply #9 on:** June 07, 2010, 04:11:19 PM »

Quote from: bubblebobble;563238

> Rreally? But doesn't entropy mostly apply to entropy coders?
Everybody has to obey the laws of nature. If you want to or not.

Sure, but that doesn't mean I can't play with this.

Quote from: bubblebobble;563238

Plus, it is quite pathetic to think that what hundreds of PhD-Level researchers achieved over decades can be wiped away by a hobbyist in a few afternoon-sessions without even undestanding the fundamentals of information theory.

Yes, but I don't think that. I know that I don't know much at all about this, and because of this, I also know it's likely that I'll fail to do it better. However, this doesn't mean that I have to take what all the PhD's say at face value, without challenging it. Like I said, trying is half the fun, and if I stop doing that, then I might as well throw my whole programming hobby in to the bin.

Quote from: bubblebobble;563238

I'll post you the code for my ADPCM4 implementation soon. It should be fast enough for a vanilla A1200, and doesn't need a lot of stuff around it, just one function to encode and one to decode.

Sounds good, thanks

Quote from: bubblebobble;563238

You can use my Tool "AudioConverter" or Samplemanager to generate ADPCM files and listen to the result.

I'll certainly do that.

Quote from: bubblebobble;563238

Karlos' algorithm follows a so called "code-book based" approach.
From what I can see it has several drawbacks:

1. On 256/4bit is has more than 25% data overhead because he stores individual codebooks for each frame. Means instead of 4:1, you will get ~3:1.
2. How to find the "best" delta representants is not well defined and might need a lot of experimenting to find the optimal algoritm.
3. The encoder has a very high complexity because it needs to do vector quantisation. Luckily only for the encoder.
4. Doesn't make use of the assumption that the left and right channel of a stereo signal are correlated.
5. "Wastes" precious 4 bits in the 256/4bit case ;-)
6. The choosen Deltas may cause an error of up to 4096 quantisation steps (=reduces to 4bit PCM quality) in the worst case. However, very unlikely of course, but unlike in ADPCM, the error is not correlated with the high frequencies, so the error is not "masked".

Those are interesting considerations, although it wouldn't matter much, because of the 3:1 compression, which is already enough. It's a different story if the audio quality of ADPCM is better.

Quote from: Karlos;563251

Perhaps the title should be changed to "(Not quite so) Simple..."

Yes, it has become less than simple, hasn't it?

Quote from: bubblebobble;563256

@Karlos

What about a competition? ;-)

Let Thorham choose a short sample as uncompressed 16bit .wav (lets say 10 secs).
(maybe with some music, fading, voice, sound effect etc.)
We can both compress and uncompress it again, and let him decide what sounds better?
(as a proof, we both must provide the compressed file too, of course).

Sounds interesting. Perhaps I'll do that.

Quote from: bubblebobble;563260

Classic Amiga.

I have one here, so it's no problem to try this.

Thorham · « **Reply #10 on:** June 09, 2010, 03:02:05 PM »

Quote from: bubblebobble;563682

Here is the C-port of the simpliest version of the ADPCM4 decoder for 16bit/mono. It packs exactly 1:4.

Thanks for the code

Quote from: bubblebobble;563682

Check if this is easy and fast enough for your purposes. The code can be optimized, this here is tuned to be a clean example.

This should be fine, and can easily be rewritten in assembler.

Quote from: bubblebobble;563682

The audio quality degrades of course, but how much depends on the actual material.
There are also better versions, but the computation power is higher, and the result is only slightly better.

So far I've only tried IMA ADPCM, and I must say that the quality isn't good enough. Will your encoder do better? Normally I'm not much of an audiophile, but in this case quality is everything (which is why I'm still trying to do this with some sort of lossless encoder as well).

I've tried Samplemanager, but this fails on my WinUae setup (which is the same as my Amiga setup). I've also checked Aminet for AudioConverter, but it doesn't seem to be on there. Should I try/search a little harder?

Thorham · « **Reply #11 on:** June 09, 2010, 04:21:53 PM »