Author Topic: Fourier analysis (of a sound) (Read 8047 times)

Karlos · « **on:** October 04, 2007, 11:29:45 PM »

@Motorollin

It isn't quite as simplistic as that. You can't break down a sound into sine waves in quite the way you might think. The Fourier Transform essentially translates amplitude data into the frequency domain. You end up with a spectrum of frequencies and phase shifts. The information you thus end up with is time independent.

If your source sound is a continuous, invarying waveform then you can reduce it to sine waves with this technique and reproduce it. This works because the loss of time information is not relevent when reconstructing a time invarying waveform.

However, complex sounds obviously change with time. Translation into the frequency domain produces time invariant data so what you have to do is to split your sound into tiny windows and perform a discrete FFT on each one. Each frame of sound will have a different spectrum that you can turn back into an impulse of sound.

It gets more complicated than this because simply playing back the decoded frames of sound still has the problem that each one was converted into a time invarying representation, so the frame data won't exactly align neatly from one frame to the next. To mitigate this, you need to further work on overlapping the frames with a function that smooths the transitions and so on.

All that aside, yes you can convert complex time-varying sounds to frequency domain data and back. If you want to hear what that sounds like, look no further than your MP3 player ;-)

Karlos · « **Reply #1 on:** October 05, 2007, 09:04:12 AM »

Well, the mathematical ideal is that, yes, but we were talking about actual software applications of FT for encoding signals. This is almost always used as a form of data reduction, where anything resulting in a larger dataset (in the frequency domain) than the original signal would be pointless.

The greatest resolution FT analysis I've come across in real life is the generation of NMR spectra in the lab where a radio freqency impulse is used to excite the nuclei resulting in a complex signal (it actually sounds like striking a bell) which is then sampled at high resolution and given a very large transform into the frequency domain (took several minutes on a machine that can encode wave to mp3 many times realtime).

Certainly you could reproduce the signal from that data with a high degree of accuracy if you wanted to, but the actual dataset was several times larger than the original impulse samples :-D

Karlos · « **Reply #2 on:** October 05, 2007, 12:47:20 PM »

@Motorollin

Quote

I understand that the frequencies and amplitudes of a complex sound will vary over time, and thus so will the amplitudes of the constituent pure tones. So how would you get around this

You don't get around it. When taking a finite duration signal, the (F)FT provides you with a wide spectrum of frequencies, their intensities and their phase shifts. However, the data you get is time invarying.

When doing a full transform, the decomposition produces a spectrum of frequencies. When you reproduce the sine waves with the appropriate intensity, frequency and phase shift, the complex constructive/destructive interaction of them (re)produces the time varying properties you heard in the origninal sound.

What you can't do is take an indefinately long stream of random sound and convert it, since your frequency spectrum would have an indefinite number of components (a looping waveform is of course a special case since it ultimately has a finite length before it repeats). Therefore you have to work with a finite signal durations which will result in a finite number of frequency components in the resulting transformation.

It's not just sound that this works for. Frequency domain transformations are the basis for JPEG and most video compression technologies too.

Karlos · « **Reply #3 on:** October 06, 2007, 03:36:09 PM »

Quote

motorollin wrote:
Quote
Karlos wrote:
When doing a full transform, the decomposition produces a spectrum of frequencies. When you reproduce the sine waves with the appropriate intensity, frequency and phase shift, the complex constructive/destructive interaction of them (re)produces the time varying properties you heard in the origninal sound.

That seems to contradict what you said earlier, unless I misunderstand this:

Quote
Karlos wrote:
However, complex sounds obviously change with time. Translation into the frequency domain produces time invariant data so what you have to do is to split your sound into tiny windows and perform a discrete FFT on each one. Each frame of sound will have a different spectrum that you can turn back into an impulse of sound.

It sure does sound that way, but that's because I wasn't being entirely clear. You have the mathematical ideal on one hand and computational reality on the other.

In the pure mathematical case, a finite duration signal of arbitary complexity (that varies across time) can be broken down into a spectrum composed of (up to) an infinite number of discrete sine waves, each having a distinct frequency, amplitude and phase shift. These sine waves do not vary over time (except in the regular sense), ie, they don't have an envelope and they don't slowly change phase or frequency. Therefore, this spectrum is not a function of time in the same way the original signal was.

Now, if you take this (potentially infinite) number of sine waves and mix them back together, the interference between them reproduces the original signal for the duration of that original signal. Mathematically you cannot do this for an infinitely long random signal since you'd require an infinitely large spectrum which is incalculable.

In computational reality, you simply cannot do this. The longer your complex sound, the larger your resulting spectrum is going to get and the longer it's going to take to calculate (you're looking at O(N log N) for FFT or O(N*N) for 'simple' FT).

Instead, you slice up your sound into frames which are individually a much shorter length and perform the (F)FT on each one individually, where you'd get a much more reasonable (and hence managable) spectrum per frame than you would for the entire original signal.

The only time when you can reasonably encode a long duration sound in one FT is when that sound is basically time invarying (or varies in a cyclic way) since you get a managable spectrum out of it.

Is that any clearer?

Karlos · « **Reply #4 on:** October 06, 2007, 05:34:37 PM »

Quote

Is that another way of saying that even if the original complex signal is aperiodic, the resultant sine waves will be periodic?

An elegant summary, sir!

Quote

Understood. But I'm not talking about using an infinitely long signal - just a short one, e.g. somebody saying "hello".

Of course. However, the reason I mentioned that was to tie in with things like live stream encoding, which are essentially indefinately long. You have no choice but to encode such things in discrete chunks.

Quote

Ahhh I think I understand what you're saying. A FT on a whole 10 second signal may create 10 million pure tones, whereas a FT on each of the 10 one-second chunks would create 10 much smaller spectra, maybe of 1000 pure tones each, because each chunk is less complex (by virtue of the fact that it is shorter). Is that kind of what you mean?

Exactly that. To give you an idea, MP3 typically encodes chunks of source audio that are 576 samples long. If there is a transient (a sudden, sharp signal) in the source data, it uses a chunk of 192 samples. These slices are but only a few milliseconds long at 44kHz.

In contrast, the machines that I used to use to get NMR spectra for the compounds I was preparing would FFT a signal that was several seconds long, repeating the process several times and averaging the results until a sharp, well defined spectrum was obtained. These machines had horsepower could encode normal mp3 many times realtime, but would take several minutes to produce the FFT for the NMR spectrum.

Karlos · « **Reply #5 on:** October 07, 2007, 05:15:00 AM »

Quote

So what are the implications for some half-baked home experiment?

There's only one way to find out :-D It's almost certainly doable.

You can find a lot of example FFT code in C/C++ out there. Depending on the implementation of the algorithm you might have to pad your sample to the nearest power of two length (in samples).

I have some pretty fast code for performing FFT on audio (unless the backup CD rotted) but unfortunately it's written in assembler for a Zilog DSP...

Author Topic: Fourier analysis (of a sound) (Read 8047 times)

Karlos

Re: Fourier analysis (of a sound)

Karlos

Re: Fourier analysis (of a sound)

Karlos

Re: Fourier analysis (of a sound)

Karlos

Re: Fourier analysis (of a sound)

Karlos

Re: Fourier analysis (of a sound)

Karlos

Re: Fourier analysis (of a sound)