Author Topic: Fourier analysis (of a sound) (Read 7113 times)

motorollin · « **on:** October 04, 2007, 07:59:43 PM »

I was fascinated to learn recently that any sound, no matter how complex, can be broken down in to constituent sine waves of varying frequencies, and that if all of those sine waves are played back simultaneously they will be perceived as the original sound.

I would like to try this. So does anyone know of any software which will perform the Fourier analysis on a sound file and show the parameters for the resulting sine waves? Once I have these I can generate the sine waves and layer them up, and hopefully the result will be the original sound (more or less).

--
moto

Cymric · « **Reply #1 on:** October 04, 2007, 10:43:59 PM »

I know of a DOS program which did this, but I wouldn't know where to find this nowadays, let alone that you can run a DOS program on 'modern' machines without some serious virtualisation :-). (... Oh wait, here it is. I remembered the author mentioning it being used for studying whale migration data, and then Google can easily find it...) You'll be wanting a software library which does the so-called fast fourier transform or FFT: there are a few available on Aminet. The FFT generates exactly the same result as a normal FT; the 'fast' stems from the fact that it is a much faster algorithm.

If you want to code the Fourier transform yourself, then start with the 'slow' algorithm as it is easier to grok by far. For small samples to transform, the difference in execution time isn't really noticable, and the fast algorithm requires a few arcane bookkeeping tricks to make it fast in practice, too.

nicholas · « **Reply #2 on:** October 04, 2007, 11:08:20 PM »

I think Karlos is your man for this Mark.

Karlos · « **Reply #3 on:** October 04, 2007, 11:29:45 PM »

@Motorollin

It isn't quite as simplistic as that. You can't break down a sound into sine waves in quite the way you might think. The Fourier Transform essentially translates amplitude data into the frequency domain. You end up with a spectrum of frequencies and phase shifts. The information you thus end up with is time independent.

If your source sound is a continuous, invarying waveform then you can reduce it to sine waves with this technique and reproduce it. This works because the loss of time information is not relevent when reconstructing a time invarying waveform.

However, complex sounds obviously change with time. Translation into the frequency domain produces time invariant data so what you have to do is to split your sound into tiny windows and perform a discrete FFT on each one. Each frame of sound will have a different spectrum that you can turn back into an impulse of sound.

It gets more complicated than this because simply playing back the decoded frames of sound still has the problem that each one was converted into a time invarying representation, so the frame data won't exactly align neatly from one frame to the next. To mitigate this, you need to further work on overlapping the frames with a function that smooths the transitions and so on.

All that aside, yes you can convert complex time-varying sounds to frequency domain data and back. If you want to hear what that sounds like, look no further than your MP3 player ;-)

Boot_WB · « **Reply #4 on:** October 05, 2007, 12:37:20 AM »

@Karlos

I believe you are not entirely correct here.
Any complex signal, regardless of complexity, can be expressed using the fourier transform as long as it is a finite time length. However, the more complex the signal, the larger the frequency domain becomes.
Consider a single directional component of a siesmic signal - essentially a finite, but random (and highly complex) acoustic signal. This can be expressed on a single graph as a range of frequencies and amplitudes (and phase displacements) which, depending on the comlexity of the sampling used, express this essentially random acoustic signal accurately.
Of course, as time and complexity increase the frequency domain must be increased to allow the superposition of the sine waves to result in the intended complex waveform.

Karlos · « **Reply #5 on:** October 05, 2007, 09:04:12 AM »

Well, the mathematical ideal is that, yes, but we were talking about actual software applications of FT for encoding signals. This is almost always used as a form of data reduction, where anything resulting in a larger dataset (in the frequency domain) than the original signal would be pointless.

The greatest resolution FT analysis I've come across in real life is the generation of NMR spectra in the lab where a radio freqency impulse is used to excite the nuclei resulting in a complex signal (it actually sounds like striking a bell) which is then sampled at high resolution and given a very large transform into the frequency domain (took several minutes on a machine that can encode wave to mp3 many times realtime).

Certainly you could reproduce the signal from that data with a high degree of accuracy if you wanted to, but the actual dataset was several times larger than the original impulse samples :-D

motorollin · « **Reply #6 on:** October 05, 2007, 10:31:54 AM »

Well, I don't care if the resulting dataset is larger than the original sound, since I'm not interested in this for the purpose of compression. I just want to try it out of curiosity, to see if it really does work :-) (and no, I can't just take your word for it ;-) )

I understand that the frequencies and amplitudes of a complex sound will vary over time, and thus so will the amplitudes of the constituent pure tones. So how would you get around this? Perform the Fourier analysis on each individual sample, then generate the pure tones for each sample one sample in length, then play back all of the resulting (different) samples consecutively?

Again, I know this is pointless since the goal is to end up with pretty much the same sound you started with. It is purely for personal interest not for any practical application that I want to try this. Imagine how exciting it would be to take a sample of your own voice saying "hello", perform the necessary analyses to determine the parameters for its constituent pure tones, then generate all of those tones simultaneously and magically hear your own voice saying "hello"!

--
moto

Karlos · « **Reply #7 on:** October 05, 2007, 12:47:20 PM »

@Motorollin

Quote

I understand that the frequencies and amplitudes of a complex sound will vary over time, and thus so will the amplitudes of the constituent pure tones. So how would you get around this

You don't get around it. When taking a finite duration signal, the (F)FT provides you with a wide spectrum of frequencies, their intensities and their phase shifts. However, the data you get is time invarying.

When doing a full transform, the decomposition produces a spectrum of frequencies. When you reproduce the sine waves with the appropriate intensity, frequency and phase shift, the complex constructive/destructive interaction of them (re)produces the time varying properties you heard in the origninal sound.

What you can't do is take an indefinately long stream of random sound and convert it, since your frequency spectrum would have an indefinite number of components (a looping waveform is of course a special case since it ultimately has a finite length before it repeats). Therefore you have to work with a finite signal durations which will result in a finite number of frequency components in the resulting transformation.

It's not just sound that this works for. Frequency domain transformations are the basis for JPEG and most video compression technologies too.

Oliver · « **Reply #8 on:** October 05, 2007, 04:29:22 PM »

Hi Moto,

If you really want to understand Fourier Transforms, you can't avoid the maths/physics of waves, signals, and systems. This page has a fair introduction.

If you would like to see an intuitively satisfying example, with a periodic signal, have a look at this demonstration of Gibbs Phenomenon. One can see how the summation of a series of sinusoidal waves, each with the appropriate magnitudes and phase shifts, can approximate a square wave. You can also see how a band width limit (finite number of sine waves) affects the approximation, producing ringing in the reconstructed signal.

Examining a signal in a series of sliding windows doesn't really get around the nature of waves and signals, but is a functional way of examing signal characteristics. The specific window functions also affect the nature of the spectral information being examined. With something like digital audio, it is already characterised to a fair extent, and a sliding window is a very useful approach.

You may also like to look at the phenomenon of wave packets. Wave packets can describe a non repetitive pulse in time/frequency (actually time and frequency are just one pair of inverse variables, it's true for all pairs of inverse variables, such as wavelength lambda, and spatial frequency). Have a look at this link for a description (sorry I wasn't able to find a good animation for this one). Sinusoidal signals covering a very wide bandwidth, can sum to form a localised pulse in time/space.

If you really want to explore this material, MATLAB is a really good tool, but is quite mathematical. MATLAB has some GUI demonstration tools, which are quite good, but I can't recall if there is anything appropriate to this.

A really good book for this stuff is Signals & Systems, by Haykin and Van Veen.. I think you're at university now, right? I think most electronic engineering libraries will have a copy of this, if you want to have a flip through it. Some really good diagrams of signals in there.

You asked about Fourier as related to sound: interestingly enough, all waves have a set of properties in common, so the principles are trasnferable. The same principles are relevent to the quantum mechanical wave model of matter, and has baring on concepts like the uncertainty principle.

Btw, you seem be thinking on a number of maths/physics/programming/engineering type questions; are you sure you're not being lured into an engineering career? :crazy:

-Oli

Boot_WB · « **Reply #9 on:** October 05, 2007, 07:37:06 PM »

Quote

Boot_WB wrote:
Consider a single directional component of a siesmic signal - essentially a finite, but random (and highly complex) acoustic signal. This can be expressed on a single graph as a range of frequencies and amplitudes (and phase displacements) which, depending on the comlexity of the sampling used, express this essentially random acoustic signal accurately.

Slight correction - there would, of course, be two graphs: a frequency-amplitude graph and a frequency-phase graph (The phase and amplitude spectra).

Quote

Btw, you seem be thinking on a number of maths/physics/programming/engineering type questions; are you sure you're not being lured into an engineering career?

What's wrong with an engineering career? :-D

motorollin · « **Reply #10 on:** October 06, 2007, 10:04:24 AM »

Quote

Karlos wrote:
When doing a full transform, the decomposition produces a spectrum of frequencies. When you reproduce the sine waves with the appropriate intensity, frequency and phase shift, the complex constructive/destructive interaction of them (re)produces the time varying properties you heard in the origninal sound.

That seems to contradict what you said earlier, unless I misunderstand this:

Quote

Karlos wrote:
However, complex sounds obviously change with time. Translation into the frequency domain produces time invariant data so what you have to do is to split your sound into tiny windows and perform a discrete FFT on each one. Each frame of sound will have a different spectrum that you can turn back into an impulse of sound.

@Oliver
I will have a look at those web sites when I have time. Unfortunately I have too much reading to do anything else :-(

Quote

Oliver wrote:
Btw, you seem be thinking on a number of maths/physics/programming/engineering type questions; are you sure you're not being lured into an engineering career?

Heh, no. The maths/trig/programming questions were to do with my port of GridWars to the GP2X. Part of my Speech & Language Therapy course (which I started in September) is a module on Hearing & Acoustics, which is all about the physics of sound. So that's why I've been asking questions about Fourier. Though I find it interesting, I struggle to see how it is of use to a Speech & Language Therapist.

--
moto

Oliver · « **Reply #11 on:** October 06, 2007, 02:44:20 PM »

Quote

What's wrong with an engineering career?

Well, it's certainly not without its good points, but there are plenty of headaches along the way, for the unsuspecting.

Some of my engineering classes had an over 40% failure rate, and very low levels of satisfaction. A very common comment by people in my electronic engineering classes was "it's nothing like what you expect it to be" or "damn it, there's just too much maths". For me, it wasn't too far form what I expected, but there were still plenty of headaches. Anyway, I love physics, and tinkering, so it's still rewarding for me.

Quote

Though I find it interesting, I struggle to see how it is of use to a Speech & Language Therapist.

Yeah, I'm not too sure if you would need a deep understanding of Fourier analysis for that. Maybe a couple of things to understand would be:

* Frequency content of a signal can be used to characterise a system which the signal comes from or a system which it passes through. So, I guess there could be some application to dealing with speech impediments (is that what your course is about?). I'm not sure it would be any better than using your own ear, though.

* All waves have a fundamentally sinusoidal nature (this is paraphrasing somewhat ;-) )

* Complex variables (complex number type variables, typically with i,j,\theta,f) being used in Fourier analysis, provide a mathematically convenient/powerful form to deal with signals and systems; if you don't need to work with them in detail, then don't worry too much about understanding things like the complex plain, or negative frequency components, which seem to deter a lot of beginners.

I'm guessing an understanding of accoustic propagation may be useful though, and I think it could be understood quite well without delving too much into things like Fourier analysis.

There's also an interesting historical side note to all this: Joseph Fourier was a member of Napolean's army, and if I remember correctly, developed notions used in Fourier analysis well before they could be practically applied for what they are used today.

-Oli

Karlos · « **Reply #12 on:** October 06, 2007, 03:36:09 PM »

Quote

motorollin wrote:
Quote
Karlos wrote:
When doing a full transform, the decomposition produces a spectrum of frequencies. When you reproduce the sine waves with the appropriate intensity, frequency and phase shift, the complex constructive/destructive interaction of them (re)produces the time varying properties you heard in the origninal sound.

That seems to contradict what you said earlier, unless I misunderstand this:

Quote
Karlos wrote:
However, complex sounds obviously change with time. Translation into the frequency domain produces time invariant data so what you have to do is to split your sound into tiny windows and perform a discrete FFT on each one. Each frame of sound will have a different spectrum that you can turn back into an impulse of sound.

It sure does sound that way, but that's because I wasn't being entirely clear. You have the mathematical ideal on one hand and computational reality on the other.

In the pure mathematical case, a finite duration signal of arbitary complexity (that varies across time) can be broken down into a spectrum composed of (up to) an infinite number of discrete sine waves, each having a distinct frequency, amplitude and phase shift. These sine waves do not vary over time (except in the regular sense), ie, they don't have an envelope and they don't slowly change phase or frequency. Therefore, this spectrum is not a function of time in the same way the original signal was.

Now, if you take this (potentially infinite) number of sine waves and mix them back together, the interference between them reproduces the original signal for the duration of that original signal. Mathematically you cannot do this for an infinitely long random signal since you'd require an infinitely large spectrum which is incalculable.

In computational reality, you simply cannot do this. The longer your complex sound, the larger your resulting spectrum is going to get and the longer it's going to take to calculate (you're looking at O(N log N) for FFT or O(N*N) for 'simple' FT).

Instead, you slice up your sound into frames which are individually a much shorter length and perform the (F)FT on each one individually, where you'd get a much more reasonable (and hence managable) spectrum per frame than you would for the entire original signal.

The only time when you can reasonably encode a long duration sound in one FT is when that sound is basically time invarying (or varies in a cyclic way) since you get a managable spectrum out of it.

Is that any clearer?

motorollin · « **Reply #13 on:** October 06, 2007, 05:17:57 PM »

Quote

Oliver wrote:
Frequency content of a signal can be used to characterise a system which the signal comes from or a system which it passes through. So, I guess there could be some application to dealing with speech impediments (is that what your course is about?). I'm not sure it would be any better than using your own ear, though.

Yes that is one aspect of my course. The only reason I can think of for teaching us about sound spectra is to demonstrate that a complex signal like speech is different from a pure tone in that it has a variety of frequencies embedded within it. If somebody's ears are damaged and they are unable to perceive certain frequencies, then auditory feedback of their own speech will be less effective, which can cause degradation in ability to produce speech sounds properly. Of course, detailed knowledge of Fourier analysis is not really needed. it has just piqued my interest :-)

Quote

Karlos wrote:
In the pure mathematical case, a finite duration signal of arbitary complexity (that varies across time) can be broken down into a spectrum composed of (up to) an infinite number of discrete sine waves, each having a distinct frequency, amplitude and phase shift. These sine waves do not vary over time (except in the regular sense), ie, they don't have an envelope and they don't slowly change phase or frequency. Therefore, this spectrum is not a function of time in the same way the original signal was.

Is that another way of saying that even if the original complex signal is aperiodic, the resultant sine waves will be periodic?

Quote

Karlos wrote:
Now, if you take this (potentially infinite) number of sine waves and mix them back together, the interference between them reproduces the original signal for the duration of that original signal. Mathematically you cannot do this for an infinitely long random signal since you'd require an infinitely large spectrum which is incalculable.

Understood. But I'm not talking about using an infinitely long signal - just a short one, e.g. somebody saying "hello".

Quote

Karlos wrote:
In computational reality, you simply cannot do this. The longer your complex sound, the larger your resulting spectrum is going to get and the longer it's going to take to calculate

That makes sense. And I would assume that a speech sound would be fairly complex, even if it is short.

Quote

Karlos wrote:
Instead, you slice up your sound into frames which are individually a much shorter length and perform the (F)FT on each one individually, where you'd get a much more reasonable (and hence managable) spectrum per frame than you would for the entire original signal.

Ahhh I think I understand what you're saying. A FT on a whole 10 second signal may create 10 million pure tones, whereas a FT on each of the 10 one-second chunks would create 10 much smaller spectra, maybe of 1000 pure tones each, because each chunk is less complex (by virtue of the fact that it is shorter). Is that kind of what you mean?

Quote

Karlos wrote:
The only time when you can reasonably encode a long duration sound in one FT is when that sound is basically time invarying (or varies in a cyclic way) since you get a managable spectrum out of it.

Sure, but as I said it would only be a very small signal anyway.

Quote

Karlos wrote:
Is that any clearer?

Yes, unless I have totally misunderstood :lol:

--
moto

Karlos · « **Reply #14 on:** October 06, 2007, 05:34:37 PM »

Quote

Is that another way of saying that even if the original complex signal is aperiodic, the resultant sine waves will be periodic?

An elegant summary, sir!

Quote

Understood. But I'm not talking about using an infinitely long signal - just a short one, e.g. somebody saying "hello".

Of course. However, the reason I mentioned that was to tie in with things like live stream encoding, which are essentially indefinately long. You have no choice but to encode such things in discrete chunks.

Quote

Ahhh I think I understand what you're saying. A FT on a whole 10 second signal may create 10 million pure tones, whereas a FT on each of the 10 one-second chunks would create 10 much smaller spectra, maybe of 1000 pure tones each, because each chunk is less complex (by virtue of the fact that it is shorter). Is that kind of what you mean?

Exactly that. To give you an idea, MP3 typically encodes chunks of source audio that are 576 samples long. If there is a transient (a sudden, sharp signal) in the source data, it uses a chunk of 192 samples. These slices are but only a few milliseconds long at 44kHz.

In contrast, the machines that I used to use to get NMR spectra for the compounds I was preparing would FFT a signal that was several seconds long, repeating the process several times and averaging the results until a sharp, well defined spectrum was obtained. These machines had horsepower could encode normal mp3 many times realtime, but would take several minutes to produce the FFT for the NMR spectrum.

Author Topic: Fourier analysis (of a sound) (Read 7113 times)

motorollin

Fourier analysis (of a sound)

Cymric

Re: Fourier analysis (of a sound)

nicholas

Re: Fourier analysis (of a sound)

Karlos

Re: Fourier analysis (of a sound)

Boot_WB

Re: Fourier analysis (of a sound)

Karlos

Re: Fourier analysis (of a sound)

motorollin

Re: Fourier analysis (of a sound)

Karlos

Re: Fourier analysis (of a sound)

Oliver

Re: Fourier analysis (of a sound)

Boot_WB

Re: Fourier analysis (of a sound)

motorollin

Re: Fourier analysis (of a sound)

Oliver

Re: Fourier analysis (of a sound)

Karlos

Re: Fourier analysis (of a sound)

motorollin

Re: Fourier analysis (of a sound)

Karlos

Re: Fourier analysis (of a sound)