My basic assumptions were as follows
1) We analyse a window of sound to get the frequency spectrum for that window.
2) We identify the lowest and highest frequencies for which the intensity (contribution to the overall signal strength) is greater than a threshold set by the user.
3) We use those frequencies as the limits to build an FIR filter and process the original window samples to eliminate as much of those frequencies above and below as possible.
Ok, all those points are more or less straight forward, so I add two other points that must be considred.
2a) Decide (in some way) the number and size of the windows.
4) Develop some kind of method to change the parameters of the filter as a function of time (position). This is probably the hardest part and it depends on the windowing technique we want to use. It is important that the filter parameters doesnt change too much (highly nonlinear filters can give rise to unwanted results).
To test the method we could skip point (2a) and fix the number and size of the windows but to get a good and robust method we probably need to find some kind of (local) maximum of the number and size of the windows (we need at least know that the specific size and number works for the data).
I would love to do some calculations on this, but Im leaving for vacation tomorrow and I will be back in about 2 weeks. If you are still interested we can continue our discussion then and I can do some basic analysis (if someone else hasnt already done it by then).
Good luck and if you get any results, please post them here (or pmail them to me).
/W