Dear Raskolnikov, Brian and Sebastian, thanks you very much for sharing your ideas.
I cross posted my question in two forums, so you can check the complete discussion by looking here and herehere.
From the discussion I got three main elements:
- Haar wavelet provide a solution to my problem, although a very suboptimal one
- This problem could be solved via an approach similar matching pursuit (but the full enumeration of the dictionary of bases needs to avoided, because it is too large in my problem)
- Sebastian Reichelt provided code for an approach based on maximum and minimum subarray search
All and all I ended up with four different solutions to the presented problem:
explore_a
indicates my original solution suggested in question (quantizing the $a_i$ values and solving multiple maximum subarray problems)brute_force
indicates the approach based on approximating $\left|w' - w\right|$ by $\left(w' - w\right)^2$. In this approach, it turns out that each rectangle possibility can be evaluated with only two memory reads, one multiplication and one division, making a brute force exploration tractable.reichelt
indicates a port of Sebastian's code to python (which I used to test the ideas)abs_max
indicates an approach where at each iteration a rectangle of a single element, placed at the maximum absolute value of the signal, is selected.
All the code implementing these methods and some example results are available at http://lts.cr/Hzg
At each run the code will generate a new "random" signal (with a noisy step). An example signal (labeled a[0:n]
) and the first rectangle selected by each method can be seen below. Example signal http://lts.cr/adL
Then each method is run recursively to approximate the input signal, until the approximation error is very low or the maximum number of iterations have been reached.
At typical result can be seen below (and another here) Example result 1 http://lts.cr/adT
After running the code multiple time (for different random signals), the behavior seems consistent:
- Unsurprisingly,
abs_max
reaches convergence in a number of iterations equal to the signal length. It does so quite fast (exponential decay). explore_a
decrease the energy fast initially and then tends to stagnate.reichelt
is consistently worse thanexplore_a
, getting stuck at a higher energy level. I hope I did not do any dumb mistake in the port from C to Python. By visual inspection the first rectangle selected seems reasonable.brute_force
is consistently the method that decreases the energy the fastest. It is also consistently intersects theabs_max
solution, which indicates that a better strategy would be to switch from one method to the other.
Obviously the exact behavior changes from run to run and would certainly change depending on the method used to generate the "random" signal. However I believe these initial results are a good indicator. I feel that it is reasonable now to proceed to generate my real data, and evaluate how well/fast brute_force
and explore_a
run there.
Feel free to add comments or play around with the code.
Again, thank you very much for your inputs and insights !