Developed independently in the fields of mathematics, quantum physics, electrical engineering, and seismic geology, the theory of wavelets has already found applications in image compression, vision analysis, and earthquake prediction. Understanding how it works in lay terms, however, is quite difficult.
Wavelets are functions that satisfy certain mathematical requirements and are used in representing data or other functions. This idea is not new. Approximation using superposition of functions has existed since the early 1800's, when Joseph Fourier discovered that he could superpose sines and cosines to represent other functions. However, in wavelet analysis, the scale that we use to look at data plays a special role. Wavelet algorithms process data at different scales or resolutions. If we look at a signal with a large "window," we would notice gross features. Similarly, if we look at a signal with a small "window," we would notice small features. The result in wavelet analysis is to see both the forest and the trees, so to speak. This makes wavelets interesting and useful. For many decades, scientists have wanted more appropriate functions than the sines and cosines which comprise the bases of Fourier analysis, to approximate choppy signals (1). By their definition, these functions are non-local (and stretch out to infinity). They therefore do a very poor job in approximating sharp spikes. But with wavelet analysis, we can use approximating functions that are contained neatly in finite domains. Wavelets are well-suited for approximating data with sharp discontinuities.
One thing to remember is that wavelet transforms do not have a single set of basis functions like the Fourier transform, which utilizes just the sine and cosine functions. Instead, wavelet transforms have an infinite set of possible basis functions. Thus wavelet analysis provides immediate access to information that can be obscured by other time-frequency methods such as Fourier analysis.
All of the wavelet algorithms that I am aware of must be applied to a data set (a time series or a signal) that with a power of two number of elements (e.g., 256 elements = 28).
The wavelet literature covers a wide variety of wavelet algorithms, which are drawn from an infinite set of wavelet algorithms. The choice of the wavelet algorithm depends on the application.
The result of the wavelet transform produces a "down sampled" smoothed version of the signal (calculated by the wavelet scaling function) and a "down sampled" version of the signal that reflects change between signal elements. The smoothing function is sometimes referred to as a low pass filter. The wavelet function is sometimes referred to as a high pass filter.
If we compare the Haar forward transform matrix to the Daubecies D4 transform matrix, there is no overlap between successive pairs of scaling and wavelet functions, as there is with the Daubechies transform.
The Haar high pass filter (wavelet function) produces a result that reflects the difference between an even element and an odd element. The difference between an odd element and its even successor will not be reflected in the coefficient band calculated by a single step of Haar high pass filter (although this change will be picked up by later steps). In contrast, there is overlap between successive Daubechies high pass filters, so change between any two elements will be reflected in the result.
The Daubechies D4 wavelet transform is more "accurate", since change in the input data set is reflected in the high pass filter results at each transform step. The cost if using the Daubechies algorithm is higher computation overhead (twice the number of operations, compared to Haar) and a more complicated algorithm (the algorithm must properly handle the edge condition where i=0). Whether the higher accuracy of the Daubechies algorithm is worth the cost is application dependent.
Wavelet packets are particular linear combinations of wavelets. They form bases which retain many of the orthogonality, smoothness, and localization properties of their parent wavelets. The coefficients in the linear combinations are computed by a recursive algorithm making each newly computed wavelet packet coefficient sequence the root of its own analysis tree.
One step in the wavelet transform calculates a low pass (scaling function) result and a high pass (wavelet function) result. The low pass result is a smoother version of the original signal (the average, in the case of the Haar wavelet). The low pass result recursively becomes the input to the next wavelet step, which calculates another low and high pass result, until only a single low pass (20) result is calculated. For more on the wavelet transform see Basic Lifting Scheme Wavelets
The wavelet transform applies the wavelet transform step to the low pass result. The wavelet packet transform applies the transform step to both the low pass and the high pass result.
Haar Wavelet Packet Transform Example
By loading a wav file consisting of superimposed 280 cps and 440 cps signals, sampling to size 1024 frames, and invoking the Haar Wavelet (Packet) transform, then buidling a 3dm file under Surface mode, the following Rhino 3dm representation is produced:

The front wire-frame projection of this surface showing the mixed signal harmonic peaks is shown below.

A 2D plot of the generating matrix (column 3) is shown below.

Daubechies Wavelet Packet Transform Example
By loading a wav file consisting of superimposed 280 cps and 440 cps signals, sampling to size 1024 frames, and invoking the Daubechies Wavelet (Packet) transform, then buidling a 3dm file under Surface mode, the following Rhino 3dm representation is produced:

The front wire-frame projection of this surface showing the mixed signal harmonic peaks is shown below.

What we see by comparing the results of these two wavelet packet transforms is that, for this periodic signal data, the results are quite similar.