In one embodiment, a music detection (MD) module accumulates sets of one or more frames and performs FFT processing on each set to recover a set of coefficients, each corresponding to a different frequency k. For each frame, the module identifies candidate musical tones by searching for peak values in the set of coefficients. If a coefficient corresponds to a peak, then a variable TONE[k] corresponding to the coefficient is set equal to one. Otherwise, the variable is set equal to zero. For each variable TONE[k] having a value of one, a corresponding accumulator A[k] is increased. Candidate musical tones that are short in duration are filtered out by comparing each accumulator A[k] to a minimum duration threshold. A determination is made as to whether or not music is present based on a number of candidate musical tones and a sum of candidate musical tone durations using a state machine.