Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 26

872 Linear Prediction and Optimum Linear Filters Chap.

11

structure provides a method for computing the reflection coefficients in the lattice
predictor.

A pipelined architecture for implementing the SchLir algorithm. Kung


and Hu (1983) developed a pipelined lattice-type processor for implementing the
SchUr algorithm. The processor consists of a cascade of p lattice-type stages, where
each stage consists of two processing elements (𝑷𝑬𝑺), which we designate as upper
𝑃𝐸𝑠 denoted as 𝐴1 , 𝐴2 , … , 𝐴𝑝 and lower 𝑃𝐸𝑠 denoted as 𝐵1 , 𝐵2 , … , 𝐵𝑝 as shown
in Fig, 11.5. The PE designated as 𝐴1 is assigned the task of performing divisions.
The remaining 𝑃𝐸𝑠 perform one multiplication and one addition per iteration (one
clock cycle).
Initially, the upper 𝑃𝐸𝑠 are loaded with the elements of the first row of the
generator matrix GD, as illustrated in Fig. 11.5. The lower PEs are loaded with the
elements of the second row of the generator matrix 𝑮𝟎 . The computational process
begins with the division 𝑃𝐸, 𝐴1 , which computes the first reflection coefficient as
𝐾1 = −𝛾𝑥𝑥 (1)/𝛾𝑥𝑥 (0)The value of 𝐾1 is sent simultaneously to all the 𝑃𝐸𝑠 in the
upper branch and lower branch.
The second step in the computation is to update the contents of all processing
elements simultaneously. The contents of the upper and lower 𝑃𝐸𝑠 are updated
as follows:
𝑷𝑬 𝑨𝒎 ∶ 𝑨𝒎 ⟵ 𝑨𝒎 + 𝒌𝟏 𝑩𝒎 𝒎 = 𝟐, 𝟑, … , 𝒑

𝑷𝑬 𝑩𝒎 ∶ 𝑩𝒎 ⟵ 𝑩𝒎 + 𝒌𝟏 𝑨𝒎 𝒎 = 𝟏, 𝟐, … , 𝒑

The third step involves the shifting of the contents of the upper 𝑃𝐸𝑠 one
place to the left. Thus we have
𝑷𝑬 𝑨𝒎 ∶ 𝑨𝒎−𝟏 ⟵ 𝑨𝒎 𝒎 = 𝟐, 𝟑, … , 𝒑
At this point, 𝑃𝐸 𝑨𝟏 contains 𝜸𝑿𝑿 (𝟐) + 𝒌𝟏 𝜸𝑿𝑿 (𝟏) while 𝑃𝐸 𝐵1 contains
𝜸𝑿𝑿 (𝟎) + 𝒌∗𝟏 𝜸𝑿𝑿 (𝟏) Hence the processor Ai is ready to begin the second cycle by
computing

Figure iLS Pipelmed parallel processor for computing the reflection coefficients
Sec. 11.4 Properties of the Linear Prediction-Error Filters 873

the second reflection coefficient 𝑲𝟐 = 𝑨𝟏 /𝑩𝟏 The three computational steps beginning
with the division A l /B1 are repeated until all p reflection coefficients are computed. Note that
𝑓
𝑷𝑬 𝑩𝟏 provides the minimum mean-square error 𝐸𝑚 for each iteration.
If r d denotes the time for 𝑷𝑬 𝑨𝟏 to perform a (complex) division and raw is the time
required to perform a (complex) multiplication and an addition, the time required to compute
all p reflection coefficients is 𝑝(𝜏𝑑 + 𝜏𝑚𝑎 ) for the Schur algorithm.

11.4 PROPERTIES OF THE LINEAR PREDICTION-ERROR FILTERS

Linear prediction filters possess several important properties, which we now de scribe. We
begin by demonstrating that the forward prediction-error filter is minimum phase.

Minimum-phase property of the forward prediction-error filter. We have


already demonstrated that the reflection coefficients {𝑲𝒊 } are correlation coefficients, and
consequently |𝑲𝒊 | ≤ 1 for all 𝑖 This condition and the relation
𝑓 𝑓
𝐸𝑚 = (1 − |𝑲𝒎 |2 )𝐸𝑚−1 can be used to show that the zeros of the prediction-error
filter are either all inside the unit circle or they are all on the unit circle.
𝑓
First, we show that if 𝐸𝑝 , > 0, the zeros |𝑧𝑖 | < 1 for every i. The proof is by
induction. Clearly, for 𝑝 = 1 the system function for the prediction-error filter is
𝑨𝟏 (𝒛)
=𝟏
+ 𝒌𝟏 𝒛−𝟏 (𝟏𝟏. 𝟒. 𝟏)
𝑓
Hence 𝑧𝑖 = — 𝐾1 𝑎𝑛𝑑 = (1 −
𝐸1 |𝑲𝟏 |2 )𝐸0𝑓
> 0. Now, suppose that the hypothesis is
true for 𝑝 — 1. Then, if zi is a root of 𝐴𝑝 (𝑧), we have from (11.2.16) and (11.2.18),
𝑨𝒑 (𝒛𝒊 ) = 𝑨𝒑−𝟏 + 𝒌𝒑 𝒛−𝟏
𝒊 𝑩𝒑−𝟏 (𝒛𝒊 )
𝟏 (𝟏𝟏. 𝟒. 𝟐)
= 𝑨𝒑−𝟏 (𝒛𝒊 ) + 𝒌𝒑 𝒛−𝟏 ∗
𝒊 𝑨𝒑−𝟏 ( ) = 𝟎
𝒛𝒊
Hence
−𝒑 𝟏
𝟏 𝒛𝒊 𝑨∗𝒑−𝟏 (𝒛 )
𝒊
= ≡ 𝑸(𝒛𝒊) (𝟏𝟏. 𝟒. 𝟑)
𝒌𝒑 𝑨𝒑−𝟏 (𝒛𝒊 )
I,

We note that the function 𝑄(𝑧) is all pass. In general, an all-pass function of the form
N
zzk∗ + 1
P(z) = ∏ |zk | < 1 (11.4.4
z + zk
k=1

satisfies the property that |𝑃(𝑧)| > 1 𝑓𝑜𝑟 |𝑧| < 1, |𝑃(𝑧)| = 1 𝑓𝑜𝑟 |𝑧| = 1 𝑎𝑛𝑑 |𝑃(𝑧)| <
1 𝑓𝑜𝑟 |𝑧| > 1 𝑆𝑖𝑛𝑐𝑒 𝑄(𝑧) = −𝑃(𝑧)/𝑧 follows that |𝑧𝑖 | < 1 𝑖𝑓 |𝑄(𝑧)| > 1. Clearly , this is
1 𝑓
the case since 𝑄(𝑧) = 𝐾 𝑎𝑛𝑑 𝐸𝑝 > 0 .
𝑝
874 Linear Prediction and Optimum Linear Filters Chap. 11
𝑓 𝑓
On the other hand, suppose that 𝐸𝑝−1 > 0 𝑎𝑛𝑑 𝐸𝑝 =0 . In this case |𝐾𝑝 | 𝑎𝑛𝑑 |𝑄(𝑧𝑖 )| =
1. Since the 𝑴𝑴𝑺𝑬 is zero, the random process 𝑥(𝑛) is called predictable or
deterministic. Specifically, a purely sinusoidal random process of the form
𝒙(𝒏)
𝑴

= ∑ 𝜶𝒌 𝒆𝒋(𝒏𝝎𝒌+𝜽𝒌) (𝟏𝟏. 𝟒
𝒌=𝟏
where the phases 𝜃𝑘 are statistically independent and uniformly distributed over (0,2𝜋) has the
autocorrelation
𝜸𝒙𝒙 (𝒎)
𝑴

= ∑ 𝜶𝟐𝒌 𝒆𝒋(𝒎𝝎𝒌) (𝟏𝟏. 𝟒. 𝟔)


𝒌=𝟏

and the power density spectrum


𝑴

𝜞𝒙𝒙 (𝒇) = ∑ 𝜶𝟐𝒌 𝜹(𝒇 − 𝒇𝒌 ) 𝒇𝒌


𝒌=𝟏
𝝎𝒌
= (𝟏𝟏. 𝟒. 𝟕)
𝟐𝝅
k=1
This process is predictable with a predictor of order 𝑝 ≥ 𝑀.
To demonstrate the validity of the statement, consider passing this process through a
prediction error filter of order 𝑝 ≥ 𝑀 The MSE at the output of this filter is
𝟏
𝒇 𝟐 𝟐
𝜺𝒑 = ∫ 𝜞𝒙𝒙 (𝒇)|𝑨𝒑 (𝒇)| 𝒅𝒇
−𝟏
𝟐
𝟏 𝑴
𝟐
= ∫ [∑ 𝜶𝟐𝒌 𝜹(𝒇
−𝟏
𝟐 𝒌=𝟏
𝟐
− 𝒇𝒌 ) ] |𝑨𝒑 (𝒇)| 𝒅𝒇 (𝟏𝟏. 𝟒. 𝟖)

𝑴
𝟐
= ∑ 𝜶𝟐𝒌 |𝑨𝒑 (𝒇𝒌 )|
𝒌=𝟏

By choosing 𝑴 of the p zeros of the prediction-error filter to coincide with the


𝒇
frequencies{𝑓𝑘 }, 𝑡ℎ𝑒 𝑴𝑺𝑬 ℇ𝒐 can be forced to zero. The remaining 𝑝 − 𝑀 zeros can be
selected arbitrarily to be anywhere inside the unit circle.
Finally, the reader can prove that if a random process consists of a mixture of a continuous
power spectral density and a discrete spectrum, the prediction- error filter must have all its roots
inside the unit circle.

Maximum-phase property of the backward prediction-error filter. The system


function for the backward prediction error filter of order p is
𝑩𝒑 (𝒛) = 𝒛−𝒑 𝑨∗𝒑 (𝒛−𝟏 ) (𝟏𝟏. 𝟒. 𝟗)
Consequently, the roots of 𝑩𝒑 (𝒛) are the reciprocals of the roots of the forward prediction-error
filter with system function 𝑨𝒑 (𝒛) Hence if 𝑨𝒑 (𝒛) is minimum phase, then 𝑩𝒑 (𝒛) is
maximum phase. However, if the process 𝑥(𝑛) is predictable, all the roots of 𝑩𝒑 (𝒛)lie on the unit
circle,
Sec. 11.4 Properties of the Linear Prediction-Error Fifters 875

Whitening property. Suppose that the random process 𝑥(𝑛) is an 𝐴𝑅(𝑝) stationary
random process that is generated by passing white noise with variance 𝜎𝑤2 , through an all-pole
filter with system function
𝑯(𝒛)
𝟏
= (𝟏𝟏. 𝟒. 𝟏𝟎)
𝟏+ ∑𝒑𝒌=𝟏 𝜶𝒌 𝒛−𝒌
Then the prediction-error filter of order p has the system function
𝑨𝒑 (𝒛)
=𝟏
𝒑

+ ∑ 𝜶𝒌 𝒛−𝒌 (𝟏𝟏. 𝟒. 𝟏𝟏)


𝒌=𝟏

where the predictor coefficients 𝒂𝒑 (𝒌) = 𝒂𝒌 . The response of the prediction-error filter is a
white noise sequence {U1(11)} In this case the prediction-error filter whitens the input
random process x(n) and is called a whitening filter, as indicated in Section 11.2.
More generally, even if the input process x(n) is not an 𝐴𝑅 process, the prediction-error filter
attempts to remove the correlation among the signal samples of the input process. As the order of
the predictor is increased, the predictor output 𝑥̂(𝑛) becomes a closer approximation to
𝑥(𝑛) and hence the difference 𝑓 (𝑛) = 𝑥̂(𝑛) — 𝑥(𝑛) approaches a white noise sequence.

Orthogonality of the backward prediction errors. The backward prediction


errors {g 𝑚 (𝑘)} from different stages in the 𝐹𝐼𝑅 lattice filter are orthogonal. That is
𝟎, 𝟎≤𝒍≤𝒎−𝟏
𝑬[𝒈𝒎 (𝒏)𝒈∗𝒍 (𝒏)] = { (𝟏𝟏. 𝟒. 𝟏𝟐)
𝑬𝒃𝒎 𝒍=𝒎
This property is easily proved by substituting for g in (n) and On) into (11.4.12) and carrying
out the expectation. Thus
𝒎 𝒍

𝑬[𝒈𝒎 (𝒏)𝒈∗𝒕 (𝒏)] = ∑ 𝒃𝒎 (𝒌) ∑ 𝒃∗𝒍 (𝒋)𝑬[𝒙(𝒏 − 𝒌)𝒙∗ (𝒏 − 𝒋)]


𝒌=𝟎 𝒊=𝟎
𝒍 𝒎

= ∑ 𝒃∗𝒍 (𝒋) ∑ 𝒃𝒎 (𝒌) 𝜸𝒙𝒙 (𝒋 − 𝒌) (𝟏𝟏. 𝟒. 𝟏𝟑)


𝒊=𝟎 𝒌=𝟎

But the normal equations for the backward linear predictor require that

∑ 𝒃𝒎 (𝒌) 𝜸𝒙𝒙 (𝒋 − 𝒌)
𝒌=𝟎
𝟎, 𝒋 = 𝟏, 𝟐, … , 𝒎 − 𝟏
={ (𝟏𝟏. 𝟒. 𝟏𝟒)
𝑬𝒃𝒎 , 𝒋=𝒎

Therefore,
876 Linear Prediction and Optimum Linear Filters Chap, 11
𝑬[𝒈𝒎 (𝒏)𝒈∗𝒕 (𝒏)]
𝒃 𝒇
= {𝑬 𝒎 = 𝑬 𝒎 𝒎=𝒍 (𝟏𝟏. 𝟒. 𝟏𝟓)
𝟎, 𝟎≤𝒍≤𝒎−𝟏

Additional properties. There are a number of other interesting properties regarding the forward and
backward prediction errors in the 𝐹𝐼𝑅 lattice filter. These are given here for real-valued data. Their
proof is left as an exercise for the reader.
(𝒃) 𝑬[𝒈𝒎 (𝒏)𝒙(𝒏 − 𝒊)] = 𝟎 𝟏≤𝒊≤𝒎−𝟏

(𝒄) 𝑬[𝒇𝒎 (𝒏)𝒙(𝒏)] = 𝑬[𝒈𝒎 (𝒏)𝒙(𝒏 − 𝒎)] = 𝑬𝒎

(𝒅) 𝑬[𝒇𝒊 (𝒏)𝒇𝒋 (𝒏)] = 𝑬𝒎𝒂𝒙 (𝒊, 𝒋)

𝟏≤𝒕 ≤𝒊−𝒋 , 𝒊>𝒋


(𝒆) 𝑬[𝒇𝒊 (𝒏)𝒇𝒋 (𝒏 − 𝒕)] = 𝟎, 𝒇𝒐𝒓 {
−𝟏 ≥ 𝒕 ≥ 𝒊 − 𝒋, 𝒊<𝒋

𝟎 ≤𝒕 ≤𝒊−𝒋 , 𝒊>𝒋
(𝒇) 𝑬[𝒈𝒊 (𝒏)𝒈𝒋 (𝒏 − 𝒕)] = 𝟎, 𝒇𝒐𝒓 {
𝟎 ≥ 𝒕 ≥ 𝒊 − 𝒋 + 𝟏, 𝒊<𝒋

𝑬𝒊 , 𝒊=𝒋
(𝒈) 𝑬[𝒇𝒊 (𝒏 + 𝒊)𝒇𝒋 (𝒏 + 𝒋)] = {
𝟎, 𝒊≠𝒋

(𝒉) 𝑬[𝒈𝒊 (𝒏 + 𝒊)𝒈𝒋 (𝒏 + 𝒋)] = 𝑬𝒎𝒂𝒙 (𝒊, 𝒋)

(𝒊) 𝑬[𝒇𝒊 (𝒏)𝒈𝒋 (𝒏)]

𝒌𝒋 𝑬𝒊 , 𝒊 ≥ 𝒋, 𝒊, 𝒋 ≥ 𝟎, 𝒌𝟎 = 𝟏
={
𝟎, 𝒊<𝒋

(𝒋) 𝑬[𝒇𝒊 (𝒏)𝒈𝒋 (𝒏 − 𝟏)] = 𝒌𝒊+𝟏 𝑬𝒊

(𝒌) 𝑬[𝒈𝒊 (𝒏 − 𝟏)𝒙(𝒏)] = 𝑬[𝒇𝒊 (𝒏 + 𝟏)𝒙(𝒏 − 𝟏)] = 𝒌𝒊+𝟏 𝑬𝒊

𝟎 𝒊 > 𝒋,
(𝒍) 𝑬[𝒇𝒊 (𝒏)𝒈𝒋 (𝒏 − 𝟏)] = { −𝒌𝒋+𝟏 𝑬𝒊 , 𝒊≤𝒋

11.5 AR LATTICE AND ARMA LATTICE-LADDER FILTERS

in Section 11.2 we showed the relationship of the all-zero FIR lattice to linear prediction. The
linear predictor with transfer function,
𝑨𝒑 (𝒛)
𝒑

= 𝟏 + ∑ 𝜶𝒑 (𝒌)𝒛−𝒌 (𝟏𝟏. 𝟓. 𝟏)
𝒌=𝟏

when excited by an input random process {𝑥(𝑛)}, produces an output that approaches a
white noise sequence as 𝑝 ⟶ ∞. On the other hand, if the input process is an 𝐴𝑅(𝑝),
the output of 𝐴𝑝 (𝑍) is white. Since 𝐴𝑝 (𝑧) generates a MA(p) process when excited with a white
noise sequence, the all-zero lattice is sometimes called a 𝑴𝑨 lattice.
In the following section, we develop the lattice structure for the inverse filter 1/
𝐴𝑝 (𝑍) , called the 𝑨𝑹 lattice, and the lattice-ladder structure for an 𝑨𝑹𝑴𝑨 process.
Sec. 11.5 AR Lattice and ARMIA Lattice-Ladder Filters 877

11.5.1 AR Lattice Structure

Let us consider an all-pole system with system function


𝟏
𝑯(𝒛) = (𝟏𝟏. 𝟓. 𝟐)
𝟏+ ∑𝒑𝒌=𝟏 𝜶𝒌 𝒛−𝒌
The difference equation for this UR system is
𝒑

𝒚(𝒏) = − ∑ 𝒂𝒑 (𝒌)𝒚(𝒏 − 𝒌) + 𝒚(𝒏) (𝟏𝟏. 𝟓. 𝟑)


𝒌=𝟏
Now suppose that we interchange the roles of the input and output [i.e., inter -
change 𝑥(𝑛) with 𝑦(𝑛) in (11.5.3)) obtaining the difference equation
𝒑

𝒙(𝒏) = − ∑ 𝒂𝒑 (𝒌)𝒙(𝒏 − 𝒌) + 𝒚(𝒏)


𝒌=𝟏

or, equivalently,
𝒚(𝒏
= 𝒙(𝒏)
𝒑

+ ∑ 𝒂𝒑 (𝒌)𝒙(𝒏
𝒌=𝟏
− 𝒌) (𝟏𝟏. 𝟓. 𝟒)
We observe that (11.5.4) is a difference equation for an FIR system with
system function 𝐴𝑝 (𝑧)Thus an all-pole 𝐻𝑅 system can be converted to an all-
zero system by interchanging the roles of the input and output.
Based on this observation, we can obtain the structure of an 𝐴𝑅(𝑝) lattice
from a 𝑀𝐴(𝑝) lattice by interchanging the input with the output. Since the 𝑀𝐴(𝑝)
lattice has 𝑦(𝑛) = 𝑓𝑝 (𝑛) as its output and𝑥(𝑛) = 𝑓0 (𝑛) is the input, we let
𝒙(𝒏) = 𝒇𝒑 (𝒏)
𝒚(𝒏) = 𝒇𝟎 (𝒏)
These definitions dictate that the quantities {𝒇𝐦 (𝒏)} be computed in descending or-
der. This computation can be accomplished by rearranging the recursive equation for
{𝒇𝐦 (𝒏)} in (11.2.4) and solving for {𝒇𝐦−𝟏 (𝒏)} in terms of {𝒇𝐦 (𝒏)} Thus we obtain
𝒇𝐦−𝟏 (𝒏) = 𝒇𝐦 (𝒏) − 𝒌𝒎 (𝒎)𝒈𝒎−𝟏 (𝒏 − 𝟏) 𝐦 = 𝐩, 𝐩 − 𝟏, … , 𝟏
The equation for 𝒈𝒎 (𝐧) remains unchanged. The result of these changes is the set
of equations
𝒙(𝒏) = 𝒇𝒑

𝒇𝐦−𝟏 (𝒏) = 𝒇𝒎 (𝒏) − 𝒌𝒎 (𝒎)𝒈𝒎−𝟏 (𝒏 − 𝟏)


𝒈𝒏 (𝒏) = 𝒌∗𝒎 𝒇𝒎−𝟏 (𝒏)
+ 𝒈𝒎−𝟏 (𝒏 − 𝟏) (𝟏𝟏. 𝟓. 𝟔)

𝒈𝒏 (𝒏) = 𝒌∗𝒎 𝒇𝒎−𝟏 (𝒏) + 𝒈𝒎−𝟏 (𝒏 − 𝟏)


878 Linear Prediction and Optimum Linear Fitters Chap. 11

Omput

The corresponding structure for the 𝑨𝑹(𝒑) lattice is shown in Fig. 11.6. Note that
the all-pole lattice structure has an all-zero path with input 𝒈𝟎 (𝐧) and output 𝒈𝒑 (𝐧) ,
Figure 11.6 Lattice structure for an all-pole system

which is identical to the all-zero path in the 𝑴𝒂(𝒑) lattice structure. This is not
surprising, since the equation for𝐠 𝒎 (𝐧 is identical in the two lattice stnictures.
We also observe that the 𝑨𝑹(𝒑) and 𝑴𝑨(𝒑) lattice structures are charac-
terized by the same parameters, namely. the reflection coefficients
{𝐾𝑖 } Consequently, the equations given in (11.2.21) and (11.2.23) for converting
between the system parameters {𝑎𝑝 (𝑘)} in the direct-form realizations of the
all-zero system A(z) and the lattice parameters {𝐾𝑖 } of the 𝑴𝑨(𝒑) lattice. structure,
apply as well to the all-pole structures.

11.5.2 ARMA Processes and Lattice-Ladder Filters

The all-pole lattice provides the basic building block for lattice-type structures that
implement IIR systems that contain both poles and zeros. To construct the appropriate
structure, let us consider an HR system with system function
∑𝒒𝒌=𝟎 𝒄𝒒 (𝒌)𝒛−𝒌 𝒄𝒒 (𝒛)
𝑯(𝒛) = = (𝟏𝟏. 𝟐. 𝟕)
𝟏+ ∑𝒑𝒌=𝟏 𝒄𝒑 (𝒌)𝒛−𝒌 𝑨𝒑 (𝒛)
Without loss of generality, we assume that 𝑝 ≥ 𝑞.
This system is described by the difference equations
𝒑

𝒗(𝒏) = ∑ 𝒂𝒑 (𝒌)𝒗(𝒏 − 𝒌) + 𝒙(𝒏)


𝒌=𝟏
𝒚(𝒏)
𝒒

= ∑ 𝒄𝒒 (𝒌)𝒗(𝒏
𝒌=𝟎
− 𝒌) (𝟏𝟏. 𝟓. 𝟖)
obtained by viewing the system as a cascade of an all-pole system followed by an
all-zero system. From (11.5.8) we observe that the output y(n) is simply a linear
combination of delayed outputs from the all-pole system.
Since zeros result from forming a linear combination of previous outputs, we can
carry over this observation to construct a pole-zero system using the all-pole
Sec. 11.5 AR Lattice and ARMA Lattice-Ladder Filters 87 9

lattice structure as the basic building block. We have clearly observed that g,(n) in
the all-pole lattice can be expressed as a linear combination of present and past outputs.
In fact, the system
𝑯𝒃 (𝒛)
𝑮𝒎 (𝒛)
= (𝟏𝟏. 𝟓. 𝟗)
𝒚(𝒛)
is an all-zero system. Therefore, any linear combination of {g m (𝑛)} is also an
all-zero filter.
Let us begin with an all-pole lattice filter with coefficients K m, 1 ≤ m ≤ p
and add a ladder part by taking as the output, a weighted linear combination of
{g m (𝑛)} The result is a pole—zero filter that has the lattice-ladder structure shown
in Fig. 11.7. Its output is
𝒒

𝒚(𝒏) = ∑ 𝑩𝒌 𝒈𝒌 (𝒏) (𝟏𝟏. 𝟓. 𝟏𝟎)


𝒌=𝟎

where {𝛽𝑘 }are the parameters that determine the zeros of the system. The system
function corresponding to (11.5.10) is
𝒒
𝒚(𝒛) 𝑮𝒌 (𝒛)
𝑯(𝒛) = = ∑ 𝜷𝒌 (𝟏𝟏. 𝟓. 𝟏𝟏)
𝒙(𝒛) 𝒙(𝒛)
𝒌=𝟎
Input
An) =On) L1E:10:2._
Stage Stage
g(n) g_1(n)

r op-1

(a) Pole—zero system

(b) mt h s t a g e o f l a t t i c e Figure 11.7


Lattice-ladder structure for pole-zero system
880 Linear Prediction and Optimum Linear Filters
Chap, 11
𝑆𝑖𝑛𝑐𝑒 𝑋(𝑧) = 𝐹𝑝(𝑧) 𝑎𝑛𝑑 𝐹0 (𝑧) = 𝐺0 (𝑧), (11.5.11), can be expressed as
𝒒
𝑮𝒌 (𝒛)𝑭𝟎 (𝒛)
𝑯(𝒛) = ∑ 𝜷𝒌
𝑮𝟎 (𝒛)𝑭𝒑 (𝒛)
𝒌=𝟎
𝒒
𝟏
= ∑ 𝜷𝒌 𝑩𝒌 (𝒛) (𝟏𝟏. 𝟓. 𝟏𝟐)
𝑨𝒑 (𝒛)
𝒌=𝟎
Therefore,
𝒒

𝐂𝐪 (𝐳) = ∑ 𝜷𝒌 𝑩𝒌 (𝒛) (𝟏𝟏. 𝟓. 𝟏𝟑)


𝒌=𝟎

This is the desired relationship that can be used to determine the weighting coef-
ficients {Pk} as previously shown in Section 7.3.5.
Given the pol ynomials C I (z) and Ap(z ) , where p > q, the reflection coef -
ficients (1( 1 ) are determined first from the coefficients {a p (k)}. By means of the
step-down recursive relation given by (11.2.22), we also obtain the polynomials
Bk(z), k = 1, 2, ... p. Then the ladder parameters can be obtained from (11.5.13), which
can be expressed as
𝒎−𝟏

𝑪𝒎 (𝒛) = ∑ 𝜷𝒌 𝑩𝒌 (𝒛) + 𝜷𝒎 𝑩𝒎 (𝒛)


𝒌=𝟎
= 𝑪𝒎−𝟏 (𝒛)
+ 𝜷𝒎 𝑩𝒎 (𝒛) (𝟏𝟏. 𝟒. 𝟏𝟒)

or, equivalently,
𝑪𝒎−𝟏 (𝒛) = 𝑪𝒎 (𝒛) − 𝜷𝒎 𝑩𝒎 (𝒛) 𝒎
= 𝒑, 𝒑 − 𝟏, … , 𝟏 (𝟏𝟏. 𝟓. 𝟏𝟓)
By running this recursive relation backward, we can generate all the lower -degree
pol ynomials, 𝑪𝒎 (𝒛). 𝑚 = 𝑝 — 1, . . . , 1. 𝑆𝑖𝑛𝑐𝑒 𝒃𝒎 (𝒎) = 1, the parameters 𝜷𝒎 ,
are determined from (11.5.15) by setting
𝜷𝒎 = 𝑪𝒎 (𝒎) 𝒎 = 𝒑, 𝒑 − 𝟏, … 𝟏. 𝟎
When excited by a white noise sequence, this lattice -ladder filter structure
generates an 𝑨𝑹𝑴𝑨(𝑝, 𝑞) process that has a power density spectrum
𝜞𝒙𝒙 (𝒇)
𝟐
|𝑪𝒒 (𝒇)|
= 𝝈𝟐𝝎 𝟐
(𝟏𝟏. 𝟓. 𝟏𝟔)
|𝑨𝒑 (𝒇)|

and an autocorrelation function that satisfies (11.1.18), where 𝝈𝟐𝝎 . in the variance of the
input white noise sequence.

11.6 WIENER FILTERS FOR FILTERING AND PREDICTION

In many practical applications we are given an input signal {𝑥(𝑛)}, consisting of the
Sec. 11.6 Wiener Filters for Filtering and Prediction 881

d(n)

Optimum
s(n)
linear e(n)
Stgnal filler

sum of a desired signal {𝑠(𝑛)}and Figure 11.8 Model for linear estimation
problem
an undesired noise or interference {𝑤(𝑛)} and we are asked to
design a fitter that suppresses the undesired interference
component. In such a case, the objective is to design a system that filters out the additive
interference while preserving the characteristics of the desired signal {𝑠(𝑛)}
In this section we treat the problem of signal estimation in the presence of an additive
noise disturbance. The estimator is constrained to be a linear filter with impulse response
{ℎ(𝑛)}, designed so that its output approximates some specified desired signal sequence {𝑑(𝑛)}.
Figure 11.8 illustrates the linear estimation problem.
The input sequence to the filter is .𝒙(𝒏) = 𝑠(𝑛) + 𝑤(𝑛), and its output sequence is
y(n). The difference between the desired signal and the filter output is the error sequence 𝑒(𝑛) =
𝑑(𝑛) − 𝑦(𝑛).
We distinguish three special cases:

1. If 𝑑(𝑛) = 𝑠(𝑛), the linear estimation problem is referred to as filtering.


2. If 𝑑(𝑛) = 𝑠(𝑛 + 𝐷). where 𝐷 > 0, the linear estimation problem is referred to
as signal prediction. Note that this problem is different than the prediction considered earlier in
this chapter, where 𝑑(𝑛) = 𝑥(𝑛 + 𝐷), 𝐷 ≥ 𝑂.
3. If 𝑑(𝑛) = 𝑠(𝑛 − 𝐷), where 𝐷 > 0, the linear estimation problem is referred to as
signal smoothing.

Our treatment will concentrate on filtering and prediction.


The criterion selected for optimizing the. filter impulse response {ℎ(0)} is the
minimization of the mean-square error. This criterion has the advantages of simplicity and
mathematical tractability.
The basic assumptions are that the sequences {𝑠(0)}, {𝑤(0)}, 𝑎𝑛𝑑 {𝑑(0)}are zero
mean and wide-sense stationary. The linear filter will be assumed to be either 𝑭𝑰𝑹 or 𝑰𝑰𝑹. If it is
𝑰𝑰𝑹, we assume that the input data {𝑥(𝑛)} are available over the infinite past. We begin with
the design of the optimum FIR filter. The optimum linear filter, in the sense of minimum
mean-square error 𝑀𝑀𝑺𝑬), is called a Wiener filter.

11.6.1 FIR Wiener Filter

Suppose that the filter is constrained to be of length 𝑴 with coefficients {ℎ𝑘 , 0 ≤


𝑘 ≤ 𝑀 − 1). H e n c e i t s o u t p u t 𝑦(𝑛) dep ends on th e f inite d ata r eco rd
𝑥(𝑛),
882 Linear Prediction and Optimum Linear Fitters Chap. 11

𝒙(𝒏 − 𝟏), … . , 𝒙(𝑛 − 𝑀 + 1),


𝑴−𝟏

𝒚(𝒏) = ∑ 𝑯(𝒌)𝐱(𝒏 − 𝒌) (𝟏𝟏. 𝟔. 𝟏)


𝒌=𝟎

The mean-square value of the error between the desired output 𝑑(𝑛) 𝑎𝑛𝑑 𝑦(𝑛) is
𝓔𝑴 = 𝑬|𝒆(𝒏)|𝟐
𝑴−𝟏

𝑬 |𝒅(𝒏) − ∑ 𝒉(𝒌)𝒙(𝒏 − 𝒌)| (𝟏𝟏. 𝟔. 𝟐)


𝑲=𝟎
Since this is a quadratic function of the filter coefficients, the minimization of ℰ𝑀 yields the set of linear
equations
𝑴−𝟏

∑ 𝒉(𝒌)𝜸𝒙𝒙 (𝒍 − 𝒌) = 𝜸𝒅𝒙 (𝒍) 𝒍 = 𝟎, 𝟏, … . 𝑴 − 𝟏 (𝟏𝟏. 𝟔. 𝟑)


𝒌=𝟎

where 𝛾𝑥𝑥 (𝑘) is the autocotTelation of the input sequence {𝑥(𝑛)} and 𝛾𝑑𝑥 (𝑘) = 𝐸[𝑑(𝑛)𝑥 ∗ (𝑛−)]
is the crosscorrelation between the desired sequence {𝑑(𝑛)}and the input sequence {𝑥(𝑛), 0 ≤
𝑛 ≤ 𝑀 − 1. The set of linear equations that specify the optimum filter is called the Wiener-liopf
equation. These equations are also called the normal equations, encountered earlier in the chapter in the
context of linear one-step prediction.
In general, the equations in (11.6.3) can be expressed in matrix form as
𝜞𝑴 𝒉𝑴 = 𝜸𝒅 (𝟏𝟏. 𝟔. 𝟒)
where 𝜞𝑴 is an 𝑀 × 𝑀(Hermitian) Toeplitz matrix with elements 𝜞𝒍𝒌 = 𝜸𝑥𝑥 (𝑛 − 𝑘) and 𝜸𝑑
is the 𝑀 𝑥 1 crosscorrelation vector with elements 𝜸𝑑𝑥 (𝑙), 𝑙 = 0, 1, . . . , 𝑀 − 1. The solution for the
optimum filter coefficients is
−𝟏
𝒉𝒐𝒑𝒕 = 𝜞𝑴 𝜸𝒅 (𝟏𝟏. 𝟔. 𝟓)
and the resulting minimum 𝑴𝑺𝑬 achieved by the Wiener filter is
−𝟏
𝒉𝒐𝒑𝒕 = 𝜞𝑴 𝜸𝒅 (𝟏𝟏. 𝟔. 𝟓)
and the resulting minimum 𝑴𝑺𝑬 achieved by the Wiener filter is
𝑴−𝟏
𝒎𝒊𝒏
𝑴𝑴𝑺𝑬𝑴 = 𝒉𝑴 𝓔𝑴= 𝝈𝟐𝒅 − ∑ 𝒉𝒐𝒑𝒕 (𝒌)𝜸∗𝒅𝒙 (𝒌) (𝟏𝟏. 𝟔. 𝟔)
𝒌=𝟎
or, equivqlenty,
𝑴𝑴𝑺𝑬𝑴 = 𝝈𝒍𝒅 − 𝜸∗𝒅𝒙 𝜞−𝟏 𝑴 𝜸𝒅 (𝟏𝟏. 𝟔. 𝟕)
𝟐 𝟐
where 𝝈𝒅 = 𝑬|𝒅(𝒏)| .
Let us consider some special cases of (11.6.3). If we are dealing with filtering. the 𝑑(𝑛) = 𝑠(𝑛).
Furthermore, if 𝑠(𝑛) 𝑎𝑛𝑑 𝑤(𝑛) are uncorrelated random sequences,
as is usually the case in practice, then
𝜸𝒙𝒙 (𝒌) = 𝜸𝒔𝒔 (𝒌) + 𝜸𝒘𝒘 (𝒌)
(𝟏𝟏. 𝟔. 𝟖)
𝜸𝒅𝒙 (𝒌) = 𝜸𝒔𝒔 (𝒌)
Sec. 11.6 Wiener Filters for Filtehng and Prediction

and the normal equations in (11.6.3) become


𝑴−𝟏

∑ 𝒉(𝒌)[𝜸𝒔𝒔 (𝒍 − 𝒌) + 𝜸𝒘𝒘 (𝒍 − 𝒌)] = 𝜸𝒔𝒔 (𝒍) 𝒍


𝒌=𝟎
= 𝟎, 𝟏, … . 𝑴 − 𝟏 (𝟏𝟏. 𝟔. 𝟗)
If we are dealing with prediction . then 𝑑(𝑛) = 𝑠(𝑛 + 𝐷) 𝑤ℎ𝑒𝑟𝑒 𝐷 >
0. Assuming that 𝑠(𝑛) 𝑎𝑛𝑑 𝑤(𝑛) are uncorrelated random sequences, we have
𝜸𝒅𝒙 (𝒌) = 𝜸𝒔𝒔 (𝒍 + 𝑫) (𝟏𝟏. 𝟔. 𝟏𝟎)
Hence the equations for the Wiener prediction filter become
𝑴−𝟏

∑ 𝒉(𝒌)[𝜸𝒔𝒔 (𝒍 − 𝒌) + 𝜸𝒘𝒘 (𝒍 − 𝒌)] = 𝜸𝒔𝒔 (𝒍 + 𝑫) 𝒍 = 𝟎, 𝟏, … . 𝑴 − 𝟏 (𝟏𝟏. 𝟔. 𝟏𝟏)


𝒌=𝟎

In all these cases. the correlation matrix to be inverted is Toeplitz. Hence the
(generalized) Levinson-Durbin algorithm may be used to solve for the optimum filter
coefficients.

E xa m pl e 1 1. 6. 1
Let us consider a signal 𝑥(𝑛) = 𝑠(𝑛) + 𝑤(𝑛), where 𝑠(𝑛) is an 𝑨𝑹(𝟏) process that
satisfies the difference equation

𝑠(𝑛) = 0.6 𝑠(𝑛 − 1) + 𝑣(𝑛)

where {𝑠(𝑛)} is a white noise sequence with variance 𝜎𝑤2 = .046 𝑎𝑛𝑑 {𝑤(𝑛)}is a
white noise sequence with variance 𝜎𝑤2 = 1. We will design a Wiener filter of length
𝑀 = 2 estimate {𝑠(𝑛)}

Solution Since {𝑠(𝑛)} is obtained by exciting a single-pole filter by white noise,


the power spectral density of 𝑠(𝑛) is
𝜞𝑠𝑠 (𝑓) = 𝜎2𝑤 |𝐻(𝑓)|2
0.64
= 2
|1 − 0.6𝑒−𝑗2𝜋𝑓 |

0.64
=
1.36 − 1.2 cos 2𝜋𝑓

The corresponding autocorrelation sequence {𝛾𝑠𝑠 (𝑚)} is

𝛾𝑠𝑠 (𝑚) = (0.6)|𝑚|

The equations for the filter coefficients are


2ℎ(0) + 0.6ℎ(1) = 1
06ℎ(0) + 2ℎ(1) = 0.6
solution of these equation yields the result
ℎ(0) = 0.451 ℎ(1) = 0.165
The corresponding minimum 𝑀𝑆𝐸 is
𝑀𝑀𝑆𝐸2 = 1 − ℎ(0)𝛾𝑠𝑠 (0) − ℎ(1)𝛾𝑠𝑠 (1)
= 1 − 0.451 − (0.165)(0.6)
= 0.45
This error can be reduced further by increasing the length of the Wiener filter (see
Problem 11.27).

11.6.2 Orthogonality Principle in Linear Mean-Square


Estimation

The normal equations for the optimum filter coefficients given by (11.63) can be
obtained directly by applying the orthogonality principle in linear mean-square
estimation. Simply stated, the mean-square error Em in (11.6.2) is a minimum if the
filter coefficients {ℎ(𝑘)} are selected such that the error is orthogonal to each of the
data points in the estimate,
𝑬[𝒆(𝒏)𝒙∗ (𝒏 − 𝒍)] = 𝟎 𝒍
= 𝟎, 𝟏, … . , 𝑴 − 𝟏 (𝟏𝟏. 𝟔. 𝟏𝟐)
where
𝑴−𝟏

𝒆(𝒏) = 𝒅(𝒏) − ∑ 𝒉(𝒌)𝒙(𝒏


𝒌=𝟎
− 𝒌) (𝟏𝟏. 𝟔. 𝟏𝟑)

Conversely, if the filter coefficients satisfy (11.6.12), the resulting 𝑴𝑺𝑬 is a mini-
mum.
When viewed geometrically, the output of the filter, which is the estimate
𝑴−𝟏
̂ (𝒏) = ∑ 𝒉(𝒌)𝒙(𝒏 − 𝒌)
𝒅 (𝟏𝟏. 𝟔. 𝟏𝟒)
𝒌=𝟎
is a vector in the subspace spanned by the data {𝑥(𝑘),0 ≤ 𝑘 ≤ 𝑀 − 1}. The
error 𝑒(𝑛) is a vector from 𝑑(𝑛) 𝑡𝑜 𝑑̂(𝑛) [𝑖. 𝑒. , 𝑑(𝑛) = 𝑒(𝑛) + 𝒅̂ (𝒏)}, as shown
in Fig. 11.9. The orthogonality principle states that the length ℰ𝑚 = 𝐸 |𝑒(𝑛)|2
is a minimum when 𝑒(𝑛) is perpendicular to the data subspace
[𝑖. 𝑒. , 𝑒(𝑛) 𝑖𝑠 𝑜𝑟𝑡ℎ𝑜𝑔𝑜𝑛𝑎𝑙 𝑡𝑜 𝑒𝑎𝑐ℎ 𝑑𝑎𝑡𝑎 𝑝𝑜𝑖𝑛𝑡 𝑥(𝑘), 0 ≤ 𝑘 ≤ 𝑀 — 1].
We note that the solution obtained from the normal equations in (11.6.3) is
unique if the data {𝑥(𝑛)} in the estimate 𝒅 ̂ (𝒏) are linearly independent. In this
case, the correlation matrix 𝜞𝑚 is nonsingular. On the other hand, if the data are
linearly dependent, the rank of 𝜞𝑚 is less than M and therefore the solution is not
unique. In this case, the estimate 𝒅̂ (𝒏) can be expressed as a linear combination
of a reduced set of linearly independent data points equal to the rank of 𝜞𝑚 .
Since the MSE is minimized by selecting the filter coefficients to satisfy the
orthogonality principle, the residual minimum MSE is simply
𝑴𝑴𝑺𝑬𝑴
= 𝑬[𝒆(𝒏)𝒅∗ (𝒏)] (𝟏𝟏. 𝟔. 𝟏𝟓)
which yields the result given in (11.6.6).
Sec. 11.6 Wiener Filters for Filtering and Prediction 885

h( 1 ).r(2)

Figure 11.9 Geometric interpretation


x(2)of linear MSE problem

11.6.3 IIR Wiener Filter

In the preceding section we constrained the filter to be FIR and obtained a set of AI
linear equations for the optimum filter coefficients. In this section we allow the filter to
be infinite in duration (IIR) and the data sequence to be infinite as well. Hence the
filter output is
𝒚(𝒏)

= ∑ 𝒉(𝒌)𝒙(𝒏
𝒌=𝟎
− 𝒌) (𝟏𝟏. 𝟔. 𝟏𝟔)
The filter coefficients are selected to minimize the mean-square error between the
desired output 𝑑(𝑛) 𝑎𝑛𝑑 𝑦(𝑛), that is.
𝓔𝑴 = 𝑬|𝒆(𝒏)|𝟐
∞ 𝟐

𝑬 |𝒅(𝒏) − ∑ 𝒉(𝒌)𝒙(𝒏 − 𝒌)| (𝟏𝟏. 𝟔. 𝟏𝟕)


𝟎
Application of the orthogonality- principle leads to the Wiener-Hopf equation.

∑ 𝒉(𝒌)𝜸𝒙𝒙 (𝒍 − 𝒌) = 𝜸𝒅𝒙 (𝒍) 𝒍


𝟎
≥𝟎 (𝟏𝟏. 𝟔. 𝟏𝟖)
The residual 𝑴𝑴𝑺𝑬 is simply obtained by application of the condition given by
(11.6.15). Thus we obtain

𝒎𝒊𝒏
𝑴𝑴𝑺𝑬∞ = 𝒉 𝓔∞= 𝝈𝟐𝒅 − ∑ 𝒉𝒐𝒑𝒕 (𝒌)𝜸∗𝒅𝒙 (𝒌) (𝟏𝟏. 𝟔. 𝟏𝟗)
𝒌=𝟎
1386 Linear Prediction and Optimum Linear Fitters Chap. 11
The Wiener—Hopf equation given by (11.6.18) cannot be solved directly with z-transform
techniques, because the equation holds only for 𝐼 ≥ 0. We shall solve

for the optimum UR Wiener filter based on the innovations representation of the stationary random
process {𝑥(𝑛)}.
Recall that a stationary random process {𝑥(𝑛)} with autocorrelation 𝜸𝒙𝒙 and power
spectral density 𝜞𝒙𝒙 (𝑓)can be represented by an equivalent innovations process, {𝑖(𝑛)} by
passing {𝑥(𝑛)} through a noise-whitening filter with system function 1/𝐺(𝑧), where 𝐺(𝑧) is the
minimum-phase part obtained from the spectral factorization of 𝜞𝒙𝒙 (𝐳)
𝜞𝒙𝒙 (𝒛)
𝟐
= 𝝈𝒊 𝑮(𝒛)𝑮(𝒛−𝟏 ) (𝟏𝟏. 𝟔. 𝟐𝟎)
Hence 𝐺(𝑧) is analytic in the region |𝑧| > 𝑟1 𝑤ℎ𝑒𝑟𝑒 𝜞𝒙𝒙 < 1.
Now, the optimum Wiener filter can be viewed as the cascade of the whitening filter
1/𝐺(𝑧) with a second filter, say 𝑄(𝑧). whose output 𝑦(𝑛) is identical to the output of the
optimum Wiener filter. Since

𝒚(𝒏) = ∑ 𝒒(𝒌)𝒊(𝒏
𝒌=𝟎
− 𝒌) (𝟏𝟏. 𝟔. 𝟐𝟏)
and 𝑒(𝑛) = 𝑑(𝑛) − 𝑦(𝑛), application of the orthogonality principle yields the new
Wiener-Hopf equation as

∑ 𝒒(𝒌)𝜸𝒊𝒊 (𝒍 − 𝒌) = 𝜸𝒊𝒅 (𝒍) 𝐥≥𝟎 (𝟏𝟏. 𝟔. 𝟐𝟐)


𝒌=𝟎
But since {𝑖(𝑛)}is white, it follows that 𝜸𝒊𝒅 (𝑙 − 𝑘)=0 unless𝑙 = 𝑘. Thus we obtain
the solution as
𝜸𝒅𝒊 (𝟎) 𝜸𝒅𝒊 (𝒍)
𝒒(𝒏) = = 𝒍≥𝟎 (𝟏𝟏. 𝟔. 𝟐𝟑)
𝜸𝒊𝒊 (𝟎) 𝝈𝟐𝒊

The z-transform of the sequence {𝑞(𝑙)} is


𝑸(𝒛) = ∑ 𝒒(𝒌)𝒛−𝒌
𝟎

𝟏
= 𝟐
∑ 𝜸𝒅𝒊 (𝒌)𝒛−𝒌 (𝟏𝟏. 𝟔. 𝟐𝟒)
𝝈𝒊
𝟎
if we denote the z-transform of the two-sided erosscorrelation sequence 𝜸𝒊𝒅 (𝑘) by 𝜞𝒅𝒊 (𝒛):

𝜞𝒅𝒊 (𝒛) = ∑ 𝜸𝒅𝒊 (𝒌)𝒛−𝒌 (𝟏𝟏. 𝟔. 𝟐𝟓)


𝟎
and define |𝜞𝒅𝒊 (𝒛)|+ as

|𝜞𝒅𝒊 (𝒛)|+ = ∑ 𝜸𝒅𝒊 (𝒌)𝒛−𝒌 (𝟏𝟏. 𝟔. 𝟐𝟔)


𝟎
then
𝟏
𝑸(𝒛) = |𝜞𝒅𝒊 (𝒛)|+ (𝟏𝟏. 𝟔. 𝟐𝟕)
𝝈𝟐𝒊
Sec. 11.6 Wiener Filters for Filtering and Prediction 887

To determine |𝜞𝒅𝒊 (𝒛)|+ ,we begin with the output of the noise-whitening filter, which can
be expressed as

𝒊(𝒏) = ∑ 𝒗(𝒌)𝒙(𝒏 − 𝒌) (𝟏𝟏. 𝟔. 𝟐𝟖)


𝒌=𝟎

where {𝑣(𝑘), 𝑘 ≥ 0} is the impulse response of the noise-whitening filter,

𝟏
≡ 𝑽(𝒛)
𝑮(𝒛)

= ∑ 𝒗(𝒌)𝒛−𝒌 (𝟏𝟏. 𝟔. 𝟐𝟗)


𝟎
Then
𝜸𝒅𝒊 (𝒌) = 𝑬[𝒅(𝒏)𝒊∗ (𝒏 − 𝒌)]

= ∑ 𝒗(𝒎)𝑬[𝒅(𝒏)𝒙∗ (𝒏 − 𝒎 − 𝒌)] (𝟏𝟏. 𝟔. 𝟑𝟎)


𝒎=𝟎

= ∑ 𝒗(𝒎)𝜸𝒅𝒙 (𝒌 + 𝒎)
𝒎=𝟎
m

The z-transform of the crosscorrelation 𝜸𝒅𝒊 (𝒌) is

∞ ∞

𝜞𝒅𝒊 (𝒛) = ∑ [ ∑ 𝒗(𝒎)𝜸𝒅𝒙 (𝒌 + 𝒎) ] 𝒛−𝒌


𝒌=−∞ 𝒎=𝟎
∞ ∞

= ∑ 𝒗(𝒎) ∑ 𝜸𝒅𝒙 (𝒌 + 𝒎)𝒛−𝒌 (𝟏𝟏. 𝟔. 𝟑𝟏)


𝒎=𝟎 𝒌=−∞
∞ ∞

= ∑ 𝒗(𝒎)𝒛−𝒎 ∑ 𝜸𝒅𝒙 (𝒌)𝒛−𝒌


𝒎=𝟎 𝒌=−∞
𝜞𝒅𝒙 (𝒛)
= 𝑽(𝒛−𝟏 )𝜞𝒅𝒙 (𝒛) =
𝐆(𝐳 −𝟏 )
Therefore,
𝑸(𝒛)
𝟏 𝜞𝒅𝒙 (𝒛)
= 𝟐[ ] (𝟏𝟏
𝝈𝒊 𝐆(𝐳 −𝟏 ) +
Finally, the optimum 𝑰𝑰𝑹 Wiener filter has the system function
𝑸(𝒛)
𝑯𝒐𝒑𝒕 (𝒛) =
𝑮(𝒛)
𝟏 𝜞𝒅𝒙 (𝒛)
= [ ] (𝟏𝟏. 𝟔. 𝟑𝟑)
𝝈𝟐𝒊 𝑮(𝒛) 𝐆(𝐳 −𝟏 ) +
In summary, the solution for the optimum IIR Wiener filter requires that we
perform a spectral factorization of 𝜞𝒙𝒙 (𝒛) ) to obtain 𝐺(𝑧), the minimum-phase
component, and then we solve for the causal part of 𝜞𝐝𝐱 (𝐳)/𝐺(𝐺−1 ) The following
example illustrates the procedure.
888 Linear Prediction and Optimum Linear Filters Chap. 11

Example 11.61
Let us determine the optimum HR Wiener filter for the signal given in Example 11.6.1.
Solution For this signal we have
𝟏 𝟏
𝟏. 𝟖 (𝟏 − 𝟑 𝐳 −𝟏 ) (𝟏 − 𝟑 𝐳)
𝜞𝐱𝐱 (𝒛) = 𝜞𝐬𝐬 (𝒛) + 𝟏 =
(𝟏 − 𝟎. 𝟔𝐳 −𝟏 )(𝟏 − 𝟎. 𝟔𝐳)
where 𝜎𝑖2 = 1.8 and
𝟏
𝟏 − 𝟑 𝐳 −𝟏
𝐺(𝑧) =
(𝟏 − 𝟎. 𝟔𝐳 −𝟏 )
The z-transform of the crosscorreiation 𝜸𝐝𝐱 (𝒎) is
𝟎. 𝟔𝟒
𝜞𝐱𝐱 (𝒛) = 𝜞𝐬𝐬 (𝒛) =
(𝟏 − 𝟎. 𝟔𝐳 −𝟏 )(𝟏 − 𝟎. 𝟔𝐳)
Hence
𝜸𝐝𝐱 (𝒎) 𝟎. 𝟔𝟒
[ ] = [ ]
𝐺(𝐳 −𝟏 ) ∗ (𝟏 − 𝟎. 𝟔𝐳 −𝟏 )(𝟏 − 𝟎. 𝟔𝐳) +

𝟎. 𝟖 0.266𝑧
=[ + ]
(𝟏 − 𝟎. 𝟔𝐳 )−𝟏 1
1 − 3𝑧
+
0.8
=
𝟏 − 𝟎. 𝟔𝐳 −𝟏
The optimum IIR filter has the system function
1 𝟏 − 𝟎. 𝟔𝐳 −𝟏 𝟎. 𝟖
𝐻𝑜𝑝𝑡 (𝑧) = ( )( )
1.8 𝟏 − 𝟏 𝐳 −𝟏 𝟏 − 𝟎. 𝟔𝐳 −𝟏
𝟑
4
= 9
𝟏
𝟏 − 𝟑 𝐳 −𝟏
and an impulse response
4 1 𝑛
𝐻𝑜𝑝𝑡 (𝑛) = ( )
9 3
We conclude this section by expressing the minimum 𝑴𝑺𝑬 given by (11.6.19) in terms of the
frequency-domain characteristics of the filter. First, we note that 𝜎𝑑2 = 𝐸|𝑑(𝑛)|2 simply
the value of the autocorrelation sequence {γdd (k)} evaluated at k = 0. Since
𝟏
𝜸𝒅𝒅 (𝒌) = ∮ 𝜞𝒅𝒙 (𝒛) 𝐳 𝐤−𝟏 𝒅𝒛 (𝟏𝟏. 𝟔. 𝟑𝟒)
𝟐𝝅𝒋
it follows that
𝟏 𝜞𝒅𝒙 (𝒛)
𝝈𝟐𝒅 = 𝜸𝒅𝒅 (𝟎) = ∮ 𝒅𝒛 (𝟏𝟏. 𝟔. 𝟑𝟓)
𝟐𝝅𝒋 𝐳

where the contour integral is evaluated along a closed path encircling the origin in
the region of convergence of 𝜞𝐝𝐝 (𝒛) .
Sec. 11.6 Wiener Filters or Filtering and Prediction 889

The second term in (11.6.19) is also easily transformed to the frequency domain by application
of Parseval's theorem. Since 𝐻𝑜𝑝𝑡 (𝑘) = 0 𝑓𝑜𝑟 𝑘 < 0 we have

𝟏
∑ 𝒉𝒐𝒑𝒕 (𝒌)𝜸∗𝒅𝒙 (𝒌) = ∮ 𝐇𝐨𝐩𝐭 (𝐳)𝜞𝒅𝒙 (𝒛−𝟏 ) 𝒛−𝟏 𝒅𝒛 (𝟏𝟏. 𝟔. 𝟑𝟔)
𝟐𝝅𝒋
𝒌=−∞

where C is a closed contour encircling the origin that lies within the common region of convergence of
𝐻𝑜𝑝𝑡 (𝑛) 𝒂𝒏𝒅 𝜞𝒅𝒙 (𝒛−𝟏 )
By combining (11.6.35) with (11+634 we obtain the desired expression for the 𝑴𝑴𝑺𝑬∞ in the
form
𝟏
𝑴𝑴𝑺𝑬∞ = ∮[𝜞𝒅𝒅 (𝒛) − 𝐇𝐨𝐩𝐭 (𝐳)𝜞𝒅𝒙 (𝒛−𝟏 )]𝒛−𝟏 𝒅𝒛 (𝟏𝟏. 𝟔. 𝟑𝟕)
𝟐𝝅𝒋

E xa m pl e 1 1. 63
For the optimum Wiener filter derived in Example 11.6.2, the minimum 𝑴𝑺𝑬 is
𝟏 𝟎. 𝟑𝟓𝟓𝟓
𝑴𝑴𝑺𝑬∞ = ∮[ ] 𝒅𝒛
𝟐𝝅𝒋 𝟏
(𝒛 − 𝟑) (𝟏 − 𝟎. 𝟔𝒛)
𝟏
There is a single pole inside the unit circle at 𝒛 = 𝟑 By evaluating the residue at the
pole, we obtain
𝑴𝑴𝑺𝑬∞ = 0.444
We observe that this MMSE is only slightly smaller than that for the optimum two-tap Wiener filter in
Exatn.ple 11.6.1.

11.6,4 Noncausal Wiener Filter

In the preceding section we constrained the optimum Wiener filter to be causal [𝒊. 𝒆. , 𝒉𝒐𝒑𝒕 (𝑛) =
0 𝑓𝑜𝑟 𝑛 < 0] . In this section we drop this condition and allow the filter to include both the
infinite past and the infinite future of the sequence {𝑥(𝑛)} in forming the output 𝑦(𝑛), that is.

𝒚(𝒏) = ∑ 𝒉(𝒌)𝒙(𝒏 − 𝒌) (𝟏𝟏. 𝟔. 𝟑𝟖)


𝒌=−∞

The resulting filter is physically unrealizable. It can also be viewed as a smoothing filter in which the infinite
future signal values are used to smooth the estimate 𝑑(𝑛) = 𝑦(𝑛) of the desired signal 𝑑(𝑛).
Application of the orthogonality principle yields the Wiener—Hopf equation for the noncausal filter
in the form

∑ 𝒉(𝒌)𝜸𝒙𝒙 (𝒍 − 𝒌) = 𝜸𝒙𝒙 (𝒍) − ∞ < 𝑙


𝒌=−∞
<∞ (𝟏𝟏. 𝟔. 𝟑𝟗)
and the resulting 𝑴𝑴𝑺𝑬𝒏𝒄 as

𝑴𝑴𝑺𝑬𝒏𝒄 = 𝝈𝟐𝒅 ∗
− ∑ 𝒉(𝒌)𝜸𝒅𝒙 (𝒌) (𝟏𝟏. 𝟔. 𝟒𝟎)
𝒌=−∞
since (11.6.39) holds for −∞ < 𝑙 < +∞ this equation can be transformed directly to yield the
optimum noncausal Wiener filter as
𝜞𝒅𝒙 (𝒛)
𝑯𝒏𝒄 (𝒛) = (𝟏𝟏. 𝟔. 𝟒𝟏)
𝜞𝐱𝐱 (𝒛)
The 𝑀𝑀𝑆𝐸𝑛𝑐 can also be simply expressed in the z-domain as
𝟏
𝑴𝑴𝑺𝑬𝒏𝒄 = ∮[𝜞𝒅𝒅 (𝒛) − 𝐇𝐧𝐜 (𝐳)𝜞𝒅𝒙 (𝒛−𝟏 )]𝒛−𝟏 𝒅𝒛 (𝟏𝟏. 𝟔. 𝟒𝟐)
𝟐𝝅𝒋

in the following example we compare the form of the optimal noncausal filter with the optimal
causal filter obtained in the previous section.

Example 11.6.4
The optimum noncausal Wiener filter for the signal characte ristics given in Example
11.6.1 is given by (11,6,41), where
𝟎. 𝟔𝟒
𝜞𝐝𝐬 (𝒛) = 𝜞𝐬𝐬 (𝒛) =
(𝟏 − 𝟎. 𝟔𝐳 −𝟏 )(𝟏 − 𝟎. 𝟔𝐳)
and
𝜞𝐱𝐱 (𝒛) = 𝜞𝐬𝐬 (𝒛) + 1
𝟐(𝟏 − 𝟎. 𝟑𝐳 −𝟏 − 𝟎. 𝟑𝐳)
=
(𝟏 − 𝟎. 𝟔𝐳 −𝟏 )(𝟏 − 𝟎. 𝟔𝐳)
Then.
𝟎. 𝟑𝟓𝟓𝟓
=
𝟏 𝟏
(𝟏 − 𝟑 𝐳 −𝟏 ) (𝟏 − 𝟑 𝐳)
This filter is clearly noncausal.
The minimum 𝑴𝑺𝑬 achieved by this filter is determined from evaluating
(11.6.42). The integrand is
𝟏 𝟎. 𝟑𝟓𝟓𝟓
𝜞𝐬𝐬 (𝒛)[𝟏 − 𝐇𝐧𝐜 (𝐳)] =
𝟐 𝟏 𝟏
(𝟏 − 𝟑 𝐳 −𝟏 ) (𝟏 − 𝟑 𝐳)

The only pole inside the unit circle is 𝑧 = 1. Hence the residue is
0.3555 0.3555
= = 0.40
𝟏 8/9
𝟏−𝟑𝐳

Hence the minimum achievable 𝑴𝑺𝑬 obtained with the optimum noncausal Wiener
filter is
𝑴𝑴𝑺𝑬𝑵𝑪 = 𝟎. 𝟒𝟎
Note that this is lower than the 𝑴𝑴𝑺𝑬 for the causal filter, as expected.

11.7 SUMMARY AND REFERENCES

The major focal point in this chapter is the design of optimum linear systems for linear
prediction and filtering. The criterion for optimality is the minimization of the mean-square
error between a specified desired filter output and the actual filter output.
Sec. 11.7 Summary and References 891

In the development of linear prediction, we demonstrated that the equations for


the forward and backward prediction errors specified a lattice filter whose
parameters, the reflection coefficients {Km), were simply related to the filter coef-
ficients (a, (k)). of the direct form FIR linear predictor and the associated prediction error
filter. The optimum filter coefficients {Km} and lam(k)] . are easily obtained from the
solution of the normal equations.
We described two computationally efficient algorithms for solving the normal
equations, the Levinson—Durbin algorithm and the Schtir algorithm. Both algo-
rithms are suitable for solving a Toeplitz system of linear equations and have a
computational complexity of 0(p 2 ) when executed on a single processor. How-
ever, with full parallel processing, the Schur algorithm solves the normal equations in
0(p) time, whereas the Levinson—Durbin algorithm requires 0(p log p) time
In addition to the all-zero lattice filter resulting from linear prediction_ we
also derived the AR lattice (all-pole) filter structure and the ARMA lattice-ladder
(pole—zero) filter structure. Finally, we described the design of the class of opti-
mum linear filters, called Wiener filters.
Linear estimation theory has had a long and rich history of development
over the past four decades. Kailath (1974) presents a historical account of the first three
decades. The pioneering work of Wiener (1949) on optimum linear filtering for
statistically stationary signals is especially significant. The generalization of the
Wiener filter theory to dynamical systems with random inputs was developed by
Kalman (1960) and Kalman and Bucy (1961). Kalman filters are treated in the
books by Meditch (1969)„ Brown (1983), and Chui and Chen (1987). The
monograph by Kailath (1981) treats both Wiener and Kalman filters.
There are numerous references on linear prediction and lattice filters. Tu-
torial treatments on these subjects have been published in the journal papers by
Ivlakhoul (1975, 1978) and Friedlander (1982a, b). The books by Haykin (1991).
Markel and Gray 1976), and Tretter (1976) provide comprehensive treatment s of
these subjects. Applications of linear prediction to spectral analysis are found in the
books by Kay (1988) and Marple (1987), to geophysics in the book Robinson and
Treitel (1980), and to adaptive filtering in the book by Haykin (1991).
The Levinson—Durbin algorithm for solving the normal equations recursively
was given by Levinson (1947) and later modified by Durbin (1959). Variations of
this classical algorithm, called split Le inson algorithms, have been developed by
De/sarte and Genin (1986) and by Krishna (1988). These algorithms exploit additional
symmetries in the Toeplitz correlation matrix and save about a factor of 2 in the
number of multiplications.
The Schtir algorithm was originally described by Schur (1917) in a paper
published in German. An English translation of this paper appears in the book
edited by Gohberg (1986). The Schur algorithm is intimately related to the poly -
nomials ( An, (z)) , which can be interpreted as orthogonal polynomials. A treatment of
orthogonal polynomials is given in the books by SzegO (1967), Grenander and Szegi5
(1958), and Geronimus (1958). The thesis of Vieira (1977) and the papers by Kailath
et al. (1978), Delsarte et al. (1978). and Youla and Kazanjian (1978)

You might also like