Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Digital Signal Processing 1

0.1 DSP Architecture


Unlike microprocessors and microcontrollers, digital signal processors use different dedi-
cated hardware architecture. The design of general microprocessors and microcontrollers is
based on the Von Neumann architecture. A Von Neumann processor contains a single, shared
memory for programs and data, a single bus for memory access, an arithmetic unit, and a
program control unit. The Von Neumann architecture operates the cycles of fetching and
execution by fetching an instruction from memory, decoding it via the program control unit,
and finally executing instruction. The Von Neumann-based processor has this bottleneck
mainly due to the use of a single, shared memory for both program instructions and data.
To accelerate the execution speed of digital signal processing, Digital signal processors are
designed based on the Harvard architecture. The digital signal processor has two separate
memory spaces. One is dedicated for the program code, while the other is employed for data.
Hence, to accommodate two memory spaces, two corresponding address buses and two data
buses are used. Thus the Harvard processor can fetch the program instruction and data in
parallel at the same time, the former via the program memory bus and the latter via the
data memory bus. Digital signal processors based on the Harvard architecture is shown in
Fig. 1. There is an additional unit called a multiplier and accumulator (MAC), which is the
dedicated hardware used for the digital filtering operation. The shift unit, is used for the
scaling operation for fixed-point implementation when the processor performs digital filtering.
The Von Neumann architecture generally has the execution cycles described in Fig. 2.
The fetch cycle obtains the opcode from the memory, and the control unit will decode the
instruction to determine the operation. Next is the execute cycle. Based the decoded in-
formation, execution will modify the content of the register or the memory. Once this is
completed, the process will fetch the next instruction and continue. As depicted in Fig. 3, in
Harvard architecture the execute and fetch cycles are overlapped. We call this the pipelining
operation. The digital signal processor performs one execution cycle while also fetching the
next instruction to be executed. Hence, the processing speed is increased.
2 Module-0

Address Program
generator control

Program memory address bus

Data memory address bus

I/O Program Data


ALU
devices memory memory

Program memory data bus

Data memory data bus

MAC Shifter

Figure 1: Digital signal processors based on the Harvard architecture.

Fetch Execute

Fetch Execute

Fetch Execute

Figure 2: Von Neumann based execution cycle.

Fetch Execute

Fetch Execute

Fetch Execute

Figure 3: Harvard based execution cycle.


Digital Signal Processing 3

0.2 DSP Hardware Units


MAC unit performs both multiply and addition functions. It operates in two stages.
Firstly it computes the product of given numbers and forward the result for the second
stage operation i.e. addition/accumulate. This is dedicated hardware, and the corresponding
instruction is generally referred to as a MAC operation. The basic structure of the MAC is
shown in Fig. 4. The multiplier has a pair of input registers, each holding the 16-bit input
to the multiplier. The result of the multiplication is accumulated in the 32-bit accumulator
unit.

Operand Operand

X-register Y-register

16 16

Multiplier

32

Accumulator

32

Result
register

Figure 4: MAC for digital signal processor.

Shifters are required to scale down or scale up operands & results to avoid errors resulting
from overflows and underflows during computations. Shifting data to the right is the same as
dividing the data by 2 and truncating the fraction part (scale down); shifting data to the left
is equivalent to multiplying the data by 2 (scale up). The digital signal processor often shifts
data by several bits for each data word. To speed up such operation, the special hardware
shift unit is designed to accommodate the scaling operation.
Address generators: The digital signal processor generates the addresses for each da-
tum on the data buffer to be processed. A special hardware unit for circular buffering is
used. Fig. 5 describes the basic mechanism of circular buffering for a buffer having eight
data samples. In circular buffering, a pointer is used and always points to the newest data
4 Module-0

sample. After the next sample is obtained the data will be placed at the location of x(n − 7)
and the oldest sample is pushed out. Thus, the location for x(n − 7) becomes the location
for the current sample. The original location for x(n) becomes a location for the past sample
of x(n − 1). For each new data sample, only one location on the circular buffer needs to
be updated. Further, the circular buffer acts like a first-in/first-out (FIFO) buffer, but each
datum on the buffer does not have to be moved.

Data point

x(n)
x(n-7)

x(n-1)
x(n-6)

x(n-2)
x(n-5)

x(n-4) x(n-3)

Figure 5: Demonstration of circular buffering.

0.3 Fixed-point and floating-point formats


Fixed-point digital signal processor represents data in 2’s complement integer format and
manipulates data using integer arithmetic, while a floating-point processor represents number
using a mantissa (fractional part) and an exponent in addition to the integer format. The
formats used by DSP implementation can be classified as fixed or floating point.

0.3.1 Fixed-Point Format


Considering a 3-bit 2’s complement, we can represent all the decimal numbers shown in
Table 5.1. Converting a decimal number to its 2’s complement form requires following steps:

• Convert the magnitude in the decimal to its binary number using the required number
Digital Signal Processing 5

Table 5.1: A 3-bit 2’s complement number representation.

Decimal Number 2’s Complement


3 011
2 010
1 001
0 000
-1 111
-2 110
-3 101
-4 100

of bits.
• If the decimal number is positive, its binary number is its 2’s complement representation;
if the decimal number is negative, perform the 2’s complement operation, where we
negate the binary number by changing the logic 1s to logic 0s and logic 0s to logic 1s
and then add a logic 1 to the data.

Example 1: Convert the following decimal numbers to its 2’s complement form.
(i) 2
(ii) -2

 Solution:
(i) Given 2d
2d = (010)2

(ii) Given −2d


2d = (010)2
1’s complement of the binary number = 101
add the logic 1 to the data 101 + 1 = 110
−2d = (110)2

2’s complement number system has a dynamic range which is very narrow. Since the
basic DSP operations include multiplications and additions, results of operation can cause
overflow problems.

Example 2: Perform the multiplication in 2’s complement form.


(i) 2 and -1
(ii) 2 and -3
6 Module-0

 Solution:
(i)
0 1 0 −→ 2d
× 0 0 1 −→ 1d
0 1 0
0 0 0
+ 0 0 0
0 0 0 1 0
The 2’s complement of 00010 = 11110. Removing two extended sign bits give 110. The
answer is 110 = −2, which is within the system range.
(ii)
0 1 0 −→ 2d
× 0 1 1 −→ 3d
0 1 0
0 1 0
+ 1 0 0
0 0 1 1 0
The 2’s complement of 00110 = 11010. Removing two extended sign bits gives 010. Since
the binary number 010 is 2, which is not (-6) as what we expect, overflow occurs; that is, the
result of the multiplication (-6) is out of the dynamic range.

Example 3: Perform the multiplication of 2/4 and -3/4 in fractional binary 2’s complement.

 Solution:
2/4 = 0 × 20 + 1 × 2−1 + 0 × 2−2 = 0.10
3/4 = 0 × 20 + 1 × 2−1 + 1 × 2−2 = 0.11
0. 1 0 −→ 2/4
× 0. 1 1 −→ 3/4
0 1 0
0 1 0
+ 0 0 0
0. 0 1 1 0
The 2’s complement of 0.0110 = 1.1010. The answer in decimal form should be 1.1010 =
(−1) × (0.0110)2 = −0 × 2−1 + 1 × 2−2 + 1 × 2−3 + 0 × 2−4 = −3/8. If we truncate the last
two least significant bits to keep the 3-bit binary number, we have an approximate answer:
1.10 = (−1) × (0.10)2 = −(1 × 2−1 + 0 × 2−2 ) = −1/2.

Truncation error occurs. With such a scheme, we can avoid overflow due to multiplication but
cannot prevent overflow due to addition. Consider the addition example (0.11)2 + (0.01)2 =
Digital Signal Processing 7

(0.00)2 , where the result 1.00 is a negative number. Adding two positive fractional numbers
yields a negative number. Hence, overflow occurs.

0.3.2 Q-format
Q-format number representation is the most common one used in fixed-point DSP imple-
mentation. Fig. 6 shows the Q-15 format. Q-15 means that the data are in a sign magnitude
form in which there are 15 bits for magnitude and one bit for sign.

Q15 -20 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12 2-13 2-14 2-15

Figure 6: Q-15 (fixed-point) format.

Example 4: Find the signed Q-15 representation for the decimal number 0.450321.

 Solution: For a positive fractional number, we multiply the number by 2 if the product
is larger than 1, carry bit 1 as a most significant bit (MSB), and copy the fractional part to
the next line for the next multiplication by 2; if the product is less than 1, we carry bit 0 to
MSB. The procedure continues to collect all 15 magnitude bits.

Number Product Carry


0.450321 × 2 0.900642 0 (MSB)
0.900642 × 2 1.801284 1
0.801284 × 2 1.602568 1
0.602568 × 2 1.205136 1
0.205136 × 2 0.410272 0
0.410272 × 2 0.820544 0
0.820544 × 2 1.641088 1
0.641088 × 2 1.282176 1
0.282176 × 2 0.564352 0
0.564352 × 2 1.128704 1
0.128704 × 2 0.257408 0
0.257408 × 2 0.514816 0
0.514816 × 2 1.029632 1
0.029632 × 2 0.059264 0
0.059264 × 2 0.118528 0 (LSB)

(Q-15)= (0.011 1001 1010 0100)2 = (39A4)h


8 Module-0

Example 5: Find the signed Q-15 representation for the decimal number -0.450321.

 Solution:
Converting the Q-15 format for the corresponding positive number we have
0.450321=(0.011 1001 1010 0100)2
Applying 2’s complement, the Q-15 format becomes
-0.450321=(1.100 0110 0101 1100)2 = (C65C)h

Example 6: Convert the Q-15 signed number 0.011100110100100 to the decimal number.

 Solution:
The decimal number is; 2−2 + 2−3 + 2−4 + 2−7 + 2−8 + 2−10 + 2−13 = 0.450317382

We verify that the truncation error is bounded by |0.450321−0.450317382| = 0.000003618.

Example 7: Add the two numbers 1.110101110000010 and 0.100011110110010 in Q-15 format.

 Solution:
1 1 1 0 1 0 1 1 1 0 0 0 0 0 1 0
+ 0 1 0 0 0 1 1 1 1 0 1 1 0 0 1 0
1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 0
Then the result is 0.011001100110100.
The decimal number is 2−2 + 2−3 + 2−6 + 2−7 + 2−10 + 2−11 + 2−13 = 0.400024.

Example 8: Determine the fixed-point multiplication of 0.25 and 0.5 in Q-3 fixed-point 2’s
complement format.

 Solution:

Q-3 −20 2−1 2−2 2−3

Therefore, 0.25 = 0.010 and 0.5 = 0.100.


Digital Signal Processing 9

0. 0 1 0
x 0. 1 0 0
0 0 0 0
0 0 0 0
0 0 1 0
+ 0 0 0 0
0. 0 0 1 0 0 0

Truncating the least significant bits to convert the result to Q-3 format, we have =0.001.
(0.25 × 0.5 = 0.125 = 0.001).

Example 9: Determine the fixed-point multiplication of 0.75 and 0.5 in Q-3 fixed-point 2’s
complement format.

 Solution:
0.75 = 0.110 and 0.5 = 0.100.
0. 1 1 0
x 0. 1 0 0
0 0 0 0
0 0 0 0
0 1 1 0
+ 0 0 0 0
0. 0 1 1 0 0 0

Truncating the least significant bits to convert the result to Q-3 format, we have =0.011.
(0.75 × 0.5 = 0.375 = 0.011).

Example 10: Determine the square of 0.25 in Q-2 fixed-point 2’s complement format.

 Solution:

Q-2 −20 2−1 2−2

0.25 = 0.01.
0. 0 1
x 0. 0 1
0 0 1
0 0 0
+ 0 0 0
0. 0 0 0 1
10 Module-0

Truncating the least significant bits to convert the result to Q-2 format, we have =0.00,
which is zero. Hence, underflow occurs.

The Q-format number representation is a better choice than the 2’s complement integer
representation, it can prevent multiplication overflow. Some of the issues with the Q-format:

• When converting a decimal number to its Q- format, we may lose accuracy due to the
truncation error (2−N ).

• Addition and subtraction may cause overflow, where adding two positive numbers leads
to a negative number, or adding two negative number yields a positive number; similarly,
subtracting a positive number from a negative number gives a positive number, while
subtracting a negative number from a positive number results in a negative number.

• Multiplying two numbers in Q-15 format leads to Q-30 product. In Q-30 format,
there is one sign-extended bit. We may get rid of it by shifting left by one bit
to obtain Q-31 format and maintaining the Q-31 format for each MAC operation.

Q15 S 15 magnitude bits × Q15 S 15 magnitude bits

Q30 S S 30 magnitude bits

• Underflow can happen when the result of multiplication is too small to be represented
in the Q-format.

0.4 Floating-Point Format


A floating-point variable can represent a wider range of numbers than a fixed-point vari-
able of the same bit width at the cost of precision. The general format for floating-point
number representation is given by

floating point number = M.2exp

where M is the mantissa, or fractional part in Q format, and exp is the exponent. The
mantissa and exponent are signed numbers. The bigger the number of bits designated to the
exponent, the larger the dynamic range.

-23 22 21 20 -20 2-1 2-2 2-3 2-4 2-5 2-6 2-7 2-8 2-9 2-10 2-11 2-12

4-bit exponent 12-bit mantissa

Figure 7: Floating-point format.


Digital Signal Processing 11

Example 11:Convert each of the following decimal numbers to a floating-point number using
12-bit mantissa and 4-bit exponent format.
(i) 0.1601230
(ii) -20.430527

 Solution:
(i) Scale the number 0.1601230 to 0.160123/2−2 = 0.640492 with an exponent of -2.

Thus 0.160123 = 0.640492 × 2−2


Exponent =-2=1110

Now we convert the value 0.640492 using the Q-11 format.

Number Product Carry


0.640492×2 1.280984 1
0.280984×2 0.561968 0
0.801284×2 1.123936 1
0.123936×2 0.247872 0
0.205136×2 0.495744 0
0.410272×2 0.991488 0
0.820544×2 1.982976 1
0.982976×2 1.965952 1
0.965952×2 1.931904 1
0.931904×2 1.863808 1
0.863808*2 1.727616 1

Mantissa in Q-11 =(0101 0001 1111)

Cascading the exponent bits and the mantissa bits yields:

0.1601230=1110 0101 0001 1111

(ii) Scale the number 20.430527/25 = 0.638454

Thus −20.430527 = −0.638454 × 25

Exponent = 5 = 0101

Now we convert the value 0.638454 using the Q-11 format =(0101 0001 1011).

Applying 2’s complement: Mantissa in Q-11= (1010 1110 0101)


12 Module-0

Cascading the exponent bits and the mantissa bits yields:

-20.430527 =0101 1010 1110 0101

0.4.1 Floating-point arithmetic


Arithmetic addition with two floating point numbers (x1 & x2 ) given as

x1 = M1 · 2exp1
x2 = M2 · 2exp2
  
 M1 + M2 × 2−(exp1 −exp2 ) × 2exp1 , if exp1 ≥ exp2
x1 + x2 =  
 M1 × 2−(exp2 −exp1 ) + M2 × 2exp2 if exp1 < exp2

Example 12: Add 0.1601230 & -20.430527.

 Solution:

0.1601230 = 0.640492 × 2−2 = 1110 0101 0001 1111; exp1 = −2


20.430527 = −0.638454 × 25 = 0101 1010 1110 0101; exp2 = 5

To perform the addition, both the numbers should have the same exponent value. Therefore
change the first number so it has the same exponent as the second number, Scale the number
0.1601230 = 0.1601230/25

Thus, 0.1601230 = 0.0050038 × 25

Now, exponent exp1 = 5 = 0101 and M1 = Mantissa in Q-11 =(0000 0000 1010)

Cascading the exponent bits and the mantissa bits yields

0.1601230 = 0101000000001010

M1 0000 0000 1010


M2 1010 1110 0101
M1 + M2 1010 1110 1111

Therefore, (0.1601230)+(-20.430527)=0101 1010 1110 1111


Digital Signal Processing 13

0.4.2 Floating-point multiplication


Multiplication with two floating point numbers (x1 & x2 ) given as

x1 × x2 = (M1 × M2 ) × 2exp1 +exp2 = M × 2exp

That is, the mantissas are multiplied while the exponents are added.

Example 13: Perform multiplication of 0.1601230 & -20.430527.

 Solution:

0.1601230 = 0.640492 × 2−2 = 1110 0101 0001 1111


−20.430527 = −0.638454 × 25 = 0101 1010 1110 0101
exp1 = 1110 & exp2 = 0101
M1 = (000000001010) & M2 = (101011100101)

Adding two exponents in 2’s complement form leads to


E = 1110 + 0101 = 0011
M1 =(0000 0000 1010) is a positive mantissa. However, the MSB is 1 for M2 =(1010 1110
0101), negative mantissa.
Perform the 2’s complement of M2 =(1010 1110 0101)=(0101 0001 1011)

M = M1 × M2 = 000000001010 × 010100011011 = 001101010100

Taking 2’s complement of M =0011 0101 0100=1100 1010 1100

Cascading the exponent bits and the mantissa bits yields the product

0.1601230 × −20.430527 = 0011 1100 1010 1100

During an operation, overflow will occur when a number is too large to be represented
in the floating point number system. Adding two mantissa numbers may lead to a number
larger than 1 or less than -1; and multiplying two numbers causes the addition of their two
exponents so that the sum of the two exponents could overflow. Underflow will occur when
a number is too small to be represented in the number system.

0.5 IEEE Floating point formats


The IEEE - floating-point number has three parts: a sign-bit (S), a exponent (E), and a
fraction (F). There are two types of IEEE floating-point formats (IEEE 754 standard). One
is the IEEE single precision format, and the other is the IEEE double precision format. The
format of IEEE single precision floating-point standard representation requires 23 fraction
bits F , 8 exponent bits E, and 1 sign bit S, with a total of 32 bits for each word. The IEEE
double precision floating-point standard representation requires a 64-bit word. The first bit
14 Module-0

is the sign bit S, the next eleven bits are the exponent bits E, and the final 52 bits are the
fraction bits F . The IEEE floating-point format in double precision significantly increases
the dynamic range of number representation.
IEEE single precision: x = (−1)s × (1.F ) × 2E−127

IEEE double precision: x = (−1)s × (1.F ) × 2E−1023

31 30 23 22 0
S exponent fraction

63 62 52 51 0
S exponent fraction

Figure 8: IEEE single precision and double precision floating-point formats.

Example 14: Convert the following IEEE 754 standard (single precision) to the decimal
number.
(i) 0 10000000 00000000000000000000000
(ii) 0 10000001 10100000000000000000000
(iii) 1 10000001 10100000000000000000000
(iv) 1 10000000 01000000000000000000000

 Solution: IEEE single precision: x = (−1)s × (1.F ) × 2E−127

(i) S = 0, E = 128, F = 0; x = (−1)0 × (1.0) × 2128−127 = 1 × 21 = 2.

(ii) S = 0, E = 129, F = 101; x = (−1)0 × (1.101) × 2129−127 = 1.625 × 22 = 6.5.

(iii) S = 1, E = 129, F = 101; x = (−1)1 × (1.101) × 2129−127 = −1.625 × 22 = −6.5.

(iv) S = 1, E = 128, F = 01; x = (−1)1 × (1.01) × 2128−127 = −1.25 × 21 = −2.5.

Example 15: Convert the following IEEE 754 standard (double precision) to the decimal
number.
(i) 0 01000000000 110.....0
(ii) 1 10000000000 110.....0
Digital Signal Processing 15

 Solution: IEEE double precision: x = (−1)s × (1.F ) × 2E−1023

(i) S = 0, E = 512, F = 11; x = (−1)0 × (1.11) × 2512−1023 = 1.75 × 2−511 .

(ii) S = 1, E = 1024, F = 11; x = (−1)1 × (1.11) × 21024−1023 = −1.75 × 21 .

0.6 Fixed-Point Digital Signal Processors


The typical TMS320C54x fixed-point DSP architecture appears in Fig. 9. They have
one program and three data memory spaces with separate buses, which provide simultaneous
accesses to program instruction and two data operands and enables writing of result at the
same time. Part of the memory is implemented on-chip and consists of combinations of
ROM, dual-access RAM, and single-access RAM. Transfers between the memory spaces are
also possible. As shown in Fig. 9, the C and D data memory address buses and the C and
D data memory data buses deal with fetching data from the data memory while the E data
memory address bus and E data memory data bus are dedicated to moving data into data
memory.

Data Program Program


address address control unit
generator control

Program memory address bus

C data memory address bus

D data memory address bus

E data memory address bus

Program Data
memory memory

Program memory data bus

C data memory data bus

D data memory data bus

E data memory data bus

I/O
ALU MAC Shifter
devices

Figure 9: Basic architecture of TMS320C54x processor family.


16 Module-0

The central processing unit (CPU) of TMS320C54xx processors consists of a 40- bit ALU,
two 40-bit accumulators, a barrel shifter, a 17x17 multiplier, a 40-bit adder, data address
generation logic (DAGEN) with its own arithmetic unit, and program address generation
logic (PAGEN). These major functional units are supported by a number of registers and logic
in the architecture. A powerful instruction set with a hardware-supported, single-instruction
repeat and block repeat operations, block memory move instructions, instructions that pack
two or three simultaneous reads, and arithmetic instructions with parallel store and load
make these devices very efficient for running high-speed DSP algorithms. Advanced Harvard
architecture is employed, where several instructions operate at the same time for given a given
single instruction cycle. Processing performance offers 40 million instructions per second
(MIPS).

0.7 Floating-Point Digital Signal Processors


Fig. 10 shows the typical architecture of Texas Instruments ‘TMS320C3x’ family of
processors. The TMS320C3X is a 32-bit floating-point DSP from Texas Instrument. The
processor has a large memory space (16M words x 32 bits), highly efficient C language engine
and is equipped with dual-access on-chip memories. A program cache is employed to enhance
the execution of commonly used codes. It uses the Harvard architecture. There also exist
memory buses and data buses for direct-memory access (DMA) for concurrent I/O and CPU
operations, and peripheral access such as serial ports, I/O ports, memory expansion, and an
external clock. The C3x CPU contains the floating-point/integer multiplier; an ALU, which
is capable of operating both integer and floating-point arithmetic; a 32-bit barrel shifter;
internal buses; a CPU register file; and dedicated auxiliary register arithmetic units.

XRDY Program RAM RAM ROM XRDY


cache block 0 block 1 block 0
IOSTRB (64 × 32) (1K × 32) (1K × 32) (4K × 32) IOSTRB
XR/W XR/W
XD31-0 XD31-0
Data bus
XA12-0 XA12-0
MSTRB MSTRB
RDY CPU DMA Serial
INT3-0 Integer/floating Integer/floating Address port 0
Peripheral bus

point multiplier point ALU generators


IACK
Serial
Control
XF1-0 port 1
8 Extended-precision registers registers
Controller

MCBL/MP Timer
X1 0
Address Address
X2/CLKIN generator 0 generator 1 Timer
VDD 1
8 auxiliary registers
VSS
12 control registers
SHZ

Figure 10: Basic architecture of TMS320C3x floating-point processor family.


Digital Signal Processing 17

The CPU register file offers 28 registers, which can be operated on by the multiplier
and ALU. It offers implementation of the DSP algorithm without worrying about problems
such as overflows and coefficient quantization. Three floating-point formats are supported.
A short 16-bit floating-point format, a 32-bit single precision format and a 40-bit extended
precision format. The TMS320C30 offers high-speed performance with 60-nanosecond single-
cycle instruction execution time, which is equivalent to 16.7 MIPS.

0.8 FIR and IIR filter implementations in Fixed point systems


In any signal processing system, number of bits per data in signal processing is fixed and
it is limited by the digital signal processor used. In Q-15 notation, the range of numbers that
can be represented is -1 to 1. If the value of a number exceeds these limits, there will be
underflow / overflow. Data is scaled down to avoid overflow. The idea is that scaling down
the coefficients will make them less than 1, and later the filtered output will be scaled up by
the same amount before it is sent to DAC. First, to avoid the overflow for an adder, we can
scale the input down by a scale factor S, the adder output can actually be expressed as a
convolution output: adder output = h(0)x(n) + h(1)x(n − 1) + h(2)x(n − 2) + · · ·
Fig. 11 describes the modified implementation of FIR filter. In the figure, the scale
factor B makes the coefficients bk /B convertible to the Q-format. The scale factors S and
B are usually chosen to be a power of 2, so the simple shift operation can be used in the
programming.

xs(n) b0/B ys(n)


x(n) + y(n)
1/S B S

Z-1
b1/B

Z-1
b2/B

Z-1

Z-1

bk/B

Figure 11: Direct-form I implementation of the FIR filter.

Example 16: Given the FIR filter y(n) = 0.75x(n) + 3x(n − 1) + 0.75x(n − 2) with a passband
gain of 4, and assuming that the input range only occupies one quarter of the full range for
18 Module-0

a particular application, develop the DSP implementation equations in the Q-15 fixed-point
system.

 Solution: Given, y(n) = 0.75x(n) + 3x(n − 1) + 0.75x(n − 2)

h(0) = 0.75; h(1) = 3; h(2) = 0.75.

The scale factor is determined using the impulse response, which consists of the FIR filter
coefficients:
1
S = (|h(0)| + |h(1)| + |h(2)|)
4
1
= (0.75 + 3 + 0.75) .
4
4.5
= = 1.125
4
Overflow may occur. Hence, we select S = 2 (a power of 2). We choose B = 4 to scale all
the coefficients to be less than 1. According to Fig. 11, the developed difference equations
are given by;
x(n) x(n)
xs (n) = s = 2 and y(n) = B × S × ys (n) = 4 × 2 × ys (n) = 8ys (n)

ys (n) = 0.75 3 0.75


4 xs (n) + 4 xs (n − 1) + 4 xs (n − 2)
ys (n) = 0.1875xs (n) + 0.75xs (n − 1) + 0.1875xs (n − 2)

The direct-form I implementation of the IIR filter is illustrated in Fig. 12. As shown in
the figure, the purpose of the scale factor C (power of 2) is to scale down the original filter
coefficients to the Q-format.

Example 17: Given the IIR filter, y(n) = 2x(n)+0.5y(n−1), uses the direct-form I realization,
and for a particular application, the maximum input is Imax = 0:25. Develop the DSP
implementation equations in the Q-15 fixed-point system.

 Solution: Given, y(n) = 2x(n) + 0.5y(n − 1), Applying z-transform,

Y (z) = 2X(z) + 0.5z −1 Y (z)


Y (z) − 0.5z −1 Y (z) = 2X(z)
−1

Y (z) 1 − 0.5z = 2X(z)
Y (z) 2 z
X(z) = H(z) = 1−0.5z −1
2 = 2 z−0.5

The corresponding impulse response, h(n) is,


Digital Signal Processing 19

xs(n) b0/C ys(n) yf(n)


x(n) + y(n)
1/S C S

Z-1 Z-1
b1/C -a1/C

Z-1 Z-1

Z-1 Z-1

bM/C -aN/C

Figure 12: Direct-form I implementation of the FIR filter.

h(n) = 2(0.5)n u(n)

To prevent overflow in the adder, we can compute the S factor with the help of the
Maclaurin series, we get

0.25×2×1 0.5
S = 0.25 × 2(0.5)0 + 2(0.5)1 + 2(0.5)2 + 2(0.5)3 + · · · =

1−0.5 = 0.5 =1

Hence, we do not need to perform input scaling. However, we need scale down all the
coefficients to use the Q-15 format. A factor of C = 4 is selected. From Fig. 12, we get the
difference equations as

xs (n) = x(n)
ys (n) = 24 xs (n) + 0.5
4 yf (n − 1)
ys (n) = 0.5xs (n) + 0.125yf (n − 1)
yf (n) = 4ys (n) and y(n) = yf (n)

0.8.1 Direct-form II implementation of the IIR filter


Direct-form II implementation of the IIR filter is shown in the Fig. 13. The difference
equations are given as

w(n) = x(n) − a1 w(n − 1) − a2 w(n − 2) − · · · − aM w(n − M )


y(n) = b0 w(n) + b1 w(n − 1) + · · · + bM w(n − M )

Two scale factors A and B are designated to scale denominator coefficients and numerator
coefficients to their Q-format representations, respectively. Further, S is a special factor to
scale down the input sample so that the numerical overflow in the first sum in Fig. 13, can
be prevented. All the scale factors A, B, and S are usually chosen to be a power of 2.
20 Module-0

ws(n) ys(n)
A w(n) b0/B
x(n) + + y(n)
1/S 1/A B S

Z-1

-a1/A b1/B
Z-1

Z-1

-aM/A bM/B

Figure 13: Direct-form II implementation of the IIR filter.

ws (n) = A1 w(n) = A1 x(n) − A1 a1 w(n − 1) − A1 a2 w(n − 2) − · · · − A1 aM w(n − M )


w(n) = A × ws (n)
ys (n) = B1 y(n) = B1 b0 w(n) + B1 b1 w(n − 1) + · · · + B1 bM w(n − M )
y(n) = B × ys (n)

To avoid the first adder overflow , the scale factor S can be safely determined by

S = Imax (|h(0)| + |h(1)| + |h(2)| + · · · )

where h(k) is the impulse response due to the denominator polynomial of the IIR filter

 
h(n) = Z −1 1
1+a1 z −1 +···+aM z −M

Example 18: Given the IIR filter, y(n) = 0.75x(n) + 1.49x(n − 1) + 0.75x(n − 2) − 1.52y(n −
1) − 0.64y(n − 2) with a passband gain of 1 and a full range of input, use the direct-form II
implementation to develop the DSP implementation equations in the Q-15 fixed-point system.

 Solution: Given, y(n) = 0.75x(n)+1.49x(n−1)+0.75x(n−2)−1.52y(n−1)−0.64y(n−2)

Y (z) = 0.75X(z) + 1.49z −1 X(z) + 0.75z −2 X(z) − 1.52z −1 Y (z) − 0.64z −2 Y (z)
−1 −2 = X(z) 0.75 + 1.49z −1 + 0.75z −2
  
Y (z) 1 + 1.52z + 0.64z
Y (z) 0.75+1.49z −1 +0.75z −2
X(z) = H(z) = 1+1.52z −1 +0.64z −2
Digital Signal Processing 21

A(z) = 1+1.52z −11+0.64z −2 and B(z) = 0.75 + 1.49z −1 + 0.75z −2


a1 = 1.52, a2 = 0.64 and b0 = 0.75, b1 = 1.46, b2 = 0.75
w(n) = x(n) − a1 w(n − 1) − a2 w(n − 2) = x(n) − 1.52w(n − 1) − 0.64w(n − 2)
y(n) = b0 w(n) + b1 w(n − 1) + b2 w(n − 2) = 0.75w(n) + 1.49w(n − 1) + 0.75w(n − 2) |

We choose the S factor as S = 16 and we choose A = 2 to scale down the denominator


coefficients by half. Since the second adder output after scaling is

0.75 1.49 0.75


ys (n) = B w(n) + B w(n − 1) + B w(n − 2)

To avoid second adder overflow we have to ensure that each coefficient is less than 1,
along with the sum of the absolute values:

0.75 1.49 0.75 2.99


B + B + B = B <1

Therefore B = 2.99. Hence B = 4 is selected.

1 x(n)
xs (n) = x(n) =
S 16
1 1
ws (n) = w(n) = w(n)
A 2
1 1 1
ws (n) = x(n) − ∗ 1.52w(n − 1) − ∗ 0.64w(n − 2)
2 2 2
ws (n) = 0.5x(n) − 0.76w(n − 1) − 0.32w(n − 2)
w(n) = 2ws (n)
1 1
ys (n) = y(n) = y(n)
B 4
1 1 1
ys (n) = b0 w(n) + b1 w(n − 1) + b2 w(n − 2)
B B B
1 1 1
ys (n) = 0.75w(n) + 1.49w(n − 1) + 0.75w(n − 2)
4 4 4
ys (n) = 0.1875w(n) + 0.3725w(n − 1) + 0.1875w(n − 2)
y(n) = (B × S)ys (n) = (4 × 16)ys (n) = 64ys (n)

You might also like