Master Thesis FPGA
Master Thesis FPGA
A THESIS SUBMITTED TO
THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES
OF
MIDDLE EAST TECHNICAL UNIVERSITY
BY
NOVEMBER 2006
This is to certify that we have read this thesis and that in our opinion it is fully
adequate, in scope and quality, as a thesis for the degree of Master of Science.
(METU, EE)
(METU, EE)
(METU, EE)
(METU, EE)
(TBTAK SAGE)
ii
I hereby declare that all information in this document has been obtained
and presented in accordance with academic rules and ethical conduct. I also
declare that, as required by these rules and conduct, I have fully cited and
referenced all material and results that are not original to this work.
iii
ABSTRACT
A NOVEL FAULT TOLERANT ARCHITECTURE ON A
RUNTIME RECONFIGURABLE FPGA
Z
ALIIRKEN YENDEN BMLENDRLEBLR FPGA
ZERNDE HATAYA DAYANIMLI YEN BR YAPI
kar
gvenilirliini
artrmtr.
zellikle
FPGAlerin
Tekli
Hata
sistem
hatal
elemanlar
hatasz
olanlarla
alma
srasnda
To My Family
vi
ACKNOWLEGMENTS
vii
TABLE OF CONTENTS
ABSTRACT .........................................................................................................iv
Z ........................................................................................................................v
ACKNOWLEGMENTS........................................................................................vii
TABLE OF CONTENTS ....................................................................................viii
LIST OF TABLES...............................................................................................xii
LIST OF FIGURES ............................................................................................xiii
LIST OF ABBREVATIONS................................................................................xvi
CHAPTERS
1 INTRODUCTION ...............................................................................................1
1.1
Overview ..........................................................................................1
1.2
1.3
1.4
2 BACKGROUND.................................................................................................5
2.1
2.2
2.3
Reconfiguration Approaches...........................................................12
2.4
Reconfiguration Time......................................................................16
2.5
2.6
2.7
2.6.2
In-Field Upgrades.................................................................20
2.8
2.7.2
Adaptable Computing...........................................................22
2.7.3
Speeding-up Computations..................................................23
2.7.4
3.2
3.1.2
3.1.3
3.2.2
Glitchless Reconfiguration....................................................31
3.2.3
Clocking Logic......................................................................31
3.2.4
3.3
3.4
XAPP290 .............................................................................38
3.4.2
JBITS ...................................................................................39
4.1.2
4.1.3
Clocking Logic......................................................................44
4.2
4.3
4.3.2
4.3.3
Implementation ....................................................................51
4.3.3.1
4.3.3.2
4.3.3.3
4.3.4
4.4
ix
4.4.1
4.4.2
Background ....................................................................................68
5.1.1
5.1.1.1
Redundancy ...................................................................69
5.1.1.2
Availability ......................................................................69
5.1.2
5.1.3
5.1.4
5.1.4.1
Transient Faults..............................................................71
5.1.4.2
5.2
Related Work..................................................................................73
5.3
5.3.1.1
5.3.1.2
5.3.2
5.3.2.1
5.3.2.2
5.3.2.3
5.3.2.4
5.3.3
5.3.4
5.3.4.1
Voter Module..................................................................83
5.3.4.2
A Redundant Module......................................................87
5.3.5
5.3.5.1
5.3.5.2
5.3.5.3
5.3.6
Eliminating Faults.................................................................97
5.3.6.1
5.3.6.2
5.3.7
PC Program .........................................................................99
5.3.7.1
5.3.7.2
5.3.7.3
5.3.8
5.3.9
5.3.10
Fault Injection.....................................................................105
5.3.10.1
5.3.10.2
6 CONCLUSIONS ............................................................................................111
6.1
6.2
REFERENCES .................................................................................................114
APPENDICES
A
xi
LIST OF TABLES
xii
LIST OF FIGURES
xiii
xiv
Figure 5-13: Modified Bus Macro that connects Two Non-Adjacent Modules ......93
Figure 5-14: FPGA Editor Snapshots of Bus Macros a) Standard Bus Macro
connecting Two Adjacent Modules b) Modified Bus Macro connecting Two
Non-Adjacent Modules.................................................................................94
Figure 5-15: Alternative Partial Configurations of Module Three .........................95
Figure 5-16: Connections of Bus Macros on a Redundant Module......................96
Figure 5-17: Alternative Configurations of a Module............................................99
Figure 5-18: Screenshot of the Supervisor PC Program....................................100
Figure 5-19: An example of Communication Protocol Commands during Error
Recovery Operation of a Module................................................................103
Figure 5-20: Flowchart of Fault Recovery Algorithm that Runs on the PC Program
..................................................................................................................104
Figure 5-21: Configurable Logic Block in Editing Mode .....................................107
Figure 5-22: A virtual faulty CLB and it is mapping on alternative placements...110
Figure A-1: Top Layer PCB of RS232 Circuit ....................................................119
Figure A-2: Top Overlay PCB of RS232 Circuit.................................................119
Figure A-3: Schematic of RS232 Circuit............................................................120
Figure B-1: Simulation of Roll Forwarding Method 1 (Constant Frequency Rate)
..................................................................................................................121
Figure B-2: Simulation of Roll Forwarding Method 2 (Variable Frequency Rate)
..................................................................................................................121
Figure D-1: Module Placements of the TMR Design (Snapshot is taken with
PACE)........................................................................................................124
Figure D-2: FPGA Editor View of TMR Design..................................................125
xv
LIST OF ABBREVATIONS
ALU
API
ASIC
CAD
CRC
DSP
FPGA
FSM
GUI
HDL
I/O
Input-Output
IP
Intellectual Property
LUT
Look-up Table
PCB
PE
Processing Element
PROM
RA
Reconfigurable Architecture
RAM
RTR
Runtime Reconfiguration
SDR
SEU
SoC
System on Chip
TMR
UART
VHDL
VHSIC
CHAPTER S
xvi
CHAPTER I
1INTRODUCTION
1.1
OVERVIEW
The microprocessors provide a flexible environment for the programmers.
SRAM
based
reconfigurable
devices
enable
changing
configuration data whenever required. Some devices use this property to change
configuration data during the device is running. Therefore, changing demands
during the operation can be satisfied by reconfiguring these devices. This type of
reconfiguration is called Runtime Reconfiguration (RTR). RTR introduced Virtual
Hardware concept. It allows same hardware sources to be used for different
purposes at different times by reconfiguring hardware. Therefore, a runtime
reconfigurable architecture enables using unlimited circuits in only one chip by
time multiplexing them.
RTR can be used in adaptable hardware applications, in-field upgrade of
hardware. Other advantages of time multiplexing sources by RTR are reduced
cost and reduced power of the system. Most importantly, speed-up can be
obtained for different types of computations. Consequently, adding RTR property
to the reconfigurable architectures offer new opportunities for digital systems.
1.2
1.3
TOOLS USED
In order to implement a runtime reconfigurable system, some hardware
and software tools were used. The tools are the following:
Hardware Tools
VHDL
Xilinx ISE is a CAD tool that is necessary to generate FPGA designs for
Xilinx FPGAs. It has a Graphical User Interface (GUI) that can be used for
standard FPGA designs. However, the GUI is not enough to achieve a runtime
reconfigurable design. The command line tools of ISE such as NgdBuild, MAP,
PAR, and BitGen are used in this design.
VHDL is a language that can describe hardware. It is used to generate
circuits on FPGA. Files written in VHDL are synthesized using Xilinx Synthesis
Tool (XST).
Borland C++ Builder 5 is used to generate a visual PC program. This
program communicates with FPGA board and manages reconfiguration
processes. The program also provides a user interface that enables user
manipulation and shows the status of the system.
1.4
following:
In Chapter 2, a literature survey is done on reconfigurable computing
Basic terms and concepts of reconfigurable architectures are explained. The
application areas of the reconfigurable architectures are also given. Alternative
reconfigurable FPGAs from different vendors are discussed and their critical
characteristics are compared.
In Chapter 3, Xilinx FPGA and its features that enable runtime
reconfiguration are discussed. Some properties of Xilinx FPGAs are explained
from this viewpoint.
In Chapter 4, a simple reconfigurable application is mapped on Xilinx
FPGA. The steps of designing a reconfigurable system are explained using that
simple application. All tools and their batch files are described in detail.
In Chapter 5, a runtime reconfigurable TMR system that is designed to be
highly fault tolerant is presented.
In Chapter 6, a conclusion of this thesis is given. Moreover, planned future
works are given in this chapter.
CHAPTER II
2BACKGROUND
2.1
RECONFIGURABLE COMPUTING
In the last few decades, Reconfigurable Computing has become popular in
2.1.1
2.2
architectures
generally
composed
of
array
of
reconfigurable unit blocks and routing sources that connect these blocks. The size
of these unit blocks reflects granularity of the architecture. The granularity of
these devices ranges from fine to coarse grain. They can be mainly classified as
Fine-Grained,
Coarse-Grained and
Heterogeneous Architectures.
Switch
Matrix
I/O Cell
LC
LC
LC
LC
LC
LC
LC
LC
LC
LC
LC
LC
LC
LC
LC
LC
Routing
Lines
Logic Cells (or Logic Tiles) are used to implement logical functions. Most
of the FPGA vendors use Lookup Table (LUT) to implement bit-level
combinational logic functions on Logic Cells. For example, a LUT takes four input
signals, gives one output signal on Virtex Family devices of Xilinx. The
combinational function (4 inputs, 1 output) of LUT is encoded to 16 Bit and stored
on configuration memory of FPGA. In addition to LUT, a Flip-flop (FF) is placed on
same logic cell to generate synchronous circuits. Logic Cell structure of an SRAM
based FPGA is shown in Figure 2-3
The elements of the array are connected with a configurable routing. I/O
ports connect the PEs to the outside world. The arrangement of the array differs
according to the target application. Different array structures are available such as
Mesh, Crossbar, Linear array, 2-Dimensional Array. In Figure 2-5, these
structures are shown.
Linear arrays are designed as a pipeline with reconfigurable connections.
Rapid and PipeRench are the popular linear array designs. Mesh arrays arrange
PEs in two-dimension and they are connected with nearest neighbor. Popular
mesh based course grained structures are MorphoSys, CHESS, Matrix, RAW and
Garp. Some mesh structures add global connections to increase the performance
of the array. These structures are also called 2-Dimensional arrays and enables
connection of arbitrary PEs. Crossbar structures connect all PEs with each other.
However, this results in increased cost for the routing resources. PADDI-1 and
PADDI-2 are the crossbar structures, which are intended to prototype datapath for
Digital Signal Processing (DSP) Algorithms [4].
Register
PE
RAM
PE
Register
b)
RAM
a)
c)
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
PE
d)
PE
PE
PE
PE
Crossbar Switch
PE
PE
PE
PE
10
Figure 2-6: A Datapath Equation and Hardware Mapping [6] a) Equation mapped to
the node levels b) Hardware mapping of the equation
11
Heterogeneous Architectures
Heterogeneous architectures contain both fine and coarse grain elements
to take advantage of both worlds. Usage of coarse grain elements results in an
increase of the system performance. By using fine grain elements flexibility is
maintained.
Therefore,
newer
reconfigurable
architectures
are
designed
Therefore,
efficient
than
2.3
RECONFIGURATION APPROACHES
Dynamic (Run-Time) Reconfiguration
If device is reconfigured according to the changing demands during the
12
13
the FPGA. Different coprocessor configurations are prepared off-line and they are
loaded to the reconfigurable parts with changing demands.
Another advantage of partial reconfiguration is reduced reconfiguration
time. Since reconfiguration of full device is not needed, size of reconfiguration
data also decreases. In other words, reconfiguration times are directly
proportional with the reconfigured modules size. For example, if reconfiguration
time of the entire device is 4 ms then quarter of the device can be reconfigured at
1 ms.
Self Reconfiguration
If the reconfigurable device reconfigures itself without any aid from the
outside world then it is called self-reconfigurable system. Data required for
different configurations are generally stored on standard storage mediums. A part
of the device is responsible for taking data from the storage medium and sending
this data to the configuration port of the device. The configuration of the device
changes after port takes the data.
The main advantage of such reconfiguration is elimination of the need for
external configuration controller. This results in reduction of the total system cost.
Moreover, configuration data can be compressed at the storage side, and it can
be decompressed by the configuration controller. Therefore, the size of the
configuration data will decrease.
Different configuration port types can be used for self-reconfiguration. For
example, if the device has only a configuration port available at external pins, then
it can be used as shown in Figure 2-9. In this structure, configuration data is taken
by configuration controller and it is sent to the external configuration port of the
device. However, this approach has some drawbacks. Firstly, pins used by
configuration controller cannot be used for different purposes. Secondly, the
configuration data sent from configuration controller to configuration port cannot
be secure since data signals must go through PCB.
14
Some
devices
(such
as
Xilinx
Virtex-II
FPGA)
have
integrated
configuration port inside the fabric of the device. The configuration controller can
access this port internally (without going through pins) as shown in Figure 2-10.
As a result, pins are not wasted for reconfiguration purpose and reconfiguration
can be done securely.
15
configuration data to the internal configuration port of the device and user
application switches to another one. As a result, reconfiguration of user
application becomes secure with this method since raw configuration data cannot
be monitored from the outside world.
2.4
RECONFIGURATION TIME
Reconfiguration time is an important criterion on runtime reconfigurable
16
Some works also try to reduce the reconfiguration delay by using offlinescheduling algorithms. For example, [14] assumes the sequence of the tasks is
already known before running the system (i.e. at design time) and it reduces the
reconfiguration overhead up to 40%.
2.5
17
System on Chip (SoC). FPGA can be divided into two parts in which one part is
static and the other one is reconfigurable. Then a soft CPU can be mapped on the
static part and it can manage reconfiguration processes of the reconfigurable part.
On a partially reconfigurable FPGA, more than one area can be
reconfigured at an instance. Therefore, multiple tasks can be loaded at the same
time and they can be reconfigured independent from the others .This is another
advantage of using partial reconfiguration of FPGA.
Altera, Atmel, Lattice, QuickLogic, and Xilinx are the major FPGA vendors
in the world. About half of them have FPGA products that offer partial runtime
reconfiguration. These partially reconfigurable FPGA devices are listed below:
Atmel AT6K
Atmel AT40K
Atmel AT94K
Lattice ORCA
Xilinx Virtex
Xilinx Spartan
18
configuration sequence that tells the FPGA not to reset the entire RAM
configuration during a reconfiguration [15].
2.5.1
parallel mode at 60 MHz with handshaking, where XCV50 is the smallest device
of Virtex series FPGAs. Reconfiguration time for Atmel FPGA AT40K40 is 631 s
in parallel mode, with writing 16-bit wide words at 33 MHz [16]. Full
reconfiguration of ORCA OR4E06 takes 5.94 ms [17]. Note that, these devices
are smallest devices of the vendors. Newer and higher capacity FPGAs will have
bigger configuration data. However, they also speed-up the configuration ports,
which maintain reconfiguration times almost in the same order. For example,
Xilinx Virtex-4 has a 32-bit SelectMAP configuration port, which can reach up to
100 MHz clock rates.
2.6
2.6.1
19
2.6.2
In-Field Upgrades
Being a reconfigurable architecture also provides some other unique
2.7
Speeding-up Computations
2.7.1
divided into multiple parts. These smaller parts can be mapped to the hardware
by generating configurations. Then these configurations can be loaded to the
device at different times by using RTR. A scheduler arranges the reconfiguration
operations according to the demands. Therefore, a smaller capacity device can
be enough to map a bigger circuit on it. This results in cost and power reduction of
the system.
For example, Lianos et al. proposed a space efficient method for
calculating Fast Fourier Transform (FFT) by using a dynamically reconfigurable
architecture [18]. One reconfigurable vector calculates a column of FFT then
feeds the outputs into the reconfigurable vector again to calculate consecutive
stages of the FFT. Therefore, only one reconfigurable vector is enough to
calculate FFT on a dedicated hardware by using RTR.
In another work [19], a reconfigurable architecture is implemented that
behave as Programmable Logical Controller (PLC). Designed architecture utilizes
Temporal Petri Net language to describe applications. The sequential structure of
Petri Nets allows splitting applications into multiple parts. Then these parts are
mapped to same FPGA and used sequentially by reconfiguring it. This
architecture can divide whole application up to 40 parts. Therefore, using 40 times
smaller capacity FPGA can be enough instead of using a big one. This can
reduce the cost of device from $317 to $38.
Widespread usage of mobile systems increased the demand for low power
consumption while maintaining high performance. Some works deals with mobile
systems that use dynamic reconfiguration to reduce the total power of the system.
In [20], control units of an automobile are implemented on a runtime
reconfigurable FPGA. The user area is divided into four smaller parts. High
number of control units (e.g. 20 units) that cannot fit to one-device shares
available sources by time multiplexing. A scheduler determines reconfiguration
21
processes of control units. As a result, the system only consumes power of four
control units for implementing much higher number of control units. In addition, a
part of FPGA is always kept in contact with the outside world since only
necessary parts reconfigured. This eliminates a need for external controller of
reconfiguration process, which contributes power and cost reduction.
2.7.2
Adaptable Computing
Some types of applications require adaptation of hardware to changing
of
the
adaptable-computing
applications
absolutely
need
2.7.3
Speeding-up Computations
Reconfigurable Architectures (RAs) provide a flexible structure as
on
microprocessor
allows
only
serial
operations.
Therefore,
2.7.4
and replacing faulty sources with spare ones. Reserving spare sources is a trivial
issue on reconfigurable devices since they are composed of array of identical
elements. Many researches such as [28], [29] and [30] use inherent
reconfiguration property of the FPGAs in order to tolerate faults on them. In
Chapter 5, researches dealing with this topic will be discussed in more detail.
2.8
24
CHAPTER III
3.1
Output Blocks (IOB), BlockRAMs (internal RAM), and the configurable routing
matrix. Array of CLBs forms the FPGA structure. They are connected using
routing lines and they implement logic functions. For example, the device used in
this work, XC2S200E has 28 rows and 42 columns of CLBs. The structure of
Spartan 2E FPGA is shown in Figure 3-1.
25
3.1.1
which have two Logic Cells (LCs). These logic cells are the basic building block of
the FPGA. There is one flip-flop as storage elements and one look-up table which
implements combinational logic in a LC. Also, carry logic elements are inserted to
speed-up arithmetic operations. A CLB structure of Virtex-E or Spartan 2E device
is shown in Figure 3-2. Note that CLB architectures of Virtex-E and Spartan 2E
are same.
3.1.2
Output Blocks (IOBs). As shown in Figure 3-3, an IOB include flip-flops (FF) for
input, output and tri-state enable signal. These FFs can be used to obtain
minimum FF to pin delay. In addition, a number of IOBs are grouped to form a
bank. Voltage levels of banks can be selected from different types of I/O
standards.
26
3.1.3
Routing Structure
Routing structure is reconfigurable on Xilinx FPGAs, which is one of the
Local Routings are used to make connections inside the CLB, between
CLB and General Routing Matrix (GRM), and between two CLBs.
27
Figure 3-5: Horizontal Longlines that traverse all along the FPGA
Global Routings are used for low skew and high fanout signals such as
clock signals
28
3.2
29
3.2.1
30
3.2.2
Glitchless Reconfiguration
FPGA memory cells have glitchless transitions, when rewritten, the
3.2.3
Clocking Logic
Same clock can route to all partial modules. However, clocking logic
(Clock Routing Paths, Clock IOB) is always separate from the reconfigurable
module and clocks have separate bitstream frames [35]. As a result,
reconfiguration of a module does not affect synchronous circuits on another
module.
3.2.4
SelectMAP Interface,
different
configuration
interfaces
for
these
initial
and
run-time
reconfigurations. However, not all of these methods are suitable for run-time
reconfiguration. The methods suitable for run-time reconfiguration are
SelectMAP Interface,
32
BitGen tool, -g Persist:Yes option must be used. This option ensures that the
SelectMAP interface will remain active after first configuration.
Essential signals used for SelectMAP configuration port are given in
Figure 3-7. Configuration data is sent or received through DATA pins
synchronized with CCLK Clock. BUSY is used for handshaking and not necessary
for low clock rates. CS is the Chip Select signal that enables the port for data
transfers. WRITE is used to select the operation type, either as write or as read.
PROG, INIT, and DONE signals are the SelectMAP protocol commands and
acknowledgements such as reset the configuration logic, verify successful
operation etc... More details about the SelectMAP protocol can be found on [37].
33
ICAP
I[0:7]
O[0:7]
CLK
BUSY
WRITE
CE
ICAP interface signals are shown in Figure 3-8. The functionalities of CLK,
WRITE and BUSY signals are equivalent on ICAP and SelectMAP. In addition,
CE has the same function with CS on SelectMAP. The only difference is the data
bus, which is divided into two parts on ICAP. One part (I[0:7]) is used for writing
configuration data to port, while the other part (O[0:7]) is used for reading back
the configuration data.
Boundary Scan (JTAG) Mode
Joint Test Action Group (JTAG) designed a test standard and named
JTAG for testing Printed Circuit Boards (PCB). This Boundary Scan architecture is
34
designed to test the physical connection of I/O pins at the board level. JTAG
become a widely used test port with the increase of complicated PCB structures
and smaller Integrated Circuits (ICs) [38]. Due to lots of benefits, it has become
an IEEE standard (IEEE 1149.1). Most of current ICs contain a JTAG port pins to
debug it. Its boundary scan architecture has a four-wire serial interface travels
along all the pins of the device forming a chain. Serial data enters to the device
with Test Data In (TDI) pin and stored on a shift (instruction) register. The data is
send to the output of the device with Test Data Out (TDO) pin. All data shifting on
JTAG chain are done with synchronized to Test Clock (TCK). The reserved pins
for the JTAG port and their acronyms are listed in Table 3-1.
Table 3-1: JTAG Pins and their descriptions
Pin Name
Description
TDI
Test Data In
TDO
TMS
TCK
Test Clock
35
JTAG
TMS
TDO
TCK
TDI
M0
M1
M2
3.3
interface (GUI) of Xilinx ISE software. The GUI takes the circuit information from
the user as a HDL (i.e VHDL, Verilog etc) or a schematic file. Using these files,
36
37
The circuit netlist and constraints are combined on a file with a translation
operation (not shown in Figure 3-10). In the mapping phase, circuit is partitioned
and elements are grouped to map Logic Cells (LCs). Afterward, these logic cells
are placed and routed to the FPGA using CLBs, routing sources, IOBs etcAt the
last step configuration information is extracted from the placed - routed design
and written to the configuration file (i.e. to the bitstream).
The tools used for the operations of standard design flow are given in
Table 3-2. Note that, these tools accept additional options that enable for different
design flows. This feature is used in creation of runtime reconfigurable designs
and explained in Chapters 4 and 5.
Table 3-2: Standard Design Flow Operations and Tools of Xilinx FPGAs
Operation
3.4
Synthesis
XST
Translation
NgdBuild
Mapping
Map
PAR
Creating Bitstream
Bitgen
3.4.1
XAPP290
XAPP290 is an application note published by Xilinx. It includes reference
38
3.4.2
JBITS
JBits is an Application Programming Interface (API) based on Java. It is
developed by Xilinx. This API may be used to construct digital designs and
parametrical cores that can be executed on Xilinx Virtex II FPGA devices. It runs
on a Java enabled environment (usually a PC). Today it is only published for
Virtex II but it can be extended to other devices in the future.
JBits can be used for runtime reconfigurable applications. The circuits can
be configured on the fly by executing a Java application that communicates with
the circuit board containing the Virtex II device. By using the XHWIF API, it is
possible to download the design within the same Java application. This enables
run-time configuration and reconfiguration of Virtex II device [39]. The design flow
of runtime reconfiguration using JBits is shown in Figure 3-11.
Bitstream
from Xilinx ISE tools
Design App
JBits API
XHWIF
Design Verification
and Execution
Virtex II
Hardware
Figure 3-11: Design Flow of Runtime Reconfiguration using JBits [39]
The main steps involved in a JBits application are the object construction,
reading bitstream from a .bit file, modifying the bitstream, and writing bitstream to
a file again. This application flow on JBits is shown in Figure 3-12.
39
Start
Create JBits
Read Bitstream
Modify Bitstream
Write Bitstream
Stop
Figure 3-12: JBits Application Flow
The disadvantage of the JBits is it is too low-level (it changes routing of the
device, LUT configurations etc.). Designer must know the entire device
architecture to modify bitstreams. Therefore, JBits remain as a research tool and
it did not go further to implement complex designs.
40
CHAPTER IV
Xilinx
FPGAs
supports
runtime
reconfiguration
(RTR).
Partial
41
4.1
42
4.1.1
A reconfigurable module can use Input Output Blocks (IOBs) that lies on
its boundaries only.
4.1.2
Bus Macros
Reconfigurable modules must communicate with other modules via bus
macros as seen in Figure 4-2. Bus macros provide a fixed routing for signals that
pass to other modules. Therefore, every different configuration of a module uses
same path to share signals with other modules. Otherwise, communication will be
broken with a reconfiguration.
Reconfigurable
Bus Macro
Reconfigurable
or
Module
Bus Macro
Static Module
Tri-state buffers and horizontal long lines are used to implement bus
macros. Physical implementation of bus macro is shown in Figure 4-3. .LO[3:0]
and RO [3:0] are the horizontal tri-state long lines and used for tri-state signals.
43
LI[3:0] can drive these long lines if LT[3:0] enable signals are hold active. Also
RI[3:0] can drive these long lines if RT[3:0] enable signals are hold active.
At an instance, only one side must become active to drive bus to prevent
contention (i.e. only RT or LT becomes active at the same time). It is also
suggested that a bus macro must be used only in one direction (i.e. not
bidirectional). In addition, its direction must not change by reconfiguration.
4.1.3
Clocking Logic
As mentioned before, clocking logic is independent from reconfiguration
44
4.2
Static module only stores input operands and the result of the arithmetic
operation. Reconfigurable module is used as Arithmetic Logic Unit (ALU) that has
three different configurations. These configurations are used to implement
45
4.3
and Xilinx Modular Design Flow [40] is followed. In addition, the restrictions given
in Xilinx application note [35] are also taken into the consideration.
4.3.1
Assemble Phase
46
47
Includes
Bus Macro
Output of Active
Implementation, used in
Final Assembly
To synthesize all
Modules and Top
Design, separate
directories are
used
Figure 4-7: Directory Structure Used For A Module Based Partial Reconfigurable
Design
The bus macro that will be used in design is put on the BusMacro
directory. It is taken from the Xilinx application note files [35]. However,
some corrections are done in these files for Virtex-E device (will be
explained in Section 4.4)
48
Table 4-1: Descriptions of Files that are used for Module Based Partial
Reconfiguration
File
Constructed
Extension
by Program
.vhd
User
Description
Used by
XST (Xilinx
the design.
synthesis tool)
.ucf
User /
Ngdbuild
Constraints
command line
Editor
how
the
logical
design
is
program
.nmc
N/A
.ngc
XST (Xilinx
synthesis tool)
Output
of.
Synthesized
module
Ngdbuild
command line
program
Ngdbuild
command line
program
.ngd
command line
program
MAP command
line program
MAP command
line program
The
Native
Circuit
Description
design
mapped
to
PAR command
line program
the
.ncd
line program
49
BitGen
command line
program
Table 4-1 contd: Descriptions of Files that are used for Module Based Partial
Reconfiguration
BitGen command
.bit
line program
.bld
command line
MAP command
line program
PAR command
.par
line program
4.3.2
Ngdbuild
program
.mrp
To load device
User
User
User
created. VHDL is used to describe logic functions and they are synthesized using
Xilinx ISE 6.3i XST. There are five different VHDL files. These are
Top.vhd,
Right.vhd,
For Top, Right, Left Adder, Left Multiplier and Left Subtractor modules
separate projects are created. Add IO Buffers option is selected in the Xilinx
Specific Options tab when synthesizing top module, for other modules, this
option is deselected. Therefore, I/O buffers are only added to the Top module,
which is also done for non-reconfigurable designs.
50
Also for all modules, Bus Delimiter option is selected as <>. Note that <>
sign is called as angle delimiter. Therefore, in the following steps angle delimiter
bus macro that is given by Xilinx will be used.
After synthesizing, files with ngc extensions are created. These netlist files
will be used in the implementation.
4.3.3
Implementation
In the implementation flow, an initial configuration bitstream that configure
whole FPGA will be generated. Also for every different configuration of each
reconfigurable module, a partial bitstream will be generated.
In this example, two modules lie on FPGA. One of them is reconfigurable
while the other is static. In addition, there will be three different configurations
(Adder, multiplier, and subtractor) for the reconfigurable module. In summary,
three partial bitstreams for reconfigurable part and one full bitstream for the whole
device will be generated.
Implementation flow has following three phases. These are
Assemble Phase.
General overview of the flow is shown in Figure 4-8 and Figure 4-9.
51
Synthesis
VHDL File
Top.vhd
Top.UCF File
Created By
User
Bm_4b.NMC File
(Bus Macro)
Given by Xilinx
XST Tool
Top.NGC File
Translate
NGDBuild Tool
Top.NGD File
Synthesis
VHDL File
Left_add.vhd
Left_add.NGD File
Map
XST Tool
Left_add.NGC File
MAP Tool
Left_add.NCD File
Generate Bitstream
BitGen Tool
partial_Left_add.bit
(Bitsream File)
PimCreate Tool
Pim Folder
Figure 4-8: Initial Budgeting and Active Implementation Phases of Module Based
Partial Reconfiguration Flow.
52
4.3.3.1
the overall design. A user constraint file is created and used with NgdBuild tool to
annotate constraints to the synthesized top design file.
53
54
After saving user constraint file, the following constraints are added by PACE:
AREA_GROUP "AG_left_module" RANGE = CLB_R1C1:CLB_R28C21 ;
AREA_GROUP "AG_left_module" RANGE = TBUF_R1C1: TBUF_R28C21 ;
INST "left_module" AREA_GROUP = "AG_left_module" ;
55
as shown in Figure 4-11. The upper one is called TBUF 0; the lower one is called
TBUF 1.
TBUF 0
A bus macro occupies one row by eight columns of CLBs. The origin of a
bus macro is the upper TBUF in the leftmost CLB. The placement and origin of a
bus macro is shown in Figure 4-12.
Bus Macro
with Origin:
Column 12
CLB
Column 15
CLB
CLB
Column 16
CLB
CLB
CLB
CLB
Reconfigurable
CLB
Fixed
Module
Boundary
56
An example statement in the ucf constraint file to define the origin of the
bus macro is the following:
INST busmacroname LOC = TBUF_R1C1.0
The following constraints are also entered manually to lock the place of the
LUTs used for VCC and GND for each module (Reason for adding VCC and GND
will be explained in Section 4.3.4).
INST "Internal_Gnd_Left" AREA_GROUP = "AG_left_module" ;
INST "Internal_Vcc_Left" AREA_GROUP = "AG_left_module" ;
INST "Internal_Gnd_Right" AREA_GROUP = "AG_right_module"
57
uc option ensures that the constraints from the top.ucf file are annotated to the
top.ngd file.
-p xc2s200e-pq208-7 option instructs ngdbuild that the device is Xilinx Spartan-2
200E, package is pq208 and speed grade is -7.
4.3.3.2
level constraints. Partial bitstreams are generated for all reconfigurable modules
(Left_mult, Left_sub, Left_add) and static module (right) as illustrated in Figure
4-13
Figure 4-13: Partial Bitstreams for Reconfigurable Modules and Static Module
For all modules, Top.ucf file and associated synthesized .ngc file are
copied to the module directories in the implementation directory.
Then for all modules NgdBuild, MAP, PAR and Bitgen commands are
executed successively. Also for all modules pimcreate is executed to publish
routed and mapped partial module design. Published files are put on Physically
Implemented Modules (PIM) folder. Then they will be used in the final assembly
phase.
As an example, the following commands are executed for adder
configuration of the left module.
ngdbuild
-p
xc2s200e-pq208-7
-modular
module
-active
left
58
meaning that the device remains in full operation while the new partial bitstream is being
downloaded.
combined to obtain a complete FPGA design. These partial design files are taken
from the Pims directory and used for map, place, and route operations to create a
full FPGA design.
Only one complete assemble is done. It includes left adder and right
modules (another two possibilities are combining left multiplier and right modules
or combining left subtractor and right modules).
59
Top_final directory is used for the final assemble phase. Top.ucf file
created in the initial budgeting phase, bm_4b.nmc (in the BusMacro directory) and
synthesized design top.ngc are copied to the top _final directory.
After copying files, the following commands are executed successively:
par -w top_map.ncd top.ncd : PAR takes mapped design as input, then it place
and route the design and outputs top.ncd file
60
Adder
Circuit
Bus Macros
Boundary
of
Modules
Multiplier
Circuit
Bus Macros
Boundary
of
Modules
Figure 4-15: Placement of a Multiplier Circuit and Bus Macro on the FPGA
61
Subtractor
Circuit
Bus Macros
Boundary
of
Modules
Figure 4-16: Placement of an Subtractor Circuit and Bus Macro on the FPGA
Adder
Circuit
Storage Module
Boundary
of
Modules
Figure 4-17: Final Layout of the Circuit on the FPGA with Adder Module on the Left
Side
62
4.3.4
macros. In addition, as explained before the direction of bus macros are adjusted
by using LT or RT enable ports of the bus macro. These ports are driven by logic
1 (VCC) and logic 0 (Ground), therefore one direction is selected for bus macro.
These VCC and Ground must also be alive during reconfiguration of a module.
In addition, it is forbidden to use same constant logic 1 (VCC) and logic
0 (Ground) signals on different reconfigurable modules. The reason is that it can
cause a problem on module that shares these signals with another module while it
is reconfiguring. Therefore, instead of sharing logic 1 and logic 0 signals, they
must be given to the modules separately.
These limitations forces a module to have it is own VCC and Ground
signals and must be always available (even if it is reconfiguring). There are two
methods for getting logic 1 and logic 0 signals to the modules. First one is
getting these signals from the outside world by using FPGA pins [41]. The second
method is creating dummy Look-up Tables (LUTs) for each module and getting
logic 1 and logic 0 signals from them [42]. The second method is used in this
design.
1-bit LUTs are used and LUT functions are selected so that whatever the
input is one LUT creates logic 0 and another LUT creates logic 1. The truth
tables of the LUTs are given in Table 4-2 .
Input
Output
Input
Output
Two LUTs are connected to each other in order to create dummy inputs.
LUT connections are shown in Figure 4-18. LUT on the left side creates logic 0
and LUT on the right side creates logic 1.
63
The same structure is used for each module (fixed and reconfigurable).
The added VHDL codes to the top.vhd file for generating left modules logic 1 and
0 are the following:
-- Fake Gnd and Fake Vcc of Left Module
Internal_Gnd_Left: LUT1
generic map (INIT => b"00")
port map
(O => Gnd_Left, I0 => Vcc_Left);
Internal_Vcc_Left: LUT1
generic map (INIT => b"11")
port map
(O => Vcc_Left, I0 => Gnd_Left);
4.4
was done on Virtex - 100E. The bus macro provided by Xilinx for Virtex - E device
64
contains some errors. The errors are corrected to achieve partial reconfiguration
on Virtex E device.
4.4.1
connected on the outside of block "t4H_<0>" but left unconnected within the block: T
WARNING:PhysSimExpander:5 - TBUF symbol `t3G_<1>':
connected on the outside of block "t3G_<1>" but left unconnected within the block: T
....
These warnings cause the mapping tool (MAP) to fail and give some
errors. The reason for these warnings is the error in the bus macro file of the
Virtex-E family. The bus macros with extensions nmc must be converted to xdl
files in order to make them editable (with a text editor). To convert a command
line utility of Xilinx is used. For example, the bus macro bm_4b_ve.nmc is
converted to bm_4b_ve.xdl by the following command:
xdl ncd2xdl bm_4b_ve.nmc
Converted xdl file is opened with a text editor and there are some lines in
the xdl file problematic such as
inst "t4H_<0>" "TBUF" , placed R1C16 TBUF_R1C16.1 ,
cfg "TMUX::0 IMUX::I _SUPERBEL::TRUE";
inst "t3G_<1>" "TBUF" , placed R1C15 TBUF_R1C15.0 ,
cfg "TMUX::0 IMUX::I _SUPERBEL::TRUE";
........
In these lines
cfg "TMUX::0 IMUX::I _SUPERBEL::TRUE"
is changed to
cfg "TMUX::T IMUX::I _SUPERBEL::TRUE"
65
Explained changes are done on the .xdl file and its again converted to nmc
file by the following command:
xdl xdl2ncd bm_4b_ve.xdl
4.4.2
Again, the bus macro has problems that need to be corrected. The
problematic lines in the converted bus macro file (.xdl) are the following
net "TNET<3>" ,
outpin "t1A_<3>"
outpin "t1E_<3>"
is changed to:
66
net "TNET<3>" ,
cfg "_NET_PROP::IS_BUS_MACRO:" ,
outpin "t1A_<3>"
outpin "t1E_<3>"
For all nets (net "TNET<3>" , net "TNET<2>" , net "TNET<1>" , net
"TNET<0>"
the
same
correction
is
done.
In
other
words
cfg
67
CHAPTER V
5.1
BACKGROUND
Reliability is an important issue for mission and safety critical applications.
A high reliability must be maintained on systems where a failure can cost lives
and money. For instance, a breakdown of a satellite is unacceptable where total
system cost takes a few billions of dollars. Similarly, brake system of an
automobile must be highly reliable where people lives depend on. Hence,
designers of such systems must take into consideration the faults on hardware
that can arise during operation.
A reliable hardware environment is designed in this work, which is mainly
built on a run-time reconfigurable FPGA. The fault types seen on such FPGAs
and recovery methods of them are discussed in this section.
68
5.1.1
Fault Tolerance
A fault tolerant system can continue to operate even a fault occurred.
Redundancy
Reserving extra sources to mitigate the effects of faults is called
Availability
Availability is a measure of ratio between running time without breakdown
and total running time of the system. High availability is the main aim of a fault
tolerant architecture. For example, mission critical applications require very high
availability. Ideal availability for such systems is 100%.
5.1.2
faults. TMR is composed of three redundant modules and one voter module as
seen in Figure 5-1. All redundant modules are exact copy of each other. The level
of a redundant module can range from only a gate to a complex circuit. A majority
voter compares the outputs of these identical modules.
69
When no error is present in the system, the outputs of all modules agree
with each other, then voter use this output. If one of the modules fails, it gives
different output from the other modules. Then the outputs of two correct modules
agree and voter uses this output to feed forward. If more than one module
becomes faulty then none of them agrees with each other and the system breaks
down. Therefore, using just TMR can mask the effect of a single failure. If a
recovery approach is used on TMR after a failure, system can return to initial state
and more than one error can be masked.
The advantage of TMR is high system availability (i.e. 100%) even if an
error is present. Another advantage is no extra error detection circuit is required
inside a module. Voter immediately detects an error on a redundant module and it
can reflect this status to the output.
5.1.3
If one of the duplicate modules encounters an error, the other can keep the
system alive as in TMR. The error must be eliminated as soon as possible since
occurrence of an error on correctly operating module can crash the system. After
eliminating fault, the state must be recovered to a well-known point. This can be
accomplished by either copying state from correctly working module or returning
to a checkpoint. Restoring state from a past checkpoint is called rollback
operation while copying it from a correctly running system is called roll-forwarding
operation.
70
5.1.4
Fault Types
Encountered fault types during the operation of a digital circuit can be
classified into two groups. One of these types is transient fault and the other type
is permanent fault. A transient fault causes the circuit to work incorrectly in a
limited time interval; afterward the fault disappears. It is important to be aware of
the fault and take precautions such as re-evaluating circuit operations. A
permanent fault cannot be corrected without any intervention from the outside
world [43]. Spare sources must be reserved in the system, in order to eliminate
permanent faulty elements. Whenever a permanent fault occur these spare
sources takes the function of the faulty blocks.
5.1.4.1
Transient Faults
Heavy ion and proton particles on space applications may hit a memory
element such as latch, flip-flop, RAM etc. This cosmic radiation results in a state
change of the memory element, which is called Single Event Upset (SEU). Such
events are seen more frequently with continuously decreasing transistor sizes by
improving implementation technology [44]. Normally these bit-flips are transient
and disappear after storing a new value on the memory cell or resetting it.
However, these transient errors become an important issue on SRAM
based FPGAs where circuit behaviours are determined by configuration memory
elements. If a configuration memory element of FPGA encounters a SEU, it will
change corresponding circuit behaviour. In normal operation of an FPGA (i.e. no
reconfiguration is done), this memory will never be refreshed until a power-down
power-up sequence. Hence, these transient errors become permanent errors, if
no configuration memory refresh is done. The system can be damaged because
of these functional errors. Therefore, they must be corrected by writing the true
configuration data after a SEU in order to recover circuits. An example bit-flip or
SEU on a Lookup Table (LUT) and resulted functional error is illustrated in Figure
5-2.
71
Figure 5-2 Effect of a Single Event Upset (SEU) a) Original Configuration with
function AND b) Configuration after a SEU with Function Constant Zero [45]
5.1.4.2
Permanent Faults
Permanent faults can arise during the operation of a circuit due to long life
usage or impurities on manufacturing that are not detected with initial tests [28].
The long usage of circuit on high radiation environment can also trigger
permanent fault generation by modifying threshold voltage levels of transistors on
the circuit [46]. Researches use some models for operational permanent faults
and most common model is stuck at error model.
Stuck-At Model
Stuck-At fault models are widely used for permanent error modelling due
to their simplicity. The models are stuck-at 0, stuck-at 1, switch stuck open and
switch stuck closed. Stuck-at 0 (SA0) means the input or output of a logic gate is
locked to 0 and cannot be changed anymore. Similarly, stuck-at 1 (SA1) means
the input or output of a logic gate is locked to 1.
Other models are used for the switches. Switches can be locked to a state
connecting (stuck closed) or not connecting (stuck open).
Any signal of a CLB can encounter a SA0 or a SA1 error on an FPGA. A
connection point on the switch matrix or Programmable Interconnect Points (PIP)
can also encounter stuck-at open/closed errors. These elements must be
replaced by undamaged ones.
72
5.2
RELATED WORK
The main research topics about the Fault Tolerance that use Runtime
Detection of Errors
Simulating faults
Some researches propose methods for more than one topic at the same
time. They are discussed under one subject in the following section.
Error Detection
There are different methods available to test the sources inside an FPGA.
Some of them use online methods in which the system continues to operate.
Runtime reconfiguration is used to enable uninterrupted operation of user circuit.
For example, M. G. Gericota et al. [28] proposed a non-intrusive CLB test method.
The method uses a dynamic rotation mechanism to test all CLBs inside the FPGA
and rotation is based on RTR of hardware. In order to test a CLB in a nonintrusive manner, its contents are copied to another CLB called replica. Then, test
is done to the replicated CLB. If no errors found on replicated CLB, the function
again copied from the replica CLB. This method is able to detect permanent
errors on CLBs and it can recover transient errors on CLBs.
J.Emmert et al. [29] used another method for error detection based on
BIST (Built-In Self-Test). BIST structure includes Test Pattern Generator and
Output Response Analyzer to test the functionality of the block under test. Their
method implements a roving Self-Testing Areas (STARs) that reserve a test area
inside the FPGA. A STAR contains vertical and horizontal blocks to be tested.
After the test operation of blocks completed, they move to another position. The
logic blocks other than STAR are always active inside the FPGA. Partial
reconfiguration of the FPGA allows the system working even if the STAR is
moving to another place. Moreover, a reconfiguration can eliminate the usage of
faulty logic cells.
73
SEU Mitigation
Single event upsets on configuration memory of SRAM based FPGAs
become a permanent error as mentioned before. Two methods can be used to
eliminate a SEU on SRAM configuration memory. First method uses a readback,
compare, and repair strategy. Configuration memory is continuously read and
compared with the original configuration data. If an error found on a frame, it is
partially reconfigured again to correct error. Instead of comparing all the
configuration bits of a frame, CRC can be generated and can be compared with
already prepared CRC to find an error. One restriction of this method is LUT
cannot be used as shift-register or RAM, since readback operation can disrupt the
data on it [47] [48].
Second method continuously writes correct configuration data to the
configuration memory of the FPGA. This method is called scrubbing. The reload
period is selected according to the rate of SEU events. As a rule of thumb,
scrubbing rate must be 10 times higher than average SEU rate.
Both systems must use a fault tolerance method such as TMR in order to
increase availability. Otherwise, system can stop working until the repair process
correct the fault. Xilinx proposed [49] such SEU mitigation method that uses
partial reconfiguration (scrubbing) and TMR.
Another research by R. F. Demara et al. [50] proposed a TMR like solution
in which redundant modules are two instead of three. These two redundant
modules are called Discrepancy Mirrors. Discrepancy mirrors are exact copy of
each other and their output is voted by a discrepancy detection circuit. When a
fault arises on one of the redundant modules or detection circuit, the detection
circuit indicates unmatched outputs. Therefore, it does not need a golden circuit
for the detection circuit. This method enables immediate detection of errors as
opposed to test vector method, which requires a high latency for detection of
errors. After detecting error, a reconfiguration can eliminate the single event
upsets on discrepancy mirrors and detection circuit.
Tolerating Permanent Faults
In a fault tolerant system, extra sources must be reserved in order to
eliminate permanent faults. These sources are kept as spare until a fault appears.
74
Then, in the case of damage occurred, the faulty element is replaced by a spare
(non-faulty) one to implement the function of it.
Runtime reconfiguration (RTR) of hardware sources can be very helpful to
mitigate the effects of faults on FPGAs. Configuration schemes can be prepared
that does not cover faulty sources. Since RTR enables uninterrupted operation,
faulty sources can be replaced with spare ones by using such configurations. This
is a very useful property since system can stay active even during recover
operations. Therefore, researches that deals with Fault Tolerance of FPGA use
RTR property of them. In this work, RTR is also used to mitigate effects of faults
on FPGAs.
W.J. Huang et al [30] have proposed a column-based precompiled
configuration technique that can eliminate permanent faults on the FPGA. In this
method, FPGA is divided into multiple columns. One or more columns are
reserved as spare and a user design is mapped to the remaining columns. For
each function, multiple configuration schemes are prepared offline, and they are
used immediately if an error appears. For example, if an error appears in a CLB of
a column then the function inside is moved to a neighbour error-free column. The
function in neighbour column is also moves by one column and replaces another
function. All functions are moved until a spare column is used, then the erroneous
column becomes empty. Therefore, each function has a configuration mapped on
each column. This method can tolerate permanent faults until the number of faults
becomes equal with the number of spare columns.
TMR architecture can be used to tolerate permanent faults. It can tolerate
up to one erroneous redundant module if no additional method is used. Its
disadvantage is high area overhead. Another work, by S.Y. Yu et al. [51] provide
a solution to reduce the area overhead of the TMR. Their technique divides a
redundant module into two parts. Then different fault tolerance methods are
applied to two distinct parts. For example, in one of the implementations, a part
uses a redundant architecture (i.e. TMR) while the other part is strengthened by a
Concurrent Error Detection (CED) unit. If TMR part encounters an error, it is
reconfigured to eliminate fault and then errors are corrected by roll forwarding. In
another method, one part uses TMR while another one uses a duplex scheme. If
an error occurs on the parts implemented with duplex schemes, it is reconfigured
to eliminate fault and then error is corrected by rollback recovery method.
75
Simulating Faults
Some researches try to find solutions for simulating Single Event Upsets.
This simulation is necessary before launch to analyze the behaviour of the circuits
under a real space environment. Two methods can be used to simulate SEUs on
the earth. First, SEUs can be directly injected by using radiation (proton beam)
test equipments. Second method reconfigures the FPGA with a configuration data
that embeds bit-flip errors inside. The second method is a low cost solution since
no external equipment is necessary to see a SEU effect.
For instance, Gokhale et al. [48] used RTR to induce SEUs into the
configuration memory of the FPGA. P. Kenterlis et al. [52] used JBits to emulate
the configuration data corruptions.
5.3
5.3.1
DESIGNED ARCHITECTURE
General Overview of the System
The system is mainly composed of a board containing runtime
76
5.3.1.1
5.3.1.2
Using only Triple Modular Redundancy (TMR) for Fault Tolerance cannot
compensate errors on different modules. In other words, if an error occurs on a
redundant module, the system can continue to operate. Nevertheless, if another
error comes to another module during this state, system will halt.
However, if the error on a redundant module of the TMR can be eliminated
whenever it appears, the system can compensate more than one error. Designed
architecture use partially reconfigurable hardware (i.e. FPGA) where parts of
circuit can be changed while others are operating. Faulty redundant modules are
replaced with repaired ones while continuing the operation of the whole circuit by
the help of partial reconfiguration.
The reconfiguration of the FPGA is based on Module Based Partial
Reconfiguration, which was described in Chapter 4. Hence, the guidelines of
77
module based partial reconfiguration are followed for this design and its
restrictions are considered.
5.3.2
78
RS232 to TTL
Converter
Board
DIO
Board
USB to RS232
Converter Cable
Xilinx FPGA
D2SB
Board
Parallel Cable
III
Figure 5-4: Picture of the Reconfigurable System without a PC
5.3.2.1
peripherals to run the FPGA. The block diagram of the board is shown in Figure
5-5. Xilinx XC2S200E is a reconfigurable FPGA as explained in Chapter 4.There
is a 1.8V regulator to supply the power of the FPGA core. FPGA I/O voltage is fed
by the 3.3V regulator. To make simple tests one LED and one push button is
inserted on the board. There is also a socket for a configuration Programmable
Read Only Memory (PROM). A JTAG connector is provided to enable
configuration of FPGA and PROM by JTAG cable. 143 I/O pins of the FPGA are
expanded to the connectors to enable connection with other daughter boards or
user circuits [53].
79
5.3.2.2
takes necessary data and displays on seven segment displays on it. Actually,
DIO1 is a daughter board that can be directly connected to the connectors of the
D2SB board. However designed architecture implements module based
reconfiguration in which a module can access to only I/Os at its boundaries (as
explained in Chapter 4 in detail). This limits connection of D2SB board to the
DIO1 board to certain pins. Therefore, the daughter board is not connected
directly to the D2SB but connected by wiring up the connector pins.
5.3.2.3
levels of the serial port and Xilinx FPGA are incompatible since Serial port uses
RS232 voltage levels and Xilinx FPGA is configured to work on TTL voltage
levels. In order to convert voltage levels from RS232 to TTL and vice versa, a
converter board is constructed. The schematic and PCB figures of the board are
given in Appendix A.
Today, some computers such as Laptop PCs do not have a serial port any
more. To overcome this problem, a USB to Serial port converter is also added to
the board. The design is done normally as using standard serial port.
5.3.2.4
configuration bitstream to the FPGA using its JTAG port. It is connected to the
Parallel Port of the PC and works at 200 kHz.
5.3.3
redundant modules and a voter module. The logic circuits on redundant modules
are exact copy of each other and they include user circuit. Voter module
compares the outputs of redundant modules. All modules are partially
reconfigurable. Alternative partial configurations are prepared for alternative
placements of a redundant module.
An initial configuration that contains TMR is loaded to the FPGA. After the
initial configuration, redundant modules send their state information to the voter
module all the time. Voter module checks if all of them has the same output or
not. Then voter sends state information to the PC on fixed time intervals. The
connections of Redundant Modules and Voter Module are shown in Figure 5-6.
81
5.3.4
82
5.3.4.1
Voter Module
Voter Module is mainly responsible from controlling whether all modules
give same output and informing PC about module states. Other responsibilities
are recovering states of redundant modules after reconfiguration, and driving
display units. Figure 5-7 shows internal units of the Voter module.
Error Checker
Error checker unit controls whether all units give same output. To
accomplish this goal, a majority voter circuit is implemented on error checker unit.
There are three comparators checking the equality of Module One - Module Two,
Module One - Module Three, and lastly Module Two - Module Three. If all
comparators give 1 to the output, Error Checker generates All Modules are OK
signal as shown in Figure 5-8a. If comparators belonging to a module give zero
output then this module is treated as faulty and reported with a corresponding
signal. For example, in Figure 5-8b the logic that generates Error on Module
One is shown.
83
Figure 5-8: Internal Logic Circuits of Error Checker Unit a) Circuit giving All
Modules are OK signal b) Circuit giving Error on Module One signal
Error checker sends status messages to the PC via UART. The messages
are shown in Table 5-1.
Table 5-1: Status Descriptions and their corresponding ASCII values
ASCII
ASCII
Decimal Number
Character
65
66
67
68
69
Status Description
84
Command Decoder
Command Decoder unit decodes the data coming from the PC. It reads
receive buffer of the UART whenever a data available. Then it decodes the data
and if necessary, it sends commands to the individual units on the Voter Module.
A command is one-byte data. It has three fields; Module Number, Module
Command and Generic Command as shown in Figure 5-9.
The Module field indicates the recipient of a Module Command. It can take
01 value for Module One, 10 for Module Two and 11 for Module Three. The
Module command is sent to the individual Modules according to the Module field.
The Module Commands and their codes are listed in Table 5-2.
Command
Code
Command Name
Command Definition
0001
Reset
Reset module
0010
Rollforward
0011
AskDiscrepency
_BusMacros
0100
UseAlternateBusMacro
0101
DeleteDiscrepencyInfo
_BusMacros
0110
UseOriginalBusMacro
85
86
A Redundant Module
A redundant module includes a user circuit. The circuit can be composed
88
Co
un
t
Co
un
t
=0
User must take into consideration the usage of Clock Enable (CE) and
Load signals. These signals are necessary to roll-forward redundant modules in
the case of a fault occurs. Any synchronous circuit must use ce_ModX to enable
loading input data to a Flip-Flop with the clock. In addition, load_ModX must be
used to load data coming from the Voter module. An example VHDL code for
Module One is given below:
89
if (reset_ModOne='1') then
Load Initial FF States
elsif (Rising_Edge(clk)) then
if(load_ModOne='1') then
FF States Rolled Back
elsif(ce_ModOne='1') then
Normal Sequential FF State Transitions
end if;
end if;
5.3.5
90
91
Module Name
Spare Area
1-7
Module Three
8-15
Module Two
16-23
Module One
24-31
Voter Module
32-42
Module based partial reconfiguration does not allow signals to pass from
one module to another except using Bus Macro structures. Therefore, bus macros
are used for the communication of redundant modules with voter module.
However, some extra effort is needed to communicate two non-adjacent modules
since Xilinx only gives a bus macro connecting only adjacent modules. Bus macro
given by Xilinx is modified to enable communication between two non-adjacent
modules.
5.3.5.1
92
Figure 5-13: Modified Bus Macro that connects Two Non-Adjacent Modules
Working principle of the modified bus macro rely on FPGA cells that can
be reconfigured glitchlessly. Writing same configuration data to the configuration
cells does not cause a glitch on the cell connection. Furthermore, bus macros are
placed exactly same horizontal lines for each configuration of a module.
Therefore, while intermediate module is reconfiguring, the bus crossing this
module does not corrupted by the help of glitchless configuration of cells.
Otherwise, programmable interconnection points (PIPs), which reside in the
middle area, will disconnect the bus macro.
Table 5-4: Different Bus Macro Functions and Their Sources
Bus Macro
Name
Connecting Modules
bm_one_4b.ncd
bm_two_4b.ncd
bm_thr_4b.ncd
93
Source
Provided by Xilinx
(bm_s2e_4b.ncd)
Edited from bm_one_4b
(custom)
Edited from bm_one_4b
(custom)
a)
Module
Boundary
TBUF
Connection
Point
TBUF Connection
Points
b)
TBUF Connection
Points
94
Module
Boundary
TBUF Connection
Points
Module
Boundary
Horizontal
Routings
TBUF Connection
Points
Figure 5-14: FPGA Editor Snapshots of Bus Macros a) Standard Bus Macro connecting Two Adjacent Modules b) Modified Bus Macro
connecting Two Non-Adjacent Modules
5.3.5.2
Partial Configurations
To eliminate permanent faults, alternative placements are done for
95
given in Appendix C). Then modular design flow automatically place bus macros
in all partial configurations.
5.3.5.2.1
To increase reliability of the TMR system two redundant bus macros are
used for each module output. If a redundant module gives erroneous output,
Voter can change the output data path from normal bus macro to an alternative
one. For this purpose, the output of a redundant module is replicated and passed
to the voter by using two bus macros.
In the case of an error, the voter side checks the equivalence of the bus
macros. If a discrepancy seen at the output of them, voter uses alternative one.
5.3.5.3
files call necessary Xilinx tools as explained in Chapter 4. Mainly three batch files
are prepared for the each step of the modular design flow. These are Initial.bat,
Active.bat and Assemble.bat.
Initial.bat copies necessary files to the topinitial directory and call
initial.cmd. Initial.cmd run the initial phase of the modular design flow with the
following command:
ngdbuild
96
par
top.pcf
Note that pimcreate command is not necessary for the other configurations
of module one. It is enough to run pimcreate for the configuration that will be used
on the generation of the full bitstream (i.e during Assemble phase).
Assemble.bat creates a final full bitstream file. This file is initially
downloaded to the FPGA. It copies necessary files to the topfinal directory and
calls assemble.cmd file. Assemble.cmd includes the following commands:
ngdbuild -p xc2s200e-pq208-7 -modular assemble -uc top.ucf pimpath ..\Pim -use_pim modone
5.3.6
5.3.6.1
Eliminating Faults
Eliminating Single Event Upsets
Single event upsets (SEUs) occurred on the configuration memory of the
97
5.3.6.2
configurations that do not use them. For this purpose, empty areas are reserved
in alternative configurations. Then if error occurs on a CLB, a configuration that
maps the faulty CLB on to empty space is loaded. Reserving spare areas on
configurations are done by using the Prohibit constraint of Xilinx PAR (Place and
Route) tool. Prohibit constraint can be applied to the CLBs that must be discarded
during place and route operation. For example, the following constraint is added
to User Constraint File to prohibit the usage of two CLBs at the place and route
operation:
#CONFIG PROHIBIT=CLB_R17C20, CLB_R17C21;
The granularity of empty spaces can range from fine to coarse grain.
Reserving only one CLB to replace faulty a CLB is the finest granularity for the
empty space. Such a fine granularity enables effective usage of area on the
FPGA while increasing alternative configuration numbers and configuration data.
On the contrary, coarse grain elimination lowers the number of alternative
configuration bitstreams at the cost of inefficient resource usage on the FPGA.
In this work, it is assumed that FPGA has excessive sources and faults are
eliminated in a coarse grain manner. A module is divided into multiple columns,
and then one of the columns is left as empty. For each column, a configuration
bitstream is prepared offline that left a column empty. In other words, no
placement of logic circuits is done on the CLBs that are in the selected empty
column. In Figure 5-17, alternative configurations that reserve different empty
CLB columns are shown.
98
5.3.7
PC Program
The intelligence of the system is put on the PC Program in order to
maintain fault free operation. Constructed TMR on the FPGA already provides a
fault tolerance however; it is strengthen by the reconfiguration operations. The PC
program is responsible for selecting an ideal reconfiguration scheme.
PC Program communicates with the Voter module on the FPGA via serial
port of the PC. It sends commands and receives status information. All bitstream
files are stored on the computer and downloaded to the FPGA with the JTAG
configuration port. PC program manages the bitstream download operations.
99
The PC program has been designed with Borland C++ Builder and written
in C++ language. The source codes of the program are given Appendix E (In
FTAchitecture / Borland-Project directory).
100
5.3.7.1
Rate. Same baud rate is used on the PC program to synchronize with Voter.
5.3.7.2
bitstream to the FPGA. Impact, a Xilinx tool, is used to ease this task. Impact is a
PC program that can connect to the JTAG chain with standard Xilinx configuration
cables. It can detect devices on the JTAG chain and program them. It has a
graphical user interface for manual operations and command line interface for
batch operations.
Batch files that use command line interface of the Impact are written.
These batch files are executed by the PC program to reconfigure the FPGA. Each
batch file has a corresponding bitstream file. For example, the following
commands of the batch file reconfigure the FPGA with the top_final.bit bitstream
file.
1) setmode bscan
2) setCable -p lpt1
3) addDevice -p 1 -file top_final.bit
4) program -p 1
5) quit
First line of the batch file instructs Impact to use Boundary Scan (JTAG)
interface. Second line selects Parallel Port 1 as the JTAG cables connection port.
Third line selects the first device (i.e. FPGA) on the JTAG chain to configure and
assigns top_final.bit as the bitstream file. Fourth line programs the device with the
assigned configuration file. The last line exits the command line interface.
The batch files are given in Appendix E (in FTArchitecture/ BorlandProject/ Configurations directory).
101
5.3.7.3
Prompt. The following command calls the Impact tool and it will run the
example_file.cmd batch file.
Impact -batch example_file.cmd
PC program must be also able to run the above command. This command
is executed in Borland C++ Builder by using following class, class variables, and
function:
SHELLEXECUTEINFO ShExecInfo;
ShExecInfo.cbSize = sizeof(SHELLEXECUTEINFO);
ShExecInfo.fMask = NULL;
ShExecInfo.hwnd = NULL;
ShExecInfo.lpVerb = NULL;
ShExecInfo.lpFile = "impact";
ShExecInfo.lpParameters ="-batch example_file.cmd";
ShExecInfo.lpDirectory = "Configurations";
ShExecInfo.nShow = SW_SHOW; //or SW_HIDE;
ShExecInfo.hInstApp = NULL;
ShellExecuteEx(&ShExecInfo);
5.3.8
FPGA and PC Program. The errors on module(s) are reported to the PC Program
by the Voter. PC makes reconfiguration operations and sends commands to the
Voter. An example of communication protocol commands used during fault
elimination process is demonstrated in Figure 5-19.
102
5.3.9
source of the error. For each module, the same algorithm runs independently.
Briefly working principle of the algorithm is as follows: It first checks if the error is
transient. If error is not transient, it successively tries changing bus macro,
refreshing configuration memory and loading alternative configurations until error
disappears. If error persists, it gives up the tests and passes to the recovery of
voter module. Figure 5-20 shows the flow chart of the running algorithm in more
detail.
103
INITIAL
Transient Count
0
DiscrepancyRecovered
DiscrepancyRecovered
True
PERMANENT ERROR
RECOVERY
false
False
RUNNING
Error
No
Send Command
(Ask Discrepancy
of Bus Macros)
Transient
Error Count<3
Is any
discrepancy
seen on Bus
Macro
Yes
No
104
Yes
Yes
Start Timer 1
WAIT TIMER 1
No
Reconfigure Module
Command (Reset)
Command (Roll Forward)
Command (Delete Discrepancy
Info.)
Change Configuration
Start Timer 4
CONFIGURATION MEMORY
ERROR-SEU
RECOVERY
WAIT TIMER 4
Reconfigure Module
Command (Reset)
Command (Roll Forward)
Command (Delete Discrepancy
Info.)
Transient Error Count ++
Start Timer 3
Error
No
No
WAIT TIMER 3
Error
Yes
WAIT TIMER 2
No
Error
Error
Timer
Finished
Yes
No
Timer
Finished
Yes
Yes
No
Are all
Configurations
Tried?
Timer
Finished
Yes
Yes
No
Yes
No
No
Timer
Finished
Yes
Yes
ERROR
Figure 5-20: Flowchart of Fault Recovery Algorithm that Runs on the PC Program
No
The algorithm starts with Initial state and reset all the variables. Voter
periodically sends status output of the module. Algorithm passes to the Running
state with the first status output. If an error comes at Running state, it passes to
the Wait Timer 1 state and waits for second error status. If no more error comes
during the Wait Timer 1 state, it returns to the Running state. This timer is
necessary to ensure the error is not transient.
If another error comes during Wait Timer 1 state, it starts to check bus
macro status. It asks whether a discrepancy is seen between original bus macro
and alternative bus macro. If no discrepancy seen, it refreshes the configuration
memory (to eliminate SEUs) with the partial bitstream of the module in the
Configuration Memory Error Recovery state. After refreshing memory, it requests
resetting and roll-forwarding operations of the module from the voter. In addition,
a counter is incremented that holds number of memory refresh operations. If the
counter exceeds three, then further errors are treated as permanent faults.
The errors that cannot be corrected by bus macro altering or memory
refreshing are considered as permanent errors. In the Permanent Error Recovery
state, the FPGA is reconfigured with alternative partial bitstreams of the module.
The alternative configuration files reserve empty spaces as described before.
When faulty resource falls into the empty space, the error disappears. Therefore,
reconfiguration is done with these bitstreams until error disappears. Again, reset
and roll-forward operations are done after each reconfiguration process for correct
operation. If the fault cannot be eliminated after trying all configurations, algorithm
passes to the Error state.
After each recover operation, the algorithm waits on a Timer state (i.e.
Wait Timer 2, Wait Timer 3 and Wait Timer 4 states) to ensure the fault is
eliminated. If no error status comes during the Timer states, then algorithm
returns to the Running state.
5.3.10
Fault Injection
It is necessary to test the system behaviour in the presence of faults.
Faults are artificially injected to test the behaviour of designed system. Fault
injection is done by loading an incorrect partial bitstream to the FPGA. Two
methods are used to obtain an incorrect partial bitstream. First method directly
105
modifies a correct partial bitstream. The second method modifies the source
VHDL file then synthesized to provide a faulty bitstream.
5.3.10.1
Bitstream Modification
106
The functions of the LUTs (i.e. Feqn, Geqn) can be changed to any
combination using the input signals. The operators given in Table 5-5 can be used
to describe a logical function. For instance, to inject a stuck-at 1 like fault, Geqn
(the function of upper LUT) is changed from (A1*(A3*(A4*A2))) to 1. This implies a
stuck-at 1 fault on the LUTs function.
107
Symbol
Operation
Logical AND
Logical OR
Logical XOR
Unary NOT
At last, the ncd file is saved with the Save Changes and Closes Window
button. At this point, only remaining operation to generate bitstream is running
Bitgen tool. For this purpose, the following command is executed to generate a
faulty partial bitstream:
bitgen -d -g ActiveReconfig:yes ModOne.ncd
5.3.10.1.1
modone_faulty.bit
108
Table 5-6: Truth Table of LUT Function Before and After a SEU Injection
Output Functions
Inputs
A1
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
A2
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
5.3.10.1.2
A3
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
A4
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
Before SEU
After SEU
F1=(~A1*(A4*~A3)*~A2) F2=(~A1*(A4*~A3))
0
0
1
0
0
0
0
0
0
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
109
5.3.10.2
Faults were also injected on bus macro connections since they are the
only connection path of redundant modules to the Voter. This is achieved by
editing VHDL code of redundant modules. Then a partial configuration is
produced by using edited VHDL file.
For instance, most significant byte of the state output is inverted as shown
in the following code.
DataoutModOne(2 downto 0) <= stateOuputModOne(2 downto 0);
DataoutModOne(3) <= not stateOuputModOne(3);
After loading generated faulty bitstream, the supervisor program will detect
the fault on the bus macro. Then it will try recovery operations such as selecting
alternative bus macro.
110
CHAPTER VI
6CONCLUSIONS
6.1
111
from
reconfigurable
computing.
Later,
designed
fault
tolerant
112
6.2
the PC used as a reconfiguration controller can be removed from the system and
replaced by a part of FPGA. This solution requires an embedded memory and
embedded configuration controller. ICAP port can be also used by embedded
configuration controller. Note that fault tolerant memory architecture is necessary
for this system.
New Bus Macro Design
Xilinx did not publish bus macro structures for the new generation devices
such as Spartan 3 and Virtex 4 yet. Therefore, current bus macro structure used
in the designs is not suitable for these devices. Some researches concentrated for
new bus macro architecture [57]. These researches implement slice based bus
macros. A new device family with new bus macro architecture can be used in
future works.
Automated Design
All VHDL codes are edited three times in the current structure of the
system, since fault tolerance is maintained by three identical circuits. There is a
need for automation for the generation of TMR structure to decrease intervention
of the user. The user must give only the design then the rest of the operations
must be made by the batch files. Since generation of such framework is very time
consuming, it is left as a future work.
Self-Checking
The errors on the voter module can be detected by Concurrent Error
Detection (CED) circuits. Embedding CED circuit on the voter will increase the
reliability of the system.
113
REFERENCES
[1]
Gericota M.G.; Alves G.R.; Silva M.L.; Ferreira J.M., Programmable Logic
Devices: A Test Approach for the Input/Output Blocks and Pad-to-Pin
Interconnections, 4th IEEE Latin-American Test Workshop (LATW'2003),
pp. 72-77, February 2003
[2]
[3]
[4]
[5]
Hartenstein, R.W.; Herz M.; Hoffmann T.; Nageldinger U., Synthesis and
Domain-specific Optimization of KressArray-based Reconfigurable
Computing Engines, Proceedings of the 2000 ACM/SIGDA eighth
international symposium on Field programmable gate arrays, pp. 222-232
2000
[6]
[7]
Hannig, F.; Dutta, H.; Teich, J., "Regular mapping for coarse-grained
reconfigurable architectures," Acoustics, Speech, and Signal Processing,
2004. Proceedings. (ICASSP '04). IEEE International Conference on ,
vol.5, pp. 57-60, 17-21 May 2004
[8]
Upegui A.; Moeckel R.; Dittrich E.; Ijspeert A.; Sanchez E., "An FPGA
Dynamically Reconfigurable Framework for Modular Robotics", Workshop
on Dynamically Reconfigurable Systems at the 18th International
Conference on Architecture of Computing Systems, ARCS '05, pp. 83-89,
Innsbruck, Austria, March 14-17, 2005
[9]
114
[10]
Fong, R.J.; Harper, S.J.; Athanas, P.M., "A versatile framework for FPGA
field updates: an application of partial self-reconfiguration," Rapid Systems
Prototyping, 2003. Proceedings. 14th IEEE International Workshop on , pp.
117- 123, 9-11 June 2003
[11]
[12]
Resano, J.; Mozos, D.; Verkest, D.; Catthoor, F.; Vernalde S., "Specific
scheduling support to minimize the reconfiguration overhead of dynamically
reconfigurable hardware" Design Automation Conference, 2004.
Proceedings. 41st , pp. 119-124, 2004
[13]
Walder, H.; Steiger, C.; Platzner, M., "Fast online task placement on
FPGAs: free space partitioning and 2D-hashing," Parallel and Distributed
Processing Symposium, 2003. Proceedings. International , pp. 178-185,
22-26 April 2003
[14]
[15]
[16]
[17]
[18]
LLanos C.; Jacobi R.P.; Rincn M.A.; Hartenstein R.W., "A Dynamically
Reconfigurable System for Space-Efficient Computation of the FFT",
Proceedings. International Conference on Reconfigurable Computing and
FPGAs 2004 - ReConFig'04, pp 360-369, Colima, Mexico, 2004
[19]
[20]
Ullmann, M.; Huebner, M.; Grimm, B.; Becker, J., "An FPGA run-time
system for dynamical on-demand reconfiguration," Parallel and Distributed
Processing Symposium, 2004. Proceedings. 18th International , pp. 135142, 26-30 April 2004
115
[21]
[22]
[23]
Hollingworth, G.; Smith, S.; Tyrrell, A., "Safe intrinsic evolution of Virtex
devices," Evolvable Hardware, 2000. Proceedings. The Second NASA/DoD
Workshop on , pp.195-202, 2000
[24]
Berthelot, F.; Nouvel, F.; Houzet, D., "Partial and dynamic reconfiguration
of FPGAs: a top down design methodology for an automatic
implementation," Parallel and Distributed Processing Symposium, 2006.
IPDPS 2006. 20th International , pp. 436-439, 25-29 April 2006
[25]
Berthelot, F.; Nouvel, F.; Houzet, D., "Design methodology for runtime
reconfigurable FPGA: from high level specification down to
implementation," Signal Processing Systems Design and Implementation,
2005. IEEE Workshop on , pp. 497-502, 2-4 Nov. 2005
[26]
Faust, O.; Sputh, B.; Nathan, D.; Rezgui, S.; Weisensee, A.; Allen, A., "A
single-chip supervised partial self-reconfigurable architecture for software
defined radio," Parallel and Distributed Processing Symposium, 2003.
Proceedings. International , pp. 191-196, 22-26 April 2003
[27]
[28]
Gericota, M.G.; Alves, G.R.; Silva, M.L.; Ferreira, J.M., "Active replication:
towards a truly SRAM-based FPGA on-line concurrent testing," On-Line
Testing Workshop, 2002. Proceedings of the Eighth IEEE International ,
pp. 165-169, 2002
[29]
Emmert, J.; Stroud, C.; Skaggs, B.; Abramovici, M., "Dynamic fault
tolerance in FPGAs via partial reconfiguration," Field-Programmable
Custom Computing Machines, 2000 IEEE Symposium on , pp.165-174,
2000
[30]
[31]
Xilinx Inc., Spartan-IIE 1.8V FPGA Family: Complete Data Sheet, Xilinx
DS077, 2004
116
[32]
[33]
[34]
[35]
[36]
Xilinx Inc., Virtex FPGA Series Configuration and Readback, Xilinx XAPP
138 v2.8, 2005
[37]
[38]
[39]
Xilinx Inc., JBits SDK 3 for Virtex-II Documentation / JBits Tutorial , 2003
[40]
[41]
[42]
Braeckman G.; Branden G.V.; Touhafi A.; Dessel G.V. Module Based
Partial Reconfiguration: a quick tutorial,
http://iwt5.ehb.be/typo3/index.php?id=415 accessed at 2006,
Erasmushogeschool IWT Department, 2004,
[43]
[44]
Lima, F.; Carro, L.; Reis, R., "Designing fault tolerant systems into SRAMbased FPGAs," Design Automation Conference, 2003. Proceedings , pp.
650-655, 2-6 June 2003
[45]
Graham P.; Caffrey M.; Zimmerman J.; Johnson D.E.; Sundararajan P.;
Patterson C., "Consequences and Categories of SRAM FPGA
Configuration SEUs," Proceedings of the Military and Aerospace
Applications of Programmable Logic Devices (MAPLD), Washington DC,
September 2003
117
[46]
Pontarelli, S.; Cardarilli, G.C.; Malvoni, A.; Ottavi, M.; Re, M.; Salsano, A.,
"System-on-chip oriented fault-tolerant sequential systems implementation
methodology," Defect and Fault Tolerance in VLSI Systems, 2001.
Proceedings. 2001 IEEE International Symposium on , pp.455-460, 2001
[47]
[48]
Gokhale, M.; Graham, P.; Johnson, E.; Rollins, N.; Wirthlin, M., "Dynamic
reconfiguration for management of radiation-induced faults in FPGAs,"
Parallel and Distributed Processing Symposium, 2004. Proceedings. 18th
International , pp. 145-150, 26-30 April 2004
[49]
[50]
[51]
Shu-Yi Yu; McCluskey, E.J., "Permanent fault repair for FPGAs with limited
redundant area," Defect and Fault Tolerance in VLSI Systems, 2001.
Proceedings. 2001 IEEE International Symposium on , vol., no.pp.125-133,
2001
[52]
Kenterlis P.; Kranitis N.; Paschalis A.; Gizopoulos D.; Psarakis M.,"A lowcost SEU fault emulation platform for SRAM-based FPGAs," On-Line
Testing Symposium, 2006. IOLTS 2006. 12th IEEE International , pp. 235241 , 10-12 July 2006
[53]
[54]
[55]
[56]
Bobda C.; Huebner M.; Niyonkuru A.; Bloget B.; Majer M.; Ahmedinia A.,
Designing Partial and Dynamically Reconfigurable Applications on Xilinx
Virtex-II FPGAs using HandelC, University of Erlangen-Nuremberg,
Germany, Technical Report 03-2004
[57]
118
Board
Reference Manual,
APPENDIX A
A PCB AND SCHEMATICS OF THE RS232 CIRCUIT
A Jumper
must be
placed
119
120
Figure A-3: Schematic of RS232 Circuit
APPENDIX B
B SIMULATION OF TWO ROLL FORWARDING
METHODS
121
APPENDIX C
C USER CONSTRAINT FILE OF THE TMR DESIGN
User Constraint File for the First Configuration of Module One
# Start of PACE Area Constraints
AREA_GROUP "AG_Inst_Voter" RANGE = CLB_R1C32:CLB_R28C42 ;
AREA_GROUP "AG_Inst_Voter" RANGE = TBUF_R1C32:TBUF_R8C42 ;
INST "Inst_Voter" AREA_GROUP = "AG_Inst_Voter" ;
AREA_GROUP "AG_Inst_Voter" MODE = RECONFIG ;
AREA_GROUP "AG_Inst_ModOne" RANGE = CLB_R1C24:CLB_R28C31 ;
AREA_GROUP "AG_Inst_ModOne" RANGE = TBUF_R1C24:TBUF_R28C31 ;
INST "Inst_ModOne" AREA_GROUP = "AG_Inst_ModOne" ;
AREA_GROUP "AG_Inst_ModOne" MODE = RECONFIG ;
AREA_GROUP "AG_Inst_ModTwo" RANGE = CLB_R1C16:CLB_R28C23 ;
AREA_GROUP "AG_Inst_ModTwo" RANGE = TBUF_R1C16:TBUF_R28C23 ;
INST "Inst_ModTwo" AREA_GROUP = "AG_Inst_ModTwo" ;
AREA_GROUP "AG_Inst_ModTwo" MODE = RECONFIG ;
AREA_GROUP "AG_Inst_ModThr" RANGE = CLB_R1C8:CLB_R28C15 ;
AREA_GROUP "AG_Inst_ModThr" RANGE = TBUF_R1C8:TBUF_R28C15 ;
INST "Inst_ModThr" AREA_GROUP = "AG_Inst_ModThr" ;
AREA_GROUP "AG_Inst_ModThr" MODE = RECONFIG ;
#AREA_GROUP "AG_Inst_Spare" RANGE = CLB_R1C1:CLB_R28C7 ;
#AREA_GROUP "AG_Inst_Spare" RANGE = TBUF_R1C1:TBUF_R28C7 ;
#INST "Inst_Spare" AREA_GROUP = "AG_Inst_Spare" ;
#AREA_GROUP "AG_Inst_Spare" MODE = RECONFIG ;
122
;
;
;
;
;
;
;
;
;
;
;
;
123
APPENDIX D
D PACE AND FPGA EDITOR VIEW OF THE TMR DESIGN
124
Figure D-1: Module Placements of the TMR Design (Snapshot is taken with PACE)
125
Figure D-2: FPGA Editor View of TMR Design
APPENDIX E
E SOURCE FILES OF DESIGNED ARCHITECTURES
A CD-ROM is enclosed to the back cover of the thesis. It contains the
source codes, batch files, and generated files of the designed architectures. The
contents of the CDROM are given in Table E-1.
Table E-1: The Directories and Files in the CDROM
Reconfig-ALU/
BusMacro/
Implementation/
left_mult/
left_sub/
Pim/
right/
right/
top_final/
top_initial/
Top.ucf
1-Initial.bat
2-Active.bat
3-Assemble.bat
126
Reconfig-ALU/
Synthesis/
Borland-Project/
FTArchitecture/
Contains Xilinx ISE projects and VHDL files for partial modules and
top module
left_add/
left_mult/
left_sub/
right/
top /
Configurations/
Macros/
Ucf/
Implementation/
Modone_X/
Modtwo_X/
Modthr_X/
Voter_1/
Pim/
127
ModOne/
ModTwo/
ModThr/
Voter/
FTArchitecture/
Implementation/
Synthesis/
Borland-Project/
Top.ucf
top_final/
top_initial/
0-Reset.bat
1-Initial.bat
2-Active.bat
3-Assemble.bat
Contains Xilinx ISE projects and VHDL files for partial modules and
top module
Modone_1/
Modtwo_1/
Modthr_1/
Modone_bme/
Modtwo_bme/
Modthr_bme/
Voter_1/
top /
Configurations/
128