Efficient Design of Majority-Logic-Based Approximate Arithmetic Circuits
Efficient Design of Majority-Logic-Based Approximate Arithmetic Circuits
Abstract— Approximate computing (AC) offers benefits by to many applications and architectures, such as data analysis,
reducing the requirement for full accuracy, thereby reducing image recognition, multimedia, and signal processing [3], [4].
power consumption and area. The majority logic (ML) gate There have been a number of emerging nanotechnolo-
functions as the fundamental logic block of many emerging
gies proposed in recent years, including quantum-dot cellular
nanotechnologies. In this article, ML-based arithmetic circuits,
i.e., multibit adders and multipliers, are proposed. These adders automata (QCA) [5], nanomagnets logic [6], and spin-wave
are designed to prevent the propagation of inexact carry-out devices [7]. These techniques are based on the majority logic
signals to higher order computing parts to enhance accuracy. (ML) abstraction, which differs from the traditional Boolean
We implemented the proposed multiplier by using a unique logic. The intrinsic energy consumption of nanotechnology is
partial product reduction (PPR) circuitry, which was based on lower than that of CMOS. Also, the ML function is more
the parallel approximate 6:3 compressor. Several logic imple- expressive than these traditional two-input Boolean logic oper-
mentation costs, error metrics, and layouts implemented by
ations. Thus, this article uses ML to implement the proposed
quantum-dot cellular automata (QCA) are analyzed to evaluate
the adder designs. A significant improvement is observed over designs for approximate circuits.
previous ML-based designs based on the experimental results. Adders and multipliers are arithmetic units that are widely
The proposed designs are further evaluated using both a neural used in computing systems. Thus the performance of comput-
network (NN) accelerator and image processing. A structural ing systems is significantly influenced by the speed and power
similarity (SSIM) value of 1 and a peak signal-to-noise ratio consumption of arithmetic circuits. Although researchers have
(PSNR) value of infinity are achieved by the proposed adder proposed a variety of designs for the approximate circuit in
design.
the transistor-based technologies [8], [9], [10], [11], [12], [13],
Index Terms— Approximate adder, approximate compressor, [14], these designs are less attractive when implemented in
approximate computing (AC), approximate multiplier, image other nontransistor or technologies that use different logic
processing, majority logic (ML). gates. As an example, the design shown in [12] adopts a
lot of XOR operations for carry generation and propaga-
I. I NTRODUCTION tion. However, ML operations are inefficient when represent-
ing XOR gates with two or more inputs; for more details,
W ITH the increasing integration of circuits, the tradi-
tional CMOS technologies have been gradually limited
in the design of VLSI circuits. The power dissipation of com-
see Section II-A.
In this article, we propose both ML-based approximate
full adders (MLAFAs) and ML-based approximate multipliers
puting systems is still an increasingly serious problem, despite (MLAMs). These contributions are described in the following.
advances in semiconductor technology and energy-efficient
design techniques [1]. 1) Our work presents a direct method for designing multibit
As a new computing paradigm at the nanoscale, approx- approximate circuits, allowing us to reduce the critical
imate computing (AC) offers a promising solution to the path delay and enhance the accuracy of our proposed
VLSI industry by trading precision for reduced complexity 2- and 4-bit adders significantly. As a result of the spe-
and power consumption. AC takes advantage of the inherent cial structure of the proposed adders, long computation
error tolerance of the application to balance performance and sequences are less prone to accumulating errors.
accuracy of the circuit [2]. As a result, AC can be applied 2) We propose an approximate parallel 6:3 compressor
and show how it can be used in combination with
Manuscript received 2 May 2022; revised 9 August 2022 and 6 September the Wallace-based distinctively partial product reduc-
2022; accepted 21 September 2022. Date of publication 11 October 2022; tion (PPR) circuitry to produce a simple and efficient
date of current version 9 December 2022. This work was supported in part
by the NSFC under Grant 61871242, Grant 61871216, and Grant 62022041.
8 × 8 multiplier. As an alternative to the conventional
(Corresponding author: Zhufei Chu.) 4:2 compressor, the proposed compressor can compress
Zhufei Chu, Chuanhe Shang, Yinshui Xia, and Lunyao Wang are with the six partial products simultaneously with a simpler circuit
Faculty of Electrical Engineering and Computer Science, Ningbo University, structure.
Ningbo 315211, China (e-mail: [email protected]).
Tingting Zhang is with the Department of Electrical and Computer Engi- 3) Adders and multipliers are used in image processing
neering, University of Alberta, Edmonton, AB T6G 1H9, Canada. applications. The results are evaluated by considering
Weiqiang Liu is with the College of Electronic and Information Engineering, structural similarity (SSIM) and peak signal-to-noise
Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China. ratio (PSNR). In addition, multipliers are used to
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TVLSI.2022.3210252. develop low-power neural network (NN) accelerators for
Digital Object Identifier 10.1109/TVLSI.2022.3210252 machine learning.
1063-8210 © 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on May 09,2023 at 06:57:30 UTC from IEEE Xplore. Restrictions apply.
1828 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 12, DECEMBER 2022
A. Majority Logic
The ML operation acts as a voter, denoted as M(x 1 , . . . , x n ), adder (MLEFA) is natively a majority gate, i.e., Cout =
where n is typically an odd number. The function evaluates M(A, B, Cin ). The summation function actually acts as a
to true if more than [(n − 1)/2] variables are true. The logic three-input XOR gate. Exact synthesis can be used to find opti-
expression of a majority-of-three function (see Fig. 1) over mal logic expressions based on specified logic primitives [29].
three Boolean variables A, B, and C is In terms of the number of majority gates, at least three majority
F = M(A, B, C) = AB + AC + BC. (1) gates are required. The implementation proposed in [30]
reveals that MLEFA requires three ML gates and two inverters
By setting any one of the inputs to constant zero or as shown in Fig. 2, where S = M(Cout , M(A, B, Cin ), Cin ).
one, the majority-of-three function is reduced to AND or Another alternative realization of the summation operation is
OR, respectively. As an example, M(A, B, 0) = AB and S = M(Cin , M(A, B, Cin ), M(A, B, M(A, B, Cin ))), in which
M(A, B, 1) = A + B. Hence, ML can be seen as a gene- only one inverter is required but the logic depth is increased
ralization of the traditional AND/ OR-based logic. Recently, from 2 to 3.
the ML-based logic is established as a graph repre-
sentation for synthesizing Boolean functions [23], [24],
which yields promising synthesis results for both FPGA/ C. Exact 4:2 Compressor
ASIC [25], [26] and nanocircuit designs [27], [28].
The arithmetic circuits often use XOR gates. XOR expressions Compressors are used for implementing the PPR stage in
with two inputs require three majority-of-three operations, high-performance and energy-efficient multipliers [31]. The
which is A ⊕ B = M( Ā, M(A, B, 1), M(A, B̄, 0)). general schematic of an exact 4:2 compressor is shown in
Fig. 3. The exact 4:2 compressor has four inputs (x 1 , x 2 ,
x 3 , and x 4 ) and two outputs (Sum and Carry). The carry
B. ML-Based Exact Full Adder input (Cin ) comes from the preceding block of lower sig-
Given three Boolean variables A, B, and carry input Cin , nificance, and the carry output (Cout ) is carried to the next
the carry output operation of an ML-based exact full block of higher significance. The logic expression of the
Authorized licensed use limited to: PES University Bengaluru. Downloaded on May 09,2023 at 06:57:30 UTC from IEEE Xplore. Restrictions apply.
CHU et al.: EFFICIENT DESIGN OF MAJORITY-LOGIC-BASED APPROXIMATE ARITHMETIC CIRCUITS 1829
Authorized licensed use limited to: PES University Bengaluru. Downloaded on May 09,2023 at 06:57:30 UTC from IEEE Xplore. Restrictions apply.
1830 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 12, DECEMBER 2022
TABLE II
R EDUCED T RUTH TABLE OF MLAFA-a
Authorized licensed use limited to: PES University Bengaluru. Downloaded on May 09,2023 at 06:57:30 UTC from IEEE Xplore. Restrictions apply.
CHU et al.: EFFICIENT DESIGN OF MAJORITY-LOGIC-BASED APPROXIMATE ARITHMETIC CIRCUITS 1831
Authorized licensed use limited to: PES University Bengaluru. Downloaded on May 09,2023 at 06:57:30 UTC from IEEE Xplore. Restrictions apply.
1832 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 12, DECEMBER 2022
TABLE V
C OMPARISON OF I MAGE P ROCESSING R ESULTS U SING 8-bit MLAFA S
Fig. 6. Image processing of 8-bit MLAFAs. (a) Original image, (b) MLAFA1212-1212, (c) MLAFA1212-1233, (d) MLAFA1212-12MSA,
(e) MLAFA1212-12a, (f) MLAFA1212-12b, (g) MLAFA1212-3333, (h) MLAFA1212-LSAMSA, (i) MLAFA1212-aa, (j) MLAFA1212-ab,
(k) MLAFA1212-bb, (l) MLAFA1212-I, (m) MLAFA1212-II, (n) MLAFA1233-3333, (o) MLAFA12LSA-LSAMSA, (p) MLAFA12b-bb, (q) MLAFA3333-
3333, (r) MLAFALSALSA-LSAMSA, (s) MLAFAbb-bb, (t) MLAFAI-I, and (u) MLAFAII-II.
from −1 to 1, measures the similarity between two images construct several representative 8-bit adders for comparisons.
and measures one when the images are identical. The PSNR The comparisons are shown in Table V, and the images are
is the logarithm of the squared error between the original and shown in Fig. 6.
processed images relative to the square of the maximum value 1) Accuracy: With cascading larger bit-width adders, MAE
of the signal. The unit of PSNR is decibel (dB). The higher and NMED for the 8-bit adders are reduced significantly. As an
the PSNR value, the less distortion it represents. example, the proposed MLAFAI-I consumes the same number
For comparison, we use MLAFA1212-1212 [15], which of majority gates and inverters, but with a logic depth of two
achieves relative efficiency tradeoff between accuracy and the instead of five in MLAFA1212-1212. Specifically, the MAE
number of logic gates, as the original 8-bit RCA. Since 8-bit is reduced from 170 to 85, and the NMED is optimized from
adders can be implemented by cascading 2- or 4-bit adders, 0.0904 to 0.0560, which represents improvements of 50%
we adopt the designs proposed in [15], [16] and ours to and 38.1%, respectively. The reasons come from two aspects.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on May 09,2023 at 06:57:30 UTC from IEEE Xplore. Restrictions apply.
CHU et al.: EFFICIENT DESIGN OF MAJORITY-LOGIC-BASED APPROXIMATE ARITHMETIC CIRCUITS 1833
On one hand, the proposed larger bit-width adders are non- TABLE VI
cascaded designs, which have low MAEs inherently. It means C OMPARISON OF L AYOUT R ESULTS OF 8-bit MLAFA S
that when the adders are used as a building block for large
circuits, the resulting ED is relatively smaller than a cascaded
design with several smaller bit-width adders. In contrast, the
proposed adders are independent of the carry chain, so they
do not cause unexpected errors resulting from previous blocks’
carries.
2) Logic Implementation Cost: Among the designs,
MLAFA12b-bb, MLAFAbb-bb, MLAFAI-I, and MLAFAII-II
have a logic depth of two, which is the minimum. This is
also due to the design outputs not being dependent on the
carry inputs, thus preventing the carry chain from growing on
a critical path.
3) SSIM and PSNR: The results of SSIM and PSNR indi-
cate that the proposed designs perform better than MLAFA12,
MLAFA33, LSA, and MSA. In the most efficient hybrid
design, MLAFALSALSA-LSAMSA, the SSIM and PSNR are
0.7313 and 36.9670 dB, respectively. After using our proposed
designs as a building module, six designs got SSIM of more
than 0.8 and seven got PSNR greater than 40 dB. For image
processing, the proposed 4-bit designs generally outperform
the proposed 2-bit adders, with the exception of MLAFA-b.
Interestingly, as the number of MLAFA-b modules increases,
the values of SSIM and PSNR are much better than the
others. Also, the experiments yielded results that were almost
indistinguishable from the original figure when the number of
MLAFA-b modules was four. Thus, the SSIM was one and
the PSNR was infinity. Small-bit width arithmetic circuits are extensively studied
4) Layouts: Both the proposed and compared circuits are due to their energy efficiency. Researchers have demonstrated
implemented by the QCA technology, in which the ML-based that NN inference computational bit width can be scaled down
gates are the building logic blocks. The circuit layouts to just a few bits [37]. We also construct some special 16-bit
are designed, simulated, and characterized using the QCA adders for further evaluation in addition to the 2-, 4-, and 8-bit
Designer-E 2.2 software tool with default settings. With regard adders. The accuracy and QCA layout metrics are shown in
to the QCA layouts, we use a multilayer wire crossing Table VII. For the accuracy metrics, MAE and NMED are
approach with a uniform layout strategy and a four-phase generated by simulating 210 × 210 × 2 combinations of input
clocking scheme. operands that are randomly created and satisfy the uniform
The comparison results of the 8-bit approximate adders distribution. The proposed adders still perform well when
are shown in Table VI, in which the number of QCA cells, constructing adders with large-bit width. Compared with a
area, delay (the number of clocking phases), and energy are general adder, adders with independence from the carry chain
demonstrated, respectively. The design MLAFAI-I achieves exhibit progressively larger advantages in area, delay, and
the minimum number of QCA cells, minimum area, minimum power consumption as the bit width increases. When com-
delay, and minimum energy. In general, the performance pared with MLAFAII-II-II-II, the MLAFALSALSA-LSALSA-
of layouts is consistent with the cost of logic implementa- LSALSA-LSAMSA design has better metrics in MAE and
tion. However, there are some exceptions due to the layout NMED. However, our design has a lower implementation cost
strategy adopted. As an example, the design MLAFA12b-bb, in the QCA layouts.
which has a logic depth of two, uses four clocking phases
in QCA compared with three other designs (MLAFAbb-bb, V. P ROPOSED A PPROXIMATE M ULTIPLIERS
MLAFAI-I, and MLAFAII-II). A parallel 6:3 compressor and a PPR circuitry are proposed
It is observed from Table VI that the proposed designs in this section that yield an efficient balance between logic
outperform the designs in [15] above all the metrics when implementation cost and accuracy. Then we create and use an
constructing large-bit width adders. This is mainly due to efficient, imprecise multiplier for multiplying the images and
the fact that the carry signals of the proposed adders are building energy-efficient NN accelerators.
independent of the carry chain. In Fig. 7, the layouts of
the proposed designs are extended in only one direction, A. Proposed Approximate Compressor
which greatly reduces the area, delay, and energy consump- The three steps of multiplication are: 1) partial products
tion of the circuit. Although the designs proposed in [16] generation; 2) PPR; and 3) final products generation by RCA.
are likewise independent of the carry chain, their com- By taking these three elements into account, PPR contributes
plex structural design results in slightly worse performance significantly to latency, power consumption, and design com-
overall. plexity. Adding proper arithmetic blocks, e.g., compressors,
Authorized licensed use limited to: PES University Bengaluru. Downloaded on May 09,2023 at 06:57:30 UTC from IEEE Xplore. Restrictions apply.
1834 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 12, DECEMBER 2022
TABLE VII
C OMPARISON OF A CCURACY AND L AYOUT R ESULTS OF 16-bit MLAFA S
Fig. 7. QCA layouts of samples for 8-bits MLAFAs. (a) MLAFA1212-1212, (b) MLAFA12LSA-LSAMSA, (c) MLAFAbb-bb, and (d) MLAFAI-I.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on May 09,2023 at 06:57:30 UTC from IEEE Xplore. Restrictions apply.
CHU et al.: EFFICIENT DESIGN OF MAJORITY-LOGIC-BASED APPROXIMATE ARITHMETIC CIRCUITS 1835
TABLE VIII
C OMPARISON OF A PPROXIMATE C OMPRESSORS
Authorized licensed use limited to: PES University Bengaluru. Downloaded on May 09,2023 at 06:57:30 UTC from IEEE Xplore. Restrictions apply.
1836 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 12, DECEMBER 2022
Fig. 10. Reduction process of an unsigned 8 × 8 multiplier. (a) General PPR circuit. (b) Proposed PPR circuit.
TABLE IX
C OMPARISONS OF 8 × 8 A PPROXIMATE M ULTIPLIERS
MATLAB is used to multiply two images by pixels to pixels. 2) Logic Implementation Cost: The cost of logic implemen-
While the proposed MLAPC uses the proposed PPR circuitry, tation is significantly optimized for the number of majority
the other 4:2 compressors use a general Dadda tree compres- gates, inverters, and logic depth. The main reason is the
sion technique. Note that two exact full adders are used at the discarding of several primary inputs and the further reduction
second stage of the PPR circuit to improve the accuracy of the in area in the second stage of the simplification caused by
multiplier. As a result, in the general reduction circuit, exact their unique structure. As an example, compared with the best
compression is also applied to the corresponding columns for results proposed in [20], the number of majority gates reduced
fair comparison. from 96 to 80, the number of inverters reduced from 36 to 18,
Table IX presents the results of image processing using the and the logic depth reduced from 21 to 12, which correspond
multipliers of the proposed MLAPC and the designs of other to 16.67%, 50%, and 42.86% improvement.
literatures. In these results, we have applied all 65 536 inputs to 3) SSIM and PSNR: The approximate multiplier using
the circuits. Fig. 11 illustrates the image multiplication results MLAPC as the approximate compression design shows good
for various examples. performance with SSIM of at least 0.9 and PSNR of at
1) Accuracy: In addition to having the lowest NMED, least 46 dB. Compared with MLAC1, which has a relatively
the proposed approximate multiplier also has a noteworthy better performance among the mentioned compressors, there
reduction in MAE. Compared with the approximate design is still a slight improvement in SSIM and PSNR of 8.11%
in [17], which has the most efficient NMED, our design makes and 14.91%, respectively. There is no significant difference
the MAE reduced from 10 320 to 7746, and the NMED from between the output of the exact multiplier and the output of
0.0304 to 0.0277, which correspond to 24.94% and 8.88% the proposed approximate multiplier when we visually observe
reduction, respectively. the obtained image directly.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on May 09,2023 at 06:57:30 UTC from IEEE Xplore. Restrictions apply.
CHU et al.: EFFICIENT DESIGN OF MAJORITY-LOGIC-BASED APPROXIMATE ARITHMETIC CIRCUITS 1837
Fig. 11. Image processing of 8 × 8 MLAMs. (a) Exact-compressor-based, (b) MLAC1-based, (c) MLAC2-based, (d) MLAC4-based, (e) MLAC22-2-
based, (f) MLAC12-1-based, (g) Taheri-based, (h) Sabetzadeh-based, (i) Salmanpour-based, (j) MLAPC-based (case 1), (k) MLAPC-based (case 2), and
(l) MLAPC-based (case 3).
Authorized licensed use limited to: PES University Bengaluru. Downloaded on May 09,2023 at 06:57:30 UTC from IEEE Xplore. Restrictions apply.
1838 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 30, NO. 12, DECEMBER 2022
a unique PPR circuit are proposed for the parallel compressor [10] H. R. Mahdiani, A. Ahmadi, S. M. Fakhraie, and C. Lucas, “Bio-inspired
in multiplier. They are able to greatly reduce the area and delay imprecise computational blocks for efficient VLSI implementation of
soft-computing applications,” IEEE Trans. Circuits Syst. I, Reg. Papers,
without significantly degrading the quality of the multiplier. vol. 57, no. 4, pp. 850–862, Apr. 2010.
In addition, the proposed compressor is strongly generalizable. [11] N. Zhu, W. L. Goh, G. Wang, and K. S. Yeo, “Enhanced low-power
It requires only one majority gate with a constant of “1,” which high-speed adder for error-tolerant application,” in Proc. Int. SoC Design
is an OR gate that is excellently implemented in other logic Conf., Nov. 2010, pp. 323–327.
[12] S. K. Patel, B. Garg, and S. K. Rai, “An efficient accuracy reconfigurable
primitives. CLA adder designs using complementary logic,” J. Electron. Test.,
Compared with other existing approximate designs, there is vol. 36, no. 1, pp. 135–142, Feb. 2020.
a significant improvement in terms of logic implementation [13] A. Dalloo, A. Najafi, and A. Garcia-Ortiz, “Systematic design of
an approximate adder: The optimized lower part constant-OR adder,”
cost, accuracy, and QCA layout cost. Based on the compre- IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 26, no. 8,
hensive experimental results, the following conclusions can be pp. 1595–1599, Aug. 2018.
drawn. [14] F. Frustaci, S. Perri, P. Corsonello, and M. Alioto, “Energy-quality
1) By directly designing multibit approximate adders, the scalable adders based on nonzeroing bit truncation,” IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 27, no. 4, pp. 964–968, Apr. 2019.
proposed designs reduce hardware overhead while ensur- [15] W. Liu, T. Zhang, E. McLarnon, M. OrNeill, P. Montuschi, and
ing lower accuracy loss. As an example, in comparison F. Lombardi, “Design and analysis of majority logic-based approximate
to the 4-bit adders in [15], the MAE and NMED are adders and multipliers,” IEEE Trans. Emerg. Topics Comput., vol. 9,
no. 3, pp. 1609–1624, Jul. 2021.
decreased by 47.39% and 40.25%, respectively. Com- [16] S. Perri, F. Spagnolo, F. Frustaci, and P. Corsonello, “Accuracy improved
pared with the cascaded proposed 2-bit adders, the low-energy multi-bit approximate adders in QCA,” IEEE Trans. Circuits
proposed 4-bit designs show a 29.22% reduction in Syst. II, Exp. Briefs, vol. 68, no. 11, pp. 3456–3460, Nov. 2021.
NMED on average. Furthermore, the proposed approx- [17] M. H. Moaiyeri, F. Sabetzadeh, and S. Angizi, “An efficient majority-
based compressor for approximate computing in the nano era,”
imate adders, which implement a truncated carry chain, Microsyst. Technol., vol. 24, no. 3, pp. 1589–1601, Mar. 2018.
have significant advantages over the other adders imple- [18] S. Angizi, H. Jiang, R. F. DeMara, J. Han, and D. Fan, “Majority-
mented by the QCA technology. based spin-CMOS primitives for approximate computing,” IEEE Trans.
Nanotechnol., vol. 17, no. 4, pp. 795–806, Jul. 2018.
2) The proposed PPR circuit offers the best overall per- [19] M. Taheri, A. Arasteh, S. Mohammadyan, A. Panahi, and K. Navi,
formance in MLAPC. On average, NMED and MAE “A novel majority based imprecise 4:2 compressor with respect to the
are decreased by 37.45% and 19.37%, respectively, current and future VLSI industry,” Microprocessors Microsyst., vol. 73,
compared with other approximate compressors. It also Mar. 2020, Art. no. 102962.
[20] F. Sabetzadeh, M. H. Moaiyeri, and M. Ahmadinejad, “A majority-
performs better in terms of hardware overhead than other based imprecise multiplier for ultra-efficient approximate image mul-
designs. By comparing the proposed multipliers with the tiplication,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 66, no. 11,
exact multiplier, 65.52% of the majority gate, 83.93% of pp. 4200–4208, Nov. 2019.
the inverter, and 60% of the critical path delay can be [21] F. Salmanpour, M. H. Moaiyeri, and F. Sabetzadeh, “Ultra-compact
imprecise 4:2 compressor and multiplier circuits for approximate com-
saved. puting in deep nanoscale,” Circuits, Syst., Signal Process., vol. 40, no. 9,
The proposed approximate adders and multipliers have pp. 4633–4650, Sep. 2021.
proven suitable for applications such as image processing [22] F. S. Torres, R. Wille, P. Niemann, and R. Drechsler, “An energy-aware
model for the logic synthesis of quantum-dot cellular automata,” IEEE
and machine learning that require low accuracy and high Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 37, no. 12,
speed. A number of case studies demonstrate that the proposed pp. 3031–3041, Dec. 2018.
designs are effective in error-tolerant applications. [23] L. Amarú, P.-E. Gaillardon, and G. De Micheli, “Majority-inverter graph:
A novel data-structure and algorithms for efficient logic optimization,”
in Proc. 51st Annu. Design Autom. Conf. Design Autom. Conf. (DAC),
2014, pp. 1–6.
R EFERENCES [24] W. Haaswijk, M. Soeken, L. Amarú, P.-E. Gaillardon, and G. D. Micheli,
“A novel basis for logic rewriting,” in Proc. 22nd Asia South Pacific
[1] Q. Xu, M. Todd, and S. K. Nam, “Approximate computing: A survey,” Design Autom. Conf. (ASP-DAC), Jan. 2017, pp. 151–156.
IEEE Design Test, vol. 33, no. 1, pp. 8–22, Feb. 2016. [25] L. Amaru, P. E. Gaillardon, and G. D. Micheli, “Majority-inverter graph:
[2] S. Mittal, “A survey of techniques for approximate computing,” ACM A new paradigm for logic optimization,” IEEE Trans. Comput.-Aided
Comput. Surv., vol. 48, no. 4, pp. 1–33, 2016. Design Integr. Circuits Syst., vol. 35, no. 5, pp. 806–819, May 2016.
[3] S. Venkataramani, S. T. Chakradhar, K. Roy, and A. Raghunathan, [26] Z. Chu, M. Soeken, Y. Xia, L. Wang, and G. D. Micheli, “Advanced
“Approximate computing and the quest for computing efficiency,” in functional decomposition using majority and its applications,” IEEE
Proc. 52nd Annu. Design Autom. Conf., Jun. 2015, pp. 1–6. Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 39, no. 8,
[4] W. Liu, F. Lombardi, and M. Schulte, “A retrospective and prospec- pp. 1621–1634, Aug. 2020.
tive view of approximate computing,” Proc. IEEE, vol. 108, no. 3, [27] Z. Chu, Z. Li, Y. Xia, L. Wang, and W. Liu, “BCD adder designs based
pp. 394–399, Mar. 2020. on three-input XOR and majority gates,” IEEE Trans. Circuits Syst. II,
[5] C. S. Lent and P. D. Tougaw, “A device architecture for computing with Exp. Briefs, vol. 68, no. 6, pp. 1942–1946, Jun. 2021.
quantum dots,” Proc. IEEE, vol. 85, no. 4, pp. 541–557, Apr. 1997. [28] Z. Chu, H. Tian, Z. Li, Y. Xia, and L. Wang, “A high-performance
[6] M. Vacca et al., “Nanomagnet logic: An architectural level overview,” design of generalized pipeline cellular array,” IEEE Comput. Archit.
in Field-Coupled Nanocomputing. Cham, Switzerland: Springer, 2014, Lett., vol. 19, no. 1, pp. 47–50, Jan. 2020.
pp. 223–256. [29] M. Soeken, L. G. Amaru, P.-E. Gaillardon, and G. D. Micheli,
[7] A. Khitun and K. L. Wang, “Nano scale computational architectures with “Exact synthesis of majority-inverter graphs and its applications,” IEEE
spin wave bus,” Superlattices Microstruct., vol. 38, no. 3, pp. 184–200, Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 36, no. 11,
2005. pp. 1842–1855, Nov. 2017.
[8] W. Liu, L. Qian, C. Wang, H. Jiang, J. Han, and F. Lombardi, [30] H. Cho and E. E. Swartzlander, “Adder and multiplier design in
“Design of approximate radix-4 booth multipliers for error-tolerant quantum-dot cellular automata,” IEEE Trans. Comput., vol. 58, no. 6,
computing,” IEEE Trans. Comput., vol. 66, no. 8, pp. 1435–1441, pp. 721–727, Jun. 2009.
Aug. 2017. [31] C.-H. Chang, J. Gu, and M. Zhang, “Ultra low-voltage low-power
[9] Z. Yang, A. Jain, J. Liang, J. Han, and F. Lombardi, “Approximate CMOS 4–2 and 5–2 compressors for fast arithmetic circuits,” IEEE
XOR/XNOR-based adders for inexact computing,” in Proc. 13th IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 51, no. 10, pp. 1985–1997,
Int. Conf. Nanotechnol. (IEEE-NANO), Aug. 2013, pp. 690–693. Oct. 2004.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on May 09,2023 at 06:57:30 UTC from IEEE Xplore. Restrictions apply.
CHU et al.: EFFICIENT DESIGN OF MAJORITY-LOGIC-BASED APPROXIMATE ARITHMETIC CIRCUITS 1839
[32] J. Liang, J. Han, and F. Lombardi, “New metrics for the reliability of Tingting Zhang (Graduate Student Member, IEEE)
approximate and probabilistic adders,” IEEE Trans. Comput., vol. 62, received the B.Sc. and M.Sc. degrees from the
no. 9, pp. 1760–1771, Sep. 2013. College of Electronic and Information Engineering,
[33] C. Labrado, H. Thapliyal, and F. Lombardi, “Design of majority logic Nanjing University of Aeronautics and Astronautics
based approximate arithmetic circuits,” in Proc. IEEE Int. Symp. Circuits (NUAA), Nanjing, China, in 2016 and 2019, respec-
Syst. (ISCAS), May 2017, pp. 1–4. tively. She is currently working toward the Ph.D.
[34] T. Zhang, W. Liu, E. McLarnon, M. O’Neill, and F. Lombardi, “Design degree at the Department of Electrical and Computer
of majority logic (ML) based approximate full adders,” in Proc. IEEE Engineering, University of Alberta, Edmonton, AB,
Int. Symp. Circuits Syst. (ISCAS), May 2018, pp. 1–5. Canada.
[35] V. Pudi and K. Sridharan, “Low complexity design of ripple carry and Her research interests include approximate com-
Brent–Kung adders in QCA,” IEEE Trans. Nanotechnol., vol. 11, no. 1, puting, Ising computing, combinatorial optimization,
pp. 105–119, Jan. 2012. and nanoelectronic circuits and systems.
[36] A. Hore and D. Ziou, “Image quality metrics: PSNR vs. SSIM,” in Proc.
20th Int. Conf. Pattern Recognit., Aug. 2010, pp. 2366–2369. Yinshui Xia (Member, IEEE) received the B.S.
[37] J. Choi, Z. Wang, S. Venkataramani, P. I-Jen Chuang, V. Srinivasan, degree in physics and the M.S. degree in electronic
and K. Gopalakrishnan, “PACT: Parameterized clipping activation for engineering from Zhejiang University, Hangzhou,
quantized neural networks,” 2018, arXiv:1805.06085. China, in 1984 and 1991, respectively, and the Ph.D.
[38] P. J. Song and G. De Micheli, “Circuit and architecture trade-offs for degree in electronic engineering from Edinburgh
high-speed multiplication,” IEEE J. Solid-State Circuits, vol. 26, no. 9, Napier University, Edinburgh, U.K., in 2003.
pp. 1184–1198, Sep. 1991. He was a Visiting Scholar with King’s College
[39] A. Momeni, J. Han, P. Montuschi, and F. Lombardi, “Design and London, London, U.K., in 1999, and then joined
analysis of approximate compressors for multiplication,” IEEE Trans. Edinburgh Napier University as a Research Assistant
Comput., vol. 64, no. 4, pp. 984–994, Apr. 2015. and an Enterprise Fellow from 2000 to 2005. He is
[40] S.-W. Kim and E. E. Swartzlander, “Multipliers with coplanar crossings currently a Professor with the Faculty of Electrical
for quantum-dot cellular automata,” in Proc. 10th IEEE Int. Conf. Engineering and Computer Science, Ningbo University, Ningbo, China. His
Nanotechnol., Aug. 2010, pp. 953–957. research interests include low-power digital circuit design, logic synthesis and
[41] S.-W. Kim and E. E. Swartzlander, “Parallel multipliers for quantum-dot optimization, and system-on-chip (SoC) design.
cellular automata,” in Proc. IEEE Nanotechnol. Mater. Devices Conf.,
Jun. 2009, pp. 68–72.
[42] B. Jacob et al., “Quantization and training of neural networks for Lunyao Wang received the Ph.D. degree in circuits
efficient integer-arithmetic-only inference,” in Proc. IEEE/CVF Conf. and systems from Zhejiang University, Hangzhou,
Comput. Vis. Pattern Recognit., Jun. 2018, pp. 2704–2713. China, in 2012.
[43] Y. Qian, C. Meng, Y. Zhang, W. Qian, R. Wang, and R. Huang, He is currently a Professor with Ningbo Univer-
“Approximate logic synthesis in the loop for designing low-power neural sity, Ningbo, China. His current research interests
network accelerator,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS), include power and area optimization in CMOS cir-
May 2021, pp. 1–5. cuit design, CMOS/nanowire/molecular hybrid cell
[44] Tiny-DNN. Accessed: Mar. 14, 2022. [Online]. Available: https:// assignment, Reed–Muller functions’ synthesis and
github.com/tiny-dnn/tiny-dnn optimization, and approximate logic synthesis for
[45] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based error-tolerant applications.
learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11,
pp. 2278–2324, Nov. 1998.
Weiqiang Liu (Senior Member, IEEE) received the
B.Sc. degree in information engineering from the
Zhufei Chu (Member, IEEE) received the B.S. Nanjing University of Aeronautics and Astronautics
degree in electronic engineering from Shandong (NUAA), Nanjing, China, in 2006, and the Ph.D.
University, Weihai, China, in 2008, and the M.S. and degree in electronic engineering from Queen’s Uni-
Ph.D. degrees in communication and information versity Belfast (QUB), Belfast, U.K., in 2012.
system from Ningbo University, Ningbo, China, in He is currently a Professor and the Vice Dean
2011 and 2014, respectively. of the College of Electronic and Information Engi-
He was a Postdoctoral Fellow with the École neering and the College of Integrated Circuits,
Polytechnique Fédéderale de Lausanne (EPFL), NUAA. He has authored two research books and
Lausanne, Switzerland, from 2016 to 2017. He is over 190 leading journal and conference papers
currently an Associate Professor with Ningbo Uni- (over 70 IEEE and ACM journals including eight invited articles). His research
versity. His current research interests include many interests include energy-efficient and secure computing integrated circuits and
aspects of logic synthesis and its applications. systems.
Dr. Chu serves as the Proceedings Chair from 2019 to 2021 and the Finance Dr. Liu is a member of the IEEE NTC AdCom and CASCOM/VSA
Chair in 2022 for the International Workshop on Logic and Synthesis (IWLS), Technical Committee of the IEEE CAS Society. He received the prestigious
and also a Technical Program Committee Member for IWLS, International Excellent Young Scholar Award by the National Natural Science Foundation
Conference on VLSI Design (VLSID), China Semiconductor Technology of China in 2020 and the Young Scientist Award by the Fok Ying Tung
International Conference (CSTIC), and China Computer Federation Integrated Education Foundation, Ministry of Education, China, in 2022. He has been
Circuit Design and Automation Conference (CCFDAC). He is actively main- listed in the Stanford University’s 2020 list of the top 2% scientists in the
taining the logic synthesis framework ALSO (https://github.com/nbulsi/also). world. He is the VP-Elect for TA of the IEEE Nanotechnology Council
(NTC). He is the Program Co-Chair for the IEEE International Symposium
on Computer Arithmetic (ARITH) 2020 and ACM/IEEE International Sym-
posium on Nanoscale Architectures (NANOARCH) 2022 and also a Technical
Program Committee Member for a number of IEEE/ACM conferences,
Chuanhe Shang received the B.S. degree in elec- including Design Automation Conference (DAC), Design, Automation, and
tronic and information engineering from Chaohu Test in Europe (DATE), ARITH, and International Symposium on Circuits
University, Hefei, China, in 2019, and the M.S. and Systems (ISCAS). He is a Tutorial Organizer and Speaker in DAC
degree in integrated circuit engineering from Ningbo 2022, DATE 2022, IEEE ISCAS 2021, and International Conference on
University, Ningbo, China, in 2022. Omni-layer Intelligent Systems (COINS) 2021. He serves as an Associate
His research interests mainly include approximate Editor for the IEEE T RANSACTIONS ON C IRCUITS AND S YSTEMS —I:
arithmetic circuits. R EGULAR PAPERS (TCAS-I), IEEE T RANSACTIONS ON E MERGING T OPICS
IN C OMPUTING (TETC), and IEEE T RANSACTIONS ON C OMPUTERS (TC),
a Steering Committee Member of the IEEE T RANSACTIONS ON V ERY
L ARGE S CALE I NTEGRATION (VLSI) S YSTEMS , and the Guest Editor of
P ROCEEDINGS OF THE IEEE.
Authorized licensed use limited to: PES University Bengaluru. Downloaded on May 09,2023 at 06:57:30 UTC from IEEE Xplore. Restrictions apply.