2023 Wang

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 20, NO.
1, JANUARY 2024 369
Blockchain-Empowered Distributed Multicamera

Multitarget Tracking in Edge Computing
Shuai Wang , Hao Sheng , Member, IEEE, Yang Zhang , Da Yang , Jiahao Shen ,
and Rongshan Chen
Abstract—The rapid increase in the volume of video data Index Terms—Distributed multicamera multiobject track-
generated from edges in the Industrial Internet of Things, ing (MCMT), edge computing, edge intelligence, multicam-
opens up new possibilities for enhancing the application era tracking chain (MCTChain).
of video service. Multicamera multiobject tracking (MCMT)
has always been a fundamental task in video surveillance or
traffic control. However, the traditional MCMT methods are
limited by the communication bottleneck and computation I. INTRODUCTION
resources of the centralized curator, and suffer from secu- HE amount of video data generated by the camera in the
rity and privacy issues. In this article, we first design mul-
ticamera multihypothesis tracking (MC-MHT) framework to
achieve real-time tracking performance among edge cam-
T modern city has brought inestimable value. Multicamera
multiobject tracking (MCMT), as a basic problem, has been a hot
eras. The complex association of objects is described by topic in past years. However, the traditional centralized MCMT
multiskip trees. The tracking task is well distributed to each paradigm can not adapt to the current huge scale of video and
camera. Then, we integrate multicamera tracking chain into suffers from the latent single failure point problem.
MC-MHT to ensure security and trust. The state transition
of targets in multicamera is illustrated from the perspective Centralized MCMT needs to collect the raw video from differ-
of blockchain transactions. The transactions are validated ent cameras, which first brings strict requirements to hardware.
by an integrated tracking consensus to counter Byzantine However, such paradigm incurs bandwidth congestion, low scal-
behavior. Numerical results derived from real-world scenar- ability, and high latency [1]. Second, the traditional MCMT sys-
ios and CAMPUS dataset show that the proposed method tems [2], [3] face serious privacy and security problems. On the
achieves real-time performance (24–36 FPs) and 79.0–82.4
MOTA indicator, as well as reduces identity switch errors one hand, the security and robustness of centralized computation
about 71% under Byzantine attack. are limited by single-point failure, which is intolerant in real
industrial environments. On the other hand, data leakage may
take place during data storage, transmission and sharing, leading
to serious issues for data owners and the administrator [4].
Manuscript received 16 November 2022; revised 6 February 2023; The video data contains too much private information so the
accepted 15 March 2023. Date of publication 28 March 2023; date of data owners are unwilling to upload the raw video, which is
current version 11 December 2023. This work was supported in part by contradictory with centralized computation. These dilemmas
the National Key R&D Program of China under Grant 2021YFB2104800,
in part by the National Natural Science Foundation of China under are the main obstacles to applying MCMT in real industrial
Grant 61872025, in part by the Science and Technology Development environments.
Fund, Macau SAR under Grant 0001/2018/AFJ, and in part by the Open Driven by the Industrial Internet of Things (IIoT), some
Fund of the State Key Laboratory of Software Development Environ-
ment under Grant SKLSDE-2021ZX-03. Paper no. TII-22-4714. (Shuai researchers [1], [5] have studied distributed technology and edge
Wang and Yang Zhang are co-first authors.) (Corresponding author: Hao computing to solve the problems aforementioned. Delegating
Sheng.) workloads of the centralized server to the nearer edge computing
Shuai Wang, Da Yang, Jiahao Shen, and Rongshan Chen are
with the State Key Laboratory of Virtual Reality Technology and Sys- nodes is becoming the trend [6], which sufficiently takes advan-
tems, School of Computer Science and Engineering, Beihang Uni- tage of distributed camera deployment. The raw video data are
versity, Beijing 100191, China, and also with the Beihang Hangzhou no longer required to be uploaded. Some works [7], [8] utilize the
Innovation Institute Yuhang, Hangzhou 310023, China (e-mail:
[email protected]; [email protected]; jiahaoplus@buaa. distributed camera deployment and achieve remarkable success
edu.cn; [email protected]). in many filed like traffic management. However, although dis-
Hao Sheng is with the State Key Laboratory of Virtual Reality Technol- tributed technology solves the extension of the MCMT system,
ogy and Systems, School of Computer Science and Engineering, Bei-
hang University, Beijing 100191, China, also with the Beihang Hangzhou it brings strict requirements on algorithms. Meanwhile, trust
Innovation Institute Yuhang, Hangzhou 310023, China, and also with among multicamera has become another core concern. In this
the Faculty of Applied Sciences, Macao Polytechnic University, Macao regard, existing works mainly focus on the local target associ-
999078, China (e-mail: [email protected]).
Yang Zhang is with the College of Information Science and Technol- ation, without considering the tolerance of single-point failure
ogy, Beijing University of Chemical Technology, Beijing 100029, China or potential attack. But in practice, one camera may crash due
(e-mail: [email protected]). to the complex communication environment. In some extreme
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TII.2023.3261890. cases, the edge device may suffer malicious attacks and transfer
Digital Object Identifier 10.1109/TII.2023.3261890 false target information, such as features to the MCMT system,
1551-3203 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 07,2024 at 08:19:50 UTC from IEEE Xplore. Restrictions apply.
370 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 20, NO. 1, JANUARY 2024
which results in the recognization failure of targets. Therefore, framework in great detail. In Section IV, we introduce the
how to realize trustworthy cooperation among multiparty can be implementation of MCTChain including the system model,
a big challenge. transaction, and consensus algorithm. We report the evaluation
Emerging blockchain technology has become an important in Section V and finally, Section VI concludes this aticle.
way to ensure trustworthy cooperation [9], [10]. Its decen-
tralization, transparency, traceability, and immutability can en- II. RELATED WORK
hance the data integrity, the fairness, and authenticity of multi-
party [11]. Therefore, some works adopt blockchain to improve A. Single-Camera Multiobject Tracking (SCMT)
the security of data processing. Wu et al. [12] showed how the SCMT refers to the process of continuously identifying,
convergence of these two paradigms can enable security and locating, and maintaining the consistency of multiple targets
scalability in IIoT. Sheng et al. [13] integrated blockchain into of interest in a single camera area. In past years, tracking-by-
cross-camera tracking and design STCCNet. They pay more detection has become a popular paradigm for its clear workflow.
attention to the feature presentation of targets and directly adopt It first obtains the detections from video, then links detections
proof-of-work (PoW). However, heavy resource requirements to completed trajectories.
for solving mining puzzles in PoW, limit the applicability of their Following this paradigm, a large number of research works
methods. Among these works, deep integration of blockchain is have emerged to solve several problems behind tracking. For
still scarce, especially in consensus and ledger design. example, by introducing deep features, Wojke et al. [14] de-
Considering the abovementioned observations, we conclude signed DeepSORT to enforce the ability to maintain object
that there remain two challenging questions in extension, trust identities. Based on their works, many researchers propose
and security, and efficiency with respect to MCMT: 1) build similar DeepSORT-based methods [15], [16]. However, such
an extendable MCMT framework that can adapt to large-scale methods have unsatisfactory performance in crowded scenes due
cameras, 2) take advantage of blockchain to coordinate the to the heavy occlusion of targets. To build a robust association
decentralized tracking process, while achieving the tradeoff mechanism, Kim et al. [17] revised multihypothesis tracking
between privacy and security as well as efficiency. (MHT), in which each target is described as a track tree. In a
In this article, we first illustrate the problem formulation word, SCMT pays more attention to the data association since
and define the global state of targets in multicamera. Then, it only needs to consider the video from one camera.
multicamera multihypothesis tracking (MC-MHT) is proposed
to perform tracking tasks distributedly. Further, in order to ensure
trust among multicamera, we design multicamera tracking chain B. Multicamera Multiobject Tracking
(MCTChain). The state transition of tracks is recorded as a trans- Compare to SCMT, MCMT focuses on holding the unique
action in the blockchain ledger, while all the tracking transac- identities of different targets among several cameras. It not
tions are validated by a deeply integrated consensus mechanism. only requires accurate data association in each camera, but
MCTChain enables robust cooperation among multicamera and also attaches great importance to the fusion of trajectories. In
ensures accurate tracking tasks. The experiments on real-world addition, the extension is also a core for practical applications of
scenes and CAMPUS dataset prove the efficiency and extension MCMT. In recent years, we have seen increasing efforts to solve
of our framework. Our methods can shed light on the new designs MCMT using two types of approaches including centralized and
of blockchain-based video processing. distributed methods.
The main contributions of this article are in the following four The centralized paradigm often takes a two-step strategy:
folders. 1) generating tracklets of all the targets for each camera, 2)
1) Propose an extendable MC-MHT framework to solve matching tracklets that belong to the same target across all the
MCMT. Instead of aggregating all raw video data, it cameras [18]. The first step is usually implemented by SCMT,
distributes tracking tasks to each camera, so as to be because of the great progress of SCMT in computer vision
expanded flexibly in edge computing environment. and pattern recognition. As discussed previously, the methods
2) Build MCTChain empowered collaborative architecture combing both appearance and motion information have become
to share tracking information and conduct tracking con- mainstream for multiobject tracking recently. For the second
sensus over distributed multiple cameras to reduce the step, various reidentification and multiview fusion methods have
risk of single-point failure or Byzantine behaviors. been proposed to match tracks among different cameras.
3) Construct a complete track model in MC-MHT and il- With the development of edge computing, the distributed
lustrate the transition of tracks from a novel transactions paradigm has attracted much attention for its better extension.
perspective. Distributed approaches operate with no fusion centers, thus
4) Evaluate the effectiveness of our method in a real-world improving the scalability and potentially reducing the commu-
scenario in terms of network flow, energy cost, etc. We nication bottlenecks and hardware requirements [19]. Schwager
also make a quantitative comparison on the CAMPUS et al. [20] proposed a decentralized development strategy for
dataset. Both quantitative and visualized results prove the robotic cameras. Yang et al. [21] build a real-time distributed
efficiency of our methods. MCMT system by introducing target management and DMMA.
The rest of this article is organized as follows. Related work However, such methods focus on the extension but ignore latent
is presented in Section II. In Section III, we describe MC-MHT security issues like connection failure.
WANG et al.: BLOCKCHAIN-EMPOWERED DISTRIBUTED MULTICAMERA MULTITARGET TRACKING IN EDGE COMPUTING 371
C. Blockchain Enabled Video Processing

Recent years have witnessed the growing demand for video
processing, especially in surveillance, or data-sharing ser-
vices [22], [23]. Whether surveillance or data-sharing should
protect the data from leakage or malicious attacks. As a classic
decentralized technology, blockchain has been wildly applied in
various fields beyond finance [24]. Due to its inherent attributes
of immutability and traceability, many works attempt to integrate
blockchain into video processing systems.
Serval startups are employing blockchain for video integra-
tion validation. For instance, Sarala et al. [25] proposed a novel
video integration method by combining a blockchain-based
message authentication code. Liu et al. [26] build a decentralized
framework with flexible monetization mechanisms for video Fig. 1. Basic state of tracks in MC-MHT.
streaming services. Sheng et al. [13] used blockchain to syn-
chronize information of targets and take PoW to achieve con-
sensus. Nevertheless, such methods are limited in applicability B. Global State and Assumption of Track
or extension. It is valuable to explore how to achieve seamless
integration of blockchain and video processing. In a real-world MCMT system, the target may go through the
area of different cameras. Therefore, it is necessary to build a
state transition of the target to describe such changes.
III. MULTICAMERA MULTIPLE HYPOTHESIS TRACKING
As shown in Fig. 1, there are five basic states of one target
FRAMEWORK
in our framework. Following the classical tracking-by-detection
A. Problem Formulation paradigm, the unmatched detections are initialized as tracks in
In our method, the raw video data is no longer uploaded to Init state. For some targets that only move in one camera, their
the server. Most of the computation is conducted locally. So in lifecycles are Init→ Tracked → Left →Deleted. The lifecycle
this section, we briefly illuminate the problem formulation in of this target behaves as a closed loop within this camera. Once
multicamera scenarios. an Init track is associated successfully in future frames, it is
Given a group of camera nodes N = {n1 , . . ., nC }, the video transferred to Tracked. When this target leaves the camera area,
data from a certain camera ni is denoted by Vi = {F1 , . . ., FL }. it can not get any new association, and is updated to Left and
Here, C is the number of cameras, Fi represents the ith frame deleted finally.
in the video, L represents the length of the video. Following the However, some targets may enter the camera area of another
tracking-by-detection paradigm, we utilize the famous object node. In this case, we define the InChain state. Assuming that
detectors, such as YOLO [27], to obtain the detection. In general, one target departs from camera ni and enters node nj , its typi-
a detection consists of the bounding box and corresponding cal lifecycle behaves as Init→ Tracked → Left → InChain →
image patch, which can be defined as follow: Left →Deleted. When it leaves node ni for a certain time, it is
regarded as Left by the system. Then it comes into the area
D = [x, y, w, h, p] (1) of nj . Once it is reidentified by nj , its state is transferred as
InChain, which indicates that this target moves in multicamera
where, x, y represents the coordinate of the top left point of the
scenes. When it goes out from nj , the state transition is similar
bounding box, w, h represent its weight and height. p represents
to before.
the corresponding image patch.
In a certain frame, we assume that there are M existing objects
and N detections from all cameras. Let D = D1 ∪ . . . ∪ DC = C. Distributed Multicamera Multiple Hypothesis Tracking
{D1 , . . ., DN } denotes the global detection set, T = T1 ∪ . . . ∪ As mentioned before, it is a core process for MCMT to
TC = {T1 , . . ., TM } denotes the object set. DC and TC are the build the association between detections and tracks. Inspired by
detection set and object set of the Cth camera, respectively. The previous progress [17], [28] in multiple hypothesis tracking, we
multiobject tracking is defined as an assignment problem propose MC-MHT to solve MCMT. The traditional centralized
D −→ T (2) paradigm aggregates all data into the server to perform associ-
ation. However, in MC-MHT, most computation of MCMT is
where, −→ represents the projection from D to T . conducted locally while the mergence among tracks is solved
In this assignment, the detection should be associated with by cooperation.
the correct object. Apart from this, it is also important to update As shown in Algorithm 1, the detections are extracted by the
T after each assignment. The objects left from the scenes detector from the local frame image at the beginning of the tth
are eliminated from this set while the newly entered ones are frame. Then, these detections are linked with the objects in the
appended. object set. Specifically, in order to judge whether new tracks
the object can not be matched with any detection due to detection
Algorithm 1: Distributed Multicamera Multiple Hypothesis
missing problem. In case of this, we build the skip branch
Tracking.
(b6 ). The detection is linked with leaves in the earlier frame
by skipping the previous frame.
Considering the limited computation resources in edge de-
vices, we leverage the motion information to prune the tree. The
number of branches can be further reduced. According to the
location and shape of the bounding box, we compute the motion
offset between each pair of detection and object as follows:
(Pi − Pj )(Pi − Pj )T
dm
i,j = √ (3)
3
where, P = [x, y, w/h] represents a three-dimensional vector
including the top-left coordinate and width/height ratio.
Since the velocity of an object usually is stable, the motion
offset of two consecutive frames should vary relatively slightly.
The branches with large motion offsets are filtered when dm i,j >
θm . Here, θm is a threshold.
The branches of each tree represent different potential tra-
jectories of the same target. Yet, there exist many conflicted
branches in this forest. It is obvious that one object can only be
assigned no more than one detection, because it is impossible
that one object appears in two positions simultaneously. Mean-
while, one detection should only be assigned with no more than
one leaf. Therefore, the optimal solution for this forest is to find
nonconflicted branches.
The whole forest can be transferred to a graph G =< V, E >,
where V is the vertex set representing leaves or detections,
E = {e = (i, j)|i ∈ V, j ∈ V, (i, j) is linked}. The weight of
each vertex is computed by balancing motion and appearance in-
formation smoothly. Due to the change in the light, occlusion, or
pose of the object, there exists some noise in the detection, which
results in the imprecise feature representation. Considering that
the detection confidence c reflects the quality of detection, we
Fig. 2. Illumination of MC-MHT.
combine motion offset and appearance similarity as follows:
f Di f T i
dai,j = (4)
come from other peers, each node fetches the remote tracks ||fDi || · ||fTj ||
from the blockchain (line 2). Then for each pair of local Init di,j = c · dai,j + (1 − c) · dm
i,j (5)
tracks and fetched tracks, this node conducts the ReID task to
recognize whether they belong to the same identities. Here, we where, dai,j is the appearance similarity between Di and Tj , Di
take account of feature-level ReID technology (in short, measure is the ith detection in D, Tj is the jth track tree in T , f represent
the similarity between two object features). The merged ones the appearance feature, di,j is the final vertex weight.
are packed as mergence proposals, which are broadcasted to the Thus, the optimal target of (2) in each frame can be defined
peer nodes. If the proposal is validated under tracking consensus as follows:

(discussed in Section IV-D), the id of Init tracks is modified to maximize ai,j di,j
the id of the fetched track, as shown in Algorithm 1 line 8. i∈D j∈T
Here, id is the numeric label of the object identity. The state
of the merged track is transferred to InChain. Afterward, the s.t. aij , j = 1, 2, 3, . . ., M (6)
detections are associated with tracks. i∈D
As shown in Fig. 2, MC-MHT creates a track tree for each
aij , i = 1, 2, 3, . . ., N
target. In each frame, the detections are linked with the leaves
j∈T
of existing trees. The branches, called “hypotheses,” represent
the potential association of one target. Considering that detec- where, ai,j ∈ {0, 1} represents that whether there exists an edge
tion failure is a common problem, we defined two types of ei,j in G.
branches including normal and skip. In Fig. 2, the branches This optimal function can be solved by the Hungarian al-
(b1 , b2 , b3 , b4 , b5 ) are defined as normal, since they are built in gorithm. The branches in the solution are preserved, while the
adjacent frames. But as the red dotted circles show, sometimes others are pruned finally. The unmatched detections are regarded
Fig. 3. System model and architecture of our MCMT system, which together integrates MC-MHT and MCTChain.
as newly entered objects (defined as Init) and initialized as the

Algorithm 2: Leader Election in MCTChain.
root of one new track tree. If an Init track obtains successful as-
sociation for Nt consecutive frames, it is transferred as Tracked.
The unmatched objects are marked with a missing label. If one
target has been marked missing for Nl = 5 × vr consecutive
frames, it is regarded as lost (defined as Left). Here, vr is the
frame rate of the video. The node packs a left transaction txlef t
(discussed in Section IV-C) to peers, in which the fetched tracks
are updated as Left. These tracks in Left will be recognized as
Deleted if they are still marked as missing for Nl consecutive
frames.
IV. MULTICAMERA TRACKING CHAIN

A. Design Objectives and System Model satisfies N = 3f + 1. We denote frequently used notations of
node, transaction, block, blockchain, chain of block headers, by
In this section, we present the cooperation mechanism un-
n, tx, B, BC, and BH, respectively.
derlying MC-MHT. Fig. 3 depicts the overall architecture of
MC-MHT and MCTChain. Edge cameras are deployed to per-
form trustworthy MCMT task in a large area. The architecture B. Initialization and Leader Election of MCTChain
of our system is divided into three layers including tracking, In MCTChain, each node creates a pair of public key pki and
blockchain, and edge network layer. The tracking layer performs private ski , and generates its identity label idi according to pki
MC-MHT while the blockchain layer is built on the edge net- with sha256. The information of the initial nodes is packed into
work. The edge nodes collaborate to maintain a permissioned the genesis block B0 . After initialization, the system begins to
blockchain that stores the tracking results in an immutable elect the leader of nodes who is in charge of packing transactions.
manner. Within such a system model, we briefly summarize our In order to simplify this election process, we take into account a
design goals. zero-knowledge proof method. Inspired by [29], we design the
1) Decentralization: MCMT in MCTChain should work in a leader election algorithm, as shown in Algorithm 2. All the node
decentralized network and the tracking task is performed takes the same random seed s as input, where the seed s is the
locally in each node. hash of the latest block. Each node utilizes this seed to generate
2) Balance of privacy: MCTChain should take less infor- a hash hi and proof pi according to VRF(ski , s). Afterward,
mation from each node as much as possible to reach a the node ni broadcasts hi and pi to its peers. Meanwhile, it
trade-off in privacy leakage and tracking efficiency. consistently receives the hash and proof from its peers. Once
3) Byzantine fault-tolerance: MCTChain can ensure conver- the node receives a pair of hash hj and proof pj from one peer
gence against at most f (N = 3f + 1) Byzantine nodes nj , it verifies these by VRF(pkj , hj , s, pj ). Then this peer is
caused by failures or attacks. added to the leader candidate list if it passes the verification.
The system model mentioned previously can be defined as Finally, the node with the least h is elected as the leader in this
an undirected and fully connected graph G =< V, E >. Here round.
V (|V | = N ) denotes the node set and E = {∀i ∈ V, j ∈
V, ∃(i, j) ∈ V } is the edge set. The entire system should
C. Tracking Transations in MCTChain
maintain the identity of the object in the surveillance field
of this system. We assume that there are at most f possible At each round, the node ni exchanges some necessary tracking
Byzantine nodes in the system, and the number of edge nodes information with other nodes in the blockchain. In Section III-B,
Algorithm 3: Blockchain Consensus.
Fig. 4. Basic structure of block. Each block contains serveral transac-

tions that consist of type, object id, feature, and timestamp.
we discuss the state transitions of targets as they move in the

tracking scenarios. In this section, we discuss the details of such
transitions.
Considering the tracking scenario aforementioned, the targets
in such a scenario can be divided into two types including
in-camera and multicamera. Obviously, the camera nodes only
need to share the information of those multicamera targets in
this system. Following this principle, we define some basic
transactions to record the ledger. As discussed before, once a
Tracked target has not been associated with any new detection for
Nl consecutive frames, it will be transferred as Left. Meanwhile,

its feature and id are packed as a left transaction T xlef t , as shown (t)
PREPARE, id, Bk , σ into a message, where σ is the sig-
in Fig. 3.
Naturally, this target may depart from the current node ni and nature of the concatenation of PREPARE and Bk . Then, it
enter into the camera area of a new node nj , as shown in Fig. 3. broadcast the message to the network. Otherwise, it becomes
Then it is recognized as an Init track at the beginning by nj . a follower and recognizes the node vl as the ledger (the idvl is
When it is merged with its identity in node ni , node nj posts a equal to vl ). It leverages Algorithm 4 to validate the transac-
new mergence transaction T xmerg including the new features tions when receiving a new block sent from the leader. For the
and object id to the blockchain, as shown in Fig. 4. After a mergence transaction, the validator calculates the ReID results
consensus, the state of this target is updated in the ledger. The according to its local model. Meanwhile, it checks the recorded
state of this target is transferred to InChain. Similarly, when it time difference in this transaction, since the updating of the track
enters into another new camera, its state is updated as InChain- state is written chronologically in the ledger. It is impossible
Left-InChain. The state of this target is transferred to InChain. that a track mergence occurs before the last update of this track.
Considering that there may be a gap area in nearby cameras, For the updating transaction, the validator checks whether the
the hyperparameter Nl is adjustable. The bigger Nl means the time difference satisfies δ ≤ 0. In addition, the left transaction
higher tolerance time of the object missing. Once a target holds is validated by comparing the time difference with a predefined
Left state for more than Nl consecutive frames, we believe that left threshold δl . Here, δl corresponds to aforementioned Nl .
this target has departed from the coverage area of cameras, and In order to avoid deadlock, each follower sets a timeout so
it is regarded as Deleted by the whole network. that can terminate the current round if Time() ≥ T0 + δ0 . Here,
Time() represents the current timestamp of the system, T0 is the
termination time of the previous round, and δ0 is an adjustable
D. Identity Convergence and Blockchain Consensus threshold.
In a practical scene, the camera node possibly suffers from a COMMIT. In this step, once a node receives
single-point failure problem (crash or Byzantine behavior). Both 2f + 1 valid messages, it broadcasts a commit message
the crash and Byzantine behavior prevent the multicamera tracks COMMIT, id, vote, σ . Then the system begins the DECIDE
from merging. Thus, we adopt the practical Byzantine fault phase.
tolerant [30] protocol as our consensus backbone. As the core DECIDE. In this phase, a node can add a new block to the
of MCTChain, the consensus determines how to append new blockchain until it receives 2f + 1 valid commit messages. By
blocks orderly and safely. The transactions from one node are this, the transactions mentioned in Section IV-C are validated
validated by its peers to resist the single-point failure problem. at least 2f + 1 peers, which makes MCMT more reliable and
The entire blockchain consensus consists of three main phases: safe. When adding the new block, the node updates the id of
PREPARE, COMMIT, and DECIDE. local tracks upon the transactions in the block. As shown in
PREPARE. During the kth round, all nodes run Algorithm Algorithm 1 line 7, the id of the local track is modified to the
2 to determine the leader of the current round. Each node remote merged ones.
checks whether it is the leader. If it appears as the leader With such a tracking consensus, the entire tracking system
node, it assembles the new block Bk and writes the proposal is Byzantine-resilient and works distributively. Compared to
Algorithm 4: Tracking Transaction Validation.
Fig. 5. Network flow and latency comparison of CS and MCTChain

when 4 ≤ N ≤ 20: (a) shows the relation between total network flow and
the number of cameras, (b) shows the latency change of three stages in
MCTChain as the number of cameras increases.
The evaluation metrics with ↑ indicate the higher the better, and
vice versa.
We evaluate the centralized system and MCTChain in the
aforementioned camera infrastructure from the following two
aspects. 1) Efficiency: we measure these two systems with
respect to total network flow, energy cost, and GPU utilization.
2) Accuracy: we validate the identity convergence and tracking
accuracy in real world scenarios and evaluate the tracking result
the centralized system, MCTChain not only resists single-point on a famous dataset with quantitative comparison. For simplicity
failure but also reduces the requirement for private raw video in our context, we denote the centralized system as “CS” in the
data. This realizes a tradeoff between accuracy and privacy. following experiments, and “MCTChain” denotes MCTChain
combined with MC-MHT.
V. EXPERIMENT
B. Efficiency Analysis
A. Implementation Details Network flow. We first compare the network flow of CS and
Hardware specification and hyperparameter setting. We MCTChain in a non-Byzantine environment. For CS, all the raw
implement MCTChain within Python code and adopt a P2P video data can be correctly uploaded to the central server. For
network for peer communications. In order to validate the effi- MCTChain, all the peers can correctly pass necessary messages
ciency of MCTChain, we also build a centralized MCMT system and are nonmalicious. As shown in Fig. 5(a), the network flow
to compare with MCTChain. The centralized system contains of CS is only influenced by the number of cameras and linearly
one powerful server and several cameras while MCTChain is increases. But for MCTChain, the network flow climbs very
composed of edge devices. As shown in Table I, the server is slowly. On the one hand, the amount of video data is no longer
equipped with AMD R9-5900X CPU processors, each having needed to be uploaded. On the other hand, the message size
12 CPU cores and 3.70 GHz frequency. The NVIDIA Jetson of MCTChain is very small. Although the peers in MCTChain
Nano is adopted to build edge cameras. Each of them has an should send messages to each other, the total flow is still less than
ARM A57 CPU with four processors and four threads, 4 GB CS. Further, we observe an interesting phenomenon that the flow
RAM and a micro Maxwell GPU. The max-power in this table of MCTChain is almost stable when N ≥ 15. The experiment in
contains both CPU and GPU power. There are a total of 20 Latency part just answers this question. According to the results
edge cameras used in our experiments. Both these two systems in Latency, the latency LBC of blockchain consensus increases
utilize the same video camera resources. The cameras are in along with the number of cameras. Here, the network flow (v)
1920×1080 resolution and 15 FPs. According to experimental is measured by (KB/s). Thus, the total flow (v × LBC ) becomes
experience, we recommend θm = 0.3, Nt = 5 and Nl = 5 × vr . higher in fact.
Here, vr is the frame rate. Energy cost. We also compare the energy cost of two types
Dataset and metric. We conduct both qualitative and quan- of systems. In each experiment, the hardware of CS is upgraded
titative experiments on a CAMPUS [31] dataset. It provides once reaching the capacity bottleneck. For CS, all the video
videos collected in several scenarios by cameras in different of cameras is processed by the central server. For MCTChain,
views. According to the attributes of subsets, we evaluate and all the camera is equipped with an edge device. As shown in
compare our methods on two subsets. The video data containing Fig. 6(a), the number of cameras increases from 4 since it should
15–25 pedestrians is recorded with 30 FPs and a resolution of satisfy N = 3f + 1(f ≥ 1). We observe that the energy cost
1920×1080. We take the widely adopted CLEAR [32] metric to of CS is obviously higher than MCTChain at the beginning.
measure tracking performance, including multiobject tracking Because both the GPU are CPU of the central server are high
accuracy (MOTA↑), multiobject tracking precision (MOTP↑), power even at idling, especially 3090Ti. However, distributed
the mostly tracked (MT↑), the mostly lost (ML↓), and FPs↑. processing greatly decreases the high cost. The maximum power
TABLE I
IMPLEMENTATION SPECIFICATIONS OF HARDWARES
Fig. 6. (a), (b), (c) are the power, memory usage, and FPs of CS and MCTChain when 4 ≤ N ≤ 20, (d) is the IDS error evaluation. (a) Power
usage comparison. (b) Memory usage comparison. (c) FPs comparison. (d) Identity switch error evaluation.
of the used edge device is 10 W, so the total energy cost of acceptable overhead for most scenes, and N is recommended
MCTChain is less than CS by a great gap. When we add more no more than 50. Meanwhile, the result in Fig. 6(c) shows that
cameras to the network, we find the energy cost of MCTChain the time efficiency of MCTChain is stable from four cameras
increases linearly since it needs a new edge device. When N = to 20 cameras (about 24 FPs). But the time efficiency of CS is
9, the energy cost of CS jumps to 839 since we have to add undulate. The reason is that total computing is conducted on the
another 3090Ti to support nine videos. Finally, we find that the server only. However, the computing speed decreases as GPU
energy cost of CS increases to 1591 when N = 20. utilization increases, so the FPs of CS drops when new cameras
To measure the utilization of GPU, we also report the GPU join unless a new GPU is added. This comparison proves the
usage in Fig. 6(b). The GPU memory usage of CS is close to efficiency of our lightweight MC-MHT.
full for every four camera, because one 3090Ti can afford four
cameras at most. If N is not multiples of four, its memory usage
is relatively lower. Based on this observation, we conclude the C. Tracking Performance Evaluation
quantitative relation between memory usage with the number of Tracking performance evaluation. We finally visualize the
cameras as follows: tracking results in a real-world edge computing environment. In
kc × C our experiments, we focus on the ability of MCTChain to keep
r≈ (7) the identities of targets across different cameras. The tracking
kG × M
results are shown in Fig. 7. In the beginning, all the targets
where, M is the maximum memory of one GPU, and C is the are assigned one unique id in camera c1 . They are tracked
memory requirement of one camera. kc and kG are the numbers correctly from 200th to 500th frame. Then three persons enter
of camera and GPU, respectively. For 3090Ti, M ≈ 4 × C. the area of c2 , and are recognized successfully as the red lines
But the memory of MCTChain is close to 100% all the time. show. Furthermore, they move into c3 and one woman leaves c3
Because the computing resource of MCTChain is used fully simultaneously. They are successfully recognized respectively
while the resource of CS is usually idle. The experimental results and assigned the same identities as in the previous cameras.
indicate that GPU is utilized more sufficiently in MCTChain. These results prove that MC-MHT can maintain the identities
Latency and time efficiency. To better illustrate the latency in multicamera scenes, even in long-term tracking. In addition,
of each round in MCTChain, we divide a round into three MC-MHT achieves outstanding performance under serious de-
stages: local tracking computation whose total time overhead formation or occlusions. For example, in the 5700th frame,
is denoted by TLCT , tracking transaction (TTX ), and blockchain although the tracker faces a serious deformation of the target
consensus (TBC ). As Fig. 5(b) shows, TLCT and TBC are in the (206), it still tracks the target stably. Similarly, two students
same order of magnitude, but TTX is 0.01 s so that almost can be are tracked successfully after entering into c4 . These results
ignored. TLCT does not grow with N because the main tracking show the accuracy of our methods in the real edge computing
task is computed locally. On the contrary, TBC depends on the environment.
communication among nodes, so it increases along with N . Identity convergence in presence of Byzantine nodes. To
When N = 20, the latency of blockchain consensus is close make a quantitative comparison of robustness and security, we
to 1 s. But it is not necessary for local tracking to wait for the count the number of identity switch (IDS) errors for both CS and
consensus, since it can be implemented asynchronously. In fact, MCTChain against Byzantine behavior. Here IDS error means
the consensus should be conducted before the target departs from that one target is assigned with a false identity label. In this group
the camera area. According to our experience, TBC < 5 s is an of experiments, N is set as 10 and BR ∈ {0%, 10%, 20%, 30%}
Fig. 7. Tracking results under multicamera scenes. For simplicity, only four cameras results are shown in this figure. The identical object is drawn
with the same color in all scenes.
TABLE II
TRACKING RESULTS ON CAMPUS DATASET. IN CASE OF THE LIMITEDLY PUBLISHED RESULTS OF SOME METHODS, WE MARK THESE VALUES AS “–”
considering f = 33% × N , where Byzantine ratio (BR) is the In Garden1, HTC fails to deal with the various motion pat-
existing Byzantine nodes over N . Specifically, BR = 0% repre- terns of targets. For STP and TRCTA, they can not track these
sents the non-Byzantine case, otherwise, the camera randomly occluded targets, thus obtaining lower MOTA. CFC constructs
sends false tracking information (such as false id). We evaluate a robust interaction mechanism among objects so achieves an
CS simultaneously but never realize Byzantine behavior. The excellent result on MOTA by 80.4. But our method conducts
entire results are shown in Fig. 6(d). We can observe that higher performance on most of the indicators. The MOTA indi-
MCTChain has lower IDS error than CS, since the identities cator of MCTChain is 2 higher than the second one. We attribute
recognition is validated by other peers. The different curve these results to the following reasons. On the one hand, the
shapes of MCTChain and CS reflect their ability to resist IDS. proposed method builds a complete association model by con-
Compared with CS, the curve of MCTChain can quickly stop structing a multiskip track tree for each target. The skip branches
climbing, because the false mergence transaction fails to be well describe the occluded targets during tracking. Thus, we
validated by peers. MCTChain can ensure lower error (reduced obtain the highest MT by 100%. On the other hand, both
by 71%–85%) even in the presence of 10%–30% Byzantine appearance and motion information are sufficiently exploited
nodes. It is clearly shown that MCTChain can still guarantee to conduct the fusion of trajectories. Naturally, MCTChain
a stable identity convergence and keep security against different reaches satisfying results on Garden2 as well. Compared to
levels of Byzantine attacks. others, our method is performed distributedly on each node.
Even though we only use light-edge devices, the time efficiency
of our method is higher than most of the others using powerful
servers.
D. Benchmark Comparison In addition, most of the other methods require extra camera
In order to validate the effectiveness of MC-MHT, we conduct information. For example, TRACTA needs a precalibrated ho-
the comparison on CAMPUS dataset. The entire results are mography transformation matrix, while HCT and STP depends
reported in Table II. Several famous algorithms are compared on 3-D spatial information. But ours only use the captured video
with ours. and has higher practical value than theirs.
VI. CONCLUSION [18] Y. He, X. Wei, X. Hong, W. Shi, and Y. Gong, “Multi-target multi-camera
tracking by tracklet-to-target assignment,” IEEE Trans. Image Process.,
MC-MHT equipped with MCTChain is a novel decentral- vol. 29, pp. 5191–5205, Mar. 2020.
ized MCMT system that can ensure efficiency while achieving [19] N. Anjum and A. Cavallaro, “Trajectory association and fusion across
partially overlapping cameras,” in Proc. IEEE Int. Conf. Adv. Video Signal
strong security and privacy guarantee. In MC-MHT, we design Based Surveill., 2009, pp. 201–206.
a multiskip strategy to improve the tracking accuracy in a single [20] M. Schwager, B. J. Julian, M. Angermann, and D. Rus, “Eyes in the sky:
edge camera. Meanwhile, it leverages blockchain to enhance Decentralized control for the deployment of robotic camera networks,”
Proc. IEEE, vol. 99, no. 9, pp. 1541–1561, Sep. 2011.
the credibility between edge cameras. We provide abundant [21] S. Yang, F. Ding, P. Li, and S. Hu, “Distributed multi-camera multi-
as well as rigorous experiment analysis on the effectiveness target association for real-time tracking,” Sci. Rep., vol. 12, 2022,
and efficiency of our methods. We also conduct the evaluation Art. no. 11052.
[22] L. Kong, X.-Y. Liu, H. Sheng, P. Zeng, and G. Chen, “Federated ten-
in a real-world scene and CAMPUS dataset to validate the sor mining for secure industrial Internet of Things,” IEEE Trans. Ind.
performance of MC-MHT combined with MCTChain. In our Inform., vol. 16, no. 3, pp. 2144–2153, Mar. 2020.
future work, we intend to explore the problem of realizing trust [23] Y. Tian, T. Li, J. Xiong, J. Ma, and C. Peng, “A blockchain-based machine
learning framework for edge services in IIoT,” IEEE Trans. Ind. Inform.,
cooperation and computing among heterogeneous edge camera vol. 18, no. 3, pp. 1918–1929, Mar. 2022.
devices. We want to sufficiently exploit edge resources, and [24] L. Feng, Y. Zhao, S. Guo, X. Qiu, W. Li, and P. Yu, “BAFL: A blockchain-
design more meaningful algorithms beyond MCMT. based asynchronous federated learning framework,” IEEE Trans. Comput.,
vol. 71, no. 5, pp. 1092–1103, May 2022.
[25] S. Ghimire, J. Y. Choi, and B. Lee, “Using blockchain for improved
video integrity verification,” IEEE Trans. Multimedia, vol. 22, no. 1,
REFERENCES pp. 108–121, Jan. 2020.
[26] M. Liu, F. R. Yu, Y. Teng, V. C. Leung, and M. Song, “Distributed resource
[1] M. Zhang, J. Cao, Y. Sahni, Q. Chen, S. Jiang, and L. Yang, “Blockchain-
allocation in blockchain-based video streaming systems with mobile edge
based collaborative edge intelligence for trustworthy and real-time video
computing,” IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 695–708,
surveillance,” IEEE Trans. Ind. Inform., vol. 19, no. 2, pp. 1623–1633,
Jan. 2019.
Feb. 2023.
[27] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once:
[2] H. Yang, J. Wen, X. Wu, L. He, and S. Mumtaz, “An efficient edge artificial
Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis.
intelligence multipedestrian tracking method with rank constraint,” IEEE
Pattern Recognit., 2016, pp. 779–788.
Trans. Ind. Inform., vol. 15, no. 7, pp. 4178–4188, Jul. 2019.
[28] Y. Zhang et al., “Long-term tracking with deep tracklet association,” IEEE
[3] S. Yang, F. Teich, and M. Baum, “Network flow labeling for extended
Trans. Image Process., vol. 29, pp. 6694–6706, May 2020.
target tracking PHD filters,” IEEE Trans. Ind. Inform., vol. 15, no. 7,
[29] M. Xu, Z. Zou, Y. Cheng, Q. Hu, D. Yu, and X. Cheng,
pp. 4164–4171, Jul. 2019.
“SPDL: A blockchain-enabled secure and privacy-preserving decen-
[4] Y. Lu, X. Huang, Y. Dai, S. Maharjan, and Y. Zhang, “Blockchain and
tralized learning system,” IEEE Trans. Comput., vol. 72, no. 2,
federated learning for privacy-preserved data sharing in industrial IoT,”
pp. 548–558, Feb. 2023.
IEEE Trans. Ind. Inform., vol. 16, no. 6, pp. 4177–4186, Jun. 2020.
[30] M. Castro et al., “Practical byzantine fault tolerance,” in Proc. 3rd Symp.
[5] C. Gao, Y. Wang, Y. Han, W. Chen, and L. Zhang, “An efficient and
Operating Syst. Des. Implementation, 1999, pp. 173–186.
intelligent video processing architecture for cloud-edge video streaming,”
[31] Y. Xu, X. Liu, L. Qin, and S.-C. Zhu, “Cross-view people tracking by
IEEE Trans. Comput., vol. 72, no. 1, pp. 264–277, Jan. 2023.
scene-centered spatio-temporal parsing,” in Proc. 31st AAAI Conf. Artif.
[6] A. B. Sada, M. A. Bouras, J. Ma, R. Huang, and H. Ning, “A distributed
Intell., 2017, pp. 4299–4305.
video analytics architecture based on edge-computing and federated learn-
[32] K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking
ing,” in Proc. IEEE Int. Conf. Dependable, Auton. Sec. Comput., 2019,
performance: The CLEAR MOT metrics,” EURASIP J. Image Video
pp. 215–220.
Process., vol. 2008, pp. 1–10, 2008.
[7] M. Alam et al., “Real-time smart parking systems integration in distributed
[33] Z. Zhang, J. Wu, X. Zhang, and C. Zhang, “Multi-target, multi-camera
its for smart cities,” J. Adv. Transp., vol. 2018, pp. 1–13, 2018.
tracking by hierarchical clustering: Recent progress on dukemtmc project,”
[8] A. A. Ouallane, A. Bahnasse, A. Bakali, and M. Talea, “Overview of road
2017, arXiv:1712.09531.
traffic management solutions based on IoT and AI,” Procedia Comput.
[34] N. Jiang, S. Bai, Y. Xu, C. Xing, Z. Zhou, and W. Wu, “Online inter-camera
Sci., vol. 198, pp. 518–523, 2022.
trajectory association exploiting person re-identification and camera topol-
[9] S. Guo, X. Hu, S. Guo, and X. Qiu, “Blockchain meets edge computing: A
ogy,” in Proc. ACM Int. Conf. Multimedia, 2018, pp. 1457–1465.
distributed and trusted authentication system,” IEEE Trans. Ind. Inform.,
[35] Y. Gan, R. Han, L. Yin, W. Feng, and S. Wang, “Self-supervised multi-view
vol. 16, no. 3, pp. 1972–1983, Mar. 2020.
multi-human association and tracking,” in Proc. ACM Int. Conf. Multime-
[10] Z. Ma, X. Wang, J. D. Kumar, K. Haneef, H. Gao, and Z. Wang, “A
dia, 2021, pp. 282–290.
blockchain-based trusted data management scheme in edge computing,”
[36] C. Li, J. Li, Y. Xie, J. Nie, T. Yang, and Z. Lu, “Calibration-
IEEE Trans. Ind. Inform., vol. 16, no. 3, pp. 2013–2021, Mar. 2020.
free cross-camera target association using interaction spatiotem-
[11] H. Xue, D. Chen, N. Zhang, H.-N. Dai, and K. Yu, “Integration of
poral consistency,” IEEE Trans. Multimedia, to be published,
blockchain and edge computing in Internet of Things: A survey,” Future
doi: 10.1109/TMM.2022.3205407.
Gener. Comput. Syst., vol. 144, pp. 307–326, Mar. 2023.
[12] Y. Wu, H.-N. Dai, and H. Wang, “Convergence of blockchain and edge
computing for secure and scalable IIoT critical infrastructures in industry
4.0,” IEEE Internet Things J., vol. 8, no. 4, pp. 2300–2317, Feb. 2021.
[13] H. Sheng et al., “Near-online tracking with co-occurrence constraints in
blockchain-based edge computing,” IEEE Internet Things J., vol. 8, no. 4,
pp. 2193–2207, Feb. 2021.
[14] N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking Shuai Wang received the B.S. degree in com-
with a deep association metric,” in Proc. IEEE Int. Conf. Image Process., puter science and technology from the School
2017, pp. 3645–3649. of Computer Science and Engineering, Beihang
[15] S. Wang, H. Sheng, D. Yang, Y. Zhang, Y. Wu, and S. Wang, “Extendable University, Beijing, China, in 2019, where he is
multiple nodes recurrent tracking framework with RTU,” IEEE Trans. currently working toward the Ph.D. degree in
Image Process., vol. 31, pp. 5257–5271, Jul. 2022. computer application technology.
[16] Y. Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, “FairMOT: On the His research focuses on computer vision,
fairness of detection and re-identification in multiple object tracking,” Int. and he is particularly interested in multi-
J. Comput. Vis., vol. 129, no. 11, pp. 3069–3087, 2021. ple object tracking. Contact him at shuai-
[17] C. Kim, F. Li, A. Ciptadi, and J. M. Rehg, “Multiple hypothesis tracking [email protected].
revisited,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 4696–4704.
Hao Sheng (Member, IEEE) received his B.S. in Jiahao Shen received the B.S. degree in com-
computer science and technology and Ph.D. de- puter science and technology from the China
grees in computer application from the School of University of Geosciences, Beijing, China, in
Computer Science and Engineering of Beihang 2019, and the master’s degree in computer sci-
University in 2003 and 2009, respectively. ence from the Beijing University of Posts and
He is currently the Professor and Ph.D. Su- Telecommunications, Beijing, China. He is cur-
pervisor with the School of Computer Science rently working toward the Ph.D. degree at Bei-
and Engineering, Beihang University. His re- hang University, Beijing, China.
search interests include computer vision, pat- His research interests include computer
tern recognition, and machine learning. Contact vision, multiple object tracking, and rein-
him at [email protected]. forcement learning. Contact him at jiahao-
[email protected].
Yang Zhang received his B.S. in computer sci-

ence and technology and Ph.D. degrees in com- Rongshan Chen received the B.S. degree in
puter application technology from the School of computer science and technology from Beihang
Computer Science and Engineering, Beihang University, Beijing, China, in 2019, where he is
University, Beijing, China, in 2014 and 2020, working toward the Ph.D. degree in computer
respectively. application technology.
He is currently an Associate Professor with His research interests include 3-D recon-
the College of Information Science and Technol- struction, artificial intelligence, and light field
ogy, Beijing University of Chemical Technology, disparity estimation. Contact him at rong-
Beijing, China. His research interests include [email protected].
computer vision and machine learning. Contact
him at [email protected].
Da Yang received the B.S. degree in com-

puter science and technology from the School
of Computer Science and Engineering, Beihang
University, Beijing, China, in 2012, where he is
currently working toward the Ph.D. degree. Con-
tact him at [email protected].

2023 Wang

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2023 Wang

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 20, NO.

1, JANUARY 2024 369

Blockchain-Empowered Distributed Multicamera

C. Blockchain Enabled Video Processing

as newly entered objects (defined as Init) and initialized as the

IV. MULTICAMERA TRACKING CHAIN

Algorithm 3: Blockchain Consensus.

Fig. 4. Basic structure of block. Each block contains serveral transac-

we discuss the state transitions of targets as they move in the

Algorithm 4: Tracking Transaction Validation.

Fig. 5. Network flow and latency comparison of CS and MCTChain

Yang Zhang received his B.S. in computer sci-

Da Yang received the B.S. degree in com-

You might also like