Professional Documents
Culture Documents
2023 Wang
2023 Wang
Abstract—The rapid increase in the volume of video data Index Terms—Distributed multicamera multiobject track-
generated from edges in the Industrial Internet of Things, ing (MCMT), edge computing, edge intelligence, multicam-
opens up new possibilities for enhancing the application era tracking chain (MCTChain).
of video service. Multicamera multiobject tracking (MCMT)
has always been a fundamental task in video surveillance or
traffic control. However, the traditional MCMT methods are
limited by the communication bottleneck and computation I. INTRODUCTION
resources of the centralized curator, and suffer from secu- HE amount of video data generated by the camera in the
rity and privacy issues. In this article, we first design mul-
ticamera multihypothesis tracking (MC-MHT) framework to
achieve real-time tracking performance among edge cam-
T modern city has brought inestimable value. Multicamera
multiobject tracking (MCMT), as a basic problem, has been a hot
eras. The complex association of objects is described by topic in past years. However, the traditional centralized MCMT
multiskip trees. The tracking task is well distributed to each paradigm can not adapt to the current huge scale of video and
camera. Then, we integrate multicamera tracking chain into suffers from the latent single failure point problem.
MC-MHT to ensure security and trust. The state transition
of targets in multicamera is illustrated from the perspective Centralized MCMT needs to collect the raw video from differ-
of blockchain transactions. The transactions are validated ent cameras, which first brings strict requirements to hardware.
by an integrated tracking consensus to counter Byzantine However, such paradigm incurs bandwidth congestion, low scal-
behavior. Numerical results derived from real-world scenar- ability, and high latency [1]. Second, the traditional MCMT sys-
ios and CAMPUS dataset show that the proposed method tems [2], [3] face serious privacy and security problems. On the
achieves real-time performance (24–36 FPs) and 79.0–82.4
MOTA indicator, as well as reduces identity switch errors one hand, the security and robustness of centralized computation
about 71% under Byzantine attack. are limited by single-point failure, which is intolerant in real
industrial environments. On the other hand, data leakage may
take place during data storage, transmission and sharing, leading
to serious issues for data owners and the administrator [4].
Manuscript received 16 November 2022; revised 6 February 2023; The video data contains too much private information so the
accepted 15 March 2023. Date of publication 28 March 2023; date of data owners are unwilling to upload the raw video, which is
current version 11 December 2023. This work was supported in part by contradictory with centralized computation. These dilemmas
the National Key R&D Program of China under Grant 2021YFB2104800,
in part by the National Natural Science Foundation of China under are the main obstacles to applying MCMT in real industrial
Grant 61872025, in part by the Science and Technology Development environments.
Fund, Macau SAR under Grant 0001/2018/AFJ, and in part by the Open Driven by the Industrial Internet of Things (IIoT), some
Fund of the State Key Laboratory of Software Development Environ-
ment under Grant SKLSDE-2021ZX-03. Paper no. TII-22-4714. (Shuai researchers [1], [5] have studied distributed technology and edge
Wang and Yang Zhang are co-first authors.) (Corresponding author: Hao computing to solve the problems aforementioned. Delegating
Sheng.) workloads of the centralized server to the nearer edge computing
Shuai Wang, Da Yang, Jiahao Shen, and Rongshan Chen are
with the State Key Laboratory of Virtual Reality Technology and Sys- nodes is becoming the trend [6], which sufficiently takes advan-
tems, School of Computer Science and Engineering, Beihang Uni- tage of distributed camera deployment. The raw video data are
versity, Beijing 100191, China, and also with the Beihang Hangzhou no longer required to be uploaded. Some works [7], [8] utilize the
Innovation Institute Yuhang, Hangzhou 310023, China (e-mail:
[email protected]; [email protected]; jiahaoplus@buaa. distributed camera deployment and achieve remarkable success
edu.cn; [email protected]). in many filed like traffic management. However, although dis-
Hao Sheng is with the State Key Laboratory of Virtual Reality Technol- tributed technology solves the extension of the MCMT system,
ogy and Systems, School of Computer Science and Engineering, Bei-
hang University, Beijing 100191, China, also with the Beihang Hangzhou it brings strict requirements on algorithms. Meanwhile, trust
Innovation Institute Yuhang, Hangzhou 310023, China, and also with among multicamera has become another core concern. In this
the Faculty of Applied Sciences, Macao Polytechnic University, Macao regard, existing works mainly focus on the local target associ-
999078, China (e-mail: [email protected]).
Yang Zhang is with the College of Information Science and Technol- ation, without considering the tolerance of single-point failure
ogy, Beijing University of Chemical Technology, Beijing 100029, China or potential attack. But in practice, one camera may crash due
(e-mail: [email protected]). to the complex communication environment. In some extreme
Color versions of one or more figures in this article are available at
https://doi.org/10.1109/TII.2023.3261890. cases, the edge device may suffer malicious attacks and transfer
Digital Object Identifier 10.1109/TII.2023.3261890 false target information, such as features to the MCMT system,
1551-3203 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 07,2024 at 08:19:50 UTC from IEEE Xplore. Restrictions apply.
370 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 20, NO. 1, JANUARY 2024
which results in the recognization failure of targets. Therefore, framework in great detail. In Section IV, we introduce the
how to realize trustworthy cooperation among multiparty can be implementation of MCTChain including the system model,
a big challenge. transaction, and consensus algorithm. We report the evaluation
Emerging blockchain technology has become an important in Section V and finally, Section VI concludes this aticle.
way to ensure trustworthy cooperation [9], [10]. Its decen-
tralization, transparency, traceability, and immutability can en- II. RELATED WORK
hance the data integrity, the fairness, and authenticity of multi-
party [11]. Therefore, some works adopt blockchain to improve A. Single-Camera Multiobject Tracking (SCMT)
the security of data processing. Wu et al. [12] showed how the SCMT refers to the process of continuously identifying,
convergence of these two paradigms can enable security and locating, and maintaining the consistency of multiple targets
scalability in IIoT. Sheng et al. [13] integrated blockchain into of interest in a single camera area. In past years, tracking-by-
cross-camera tracking and design STCCNet. They pay more detection has become a popular paradigm for its clear workflow.
attention to the feature presentation of targets and directly adopt It first obtains the detections from video, then links detections
proof-of-work (PoW). However, heavy resource requirements to completed trajectories.
for solving mining puzzles in PoW, limit the applicability of their Following this paradigm, a large number of research works
methods. Among these works, deep integration of blockchain is have emerged to solve several problems behind tracking. For
still scarce, especially in consensus and ledger design. example, by introducing deep features, Wojke et al. [14] de-
Considering the abovementioned observations, we conclude signed DeepSORT to enforce the ability to maintain object
that there remain two challenging questions in extension, trust identities. Based on their works, many researchers propose
and security, and efficiency with respect to MCMT: 1) build similar DeepSORT-based methods [15], [16]. However, such
an extendable MCMT framework that can adapt to large-scale methods have unsatisfactory performance in crowded scenes due
cameras, 2) take advantage of blockchain to coordinate the to the heavy occlusion of targets. To build a robust association
decentralized tracking process, while achieving the tradeoff mechanism, Kim et al. [17] revised multihypothesis tracking
between privacy and security as well as efficiency. (MHT), in which each target is described as a track tree. In a
In this article, we first illustrate the problem formulation word, SCMT pays more attention to the data association since
and define the global state of targets in multicamera. Then, it only needs to consider the video from one camera.
multicamera multihypothesis tracking (MC-MHT) is proposed
to perform tracking tasks distributedly. Further, in order to ensure
trust among multicamera, we design multicamera tracking chain B. Multicamera Multiobject Tracking
(MCTChain). The state transition of tracks is recorded as a trans- Compare to SCMT, MCMT focuses on holding the unique
action in the blockchain ledger, while all the tracking transac- identities of different targets among several cameras. It not
tions are validated by a deeply integrated consensus mechanism. only requires accurate data association in each camera, but
MCTChain enables robust cooperation among multicamera and also attaches great importance to the fusion of trajectories. In
ensures accurate tracking tasks. The experiments on real-world addition, the extension is also a core for practical applications of
scenes and CAMPUS dataset prove the efficiency and extension MCMT. In recent years, we have seen increasing efforts to solve
of our framework. Our methods can shed light on the new designs MCMT using two types of approaches including centralized and
of blockchain-based video processing. distributed methods.
The main contributions of this article are in the following four The centralized paradigm often takes a two-step strategy:
folders. 1) generating tracklets of all the targets for each camera, 2)
1) Propose an extendable MC-MHT framework to solve matching tracklets that belong to the same target across all the
MCMT. Instead of aggregating all raw video data, it cameras [18]. The first step is usually implemented by SCMT,
distributes tracking tasks to each camera, so as to be because of the great progress of SCMT in computer vision
expanded flexibly in edge computing environment. and pattern recognition. As discussed previously, the methods
2) Build MCTChain empowered collaborative architecture combing both appearance and motion information have become
to share tracking information and conduct tracking con- mainstream for multiobject tracking recently. For the second
sensus over distributed multiple cameras to reduce the step, various reidentification and multiview fusion methods have
risk of single-point failure or Byzantine behaviors. been proposed to match tracks among different cameras.
3) Construct a complete track model in MC-MHT and il- With the development of edge computing, the distributed
lustrate the transition of tracks from a novel transactions paradigm has attracted much attention for its better extension.
perspective. Distributed approaches operate with no fusion centers, thus
4) Evaluate the effectiveness of our method in a real-world improving the scalability and potentially reducing the commu-
scenario in terms of network flow, energy cost, etc. We nication bottlenecks and hardware requirements [19]. Schwager
also make a quantitative comparison on the CAMPUS et al. [20] proposed a decentralized development strategy for
dataset. Both quantitative and visualized results prove the robotic cameras. Yang et al. [21] build a real-time distributed
efficiency of our methods. MCMT system by introducing target management and DMMA.
The rest of this article is organized as follows. Related work However, such methods focus on the extension but ignore latent
is presented in Section II. In Section III, we describe MC-MHT security issues like connection failure.
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 07,2024 at 08:19:50 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: BLOCKCHAIN-EMPOWERED DISTRIBUTED MULTICAMERA MULTITARGET TRACKING IN EDGE COMPUTING 371
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 07,2024 at 08:19:50 UTC from IEEE Xplore. Restrictions apply.
372 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 20, NO. 1, JANUARY 2024
the object can not be matched with any detection due to detection
Algorithm 1: Distributed Multicamera Multiple Hypothesis
missing problem. In case of this, we build the skip branch
Tracking.
(b6 ). The detection is linked with leaves in the earlier frame
by skipping the previous frame.
Considering the limited computation resources in edge de-
vices, we leverage the motion information to prune the tree. The
number of branches can be further reduced. According to the
location and shape of the bounding box, we compute the motion
offset between each pair of detection and object as follows:
(Pi − Pj )(Pi − Pj )T
dm
i,j = √ (3)
3
where, P = [x, y, w/h] represents a three-dimensional vector
including the top-left coordinate and width/height ratio.
Since the velocity of an object usually is stable, the motion
offset of two consecutive frames should vary relatively slightly.
The branches with large motion offsets are filtered when dm i,j >
θm . Here, θm is a threshold.
The branches of each tree represent different potential tra-
jectories of the same target. Yet, there exist many conflicted
branches in this forest. It is obvious that one object can only be
assigned no more than one detection, because it is impossible
that one object appears in two positions simultaneously. Mean-
while, one detection should only be assigned with no more than
one leaf. Therefore, the optimal solution for this forest is to find
nonconflicted branches.
The whole forest can be transferred to a graph G =< V, E >,
where V is the vertex set representing leaves or detections,
E = {e = (i, j)|i ∈ V, j ∈ V, (i, j) is linked}. The weight of
each vertex is computed by balancing motion and appearance in-
formation smoothly. Due to the change in the light, occlusion, or
pose of the object, there exists some noise in the detection, which
results in the imprecise feature representation. Considering that
the detection confidence c reflects the quality of detection, we
Fig. 2. Illumination of MC-MHT.
combine motion offset and appearance similarity as follows:
f Di f T i
dai,j = (4)
come from other peers, each node fetches the remote tracks ||fDi || · ||fTj ||
from the blockchain (line 2). Then for each pair of local Init di,j = c · dai,j + (1 − c) · dm
i,j (5)
tracks and fetched tracks, this node conducts the ReID task to
recognize whether they belong to the same identities. Here, we where, dai,j is the appearance similarity between Di and Tj , Di
take account of feature-level ReID technology (in short, measure is the ith detection in D, Tj is the jth track tree in T , f represent
the similarity between two object features). The merged ones the appearance feature, di,j is the final vertex weight.
are packed as mergence proposals, which are broadcasted to the Thus, the optimal target of (2) in each frame can be defined
peer nodes. If the proposal is validated under tracking consensus as follows:
(discussed in Section IV-D), the id of Init tracks is modified to maximize ai,j di,j
the id of the fetched track, as shown in Algorithm 1 line 8. i∈D j∈T
Here, id is the numeric label of the object identity. The state
of the merged track is transferred to InChain. Afterward, the s.t. aij , j = 1, 2, 3, . . ., M (6)
detections are associated with tracks. i∈D
As shown in Fig. 2, MC-MHT creates a track tree for each
aij , i = 1, 2, 3, . . ., N
target. In each frame, the detections are linked with the leaves
j∈T
of existing trees. The branches, called “hypotheses,” represent
the potential association of one target. Considering that detec- where, ai,j ∈ {0, 1} represents that whether there exists an edge
tion failure is a common problem, we defined two types of ei,j in G.
branches including normal and skip. In Fig. 2, the branches This optimal function can be solved by the Hungarian al-
(b1 , b2 , b3 , b4 , b5 ) are defined as normal, since they are built in gorithm. The branches in the solution are preserved, while the
adjacent frames. But as the red dotted circles show, sometimes others are pruned finally. The unmatched detections are regarded
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 07,2024 at 08:19:50 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: BLOCKCHAIN-EMPOWERED DISTRIBUTED MULTICAMERA MULTITARGET TRACKING IN EDGE COMPUTING 373
Fig. 3. System model and architecture of our MCMT system, which together integrates MC-MHT and MCTChain.
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 07,2024 at 08:19:50 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: BLOCKCHAIN-EMPOWERED DISTRIBUTED MULTICAMERA MULTITARGET TRACKING IN EDGE COMPUTING 375
The evaluation metrics with ↑ indicate the higher the better, and
vice versa.
We evaluate the centralized system and MCTChain in the
aforementioned camera infrastructure from the following two
aspects. 1) Efficiency: we measure these two systems with
respect to total network flow, energy cost, and GPU utilization.
2) Accuracy: we validate the identity convergence and tracking
accuracy in real world scenarios and evaluate the tracking result
the centralized system, MCTChain not only resists single-point on a famous dataset with quantitative comparison. For simplicity
failure but also reduces the requirement for private raw video in our context, we denote the centralized system as “CS” in the
data. This realizes a tradeoff between accuracy and privacy. following experiments, and “MCTChain” denotes MCTChain
combined with MC-MHT.
V. EXPERIMENT
B. Efficiency Analysis
A. Implementation Details Network flow. We first compare the network flow of CS and
Hardware specification and hyperparameter setting. We MCTChain in a non-Byzantine environment. For CS, all the raw
implement MCTChain within Python code and adopt a P2P video data can be correctly uploaded to the central server. For
network for peer communications. In order to validate the effi- MCTChain, all the peers can correctly pass necessary messages
ciency of MCTChain, we also build a centralized MCMT system and are nonmalicious. As shown in Fig. 5(a), the network flow
to compare with MCTChain. The centralized system contains of CS is only influenced by the number of cameras and linearly
one powerful server and several cameras while MCTChain is increases. But for MCTChain, the network flow climbs very
composed of edge devices. As shown in Table I, the server is slowly. On the one hand, the amount of video data is no longer
equipped with AMD R9-5900X CPU processors, each having needed to be uploaded. On the other hand, the message size
12 CPU cores and 3.70 GHz frequency. The NVIDIA Jetson of MCTChain is very small. Although the peers in MCTChain
Nano is adopted to build edge cameras. Each of them has an should send messages to each other, the total flow is still less than
ARM A57 CPU with four processors and four threads, 4 GB CS. Further, we observe an interesting phenomenon that the flow
RAM and a micro Maxwell GPU. The max-power in this table of MCTChain is almost stable when N ≥ 15. The experiment in
contains both CPU and GPU power. There are a total of 20 Latency part just answers this question. According to the results
edge cameras used in our experiments. Both these two systems in Latency, the latency LBC of blockchain consensus increases
utilize the same video camera resources. The cameras are in along with the number of cameras. Here, the network flow (v)
1920×1080 resolution and 15 FPs. According to experimental is measured by (KB/s). Thus, the total flow (v × LBC ) becomes
experience, we recommend θm = 0.3, Nt = 5 and Nl = 5 × vr . higher in fact.
Here, vr is the frame rate. Energy cost. We also compare the energy cost of two types
Dataset and metric. We conduct both qualitative and quan- of systems. In each experiment, the hardware of CS is upgraded
titative experiments on a CAMPUS [31] dataset. It provides once reaching the capacity bottleneck. For CS, all the video
videos collected in several scenarios by cameras in different of cameras is processed by the central server. For MCTChain,
views. According to the attributes of subsets, we evaluate and all the camera is equipped with an edge device. As shown in
compare our methods on two subsets. The video data containing Fig. 6(a), the number of cameras increases from 4 since it should
15–25 pedestrians is recorded with 30 FPs and a resolution of satisfy N = 3f + 1(f ≥ 1). We observe that the energy cost
1920×1080. We take the widely adopted CLEAR [32] metric to of CS is obviously higher than MCTChain at the beginning.
measure tracking performance, including multiobject tracking Because both the GPU are CPU of the central server are high
accuracy (MOTA↑), multiobject tracking precision (MOTP↑), power even at idling, especially 3090Ti. However, distributed
the mostly tracked (MT↑), the mostly lost (ML↓), and FPs↑. processing greatly decreases the high cost. The maximum power
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 07,2024 at 08:19:50 UTC from IEEE Xplore. Restrictions apply.
376 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 20, NO. 1, JANUARY 2024
TABLE I
IMPLEMENTATION SPECIFICATIONS OF HARDWARES
Fig. 6. (a), (b), (c) are the power, memory usage, and FPs of CS and MCTChain when 4 ≤ N ≤ 20, (d) is the IDS error evaluation. (a) Power
usage comparison. (b) Memory usage comparison. (c) FPs comparison. (d) Identity switch error evaluation.
of the used edge device is 10 W, so the total energy cost of acceptable overhead for most scenes, and N is recommended
MCTChain is less than CS by a great gap. When we add more no more than 50. Meanwhile, the result in Fig. 6(c) shows that
cameras to the network, we find the energy cost of MCTChain the time efficiency of MCTChain is stable from four cameras
increases linearly since it needs a new edge device. When N = to 20 cameras (about 24 FPs). But the time efficiency of CS is
9, the energy cost of CS jumps to 839 since we have to add undulate. The reason is that total computing is conducted on the
another 3090Ti to support nine videos. Finally, we find that the server only. However, the computing speed decreases as GPU
energy cost of CS increases to 1591 when N = 20. utilization increases, so the FPs of CS drops when new cameras
To measure the utilization of GPU, we also report the GPU join unless a new GPU is added. This comparison proves the
usage in Fig. 6(b). The GPU memory usage of CS is close to efficiency of our lightweight MC-MHT.
full for every four camera, because one 3090Ti can afford four
cameras at most. If N is not multiples of four, its memory usage
is relatively lower. Based on this observation, we conclude the C. Tracking Performance Evaluation
quantitative relation between memory usage with the number of Tracking performance evaluation. We finally visualize the
cameras as follows: tracking results in a real-world edge computing environment. In
kc × C our experiments, we focus on the ability of MCTChain to keep
r≈ (7) the identities of targets across different cameras. The tracking
kG × M
results are shown in Fig. 7. In the beginning, all the targets
where, M is the maximum memory of one GPU, and C is the are assigned one unique id in camera c1 . They are tracked
memory requirement of one camera. kc and kG are the numbers correctly from 200th to 500th frame. Then three persons enter
of camera and GPU, respectively. For 3090Ti, M ≈ 4 × C. the area of c2 , and are recognized successfully as the red lines
But the memory of MCTChain is close to 100% all the time. show. Furthermore, they move into c3 and one woman leaves c3
Because the computing resource of MCTChain is used fully simultaneously. They are successfully recognized respectively
while the resource of CS is usually idle. The experimental results and assigned the same identities as in the previous cameras.
indicate that GPU is utilized more sufficiently in MCTChain. These results prove that MC-MHT can maintain the identities
Latency and time efficiency. To better illustrate the latency in multicamera scenes, even in long-term tracking. In addition,
of each round in MCTChain, we divide a round into three MC-MHT achieves outstanding performance under serious de-
stages: local tracking computation whose total time overhead formation or occlusions. For example, in the 5700th frame,
is denoted by TLCT , tracking transaction (TTX ), and blockchain although the tracker faces a serious deformation of the target
consensus (TBC ). As Fig. 5(b) shows, TLCT and TBC are in the (206), it still tracks the target stably. Similarly, two students
same order of magnitude, but TTX is 0.01 s so that almost can be are tracked successfully after entering into c4 . These results
ignored. TLCT does not grow with N because the main tracking show the accuracy of our methods in the real edge computing
task is computed locally. On the contrary, TBC depends on the environment.
communication among nodes, so it increases along with N . Identity convergence in presence of Byzantine nodes. To
When N = 20, the latency of blockchain consensus is close make a quantitative comparison of robustness and security, we
to 1 s. But it is not necessary for local tracking to wait for the count the number of identity switch (IDS) errors for both CS and
consensus, since it can be implemented asynchronously. In fact, MCTChain against Byzantine behavior. Here IDS error means
the consensus should be conducted before the target departs from that one target is assigned with a false identity label. In this group
the camera area. According to our experience, TBC < 5 s is an of experiments, N is set as 10 and BR ∈ {0%, 10%, 20%, 30%}
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 07,2024 at 08:19:50 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: BLOCKCHAIN-EMPOWERED DISTRIBUTED MULTICAMERA MULTITARGET TRACKING IN EDGE COMPUTING 377
Fig. 7. Tracking results under multicamera scenes. For simplicity, only four cameras results are shown in this figure. The identical object is drawn
with the same color in all scenes.
TABLE II
TRACKING RESULTS ON CAMPUS DATASET. IN CASE OF THE LIMITEDLY PUBLISHED RESULTS OF SOME METHODS, WE MARK THESE VALUES AS “–”
considering f = 33% × N , where Byzantine ratio (BR) is the In Garden1, HTC fails to deal with the various motion pat-
existing Byzantine nodes over N . Specifically, BR = 0% repre- terns of targets. For STP and TRCTA, they can not track these
sents the non-Byzantine case, otherwise, the camera randomly occluded targets, thus obtaining lower MOTA. CFC constructs
sends false tracking information (such as false id). We evaluate a robust interaction mechanism among objects so achieves an
CS simultaneously but never realize Byzantine behavior. The excellent result on MOTA by 80.4. But our method conducts
entire results are shown in Fig. 6(d). We can observe that higher performance on most of the indicators. The MOTA indi-
MCTChain has lower IDS error than CS, since the identities cator of MCTChain is 2 higher than the second one. We attribute
recognition is validated by other peers. The different curve these results to the following reasons. On the one hand, the
shapes of MCTChain and CS reflect their ability to resist IDS. proposed method builds a complete association model by con-
Compared with CS, the curve of MCTChain can quickly stop structing a multiskip track tree for each target. The skip branches
climbing, because the false mergence transaction fails to be well describe the occluded targets during tracking. Thus, we
validated by peers. MCTChain can ensure lower error (reduced obtain the highest MT by 100%. On the other hand, both
by 71%–85%) even in the presence of 10%–30% Byzantine appearance and motion information are sufficiently exploited
nodes. It is clearly shown that MCTChain can still guarantee to conduct the fusion of trajectories. Naturally, MCTChain
a stable identity convergence and keep security against different reaches satisfying results on Garden2 as well. Compared to
levels of Byzantine attacks. others, our method is performed distributedly on each node.
Even though we only use light-edge devices, the time efficiency
of our method is higher than most of the others using powerful
servers.
D. Benchmark Comparison In addition, most of the other methods require extra camera
In order to validate the effectiveness of MC-MHT, we conduct information. For example, TRACTA needs a precalibrated ho-
the comparison on CAMPUS dataset. The entire results are mography transformation matrix, while HCT and STP depends
reported in Table II. Several famous algorithms are compared on 3-D spatial information. But ours only use the captured video
with ours. and has higher practical value than theirs.
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 07,2024 at 08:19:50 UTC from IEEE Xplore. Restrictions apply.
378 IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. 20, NO. 1, JANUARY 2024
VI. CONCLUSION [18] Y. He, X. Wei, X. Hong, W. Shi, and Y. Gong, “Multi-target multi-camera
tracking by tracklet-to-target assignment,” IEEE Trans. Image Process.,
MC-MHT equipped with MCTChain is a novel decentral- vol. 29, pp. 5191–5205, Mar. 2020.
ized MCMT system that can ensure efficiency while achieving [19] N. Anjum and A. Cavallaro, “Trajectory association and fusion across
partially overlapping cameras,” in Proc. IEEE Int. Conf. Adv. Video Signal
strong security and privacy guarantee. In MC-MHT, we design Based Surveill., 2009, pp. 201–206.
a multiskip strategy to improve the tracking accuracy in a single [20] M. Schwager, B. J. Julian, M. Angermann, and D. Rus, “Eyes in the sky:
edge camera. Meanwhile, it leverages blockchain to enhance Decentralized control for the deployment of robotic camera networks,”
Proc. IEEE, vol. 99, no. 9, pp. 1541–1561, Sep. 2011.
the credibility between edge cameras. We provide abundant [21] S. Yang, F. Ding, P. Li, and S. Hu, “Distributed multi-camera multi-
as well as rigorous experiment analysis on the effectiveness target association for real-time tracking,” Sci. Rep., vol. 12, 2022,
and efficiency of our methods. We also conduct the evaluation Art. no. 11052.
[22] L. Kong, X.-Y. Liu, H. Sheng, P. Zeng, and G. Chen, “Federated ten-
in a real-world scene and CAMPUS dataset to validate the sor mining for secure industrial Internet of Things,” IEEE Trans. Ind.
performance of MC-MHT combined with MCTChain. In our Inform., vol. 16, no. 3, pp. 2144–2153, Mar. 2020.
future work, we intend to explore the problem of realizing trust [23] Y. Tian, T. Li, J. Xiong, J. Ma, and C. Peng, “A blockchain-based machine
learning framework for edge services in IIoT,” IEEE Trans. Ind. Inform.,
cooperation and computing among heterogeneous edge camera vol. 18, no. 3, pp. 1918–1929, Mar. 2022.
devices. We want to sufficiently exploit edge resources, and [24] L. Feng, Y. Zhao, S. Guo, X. Qiu, W. Li, and P. Yu, “BAFL: A blockchain-
design more meaningful algorithms beyond MCMT. based asynchronous federated learning framework,” IEEE Trans. Comput.,
vol. 71, no. 5, pp. 1092–1103, May 2022.
[25] S. Ghimire, J. Y. Choi, and B. Lee, “Using blockchain for improved
video integrity verification,” IEEE Trans. Multimedia, vol. 22, no. 1,
REFERENCES pp. 108–121, Jan. 2020.
[26] M. Liu, F. R. Yu, Y. Teng, V. C. Leung, and M. Song, “Distributed resource
[1] M. Zhang, J. Cao, Y. Sahni, Q. Chen, S. Jiang, and L. Yang, “Blockchain-
allocation in blockchain-based video streaming systems with mobile edge
based collaborative edge intelligence for trustworthy and real-time video
computing,” IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 695–708,
surveillance,” IEEE Trans. Ind. Inform., vol. 19, no. 2, pp. 1623–1633,
Jan. 2019.
Feb. 2023.
[27] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once:
[2] H. Yang, J. Wen, X. Wu, L. He, and S. Mumtaz, “An efficient edge artificial
Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis.
intelligence multipedestrian tracking method with rank constraint,” IEEE
Pattern Recognit., 2016, pp. 779–788.
Trans. Ind. Inform., vol. 15, no. 7, pp. 4178–4188, Jul. 2019.
[28] Y. Zhang et al., “Long-term tracking with deep tracklet association,” IEEE
[3] S. Yang, F. Teich, and M. Baum, “Network flow labeling for extended
Trans. Image Process., vol. 29, pp. 6694–6706, May 2020.
target tracking PHD filters,” IEEE Trans. Ind. Inform., vol. 15, no. 7,
[29] M. Xu, Z. Zou, Y. Cheng, Q. Hu, D. Yu, and X. Cheng,
pp. 4164–4171, Jul. 2019.
“SPDL: A blockchain-enabled secure and privacy-preserving decen-
[4] Y. Lu, X. Huang, Y. Dai, S. Maharjan, and Y. Zhang, “Blockchain and
tralized learning system,” IEEE Trans. Comput., vol. 72, no. 2,
federated learning for privacy-preserved data sharing in industrial IoT,”
pp. 548–558, Feb. 2023.
IEEE Trans. Ind. Inform., vol. 16, no. 6, pp. 4177–4186, Jun. 2020.
[30] M. Castro et al., “Practical byzantine fault tolerance,” in Proc. 3rd Symp.
[5] C. Gao, Y. Wang, Y. Han, W. Chen, and L. Zhang, “An efficient and
Operating Syst. Des. Implementation, 1999, pp. 173–186.
intelligent video processing architecture for cloud-edge video streaming,”
[31] Y. Xu, X. Liu, L. Qin, and S.-C. Zhu, “Cross-view people tracking by
IEEE Trans. Comput., vol. 72, no. 1, pp. 264–277, Jan. 2023.
scene-centered spatio-temporal parsing,” in Proc. 31st AAAI Conf. Artif.
[6] A. B. Sada, M. A. Bouras, J. Ma, R. Huang, and H. Ning, “A distributed
Intell., 2017, pp. 4299–4305.
video analytics architecture based on edge-computing and federated learn-
[32] K. Bernardin and R. Stiefelhagen, “Evaluating multiple object tracking
ing,” in Proc. IEEE Int. Conf. Dependable, Auton. Sec. Comput., 2019,
performance: The CLEAR MOT metrics,” EURASIP J. Image Video
pp. 215–220.
Process., vol. 2008, pp. 1–10, 2008.
[7] M. Alam et al., “Real-time smart parking systems integration in distributed
[33] Z. Zhang, J. Wu, X. Zhang, and C. Zhang, “Multi-target, multi-camera
its for smart cities,” J. Adv. Transp., vol. 2018, pp. 1–13, 2018.
tracking by hierarchical clustering: Recent progress on dukemtmc project,”
[8] A. A. Ouallane, A. Bahnasse, A. Bakali, and M. Talea, “Overview of road
2017, arXiv:1712.09531.
traffic management solutions based on IoT and AI,” Procedia Comput.
[34] N. Jiang, S. Bai, Y. Xu, C. Xing, Z. Zhou, and W. Wu, “Online inter-camera
Sci., vol. 198, pp. 518–523, 2022.
trajectory association exploiting person re-identification and camera topol-
[9] S. Guo, X. Hu, S. Guo, and X. Qiu, “Blockchain meets edge computing: A
ogy,” in Proc. ACM Int. Conf. Multimedia, 2018, pp. 1457–1465.
distributed and trusted authentication system,” IEEE Trans. Ind. Inform.,
[35] Y. Gan, R. Han, L. Yin, W. Feng, and S. Wang, “Self-supervised multi-view
vol. 16, no. 3, pp. 1972–1983, Mar. 2020.
multi-human association and tracking,” in Proc. ACM Int. Conf. Multime-
[10] Z. Ma, X. Wang, J. D. Kumar, K. Haneef, H. Gao, and Z. Wang, “A
dia, 2021, pp. 282–290.
blockchain-based trusted data management scheme in edge computing,”
[36] C. Li, J. Li, Y. Xie, J. Nie, T. Yang, and Z. Lu, “Calibration-
IEEE Trans. Ind. Inform., vol. 16, no. 3, pp. 2013–2021, Mar. 2020.
free cross-camera target association using interaction spatiotem-
[11] H. Xue, D. Chen, N. Zhang, H.-N. Dai, and K. Yu, “Integration of
poral consistency,” IEEE Trans. Multimedia, to be published,
blockchain and edge computing in Internet of Things: A survey,” Future
doi: 10.1109/TMM.2022.3205407.
Gener. Comput. Syst., vol. 144, pp. 307–326, Mar. 2023.
[12] Y. Wu, H.-N. Dai, and H. Wang, “Convergence of blockchain and edge
computing for secure and scalable IIoT critical infrastructures in industry
4.0,” IEEE Internet Things J., vol. 8, no. 4, pp. 2300–2317, Feb. 2021.
[13] H. Sheng et al., “Near-online tracking with co-occurrence constraints in
blockchain-based edge computing,” IEEE Internet Things J., vol. 8, no. 4,
pp. 2193–2207, Feb. 2021.
[14] N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking Shuai Wang received the B.S. degree in com-
with a deep association metric,” in Proc. IEEE Int. Conf. Image Process., puter science and technology from the School
2017, pp. 3645–3649. of Computer Science and Engineering, Beihang
[15] S. Wang, H. Sheng, D. Yang, Y. Zhang, Y. Wu, and S. Wang, “Extendable University, Beijing, China, in 2019, where he is
multiple nodes recurrent tracking framework with RTU,” IEEE Trans. currently working toward the Ph.D. degree in
Image Process., vol. 31, pp. 5257–5271, Jul. 2022. computer application technology.
[16] Y. Zhang, C. Wang, X. Wang, W. Zeng, and W. Liu, “FairMOT: On the His research focuses on computer vision,
fairness of detection and re-identification in multiple object tracking,” Int. and he is particularly interested in multi-
J. Comput. Vis., vol. 129, no. 11, pp. 3069–3087, 2021. ple object tracking. Contact him at shuai-
[17] C. Kim, F. Li, A. Ciptadi, and J. M. Rehg, “Multiple hypothesis tracking [email protected].
revisited,” in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 4696–4704.
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 07,2024 at 08:19:50 UTC from IEEE Xplore. Restrictions apply.
WANG et al.: BLOCKCHAIN-EMPOWERED DISTRIBUTED MULTICAMERA MULTITARGET TRACKING IN EDGE COMPUTING 379
Hao Sheng (Member, IEEE) received his B.S. in Jiahao Shen received the B.S. degree in com-
computer science and technology and Ph.D. de- puter science and technology from the China
grees in computer application from the School of University of Geosciences, Beijing, China, in
Computer Science and Engineering of Beihang 2019, and the master’s degree in computer sci-
University in 2003 and 2009, respectively. ence from the Beijing University of Posts and
He is currently the Professor and Ph.D. Su- Telecommunications, Beijing, China. He is cur-
pervisor with the School of Computer Science rently working toward the Ph.D. degree at Bei-
and Engineering, Beihang University. His re- hang University, Beijing, China.
search interests include computer vision, pat- His research interests include computer
tern recognition, and machine learning. Contact vision, multiple object tracking, and rein-
him at [email protected]. forcement learning. Contact him at jiahao-
[email protected].
Authorized licensed use limited to: INDIAN INSTITUTE OF INFORMATION TECHNOLOGY. Downloaded on January 07,2024 at 08:19:50 UTC from IEEE Xplore. Restrictions apply.