Search | arXiv e-print repository

Compressed Sensing based Detection Schemes for Differential Spatial Modulation in Visible Light Communication Systems

Authors: Zichun Shi, Pu Miao, Peng Chen, Lei Xue, Li-Yang Zheng, Laiyuan Wang, Gaojie Chen

Abstract: Differential spatial modulation (DSM) exploits the time dimension to facilitate the differential modulation, which can perfectly avoid the challenge in acquiring of heavily entangled channel state information of visible light communication (VLC) system. However, it has huge search space and high complexity for large number of transmitters. In this paper, a novel vector correction (VC)-based orthog… ▽ More Differential spatial modulation (DSM) exploits the time dimension to facilitate the differential modulation, which can perfectly avoid the challenge in acquiring of heavily entangled channel state information of visible light communication (VLC) system. However, it has huge search space and high complexity for large number of transmitters. In this paper, a novel vector correction (VC)-based orthogonal matching pursuit (OMP) detection algorithm is proposed to reduce the complexity, which exploits the sparsity and relativity of all transmitters, and then employs a novel correction criterion by correcting the index vectors of the error estimation for improving the demodulation performance. To overcome the local optimum dilemma in the atoms searching, an OMP-assisted genetic algorithm is also proposed to further improve the bit error rate (BER) performance of the VLC-DSM system. Simulation results demonstrate that the proposed schemes can significantly reduce the computational complexity at least by 62.5% while achieving an excellent BER performance as compared with traditional maximum likelihood based receiver. △ Less

Submitted 10 September, 2024; originally announced September 2024.

Comments: This paper has been accepted by 2024 IEEE 24th International Conference on Communication Technology (ICCT 2024)

arXiv:2409.06197 [pdf, other]

UdeerLID+: Integrating LiDAR, Image, and Relative Depth with Semi-Supervised

Authors: Tao Ni, Xin Zhan, Tao Luo, Wenbin Liu, Zhan Shi, JunBo Chen

Abstract: Road segmentation is a critical task for autonomous driving systems, requiring accurate and robust methods to classify road surfaces from various environmental data. Our work introduces an innovative approach that integrates LiDAR point cloud data, visual image, and relative depth maps derived from images. The integration of multiple data sources in road segmentation presents both opportunities an… ▽ More Road segmentation is a critical task for autonomous driving systems, requiring accurate and robust methods to classify road surfaces from various environmental data. Our work introduces an innovative approach that integrates LiDAR point cloud data, visual image, and relative depth maps derived from images. The integration of multiple data sources in road segmentation presents both opportunities and challenges. One of the primary challenges is the scarcity of large-scale, accurately labeled datasets that are necessary for training robust deep learning models. To address this, we have developed the [UdeerLID+] framework under a semi-supervised learning paradigm. Experiments results on KITTI datasets validate the superior performance. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.03319 [pdf, other]

Semantic Communication for Efficient Point Cloud Transmission

Authors: Shangzhuo Xie, Qianqian Yang, Yuyi Sun, Tianxiao Han, Zhaohui Yang, Zhiguo Shi

Abstract: As three-dimensional acquisition technologies like LiDAR cameras advance, the need for efficient transmission of 3D point clouds is becoming increasingly important. In this paper, we present a novel semantic communication (SemCom) approach for efficient 3D point cloud transmission. Different from existing methods that rely on downsampling and feature extraction for compression, our approach utiliz… ▽ More As three-dimensional acquisition technologies like LiDAR cameras advance, the need for efficient transmission of 3D point clouds is becoming increasingly important. In this paper, we present a novel semantic communication (SemCom) approach for efficient 3D point cloud transmission. Different from existing methods that rely on downsampling and feature extraction for compression, our approach utilizes a parallel structure to separately extract both global and local information from point clouds. This system is composed of five key components: local semantic encoder, global semantic encoder, channel encoder, channel decoder, and semantic decoder. Our numerical results indicate that this approach surpasses both the traditional Octree compression methodology and alternative deep learning-based strategies in terms of reconstruction quality. Moreover, our system is capable of achieving high-quality point cloud reconstruction under adverse channel conditions, specifically maintaining a reconstruction quality of over 37dB even with severe channel noise. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2409.02008 [pdf, other]

When Digital Twin Meets 6G: Concepts, Obstacles, and Research Prospects

Authors: Wenshuai Liu, Yaru Fu, Zheng Shi, Hong Wang

Abstract: The convergence of digital twin technology and the emerging 6G network presents both challenges and numerous research opportunities. This article explores the potential synergies between digital twin and 6G, highlighting the key challenges and proposing fundamental principles for their integration. We discuss the unique requirements and capabilities of digital twin in the context of 6G networks, s… ▽ More The convergence of digital twin technology and the emerging 6G network presents both challenges and numerous research opportunities. This article explores the potential synergies between digital twin and 6G, highlighting the key challenges and proposing fundamental principles for their integration. We discuss the unique requirements and capabilities of digital twin in the context of 6G networks, such as sustainable deployment, real-time synchronization, seamless migration, predictive analytic, and closed-loop control. Furthermore, we identify research opportunities for leveraging digital twin and artificial intelligence to enhance various aspects of 6G, including network optimization, resource allocation, security, and intelligent service provisioning. This article aims to stimulate further research and innovation at the intersection of digital twin and 6G, paving the way for transformative applications and services in the future. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: 7 pages, 6 figures

arXiv:2409.01048 [pdf, ps, other]

Boundedness of discounted tree sums

Authors: Elie Aïdékon, Yueyun Hu, Zhan Shi

Abstract: Let $(V(u),\, u\in \mathcal{T})$ be a (supercritical) branching random walk and $(η_u,\,u\in \mathcal{T})$ be marks on the vertices of the tree, distributed in an i.i.d.\ fashion. Following Aldous and Bandyopadhyay \cite{AB05}, for each infinite ray $ξ$ of the tree, we associate the {\it discounted tree sum} $D(ξ)$ which is the sum of the $e^{-V(u)}η_u$ taken along the ray. The paper deals with… ▽ More Let $(V(u),\, u\in \mathcal{T})$ be a (supercritical) branching random walk and $(η_u,\,u\in \mathcal{T})$ be marks on the vertices of the tree, distributed in an i.i.d.\ fashion. Following Aldous and Bandyopadhyay \cite{AB05}, for each infinite ray $ξ$ of the tree, we associate the {\it discounted tree sum} $D(ξ)$ which is the sum of the $e^{-V(u)}η_u$ taken along the ray. The paper deals with the finiteness of $\sup_ξD(ξ)$. To this end, we study the extreme behaviour of the local time processes of the paths $(V(u),\,u\in ξ)$. It answers a question of Nicolas Curien, and partially solves Open Problem 31 of Aldous and Bandyopadhyay \cite{AB05}. We also present several open questions. △ Less

Submitted 2 September, 2024; originally announced September 2024.

arXiv:2409.01035 [pdf, other]

Unleashing the Power of Task-Specific Directions in Parameter Efficient Fine-tuning

Authors: Chongjie Si, Zhiyi Shi, Shifan Zhang, Xiaokang Yang, Hanspeter Pfister, Wei Shen

Abstract: Large language models demonstrate impressive performance on downstream tasks, yet requiring extensive resource consumption when fully fine-tuning all parameters. To mitigate this, Parameter Efficient Fine-Tuning (PEFT) strategies, such as LoRA, have been developed. In this paper, we delve into the concept of task-specific directions--critical for transitioning large models from pre-trained states… ▽ More Large language models demonstrate impressive performance on downstream tasks, yet requiring extensive resource consumption when fully fine-tuning all parameters. To mitigate this, Parameter Efficient Fine-Tuning (PEFT) strategies, such as LoRA, have been developed. In this paper, we delve into the concept of task-specific directions--critical for transitioning large models from pre-trained states to task-specific enhancements in PEFT. We propose a framework to clearly define these directions and explore their properties, and practical utilization challenges. We then introduce a novel approach, LoRA-Dash, which aims to maximize the impact of task-specific directions during the fine-tuning process, thereby enhancing model performance on targeted tasks. Extensive experiments have conclusively demonstrated the effectiveness of LoRA-Dash, and in-depth analyses further reveal the underlying mechanisms of LoRA-Dash. The code is available at https://github.com/Chongjie-Si/Subspace-Tuning. △ Less

Submitted 2 September, 2024; originally announced September 2024.

Comments: Revisions ongoing. Codes in https://github.com/Chongjie-Si/Subspace-Tuning

arXiv:2409.00130 [pdf]

Mirror contrastive loss based sliding window transformer for subject-independent motor imagery based EEG signal recognition

Authors: Jing Luo, Qi Mao, Weiwei Shi, Zhenghao Shi, Xiaofan Wang, Xiaofeng Lu, Xinhong Hei

Abstract: While deep learning models have been extensively utilized in motor imagery based EEG signal recognition, they often operate as black boxes. Motivated by neurological findings indicating that the mental imagery of left or right-hand movement induces event-related desynchronization (ERD) in the contralateral sensorimotor area of the brain, we propose a Mirror Contrastive Loss based Sliding Window Tr… ▽ More While deep learning models have been extensively utilized in motor imagery based EEG signal recognition, they often operate as black boxes. Motivated by neurological findings indicating that the mental imagery of left or right-hand movement induces event-related desynchronization (ERD) in the contralateral sensorimotor area of the brain, we propose a Mirror Contrastive Loss based Sliding Window Transformer (MCL-SWT) to enhance subject-independent motor imagery-based EEG signal recognition. Specifically, our proposed mirror contrastive loss enhances sensitivity to the spatial location of ERD by contrasting the original EEG signals with their mirror counterparts-mirror EEG signals generated by interchanging the channels of the left and right hemispheres of the EEG signals. Moreover, we introduce a temporal sliding window transformer that computes self-attention scores from high temporal resolution features, thereby improving model performance with manageable computational complexity. We evaluate the performance of MCL-SWT on subject-independent motor imagery EEG signal recognition tasks, and our experimental results demonstrate that MCL-SWT achieved accuracies of 66.48% and 75.62%, surpassing the state-of-the-art (SOTA) model by 2.82% and 2.17%, respectively. Furthermore, ablation experiments confirm the effectiveness of the proposed mirror contrastive loss. A code demo of MCL-SWT is available at https://github.com/roniusLuo/MCL_SWT. △ Less

Submitted 29 August, 2024; originally announced September 2024.

Comments: This paper has been accepted by the Fourth International Workshop on Human Brain and Artificial Intelligence, joint workshop of the 33rd International Joint Conference on Artificial Intelligence, Jeju Island, South Korea, from August 3rd to August 9th, 2024

arXiv:2408.16634 [pdf, other]

RLCP: A Reinforcement Learning-based Copyright Protection Method for Text-to-Image Diffusion Model

Authors: Zhuan Shi, Jing Yan, Xiaoli Tang, Lingjuan Lyu, Boi Faltings

Abstract: The increasing sophistication of text-to-image generative models has led to complex challenges in defining and enforcing copyright infringement criteria and protection. Existing methods, such as watermarking and dataset deduplication, fail to provide comprehensive solutions due to the lack of standardized metrics and the inherent complexity of addressing copyright infringement in diffusion models.… ▽ More The increasing sophistication of text-to-image generative models has led to complex challenges in defining and enforcing copyright infringement criteria and protection. Existing methods, such as watermarking and dataset deduplication, fail to provide comprehensive solutions due to the lack of standardized metrics and the inherent complexity of addressing copyright infringement in diffusion models. To deal with these challenges, we propose a Reinforcement Learning-based Copyright Protection(RLCP) method for Text-to-Image Diffusion Model, which minimizes the generation of copyright-infringing content while maintaining the quality of the model-generated dataset. Our approach begins with the introduction of a novel copyright metric grounded in copyright law and court precedents on infringement. We then utilize the Denoising Diffusion Policy Optimization (DDPO) framework to guide the model through a multi-step decision-making process, optimizing it using a reward function that incorporates our proposed copyright metric. Additionally, we employ KL divergence as a regularization term to mitigate some failure modes and stabilize RL fine-tuning. Experiments conducted on 3 mixed datasets of copyright and non-copyright images demonstrate that our approach significantly reduces copyright infringement risk while maintaining image quality. △ Less

Submitted 2 September, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

Comments: arXiv admin note: text overlap with arXiv:2403.12052 by other authors

arXiv:2408.16243 [pdf, ps, other]

Error analysis of finite element method for nonlocal diffusion model

Authors: Zuoqiang Shi

Abstract: We analyze the error of finite element method for nonlocal diffusion model include both conformal and nonconformal method. We also consider the mesh with and without shape regularity. For shape regular mesh, finite element method for nonlocal diffusion model is asymptotic preserving and the error is $O(h^k+δ)$. For shape irregular mesh, the error becomes $O(\frac{h^{k+1}}δ+δ)$. We analyze the error of finite element method for nonlocal diffusion model include both conformal and nonconformal method. We also consider the mesh with and without shape regularity. For shape regular mesh, finite element method for nonlocal diffusion model is asymptotic preserving and the error is $O(h^k+δ)$. For shape irregular mesh, the error becomes $O(\frac{h^{k+1}}δ+δ)$. △ Less

Submitted 2 September, 2024; v1 submitted 28 August, 2024; originally announced August 2024.

arXiv:2408.15821 [pdf, other]

Higher-dimensional quantum Oppenheimer-Snyder model

Authors: Zijian Shi, Xiangdong Zhang, Yongge Ma

Abstract: The quantum Oppenheimer-Snyder model for higher-dimensional spacetimes is studied. The higher-dimensional quantum-corrected Schwarzschild black hole is obtained by the junction condition. It turns out that quantum bounces always occur in the collapse thus that the classical gravitational collapse singularities are avoided. The scalar perturbations upon the quantum-corrected black holes are also st… ▽ More The quantum Oppenheimer-Snyder model for higher-dimensional spacetimes is studied. The higher-dimensional quantum-corrected Schwarzschild black hole is obtained by the junction condition. It turns out that quantum bounces always occur in the collapse thus that the classical gravitational collapse singularities are avoided. The scalar perturbations upon the quantum-corrected black holes are also studied. It turns out that the quantum corrections enhance the oscillation frequency in lower dimensions and decrease it in higher dimensions. Moreover, the thermodynamic laws of the quantum-corrected black holes imply that the Hawking temperature of quantum-corrected black hole decreases as the mass decreases in contrast to the classical situation. The behaviour of heat capacity indicates that quantum corrections introduce an extra phase transition of the black holes. △ Less

Submitted 28 August, 2024; originally announced August 2024.

Comments: 15 pages, 15 figures

arXiv:2408.15472 [pdf, other]

On the implementation of linear finite element method for nonlocal diffusion model over 2D domain

Authors: Zuoqiang Shi

Abstract: We propose an implementation of linear finite element method for nonlocal diffusion problem in 2D space. In the implementation, we reduce the integral from 4D to 2D which would simplify the computation significantly. We propose an implementation of linear finite element method for nonlocal diffusion problem in 2D space. In the implementation, we reduce the integral from 4D to 2D which would simplify the computation significantly. △ Less

Submitted 27 August, 2024; originally announced August 2024.

arXiv:2408.14786 [pdf]

doi 10.3847/1538-4357/ad5af8

Pulsar Population Synthesis with Magnetorotational Evolution: Constraining the Decay of Magnetic field

Authors: Zhihong Shi, C. -Y. Ng

Abstract: We present a population synthesis model for normal radio pulsars in the Galaxy incorporating the latest developments in the field and the magnetorotational evolution processes. Our model considers spin-down with a force-free magnetosphere and the decay of the magnetic field strength and its inclination angle. The simulated pulsar population is fit to a large observation sample that covers the majo… ▽ More We present a population synthesis model for normal radio pulsars in the Galaxy incorporating the latest developments in the field and the magnetorotational evolution processes. Our model considers spin-down with a force-free magnetosphere and the decay of the magnetic field strength and its inclination angle. The simulated pulsar population is fit to a large observation sample that covers the majority of radio surveys using the Markov Chain Monte Carlo technique. We compare the distributions of four major observables: spin period (P), spin down rate($\dot{P}$), dispersion measure, and radio flux density using accurate high-dimensional Kolmogoro-Smirnov statistics. We test two B-field decay scenarios, an exponential model motivated by ohmic dissipation and a power-law model motivated by the Hall effect. The former clearly provides a better fit, and it can successfully reproduce the observed pulsar distributions with a decay timescale of $8.3_{-3.0}^{+3.9}$ Myr. The result suggests that significant B-field decay in aged pulsars and ohmic dissipation could be the dominant process. △ Less

Submitted 3 September, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

Comments: 17 pages, 10 figures. Published by APJ

arXiv:2408.14475 [pdf, other]

Crowdsense Roadside Parking Spaces with Dynamic Gap Reduction Algorithm

Authors: Wenjun Zheng, Zhan Shi, Qianyu Ou, Ruizhi Liao

Abstract: In the context of smart city development, mobile sensing emerges as a cost-effective alternative to fixed sensing for on-street parking detection. However, its practicality is often challenged by the inherent accuracy limitations arising from detection intervals. This paper introduces a novel Dynamic Gap Reduction Algorithm (DGRA), which is a crowdsensing-based approach aimed at addressing this qu… ▽ More In the context of smart city development, mobile sensing emerges as a cost-effective alternative to fixed sensing for on-street parking detection. However, its practicality is often challenged by the inherent accuracy limitations arising from detection intervals. This paper introduces a novel Dynamic Gap Reduction Algorithm (DGRA), which is a crowdsensing-based approach aimed at addressing this question through parking detection data collected by sensors on moving vehicles. The algorithm's efficacy is validated through real drive tests and simulations. We also present a Driver-Side and Traffic-Based Model (DSTBM), which incorporates drivers' parking decisions and traffic conditions to evaluate DGRA's performance. Results highlight DGRA's significant potential in reducing the mobile sensing accuracy gap, marking a step forward in efficient urban parking management. △ Less

Submitted 10 August, 2024; originally announced August 2024.

arXiv:2408.13233 [pdf, ps, other]

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

Authors: Yingyu Liang, Zhizhou Sha, Zhenmei Shi, Zhao Song, Yufa Zhou

Abstract: The quadratic computational complexity in the self-attention mechanism of popular transformer architectures poses significant challenges for training and inference, particularly in terms of efficiency and memory requirements. Towards addressing these challenges, this paper introduces a novel fast computation method for gradient calculation in multi-layer transformer models. Our approach enables th… ▽ More The quadratic computational complexity in the self-attention mechanism of popular transformer architectures poses significant challenges for training and inference, particularly in terms of efficiency and memory requirements. Towards addressing these challenges, this paper introduces a novel fast computation method for gradient calculation in multi-layer transformer models. Our approach enables the computation of gradients for the entire multi-layer transformer model in almost linear time $n^{1+o(1)}$, where $n$ is the input sequence length. This breakthrough significantly reduces the computational bottleneck associated with the traditional quadratic time complexity. Our theory holds for any loss function and maintains a bounded approximation error across the entire model. Furthermore, our analysis can hold when the multi-layer transformer model contains many practical sub-modules, such as residual connection, casual mask, and multi-head attention. By improving the efficiency of gradient computation in large language models, we hope that our work will facilitate the more effective training and deployment of long-context language models based on our theoretical results. △ Less

Submitted 23 August, 2024; originally announced August 2024.

arXiv:2408.12317 [pdf, other]

Adapt CLIP as Aggregation Instructor for Image Dehazing

Authors: Xiaozhe Zhang, Fengying Xie, Haidong Ding, Linpeng Pan, Zhenwei Shi

Abstract: Most dehazing methods suffer from limited receptive field and do not explore the rich semantic prior encapsulated in vision-language models, which have proven effective in downstream tasks. In this paper, we introduce CLIPHaze, a pioneering hybrid framework that synergizes the efficient global modeling of Mamba with the prior knowledge and zero-shot capabilities of CLIP to address both issues simu… ▽ More Most dehazing methods suffer from limited receptive field and do not explore the rich semantic prior encapsulated in vision-language models, which have proven effective in downstream tasks. In this paper, we introduce CLIPHaze, a pioneering hybrid framework that synergizes the efficient global modeling of Mamba with the prior knowledge and zero-shot capabilities of CLIP to address both issues simultaneously. Specifically, our method employs parallel state space model and window-based self-attention to obtain global contextual dependency and local fine-grained perception, respectively. To seamlessly aggregate information from both paths, we introduce CLIP-instructed Aggregation Module (CAM). For non-homogeneous and homogeneous haze, CAM leverages zero-shot estimated haze density map and high-quality image embedding without degradation information to explicitly and implicitly determine the optimal neural operation range for each pixel, thereby adaptively fusing two paths with different receptive fields. Extensive experiments on various benchmarks demonstrate that CLIPHaze achieves state-of-the-art (SOTA) performance, particularly in non-homogeneous haze. Code will be publicly after acceptance. △ Less

Submitted 22 August, 2024; originally announced August 2024.

Comments: 12 pages, 6 figures

arXiv:2408.12151 [pdf, ps, other]

A Tighter Complexity Analysis of SparseGPT

Authors: Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song

Abstract: In this work, we improved the analysis of the running time of SparseGPT [Frantar, Alistarh ICML 2023] from $O(d^{3})$ to $O(d^ω + d^{2+a+o(1)} + d^{1+ω(1,1,a)-a})$ for any $a \in [0, 1]$, where $ω$ is the exponent of matrix multiplication. In particular, for the current $ω\approx 2.371$ [Alman, Duan, Williams, Xu, Xu, Zhou 2024], our running times boil down to $O(d^{2.53})$. This running time is d… ▽ More In this work, we improved the analysis of the running time of SparseGPT [Frantar, Alistarh ICML 2023] from $O(d^{3})$ to $O(d^ω + d^{2+a+o(1)} + d^{1+ω(1,1,a)-a})$ for any $a \in [0, 1]$, where $ω$ is the exponent of matrix multiplication. In particular, for the current $ω\approx 2.371$ [Alman, Duan, Williams, Xu, Xu, Zhou 2024], our running times boil down to $O(d^{2.53})$. This running time is due to the analysis of the lazy update behavior in iterative maintenance problems, such as [Deng, Song, Weinstein 2022, Brand, Song, Zhou ICML 2024]. △ Less

Submitted 22 August, 2024; originally announced August 2024.

arXiv:2408.10854 [pdf, other]

MambaDS: Near-Surface Meteorological Field Downscaling with Topography Constrained Selective State Space Modeling

Authors: Zili Liu, Hao Chen, Lei Bai, Wenyuan Li, Wanli Ouyang, Zhengxia Zou, Zhenwei Shi

Abstract: In an era of frequent extreme weather and global warming, obtaining precise, fine-grained near-surface weather forecasts is increasingly essential for human activities. Downscaling (DS), a crucial task in meteorological forecasting, enables the reconstruction of high-resolution meteorological states for target regions from global-scale forecast results. Previous downscaling methods, inspired by CN… ▽ More In an era of frequent extreme weather and global warming, obtaining precise, fine-grained near-surface weather forecasts is increasingly essential for human activities. Downscaling (DS), a crucial task in meteorological forecasting, enables the reconstruction of high-resolution meteorological states for target regions from global-scale forecast results. Previous downscaling methods, inspired by CNN and Transformer-based super-resolution models, lacked tailored designs for meteorology and encountered structural limitations. Notably, they failed to efficiently integrate topography, a crucial prior in the downscaling process. In this paper, we address these limitations by pioneering the selective state space model into the meteorological field downscaling and propose a novel model called MambaDS. This model enhances the utilization of multivariable correlations and topography information, unique challenges in the downscaling process while retaining the advantages of Mamba in long-range dependency modeling and linear computational complexity. Through extensive experiments in both China mainland and the continental United States (CONUS), we validated that our proposed MambaDS achieves state-of-the-art results in three different types of meteorological field downscaling settings. We will release the code subsequently. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2408.09723 [pdf, other]

sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting

Authors: Jiaheng Yin, Zhengxin Shi, Jianshen Zhang, Xiaomin Lin, Yulin Huang, Yongzhi Qi, Wei Qi

Abstract: In recent years, numerous Transformer-based models have been applied to long-term time-series forecasting (LTSF) tasks. However, recent studies with linear models have questioned their effectiveness, demonstrating that simple linear layers can outperform sophisticated Transformer-based models. In this work, we review and categorize existing Transformer-based models into two main types: (1) modific… ▽ More In recent years, numerous Transformer-based models have been applied to long-term time-series forecasting (LTSF) tasks. However, recent studies with linear models have questioned their effectiveness, demonstrating that simple linear layers can outperform sophisticated Transformer-based models. In this work, we review and categorize existing Transformer-based models into two main types: (1) modifications to the model structure and (2) modifications to the input data. The former offers scalability but falls short in capturing inter-sequential information, while the latter preprocesses time-series data but is challenging to use as a scalable module. We propose $\textbf{sTransformer}$, which introduces the Sequence and Temporal Convolutional Network (STCN) to fully capture both sequential and temporal information. Additionally, we introduce a Sequence-guided Mask Attention mechanism to capture global feature information. Our approach ensures the capture of inter-sequential information while maintaining module scalability. We compare our model with linear models and existing forecasting models on long-term time-series forecasting, achieving new state-of-the-art results. We also conducted experiments on other time-series tasks, achieving strong performance. These demonstrate that Transformer-based structures remain effective and our model can serve as a viable baseline for time-series tasks. △ Less

Submitted 19 August, 2024; originally announced August 2024.

arXiv:2408.09220 [pdf, other]

Flatten: Video Action Recognition is an Image Classification task

Authors: Junlin Chen, Chengcheng Xu, Yangfan Xu, Jian Yang, Jun Li, Zhiping Shi

Abstract: In recent years, video action recognition, as a fundamental task in the field of video understanding, has been deeply explored by numerous researchers.Most traditional video action recognition methods typically involve converting videos into three-dimensional data that encapsulates both spatial and temporal information, subsequently leveraging prevalent image understanding models to model and anal… ▽ More In recent years, video action recognition, as a fundamental task in the field of video understanding, has been deeply explored by numerous researchers.Most traditional video action recognition methods typically involve converting videos into three-dimensional data that encapsulates both spatial and temporal information, subsequently leveraging prevalent image understanding models to model and analyze these data. However,these methods have significant drawbacks. Firstly, when delving into video action recognition tasks, image understanding models often need to be adapted accordingly in terms of model architecture and preprocessing for these spatiotemporal tasks; Secondly, dealing with high-dimensional data often poses greater challenges and incurs higher time costs compared to its lower-dimensional counterparts.To bridge the gap between image-understanding and video-understanding tasks while simplifying the complexity of video comprehension, we introduce a novel video representation architecture, Flatten, which serves as a plug-and-play module that can be seamlessly integrated into any image-understanding network for efficient and effective 3D temporal data modeling.Specifically, by applying specific flattening operations (e.g., row-major transform), 3D spatiotemporal data is transformed into 2D spatial information, and then ordinary image understanding models are used to capture temporal dynamic and spatial semantic information, which in turn accomplishes effective and efficient video action recognition. Extensive experiments on commonly used datasets (Kinetics-400, Something-Something v2, and HMDB-51) and three classical image classification models (Uniformer, SwinV2, and ResNet), have demonstrated that embedding Flatten provides a significant performance improvements over original model. △ Less

Submitted 17 August, 2024; originally announced August 2024.

Comments: 13pages, 6figures

arXiv:2408.09064 [pdf, other]

MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality

Authors: Zhiyi Shi, Junsik Kim, Wanhua Li, Yicong Li, Hanspeter Pfister

Abstract: Multi-modal pre-trained models efficiently extract and fuse features from different modalities with low memory requirements for fine-tuning. Despite this efficiency, their application in disease diagnosis is under-explored. A significant challenge is the frequent occurrence of missing modalities, which impairs performance. Additionally, fine-tuning the entire pre-trained model demands substantial… ▽ More Multi-modal pre-trained models efficiently extract and fuse features from different modalities with low memory requirements for fine-tuning. Despite this efficiency, their application in disease diagnosis is under-explored. A significant challenge is the frequent occurrence of missing modalities, which impairs performance. Additionally, fine-tuning the entire pre-trained model demands substantial computational resources. To address these issues, we introduce Modality-aware Low-Rank Adaptation (MoRA), a computationally efficient method. MoRA projects each input to a low intrinsic dimension but uses different modality-aware up-projections for modality-specific adaptation in cases of missing modalities. Practically, MoRA integrates into the first block of the model, significantly improving performance when a modality is missing. It requires minimal computational resources, with less than 1.6% of the trainable parameters needed compared to training the entire model. Experimental results show that MoRA outperforms existing techniques in disease diagnosis, demonstrating superior performance, robustness, and training efficiency. △ Less

Submitted 16 August, 2024; originally announced August 2024.

Comments: Accepted by MICCAI 2024

arXiv:2408.08500 [pdf, other]

CoSEC: A Coaxial Stereo Event Camera Dataset for Autonomous Driving

Authors: Shihan Peng, Hanyu Zhou, Hao Dong, Zhiwei Shi, Haoyue Liu, Yuxing Duan, Yi Chang, Luxin Yan

Abstract: Conventional frame camera is the mainstream sensor of the autonomous driving scene perception, while it is limited in adverse conditions, such as low light. Event camera with high dynamic range has been applied in assisting frame camera for the multimodal fusion, which relies heavily on the pixel-level spatial alignment between various modalities. Typically, existing multimodal datasets mainly pla… ▽ More Conventional frame camera is the mainstream sensor of the autonomous driving scene perception, while it is limited in adverse conditions, such as low light. Event camera with high dynamic range has been applied in assisting frame camera for the multimodal fusion, which relies heavily on the pixel-level spatial alignment between various modalities. Typically, existing multimodal datasets mainly place event and frame cameras in parallel and directly align them spatially via warping operation. However, this parallel strategy is less effective for multimodal fusion, since the large disparity exacerbates spatial misalignment due to the large event-frame baseline. We argue that baseline minimization can reduce alignment error between event and frame cameras. In this work, we introduce hybrid coaxial event-frame devices to build the multimodal system, and propose a coaxial stereo event camera (CoSEC) dataset for autonomous driving. As for the multimodal system, we first utilize the microcontroller to achieve time synchronization, and then spatially calibrate different sensors, where we perform intra- and inter-calibration of stereo coaxial devices. As for the multimodal dataset, we filter LiDAR point clouds to generate depth and optical flow labels using reference depth, which is further improved by fusing aligned event and frame data in nighttime conditions. With the help of the coaxial device, the proposed dataset can promote the all-day pixel-level multimodal fusion. Moreover, we also conduct experiments to demonstrate that the proposed dataset can improve the performance and generalization of the multimodal fusion. △ Less

Submitted 15 August, 2024; originally announced August 2024.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2408.07321 [pdf, other]

LLM-Enhanced Static Analysis for Precise Identification of Vulnerable OSS Versions

Authors: Yiran Cheng, Lwin Khin Shar, Ting Zhang, Shouguo Yang, Chaopeng Dong, David Lo, Shichao Lv, Zhiqiang Shi, Limin Sun

Abstract: Open-source software (OSS) has experienced a surge in popularity, attributed to its collaborative development model and cost-effective nature. However, the adoption of specific software versions in development projects may introduce security risks when these versions bring along vulnerabilities. Current methods of identifying vulnerable versions typically analyze and trace the code involved in vul… ▽ More Open-source software (OSS) has experienced a surge in popularity, attributed to its collaborative development model and cost-effective nature. However, the adoption of specific software versions in development projects may introduce security risks when these versions bring along vulnerabilities. Current methods of identifying vulnerable versions typically analyze and trace the code involved in vulnerability patches using static analysis with pre-defined rules. They then use syntactic-level code clone detection to identify the vulnerable versions. These methods are hindered by imprecisions due to (1) the inclusion of vulnerability-irrelevant code in the analysis and (2) the inadequacy of syntactic-level code clone detection. This paper presents Vercation, an approach designed to identify vulnerable versions of OSS written in C/C++. Vercation combines program slicing with a Large Language Model (LLM) to identify vulnerability-relevant code from vulnerability patches. It then backtraces historical commits to gather previous modifications of identified vulnerability-relevant code. We propose semantic-level code clone detection to compare the differences between pre-modification and post-modification code, thereby locating the vulnerability-introducing commit (vic) and enabling to identify the vulnerable versions between the patch commit and the vic. We curate a dataset linking 74 OSS vulnerabilities and 1013 versions to evaluate Vercation. On this dataset, our approach achieves the F1 score of 92.4%, outperforming current state-of-the-art methods. More importantly, Vercation detected 134 incorrect vulnerable OSS versions in NVD reports. △ Less

Submitted 14 August, 2024; originally announced August 2024.

arXiv:2408.06604 [pdf, other]

MV-DETR: Multi-modality indoor object detection by Multi-View DEtecton TRansformers

Authors: Zichao Dong, Yilin Zhang, Xufeng Huang, Hang Ji, Zhan Shi, Xin Zhan, Junbo Chen

Abstract: We introduce a novel MV-DETR pipeline which is effective while efficient transformer based detection method. Given input RGBD data, we notice that there are super strong pretraining weights for RGB data while less effective works for depth related data. First and foremost , we argue that geometry and texture cues are both of vital importance while could be encoded separately. Secondly, we find tha… ▽ More We introduce a novel MV-DETR pipeline which is effective while efficient transformer based detection method. Given input RGBD data, we notice that there are super strong pretraining weights for RGB data while less effective works for depth related data. First and foremost , we argue that geometry and texture cues are both of vital importance while could be encoded separately. Secondly, we find that visual texture feature is relatively hard to extract compared with geometry feature in 3d space. Unfortunately, single RGBD dataset with thousands of data is not enough for training an discriminating filter for visual texture feature extraction. Last but certainly not the least, we designed a lightweight VG module consists of a visual textual encoder, a geometry encoder and a VG connector. Compared with previous state of the art works like V-DETR, gains from pretrained visual encoder could be seen. Extensive experiments on ScanNetV2 dataset shows the effectiveness of our method. It is worth mentioned that our method achieve 78\% AP which create new state of the art on ScanNetv2 benchmark. △ Less

Submitted 12 August, 2024; originally announced August 2024.

arXiv:2408.06395 [pdf, ps, other]

Fast John Ellipsoid Computation with Differential Privacy Optimization

Authors: Jiuxiang Gu, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song, Junwei Yu

Abstract: Determining the John ellipsoid - the largest volume ellipsoid contained within a convex polytope - is a fundamental problem with applications in machine learning, optimization, and data analytics. Recent work has developed fast algorithms for approximating the John ellipsoid using sketching and leverage score sampling techniques. However, these algorithms do not provide privacy guarantees for sens… ▽ More Determining the John ellipsoid - the largest volume ellipsoid contained within a convex polytope - is a fundamental problem with applications in machine learning, optimization, and data analytics. Recent work has developed fast algorithms for approximating the John ellipsoid using sketching and leverage score sampling techniques. However, these algorithms do not provide privacy guarantees for sensitive input data. In this paper, we present the first differentially private algorithm for fast John ellipsoid computation. Our method integrates noise perturbation with sketching and leverage score sampling to achieve both efficiency and privacy. We prove that (1) our algorithm provides $(ε,δ)$-differential privacy, and the privacy guarantee holds for neighboring datasets that are $ε_0$-close, allowing flexibility in the privacy definition; (2) our algorithm still converges to a $(1+ξ)$-approximation of the optimal John ellipsoid in $O(ξ^{-2}(\log(n/δ_0) + (Lε_0)^{-2}))$ iterations where $n$ is the number of data point, $L$ is the Lipschitz constant, $δ_0$ is the failure probability, and $ε_0$ is the closeness of neighboring input datasets. Our theoretical analysis demonstrates the algorithm's convergence and privacy properties, providing a robust approach for balancing utility and privacy in John ellipsoid computation. This is the first differentially private algorithm for fast John ellipsoid computation, opening avenues for future research in privacy-preserving optimization techniques. △ Less

Submitted 11 August, 2024; originally announced August 2024.

arXiv:2408.05723 [pdf, other]

Deep Learning with Data Privacy via Residual Perturbation

Authors: Wenqi Tao, Huaming Ling, Zuoqiang Shi, Bao Wang

Abstract: Protecting data privacy in deep learning (DL) is of crucial importance. Several celebrated privacy notions have been established and used for privacy-preserving DL. However, many existing mechanisms achieve privacy at the cost of significant utility degradation and computational overhead. In this paper, we propose a stochastic differential equation-based residual perturbation for privacy-preservin… ▽ More Protecting data privacy in deep learning (DL) is of crucial importance. Several celebrated privacy notions have been established and used for privacy-preserving DL. However, many existing mechanisms achieve privacy at the cost of significant utility degradation and computational overhead. In this paper, we propose a stochastic differential equation-based residual perturbation for privacy-preserving DL, which injects Gaussian noise into each residual mapping of ResNets. Theoretically, we prove that residual perturbation guarantees differential privacy (DP) and reduces the generalization gap of DL. Empirically, we show that residual perturbation is computationally efficient and outperforms the state-of-the-art differentially private stochastic gradient descent (DPSGD) in utility maintenance without sacrificing membership privacy. △ Less

Submitted 11 August, 2024; originally announced August 2024.

arXiv:2408.05707 [pdf, other]

Fast and Scalable Semi-Supervised Learning for Multi-View Subspace Clustering

Authors: Huaming Ling, Chenglong Bao, Jiebo Song, Zuoqiang Shi

Abstract: In this paper, we introduce a Fast and Scalable Semi-supervised Multi-view Subspace Clustering (FSSMSC) method, a novel solution to the high computational complexity commonly found in existing approaches. FSSMSC features linear computational and space complexity relative to the size of the data. The method generates a consensus anchor graph across all views, representing each data point as a spars… ▽ More In this paper, we introduce a Fast and Scalable Semi-supervised Multi-view Subspace Clustering (FSSMSC) method, a novel solution to the high computational complexity commonly found in existing approaches. FSSMSC features linear computational and space complexity relative to the size of the data. The method generates a consensus anchor graph across all views, representing each data point as a sparse linear combination of chosen landmarks. Unlike traditional methods that manage the anchor graph construction and the label propagation process separately, this paper proposes a unified optimization model that facilitates simultaneous learning of both. An effective alternating update algorithm with convergence guarantees is proposed to solve the unified optimization model. Additionally, the method employs the obtained anchor graph and landmarks' low-dimensional representations to deduce low-dimensional representations for raw data. Following this, a straightforward clustering approach is conducted on these low-dimensional representations to achieve the final clustering results. The effectiveness and efficiency of FSSMSC are validated through extensive experiments on multiple benchmark datasets of varying scales. △ Less

Submitted 11 August, 2024; originally announced August 2024.

Comments: 40 pages,7 figures

arXiv:2408.05645 [pdf]

BeyondCT: A deep learning model for predicting pulmonary function from chest CT scans

Authors: Kaiwen Geng, Zhiyi Shi, Xiaoyan Zhao, Alaa Ali, Jing Wang, Joseph Leader, Jiantao Pu

Abstract: Abstract Background: Pulmonary function tests (PFTs) and computed tomography (CT) imaging are vital in diagnosing, managing, and monitoring lung diseases. A common issue in practice is the lack of access to recorded pulmonary functions despite available chest CT scans. Purpose: To develop and validate a deep learning algorithm for predicting pulmonary function directly from chest CT scans. M… ▽ More Abstract Background: Pulmonary function tests (PFTs) and computed tomography (CT) imaging are vital in diagnosing, managing, and monitoring lung diseases. A common issue in practice is the lack of access to recorded pulmonary functions despite available chest CT scans. Purpose: To develop and validate a deep learning algorithm for predicting pulmonary function directly from chest CT scans. Methods: The development cohort came from the Pittsburgh Lung Screening Study (PLuSS) (n=3619). The validation cohort came from the Specialized Centers of Clinically Oriented Research (SCCOR) in COPD (n=662). A deep learning model called BeyondCT, combining a three-dimensional (3D) convolutional neural network (CNN) and Vision Transformer (ViT) architecture, was used to predict forced vital capacity (FVC) and forced expiratory volume in one second (FEV1) from non-contrasted inspiratory chest CT scans. A 3D CNN model without ViT was used for comparison. Subject demographics (age, gender, smoking status) were also incorporated into the model. Performance was compared to actual PFTs using mean absolute error (MAE, L), percentage error, and R square. Results: The 3D-CNN model achieved MAEs of 0.395 L and 0.383 L, percentage errors of 13.84% and 18.85%, and R square of 0.665 and 0.679 for FVC and FEV1, respectively. The BeyondCT model without demographics had MAEs of 0.362 L and 0.371 L, percentage errors of 10.89% and 14.96%, and R square of 0.719 and 0.727, respectively. Including demographics improved performance (p<0.05), with MAEs of 0.356 L and 0.353 L, percentage errors of 10.79% and 14.82%, and R square of 0.77 and 0.739 for FVC and FEV1 in the test set. Conclusion: The BeyondCT model showed robust performance in predicting lung function from non-contrast inspiratory chest CT scans. △ Less

Submitted 10 August, 2024; originally announced August 2024.

Comments: 5 tables, 7 figures,22 pages

arXiv:2408.05419 [pdf, other]

Interface Laplace Learning: Learnable Interface Term Helps Semi-Supervised Learning

Authors: Tangjun Wang, Chenglong Bao, Zuoqiang Shi

Abstract: We introduce a novel framework, called Interface Laplace learning, for graph-based semi-supervised learning. Motivated by the observation that an interface should exist between different classes where the function value is non-smooth, we introduce a Laplace learning model that incorporates an interface term. This model challenges the long-standing assumption that functions are smooth at all unlabe… ▽ More We introduce a novel framework, called Interface Laplace learning, for graph-based semi-supervised learning. Motivated by the observation that an interface should exist between different classes where the function value is non-smooth, we introduce a Laplace learning model that incorporates an interface term. This model challenges the long-standing assumption that functions are smooth at all unlabeled points. In the proposed approach, we add an interface term to the Laplace learning model at the interface positions. We provide a practical algorithm to approximate the interface positions using k-hop neighborhood indices, and to learn the interface term from labeled data without artificial design. Our method is efficient and effective, and we present extensive experiments demonstrating that Interface Laplace learning achieves better performance than other recent semi-supervised learning approaches at extremely low label rates on the MNIST, FashionMNIST, and CIFAR-10 datasets. △ Less

Submitted 9 August, 2024; originally announced August 2024.

arXiv:2408.04499 [pdf, other]

Knowledge-Aided Semantic Communication Leveraging Probabilistic Graphical Modeling

Authors: Haowen Wan, Qianqian Yang, Jiancheng Tang, Zhiguo shi

Abstract: In this paper, we propose a semantic communication approach based on probabilistic graphical model (PGM). The proposed approach involves constructing a PGM from a training dataset, which is then shared as common knowledge between the transmitter and receiver. We evaluate the importance of various semantic features and present a PGM-based compression algorithm designed to eliminate predictable port… ▽ More In this paper, we propose a semantic communication approach based on probabilistic graphical model (PGM). The proposed approach involves constructing a PGM from a training dataset, which is then shared as common knowledge between the transmitter and receiver. We evaluate the importance of various semantic features and present a PGM-based compression algorithm designed to eliminate predictable portions of semantic information. Furthermore, we introduce a technique to reconstruct the discarded semantic information at the receiver end, generating approximate results based on the PGM. Simulation results indicate a significant improvement in transmission efficiency over existing methods, while maintaining the quality of the transmitted images. △ Less

Submitted 8 August, 2024; originally announced August 2024.

arXiv:2408.03815 [pdf, other]

Dissipation Driven Coherent Dynamics Observed in Bose-Einstein Condensates

Authors: Ye Tian, Yajuan Zhao, Yue Wu, Jilai Ye, Shuyao Mei, Zhihao Chi, Tian Tian, Ce Wang, Zhe-Yu Shi, Yu Chen, Jiazhong Hu, Hui Zhai, Wenlan Chen

Abstract: We report the first experimental observation of dissipation-driven coherent quantum many-body oscillation, and this oscillation is manifested as the coherent exchange of atoms between the thermal and the condensate components in a three-dimensional partially condensed Bose gas. Firstly, we observe that the dissipation leads to two different atom loss rates between the thermal and the condensate co… ▽ More We report the first experimental observation of dissipation-driven coherent quantum many-body oscillation, and this oscillation is manifested as the coherent exchange of atoms between the thermal and the condensate components in a three-dimensional partially condensed Bose gas. Firstly, we observe that the dissipation leads to two different atom loss rates between the thermal and the condensate components, such that the thermal fraction increases as dissipation time increases. Therefore, this dissipation process serves as a tool to uniformly ramp up the system's temperature without introducing extra density excitation. Subsequently, a coherent pair exchange of atoms between the thermal and the condensate components occurs, resulting in coherent oscillation of atom numbers in both components. This oscillation, permanently embedded in the atom loss process, is revealed clearly when we inset a duration of dissipation-free evolution into the entire dynamics, manifested as an oscillation of total atom number at the end. Finally, we also present a theoretical calculation to support this physical mechanism, which simultaneously includes dissipation, interaction, finite temperature, and harmonic trap effects. Our work introduces a highly controllable dissipation as a new tool to control quantum many-body dynamics. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: 11 pages, 5 figures, 1 table

arXiv:2408.03075 [pdf]

Characterizing the current systems in the Martian ionosphere

Authors: Jiawei Gao, Shibang Li, Anna Mittelholz, Zhaojin Rong, Moa Persson, Zhen Shi, Haoyu Lu, Chi Zhang, Xiaodong Wang, Chuanfei Dong, Lucy Klinger, Jun Cui, Yong Wei, Yongxin Pan

Abstract: When the solar wind interacts with the ionosphere of an unmagnetized planet, it induces currents that form an induced magnetosphere. These currents and their associated magnetic fields play a pivotal role in controlling the movement of charged particles, which is essential for understanding the escape of planetary ions. Unlike the well-documented magnetospheric current systems, the ionospheric cur… ▽ More When the solar wind interacts with the ionosphere of an unmagnetized planet, it induces currents that form an induced magnetosphere. These currents and their associated magnetic fields play a pivotal role in controlling the movement of charged particles, which is essential for understanding the escape of planetary ions. Unlike the well-documented magnetospheric current systems, the ionospheric current systems on unmagnetized planets remain less understood, which constrains the quantification of electrodynamic energy transfer from stars to these planets. Here, utilizing eight years of data from the Mars Atmosphere and Volatile EvolutioN (MAVEN) mission, we investigate the global distribution of ionospheric currents on Mars. We have identified two distinct current systems in the ionosphere: one aligns with the solar wind electric field yet exhibits hemispheric asymmetry perpendicular to the electric field direction; the other corresponds to the flow pattern of annually-averaged neutral winds. We propose that these two current systems are driven by the solar wind and atmospheric neutral winds, respectively. Our findings reveal that Martian ionospheric dynamics are influenced by the neutral winds from below and the solar wind from above, highlighting the complex and intriguing nature of current systems on unmagnetized planets. △ Less

Submitted 6 August, 2024; originally announced August 2024.

Comments: 20 pages, 6 figures

arXiv:2408.02780 [pdf]

LR-Net: A Lightweight and Robust Network for Infrared Small Target Detection

Authors: Chuang Yu, Yunpeng Liu, Jinmiao Zhao, Zelin Shi

Abstract: Limited by equipment limitations and the lack of target intrinsic features, existing infrared small target detection methods have difficulty meeting actual comprehensive performance requirements. Therefore, we propose an innovative lightweight and robust network (LR-Net), which abandons the complex structure and achieves an effective balance between detection accuracy and resource consumption. Spe… ▽ More Limited by equipment limitations and the lack of target intrinsic features, existing infrared small target detection methods have difficulty meeting actual comprehensive performance requirements. Therefore, we propose an innovative lightweight and robust network (LR-Net), which abandons the complex structure and achieves an effective balance between detection accuracy and resource consumption. Specifically, to ensure the lightweight and robustness, on the one hand, we construct a lightweight feature extraction attention (LFEA) module, which can fully extract target features and strengthen information interaction across channels. On the other hand, we construct a simple refined feature transfer (RFT) module. Compared with direct cross-layer connections, the RFT module can improve the network's feature refinement extraction capability with little resource consumption. Meanwhile, to solve the problem of small target loss in high-level feature maps, on the one hand, we propose a low-level feature distribution (LFD) strategy to use low-level features to supplement the information of high-level features. On the other hand, we introduce an efficient simplified bilinear interpolation attention module (SBAM) to promote the guidance constraints of low-level features on high-level features and the fusion of the two. In addition, We abandon the traditional resizing method and adopt a new training and inference cropping strategy, which is more robust to datasets with multi-scale samples. Extensive experimental results show that our LR-Net achieves state-of-the-art (SOTA) performance. Notably, on the basis of the proposed LR-Net, we achieve 3rd place in the "ICPR 2024 Resource-Limited Infrared Small Target Detection Challenge Track 2: Lightweight Infrared Small Target Detection". △ Less

Submitted 5 August, 2024; originally announced August 2024.

arXiv:2408.02773 [pdf]

Refined Infrared Small Target Detection Scheme with Single-Point Supervision

Authors: Jinmiao Zhao, Zelin Shi, Chuang Yu, Yunpeng Liu

Abstract: Recently, infrared small target detection with single-point supervision has attracted extensive attention. However, the detection accuracy of existing methods has difficulty meeting actual needs. Therefore, we propose an innovative refined infrared small target detection scheme with single-point supervision, which has excellent segmentation accuracy and detection rate. Specifically, we introduce l… ▽ More Recently, infrared small target detection with single-point supervision has attracted extensive attention. However, the detection accuracy of existing methods has difficulty meeting actual needs. Therefore, we propose an innovative refined infrared small target detection scheme with single-point supervision, which has excellent segmentation accuracy and detection rate. Specifically, we introduce label evolution with single point supervision (LESPS) framework and explore the performance of various excellent infrared small target detection networks based on this framework. Meanwhile, to improve the comprehensive performance, we construct a complete post-processing strategy. On the one hand, to improve the segmentation accuracy, we use a combination of test-time augmentation (TTA) and conditional random field (CRF) for post-processing. On the other hand, to improve the detection rate, we introduce an adjustable sensitivity (AS) strategy for post-processing, which fully considers the advantages of multiple detection results and reasonably adds some areas with low confidence to the fine segmentation image in the form of centroid points. In addition, to further improve the performance and explore the characteristics of this task, on the one hand, we construct and find that a multi-stage loss is helpful for fine-grained detection. On the other hand, we find that a reasonable sliding window cropping strategy for test samples has better performance for actual multi-size samples. Extensive experimental results show that the proposed scheme achieves state-of-the-art (SOTA) performance. Notably, the proposed scheme won the third place in the "ICPR 2024 Resource-Limited Infrared Small Target Detection Challenge Track 1: Weakly Supervised Infrared Small Target Detection". △ Less

Submitted 5 August, 2024; originally announced August 2024.

arXiv:2408.02095 [pdf, other]

Secure Semantic Communications: From Perspective of Physical Layer Security

Authors: Yongkang Li, Zheng Shi, Han Hu, Yaru Fu, Hong Wang, Hongjiang Lei

Abstract: Semantic communications have been envisioned as a potential technique that goes beyond Shannon paradigm. Unlike modern communications that provide bit-level security, the eaves-dropping of semantic communications poses a significant risk of potentially exposing intention of legitimate user. To address this challenge, a novel deep neural network (DNN) enabled secure semantic communication (DeepSSC)… ▽ More Semantic communications have been envisioned as a potential technique that goes beyond Shannon paradigm. Unlike modern communications that provide bit-level security, the eaves-dropping of semantic communications poses a significant risk of potentially exposing intention of legitimate user. To address this challenge, a novel deep neural network (DNN) enabled secure semantic communication (DeepSSC) system is developed by capitalizing on physical layer security. To balance the tradeoff between security and reliability, a two-phase training method for DNNs is devised. Particularly, Phase I aims at semantic recovery of legitimate user, while Phase II attempts to minimize the leakage of semantic information to eavesdroppers. The loss functions of DeepSSC in Phases I and II are respectively designed according to Shannon capacity and secure channel capacity, which are approximated with variational inference. Moreover, we define the metric of secure bilingual evaluation understudy (S-BLEU) to assess the security of semantic communications. Finally, simulation results demonstrate that DeepSSC achieves a significant boost to semantic security particularly in high signal-to-noise ratio regime, despite a minor degradation of reliability. △ Less

Submitted 4 August, 2024; originally announced August 2024.

arXiv:2408.01291 [pdf, other]

TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling

Authors: Dong Huo, Zixin Guo, Xinxin Zuo, Zhihao Shi, Juwei Lu, Peng Dai, Songcen Xu, Li Cheng, Yee-Hong Yang

Abstract: Given a 3D mesh, we aim to synthesize 3D textures that correspond to arbitrary textual descriptions. Current methods for generating and assembling textures from sampled views often result in prominent seams or excessive smoothing. To tackle these issues, we present TexGen, a novel multi-view sampling and resampling framework for texture generation leveraging a pre-trained text-to-image diffusion m… ▽ More Given a 3D mesh, we aim to synthesize 3D textures that correspond to arbitrary textual descriptions. Current methods for generating and assembling textures from sampled views often result in prominent seams or excessive smoothing. To tackle these issues, we present TexGen, a novel multi-view sampling and resampling framework for texture generation leveraging a pre-trained text-to-image diffusion model. For view consistent sampling, first of all we maintain a texture map in RGB space that is parameterized by the denoising step and updated after each sampling step of the diffusion model to progressively reduce the view discrepancy. An attention-guided multi-view sampling strategy is exploited to broadcast the appearance information across views. To preserve texture details, we develop a noise resampling technique that aids in the estimation of noise, generating inputs for subsequent denoising steps, as directed by the text prompt and current texture map. Through an extensive amount of qualitative and quantitative evaluations, we demonstrate that our proposed method produces significantly better texture quality for diverse 3D objects with a high degree of view consistency and rich appearance details, outperforming current state-of-the-art methods. Furthermore, our proposed texture generation technique can also be applied to texture editing while preserving the original identity. More experimental results are available at https://dong-huo.github.io/TexGen/ △ Less

Submitted 2 August, 2024; originally announced August 2024.

Comments: European Conference on Computer Vision (ECCV) 2024

arXiv:2408.00234 [pdf, other]

doi 10.1002/pssb.202400240

Superconductive Sodalite-like Clathrate Hydrides MXH$_{12}$ with Critical Temperatures of near 300 K under Pressures

Authors: Yuxiang Fan, Bin Li, Cong Zhu, Jie Cheng, Shengli Liu, Zhixiang Shi

Abstract: We designed and investigated a series of ternary hydride compounds MXH$_{12}$ crystallizing in the cubic $Pm\overline{3}m$ structure as potential rare-earth and alkaline-earth superconductors. First-principles calculations were performed on these prospective superconductors across the pressure range of 50-200 GPa, revealing their electronic band structures, phonon dispersions, electron-phonon inte… ▽ More We designed and investigated a series of ternary hydride compounds MXH$_{12}$ crystallizing in the cubic $Pm\overline{3}m$ structure as potential rare-earth and alkaline-earth superconductors. First-principles calculations were performed on these prospective superconductors across the pressure range of 50-200 GPa, revealing their electronic band structures, phonon dispersions, electron-phonon interactions, and superconducting properties. Several compounds were identified as dynamically stable, with ScYbH$_{12}$ and LuYbH$_{12}$ remaining stable at 70 GPa, and ScLuH$_{12}$ at 100 GPa. Notably, Eliashberg theory and electron-phonon coupling calculations predict CaLuH$_{12}$ to exhibit a remarkable $T_{c}$ of up to 294 K at 180 GPa. These findings unveil ternary hydrides as a promising class of high-temperature superconductors and provide insights for achieving superconductivity at lower or ambient pressures through material design and exploration. △ Less

Submitted 31 July, 2024; originally announced August 2024.

Journal ref: Phys. Status Solidi B 2400240 (2024)

arXiv:2407.21346 [pdf, other]

A network based approach for unbalanced optimal transport on surfaces

Authors: Jiangong Pan, Wei Wan, Yuejin Zhang, Chenlong Bao, Zuoqiang Shi

Abstract: In this paper, we present a neural network approach to address the dynamic unbalanced optimal transport problem on surfaces with point cloud representation. For surfaces with point cloud representation, traditional method is difficult to apply due to the difficulty of mesh generating. Neural network is easy to implement even for complicate geometry. Moreover, instead of solving the original dynami… ▽ More In this paper, we present a neural network approach to address the dynamic unbalanced optimal transport problem on surfaces with point cloud representation. For surfaces with point cloud representation, traditional method is difficult to apply due to the difficulty of mesh generating. Neural network is easy to implement even for complicate geometry. Moreover, instead of solving the original dynamic formulation, we consider the Hamiltonian flow approach, i.e. Karush-Kuhn-Tucker system. Based on this approach, we can exploit mathematical structure of the optimal transport to construct the neural network and the loss function can be simplified. Extensive numerical experiments are conducted for surfaces with different geometry. We also test the method for point cloud with noise, which shows stability of this method. This method is also easy to generalize to diverse range of problems. △ Less

Submitted 31 July, 2024; originally announced July 2024.

Comments: 24 pages, 11 figures, 7 tables

MSC Class: 65K10; 68T05; 68T07

arXiv:2407.20518 [pdf, other]

High-Resolution Spatial Transcriptomics from Histology Images using HisToSGE

Authors: Zhiceng Shi, Shuailin Xue, Fangfang Zhu, Wenwen Min

Abstract: Spatial transcriptomics (ST) is a groundbreaking genomic technology that enables spatial localization analysis of gene expression within tissue sections. However, it is significantly limited by high costs and sparse spatial resolution. An alternative, more cost-effective strategy is to use deep learning methods to predict high-density gene expression profiles from histological images. However, exi… ▽ More Spatial transcriptomics (ST) is a groundbreaking genomic technology that enables spatial localization analysis of gene expression within tissue sections. However, it is significantly limited by high costs and sparse spatial resolution. An alternative, more cost-effective strategy is to use deep learning methods to predict high-density gene expression profiles from histological images. However, existing methods struggle to capture rich image features effectively or rely on low-dimensional positional coordinates, making it difficult to accurately predict high-resolution gene expression profiles. To address these limitations, we developed HisToSGE, a method that employs a Pathology Image Large Model (PILM) to extract rich image features from histological images and utilizes a feature learning module to robustly generate high-resolution gene expression profiles. We evaluated HisToSGE on four ST datasets, comparing its performance with five state-of-the-art baseline methods. The results demonstrate that HisToSGE excels in generating high-resolution gene expression profiles and performing downstream tasks such as spatial domain identification. All code and public datasets used in this paper are available at https://github.com/wenwenmin/HisToSGE and https://zenodo.org/records/12792163. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.20090 [pdf]

Infrared Small Target Detection based on Adjustable Sensitivity Strategy and Multi-Scale Fusion

Authors: Jinmiao Zhao, Zelin Shi, Chuang Yu, Yunpeng Liu

Abstract: Recently, deep learning-based single-frame infrared small target (SIRST) detection technology has made significant progress. However, existing infrared small target detection methods are often optimized for a fixed image resolution, a single wavelength, or a specific imaging system, limiting their breadth and flexibility in practical applications. Therefore, we propose a refined infrared small tar… ▽ More Recently, deep learning-based single-frame infrared small target (SIRST) detection technology has made significant progress. However, existing infrared small target detection methods are often optimized for a fixed image resolution, a single wavelength, or a specific imaging system, limiting their breadth and flexibility in practical applications. Therefore, we propose a refined infrared small target detection scheme based on an adjustable sensitivity (AS) strategy and multi-scale fusion. Specifically, a multi-scale model fusion framework based on multi-scale direction-aware network (MSDA-Net) is constructed, which uses input images of multiple scales to train multiple models and fuses them. Multi-scale fusion helps characterize the shape, edge, and texture features of the target from different scales, making the model more accurate and reliable in locating the target. At the same time, we fully consider the characteristics of the infrared small target detection task and construct an edge enhancement difficulty mining (EEDM) loss. The EEDM loss helps alleviate the problem of category imbalance and guides the network to pay more attention to difficult target areas and edge features during training. In addition, we propose an adjustable sensitivity strategy for post-processing. This strategy significantly improves the detection rate of infrared small targets while ensuring segmentation accuracy. Extensive experimental results show that the proposed scheme achieves the best performance. Notably, this scheme won the first prize in the PRCV 2024 wide-area infrared small target detection competition. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.19813 [pdf, other]

Improving Retrieval Augmented Language Model with Self-Reasoning

Authors: Yuan Xia, Jingbo Zhou, Zhenhui Shi, Jun Chen, Haifeng Huang

Abstract: The Retrieval-Augmented Language Model (RALM) has shown remarkable performance on knowledge-intensive tasks by incorporating external knowledge during inference, which mitigates the factual hallucinations inherited in large language models (LLMs). Despite these advancements, challenges persist in the implementation of RALMs, particularly concerning their reliability and traceability. To be specifi… ▽ More The Retrieval-Augmented Language Model (RALM) has shown remarkable performance on knowledge-intensive tasks by incorporating external knowledge during inference, which mitigates the factual hallucinations inherited in large language models (LLMs). Despite these advancements, challenges persist in the implementation of RALMs, particularly concerning their reliability and traceability. To be specific, the irrelevant document retrieval may result in unhelpful response generation or even deteriorate the performance of LLMs, while the lack of proper citations in generated outputs complicates efforts to verify the trustworthiness of the models. To this end, we propose a novel self-reasoning framework aimed at improving the reliability and traceability of RALMs, whose core idea is to leverage reasoning trajectories generated by the LLM itself. The framework involves constructing self-reason trajectories with three processes: a relevance-aware process, an evidence-aware selective process, and a trajectory analysis process. We have evaluated our framework across four public datasets (two short-form QA datasets, one long-form QA dataset, and one fact verification dataset) to demonstrate the superiority of our method, which can outperform existing state-of-art models and can achieve comparable performance with GPT-4, while only using 2,000 training samples. △ Less

Submitted 2 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.19555 [pdf]

Crystal-symmetry-paired spin-valley locking in a layered room-temperature antiferromagnet

Authors: Fayuan Zhang, Xingkai Cheng, Zhouyi Yin, Changchao Liu, Liwei Deng, Yuxi Qiao, Zheng Shi, Shuxuan Zhang, Junhao Lin, Zhengtai Liu, Mao Ye, Yaobo Huang, Xiangyu Meng, Cheng Zhang, Taichi Okuda, Kenya Shimada, Shengtao Cui, Yue Zhao, Guang-Han Cao, Shan Qiao, Junwei Liu, Chaoyu Chen

Abstract: Recent theoretical efforts predicted a type of unconventional antiferromagnet characterized by the crystal symmetry C (rotation or mirror), which connects antiferromagnetic sublattices in real space and simultaneously couples spin and momentum in reciprocal space. This results in a unique C-paired spin-valley locking (SVL) and corresponding novel properties such as piezomagnetism and noncollinear… ▽ More Recent theoretical efforts predicted a type of unconventional antiferromagnet characterized by the crystal symmetry C (rotation or mirror), which connects antiferromagnetic sublattices in real space and simultaneously couples spin and momentum in reciprocal space. This results in a unique C-paired spin-valley locking (SVL) and corresponding novel properties such as piezomagnetism and noncollinear spin current even without spin-orbit coupling. However, the unconventional antiferromagnets reported thus far are not layered materials, limiting their potential in spintronic applications. Additionally, they do not meet the necessary symmetry requirements for nonrelativistic spin current. Here, we report the realization of C-paired SVL in a layered room-temperature antiferromagnetic compound, Rb1-δV2Te2O. Spin resolved photoemission measurements directly demonstrate the opposite spin splitting between C-paired valleys. Quasi-particle interference patterns reveal the suppression of inter-valley scattering due to the spin selection rules, as a direct consequence of C-paired SVL. All these experiments are well consistent with the results obtained from first-principles calculations. Our observations represent the first realization of layered antiferromagnets with C-paired SVL, enabling both the advantages of layered materials and possible control through crystal symmetry manipulation. These results hold significant promise and broad implications for advancements in magnetism, electronics, and information technology. △ Less

Submitted 2 August, 2024; v1 submitted 28 July, 2024; originally announced July 2024.

Comments: 22 pages, 5 figures

arXiv:2407.18172 [pdf]

Chip-scale sensor for spectroscopic metrology

Authors: Chunhui Yao, Wanlu Zhang, Peng Bao, Jie Ma, Wei Zhuo, Minjia Chen, Zhitian Shi, Jingwen Zhou, Yuxiao Ye, Liang Ming, Ting Yan, Richard Penty, Qixiang Cheng

Abstract: Miniaturized spectrometers hold great promise for in situ, in vitro, and even in vivo sensing applications. However, their size reduction imposes vital performance constraints in meeting the rigorous demands of spectroscopy, including fine resolution, high accuracy, and ultra-wide observation window. The prevailing view in the community holds that miniaturized spectrometers are most suitable for t… ▽ More Miniaturized spectrometers hold great promise for in situ, in vitro, and even in vivo sensing applications. However, their size reduction imposes vital performance constraints in meeting the rigorous demands of spectroscopy, including fine resolution, high accuracy, and ultra-wide observation window. The prevailing view in the community holds that miniaturized spectrometers are most suitable for the coarse identification of signature peaks. In this paper, we present an integrated reconstructive spectrometer that enables near-infrared (NIR) spectroscopic metrology, and demonstrate a fully packaged sensor with auxiliary electronics. Such a sensor operates over a 520 nm bandwidth together with a resolution of less than 8 pm, which translates into a record-breaking bandwidth-to-resolution ratio of over 65,000. The classification of different types of solid substances and the concentration measurement of aqueous and organic solutions are performed, all achieving approximately 100% accuracy. Notably, the detection limit of our sensor matches that of the commercial benchtop counterparts, which is as low as 0.1% (i.e. 100 mg/dL) for identifying the concentration of glucose solution. △ Less

Submitted 12 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.17902 [pdf, other]

Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization

Authors: Ruijie Tao, Zhan Shi, Yidi Jiang, Duc-Tuan Truong, Eng-Siong Chng, Massimo Alioto, Haizhou Li

Abstract: The human brain has the capability to associate the unknown person's voice and face by leveraging their general relationship, referred to as ``cross-modal speaker verification''. This task poses significant challenges due to the complex relationship between the modalities. In this paper, we propose a ``Multi-stage Face-voice Association Learning with Keynote Speaker Diarization''~(MFV-KSD) framewo… ▽ More The human brain has the capability to associate the unknown person's voice and face by leveraging their general relationship, referred to as ``cross-modal speaker verification''. This task poses significant challenges due to the complex relationship between the modalities. In this paper, we propose a ``Multi-stage Face-voice Association Learning with Keynote Speaker Diarization''~(MFV-KSD) framework. MFV-KSD contains a keynote speaker diarization front-end to effectively address the noisy speech inputs issue. To balance and enhance the intra-modal feature learning and inter-modal correlation understanding, MFV-KSD utilizes a novel three-stage training strategy. Our experimental results demonstrated robust performance, achieving the first rank in the 2024 Face-voice Association in Multilingual Environments (FAME) challenge with an overall Equal Error Rate (EER) of 19.9%. Details can be found in https://github.com/TaoRuijie/MFV-KSD. △ Less

Submitted 25 July, 2024; originally announced July 2024.

arXiv:2407.15720 [pdf, other]

Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability

Authors: Zhuoyan Xu, Zhenmei Shi, Yingyu Liang

Abstract: Large language models (LLMs) have emerged as powerful tools for many AI problems and exhibit remarkable in-context learning (ICL) capabilities. Compositional ability, solving unseen complex tasks that combine two or more simple tasks, is an essential reasoning ability for Artificial General Intelligence. Despite the tremendous success of LLMs, how they approach composite tasks, especially those no… ▽ More Large language models (LLMs) have emerged as powerful tools for many AI problems and exhibit remarkable in-context learning (ICL) capabilities. Compositional ability, solving unseen complex tasks that combine two or more simple tasks, is an essential reasoning ability for Artificial General Intelligence. Despite the tremendous success of LLMs, how they approach composite tasks, especially those not encountered during the pretraining phase, remains an open and largely underexplored question. In this study, we delve into the ICL capabilities of LLMs on composite tasks, with only simple tasks as in-context examples. We develop a test suite of composite tasks including linguistic and logical challenges and perform empirical studies across different LLM families. We observe that models exhibit divergent behaviors: (1) For simpler composite tasks that apply distinct mapping mechanisms to different input segments, the models demonstrate decent compositional ability, while scaling up the model enhances this ability; (2) for more complex composite tasks involving reasoning multiple steps, where each step represents one task, models typically underperform, and scaling up generally provides no improvements. We offer theoretical analysis in a simplified setting, explaining that models exhibit compositional capability when the task handles different input parts separately. We believe our work sheds new light on the capabilities of LLMs in solving composite tasks regarding the nature of the tasks and model scale. Our dataset and code are available at {\url{https://github.com/OliverXUZY/LLM_Compose}}. △ Less

Submitted 11 August, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.15429 [pdf, other]

doi 10.1109/TPAMI.2024.3396809

Learning at a Glance: Towards Interpretable Data-limited Continual Semantic Segmentation via Semantic-Invariance Modelling

Authors: Bo Yuan, Danpei Zhao, Zhenwei Shi

Abstract: Continual semantic segmentation (CSS) based on incremental learning (IL) is a great endeavour in developing human-like segmentation models. However, current CSS approaches encounter challenges in the trade-off between preserving old knowledge and learning new ones, where they still need large-scale annotated data for incremental training and lack interpretability. In this paper, we present Learnin… ▽ More Continual semantic segmentation (CSS) based on incremental learning (IL) is a great endeavour in developing human-like segmentation models. However, current CSS approaches encounter challenges in the trade-off between preserving old knowledge and learning new ones, where they still need large-scale annotated data for incremental training and lack interpretability. In this paper, we present Learning at a Glance (LAG), an efficient, robust, human-like and interpretable approach for CSS. Specifically, LAG is a simple and model-agnostic architecture, yet it achieves competitive CSS efficiency with limited incremental data. Inspired by human-like recognition patterns, we propose a semantic-invariance modelling approach via semantic features decoupling that simultaneously reconciles solid knowledge inheritance and new-term learning. Concretely, the proposed decoupling manner includes two ways, i.e., channel-wise decoupling and spatial-level neuron-relevant semantic consistency. Our approach preserves semantic-invariant knowledge as solid prototypes to alleviate catastrophic forgetting, while also constraining sample-specific contents through an asymmetric contrastive learning method to enhance model robustness during IL steps. Experimental results in multiple datasets validate the effectiveness of the proposed method. Furthermore, we introduce a novel CSS protocol that better reflects realistic data-limited CSS settings, and LAG achieves superior performance under multiple data-limited conditions. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.15317 [pdf, other]

Open-CD: A Comprehensive Toolbox for Change Detection

Authors: Kaiyu Li, Jiawei Jiang, Andrea Codegoni, Chengxi Han, Yupeng Deng, Keyan Chen, Zhuo Zheng, Hao Chen, Zhengxia Zou, Zhenwei Shi, Sheng Fang, Deyu Meng, Zhi Wang, Xiangyong Cao

Abstract: We present Open-CD, a change detection toolbox that contains a rich set of change detection methods as well as related components and modules. The toolbox started from a series of open source general vision task tools, including OpenMMLab Toolkits, PyTorch Image Models, etc. It gradually evolves into a unified platform that covers many popular change detection methods and contemporary modules. It… ▽ More We present Open-CD, a change detection toolbox that contains a rich set of change detection methods as well as related components and modules. The toolbox started from a series of open source general vision task tools, including OpenMMLab Toolkits, PyTorch Image Models, etc. It gradually evolves into a unified platform that covers many popular change detection methods and contemporary modules. It not only includes training and inference codes, but also provides some useful scripts for data analysis. We believe this toolbox is by far the most complete change detection toolbox. In this report, we introduce the various features, supported methods and applications of Open-CD. In addition, we also conduct a benchmarking study on different methods and components. We wish that the toolbox and benchmark could serve the growing research community by providing a flexible toolkit to reimplement existing methods and develop their own new change detectors. Code and models are available at \url{https://github.com/likyoo/open-cd}. Pioneeringly, this report also includes brief descriptions of the algorithms supported in Open-CD, mainly contributed by their authors. We sincerely encourage researchers in this field to participate in this project and work together to create a more open community. This toolkit and report will be kept updated. △ Less

Submitted 21 July, 2024; originally announced July 2024.

Comments: 9 pages

arXiv:2407.15162 [pdf, other]

Random walk on dynamical percolation in Euclidean lattices: separating critical and supercritical regimes

Authors: Chenlin Gu, Jianping Jiang, Yuval Peres, Zhan Shi, Hao Wu, Fan Yang

Abstract: We study the random walk on dynamical percolation of $\mathbb{Z}^d$ (resp., the two-dimensional triangular lattice $\mathcal{T}$), where each edge (resp., each site) can be either open or closed, refreshing its status at rate $μ\in (0,1/e]$. The random walk moves along open edges in $\mathbb{Z}^d$ (resp., open sites in $\mathcal{T}$) at rate $1$. For the critical regime $p=p_c$, we prove the follo… ▽ More We study the random walk on dynamical percolation of $\mathbb{Z}^d$ (resp., the two-dimensional triangular lattice $\mathcal{T}$), where each edge (resp., each site) can be either open or closed, refreshing its status at rate $μ\in (0,1/e]$. The random walk moves along open edges in $\mathbb{Z}^d$ (resp., open sites in $\mathcal{T}$) at rate $1$. For the critical regime $p=p_c$, we prove the following two results: on $\mathcal{T}$, the mean squared displacement of the random walk from $0$ to $t$ is at most $O(tμ^{5/132-ε})$ for any $ε>0$; on $\mathbb{Z}^d$ with $d\geq 11$, the corresponding upper bound for the mean squared displacement is $O(t μ^{1/2}\log(1/μ))$. For the supercritical regime $p>p_c$, we prove that the mean squared displacement on $\mathbb{Z}^d$ is at least $ct$ for some $c=c(d)>0$ that does not depend on $μ$. △ Less

Submitted 1 August, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

Comments: 23 pages, 1 figure; minor revision

MSC Class: 60K35; 60K37

arXiv:2407.15079 [pdf, other]

Speed of random walk on dynamical percolation in nonamenable transitive graphs

Authors: Chenlin Gu, Jianping Jiang, Yuval Peres, Zhan Shi, Hao Wu, Fan Yang

Abstract: Let $G$ be a nonamenable transitive unimodular graph. In dynamical percolation, every edge in $G$ refreshes its status at rate $μ>0$, and following the refresh, each edge is open independently with probability $p$. The random walk traverses $G$ only along open edges, moving at rate $1$. In the critical regime $p=p_c$, we prove that the speed of the random walk is at most $O(\sqrt{μ\log(1/μ)})$, pr… ▽ More Let $G$ be a nonamenable transitive unimodular graph. In dynamical percolation, every edge in $G$ refreshes its status at rate $μ>0$, and following the refresh, each edge is open independently with probability $p$. The random walk traverses $G$ only along open edges, moving at rate $1$. In the critical regime $p=p_c$, we prove that the speed of the random walk is at most $O(\sqrt{μ\log(1/μ)})$, provided that $μ\le e^{-1}$. In the supercritical regime $p>p_c$, we prove that the speed on $G$ is of order 1 (uniformly in $μ)$, while in the subcritical regime $p<p_c$, the speed is of order $μ\wedge 1$. △ Less

Submitted 21 July, 2024; originally announced July 2024.

Comments: 29 pages, 1 figure

arXiv:2407.14717 [pdf, other]

Differential Privacy of Cross-Attention with Provable Guarantee

Authors: Jiuxiang Gu, Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

Abstract: Cross-attention has become a fundamental module nowadays in many important artificial intelligence applications, e.g., retrieval-augmented generation (RAG), system prompt, guided stable diffusion, and many so on. Ensuring cross-attention privacy is crucial and urgently needed because its key and value matrices may contain sensitive information about companies and their users, many of which profit… ▽ More Cross-attention has become a fundamental module nowadays in many important artificial intelligence applications, e.g., retrieval-augmented generation (RAG), system prompt, guided stable diffusion, and many so on. Ensuring cross-attention privacy is crucial and urgently needed because its key and value matrices may contain sensitive information about companies and their users, many of which profit solely from their system prompts or RAG data. In this work, we design a novel differential privacy (DP) data structure to address the privacy security of cross-attention with a theoretical guarantee. In detail, let $n$ be the input token length of system prompt/RAG data, $d$ be the feature dimension, $0 < α\le 1$ be the relative error parameter, $R$ be the maximum value of the query and key matrices, $R_w$ be the maximum value of the value matrix, and $r,s,ε_s$ be parameters of polynomial kernel methods. Then, our data structure requires $\widetilde{O}(ndr^2)$ memory consumption with $\widetilde{O}(nr^2)$ initialization time complexity and $\widetilde{O}(α^{-1} r^2)$ query time complexity for a single token query. In addition, our data structure can guarantee that the user query is $(ε, δ)$-DP with $\widetilde{O}(n^{-1} ε^{-1} α^{-1/2} R^{2s} R_w r^2)$ additive error and $n^{-1} (α+ ε_s)$ relative error between our output and the true answer. Furthermore, our result is robust to adaptive queries in which users can intentionally attack the cross-attention system. To our knowledge, this is the first work to provide DP for cross-attention. We believe it can inspire more privacy algorithm design in large generative models (LGMs). △ Less

Submitted 19 July, 2024; originally announced July 2024.

arXiv:2407.14032 [pdf, other]

Semantic-CC: Boosting Remote Sensing Image Change Captioning via Foundational Knowledge and Semantic Guidance

Authors: Yongshuo Zhu, Lu Li, Keyan Chen, Chenyang Liu, Fugen Zhou, Zhenwei Shi

Abstract: Remote sensing image change captioning (RSICC) aims to articulate the changes in objects of interest within bi-temporal remote sensing images using natural language. Given the limitations of current RSICC methods in expressing general features across multi-temporal and spatial scenarios, and their deficiency in providing granular, robust, and precise change descriptions, we introduce a novel chang… ▽ More Remote sensing image change captioning (RSICC) aims to articulate the changes in objects of interest within bi-temporal remote sensing images using natural language. Given the limitations of current RSICC methods in expressing general features across multi-temporal and spatial scenarios, and their deficiency in providing granular, robust, and precise change descriptions, we introduce a novel change captioning (CC) method based on the foundational knowledge and semantic guidance, which we term Semantic-CC. Semantic-CC alleviates the dependency of high-generalization algorithms on extensive annotations by harnessing the latent knowledge of foundation models, and it generates more comprehensive and accurate change descriptions guided by pixel-level semantics from change detection (CD). Specifically, we propose a bi-temporal SAM-based encoder for dual-image feature extraction; a multi-task semantic aggregation neck for facilitating information interaction between heterogeneous tasks; a straightforward multi-scale change detection decoder to provide pixel-level semantic guidance; and a change caption decoder based on the large language model (LLM) to generate change description sentences. Moreover, to ensure the stability of the joint training of CD and CC, we propose a three-stage training strategy that supervises different tasks at various stages. We validate the proposed method on the LEVIR-CC and LEVIR-CD datasets. The experimental results corroborate the complementarity of CD and CC, demonstrating that Semantic-CC can generate more accurate change descriptions and achieve optimal performance across both tasks. △ Less

Submitted 19 July, 2024; originally announced July 2024.

Showing 1–50 of 1,190 results for author: Shi, Z