Search | arXiv e-print repository

MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection

Authors: Yaning Zhang, Tianyi Wang, Zitong Yu, Zan Gao, Linlin Shen, Shengyong Chen

Abstract: The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia, highlighting the urgent need for robust and generalizable face forgery detection (FFD) techniques. Although existing approaches mainly capture face forgery patterns using image modality, other modalities like fine-grained noises and texts are not fully explored, which limits th… ▽ More The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia, highlighting the urgent need for robust and generalizable face forgery detection (FFD) techniques. Although existing approaches mainly capture face forgery patterns using image modality, other modalities like fine-grained noises and texts are not fully explored, which limits the generalization capability of the model. In addition, most FFD methods tend to identify facial images generated by GAN, but struggle to detect unseen diffusion-synthesized ones. To address the limitations, we aim to leverage the cutting-edge foundation model, contrastive language-image pre-training (CLIP), to achieve generalizable diffusion face forgery detection (DFFD). In this paper, we propose a novel multi-modal fine-grained CLIP (MFCLIP) model, which mines comprehensive and fine-grained forgery traces across image-noise modalities via language-guided face forgery representation learning, to facilitate the advancement of DFFD. Specifically, we devise a fine-grained language encoder (FLE) that extracts fine global language features from hierarchical text prompts. We design a multi-modal vision encoder (MVE) to capture global image forgery embeddings as well as fine-grained noise forgery patterns extracted from the richest patch, and integrate them to mine general visual forgery traces. Moreover, we build an innovative plug-and-play sample pair attention (SPA) method to emphasize relevant negative pairs and suppress irrelevant ones, allowing cross-modality sample pairs to conduct more flexible alignment. Extensive experiments and visualizations show that our model outperforms the state of the arts on different settings like cross-generator, cross-forgery, and cross-dataset evaluations. △ Less

Submitted 15 September, 2024; originally announced September 2024.

arXiv:2409.09628 [pdf, other]

Can Large Language Models Grasp Event Signals? Exploring Pure Zero-Shot Event-based Recognition

Authors: Zongyou Yu, Qiang Qu, Xiaoming Chen, Chen Wang

Abstract: Recent advancements in event-based zero-shot object recognition have demonstrated promising results. However, these methods heavily depend on extensive training and are inherently constrained by the characteristics of CLIP. To the best of our knowledge, this research is the first study to explore the understanding capabilities of large language models (LLMs) for event-based visual content. We demo… ▽ More Recent advancements in event-based zero-shot object recognition have demonstrated promising results. However, these methods heavily depend on extensive training and are inherently constrained by the characteristics of CLIP. To the best of our knowledge, this research is the first study to explore the understanding capabilities of large language models (LLMs) for event-based visual content. We demonstrate that LLMs can achieve event-based object recognition without additional training or fine-tuning in conjunction with CLIP, effectively enabling pure zero-shot event-based recognition. Particularly, we evaluate the ability of GPT-4o / 4turbo and two other open-source LLMs to directly recognize event-based visual content. Extensive experiments are conducted across three benchmark datasets, systematically assessing the recognition accuracy of these models. The results show that LLMs, especially when enhanced with well-designed prompts, significantly improve event-based zero-shot recognition performance. Notably, GPT-4o outperforms the compared models and exceeds the recognition accuracy of state-of-the-art event-based zero-shot methods on N-ImageNet by five orders of magnitude. The implementation of this paper is available at \url{https://github.com/ChrisYu-Zz/Pure-event-based-recognition-based-LLM}. △ Less

Submitted 15 September, 2024; originally announced September 2024.

arXiv:2409.08681 [pdf, other]

SLIM: Scalable and Lightweight LiDAR Mapping in Urban Environments

Authors: Zehuan Yu, Zhijian Qiao, Wenyi Liu, Huan Yin, Shaojie Shen

Abstract: LiDAR point cloud maps are extensively utilized on roads for robot navigation due to their high consistency. However, dense point clouds face challenges of high memory consumption and reduced maintainability for long-term operations. In this study, we introduce SLIM, a scalable and lightweight mapping system for long-term LiDAR mapping in urban environments. The system begins by parameterizing str… ▽ More LiDAR point cloud maps are extensively utilized on roads for robot navigation due to their high consistency. However, dense point clouds face challenges of high memory consumption and reduced maintainability for long-term operations. In this study, we introduce SLIM, a scalable and lightweight mapping system for long-term LiDAR mapping in urban environments. The system begins by parameterizing structural point clouds into lines and planes. These lightweight and structural representations meet the requirements of map merging, pose graph optimization, and bundle adjustment, ensuring incremental management and local consistency. For long-term operations, a map-centric nonlinear factor recovery method is designed to sparsify poses while preserving mapping accuracy. We validate the SLIM system with multi-session real-world LiDAR data from classical LiDAR mapping datasets, including KITTI, NCLT, and HeLiPR. The experiments demonstrate its capabilities in mapping accuracy, lightweightness, and scalability. Map re-use is also verified through map-based robot localization. Ultimately, with multi-session LiDAR data, the SLIM system provides a globally consistent map with low memory consumption (130 KB/km). We have made our code open-source to benefit the community. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: 20 pages, 16 figures

arXiv:2409.08572 [pdf, other]

DiffFAS: Face Anti-Spoofing via Generative Diffusion Models

Authors: Xinxu Ge, Xin Liu, Zitong Yu, Jingang Shi, Chun Qi, Jie Li, Heikki Kälviäinen

Abstract: Face anti-spoofing (FAS) plays a vital role in preventing face recognition (FR) systems from presentation attacks. Nowadays, FAS systems face the challenge of domain shift, impacting the generalization performance of existing FAS methods. In this paper, we rethink about the inherence of domain shift and deconstruct it into two factors: image style and image quality. Quality influences the purity o… ▽ More Face anti-spoofing (FAS) plays a vital role in preventing face recognition (FR) systems from presentation attacks. Nowadays, FAS systems face the challenge of domain shift, impacting the generalization performance of existing FAS methods. In this paper, we rethink about the inherence of domain shift and deconstruct it into two factors: image style and image quality. Quality influences the purity of the presentation of spoof information, while style affects the manner in which spoof information is presented. Based on our analysis, we propose DiffFAS framework, which quantifies quality as prior information input into the network to counter image quality shift, and performs diffusion-based high-fidelity cross-domain and cross-attack types generation to counter image style shift. DiffFAS transforms easily collectible live faces into high-fidelity attack faces with precise labels while maintaining consistency between live and spoof face identities, which can also alleviate the scarcity of labeled data with novel type attacks faced by nowadays FAS system. We demonstrate the effectiveness of our framework on challenging cross-domain and cross-attack FAS datasets, achieving the state-of-the-art performance. Available at https://github.com/murphytju/DiffFAS. △ Less

Submitted 13 September, 2024; originally announced September 2024.

Comments: ECCV 24

arXiv:2409.07800 [pdf, ps, other]

Large deviation inequalities for the nonlinear unbalanced urn model

Authors: Jianan Shi, Zhenhong Yu, Yu Miao

Abstract: In the present paper, we consider the two-color nonlinear unbalanced urn model, under a drawing rule reinforced by an $\mathbb{R}^+$-valued concave function and an unbalanced replacement matrix. The large deviation inequalities for the nonlinear unbalanced urn model are established by using the stochastic approximation theory. As an auxiliary theory, we give a specific large deviation inequality f… ▽ More In the present paper, we consider the two-color nonlinear unbalanced urn model, under a drawing rule reinforced by an $\mathbb{R}^+$-valued concave function and an unbalanced replacement matrix. The large deviation inequalities for the nonlinear unbalanced urn model are established by using the stochastic approximation theory. As an auxiliary theory, we give a specific large deviation inequality for a general stochastic approximation algorithm. △ Less

Submitted 12 September, 2024; originally announced September 2024.

MSC Class: 60F10; 62L20

arXiv:2409.07761 [pdf]

CTLESS: A scatter-window projection and deep learning-based transmission-less attenuation compensation method for myocardial perfusion SPECT

Authors: Zitong Yu, Md Ashequr Rahman, Craig K. Abbey, Richard Laforest, Nancy A. Obuchowski, Barry A. Siegel, Abhinav K. Jha

Abstract: Attenuation compensation (AC), while being beneficial for visual-interpretation tasks in myocardial perfusion imaging (MPI) by SPECT, typically requires the availability of a separate X-ray CT component, leading to additional radiation dose, higher costs, and potentially inaccurate diagnosis due to SPECT/CT misalignment. To address these issues, we developed a method for cardiac SPECT AC using dee… ▽ More Attenuation compensation (AC), while being beneficial for visual-interpretation tasks in myocardial perfusion imaging (MPI) by SPECT, typically requires the availability of a separate X-ray CT component, leading to additional radiation dose, higher costs, and potentially inaccurate diagnosis due to SPECT/CT misalignment. To address these issues, we developed a method for cardiac SPECT AC using deep learning and emission scatter-window photons without a separate transmission scan (CTLESS). In this method, an estimated attenuation map reconstructed from scatter-energy window projections is segmented into different regions using a multi-channel input multi-decoder network trained on CT scans. Pre-defined attenuation coefficients are assigned to these regions, yielding the attenuation map used for AC. We objectively evaluated this method in a retrospective study with anonymized clinical SPECT/CT stress MPI images on the clinical task of detecting defects with an anthropomorphic model observer. CTLESS yielded statistically non-inferior performance compared to a CT-based AC (CTAC) method and significantly outperformed a non-AC (NAC) method on this clinical task. Similar results were observed in stratified analyses with different sexes, defect extents and severities. The method was observed to generalize across two SPECT scanners, each with a different camera. In addition, CTLESS yielded similar performance as CTAC and outperformed NAC method on the metrics of root mean squared error and structural similarity index measure. Moreover, as we reduced the training dataset size, CTLESS yielded relatively stable AUC values and generally outperformed another DL-based AC method that directly estimated the attenuation coefficient within each voxel. These results demonstrate the capability of the CTLESS method for transmission-less AC in SPECT and motivate further clinical evaluation. △ Less

Submitted 12 September, 2024; originally announced September 2024.

arXiv:2409.06882 [pdf, other]

Azimuthal modulations and extraction of generalized parton distributions

Authors: Jian-Wei Qiu, Nobuo Sato, Zhite Yu

Abstract: Azimuthal modulations are crucial for the phenomenological extraction and separation of various generalized parton distributions. We provide a new choice of frame and corresponding formalism to describe the azimuthal distributions, based on the separation of physics occurring at different momentum scales. We demonstrate that this new description is not only well-suited for experimental analysis, b… ▽ More Azimuthal modulations are crucial for the phenomenological extraction and separation of various generalized parton distributions. We provide a new choice of frame and corresponding formalism to describe the azimuthal distributions, based on the separation of physics occurring at different momentum scales. We demonstrate that this new description is not only well-suited for experimental analysis, but also advantageous in separating contributions from different subprocesses to the same physical cross section. △ Less

Submitted 10 September, 2024; originally announced September 2024.

Comments: 5 pages, 3 figures, plus a 3-page supplemental material

Report number: JLAB-THY-24-4181

arXiv:2409.06659 [pdf, other]

Amortized Stabilizer Rényi Entropy of Quantum Dynamics

Authors: Chengkai Zhu, Yu-Ao Chen, Zanqiu Shen, Zhiping Liu, Zhan Yu, Xin Wang

Abstract: Unraveling the secrets of how much nonstabilizerness a quantum dynamic can generate is crucial for harnessing the power of magic states, the essential resources for achieving quantum advantage and realizing fault-tolerant quantum computation. In this work, we introduce the amortized $α$-stabilizer Rényi entropy, a magic monotone for unitary operations that quantifies the nonstabilizerness generati… ▽ More Unraveling the secrets of how much nonstabilizerness a quantum dynamic can generate is crucial for harnessing the power of magic states, the essential resources for achieving quantum advantage and realizing fault-tolerant quantum computation. In this work, we introduce the amortized $α$-stabilizer Rényi entropy, a magic monotone for unitary operations that quantifies the nonstabilizerness generation capability of quantum dynamics. Amortization is key in quantifying the magic of quantum dynamics, as we reveal that nonstabilizerness generation can be enhanced by prior nonstabilizerness in input states when considering the $α$-stabilizer Rényi entropy, while this is not the case for robustness of magic or stabilizer extent. We demonstrate the versatility of the amortized $α$-stabilizer Rényi entropy in investigating the nonstabilizerness resources of quantum dynamics of computational and fundamental interest. In particular, we establish improved lower bounds on the $T$-count of quantum Fourier transforms and the quantum evolutions of one-dimensional Heisenberg Hamiltonians, showcasing the power of this tool in studying quantum advantages and the corresponding cost in fault-tolerant quantum computation. △ Less

Submitted 10 September, 2024; originally announced September 2024.

Comments: 5 + 7 pages, 2 figures

arXiv:2409.06209 [pdf, other]

Adaptive Transformer Modelling of Density Function for Nonparametric Survival Analysis

Authors: Xin Zhang, Deval Mehta, Yanan Hu, Chao Zhu, David Darby, Zhen Yu, Daniel Merlo, Melissa Gresle, Anneke Van Der Walt, Helmut Butzkueven, Zongyuan Ge

Abstract: Survival analysis holds a crucial role across diverse disciplines, such as economics, engineering and healthcare. It empowers researchers to analyze both time-invariant and time-varying data, encompassing phenomena like customer churn, material degradation and various medical outcomes. Given the complexity and heterogeneity of such data, recent endeavors have demonstrated successful integration of… ▽ More Survival analysis holds a crucial role across diverse disciplines, such as economics, engineering and healthcare. It empowers researchers to analyze both time-invariant and time-varying data, encompassing phenomena like customer churn, material degradation and various medical outcomes. Given the complexity and heterogeneity of such data, recent endeavors have demonstrated successful integration of deep learning methodologies to address limitations in conventional statistical approaches. However, current methods typically involve cluttered probability distribution function (PDF), have lower sensitivity in censoring prediction, only model static datasets, or only rely on recurrent neural networks for dynamic modelling. In this paper, we propose a novel survival regression method capable of producing high-quality unimodal PDFs without any prior distribution assumption, by optimizing novel Margin-Mean-Variance loss and leveraging the flexibility of Transformer to handle both temporal and non-temporal data, coined UniSurv. Extensive experiments on several datasets demonstrate that UniSurv places a significantly higher emphasis on censoring compared to other methods. △ Less

Submitted 10 September, 2024; originally announced September 2024.

arXiv:2409.05522 [pdf, other]

Design and Implementation of TAO DAQ System

Authors: Shuihan Zhang, Chao Chen, Xiaolu Ji, Fei Li, Yu Peng, Fabrizio Petrucci, Yinhui Wu, Zezhong Yu, Tingxuan Zeng, Kejun Zhu

Abstract: Purpose: The Taishan Antineutrino Observatory (TAO) is a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO), also known as JUNO-TAO. Located close to one of the reactors of the Taishan Nuclear Power Plant, TAO will measure the antineutrino energy spectrum precisely as a reference spectrum for JUNO. The data acquisition (DAQ) system is designed to acquire data from the TAO… ▽ More Purpose: The Taishan Antineutrino Observatory (TAO) is a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO), also known as JUNO-TAO. Located close to one of the reactors of the Taishan Nuclear Power Plant, TAO will measure the antineutrino energy spectrum precisely as a reference spectrum for JUNO. The data acquisition (DAQ) system is designed to acquire data from the TAO readout electronics and process it with software trigger and data compression algorithms. The data storage bandwidth is limited by the onsite network to be less than 100 Mb/s. Methods: The system is designed based on a distributed architecture, with fully decoupled modules to facilitate customized design and implementation. It is divided into two main components: the data flow system and the online software. The online software serves as the foundation, providing the electronics configuration, the process management, the run control, and the information sharing. The data flow system facilitates continuous data acquisition from various electronic boards or trigger systems, assembles and processes raw data, and ultimately stores it on the disk. Results: The core functionality of the system has been designed and developed. The usability of the data flow system interface and the software trigger results have been verified during the pre-installation testing phase. Conclusion: The DAQ system has been deployed for the TAO experiment. It has also successfully been applied to the integration test of the detector and electronics prototypes. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.05086 [pdf, other]

Exploring the Optimal Size of Grid-forming Energy Storage in an Off-grid Renewable P2H System under Multi-timescale Energy Management

Authors: Jie Zhu, Yiwei Qiu, Yangjun Zeng, Yi Zhou, Shi Chen, Tianlei Zang, Buxiang Zhou, Zhipeng Yu, Jin Lin

Abstract: Utility-scale off-grid renewable power-to-hydrogen systems (OReP2HSs) typically include photovoltaic plants, wind turbines, electrolyzers (ELs), and energy storage systems. As an island system, OReP2HS requires at least one component, generally the battery energy storage system (BESS), that operates for grid-forming control to provide frequency and voltage references and regulate them through tran… ▽ More Utility-scale off-grid renewable power-to-hydrogen systems (OReP2HSs) typically include photovoltaic plants, wind turbines, electrolyzers (ELs), and energy storage systems. As an island system, OReP2HS requires at least one component, generally the battery energy storage system (BESS), that operates for grid-forming control to provide frequency and voltage references and regulate them through transient power support and short-term energy balance regulation. While larger BESS capacity increases this ability, it also raises investment costs. This paper proposes a framework of layered multi-timescale energy management system (EMS) and evaluates the most cost-effective size of the grid-forming BESS in the OReP2HS. The proposed EMS covers the timescales ranging from those for power system transient behaviors to intra-day scheduling, coordinating renewable power, BESS, and ELs. Then, an iterative search procedure based on high-fidelity simulation is employed to determine the size of the BESS with minimal levelized cost of hydrogen (LCOH). Simulations over a reference year, based on the data from a planned OReP2HS project in Inner Mongolia, China, show that with the proposed EMS, the base-case optimal LCOH is 33.212 CNY/kg (4.581 USD/kg). The capital expenditure of the BESS accounts for 17.83% of the total, and the optimal BESS size accounts for 13.6% of the rated hourly energy output of power sources. Sensitivity analysis reveals that by reducing the electrolytic load adjustment time step from 90 to 5 s and increasing its ramping limit from 1% to 10% rated power per second, the BESS size decreases by 53.57%, and the LCOH decreases to 25.458 CNY/kg (3.511 USD/kg). Considering the cost of designing and manufacturing utility-scale ELs with fast load regulation capability, a load adjustment time step of 5-10 s and a ramping limit of 4-6% rated power per second are recommended. △ Less

Submitted 8 September, 2024; originally announced September 2024.

arXiv:2409.04381 [pdf]

Enhancing Skin Lesion Diagnosis with Ensemble Learning

Authors: Xiaoyi Liu, Zhou Yu, Lianghao Tan, Yafeng Yan, Ge Shi

Abstract: Skin lesions are an increasingly significant medical concern, varying widely in severity from benign to cancerous. Accurate diagnosis is essential for ensuring timely and appropriate treatment. This study examines the implementation of deep learning methods to assist in the diagnosis of skin lesions using the HAM10000 dataset, which contains seven distinct types of lesions. First, we evaluated thr… ▽ More Skin lesions are an increasingly significant medical concern, varying widely in severity from benign to cancerous. Accurate diagnosis is essential for ensuring timely and appropriate treatment. This study examines the implementation of deep learning methods to assist in the diagnosis of skin lesions using the HAM10000 dataset, which contains seven distinct types of lesions. First, we evaluated three pre-trained models: MobileNetV2, ResNet18, and VGG11, achieving accuracies of 0.798, 0.802, and 0.805, respectively. To further enhance classification accuracy, we developed ensemble models employing max voting, average voting, and stacking, resulting in accuracies of 0.803, 0.82, and 0.83. Building on the best-performing ensemble learning model, stacking, we developed our proposed model, SkinNet, which incorporates a customized architecture and fine-tuning, achieving an accuracy of 0.867 and an AUC of 0.96. This substantial improvement over individual models demonstrates the effectiveness of ensemble learning in improving skin lesion classification. △ Less

Submitted 6 September, 2024; originally announced September 2024.

arXiv:2409.04240 [pdf, other]

Network reconstruction may not mean dynamics prediction

Authors: Zhendong Yu, Haiping Huang

Abstract: With an increasing amount of observations on the dynamics of many complex systems, it is required to reveal the underlying mechanisms behind these complex dynamics, which is fundamentally important in many scientific fields such as climate, financial, ecological, and neural systems. The underlying mechanisms are commonly encoded into network structures, e.g., capturing how constituents interact wi… ▽ More With an increasing amount of observations on the dynamics of many complex systems, it is required to reveal the underlying mechanisms behind these complex dynamics, which is fundamentally important in many scientific fields such as climate, financial, ecological, and neural systems. The underlying mechanisms are commonly encoded into network structures, e.g., capturing how constituents interact with each other to produce emergent behavior. Here, we address whether a good network reconstruction suggests a good dynamics prediction. The answer is quite dependent on the nature of the supplied (observed) dynamics sequences measured on the complex system. When the dynamics are not chaotic, network reconstruction implies dynamics prediction. In contrast, even if a network can be well reconstructed from the chaotic time series (chaos means that many unstable dynamics states coexist), the prediction of the future dynamics can become impossible as at some future point the prediction error will be amplified. This is explained by using dynamical mean-field theory on a toy model of random recurrent neural networks. △ Less

Submitted 6 September, 2024; originally announced September 2024.

Comments: 27 pages, 9 figures

arXiv:2409.03501 [pdf, other]

Towards Data-Centric Face Anti-Spoofing: Improving Cross-domain Generalization via Physics-based Data Synthesis

Authors: Rizhao Cai, Cecelia Soh, Zitong Yu, Haoliang Li, Wenhan Yang, Alex Kot

Abstract: Face Anti-Spoofing (FAS) research is challenged by the cross-domain problem, where there is a domain gap between the training and testing data. While recent FAS works are mainly model-centric, focusing on developing domain generalization algorithms for improving cross-domain performance, data-centric research for face anti-spoofing, improving generalization from data quality and quantity, is large… ▽ More Face Anti-Spoofing (FAS) research is challenged by the cross-domain problem, where there is a domain gap between the training and testing data. While recent FAS works are mainly model-centric, focusing on developing domain generalization algorithms for improving cross-domain performance, data-centric research for face anti-spoofing, improving generalization from data quality and quantity, is largely ignored. Therefore, our work starts with data-centric FAS by conducting a comprehensive investigation from the data perspective for improving cross-domain generalization of FAS models. More specifically, at first, based on physical procedures of capturing and recapturing, we propose task-specific FAS data augmentation (FAS-Aug), which increases data diversity by synthesizing data of artifacts, such as printing noise, color distortion, moiré pattern, \textit{etc}. Our experiments show that using our FAS augmentation can surpass traditional image augmentation in training FAS models to achieve better cross-domain performance. Nevertheless, we observe that models may rely on the augmented artifacts, which are not environment-invariant, and using FAS-Aug may have a negative effect. As such, we propose Spoofing Attack Risk Equalization (SARE) to prevent models from relying on certain types of artifacts and improve the generalization performance. Last but not least, our proposed FAS-Aug and SARE with recent Vision Transformer backbones can achieve state-of-the-art performance on the FAS cross-domain generalization protocols. The implementation is available at https://github.com/RizhaoCai/FAS_Aug. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: Accepted by International Journal of Computer Vision (IJCV) in Sept 2024

arXiv:2409.03368 [pdf, other]

Training-free Conversion of Pretrained ANNs to SNNs for Low-Power and High-Performance Applications

Authors: Tong Bu, Maohua Li, Zhaofei Yu

Abstract: Spiking Neural Networks (SNNs) have emerged as a promising substitute for Artificial Neural Networks (ANNs) due to their advantages of fast inference and low power consumption. However, the lack of efficient training algorithms has hindered their widespread adoption. Existing supervised learning algorithms for SNNs require significantly more memory and time than their ANN counterparts. Even common… ▽ More Spiking Neural Networks (SNNs) have emerged as a promising substitute for Artificial Neural Networks (ANNs) due to their advantages of fast inference and low power consumption. However, the lack of efficient training algorithms has hindered their widespread adoption. Existing supervised learning algorithms for SNNs require significantly more memory and time than their ANN counterparts. Even commonly used ANN-SNN conversion methods necessitate re-training of ANNs to enhance conversion efficiency, incurring additional computational costs. To address these challenges, we propose a novel training-free ANN-SNN conversion pipeline. Our approach directly converts pre-trained ANN models into high-performance SNNs without additional training. The conversion pipeline includes a local-learning-based threshold balancing algorithm, which enables efficient calculation of the optimal thresholds and fine-grained adjustment of threshold value by channel-wise scaling. We demonstrate the scalability of our framework across three typical computer vision tasks: image classification, semantic segmentation, and object detection. This showcases its applicability to both classification and regression tasks. Moreover, we have evaluated the energy consumption of the converted SNNs, demonstrating their superior low-power advantage compared to conventional ANNs. Our training-free algorithm outperforms existing methods, highlighting its practical applicability and efficiency. This approach simplifies the deployment of SNNs by leveraging open-source pre-trained ANN models and neuromorphic hardware, enabling fast, low-power inference with negligible performance reduction. △ Less

Submitted 5 September, 2024; originally announced September 2024.

arXiv:2409.00803 [pdf]

Broadband light extraction from near-surface NV centers using crystalline-silicon antennas

Authors: Minjeong Kim, Maryam Zahedian, Wenxin Wu, Chengyu Fang, Zhaoning Yu, Raymond A. Wambold, Shenwei Yin, David A. Czaplewski, Jennifer T. Choy, Mikhail A. Kats

Abstract: We use crystalline silicon (Si) antennas to efficiently extract broadband single-photon fluorescence from shallow nitrogen-vacancy (NV) centers in diamond into free space. Our design features relatively easy-to-pattern high-index Si resonators on the diamond surface to boost photon extraction by overcoming total internal reflection and Fresnel reflection at the diamond-air interface, and providing… ▽ More We use crystalline silicon (Si) antennas to efficiently extract broadband single-photon fluorescence from shallow nitrogen-vacancy (NV) centers in diamond into free space. Our design features relatively easy-to-pattern high-index Si resonators on the diamond surface to boost photon extraction by overcoming total internal reflection and Fresnel reflection at the diamond-air interface, and providing modest Purcell enhancement, without etching or otherwise damaging the diamond surface. In simulations, ~20 times more single photons are collected from a single NV center compared to the case without the antenna; in experiments, we observe an enhancement of ~4 times, limited by spatial alignment between the NV and the antenna. Our approach can be readily applied to other color centers in diamond, and more generally to the extraction of light from quantum emitters in wide-bandgap materials. △ Less

Submitted 1 September, 2024; originally announced September 2024.

Comments: Main text + supplementary

arXiv:2409.00694 [pdf, other]

IAFI-FCOS: Intra- and across-layer feature interaction FCOS model for lesion detection of CT images

Authors: Qiu Guan, Mengjie Pan, Feng Chen, Zhiqiang Yang, Zhongwen Yu, Qianwei Zhou, Haigen Hu

Abstract: Effective lesion detection in medical image is not only rely on the features of lesion region,but also deeply relative to the surrounding information.However,most current methods have not fully utilize it.What is more,multi-scale feature fusion mechanism of most traditional detectors are unable to transmit detail information without loss,which makes it hard to detect small and boundary ambiguous l… ▽ More Effective lesion detection in medical image is not only rely on the features of lesion region,but also deeply relative to the surrounding information.However,most current methods have not fully utilize it.What is more,multi-scale feature fusion mechanism of most traditional detectors are unable to transmit detail information without loss,which makes it hard to detect small and boundary ambiguous lesion in early stage disease.To address the above issues,we propose a novel intra- and across-layer feature interaction FCOS model (IAFI-FCOS) with a multi-scale feature fusion mechanism ICAF-FPN,which is a network structure with intra-layer context augmentation (ICA) block and across-layer feature weighting (AFW) block.Therefore,the traditional FCOS detector is optimized by enriching the feature representation from two perspectives.Specifically,the ICA block utilizes dilated attention to augment the context information in order to capture long-range dependencies between the lesion region and the surrounding.The AFW block utilizes dual-axis attention mechanism and weighting operation to obtain the efficient across-layer interaction features,enhancing the representation of detailed features.Our approach has been extensively experimented on both the private pancreatic lesion dataset and the public DeepLesion dataset,our model achieves SOTA results on the pancreatic lesion dataset. △ Less

Submitted 1 September, 2024; originally announced September 2024.

Comments: 2024 IJCNN

arXiv:2409.00366 [pdf, other]

Mini-Proceedings of the "Fourth International Workshop on the Extension Project for the J-PARC Hadron Experimental Facility (HEF-ex 2024)"

Authors: P. Achenbach, K. Aoki, S. Aoki, C. Curceanu, S. Diehl, T. Doi, M. Endo, M. Fujita, T. Fukuda, H. Garcia-Tecocoatzi, L. S. Geng, T. Gunji, C. Hanhart, M. Harada, T. Harada, S. Hayakawa, B. R. He, E. Hiyama, R. Honda, Y. Ichikawa, M. Isaka, D. Jido, A. Jinno, K. Kamada, Y. Kamiya , et al. (36 additional authors not shown)

Abstract: The mini proceedings of the "Fourth International Workshop on the Extension Project for the J-PARC Hadron Experimental Facility (HEF-ex 2024) [https://kds.kek.jp/event/46965]" held at J-PARC, February 19-21, 2024, are presented. The workshop was devoted to discussing the physics case that connects both the present and the future Hadron Experimental Facility at J-PARC, covering a wide range of topi… ▽ More The mini proceedings of the "Fourth International Workshop on the Extension Project for the J-PARC Hadron Experimental Facility (HEF-ex 2024) [https://kds.kek.jp/event/46965]" held at J-PARC, February 19-21, 2024, are presented. The workshop was devoted to discussing the physics case that connects both the present and the future Hadron Experimental Facility at J-PARC, covering a wide range of topics in flavor, hadron, and nuclear physics related to both experimental and theoretical activities being conducted at the facility. △ Less

Submitted 31 August, 2024; originally announced September 2024.

arXiv:2409.00239 [pdf, other]

Quantum algorithms for hypergraph simplex finding

Authors: Zhiying Yu, Shalev Ben-David

Abstract: We study the quantum query algorithms for simplex finding, a generalization of triangle finding to hypergraphs. This problem satisfies a rank-reduction property: a quantum query algorithm for finding simplices in rank-$r$ hypergraphs can be turned into a faster algorithm for finding simplices in rank-$(r-1)$ hypergraphs. We then show that every nested Johnson graph quantum walk (with any constant… ▽ More We study the quantum query algorithms for simplex finding, a generalization of triangle finding to hypergraphs. This problem satisfies a rank-reduction property: a quantum query algorithm for finding simplices in rank-$r$ hypergraphs can be turned into a faster algorithm for finding simplices in rank-$(r-1)$ hypergraphs. We then show that every nested Johnson graph quantum walk (with any constant number of nested levels) can be converted into an adaptive learning graph. Then, we introduce the concept of $α$-symmetric learning graphs, which is a useful framework for designing and analyzing complex quantum search algorithms. Inspired by the work of Le Gall, Nishimura, and Tani (2016) on $3$-simplex finding, we use our new technique to obtain an algorithm for $4$-simplex finding in rank-$4$ hypergraphs with $O(n^{2.46})$ quantum query cost, improving the trivial $O(n^{2.5})$ algorithm. △ Less

Submitted 30 August, 2024; originally announced September 2024.

Comments: 31 pages

arXiv:2408.16712 [pdf, ps, other]

Correlators of long strings on AdS$_3\times$S$^3\times$T$^4$

Authors: Zhe-fei Yu, Cheng Peng

Abstract: In this work, we calculate correlators of long strings on AdS$_3\times$S$^3\times$T$^4$ with pure NS-NS flux. We first construct physical vertex operators that correspond to long strings. Due to the GSO projection, they depend on the parity of the spectral flow parameter $w$. For a given $w$, we construct the physical operators that have the lowest space-time weights in both the NS and R sector. T… ▽ More In this work, we calculate correlators of long strings on AdS$_3\times$S$^3\times$T$^4$ with pure NS-NS flux. We first construct physical vertex operators that correspond to long strings. Due to the GSO projection, they depend on the parity of the spectral flow parameter $w$. For a given $w$, we construct the physical operators that have the lowest space-time weights in both the NS and R sector. Then, we calculate three point correlators for each possible type of parities of spectral flows. We find that the recursion relations of correlators in the bosonic SL$(2,\mathbb{R})$ WZW model can be understood from the equivalence of these superstring correlators with different picture choices. Furthermore, after carefully mapping the vertex operators to appropriate operators in the dual CFT, we find that once the fermionic contributions together with the picture changing effects are correctly taken into account, some mathematical identities of covering maps lead to the matching of the correlators of the two sides. We check this explicitly at the leading order in the conformal perturbation computation and conjecture that this remains correct to all orders. △ Less

Submitted 3 September, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

Comments: 57 pages, 4 tables; v2: typos corrected

arXiv:2408.16455 [pdf, other]

Addressing the Mutual Interference in Uplink ISAC Receivers: A Projection Method

Authors: Zhiyuan Yu, Hong Ren, Cunhua Pan, Gui Zhou, Ruizhe Wang, Mengyu Liu, Jiangzhou Wang

Abstract: Dual function radar and communication (DFRC) is a promising research direction within integrated sensing and communication (ISAC), improving hardware and spectrum efficiency by merging sensing and communication (S&C) functionalities into a shared platform. However, the DFRC receiver (DFRC-R) is tasked with both uplink communication signal detection and simultaneously target-related parameter estim… ▽ More Dual function radar and communication (DFRC) is a promising research direction within integrated sensing and communication (ISAC), improving hardware and spectrum efficiency by merging sensing and communication (S&C) functionalities into a shared platform. However, the DFRC receiver (DFRC-R) is tasked with both uplink communication signal detection and simultaneously target-related parameter estimation from the echoes, leading to issues with mutual interference. In this paper, a projection-based scheme is proposed to equivalently transform the joint signal detection and target estimation problem into a joint signal detection process across multiple snapshots. Compared with conventional successive interference cancellation (SIC) schemes, our proposed approach achieves a higher signal-to-noise ratio (SNR), and a higher ergodic rate when the radar signal is non-negligible. Nonetheless, it introduces an ill-conditioned signal detection problem, which is addressed using a non-linear detector. By jointly processing an increased number of snapshots, the proposed scheme can achieve high S&C performance simultaneously. △ Less

Submitted 29 August, 2024; originally announced August 2024.

Comments: 5 pages, 3 figures, accepted by IEEE WCL

arXiv:2408.16200 [pdf, other]

PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View

Authors: Zichen Yu, Quanli Liu, Wei Wang, Liyong Zhang, Xiaoguang Zhao

Abstract: Recently, LSS-based multi-view 3D object detection provides an economical and deployment-friendly solution for autonomous driving. However, all the existing LSS-based methods transform multi-view image features into a Cartesian Bird's-Eye-View(BEV) representation, which does not take into account the non-uniform image information distribution and hardly exploits the view symmetry. In this paper, i… ▽ More Recently, LSS-based multi-view 3D object detection provides an economical and deployment-friendly solution for autonomous driving. However, all the existing LSS-based methods transform multi-view image features into a Cartesian Bird's-Eye-View(BEV) representation, which does not take into account the non-uniform image information distribution and hardly exploits the view symmetry. In this paper, in order to adapt the image information distribution and preserve the view symmetry by regular convolution, we propose to employ the polar BEV representation to substitute the Cartesian BEV representation. To achieve this, we elaborately tailor three modules: a polar view transformer to generate the polar BEV representation, a polar temporal fusion module for fusing historical polar BEV features and a polar detection head to predict the polar-parameterized representation of the object. In addition, we design a 2D auxiliary detection head and a spatial attention enhancement module to improve the quality of feature extraction in perspective view and BEV, respectively. Finally, we integrate the above improvements into a novel multi-view 3D object detector, PolarBEVDet. Experiments on nuScenes show that PolarBEVDet achieves the superior performance. The code is available at https://github.com/Yzichen/PolarBEVDet.git. △ Less