-
MFCLIP: Multi-modal Fine-grained CLIP for Generalizable Diffusion Face Forgery Detection
Authors:
Yaning Zhang,
Tianyi Wang,
Zitong Yu,
Zan Gao,
Linlin Shen,
Shengyong Chen
Abstract:
The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia, highlighting the urgent need for robust and generalizable face forgery detection (FFD) techniques. Although existing approaches mainly capture face forgery patterns using image modality, other modalities like fine-grained noises and texts are not fully explored, which limits th…
▽ More
The rapid development of photo-realistic face generation methods has raised significant concerns in society and academia, highlighting the urgent need for robust and generalizable face forgery detection (FFD) techniques. Although existing approaches mainly capture face forgery patterns using image modality, other modalities like fine-grained noises and texts are not fully explored, which limits the generalization capability of the model. In addition, most FFD methods tend to identify facial images generated by GAN, but struggle to detect unseen diffusion-synthesized ones. To address the limitations, we aim to leverage the cutting-edge foundation model, contrastive language-image pre-training (CLIP), to achieve generalizable diffusion face forgery detection (DFFD). In this paper, we propose a novel multi-modal fine-grained CLIP (MFCLIP) model, which mines comprehensive and fine-grained forgery traces across image-noise modalities via language-guided face forgery representation learning, to facilitate the advancement of DFFD. Specifically, we devise a fine-grained language encoder (FLE) that extracts fine global language features from hierarchical text prompts. We design a multi-modal vision encoder (MVE) to capture global image forgery embeddings as well as fine-grained noise forgery patterns extracted from the richest patch, and integrate them to mine general visual forgery traces. Moreover, we build an innovative plug-and-play sample pair attention (SPA) method to emphasize relevant negative pairs and suppress irrelevant ones, allowing cross-modality sample pairs to conduct more flexible alignment. Extensive experiments and visualizations show that our model outperforms the state of the arts on different settings like cross-generator, cross-forgery, and cross-dataset evaluations.
△ Less
Submitted 15 September, 2024;
originally announced September 2024.
-
Can Large Language Models Grasp Event Signals? Exploring Pure Zero-Shot Event-based Recognition
Authors:
Zongyou Yu,
Qiang Qu,
Xiaoming Chen,
Chen Wang
Abstract:
Recent advancements in event-based zero-shot object recognition have demonstrated promising results. However, these methods heavily depend on extensive training and are inherently constrained by the characteristics of CLIP. To the best of our knowledge, this research is the first study to explore the understanding capabilities of large language models (LLMs) for event-based visual content. We demo…
▽ More
Recent advancements in event-based zero-shot object recognition have demonstrated promising results. However, these methods heavily depend on extensive training and are inherently constrained by the characteristics of CLIP. To the best of our knowledge, this research is the first study to explore the understanding capabilities of large language models (LLMs) for event-based visual content. We demonstrate that LLMs can achieve event-based object recognition without additional training or fine-tuning in conjunction with CLIP, effectively enabling pure zero-shot event-based recognition. Particularly, we evaluate the ability of GPT-4o / 4turbo and two other open-source LLMs to directly recognize event-based visual content. Extensive experiments are conducted across three benchmark datasets, systematically assessing the recognition accuracy of these models. The results show that LLMs, especially when enhanced with well-designed prompts, significantly improve event-based zero-shot recognition performance. Notably, GPT-4o outperforms the compared models and exceeds the recognition accuracy of state-of-the-art event-based zero-shot methods on N-ImageNet by five orders of magnitude. The implementation of this paper is available at \url{https://github.com/ChrisYu-Zz/Pure-event-based-recognition-based-LLM}.
△ Less
Submitted 15 September, 2024;
originally announced September 2024.
-
SLIM: Scalable and Lightweight LiDAR Mapping in Urban Environments
Authors:
Zehuan Yu,
Zhijian Qiao,
Wenyi Liu,
Huan Yin,
Shaojie Shen
Abstract:
LiDAR point cloud maps are extensively utilized on roads for robot navigation due to their high consistency. However, dense point clouds face challenges of high memory consumption and reduced maintainability for long-term operations. In this study, we introduce SLIM, a scalable and lightweight mapping system for long-term LiDAR mapping in urban environments. The system begins by parameterizing str…
▽ More
LiDAR point cloud maps are extensively utilized on roads for robot navigation due to their high consistency. However, dense point clouds face challenges of high memory consumption and reduced maintainability for long-term operations. In this study, we introduce SLIM, a scalable and lightweight mapping system for long-term LiDAR mapping in urban environments. The system begins by parameterizing structural point clouds into lines and planes. These lightweight and structural representations meet the requirements of map merging, pose graph optimization, and bundle adjustment, ensuring incremental management and local consistency. For long-term operations, a map-centric nonlinear factor recovery method is designed to sparsify poses while preserving mapping accuracy. We validate the SLIM system with multi-session real-world LiDAR data from classical LiDAR mapping datasets, including KITTI, NCLT, and HeLiPR. The experiments demonstrate its capabilities in mapping accuracy, lightweightness, and scalability. Map re-use is also verified through map-based robot localization. Ultimately, with multi-session LiDAR data, the SLIM system provides a globally consistent map with low memory consumption (130 KB/km). We have made our code open-source to benefit the community.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
DiffFAS: Face Anti-Spoofing via Generative Diffusion Models
Authors:
Xinxu Ge,
Xin Liu,
Zitong Yu,
Jingang Shi,
Chun Qi,
Jie Li,
Heikki Kälviäinen
Abstract:
Face anti-spoofing (FAS) plays a vital role in preventing face recognition (FR) systems from presentation attacks. Nowadays, FAS systems face the challenge of domain shift, impacting the generalization performance of existing FAS methods. In this paper, we rethink about the inherence of domain shift and deconstruct it into two factors: image style and image quality. Quality influences the purity o…
▽ More
Face anti-spoofing (FAS) plays a vital role in preventing face recognition (FR) systems from presentation attacks. Nowadays, FAS systems face the challenge of domain shift, impacting the generalization performance of existing FAS methods. In this paper, we rethink about the inherence of domain shift and deconstruct it into two factors: image style and image quality. Quality influences the purity of the presentation of spoof information, while style affects the manner in which spoof information is presented. Based on our analysis, we propose DiffFAS framework, which quantifies quality as prior information input into the network to counter image quality shift, and performs diffusion-based high-fidelity cross-domain and cross-attack types generation to counter image style shift. DiffFAS transforms easily collectible live faces into high-fidelity attack faces with precise labels while maintaining consistency between live and spoof face identities, which can also alleviate the scarcity of labeled data with novel type attacks faced by nowadays FAS system. We demonstrate the effectiveness of our framework on challenging cross-domain and cross-attack FAS datasets, achieving the state-of-the-art performance. Available at https://github.com/murphytju/DiffFAS.
△ Less
Submitted 13 September, 2024;
originally announced September 2024.
-
Large deviation inequalities for the nonlinear unbalanced urn model
Authors:
Jianan Shi,
Zhenhong Yu,
Yu Miao
Abstract:
In the present paper, we consider the two-color nonlinear unbalanced urn model, under a drawing rule reinforced by an $\mathbb{R}^+$-valued concave function and an unbalanced replacement matrix. The large deviation inequalities for the nonlinear unbalanced urn model are established by using the stochastic approximation theory. As an auxiliary theory, we give a specific large deviation inequality f…
▽ More
In the present paper, we consider the two-color nonlinear unbalanced urn model, under a drawing rule reinforced by an $\mathbb{R}^+$-valued concave function and an unbalanced replacement matrix. The large deviation inequalities for the nonlinear unbalanced urn model are established by using the stochastic approximation theory. As an auxiliary theory, we give a specific large deviation inequality for a general stochastic approximation algorithm.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
CTLESS: A scatter-window projection and deep learning-based transmission-less attenuation compensation method for myocardial perfusion SPECT
Authors:
Zitong Yu,
Md Ashequr Rahman,
Craig K. Abbey,
Richard Laforest,
Nancy A. Obuchowski,
Barry A. Siegel,
Abhinav K. Jha
Abstract:
Attenuation compensation (AC), while being beneficial for visual-interpretation tasks in myocardial perfusion imaging (MPI) by SPECT, typically requires the availability of a separate X-ray CT component, leading to additional radiation dose, higher costs, and potentially inaccurate diagnosis due to SPECT/CT misalignment. To address these issues, we developed a method for cardiac SPECT AC using dee…
▽ More
Attenuation compensation (AC), while being beneficial for visual-interpretation tasks in myocardial perfusion imaging (MPI) by SPECT, typically requires the availability of a separate X-ray CT component, leading to additional radiation dose, higher costs, and potentially inaccurate diagnosis due to SPECT/CT misalignment. To address these issues, we developed a method for cardiac SPECT AC using deep learning and emission scatter-window photons without a separate transmission scan (CTLESS). In this method, an estimated attenuation map reconstructed from scatter-energy window projections is segmented into different regions using a multi-channel input multi-decoder network trained on CT scans. Pre-defined attenuation coefficients are assigned to these regions, yielding the attenuation map used for AC. We objectively evaluated this method in a retrospective study with anonymized clinical SPECT/CT stress MPI images on the clinical task of detecting defects with an anthropomorphic model observer. CTLESS yielded statistically non-inferior performance compared to a CT-based AC (CTAC) method and significantly outperformed a non-AC (NAC) method on this clinical task. Similar results were observed in stratified analyses with different sexes, defect extents and severities. The method was observed to generalize across two SPECT scanners, each with a different camera. In addition, CTLESS yielded similar performance as CTAC and outperformed NAC method on the metrics of root mean squared error and structural similarity index measure. Moreover, as we reduced the training dataset size, CTLESS yielded relatively stable AUC values and generally outperformed another DL-based AC method that directly estimated the attenuation coefficient within each voxel. These results demonstrate the capability of the CTLESS method for transmission-less AC in SPECT and motivate further clinical evaluation.
△ Less
Submitted 12 September, 2024;
originally announced September 2024.
-
Azimuthal modulations and extraction of generalized parton distributions
Authors:
Jian-Wei Qiu,
Nobuo Sato,
Zhite Yu
Abstract:
Azimuthal modulations are crucial for the phenomenological extraction and separation of various generalized parton distributions. We provide a new choice of frame and corresponding formalism to describe the azimuthal distributions, based on the separation of physics occurring at different momentum scales. We demonstrate that this new description is not only well-suited for experimental analysis, b…
▽ More
Azimuthal modulations are crucial for the phenomenological extraction and separation of various generalized parton distributions. We provide a new choice of frame and corresponding formalism to describe the azimuthal distributions, based on the separation of physics occurring at different momentum scales. We demonstrate that this new description is not only well-suited for experimental analysis, but also advantageous in separating contributions from different subprocesses to the same physical cross section.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
Amortized Stabilizer Rényi Entropy of Quantum Dynamics
Authors:
Chengkai Zhu,
Yu-Ao Chen,
Zanqiu Shen,
Zhiping Liu,
Zhan Yu,
Xin Wang
Abstract:
Unraveling the secrets of how much nonstabilizerness a quantum dynamic can generate is crucial for harnessing the power of magic states, the essential resources for achieving quantum advantage and realizing fault-tolerant quantum computation. In this work, we introduce the amortized $α$-stabilizer Rényi entropy, a magic monotone for unitary operations that quantifies the nonstabilizerness generati…
▽ More
Unraveling the secrets of how much nonstabilizerness a quantum dynamic can generate is crucial for harnessing the power of magic states, the essential resources for achieving quantum advantage and realizing fault-tolerant quantum computation. In this work, we introduce the amortized $α$-stabilizer Rényi entropy, a magic monotone for unitary operations that quantifies the nonstabilizerness generation capability of quantum dynamics. Amortization is key in quantifying the magic of quantum dynamics, as we reveal that nonstabilizerness generation can be enhanced by prior nonstabilizerness in input states when considering the $α$-stabilizer Rényi entropy, while this is not the case for robustness of magic or stabilizer extent. We demonstrate the versatility of the amortized $α$-stabilizer Rényi entropy in investigating the nonstabilizerness resources of quantum dynamics of computational and fundamental interest. In particular, we establish improved lower bounds on the $T$-count of quantum Fourier transforms and the quantum evolutions of one-dimensional Heisenberg Hamiltonians, showcasing the power of this tool in studying quantum advantages and the corresponding cost in fault-tolerant quantum computation.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
Adaptive Transformer Modelling of Density Function for Nonparametric Survival Analysis
Authors:
Xin Zhang,
Deval Mehta,
Yanan Hu,
Chao Zhu,
David Darby,
Zhen Yu,
Daniel Merlo,
Melissa Gresle,
Anneke Van Der Walt,
Helmut Butzkueven,
Zongyuan Ge
Abstract:
Survival analysis holds a crucial role across diverse disciplines, such as economics, engineering and healthcare. It empowers researchers to analyze both time-invariant and time-varying data, encompassing phenomena like customer churn, material degradation and various medical outcomes. Given the complexity and heterogeneity of such data, recent endeavors have demonstrated successful integration of…
▽ More
Survival analysis holds a crucial role across diverse disciplines, such as economics, engineering and healthcare. It empowers researchers to analyze both time-invariant and time-varying data, encompassing phenomena like customer churn, material degradation and various medical outcomes. Given the complexity and heterogeneity of such data, recent endeavors have demonstrated successful integration of deep learning methodologies to address limitations in conventional statistical approaches. However, current methods typically involve cluttered probability distribution function (PDF), have lower sensitivity in censoring prediction, only model static datasets, or only rely on recurrent neural networks for dynamic modelling. In this paper, we propose a novel survival regression method capable of producing high-quality unimodal PDFs without any prior distribution assumption, by optimizing novel Margin-Mean-Variance loss and leveraging the flexibility of Transformer to handle both temporal and non-temporal data, coined UniSurv. Extensive experiments on several datasets demonstrate that UniSurv places a significantly higher emphasis on censoring compared to other methods.
△ Less
Submitted 10 September, 2024;
originally announced September 2024.
-
Design and Implementation of TAO DAQ System
Authors:
Shuihan Zhang,
Chao Chen,
Xiaolu Ji,
Fei Li,
Yu Peng,
Fabrizio Petrucci,
Yinhui Wu,
Zezhong Yu,
Tingxuan Zeng,
Kejun Zhu
Abstract:
Purpose: The Taishan Antineutrino Observatory (TAO) is a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO), also known as JUNO-TAO. Located close to one of the reactors of the Taishan Nuclear Power Plant, TAO will measure the antineutrino energy spectrum precisely as a reference spectrum for JUNO. The data acquisition (DAQ) system is designed to acquire data from the TAO…
▽ More
Purpose: The Taishan Antineutrino Observatory (TAO) is a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO), also known as JUNO-TAO. Located close to one of the reactors of the Taishan Nuclear Power Plant, TAO will measure the antineutrino energy spectrum precisely as a reference spectrum for JUNO. The data acquisition (DAQ) system is designed to acquire data from the TAO readout electronics and process it with software trigger and data compression algorithms. The data storage bandwidth is limited by the onsite network to be less than 100 Mb/s.
Methods: The system is designed based on a distributed architecture, with fully decoupled modules to facilitate customized design and implementation. It is divided into two main components: the data flow system and the online software. The online software serves as the foundation, providing the electronics configuration, the process management, the run control, and the information sharing. The data flow system facilitates continuous data acquisition from various electronic boards or trigger systems, assembles and processes raw data, and ultimately stores it on the disk.
Results: The core functionality of the system has been designed and developed. The usability of the data flow system interface and the software trigger results have been verified during the pre-installation testing phase.
Conclusion: The DAQ system has been deployed for the TAO experiment. It has also successfully been applied to the integration test of the detector and electronics prototypes.
△ Less
Submitted 9 September, 2024;
originally announced September 2024.
-
Exploring the Optimal Size of Grid-forming Energy Storage in an Off-grid Renewable P2H System under Multi-timescale Energy Management
Authors:
Jie Zhu,
Yiwei Qiu,
Yangjun Zeng,
Yi Zhou,
Shi Chen,
Tianlei Zang,
Buxiang Zhou,
Zhipeng Yu,
Jin Lin
Abstract:
Utility-scale off-grid renewable power-to-hydrogen systems (OReP2HSs) typically include photovoltaic plants, wind turbines, electrolyzers (ELs), and energy storage systems. As an island system, OReP2HS requires at least one component, generally the battery energy storage system (BESS), that operates for grid-forming control to provide frequency and voltage references and regulate them through tran…
▽ More
Utility-scale off-grid renewable power-to-hydrogen systems (OReP2HSs) typically include photovoltaic plants, wind turbines, electrolyzers (ELs), and energy storage systems. As an island system, OReP2HS requires at least one component, generally the battery energy storage system (BESS), that operates for grid-forming control to provide frequency and voltage references and regulate them through transient power support and short-term energy balance regulation. While larger BESS capacity increases this ability, it also raises investment costs. This paper proposes a framework of layered multi-timescale energy management system (EMS) and evaluates the most cost-effective size of the grid-forming BESS in the OReP2HS. The proposed EMS covers the timescales ranging from those for power system transient behaviors to intra-day scheduling, coordinating renewable power, BESS, and ELs. Then, an iterative search procedure based on high-fidelity simulation is employed to determine the size of the BESS with minimal levelized cost of hydrogen (LCOH). Simulations over a reference year, based on the data from a planned OReP2HS project in Inner Mongolia, China, show that with the proposed EMS, the base-case optimal LCOH is 33.212 CNY/kg (4.581 USD/kg). The capital expenditure of the BESS accounts for 17.83% of the total, and the optimal BESS size accounts for 13.6% of the rated hourly energy output of power sources. Sensitivity analysis reveals that by reducing the electrolytic load adjustment time step from 90 to 5 s and increasing its ramping limit from 1% to 10% rated power per second, the BESS size decreases by 53.57%, and the LCOH decreases to 25.458 CNY/kg (3.511 USD/kg). Considering the cost of designing and manufacturing utility-scale ELs with fast load regulation capability, a load adjustment time step of 5-10 s and a ramping limit of 4-6% rated power per second are recommended.
△ Less
Submitted 8 September, 2024;
originally announced September 2024.
-
Enhancing Skin Lesion Diagnosis with Ensemble Learning
Authors:
Xiaoyi Liu,
Zhou Yu,
Lianghao Tan,
Yafeng Yan,
Ge Shi
Abstract:
Skin lesions are an increasingly significant medical concern, varying widely in severity from benign to cancerous. Accurate diagnosis is essential for ensuring timely and appropriate treatment. This study examines the implementation of deep learning methods to assist in the diagnosis of skin lesions using the HAM10000 dataset, which contains seven distinct types of lesions. First, we evaluated thr…
▽ More
Skin lesions are an increasingly significant medical concern, varying widely in severity from benign to cancerous. Accurate diagnosis is essential for ensuring timely and appropriate treatment. This study examines the implementation of deep learning methods to assist in the diagnosis of skin lesions using the HAM10000 dataset, which contains seven distinct types of lesions. First, we evaluated three pre-trained models: MobileNetV2, ResNet18, and VGG11, achieving accuracies of 0.798, 0.802, and 0.805, respectively. To further enhance classification accuracy, we developed ensemble models employing max voting, average voting, and stacking, resulting in accuracies of 0.803, 0.82, and 0.83. Building on the best-performing ensemble learning model, stacking, we developed our proposed model, SkinNet, which incorporates a customized architecture and fine-tuning, achieving an accuracy of 0.867 and an AUC of 0.96. This substantial improvement over individual models demonstrates the effectiveness of ensemble learning in improving skin lesion classification.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Network reconstruction may not mean dynamics prediction
Authors:
Zhendong Yu,
Haiping Huang
Abstract:
With an increasing amount of observations on the dynamics of many complex systems, it is required to reveal the underlying mechanisms behind these complex dynamics, which is fundamentally important in many scientific fields such as climate, financial, ecological, and neural systems. The underlying mechanisms are commonly encoded into network structures, e.g., capturing how constituents interact wi…
▽ More
With an increasing amount of observations on the dynamics of many complex systems, it is required to reveal the underlying mechanisms behind these complex dynamics, which is fundamentally important in many scientific fields such as climate, financial, ecological, and neural systems. The underlying mechanisms are commonly encoded into network structures, e.g., capturing how constituents interact with each other to produce emergent behavior. Here, we address whether a good network reconstruction suggests a good dynamics prediction. The answer is quite dependent on the nature of the supplied (observed) dynamics sequences measured on the complex system. When the dynamics are not chaotic, network reconstruction implies dynamics prediction. In contrast, even if a network can be well reconstructed from the chaotic time series (chaos means that many unstable dynamics states coexist), the prediction of the future dynamics can become impossible as at some future point the prediction error will be amplified. This is explained by using dynamical mean-field theory on a toy model of random recurrent neural networks.
△ Less
Submitted 6 September, 2024;
originally announced September 2024.
-
Towards Data-Centric Face Anti-Spoofing: Improving Cross-domain Generalization via Physics-based Data Synthesis
Authors:
Rizhao Cai,
Cecelia Soh,
Zitong Yu,
Haoliang Li,
Wenhan Yang,
Alex Kot
Abstract:
Face Anti-Spoofing (FAS) research is challenged by the cross-domain problem, where there is a domain gap between the training and testing data. While recent FAS works are mainly model-centric, focusing on developing domain generalization algorithms for improving cross-domain performance, data-centric research for face anti-spoofing, improving generalization from data quality and quantity, is large…
▽ More
Face Anti-Spoofing (FAS) research is challenged by the cross-domain problem, where there is a domain gap between the training and testing data. While recent FAS works are mainly model-centric, focusing on developing domain generalization algorithms for improving cross-domain performance, data-centric research for face anti-spoofing, improving generalization from data quality and quantity, is largely ignored. Therefore, our work starts with data-centric FAS by conducting a comprehensive investigation from the data perspective for improving cross-domain generalization of FAS models. More specifically, at first, based on physical procedures of capturing and recapturing, we propose task-specific FAS data augmentation (FAS-Aug), which increases data diversity by synthesizing data of artifacts, such as printing noise, color distortion, moiré pattern, \textit{etc}. Our experiments show that using our FAS augmentation can surpass traditional image augmentation in training FAS models to achieve better cross-domain performance. Nevertheless, we observe that models may rely on the augmented artifacts, which are not environment-invariant, and using FAS-Aug may have a negative effect. As such, we propose Spoofing Attack Risk Equalization (SARE) to prevent models from relying on certain types of artifacts and improve the generalization performance. Last but not least, our proposed FAS-Aug and SARE with recent Vision Transformer backbones can achieve state-of-the-art performance on the FAS cross-domain generalization protocols. The implementation is available at https://github.com/RizhaoCai/FAS_Aug.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
Training-free Conversion of Pretrained ANNs to SNNs for Low-Power and High-Performance Applications
Authors:
Tong Bu,
Maohua Li,
Zhaofei Yu
Abstract:
Spiking Neural Networks (SNNs) have emerged as a promising substitute for Artificial Neural Networks (ANNs) due to their advantages of fast inference and low power consumption. However, the lack of efficient training algorithms has hindered their widespread adoption. Existing supervised learning algorithms for SNNs require significantly more memory and time than their ANN counterparts. Even common…
▽ More
Spiking Neural Networks (SNNs) have emerged as a promising substitute for Artificial Neural Networks (ANNs) due to their advantages of fast inference and low power consumption. However, the lack of efficient training algorithms has hindered their widespread adoption. Existing supervised learning algorithms for SNNs require significantly more memory and time than their ANN counterparts. Even commonly used ANN-SNN conversion methods necessitate re-training of ANNs to enhance conversion efficiency, incurring additional computational costs. To address these challenges, we propose a novel training-free ANN-SNN conversion pipeline. Our approach directly converts pre-trained ANN models into high-performance SNNs without additional training. The conversion pipeline includes a local-learning-based threshold balancing algorithm, which enables efficient calculation of the optimal thresholds and fine-grained adjustment of threshold value by channel-wise scaling. We demonstrate the scalability of our framework across three typical computer vision tasks: image classification, semantic segmentation, and object detection. This showcases its applicability to both classification and regression tasks. Moreover, we have evaluated the energy consumption of the converted SNNs, demonstrating their superior low-power advantage compared to conventional ANNs. Our training-free algorithm outperforms existing methods, highlighting its practical applicability and efficiency. This approach simplifies the deployment of SNNs by leveraging open-source pre-trained ANN models and neuromorphic hardware, enabling fast, low-power inference with negligible performance reduction.
△ Less
Submitted 5 September, 2024;
originally announced September 2024.
-
arXiv:2409.00803
[pdf]
physics.optics
cond-mat.mes-hall
cond-mat.mtrl-sci
physics.app-ph
quant-ph
Broadband light extraction from near-surface NV centers using crystalline-silicon antennas
Authors:
Minjeong Kim,
Maryam Zahedian,
Wenxin Wu,
Chengyu Fang,
Zhaoning Yu,
Raymond A. Wambold,
Shenwei Yin,
David A. Czaplewski,
Jennifer T. Choy,
Mikhail A. Kats
Abstract:
We use crystalline silicon (Si) antennas to efficiently extract broadband single-photon fluorescence from shallow nitrogen-vacancy (NV) centers in diamond into free space. Our design features relatively easy-to-pattern high-index Si resonators on the diamond surface to boost photon extraction by overcoming total internal reflection and Fresnel reflection at the diamond-air interface, and providing…
▽ More
We use crystalline silicon (Si) antennas to efficiently extract broadband single-photon fluorescence from shallow nitrogen-vacancy (NV) centers in diamond into free space. Our design features relatively easy-to-pattern high-index Si resonators on the diamond surface to boost photon extraction by overcoming total internal reflection and Fresnel reflection at the diamond-air interface, and providing modest Purcell enhancement, without etching or otherwise damaging the diamond surface. In simulations, ~20 times more single photons are collected from a single NV center compared to the case without the antenna; in experiments, we observe an enhancement of ~4 times, limited by spatial alignment between the NV and the antenna. Our approach can be readily applied to other color centers in diamond, and more generally to the extraction of light from quantum emitters in wide-bandgap materials.
△ Less
Submitted 1 September, 2024;
originally announced September 2024.
-
IAFI-FCOS: Intra- and across-layer feature interaction FCOS model for lesion detection of CT images
Authors:
Qiu Guan,
Mengjie Pan,
Feng Chen,
Zhiqiang Yang,
Zhongwen Yu,
Qianwei Zhou,
Haigen Hu
Abstract:
Effective lesion detection in medical image is not only rely on the features of lesion region,but also deeply relative to the surrounding information.However,most current methods have not fully utilize it.What is more,multi-scale feature fusion mechanism of most traditional detectors are unable to transmit detail information without loss,which makes it hard to detect small and boundary ambiguous l…
▽ More
Effective lesion detection in medical image is not only rely on the features of lesion region,but also deeply relative to the surrounding information.However,most current methods have not fully utilize it.What is more,multi-scale feature fusion mechanism of most traditional detectors are unable to transmit detail information without loss,which makes it hard to detect small and boundary ambiguous lesion in early stage disease.To address the above issues,we propose a novel intra- and across-layer feature interaction FCOS model (IAFI-FCOS) with a multi-scale feature fusion mechanism ICAF-FPN,which is a network structure with intra-layer context augmentation (ICA) block and across-layer feature weighting (AFW) block.Therefore,the traditional FCOS detector is optimized by enriching the feature representation from two perspectives.Specifically,the ICA block utilizes dilated attention to augment the context information in order to capture long-range dependencies between the lesion region and the surrounding.The AFW block utilizes dual-axis attention mechanism and weighting operation to obtain the efficient across-layer interaction features,enhancing the representation of detailed features.Our approach has been extensively experimented on both the private pancreatic lesion dataset and the public DeepLesion dataset,our model achieves SOTA results on the pancreatic lesion dataset.
△ Less
Submitted 1 September, 2024;
originally announced September 2024.
-
Mini-Proceedings of the "Fourth International Workshop on the Extension Project for the J-PARC Hadron Experimental Facility (HEF-ex 2024)"
Authors:
P. Achenbach,
K. Aoki,
S. Aoki,
C. Curceanu,
S. Diehl,
T. Doi,
M. Endo,
M. Fujita,
T. Fukuda,
H. Garcia-Tecocoatzi,
L. S. Geng,
T. Gunji,
C. Hanhart,
M. Harada,
T. Harada,
S. Hayakawa,
B. R. He,
E. Hiyama,
R. Honda,
Y. Ichikawa,
M. Isaka,
D. Jido,
A. Jinno,
K. Kamada,
Y. Kamiya
, et al. (36 additional authors not shown)
Abstract:
The mini proceedings of the "Fourth International Workshop on the Extension Project for the J-PARC Hadron Experimental Facility (HEF-ex 2024) [https://kds.kek.jp/event/46965]" held at J-PARC, February 19-21, 2024, are presented. The workshop was devoted to discussing the physics case that connects both the present and the future Hadron Experimental Facility at J-PARC, covering a wide range of topi…
▽ More
The mini proceedings of the "Fourth International Workshop on the Extension Project for the J-PARC Hadron Experimental Facility (HEF-ex 2024) [https://kds.kek.jp/event/46965]" held at J-PARC, February 19-21, 2024, are presented. The workshop was devoted to discussing the physics case that connects both the present and the future Hadron Experimental Facility at J-PARC, covering a wide range of topics in flavor, hadron, and nuclear physics related to both experimental and theoretical activities being conducted at the facility.
△ Less
Submitted 31 August, 2024;
originally announced September 2024.
-
Quantum algorithms for hypergraph simplex finding
Authors:
Zhiying Yu,
Shalev Ben-David
Abstract:
We study the quantum query algorithms for simplex finding, a generalization of triangle finding to hypergraphs. This problem satisfies a rank-reduction property: a quantum query algorithm for finding simplices in rank-$r$ hypergraphs can be turned into a faster algorithm for finding simplices in rank-$(r-1)$ hypergraphs. We then show that every nested Johnson graph quantum walk (with any constant…
▽ More
We study the quantum query algorithms for simplex finding, a generalization of triangle finding to hypergraphs. This problem satisfies a rank-reduction property: a quantum query algorithm for finding simplices in rank-$r$ hypergraphs can be turned into a faster algorithm for finding simplices in rank-$(r-1)$ hypergraphs. We then show that every nested Johnson graph quantum walk (with any constant number of nested levels) can be converted into an adaptive learning graph. Then, we introduce the concept of $α$-symmetric learning graphs, which is a useful framework for designing and analyzing complex quantum search algorithms. Inspired by the work of Le Gall, Nishimura, and Tani (2016) on $3$-simplex finding, we use our new technique to obtain an algorithm for $4$-simplex finding in rank-$4$ hypergraphs with $O(n^{2.46})$ quantum query cost, improving the trivial $O(n^{2.5})$ algorithm.
△ Less
Submitted 30 August, 2024;
originally announced September 2024.
-
Correlators of long strings on AdS$_3\times$S$^3\times$T$^4$
Authors:
Zhe-fei Yu,
Cheng Peng
Abstract:
In this work, we calculate correlators of long strings on AdS$_3\times$S$^3\times$T$^4$ with pure NS-NS flux. We first construct physical vertex operators that correspond to long strings. Due to the GSO projection, they depend on the parity of the spectral flow parameter $w$. For a given $w$, we construct the physical operators that have the lowest space-time weights in both the NS and R sector. T…
▽ More
In this work, we calculate correlators of long strings on AdS$_3\times$S$^3\times$T$^4$ with pure NS-NS flux. We first construct physical vertex operators that correspond to long strings. Due to the GSO projection, they depend on the parity of the spectral flow parameter $w$. For a given $w$, we construct the physical operators that have the lowest space-time weights in both the NS and R sector. Then, we calculate three point correlators for each possible type of parities of spectral flows. We find that the recursion relations of correlators in the bosonic SL$(2,\mathbb{R})$ WZW model can be understood from the equivalence of these superstring correlators with different picture choices. Furthermore, after carefully mapping the vertex operators to appropriate operators in the dual CFT, we find that once the fermionic contributions together with the picture changing effects are correctly taken into account, some mathematical identities of covering maps lead to the matching of the correlators of the two sides. We check this explicitly at the leading order in the conformal perturbation computation and conjecture that this remains correct to all orders.
△ Less
Submitted 3 September, 2024; v1 submitted 29 August, 2024;
originally announced August 2024.
-
Addressing the Mutual Interference in Uplink ISAC Receivers: A Projection Method
Authors:
Zhiyuan Yu,
Hong Ren,
Cunhua Pan,
Gui Zhou,
Ruizhe Wang,
Mengyu Liu,
Jiangzhou Wang
Abstract:
Dual function radar and communication (DFRC) is a promising research direction within integrated sensing and communication (ISAC), improving hardware and spectrum efficiency by merging sensing and communication (S&C) functionalities into a shared platform. However, the DFRC receiver (DFRC-R) is tasked with both uplink communication signal detection and simultaneously target-related parameter estim…
▽ More
Dual function radar and communication (DFRC) is a promising research direction within integrated sensing and communication (ISAC), improving hardware and spectrum efficiency by merging sensing and communication (S&C) functionalities into a shared platform. However, the DFRC receiver (DFRC-R) is tasked with both uplink communication signal detection and simultaneously target-related parameter estimation from the echoes, leading to issues with mutual interference. In this paper, a projection-based scheme is proposed to equivalently transform the joint signal detection and target estimation problem into a joint signal detection process across multiple snapshots. Compared with conventional successive interference cancellation (SIC) schemes, our proposed approach achieves a higher signal-to-noise ratio (SNR), and a higher ergodic rate when the radar signal is non-negligible. Nonetheless, it introduces an ill-conditioned signal detection problem, which is addressed using a non-linear detector. By jointly processing an increased number of snapshots, the proposed scheme can achieve high S&C performance simultaneously.
△ Less
Submitted 29 August, 2024;
originally announced August 2024.
-
PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View
Authors:
Zichen Yu,
Quanli Liu,
Wei Wang,
Liyong Zhang,
Xiaoguang Zhao
Abstract:
Recently, LSS-based multi-view 3D object detection provides an economical and deployment-friendly solution for autonomous driving. However, all the existing LSS-based methods transform multi-view image features into a Cartesian Bird's-Eye-View(BEV) representation, which does not take into account the non-uniform image information distribution and hardly exploits the view symmetry. In this paper, i…
▽ More
Recently, LSS-based multi-view 3D object detection provides an economical and deployment-friendly solution for autonomous driving. However, all the existing LSS-based methods transform multi-view image features into a Cartesian Bird's-Eye-View(BEV) representation, which does not take into account the non-uniform image information distribution and hardly exploits the view symmetry. In this paper, in order to adapt the image information distribution and preserve the view symmetry by regular convolution, we propose to employ the polar BEV representation to substitute the Cartesian BEV representation. To achieve this, we elaborately tailor three modules: a polar view transformer to generate the polar BEV representation, a polar temporal fusion module for fusing historical polar BEV features and a polar detection head to predict the polar-parameterized representation of the object. In addition, we design a 2D auxiliary detection head and a spatial attention enhancement module to improve the quality of feature extraction in perspective view and BEV, respectively. Finally, we integrate the above improvements into a novel multi-view 3D object detector, PolarBEVDet. Experiments on nuScenes show that PolarBEVDet achieves the superior performance. The code is available at https://github.com/Yzichen/PolarBEVDet.git.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
The Remarkable X-ray Spectra and Variability of the Ultraluminous Weak-Line Quasar SDSS J1521+5202
Authors:
Shouyi Wang,
W. Niel Brandt,
Bin Luo,
Zhibo Yu,
Fan Zou,
Qingling Ni,
Fabio Vito
Abstract:
We present a focused X-ray and multiwavelength study of the ultraluminous weak-line quasar (WLQ) SDSS J1521+5202, one of the few X-ray weak WLQs that is amenable to basic X-ray spectral and variability investigations. J1521+5202 shows striking X-ray variability during 2006--2023, by up to a factor of $\approx 32$ in 0.5--2 keV flux, and our new 2023 Chandra observation caught it in its brightest X…
▽ More
We present a focused X-ray and multiwavelength study of the ultraluminous weak-line quasar (WLQ) SDSS J1521+5202, one of the few X-ray weak WLQs that is amenable to basic X-ray spectral and variability investigations. J1521+5202 shows striking X-ray variability during 2006--2023, by up to a factor of $\approx 32$ in 0.5--2 keV flux, and our new 2023 Chandra observation caught it in its brightest X-ray flux state to date. Concurrent infrared/optical observations show only mild variability. The 2023 Chandra spectrum can be acceptably described by a power law with intrinsic X-ray absorption, and it reveals a nominal intrinsic level of X-ray emission relative to its optical/ultraviolet emission. In contrast, an earlier Chandra spectrum from 2013 shows apparent spectral complexity that is not well fit by a variety of models, including ionized-absorption or standard Compton-reflection models. Overall, the observations are consistent with the thick-disk plus outflow model previously advanced for WLQs, where a nominal level of underlying X-ray emission plus variable absorption lead to the remarkable observed X-ray variability. In the case of J1521+5202 it appears likely that the outflow, and not the thick disk itself, lies along our line-of-sight and causes the X-ray absorption.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Authors:
Min Shi,
Fuxiao Liu,
Shihao Wang,
Shijia Liao,
Subhashree Radhakrishnan,
De-An Huang,
Hongxu Yin,
Karan Sapra,
Yaser Yacoob,
Humphrey Shi,
Bryan Catanzaro,
Andrew Tao,
Jan Kautz,
Zhiding Yu,
Guilin Liu
Abstract:
The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual perception significantly reduces hallucinations and improves performance on resolution-sensitive tasks, such as optical character recognition and document analysis. A number of recent MLLMs achieve this goal using a mixture of vis…
▽ More
The ability to accurately interpret complex visual information is a crucial topic of multimodal large language models (MLLMs). Recent work indicates that enhanced visual perception significantly reduces hallucinations and improves performance on resolution-sensitive tasks, such as optical character recognition and document analysis. A number of recent MLLMs achieve this goal using a mixture of vision encoders. Despite their success, there is a lack of systematic comparisons and detailed ablation studies addressing critical aspects, such as expert selection and the integration of multiple vision experts. This study provides an extensive exploration of the design space for MLLMs using a mixture of vision encoders and resolutions. Our findings reveal several underlying principles common to various existing strategies, leading to a streamlined yet effective design approach. We discover that simply concatenating visual tokens from a set of complementary vision encoders is as effective as more complex mixing architectures or strategies. We additionally introduce Pre-Alignment to bridge the gap between vision-focused encoders and language tokens, enhancing model coherence. The resulting family of MLLMs, Eagle, surpasses other leading open-source models on major MLLM benchmarks. Models and code: https://github.com/NVlabs/Eagle
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation
Authors:
Fangxun Shu,
Yue Liao,
Le Zhuo,
Chenning Xu,
Guanghao Zhang,
Haonan Shi,
Long Chen,
Tao Zhong,
Wanggui He,
Siming Fu,
Haoyuan Li,
Bolin Li,
Zhelun Yu,
Si Liu,
Hongsheng Li,
Hao Jiang
Abstract:
We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale MLLM (l-MLLM). Our approach tackles two fundamental challenges in MLLM distillation. First, we optimize the network structure of s-MLLM by integrating a sparse Mixture of Experts (MoE) architecture into the language model, s…
▽ More
We introduce LLaVA-MoD, a novel framework designed to enable the efficient training of small-scale Multimodal Language Models (s-MLLM) by distilling knowledge from large-scale MLLM (l-MLLM). Our approach tackles two fundamental challenges in MLLM distillation. First, we optimize the network structure of s-MLLM by integrating a sparse Mixture of Experts (MoE) architecture into the language model, striking a balance between computational efficiency and model expressiveness. Second, we propose a progressive knowledge transfer strategy to ensure comprehensive knowledge migration. This strategy begins with mimic distillation, where we minimize the Kullback-Leibler (KL) divergence between output distributions to enable the student model to emulate the teacher network's understanding. Following this, we introduce preference distillation via Direct Preference Optimization (DPO), where the key lies in treating l-MLLM as the reference model. During this phase, the s-MLLM's ability to discriminate between superior and inferior examples is significantly enhanced beyond l-MLLM, leading to a better student that surpasses its teacher, particularly in hallucination benchmarks. Extensive experiments demonstrate that LLaVA-MoD outperforms existing models across various multimodal benchmarks while maintaining a minimal number of activated parameters and low computational costs. Remarkably, LLaVA-MoD, with only 2B activated parameters, surpasses Qwen-VL-Chat-7B by an average of 8.8% across benchmarks, using merely 0.3% of the training data and 23% trainable parameters. These results underscore LLaVA-MoD's ability to effectively distill comprehensive knowledge from its teacher model, paving the way for the development of more efficient MLLMs. The code will be available on: https://github.com/shufangxun/LLaVA-MoD.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
220 GHz Urban Microcell Channel Measurement and Characterization on a University Campus
Authors:
Yuanbo Li,
Yiqin Wang,
Yejian Lyu,
Ziming Yu,
Chong Han
Abstract:
Owning abundant bandwidth resources, the Terahertz (THz) band (0.1-10~THz) is envisioned as a key technology to realize ultra-high-speed communications in 6G and beyond wireless networks. To realize reliable THz communications in urban microcell (UMi) environments, propagation analysis and channel characterization are still insufficient. In this paper, channel measurement campaigns are conducted i…
▽ More
Owning abundant bandwidth resources, the Terahertz (THz) band (0.1-10~THz) is envisioned as a key technology to realize ultra-high-speed communications in 6G and beyond wireless networks. To realize reliable THz communications in urban microcell (UMi) environments, propagation analysis and channel characterization are still insufficient. In this paper, channel measurement campaigns are conducted in a UMi scenario at 220~GHz, using a correlation-based time domain channel sounder. 24 positions are measured along a road on the university campus, with distances ranging from 34~m to 410~m. Based on the measurement results, the spatial consistency and interaction of THz waves to the surrounding environments are analyzed. Moreover, the additional loss due to foliage blockage is calculated and an average value of 16.7~dB is observed. Furthermore, a full portrait of channel characteristics, including path loss, shadow fading, K-factor, delay and angular spreads, as well as cluster parameters, is calculated and analyzed. Specifically, an average K-factor value of 17.5 dB is measured in the line-of-sight (LoS) case, which is nearly two times larger than the extrapolated values from the 3GPP standard, revealing weak multipath effects in the THz band. Additionally, 2.5 clusters on average are observed in the LoS case, around one fifth of what is defined in the 3GPP model, which uncovers the strong sparsity in THz UMi. The results and analysis in this work can offer guidance for system design for future THz UMi networks.
△ Less
Submitted 28 August, 2024;
originally announced August 2024.
-
Force-Guided Bridge Matching for Full-Atom Time-Coarsened Dynamics of Peptides
Authors:
Ziyang Yu,
Wenbing Huang,
Yang Liu
Abstract:
Molecular Dynamics (MD) simulations are irreplaceable and ubiquitous in fields of materials science, chemistry, pharmacology just to name a few. Conventional MD simulations are plagued by numerical stability as well as long equilibration time issues, which limits broader applications of MD simulations. Recently, a surge of deep learning approaches have been devised for time-coarsened dynamics, whi…
▽ More
Molecular Dynamics (MD) simulations are irreplaceable and ubiquitous in fields of materials science, chemistry, pharmacology just to name a few. Conventional MD simulations are plagued by numerical stability as well as long equilibration time issues, which limits broader applications of MD simulations. Recently, a surge of deep learning approaches have been devised for time-coarsened dynamics, which learns the state transition mechanism over much larger time scales to overcome these limitations. However, only a few methods target the underlying Boltzmann distribution by resampling techniques, where proposals are rarely accepted as new states with low efficiency. In this work, we propose a force-guided bridge matching model, FBM, a novel framework that first incorporates physical priors into bridge matching for full-atom time-coarsened dynamics. With the guidance of our well-designed intermediate force field, FBM is feasible to target the Boltzmann-like distribution by direct inference without extra steps. Experiments on small peptides verify our superiority in terms of comprehensive metrics and demonstrate transferability to unseen peptide systems.
△ Less
Submitted 3 September, 2024; v1 submitted 27 August, 2024;
originally announced August 2024.
-
LSM-YOLO: A Compact and Effective ROI Detector for Medical Detection
Authors:
Zhongwen Yu,
Qiu Guan,
Jianmin Yang,
Zhiqiang Yang,
Qianwei Zhou,
Yang Chen,
Feng Chen
Abstract:
In existing medical Region of Interest (ROI) detection, there lacks an algorithm that can simultaneously satisfy both real-time performance and accuracy, not meeting the growing demand for automatic detection in medicine. Although the basic YOLO framework ensures real-time detection due to its fast speed, it still faces challenges in maintaining precision concurrently. To alleviate the above probl…
▽ More
In existing medical Region of Interest (ROI) detection, there lacks an algorithm that can simultaneously satisfy both real-time performance and accuracy, not meeting the growing demand for automatic detection in medicine. Although the basic YOLO framework ensures real-time detection due to its fast speed, it still faces challenges in maintaining precision concurrently. To alleviate the above problems, we propose a novel model named Lightweight Shunt Matching-YOLO (LSM-YOLO), with Lightweight Adaptive Extraction (LAE) and Multipath Shunt Feature Matching (MSFM). Firstly, by using LAE to refine feature extraction, the model can obtain more contextual information and high-resolution details from multiscale feature maps, thereby extracting detailed features of ROI in medical images while reducing the influence of noise. Secondly, MSFM is utilized to further refine the fusion of high-level semantic features and low-level visual features, enabling better fusion between ROI features and neighboring features, thereby improving the detection rate for better diagnostic assistance. Experimental results demonstrate that LSM-YOLO achieves 48.6% AP on a private dataset of pancreatic tumors, 65.1% AP on the BCCD blood cell detection public dataset, and 73.0% AP on the Br35h brain tumor detection public dataset. Our model achieves state-of-the-art performance with minimal parameter cost on the above three datasets. The source codes are at: https://github.com/VincentYuuuuuu/LSM-YOLO.
△ Less
Submitted 26 August, 2024;
originally announced August 2024.
-
Advancing Gamma-Ray Burst Identification through Transfer Learning with Convolutional Neural Networks
Authors:
Peng Zhang,
Bing Li,
Ren-zhou Gui,
Shao-lin Xiong,
Yu Wang,
Yan-qiu Zhang,
Chen-wei Wang,
Jia-cong Liu,
Wang-chen Xue,
Chao Zheng,
Zheng-hang Yu,
Wen-long Zhang
Abstract:
The Rapid and accurate identification of Gamma-Ray Bursts (GRBs) is crucial for unraveling their origins. However, current burst search algorithms frequently miss low-threshold signals or lack universality for observations. In this study, we propose a novel approach utilizing transfer learning experiment based on convolutional neural network (CNN) to establish a universal GRB identification method…
▽ More
The Rapid and accurate identification of Gamma-Ray Bursts (GRBs) is crucial for unraveling their origins. However, current burst search algorithms frequently miss low-threshold signals or lack universality for observations. In this study, we propose a novel approach utilizing transfer learning experiment based on convolutional neural network (CNN) to establish a universal GRB identification method, which validated successfully using GECAM-B data. By employing data augmentation techniques, we enhance the diversity and quantity of the GRB sample. We develop a 1D CNN model with a multi-scale feature cross fusion module (MSCFM) to extract features from samples and perform classification. The comparative results demonstrated significant performance improvements following pre-training and transferring on a large-scale dataset. Our optimal model achieved an impressive accuracy of 96.41% on the source dataset of GECAM-B, and identified three previously undiscovered GRBs by contrast with manual analysis of GECAM-B observations. These innovative transfer learning and data augmentation methods presented in this work hold promise for applications in multi-satellite exploration scenarios characterized by limited data sets and a scarcity of labeled samples in high-energy astronomy.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Application of first- and second-order adjoint methods to glacial isostatic adjustment incorporating rotational feedbacks
Authors:
Ziheng Yu,
David Al-Attar,
Frank Syvret,
Andrew J. Lloyd
Abstract:
This paper revisits and extends the adjoint theory for glacial isostatic adjustment (GIA) of Crawford et al. (2018). Rotational feedbacks are now incorporated, and the application of the second-order adjoint method is described for the first time. The first-order adjoint method provides an efficient means for computing sensitivity kernels for a chosen objective functional, while the second-order a…
▽ More
This paper revisits and extends the adjoint theory for glacial isostatic adjustment (GIA) of Crawford et al. (2018). Rotational feedbacks are now incorporated, and the application of the second-order adjoint method is described for the first time. The first-order adjoint method provides an efficient means for computing sensitivity kernels for a chosen objective functional, while the second-order adjoint method provides second-derivative information in the form of Hessian kernels. These latter kernels are required by efficient Newton-type optimisation schemes and within methods for quantifying uncertainty for non-linear inverse problems. Most importantly, the entire theory has been reformulated so as to simplify its implementation by others within the GIA community. In particular, the rate-formulation for the GIA forward problem introduced by Crawford et al. (2018) has been replaced with the conventional equations for modelling GIA in laterally heterogeneous earth models. The implementation of the first- and second-order adjoint problems should be relatively easy within both existing and new GIA codes, with only the inclusions of more general force terms being required.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Deep Learning for Lung Disease Classification Using Transfer Learning and a Customized CNN Architecture with Attention
Authors:
Xiaoyi Liu,
Zhou Yu,
Lianghao Tan
Abstract:
Many people die from lung-related diseases every year. X-ray is an effective way to test if one is diagnosed with a lung-related disease or not. This study concentrates on categorizing three distinct types of lung X-rays: those depicting healthy lungs, those showing lung opacities, and those indicative of viral pneumonia. Accurately diagnosing the disease at an early phase is critical. In this pap…
▽ More
Many people die from lung-related diseases every year. X-ray is an effective way to test if one is diagnosed with a lung-related disease or not. This study concentrates on categorizing three distinct types of lung X-rays: those depicting healthy lungs, those showing lung opacities, and those indicative of viral pneumonia. Accurately diagnosing the disease at an early phase is critical. In this paper, five different pre-trained models will be tested on the Lung X-ray Image Dataset. SqueezeNet, VGG11, ResNet18, DenseNet, and MobileNetV2 achieved accuracies of 0.64, 0.85, 0.87, 0.88, and 0.885, respectively. MobileNetV2, as the best-performing pre-trained model, will then be further analyzed as the base model. Eventually, our own model, MobileNet-Lung based on MobileNetV2, with fine-tuning and an additional layer of attention within feature layers, was invented to tackle the lung disease classification task and achieved an accuracy of 0.933. This result is significantly improved compared with all five pre-trained models.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
IntelliCare: Improving Healthcare Analysis with Variance-Controlled Patient-Level Knowledge from Large Language Models
Authors:
Zhihao Yu,
Yujie Jin,
Yongxin Xu,
Xu Chu,
Yasha Wang,
Junfeng Zhao
Abstract:
While pioneering deep learning methods have made great strides in analyzing electronic health record (EHR) data, they often struggle to fully capture the semantics of diverse medical codes from limited data. The integration of external knowledge from Large Language Models (LLMs) presents a promising avenue for improving healthcare predictions. However, LLM analyses may exhibit significant variance…
▽ More
While pioneering deep learning methods have made great strides in analyzing electronic health record (EHR) data, they often struggle to fully capture the semantics of diverse medical codes from limited data. The integration of external knowledge from Large Language Models (LLMs) presents a promising avenue for improving healthcare predictions. However, LLM analyses may exhibit significant variance due to ambiguity problems and inconsistency issues, hindering their effective utilization. To address these challenges, we propose IntelliCare, a novel framework that leverages LLMs to provide high-quality patient-level external knowledge and enhance existing EHR models. Concretely, IntelliCare identifies patient cohorts and employs task-relevant statistical information to augment LLM understanding and generation, effectively mitigating the ambiguity problem. Additionally, it refines LLM-derived knowledge through a hybrid approach, generating multiple analyses and calibrating them using both the EHR model and perplexity measures. Experimental evaluations on three clinical prediction tasks across two large-scale EHR datasets demonstrate that IntelliCare delivers significant performance improvements to existing methods, highlighting its potential in advancing personalized healthcare predictions and decision support systems.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
CLLMFS: A Contrastive Learning enhanced Large Language Model Framework for Few-Shot Named Entity Recognition
Authors:
Yafeng Zhang,
Zilan Yu,
Yuang Huang,
Jing Tang
Abstract:
Few-shot Named Entity Recognition (NER), the task of identifying named entities with only a limited amount of labeled data, has gained increasing significance in natural language processing. While existing methodologies have shown some effectiveness, such as enriching label semantics through various prompting modes or employing metric learning techniques, their performance exhibits limited robustn…
▽ More
Few-shot Named Entity Recognition (NER), the task of identifying named entities with only a limited amount of labeled data, has gained increasing significance in natural language processing. While existing methodologies have shown some effectiveness, such as enriching label semantics through various prompting modes or employing metric learning techniques, their performance exhibits limited robustness across diverse domains due to the lack of rich knowledge in their pre-trained models. To address this issue, we propose CLLMFS, a Contrastive Learning enhanced Large Language Model (LLM) Framework for Few-Shot Named Entity Recognition, achieving promising results with limited training data. Considering the impact of LLM's internal representations on downstream tasks, CLLMFS integrates Low-Rank Adaptation (LoRA) and contrastive learning mechanisms specifically tailored for few-shot NER. By enhancing the model's internal representations, CLLMFS effectively improves both entity boundary awareness ability and entity recognition accuracy. Our method has achieved state-of-the-art performance improvements on F1-score ranging from 2.58\% to 97.74\% over existing best-performing methods across several recognized benchmarks. Furthermore, through cross-domain NER experiments conducted on multiple datasets, we have further validated the robust generalization capability of our method. Our code will be released in the near future.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
All-Electrical Layer-Spintronics in Altermagnetic Bilayer
Authors:
Rui Peng,
Jin Yang,
Wee-Liat Ong,
Pin Ho,
Chit Siong Lau,
Zhi-Ming Yu,
Yee Sin Ang
Abstract:
Electrical manipulation of spin-polarized current is highly desirable yet tremendously challenging in developing ultracompact spintronic device technology. Here we propose a scheme to realize the all-electrical manipulation of spin-polarized current in an altermagnetic bilayer. Such a bilayer system can host layer-spin locking, in which one layer hosts a spin-polarized current while the other laye…
▽ More
Electrical manipulation of spin-polarized current is highly desirable yet tremendously challenging in developing ultracompact spintronic device technology. Here we propose a scheme to realize the all-electrical manipulation of spin-polarized current in an altermagnetic bilayer. Such a bilayer system can host layer-spin locking, in which one layer hosts a spin-polarized current while the other layer hosts a current with opposite spin polarization. An out-of-plane electric field breaks the layer degeneracy, leading to a gate-tunable spin-polarized current whose polarization can be fully reversed upon flipping the polarity of the electric field. Using first-principles calculations, we show that CrS bilayer with C-type antiferromagnetic exchange interaction exhibits a hidden layer-spin locking mechanism that enables the spin polarization of the transport current to be electrically manipulated via the layer degree of freedom. We demonstrate that sign-reversible spin polarization as high as 87% can be achieved at room temperature. This work presents the pioneering concept of layer-spintronics which synergizes altermagnetism and bilayer stacking to achieve efficient electrical control of spin.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
Long-Propagating Ghost Phonon Polaritons Enabled by Selective Mode Excitation
Authors:
Manuka P. Suriyage,
Qingyi Zhou,
Hao Qin,
Xueqian Sun,
Zhuoyuan Lu,
Stefan A. Maier,
Zongfu Yu,
Yuerui Lu
Abstract:
The precise control of phonon polaritons(PhPs) is essential for advancements in nanophotonic applications like on-chip optical communication and quantum information processing. Ghost hyperbolic phonon polaritons (g-HPs), which have been recently discovered, feature in-plane hyperbolic dispersion and oblique wavefronts, enabling long-range propagation. Despite their potential, controlling the direc…
▽ More
The precise control of phonon polaritons(PhPs) is essential for advancements in nanophotonic applications like on-chip optical communication and quantum information processing. Ghost hyperbolic phonon polaritons (g-HPs), which have been recently discovered, feature in-plane hyperbolic dispersion and oblique wavefronts, enabling long-range propagation. Despite their potential, controlling the directionality and selective excitation of g-HPs remains challenging. Our research demonstrates that modifying the shape of the launching micro/nano antenna can achieve this control. Using an asymmetric triangular gold antenna on a calcite crystal surface, we achieve highly directional g-HP excitation by selectively targeting specific polariton modes. Additionally, the mode of g-HPs can be adjusted by changing the excitation wavelength or rotating the antenna. Remarkably, our near-field imaging experiments show g-HP propagation over distances exceeding 35 micrometers, more than twice the length reported in previous studies. This work merges g-HP theory with structural engineering, enhancing the control over g-HPs and paving the way for innovative applications in mid-IR optoelectronics.
△ Less
Submitted 25 August, 2024; v1 submitted 22 August, 2024;
originally announced August 2024.
-
SAM-SP: Self-Prompting Makes SAM Great Again
Authors:
Chunpeng Zhou,
Kangjie Ning,
Qianqian Shen,
Sheng Zhou,
Zhi Yu,
Haishuai Wang
Abstract:
The recently introduced Segment Anything Model (SAM), a Visual Foundation Model (VFM), has demonstrated impressive capabilities in zero-shot segmentation tasks across diverse natural image datasets. Despite its success, SAM encounters noticeably performance degradation when applied to specific domains, such as medical images. Current efforts to address this issue have involved fine-tuning strategi…
▽ More
The recently introduced Segment Anything Model (SAM), a Visual Foundation Model (VFM), has demonstrated impressive capabilities in zero-shot segmentation tasks across diverse natural image datasets. Despite its success, SAM encounters noticeably performance degradation when applied to specific domains, such as medical images. Current efforts to address this issue have involved fine-tuning strategies, intended to bolster the generalizability of the vanilla SAM. However, these approaches still predominantly necessitate the utilization of domain specific expert-level prompts during the evaluation phase, which severely constrains the model's practicality.
To overcome this limitation, we introduce a novel self-prompting based fine-tuning approach, called SAM-SP, tailored for extending the vanilla SAM model. Specifically, SAM-SP leverages the output from the previous iteration of the model itself as prompts to guide subsequent iteration of the model. This self-prompting module endeavors to learn how to generate useful prompts autonomously and alleviates the dependence on expert prompts during the evaluation phase, significantly broadening SAM's applicability. Additionally, we integrate a self-distillation module to enhance the self-prompting process further. Extensive experiments across various domain specific datasets validate the effectiveness of the proposed SAM-SP. Our SAM-SP not only alleviates the reliance on expert prompts but also exhibits superior segmentation performance comparing to the state-of-the-art task-specific segmentation approaches, the vanilla SAM, and SAM-based approaches.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
TRRG: Towards Truthful Radiology Report Generation With Cross-modal Disease Clue Enhanced Large Language Model
Authors:
Yuhao Wang,
Chao Hao,
Yawen Cui,
Xinqi Su,
Weicheng Xie,
Tao Tan,
Zitong Yu
Abstract:
The vision-language modeling capability of multi-modal large language models has attracted wide attention from the community. However, in medical domain, radiology report generation using vision-language models still faces significant challenges due to the imbalanced data distribution caused by numerous negated descriptions in radiology reports and issues such as rough alignment between radiology…
▽ More
The vision-language modeling capability of multi-modal large language models has attracted wide attention from the community. However, in medical domain, radiology report generation using vision-language models still faces significant challenges due to the imbalanced data distribution caused by numerous negated descriptions in radiology reports and issues such as rough alignment between radiology reports and radiography. In this paper, we propose a truthful radiology report generation framework, namely TRRG, based on stage-wise training for cross-modal disease clue injection into large language models. In pre-training stage, During the pre-training phase, contrastive learning is employed to enhance the ability of visual encoder to perceive fine-grained disease details. In fine-tuning stage, the clue injection module we proposed significantly enhances the disease-oriented perception capability of the large language model by effectively incorporating the robust zero-shot disease perception. Finally, through the cross-modal clue interaction module, our model effectively achieves the multi-granular interaction of visual embeddings and an arbitrary number of disease clue embeddings. This significantly enhances the report generation capability and clinical effectiveness of multi-modal large language models in the field of radiology reportgeneration. Experimental results demonstrate that our proposed pre-training and fine-tuning framework achieves state-of-the-art performance in radiology report generation on datasets such as IU-Xray and MIMIC-CXR. Further analysis indicates that our proposed method can effectively enhance the model to perceive diseases and improve its clinical effectiveness.
△ Less
Submitted 22 August, 2024;
originally announced August 2024.
-
LLM-enhanced Scene Graph Learning for Household Rearrangement
Authors:
Wenhao Li,
Zhiyuan Yu,
Qijin She,
Zhinan Yu,
Yuqing Lan,
Chenyang Zhu,
Ruizhen Hu,
Kai Xu
Abstract:
The household rearrangement task involves spotting misplaced objects in a scene and accommodate them with proper places. It depends both on common-sense knowledge on the objective side and human user preference on the subjective side. In achieving such task, we propose to mine object functionality with user preference alignment directly from the scene itself, without relying on human intervention.…
▽ More
The household rearrangement task involves spotting misplaced objects in a scene and accommodate them with proper places. It depends both on common-sense knowledge on the objective side and human user preference on the subjective side. In achieving such task, we propose to mine object functionality with user preference alignment directly from the scene itself, without relying on human intervention. To do so, we work with scene graph representation and propose LLM-enhanced scene graph learning which transforms the input scene graph into an affordance-enhanced graph (AEG) with information-enhanced nodes and newly discovered edges (relations). In AEG, the nodes corresponding to the receptacle objects are augmented with context-induced affordance which encodes what kind of carriable objects can be placed on it. New edges are discovered with newly discovered non-local relations. With AEG, we perform task planning for scene rearrangement by detecting misplaced carriables and determining a proper placement for each of them. We test our method by implementing a tiding robot in simulator and perform evaluation on a new benchmark we build. Extensive evaluations demonstrate that our method achieves state-of-the-art performance on misplacement detection and the following rearrangement planning.
△ Less
Submitted 12 September, 2024; v1 submitted 21 August, 2024;
originally announced August 2024.
-
A Quick, trustworthy spectral detection Q&A system based on the SDAAP Dataset and large language model
Authors:
Jiheng Liang,
Ziru Yu,
Zujie Xie,
Xiangyang Yu
Abstract:
Large Language Model (LLM) has demonstrated significant success in a range of natural language processing (NLP) tasks within general domain. The emergence of LLM has introduced innovative methodologies across diverse fields, including the natural sciences. Researchers aim to implement automated, concurrent process driven by LLM to supplant conventional manual, repetitive and labor-intensive work.…
▽ More
Large Language Model (LLM) has demonstrated significant success in a range of natural language processing (NLP) tasks within general domain. The emergence of LLM has introduced innovative methodologies across diverse fields, including the natural sciences. Researchers aim to implement automated, concurrent process driven by LLM to supplant conventional manual, repetitive and labor-intensive work. In the domain of spectral analysis and detection, it is imperative for researchers to autonomously acquire pertinent knowledge across various research objects, which encompasses the spectroscopic techniques and the chemometric methods that are employed in experiments and analysis. Paradoxically, despite the recognition of spectroscopic detection as an effective analytical method, the fundamental process of knowledge retrieval remains both time-intensive and repetitive. In response to this challenge, we first introduced the Spectral Detection and Analysis Based Paper(SDAAP) dataset, which is the first open-source textual knowledge dataset for spectral analysis and detection and contains annotated literature data as well as corresponding knowledge instruction data. Subsequently, we also designed an automated Q\&A framework based on the SDAAP dataset, which can retrieve relevant knowledge and generate high-quality responses by extracting entities in the input as retrieval parameters. It is worth noting that: within this framework, LLM is only used as a tool to provide generalizability, while RAG technique is used to accurately capture the source of the knowledge.This approach not only improves the quality of the generated responses, but also ensures the traceability of the knowledge. Experimental results show that our framework generates responses with more reliable expertise compared to the baseline.
△ Less
Submitted 23 August, 2024; v1 submitted 21 August, 2024;
originally announced August 2024.
-
EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning
Authors:
Bohao Xing,
Zitong Yu,
Xin Liu,
Kaishen Yuan,
Qilang Ye,
Weicheng Xie,
Huanjing Yue,
Jingyu Yang,
Heikki Kälviäinen
Abstract:
Facial expression recognition (FER) is an important research topic in emotional artificial intelligence. In recent decades, researchers have made remarkable progress. However, current FER paradigms face challenges in generalization, lack semantic information aligned with natural language, and struggle to process both images and videos within a unified framework, making their application in multimo…
▽ More
Facial expression recognition (FER) is an important research topic in emotional artificial intelligence. In recent decades, researchers have made remarkable progress. However, current FER paradigms face challenges in generalization, lack semantic information aligned with natural language, and struggle to process both images and videos within a unified framework, making their application in multimodal emotion understanding and human-computer interaction difficult. Multimodal Large Language Models (MLLMs) have recently achieved success, offering advantages in addressing these issues and potentially overcoming the limitations of current FER paradigms. However, directly applying pre-trained MLLMs to FER still faces several challenges. Our zero-shot evaluations of existing open-source MLLMs on FER indicate a significant performance gap compared to GPT-4V and current supervised state-of-the-art (SOTA) methods. In this paper, we aim to enhance MLLMs' capabilities in understanding facial expressions. We first generate instruction data for five FER datasets with Gemini. We then propose a novel MLLM, named EMO-LLaMA, which incorporates facial priors from a pretrained facial analysis network to enhance human facial information. Specifically, we design a Face Info Mining module to extract both global and local facial information. Additionally, we utilize a handcrafted prompt to introduce age-gender-race attributes, considering the emotional differences across different human groups. Extensive experiments show that EMO-LLaMA achieves SOTA-comparable or competitive results across both static and dynamic FER datasets. The instruction dataset and code are available at https://github.com/xxtars/EMO-LLaMA.
△ Less
Submitted 21 August, 2024;
originally announced August 2024.
-
DAAD: Dynamic Analysis and Adaptive Discriminator for Fake News Detection
Authors:
Xinqi Su,
Yawen Cui,
Ajian Liu,
Xun Lin,
Yuhao Wang,
Haochen Liang,
Wenhui Li,
Zitong Yu
Abstract:
In current web environment, fake news spreads rapidly across online social networks, posing serious threats to society. Existing multimodal fake news detection (MFND) methods can be classified into knowledge-based and semantic-based approaches. However, these methods are overly dependent on human expertise and feedback, lacking flexibility. To address this challenge, we propose a Dynamic Analysis…
▽ More
In current web environment, fake news spreads rapidly across online social networks, posing serious threats to society. Existing multimodal fake news detection (MFND) methods can be classified into knowledge-based and semantic-based approaches. However, these methods are overly dependent on human expertise and feedback, lacking flexibility. To address this challenge, we propose a Dynamic Analysis and Adaptive Discriminator (DAAD) approach for fake news detection. For knowledge-based methods, we introduce the Monte Carlo Tree Search (MCTS) algorithm to leverage the self-reflective capabilities of large language models (LLMs) for prompt optimization, providing richer, domain-specific details and guidance to the LLMs, while enabling more flexible integration of LLM comment on news content. For semantic-based methods, we define four typical deceit patterns: emotional exaggeration, logical inconsistency, image manipulation, and semantic inconsistency, to reveal the mechanisms behind fake news creation. To detect these patterns, we carefully design four discriminators and expand them in depth and breadth, using the soft-routing mechanism to explore optimal detection models. Experimental results on three real-world datasets demonstrate the superiority of our approach. The code will be available at: https://github.com/SuXinqi/DAAD.
△ Less
Submitted 20 August, 2024;
originally announced August 2024.
-
TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition
Authors:
Tianwei Lin,
Jiang Liu,
Wenqiao Zhang,
Zhaocheng Li,
Yang Dai,
Haoyuan Li,
Zhelun Yu,
Wanggui He,
Juncheng Li,
Hao Jiang,
Siliang Tang,
Yueting Zhuang
Abstract:
While Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multidimensional task scenarios. To address this issue, one straightforward solution is to introduce task-specific LoRA modules as domain experts, leveraging the modeling of multiple experts' capabilities and thus en…
▽ More
While Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multidimensional task scenarios. To address this issue, one straightforward solution is to introduce task-specific LoRA modules as domain experts, leveraging the modeling of multiple experts' capabilities and thus enhancing the general capability of multi-task learning. Despite promising, these additional components often add complexity to the training and inference process, contravening the efficient characterization of PEFT designed for. Considering this, we introduce an innovative PEFT method, TeamLoRA, consisting of a collaboration and competition module for experts, and thus achieving the right balance of effectiveness and efficiency: (i) For collaboration, a novel knowledge-sharing and -organizing mechanism is devised to appropriately reduce the scale of matrix operations, thereby boosting the training and inference speed. (ii) For competition, we propose leveraging a game-theoretic interaction mechanism for experts, encouraging experts to transfer their domain-specific knowledge while facing diverse downstream tasks, and thus enhancing the performance. By doing so, TeamLoRA elegantly connects the experts as a "Team" with internal collaboration and competition, enabling a faster and more accurate PEFT paradigm for multi-task learning. To validate the superiority of TeamLoRA, we curate a comprehensive multi-task evaluation(CME) benchmark to thoroughly assess the capability of multi-task learning. Experiments conducted on our CME and other benchmarks indicate the effectiveness and efficiency of TeamLoRA. Our project is available at https://github.com/Lin-Tianwei/TeamLoRA.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Orientation independent quantification of macromolecular proton fraction in tissues with suppression of residual dipolar coupling
Authors:
Zijian Gao,
Ziqiang Yu,
Ziqin Zhou,
Jian Hou,
Baiyan Jiang,
Michael Ong,
Weitian Chen
Abstract:
Quantitative magnetization transfer (MT) imaging enables non-invasive characterization of the macromolecular environment of tissues. However, recent work has highlighted that the quantification of MT parameters exhibits orientation dependence in ordered tissue structures, potentially confounding its clinical applications. Notably, in tissues with ordered structures, such as articular cartilage and…
▽ More
Quantitative magnetization transfer (MT) imaging enables non-invasive characterization of the macromolecular environment of tissues. However, recent work has highlighted that the quantification of MT parameters exhibits orientation dependence in ordered tissue structures, potentially confounding its clinical applications. Notably, in tissues with ordered structures, such as articular cartilage and myelin, the residual dipolar coupling (RDC) effect can arise owing to incomplete averaging of dipolar-dipolar interactions of water protons. In this study, we demonstrated the confounding effect of RDC on quantitative MT imaging in ordered tissues can be suppressed by using an emerging technique known as macromolecular proton fraction mapping based on spin-lock (MPF-SL). The off-resonance spin-lock pulse in MPF-SL could be designed to generate a strong effective spin-lock field to suppress RDC without violating the specific absorption rate and hardware limitations in clinical scans. Furthermore, removing the water signal in MPF-SL enabled the application of a strong effective spin-lock field without any confounding signal from direct water saturation. Our findings were experimentally validated using human knee specimens and healthy human cartilage. The results demonstrated that MPF-SL exhibits lower sensitivity to tissue orientation compared with R2, R1rho, and saturation-pulse-based MT imaging. Thus, MPF-SL could serve as a valuable orientation-independent technique for quantifying MPF.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
G$^2$V$^2$former: Graph Guided Video Vision Transformer for Face Anti-Spoofing
Authors:
Jingyi Yang,
Zitong Yu,
Xiuming Ni,
Jia He,
Hui Li
Abstract:
In videos containing spoofed faces, we may uncover the spoofing evidence based on either photometric or dynamic abnormality, even a combination of both. Prevailing face anti-spoofing (FAS) approaches generally concentrate on the single-frame scenario, however, purely photometric-driven methods overlook the dynamic spoofing clues that may be exposed over time. This may lead FAS systems to conclude…
▽ More
In videos containing spoofed faces, we may uncover the spoofing evidence based on either photometric or dynamic abnormality, even a combination of both. Prevailing face anti-spoofing (FAS) approaches generally concentrate on the single-frame scenario, however, purely photometric-driven methods overlook the dynamic spoofing clues that may be exposed over time. This may lead FAS systems to conclude incorrect judgments, especially in cases where it is easily distinguishable in terms of dynamics but challenging to discern in terms of photometrics. To this end, we propose the Graph Guided Video Vision Transformer (G$^2$V$^2$former), which combines faces with facial landmarks for photometric and dynamic feature fusion. We factorize the attention into space and time, and fuse them via a spatiotemporal block. Specifically, we design a novel temporal attention called Kronecker temporal attention, which has a wider receptive field, and is beneficial for capturing dynamic information. Moreover, we leverage the low-semantic motion of facial landmarks to guide the high-semantic change of facial expressions based on the motivation that regions containing landmarks may reveal more dynamic clues. Extensive experiments on nine benchmark datasets demonstrate that our method achieves superior performance under various scenarios. The codes will be released soon.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
$χ$SPN: Characteristic Interventional Sum-Product Networks for Causal Inference in Hybrid Domains
Authors:
Harsh Poonia,
Moritz Willig,
Zhongjie Yu,
Matej Zečević,
Kristian Kersting,
Devendra Singh Dhami
Abstract:
Causal inference in hybrid domains, characterized by a mixture of discrete and continuous variables, presents a formidable challenge. We take a step towards this direction and propose Characteristic Interventional Sum-Product Network ($χ$SPN) that is capable of estimating interventional distributions in presence of random variables drawn from mixed distributions. $χ$SPN uses characteristic functio…
▽ More
Causal inference in hybrid domains, characterized by a mixture of discrete and continuous variables, presents a formidable challenge. We take a step towards this direction and propose Characteristic Interventional Sum-Product Network ($χ$SPN) that is capable of estimating interventional distributions in presence of random variables drawn from mixed distributions. $χ$SPN uses characteristic functions in the leaves of an interventional SPN (iSPN) thereby providing a unified view for discrete and continuous random variables through the Fourier-Stieltjes transform of the probability measures. A neural network is used to estimate the parameters of the learned iSPN using the intervened data. Our experiments on 3 synthetic heterogeneous datasets suggest that $χ$SPN can effectively capture the interventional distributions for both discrete and continuous variables while being expressive and causally adequate. We also show that $χ$SPN generalize to multiple interventions while being trained only on a single intervention data.
△ Less
Submitted 14 August, 2024;
originally announced August 2024.
-
Dihadron azimuthal asymmetry and light-quark dipole moments at the Electron-Ion Collider
Authors:
Xin-Kai Wen,
Bin Yan,
Zhite Yu,
C. -P. Yuan
Abstract:
We propose a novel method to probe light-quark dipole moments by examining the azimuthal asymmetries between a collinear pair of hadrons in semi-inclusive deep inelastic lepton scattering off an unpolarized proton target at the Electron-Ion Collider. These asymmetries provide a means to observe transversely polarized quarks, which arise exclusively from the interference between the dipole and the…
▽ More
We propose a novel method to probe light-quark dipole moments by examining the azimuthal asymmetries between a collinear pair of hadrons in semi-inclusive deep inelastic lepton scattering off an unpolarized proton target at the Electron-Ion Collider. These asymmetries provide a means to observe transversely polarized quarks, which arise exclusively from the interference between the dipole and the Standard Model interactions, thereby depending linearly on the dipole couplings. We demonstrate that this novel approach can enhance current constraints on light-quark dipole operators by an order of magnitude, free from contamination of other new physics effects. Furthermore, it allows for a simultaneous determination of both the real and imaginary parts of the dipole couplings, offering a new avenue for investigating potential $CP$-violating effects at high energies.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
DiffSG: A Generative Solver for Network Optimization with Diffusion Model
Authors:
Ruihuai Liang,
Bo Yang,
Zhiwen Yu,
Bin Guo,
Xuelin Cao,
Mérouane Debbah,
H. Vincent Poor,
Chau Yuen
Abstract:
Diffusion generative models, famous for their performance in image generation, are popular in various cross-domain applications. However, their use in the communication community has been mostly limited to auxiliary tasks like data modeling and feature extraction. These models hold greater promise for fundamental problems in network optimization compared to traditional machine learning methods. Di…
▽ More
Diffusion generative models, famous for their performance in image generation, are popular in various cross-domain applications. However, their use in the communication community has been mostly limited to auxiliary tasks like data modeling and feature extraction. These models hold greater promise for fundamental problems in network optimization compared to traditional machine learning methods. Discriminative deep learning often falls short due to its single-step input-output mapping and lack of global awareness of the solution space, especially given the complexity of network optimization's objective functions. In contrast, diffusion generative models can consider a broader range of solutions and exhibit stronger generalization by learning parameters that describe the distribution of the underlying solution space, with higher probabilities assigned to better solutions. We propose a new framework Diffusion Model-based Solution Generation (DiffSG), which leverages the intrinsic distribution learning capabilities of diffusion generative models to learn high-quality solution distributions based on given inputs. The optimal solution within this distribution is highly probable, allowing it to be effectively reached through repeated sampling. We validate the performance of DiffSG on several typical network optimization problems, including mixed-integer non-linear programming, convex optimization, and hierarchical non-convex optimization. Our results show that DiffSG outperforms existing baselines. In summary, we demonstrate the potential of diffusion generative models in tackling complex network optimization problems and outline a promising path for their broader application in the communication community.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
$X(4630)$ and $Y(4626)$ production in the $B^+$ and $B_s^0$ decays
Authors:
Zhuo Yu,
Qi Wu,
Dian-Yong Chen
Abstract:
In the present work, we investigate the production of $X(4630)$ and $Y(4626)$ in $B^+$ and $B_s^0$ decays, where $X(4630)$ and $Y(4626)$ are considered as the $C-$ parity pigeon pair in the $D_{s}^{\ast+} D_{s1}(2536)^-$ molecular frame. The branching fractions of $B^+ \to K^+ X(4630)/Y(4626)$ and $B_s^0 \to ηX(4630)/Y(4626)$ have been evaluated using an effective Lagrangian approach, which are of…
▽ More
In the present work, we investigate the production of $X(4630)$ and $Y(4626)$ in $B^+$ and $B_s^0$ decays, where $X(4630)$ and $Y(4626)$ are considered as the $C-$ parity pigeon pair in the $D_{s}^{\ast+} D_{s1}(2536)^-$ molecular frame. The branching fractions of $B^+ \to K^+ X(4630)/Y(4626)$ and $B_s^0 \to ηX(4630)/Y(4626)$ have been evaluated using an effective Lagrangian approach, which are of the order of $10^{-5}$ and the ratios of these branching fractions are almost independent on the model parameter. Based on the present estimations, we propose to search $Y(4626)$ in the process $B^+ \to K^+ J/ψη^{(\prime)}$, which should be accessible by the LHCb and Belle II Collaborations.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
DiPGrasp: Parallel Local Searching for Efficient Differentiable Grasp Planning
Authors:
Wenqiang Xu,
Jieyi Zhang,
Tutian Tang,
Zhenjun Yu,
Yutong Li,
Cewu Lu
Abstract:
Grasp planning is an important task for robotic manipulation. Though it is a richly studied area, a standalone, fast, and differentiable grasp planner that can work with robot grippers of different DOFs has not been reported. In this work, we present DiPGrasp, a grasp planner that satisfies all these goals. DiPGrasp takes a force-closure geometric surface matching grasp quality metric. It adopts a…
▽ More
Grasp planning is an important task for robotic manipulation. Though it is a richly studied area, a standalone, fast, and differentiable grasp planner that can work with robot grippers of different DOFs has not been reported. In this work, we present DiPGrasp, a grasp planner that satisfies all these goals. DiPGrasp takes a force-closure geometric surface matching grasp quality metric. It adopts a gradient-based optimization scheme on the metric, which also considers parallel sampling and collision handling. This not only drastically accelerates the grasp search process over the object surface but also makes it differentiable. We apply DiPGrasp to three applications, namely grasp dataset construction, mask-conditioned planning, and pose refinement. For dataset generation, as a standalone planner, DiPGrasp has clear advantages over speed and quality compared with several classic planners. For mask-conditioned planning, it can turn a 3D perception model into a 3D grasp detection model instantly. As a pose refiner, it can optimize the coarse grasp prediction from the neural network, as well as the neural network parameters. Finally, we conduct real-world experiments with the Barrett hand and Schunk SVH 5-finger hand. Video and supplementary materials can be viewed on our website: \url{https://dipgrasp.robotflow.ai}.
△ Less
Submitted 8 August, 2024;
originally announced August 2024.
-
Regression analysis of elliptically symmetric direction data
Authors:
Zehao Yu,
Xianzheng Huang
Abstract:
A comprehensive toolkit is developed for regression analysis of directional data based on a flexible class of angular Gaussian distributions. Informative testing procedures for isotropy and covariate effects on the directional response are proposed. Moreover, a prediction region that achieves the smallest volume in a class of ellipsoidal prediction regions of the same coverage probability is const…
▽ More
A comprehensive toolkit is developed for regression analysis of directional data based on a flexible class of angular Gaussian distributions. Informative testing procedures for isotropy and covariate effects on the directional response are proposed. Moreover, a prediction region that achieves the smallest volume in a class of ellipsoidal prediction regions of the same coverage probability is constructed. The efficacy of these inference procedures is demonstrated in simulation experiments. Finally, this new toolkit is used to analyze directional data originating from a hydrology study and a bioinformatics application.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.