Search | arXiv e-print repository

arXiv:2409.06946 [pdf, other]

Refracting Reconfigurable Intelligent Surface Assisted URLLC for Millimeter Wave High-Speed Train Communication Coverage Enhancement

Authors: Changzhu Liu, Ruisi He, Yong Niu, Shiwen Mao, Bo Ai, Ruifeng Chen

Abstract: High-speed train (HST) has garnered significant attention from both academia and industry due to the rapid development of railways worldwide. Millimeter wave (mmWave) communication, known for its large bandwidth is an effective way to address performance bottlenecks in cellular network based HST wireless communication systems. However, mmWave signals suffer from significant path loss when traversi… ▽ More High-speed train (HST) has garnered significant attention from both academia and industry due to the rapid development of railways worldwide. Millimeter wave (mmWave) communication, known for its large bandwidth is an effective way to address performance bottlenecks in cellular network based HST wireless communication systems. However, mmWave signals suffer from significant path loss when traversing carriage, posing substantial challenges to cellular networks. To address this issue, reconfigurable intelligent surfaces (RIS) have gained considerable interest for its ability to enhance cell coverage by reflecting signals toward receiver. Ensuring communication reliability, a core performance indicators of ultra-reliable and low-latency communications (URLLC) in fifth-generation systems, is crucial for providing steady and reliable data transmissions along railways, particularly for delivering safety and control messages and monitoring HST signaling information. In this paper, we investigate a refracting RIS-assisted multi-user multiple-input single-output URLLC system in mmWave HST communications. We propose a sum rate maximization problem, subject to base station beamforming constraint, as well as refracting RIS discrete phase shifts and reliability constraints. To solve this optimization problem, we design a joint optimization algorithm based on alternating optimization method. This involves decoupling the original optimization problem into active beamforming design and packet error probability optimization subproblem, and discrete phase shift design subproblems. These subproblems are addressed exploiting Lagrangian dual method and the local search method, respectively. Simulation results demonstrate the fast convergence of the proposed algorithm and highlight the benefits of refracting RIS adoption for sum rate improvement in mmWave HST networks. △ Less

Submitted 10 September, 2024; originally announced September 2024.

Comments: 11 figures, accepted by IEEE Transactions on Vehicular Technology

arXiv:2409.06747 [pdf, other]

Wave effect of gravitational waves intersected with a microlens field II: an adaptive hierarchical tree algorithm and population study

Authors: Xikai Shan, Guoliang Li, Xuechun Chen, Wen Zhao, Bin Hu, Shude Mao

Abstract: The gravitational lensing wave effect generated by a microlensing field embedded in a lens galaxy is an inevitable phenomenon in strong lensed gravitational waves (SLGWs). This effect presents both challenges and opportunities for the detection and application of SLGWs. However, investigating this wave effect requires computing a complete diffraction integral over each microlens in the field. This… ▽ More The gravitational lensing wave effect generated by a microlensing field embedded in a lens galaxy is an inevitable phenomenon in strong lensed gravitational waves (SLGWs). This effect presents both challenges and opportunities for the detection and application of SLGWs. However, investigating this wave effect requires computing a complete diffraction integral over each microlens in the field. This is extremely time-consuming due to the large number of microlenses. Therefore, simply adding all the microlenses is impractical. Additionally, the complexity of the time delay surface makes the lens plane resolution a crucial factor in controlling numerical errors. In this paper, we propose a trapezoid approximation-based adaptive hierarchical tree algorithm to meet the challenges of calculation speed and precision. We find that this algorithm accelerates the calculation by four orders of magnitude compared to the simple adding method and is one order of magnitude faster than the fixed hierarchical tree algorithm proposed for electromagnetic microlensing. More importantly, our algorithm ensures controllable numerical errors, increasing confidence in the results. Together with our previous work, this paper addresses all numerical issues, including integral convergence, precision, and computational time. Finally, we conducted a population study on the microlensing wave effect of SLGWs using this algorithm and found that the microlensing wave effect cannot be ignored, especially for Type II SLGWs due to their intrinsic geometric structures and their typical intersection with a denser microlensing field. Statistically, more than 33% (11%) of SLGWs have a mismatch larger than 1% (3%) compared to the unlensed waveform. Additionally, we found that the mismatch between signal pairs in a doubly imaged GW is generally larger than 10^{-3}, and 61% (25%) of signal pairs have a mismatch larger than 1% (3%). △ Less

Submitted 10 September, 2024; originally announced September 2024.

Comments: 19 pages, 11 figures, minor revision before publication

arXiv:2409.06141 [pdf, other]

Einstein-Klein-Gordon system via Cauchy-characteristic evolution: Computation of memory and ringdown tail

Authors: Sizheng Ma, Kyle C. Nelli, Jordan Moxon, Mark A. Scheel, Nils Deppe, Lawrence E. Kidder, William Throwe, Nils L. Vu

Abstract: Cauchy-characteristic evolution (CCE) is a powerful method for accurately extracting gravitational waves at future null infinity. In this work, we extend the previously implemented CCE system within the numerical relativity code SpECTRE by incorporating a scalar field. This allows the system to capture features of beyond-general-relativity theories. We derive scalar contributions to the equations… ▽ More Cauchy-characteristic evolution (CCE) is a powerful method for accurately extracting gravitational waves at future null infinity. In this work, we extend the previously implemented CCE system within the numerical relativity code SpECTRE by incorporating a scalar field. This allows the system to capture features of beyond-general-relativity theories. We derive scalar contributions to the equations of motion, Weyl scalar computations, Bianchi identities, and balance laws at future null infinity. Our algorithm, tested across various scenarios, accurately reveals memory effects induced by both scalar and tensor fields and captures Price's power-law tail ($u^{-l-2}$) in scalar fields at future null infinity, in contrast to the $t^{-2l-3}$ tail at future timelike infinity. △ Less

Submitted 9 September, 2024; originally announced September 2024.

arXiv:2409.05229 [pdf, other]

Two Channels of Metal-Rich Compact Stellar System Formation: Starbursts Under High Ram Pressure vs. Tidal Stripping

Authors: Yuan Bian, Min Du, Victor P. Debattista, Dylan Nelson, Mark A. Norris, Luis C. Ho, Shuai Lu, Renyue Cen, Shuo Ma, Chong Ge, Taotao Fang, Hui Li

Abstract: Most galaxies follow well-defined scaling relations of metallicity and stellar mass; however, some outliers at the low mass end of the observed galaxy population exhibit unusually high metallicity for their mass. Understanding how these objects get to be so metal-rich is vital for understanding the role of feedback in galaxy formation. Using the TNG50 simulation, we explore the origins of this phe… ▽ More Most galaxies follow well-defined scaling relations of metallicity and stellar mass; however, some outliers at the low mass end of the observed galaxy population exhibit unusually high metallicity for their mass. Understanding how these objects get to be so metal-rich is vital for understanding the role of feedback in galaxy formation. Using the TNG50 simulation, we explore the origins of this phenomenon. We identify 227 metal-rich, Compact Stellar Systems (CSSs) that deviate significantly from this scaling relation. These CSSs are satellites located in the vicinity of massive host galaxies, with stellar masses ranging from $10^{8} M_{\odot}$ to $10^{10} M_{\odot}$ (including six systems that are close analogs of the M31-M32 system). Contrary to the previously assumed scenario that such objects are predominantly products of tidal stripping, our results suggest a more prevalent role for ram pressure in their formation. Indeed, 76\% (173) of these CSSs are formed through a burst of star formation occurring around the time of the first pericentric passage, typically at redshifts $z\lesssim1$, aided by strong ram pressure and tidal forces. The high ram pressure, resulting from the CSSs' rapid motion near the halo center, facilitates metal enrichment, producing high-metallicity CSSs by confining the metal-rich gas from bursty star formation, which leads to distinct stellar populations characterized by enhanced metallicity as well as high $α$-abundance. Only the remaining 24\% (54) of metal-rich CSSs are generated through the tidal stripping of massive progenitors. Our results further indicate that M32 is more likely to have formed through intense star formation events rather than through gradual, tidal stripping, thereby providing crucial insights into the nature of low mass, compact galaxy formation. △ Less

Submitted 8 September, 2024; originally announced September 2024.

Comments: 28 pages, 13 figures. Submitted

arXiv:2409.04315 [pdf, ps, other]

Siegel operators for holomorphic differential forms

Authors: Shouhei Ma

Abstract: We give a geometric interpretation of the Siegel operators for holomorphic differential forms on Siegel modular varieties. This involves extension of the differential forms over a toroidal compactification, and we show that the Siegel operator essentially describes the restriction and descent to the boundary Kuga variety via holomorphic Leray filtration. As a consequence, we obtain equivalence of… ▽ More We give a geometric interpretation of the Siegel operators for holomorphic differential forms on Siegel modular varieties. This involves extension of the differential forms over a toroidal compactification, and we show that the Siegel operator essentially describes the restriction and descent to the boundary Kuga variety via holomorphic Leray filtration. As a consequence, we obtain equivalence of various notions of "vanishing at boundary'' for holomorphic forms. We also study the case of orthogonal modular varieties. △ Less

Submitted 6 September, 2024; originally announced September 2024.

MSC Class: 11F46; 11F55; 11F75

arXiv:2409.02260 [pdf, other]

Penalty Adversarial Network (PAN): A neural network-based method to solve PDE-constrained optimal control problems

Authors: Shilin Ma, Yukun Yue

Abstract: In this work, we introduce a novel strategy for tackling constrained optimization problems through a modified penalty method. Conventional penalty methods convert constrained problems into unconstrained ones by incorporating constraints into the loss function via a penalty term. However, selecting an optimal penalty parameter remains challenging; an improper choice, whether excessively high or low… ▽ More In this work, we introduce a novel strategy for tackling constrained optimization problems through a modified penalty method. Conventional penalty methods convert constrained problems into unconstrained ones by incorporating constraints into the loss function via a penalty term. However, selecting an optimal penalty parameter remains challenging; an improper choice, whether excessively high or low, can significantly impede the discovery of the true solution. This challenge is particularly evident when training neural networks for constrained optimization, where tuning parameters can become an extensive and laborious task. To overcome these issues, we propose an adversarial approach that redefines the conventional penalty method by simultaneously considering two competing penalty problems--a technique we term the penalty adversarial problem. Within linear settings, our method not only ensures the fulfillment of constraints but also guarantees solvability, leading to more precise solutions compared to traditional approaches. We further reveal that our method effectively performs an automatic adjustment of penalty parameters by leveraging the relationship between the objective and loss functions, thereby obviating the need for manual parameter tuning. Additionally, we extend this adversarial framework to develop a neural network-based solution for optimal control problems governed by linear or nonlinear partial differential equations. We demonstrate the efficacy of this innovative approach through a series of numerical examples. △ Less

Submitted 3 September, 2024; originally announced September 2024.

arXiv:2409.02157 [pdf, other]

An Earth-Mass Planet and a Brown Dwarf in Orbit Around a White Dwarf

Authors: Keming Zhang, Weicheng Zang, Kareem El-Badry, Jessica R. Lu, Joshua S. Bloom, Eric Agol, B. Scott Gaudi, Quinn Konopacky, Natalie LeBaron, Shude Mao, Sean Terry

Abstract: Terrestrial planets born beyond 1-3 AU have been theorized to avoid being engulfed during the red-giant phases of their host stars. Nevertheless, only a few gas-giant planets have been observed around white dwarfs (WDs) -- the end product left behind by a red giant. Here we report on evidence that the lens system that produced the microlensing event KMT-2020-BLG-0414 is composed of a WD orbited by… ▽ More Terrestrial planets born beyond 1-3 AU have been theorized to avoid being engulfed during the red-giant phases of their host stars. Nevertheless, only a few gas-giant planets have been observed around white dwarfs (WDs) -- the end product left behind by a red giant. Here we report on evidence that the lens system that produced the microlensing event KMT-2020-BLG-0414 is composed of a WD orbited by an Earth-mass planet and a brown dwarf (BD) companion, as shown by the non-detection of the lens flux using Keck Adaptive Optics (AO). From microlensing orbital motion constraints, we determine the planet to be a $1.9\pm0.2$ Earth-mass ($M_\oplus$) planet at a physical separation of $2.1\pm0.2$ au from the WD during the event. By considering the system evolutionary history, we determine the BD companion to have a projected separation of 22 au from the WD, and reject an alternative model that places the BD at 0.2 au. Given planetary orbital expansion during the final evolutionary stages of the host star, this Earth-mass planet may have existed in an initial orbit close to 1 au, thereby offering a glimpse into the possible survival of planet Earth in the distant future. △ Less

Submitted 3 September, 2024; originally announced September 2024.

Comments: Accepted. 25 pages, 7 figures, 4 tables

arXiv:2409.00956 [pdf]

Physics-Informed Neural Network Based Digital Image Correlation Method

Authors: Boda Li, Shichao Zhou, Qinwei Ma, Shaopeng Ma

Abstract: Digital Image Correlation (DIC) is a key technique in experimental mechanics for full-field deformation measurement, traditionally relying on subset matching to determine displacement fields. However, selecting optimal parameters like shape functions and subset size can be challenging in non-uniform deformation scenarios. Recent deep learning-based DIC approaches, both supervised and unsupervised,… ▽ More Digital Image Correlation (DIC) is a key technique in experimental mechanics for full-field deformation measurement, traditionally relying on subset matching to determine displacement fields. However, selecting optimal parameters like shape functions and subset size can be challenging in non-uniform deformation scenarios. Recent deep learning-based DIC approaches, both supervised and unsupervised, use neural networks to map speckle images to deformation fields, offering precise measurements without manual tuning. However, these methods require complex network architectures to extract speckle image features, which does not guarantee solution accuracy This paper introduces PINN-DIC, a novel DIC method based on Physics-Informed Neural Networks (PINNs). Unlike traditional approaches, PINN-DIC uses a simple fully connected neural network that takes the coordinate domain as input and outputs the displacement field. By integrating the DIC governing equation into the loss function, PINN-DIC directly extracts the displacement field from reference and deformed speckle images through iterative optimization. Evaluations on simulated and real experiments demonstrate that PINN-DIC maintains the accuracy of deep learning-based DIC in non-uniform fields while offering three distinct advantages: 1) enhanced precision with a simpler network by directly fitting the displacement field from coordinates, 2) effective handling of irregular boundary displacement fields with minimal parameter adjustments, and 3) easy integration with other neural network-based mechanical analysis methods for comprehensive DIC result analysis. △ Less

Submitted 2 September, 2024; originally announced September 2024.

arXiv:2408.15245 [pdf, other]

An Edge AI System Based on FPGA Platform for Railway Fault Detection

Authors: Jiale Li, Yulin Fu, Dongwei Yan, Sean Longyu Ma, Chiu-Wing Sham

Abstract: As the demands for railway transportation safety increase, traditional methods of rail track inspection no longer meet the needs of modern railway systems. To address the issues of automation and efficiency in rail fault detection, this study introduces a railway inspection system based on Field Programmable Gate Array (FPGA). This edge AI system collects track images via cameras and uses Convolut… ▽ More As the demands for railway transportation safety increase, traditional methods of rail track inspection no longer meet the needs of modern railway systems. To address the issues of automation and efficiency in rail fault detection, this study introduces a railway inspection system based on Field Programmable Gate Array (FPGA). This edge AI system collects track images via cameras and uses Convolutional Neural Networks (CNN) to perform real-time detection of track defects and automatically reports fault information. The innovation of this system lies in its high level of automation and detection efficiency. The neural network approach employed by this system achieves a detection accuracy of 88.9%, significantly enhancing the reliability and efficiency of detection. Experimental results demonstrate that this FPGA-based system is 1.39* and 4.67* better in energy efficiency than peer implementation on the GPU and CPU platform, respectively. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: Accepted at the 2024 IEEE 13th Global Conference on Consumer Electronics (GCCE 2024)

arXiv:2408.14478 [pdf, other]

Uncertainty Quantification in Alzheimer's Disease Progression Modeling

Authors: Wael Mobeirek, Shirley Mao

Abstract: With the increasing number of patients diagnosed with Alzheimer's Disease, prognosis models have the potential to aid in early disease detection. However, current approaches raise dependability concerns as they do not account for uncertainty. In this work, we compare the performance of Monte Carlo Dropout, Variational Inference, Markov Chain Monte Carlo, and Ensemble Learning trained on 512 patien… ▽ More With the increasing number of patients diagnosed with Alzheimer's Disease, prognosis models have the potential to aid in early disease detection. However, current approaches raise dependability concerns as they do not account for uncertainty. In this work, we compare the performance of Monte Carlo Dropout, Variational Inference, Markov Chain Monte Carlo, and Ensemble Learning trained on 512 patients to predict 4-year cognitive score trajectories with confidence bounds. We show that MC Dropout and MCMC are able to produce well-calibrated, and accurate predictions under noisy training data. △ Less

Submitted 13 August, 2024; originally announced August 2024.

Comments: This work was done as part of degree requirements for the authors in 2021-2022

arXiv:2408.13960 [pdf, other]

Time Series Analysis for Education: Methods, Applications, and Future Directions

Authors: Shengzhong Mao, Chaoli Zhang, Yichi Song, Jindong Wang, Xiao-Jun Zeng, Zenglin Xu, Qingsong Wen

Abstract: Recent advancements in the collection and analysis of sequential educational data have brought time series analysis to a pivotal position in educational research, highlighting its essential role in facilitating data-driven decision-making. However, there is a lack of comprehensive summaries that consolidate these advancements. To the best of our knowledge, this paper is the first to provide a comp… ▽ More Recent advancements in the collection and analysis of sequential educational data have brought time series analysis to a pivotal position in educational research, highlighting its essential role in facilitating data-driven decision-making. However, there is a lack of comprehensive summaries that consolidate these advancements. To the best of our knowledge, this paper is the first to provide a comprehensive review of time series analysis techniques specifically within the educational context. We begin by exploring the landscape of educational data analytics, categorizing various data sources and types relevant to education. We then review four prominent time series methods-forecasting, classification, clustering, and anomaly detection-illustrating their specific application points in educational settings. Subsequently, we present a range of educational scenarios and applications, focusing on how these methods are employed to address diverse educational tasks, which highlights the practical integration of multiple time series methods to solve complex educational problems. Finally, we conclude with a discussion on future directions, including personalized learning analytics, multimodal data fusion, and the role of large language models (LLMs) in educational time series. The contributions of this paper include a detailed taxonomy of educational data, a synthesis of time series techniques with specific educational applications, and a forward-looking perspective on emerging trends and future research opportunities in educational analysis. The related papers and resources are available and regularly updated at the project page. △ Less

Submitted 27 August, 2024; v1 submitted 25 August, 2024; originally announced August 2024.

Comments: 24 pages, 3 figures, 6 tables, project page: see https://github.com/ai-for-edu/time-series-analysis-for-education

arXiv:2408.13759 [pdf, other]

MASQ: Multi-Agent Reinforcement Learning for Single Quadruped Robot Locomotion

Authors: Qi Liu, Jingxiang Guo, Sixu Lin, Shuaikang Ma, Jinxuan Zhu, Yanjie Li

Abstract: This paper proposes a novel method to improve locomotion learning for a single quadruped robot using multi-agent deep reinforcement learning (MARL). Many existing methods use single-agent reinforcement learning for an individual robot or MARL for the cooperative task in multi-robot systems. Unlike existing methods, this paper proposes using MARL for the locomotion learning of a single quadruped ro… ▽ More This paper proposes a novel method to improve locomotion learning for a single quadruped robot using multi-agent deep reinforcement learning (MARL). Many existing methods use single-agent reinforcement learning for an individual robot or MARL for the cooperative task in multi-robot systems. Unlike existing methods, this paper proposes using MARL for the locomotion learning of a single quadruped robot. We develop a learning structure called Multi-Agent Reinforcement Learning for Single Quadruped Robot Locomotion (MASQ), considering each leg as an agent to explore the action space of the quadruped robot, sharing a global critic, and learning collaboratively. Experimental results indicate that MASQ not only speeds up learning convergence but also enhances robustness in real-world settings, suggesting that applying MASQ to single robots such as quadrupeds could surpass traditional single-robot reinforcement learning approaches. Our study provides insightful guidance on integrating MARL with single-robot locomotion learning. △ Less

Submitted 25 August, 2024; originally announced August 2024.

arXiv:2408.11398 [pdf, other]

Generative AI based Secure Wireless Sensing for ISAC Networks

Authors: Jiacheng Wang, Hongyang Du, Yinqiu Liu, Geng Sun, Dusit Niyato, Shiwen Mao, Dong In Kim, Xuemin Shen

Abstract: Integrated sensing and communications (ISAC) is expected to be a key technology for 6G, and channel state information (CSI) based sensing is a key component of ISAC. However, current research on ISAC focuses mainly on improving sensing performance, overlooking security issues, particularly the unauthorized sensing of users. In this paper, we propose a secure sensing system (DFSS) based on two dist… ▽ More Integrated sensing and communications (ISAC) is expected to be a key technology for 6G, and channel state information (CSI) based sensing is a key component of ISAC. However, current research on ISAC focuses mainly on improving sensing performance, overlooking security issues, particularly the unauthorized sensing of users. In this paper, we propose a secure sensing system (DFSS) based on two distinct diffusion models. Specifically, we first propose a discrete conditional diffusion model to generate graphs with nodes and edges, guiding the ISAC system to appropriately activate wireless links and nodes, which ensures the sensing performance while minimizing the operation cost. Using the activated links and nodes, DFSS then employs the continuous conditional diffusion model to generate safeguarding signals, which are next modulated onto the pilot at the transmitter to mask fluctuations caused by user activities. As such, only ISAC devices authorized with the safeguarding signals can extract the true CSI for sensing, while unauthorized devices are unable to achieve the same sensing. Experiment results demonstrate that DFSS can reduce the activity recognition accuracy of the unauthorized devices by approximately 70%, effectively shield the user from the unauthorized surveillance. △ Less

Submitted 21 August, 2024; originally announced August 2024.

arXiv:2408.11313 [pdf, other]

Unlocking Adversarial Suffix Optimization Without Affirmative Phrases: Efficient Black-box Jailbreaking via LLM as Optimizer

Authors: Weipeng Jiang, Zhenting Wang, Juan Zhai, Shiqing Ma, Zhengyu Zhao, Chao Shen

Abstract: Despite prior safety alignment efforts, mainstream LLMs can still generate harmful and unethical content when subjected to jailbreaking attacks. Existing jailbreaking methods fall into two main categories: template-based and optimization-based methods. The former requires significant manual effort and domain knowledge, while the latter, exemplified by Greedy Coordinate Gradient (GCG), which seeks… ▽ More Despite prior safety alignment efforts, mainstream LLMs can still generate harmful and unethical content when subjected to jailbreaking attacks. Existing jailbreaking methods fall into two main categories: template-based and optimization-based methods. The former requires significant manual effort and domain knowledge, while the latter, exemplified by Greedy Coordinate Gradient (GCG), which seeks to maximize the likelihood of harmful LLM outputs through token-level optimization, also encounters several limitations: requiring white-box access, necessitating pre-constructed affirmative phrase, and suffering from low efficiency. In this paper, we present ECLIPSE, a novel and efficient black-box jailbreaking method utilizing optimizable suffixes. Drawing inspiration from LLMs' powerful generation and optimization capabilities, we employ task prompts to translate jailbreaking goals into natural language instructions. This guides the LLM to generate adversarial suffixes for malicious queries. In particular, a harmfulness scorer provides continuous feedback, enabling LLM self-reflection and iterative optimization to autonomously and efficiently produce effective suffixes. Experimental results demonstrate that ECLIPSE achieves an average attack success rate (ASR) of 0.92 across three open-source LLMs and GPT-3.5-Turbo, significantly surpassing GCG in 2.4 times. Moreover, ECLIPSE is on par with template-based methods in ASR while offering superior attack efficiency, reducing the average attack overhead by 83%. △ Less

Submitted 20 August, 2024; originally announced August 2024.

arXiv:2408.08977 [pdf, other]

FedFQ: Federated Learning with Fine-Grained Quantization

Authors: Haowei Li, Weiying Xie, Hangyu Ye, Jitao Ma, Shuran Ma, Yunsong Li

Abstract: Federated learning (FL) is a decentralized approach, enabling multiple participants to collaboratively train a model while ensuring the protection of data privacy. The transmission of updates from numerous edge clusters to the server creates a significant communication bottleneck in FL. Quantization is an effective compression technology, showcasing immense potential in addressing this bottleneck… ▽ More Federated learning (FL) is a decentralized approach, enabling multiple participants to collaboratively train a model while ensuring the protection of data privacy. The transmission of updates from numerous edge clusters to the server creates a significant communication bottleneck in FL. Quantization is an effective compression technology, showcasing immense potential in addressing this bottleneck problem. The Non-IID nature of FL renders it sensitive to quantization. Existing quantized FL frameworks inadequately balance high compression ratios and superior convergence performance by roughly employing a uniform quantization bit-width on the client-side. In this work, we propose a communication-efficient FL algorithm with a fine-grained adaptive quantization strategy (FedFQ). FedFQ addresses the trade-off between achieving high communication compression ratios and maintaining superior convergence performance by introducing parameter-level quantization. Specifically, we have designed a Constraint-Guided Simulated Annealing algorithm to determine specific quantization schemes. We derive the convergence of FedFQ, demonstrating its superior convergence performance compared to existing quantized FL algorithms. We conducted extensive experiments on multiple benchmarks and demonstrated that, while maintaining lossless performance, FedFQ achieves a compression ratio of 27 times to 63 times compared to the baseline experiment. △ Less

Submitted 16 August, 2024; originally announced August 2024.

arXiv:2408.08862 [pdf, other]

Visual Agents as Fast and Slow Thinkers

Authors: Guangyan Sun, Mingyu Jin, Zhenting Wang, Cheng-Long Wang, Siqi Ma, Qifan Wang, Ying Nian Wu, Yongfeng Zhang, Dongfang Liu

Abstract: Achieving human-level intelligence requires refining cognitive distinctions between System 1 and System 2 thinking. While contemporary AI, driven by large language models, demonstrates human-like traits, it falls short of genuine cognition. Transitioning from structured benchmarks to real-world scenarios presents challenges for visual agents, often leading to inaccurate and overly confident respon… ▽ More Achieving human-level intelligence requires refining cognitive distinctions between System 1 and System 2 thinking. While contemporary AI, driven by large language models, demonstrates human-like traits, it falls short of genuine cognition. Transitioning from structured benchmarks to real-world scenarios presents challenges for visual agents, often leading to inaccurate and overly confident responses. To address the challenge, we introduce FaST, which incorporates the Fast and Slow Thinking mechanism into visual agents. FaST employs a switch adapter to dynamically select between System 1/2 modes, tailoring the problem-solving approach to different task complexity. It tackles uncertain and unseen objects by adjusting model confidence and integrating new contextual data. With this novel design, we advocate a flexible system, hierarchical reasoning capabilities, and a transparent decision-making pipeline, all of which contribute to its ability to emulate human-like cognitive processes in visual intelligence. Empirical results demonstrate that FaST outperforms various well-known baselines, achieving 80.8% accuracy over VQA^{v2} for visual question answering and 48.7% GIoU score over ReasonSeg for reasoning segmentation, demonstrate FaST's superior performance. Extensive testing validates the efficacy and robustness of FaST's core components, showcasing its potential to advance the development of cognitive visual agents in AI systems. The code is available at ttps://github.com/GuangyanS/Sys2-LLaVA. △ Less

Submitted 6 September, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

arXiv:2408.08833 [pdf, other]

Intra-symbol Differential Amplitude Shift Keying-aided Blind Detector for Ambient Backscatter Communication Systems

Authors: Shuaijun Ma, Peng Wei, Sa Xiao, Jianquan Wang, Wanbin Tang, Wei Xiang

Abstract: Ambient backscatter communications (AmBC) are a promising technology for addressing the energy consumption challenge in wireless communications through the reflection or absorption of surrounding radio frequency (RF) signals. However, it grapples with the intricacies of ambient RF signal and the round-trip path loss. For traditional detectors, the incorporation of pilot sequences results in a redu… ▽ More Ambient backscatter communications (AmBC) are a promising technology for addressing the energy consumption challenge in wireless communications through the reflection or absorption of surrounding radio frequency (RF) signals. However, it grapples with the intricacies of ambient RF signal and the round-trip path loss. For traditional detectors, the incorporation of pilot sequences results in a reduction in spectral efficiency. Furthermore, traditional energy-based detectors are inherently susceptible to a notable error floor issue, attributed to the co-channel direct link interference (DLI). Consequently, this paper proposes a blind symbol detector without the prior knowledge of the channel state information, signal variance, and noise variance. By leveraging the intra-symbol differential amplitude shift keying (IDASK) scheme, this detector effectively redirects the majority of the DLI energy towards the largest eigenvalue of the received sample covariance matrix, thereby utilizing the second largest eigenvalue for efficient symbol detection. In addition, this paper conducts theoretical performance analyses of the proposed detector in terms of the false alarm probability, missed detection probability, and the bit-error rate (BER) lower bound. Simulation results demonstrate that the proposed blind detector exhibits a significant enhancement in symbol detection performance compared to its traditional counterparts. △ Less

Submitted 16 August, 2024; originally announced August 2024.

arXiv:2408.08765 [pdf, other]

Rethinking Generative Semantic Communication for Multi-User Systems with Multi-Modal LLM

Authors: Wanting Yang, Zehui Xiong, Shiwen Mao, Tony Q. S. Quek, Ping Zhang, Merouane Debbah, Rahim Tafazolli

Abstract: The surge in connected devices in 6G with typical massive access scenarios, such as smart agriculture, and smart cities, poses significant challenges to unsustainable traditional communication with limited radio resources and already high system complexity. Fortunately, the booming artificial intelligence technology and the growing computational power of devices offer a promising 6G enabler: seman… ▽ More The surge in connected devices in 6G with typical massive access scenarios, such as smart agriculture, and smart cities, poses significant challenges to unsustainable traditional communication with limited radio resources and already high system complexity. Fortunately, the booming artificial intelligence technology and the growing computational power of devices offer a promising 6G enabler: semantic communication (SemCom). However, existing deep learning-based SemCom paradigms struggle to extend to multi-user scenarios due to their rigid end-to-end training approach. Consequently, to truly empower 6G networks with this critical technology, this article rethinks generative SemCom for multi-user system with multi-modal large language model (MLLM), and propose a novel framework called "M2GSC". In this framework, the MLLM, which serves as shared knowledge base (SKB), plays three critical roles for complex tasks, spawning a series of benefits such as semantic encoding standardization and semantic decoding personalization. Meanwhile, to enhance the performance of M2GSC framework and to advance its implementation in 6G, we highlight three research directions on M2GSC framework, namely, upgrading SKB to closed loop agent, adaptive semantic encoding offloading, and streamlined semantic decoding offloading. Finally, a case study is conducted to demonstrate the preliminary validation on the effectiveness of the M2GSC framework in terms of streamlined decoding offloading. △ Less

Submitted 16 August, 2024; originally announced August 2024.

arXiv:2408.06648 [pdf, other]

A Miniature Vision-Based Localization System for Indoor Blimps

Authors: Shicong Ma

Abstract: With increasing attention paid to blimp research, I hope to build an indoor blimp to interact with humans. To begin with, I propose developing a visual localization system to enable blimps to localize themselves in an indoor environment autonomously. This system initially reconstructs an indoor environment by employing Structure from Motion with Superpoint visual features. Next, with the previousl… ▽ More With increasing attention paid to blimp research, I hope to build an indoor blimp to interact with humans. To begin with, I propose developing a visual localization system to enable blimps to localize themselves in an indoor environment autonomously. This system initially reconstructs an indoor environment by employing Structure from Motion with Superpoint visual features. Next, with the previously built sparse point cloud map, the system generates camera poses by continuously employing pose estimation on matched visual features observed from the map. In this project, the blimp only serves as a reference mobile platform that constrains the weight of the perception system. The perception system contains one monocular camera and a WiFi adaptor to capture and transmit visual data to a ground PC station where the algorithms will be executed. The success of this project will transform remote-controlled indoor blimps into autonomous indoor blimps, which can be utilized for applications such as surveillance, advertisement, and indoor mapping. △ Less

Submitted 13 August, 2024; originally announced August 2024.

arXiv:2408.04825 [pdf, other]

doi 10.1109/MNET.2024.3421517

Towards Effective and Interpretable Semantic Communications

Authors: Youlong Wu, Yuanmin Shi, Shuai Ma, Chunxiao Jiang, Wei Zhang, Khaled B. Letaief

Abstract: With the exponential surge in traffic data and the pressing need for ultra-low latency in emerging intelligence applications, it is envisioned that 6G networks will demand disruptive communication technologies to foster ubiquitous intelligence and succinctness within the human society. Semantic communication, a novel paradigm, holds the promise of significantly curtailing communication overhead an… ▽ More With the exponential surge in traffic data and the pressing need for ultra-low latency in emerging intelligence applications, it is envisioned that 6G networks will demand disruptive communication technologies to foster ubiquitous intelligence and succinctness within the human society. Semantic communication, a novel paradigm, holds the promise of significantly curtailing communication overhead and latency by transmitting only task-relevant information. Despite numerous efforts in both theoretical frameworks and practical implementations of semantic communications, a substantial theory-practice gap complicates the theoretical analysis and interpretation, particularly when employing black-box machine learning techniques. This article initially delves into information-theoretic metrics such as semantic entropy, semantic distortions, and semantic communication rate to characterize the information flow in semantic communications. Subsequently, it provides a guideline for implementing semantic communications to ensure both theoretical interpretability and communication effectiveness. △ Less

Submitted 8 August, 2024; originally announced August 2024.

Comments: This paper has been accepted by IEEE Network Magazine

arXiv:2408.04682 [pdf, other]

ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities

Authors: Jiarui Lu, Thomas Holleis, Yizhe Zhang, Bernhard Aumayer, Feng Nan, Felix Bai, Shuang Ma, Shen Ma, Mengyu Li, Guoli Yin, Zirui Wang, Ruoming Pang

Abstract: Recent large language models (LLMs) advancements sparked a growing research interest in tool assisted LLMs solving real-world challenges, which calls for comprehensive evaluation of tool-use capabilities. While previous works focused on either evaluating over stateless web services (RESTful API), based on a single turn user prompt, or an off-policy dialog trajectory, ToolSandbox includes stateful… ▽ More Recent large language models (LLMs) advancements sparked a growing research interest in tool assisted LLMs solving real-world challenges, which calls for comprehensive evaluation of tool-use capabilities. While previous works focused on either evaluating over stateless web services (RESTful API), based on a single turn user prompt, or an off-policy dialog trajectory, ToolSandbox includes stateful tool execution, implicit state dependencies between tools, a built-in user simulator supporting on-policy conversational evaluation and a dynamic evaluation strategy for intermediate and final milestones over an arbitrary trajectory. We show that open source and proprietary models have a significant performance gap, and complex tasks like State Dependency, Canonicalization and Insufficient Information defined in ToolSandbox are challenging even the most capable SOTA LLMs, providing brand-new insights into tool-use LLM capabilities. ToolSandbox evaluation framework is released at https://github.com/apple/ToolSandbox △ Less

Submitted 8 August, 2024; originally announced August 2024.

arXiv:2408.02669 [pdf, other]

doi 10.1093/mnras/stae1939

SDSS-IV MaNGA: Stellar rotational support in disk galaxies vs. central surface density and stellar population age

Authors: Xiaohan Wang, Yifei Luo, S. M. Faber, David C. Koo, Shude Mao, Kyle B. Westfall, Shengdong Lu, Weichen Wang, Kevin Bundy, N. Boardman, Vladimir Avila-Reese, José G. Fernández-Trincado, Richard R. Lane

Abstract: We investigate how the stellar rotational support changes as a function of spatially resolved stellar population age ($\rm D_n4000$) and relative central stellar surface density ($ΔΣ_1$) for MaNGA isolated/central disk galaxies. We find that the galaxy rotational support $λ_{R_\mathrm{e}}$ varies smoothly as a function of $ΔΣ_1$ and $\rm D_n4000$. $\rm D_n4000$ vs. $ΔΣ_1$ follows a "J-shape", with… ▽ More We investigate how the stellar rotational support changes as a function of spatially resolved stellar population age ($\rm D_n4000$) and relative central stellar surface density ($ΔΣ_1$) for MaNGA isolated/central disk galaxies. We find that the galaxy rotational support $λ_{R_\mathrm{e}}$ varies smoothly as a function of $ΔΣ_1$ and $\rm D_n4000$. $\rm D_n4000$ vs. $ΔΣ_1$ follows a "J-shape", with $λ_{R_\mathrm{e}}$ contributing to the scatters. In this "J-shaped" pattern rotational support increases with central $\rm D_n4000$ when $ΔΣ_1$ is low but decreases with $ΔΣ_1$ when $ΔΣ_1$ is high. Restricting attention to low-$ΔΣ_1$ (i.e, large-radius) galaxies, we suggest that the trend of increasing rotational support with $\rm D_n4000$ for these objects is produced by a mix of two different processes, a primary trend characterized by growth in $λ_{R_\mathrm{e}}$ along with mass through gas accretion, on top of which disturbance episodes are overlaid, which reduce rotational support and trigger increased star formation. An additional finding is that star forming galaxies with low $ΔΣ_1$ have relatively larger radii than galaxies with higher $ΔΣ_1$ at fixed stellar mass. Assuming that these relative radii rankings are preserved while galaxies are star forming then implies clear evolutionary paths in central $\rm D_n4000$ vs. $ΔΣ_1$. The paper closes with comments on the implications that these paths have for the evolution of pseudo-bulges vs. classical-bulges. The utility of using $\rm D_n4000$-$ΔΣ_1$ to study $λ_{R_\mathrm{e}}$ reinforces the notion that galaxy kinematics correlate both with structure and with stellar-population state, and indicates the importance of a multi-dimensional description for understanding bulge and galaxy evolution. △ Less

Submitted 5 August, 2024; originally announced August 2024.

Comments: 24 pages, 22 figures (including Appendix), accepted for publication in MNRAS

arXiv:2408.02103 [pdf, other]

Effective Demonstration Annotation for In-Context Learning via Language Model-Based Determinantal Point Process

Authors: Peng Wang, Xiaobin Wang, Chao Lou, Shengyu Mao, Pengjun Xie, Yong Jiang

Abstract: In-context learning (ICL) is a few-shot learning paradigm that involves learning mappings through input-output pairs and appropriately applying them to new instances. Despite the remarkable ICL capabilities demonstrated by Large Language Models (LLMs), existing works are highly dependent on large-scale labeled support sets, not always feasible in practical scenarios. To refine this approach, we fo… ▽ More In-context learning (ICL) is a few-shot learning paradigm that involves learning mappings through input-output pairs and appropriately applying them to new instances. Despite the remarkable ICL capabilities demonstrated by Large Language Models (LLMs), existing works are highly dependent on large-scale labeled support sets, not always feasible in practical scenarios. To refine this approach, we focus primarily on an innovative selective annotation mechanism, which precedes the standard demonstration retrieval. We introduce the Language Model-based Determinant Point Process (LM-DPP) that simultaneously considers the uncertainty and diversity of unlabeled instances for optimal selection. Consequently, this yields a subset for annotation that strikes a trade-off between the two factors. We apply LM-DPP to various language models, including GPT-J, LlaMA, and GPT-3. Experimental results on 9 NLU and 2 Generation datasets demonstrate that LM-DPP can effectively select canonical examples. Further analysis reveals that LLMs benefit most significantly from subsets that are both low uncertainty and high diversity. △ Less

Submitted 4 August, 2024; originally announced August 2024.

arXiv:2408.01946 [pdf, other]

Masked Angle-Aware Autoencoder for Remote Sensing Images

Authors: Zhihao Li, Biao Hou, Siteng Ma, Zitong Wu, Xianpeng Guo, Bo Ren, Licheng Jiao

Abstract: To overcome the inherent domain gap between remote sensing (RS) images and natural images, some self-supervised representation learning methods have made promising progress. However, they have overlooked the diverse angles present in RS objects. This paper proposes the Masked Angle-Aware Autoencoder (MA3E) to perceive and learn angles during pre-training. We design a \textit{scaling center crop} o… ▽ More To overcome the inherent domain gap between remote sensing (RS) images and natural images, some self-supervised representation learning methods have made promising progress. However, they have overlooked the diverse angles present in RS objects. This paper proposes the Masked Angle-Aware Autoencoder (MA3E) to perceive and learn angles during pre-training. We design a \textit{scaling center crop} operation to create the rotated crop with random orientation on each original image, introducing the explicit angle variation. MA3E inputs this composite image while reconstruct the original image, aiming to effectively learn rotation-invariant representations by restoring the angle variation introduced on the rotated crop. To avoid biases caused by directly reconstructing the rotated crop, we propose an Optimal Transport (OT) loss that automatically assigns similar original image patches to each rotated crop patch for reconstruction. MA3E demonstrates more competitive performance than existing pre-training methods on seven different RS image datasets in three downstream tasks. △ Less

Submitted 4 August, 2024; originally announced August 2024.

Comments: This paper has been accepted by ECCV 2024

arXiv:2408.01937 [pdf]

Inflight Performance and Calibrations of the Lyman-alpha Solar Telescope on board the Advanced Space-based Solar Observatory

Authors: Bo Chen, Li Feng, Guang Zhang, Hui Li, Lingping He, Kefei Song, Quanfeng Guo, Ying Li, Yu Huang, Jingwei Li, Jie Zhao, Jianchao Xue, Gen Li, Guanglu Shi, Dechao Song, Lei Lu, Beili Ying, Haifeng Wang, Shuang Dai, Xiaodong Wang, Shilei Mao, Peng Wang, Kun Wu, Shuai Ren, Liang Sun , et al. (18 additional authors not shown)

Abstract: The Lyman-alpha Solar Telescope (LST) on board the Advanced Space-based Solar Observatory (ASO-S) is the first payload to image the full solar disk and the solar corona in both white-light (WL) and ultraviolet (UV) H I Lya, extending up to 2.5 solar radii (Rs). Since the launch of the ASO-S on 9 October 2022, LST has captured various significant solar activities including flares, prominences, coro… ▽ More The Lyman-alpha Solar Telescope (LST) on board the Advanced Space-based Solar Observatory (ASO-S) is the first payload to image the full solar disk and the solar corona in both white-light (WL) and ultraviolet (UV) H I Lya, extending up to 2.5 solar radii (Rs). Since the launch of the ASO-S on 9 October 2022, LST has captured various significant solar activities including flares, prominences, coronal mass ejections (CMEs). LST covers different passbands of 121.6 nm, 360 nm and 700 nm. The Lya Solar Disk Imager (SDI) has a field of view (FOV) of 38.4 arcmin and a spatial resolution of around 9.5 arcsec, while the White-Light Solar Telescope (WST) has a FOV of 38.43 arcmin and a spatial resolution of around 3.0 arcsec. The FOV of the Lya Solar Corona Imager (SCI) reaches 81.1 arcmin and its spatial resolution is 4.3 arcsec. The stray-light level in the 700 nm waveband is about 7.8e-6 MSB (mean solar brightness) at 1.1 Rs and 7.6e-7 MSB at 2.5 Rs, and in the Lya waveband it is around 4.3e-3 MSB at 1.1 Rs and 4.1e-4 MSB at 2.5 Rs. This article will detail the results from on-orbit tests and calibrations. △ Less

Submitted 4 August, 2024; originally announced August 2024.

Comments: Solar Physics (ASO-S mission topical collection), accepted

arXiv:2408.01173 [pdf, other]

Sustainable Diffusion-based Incentive Mechanism for Generative AI-driven Digital Twins in Industrial Cyber-Physical Systems

Authors: Jinbo Wen, Jiawen Kang, Dusit Niyato, Yang Zhang, Shiwen Mao

Abstract: Industrial Cyber-Physical Systems (ICPSs) are an integral component of modern manufacturing and industries. By digitizing data throughout the product life cycle, Digital Twins (DTs) in ICPSs enable a shift from current industrial infrastructures to intelligent and adaptive infrastructures. Thanks to data process capability, Generative Artificial Intelligence (GAI) can drive the construction and up… ▽ More Industrial Cyber-Physical Systems (ICPSs) are an integral component of modern manufacturing and industries. By digitizing data throughout the product life cycle, Digital Twins (DTs) in ICPSs enable a shift from current industrial infrastructures to intelligent and adaptive infrastructures. Thanks to data process capability, Generative Artificial Intelligence (GAI) can drive the construction and update of DTs to improve predictive accuracy and prepare for diverse smart manufacturing. However, mechanisms that leverage sensing Industrial Internet of Things (IIoT) devices to share data for the construction of DTs are susceptible to adverse selection problems. In this paper, we first develop a GAI-driven DT architecture for ICPSs. To address the adverse selection problem caused by information asymmetry, we propose a contract theory model and develop the sustainable diffusion-based soft actor-critic algorithm to identify the optimal feasible contract. Specifically, we leverage the dynamic structured pruning technique to reduce parameter numbers of actor networks, allowing sustainability and efficient implementation of the proposed algorithm. Finally, numerical results demonstrate the effectiveness of the proposed scheme. △ Less

Submitted 2 August, 2024; originally announced August 2024.

arXiv:2408.01090 [pdf, other]

General-purpose Dataflow Model with Neuromorphic Primitives

Authors: Weihao Zhang, Yu Du, Hongyi Li, Songchen Ma, Rong Zhao

Abstract: Neuromorphic computing exhibits great potential to provide high-performance benefits in various applications beyond neural networks. However, a general-purpose program execution model that aligns with the features of neuromorphic computing is required to bridge the gap between program versatility and neuromorphic hardware efficiency. The dataflow model offers a potential solution, but it faces hig… ▽ More Neuromorphic computing exhibits great potential to provide high-performance benefits in various applications beyond neural networks. However, a general-purpose program execution model that aligns with the features of neuromorphic computing is required to bridge the gap between program versatility and neuromorphic hardware efficiency. The dataflow model offers a potential solution, but it faces high graph complexity and incompatibility with neuromorphic hardware when dealing with control flow programs, which decreases the programmability and performance. Here, we present a dataflow model tailored for neuromorphic hardware, called neuromorphic dataflow, which provides a compact, concise, and neuromorphic-compatible program representation for control logic. The neuromorphic dataflow introduces "when" and "where" primitives, which restructure the view of control. The neuromorphic dataflow embeds these primitives in the dataflow schema with the plasticity inherited from the spiking algorithms. Our method enables the deployment of general-purpose programs on neuromorphic hardware with both programmability and plasticity, while fully utilizing the hardware's potential. △ Less

Submitted 2 August, 2024; originally announced August 2024.

arXiv:2407.21075 [pdf, other]

Apple Intelligence Foundation Language Models

Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used to train the model, the training process, how the models are optimized for inference, and the evaluation results. We highlight our focus on Responsible AI and how the principles are applied throughout the model development. △ Less

Submitted 29 July, 2024; originally announced July 2024.

arXiv:2407.19174 [pdf, other]

Reducing Spurious Correlation for Federated Domain Generalization

Authors: Shuran Ma, Weiying Xie, Daixun Li, Haowei Li, Yunsong Li

Abstract: The rapid development of multimedia has provided a large amount of data with different distributions for visual tasks, forming different domains. Federated Learning (FL) can efficiently use this diverse data distributed on different client media in a decentralized manner through model sharing. However, in open-world scenarios, there is a challenge: global models may struggle to predict well on ent… ▽ More The rapid development of multimedia has provided a large amount of data with different distributions for visual tasks, forming different domains. Federated Learning (FL) can efficiently use this diverse data distributed on different client media in a decentralized manner through model sharing. However, in open-world scenarios, there is a challenge: global models may struggle to predict well on entirely new domain data captured by certain media, which were not encountered during training. Existing methods still rely on strong statistical correlations between samples and labels to address this issue, which can be misleading, as some features may establish spurious short-cut correlations with the predictions. To comprehensively address this challenge, we introduce FedCD (Cross-Domain Invariant Federated Learning), an overall optimization framework at both the local and global levels. We introduce the Spurious Correlation Intervener (SCI), which employs invariance theory to locally generate interventers for features in a self-supervised manner to reduce the model's susceptibility to spurious correlated features. Our approach requires no sharing of data or features, only the gradients related to the model. Additionally, we develop the simple yet effective Risk Extrapolation Aggregation strategy (REA), determining aggregation coefficients through mathematical optimization to facilitate global causal invariant predictions. Extensive experiments and ablation studies highlight the effectiveness of our approach. In both classification and object detection generalization tasks, our method outperforms the baselines by an average of at least 1.45% in Acc, 4.8% and 1.27% in mAP50. △ Less

Submitted 27 July, 2024; originally announced July 2024.

Comments: 10 pages, 4 figures

arXiv:2407.18961 [pdf, other]

MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains

Authors: Guoli Yin, Haoping Bai, Shuang Ma, Feng Nan, Yanchao Sun, Zhaoyang Xu, Shen Ma, Jiarui Lu, Xiang Kong, Aonan Zhang, Dian Ang Yap, Yizhe zhang, Karsten Ahnert, Vik Kamath, Mathias Berglund, Dominic Walsh, Tobias Gindele, Juergen Wiest, Zhengfeng Lai, Xiaoming Wang, Jiulong Shan, Meng Cao, Ruoming Pang, Zirui Wang

Abstract: Recent advances in large language models (LLMs) have increased the demand for comprehensive benchmarks to evaluate their capabilities as human-like agents. Existing benchmarks, while useful, often focus on specific application scenarios, emphasizing task completion but failing to dissect the underlying skills that drive these outcomes. This lack of granularity makes it difficult to deeply discern… ▽ More Recent advances in large language models (LLMs) have increased the demand for comprehensive benchmarks to evaluate their capabilities as human-like agents. Existing benchmarks, while useful, often focus on specific application scenarios, emphasizing task completion but failing to dissect the underlying skills that drive these outcomes. This lack of granularity makes it difficult to deeply discern where failures stem from. Additionally, setting up these environments requires considerable effort, and issues of unreliability and reproducibility sometimes arise, especially in interactive tasks. To address these limitations, we introduce the Massive Multitask Agent Understanding (MMAU) benchmark, featuring comprehensive offline tasks that eliminate the need for complex environment setups. It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics, and covers five essential capabilities: Understanding, Reasoning, Planning, Problem-solving, and Self-correction. With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents. By testing 18 representative models on MMAU, we provide deep and insightful analyses. Ultimately, MMAU not only sheds light on the capabilities and limitations of LLM agents but also enhances the interpretability of their performance. Datasets and evaluation scripts of MMAU are released at https://github.com/apple/axlearn/tree/main/docs/research/mmau. △ Less

Submitted 15 August, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

arXiv:2407.18171 [pdf, other]

Chemically reactive and aging macromolecular mixtures II: Phase separation and coarsening

Authors: Ruoyao Zhang, Sheng Mao, Mikko P. Haataja

Abstract: In a companion paper, we put forth a thermodynamic model for complex formation via a chemical reaction involving multiple macromolecular species, which may subsequently undergo liquid-liquid phase separation and a further transition into a gel-like state. In the present work, we formulate a thermodynamically consistent kinetic framework to study the interplay between phase separation, chemical rea… ▽ More In a companion paper, we put forth a thermodynamic model for complex formation via a chemical reaction involving multiple macromolecular species, which may subsequently undergo liquid-liquid phase separation and a further transition into a gel-like state. In the present work, we formulate a thermodynamically consistent kinetic framework to study the interplay between phase separation, chemical reaction and aging in spatially inhomogeneous macromolecular mixtures. A numerical algorithm is also proposed to simulate domain growth from collisions of liquid and gel domains via passive Brownian motion in both two and three spatial dimensions. Our results show that the coarsening behavior is significantly influenced by the degree of gelation and Brownian motion. The presence of a gel phase inside condensates strongly limits the diffusive transport processes, and Brownian motion coalescence controls the coarsening process in systems with high area/volume fractions of gel-like condensates, leading to formation of interconnected domains with atypical domain growth rates controlled by size-dependent translational and rotational diffusivities. △ Less

Submitted 25 July, 2024; originally announced July 2024.

Comments: 14 pages, 9 figures

arXiv:2407.16748 [pdf, other]

The dispersion measure and rotation measure from fast radio burst host galaxies based on the IllustrisTNG50 simulation

Authors: Timea Orsolya Kovacs, Sui Ann Mao, Aritra Basu, Yik Ki Ma, Laura G. Spitler, Charles R. H. Walker

Abstract: Fast radio bursts (FRB) will become important cosmological tools, as the number of observed FRBs is increasing rapidly with more surveys being carried out. A large sample of FRBs with dispersion measures (DM) and rotation measures (RM) can be used to study the intergalactic magnetic field. However, the observed DM and RM of FRBs have multiple contributors which must be quantified to obtain the int… ▽ More Fast radio bursts (FRB) will become important cosmological tools, as the number of observed FRBs is increasing rapidly with more surveys being carried out. A large sample of FRBs with dispersion measures (DM) and rotation measures (RM) can be used to study the intergalactic magnetic field. However, the observed DM and RM of FRBs have multiple contributors which must be quantified to obtain the intergalactic medium's (IGM) DM and RM. In this paper, we estimate one such contribution to DM and RM: that of FRB host galaxies. We show how it changes with redshift, galaxy type, and the stellar mass of the galaxies, inclination, and FRB's projected offset. Using the IllustrisTNG50 simulations, we selected 16500 galaxies at redshifts of 0<=z<=2, with stellar masses in the range 9<=log(M*/Msun)<=12. In each galaxy, we calculate the DM and RM contributions of 1000 sightlines, and construct DM and RM probability density functions. We find that the rest frame DM distributions of all galaxies at a given redshift can be fitted by a lognormal function, and the rest frame RM distribution is symmetric around 0 rad m$^{-2}$, and can be fitted by the combination of a Lorentzian and two Gaussian functions. The parameters of these functions change for different subsets of galaxies with different redshift, stellar mass, inclination, and FRB offset. These changes are due to an increasing $n_e$ with redshift, SFR, and stellar mass, and we find a more ordered B field at lower z compared to higher z, suggested by more galaxies with B field reversals and B fields dominated by random B field at higher z. We estimate the FRB host DM and RM contributions, which can be used in the future to isolate the IGM's contribution from the observed DM and RM of FRBs. We predict that to constrain an $σ_{\rm RM,IGM}$ of 2 rad m$^{-2}$ to 95% confidence level we need to observe 95000 FRBs at z=0.5, but only 9500 FRBs at z=2. △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: 24 pages, 15 figures Accepted for publication in A&A

arXiv:2407.15498 [pdf, other]

Refining Corpora from a Model Calibration Perspective for Chinese Spelling Correction

Authors: Dingyao Yu, Yang An, Wei Ye, Xiongfeng Xiao, Shaoguang Mao, Tao Ge, Shikun Zhang

Abstract: Chinese Spelling Correction (CSC) commonly lacks large-scale high-quality corpora, due to the labor-intensive labeling of spelling errors in real-life human writing or typing scenarios. Two data augmentation methods are widely adopted: (1) \textit{Random Replacement} with the guidance of confusion sets and (2) \textit{OCR/ASR-based Generation} that simulates character misusing. However, both metho… ▽ More Chinese Spelling Correction (CSC) commonly lacks large-scale high-quality corpora, due to the labor-intensive labeling of spelling errors in real-life human writing or typing scenarios. Two data augmentation methods are widely adopted: (1) \textit{Random Replacement} with the guidance of confusion sets and (2) \textit{OCR/ASR-based Generation} that simulates character misusing. However, both methods inevitably introduce noisy data (e.g., false spelling errors), potentially leading to over-correction. By carefully analyzing the two types of corpora, we find that though the latter achieves more robust generalization performance, the former yields better-calibrated CSC models. We then provide a theoretical analysis of this empirical observation, based on which a corpus refining strategy is proposed. Specifically, OCR/ASR-based data samples are fed into a well-calibrated CSC model trained on random replacement-based corpora and then filtered based on prediction confidence. By learning a simple BERT-based model on the refined OCR/ASR-based corpus, we set up impressive state-of-the-art performance on three widely-used benchmarks, while significantly alleviating over-correction (e.g., lowering false positive predictions). △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.15395 [pdf, other]

FAST-GSC: Fast and Adaptive Semantic Transmission for Generative Semantic Communication

Authors: Yiru Wang, Wanting Yang, Zehui Xiong, Yuping Zhao, Shiwen Mao, Tony Q. S. Quek, H. Vincent Poor

Abstract: The rapidly evolving field of generative artificial intelligence technology has introduced innovative approaches for developing semantic communication (SemCom) frameworks, leading to the emergence of a new paradigm-generative SemCom (GSC). However, the complex processes involved in semantic extraction and generative inference may result in considerable latency in resource-constrained scenarios. To… ▽ More The rapidly evolving field of generative artificial intelligence technology has introduced innovative approaches for developing semantic communication (SemCom) frameworks, leading to the emergence of a new paradigm-generative SemCom (GSC). However, the complex processes involved in semantic extraction and generative inference may result in considerable latency in resource-constrained scenarios. To tackle these issues, we introduce a new GSC framework that involves fast and adaptive semantic transmission (FAST-GSC). This framework incorporates one innovative communication mechanism and two enhancement strategies at the transmitter and receiver, respectively. Aiming to reduce task latency, our communication mechanism enables fast semantic transmission by parallelizing the processes of semantic extraction at the transmitter and inference at the receiver. Preliminary evaluations indicate that while this mechanism effectively reduces task latency, it could potentially compromise task performance. To address this issue, we propose two additional methods for enhancement. First, at the transmitter, we employ reinforcement learning to discern the intrinsic temporal dependencies among the semantic units and design their extraction and transmission sequence accordingly. Second, at the receiver, we design a semantic difference calculation module and propose a sequential conditional denoising approach to alleviate the stringent immediacy requirement for the reception of semantic features. Extensive experiments demonstrate that our proposed architecture achieves a performance score comparable to the conventional GSC architecture while realizing a 52% reduction in residual task latency that extends beyond the fixed inference duration. △ Less

Submitted 22 July, 2024; originally announced July 2024.

arXiv:2407.14814 [pdf, other]

FMamba: Mamba based on Fast-attention for Multivariate Time-series Forecasting

Authors: Shusen Ma, Yu Kang, Peng Bai, Yun-Bo Zhao

Abstract: In multivariate time-series forecasting (MTSF), extracting the temporal correlations of the input sequences is crucial. While popular Transformer-based predictive models can perform well, their quadratic computational complexity results in inefficiency and high overhead. The recently emerged Mamba, a selective state space model, has shown promising results in many fields due to its strong temporal… ▽ More In multivariate time-series forecasting (MTSF), extracting the temporal correlations of the input sequences is crucial. While popular Transformer-based predictive models can perform well, their quadratic computational complexity results in inefficiency and high overhead. The recently emerged Mamba, a selective state space model, has shown promising results in many fields due to its strong temporal feature extraction capabilities and linear computational complexity. However, due to the unilateral nature of Mamba, channel-independent predictive models based on Mamba cannot attend to the relationships among all variables in the manner of Transformer-based models. To address this issue, we combine fast-attention with Mamba to introduce a novel framework named FMamba for MTSF. Technically, we first extract the temporal features of the input variables through an embedding layer, then compute the dependencies among input variables via the fast-attention module. Subsequently, we use Mamba to selectively deal with the input features and further extract the temporal dependencies of the variables through the multi-layer perceptron block (MLP-block). Finally, FMamba obtains the predictive results through the projector, a linear layer. Experimental results on eight public datasets demonstrate that FMamba can achieve state-of-the-art performance while maintaining low computational overhead. △ Less

Submitted 20 July, 2024; originally announced July 2024.

arXiv:2407.14192 [pdf, other]

LeKUBE: A Legal Knowledge Update BEnchmark

Authors: Changyue Wang, Weihang Su, Hu Yiran, Qingyao Ai, Yueyue Wu, Cheng Luo, Yiqun Liu, Min Zhang, Shaoping Ma

Abstract: Recent advances in Large Language Models (LLMs) have significantly shaped the applications of AI in multiple fields, including the studies of legal intelligence. Trained on extensive legal texts, including statutes and legal documents, the legal LLMs can capture important legal knowledge/concepts effectively and provide important support for downstream legal applications such as legal consultancy.… ▽ More Recent advances in Large Language Models (LLMs) have significantly shaped the applications of AI in multiple fields, including the studies of legal intelligence. Trained on extensive legal texts, including statutes and legal documents, the legal LLMs can capture important legal knowledge/concepts effectively and provide important support for downstream legal applications such as legal consultancy. Yet, the dynamic nature of legal statutes and interpretations also poses new challenges to the use of LLMs in legal applications. Particularly, how to update the legal knowledge of LLMs effectively and efficiently has become an important research problem in practice. Existing benchmarks for evaluating knowledge update methods are mostly designed for the open domain and cannot address the specific challenges of the legal domain, such as the nuanced application of new legal knowledge, the complexity and lengthiness of legal regulations, and the intricate nature of legal reasoning. To address this gap, we introduce the Legal Knowledge Update BEnchmark, i.e. LeKUBE, which evaluates knowledge update methods for legal LLMs across five dimensions. Specifically, we categorize the needs of knowledge updates in the legal domain with the help of legal professionals, and then hire annotators from law schools to create synthetic updates to the Chinese Criminal and Civil Code as well as sets of questions of which the answers would change after the updates. Through a comprehensive evaluation of state-of-the-art knowledge update methods, we reveal a notable gap between existing knowledge update methods and the unique needs of the legal domain, emphasizing the need for further research and development of knowledge update mechanisms tailored for legal LLMs. △ Less

Submitted 19 July, 2024; originally announced July 2024.

arXiv:2407.14009 [pdf, other]

Scale Disparity of Instances in Interactive Point Cloud Segmentation

Authors: Chenrui Han, Xuan Yu, Yuxuan Xie, Yili Liu, Sitong Mao, Shunbo Zhou, Rong Xiong, Yue Wang

Abstract: Interactive point cloud segmentation has become a pivotal task for understanding 3D scenes, enabling users to guide segmentation models with simple interactions such as clicks, therefore significantly reducing the effort required to tailor models to diverse scenarios and new categories. However, in the realm of interactive segmentation, the meaning of instance diverges from that in instance segmen… ▽ More Interactive point cloud segmentation has become a pivotal task for understanding 3D scenes, enabling users to guide segmentation models with simple interactions such as clicks, therefore significantly reducing the effort required to tailor models to diverse scenarios and new categories. However, in the realm of interactive segmentation, the meaning of instance diverges from that in instance segmentation, because users might desire to segment instances of both thing and stuff categories that vary greatly in scale. Existing methods have focused on thing categories, neglecting the segmentation of stuff categories and the difficulties arising from scale disparity. To bridge this gap, we propose ClickFormer, an innovative interactive point cloud segmentation model that accurately segments instances of both thing and stuff categories. We propose a query augmentation module to augment click queries by a global query sampling strategy, thus maintaining consistent performance across different instance scales. Additionally, we employ global attention in the query-voxel transformer to mitigate the risk of generating false positives, along with several other network structure improvements to further enhance the model's segmentation performance. Experiments demonstrate that ClickFormer outperforms existing interactive point cloud segmentation methods across both indoor and outdoor datasets, providing more accurate segmentation results with fewer user clicks in an open-world setting. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: Accepted by 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems

arXiv:2407.13801 [pdf, other]

Application of a spectral scheme to simulate horizontally slowly varying three-dimensional ocean acoustic propagation

Authors: Houwang Tu, Yongxian Wang, Xiaolan Zhou, Guojun Xu, Dongbao Gao, Shuqing Ma

Abstract: Three-dimensional numerical models for underwater sound propagation are popular in computational ocean acoustics. For horizontally slowly varying waveguide environments, an adiabatic mode-parabolic equation hybrid theory can be used for simulation. This theory employs adiabatic modes in the vertical direction, simplifying the solution of the sound pressure to the solution of horizontal refractive… ▽ More Three-dimensional numerical models for underwater sound propagation are popular in computational ocean acoustics. For horizontally slowly varying waveguide environments, an adiabatic mode-parabolic equation hybrid theory can be used for simulation. This theory employs adiabatic modes in the vertical direction, simplifying the solution of the sound pressure to the solution of horizontal refractive index of vertical modes. The refractive equations in the horizontal direction are further solved by a ``split-step" wide-angle parabolic equation model, following the approach of the ``vertical modes and horizontal parabolic equation". Existing three-dimensional sound propagation models mostly use finite difference methods for discretization, but in recent years, the academic community has proposed new types of sound propagation models based on spectral methods. Spectral methods are numerical discretization methods based on orthogonal polynomial approximation and weighted residual principles. They offer advantages such as high computational accuracy and fast convergence. In this study, a three-dimensional adiabatic mode-parabolic equation hybrid model discretized using spectral methods is proposed. In the vertical direction, the modal functions are solved using the Chebyshev spectral method. The medium layering is handled using a domain decomposition strategy, and the leaky modes under semi-infinite boundary conditions are addressed using an eigenvalue transformation technique. In the horizontal direction, the perfectly matched layer technique is utilized to handle unbounded computational domains, and the perfectly matched layer and computational domain are segmented into multiple layers. Numerical simulations show that the Chebyshev spectral method achieves reliable results in the application of the adiabatic mode-parabolic equation hybrid model. △ Less

Submitted 17 July, 2024; originally announced July 2024.

Comments: 34 pages, 16 figures

arXiv:2407.13228 [pdf]

Evaluating Large Language Models for Anxiety and Depression Classification using Counseling and Psychotherapy Transcripts

Authors: Junwei Sun, Siqi Ma, Yiran Fan, Peter Washington

Abstract: We aim to evaluate the efficacy of traditional machine learning and large language models (LLMs) in classifying anxiety and depression from long conversational transcripts. We fine-tune both established transformer models (BERT, RoBERTa, Longformer) and more recent large models (Mistral-7B), trained a Support Vector Machine with feature engineering, and assessed GPT models through prompting. We ob… ▽ More We aim to evaluate the efficacy of traditional machine learning and large language models (LLMs) in classifying anxiety and depression from long conversational transcripts. We fine-tune both established transformer models (BERT, RoBERTa, Longformer) and more recent large models (Mistral-7B), trained a Support Vector Machine with feature engineering, and assessed GPT models through prompting. We observe that state-of-the-art models fail to enhance classification outcomes compared to traditional machine learning methods. △ Less

Submitted 18 July, 2024; originally announced July 2024.

arXiv:2407.12867 [pdf, other]

Swift-BAT GUANO follow-up of gravitational-wave triggers in the third LIGO-Virgo-KAGRA observing run

Authors: Gayathri Raman, Samuele Ronchini, James Delaunay, Aaron Tohuvavohu, Jamie A. Kennea, Tyler Parsotan, Elena Ambrosi, Maria Grazia Bernardini, Sergio Campana, Giancarlo Cusumano, Antonino D'Ai, Paolo D'Avanzo, Valerio D'Elia, Massimiliano De Pasquale, Simone Dichiara, Phil Evans, Dieter Hartmann, Paul Kuin, Andrea Melandri, Paul O'Brien, Julian P. Osborne, Kim Page, David M. Palmer, Boris Sbarufatti, Gianpiero Tagliaferri , et al. (1797 additional authors not shown)

Abstract: We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wav… ▽ More We present results from a search for X-ray/gamma-ray counterparts of gravitational-wave (GW) candidates from the third observing run (O3) of the LIGO-Virgo-KAGRA (LVK) network using the Swift Burst Alert Telescope (Swift-BAT). The search includes 636 GW candidates received in low latency, 86 of which have been confirmed by the offline analysis and included in the third cumulative Gravitational-Wave Transient Catalogs (GWTC-3). Targeted searches were carried out on the entire GW sample using the maximum--likelihood NITRATES pipeline on the BAT data made available via the GUANO infrastructure. We do not detect any significant electromagnetic emission that is temporally and spatially coincident with any of the GW candidates. We report flux upper limits in the 15-350 keV band as a function of sky position for all the catalog candidates. For GW candidates where the Swift-BAT false alarm rate is less than 10$^{-3}$ Hz, we compute the GW--BAT joint false alarm rate. Finally, the derived Swift-BAT upper limits are used to infer constraints on the putative electromagnetic emission associated with binary black hole mergers. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: 50 pages, 10 figures, 4 tables

arXiv:2407.12070 [pdf, other]

Co-Designing Binarized Transformer and Hardware Accelerator for Efficient End-to-End Edge Deployment

Authors: Yuhao Ji, Chao Fang, Shaobo Ma, Haikuo Shao, Zhongfeng Wang

Abstract: Transformer models have revolutionized AI tasks, but their large size hinders real-world deployment on resource-constrained and latency-critical edge devices. While binarized Transformers offer a promising solution by significantly reducing model size, existing approaches suffer from algorithm-hardware mismatches with limited co-design exploration, leading to suboptimal performance on edge devices… ▽ More Transformer models have revolutionized AI tasks, but their large size hinders real-world deployment on resource-constrained and latency-critical edge devices. While binarized Transformers offer a promising solution by significantly reducing model size, existing approaches suffer from algorithm-hardware mismatches with limited co-design exploration, leading to suboptimal performance on edge devices. Hence, we propose a co-design method for efficient end-to-end edge deployment of Transformers from three aspects: algorithm, hardware, and joint optimization. First, we propose BMT, a novel hardware-friendly binarized Transformer with optimized quantization methods and components, and we further enhance its model accuracy by leveraging the weighted ternary weight splitting training technique. Second, we develop a streaming processor mixed binarized Transformer accelerator, namely BAT, which is equipped with specialized units and scheduling pipelines for efficient inference of binarized Transformers. Finally, we co-optimize the algorithm and hardware through a design space exploration approach to achieve a global trade-off between accuracy, latency, and robustness for real-world deployments. Experimental results show our co-design achieves up to 2.14-49.37x throughput gains and 3.72-88.53x better energy efficiency over state-of-the-art Transformer accelerators, enabling efficient end-to-end edge deployment. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: This paper is accepted by ICCAD 2024

arXiv:2407.11449 [pdf, other]

Controllable Contextualized Image Captioning: Directing the Visual Narrative through User-Defined Highlights

Authors: Shunqi Mao, Chaoyi Zhang, Hang Su, Hwanjun Song, Igor Shalyminov, Weidong Cai

Abstract: Contextualized Image Captioning (CIC) evolves traditional image captioning into a more complex domain, necessitating the ability for multimodal reasoning. It aims to generate image captions given specific contextual information. This paper further introduces a novel domain of Controllable Contextualized Image Captioning (Ctrl-CIC). Unlike CIC, which solely relies on broad context, Ctrl-CIC accentu… ▽ More Contextualized Image Captioning (CIC) evolves traditional image captioning into a more complex domain, necessitating the ability for multimodal reasoning. It aims to generate image captions given specific contextual information. This paper further introduces a novel domain of Controllable Contextualized Image Captioning (Ctrl-CIC). Unlike CIC, which solely relies on broad context, Ctrl-CIC accentuates a user-defined highlight, compelling the model to tailor captions that resonate with the highlighted aspects of the context. We present two approaches, Prompting-based Controller (P-Ctrl) and Recalibration-based Controller (R-Ctrl), to generate focused captions. P-Ctrl conditions the model generation on highlight by prepending captions with highlight-driven prefixes, whereas R-Ctrl tunes the model to selectively recalibrate the encoder embeddings for highlighted tokens. Additionally, we design a GPT-4V empowered evaluator to assess the quality of the controlled captions alongside standard assessment methods. Extensive experimental results demonstrate the efficient and effective controllability of our method, charting a new direction in achieving user-adaptive image captioning. Code is available at https://github.com/ShunqiM/Ctrl-CIC . △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.11372 [pdf, other]

UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening

Authors: Siyuan Cheng, Guangyu Shen, Kaiyuan Zhang, Guanhong Tao, Shengwei An, Hanxi Guo, Shiqing Ma, Xiangyu Zhang

Abstract: Deep neural networks (DNNs) have demonstrated effectiveness in various fields. However, DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. While existing works have proposed various methods to mitigate backdoor effects in poisoned models, they tend to be less effective against recent ad… ▽ More Deep neural networks (DNNs) have demonstrated effectiveness in various fields. However, DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. While existing works have proposed various methods to mitigate backdoor effects in poisoned models, they tend to be less effective against recent advanced attacks. In this paper, we introduce a novel post-training defense technique UNIT that can effectively eliminate backdoor effects for a variety of attacks. In specific, UNIT approximates a unique and tight activation distribution for each neuron in the model. It then proactively dispels substantially large activation values that exceed the approximated boundaries. Our experimental results demonstrate that UNIT outperforms 7 popular defense methods against 14 existing backdoor attacks, including 2 advanced attacks, using only 5\% of clean training data. UNIT is also cost efficient. The code is accessible at https://github.com/Megum1/UNIT. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: The 18th European Conference on Computer Vision ECCV 2024

arXiv:2407.11282 [pdf, other]

Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models

Authors: Qingcheng Zeng, Mingyu Jin, Qinkai Yu, Zhenting Wang, Wenyue Hua, Zihao Zhou, Guangyan Sun, Yanda Meng, Shiqing Ma, Qifan Wang, Felix Juefei-Xu, Kaize Ding, Fan Yang, Ruixiang Tang, Yongfeng Zhang

Abstract: Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial. One commonly used method to assess the reliability of LLMs' responses is uncertainty estimation, which gauges the likelihood of their answers being correct. While many studies focus on improving the accuracy of uncertainty estimations for LLMs, our research investigates… ▽ More Large Language Models (LLMs) are employed across various high-stakes domains, where the reliability of their outputs is crucial. One commonly used method to assess the reliability of LLMs' responses is uncertainty estimation, which gauges the likelihood of their answers being correct. While many studies focus on improving the accuracy of uncertainty estimations for LLMs, our research investigates the fragility of uncertainty estimation and explores potential attacks. We demonstrate that an attacker can embed a backdoor in LLMs, which, when activated by a specific trigger in the input, manipulates the model's uncertainty without affecting the final output. Specifically, the proposed backdoor attack method can alter an LLM's output probability distribution, causing the probability distribution to converge towards an attacker-predefined distribution while ensuring that the top-1 prediction remains unchanged. Our experimental results demonstrate that this attack effectively undermines the model's self-evaluation reliability in multiple-choice questions. For instance, we achieved a 100 attack success rate (ASR) across three different triggering strategies in four models. Further, we investigate whether this manipulation generalizes across different prompts and domains. This work highlights a significant threat to the reliability of LLMs and underscores the need for future defenses against such attacks. The code is available at https://github.com/qcznlp/uncertainty_attack. △ Less

Submitted 19 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10969 [pdf, other]

Q-Sparse: All Large Language Models can be Fully Sparsely-Activated

Authors: Hongyu Wang, Shuming Ma, Ruiping Wang, Furu Wei

Abstract: We introduce, Q-Sparse, a simple yet effective approach to training sparsely-activated large language models (LLMs). Q-Sparse enables full sparsity of activations in LLMs which can bring significant efficiency gains in inference. This is achieved by applying top-K sparsification to the activations and the straight-through-estimator to the training. We also introduce Block Q-Sparse for batch traini… ▽ More We introduce, Q-Sparse, a simple yet effective approach to training sparsely-activated large language models (LLMs). Q-Sparse enables full sparsity of activations in LLMs which can bring significant efficiency gains in inference. This is achieved by applying top-K sparsification to the activations and the straight-through-estimator to the training. We also introduce Block Q-Sparse for batch training and inference. The key results from this work are, (1) Q-Sparse can achieve results comparable to those of baseline LLMs while being much more efficient at inference time; (2) We present an inference-optimal scaling law for sparsely-activated LLMs; (3) Q-Sparse is effective in different settings, including training-from-scratch, continue-training of off-the-shelf LLMs, and finetuning; (4) Q-Sparse works for both full-precision and 1-bit LLMs (e.g., BitNet b1.58). Particularly, the synergy of BitNet b1.58 and Q-Sparse (can be equipped with MoE) provides the cornerstone and a clear path to revolutionize the efficiency, including cost and energy consumption, of future LLMs. △ Less

Submitted 24 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

Comments: Work in progress

arXiv:2407.10805 [pdf, other]

Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval

Authors: Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, Jian Guo

Abstract: Retrieval-augmented generation (RAG) has significantly advanced large language models (LLMs) by enabling dynamic information retrieval to mitigate knowledge gaps and hallucinations in generated content. However, these systems often falter with complex reasoning and consistency across diverse queries. In this work, we present Think-on-Graph 2.0, an enhanced RAG framework that aligns questions with… ▽ More Retrieval-augmented generation (RAG) has significantly advanced large language models (LLMs) by enabling dynamic information retrieval to mitigate knowledge gaps and hallucinations in generated content. However, these systems often falter with complex reasoning and consistency across diverse queries. In this work, we present Think-on-Graph 2.0, an enhanced RAG framework that aligns questions with the knowledge graph and uses it as a navigational tool, which deepens and refines the RAG paradigm for information collection and integration. The KG-guided navigation fosters deep and long-range associations to uphold logical consistency and optimize the scope of retrieval for precision and interoperability. In conjunction, factual consistency can be better ensured through semantic similarity guided by precise directives. ToG${2.0}$ not only improves the accuracy and reliability of LLMs' responses but also demonstrates the potential of hybrid structured knowledge systems to significantly advance LLM reasoning, aligning it closer to human-like performance. We conducted extensive experiments on four public datasets to demonstrate the advantages of our method compared to the baseline. △ Less

Submitted 6 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10131 [pdf, other]

WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models

Authors: Xinjian Wu, Ruisong Zhang, Jie Qin, Shijie Ma, Cheng-Lin Liu

Abstract: Segmenting and recognizing diverse object parts is crucial in computer vision and robotics. Despite significant progress in object segmentation, part-level segmentation remains underexplored due to complex boundaries and scarce annotated data. To address this, we propose a novel Weakly-supervised Part Segmentation (WPS) setting and an approach called WPS-SAM, built on the large-scale pre-trained v… ▽ More Segmenting and recognizing diverse object parts is crucial in computer vision and robotics. Despite significant progress in object segmentation, part-level segmentation remains underexplored due to complex boundaries and scarce annotated data. To address this, we propose a novel Weakly-supervised Part Segmentation (WPS) setting and an approach called WPS-SAM, built on the large-scale pre-trained vision foundation model, Segment Anything Model (SAM). WPS-SAM is an end-to-end framework designed to extract prompt tokens directly from images and perform pixel-level segmentation of part regions. During its training phase, it only uses weakly supervised labels in the form of bounding boxes or points. Extensive experiments demonstrate that, through exploiting the rich knowledge embedded in pre-trained foundation models, WPS-SAM outperforms other segmentation models trained with pixel-level strong annotations. Specifically, WPS-SAM achieves 68.93% mIOU and 79.53% mACC on the PartImageNet dataset, surpassing state-of-the-art fully supervised methods by approximately 4% in terms of mIOU. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.10046 [pdf, other]

Non-Hermitian dynamics of Cooper pair splitter

Authors: E. S. Ma, Z. Song

Abstract: We propose a non-Hermitian model for Cooper pair splitters, in which the process of electron tunneling into electrodes is characterized by non-Hermitian terms. We find that across a broad range of parameters, the energy levels consistently remain real, and coalescing states are always present. The Coulomb repulsion between electrons in a quantum dot affects the order of the coalescing states. This… ▽ More We propose a non-Hermitian model for Cooper pair splitters, in which the process of electron tunneling into electrodes is characterized by non-Hermitian terms. We find that across a broad range of parameters, the energy levels consistently remain real, and coalescing states are always present. The Coulomb repulsion between electrons in a quantum dot affects the order of the coalescing states. This gives rise to two distinct dynamic behaviors: (i) when the initial state is an empty state, the final state supports a nonzero electron-escaping rate; (ii) the electron-escaping rate is zero for a single-electron initial state. In the former case, our exact solutions reveal that the average electron-escaping rate vanishes along a set of hyperbolic curves in the plane of the chemical potentials of the two quantum dots. The stability of the results in the presence of disordered perturbation is also investigated. Our findings pave the way for investigating Cooper pair splitters within the framework of non-Hermitian quantum mechanics. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.08919 [pdf, other]

Redefinition of Digital Twin and its Situation Awareness Framework Designing Towards Fourth Paradigm for Energy Internet of Things

Authors: Xing He, Yuezhong Tang, Shuyan Ma, Qian Ai, Fei Tao, Robert Qiu

Abstract: Traditional knowledge-based situation awareness (SA) modes struggle to adapt to the escalating complexity of today's Energy Internet of Things (EIoT), necessitating a pivotal paradigm shift. In response, this work introduces a pioneering data-driven SA framework, termed digital twin-based situation awareness (DT-SA), aiming to bridge existing gaps between data and demands, and further to enhance S… ▽ More Traditional knowledge-based situation awareness (SA) modes struggle to adapt to the escalating complexity of today's Energy Internet of Things (EIoT), necessitating a pivotal paradigm shift. In response, this work introduces a pioneering data-driven SA framework, termed digital twin-based situation awareness (DT-SA), aiming to bridge existing gaps between data and demands, and further to enhance SA capabilities within the complex EIoT landscape. First, we redefine the concept of digital twin (DT) within the EIoT context, aligning it with data-intensive scientific discovery paradigm (the Fourth Paradigm) so as to waken EIoT's sleeping data; this contextual redefinition lays the cornerstone of our DT-SA framework for EIoT. Then, the framework is comprehensively explored through its four fundamental steps: digitalization, simulation, informatization, and intellectualization. These steps initiate a virtual ecosystem conducive to a continuously self-adaptive, self-learning, and self-evolving big model (BM), further contributing to the evolution and effectiveness of DT-SA in engineering. Our framework is characterized by the incorporation of system theory and Fourth Paradigm as guiding ideologies, DT as data engine, and BM as intelligence engine. This unique combination forms the backbone of our approach. This work extends beyond engineering, stepping into the domain of data science -- DT-SA not only enhances management practices for EIoT users/operators, but also propels advancements in pattern analysis and machine intelligence (PAMI) within the intricate fabric of a complex system. Numerous real-world cases validate our DT-SA framework. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 16 pages, 15 figures Accepted by IEEE Transactions on Systems, Man and Cybernetics: Systems

arXiv:2407.08424 [pdf, other]

Semantic Feature Division Multiple Access for Multi-user Digital Interference Networks

Authors: Shuai Ma, Chuanhui Zhang, Bin Shen, Youlong Wu, Hang Li, Shiyin Li, Guangming Shi, Naofal Al-Dhahir

Abstract: With the ever-increasing user density and quality of service (QoS) demand,5G networks with limited spectrum resources are facing massive access challenges. To address these challenges, in this paper, we propose a novel discrete semantic feature division multiple access (SFDMA) paradigm for multi-user digital interference networks. Specifically, by utilizing deep learning technology, SFDMA extracts… ▽ More With the ever-increasing user density and quality of service (QoS) demand,5G networks with limited spectrum resources are facing massive access challenges. To address these challenges, in this paper, we propose a novel discrete semantic feature division multiple access (SFDMA) paradigm for multi-user digital interference networks. Specifically, by utilizing deep learning technology, SFDMA extracts multi-user semantic information into discrete representations in distinguishable semantic subspaces, which enables multiple users to transmit simultaneously over the same time-frequency resources. Furthermore, based on a robust information bottleneck, we design a SFDMA based multi-user digital semantic interference network for inference tasks, which can achieve approximate orthogonal transmission. Moreover, we propose a SFDMA based multi-user digital semantic interference network for image reconstruction tasks, where the discrete outputs of the semantic encoders of the users are approximately orthogonal, which significantly reduces multi-user interference. Furthermore, we propose an Alpha-Beta-Gamma (ABG) formula for semantic communications, which is the first theoretical relationship between inference accuracy and transmission power. Then, we derive adaptive power control methods with closed-form expressions for inference tasks. Extensive simulations verify the effectiveness and superiority of the proposed SFDMA. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Showing 1–50 of 2,063 results for author: Ma, S