Skip to main content

Showing 1–50 of 1,348 results for author: Xue, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.05880  [pdf, ps, other

    cs.HC

    Bridging Research and Practice Through Conversation: Reflecting on Our Experience

    Authors: Mayra Russo, Mackenzie Jorgensen, Kristen M. Scott, Wendy Xu, Di H. Nguyen, Jessie Finocchiaro, Matthew Olckers

    Abstract: While some research fields have a long history of collaborating with domain experts outside academia, many quantitative researchers do not have natural avenues to meet experts in areas where the research is later deployed. We explain how conversations -- interviews without a specific research objective -- can bridge research and practice. Using collaborative autoethnography, we reflect on our expe… ▽ More

    Submitted 25 August, 2024; originally announced September 2024.

    Comments: To by published in the fourth ACM Conference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO'24)

  2. arXiv:2409.04945  [pdf, other

    cs.CV eess.SP

    Fast Deep Predictive Coding Networks for Videos Feature Extraction without Labels

    Authors: Wenqian Xue, Chi Ding, Jose Principe

    Abstract: Brain-inspired deep predictive coding networks (DPCNs) effectively model and capture video features through a bi-directional information flow, even without labels. They are based on an overcomplete description of video scenes, and one of the bottlenecks has been the lack of effective sparsification techniques to find discriminative and robust dictionaries. FISTA has been the best alternative. This… ▽ More

    Submitted 7 September, 2024; originally announced September 2024.

  3. arXiv:2409.04267  [pdf, other

    cs.AI cs.CL

    An overview of domain-specific foundation model: key technologies, applications and challenges

    Authors: Haolong Chen, Hanzhi Chen, Zijian Zhao, Kaifeng Han, Guangxu Zhu, Yichen Zhao, Ying Du, Wei Xu, Qingjiang Shi

    Abstract: The impressive performance of ChatGPT and other foundation-model-based products in human language understanding has prompted both academia and industry to explore how these models can be tailored for specific industries and application scenarios. This process, known as the customization of domain-specific foundation models, addresses the limitations of general-purpose models, which may not fully c… ▽ More

    Submitted 6 September, 2024; originally announced September 2024.

  4. arXiv:2409.03810  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data

    Authors: Yejie Wang, Keqing He, Dayuan Fu, Zhuoma Gongque, Heyang Xu, Yanxu Chen, Zhexu Wang, Yujia Fu, Guanting Dong, Muxi Diao, Jingang Wang, Mengdi Zhang, Xunliang Cai, Weiran Xu

    Abstract: Recently, there has been a growing interest in studying how to construct better code instruction tuning data. However, we observe Code models trained with these datasets exhibit high performance on HumanEval but perform worse on other benchmarks such as LiveCodeBench. Upon further investigation, we find that many datasets suffer from severe data leakage. After cleaning up most of the leaked data,… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

    Comments: Working in progress

  5. arXiv:2409.03457  [pdf, other

    cs.RO

    FLAF: Focal Line and Feature-constrained Active View Planning for Visual Teach and Repeat

    Authors: Changfei Fu, Weinan Chen, Wenjun Xu, Hong Zhang

    Abstract: This paper presents FLAF, a focal line and feature-constrained active view planning method for tracking failure avoidance in feature-based visual navigation of mobile robots. Our FLAF-based visual navigation is built upon a feature-based visual teach and repeat (VT\&R) framework, which supports many robotic applications by teaching a robot to navigate on various paths that cover a significant port… ▽ More

    Submitted 8 September, 2024; v1 submitted 5 September, 2024; originally announced September 2024.

  6. arXiv:2409.02919  [pdf, other

    cs.CV

    HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts

    Authors: Xinyu Liu, Yingqing He, Lanqing Guo, Xiang Li, Bu Jin, Peng Li, Yan Li, Chi-Min Chan, Qifeng Chen, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo

    Abstract: The potential for higher-resolution image generation using pretrained diffusion models is immense, yet these models often struggle with issues of object repetition and structural artifacts especially when scaling to 4K resolution and higher. We figure out that the problem is caused by that, a single prompt for the generation of multiple scales provides insufficient efficacy. In response, we propos… ▽ More

    Submitted 9 September, 2024; v1 submitted 4 September, 2024; originally announced September 2024.

    Comments: https://liuxinyv.github.io/HiPrompt/

  7. arXiv:2409.02648  [pdf, other

    cond-mat.mtrl-sci cs.CV

    Creating a Microstructure Latent Space with Rich Material Information for Multiphase Alloy Design

    Authors: Xudong Ma, Yuqi Zhang, Chenchong Wang, Ming Wang, Mingxin Huang, Wei Xu

    Abstract: The intricate microstructure serves as the cornerstone for the composition/processing-structure-property (CPSP) connection in multiphase alloys. Traditional alloy design methods often overlook microstructural details, which diminishes the reliability and effectiveness of the outcomes. This study introduces an improved alloy design algorithm that integrates authentic microstructural information to… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  8. arXiv:2409.02074  [pdf, other

    cs.CR cs.HC cs.LG cs.SE

    RACONTEUR: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer

    Authors: Jiangyi Deng, Xinfeng Li, Yanjiao Chen, Yijie Bai, Haiqin Weng, Yan Liu, Tao Wei, Wenyuan Xu

    Abstract: Malicious shell commands are linchpins to many cyber-attacks, but may not be easy to understand by security analysts due to complicated and often disguised code structures. Advances in large language models (LLMs) have unlocked the possibility of generating understandable explanations for shell commands. However, existing general-purpose LLMs suffer from a lack of expert knowledge and a tendency t… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: Accepted by NDSS Symposium 2025. Please cite this paper as "Jiangyi Deng, Xinfeng Li, Yanjiao Chen, Yijie Bai, Haiqin Weng, Yan Liu, Tao Wei, Wenyuan Xu. RACONTEUR: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer. In the 32nd Annual Network and Distributed System Security Symposium (NDSS 2025)."

  9. arXiv:2409.00992  [pdf, other

    cs.RO

    MFCalib: Single-shot and Automatic Extrinsic Calibration for LiDAR and Camera in Targetless Environments Based on Multi-Feature Edge

    Authors: Tianyong Ye, Wei Xu, Chunran Zheng, Yukang Cui

    Abstract: This paper presents MFCalib, an innovative extrinsic calibration technique for LiDAR and RGB camera that operates automatically in targetless environments with a single data capture. At the heart of this method is using a rich set of edge information, significantly enhancing calibration accuracy and robustness. Specifically, we extract both depth-continuous and depth-discontinuous edges, along wit… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 8 pages, 10 figures, accepted by IROS2024

  10. arXiv:2409.00086  [pdf, other

    cs.NI cs.AR cs.HC cs.LG eess.SY

    Towards Battery-Free Wireless Sensing via Radio-Frequency Energy Harvesting

    Authors: Tao Ni, Zehua Sun, Mingda Han, Guohao Lan, Yaxiong Xie, Zhenjiang Li, Tao Gu, Weitao Xu

    Abstract: Diverse Wi-Fi-based wireless applications have been proposed, ranging from daily activity recognition to vital sign monitoring. Despite their remarkable sensing accuracy, the high energy consumption and the requirement for customized hardware modification hinder the wide deployment of the existing sensing solutions. In this paper, we propose REHSense, an energy-efficient wireless sensing solution… ▽ More

    Submitted 25 August, 2024; originally announced September 2024.

  11. arXiv:2409.00036  [pdf, other

    cs.IT cs.LG cs.MA eess.SY

    GNN-Empowered Effective Partial Observation MARL Method for AoI Management in Multi-UAV Network

    Authors: Yuhao Pan, Xiucheng Wang, Zhiyao Xu, Nan Cheng, Wenchao Xu, Jun-jie Zhang

    Abstract: Unmanned Aerial Vehicles (UAVs), due to their low cost and high flexibility, have been widely used in various scenarios to enhance network performance. However, the optimization of UAV trajectories in unknown areas or areas without sufficient prior information, still faces challenges related to poor planning performance and low distributed execution. These challenges arise when UAVs rely solely on… ▽ More

    Submitted 17 August, 2024; originally announced September 2024.

  12. arXiv:2408.17175  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

    Authors: Zhen Ye, Peiwen Sun, Jiahe Lei, Hongzhan Lin, Xu Tan, Zheqi Dai, Qiuqiang Kong, Jianyi Chen, Jiahao Pan, Qifeng Liu, Yike Guo, Wei Xue

    Abstract: Recent advancements in audio generation have been significantly propelled by the capabilities of Large Language Models (LLMs). The existing research on audio LLM has primarily focused on enhancing the architecture and scale of audio language models, as well as leveraging larger datasets, and generally, acoustic codecs, such as EnCodec, are used for audio tokenization. However, these codecs were or… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  13. arXiv:2408.15488  [pdf, other

    cs.CL

    Legilimens: Practical and Unified Content Moderation for Large Language Model Services

    Authors: Jialin Wu, Jiangyi Deng, Shengyuan Pang, Yanjiao Chen, Jiayang Xu, Xinfeng Li, Wenyuan Xu

    Abstract: Given the societal impact of unsafe content generated by large language models (LLMs), ensuring that LLM services comply with safety standards is a crucial concern for LLM service providers. Common content moderation methods are limited by an effectiveness-and-efficiency dilemma, where simple models are fragile while sophisticated models consume excessive computational resources. In this paper, we… ▽ More

    Submitted 5 September, 2024; v1 submitted 27 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM Conference on Computer and Communications Security (CCS) 2024

  14. arXiv:2408.14972  [pdf, other

    cs.CL

    AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems

    Authors: Chi-Min Chan, Jianxuan Yu, Weize Chen, Chunyang Jiang, Xinyu Liu, Weijie Shi, Zhiyuan Liu, Wei Xue, Yike Guo

    Abstract: The rapid advancement of large language models (LLMs) has led to the rise of LLM-based agents. Recent research shows that multi-agent systems (MAS), where each agent plays a specific role, can outperform individual LLMs. However, configuring an MAS for a task remains challenging, with performance only observable post-execution. Inspired by scaling laws in LLM development, we investigate whether MA… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  15. arXiv:2408.14035  [pdf, other

    cs.RO cs.CV

    FAST-LIVO2: Fast, Direct LiDAR-Inertial-Visual Odometry

    Authors: Chunran Zheng, Wei Xu, Zuhao Zou, Tong Hua, Chongjian Yuan, Dongjiao He, Bingyang Zhou, Zheng Liu, Jiarong Lin, Fangcheng Zhu, Yunfan Ren, Rong Wang, Fanle Meng, Fu Zhang

    Abstract: This paper proposes FAST-LIVO2: a fast, direct LiDAR-inertial-visual odometry framework to achieve accurate and robust state estimation in SLAM tasks and provide great potential in real-time, onboard robotic applications. FAST-LIVO2 fuses the IMU, LiDAR and image measurements efficiently through an ESIKF. To address the dimension mismatch between the heterogeneous LiDAR and image measurements, we… ▽ More

    Submitted 28 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

    Comments: 30 pages, 31 figures, due to the limitation that 'The abstract field cannot exceed 1,920 characters', the abstract presented here is shorter than the one in the PDF file

  16. arXiv:2408.13849  [pdf

    cs.CR

    Sample-Independent Federated Learning Backdoor Attack

    Authors: Weida Xu, Yang Xu, Sicong Zhang

    Abstract: In federated learning, backdoor attacks embed triggers in the adversarial client's data to inject a backdoor into the model. To evade detection through sample analysis, non-sample-modifying backdoor attack methods based on dropout have been developed. However, these methods struggle to covertly utilize dropout in evaluation mode, thus hindering their deployment in real-world scenarios. To address… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  17. arXiv:2408.13773  [pdf, other

    cs.CR cs.AI

    SAB:A Stealing and Robust Backdoor Attack based on Steganographic Algorithm against Federated Learning

    Authors: Weida Xu, Yang Xu, Sicong Zhang

    Abstract: Federated learning, an innovative network architecture designed to safeguard user privacy, is gaining widespread adoption in the realm of technology. However, given the existence of backdoor attacks in federated learning, exploring the security of federated learning is significance. Nevertheless, the backdoors investigated in current federated learning research can be readily detected by human ins… ▽ More

    Submitted 25 August, 2024; originally announced August 2024.

  18. arXiv:2408.12616  [pdf, other

    cs.CV cs.AI

    Semantic Communication based on Large Language Model for Underwater Image Transmission

    Authors: Weilong Chen, Wenxuan Xu, Haoran Chen, Xinran Zhang, Zhijin Qin, Yanru Zhang, Zhu Han

    Abstract: Underwater communication is essential for environmental monitoring, marine biology research, and underwater exploration. Traditional underwater communication faces limitations like low bandwidth, high latency, and susceptibility to noise, while semantic communication (SC) offers a promising solution by focusing on the exchange of semantics rather than symbols or bits. However, SC encounters challe… ▽ More

    Submitted 25 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

  19. arXiv:2408.12162  [pdf, ps, other

    cs.IT eess.SP

    Empowering Over-the-Air Personalized Federated Learning via RIS

    Authors: Wei Shi, Jiacheng Yao, Jindan Xu, Wei Xu, Lexi Xu, Chunming Zhao

    Abstract: Over-the-air computation (AirComp) integrates analog communication with task-oriented computation, serving as a key enabling technique for communication-efficient federated learning (FL) over wireless networks. However, AirComp-enabled FL (AirFL) with a single global consensus model fails to address the data heterogeneity in real-life FL scenarios with non-independent and identically distributed l… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted by SCIENCE CHINA Information Sciences

  20. arXiv:2408.11446  [pdf, other

    cs.ET

    Green Probabilistic Semantic Communication over Wireless Networks

    Authors: Ruopeng Xu, Zhaohui Yang, Yijie Mao, Chongwen Huang, Qianqian Yang, Lexi Xu, Wei Xu, Zhaoyang Zhang

    Abstract: In this paper, we propose a multi-user green semantic communication system facilitated by a probabilistic knowledge graph (PKG). By integrating probability into the knowledge graph, we enable probabilistic semantic communication (PSC) and represent semantic information accordingly. On this basis, a semantic compression model designed for multi-user downlink task-oriented communication is introduce… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  21. arXiv:2408.11381  [pdf, other

    cs.CL

    RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation

    Authors: Xuanwang Zhang, Yunze Song, Yidong Wang, Shuyun Tang, Xinfeng Li, Zhengran Zeng, Zhen Wu, Wei Ye, Wenyuan Xu, Yue Zhang, Xinyu Dai, Shikun Zhang, Qingsong Wen

    Abstract: Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention. However, even the most advanced LLMs face challenges such as hallucinations and real-time updating of their knowledge. Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval Augmented Generation (RAG). However, two key issu… ▽ More

    Submitted 9 September, 2024; v1 submitted 21 August, 2024; originally announced August 2024.

    Comments: 6 pages, 3 figures

  22. arXiv:2408.10899  [pdf, other

    cs.RO

    All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents

    Authors: Zhiqiang Wang, Hao Zheng, Yunshuang Nie, Wenjun Xu, Qingwei Wang, Hua Ye, Zhe Li, Kaidong Zhang, Xuewen Cheng, Wanxi Dong, Chang Cai, Liang Lin, Feng Zheng, Xiaodan Liang

    Abstract: Embodied AI is transforming how AI systems interact with the physical world, yet existing datasets are inadequate for developing versatile, general-purpose agents. These limitations include a lack of standardized formats, insufficient data diversity, and inadequate data volume. To address these issues, we introduce ARIO (All Robots In One), a new data standard that enhances existing datasets by of… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Project website: https://imaei.github.io/project_pages/ario/

  23. arXiv:2408.10641  [pdf, other

    cs.CV cs.AI

    A Review of Human-Object Interaction Detection

    Authors: Yuxiao Wang, Qiwei Xiong, Yu Lei, Weiying Xue, Qi Liu, Zhenao Wei

    Abstract: Human-object interaction (HOI) detection plays a key role in high-level visual understanding, facilitating a deep comprehension of human activities. Specifically, HOI detection aims to locate the humans and objects involved in interactions within images or videos and classify the specific interactions between them. The success of this task is influenced by several key factors, including the accura… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  24. arXiv:2408.10562  [pdf, other

    cs.RO cs.CV

    Kalib: Markerless Hand-Eye Calibration with Keypoint Tracking

    Authors: Tutian Tang, Minghao Liu, Wenqiang Xu, Cewu Lu

    Abstract: Hand-eye calibration involves estimating the transformation between the camera and the robot. Traditional methods rely on fiducial markers, involving much manual labor and careful setup. Recent advancements in deep learning offer markerless techniques, but they present challenges, including the need for retraining networks for each robot, the requirement of accurate mesh models for data generation… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: The code and supplementary materials are available at https://sites.google.com/view/hand-eye-kalib

  25. arXiv:2408.10280  [pdf, other

    cs.LG

    NoRA: Nested Low-Rank Adaptation for Efficient Fine-Tuning Large Models

    Authors: Cheng Lin, Lujun Li, Dezhi Li, Jie Zou, Wei Xue, Yike Guo

    Abstract: In this paper, we introduce Nested Low-Rank Adaptation (NoRA), a novel approach to parameter-efficient fine-tuning that extends the capabilities of Low-Rank Adaptation (LoRA) techniques. Vanilla LoRA overlooks pre-trained weight inheritance and still requires fine-tuning numerous parameters. To addresses these issues, our NoRA adopts a dual-layer nested structure with Singular Value Decomposition… ▽ More

    Submitted 27 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: Work in progress, revisions ongoing

  26. arXiv:2408.10088  [pdf, other

    cs.SI

    Recent Surge in Public Interest in Transportation: Sentiment Analysis of Baidu Apollo Go Using Weibo Data

    Authors: Shiqi Wang, Zhouye Zhao, Yuhang Xie, Mingchuan Ma, Zirui Chen, Zeyu Wang, Bohao Su, Wenrui Xu, Tianyi Li

    Abstract: Urban mobility and transportation systems have been profoundly transformed by the advancement of autonomous vehicle technologies. Baidu Apollo Go, a pioneer robotaxi service from the Chinese tech giant Baidu, has recently been widely deployed in major cities like Beijing and Wuhan, sparking increased conversation and offering a glimpse into the future of urban mobility. This study investigates p… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    ACM Class: J.4

  27. arXiv:2408.09849  [pdf, other

    cs.CL cs.AI

    Importance Weighting Can Help Large Language Models Self-Improve

    Authors: Chunyang Jiang, Chi-min Chan, Wei Xue, Qifeng Liu, Yike Guo

    Abstract: Large language models (LLMs) have shown remarkable capability in numerous tasks and applications. However, fine-tuning LLMs using high-quality datasets under external supervision remains prohibitively expensive. In response, LLM self-improvement approaches have been vibrantly developed recently. The typical paradigm of LLM self-improvement involves training LLM on self-generated data, part of whic… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  28. arXiv:2408.09403  [pdf, other

    cs.AI cs.CV

    Obtaining Optimal Spiking Neural Network in Sequence Learning via CRNN-SNN Conversion

    Authors: Jiahao Su, Kang You, Zekai Xu, Weizhi Xu, Zhezhi He

    Abstract: Spiking neural networks (SNNs) are becoming a promising alternative to conventional artificial neural networks (ANNs) due to their rich neural dynamics and the implementation of energy-efficient neuromorphic chips. However, the non-differential binary communication mechanism makes SNN hard to converge to an ANN-level accuracy. When SNN encounters sequence learning, the situation becomes worse due… ▽ More

    Submitted 25 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted by 33rd International Conference on Artificial Neural Networks

  29. arXiv:2408.08713  [pdf, other

    cs.LG cs.AI cs.IR

    Beyond KAN: Introducing KarSein for Adaptive High-Order Feature Interaction Modeling in CTR Prediction

    Authors: Yunxiao Shi, Wujiang Xu, Mingyu Jin, Haimin Zhang, Qiang Wu, Yongfeng Zhang, Min Xu

    Abstract: Modeling feature interactions is crucial for click-through rate (CTR) prediction, particularly when it comes to high-order explicit interactions. Traditional methods struggle with this task because they often predefine a maximum interaction order, which relies heavily on prior knowledge and can limit the model's effectiveness. Additionally, modeling high-order interactions typically leads to incre… ▽ More

    Submitted 25 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: KarSein for CTR

  30. arXiv:2408.07470  [pdf, other

    cs.HC

    Enhancement of Co-located Shared VR Experiences: Representing Non-HMD Observers on Both HMD and 2D Screen

    Authors: Zixuan Guo, Wenge Xu, Hongyu Wang, Tingjie Wan, Nilufar Baghaei, Cheng-Hung Lo, Hai-Ning Liang

    Abstract: Virtual reality (VR) not only allows head-mounted display (HMD) users to immerse themselves in virtual worlds but also to share them with others. When designed correctly, this shared experience can be enjoyable. However, in typical scenarios, HMD users are isolated by their devices, and non-HMD observers lack connection with the virtual world. To address this, our research investigates visually re… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  31. arXiv:2408.07468  [pdf, other

    cs.HC

    Exploring the Impact of Passthrough on VR Exergaming in Public Environments: A Field Study

    Authors: Zixuan Guo, Hanxiao Deng, Hongyu Wang, Angel J. Y. Tan, Wenge Xu, Hai-Ning Liang

    Abstract: Sedentary behavior is becoming increasingly prevalent in daily work and study environments. VR exergaming has emerged as a promising solution in these places of work and study. However, private spaces in these environments are not easy, and engaging in VR exergaming in public settings presents its own set of challenges (e.g., safety, social acceptance, isolation, and privacy protection). The recen… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  32. arXiv:2408.07184  [pdf, other

    cs.SD cs.AI

    A New Dataset, Notation Software, and Representation for Computational Schenkerian Analysis

    Authors: Stephen Ni-Hahn, Weihan Xu, Jerry Yin, Rico Zhu, Simon Mak, Yue Jiang, Cynthia Rudin

    Abstract: Schenkerian Analysis (SchA) is a uniquely expressive method of music analysis, combining elements of melody, harmony, counterpoint, and form to describe the hierarchical structure supporting a work of music. However, despite its powerful analytical utility and potential to improve music understanding and generation, SchA has rarely been utilized by the computer music community. This is in large pa… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  33. arXiv:2408.06266  [pdf, other

    cs.LG cs.AI cs.CL

    Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment

    Authors: Karel D'Oosterlinck, Winnie Xu, Chris Develder, Thomas Demeester, Amanpreet Singh, Christopher Potts, Douwe Kiela, Shikib Mehri

    Abstract: Large Language Models (LLMs) are often aligned using contrastive alignment objectives and preference pair datasets. The interaction between model, paired data, and objective makes alignment a complicated procedure, sometimes producing subpar results. We study this and find that (i) preference data gives a better learning signal when the underlying responses are contrastive, and (ii) alignment obje… ▽ More

    Submitted 3 September, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  34. arXiv:2408.06082  [pdf, ps, other

    cs.SE

    AutoCheck: Automatically Identifying Variables for Checkpointing by Data Dependency Analysis

    Authors: Xiang Fu, Weiping Zhang, Xin Huang, Shiman Meng, Wubiao Xu, Luanzheng Guo, Kento Sato

    Abstract: Checkpoint/Restart (C/R) has been widely deployed in numerous HPC systems, Clouds, and industrial data centers, which are typically operated by system engineers. Nevertheless, there is no existing approach that helps system engineers without domain expertise, and domain scientists without system fault tolerance knowledge identify those critical variables accounted for correct application execution… ▽ More

    Submitted 15 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: 11 pages, 7 figures, 4 tables

  35. arXiv:2408.04738  [pdf, other

    cs.RO

    DiPGrasp: Parallel Local Searching for Efficient Differentiable Grasp Planning

    Authors: Wenqiang Xu, Jieyi Zhang, Tutian Tang, Zhenjun Yu, Yutong Li, Cewu Lu

    Abstract: Grasp planning is an important task for robotic manipulation. Though it is a richly studied area, a standalone, fast, and differentiable grasp planner that can work with robot grippers of different DOFs has not been reported. In this work, we present DiPGrasp, a grasp planner that satisfies all these goals. DiPGrasp takes a force-closure geometric surface matching grasp quality metric. It adopts a… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  36. arXiv:2408.04267  [pdf, other

    cs.SD eess.AS

    Distil-DCCRN: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation in Speech Enhancement

    Authors: Runduo Han, Weiming Xu, Zihan Zhang, Mingshuai Liu, Lei Xie

    Abstract: The deep complex convolution recurrent network (DCCRN) achieves excellent speech enhancement performance by utilizing the audio spectrum's complex features. However, it has a large number of model parameters. We propose a smaller model, Distil-DCCRN, which has only 30% of the parameters compared to the DCCRN. To ensure that the performance of Distil-DCCRN matches that of the DCCRN, we employ the k… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE Signal Processing Letters

  37. arXiv:2408.03215  [pdf, other

    cs.LG cs.DC

    FedBAT: Communication-Efficient Federated Learning via Learnable Binarization

    Authors: Shiwei Li, Wenchao Xu, Haozhao Wang, Xing Tang, Yining Qi, Shijie Xu, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li

    Abstract: Federated learning is a promising distributed machine learning paradigm that can effectively exploit large-scale data without exposing users' privacy. However, it may incur significant communication overhead, thereby potentially impairing the training efficiency. To address this challenge, numerous studies suggest binarizing the model updates. Nonetheless, traditional methods usually binarize mode… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by ICML 2024

  38. arXiv:2408.02710  [pdf, other

    cs.LG cs.CV

    RCDM: Enabling Robustness for Conditional Diffusion Model

    Authors: Weifeng Xu, Xiang Zhu, Xiaoyong Li

    Abstract: The conditional diffusion model (CDM) enhances the standard diffusion model by providing more control, improving the quality and relevance of the outputs, and making the model adaptable to a wider range of complex tasks. However, inaccurate conditional inputs in the inverse process of CDM can easily lead to generating fixed errors in the neural network, which diminishes the adaptability of a well-… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  39. arXiv:2408.02632  [pdf, other

    cs.CL cs.AI

    SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models

    Authors: Muxi Diao, Rumei Li, Shiyang Liu, Guogang Liao, Jingang Wang, Xunliang Cai, Weiran Xu

    Abstract: As large language models (LLMs) continue to advance in capability and influence, ensuring their security and preventing harmful outputs has become crucial. A promising approach to address these concerns involves training models to automatically generate adversarial prompts for red teaming. However, the evolving subtlety of vulnerabilities in LLMs challenges the effectiveness of current adversarial… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  40. arXiv:2408.02487  [pdf, other

    cs.SE cs.AI cs.LG

    A First Look at License Compliance Capability of LLMs in Code Generation

    Authors: Weiwei Xu, Kai Gao, Hao He, Minghui Zhou

    Abstract: Recent advances in Large Language Models (LLMs) have revolutionized code generation, leading to widespread adoption of AI coding tools by developers. However, LLMs can generate license-protected code without providing the necessary license information, leading to potential intellectual property violations during software production. This paper addresses the critical, yet underexplored, issue of li… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  41. arXiv:2408.02032  [pdf, other

    cs.CV cs.AI

    Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models

    Authors: Fushuo Huo, Wenchao Xu, Zhong Zhang, Haozhao Wang, Zhicheng Chen, Peilin Zhao

    Abstract: While Large Vision-Language Models (LVLMs) have rapidly advanced in recent years, the prevalent issue known as the `hallucination' problem has emerged as a significant bottleneck, hindering their real-world deployments. Existing methods mitigate this issue mainly from two perspectives: One approach leverages extra knowledge like robust instruction tuning LVLMs with curated datasets or employing au… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  42. arXiv:2408.01803  [pdf, other

    cs.LG cs.CL

    STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

    Authors: Peijie Dong, Lujun Li, Dayou Du, Yuhan Chen, Zhenheng Tang, Qiang Wang, Wei Xue, Wenhan Luo, Qifeng Liu, Yike Guo, Xiaowen Chu

    Abstract: In this paper, we present STBLLM, the first structural binarization framework for compressing Large Language Models (LLMs) to less than 1-bit precision. LLMs have achieved remarkable performance, but their heavy memory requirements have hindered widespread adoption, particularly on resource-constrained devices. Binarization, which quantifies weights to a mere 1-bit, achieves a milestone in increas… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  43. arXiv:2408.01791  [pdf

    cs.NI

    Implementing NAT Hole Punching with QUIC

    Authors: Jinyu Liang, Wei Xu, Taotao Wang, Qing Yang, Shengli Zhang

    Abstract: The widespread adoption of Network Address Translation (NAT) technology has led to a significant number of network end nodes being located in private networks behind NAT devices, impeding direct communication between these nodes. To solve this problem, a technique known as "hole punching" has been devised for NAT traversal to facilitate peer-to-peer communication among end nodes located in distinc… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: The paper has been accepted for oral presentation at the VTC2024-Fall Conference

  44. arXiv:2408.01419  [pdf, other

    cs.CL

    DebateQA: Evaluating Question Answering on Debatable Knowledge

    Authors: Rongwu Xu, Xuan Qi, Zehan Qi, Wei Xu, Zhijiang Guo

    Abstract: The rise of large language models (LLMs) has enabled us to seek answers to inherently debatable questions on LLM chatbots, necessitating a reliable way to evaluate their ability. However, traditional QA benchmarks assume fixed answers are inadequate for this purpose. To address this, we introduce DebateQA, a dataset of 2,941 debatable questions, each accompanied by multiple human-annotated partial… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: Dataset and scripts for evaluation are available at https://github.com/pillowsofwind/DebateQA

  45. arXiv:2408.01271  [pdf, other

    cs.CE

    HRFT: Mining High-Frequency Risk Factor Collections End-to-End via Transformer

    Authors: Wenyan Xu, Rundong Wang, Chen Li, Yonghong Hu, Zhonghua Lu

    Abstract: In quantitative trading, it is common to find patterns in short term volatile trends of the market. These patterns are known as High Frequency (HF) risk factors, serving as key indicators of future stock price volatility. Traditionally, these risk factors were generated by financial models relying heavily on domain-specific knowledge manually added rather than extensive market data. Inspired by sy… ▽ More

    Submitted 5 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: Preprint. Under review

  46. arXiv:2408.00913  [pdf, other

    cs.NI cs.ET

    Design and Implementation of ARA Wireless Living Lab for Rural Broadband and Applications

    Authors: Taimoor Ul Islam, Joshua Ofori Boateng, Md Nadim, Guoying Zu, Mukaram Shahid, Xun Li, Tianyi Zhang, Salil Reddy, Wei Xu, Ataberk Atalar, Vincent Lee, Yung-Fu Chen, Evan Gosling, Elisabeth Permatasari, Christ Somiah, Zhibo Meng, Sarath Babu, Mohammed Soliman, Ali Hussain, Daji Qiao, Mai Zheng, Ozdal Boyraz, Yong Guan, Anish Arora, Mohamed Selim , et al. (6 additional authors not shown)

    Abstract: To address the rural broadband challenge and to leverage the unique opportunities that rural regions provide for piloting advanced wireless applications, we design and implement the ARA wireless living lab for research and innovation in rural wireless systems and their applications in precision agriculture, community services, and so on. ARA focuses on the unique community, application, and econom… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 17 pages, 18 figures

  47. arXiv:2407.21531  [pdf, other

    cs.SD cs.CL cs.MM eess.AS

    Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation

    Authors: Ziya Zhou, Yuhang Wu, Zhiyue Wu, Xinyue Zhang, Ruibin Yuan, Yinghao Ma, Lu Wang, Emmanouil Benetos, Wei Xue, Yike Guo

    Abstract: Symbolic Music, akin to language, can be encoded in discrete symbols. Recent research has extended the application of large language models (LLMs) such as GPT-4 and Llama2 to the symbolic music domain including understanding and generation. Yet scant research explores the details of how these LLMs perform on advanced music understanding and conditioned generation, especially from the multi-step re… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted by ISMIR2024

  48. arXiv:2407.20962  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

    Authors: Xiaowei Chi, Yatian Wang, Aosong Cheng, Pengjun Fang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, Mengfei Li, Ruibin Yuan, Yanbing Jiang, Wei Xue, Wenhan Luo, Qifeng Chen, Shanghang Zhang, Qifeng Liu, Yike Guo

    Abstract: Massive multi-modality datasets play a significant role in facilitating the success of large video-language models. However, current video-language datasets primarily provide text descriptions for visual frames, considering audio to be weakly related information. They usually overlook exploring the potential of inherent audio-visual correlation, leading to monotonous annotation within each modalit… ▽ More

    Submitted 6 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: 15 Pages. Dataset report

  49. arXiv:2407.19765  [pdf, other

    cs.AI

    Map2Traj: Street Map Piloted Zero-shot Trajectory Generation with Diffusion Model

    Authors: Zhenyu Tao, Wei Xu, Xiaohu You

    Abstract: User mobility modeling serves a crucial role in analysis and optimization of contemporary wireless networks. Typical stochastic mobility models, e.g., random waypoint model and Gauss Markov model, can hardly capture the distribution characteristics of users within real-world areas. State-of-the-art trace-based mobility models and existing learning-based trajectory generation methods, however, are… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  50. arXiv:2407.19672  [pdf, other

    cs.CL

    SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

    Authors: Wenxuan Zhang, Hou Pong Chan, Yiran Zhao, Mahani Aljunied, Jianyu Wang, Chaoqun Liu, Yue Deng, Zhiqiang Hu, Weiwen Xu, Yew Ken Chia, Xin Li, Lidong Bing

    Abstract: Large Language Models (LLMs) have shown remarkable abilities across various tasks, yet their development has predominantly centered on high-resource languages like English and Chinese, leaving low-resource languages underserved. To address this disparity, we present SeaLLMs 3, the latest iteration of the SeaLLMs model family, tailored for Southeast Asian languages. This region, characterized by it… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.