Skip to main content

Showing 1–50 of 523 results for author: Xia, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.11295  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    EIA: Environmental Injection Attack on Generalist Web Agents for Privacy Leakage

    Authors: Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, Huan Sun

    Abstract: Generalist web agents have evolved rapidly and demonstrated remarkable potential. However, there are unprecedented safety risks associated with these them, which are nearly unexplored so far. In this work, we aim to narrow this gap by conducting the first study on the privacy risks of generalist web agents in adversarial environments. First, we present a threat model that discusses the adversarial… ▽ More

    Submitted 17 September, 2024; originally announced September 2024.

    Comments: 24 pages

  2. arXiv:2409.08615  [pdf, other

    cs.GR

    DrawingSpinUp: 3D Animation from Single Character Drawings

    Authors: Jie Zhou, Chufeng Xiao, Miu-Ling Lam, Hongbo Fu

    Abstract: Animating various character drawings is an engaging visual content creation task. Given a single character drawing, existing animation methods are limited to flat 2D motions and thus lack 3D effects. An alternative solution is to reconstruct a 3D model from a character drawing as a proxy and then retarget 3D motion data onto it. However, the existing image-to-3D methods could not work well for ama… ▽ More

    Submitted 13 September, 2024; originally announced September 2024.

    Comments: 10 pages, 15 figures

  3. arXiv:2409.08475  [pdf, other

    cs.CV

    RT-DETRv3: Real-time End-to-End Object Detection with Hierarchical Dense Positive Supervision

    Authors: Shuo Wang, Chunlong Xia, Feng Lv, Yifeng Shi

    Abstract: RT-DETR is the first real-time end-to-end transformer-based object detector. Its efficiency comes from the framework design and the Hungarian matching. However, compared to dense supervision detectors like the YOLO series, the Hungarian matching provides much sparser supervision, leading to insufficient model training and difficult to achieve optimal results. To address these issues, we proposed a… ▽ More

    Submitted 12 September, 2024; originally announced September 2024.

  4. arXiv:2409.07055  [pdf, other

    cs.CL cs.AI cs.CY

    Legal Fact Prediction: Task Definition and Dataset Construction

    Authors: Junkai Liu, Yujie Tong, Hui Huang, Shuyuan Zheng, Muyun Yang, Peicheng Wu, Makoto Onizuka, Chuan Xiao

    Abstract: Legal facts refer to the facts that can be proven by acknowledged evidence in a trial. They form the basis for the determination of court judgments. This paper introduces a novel NLP task: legal fact prediction, which aims to predict the legal fact based on a list of evidence. The predicted facts can instruct the parties and their lawyers involved in a trial to strengthen their submissions and opt… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

  5. arXiv:2409.07013  [pdf

    cs.RO

    Enabling Shared-Control for A Riding Ballbot System

    Authors: Yu Chen, Mahshid Mansouri, Chenzhang Xiao, Ze Wang, Elizabeth T. Hsiao-Wecksler, William R. Norris

    Abstract: This study introduces a shared-control approach for collision avoidance in a self-balancing riding ballbot, called PURE, marked by its dynamic stability, omnidirectional movement, and hands-free interface. Integrated with a sensor array and a novel Passive Artificial Potential Field (PAPF) method, PURE provides intuitive navigation with deceleration assistance and haptic/audio feedback, effectivel… ▽ More

    Submitted 11 September, 2024; originally announced September 2024.

    Comments: 7 pages and 7 figures, IEEE ICRA format

    ACM Class: I.2.9

  6. arXiv:2409.06948  [pdf, other

    cs.RO eess.SY

    Equivariant Filter for Tightly Coupled LiDAR-Inertial Odometry

    Authors: Anbo Tao, Yarong Luo, Chunxi Xia, Chi Guo, Xingxing Li

    Abstract: Pose estimation is a crucial problem in simultaneous localization and mapping (SLAM). However, developing a robust and consistent state estimator remains a significant challenge, as the traditional extended Kalman filter (EKF) struggles to handle the model nonlinearity, especially for inertial measurement unit (IMU) and light detection and ranging (LiDAR). To provide a consistent and efficient sol… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

  7. arXiv:2409.03512  [pdf, other

    cs.CY cs.CL

    From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents

    Authors: Jifan Yu, Zheyuan Zhang, Daniel Zhang-li, Shangqing Tu, Zhanxin Hao, Rui Miao Li, Haoxuan Li, Yuanchun Wang, Hanming Li, Linlu Gong, Jie Cao, Jiayin Lin, Jinchang Zhou, Fei Qin, Haohua Wang, Jianxiao Jiang, Lijun Deng, Yisi Zhan, Chaojun Xiao, Xusheng Dai, Xuan Yan, Nianyi Lin, Nan Zhang, Ruixin Ni, Yang Dang , et al. (8 additional authors not shown)

    Abstract: Since the first instances of online education, where courses were uploaded to accessible and shared online platforms, this form of scaling the dissemination of human knowledge to reach a broader audience has sparked extensive discussion and widespread adoption. Recognizing that personalized learning still holds significant potential for improvement, new AI technologies have been continuously integ… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  8. arXiv:2409.02877  [pdf, other

    cs.AI cs.CL cs.LG

    Configurable Foundation Models: Building LLMs from a Modular Perspective

    Authors: Chaojun Xiao, Zhengyan Zhang, Chenyang Song, Dazhi Jiang, Feng Yao, Xu Han, Xiaozhi Wang, Shuo Wang, Yufei Huang, Guanyu Lin, Yingfa Chen, Weilin Zhao, Yuge Tu, Zexuan Zhong, Ao Zhang, Chenglei Si, Khai Hao Moo, Chenyang Zhao, Huimin Chen, Yankai Lin, Zhiyuan Liu, Jingbo Shang, Maosong Sun

    Abstract: Advancements in LLMs have recently unveiled challenges tied to computational efficiency and continual scalability due to their requirements of huge parameters, making the applications and evolution of these models on devices with limited computation resources and scenarios requiring various abilities increasingly cumbersome. Inspired by modularity within the human brain, there is a growing tendenc… ▽ More

    Submitted 4 September, 2024; originally announced September 2024.

  9. arXiv:2409.02382  [pdf, other

    cs.CV

    GGS: Generalizable Gaussian Splatting for Lane Switching in Autonomous Driving

    Authors: Huasong Han, Kaixuan Zhou, Xiaoxiao Long, Yusen Wang, Chunxia Xiao

    Abstract: We propose GGS, a Generalizable Gaussian Splatting method for Autonomous Driving which can achieve realistic rendering under large viewpoint changes. Previous generalizable 3D gaussian splatting methods are limited to rendering novel views that are very close to the original pair of images, which cannot handle large differences in viewpoint. Especially in autonomous driving scenarios, images are t… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

  10. arXiv:2409.00988  [pdf, other

    cs.CV

    Self-Supervised Multi-Scale Network for Blind Image Deblurring via Alternating Optimization

    Authors: Lening Guo, Jing Yu, Ning Zhang, Chuangbai Xiao

    Abstract: Blind image deblurring is a challenging low-level vision task that involves estimating the unblurred image when the blur kernel is unknown. In this paper, we present a self-supervised multi-scale blind image deblurring method to jointly estimate the latent image and the blur kernel via alternating optimization. In the image estimation step, we construct a multi-scale generator network with multipl… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 21 pages, 17 figures, 94 references

  11. arXiv:2408.17223  [pdf, other

    cs.CV

    OG-Mapping: Octree-based Structured 3D Gaussians for Online Dense Mapping

    Authors: Meng Wang, Junyi Wang, Changqun Xia, Chen Wang, Yue Qi

    Abstract: 3D Gaussian splatting (3DGS) has recently demonstrated promising advancements in RGB-D online dense mapping. Nevertheless, existing methods excessively rely on per-pixel depth cues to perform map densification, which leads to significant redundancy and increased sensitivity to depth noise. Additionally, explicitly storing 3D Gaussian parameters of room-scale scene poses a significant storage chall… ▽ More

    Submitted 30 August, 2024; originally announced August 2024.

  12. arXiv:2408.12590  [pdf, other

    cs.CV cs.AI

    xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations

    Authors: Can Qin, Congying Xia, Krithika Ramakrishnan, Michael Ryoo, Lifu Tu, Yihao Feng, Manli Shu, Honglu Zhou, Anas Awadalla, Jun Wang, Senthil Purushwalkam, Le Xue, Yingbo Zhou, Huan Wang, Silvio Savarese, Juan Carlos Niebles, Zeyuan Chen, Ran Xu, Caiming Xiong

    Abstract: We present xGen-VideoSyn-1, a text-to-video (T2V) generation model capable of producing realistic scenes from textual descriptions. Building on recent advancements, such as OpenAI's Sora, we explore the latent diffusion model (LDM) architecture and introduce a video variational autoencoder (VidVAE). VidVAE compresses video data both spatially and temporally, significantly reducing the length of vi… ▽ More

    Submitted 31 August, 2024; v1 submitted 22 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV24 AI4VA

  13. arXiv:2408.11293  [pdf, other

    cs.RO cs.LG

    ViIK: Flow-based Vision Inverse Kinematics Solver with Fusing Collision Checking

    Authors: Qinglong Meng, Chongkun Xia, Xueqian Wang

    Abstract: Inverse Kinematics (IK) is to find the robot's configurations that satisfy the target pose of the end effector. In motion planning, diverse configurations were required in case a feasible trajectory was not found. Meanwhile, collision checking (CC), e.g. Oriented bounding box (OBB), Discrete Oriented Polytope (DOP), and Quickhull \cite{quickhull}, needs to be done for each configuration provided b… ▽ More

    Submitted 28 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

  14. arXiv:2408.09178  [pdf, other

    cs.CV

    MambaTrack: A Simple Baseline for Multiple Object Tracking with State Space Model

    Authors: Changcheng Xiao, Qiong Cao, Zhigang Luo, Long Lan

    Abstract: Tracking by detection has been the prevailing paradigm in the field of Multi-object Tracking (MOT). These methods typically rely on the Kalman Filter to estimate the future locations of objects, assuming linear object motion. However, they fall short when tracking objects exhibiting nonlinear and diverse motion in scenarios like dancing and sports. In addition, there has been limited focus on util… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM Multimedia 2024

  15. arXiv:2408.06854  [pdf, other

    cs.CL

    LoRA$^2$ : Multi-Scale Low-Rank Approximations for Fine-Tuning Large Language Models

    Authors: Jia-Chen Zhang, Yu-Jie Xiong, He-Xi Qiu, Dong-Hai Zhu, Chun-Ming Xia

    Abstract: Fine-tuning large language models (LLMs) with high parameter efficiency for downstream tasks has become a new paradigm. Low-Rank Adaptation (LoRA) significantly reduces the number of trainable parameters for fine-tuning. Although it has demonstrated commendable performance, updating parameters within a single scale may not be the optimal choice for complex downstream tasks.In this paper, we extend… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  16. Diffusion Model-based Contrastive Learning for Human Activity Recognition

    Authors: Chunjing Xiao, Yanhui Han, Wei Yang, Yane Hou, Fangzhan Shi, Kevin Chetty

    Abstract: WiFi Channel State Information (CSI)-based activity recognition has sparked numerous studies due to its widespread availability and privacy protection. However, when applied in practical applications, general CSI-based recognition models may face challenges related to the limited generalization capability, since individuals with different behavior habits will cause various fluctuations in CSI data… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: The paper has been accepted by IEEE Internet of Things Journal

  17. arXiv:2408.03005  [pdf, other

    cs.DB

    Automatic String Data Validation with Pattern Discovery

    Authors: Xinwei Lin, Jing Zhao, Peng Di, Chuan Xiao, Rui Mao, Yan Ji, Makoto Onizuka, Zishuo Ding, Weiyi Shang, Jianbin Qin

    Abstract: In enterprise data pipelines, data insertions occur periodically and may impact downstream services if data quality issues are not addressed. Typically, such problems can be investigated and fixed by on-call engineers, but locating the cause of such problems and fixing errors are often time-consuming. Therefore, automatic data validation is a better solution to defend the system and downstream ser… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  18. arXiv:2408.02065  [pdf, other

    cs.LG stat.ML

    A Multi-class Ride-hailing Service Subsidy System Utilizing Deep Causal Networks

    Authors: Zhe Yu, Chi Xia, Shaosheng Cao, Lin Zhou

    Abstract: In the ride-hailing industry, subsidies are predominantly employed to incentivize consumers to place more orders, thereby fostering market growth. Causal inference techniques are employed to estimate the consumer elasticity with different subsidy levels. However, the presence of confounding effects poses challenges in achieving an unbiased estimate of the uplift effect. We introduce a consumer sub… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  19. arXiv:2408.01690  [pdf, other

    cs.CV cs.AI cs.MM

    IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection

    Authors: Hong Guan, Yancheng Wang, Lulu Xie, Soham Nag, Rajeev Goel, Niranjan Erappa Narayana Swamy, Yingzhen Yang, Chaowei Xiao, Jonathan Prisby, Ross Maciejewski, Jia Zou

    Abstract: Effective fraud detection and analysis of government-issued identity documents, such as passports, driver's licenses, and identity cards, are essential in thwarting identity theft and bolstering security on online platforms. The training of accurate fraud detection and analysis tools depends on the availability of extensive identity document datasets. However, current publicly available benchmark… ▽ More

    Submitted 3 September, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

    Comments: 40 pages

  20. arXiv:2408.01137  [pdf, other

    cs.CV

    PGNeXt: High-Resolution Salient Object Detection via Pyramid Grafting Network

    Authors: Changqun Xia, Chenxi Xie, Zhentao He, Tianshu Yu, Jia Li

    Abstract: We present an advanced study on more challenging high-resolution salient object detection (HRSOD) from both dataset and network framework perspectives. To compensate for the lack of HRSOD dataset, we thoughtfully collect a large-scale high resolution salient object detection dataset, called UHRSD, containing 5,920 images from real-world complex scenarios at 4K-8K resolutions. All the images are fi… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  21. arXiv:2407.20224  [pdf, other

    cs.CL

    Can Editing LLMs Inject Harm?

    Authors: Canyu Chen, Baixiang Huang, Zekun Li, Zhaorun Chen, Shiyang Lai, Xiongxiao Xu, Jia-Chen Gu, Jindong Gu, Huaxiu Yao, Chaowei Xiao, Xifeng Yan, William Yang Wang, Philip Torr, Dawn Song, Kai Shu

    Abstract: Knowledge editing has been increasingly adopted to correct the false or outdated knowledge in Large Language Models (LLMs). Meanwhile, one critical but under-explored question is: can knowledge editing be used to inject harm into LLMs? In this paper, we propose to reformulate knowledge editing as a new type of safety threat for LLMs, namely Editing Attack, and conduct a systematic investigation wi… ▽ More

    Submitted 16 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally. 9 pages for main paper, 36 pages including appendix. The code, results, dataset for this paper and more resources are on the project website: https://llm-editing.github.io

  22. Motif-Consistent Counterfactuals with Adversarial Refinement for Graph-Level Anomaly Detection

    Authors: Chunjing Xiao, Shikang Pang, Wenxin Tai, Yanlong Huang, Goce Trajcevski, Fan Zhou

    Abstract: Graph-level anomaly detection is significant in diverse domains. To improve detection performance, counterfactual graphs have been exploited to benefit the generalization capacity by learning causal relations. Most existing studies directly introduce perturbations (e.g., flipping edges) to generate counterfactual graphs, which are prone to alter the semantics of generated examples and make them of… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by KDD 2024

  23. arXiv:2407.13164  [pdf, other

    cs.CL cs.AI

    Translate-and-Revise: Boosting Large Language Models for Constrained Translation

    Authors: Pengcheng Huang, Yongyu Mu, Yuzhang Wu, Bei Li, Chunyang Xiao, Tong Xiao, Jingbo Zhu

    Abstract: Imposing constraints on machine translation systems presents a challenging issue because these systems are not trained to make use of constraints in generating adequate, fluent translations. In this paper, we leverage the capabilities of large language models (LLMs) for constrained translation, given that LLMs can easily adapt to this task by taking translation instructions and constraints as prom… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 16 pages

  24. arXiv:2407.12784  [pdf, other

    cs.LG cs.CR cs.IR

    AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases

    Authors: Zhaorun Chen, Zhen Xiang, Chaowei Xiao, Dawn Song, Bo Li

    Abstract: LLM agents have demonstrated remarkable performance across various applications, primarily due to their advanced capabilities in reasoning, utilizing external knowledge and tools, calling APIs, and executing actions to interact with environments. Current agents typically utilize a memory module or a retrieval-augmented generation (RAG) mechanism, retrieving past knowledge and instances with simila… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 22 pages, 13 figures, 7 tables

  25. arXiv:2407.05563  [pdf, other

    cs.CL

    LLMBox: A Comprehensive Library for Large Language Models

    Authors: Tianyi Tang, Yiwen Hu, Bingqian Li, Wenyang Luo, Zijing Qin, Haoxiang Sun, Jiapeng Wang, Shiyi Xu, Xiaoxue Cheng, Geyang Guo, Han Peng, Bowen Zheng, Yiru Tang, Yingqian Min, Yushuo Chen, Jie Chen, Yuanqian Zhao, Luran Ding, Yuhao Wang, Zican Dong, Chunxuan Xia, Junyi Li, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen

    Abstract: To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs. This library is featured with three main merits: (1) a unified data interface that supports the flexible implementation of various training strategies, (2) a comprehensive evaluation that covers extensive tasks, datasets,… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024 Demo

  26. arXiv:2407.05276  [pdf, other

    cs.DC

    BFLN: A Blockchain-based Federated Learning Model for Non-IID Data

    Authors: Yang Li, Chunhe Xia, Dongchi Huang, Xiaojian Li, Tianbo Wang

    Abstract: As the application of federated learning becomes increasingly widespread, the issue of imbalanced training data distribution has emerged as a significant challenge. Federated learning utilizes local data stored on different training clients for model training, rather than centralizing data on a server, thereby greatly enhancing the privacy and security of training data. However, the distribution o… ▽ More

    Submitted 10 July, 2024; v1 submitted 7 July, 2024; originally announced July 2024.

  27. arXiv:2407.04451  [pdf, other

    cs.LG cs.AI

    Hindsight Preference Learning for Offline Preference-based Reinforcement Learning

    Authors: Chen-Xiao Gao, Shengjun Fang, Chenjun Xiao, Yang Yu, Zongzhang Zhang

    Abstract: Offline preference-based reinforcement learning (RL), which focuses on optimizing policies using human preferences between pairs of trajectory segments selected from an offline dataset, has emerged as a practical avenue for RL applications. Existing works rely on extracting step-wise reward signals from trajectory-wise preference annotations, assuming that preferences correlate with the cumulative… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  28. arXiv:2407.02143  [pdf, other

    cs.LG cs.SI

    Counterfactual Data Augmentation with Denoising Diffusion for Graph Anomaly Detection

    Authors: Chunjing Xiao, Shikang Pang, Xovee Xu, Xuan Li, Goce Trajcevski, Fan Zhou

    Abstract: A critical aspect of Graph Neural Networks (GNNs) is to enhance the node representations by aggregating node neighborhood information. However, when detecting anomalies, the representations of abnormal nodes are prone to be averaged by normal neighbors, making the learned anomaly representations less distinguishable. To tackle this issue, we propose CAGAD -- an unsupervised Counterfactual data Aug… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by IEEE Transactions on Computational Social Systems(TCSS). DOI: https://doi.org/10.1109/TCSS.2024.3403503

  29. arXiv:2407.01489  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    Agentless: Demystifying LLM-based Software Engineering Agents

    Authors: Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, Lingming Zhang

    Abstract: Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry practitioners have developed various autonomous LLM agents to perform end-to-end software development tasks. These agents are equipped with the ability to use tools, run c… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  30. arXiv:2407.00631  [pdf, other

    cs.LG cs.AI

    TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets

    Authors: Jintai Chen, Yaojun Hu, Yue Wang, Yingzhou Lu, Xu Cao, Miao Lin, Hongxia Xu, Jian Wu, Cao Xiao, Jimeng Sun, Lucas Glass, Kexin Huang, Marinka Zitnik, Tianfan Fu

    Abstract: Clinical trials are pivotal for developing new medical treatments, yet they typically pose some risks such as patient mortality, adverse events, and enrollment failure that waste immense efforts spanning over a decade. Applying artificial intelligence (AI) to forecast or simulate key events in clinical trials holds great potential for providing insights to guide trial designs. However, complex dat… ▽ More

    Submitted 3 September, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

  31. arXiv:2407.00623  [pdf, other

    cs.CV

    Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness

    Authors: Yiquan Li, Zhongzhu Chen, Kun Jin, Jiongxiao Wang, Bo Li, Chaowei Xiao

    Abstract: Diffusion Purification, purifying noised images with diffusion models, has been widely used for enhancing certified robustness via randomized smoothing. However, existing frameworks often grapple with the balance between efficiency and effectiveness. While the Denoising Diffusion Probabilistic Model (DDPM) offers an efficient single-step purification, it falls short in ensuring purified images res… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  32. arXiv:2406.20038  [pdf, other

    cs.CL

    BioMNER: A Dataset for Biomedical Method Entity Recognition

    Authors: Chen Tang, Bohao Yang, Kun Zhao, Bo Lv, Chenghao Xiao, Frank Guerin, Chenghua Lin

    Abstract: Named entity recognition (NER) stands as a fundamental and pivotal task within the realm of Natural Language Processing. Particularly within the domain of Biomedical Method NER, this task presents notable challenges, stemming from the continual influx of domain-specific terminologies in scholarly literature. Current research in Biomedical Method (BioMethod) NER suffers from a scarcity of resources… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  33. arXiv:2406.18966  [pdf, other

    cs.CL

    UniGen: A Unified Framework for Textual Dataset Generation Using Large Language Models

    Authors: Siyuan Wu, Yue Huang, Chujie Gao, Dongping Chen, Qihui Zhang, Yao Wan, Tianyi Zhou, Xiangliang Zhang, Jianfeng Gao, Chaowei Xiao, Lichao Sun

    Abstract: Large Language Models (LLMs) such as GPT-4 and Llama3 have significantly impacted various fields by enabling high-quality synthetic data generation and reducing dependence on expensive human-generated datasets. Despite this, challenges remain in the areas of generalization, controllability, diversity, and truthfulness within the existing generative frameworks. To address these challenges, this pap… ▽ More

    Submitted 22 August, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

  34. arXiv:2406.18099  [pdf, other

    cs.DB

    CompassDB: Pioneering High-Performance Key-Value Store with Perfect Hash

    Authors: Jin Jiang, Dongsheng He, Yu Hu, Dong Liu, Chenfan Xiao, Hongxiao Bi, Yusong Zhang, Chaoqu Jiang, Zhijun Fu

    Abstract: Modern mainstream persistent key-value storage engines utilize Log-Structured Merge tree (LSM-tree) based designs, optimizing read/write performance by leveraging sequential disk I/O. However, the advent of SSDs, with their significant improvements in bandwidth and IOPS, shifts the bottleneck from I/O to CPU. The high compaction cost and large read/write amplification associated with LSM trees hav… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  35. arXiv:2406.17962  [pdf, other

    cs.CL

    Crafting Customisable Characters with LLMs: Introducing SimsChat, a Persona-Driven Role-Playing Agent Framework

    Authors: Bohao Yang, Dong Liu, Chen Tang, Chenghao Xiao, Kun Zhao, Chao Li, Lin Yuan, Guang Yang, Lanxiao Huang, Chenghua Lin

    Abstract: Large Language Models (LLMs) demonstrate a remarkable ability to comprehend human instructions and generate high-quality text. This capability allows LLMs to function as agents that can emulate human beings at a more sophisticated level, beyond the mere replication of basic human behaviours. However, there is a lack of exploring into leveraging LLMs to craft characters from diverse aspects. In thi… ▽ More

    Submitted 16 August, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  36. arXiv:2406.17911  [pdf, other

    cs.CL

    X-ray Made Simple: Radiology Report Generation and Evaluation with Layman's Terms

    Authors: Kun Zhao, Chenghao Xiao, Chen Tang, Bohao Yang, Kai Ye, Noura Al Moubayed, Liang Zhan, Chenghua Lin

    Abstract: Radiology Report Generation (RRG) has achieved significant progress with the advancements of multimodal generative models. However, the evaluation in the domain suffers from a lack of fair and robust metrics. We reveal that, high performance on RRG with existing lexical-based metrics (e.g. BLEU) might be more of a mirage - a model can get a high BLEU only by learning the template of reports. This… ▽ More

    Submitted 30 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  37. arXiv:2406.16253  [pdf, other

    cs.CL

    LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

    Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

    Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  38. arXiv:2406.16121  [pdf, other

    cs.LG cs.AI

    Diffusion Spectral Representation for Reinforcement Learning

    Authors: Dmitry Shribak, Chen-Xiao Gao, Yitong Li, Chenjun Xiao, Bo Dai

    Abstract: Diffusion-based models have achieved notable empirical successes in reinforcement learning (RL) due to their expressiveness in modeling complex distributions. Despite existing methods being promising, the key challenge of extending existing methods for broader real-world applications lies in the computational cost at inference time, i.e., sampling from a diffusion model is considerably slow as it… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: Under review

  39. arXiv:2406.14482  [pdf, other

    cs.CV

    Visible-Thermal Tiny Object Detection: A Benchmark Dataset and Baselines

    Authors: Xinyi Ying, Chao Xiao, Ruojing Li, Xu He, Boyang Li, Zhaoxu Li, Yingqian Wang, Mingyuan Hu, Qingyu Xu, Zaiping Lin, Miao Li, Shilin Zhou, Wei An, Weidong Sheng, Li Liu

    Abstract: Small object detection (SOD) has been a longstanding yet challenging task for decades, with numerous datasets and algorithms being developed. However, they mainly focus on either visible or thermal modality, while visible-thermal (RGBT) bimodality is rarely explored. Although some RGBT datasets have been developed recently, the insufficient quantity, limited category, misaligned images and large t… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  40. arXiv:2406.13942  [pdf, other

    cs.LG

    Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models

    Authors: Yuan Zhong, Xiaochen Wang, Jiaqi Wang, Xiaokun Zhang, Yaqing Wang, Mengdi Huai, Cao Xiao, Fenglong Ma

    Abstract: Synthesizing electronic health records (EHR) data has become a preferred strategy to address data scarcity, improve data quality, and model fairness in healthcare. However, existing approaches for EHR data generation predominantly rely on state-of-the-art generative techniques like generative adversarial networks, variational autoencoders, and language models. These methods typically replicate inp… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  41. arXiv:2406.09411  [pdf, other

    cs.CV cs.AI cs.CL

    MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

    Authors: Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of multi-image relations (e.g., multiview, temporal relations). Comprising 11,264 images and 2,600 multiple-choice questions, MuirBench is created in a… ▽ More

    Submitted 1 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: typos corrected, references added, Project Page: https://muirbench.github.io/

  42. arXiv:2406.01960  [pdf, other

    cs.LG cs.AI

    Certifiably Byzantine-Robust Federated Conformal Prediction

    Authors: Mintong Kang, Zhen Lin, Jimeng Sun, Cao Xiao, Bo Li

    Abstract: Conformal prediction has shown impressive capacity in constructing statistically rigorous prediction sets for machine learning models with exchangeable data samples. The siloed datasets, coupled with the escalating privacy concerns related to local data sharing, have inspired recent innovations extending conformal prediction into federated environments with distributed data samples. However, this… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to ICML 2024

  43. arXiv:2405.21043  [pdf, other

    cs.LG cs.AI

    Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation

    Authors: Fengdi Che, Chenjun Xiao, Jincheng Mei, Bo Dai, Ramki Gummadi, Oscar A Ramirez, Christopher K Harris, A. Rupam Mahmood, Dale Schuurmans

    Abstract: We prove that the combination of a target network and over-parameterized linear function approximation establishes a weaker convergence condition for bootstrapped value estimation in certain cases, even with off-policy data. Our condition is naturally satisfied for expected updates over the entire state-action space or learning with a batch of complete trajectories from episodic Markov decision pr… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Journal ref: Proceedings of the 41 st International Conference on Machine Learning, 2024

  44. arXiv:2405.19524  [pdf, other

    cs.CR cs.AI

    AI Risk Management Should Incorporate Both Safety and Security

    Authors: Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Geiping, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi, Boyi Wei, Tinghao Xie, Danqi Chen, Pin-Yu Chen, Jeffrey Ding, Ruoxi Jia, Jiaqi Ma, Arvind Narayanan, Weijie J Su, Mengdi Wang, Chaowei Xiao, Bo Li, Dawn Song, Peter Henderson, Prateek Mittal

    Abstract: The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this pape… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  45. arXiv:2405.17450  [pdf, other

    cs.CV cs.LG

    The Power of Next-Frame Prediction for Learning Physical Laws

    Authors: Thomas Winterbottom, G. Thomas Hudson, Daniel Kluvanec, Dean Slack, Jamie Sterling, Junjie Shentu, Chenghao Xiao, Zheming Zhou, Noura Al Moubayed

    Abstract: Next-frame prediction is a useful and powerful method for modelling and understanding the dynamics of video data. Inspired by the empirical success of causal language modelling and next-token prediction in language modelling, we explore the extent to which next-frame prediction serves as a strong foundational learning strategy (analogous to language modelling) for inducing an understanding of the… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 7 Figures, 12 Pages, 1 Table

    MSC Class: 68T45 ACM Class: I.2.6; I.2.10

  46. arXiv:2405.16412  [pdf, other

    cs.CL cs.LG

    KG-FIT: Knowledge Graph Fine-Tuning Upon Open-World Knowledge

    Authors: Pengcheng Jiang, Lang Cao, Cao Xiao, Parminder Bhatia, Jimeng Sun, Jiawei Han

    Abstract: Knowledge Graph Embedding (KGE) techniques are crucial in learning compact representations of entities and relations within a knowledge graph, facilitating efficient reasoning and knowledge discovery. While existing methods typically focus either on training KGE models solely based on graph structure or fine-tuning pre-trained language models with classification data in KG, KG-FIT leverages LLM-gu… ▽ More

    Submitted 4 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  47. arXiv:2405.15973  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement

    Authors: Xiyao Wang, Jiuhai Chen, Zhaoyang Wang, Yuhang Zhou, Yiyang Zhou, Huaxiu Yao, Tianyi Zhou, Tom Goldstein, Parminder Bhatia, Furong Huang, Cao Xiao

    Abstract: Large vision-language models (LVLMs) have achieved impressive results in various visual question-answering and reasoning tasks through vision instruction tuning on specific datasets. However, there is still significant room for improvement in the alignment between visual and language modalities. Previous methods to enhance this alignment typically require external models or data, heavily depending… ▽ More

    Submitted 7 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: 15 pages, 8 figures

  48. arXiv:2405.11883  [pdf, other

    cs.IT eess.SP

    Asynchronous MIMO-OFDM Massive Unsourced Random Access with Codeword Collisions

    Authors: Tianya Li, Yongpeng Wu, Junyuan Gao, Wenjun Zhang, Xiang-Gen Xia, Derrick Wing Kwan Ng, Chengshan Xiao

    Abstract: This paper investigates asynchronous MIMO massive unsourced random access in an orthogonal frequency division multiplexing (OFDM) system over frequency-selective fading channels, with the presence of both timing and carrier frequency offsets (TO and CFO) and non-negligible codeword collisions. The proposed coding framework segregates the data into two components, namely, preamble and coding parts,… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 13 pages, 12 figures, submitted to the IEEE for possible publication

  49. arXiv:2405.10529  [pdf, other

    cs.CV cs.AI

    Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors

    Authors: Jiachen Sun, Changsheng Wang, Jiongxiao Wang, Yiwei Zhang, Chaowei Xiao

    Abstract: Large language models have become increasingly prominent, also signaling a shift towards multimodality as the next frontier in artificial intelligence, where their embeddings are harnessed as prompts to generate textual content. Vision-language models (VLMs) stand at the forefront of this advancement, offering innovative ways to combine visual and textual data for enhanced understanding and intera… ▽ More

    Submitted 24 August, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

    Comments: 15 pages

    ACM Class: I.2.7; I.4

  50. arXiv:2405.07089  [pdf, other

    cs.HC

    SonifyAR: Context-Aware Sound Generation in Augmented Reality

    Authors: Xia Su, Jon E. Froehlich, Eunyee Koh, Chang Xiao

    Abstract: Sound plays a crucial role in enhancing user experience and immersiveness in Augmented Reality (AR). However, current platforms lack support for AR sound authoring due to limited interaction types, challenges in collecting and specifying context information, and difficulty in acquiring matching sound assets. We present SonifyAR, an LLM-based AR sound authoring system that generates context-aware s… ▽ More

    Submitted 11 August, 2024; v1 submitted 11 May, 2024; originally announced May 2024.

    Comments: To appear in UIST2024