Skip to main content

Showing 1–50 of 247 results for author: Cao, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.09491  [pdf, other

    cs.SD eess.AS

    A Transcription Prompt-based Efficient Audio Large Language Model for Robust Speech Recognition

    Authors: Yangze Li, Xiong Wang, Songjun Cao, Yike Zhang, Long Ma, Lei Xie

    Abstract: Audio-LLM introduces audio modality into a large language model (LLM) to enable a powerful LLM to recognize, understand, and generate audio. However, during speech recognition in noisy environments, we observed the presence of illusions and repetition issues in audio-LLM, leading to substitution and insertion errors. This paper proposes a transcription prompt-based audio-LLM by introducing an ASR… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  2. arXiv:2408.09202  [pdf, other

    cs.LG

    NDDEs: A Deep Neural Network Framework for Solving Forward and Inverse Problems in Delay Differential Equations

    Authors: Housen Wang, Yuxing Chen, Sirong Cao, Xiaoli Wang, Qiang Liu

    Abstract: This article proposes a solution framework for delay differential equations (DDEs) based on deep neural networks (DNNs) - the neural delay differential equations (NDDEs), aimed at solving the forward and inverse problems of delay differential equations. This framework embeds the delay differential equations into the neural networks to accommodate the diverse requirements of DDEs in terms of initia… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  3. arXiv:2408.09055  [pdf, other

    cs.DC quant-ph

    Atlas: Hierarchical Partitioning for Quantum Circuit Simulation on GPUs (Extended Version)

    Authors: Mingkuan Xu, Shiyi Cao, Xupeng Miao, Umut A. Acar, Zhihao Jia

    Abstract: This paper presents techniques for theoretically and practically efficient and scalable Schrödinger-style quantum circuit simulation. Our approach partitions a quantum circuit into a hierarchy of subcircuits and simulates the subcircuits on multi-node GPUs, exploiting available data parallelism while minimizing communication costs. To minimize communication costs, we formulate an Integer Linear Pr… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 20 pages, 37 figures, extended version of the paper presented in SC24

  4. arXiv:2408.06840  [pdf, other

    cs.CV

    Dynamic and Compressive Adaptation of Transformers From Images to Videos

    Authors: Guozhen Zhang, Jingyu Liu, Shengming Cao, Xiaotong Zhao, Kevin Zhao, Kai Ma, Limin Wang

    Abstract: Recently, the remarkable success of pre-trained Vision Transformers (ViTs) from image-text matching has sparked an interest in image-to-video adaptation. However, most current approaches retain the full forward pass for each frame, leading to a high computation overhead for processing entire videos. In this paper, we present InTI, a novel approach for compressive image-to-video adaptation using dy… ▽ More

    Submitted 13 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

  5. arXiv:2408.06003  [pdf, other

    cs.AR cs.LG

    LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration

    Authors: Zhiwen Mo, Lei Wang, Jianyu Wei, Zhichen Zeng, Shijie Cao, Lingxiao Ma, Naifeng Jing, Ting Cao, Jilong Xue, Fan Yang, Mao Yang

    Abstract: As large language model (LLM) inference demands ever-greater resources, there is a rapid growing trend of using low-bit weights to shrink memory usage and boost inference efficiency. However, these low-bit LLMs introduce the need for mixed-precision matrix multiplication (mpGEMM), which is a crucial yet under-explored operation that involves multiplying lower-precision weights with higher-precisio… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  6. arXiv:2408.02693  [pdf, other

    physics.comp-ph cs.AI

    Diff-PIC: Revolutionizing Particle-In-Cell Simulation for Advancing Nuclear Fusion with Diffusion Models

    Authors: Chuan Liu, Chunshu Wu, Shihui Cao, Mingkai Chen, James Chenhao Liang, Ang Li, Michael Huang, Chuang Ren, Dongfang Liu, Ying Nian Wu, Tong Geng

    Abstract: Sustainable energy is a crucial global challenge, and recent breakthroughs in nuclear fusion ignition underscore the potential of harnessing energy extracted from nuclear fusion in everyday life, thereby drawing significant attention to fusion ignition research, especially Laser-Plasma Interaction (LPI). Unfortunately, the complexity of LPI at ignition scale renders theory-based analysis nearly im… ▽ More

    Submitted 19 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

  7. arXiv:2408.02065  [pdf, other

    cs.LG stat.ML

    A Multi-class Ride-hailing Service Subsidy System Utilizing Deep Causal Networks

    Authors: Zhe Yu, Chi Xia, Shaosheng Cao, Lin Zhou

    Abstract: In the ride-hailing industry, subsidies are predominantly employed to incentivize consumers to place more orders, thereby fostering market growth. Causal inference techniques are employed to estimate the consumer elasticity with different subsidy levels. However, the presence of confounding effects poses challenges in achieving an unbiased estimate of the uplift effect. We introduce a consumer sub… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  8. arXiv:2408.01841  [pdf, other

    cs.RO

    BEVPlace++: Fast, Robust, and Lightweight LiDAR Global Localization for Unmanned Ground Vehicles

    Authors: Lun Luo, Si-Yuan Cao, Xiaorui Li, Jintao Xu, Rui Ai, Zhu Yu, Xieyuanli Chen

    Abstract: This article introduces BEVPlace++, a novel, fast, and robust LiDAR global localization method for unmanned ground vehicles. It uses lightweight convolutional neural networks (CNNs) on Bird's Eye View (BEV) image-like representations of LiDAR data to achieve accurate global localization through place recognition followed by 3-DoF pose estimation. Our detailed analyses reveal an interesting fact th… ▽ More

    Submitted 9 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

    Comments: Under review

  9. arXiv:2407.18949  [pdf, other

    cs.CV cs.AI

    Predicting Winning Captions for Weekly New Yorker Comics

    Authors: Stanley Cao, Sonny Young

    Abstract: Image captioning using Vision Transformers (ViTs) represents a pivotal convergence of computer vision and natural language processing, offering the potential to enhance user experiences, improve accessibility, and provide textual representations of visual data. This paper explores the application of image captioning techniques to New Yorker cartoons, aiming to generate captions that emulate the wi… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  10. arXiv:2407.17428  [pdf, other

    cs.CV cs.AI

    Vision Language Model-Empowered Contract Theory for AIGC Task Allocation in Teleoperation

    Authors: Zijun Zhan, Yaxian Dong, Yuqing Hu, Shuai Li, Shaohua Cao, Zhu Han

    Abstract: Integrating low-light image enhancement techniques, in which diffusion-based AI-generated content (AIGC) models are promising, is necessary to enhance nighttime teleoperation. Remarkably, the AIGC model is computation-intensive, thus necessitating the allocation of AIGC tasks to edge servers with ample computational resources. Given the distinct cost of the AIGC model trained with varying-sized da… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 11 pages, 10 figures

  11. arXiv:2407.16909  [pdf, other

    cs.CY cs.HC cs.RO

    Using Helium Balloon Flying Drones for Introductory CS Education

    Authors: Stanley Cao, Christopher Gregg

    Abstract: In the rapidly evolving field of computer science education, novel approaches to teaching fundamental concepts are crucial for engaging a diverse student body. Given the growing demand for a computing-skilled workforce, it is essential to adapt educational methods to capture the interest of a broader audience than what current computing education typically targets. Engaging educational experiences… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  12. arXiv:2407.12273  [pdf, other

    cs.CV

    GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity

    Authors: Shuo Cao, Yihao Liu, Wenlong Zhang, Yu Qiao, Chao Dong

    Abstract: Traditional single-task image restoration methods excel in handling specific degradation types but struggle with multiple degradations. To address this limitation, we propose Grouped Restoration with Image Degradation Similarity (GRIDS), a novel approach that harmonizes the competing objectives inherent in multiple-degradation restoration. We first introduce a quantitative method for assessing rel… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  13. arXiv:2407.11998  [pdf, other

    cs.HC

    Custom Cloth Creation and Virtual Try-on for Everyone

    Authors: Pei Chen, Heng Wang, Sainan Sun, Zhiyuan Chen, Zhenkun Liu, Shuhua Cao, Li Yang, Minghui Yang

    Abstract: This demo showcases a simple tool that utilizes AIGC technology, enabling both professional designers and regular users to easily customize clothing for their digital avatars. Customization options include changing clothing colors, textures, logos, and patterns. Compared with traditional 3D modeling processes, our approach significantly enhances efficiency and interactivity and reduces production… ▽ More

    Submitted 13 June, 2024; originally announced July 2024.

  14. arXiv:2407.11008  [pdf, other

    cs.CL cs.AI cs.CV

    Figuring out Figures: Using Textual References to Caption Scientific Figures

    Authors: Stanley Cao, Kevin Liu

    Abstract: Figures are essential channels for densely communicating complex ideas in scientific papers. Previous work in automatically generating figure captions has been largely unsuccessful and has defaulted to using single-layer LSTMs, which no longer achieve state-of-the-art performance. In our work, we use the SciCap datasets curated by Hsu et al. and use a variant of a CLIP+GPT-2 encoder-decoder model… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  15. arXiv:2407.08377  [pdf, other

    cs.CV

    Long-range Turbulence Mitigation: A Large-scale Dataset and A Coarse-to-fine Framework

    Authors: Shengqi Xu, Run Sun, Yi Chang, Shuning Cao, Xueyao Xiao, Luxin Yan

    Abstract: Long-range imaging inevitably suffers from atmospheric turbulence with severe geometric distortions due to random refraction of light. The further the distance, the more severe the disturbance. Despite existing research has achieved great progress in tackling short-range turbulence, there is less attention paid to long-range turbulence with significant distortions. To address this dilemma and adva… ▽ More

    Submitted 17 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: This paper is accepted by ECCV 2024

  16. arXiv:2407.08148  [pdf, other

    cs.CV

    SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning

    Authors: Runmin Zhang, Jun Ma, Si-Yuan Cao, Lun Luo, Beinan Yu, Shu-Jie Chen, Junwei Li, Hui-Liang Shen

    Abstract: We propose a novel unsupervised cross-modal homography estimation framework based on intra-modal Self-supervised learning, Correlation, and consistent feature map Projection, namely SCPNet. The concept of intra-modal self-supervised learning is first presented to facilitate the unsupervised cross-modal homography estimation. The correlation-based homography estimation network and the consistent fe… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  17. arXiv:2407.08049  [pdf, other

    cs.CV eess.SP eess.SY

    Deep Learning-Based Robust Multi-Object Tracking via Fusion of mmWave Radar and Camera Sensors

    Authors: Lei Cheng, Arindam Sengupta, Siyang Cao

    Abstract: Autonomous driving holds great promise in addressing traffic safety concerns by leveraging artificial intelligence and sensor technology. Multi-Object Tracking plays a critical role in ensuring safer and more efficient navigation through complex traffic scenarios. This paper presents a novel deep learning-based method that integrates radar and camera data to enhance the accuracy and robustness of… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Published in IEEE Transactions on Intelligent Transportation Systems

  18. arXiv:2407.00088  [pdf, other

    cs.DC cs.AI

    T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge

    Authors: Jianyu Wei, Shijie Cao, Ting Cao, Lingxiao Ma, Lei Wang, Yanyong Zhang, Mao Yang

    Abstract: The deployment of Large Language Models (LLMs) on edge devices is increasingly important to enhance on-device intelligence. Weight quantization is crucial for reducing the memory footprint of LLMs on devices. However, low-bit LLMs necessitate mixed precision matrix multiplication (mpGEMM) of low precision weights and high precision activations during inference. Existing systems, lacking native sup… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  19. arXiv:2406.19227  [pdf, other

    cs.CL

    Aligning Teacher with Student Preferences for Tailored Training Data Generation

    Authors: Yantao Liu, Zhao Zhang, Zijun Yao, Shulin Cao, Lei Hou, Juanzi Li

    Abstract: Large Language Models (LLMs) have shown significant promise as copilots in various tasks. Local deployment of LLMs on edge devices is necessary when handling privacy-sensitive data or latency-sensitive tasks. The computational constraints of such devices make direct deployment of powerful large-scale LLMs impractical, necessitating the Knowledge Distillation from large-scale models to lightweight… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  20. arXiv:2406.19215  [pdf, other

    cs.CL

    SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation

    Authors: Zijun Yao, Weijian Qi, Liangming Pan, Shulin Cao, Linmei Hu, Weichuan Liu, Lei Hou, Juanzi Li

    Abstract: This paper introduces Self-aware Knowledge Retrieval (SeaKR), a novel adaptive RAG model that extracts self-aware uncertainty of LLMs from their internal states. SeaKR activates retrieval when the LLMs present high self-aware uncertainty for generation. To effectively integrate retrieved knowledge snippets, SeaKR re-ranks them based on LLM's self-aware uncertainty to preserve the snippet that redu… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  21. arXiv:2406.17145  [pdf, other

    cs.DC cs.AI cs.LG

    GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism

    Authors: Byungsoo Jeon, Mengdi Wu, Shiyi Cao, Sunghyun Kim, Sunghyun Park, Neeraj Aggarwal, Colin Unger, Daiyaan Arfeen, Peiyuan Liao, Xupeng Miao, Mohammad Alizadeh, Gregory R. Ganger, Tianqi Chen, Zhihao Jia

    Abstract: Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in existing DNN systems to support large-scale DNN training by partitioning a DNN into multiple stages, which concurrently perform DNN training for different micro-batches in a pipeline fashion. However, existing pipeline-parallel approaches only c… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  22. arXiv:2406.16439  [pdf, other

    cs.CV

    Exploring Test-Time Adaptation for Object Detection in Continually Changing Environments

    Authors: Shilei Cao, Yan Liu, Juepeng Zheng, Weijia Li, Runmin Dong, Haohuan Fu

    Abstract: Real-world application models are commonly deployed in dynamic environments, where the target domain distribution undergoes temporal changes. Continual Test-Time Adaptation (CTTA) has recently emerged as a promising technique to gradually adapt a source-trained model to continually changing target domains. Despite recent advancements in addressing CTTA, two critical issues remain: 1) Fixed thresho… ▽ More

    Submitted 18 August, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

  23. arXiv:2406.12793  [pdf, other

    cs.CL

    ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

    Authors: Team GLM, :, Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Dan Zhang, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Jingyu Sun, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong , et al. (34 additional authors not shown)

    Abstract: We introduce ChatGLM, an evolving family of large language models that we have been developing over time. This report primarily focuses on the GLM-4 language series, which includes GLM-4, GLM-4-Air, and GLM-4-9B. They represent our most capable models that are trained with all the insights and lessons gained from the preceding three generations of ChatGLM. To date, the GLM-4 models are pre-trained… ▽ More

    Submitted 29 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  24. arXiv:2406.10467  [pdf, other

    cs.DS

    Scheduling two types of jobs with minimum makespan

    Authors: Song Cao, Kai Jin

    Abstract: We consider scheduling two types of jobs (A-job and B-job) to $p$ machines and minimizing their makespan. A group of same type of jobs processed consecutively by a machine is called a batch. For machine $v$, processing $x$ A-jobs in a batch takes $k^A_vx^2$ time units for a given speed $k^A_v$, and processing $x$ B-jobs in a batch takes $k^B_vx^2$ time units for a given speed $k^B_v$. We give an… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  25. arXiv:2406.08079  [pdf, other

    cs.CV

    A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder

    Authors: Lixian Zhang, Yi Zhao, Runmin Dong, Jinxiao Zhang, Shuai Yuan, Shilei Cao, Mengxuan Chen, Juepeng Zheng, Weijia Li, Wei Liu, Wayne Zhang, Litong Feng, Haohuan Fu

    Abstract: Vast amounts of remote sensing (RS) data provide Earth observations across multiple dimensions, encompassing critical spatial, temporal, and spectral information which is essential for addressing global-scale challenges such as land use monitoring, disaster prevention, and environmental change mitigation. Despite various pre-training methods tailored to the characteristics of RS data, a key limita… ▽ More

    Submitted 16 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  26. arXiv:2406.06125  [pdf, other

    cs.CL

    Verifiable Generation with Subsentence-Level Fine-Grained Citations

    Authors: Shuyang Cao, Lu Wang

    Abstract: Verifiable generation requires large language models (LLMs) to cite source documents supporting their outputs, thereby improve output transparency and trustworthiness. Yet, previous work mainly targets the generation of sentence-level citations, lacking specificity about which parts of a sentence are backed by the cited sources. This work studies verifiable generation with subsentence-level fine-g… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: NAACL 2024 Findings

  27. arXiv:2406.04271  [pdf, other

    cs.CL

    Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

    Authors: Ling Yang, Zhaochen Yu, Tianjun Zhang, Shiyi Cao, Minkai Xu, Wentao Zhang, Joseph E. Gonzalez, Bin Cui

    Abstract: We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs). Specifically, we propose meta-buffer to store a series of informative high-level thoughts, namely thought-template, distilled from the problem-solving processes across various tasks. Then for each problem, we retrieve a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project: https://github.com/YangLing0818/buffer-of-thought-llm

  28. arXiv:2405.17211  [pdf, other

    cs.LG math.NA physics.flu-dyn

    Spectral-Refiner: Fine-Tuning of Accurate Spatiotemporal Neural Operator for Turbulent Flows

    Authors: Shuhao Cao, Francesco Brarda, Ruipeng Li, Yuanzhe Xi

    Abstract: Recent advancements in operator-type neural networks have shown promising results in approximating the solutions of spatiotemporal Partial Differential Equations (PDEs). However, these neural networks often entail considerable training expenses, and may not always achieve the desired accuracy required in many scientific and engineering disciplines. In this paper, we propose a new Spatiotemporal Fo… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    MSC Class: 65M70 (Primary); 35Q30; 76M22; 65M50; 68T07 (Secondary)

  29. arXiv:2405.16038  [pdf, other

    cs.CV

    Rethinking Early-Fusion Strategies for Improved Multispectral Object Detection

    Authors: Xue Zhang, Si-Yuan Cao, Fang Wang, Runmin Zhang, Zhe Wu, Xiaohan Zhang, Xiaokai Bai, Hui-Liang Shen

    Abstract: Most recent multispectral object detectors employ a two-branch structure to extract features from RGB and thermal images. While the two-branch structure achieves better performance than a single-branch structure, it overlooks inference efficiency. This conflict is increasingly aggressive, as recent works solely pursue higher performance rather than both performance and efficiency. In this paper, w… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  30. arXiv:2405.13675  [pdf, other

    cs.CV

    Context and Geometry Aware Voxel Transformer for Semantic Scene Completion

    Authors: Zhu Yu, Runming Zhang, Jiacheng Ying, Junchen Yu, Xiaohai Hu, Lun Luo, Siyuan Cao, Huiliang Shen

    Abstract: Vision-based Semantic Scene Completion (SSC) has gained much attention due to its widespread applications in various 3D perception tasks. Existing sparse-to-dense approaches typically employ shared context-independent queries across various input images, which fails to capture distinctions among them as the focal regions of different inputs vary and may result in undirected feature aggregation of… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  31. arXiv:2405.11414  [pdf, other

    cs.CY cs.CE cs.MA physics.soc-ph

    High-Resolution Agent-Based Modeling of Campus Population Behaviors for Pandemic Response Planning

    Authors: Hiroki Sayama, Shun Cao

    Abstract: This paper reports a case study of an application of high-resolution agent-based modeling and simulation to pandemic response planning on a university campus. In the summer of 2020, we were tasked with a COVID-19 pandemic response project to create a detailed behavioral simulation model of the entire campus population at Binghamton University. We conceptualized this problem as an agent migration p… ▽ More

    Submitted 18 May, 2024; originally announced May 2024.

    Comments: 14 pages, 6 figures; submitted to PPAM 2024 (under review)

  32. arXiv:2405.03239  [pdf, other

    cs.LG cs.AI

    Deep Learning for Detecting and Early Predicting Chronic Obstructive Pulmonary Disease from Spirogram Time Series: A UK Biobank Study

    Authors: Shuhao Mei, Yuxi Zhou, Jiahao Xu, Yuxuan Wan, Shan Cao, Qinghao Zhao, Shijia Geng, Junqing Xie, Shenda Hong

    Abstract: Chronic Obstructive Pulmonary Disease (COPD) is a chronic inflammatory lung condition that causes airflow obstruction. The existing methods can only detect patients who already have COPD based on obvious features shown in the spirogram (In this article, the spirogram specifically involves measuring Volume-Flow curve time series). Early prediction of COPD risk is vital for monitoring COPD disease p… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  33. arXiv:2405.00579  [pdf, other

    cs.GT

    LEAP: Optimization Hierarchical Federated Learning on Non-IID Data with Coalition Formation Game

    Authors: Jianfeng Lu, Yue Chen, Shuqin Cao, Longbiao Chen, Wei Wang, Yun Xin

    Abstract: Although Hierarchical Federated Learning (HFL) utilizes edge servers (ESs) to alleviate communication burdens, its model performance will be degraded by non-IID data and limited communication resources. Current works often assume that data is uniformly distributed, which however contradicts the heterogeneity of IoT. Solutions of additional model training to check the data distribution inevitably i… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  34. arXiv:2404.18084  [pdf, other

    cs.NI

    Age-minimal Multicast by Graph Attention Reinforcement Learning

    Authors: Yanning Zhang, Guocheng Liao, Shengbin Cao, Ning Yang, Meng Zhang

    Abstract: Age of Information (AoI) is an emerging metric used to assess the timeliness of information, gaining research interest in real-time multicast applications such as video streaming and metaverse platforms. In this paper, we consider a dynamic multicast network with energy constraints, where our objective is to minimize the expected time-average AoI through energy-constrained multicast routing and sc… ▽ More

    Submitted 31 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  35. arXiv:2404.12386  [pdf, other

    cs.CV cs.LG

    SOHES: Self-supervised Open-world Hierarchical Entity Segmentation

    Authors: Shengcao Cao, Jiuxiang Gu, Jason Kuen, Hao Tan, Ruiyi Zhang, Handong Zhao, Ani Nenkova, Liang-Yan Gui, Tong Sun, Yu-Xiong Wang

    Abstract: Open-world entity segmentation, as an emerging computer vision task, aims at segmenting entities in images without being restricted by pre-defined classes, offering impressive generalization capabilities on unseen images and concepts. Despite its promise, existing entity segmentation methods like Segment Anything Model (SAM) rely heavily on costly expert annotators. This work presents Self-supervi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

    Comments: ICLR 2024

  36. arXiv:2404.10160  [pdf, other

    cs.AI

    Reinforcement Learning from Multi-role Debates as Feedback for Bias Mitigation in LLMs

    Authors: Ruoxi Cheng, Haoxuan Ma, Shuirong Cao, Jiaqi Li, Aihua Pei, Zhiqiang Wang, Pengliang Ji, Haoyu Wang, Jiaqi Huo

    Abstract: Bias in LLMs can harm user experience and societal outcomes. However, current bias mitigation methods often require intensive human feedback, lack transferability to other topics or yield overconfident and random outputs. We find that involving LLMs in role-playing scenario boosts their ability to recognize and mitigate biases. Based on this, we propose Reinforcement Learning from Multi-role Debat… ▽ More

    Submitted 16 August, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: The first three authors contributed equally to this work

  37. arXiv:2404.09408  [pdf, other

    cs.NI

    A Distributed Scalable Cross-chain State Channel Scheme Based on Recursive State Synchronization

    Authors: Xinyu Liang, Ruiying Du, Jing Chen, Yu Zhang, Meng Jia, Shuangxi Cao, Yufeng Wei, Shixiong Yao

    Abstract: As cross-chain technology continues to advance, the scale of cross-chain transactions is experiencing significant expansion. To improve scalability, researchers have turned to the study of cross-chain state channels. However, most of the existing schemes rely on trusted parties to support channel operations. To address this issue, we present Interpipe: a distributed cross-chain state channel schem… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  38. arXiv:2404.07671  [pdf

    cs.CV

    Deep learning-driven pulmonary arteries and veins segmentation reveals demography-associated pulmonary vasculature anatomy

    Authors: Yuetan Chu, Gongning Luo, Longxi Zhou, Shaodong Cao, Guolin Ma, Xianglin Meng, Juexiao Zhou, Changchun Yang, Dexuan Xie, Ricardo Henao, Xigang Xiao, Lianming Wu, Zhaowen Qiu, Xin Gao

    Abstract: Pulmonary artery-vein segmentation is crucial for diagnosing pulmonary diseases and surgical planning, and is traditionally achieved by Computed Tomography Pulmonary Angiography (CTPA). However, concerns regarding adverse health effects from contrast agents used in CTPA have constrained its clinical utility. In contrast, identifying arteries and veins using non-contrast CT, a conventional and low-… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  39. arXiv:2404.05825  [pdf, other

    cs.IR cs.AI

    LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding

    Authors: Mingrui Wu, Sheng Cao

    Abstract: Recently embedding-based retrieval or dense retrieval have shown state of the art results, compared with traditional sparse or bag-of-words based approaches. This paper introduces a model-agnostic doc-level embedding framework through large language model (LLM) augmentation. In addition, it also improves some important components in the retrieval model training process, such as negative sampling,… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  40. arXiv:2404.03577  [pdf, other

    cs.CL

    Untangle the KNOT: Interweaving Conflicting Knowledge and Reasoning Skills in Large Language Models

    Authors: Yantao Liu, Zijun Yao, Xin Lv, Yuchen Fan, Shulin Cao, Jifan Yu, Lei Hou, Juanzi Li

    Abstract: Providing knowledge documents for large language models (LLMs) has emerged as a promising solution to update the static knowledge inherent in their parameters. However, knowledge in the document may conflict with the memory of LLMs due to outdated or incorrect knowledge in the LLMs' parameters. This leads to the necessity of examining the capability of LLMs to assimilate supplemental external know… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

    Comments: Accepted by LREC-COLING 2024 as long paper

  41. arXiv:2404.02525  [pdf, other

    cs.SE

    Large Language Model for Vulnerability Detection and Repair: Literature Review and the Road Ahead

    Authors: Xin Zhou, Sicong Cao, Xiaobing Sun, David Lo

    Abstract: The significant advancements in Large Language Models (LLMs) have resulted in their widespread adoption across various tasks within Software Engineering (SE), including vulnerability detection and repair. Numerous recent studies have investigated the application of LLMs to enhance vulnerability detection and repair tasks. Despite the increasing research interest, there is currently no existing sur… ▽ More

    Submitted 6 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: 11 pages

  42. arXiv:2404.00349  [pdf, other

    cs.CV

    SGDFormer: One-stage Transformer-based Architecture for Cross-Spectral Stereo Image Guided Denoising

    Authors: Runmin Zhang, Zhu Yu, Zehua Sheng, Jiacheng Ying, Si-Yuan Cao, Shu-Jie Chen, Bailin Yang, Junwei Li, Hui-Liang Shen

    Abstract: Cross-spectral image guided denoising has shown its great potential in recovering clean images with rich details, such as using the near-infrared image to guide the denoising process of the visible one. To obtain such image pairs, a feasible and economical way is to employ a stereo system, which is widely used on mobile devices. Current works attempt to generate an aligned guidance image to handle… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  43. arXiv:2403.17708  [pdf, other

    cs.CV cs.HC cs.MM

    Panonut360: A Head and Eye Tracking Dataset for Panoramic Video

    Authors: Yutong Xu, Junhao Du, Jiahe Wang, Yuwei Ning, Sihan Zhou Yang Cao

    Abstract: With the rapid development and widespread application of VR/AR technology, maximizing the quality of immersive panoramic video services that match users' personal preferences and habits has become a long-standing challenge. Understanding the saliency region where users focus, based on data collected with HMDs, can promote multimedia encoding, transmission, and quality assessment. At the same time,… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: 7 pages,ACM MMSys'24 accepted

  44. arXiv:2403.14135  [pdf, other

    eess.IV cs.CV

    Powerful Lossy Compression for Noisy Images

    Authors: Shilv Cai, Xiaoguo Liang, Shuning Cao, Luxin Yan, Sheng Zhong, Liqun Chen, Xu Zou

    Abstract: Image compression and denoising represent fundamental challenges in image processing with many real-world applications. To address practical demands, current solutions can be categorized into two main strategies: 1) sequential method; and 2) joint method. However, sequential methods have the disadvantage of error accumulation as there is information loss between multiple individual models. Recentl… ▽ More

    Submitted 26 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by ICME 2024

  45. arXiv:2403.07022  [pdf, other

    cs.LG cs.AI

    A Unified Model for Spatio-Temporal Prediction Queries with Arbitrary Modifiable Areal Units

    Authors: Liyue Chen, Jiangyi Fang, Tengfei Liu, Shaosheng Cao, Leye Wang

    Abstract: Spatio-Temporal (ST) prediction is crucial for making informed decisions in urban location-based applications like ride-sharing. However, existing ST models often require region partition as a prerequisite, resulting in two main pitfalls. Firstly, location-based services necessitate ad-hoc regions for various purposes, requiring multiple ST models with varying scales and zones, which can be costly… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Accepted by ICDE 2024

  46. arXiv:2403.05821  [pdf, other

    cs.LG cs.DB

    Optimizing LLM Queries in Relational Workloads

    Authors: Shu Liu, Asim Biswal, Audrey Cheng, Xiangxi Mo, Shiyi Cao, Joseph E. Gonzalez, Ion Stoica, Matei Zaharia

    Abstract: Analytical database providers (e.g., Redshift, Databricks, BigQuery) have rapidly added support for invoking Large Language Models (LLMs) through native user-defined functions (UDFs) to help users perform natural language tasks, such as classification, entity extraction, and translation, inside analytical workloads. For instance, an analyst might want to extract customer sentiments on millions of… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  47. arXiv:2403.04437  [pdf, other

    cs.CV

    StableDrag: Stable Dragging for Point-based Image Editing

    Authors: Yutao Cui, Xiaotong Zhao, Guozhen Zhang, Shengming Cao, Kai Ma, Limin Wang

    Abstract: Point-based image editing has attracted remarkable attention since the emergence of DragGAN. Recently, DragDiffusion further pushes forward the generative quality via adapting this dragging technique to diffusion models. Despite these great success, this dragging scheme exhibits two major drawbacks, namely inaccurate point tracking and incomplete motion supervision, which may result in unsatisfact… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  48. arXiv:2402.18490  [pdf, other

    cs.CV

    TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding

    Authors: Zhihao Zhang, Shengcao Cao, Yu-Xiong Wang

    Abstract: The limited scale of current 3D shape datasets hinders the advancements in 3D shape understanding, and motivates multi-modal learning approaches which transfer learned knowledge from data-abundant 2D image and language modalities to 3D shapes. However, even though the image and language representations have been aligned by cross-modal models like CLIP, we find that the image modality fails to cont… ▽ More

    Submitted 1 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: This paper is accepted by CVPR 2024

  49. arXiv:2402.18070  [pdf, other

    cs.AR eess.SP

    A Hierarchical Dataflow-Driven Heterogeneous Architecture for Wireless Baseband Processing

    Authors: Limin Jiang, Yi Shi, Haiqin Hu, Qingyu Deng, Siyi Xu, Yintao Liu, Feng Yuan, Si Wang, Yihao Shen, Fangfang Ye, Shan Cao, Zhiyuan Jiang

    Abstract: Wireless baseband processing (WBP) is a key element of wireless communications, with a series of signal processing modules to improve data throughput and counter channel fading. Conventional hardware solutions, such as digital signal processors (DSPs) and more recently, graphic processing units (GPUs), provide various degrees of parallelism, yet they both fail to take into account the cyclical and… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 7 pages, 7 figures, conference

  50. arXiv:2402.10631  [pdf, other

    cs.CL

    BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation

    Authors: Dayou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu, Ningyi Xu

    Abstract: The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges. Weight quantization has emerged as a widely embraced solution to reduce memory and computational demands. This paper introduces BitDistiller, a framework that synergizes Quantization-Aware Training (QAT) with Knowledge Distillation (KD)… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.