Skip to main content

Showing 1–50 of 1,966 results for author: Zhang, R

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.09675  [pdf, other

    cs.AI cs.MA cs.RO

    Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey

    Authors: Ruiqi Zhang, Jing Hou, Florian Walter, Shangding Gu, Jiayi Guan, Florian Röhrbein, Yali Du, Panpan Cai, Guang Chen, Alois Knoll

    Abstract: Reinforcement Learning (RL) is a potent tool for sequential decision-making and has achieved performance surpassing human capabilities across many challenging real-world tasks. As the extension of RL in the multi-agent system domain, multi-agent RL (MARL) not only need to learn the control policy but also requires consideration regarding interactions with all other agents in the environment, mutua… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 23 pages, 6 figures and 2 tables. Submitted to IEEE Journal

  2. arXiv:2408.09485  [pdf, other

    cs.CL

    Activated Parameter Locating via Causal Intervention for Model Merging

    Authors: Fanshuang Kong, Richong Zhang, Ziqiao Wang

    Abstract: Model merging combines multiple homologous models into one model, achieving convincing generalization without the necessity of additional training. A key challenge in this problem is resolving parameter redundancies and conflicts across multiple models. Existing models have demonstrated that dropping a portion of delta parameters can alleviate conflicts while maintaining performance. However, thes… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  3. arXiv:2408.09199  [pdf, other

    cs.IR

    TC-RAG:Turing-Complete RAG's Case study on Medical LLM Systems

    Authors: Xinke Jiang, Yue Fang, Rihong Qiu, Haoyu Zhang, Yongxin Xu, Hao Chen, Wentao Zhang, Ruizhe Zhang, Yuchen Fang, Xu Chu, Junfeng Zhao, Yasha Wang

    Abstract: In the pursuit of enhancing domain-specific Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) emerges as a promising solution to mitigate issues such as hallucinations, outdated knowledge, and limited expertise in highly specialized queries. However, existing approaches to RAG fall short by neglecting system state variables, which are crucial for ensuring adaptive control, retriev… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: version 1.0

  4. arXiv:2408.08981  [pdf, other

    cs.IR cs.CL

    From Lazy to Prolific: Tackling Missing Labels in Open Vocabulary Extreme Classification by Positive-Unlabeled Sequence Learning

    Authors: Haoran Ranran Zhang, Bensu Uçar, Soumik Dey, Hansi Wu, Binbin Li, Rui Zhang

    Abstract: Open-vocabulary Extreme Multi-label Classification (OXMC) extends traditional XMC by allowing prediction beyond an extremely large, predefined label set (typically $10^3$ to $10^{12}$ labels), addressing the dynamic nature of real-world labeling tasks. However, self-selection bias in data annotation leads to significant missing labels in both training and test data, particularly for less popular i… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  5. arXiv:2408.08588  [pdf, other

    cs.IT eess.SP

    Movable Antenna for Wireless Communications:Prototyping and Experimental Results

    Authors: Zhenjun Dong, Zhiwen Zhou, Zhiqiang Xiao, Chaoyue Zhang, Xinrui Li, Hongqi Min, Yong Zeng, Shi Jin, Rui Zhang

    Abstract: Movable antenna (MA), which can flexibly change the position of antenna in three-dimensional (3D) continuous space, is an emerging technology for achieving full spatial performance gains. In this paper, a prototype of MA communication system with ultra-accurate movement control is presented to verify the performance gain of MA in practical environments. The prototype utilizes the feedback control… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  6. arXiv:2408.08506  [pdf, other

    cs.CL cs.AI

    Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding

    Authors: Huang Lei, Jiaming Guo, Guanhua He, Xishan Zhang, Rui Zhang, Shaohui Peng, Shaoli Liu, Tianshi Chen

    Abstract: Generating long-term texts such as novels using artificial intelligence has always been a challenge. A common approach is to use large language models (LLMs) to construct a hierarchical framework that first plans and then writes. Despite the fact that the generated novels reach a sufficient length, they exhibit poor logical coherence and appeal in their plots and deficiencies in character and even… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  7. arXiv:2408.08332  [pdf, other

    cs.CV cs.LG

    TurboEdit: Instant text-based image editing

    Authors: Zongze Wu, Nicholas Kolkin, Jonathan Brandt, Richard Zhang, Eli Shechtman

    Abstract: We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models. We introduce an encoder based iterative inversion technique. The inversion network is conditioned on the input image and the reconstructed image from the previous step, allowing for correction of the next reconstruction towards the input image. We demonstrate that disent… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted to European Conference on Computer Vision (ECCV), 2024. Project page: https://betterze.github.io/TurboEdit/

  8. arXiv:2408.06666  [pdf, ps, other

    cs.RO eess.SY

    Design of a Double-joint Robotic Fish Using a Composite Linkage

    Authors: Ruijia Zhang, Wenke Zhou, Min Li, Miao Li

    Abstract: Robotic fish is one of the most promising directions of the new generation of underwater vehicles. Traditional biomimetic fish often mimic fish joints using tandem components like servos, which leads to increased volume, weight and control complexity. In this paper, a new double-joint robotic fish using a composite linkage was designed, where the propulsion mechanism transforms the single-degree-o… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  9. arXiv:2408.06574  [pdf, other

    cs.CL

    SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model

    Authors: Dayong Wu, Jiaqi Li, Baoxin Wang, Honghong Zhao, Siyuan Xue, Yanjie Yang, Zhijun Chang, Rui Zhang, Li Qian, Bo Wang, Shijin Wang, Zhixiong Zhang, Guoping Hu

    Abstract: Large language models (LLMs) have shown remarkable achievements across various language tasks.To enhance the performance of LLMs in scientific literature services, we developed the scientific literature LLM (SciLit-LLM) through pre-training and supervised fine-tuning on scientific literature, building upon the iFLYTEK Spark LLM. Furthermore, we present a knowledge service system Spark Research Ass… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  10. arXiv:2408.05677  [pdf, other

    math.NA cs.LG

    Tensor Decomposition Meets RKHS: Efficient Algorithms for Smooth and Misaligned Data

    Authors: Brett W. Larsen, Tamara G. Kolda, Anru R. Zhang, Alex H. Williams

    Abstract: The canonical polyadic (CP) tensor decomposition decomposes a multidimensional data array into a sum of outer products of finite-dimensional vectors. Instead, we can replace some or all of the vectors with continuous functions (infinite-dimensional vectors) from a reproducing kernel Hilbert space (RKHS). We refer to tensors with some infinite-dimensional modes as quasitensors, and the approach of… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  11. arXiv:2408.03326  [pdf, other

    cs.CV cs.AI cs.CL

    LLaVA-OneVision: Easy Visual Task Transfer

    Authors: Bo Li, Yuanhan Zhang, Dong Guo, Renrui Zhang, Feng Li, Hao Zhang, Kaichen Zhang, Yanwei Li, Ziwei Liu, Chunyuan Li

    Abstract: We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. Our experimental results demonstrate that LLaVA-OneVision is the first single model that can simultaneously push the performance boundaries of open LMMs in three important computer vision scenarios: single-i… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Project Homepage: https://llava-vl.github.io/blog/2024-08-05-llava-onevision/

  12. arXiv:2408.03297  [pdf, other

    cs.CL cs.AI

    KnowPO: Knowledge-aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models

    Authors: Ruizhe Zhang, Yongxin Xu, Yuzhen Xiao, Runchuan Zhu, Xinke Jiang, Xu Chu, Junfeng Zhao, Yasha Wang

    Abstract: By integrating external knowledge, Retrieval-Augmented Generation (RAG) has become an effective strategy for mitigating the hallucination problems that large language models (LLMs) encounter when dealing with knowledge-intensive tasks. However, in the process of integrating external non-parametric supporting evidence with internal parametric knowledge, inevitable knowledge conflicts may arise, lea… ▽ More

    Submitted 19 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  13. arXiv:2408.02561  [pdf, other

    cs.CV

    HQOD: Harmonious Quantization for Object Detection

    Authors: Long Huang, Zhiwei Dong, Song-Lu Chen, Ruiyao Zhang, Shutong Ti, Feng Chen, Xu-Cheng Yin

    Abstract: Task inharmony problem commonly occurs in modern object detectors, leading to inconsistent qualities between classification and regression tasks. The predicted boxes with high classification scores but poor localization positions or low classification scores but accurate localization positions will worsen the performance of detectors after Non-Maximum Suppression. Furthermore, when object detector… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 2024 IEEE International Conference on Multimedia and Expo (ICME), July 15 - July 19, 2024, Niagra Falls, Ontario, Canada

  14. Embedding Compression in Recommender Systems: A Survey

    Authors: Shiwei Li, Huifeng Guo, Xing Tang, Ruiming Tang, Lu Hou, Ruixuan Li, Rui Zhang

    Abstract: To alleviate the problem of information explosion, recommender systems are widely deployed to provide personalized information filtering services. Usually, embedding tables are employed in recommender systems to transform high-dimensional sparse one-hot vectors into dense real-valued embeddings. However, the embedding tables are huge and account for most of the parameters in industrial-scale recom… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM Computing Surveys

    Journal ref: ACM Comput. Surv. 56, 5, Article 130 (January 2024)

  15. arXiv:2408.01510  [pdf, other

    cs.RO cs.LG

    Adaptive Planning with Generative Models under Uncertainty

    Authors: Pascal Jutras-Dubé, Ruqi Zhang, Aniket Bera

    Abstract: Planning with generative models has emerged as an effective decision-making paradigm across a wide range of domains, including reinforcement learning and autonomous navigation. While continuous replanning at each timestep might seem intuitive because it allows decisions to be made based on the most recent environmental observations, it results in substantial computational challenges, primarily due… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  16. arXiv:2408.01072  [pdf, other

    cs.AI

    A Survey on Self-play Methods in Reinforcement Learning

    Authors: Ruize Zhang, Zelai Xu, Chengdong Ma, Chao Yu, Wei-Wei Tu, Shiyu Huang, Deheng Ye, Wenbo Ding, Yaodong Yang, Yu Wang

    Abstract: Self-play, characterized by agents' interactions with copies or past versions of itself, has recently gained prominence in reinforcement learning. This paper first clarifies the preliminaries of self-play, including the multi-agent reinforcement learning framework and basic game theory concepts. Then it provides a unified framework and classifies existing self-play algorithms within this framework… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  17. arXiv:2408.00859  [pdf, other

    cs.AI cs.IR

    LICM: Effective and Efficient Long Interest Chain Modeling for News Recommendation

    Authors: Zhen Yang, Wenhui Wang, Tao Qi, Peng Zhang, Tianyun Zhang, Ru Zhang, Jianyi Liu, Yongfeng Huang

    Abstract: Accurately recommending personalized candidate news articles to users has always been the core challenge of news recommendation system. News recommendations often require modeling of user interests to match candidate news. Recent efforts have primarily focused on extract local subgraph information, the lack of a comprehensive global news graph extraction has hindered the ability to utilize global… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  18. arXiv:2408.00619  [pdf, other

    cs.CV

    Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection

    Authors: Ruiyang Zhang, Hu Zhang, Hang Yu, Zhedong Zheng

    Abstract: Unsupervised 3D object detection aims to identify objects of interest from unlabeled raw data, such as LiDAR points. Recent approaches usually adopt pseudo 3D bounding boxes (3D bboxes) from clustering algorithm to initialize the model training, and then iteratively updating both pseudo labels and the trained model. However, pseudo bboxes inevitably contain noises, and such inaccurate annotation a… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Preprint, 14 pages, 4 figures, 4 tables

  19. Decomposed Prompting to Answer Questions on a Course Discussion Board

    Authors: Brandon Jaipersaud, Paul Zhang, Jimmy Ba, Andrew Petersen, Lisa Zhang, Michael R. Zhang

    Abstract: We propose and evaluate a question-answering system that uses decomposed prompting to classify and answer student questions on a course discussion board. Our system uses a large language model (LLM) to classify questions into one of four types: conceptual, homework, logistics, and not answerable. This enables us to employ a different strategy for answering questions that fall under different types… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 6 pages. Published at International Conference on Artificial Intelligence in Education 2023. Code repository: https://github.com/brandonjaipersaud/piazza-qabot-gpt

    Journal ref: In: Artificial Intelligence in Education. AIED 2023. Communications in Computer and Information Science, vol 1831. Springer, Cham

  20. arXiv:2407.20962  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

    Authors: Xiaowei Chi, Yatian Wang, Aosong Cheng, Pengjun Fang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, Mengfei Li, Ruibin Yuan, Yanbing Jiang, Wei Xue, Wenhan Luo, Qifeng Chen, Shanghang Zhang, Qifeng Liu, Yike Guo

    Abstract: Massive multi-modality datasets play a significant role in facilitating the success of large video-language models. However, current video-language datasets primarily provide text descriptions for visual frames, considering audio to be weakly related information. They usually overlook exploring the potential of inherent audio-visual correlation, leading to monotonous annotation within each modalit… ▽ More

    Submitted 6 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: 15 Pages. Dataset report

  21. arXiv:2407.20544  [pdf, other

    cs.CR cs.AR

    Automated Physical Design Watermarking Leveraging Graph Neural Networks

    Authors: Ruisi Zhang, Rachel Selina Rajarathnam, David Z. Pan, Farinaz Koushanfar

    Abstract: This paper presents AutoMarks, an automated and transferable watermarking framework that leverages graph neural networks to reduce the watermark search overheads during the placement stage. AutoMarks's novel automated watermark search is accomplished by (i) constructing novel graph and node features with physical, semantic, and design constraint-aware representation; (ii) designing a data-efficien… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: accept to MLCAD24, code: https://github.com/ruisizhang123/PD_WM_GNN

  22. arXiv:2407.20223  [pdf, other

    cs.CV cs.RO

    Correspondence-Free SE(3) Point Cloud Registration in RKHS via Unsupervised Equivariant Learning

    Authors: Ray Zhang, Zheming Zhou, Min Sun, Omid Ghasemalizadeh, Cheng-Hao Kuo, Ryan Eustice, Maani Ghaffari, Arnie Sen

    Abstract: This paper introduces a robust unsupervised SE(3) point cloud registration method that operates without requiring point correspondences. The method frames point clouds as functions in a reproducing kernel Hilbert space (RKHS), leveraging SE(3)-equivariant features for direct feature space registration. A novel RKHS distance metric is proposed, offering reliable performance amidst noise, outliers,… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 10 pages, to be published in ECCV 2024

  23. arXiv:2407.20042  [pdf, other

    cs.SE

    When to Stop? Towards Efficient Code Generation in LLMs with Excess Token Prevention

    Authors: Lianghong Guo, Yanlin Wang, Ensheng Shi, Wanjun Zhong, Hongyu Zhang, Jiachi Chen, Ruikai Zhang, Yuchi Ma, Zibin Zheng

    Abstract: Code generation aims to automatically generate code snippets that meet given natural language requirements and plays an important role in software development. Although Code LLMs have shown excellent performance in this domain, their long generation time poses a signification limitation in practice use. In this paper, we first conduct an in-depth preliminary study with different Code LLMs on code… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: To appear at ISSTA 2024

  24. arXiv:2407.19487  [pdf, other

    cs.SE

    RLCoder: Reinforcement Learning for Repository-Level Code Completion

    Authors: Yanlin Wang, Yanli Wang, Daya Guo, Jiachi Chen, Ruikai Zhang, Yuchi Ma, Zibin Zheng

    Abstract: Repository-level code completion aims to generate code for unfinished code snippets within the context of a specified repository. Existing approaches mainly rely on retrieval-augmented generation strategies due to limitations in input sequence length. However, traditional lexical-based retrieval methods like BM25 struggle to capture code semantics, while model-based retrieval methods face challeng… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: To appear at ICSE 2025

    Journal ref: 47th International Conference on Software Engineering (ICSE 2025)

  25. arXiv:2407.19239  [pdf, other

    cs.IR

    MaTrRec: Uniting Mamba and Transformer for Sequential Recommendation

    Authors: Shun Zhang, Runsen Zhang, Zhirong Yang

    Abstract: Sequential recommendation systems aim to provide personalized recommendations by analyzing dynamic preferences and dependencies within user behavior sequences. Recently, Transformer models can effectively capture user preferences. However, their quadratic computational complexity limits recommendation performance on long interaction sequence data. Inspired by the State Space Model (SSM)representat… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  26. arXiv:2407.19185  [pdf, other

    cs.CV cs.AI

    LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models

    Authors: Ruiyi Zhang, Yufan Zhou, Jian Chen, Jiuxiang Gu, Changyou Chen, Tong Sun

    Abstract: Large multimodal language models have demonstrated impressive capabilities in understanding and manipulating images. However, many of these models struggle with comprehending intensive textual contents embedded within the images, primarily due to the limited text recognition and layout understanding ability. To understand the sources of these limitations, we perform an exploratory analysis showing… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: NeurIPS 2024 Under Review

  27. arXiv:2407.18595  [pdf, other

    cs.CV

    LinguaLinker: Audio-Driven Portraits Animation with Implicit Facial Control Enhancement

    Authors: Rui Zhang, Yixiao Fang, Zhengnan Lu, Pei Cheng, Zebiao Huang, Bin Fu

    Abstract: This study delves into the intricacies of synchronizing facial dynamics with multilingual audio inputs, focusing on the creation of visually compelling, time-synchronized animations through diffusion-based techniques. Diverging from traditional parametric models for facial animation, our approach, termed LinguaLinker, adopts a holistic diffusion-based framework that integrates audio-driven visual… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  28. arXiv:2407.17956  [pdf, other

    cs.CV

    SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images

    Authors: Wenxi Li, Ruxin Zhang, Haozhe Lin, Yuchen Guo, Chao Ma, Xiaokang Yang

    Abstract: The advancement of deep learning in object detection has predominantly focused on megapixel images, leaving a critical gap in the efficient processing of gigapixel images. These super high-resolution images present unique challenges due to their immense size and computational demands. To address this, we introduce 'SaccadeDet', an innovative architecture for gigapixel-level object detection, inspi… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: This paper is accepted to ECML-PKDD 2024

    Journal ref: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2024

  29. arXiv:2407.17738  [pdf, other

    cs.CV

    Enhancing Fine-grained Object Detection in Aerial Images via Orthogonal Mapping

    Authors: Haoran Zhu, Yifan Zhou, Chang Xu, Ruixiang Zhang, Wen Yang

    Abstract: Fine-Grained Object Detection (FGOD) is a critical task in high-resolution aerial image analysis. This letter introduces Orthogonal Mapping (OM), a simple yet effective method aimed at addressing the challenge of semantic confusion inherent in FGOD. OM introduces orthogonal constraints in the feature space by decoupling features from the last layer of the classification branch with a class-wise or… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  30. arXiv:2407.17303  [pdf

    cs.LG

    MoveLight: Enhancing Traffic Signal Control through Movement-Centric Deep Reinforcement Learning

    Authors: Junqi Shao, Chenhao Zheng, Yuxuan Chen, Yucheng Huang, Rui Zhang

    Abstract: This paper introduces MoveLight, a novel traffic signal control system that enhances urban traffic management through movement-centric deep reinforcement learning. By leveraging detailed real-time data and advanced machine learning techniques, MoveLight overcomes the limitations of traditional traffic signal control methods. It employs a lane-level control approach using the FRAP algorithm to achi… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  31. arXiv:2407.16273  [pdf, other

    cs.CR

    Backdoor Attacks against Hybrid Classical-Quantum Neural Networks

    Authors: Ji Guo, Wenbo Jiang, Rui Zhang, Wenshu Fan, Jiachen Li, Guoming Lu

    Abstract: Hybrid Quantum Neural Networks (HQNNs) represent a promising advancement in Quantum Machine Learning (QML), yet their security has been rarely explored. In this paper, we present the first systematic study of backdoor attacks on HQNNs. We begin by proposing an attack framework and providing a theoretical analysis of the generalization bounds and minimum perturbation requirements for backdoor attac… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  32. arXiv:2407.16182  [pdf, other

    cs.CV

    No Re-Train, More Gain: Upgrading Backbones with Diffusion Model for Few-Shot Segmentation

    Authors: Shuai Chen, Fanman Meng, Chenhao Wu, Haoran Wei, Runtong Zhang, Qingbo Wu, Linfeng Xu, Hongliang Li

    Abstract: Few-Shot Segmentation (FSS) aims to segment novel classes using only a few annotated images. Despite considerable process under pixel-wise support annotation, current FSS methods still face three issues: the inflexibility of backbone upgrade without re-training, the inability to uniformly handle various types of annotations (e.g., scribble, bounding box, mask and text), and the difficulty in accom… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 7 figures

  33. arXiv:2407.16168  [pdf, other

    cs.CL

    Progressively Modality Freezing for Multi-Modal Entity Alignment

    Authors: Yani Huang, Xuefeng Zhang, Richong Zhang, Junfan Chen, Jaein Kim

    Abstract: Multi-Modal Entity Alignment aims to discover identical entities across heterogeneous knowledge graphs. While recent studies have delved into fusion paradigms to represent entities holistically, the elimination of features irrelevant to alignment and modal inconsistencies is overlooked, which are caused by inherent differences in multi-modal features. To address these challenges, we propose a nove… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 13pages, 8 figures, Accepted by ACL2024

  34. arXiv:2407.16073  [pdf, other

    cs.CL

    KaPQA: Knowledge-Augmented Product Question-Answering

    Authors: Swetha Eppalapally, Daksh Dangi, Chaithra Bhat, Ankita Gupta, Ruiyi Zhang, Shubham Agarwal, Karishma Bagga, Seunghyun Yoon, Nedim Lipka, Ryan A. Rossi, Franck Dernoncourt

    Abstract: Question-answering for domain-specific applications has recently attracted much interest due to the latest advancements in large language models (LLMs). However, accurately assessing the performance of these applications remains a challenge, mainly due to the lack of suitable benchmarks that effectively simulate real-world scenarios. To address this challenge, we introduce two product question-ans… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted at the ACL 2024 Workshop on Knowledge Augmented Methods for NLP

  35. arXiv:2407.15835  [pdf, other

    cs.CL cs.AI cs.SD eess.AS

    dMel: Speech Tokenization made Simple

    Authors: He Bai, Tatiana Likhomanenko, Ruixiang Zhang, Zijin Gu, Zakaria Aldeneh, Navdeep Jaitly

    Abstract: Large language models have revolutionized natural language processing by leveraging self-supervised pretraining on vast textual data. Inspired by this success, researchers have investigated complicated speech tokenization methods to discretize continuous speech signals so that language modeling techniques can be applied to speech data. However, existing approaches either model semantic tokens, pot… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: under review

  36. arXiv:2407.15816  [pdf

    cs.CV

    Efficient and generalizable prediction of molecular alterations in multiple cancer cohorts using H&E whole slide images

    Authors: Kshitij Ingale, Sun Hae Hong, Qiyuan Hu, Renyu Zhang, Bo Osinski, Mina Khoshdeli, Josh Och, Kunal Nagpal, Martin C. Stumpe, Rohan P. Joshi

    Abstract: Molecular testing of tumor samples for targetable biomarkers is restricted by a lack of standardization, turnaround-time, cost, and tissue availability across cancer types. Additionally, targetable alterations of low prevalence may not be tested in routine workflows. Algorithms that predict DNA alterations from routinely generated hematoxylin and eosin (H&E)-stained images could prioritize samples… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  37. arXiv:2407.15309  [pdf, other

    cs.DC cs.LG

    vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving

    Authors: Jiale Xu, Rui Zhang, Cong Guo, Weiming Hu, Zihan Liu, Feiyang Wu, Yu Feng, Shixuan Sun, Changxu Shao, Yuhong Guo, Junping Zhao, Ke Zhang, Minyi Guo, Jingwen Leng

    Abstract: Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests. This surge in demand poses significant challenges in optimizing throughput and latency while keeping costs manageable. The Key-Value (KV) cache, a standard method for retaining previous computations, makes LLM inference highly bounded by memory. While batching strategies can enhance performa… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 16 pages, 12 figures

  38. arXiv:2407.14439  [pdf, other

    cs.CV

    Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding

    Authors: Renshan Zhang, Yibo Lyu, Rui Shao, Gongwei Chen, Weili Guan, Liqiang Nie

    Abstract: Cropping high-resolution document images into multiple sub-images is the most widely used approach for current Multimodal Large Language Models (MLLMs) to do document understanding. Most of current document understanding methods preserve all tokens within sub-images and treat them equally. This neglects their different informativeness and leads to a significant increase in the number of image toke… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  39. arXiv:2407.14146  [pdf, other

    cs.MM

    Fine-grained Knowledge Graph-driven Video-Language Learning for Action Recognition

    Authors: Rui Zhang, Yafen Lu, Pengli Ji, Junxiao Xue, Xiaoran Yan

    Abstract: Recent work has explored video action recognition as a video-text matching problem and several effective methods have been proposed based on large-scale pre-trained vision-language models. However, these approaches primarily operate at a coarse-grained level without the detailed and semantic understanding of action concepts by exploiting fine-grained semantic connections between actions and body m… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  40. arXiv:2407.13598  [pdf, other

    cs.HC

    KNOWNET: Guided Health Information Seeking from LLMs via Knowledge Graph Integration

    Authors: Youfu Yan, Yu Hou, Yongkang Xiao, Rui Zhang, Qianwen Wang

    Abstract: The increasing reliance on Large Language Models (LLMs) for health information seeking can pose severe risks due to the potential for misinformation and the complexity of these topics. This paper introduces KNOWNET a visualization system that integrates LLMs with Knowledge Graphs (KG) to provide enhanced accuracy and structured exploration. Specifically, for enhanced accuracy, KNOWNET extracts tri… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 9 pages, 9 figures, accepted by IEEE VIS 2024

  41. arXiv:2407.13306  [pdf, ps, other

    cs.IT eess.SP

    Group Movable Antenna With Flexible Sparsity: Joint Array Position and Sparsity Optimization

    Authors: Haiquan Lu, Yong Zeng, Shi Jin, Rui Zhang

    Abstract: Movable antenna (MA) is a promising technology to exploit the spatial variation of wireless channel for performance enhancement, by dynamically varying the antenna position within a certain region. However, for multi-antenna communication systems, moving each antenna independently not only requires prohibitive complexity to find the optimal antenna positions, but also incurs sophisticated movement… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 5 pages, 5 figures

  42. arXiv:2407.13157  [pdf, other

    cs.CV cs.AI

    Learning Camouflaged Object Detection from Noisy Pseudo Label

    Authors: Jin Zhang, Ruiheng Zhang, Yanjiao Shi, Zhe Cao, Nian Liu, Fahad Shahbaz Khan

    Abstract: Existing Camouflaged Object Detection (COD) methods rely heavily on large-scale pixel-annotated training sets, which are both time-consuming and labor-intensive. Although weakly supervised methods offer higher annotation efficiency, their performance is far behind due to the unclear visual demarcations between foreground and background in camouflaged images. In this paper, we explore the potential… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  43. arXiv:2407.12851  [pdf

    cs.CL

    ISPO: An Integrated Ontology of Symptom Phenotypes for Semantic Integration of Traditional Chinese Medical Data

    Authors: Zixin Shu, Rui Hua, Dengying Yan, Chenxia Lu, Ning Xu, Jun Li, Hui Zhu, Jia Zhang, Dan Zhao, Chenyang Hui, Junqiu Ye, Chu Liao, Qi Hao, Wen Ye, Cheng Luo, Xinyan Wang, Chuang Cheng, Xiaodong Li, Baoyan Liu, Xiaji Zhou, Runshun Zhang, Min Xu, Xuezhong Zhou

    Abstract: Symptom phenotypes are one of the key types of manifestations for diagnosis and treatment of various disease conditions. However, the diversity of symptom terminologies is one of the major obstacles hindering the analysis and knowledge sharing of various types of symptom-related medical data particularly in the fields of Traditional Chinese Medicine (TCM). Objective: This study aimed to construct… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 39 pages, 6 figures, 6 tables

  44. arXiv:2407.12435  [pdf, other

    cs.CV

    F-HOI: Toward Fine-grained Semantic-Aligned 3D Human-Object Interactions

    Authors: Jie Yang, Xuesong Niu, Nan Jiang, Ruimao Zhang, Siyuan Huang

    Abstract: Existing 3D human object interaction (HOI) datasets and models simply align global descriptions with the long HOI sequence, while lacking a detailed understanding of intermediate states and the transitions between states. In this paper, we argue that fine-grained semantic alignment, which utilizes state-level descriptions, offers a promising paradigm for learning semantically rich HOI representati… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV24

  45. arXiv:2407.12423  [pdf, other

    cs.HC cs.AI

    StuGPTViz: A Visual Analytics Approach to Understand Student-ChatGPT Interactions

    Authors: Zixin Chen, Jiachen Wang, Meng Xia, Kento Shigyo, Dingdong Liu, Rong Zhang, Huamin Qu

    Abstract: The integration of Large Language Models (LLMs), especially ChatGPT, into education is poised to revolutionize students' learning experiences by introducing innovative conversational learning methodologies. To empower students to fully leverage the capabilities of ChatGPT in educational scenarios, understanding students' interaction patterns with ChatGPT is crucial for instructors. However, this e… ▽ More

    Submitted 21 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: 11 pages. To be published at IEEE Visualization 2024

  46. arXiv:2407.12064  [pdf, other

    eess.IV cs.CL cs.CV cs.LG cs.MM

    LiteGPT: Large Vision-Language Model for Joint Chest X-ray Localization and Classification Task

    Authors: Khai Le-Duc, Ryan Zhang, Ngoc Son Nguyen, Tan-Hanh Pham, Anh Dao, Ba Hung Ngo, Anh Totti Nguyen, Truong-Son Hy

    Abstract: Vision-language models have been extensively explored across a wide range of tasks, achieving satisfactory performance; however, their application in medical imaging remains underexplored. In this work, we propose a unified framework - LiteGPT - for the medical imaging. We leverage multiple pre-trained visual encoders to enrich information and enhance the performance of vision-language models. To… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Preprint, 19 pages

  47. arXiv:2407.12021  [pdf, other

    cs.CL cs.AI

    Adaptive Draft-Verification for Efficient Large Language Model Decoding

    Authors: Xukun Liu, Bowen Lei, Ruqi Zhang, Dongkuan Xu

    Abstract: Large language model (LLM) decoding involves generating a sequence of tokens based on a given context, where each token is predicted one at a time using the model's learned probabilities. The typical autoregressive decoding method requires a separate forward pass through the model for each token generated, which is computationally inefficient and poses challenges for deploying LLMs in latency-sens… ▽ More

    Submitted 19 August, 2024; v1 submitted 27 June, 2024; originally announced July 2024.

    Comments: Under review of Neurips 2024

  48. arXiv:2407.11504  [pdf, other

    cs.IR

    Bootstrapped Pre-training with Dynamic Identifier Prediction for Generative Retrieval

    Authors: Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, Xueqi Cheng

    Abstract: Generative retrieval uses differentiable search indexes to directly generate relevant document identifiers in response to a query. Recent studies have highlighted the potential of a strong generative retrieval model, trained with carefully crafted pre-training tasks, to enhance downstream retrieval tasks via fine-tuning. However, the full power of pre-training for generative retrieval remains unde… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ACL Findings 2024

  49. arXiv:2407.11218  [pdf

    cs.HC cs.RO

    Walk along: An Experiment on Controlling the Mobile Robot 'Spot' with Voice and Gestures

    Authors: Renchi Zhang, Jesse van der Linden, Dimitra Dodou, Harleigh Seyffert, Yke Bauke Eisma, Joost C. F. de Winter

    Abstract: Robots are becoming increasingly intelligent and can autonomously perform tasks such as navigating between locations. However, human oversight remains crucial. This study compared two hands-free methods for directing mobile robots: voice control and gesture control. These methods were tested with the human stationary and walking freely. We hypothesized that walking with the robot would lead to hig… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  50. arXiv:2407.11017  [pdf, other

    cs.CL cs.AI cs.LG

    Direct-Inverse Prompting: Analyzing LLMs' Discriminative Capacity in Self-Improving Generation

    Authors: Jihyun Janice Ahn, Ryo Kamoi, Lu Cheng, Rui Zhang, Wenpeng Yin

    Abstract: Mainstream LLM research has primarily focused on enhancing their generative capabilities. However, even the most advanced LLMs experience uncertainty in their outputs, often producing varied results on different runs or when faced with minor changes in input, despite no substantial change in content. Given multiple responses from the same LLM to the same input, we advocate leveraging the LLMs' dis… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Comments: 4 pages, 3 tables