Skip to main content

Showing 1–50 of 9,894 results for author: Wang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11787  [pdf, other

    eess.IV cs.CV

    NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation

    Authors: Zhenye Lou, Qing Xu, Zekun Jiang, Xiangjian He, Zhen Chen, Yi Wang, Chenxin Li, Maggie M. He, Wenting Duan

    Abstract: Domain-generalized nuclei segmentation refers to the generalizability of models to unseen domains based on knowledge learned from source domains and is challenged by various image conditions, cell types, and stain strategies. Recently, the Segment Anything Model (SAM) has made great success in universal image segmentation by interactive prompt modes (e.g., point and box). Despite its strengths, th… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Under Reivew

  2. arXiv:2408.11609  [pdf, other

    cs.CL cs.AI

    Xinyu: An Efficient LLM-based System for Commentary Generation

    Authors: Yiquan Wu, Bo Tang, Chenyang Xi, Yu Yu, Pengyu Wang, Yifei Liu, Kun Kuang, Haiying Deng, Zhiyu Li, Feiyu Xiong, Jie Hu, Peng Cheng, Zhonghao Wang, Yi Wang, Yi Luo, Mingchuan Yang

    Abstract: Commentary provides readers with a deep understanding of events by presenting diverse arguments and evidence. However, creating commentary is a time-consuming task, even for skilled commentators. Large language models (LLMs) have simplified the process of natural language generation, but their direct application in commentary creation still faces challenges due to unique task requirements. These r… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    ACM Class: I.2.7

  3. arXiv:2408.11537  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    A Survey of Embodied Learning for Object-Centric Robotic Manipulation

    Authors: Ying Zheng, Lei Yao, Yuejiao Su, Yi Zhang, Yi Wang, Sicheng Zhao, Yiyi Zhang, Lap-Pui Chau

    Abstract: Embodied learning for object-centric robotic manipulation is a rapidly developing and challenging area in embodied AI. It is crucial for advancing next-generation intelligent robots and has garnered significant interest recently. Unlike data-driven machine learning methods, embodied learning focuses on robot learning through physical interaction with the environment and perceptual feedback, making… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  4. arXiv:2408.11464  [pdf, other

    cs.CV

    MambaOcc: Visual State Space Model for BEV-based Occupancy Prediction with Local Adaptive Reordering

    Authors: Yonglin Tian, Songlin Bai, Zhiyao Luo, Yutong Wang, Yisheng Lv, Fei-Yue Wang

    Abstract: Occupancy prediction has attracted intensive attention and shown great superiority in the development of autonomous driving systems. The fine-grained environmental representation brought by occupancy prediction in terms of both geometry and semantic information has facilitated the general perception and safe planning under open scenarios. However, it also brings high computation costs and heavy pa… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  5. arXiv:2408.11451  [pdf, other

    cs.AI

    Bidirectional Gated Mamba for Sequential Recommendation

    Authors: Ziwei Liu, Qidong Liu, Yejing Wang, Wanyu Wang, Pengyue Jia, Maolin Wang, Zitao Liu, Yi Chang, Xiangyu Zhao

    Abstract: In various domains, Sequential Recommender Systems (SRS) have become essential due to their superior capability to discern intricate user preferences. Typically, SRS utilize transformer-based architectures to forecast the subsequent item within a sequence. Nevertheless, the quadratic computational complexity inherent in these models often leads to inefficiencies, hindering the achievement of real-… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  6. arXiv:2408.11393  [pdf, other

    cs.CL cs.LG

    First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models

    Authors: Chi Ma, Mincong Huang, Ying Zhang, Chao Wang, Yujie Wang, Lei Yu, Chuan Liu, Wei Lin

    Abstract: Dynamic activation (DA) techniques, such as DejaVu and MoEfication, have demonstrated their potential to significantly enhance the inference efficiency of large language models (LLMs). However, these techniques often rely on ReLU activation functions or require additional parameters and training to maintain performance. This paper introduces a training-free Threshold-based Dynamic Activation(TDA)… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  7. arXiv:2408.11381  [pdf, other

    cs.CL

    RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation

    Authors: Xuanwang Zhang, Yunze Song, Yidong Wang, Shuyun Tang, Xinfeng Li, Zhengran Zeng, Zhen Wu, Wei Ye, Wenyuan Xu, Yue Zhang, Xinyu Dai, Shikun Zhang, Qingsong Wen

    Abstract: Large Language Models (LLMs) demonstrate human-level capabilities in dialogue, reasoning, and knowledge retention. However, even the most advanced LLMs face challenges such as hallucinations and real-time updating of their knowledge. Current research addresses this bottleneck by equipping LLMs with external knowledge, a technique known as Retrieval Augmented Generation (RAG). However, two key issu… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: 6 pages, 3 figures

  8. arXiv:2408.11357  [pdf, other

    cs.CV

    HumanCoser: Layered 3D Human Generation via Semantic-Aware Diffusion Model

    Authors: Yi Wang, Jian Ma, Ruizhi Shao, Qiao Feng, Yu-kun Lai, Kun Li

    Abstract: This paper aims to generate physically-layered 3D humans from text prompts. Existing methods either generate 3D clothed humans as a whole or support only tight and simple clothing generation, which limits their applications to virtual try-on and part-level editing. To achieve physically-layered 3D human generation with reusable and complex clothing, we propose a novel layer-wise dressed human repr… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  9. arXiv:2408.11311  [pdf, other

    cs.AR quant-ph

    HiMA: Hierarchical Quantum Microarchitecture for Qubit-Scaling and Quantum Process-Level Parallelism

    Authors: Qi Zhou, Zi-Hao Mei, Han-Qing Shi, Liang-Liang Guo, Xiao-Yan Yang, Yun-Jie Wang, Xiao-Fan Xu, Cheng Xue, Wei-Cheng Kong, Jun-Chao Wang, Yu-Chun Wu, Zhao-Yun Chen, Guo-Ping Guo

    Abstract: Quantum computing holds immense potential for addressing a myriad of intricate challenges, which is significantly amplified when scaled to thousands of qubits. However, a major challenge lies in developing an efficient and scalable quantum control system. To address this, we propose a novel Hierarchical MicroArchitecture (HiMA) designed to facilitate qubit scaling and exploit quantum process-level… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  10. arXiv:2408.11145  [pdf, other

    cs.LG

    Total Uncertainty Quantification in Inverse PDE Solutions Obtained with Reduced-Order Deep Learning Surrogate Models

    Authors: Yuanzhe Wang, Alexandre M. Tartakovsky

    Abstract: We propose an approximate Bayesian method for quantifying the total uncertainty in inverse PDE solutions obtained with machine learning surrogate models, including operator learning models. The proposed method accounts for uncertainty in the observations and PDE and surrogate models. First, we use the surrogate model to formulate a minimization problem in the reduced space for the maximum a poster… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  11. arXiv:2408.11001  [pdf, other

    cs.CV

    MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

    Authors: Haoning Wu, Shaocheng Shen, Qiang Hu, Xiaoyun Zhang, Ya Zhang, Yanfeng Wang

    Abstract: Diffusion models have emerged as frontrunners in text-to-image generation for their impressive capabilities. Nonetheless, their fixed image resolution during training often leads to challenges in high-resolution image generation, such as semantic inaccuracies and object replication. This paper introduces MegaFusion, a novel approach that extends existing diffusion-based text-to-image generation mo… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Technical Report. Project Page: https://haoningwu3639.github.io/MegaFusion/

  12. arXiv:2408.10908  [pdf, other

    cs.RO cs.HC

    Enhancing End-to-End Autonomous Driving Systems Through Synchronized Human Behavior Data

    Authors: Yiqun Duan, Zhuoli Zhuang, Jinzhao Zhou, Yu-Cheng Chang, Yu-Kai Wang, Chin-Teng Lin

    Abstract: This paper presents a pioneering exploration into the integration of fine-grained human supervision within the autonomous driving domain to enhance system performance. The current advances in End-to-End autonomous driving normally are data-driven and rely on given expert trials. However, this reliance limits the systems' generalizability and their ability to earn human trust. Addressing this gap,… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  13. arXiv:2408.10883  [pdf, other

    cs.AI cs.CV

    DAAD: Dynamic Analysis and Adaptive Discriminator for Fake News Detection

    Authors: Xinqi Su, Yawen Cui, Ajian Liu, Xun Lin, Yuhao Wang, Haochen Liang, Wenhui Li, Zitong Yu

    Abstract: In current web environment, fake news spreads rapidly across online social networks, posing serious threats to society. Existing multimodal fake news detection (MFND) methods can be classified into knowledge-based and semantic-based approaches. However, these methods are overly dependent on human expertise and feedback, lacking flexibility. To address this challenge, we propose a Dynamic Analysis… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  14. arXiv:2408.10841  [pdf, other

    cs.AI cs.CL

    DELIA: Diversity-Enhanced Learning for Instruction Adaptation in Large Language Models

    Authors: Yuanhao Zeng, Fei Ren, Xinpeng Zhou, Yihang Wang, Yingxia Shao

    Abstract: Although instruction tuning is widely used to adjust behavior in Large Language Models (LLMs), extensive empirical evidence and research indicates that it is primarily a process where the model fits to specific task formats, rather than acquiring new knowledge or capabilities. We propose that this limitation stems from biased features learned during instruction tuning, which differ from ideal task… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 8 pages, 5 figures

  15. arXiv:2408.10652  [pdf, other

    cs.CV cs.AI

    Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant

    Authors: Guofeng Mei, Luigi Riz, Yiming Wang, Fabio Poiesi

    Abstract: Most recent 3D instance segmentation methods are open vocabulary, offering a greater flexibility than closed-vocabulary methods. Yet, they are limited to reasoning within a specific set of concepts, \ie the vocabulary, prompted by the user at test time. In essence, these models cannot reason in an open-ended fashion, i.e., answering ``List the objects in the scene.''. We introduce the first method… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  16. arXiv:2408.10641  [pdf, other

    cs.CV cs.AI

    A Review of Human-Object Interaction Detection

    Authors: Yuxiao Wang, Qiwei Xiong, Yu Lei, Weiying Xue, Qi Liu, Zhenao Wei

    Abstract: Human-object interaction (HOI) detection plays a key role in high-level visual understanding, facilitating a deep comprehension of human activities. Specifically, HOI detection aims to locate the humans and objects involved in interactions within images or videos and classify the specific interactions between them. The success of this task is influenced by several key factors, including the accura… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  17. arXiv:2408.10605  [pdf, other

    cs.CV cs.AI

    MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

    Authors: Yanbo Ding, Shaobin Zhuang, Kunchang Li, Zhengrong Yue, Yu Qiao, Yali Wang

    Abstract: Despite recent advancements in text-to-image generation, most existing methods struggle to create images with multiple objects and complex spatial relationships in 3D world. To tackle this limitation, we introduce a generic AI system, namely MUSES, for 3D-controllable image generation from user queries. Specifically, our MUSES addresses this challenging task by developing a progressive workflow wi… ▽ More

    Submitted 21 August, 2024; v1 submitted 20 August, 2024; originally announced August 2024.

  18. arXiv:2408.10497  [pdf, other

    cs.CL cs.AI

    QUITO-X: An Information Bottleneck-based Compression Algorithm with Cross-Attention

    Authors: Yihang Wang, Xu Huang, Bowen Tian, Yixing Fan, Jiafeng Guo

    Abstract: Generative LLM have achieved significant success in various industrial tasks and can effectively adapt to vertical domains and downstream tasks through ICL. However, with tasks becoming increasingly complex, the context length required by ICL is also getting longer, and two significant issues arise: (i) The excessively long context leads to high costs and inference delays. (ii) A substantial amoun… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  19. arXiv:2408.10488  [pdf, other

    cs.CV cs.AI cs.CL cs.NE

    Event Stream based Sign Language Translation: A High-Definition Benchmark Dataset and A New Algorithm

    Authors: Xiao Wang, Yao Rong, Fuling Wang, Jianing Li, Lin Zhu, Bo Jiang, Yaowei Wang

    Abstract: Sign Language Translation (SLT) is a core task in the field of AI-assisted disability. Unlike traditional SLT based on visible light videos, which is easily affected by factors such as lighting, rapid hand movements, and privacy breaches, this paper proposes the use of high-definition Event streams for SLT, effectively mitigating the aforementioned issues. This is primarily because Event streams h… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: First Large-scale and High-Definition Benchmark Dataset for Event-based Sign Language Translation

  20. arXiv:2408.10486  [pdf, ps, other

    cs.SE

    Revisiting Evolutionary Program Repair via Code Language Model

    Authors: Yunan Wang, Tingyu Guo, Zilong Huang, Yuan Yuan

    Abstract: Software defects are an inherent part of software development and maintenance. To address these defects, Automated Program Repair (APR) has been developed to fix bugs automatically. With the advent of Large Language Models, Code Language Models (CLMs) trained on code corpora excels in code generation, making them suitable for APR applications. Despite this progress, a significant limitation remain… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  21. AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference

    Authors: Shuzhang Zhong, Ling Liang, Yuan Wang, Runsheng Wang, Ru Huang, Meng Li

    Abstract: Mixture-of-Experts (MoE) models are designed to enhance the efficiency of large language models (LLMs) without proportionally increasing the computational demands. However, their deployment on edge devices still faces significant challenges due to high on-demand loading overheads from managing sparsely activated experts. This paper introduces AdapMoE, an algorithm-system co-design framework for ef… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  22. arXiv:2408.10205  [pdf, other

    cs.LG cs.AI physics.comp-ph physics.data-an

    KAN 2.0: Kolmogorov-Arnold Networks Meet Science

    Authors: Ziming Liu, Pingchuan Ma, Yixuan Wang, Wojciech Matusik, Max Tegmark

    Abstract: A major challenge of AI + Science lies in their inherent incompatibility: today's AI is primarily based on connectionism, while science depends on symbolism. To bridge the two worlds, we propose a framework to seamlessly synergize Kolmogorov-Arnold Networks (KANs) and science. The framework highlights KANs' usage for three aspects of scientific discovery: identifying relevant features, revealing m… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 27 pages, 14 figures

  23. arXiv:2408.10202  [pdf, other

    cs.CV

    SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP

    Authors: Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang, Yuta Nakashima, Yu-Chiang Frank Wang, Ryo Hachiuma

    Abstract: Large-scale vision-language models, such as CLIP, are known to contain harmful societal bias regarding protected attributes (e.g., gender and age). In this paper, we aim to address the problems of societal bias in CLIP. Although previous studies have proposed to debias societal bias through adversarial learning or test-time projecting, our comprehensive study of these works identifies two critical… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  24. arXiv:2408.10178  [pdf, other

    cs.CV cs.AI

    NeuRodin: A Two-stage Framework for High-Fidelity Neural Surface Reconstruction

    Authors: Yifan Wang, Di Huang, Weicai Ye, Guofeng Zhang, Wanli Ouyang, Tong He

    Abstract: Signed Distance Function (SDF)-based volume rendering has demonstrated significant capabilities in surface reconstruction. Although promising, SDF-based methods often fail to capture detailed geometric structures, resulting in visible defects. By comparing SDF-based volume rendering to density-based volume rendering, we identify two main factors within the SDF-based approach that degrade surface q… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  25. arXiv:2408.09984  [pdf, other

    cs.CV

    Boosting Open-Domain Continual Learning via Leveraging Intra-domain Category-aware Prototype

    Authors: Yadong Lu, Shitian Zhao, Boxiang Yun, Dongsheng Jiang, Yin Li, Qingli Li, Yan Wang

    Abstract: Despite recent progress in enhancing the efficacy of Open-Domain Continual Learning (ODCL) in Vision-Language Models (VLM), failing to (1) correctly identify the Task-ID of a test image and (2) use only the category set corresponding to the Task-ID, while preserving the knowledge related to each domain, cannot address the two primary challenges of ODCL: forgetting old knowledge and maintaining zer… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  26. arXiv:2408.09786  [pdf, other

    cs.CV

    Cross-composition Feature Disentanglement for Compositional Zero-shot Learning

    Authors: Yuxia Geng, Runkai Zhu, Jiaoyan Chen, Jintai Chen, Zhuo Chen, Xiang Chen, Can Xu, Yuxiang Wang, Xiaoliang Xu

    Abstract: Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL). However, due to the feature divergence of an attribute (resp. object) when combined with different objects (resp. attributes), it is challenging to learn disentangled primitive features that are general across different compositions. To this end,… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: work in progress

  27. arXiv:2408.09554  [pdf, other

    q-bio.QM cs.CV eess.IV

    Screen Them All: High-Throughput Pan-Cancer Genetic and Phenotypic Biomarker Screening from H&E Whole Slide Images

    Authors: Yi Kan Wang, Ludmila Tydlitatova, Jeremy D. Kunz, Gerard Oakley, Ran A. Godrich, Matthew C. H. Lee, Chad Vanderbilt, Razik Yousfi, Thomas Fuchs, David S. Klimstra, Siqi Liu

    Abstract: Many molecular alterations serve as clinically prognostic or therapy-predictive biomarkers, typically detected using single or multi-gene molecular assays. However, these assays are expensive, tissue destructive and often take weeks to complete. Using AI on routine H&E WSIs offers a fast and economical approach to screen for multiple molecular biomarkers. We present a high-throughput AI-based syst… ▽ More

    Submitted 20 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

  28. arXiv:2408.09459  [pdf, other

    cs.CL cs.IR

    WPN: An Unlearning Method Based on N-pair Contrastive Learning in Language Models

    Authors: Guitao Chen, Yunshen Wang, Hongye Sun, Guang Chen

    Abstract: Generative language models (LMs) offer numerous advantages but may produce inappropriate or harmful outputs due to the harmful knowledge acquired during pre-training. This knowledge often manifests as undesirable correspondences, such as "harmful prompts" leading to "harmful outputs," which our research aims to mitigate through unlearning techniques.However, existing unlearning methods based on gr… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: ECAI 2024

  29. arXiv:2408.09458  [pdf, other

    cs.CV

    G2Face: High-Fidelity Reversible Face Anonymization via Generative and Geometric Priors

    Authors: Haoxin Yang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Jing Qin, Yi Wang, Pheng-Ann Heng, Shengfeng He

    Abstract: Reversible face anonymization, unlike traditional face pixelization, seeks to replace sensitive identity information in facial images with synthesized alternatives, preserving privacy without sacrificing image clarity. Traditional methods, such as encoder-decoder networks, often result in significant loss of facial details due to their limited learning capacity. Additionally, relying on latent man… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  30. arXiv:2408.09278  [pdf, other

    eess.IV cs.CV

    Cross-Species Data Integration for Enhanced Layer Segmentation in Kidney Pathology

    Authors: Junchao Zhu, Mengmeng Yin, Ruining Deng, Yitian Long, Yu Wang, Yaohong Wang, Shilin Zhao, Haichun Yang, Yuankai Huo

    Abstract: Accurate delineation of the boundaries between the renal cortex and medulla is crucial for subsequent functional structural analysis and disease diagnosis. Training high-quality deep-learning models for layer segmentation relies on the availability of large amounts of annotated data. However, due to the patient's privacy of medical data and scarce clinical cases, constructing pathological datasets… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  31. arXiv:2408.09199  [pdf, other

    cs.IR

    TC-RAG:Turing-Complete RAG's Case study on Medical LLM Systems

    Authors: Xinke Jiang, Yue Fang, Rihong Qiu, Haoyu Zhang, Yongxin Xu, Hao Chen, Wentao Zhang, Ruizhe Zhang, Yuchen Fang, Xu Chu, Junfeng Zhao, Yasha Wang

    Abstract: In the pursuit of enhancing domain-specific Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) emerges as a promising solution to mitigate issues such as hallucinations, outdated knowledge, and limited expertise in highly specialized queries. However, existing approaches to RAG fall short by neglecting system state variables, which are crucial for ensuring adaptive control, retriev… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: version 1.0

  32. arXiv:2408.09191  [pdf, other

    cs.CV

    GSLAMOT: A Tracklet and Query Graph-based Simultaneous Locating, Mapping, and Multiple Object Tracking System

    Authors: Shuo Wang, Yongcai Wang, Zhimin Xu, Yongyu Guo, Wanting Li, Zhe Huang, Xuewei Bai, Deying Li

    Abstract: For interacting with mobile objects in unfamiliar environments, simultaneously locating, mapping, and tracking the 3D poses of multiple objects are crucially required. This paper proposes a Tracklet Graph and Query Graph-based framework, i.e., GSLAMOT, to address this challenge. GSLAMOT utilizes camera and LiDAR multimodal information as inputs and divides the representation of the dynamic scene i… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 11 pages, 9 figures, ACM MM 2024

  33. arXiv:2408.09181  [pdf, other

    cs.CV cs.CR cs.LG

    PADetBench: Towards Benchmarking Physical Attacks against Object Detection

    Authors: Jiawei Lian, Jianhong Pan, Lefan Wang, Yi Wang, Lap-Pui Chau, Shaohui Mei

    Abstract: Physical attacks against object detection have gained increasing attention due to their significant practical implications. However, conducting physical experiments is extremely time-consuming and labor-intensive. Moreover, physical dynamics and cross-domain transformation are challenging to strictly regulate in the real world, leading to unaligned evaluation and comparison, severely hindering the… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  34. HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction

    Authors: Xiao Zhao, Bo Chen, Mingyang Sun, Dingkang Yang, Youxing Wang, Xukun Zhang, Mingcheng Li, Dongliang Kou, Xiaoyi Wei, Lihua Zhang

    Abstract: Vision-based 3D semantic scene completion (SSC) describes autonomous driving scenes through 3D volume representations. However, the occlusion of invisible voxels by scene surfaces poses challenges to current SSC methods in hallucinating refined 3D geometry. This paper proposes HybridOcc, a hybrid 3D volume query proposal method generated by Transformer framework and NeRF representation and refined… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Accepted to IEEE RAL

  35. arXiv:2408.08696  [pdf, other

    cs.CL cs.LG

    Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling

    Authors: Xianzhen Luo, Yixuan Wang, Qingfu Zhu, Zhiming Zhang, Xuanyu Zhang, Qing Yang, Dongliang Xu, Wanxiang Che

    Abstract: The rapid growth in the parameters of large language models (LLMs) has made inference latency a fundamental bottleneck, limiting broader application of LLMs. Speculative decoding represents a lossless approach to accelerate inference through a guess-and-verify paradigm, leveraging the parallel capabilities of modern hardware. Some speculative decoding methods rely on additional structures to guess… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: under review

  36. arXiv:2408.08693  [pdf, other

    cs.CL

    Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm

    Authors: Hongcheng Liu, Yusheng Liao, Siqv Ou, Yuhao Wang, Heyang Liu, Yanfeng Wang, Yu Wang

    Abstract: The application of the Multi-modal Large Language Models (MLLMs) in medical clinical scenarios remains underexplored. Previous benchmarks only focus on the capacity of the MLLMs in medical visual question-answering (VQA) or report generation and fail to assess the performance of the MLLMs on complex clinical multi-modal tasks. In this paper, we propose a novel Medical Personalized Multi-modal Cons… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 26 pages, 5 figures

  37. arXiv:2408.08669  [pdf, other

    cs.SD eess.AS

    HSDreport: Heart Sound Diagnosis with Echocardiography Reports

    Authors: Zihan Zhao, Pingjie Wang, Liudan Zhao, Yuchen Yang, Ya Zhang, Kun Sun, Xin Sun, Xin Zhou, Yu Wang, Yanfeng Wang

    Abstract: Heart sound auscultation holds significant importance in the diagnosis of congenital heart disease. However, existing methods for Heart Sound Diagnosis (HSD) tasks are predominantly limited to a few fixed categories, framing the HSD task as a rigid classification problem that does not fully align with medical practice and offers only limited information to physicians. Besides, such methods do not… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  38. arXiv:2408.08665  [pdf, other

    cs.CV

    QMambaBSR: Burst Image Super-Resolution with Query State Space Model

    Authors: Xin Di, Long Peng, Peizhe Xia, Wenbo Li, Renjing Pei, Yang Cao, Yang Wang, Zheng-Jun Zha

    Abstract: Burst super-resolution aims to reconstruct high-resolution images with higher quality and richer details by fusing the sub-pixel information from multiple burst low-resolution frames. In BusrtSR, the key challenge lies in extracting the base frame's content complementary sub-pixel details while simultaneously suppressing high-frequency noise disturbance. Existing methods attempt to extract sub-pix… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  39. arXiv:2408.08592  [pdf, other

    cs.RO

    Case Study: Runtime Safety Verification of Neural Network Controlled System

    Authors: Frank Yang, Sinong Simon Zhan, Yixuan Wang, Chao Huang, Qi Zhu

    Abstract: Neural networks are increasingly used in safety-critical applications such as robotics and autonomous vehicles. However, the deployment of neural-network-controlled systems (NNCSs) raises significant safety concerns. Many recent advances overlook critical aspects of verifying control and ensuring safety in real-time scenarios. This paper presents a case study on using POLAR-Express, a state-of-the… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 15 pages, 5 figures, submitted to Runtime Verification 2024

  40. arXiv:2408.08447  [pdf, other

    cs.CV cs.AI

    SpectralEarth: Training Hyperspectral Foundation Models at Scale

    Authors: Nassim Ait Ali Braham, Conrad M Albrecht, Julien Mairal, Jocelyn Chanussot, Yi Wang, Xiao Xiang Zhu

    Abstract: Foundation models have triggered a paradigm shift in computer vision and are increasingly being adopted in remote sensing, particularly for multispectral imagery. Yet, their potential in hyperspectral imaging (HSI) remains untapped due to the absence of comprehensive and globally representative hyperspectral datasets. To close this gap, we introduce SpectralEarth, a large-scale multi-temporal data… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  41. arXiv:2408.08412  [pdf, other

    cs.CV

    Penny-Wise and Pound-Foolish in Deepfake Detection

    Authors: Yabin Wang, Zhiwu Huang, Su Zhou, Adam Prugel-Bennett, Xiaopeng Hong

    Abstract: The diffusion of deepfake technologies has sparked serious concerns about its potential misuse across various domains, prompting the urgent need for robust detection methods. Despite advancement, many current approaches prioritize short-term gains at expense of long-term effectiveness. This paper critiques the overly specialized approach of fine-tuning pre-trained models solely with a penny-wise o… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  42. arXiv:2408.07967  [pdf, other

    cs.CV

    FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering

    Authors: Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Zhilin Pei, Hengjie Li, Xingcheng Zhang, Bo Dai

    Abstract: This work introduces FlashGS, an open-source CUDA Python library, designed to facilitate the efficient differentiable rasterization of 3D Gaussian Splatting through algorithmic and kernel-level optimizations. FlashGS is developed based on the observations from a comprehensive analysis of the rendering process to enhance computational efficiency and bring the technique to wide adoption. The paper i… ▽ More

    Submitted 19 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  43. GOReloc: Graph-based Object-Level Relocalization for Visual SLAM

    Authors: Yutong Wang, Chaoyang Jiang, Xieyuanli Chen

    Abstract: This article introduces a novel method for object-level relocalization of robotic systems. It determines the pose of a camera sensor by robustly associating the object detections in the current frame with 3D objects in a lightweight object-level map. Object graphs, considering semantic uncertainties, are constructed for both the incoming camera frame and the pre-built map. Objects are represented… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 8 pages, accepted by IEEE RAL

    Journal ref: IEEE Robotics and Automation Letters 2024

  44. arXiv:2408.07884  [pdf, other

    cs.CL

    Instruct Large Language Models to Generate Scientific Literature Survey Step by Step

    Authors: Yuxuan Lai, Yupeng Wu, Yidan Wang, Wenpeng Hu, Chen Zheng

    Abstract: Abstract. Automatically generating scientific literature surveys is a valuable task that can significantly enhance research efficiency. However, the diverse and complex nature of information within a literature survey poses substantial challenges for generative models. In this paper, we design a series of prompts to systematically leverage large language models (LLMs), enabling the creation of com… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: NLPCC 2024

  45. arXiv:2408.07819  [pdf, other

    cs.MM cs.CV cs.LG

    Regularized Contrastive Partial Multi-view Outlier Detection

    Authors: Yijia Wang, Qianqian Xu, Yangbangyan Jiang, Siran Dai, Qingming Huang

    Abstract: In recent years, multi-view outlier detection (MVOD) methods have advanced significantly, aiming to identify outliers within multi-view datasets. A key point is to better detect class outliers and class-attribute outliers, which only exist in multi-view data. However, existing methods either is not able to reduce the impact of outliers when learning view-consistent information, or struggle in case… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: Proceedings of the 32nd ACM International Conference on Multimedia

  46. arXiv:2408.07605  [pdf, other

    cs.CV

    Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving

    Authors: Yuqing Wen, Yucheng Zhao, Yingfei Liu, Binyuan Huang, Fan Jia, Yanhui Wang, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang

    Abstract: The field of autonomous driving increasingly demands high-quality annotated video training data. In this paper, we propose Panacea+, a powerful and universally applicable framework for generating video data in driving scenes. Built upon the foundation of our previous work, Panacea, Panacea+ adopts a multi-view appearance noise prior mechanism and a super-resolution module for enhanced consistency… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Project page: https://panacea-ad.github.io/. arXiv admin note: text overlap with arXiv:2311.16813

  47. arXiv:2408.07471  [pdf, other

    cs.CL

    Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization

    Authors: Yuxin Jiang, Bo Huang, Yufei Wang, Xingshan Zeng, Liangyou Li, Yasheng Wang, Xin Jiang, Lifeng Shang, Ruiming Tang, Wei Wang

    Abstract: Direct preference optimization (DPO), a widely adopted offline preference optimization algorithm, aims to align large language models (LLMs) with human-desired behaviors using pairwise preference data. However, the winning response and the losing response within pairwise data are generated isolatedly, leading to weak correlations between them as well as suboptimal alignment performance. To address… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 18 pages, 8 figures, 8 tables, working in progress

  48. arXiv:2408.07433  [pdf, other

    cs.CV cs.AI

    MagicFace: Training-free Universal-Style Human Image Customized Synthesis

    Authors: Yibin Wang, Weizhong Zhang, Cheng Jin

    Abstract: Current state-of-the-art methods for human image customized synthesis typically require tedious training on large-scale datasets. In such cases, they are prone to overfitting and struggle to personalize individuals of unseen styles. Moreover, these methods extensively focus on single-concept human image synthesis and lack the flexibility needed for customizing individuals with multiple given conce… ▽ More

    Submitted 19 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: project page: https://codegoat24.github.io/MagicFace

  49. arXiv:2408.07083  [pdf, other

    cs.LG cs.AI

    Masked EEG Modeling for Driving Intention Prediction

    Authors: Jinzhao Zhou, Justin Sia, Yiqun Duan, Yu-Cheng Chang, Yu-Kai Wang, Chin-Teng Lin

    Abstract: Driving under drowsy conditions significantly escalates the risk of vehicular accidents. Although recent efforts have focused on using electroencephalography to detect drowsiness, helping prevent accidents caused by driving in such states, seamless human-machine interaction in driving scenarios requires a more versatile EEG-based system. This system should be capable of understanding a driver's in… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  50. arXiv:2408.07018  [pdf, other

    cs.CV

    Efficient Human-Object-Interaction (EHOI) Detection via Interaction Label Coding and Conditional Decision

    Authors: Tsung-Shan Yang, Yun-Cheng Wang, Chengwei Wei, Suya You, C. -C. Jay Kuo

    Abstract: Human-Object Interaction (HOI) detection is a fundamental task in image understanding. While deep-learning-based HOI methods provide high performance in terms of mean Average Precision (mAP), they are computationally expensive and opaque in training and inference processes. An Efficient HOI (EHOI) detector is proposed in this work to strike a good balance between detection performance, inference c… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.