Skip to main content

Showing 1–50 of 779 results for author: Yuan, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10495  [pdf, other

    cs.SE cs.AI

    How Well Do Large Language Models Serve as End-to-End Secure Code Producers?

    Authors: Jianian Gong, Nachuan Duan, Ziheng Tao, Zhaohui Gong, Yuan Yuan, Minlie Huang

    Abstract: The rapid advancement of large language models (LLMs) such as GPT-4 has revolutionized the landscape of software engineering, positioning these models at the core of modern development practices. As we anticipate these models to evolve into the primary and trustworthy tools used in software development, ensuring the security of the code they produce becomes paramount. How well can LLMs serve as en… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    ACM Class: D.2

  2. arXiv:2408.10486  [pdf, ps, other

    cs.SE

    Revisiting Evolutionary Program Repair via Code Language Model

    Authors: Yunan Wang, Tingyu Guo, Zilong Huang, Yuan Yuan

    Abstract: Software defects are an inherent part of software development and maintenance. To address these defects, Automated Program Repair (APR) has been developed to fix bugs automatically. With the advent of Large Language Models, Code Language Models (CLMs) trained on code corpora excels in code generation, making them suitable for APR applications. Despite this progress, a significant limitation remain… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  3. arXiv:2408.10275  [pdf

    cs.LG cs.AI

    FedKBP: Federated dose prediction framework for knowledge-based planning in radiation therapy

    Authors: Jingyun Chen, Martin King, Yading Yuan

    Abstract: Dose prediction plays a key role in knowledge-based planning (KBP) by automatically generating patient-specific dose distribution. Recent advances in deep learning-based dose prediction methods necessitates collaboration among data contributors for improved performance. Federated learning (FL) has emerged as a solution, enabling medical centers to jointly train deep-learning models without comprom… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Under review by SPIE Medical Imaging 2025 Conference

  4. arXiv:2408.10130  [pdf

    cs.CL cs.AI

    Rhyme-aware Chinese lyric generator based on GPT

    Authors: Yixiao Yuan, Yangchen Huang, Yu Ma, Xinjin Li, Zhenglin Li, Yiming Shi, Huapeng Zhou

    Abstract: Neural language representation models such as GPT, pre-trained on large-scale corpora, can effectively capture rich semantic patterns from plain text and be fine-tuned to consistently improve natural language generation performance. However, existing pre-trained language models used to generate lyrics rarely consider rhyme information, which is crucial in lyrics. Using a pre-trained model directly… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  5. arXiv:2408.08788  [pdf

    cs.LG

    Neighbor Overlay-Induced Graph Attention Network

    Authors: Tiqiao Wei, Ye Yuan

    Abstract: Graph neural networks (GNNs) have garnered significant attention due to their ability to represent graph data. Among various GNN variants, graph attention network (GAT) stands out since it is able to dynamically learn the importance of different nodes. However, present GATs heavily rely on the smoothed node features to obtain the attention coefficients rather than graph structural information, whi… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  6. arXiv:2408.08407  [pdf, other

    physics.optics cs.ET

    Photonic KAN: a Kolmogorov-Arnold network inspired efficient photonic neuromorphic architecture

    Authors: Yiwei Peng, Sean Hooten, Xinling Yu, Thomas Van Vaerenbergh, Yuan Yuan, Xian Xiao, Bassem Tossoun, Stanley Cheung, Marco Fiorentino, Raymond Beausoleil

    Abstract: Kolmogorov-Arnold Networks (KAN) models were recently proposed and claimed to provide improved parameter scaling and interpretability compared to conventional multilayer perceptron (MLP) models. Inspired by the KAN architecture, we propose the Photonic KAN -- an integrated all-optical neuromorphic platform leveraging highly parametric optical nonlinear transfer functions along KAN edges. In this w… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 11 pages, 7 figures, 1 table

  7. arXiv:2408.07941  [pdf, other

    stat.ML cs.LG

    Robust Offline Active Learning on Graphs

    Authors: Yuanchen Wu, Yubai Yuan

    Abstract: We consider the problem of active learning on graphs, which has crucial applications in many real-world networks where labeling node responses is expensive. In this paper, we propose an offline active learning method that selects nodes to query by explicitly incorporating information from both the network structure and node covariates. Building on graph signal recovery theories and the random spec… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  8. arXiv:2408.07891  [pdf, other

    cs.CV cs.AI cs.LG

    Quantum-inspired Interpretable Deep Learning Architecture for Text Sentiment Analysis

    Authors: Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Yuan Yuan

    Abstract: Text has become the predominant form of communication on social media, embedding a wealth of emotional nuances. Consequently, the extraction of emotional information from text is of paramount importance. Despite previous research making some progress, existing text sentiment analysis models still face challenges in integrating diverse semantic information and lack interpretability. To address thes… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  9. arXiv:2408.06629  [pdf, other

    cs.CV

    Fast Information Streaming Handler (FisH): A Unified Seismic Neural Network for Single Station Real-Time Earthquake Early Warning

    Authors: Tianning Zhang, Feng Liu, Yuming Yuan, Rui Su, Wanli Ouyang, Lei Bai

    Abstract: Existing EEW approaches often treat phase picking, location estimation, and magnitude estimation as separate tasks, lacking a unified framework. Additionally, most deep learning models in seismology rely on full three-component waveforms and are not suitable for real-time streaming data. To address these limitations, we propose a novel unified seismic neural network called Fast Information Streami… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  10. arXiv:2408.06567  [pdf, other

    cs.CL cs.AI

    AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

    Authors: Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, Chengwei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu , et al. (2 additional authors not shown)

    Abstract: In recent years, with the rapid application of large language models across various fields, the scale of these models has gradually increased, and the resources required for their pre-training have grown exponentially. Training an LLM from scratch will cost a lot of computation resources while scaling up from a smaller model is a more efficient approach and has thus attracted significant attention… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  11. arXiv:2408.05710  [pdf, other

    cs.CV

    Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators

    Authors: Yifan Pu, Zhuofan Xia, Jiayi Guo, Dongchen Han, Qixiu Li, Duo Li, Yuhui Yuan, Ji Li, Yizeng Han, Shiji Song, Gao Huang, Xiu Li

    Abstract: This paper identifies significant redundancy in the query-key interactions within self-attention mechanisms of diffusion transformer models, particularly during the early stages of denoising diffusion steps. In response to this observation, we present a novel diffusion transformer framework incorporating an additional set of mediator tokens to engage with queries and keys separately. By modulating… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  12. arXiv:2408.05141  [pdf, other

    cs.CL cs.IR

    A Hybrid RAG System with Comprehensive Enhancement on Complex Reasoning

    Authors: Ye Yuan, Chengwu Liu, Jingyang Yuan, Gongbo Sun, Siqi Li, Ming Zhang

    Abstract: Retrieval-augmented generation (RAG) is a framework enabling large language models (LLMs) to enhance their accuracy and reduce hallucinations by integrating external knowledge bases. In this paper, we introduce a hybrid RAG system enhanced through a comprehensive suite of optimizations that significantly improve retrieval quality, augment reasoning capabilities, and refine numerical computation ab… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Technical report for 3rd prize in Task 1 of Meta CRAG KDD Cup 2024

  13. arXiv:2408.03608  [pdf, other

    cs.LG cs.CV stat.ME

    InPer: Whole-Process Domain Generalization via Causal Intervention and Perturbation

    Authors: Luyao Tang, Yuxuan Yuan, Chaoqi Chen, Xinghao Ding, Yue Huang

    Abstract: Despite the considerable advancements achieved by deep neural networks, their performance tends to degenerate when the test environment diverges from the training ones. Domain generalization (DG) solves this issue by learning representations independent of domain-related information, thus facilitating extrapolation to unseen environments. Existing approaches typically focus on formulating tailored… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted by BMVC2024

  14. arXiv:2408.02987  [pdf

    cs.LG

    A Differential Smoothness-based Compact-Dynamic Graph Convolutional Network for Spatiotemporal Signal Recovery

    Authors: Pengcheng Gao, Zicheng Gao, Ye Yuan

    Abstract: High quality spatiotemporal signal is vitally important for real application scenarios like energy management, traffic planning and cyber security. Due to the uncontrollable factors like abrupt sensors breakdown or communication fault, the spatiotemporal signal collected by sensors is always incomplete. A dynamic graph convolutional network (DGCN) is effective for processing spatiotemporal signal… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  15. arXiv:2408.02936  [pdf, other

    cs.LG

    Achieving More with Less: A Tensor-Optimization-Powered Ensemble Method

    Authors: Jinghui Yuan, Weijin Jiang, Zhe Cao, Fangyuan Xie, Rong Wang, Feiping Nie, Yuan Yuan

    Abstract: Ensemble learning is a method that leverages weak learners to produce a strong learner. However, obtaining a large number of base learners requires substantial time and computational resources. Therefore, it is meaningful to study how to achieve the performance typically obtained with many base learners using only a few. We argue that to achieve this, it is essential to enhance both classification… ▽ More

    Submitted 12 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  16. arXiv:2408.02932  [pdf, other

    cs.LG cs.AI

    Doubly Stochastic Adaptive Neighbors Clustering via the Marcus Mapping

    Authors: Jinghui Yuan, Chusheng Zeng, Fangyuan Xie, Zhe Cao, Mulin Chen, Rong Wang, Feiping Nie, Yuan Yuan

    Abstract: Clustering is a fundamental task in machine learning and data science, and similarity graph-based clustering is an important approach within this domain. Doubly stochastic symmetric similarity graphs provide numerous benefits for clustering problems and downstream tasks, yet learning such graphs remains a significant challenge. Marcus theorem states that a strictly positive symmetric matrix can be… ▽ More

    Submitted 12 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  17. arXiv:2408.00989  [pdf, other

    cs.AI

    On the Resilience of Multi-Agent Systems with Malicious Agents

    Authors: Jen-tse Huang, Jiaxu Zhou, Tailin Jin, Xuhui Zhou, Zixi Chen, Wenxuan Wang, Youliang Yuan, Maarten Sap, Michael R. Lyu

    Abstract: Multi-agent systems, powered by large language models, have shown great abilities across various tasks due to the collaboration of expert agents, each focusing on a specific domain. However, when agents are deployed separately, there is a risk that malicious users may introduce malicious agents who generate incorrect or irrelevant results that are too stealthy to be identified by other non-special… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 10 pages

  18. arXiv:2407.21376  [pdf

    cs.AI

    An Extended Kalman Filter Integrated Latent Feature Model on Dynamic Weighted Directed Graphs

    Authors: Hongxun Zhou, Xiangyu Chen, Ye Yuan

    Abstract: A dynamic weighted directed graph (DWDG) is commonly encountered in various application scenarios. It involves extensive dynamic interactions among numerous nodes. Most existing approaches explore the intricate temporal patterns hidden in a DWDG from the purely data-driven perspective, which suffers from accuracy loss when a DWDG exhibits strong fluctuations over time. To address this issue, this… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  19. arXiv:2407.20119  [pdf, ps, other

    cs.LG cs.AI

    Adaptive Self-supervised Robust Clustering for Unstructured Data with Unknown Cluster Number

    Authors: Chen-Lu Ding, Jiancan Wu, Wei Lin, Shiyang Shen, Xiang Wang, Yancheng Yuan

    Abstract: We introduce a novel self-supervised deep clustering approach tailored for unstructured data without requiring prior knowledge of the number of clusters, termed Adaptive Self-supervised Robust Clustering (ASRC). In particular, ASRC adaptively learns the graph structure and edge weights to capture both local and global structural information. The obtained graph enables us to learn clustering-friend… ▽ More

    Submitted 30 July, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

  20. arXiv:2407.20060  [pdf, other

    cs.LG cs.AI cs.DB

    RelBench: A Benchmark for Deep Learning on Relational Databases

    Authors: Joshua Robinson, Rishabh Ranjan, Weihua Hu, Kexin Huang, Jiaqi Han, Alejandro Dobles, Matthias Fey, Jan E. Lenssen, Yiwen Yuan, Zecheng Zhang, Xinwei He, Jure Leskovec

    Abstract: We present RelBench, a public benchmark for solving predictive tasks over relational databases with graph neural networks. RelBench provides databases and tasks spanning diverse domains and scales, and is intended to be a foundational infrastructure for future research. We use RelBench to conduct the first comprehensive study of Relational Deep Learning (RDL) (Fey et al., 2024), which combines gra… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  21. arXiv:2407.18902  [pdf, other

    cs.RO cs.AI cs.LG

    Lessons from Learning to Spin "Pens"

    Authors: Jun Wang, Ying Yuan, Haichuan Che, Haozhi Qi, Yi Ma, Jitendra Malik, Xiaolong Wang

    Abstract: In-hand manipulation of pen-like objects is an important skill in our daily lives, as many tools such as hammers and screwdrivers are similarly shaped. However, current learning-based methods struggle with this task due to a lack of high-quality demonstrations and the significant gap between simulation and the real world. In this work, we push the boundaries of learning-based in-hand manipulation… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Website: https://penspin.github.io/

  22. arXiv:2407.17535  [pdf, other

    cs.AI cs.LG cs.SE

    LAMBDA: A Large Model Based Data Agent

    Authors: Maojun Sun, Ruijian Han, Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan, Jian Huang

    Abstract: We introduce ``LAMBDA," a novel open-source, code-free multi-agent data analysis system that that harnesses the power of large models. LAMBDA is designed to address data analysis challenges in complex data-driven applications through the use of innovatively designed data agents that operate iteratively and generatively using natural language. At the core of LAMBDA are two key agent roles: the prog… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 30 pages, 21 figures and 5 tables

    MSC Class: 62-04; 62-08; 68T01; 68T09

  23. arXiv:2407.16847  [pdf, other

    cs.PL cs.LG

    SPLAT: A framework for optimised GPU code-generation for SParse reguLar ATtention

    Authors: Ahan Gupta, Yueming Yuan, Devansh Jain, Yuhao Ge, David Aponte, Yanqi Zhou, Charith Mendis

    Abstract: Multi-head-self-attention (MHSA) mechanisms achieve state-of-the-art (SOTA) performance across natural language processing and vision tasks. However, their quadratic dependence on sequence lengths has bottlenecked inference speeds. To circumvent this bottleneck, researchers have proposed various sparse-MHSA models, where a subset of full attention is computed. Despite their promise, current sparse… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 31 pages, 16 figures

  24. arXiv:2407.14983  [pdf, other

    physics.med-ph cs.CV eess.IV physics.comp-ph

    Deep Learning CT Image Restoration using System Blur and Noise Models

    Authors: Yijie Yuan, Grace J. Gang, J. Webster Stayman

    Abstract: The restoration of images affected by blur and noise has been widely studied and has broad potential for applications including in medical imaging modalities like computed tomography (CT). Although the blur and noise in CT images can be attributed to a variety of system factors, these image properties can often be modeled and predicted accurately and used in classical restoration approaches for de… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  25. arXiv:2407.14381  [pdf, other

    cs.LG

    Improving GBDT Performance on Imbalanced Datasets: An Empirical Study of Class-Balanced Loss Functions

    Authors: Jiaqi Luo, Yuan Yuan, Shixin Xu

    Abstract: Class imbalance remains a significant challenge in machine learning, particularly for tabular data classification tasks. While Gradient Boosting Decision Trees (GBDT) models have proven highly effective for such tasks, their performance can be compromised when dealing with imbalanced datasets. This paper presents the first comprehensive study on adapting class-balanced loss functions to three GBDT… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  26. arXiv:2407.14102  [pdf, other

    cs.RO

    MSSP : A Versatile Multi-Scenario Adaptable Intelligent Robot Simulation Platform Based on LIDAR-Inertial Fusion

    Authors: Qiyan Li, Chang Wu, Yifei Yuan, Yuan You

    Abstract: This letter presents a multi-scenario adaptable intelligent robot simulation platform based on LIDAR-inertial fusion, with three main features: (1 The platform includes an versatile robot model that can be freely controlled through manual control or autonomous tracking. This model is equipped with various types of LIDAR and Inertial Measurement Unit (IMU), providing ground truth information with a… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  27. arXiv:2407.13193  [pdf, other

    cs.CL

    Retrieval-Augmented Generation for Natural Language Processing: A Survey

    Authors: Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue

    Abstract: Large language models (LLMs) have demonstrated great success in various fields, benefiting from their huge amount of parameters that store knowledge. However, LLMs still suffer from several key issues, such as hallucination problems, knowledge update issues, and lacking domain-specific expertise. The appearance of retrieval-augmented generation (RAG), which leverages an external knowledge database… ▽ More

    Submitted 18 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  28. arXiv:2407.13179  [pdf, other

    eess.IV cs.CV

    Learned HDR Image Compression for Perceptually Optimal Storage and Display

    Authors: Peibei Cao, Haoyu Chen, Jingzhe Ma, Yu-Chieh Yuan, Zhiyong Xie, Xin Xie, Haiqing Bai, Kede Ma

    Abstract: High dynamic range (HDR) capture and display have seen significant growth in popularity driven by the advancements in technology and increasing consumer demand for superior image quality. As a result, HDR image compression is crucial to fully realize the benefits of HDR imaging without suffering from large file sizes and inefficient data handling. Conventionally, this is achieved by introducing a… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  29. arXiv:2407.12684  [pdf, other

    cs.CV

    4Dynamic: Text-to-4D Generation with Hybrid Priors

    Authors: Yu-Jie Yuan, Leif Kobbelt, Jiwen Liu, Yuan Zhang, Pengfei Wan, Yu-Kun Lai, Lin Gao

    Abstract: Due to the fascinating generative performance of text-to-image diffusion models, growing text-to-3D generation works explore distilling the 2D generative priors into 3D, using the score distillation sampling (SDS) loss, to bypass the data scarcity problem. The existing text-to-3D methods have achieved promising results in realism and 3D consistency, but text-to-4D generation still faces challenges… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  30. arXiv:2407.11745  [pdf, other

    eess.AS cs.AI cs.SD

    Universal Sound Separation with Self-Supervised Audio Masked Autoencoder

    Authors: Junqi Zhao, Xubo Liu, Jinzheng Zhao, Yi Yuan, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

    Abstract: Universal sound separation (USS) is a task of separating mixtures of arbitrary sound sources. Typically, universal separation models are trained from scratch in a supervised manner, using labeled data. Self-supervised learning (SSL) is an emerging deep learning approach that leverages unlabeled data to obtain task-agnostic representations, which can benefit many downstream tasks. In this paper, we… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  31. arXiv:2407.09918  [pdf, other

    eess.IV cs.CV

    DiffRect: Latent Diffusion Label Rectification for Semi-supervised Medical Image Segmentation

    Authors: Xinyu Liu, Wuyang Li, Yixuan Yuan

    Abstract: Semi-supervised medical image segmentation aims to leverage limited annotated data and rich unlabeled data to perform accurate segmentation. However, existing semi-supervised methods are highly dependent on the quality of self-generated pseudo labels, which are prone to incorrect supervision and confirmation bias. Meanwhile, they are insufficient in capturing the label distributions in latent spac… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: MICCAI 2024

  32. arXiv:2407.09826  [pdf, other

    cs.CV

    3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance

    Authors: Xiaoxu Xu, Yitian Yuan, Jinlong Li, Qiudan Zhang, Zequn Jie, Lin Ma, Hao Tang, Nicu Sebe, Xu Wang

    Abstract: In this paper, we propose 3DSS-VLG, a weakly supervised approach for 3D Semantic Segmentation with 2D Vision-Language Guidance, an alternative approach that a 3D model predicts dense-embedding for each point which is co-embedded with both the aligned image and text spaces from the 2D vision-language model. Specifically, our method exploits the superior generalization ability of the 2D vision-langu… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  33. arXiv:2407.09760  [pdf, other

    cs.CV cs.AI

    ICCV23 Visual-Dialog Emotion Explanation Challenge: SEU_309 Team Technical Report

    Authors: Yixiao Yuan, Yingzhe Peng

    Abstract: The Visual-Dialog Based Emotion Explanation Generation Challenge focuses on generating emotion explanations through visual-dialog interactions in art discussions. Our approach combines state-of-the-art multi-modal models, including Language Model (LM) and Large Vision Language Model (LVLM), to achieve superior performance. By leveraging these models, we outperform existing benchmarks, securing the… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  34. arXiv:2407.09121  [pdf, other

    cs.CL cs.AI

    Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

    Authors: Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu

    Abstract: This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) by identifying and tackling a refusal position bias within safety tuning data, which compromises the models' ability to appropriately refuse generating unsafe content. We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at a… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  35. arXiv:2407.07720  [pdf, other

    eess.IV cs.CV

    Exploiting Scale-Variant Attention for Segmenting Small Medical Objects

    Authors: Wei Dai, Rui Liu, Zixuan Wu, Tianyi Wu, Min Wang, Junxian Zhou, Yixuan Yuan, Jun Liu

    Abstract: Early detection and accurate diagnosis can predict the risk of malignant disease transformation, thereby increasing the probability of effective treatment. Identifying mild syndrome with small pathological regions serves as an ominous warning and is fundamental in the early diagnosis of diseases. While deep learning algorithms, particularly convolutional neural networks (CNNs), have shown promise… ▽ More

    Submitted 5 August, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: 14 pages, 9 figures, under review

  36. arXiv:2407.06881  [pdf, other

    cs.DS

    Efficient Stochastic Routing in Path-Centric Uncertain Road Networks -- Extended Version

    Authors: Chenjuan Guo, Ronghui Xu, Bin Yang, Ye Yuan, Tung Kieu, Yan Zhao, Christian S. Jensen

    Abstract: The availability of massive vehicle trajectory data enables the modeling of road-network constrained movement as travel-cost distributions rather than just single-valued costs, thereby capturing the inherent uncertainty of movement and enabling improved routing quality. Thus, stochastic routing has been studied extensively in the edge-centric model, where such costs are assigned to the edges in a… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  37. arXiv:2407.06048  [pdf, other

    cs.CL cs.CV

    Vision-Braille: An End-to-End Tool for Chinese Braille Image-to-Text Translation

    Authors: Alan Wu, Ye Yuan, Ming Zhang

    Abstract: Visually impaired people are a large group who can only use braille for reading and writing. However, the lack of special educational resources is the bottleneck for educating them. Educational equity is a reflection of the level of social civilization, cultural equality, and individual dignity. Facilitating and improving lifelong learning channels for the visually impaired is of great significanc… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: This paper is submitted to NeurIPS 2024 High School Project Track

  38. arXiv:2407.05540  [pdf, other

    cs.CV

    GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation

    Authors: Chenxin Li, Xinyu Liu, Cheng Wang, Yifan Liu, Weihao Yu, Jing Shao, Yixuan Yuan

    Abstract: Recent advances in learning multi-modal representation have witnessed the success in biomedical domains. While established techniques enable handling multi-modal information, the challenges are posed when extended to various clinical modalities and practical modalitymissing setting due to the inherent modality gaps. To tackle these, we propose an innovative Modality-prompted Heterogeneous Graph fo… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  39. arXiv:2407.04416  [pdf, other

    cs.SD cs.MM eess.AS

    Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions

    Authors: Yi Yuan, Dongya Jia, Xiaobin Zhuang, Yuanzhe Chen, Zhengxi Liu, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xubo Liu, Xiyuan Kang, Mark D. Plumbley, Wenwu Wang

    Abstract: Generative models have shown significant achievements in audio generation tasks. However, existing models struggle with complex and detailed prompts, leading to potential performance degradation. We hypothesize that this problem stems from the simplicity and scarcity of the training data. This work aims to create a large-scale audio dataset with rich captions for improving audio generation models.… ▽ More

    Submitted 14 August, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: 5 pages with 1 appendix

  40. arXiv:2407.04121  [pdf, other

    cs.CL cs.AI

    Hallucination Detection: Robustly Discerning Reliable Answers in Large Language Models

    Authors: Yuyan Chen, Qiang Fu, Yichen Yuan, Zhihao Wen, Ge Fan, Dayiheng Liu, Dongmei Zhang, Zhixu Li, Yanghua Xiao

    Abstract: Large Language Models (LLMs) have gained widespread adoption in various natural language processing tasks, including question answering and dialogue systems. However, a major drawback of LLMs is the issue of hallucination, where they generate unfaithful or inconsistent content that deviates from the input source, leading to severe consequences. In this paper, we propose a robust discriminator name… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted to CIKM 2023 (Long Paper)

  41. arXiv:2407.03825  [pdf, other

    cs.CV cs.RO

    StreamLTS: Query-based Temporal-Spatial LiDAR Fusion for Cooperative Object Detection

    Authors: Yunshuang Yuan, Monika Sester

    Abstract: Cooperative perception via communication among intelligent traffic agents has great potential to improve the safety of autonomous driving. However, limited communication bandwidth, localization errors and asynchronized capturing time of sensor data, all introduce difficulties to the data fusion of different agents. To some extend, previous works have attempted to reduce the shared data size, mitig… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  42. arXiv:2407.03640  [pdf, other

    cs.LG cs.CL cs.CV

    Generative Technology for Human Emotion Recognition: A Scope Review

    Authors: Fei Ma, Yucheng Yuan, Yifan Xie, Hongwei Ren, Ivan Liu, Ying He, Fuji Ren, Fei Richard Yu, Shiguang Ni

    Abstract: Affective computing stands at the forefront of artificial intelligence (AI), seeking to imbue machines with the ability to comprehend and respond to human emotions. Central to this field is emotion recognition, which endeavors to identify and interpret human emotional states from different modalities, such as speech, facial images, text, and physiological signals. In recent years, important progre… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Under Review

  43. arXiv:2407.03133  [pdf, other

    cs.CY cs.AI cs.LG stat.ML

    Quantifying the Cross-sectoral Intersecting Discrepancies within Multiple Groups Using Latent Class Analysis Towards Fairness

    Authors: Yingfang Yuan, Kefan Chen, Mehdi Rizvi, Lynne Baillie, Wei Pang

    Abstract: The growing interest in fair AI development is evident. The ''Leave No One Behind'' initiative urges us to address multiple and intersecting forms of inequality in accessing services, resources, and opportunities, emphasising the significance of fairness in AI. This is particularly relevant as an increasing number of AI tools are applied to decision-making processes, such as resource allocation an… ▽ More

    Submitted 11 July, 2024; v1 submitted 24 May, 2024; originally announced July 2024.

  44. arXiv:2407.02392  [pdf, other

    cs.CV

    TokenPacker: Efficient Visual Projector for Multimodal LLM

    Authors: Wentong Li, Yuqian Yuan, Jian Liu, Dongqi Tang, Song Wang, Jianke Zhu, Lei Zhang

    Abstract: The visual projector serves as an essential bridge between the visual encoder and the Large Language Model (LLM) in a Multimodal LLM (MLLM). Typically, MLLMs adopt a simple MLP to preserve all visual contexts via one-to-one transformation. However, the visual tokens are redundant and can be considerably increased when dealing with high-resolution images, impairing the efficiency of MLLMs significa… ▽ More

    Submitted 22 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: 16 pages, Codes:https://github.com/CircleRadon/TokenPacker

  45. arXiv:2407.01301  [pdf, other

    cs.CV

    GaussianStego: A Generalizable Stenography Pipeline for Generative 3D Gaussians Splatting

    Authors: Chenxin Li, Hengyu Liu, Zhiwen Fan, Wuyang Li, Yifan Liu, Panwang Pan, Yixuan Yuan

    Abstract: Recent advancements in large generative models and real-time neural rendering using point-based techniques pave the way for a future of widespread visual data distribution through sharing synthesized 3D assets. However, while standardized methods for embedding proprietary or copyright information, either overtly or subtly, exist for conventional visual content such as images and videos, this issue… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Project website: https://gaussian-stego.github.io/

  46. arXiv:2407.01029  [pdf, other

    cs.CV

    EndoSparse: Real-Time Sparse View Synthesis of Endoscopic Scenes using Gaussian Splatting

    Authors: Chenxin Li, Brandon Y. Feng, Yifan Liu, Hengyu Liu, Cheng Wang, Weihao Yu, Yixuan Yuan

    Abstract: 3D reconstruction of biological tissues from a collection of endoscopic images is a key to unlock various important downstream surgical applications with 3D capabilities. Existing methods employ various advanced neural rendering techniques for photorealistic view synthesis, but they often struggle to recover accurate 3D representations when only sparse observations are available, which is usually… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accpeted by MICCAI2024

  47. arXiv:2407.00468  [pdf, other

    cs.CV cs.AI cs.CL

    MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation

    Authors: Jinsheng Huang, Liang Chen, Taian Guo, Fu Zeng, Yusheng Zhao, Bohan Wu, Ye Yuan, Haozhe Zhao, Zhihui Guo, Yichi Zhang, Jingyang Yuan, Wei Ju, Luchen Liu, Tianyu Liu, Baobao Chang, Ming Zhang

    Abstract: Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for such evaluations suffer from systematic biases. Remarkably, Large Language Models (LLMs) without any visual perception capabilities achieve non-trivial p… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 21 pages, code released at https://github.com/chenllliang/MMEvalPro, Homepage at https://mmevalpro.github.io/

  48. arXiv:2407.00187  [pdf, other

    cs.RO cs.CV cs.GR

    SMPLOlympics: Sports Environments for Physically Simulated Humanoids

    Authors: Zhengyi Luo, Jiashun Wang, Kangni Liu, Haotian Zhang, Chen Tessler, Jingbo Wang, Ye Yuan, Jinkun Cao, Zihui Lin, Fengyi Wang, Jessica Hodgins, Kris Kitani

    Abstract: We present SMPLOlympics, a collection of physically simulated environments that allow humanoids to compete in a variety of Olympic sports. Sports simulation offers a rich and standardized testing ground for evaluating and improving the capabilities of learning algorithms due to the diversity and physically demanding nature of athletic activities. As humans have been competing in these sports for m… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Project page: https://smplolympics.github.io/SMPLOlympics

  49. arXiv:2407.00099  [pdf, other

    q-bio.NC cs.LG stat.AP

    Optimal Transport for Latent Integration with An Application to Heterogeneous Neuronal Activity Data

    Authors: Yubai Yuan, Babak Shahbaba, Norbert Fortin, Keiland Cooper, Qing Nie, Annie Qu

    Abstract: Detecting dynamic patterns of task-specific responses shared across heterogeneous datasets is an essential and challenging problem in many scientific applications in medical science and neuroscience. In our motivating example of rodent electrophysiological data, identifying the dynamical patterns in neuronal activity associated with ongoing cognitive demands and behavior is key to uncovering the n… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  50. arXiv:2406.18310  [pdf, other

    cs.CV cs.LG eess.IV

    Spatial-temporal Hierarchical Reinforcement Learning for Interpretable Pathology Image Super-Resolution

    Authors: Wenting Chen, Jie Liu, Tommy W. S. Chow, Yixuan Yuan

    Abstract: Pathology image are essential for accurately interpreting lesion cells in cytopathology screening, but acquiring high-resolution digital slides requires specialized equipment and long scanning times. Though super-resolution (SR) techniques can alleviate this problem, existing deep learning models recover pathology image in a black-box manner, which can lead to untruthful biological details and mis… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to IEEE TRANSACTIONS ON MEDICAL IMAGING (TMI)