Skip to main content

Showing 1–50 of 380 results for author: Qi, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10608  [pdf, other

    cs.CL cs.AI

    Promoting Equality in Large Language Models: Identifying and Mitigating the Implicit Bias based on Bayesian Theory

    Authors: Yongxin Deng, Xihe Qiu, Xiaoyu Tan, Jing Pan, Chen Jue, Zhijun Fang, Yinghui Xu, Wei Chu, Yuan Qi

    Abstract: Large language models (LLMs) are trained on extensive text corpora, which inevitably include biased information. Although techniques such as Affective Alignment can mitigate some negative impacts of these biases, existing prompt-based attack methods can still extract these biases from the model's weights. Moreover, these biases frequently appear subtly when LLMs are prompted to perform identical t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2408.09723  [pdf, other

    cs.LG

    sTransformer: A Modular Approach for Extracting Inter-Sequential and Temporal Information for Time-Series Forecasting

    Authors: Jiaheng Yin, Zhengxin Shi, Jianshen Zhang, Xiaomin Lin, Yulin Huang, Yongzhi Qi, Wei Qi

    Abstract: In recent years, numerous Transformer-based models have been applied to long-term time-series forecasting (LTSF) tasks. However, recent studies with linear models have questioned their effectiveness, demonstrating that simple linear layers can outperform sophisticated Transformer-based models. In this work, we review and categorize existing Transformer-based models into two main types: (1) modific… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  3. arXiv:2408.07009  [pdf, other

    cs.CV

    Imagen 3

    Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis, Siavash Khodadadeh , et al. (227 additional authors not shown)

    Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

    Submitted 13 August, 2024; originally announced August 2024.

  4. arXiv:2408.05786  [pdf, other

    cs.CL

    HiLight: A Hierarchy-aware Light Global Model with Hierarchical Local ConTrastive Learning

    Authors: Zhijian Chen, Zhonghua Li, Jianxin Yang, Ye Qi

    Abstract: Hierarchical text classification (HTC) is a special sub-task of multi-label classification (MLC) whose taxonomy is constructed as a tree and each sample is assigned with at least one path in the tree. Latest HTC models contain three modules: a text encoder, a structure encoder and a multi-label classification head. Specially, the structure encoder is designed to encode the hierarchy of taxonomy. H… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  5. arXiv:2408.05681  [pdf, ps, other

    cs.LG cs.AI

    SRTFD: Scalable Real-Time Fault Diagnosis through Online Continual Learning

    Authors: Dandan Zhao, Karthick Sharma, Hongpeng Yin, Yuxin Qi, Shuhao Zhang

    Abstract: Fault diagnosis (FD) is essential for maintaining operational safety and minimizing economic losses by detecting system abnormalities. Recently, deep learning (DL)-driven FD methods have gained prominence, offering significant improvements in precision and adaptability through the utilization of extensive datasets and advanced DL models. Modern industrial environments, however, demand FD methods t… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  6. arXiv:2408.05586  [pdf, other

    cs.LG cs.IR

    Meta Clustering of Neural Bandits

    Authors: Yikun Ban, Yunzhe Qi, Tianxin Wei, Lihui Liu, Jingrui He

    Abstract: The contextual bandit has been identified as a powerful framework to formulate the recommendation process as a sequential decision-making process, where each item is regarded as an arm and the objective is to minimize the regret of $T$ rounds. In this paper, we study a new problem, Clustering of Neural Bandits, by extending previous work to the arbitrary reward function, to strike a balance betwee… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: KDD 2024

  7. arXiv:2408.05472  [pdf, other

    cs.LG physics.ao-ph

    FuXi Weather: An end-to-end machine learning weather data assimilation and forecasting system

    Authors: Xiuyu Sun, Xiaohui Zhong, Xiaoze Xu, Yuanqing Huang, Hao Li, Jie Feng, Wei Han, Libo Wu, Yuan Qi

    Abstract: Operational numerical weather prediction systems consist of three fundamental components: the global observing system for data collection, data assimilation for generating initial conditions, and the forecasting model to predict future weather conditions. While NWP have undergone a quiet revolution, with forecast skills progressively improving over the past few decades, their advancement has slowe… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 34 pages, 4 figures

  8. arXiv:2408.04187  [pdf, other

    cs.CV

    Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation

    Authors: Junde Wu, Jiayuan Zhu, Yunli Qi

    Abstract: We introduce a novel graph-based Retrieval-Augmented Generation (RAG) framework specifically designed for the medical domain, called \textbf{MedGraphRAG}, aimed at enhancing Large Language Model (LLM) capabilities and generating evidence-based results, thereby improving safety and reliability when handling private medical data. Our comprehensive pipeline begins with a hybrid static-semantic approa… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  9. arXiv:2408.03215  [pdf, other

    cs.LG cs.DC

    FedBAT: Communication-Efficient Federated Learning via Learnable Binarization

    Authors: Shiwei Li, Wenchao Xu, Haozhao Wang, Xing Tang, Yining Qi, Shijie Xu, Weihong Luo, Yuhua Li, Xiuqiang He, Ruixuan Li

    Abstract: Federated learning is a promising distributed machine learning paradigm that can effectively exploit large-scale data without exposing users' privacy. However, it may incur significant communication overhead, thereby potentially impairing the training efficiency. To address this challenge, numerous studies suggest binarizing the model updates. Nonetheless, traditional methods usually binarize mode… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted by ICML 2024

  10. arXiv:2408.01840  [pdf, other

    cs.CV

    E$^3$NeRF: Efficient Event-Enhanced Neural Radiance Fields from Blurry Images

    Authors: Yunshan Qi, Jia Li, Yifan Zhao, Yu Zhang, Lin Zhu

    Abstract: Neural Radiance Fields (NeRF) achieve impressive rendering performance by learning volumetric 3D representation from several images of different views. However, it is difficult to reconstruct a sharp NeRF from blurry input as it often occurs in the wild. To solve this problem, we propose a novel Efficient Event-Enhanced NeRF (E$^3$NeRF) by utilizing the combination of RGB images and event streams.… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  11. arXiv:2408.01696  [pdf, other

    cs.SD cs.AI eess.AS

    Generating High-quality Symbolic Music Using Fine-grained Discriminators

    Authors: Zhedong Zhang, Liang Li, Jiehua Zhang, Zhenghui Hu, Hongkui Wang, Chenggang Yan, Jian Yang, Yuankai Qi

    Abstract: Existing symbolic music generation methods usually utilize discriminator to improve the quality of generated music via global perception of music. However, considering the complexity of information in music, such as rhythm and melody, a single discriminator cannot fully reflect the differences in these two primary dimensions of music. In this work, we propose to decouple the melody and rhythm from… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: Accepted by ICPR2024

  12. arXiv:2408.00874  [pdf, other

    cs.CV

    Medical SAM 2: Segment medical images as video via Segment Anything Model 2

    Authors: Jiayuan Zhu, Yunli Qi, Junde Wu

    Abstract: In this paper, we introduce Medical SAM 2 (MedSAM-2), an advanced segmentation model that utilizes the SAM 2 framework to address both 2D and 3D medical image segmentation tasks. By adopting the philosophy of taking medical images as videos, MedSAM-2 not only applies to 3D medical images but also unlocks new One-prompt Segmentation capability. That allows users to provide a prompt for just one or… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  13. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  14. arXiv:2407.21721  [pdf, other

    cs.MM cs.AI

    Open-Vocabulary Audio-Visual Semantic Segmentation

    Authors: Ruohao Guo, Liao Qu, Dantong Niu, Yanyu Qi, Wenzhen Yue, Ji Shi, Bowei Xing, Xianghua Ying

    Abstract: Audio-visual semantic segmentation (AVSS) aims to segment and classify sounding objects in videos with acoustic cues. However, most approaches operate on the close-set assumption and only identify pre-defined categories from training data, lacking the generalization ability to detect novel categories in practical applications. In this paper, we introduce a new task: open-vocabulary audio-visual se… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024 (Oral)

  15. arXiv:2407.21439  [pdf, other

    cs.AI cs.CL cs.LG

    MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training

    Authors: Zhanpeng Chen, Chengjin Xu, Yiyan Qi, Jian Guo

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in processing and generating content across multiple data modalities, including text, images, audio, and video. However, a significant drawback of MLLMs is their reliance on static training data, leading to outdated information and limited contextual awareness. This static nature hampers their ability to provide acc… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  16. arXiv:2407.19306  [pdf, other

    cs.CV

    Symmetrical Joint Learning Support-query Prototypes for Few-shot Segmentation

    Authors: Qun Li, Baoquan Sun, Fu Xiao, Yonggang Qi, Bir Bhanu

    Abstract: We propose Sym-Net, a novel framework for Few-Shot Segmentation (FSS) that addresses the critical issue of intra-class variation by jointly learning both query and support prototypes in a symmetrical manner. Unlike previous methods that generate query prototypes solely by matching query features to support prototypes, which is a form of bias learning towards the few-shot support samples, Sym-Net l… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  17. arXiv:2407.15451  [pdf, other

    cs.CV

    Domain-Adaptive 2D Human Pose Estimation via Dual Teachers in Extremely Low-Light Conditions

    Authors: Yihao Ai, Yifei Qi, Bo Wang, Yu Cheng, Xinchao Wang, Robby T. Tan

    Abstract: Existing 2D human pose estimation research predominantly concentrates on well-lit scenarios, with limited exploration of poor lighting conditions, which are a prevalent aspect of daily life. Recent studies on low-light pose estimation require the use of paired well-lit and low-light images with ground truths for training, which are impractical due to the inherent challenges associated with annotat… ▽ More

    Submitted 23 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 18 pages, 3 figure. Accepted by ECCV24

  18. arXiv:2407.15352  [pdf, other

    cs.CL

    MAVEN-Fact: A Large-scale Event Factuality Detection Dataset

    Authors: Chunyang Li, Hao Peng, Xiaozhi Wang, Yunjia Qi, Lei Hou, Bin Xu, Juanzi Li

    Abstract: Event Factuality Detection (EFD) task determines the factuality of textual events, i.e., classifying whether an event is a fact, possibility, or impossibility, which is essential for faithfully understanding and utilizing event knowledge. However, due to the lack of high-quality large-scale data, event factuality detection is under-explored in event understanding research, which limits the develop… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Under review

  19. arXiv:2407.14562  [pdf, other

    cs.AI cs.CL

    Thought-Like-Pro: Enhancing Reasoning of Large Language Models through Self-Driven Prolog-based Chain-of-Thought

    Authors: Xiaoyu Tan, Yongxin Deng, Xihe Qiu, Weidi Xu, Chao Qu, Wei Chu, Yinghui Xu, Yuan Qi

    Abstract: Large language models (LLMs) have shown exceptional performance as general-purpose assistants, excelling across a variety of reasoning tasks. This achievement represents a significant step toward achieving artificial general intelligence (AGI). Despite these advancements, the effectiveness of LLMs often hinges on the specific prompting strategies employed, and there remains a lack of a robust fram… ▽ More

    Submitted 10 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    ACM Class: I.2.7

  20. arXiv:2407.12999  [pdf, other

    cs.CY cs.AI cs.CR

    Securing the Future of GenAI: Policy and Technology

    Authors: Mihai Christodorescu, Ryan Craven, Soheil Feizi, Neil Gong, Mia Hoffmann, Somesh Jha, Zhengyuan Jiang, Mehrdad Saberi Kamarposhti, John Mitchell, Jessica Newman, Emelia Probasco, Yanjun Qi, Khawaja Shams, Matthew Turek

    Abstract: The rise of Generative AI (GenAI) brings about transformative potential across sectors, but its dual-use nature also amplifies risks. Governments globally are grappling with the challenge of regulating GenAI, balancing innovation against safety. China, the United States (US), and the European Union (EU) are at the forefront with initiatives like the Management of Algorithmic Recommendations, the E… ▽ More

    Submitted 21 May, 2024; originally announced July 2024.

  21. arXiv:2407.12532  [pdf, other

    cs.CL cs.AI

    Towards Collaborative Intelligence: Propagating Intentions and Reasoning for Multi-Agent Coordination with Large Language Models

    Authors: Xihe Qiu, Haoyu Wang, Xiaoyu Tan, Chao Qu, Yujie Xiong, Yuan Cheng, Yinghui Xu, Wei Chu, Yuan Qi

    Abstract: Effective collaboration in multi-agent systems requires communicating goals and intentions between agents. Current agent frameworks often suffer from dependencies on single-agent execution and lack robust inter-module communication, frequently leading to suboptimal multi-agent reinforcement learning (MARL) policies and inadequate task coordination. To address these challenges, we present a framewo… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  22. arXiv:2407.12522  [pdf, other

    cs.CL cs.AI

    Struct-X: Enhancing Large Language Models Reasoning with Structured Data

    Authors: Xiaoyu Tan, Haoyu Wang, Xihe Qiu, Yuan Cheng, Yinghui Xu, Wei Chu, Yuan Qi

    Abstract: Structured data, rich in logical and relational information, has the potential to enhance the reasoning abilities of large language models (LLMs). Still, its integration poses a challenge due to the risk of overwhelming LLMs with excessive tokens and irrelevant context information. To address this, we propose Struct-X, a novel framework that operates through five key phases: ``read-model-fill-refl… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  23. arXiv:2407.12217  [pdf, other

    cs.CV

    AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs

    Authors: Yunling Zheng, Zeyi Xu, Fanghui Xue, Biao Yang, Jiancheng Lyu, Shuai Zhang, Yingyong Qi, Jack Xin

    Abstract: We propose and demonstrate an alternating Fourier and image domain filtering approach for feature extraction as an efficient alternative to build a vision backbone without using the computationally intensive attention. The performance among the lightweight models reaches the state-of-the-art level on ImageNet-1K classification, and improves downstream tasks on object detection and segmentation con… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  24. arXiv:2407.11700  [pdf, other

    cs.CV eess.IV

    Rate-Distortion-Cognition Controllable Versatile Neural Image Compression

    Authors: Jinming Liu, Ruoyu Feng, Yunpeng Qi, Qiuyu Chen, Zhibo Chen, Wenjun Zeng, Xin Jin

    Abstract: Recently, the field of Image Coding for Machines (ICM) has garnered heightened interest and significant advances thanks to the rapid progress of learning-based techniques for image compression and analysis. Previous studies often require training separate codecs to support various bitrate levels, machine tasks, and networks, thus lacking both flexibility and practicality. To address these challeng… ▽ More

    Submitted 17 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  25. arXiv:2407.11298  [pdf, other

    cs.RO

    ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

    Authors: Yaoyao Qian, Xupeng Zhu, Ondrej Biza, Shuo Jiang, Linfeng Zhao, Haojie Huang, Yu Qi, Robert Platt

    Abstract: Robotic grasping in cluttered environments remains a significant challenge due to occlusions and complex object arrangements. We have developed ThinkGrasp, a plug-and-play vision-language grasping system that makes use of GPT-4o's advanced contextual reasoning for heavy clutter environment grasping strategies. ThinkGrasp can effectively identify and generate grasp poses for target objects, even wh… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Project Website:(https://h-freax.github.io/thinkgrasp_page/)

  26. arXiv:2407.08500  [pdf, other

    cs.LG cs.AI

    Latent Conditional Diffusion-based Data Augmentation for Continuous-Time Dynamic Graph Model

    Authors: Yuxing Tian, Yiyan Qi, Aiwen Jiang, Qi Huang, Jian Guo

    Abstract: Continuous-Time Dynamic Graph (CTDG) precisely models evolving real-world relationships, drawing heightened interest in dynamic graph learning across academia and industry. However, existing CTDG models encounter challenges stemming from noise and limited historical data. Graph Data Augmentation (GDA) emerges as a critical solution, yet current approaches primarily focus on static graphs and strug… ▽ More

    Submitted 20 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by KDD 2024

  27. arXiv:2407.08167  [pdf, other

    eess.IV cs.CV

    DSCENet: Dynamic Screening and Clinical-Enhanced Multimodal Fusion for MPNs Subtype Classification

    Authors: Yuan Zhang, Yaolei Qi, Xiaoming Qi, Yongyue Wei, Guanyu Yang

    Abstract: The precise subtype classification of myeloproliferative neoplasms (MPNs) based on multimodal information, which assists clinicians in diagnosis and long-term treatment plans, is of great clinical significance. However, it remains a great challenging task due to the lack of diagnostic representativeness for local patches and the absence of diagnostic-relevant features from a single modality. In th… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI2024

  28. arXiv:2407.05915  [pdf, other

    cs.NI cs.AI

    Harnessing Federated Generative Learning for Green and Sustainable Internet of Things

    Authors: Yuanhang Qi, M. Shamim Hossain

    Abstract: The rapid proliferation of devices in the Internet of Things (IoT) has ushered in a transformative era of data-driven connectivity across various domains. However, this exponential growth has raised pressing concerns about environmental sustainability and data privacy. In response to these challenges, this paper introduces One-shot Federated Learning (OSFL), an innovative paradigm that harmonizes… ▽ More

    Submitted 30 April, 2024; originally announced July 2024.

    Comments: This paper is a correction of the published version, in which we corrected the grammatical errors between contexts and highlighted the relationship with "Federated generative learning with foundation models"

  29. arXiv:2407.05005  [pdf, other

    cs.LG cs.DC

    Personalized Federated Domain-Incremental Learning based on Adaptive Knowledge Matching

    Authors: Yichen Li, Wenchao Xu, Haozhao Wang, Ruixuan Li, Yining Qi, Jingcai Guo

    Abstract: This paper focuses on Federated Domain-Incremental Learning (FDIL) where each client continues to learn incremental tasks where their domain shifts from each other. We propose a novel adaptive knowledge matching-based personalized FDIL approach (pFedDIL) which allows each client to alternatively utilize appropriate incremental task learning strategy on the correlation with the knowledge from previ… ▽ More

    Submitted 18 July, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

  30. arXiv:2407.04020  [pdf, other

    cs.CL

    LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking

    Authors: Amy Xin, Yunjia Qi, Zijun Yao, Fangwei Zhu, Kaisheng Zeng, Xu Bin, Lei Hou, Juanzi Li

    Abstract: Entity Linking (EL) models are well-trained at mapping mentions to their corresponding entities according to a given context. However, EL models struggle to disambiguate long-tail entities due to their limited training data. Meanwhile, large language models (LLMs) are more robust at interpreting uncommon mentions. Yet, due to a lack of specialized training, LLMs suffer at generating correct entity… ▽ More

    Submitted 15 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  31. arXiv:2407.03900  [pdf, other

    cs.CV

    Oracle Bone Inscriptions Multi-modal Dataset

    Authors: Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, Pian Wu, Chaoyang Wang, Yun Qi, Taisong Jin, Chengjie Wang, Xiaoming Huang, Zhan Shu, Rongrong Ji, Yongge Liu, Yunsheng Wu

    Abstract: Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography. However, the task of deciphering OBI, in the current climate of the scholarship, can prove extremely challenging. Out of the 4,500 oracle bone characters excavated, only a third have been successfully identified. Therefore, leveraging… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  32. arXiv:2407.02073  [pdf, other

    cs.LG

    Contribution Evaluation of Heterogeneous Participants in Federated Learning via Prototypical Representations

    Authors: Qi Guo, Minghao Yao, Zhen Tian, Saiyu Qi, Yong Qi, Yun Lin, Jin Song Dong

    Abstract: Contribution evaluation in federated learning (FL) has become a pivotal research area due to its applicability across various domains, such as detecting low-quality datasets, enhancing model robustness, and designing incentive mechanisms. Existing contribution evaluation methods, which primarily rely on data volume, model similarity, and auxiliary test datasets, have shown success in diverse scena… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  33. arXiv:2407.00921  [pdf, other

    cs.CV

    PointViG: A Lightweight GNN-based Model for Efficient Point Cloud Analysis

    Authors: Qiang Zheng, Yafei Qi, Chen Wang, Chao Zhang, Jian Sun

    Abstract: In the domain of point cloud analysis, despite the significant capabilities of Graph Neural Networks (GNNs) in managing complex 3D datasets, existing approaches encounter challenges like high computational costs and scalability issues with extensive scenarios. These limitations restrict the practical deployment of GNNs, notably in resource-constrained environments. To address these issues, this st… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  34. arXiv:2407.00365  [pdf, other

    cs.CL

    Financial Knowledge Large Language Model

    Authors: Cehao Yang, Chengjin Xu, Yiyan Qi

    Abstract: Artificial intelligence is making significant strides in the finance industry, revolutionizing how data is processed and interpreted. Among these technologies, large language models (LLMs) have demonstrated substantial potential to transform financial services by automating complex tasks, enhancing customer service, and providing detailed financial analysis. Firstly, we introduce IDEA-FinBench, an… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: 66 pages

  35. arXiv:2406.19143  [pdf, other

    cs.DB cs.DS

    QSketch: An Efficient Sketch for Weighted Cardinality Estimation in Streams

    Authors: Yiyan Qi, Rundong Li, Pinghui Wang, Yufang Sun, Rui Xing

    Abstract: Estimating cardinality, i.e., the number of distinct elements, of a data stream is a fundamental problem in areas like databases, computer networks, and information retrieval. This study delves into a broader scenario where each element carries a positive weight. Unlike traditional cardinality estimation, limited research exists on weighted cardinality, with current methods requiring substantial m… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 12 pages, 10 figures, accepted by KDD 2024

  36. arXiv:2406.18967  [pdf, other

    cs.CV

    Structural Attention: Rethinking Transformer for Unpaired Medical Image Synthesis

    Authors: Vu Minh Hieu Phan, Yutong Xie, Bowen Zhang, Yuankai Qi, Zhibin Liao, Antonios Perperidis, Son Lam Phung, Johan W. Verjans, Minh-Son To

    Abstract: Unpaired medical image synthesis aims to provide complementary information for an accurate clinical diagnostics, and address challenges in obtaining aligned multi-modal medical scans. Transformer-based models excel in imaging translation tasks thanks to their ability to capture long-range dependencies. Although effective in supervised training settings, their performance falters in unpaired image… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: MICCAI2024 - Early Accept Top 11%

  37. arXiv:2406.18327  [pdf, other

    eess.IV cs.CV cs.LG

    Multi-modal Evidential Fusion Network for Trusted PET/CT Tumor Segmentation

    Authors: Yuxuan Qi, Li Lin, Jiajun Wang, Jingya Zhang, Bin Zhang

    Abstract: Accurate segmentation of tumors in PET/CT images is important in computer-aided diagnosis and treatment of cancer. The key issue of such a segmentation problem lies in the effective integration of complementary information from PET and CT images. However, the quality of PET and CT images varies widely in clinical settings, which leads to uncertainty in the modality information extracted by network… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  38. arXiv:2406.15658  [pdf, other

    cs.CV cs.AI

    TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning

    Authors: Nemin Wu, Qian Cao, Zhangyu Wang, Zeping Liu, Yanlin Qi, Jielu Zhang, Joshua Ni, Xiaobai Yao, Hongxu Ma, Lan Mu, Stefano Ermon, Tanuja Ganu, Akshay Nambi, Ni Lao, Gengchen Mai

    Abstract: Spatial representation learning (SRL) aims at learning general-purpose neural network representations from various types of spatial data (e.g., points, polylines, polygons, networks, images, etc.) in their native formats. Learning good spatial representations is a fundamental problem for various downstream applications such as species distribution modeling, weather forecasting, trajectory generati… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 9 pages, 2 figures. Submitted to NeurIPS 2024 Datasets and Benchmarks Track. Under review

  39. Deblurring Neural Radiance Fields with Event-driven Bundle Adjustment

    Authors: Yunshan Qi, Lin Zhu, Yifan Zhao, Nan Bao, Jia Li

    Abstract: Neural Radiance Fields (NeRF) achieves impressive 3D representation learning and novel view synthesis results with high-quality multi-view images as input. However, motion blur in images often occurs in low-light and high-speed motion scenes, which significantly degrades the reconstruction quality of NeRF. Previous deblurring NeRF methods struggle to estimate pose and lighting changes during the e… ▽ More

    Submitted 1 August, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Accepted by 32nd ACM International Conference on Multimedia (MM 2024)

  40. arXiv:2406.11160  [pdf, other

    cs.AI

    Context Graph

    Authors: Chengjin Xu, Muzhi Li, Cehao Yang, Xuhui Jiang, Lumingyuan Tang, Yiyan Qi, Jian Guo

    Abstract: Knowledge Graphs (KGs) are foundational structures in many AI applications, representing entities and their interrelations through triples. However, triple-based KGs lack the contextual information of relational knowledge, like temporal dynamics and provenance details, which are crucial for comprehensive knowledge representation and effective reasoning. Instead, \textbf{Context Graphs} (CGs) expan… ▽ More

    Submitted 27 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

  41. arXiv:2406.02622  [pdf, other

    cs.CR cs.AI

    Safeguarding Large Language Models: A Survey

    Authors: Yi Dong, Ronghui Mu, Yanghao Zhang, Siqi Sun, Tianle Zhang, Changshun Wu, Gaojie Jin, Yi Qi, Jinwei Hu, Jie Meng, Saddek Bensalem, Xiaowei Huang

    Abstract: In the burgeoning field of Large Language Models (LLMs), developing a robust safety mechanism, colloquially known as "safeguards" or "guardrails", has become imperative to ensure the ethical use of LLMs within prescribed boundaries. This article provides a systematic literature review on the current status of this critical mechanism. It discusses its major challenges and how it can be enhanced int… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: under review. arXiv admin note: text overlap with arXiv:2402.01822

  42. arXiv:2406.02407  [pdf, other

    cs.CV

    WE-GS: An In-the-wild Efficient 3D Gaussian Representation for Unconstrained Photo Collections

    Authors: Yuze Wang, Junyi Wang, Yue Qi

    Abstract: Novel View Synthesis (NVS) from unconstrained photo collections is challenging in computer graphics. Recently, 3D Gaussian Splatting (3DGS) has shown promise for photorealistic and real-time NVS of static scenes. Building on 3DGS, we propose an efficient point-based differentiable rendering framework for scene reconstruction from photo collections. Our key innovation is a residual-based spherical… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Our project page is available at https://yuzewang1998.github.io/we-gs.github.io/

  43. arXiv:2406.01256  [pdf, other

    cs.CV cs.AI

    Augmented Commonsense Knowledge for Remote Object Grounding

    Authors: Bahram Mohammadi, Yicong Hong, Yuankai Qi, Qi Wu, Shirui Pan, Javen Qinfeng Shi

    Abstract: The vision-and-language navigation (VLN) task necessitates an agent to perceive the surroundings, follow natural language instructions, and act in photo-realistic unseen environments. Most of the existing methods employ the entire image or object features to represent navigable viewpoints. However, these representations are insufficient for proper action prediction, especially for the REVERIE task… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  44. arXiv:2406.00993  [pdf

    eess.SP cs.HC q-bio.OT

    Detection of Acetone as a Gas Biomarker for Diabetes Based on Gas Sensor Technology

    Authors: Jiaming Wei, Tong Liu, Jipeng Huang, Xiaowei Li, Yurui Qi, Gangyin Luo

    Abstract: With the continuous development and improvement of medical services, there is a growing demand for improving diabetes diagnosis. Exhaled breath analysis, characterized by its speed, convenience, and non-invasive nature, is leading the trend in diagnostic development. Studies have shown that the acetone levels in the breath of diabetes patients are higher than normal, making acetone a basis for dia… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: 9 pages, 14 figures

  45. arXiv:2405.20652  [pdf, other

    cs.LG

    Sign is Not a Remedy: Multiset-to-Multiset Message Passing for Learning on Heterophilic Graphs

    Authors: Langzhang Liang, Sunwoo Kim, Kijung Shin, Zenglin Xu, Shirui Pan, Yuan Qi

    Abstract: Graph Neural Networks (GNNs) have gained significant attention as a powerful modeling and inference method, especially for homophilic graph-structured data. To empower GNNs in heterophilic graphs, where adjacent nodes exhibit dissimilar labels or features, Signed Message Passing (SMP) has been widely adopted. However, there is a lack of theoretical and empirical analysis regarding the limitations… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

    Comments: Published as a conference paper at ICML 2024

  46. arXiv:2405.17846  [pdf, other

    cs.RO cs.AI

    Safety Control of Service Robots with LLMs and Embodied Knowledge Graphs

    Authors: Yong Qi, Gabriel Kyebambo, Siyuan Xie, Wei Shen, Shenghui Wang, Bitao Xie, Bin He, Zhipeng Wang, Shuo Jiang

    Abstract: Safety limitations in service robotics across various industries have raised significant concerns about the need for robust mechanisms ensuring that robots adhere to safe practices, thereby preventing actions that might harm humans or cause property damage. Despite advances, including the integration of Knowledge Graphs (KGs) with Large Language Models (LLMs), challenges in ensuring consistent saf… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  47. arXiv:2405.14808  [pdf, other

    cs.CL cs.AI cs.CY cs.HC cs.LG

    Implicit Personalization in Language Models: A Systematic Study

    Authors: Zhijing Jin, Nils Heil, Jiarui Liu, Shehzaad Dhuliawala, Yahang Qi, Bernhard Schölkopf, Rada Mihalcea, Mrinmaya Sachan

    Abstract: Implicit Personalization (IP) is a phenomenon of language models inferring a user's background from the implicit cues in the input prompts and tailoring the response based on this inference. While previous work has touched upon various instances of this problem, there lacks a unified framework to study this behavior. This work systematically studies IP through a rigorous mathematical formulation,… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  48. arXiv:2405.10746  [pdf, other

    cs.CY cs.AI

    Causality in the Can: Diet Coke's Impact on Fatness

    Authors: Yicheng Qi, Ang Li

    Abstract: Artificially sweetened beverages like Diet Coke are often considered better alternatives to sugary drinks, but the debate over their impact on health, particularly in relation to obesity, continues. Previous research has predominantly used association-based methods with observational or Randomized Controlled Trial (RCT) data, which may not accurately capture the causal relationship between Diet Co… ▽ More

    Submitted 18 August, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  49. arXiv:2405.07046  [pdf, other

    cs.CV

    Retrieval Enhanced Zero-Shot Video Captioning

    Authors: Yunchuan Ma, Laiyun Qing, Guorong Li, Yuankai Qi, Quan Z. Sheng, Qingming Huang

    Abstract: Despite the significant progress of fully-supervised video captioning, zero-shot methods remain much less explored. In this paper, we propose to take advantage of existing pre-trained large-scale vision and language models to directly generate captions with test time adaptation. Specifically, we bridge video and text using three key models: a general video understanding model XCLIP, a general imag… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  50. arXiv:2405.05008  [pdf, other

    cs.CL

    ADELIE: Aligning Large Language Models on Information Extraction

    Authors: Yunjia Qi, Hao Peng, Xiaozhi Wang, Bin Xu, Lei Hou, Juanzi Li

    Abstract: Large language models (LLMs) usually fall short on information extraction (IE) tasks and struggle to follow the complex instructions of IE tasks. This primarily arises from LLMs not being aligned with humans, as mainstream alignment datasets typically do not include IE data. In this paper, we introduce ADELIE (Aligning large language moDELs on Information Extraction), an aligned LLM that effective… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.