Skip to main content

Showing 1–50 of 225 results for author: Bai, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10854  [pdf, other

    physics.ao-ph cs.AI cs.CV

    MambaDS: Near-Surface Meteorological Field Downscaling with Topography Constrained Selective State Space Modeling

    Authors: Zili Liu, Hao Chen, Lei Bai, Wenyuan Li, Wanli Ouyang, Zhengxia Zou, Zhenwei Shi

    Abstract: In an era of frequent extreme weather and global warming, obtaining precise, fine-grained near-surface weather forecasts is increasingly essential for human activities. Downscaling (DS), a crucial task in meteorological forecasting, enables the reconstruction of high-resolution meteorological states for target regions from global-scale forecast results. Previous downscaling methods, inspired by CN… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2408.06629  [pdf, other

    cs.CV

    Fast Information Streaming Handler (FisH): A Unified Seismic Neural Network for Single Station Real-Time Earthquake Early Warning

    Authors: Tianning Zhang, Feng Liu, Yuming Yuan, Rui Su, Wanli Ouyang, Lei Bai

    Abstract: Existing EEW approaches often treat phase picking, location estimation, and magnitude estimation as separate tasks, lacking a unified framework. Additionally, most deep learning models in seismology rely on full three-component waveforms and are not suitable for real-time streaming data. To address these limitations, we propose a novel unified seismic neural network called Fast Information Streami… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  3. arXiv:2408.04958  [pdf, other

    cs.CV cs.RO

    Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery

    Authors: Long Bai, Guankun Wang, Mobarakol Islam, Lalithkumar Seenivasan, An Wang, Hongliang Ren

    Abstract: Medical visual question answering (VQA) bridges the gap between visual information and clinical decision-making, enabling doctors to extract understanding from clinical images and videos. In particular, surgical VQA can enhance the interpretation of surgical data, aiding in accurate diagnoses, effective education, and clinical interventions. However, the inability of VQA models to visually indicat… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by Information Fusion. Code and data availability: https://github.com/longbai1006/Surgical-VQLAPlus

  4. arXiv:2408.04593  [pdf, other

    cs.CV cs.RO eess.IV

    SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation

    Authors: Jieming Yu, An Wang, Wenzhen Dong, Mengya Xu, Mobarakol Islam, Jie Wang, Long Bai, Hongliang Ren

    Abstract: The recent Segment Anything Model (SAM) 2 has demonstrated remarkable foundational competence in semantic segmentation, with its memory mechanism and mask decoder further addressing challenges in video tracking and object occlusion, thereby achieving superior results in interactive segmentation for both images and videos. Building upon our previous empirical studies, we further explore the zero-sh… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Empirical study. Previous work "SAM Meets Robotic Surgery" is accessible at: arXiv:2308.07156

  5. arXiv:2408.04426  [pdf, other

    cs.CV cs.RO

    A Review of 3D Reconstruction Techniques for Deformable Tissues in Robotic Surgery

    Authors: Mengya Xu, Ziqi Guo, An Wang, Long Bai, Hongliang Ren

    Abstract: As a crucial and intricate task in robotic minimally invasive surgery, reconstructing surgical scenes using stereo or monocular endoscopic video holds immense potential for clinical applications. NeRF-based techniques have recently garnered attention for the ability to reconstruct scenes implicitly. On the other hand, Gaussian splatting-based 3D-GS represents scenes explicitly using 3D Gaussians a… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: To appear in MICCAI 2024 EARTH Workshop. Code availability: https://github.com/Epsilon404/surgicalnerf

  6. arXiv:2408.03877  [pdf, other

    cs.LG cs.AI

    Knowledge Probing for Graph Representation Learning

    Authors: Mingyu Zhao, Xingyu Huang, Ziyu Lyu, Yanlin Wang, Lixin Cui, Lu Bai

    Abstract: Graph learning methods have been extensively applied in diverse application areas. However, what kind of inherent graph properties e.g. graph proximity, graph structural information has been encoded into graph representation learning for downstream tasks is still under-explored. In this paper, we propose a novel graph probing framework (GraphProbe) to investigate and interpret whether the family o… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  7. arXiv:2407.20213  [pdf, other

    cs.RO cs.CV

    Registering Neural 4D Gaussians for Endoscopic Surgery

    Authors: Yiming Huang, Beilei Cui, Ikemura Kei, Jiekai Zhang, Long Bai, Hongliang Ren

    Abstract: The recent advance in neural rendering has enabled the ability to reconstruct high-quality 4D scenes using neural networks. Although 4D neural reconstruction is popular, registration for such representations remains a challenging task, especially for dynamic scene registration in surgical planning and simulation. In this paper, we propose a novel strategy for dynamic surgical neural scene registra… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  8. arXiv:2407.19435  [pdf, other

    cs.CV cs.AI cs.CL cs.HC cs.RO

    ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding

    Authors: Zhen Chen, Zongming Zhang, Wenwu Guo, Xingjian Luo, Long Bai, Jinlin Wu, Hongliang Ren, Hongbin Liu

    Abstract: Surgical instrument segmentation is crucial in surgical scene understanding, thereby facilitating surgical safety. Existing algorithms directly detected all instruments of pre-defined categories in the input image, lacking the capability to segment specific instruments according to the surgeon's intention. During different stages of surgery, surgeons exhibit varying preferences and focus toward di… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: This work is accepted by IROS 2024 (Oral)

  9. arXiv:2407.14041  [pdf, other

    cs.CV

    Not All Noises Are Created Equally:Diffusion Noise Selection and Optimization

    Authors: Zipeng Qi, Lichen Bai, Haoyi Xiong, Zeke Xie

    Abstract: Diffusion models that can generate high-quality data from randomly sampled Gaussian noises have become the mainstream generative method in both academia and industry. Are randomly sampled Gaussian noises equally good for diffusion models? While a large body of works tried to understand and improve diffusion models, previous works overlooked the possibility to select or optimize the sampled noise t… ▽ More

    Submitted 27 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

  10. arXiv:2407.12592  [pdf, other

    cs.CV

    VegeDiff: Latent Diffusion Model for Geospatial Vegetation Forecasting

    Authors: Sijie Zhao, Hao Chen, Xueliang Zhang, Pengfeng Xiao, Lei Bai, Wanli Ouyang

    Abstract: In the context of global climate change and frequent extreme weather events, forecasting future geospatial vegetation states under these conditions is of significant importance. The vegetation change process is influenced by the complex interplay between dynamic meteorological variables and static environmental variables, leading to high levels of uncertainty. Existing deterministic methods are in… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 15 pages, 8 figures

  11. arXiv:2407.10047  [pdf, other

    cs.CV

    HSFusion: A high-level vision task-driven infrared and visible image fusion network via semantic and geometric domain transformation

    Authors: Chengjie Jiang, Xiaowen Liu, Bowen Zheng, Lu Bai, Jing Li

    Abstract: Infrared and visible image fusion has been developed from vision perception oriented fusion methods to strategies which both consider the vision perception and high-level vision task. However, the existing task-driven methods fail to address the domain gap between semantic and geometric representation. To overcome these issues, we propose a high-level vision task-driven infrared and visible image… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  12. arXiv:2407.08418  [pdf, other

    cs.LG cs.CV

    PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines

    Authors: ZiDong Wang, Zeyu Lu, Di Huang, Tong He, Xihui Liu, Wanli Ouyang, Lei Bai

    Abstract: In this paper, we introduce PredBench, a benchmark tailored for the holistic evaluation of spatio-temporal prediction networks. Despite significant progress in this field, there remains a lack of a standardized framework for a detailed and comparative analysis of various prediction network architectures. PredBench addresses this gap by conducting large-scale experiments, upholding standardized and… ▽ More

    Submitted 11 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  13. arXiv:2407.06317  [pdf, other

    cs.AI cs.CV cs.RO

    Enhanced Safety in Autonomous Driving: Integrating Latent State Diffusion Model for End-to-End Navigation

    Authors: Detian Chu, Linyuan Bai, Jianuo Huang, Zhenlong Fang, Peng Zhang, Wei Kang, Haifeng Lin

    Abstract: With the advancement of autonomous driving, ensuring safety during motion planning and navigation is becoming more and more important. However, most end-to-end planning methods suffer from a lack of safety. This research addresses the safety issue in the control optimization problem of autonomous driving, formulated as Constrained Markov Decision Processes (CMDPs). We propose a novel, model-based… ▽ More

    Submitted 17 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  14. arXiv:2407.02816  [pdf, other

    cs.IT eess.SP math.ST

    Large and Small Deviations for Statistical Sequence Matching

    Authors: Lin Zhou, Qianyun Wang, Jingjing Wang, Lin Bai, Alfred O. Hero

    Abstract: We revisit the problem of statistical sequence matching between two databases of sequences initiated by Unnikrishnan (TIT 2015) and derive theoretical performance guarantees for the generalized likelihood ratio test (GLRT). We first consider the case where the number of matched pairs of sequences between the databases is known. In this case, the task is to accurately find the matched pairs of sequ… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Extended version of ISIT paper

  15. arXiv:2406.14399  [pdf, other

    cs.LG cs.CV physics.ao-ph stat.ML

    WEATHER-5K: A Large-scale Global Station Weather Dataset Towards Comprehensive Time-series Forecasting Benchmark

    Authors: Tao Han, Song Guo, Zhenghao Chen, Wanghan Xu, Lei Bai

    Abstract: Global Station Weather Forecasting (GSWF) is crucial for various sectors, including aviation, agriculture, energy, and disaster preparedness. Recent advancements in deep learning have significantly improved the accuracy of weather predictions by optimizing models based on public meteorological data. However, existing public datasets for GSWF optimization and benchmarking still suffer from signific… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 26 pages,13 figures

  16. arXiv:2406.14191  [pdf, other

    cs.CL cs.AI cs.LG

    Temporal Knowledge Graph Question Answering: A Survey

    Authors: Miao Su, Zixuan Li, Zhuo Chen, Long Bai, Xiaolong Jin, Jiafeng Guo

    Abstract: Knowledge Base Question Answering (KBQA) has been a long-standing field to answer questions based on knowledge bases. Recently, the evolving dynamics of knowledge have attracted a growing interest in Temporal Knowledge Graph Question Answering (TKGQA), an emerging task to answer temporal questions. However, this field grapples with ambiguities in defining temporal questions and lacks a systematic… ▽ More

    Submitted 5 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 8 pages, 3 figures

  17. arXiv:2406.13705  [pdf, other

    eess.IV cs.AI cs.CV

    EndoUIC: Promptable Diffusion Transformer for Unified Illumination Correction in Capsule Endoscopy

    Authors: Long Bai, Tong Chen, Qiaozhi Tan, Wan Jun Nah, Yanheng Li, Zhicheng He, Sishen Yuan, Zhen Chen, Jinlin Wu, Mobarakol Islam, Zhen Li, Hongbin Liu, Hongliang Ren

    Abstract: Wireless Capsule Endoscopy (WCE) is highly valued for its non-invasive and painless approach, though its effectiveness is compromised by uneven illumination from hardware constraints and complex internal dynamics, leading to overexposed or underexposed images. While researchers have discussed the challenges of low-light enhancement in WCE, the issue of correcting for different exposure levels rema… ▽ More

    Submitted 8 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: To appear in MICCAI 2024. Code and dataset availability: https://github.com/longbai1006/EndoUIC

  18. arXiv:2406.12754  [pdf, other

    cs.CL cs.AI

    Chumor 1.0: A Truly Funny and Challenging Chinese Humor Understanding Dataset from Ruo Zhi Ba

    Authors: Ruiqi He, Yushu He, Longju Bai, Jiarui Liu, Zhenjie Sun, Zenghao Tang, He Wang, Hanchen Xia, Naihao Deng

    Abstract: Existing humor datasets and evaluations predominantly focus on English, lacking resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct Chumor, a dataset sourced from Ruo Zhi Ba (RZB), a Chinese Reddit-like platform dedicated to sharing intellectually challenging and culturally specific jokes. We annotate explanations for each joke and evalua… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  19. arXiv:2406.10508  [pdf, other

    cs.CV

    Learning to Adapt Foundation Model DINOv2 for Capsule Endoscopy Diagnosis

    Authors: Bowen Zhang, Ying Chen, Long Bai, Yan Zhao, Yuxiang Sun, Yixuan Yuan, Jianhua Zhang, Hongliang Ren

    Abstract: Foundation models have become prominent in computer vision, achieving notable success in various tasks. However, their effectiveness largely depends on pre-training with extensive datasets. Applying foundation models directly to small datasets of capsule endoscopy images from scratch is challenging. Pre-training on broad, general vision datasets is crucial for successfully fine-tuning our model fo… ▽ More

    Submitted 30 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: To appear in ICBIR 2024

  20. arXiv:2406.01645  [pdf, other

    cs.LG cs.AI

    FNP: Fourier Neural Processes for Arbitrary-Resolution Data Assimilation

    Authors: Kun Chen, Tao Chen, Peng Ye, Hao Chen, Kang Chen, Tao Han, Wanli Ouyang, Lei Bai

    Abstract: Data assimilation is a vital component in modern global medium-range weather forecasting systems to obtain the best estimation of the atmospheric state by combining the short-term forecast and observations. Recently, AI-based data assimilation approaches have attracted increasing attention for their significant advantages over traditional techniques in terms of computational consumption. However,… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  21. arXiv:2405.17790  [pdf, other

    cs.CV

    Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

    Authors: Weizhen He, Yiheng Deng, Yunfeng Yan, Feng Zhu, Yizhou Wang, Lei Bai, Qingsong Xie, Donglian Qi, Wanli Ouyang, Shixiang Tang

    Abstract: Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2306.07520

  22. arXiv:2405.15412  [pdf, other

    physics.ao-ph cs.AI cs.LG

    ORCA: A Global Ocean Emulator for Multi-year to Decadal Predictions

    Authors: Zijie Guo, Pumeng Lyu, Fenghua Ling, Jing-Jia Luo, Niklas Boers, Wanli Ouyang, Lei Bai

    Abstract: Ocean dynamics plays a crucial role in driving global weather and climate patterns. Accurate and efficient modeling of ocean dynamics is essential for improved understanding of complex ocean circulation and processes, for predicting climate variations and their associated teleconnections, and for addressing the challenges of climate change. While great efforts have been made to improve numerical O… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  23. arXiv:2405.15151  [pdf, other

    cs.CV cs.GR cs.RO

    NeB-SLAM: Neural Blocks-based Salable RGB-D SLAM for Unknown Scenes

    Authors: Lizhi Bai, Chunqi Tian, Jun Yang, Siyu Zhang, Weijian Liang

    Abstract: Neural implicit representations have recently demonstrated considerable potential in the field of visual simultaneous localization and mapping (SLAM). This is due to their inherent advantages, including low storage overhead and representation continuity. However, these methods necessitate the size of the scene as input, which is impractical for unknown scenes. Consequently, we propose NeB-SLAM, a… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  24. arXiv:2405.14742  [pdf, other

    cs.LG cs.AI

    HC-GAE: The Hierarchical Cluster-based Graph Auto-Encoder for Graph Representation Learning

    Authors: Zhuo Xu, Lu Bai, Lixin Cui, Ming Li, Yue Wang, Edwin R. Hancock

    Abstract: Graph Auto-Encoders (GAEs) are powerful tools for graph representation learning. In this paper, we develop a novel Hierarchical Cluster-based GAE (HC-GAE), that can learn effective structural characteristics for graph data analysis. To this end, during the encoding process, we commence by utilizing the hard node assignment to decompose a sample graph into a family of separated subgraphs. We compre… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  25. arXiv:2405.13796  [pdf, other

    cs.LG cs.AI

    Generalizing Weather Forecast to Fine-grained Temporal Scales via Physics-AI Hybrid Modeling

    Authors: Wanghan Xu, Fenghua Ling, Wenlong Zhang, Tao Han, Hao Chen, Wanli Ouyang, Lei Bai

    Abstract: Data-driven artificial intelligence (AI) models have made significant advancements in weather forecasting, particularly in medium-range and nowcasting. However, most data-driven weather forecasting models are black-box systems that focus on learning data mapping rather than fine-grained physical evolution in the time dimension. Consequently, the limitations in the temporal scale of datasets preven… ▽ More

    Submitted 29 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  26. arXiv:2405.13711  [pdf, other

    cs.LG cs.AI math.DS physics.ao-ph

    VAE-Var: Variational-Autoencoder-Enhanced Variational Assimilation

    Authors: Yi Xiao, Qilong Jia, Wei Xue, Lei Bai

    Abstract: Data assimilation refers to a set of algorithms designed to compute the optimal estimate of a system's state by refining the prior prediction (known as background states) using observed data. Variational assimilation methods rely on the maximum likelihood approach to formulate a variational cost, with the optimal state estimate derived by minimizing this cost. Although traditional variational meth… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  27. arXiv:2405.10948  [pdf, other

    cs.CV cs.AI cs.RO eess.IV

    Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery

    Authors: Guankun Wang, Long Bai, Wan Jun Nah, Jie Wang, Zhaoxi Zhang, Zhen Chen, Jinlin Wu, Mobarakol Islam, Hongbin Liu, Hongliang Ren

    Abstract: Recent advancements in Surgical Visual Question Answering (Surgical-VQA) and related region grounding have shown great promise for robotic and medical applications, addressing the critical need for automated methods in personalized surgical mentorship. However, existing models primarily provide simple structured answers and struggle with complex scenarios due to their limited capability in recogni… ▽ More

    Submitted 22 March, 2024; originally announced May 2024.

  28. arXiv:2405.10550  [pdf, other

    eess.IV cs.CV

    LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion

    Authors: Tong Chen, Qingcheng Lyu, Long Bai, Erjian Guo, Huxin Gao, Xiaoxiao Yang, Hongliang Ren, Luping Zhou

    Abstract: Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It ad… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  29. arXiv:2405.10218  [pdf, other

    cs.LG cs.AI

    ENADPool: The Edge-Node Attention-based Differentiable Pooling for Graph Neural Networks

    Authors: Zhehan Zhao, Lu Bai, Lixin Cui, Ming Li, Yue Wang, Lixiang Xu, Edwin R. Hancock

    Abstract: Graph Neural Networks (GNNs) are powerful tools for graph classification. One important operation for GNNs is the downsampling or pooling that can learn effective embeddings from the node representations. In this paper, we propose a new hierarchical pooling operation, namely the Edge-Node Attention-based Differentiable Pooling (ENADPool), for GNNs to learn effective graph representations. Unlike t… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  30. arXiv:2405.08672  [pdf, other

    eess.IV cs.CV

    EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera

    Authors: Beilei Cui, Mobarakol Islam, Long Bai, An Wang, Hongliang Ren

    Abstract: Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adapt… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: early accepted by MICCAI 2024

  31. arXiv:2405.03376  [pdf, other

    cs.LG cs.CV

    CRA5: Extreme Compression of ERA5 for Portable Global Climate and Weather Research via an Efficient Variational Transformer

    Authors: Tao Han, Zhenghao Chen, Song Guo, Wanghan Xu, Lei Bai

    Abstract: The advent of data-driven weather forecasting models, which learn from hundreds of terabytes (TB) of reanalysis data, has significantly advanced forecasting capabilities. However, the substantial costs associated with data storage and transmission present a major challenge for data providers and users, affecting resource-constrained researchers and limiting their accessibility to participate in AI… ▽ More

    Submitted 7 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: Main text and supplementary, 22 pages, 13 figures

  32. arXiv:2405.00216  [pdf, other

    cs.CL cs.AI cs.LG

    Graphical Reasoning: LLM-based Semi-Open Relation Extraction

    Authors: Yicheng Tao, Yiqun Wang, Longju Bai

    Abstract: This paper presents a comprehensive exploration of relation extraction utilizing advanced language models, specifically Chain of Thought (CoT) and Graphical Reasoning (GRE) techniques. We demonstrate how leveraging in-context learning with GPT-3.5 can significantly enhance the extraction process, particularly through detailed example-based reasoning. Additionally, we introduce a novel graphical re… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  33. arXiv:2404.18343  [pdf, other

    cs.MM cs.CV

    G-Refine: A General Quality Refiner for Text-to-Image Generation

    Authors: Chunyi Li, Haoning Wu, Hongkun Hao, Zicheng Zhang, Tengchaun Kou, Chaofeng Chen, Lei Bai, Xiaohong Liu, Weisi Lin, Guangtao Zhai

    Abstract: With the evolution of Text-to-Image (T2I) models, the quality defects of AI-Generated Images (AIGIs) pose a significant barrier to their widespread adoption. In terms of both perception and alignment, existing models cannot always guarantee high-quality results. To mitigate this limitation, we introduce G-Refine, a general image quality refiner designed to enhance low-quality images without compro… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  34. arXiv:2404.02668  [pdf, other

    cs.CV

    RS-Mamba for Large Remote Sensing Image Dense Prediction

    Authors: Sijie Zhao, Hao Chen, Xueliang Zhang, Pengfeng Xiao, Lei Bai, Wanli Ouyang

    Abstract: Context modeling is critical for remote sensing image dense prediction tasks. Nowadays, the growing size of very-high-resolution (VHR) remote sensing images poses challenges in effectively modeling context. While transformer-based models possess global modeling capabilities, they encounter computational challenges when applied to large VHR images due to their quadratic complexity. The conventional… ▽ More

    Submitted 10 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: 15 pages,8 figures

  35. arXiv:2404.01767  [pdf, other

    cs.CL

    Class-Incremental Few-Shot Event Detection

    Authors: Kailin Zhao, Xiaolong Jin, Long Bai, Jiafeng Guo, Xueqi Cheng

    Abstract: Event detection is one of the fundamental tasks in information extraction and knowledge graph. However, a realistic event detection system often needs to deal with new event classes constantly. These new classes usually have only a few labeled instances as it is time-consuming and labor-intensive to annotate a large number of unlabeled instances. Therefore, this paper proposes a new task, called c… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  36. arXiv:2404.01695  [pdf, other

    cs.LG

    Selective Temporal Knowledge Graph Reasoning

    Authors: Zhongni Hou, Xiaolong Jin, Zixuan Li, Long Bai, Jiafeng Guo, Xueqi Cheng

    Abstract: Temporal Knowledge Graph (TKG), which characterizes temporally evolving facts in the form of (subject, relation, object, timestamp), has attracted much attention recently. TKG reasoning aims to predict future facts based on given historical ones. However, existing TKG reasoning models are unable to abstain from predictions they are uncertain, which will inevitably bring risks in real-world applica… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  37. arXiv:2403.16407  [pdf, other

    cs.CV

    A Survey on Long Video Generation: Challenges, Methods, and Prospects

    Authors: Chengxuan Li, Di Huang, Zeyu Lu, Yang Xiao, Qingqi Pei, Lei Bai

    Abstract: Video generation is a rapidly advancing research area, garnering significant attention due to its broad range of applications. One critical aspect of this field is the generation of long-duration videos, which presents unique challenges and opportunities. This paper presents the first survey of recent advancements in long video generation and summarises them into two key paradigms: divide and conq… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  38. arXiv:2403.16227  [pdf, other

    cs.CV

    Dual-modal Prior Semantic Guided Infrared and Visible Image Fusion for Intelligent Transportation System

    Authors: Jing Li, Lu Bai, Bin Yang, Chang Li, Lingfei Ma, Lixin Cui, Edwin R. Hancock

    Abstract: Infrared and visible image fusion (IVF) plays an important role in intelligent transportation system (ITS). The early works predominantly focus on boosting the visual appeal of the fused result, and only several recent approaches have tried to combine the high-level vision task with IVF. However, they prioritize the design of cascaded structure to seek unified suitable features and fit different t… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  39. arXiv:2403.16162  [pdf, other

    cs.AI

    Multi-Task Learning with Multi-Task Optimization

    Authors: Lu Bai, Abhishek Gupta, Yew-Soon Ong

    Abstract: Multi-task learning solves multiple correlated tasks. However, conflicts may exist between them. In such circumstances, a single solution can rarely optimize all the tasks, leading to performance trade-offs. To arrive at a set of optimized yet well-distributed models that collectively embody different trade-offs in one algorithmic pass, this paper proposes to view Pareto multi-task learning throug… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

  40. arXiv:2403.16133  [pdf, other

    cs.AI cs.LG

    SSHPool: The Separated Subgraph-based Hierarchical Pooling

    Authors: Zhuo Xu, Lixin Cui, Ming Li, Yue Wang, Ziyu Lyu, Hangyuan Du, Lu Bai, Philip S. Yu, Edwin R. Hancock

    Abstract: In this paper, we develop a novel local graph pooling method, namely the Separated Subgraph-based Hierarchical Pooling (SSHPool), for graph classification. We commence by assigning the nodes of a sample graph into different clusters, resulting in a family of separated subgraphs. We individually employ the local graph convolution units as the local structure to further compress each subgraph into a… ▽ More

    Submitted 13 August, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  41. arXiv:2403.16130  [pdf, other

    cs.LG cs.AI

    AKBR: Learning Adaptive Kernel-based Representations for Graph Classification

    Authors: Feifei Qian, Lixin Cui, Ming Li, Yue Wang, Hangyuan Du, Lixiang Xu, Lu Bai, Philip S. Yu, Edwin R. Hancock

    Abstract: In this paper, we propose a new model to learn Adaptive Kernel-based Representations (AKBR) for graph classification. Unlike state-of-the-art R-convolution graph kernels that are defined by merely counting any pair of isomorphic substructures between graphs and cannot provide an end-to-end learning mechanism for the classifier, the proposed AKBR approach aims to define an end-to-end representation… ▽ More

    Submitted 13 August, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  42. arXiv:2403.11817  [pdf, other

    cs.CV

    HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation

    Authors: Sha Zhang, Jiajun Deng, Lei Bai, Houqiang Li, Wanli Ouyang, Yanyong Zhang

    Abstract: We present a hybrid-view-based knowledge distillation framework, termed HVDistill, to guide the feature learning of a point cloud neural network with a pre-trained image network in an unsupervised manner. By exploiting the geometric relationship between RGB cameras and LiDAR sensors, the correspondence between the two modalities based on both image-plane view and bird-eye view can be established,… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  43. arXiv:2403.11035  [pdf

    physics.optics cs.CV cs.NE physics.app-ph

    Multiplane Quantitative Phase Imaging Using a Wavelength-Multiplexed Diffractive Optical Processor

    Authors: Che-Yung Shen, Jingxi Li, Tianyi Gan, Yuhang Li, Langxing Bai, Mona Jarrahi, Aydogan Ozcan

    Abstract: Quantitative phase imaging (QPI) is a label-free technique that provides optical path length information for transparent specimens, finding utility in biology, materials science, and engineering. Here, we present quantitative phase imaging of a 3D stack of phase-only objects using a wavelength-multiplexed diffractive optical processor. Utilizing multiple spatially engineered diffractive layers tra… ▽ More

    Submitted 16 March, 2024; originally announced March 2024.

    Comments: 27 Pages, 9 Figures

    Journal ref: Advanced Photonics (2024)

  44. arXiv:2403.07969  [pdf, other

    cs.LG cs.AI

    KnowCoder: Coding Structured Knowledge into LLMs for Universal Information Extraction

    Authors: Zixuan Li, Yutao Zeng, Yuxin Zuo, Weicheng Ren, Wenxuan Liu, Miao Su, Yucan Guo, Yantao Liu, Xiang Li, Zhilei Hu, Long Bai, Wei Li, Yidan Liu, Pan Yang, Xiaolong Jin, Jiafeng Guo, Xueqi Cheng

    Abstract: In this paper, we propose KnowCoder, a Large Language Model (LLM) to conduct Universal Information Extraction (UIE) via code generation. KnowCoder aims to develop a kind of unified schema representation that LLMs can easily understand and an effective learning framework that encourages LLMs to follow schemas and extract structured knowledge accurately. To achieve these, KnowCoder introduces a code… ▽ More

    Submitted 13 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

  45. arXiv:2403.07687  [pdf, other

    cs.CV cs.AI cs.CL

    Annotations on a Budget: Leveraging Geo-Data Similarity to Balance Model Performance and Annotation Cost

    Authors: Oana Ignat, Longju Bai, Joan Nwatu, Rada Mihalcea

    Abstract: Current foundation models have shown impressive performance across various tasks. However, several studies have revealed that these models are not effective for everyone due to the imbalanced geographical and economic representation of the data used in the training process. Most of this data comes from Western countries, leading to poor results for underrepresented countries. To address this issue… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: accepted at COLING 2024

  46. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  47. arXiv:2402.13270  [pdf, other

    physics.ao-ph cs.AI cs.LG physics.data-an

    Global Tropical Cyclone Intensity Forecasting with Multi-modal Multi-scale Causal Autoregressive Model

    Authors: Xinyu Wang, Kang Chen, Lei Liu, Tao Han, Bin Li, Lei Bai

    Abstract: Accurate forecasting of Tropical cyclone (TC) intensity is crucial for formulating disaster risk reduction strategies. Current methods predominantly rely on limited spatiotemporal information from ERA5 data and neglect the causal relationships between these physical variables, failing to fully capture the spatial and temporal patterns required for intensity forecasting. To address this issue, we p… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  48. arXiv:2402.12376  [pdf, other

    cs.CV

    FiT: Flexible Vision Transformer for Diffusion Model

    Authors: Zeyu Lu, Zidong Wang, Di Huang, Chengyue Wu, Xihui Liu, Wanli Ouyang, Lei Bai

    Abstract: Nature is infinitely resolution-free. In the context of this reality, existing diffusion models, such as Diffusion Transformers, often face challenges when processing image resolutions outside of their trained domain. To overcome this limitation, we present the Flexible Vision Transformer (FiT), a transformer architecture specifically designed for generating images with unrestricted resolutions an… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  49. arXiv:2402.11476  [pdf, other

    cs.CV

    EndoOOD: Uncertainty-aware Out-of-distribution Detection in Capsule Endoscopy Diagnosis

    Authors: Qiaozhi Tan, Long Bai, Guankun Wang, Mobarakol Islam, Hongliang Ren

    Abstract: Wireless capsule endoscopy (WCE) is a non-invasive diagnostic procedure that enables visualization of the gastrointestinal (GI) tract. Deep learning-based methods have shown effectiveness in disease screening using WCE data, alleviating the burden on healthcare professionals. However, existing capsule endoscopy classification methods mostly rely on pre-defined categories, making it challenging to… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

    Comments: To appear in IEEE ISBI 2024

  50. arXiv:2402.06985  [pdf, other

    cs.CV cs.AI cs.RO

    OSSAR: Towards Open-Set Surgical Activity Recognition in Robot-assisted Surgery

    Authors: Long Bai, Guankun Wang, Jie Wang, Xiaoxiao Yang, Huxin Gao, Xin Liang, An Wang, Mobarakol Islam, Hongliang Ren

    Abstract: In the realm of automated robotic surgery and computer-assisted interventions, understanding robotic surgical activities stands paramount. Existing algorithms dedicated to surgical activity recognition predominantly cater to pre-defined closed-set paradigms, ignoring the challenges of real-world open-set scenarios. Such algorithms often falter in the presence of test samples originating from class… ▽ More

    Submitted 10 February, 2024; originally announced February 2024.

    Comments: To appear in IEEE ICRA 2024