Skip to main content

Showing 1–50 of 822 results for author: Jiang, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10458  [pdf, other

    cs.LG

    Transfer Operator Learning with Fusion Frame

    Authors: Haoyang Jiang, Yongzhi Qu

    Abstract: The challenge of applying learned knowledge from one domain to solve problems in another related but distinct domain, known as transfer learning, is fundamental in operator learning models that solve Partial Differential Equations (PDEs). These current models often struggle with generalization across different tasks and datasets, limiting their applicability in diverse scientific and engineering d… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  2. arXiv:2408.10161  [pdf, other

    cs.CV cs.AI cs.RO

    NeuFlow v2: High-Efficiency Optical Flow Estimation on Edge Devices

    Authors: Zhiyong Zhang, Aniket Gupta, Huaizu Jiang, Hanumant Singh

    Abstract: Real-time high-accuracy optical flow estimation is crucial for various real-world applications. While recent learning-based optical flow methods have achieved high accuracy, they often come with significant computational costs. In this paper, we propose a highly efficient optical flow method that balances high accuracy with reduced computational demands. Building upon NeuFlow v1, we introduce new… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  3. arXiv:2408.09856  [pdf, other

    cs.CL cs.AI

    TeamLoRA: Boosting Low-Rank Adaptation with Expert Collaboration and Competition

    Authors: Tianwei Lin, Jiang Liu, Wenqiao Zhang, Zhaocheng Li, Yang Dai, Haoyuan Li, Zhelun Yu, Wanggui He, Juncheng Li, Hao Jiang, Siliang Tang, Yueting Zhuang

    Abstract: While Parameter-Efficient Fine-Tuning (PEFT) methods like LoRA have effectively addressed GPU memory constraints during fine-tuning, their performance often falls short, especially in multidimensional task scenarios. To address this issue, one straightforward solution is to introduce task-specific LoRA modules as domain experts, leveraging the modeling of multiple experts' capabilities and thus en… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  4. arXiv:2408.09851  [pdf, other

    cs.NI eess.SY

    ISAC-Fi: Enabling Full-fledged Monostatic Sensing over Wi-Fi Communication

    Authors: Zhe Chen, Chao Hu, Tianyue Zheng, Hangcheng Cao, Yanbing Yang, Yen Chu, Hongbo Jiang, Jun Luo

    Abstract: Whereas Wi-Fi communications have been exploited for sensing purpose for over a decade, the bistatic or multistatic nature of Wi-Fi still poses multiple challenges, hampering real-life deployment of integrated sensing and communication (ISAC) within Wi-Fi framework. In this paper, we aim to re-design WiFi so that monostatic sensing (mimicking radar) can be achieved over the multistatic communicati… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 14 pages, 22 figures

  5. arXiv:2408.09465  [pdf, other

    cs.CV cs.AI

    MedMAP: Promoting Incomplete Multi-modal Brain Tumor Segmentation with Alignment

    Authors: Tianyi Liu, Zhaorui Tan, Muyin Chen, Xi Yang, Haochuan Jiang, Kaizhu Huang

    Abstract: Brain tumor segmentation is often based on multiple magnetic resonance imaging (MRI). However, in clinical practice, certain modalities of MRI may be missing, which presents a more difficult scenario. To cope with this challenge, Knowledge Distillation, Domain Adaption, and Shared Latent Space have emerged as commonly promising strategies. However, recent efforts typically overlook the modality ga… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  6. arXiv:2408.09441  [pdf, other

    cs.CV

    CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination

    Authors: Kaicheng Yang, Tiancheng Gu, Xiang An, Haiqiang Jiang, Xiangzi Dai, Ziyong Feng, Weidong Cai, Jiankang Deng

    Abstract: Contrastive Language-Image Pre-training (CLIP) has achieved excellent performance over a wide range of tasks. However, the effectiveness of CLIP heavily relies on a substantial corpus of pre-training data, resulting in notable consumption of computational resources. Although knowledge distillation has been widely applied in single modality models, how to efficiently expand knowledge distillation t… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 11 pages,8 figures

  7. arXiv:2408.09123  [pdf, other

    cs.LG math.AT

    Dynamic Neural Dowker Network: Approximating Persistent Homology in Dynamic Directed Graphs

    Authors: Hao Li, Hao Jiang, Jiajun Fan, Dongsheng Ye, Liang Du

    Abstract: Persistent homology, a fundamental technique within Topological Data Analysis (TDA), captures structural and shape characteristics of graphs, yet encounters computational difficulties when applied to dynamic directed graphs. This paper introduces the Dynamic Neural Dowker Network (DNDN), a novel framework specifically designed to approximate the results of dynamic Dowker filtration, aiming to capt… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: KDD 2024

  8. arXiv:2408.08305  [pdf, other

    cs.CV

    Towards Flexible Visual Relationship Segmentation

    Authors: Fangrui Zhu, Jianwei Yang, Huaizu Jiang

    Abstract: Visual relationship understanding has been studied separately in human-object interaction(HOI) detection, scene graph generation(SGG), and referring relationships(RR) tasks. Given the complexity and interconnectedness of these tasks, it is crucial to have a flexible framework that can effectively address these tasks in a cohesive manner. In this work, we propose FleVRS, a single model that seamles… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  9. arXiv:2408.07444  [pdf, other

    eess.IV cs.CV

    Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark

    Authors: Senmao Wang, Haifan Gong, Runmeng Cui, Boyao Wan, Yicheng Liu, Zhonglin Hu, Haiqing Yang, Jingyang Zhou, Bo Pan, Lin Lin, Haiyue Jiang

    Abstract: Costal cartilage segmentation is crucial to various medical applications, necessitating precise and reliable techniques due to its complex anatomy and the importance of accurate diagnosis and surgical planning. We propose a novel deep learning-based approach called topology-guided deformable Mamba (TGDM) for costal cartilage segmentation. The TGDM is tailored to capture the intricate long-range co… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  10. arXiv:2408.07100  [pdf, other

    cs.LG cs.AI

    Pattern-Matching Dynamic Memory Network for Dual-Mode Traffic Prediction

    Authors: Wenchao Weng, Mei Wu, Hanyu Jiang, Wanzeng Kong, Xiangjie Kong, Feng Xia

    Abstract: In recent years, deep learning has increasingly gained attention in the field of traffic prediction. Existing traffic prediction models often rely on GCNs or attention mechanisms with O(N^2) complexity to dynamically extract traffic node features, which lack efficiency and are not lightweight. Additionally, these models typically only utilize historical data for prediction, without considering the… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  11. arXiv:2408.06475  [pdf, other

    cs.DS cs.CG cs.DM math.NA

    Quasi-Monte Carlo Beyond Hardy-Krause

    Authors: Nikhil Bansal, Haotian Jiang

    Abstract: The classical approaches to numerically integrating a function $f$ are Monte Carlo (MC) and quasi-Monte Carlo (QMC) methods. MC methods use random samples to evaluate $f$ and have error $O(σ(f)/\sqrt{n})$, where $σ(f)$ is the standard deviation of $f$. QMC methods are based on evaluating $f$ at explicit point sets with low discrepancy, and as given by the classical Koksma-Hlawka inequality, they h… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  12. arXiv:2408.04600  [pdf, other

    cs.CV

    Improving Network Interpretability via Explanation Consistency Evaluation

    Authors: Hefeng Wu, Hao Jiang, Keze Wang, Ziyi Tang, Xianghuan He, Liang Lin

    Abstract: While deep neural networks have achieved remarkable performance, they tend to lack transparency in prediction. The pursuit of greater interpretability in neural networks often results in a degradation of their original performance. Some works strive to improve both interpretability and performance, but they primarily depend on meticulously imposed conditions. In this paper, we propose a simple yet… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: To appear in IEEE Transactions on Multimedia

  13. arXiv:2408.03704  [pdf, ps, other

    cs.CR

    BioDeepHash: Mapping Biometrics into a Stable Code

    Authors: Baogang Song, Dongdong Zhao, Jiang Yan, Huanhuan Li, Hao Jiang

    Abstract: With the wide application of biometrics, more and more attention has been paid to the security of biometric templates. However most of existing biometric template protection (BTP) methods have some security problems, e.g. the problem that protected templates leak part of the original biometric data (exists in Cancelable Biometrics (CB)), the use of error-correcting codes (ECC) leads to decodable a… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  14. arXiv:2408.02013  [pdf, other

    cs.DC

    Blockchain-Enabled Dynamic Spectrum Sharing for Satellite and Terrestrial Communication Networks

    Authors: Zixin Wang, Mingrui Cao, Hao Jiang, Bin Cao, Shuo Wang, Chen Sun, Mugen Peng

    Abstract: Dynamic spectrum sharing (DSS) between satellite and terrestrial networks has increasingly engaged the academic and industrial sectors. Nevertheless, facilitating secure, efficient and scalable sharing continues to pose a pivotal challenge. Emerging as a promising technology to bridge the trust gap among multiple participants, blockchain has been envisioned to enable DSS in a decentralized manner.… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  15. arXiv:2408.01942  [pdf, other

    cs.AI cs.CV

    Visual Grounding for Object-Level Generalization in Reinforcement Learning

    Authors: Haobin Jiang, Zongqing Lu

    Abstract: Generalization is a pivotal challenge for agents following natural language instructions. To approach this goal, we leverage a vision-language model (VLM) for visual grounding and transfer its vision-language knowledge into reinforcement learning (RL) for object-centric tasks, which makes the agent capable of zero-shot generalization to unseen objects and instructions. By visual grounding, we obta… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: 35 pages, 14 figures, 17 tables

  16. arXiv:2408.01319  [pdf, other

    cs.AI

    A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks

    Authors: Jiaqi Wang, Hanqi Jiang, Yiheng Liu, Chong Ma, Xu Zhang, Yi Pan, Mengyuan Liu, Peiran Gu, Sichen Xia, Wenjun Li, Yutong Zhang, Zihao Wu, Zhengliang Liu, Tianyang Zhong, Bao Ge, Tuo Zhang, Ning Qiang, Xintao Hu, Xi Jiang, Xin Zhang, Wei Zhang, Dinggang Shen, Tianming Liu, Shu Zhang

    Abstract: In an era defined by the explosive growth of data and rapid technological advancements, Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence (AI) systems. Designed to seamlessly integrate diverse data types-including text, images, videos, audio, and physiological sequences-MLLMs address the complexities of real-world applications far beyond the capabilities of… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  17. arXiv:2408.00114  [pdf, other

    cs.AI

    Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs

    Authors: Kewei Cheng, Jingfeng Yang, Haoming Jiang, Zhengyang Wang, Binxuan Huang, Ruirui Li, Shiyang Li, Zheng Li, Yifan Gao, Xian Li, Bing Yin, Yizhou Sun

    Abstract: Reasoning encompasses two typical types: deductive reasoning and inductive reasoning. Despite extensive research into the reasoning capabilities of Large Language Models (LLMs), most studies have failed to rigorously differentiate between inductive and deductive reasoning, leading to a blending of the two. This raises an essential question: In LLM reasoning, which poses a greater challenge - deduc… ▽ More

    Submitted 6 August, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

  18. arXiv:2407.21316  [pdf, other

    cs.CR cs.LG

    Diff-Cleanse: Identifying and Mitigating Backdoor Attacks in Diffusion Models

    Authors: Jiang Hao, Xiao Jin, Hu Xiaoguang, Chen Tianyou

    Abstract: Diffusion models (DM) represent one of the most advanced generative models today, yet recent studies suggest that DMs are vulnerable to backdoor attacks. Backdoor attacks establish hidden associations between particular input patterns and model behaviors, compromising model integrity by triggering undesirable actions with manipulated input data. This vulnerability poses substantial risks, includin… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  19. arXiv:2407.21308  [pdf, other

    cs.CV

    Enhanced Self-Checkout System for Retail Based on Improved YOLOv10

    Authors: Lianghao Tan, Shubing Liu, Jing Gao, Xiaoyi Liu, Linyue Chu, Huangqi Jiang

    Abstract: With the rapid advancement of deep learning technologies, computer vision has shown immense potential in retail automation. This paper presents a novel self-checkout system for retail based on an improved YOLOv10 network, aimed at enhancing checkout efficiency and reducing labor costs. We propose targeted optimizations to the YOLOv10 model, by incorporating the detection head structure from YOLOv8… ▽ More

    Submitted 15 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

  20. Enhancing CTR Prediction through Sequential Recommendation Pre-training: Introducing the SRP4CTR Framework

    Authors: Ruidong Han, Qianzhong Li, He Jiang, Rui Li, Yurou Zhao, Xiang Li, Wei Lin

    Abstract: Understanding user interests is crucial for Click-Through Rate (CTR) prediction tasks. In sequential recommendation, pre-training from user historical behaviors through self-supervised learning can better comprehend user dynamic preferences, presenting the potential for direct integration with CTR tasks. Previous methods have integrated pre-trained models into downstream tasks with the sole purpos… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  21. arXiv:2407.19471  [pdf, other

    cs.CV

    On the Evaluation Consistency of Attribution-based Explanations

    Authors: Jiarui Duan, Haoling Li, Haofei Zhang, Hao Jiang, Mengqi Xue, Li Sun, Mingli Song, Jie Song

    Abstract: Attribution-based explanations are garnering increasing attention recently and have emerged as the predominant approach towards \textit{eXplanable Artificial Intelligence}~(XAI). However, the absence of consistent configurations and systematic investigations in prior literature impedes comprehensive evaluations of existing methodologies. In this work, we introduce {Meta-Rank}, an open platform for… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted as a conference paper by ECCV 2024

  22. arXiv:2407.17470  [pdf, other

    cs.CV

    SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency

    Authors: Yiming Xie, Chun-Han Yao, Vikram Voleti, Huaizu Jiang, Varun Jampani

    Abstract: We present Stable Video 4D (SV4D), a latent video diffusion model for multi-frame and multi-view consistent dynamic 3D content generation. Unlike previous methods that rely on separately trained generative models for video generation and novel view synthesis, we design a unified diffusion model to generate novel view videos of dynamic 3D objects. Specifically, given a monocular reference video, SV… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Project page: https://sv4d.github.io/

  23. arXiv:2407.16944  [pdf, ps, other

    cs.LG

    Adaptive Gradient Regularization: A Faster and Generalizable Optimization Technique for Deep Neural Networks

    Authors: Huixiu Jiang, Ling Yang, Yu Bao, Rutong Si, Sikun Yang

    Abstract: Stochastic optimization plays a crucial role in the advancement of deep learning technologies. Over the decades, significant effort has been dedicated to improving the training efficiency and robustness of deep neural networks, via various strategies including gradient normalization (GN) and gradient centralization (GC). Nevertheless, to the best of our knowledge, no one has considered to capture… ▽ More

    Submitted 19 August, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: 12 pages, 13 figures

  24. arXiv:2407.15071  [pdf, other

    cs.DB cs.CL

    Relational Database Augmented Large Language Model

    Authors: Zongyue Qin, Chen Luo, Zhengyang Wang, Haoming Jiang, Yizhou Sun

    Abstract: Large language models (LLMs) excel in many natural language processing (NLP) tasks. However, since LLMs can only incorporate new knowledge through training or supervised fine-tuning processes, they are unsuitable for applications that demand precise, up-to-date, and private information not available in the training corpora. This precise, up-to-date, and private information is typically stored in r… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  25. arXiv:2407.14770  [pdf, other

    cs.HC

    SLInterpreter: An Exploratory and Iterative Human-AI Collaborative System for GNN-based Synthetic Lethal Prediction

    Authors: Haoran Jiang, Shaohan Shi, Shuhao Zhang, Jie Zheng, Quan Li

    Abstract: Synthetic Lethal (SL) relationships, though rare among the vast array of gene combinations, hold substantial promise for targeted cancer therapy. Despite advancements in AI model accuracy, there is still a significant need among domain experts for interpretive paths and mechanism explorations that align better with domain-specific knowledge, particularly due to the high costs of experimentation. T… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  26. arXiv:2407.13338  [pdf, other

    cs.CV

    Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM

    Authors: Baicheng Li, Zike Yan, Dong Wu, Hanqing Jiang, Hongbin Zha

    Abstract: Simultaneous localization and mapping (SLAM) with implicit neural representations has received extensive attention due to the expressive representation power and the innovative paradigm of continual learning. However, deploying such a system within a dynamic environment has not been well-studied. Such challenges are intractable even for conventional algorithms since observations from different vie… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  27. arXiv:2407.12783  [pdf, other

    cs.CV cs.GR

    SMooDi: Stylized Motion Diffusion Model

    Authors: Lei Zhong, Yiming Xie, Varun Jampani, Deqing Sun, Huaizu Jiang

    Abstract: We introduce a novel Stylized Motion Diffusion model, dubbed SMooDi, to generate stylized motion driven by content texts and style motion sequences. Unlike existing methods that either generate motion of various content or transfer style from one sequence to another, SMooDi can rapidly generate motion across a broad range of content and diverse styles. To this end, we tailor a pre-trained text-to-… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. Project page: https://neu-vi.github.io/SMooDi/

  28. arXiv:2407.12613  [pdf, other

    cs.HC cs.CL

    AudienceView: AI-Assisted Interpretation of Audience Feedback in Journalism

    Authors: William Brannon, Doug Beeferman, Hang Jiang, Andrew Heyward, Deb Roy

    Abstract: Understanding and making use of audience feedback is important but difficult for journalists, who now face an impractically large volume of audience comments online. We introduce AudienceView, an online tool to help journalists categorize and interpret this feedback by leveraging large language models (LLMs). AudienceView identifies themes and topics, connects them back to specific comments, provi… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted at CSCW Demo 2024. 5 pages, 2 figures

  29. arXiv:2407.12037  [pdf, other

    cs.AR cs.SE

    A Novel HDL Code Generator for Effectively Testing FPGA Logic Synthesis Compilers

    Authors: Zhihao Xu, Shikai Guo, Guilin Zhao, Peiyu Zou, Xiaochen Li, He Jiang

    Abstract: Field Programmable Gate Array (FPGA) logic synthesis compilers (e.g., Vivado, Iverilog, Yosys, and Quartus) are widely applied in Electronic Design Automation (EDA), such as the development of FPGA programs.However, defects (i.e., incorrect synthesis) in logic synthesis compilers may lead to unexpected behaviors in target applications, posing security risks. Therefore, it is crucial to thoroughly… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  30. arXiv:2407.11100  [pdf, other

    cs.CR cs.AI cs.CL

    Building Intelligence Identification System via Large Language Model Watermarking: A Survey and Beyond

    Authors: Xuhong Wang, Haoyu Jiang, Yi Yu, Jingru Yu, Yilun Lin, Ping Yi, Yingchun Wang, Yu Qiao, Li Li, Fei-Yue Wang

    Abstract: Large Language Models (LLMs) are increasingly integrated into diverse industries, posing substantial security risks due to unauthorized replication and misuse. To mitigate these concerns, robust identification mechanisms are widely acknowledged as an effective strategy. Identification systems for LLMs now rely heavily on watermarking technology to manage and protect intellectual property and ensur… ▽ More

    Submitted 24 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 59 pages, 7 figures

  31. arXiv:2407.10976  [pdf, other

    cs.NI cs.LG eess.SP stat.AP

    Learning Cellular Network Connection Quality with Conformal

    Authors: Hanyang Jiang, Elizabeth Belding, Ellen Zegure, Yao Xie

    Abstract: In this paper, we address the problem of uncertainty quantification for cellular network speed. It is a well-known fact that the actual internet speed experienced by a mobile phone can fluctuate significantly, even when remaining in a single location. This high degree of variability underscores that mere point estimation of network speed is insufficient. Rather, it is advantageous to establish a p… ▽ More

    Submitted 4 June, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2311.05641

  32. arXiv:2407.10811  [pdf, other

    cs.MA cs.AI cs.LG

    GuideLight: "Industrial Solution" Guidance for More Practical Traffic Signal Control Agents

    Authors: Haoyuan Jiang, Xuantang Xiong, Ziyue Li, Hangyu Mao, Guanghu Sui, Jingqing Ruan, Yuheng Cheng, Hua Wei, Wolfgang Ketter, Rui Zhao

    Abstract: Currently, traffic signal control (TSC) methods based on reinforcement learning (RL) have proven superior to traditional methods. However, most RL methods face difficulties when applied in the real world due to three factors: input, output, and the cycle-flow relation. The industry's observable input is much more limited than simulation-based RL methods. For real-world solutions, only flow can be… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Under Review of IEEE Transactions on Intelligent Transportation Systems

  33. arXiv:2407.09661  [pdf, other

    cs.HC cs.CL

    Bridging Dictionary: AI-Generated Dictionary of Partisan Language Use

    Authors: Hang Jiang, Doug Beeferman, William Brannon, Andrew Heyward, Deb Roy

    Abstract: Words often carry different meanings for people from diverse backgrounds. Today's era of social polarization demands that we choose words carefully to prevent miscommunication, especially in political communication and journalism. To address this issue, we introduce the Bridging Dictionary, an interactive tool designed to illuminate how words are perceived by people with different political views.… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted to CSCW Demo 2024

  34. arXiv:2407.08990  [pdf, other

    cs.AR cs.AI cs.ET cs.NE

    Dynamic neural network with memristive CIM and CAM for 2D and 3D vision

    Authors: Yue Zhang, Woyu Zhang, Shaocong Wang, Ning Lin, Yifei Yu, Yangu He, Bo Wang, Hao Jiang, Peng Lin, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

    Abstract: The brain is dynamic, associative and efficient. It reconfigures by associating the inputs with past experiences, with fused memory and processing. In contrast, AI models are static, unable to associate inputs with past experiences, and run on digital computers with physically separated memory and processing. We propose a hardware-software co-design, a semantic memory-based dynamic neural network… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: In press

  35. arXiv:2407.08939  [pdf, other

    cs.CV

    LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models

    Authors: Hai Jiang, Ao Luo, Xiaohong Liu, Songchen Han, Shuaicheng Liu

    Abstract: In this paper, we propose a diffusion-based unsupervised framework that incorporates physically explainable Retinex theory with diffusion models for low-light image enhancement, named LightenDiffusion. Specifically, we present a content-transfer decomposition network that performs Retinex decomposition within the latent space instead of image space as in previous approaches, enabling the encoded f… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  36. arXiv:2407.08848  [pdf, other

    cs.RO

    GCS*: Forward Heuristic Search on Implicit Graphs of Convex Sets

    Authors: Shao Yuan Chew Chia, Rebecca H. Jiang, Bernhard Paus Graesdal, Leslie Pack Kaelbling, Russ Tedrake

    Abstract: We consider large-scale, implicit-search-based solutions to the Shortest Path Problems on Graphs of Convex Sets (GCS). We propose GCS*, a forward heuristic search algorithm that generalizes A* search to the GCS setting, where a continuous-valued decision is made at each graph vertex, and constraints across graph edges couple these decisions, influencing costs and feasibility. Such mixed discrete-c… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  37. arXiv:2407.07614  [pdf, other

    cs.CV

    MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

    Authors: Wanggui He, Siming Fu, Mushui Liu, Xierui Wang, Wenyi Xiao, Fangxun Shu, Yi Wang, Lei Zhang, Zhelun Yu, Haoyuan Li, Ziwei Huang, LeiLei Gan, Hao Jiang

    Abstract: Auto-regressive models have made significant progress in the realm of language generation, yet they do not perform on par with diffusion models in the domain of image synthesis. In this work, we introduce MARS, a novel framework for T2I generation that incorporates a specially designed Semantic Vision-Language Integration Expert (SemVIE). This innovative component integrates pre-trained LLMs by in… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: 14 pages, 9 figures

  38. arXiv:2407.07457  [pdf, other

    cs.LG cs.CL

    GLBench: A Comprehensive Benchmark for Graph with Large Language Models

    Authors: Yuhan Li, Peisong Wang, Xiao Zhu, Aochuan Chen, Haiyun Jiang, Deng Cai, Victor Wai Kin Chan, Jia Li

    Abstract: The emergence of large language models (LLMs) has revolutionized the way we interact with graphs, leading to a new paradigm called GraphLLM. Despite the rapid development of GraphLLM methods in recent years, the progress and understanding of this field remain unclear due to the lack of a benchmark with consistent experimental protocols. To bridge this gap, we introduce GLBench, the first comprehen… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2306.10280 by other authors

  39. arXiv:2407.05758  [pdf, other

    eess.IV cs.AI cs.CV

    Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports

    Authors: Yutong Zhang, Yi Pan, Tianyang Zhong, Peixin Dong, Kangni Xie, Yuxiao Liu, Hanqi Jiang, Zhengliang Liu, Shijie Zhao, Tuo Zhang, Xi Jiang, Dinggang Shen, Tianming Liu, Xin Zhang

    Abstract: Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecti… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  40. arXiv:2407.04041  [pdf, other

    cs.CV

    Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation

    Authors: Laiyan Ding, Hualie Jiang, Jie Li, Yongquan Chen, Rui Huang

    Abstract: Depth estimation is a cornerstone for autonomous driving, yet acquiring per-pixel depth ground truth for supervised learning is challenging. Self-Supervised Surround Depth Estimation (SSSDE) from consecutive images offers an economical alternative. While previous SSSDE methods have proposed different mechanisms to fuse information across images, few of them explicitly consider the cross-view const… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  41. arXiv:2407.02490  [pdf, other

    cs.CL cs.LG

    MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention

    Authors: Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H. Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, Lili Qiu

    Abstract: The computational challenges of Large Language Model (LLM) inference remain a significant barrier to their widespread deployment, especially as prompt lengths continue to increase. Due to the quadratic complexity of the attention computation, it takes 30 minutes for an 8B LLM to process a prompt of 1M tokens (i.e., the pre-filling stage) on a single A100 GPU. Existing methods for speeding up prefi… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  42. arXiv:2407.01303  [pdf, other

    cs.RO

    RoDyn-SLAM: Robust Dynamic Dense RGB-D SLAM with Neural Radiance Fields

    Authors: Haochen Jiang, Yueming Xu, Kejie Li, Jianfeng Feng, Li Zhang

    Abstract: Leveraging neural implicit representation to conduct dense RGB-D SLAM has been studied in recent years. However, this approach relies on a static environment assumption and does not work robustly within a dynamic environment due to the inconsistent observation of geometry and photometry. To address the challenges presented in dynamic environments, we propose a novel dynamic SLAM framework with neu… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: IEEE RAL 2024

  43. arXiv:2406.20077  [pdf, other

    cs.CV

    HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Model

    Authors: Hieu T. Nguyen, Yiwen Chen, Vikram Voleti, Varun Jampani, Huaizu Jiang

    Abstract: We introduce HouseCrafter, a novel approach that can lift a floorplan into a complete large 3D indoor scene (e.g., a house). Our key insight is to adapt a 2D diffusion model, which is trained on web-scale images, to generate consistent multi-view color (RGB) and depth (D) images across different locations of the scene. Specifically, the RGB-D images are generated autoregressively in a batch-wise m… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  44. arXiv:2406.19756  [pdf, other

    cs.CV cs.AI

    Structure-aware World Model for Probe Guidance via Large-scale Self-supervised Pre-train

    Authors: Haojun Jiang, Meng Li, Zhenguo Sun, Ning Jia, Yu Sun, Shaqi Luo, Shiji Song, Gao Huang

    Abstract: The complex structure of the heart leads to significant challenges in echocardiography, especially in acquisition cardiac ultrasound images. Successful echocardiography requires a thorough understanding of the structures on the two-dimensional plane and the spatial relationships between planes in three-dimensional space. In this paper, we innovatively propose a large-scale self-supervised pre-trai… ▽ More

    Submitted 19 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI 2024 ASMUS Workshop

  45. arXiv:2406.15568  [pdf, other

    cs.LG

    Robust Reinforcement Learning from Corrupted Human Feedback

    Authors: Alexander Bukharin, Ilgee Hong, Haoming Jiang, Zichong Li, Qingru Zhang, Zixuan Zhang, Tuo Zhao

    Abstract: Reinforcement learning from human feedback (RLHF) provides a principled framework for aligning AI systems with human preference data. For various reasons, e.g., personal bias, context ambiguity, lack of training, etc, human annotators may give incorrect or inconsistent preference labels. To tackle this challenge, we propose a robust RLHF approach -- $R^3M$, which models the potentially corrupted p… ▽ More

    Submitted 9 July, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

    Comments: 22 pages, 7 figures

  46. arXiv:2406.14230  [pdf, other

    cs.CL cs.AI cs.CY

    Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing

    Authors: Han Jiang, Xiaoyuan Yi, Zhihua Wei, Shu Wang, Xing Xie

    Abstract: Warning: this paper contains model outputs exhibiting unethical information. Large Language Models (LLMs) have achieved significant breakthroughs, but their generated unethical content poses potential risks. Measuring value alignment of LLMs becomes crucial for their regulation and responsible deployment. Numerous datasets have been constructed to assess social bias, toxicity, and ethics in LLMs,… ▽ More

    Submitted 11 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: Work in progress

  47. arXiv:2406.13897  [pdf, other

    cs.CV

    CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets

    Authors: Longwen Zhang, Ziyu Wang, Qixuan Zhang, Qiwei Qiu, Anqi Pang, Haoran Jiang, Wei Yang, Lan Xu, Jingyi Yu

    Abstract: In the realm of digital creativity, our potential to craft intricate 3D worlds from imagination is often hampered by the limitations of existing digital tools, which demand extensive expertise and efforts. To narrow this disparity, we introduce CLAY, a 3D geometry and material generator designed to effortlessly transform human imagination into intricate 3D digital structures. CLAY supports classic… ▽ More

    Submitted 30 May, 2024; originally announced June 2024.

    Comments: Project page: https://sites.google.com/view/clay-3dlm Video: https://youtu.be/YcKFp4U2Voo

  48. arXiv:2406.13165  [pdf, other

    eess.IV cs.AI cs.CV cs.RO

    Cardiac Copilot: Automatic Probe Guidance for Echocardiography with World Model

    Authors: Haojun Jiang, Zhenguo Sun, Ning Jia, Meng Li, Yu Sun, Shaqi Luo, Shiji Song, Gao Huang

    Abstract: Echocardiography is the only technique capable of real-time imaging of the heart and is vital for diagnosing the majority of cardiac diseases. However, there is a severe shortage of experienced cardiac sonographers, due to the heart's complex structure and significant operational challenges. To mitigate this situation, we present a Cardiac Copilot system capable of providing real-time probe moveme… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Early Accepted by MICCAI 2024

  49. arXiv:2406.13050  [pdf, other

    cs.CL

    Think-then-Act: A Dual-Angle Evaluated Retrieval-Augmented Generation

    Authors: Yige Shen, Hao Jiang, Hua Qu, Jihong Zhao

    Abstract: Despite their impressive capabilities, large language models (LLMs) often face challenges such as temporal misalignment and generating hallucinatory content. Enhancing LLMs with retrieval mechanisms to fetch relevant information from external sources offers a promising solution. Inspired by the proverb "Think twice before you act," we propose a dual-angle evaluated retrieval-augmented generation f… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 12 pages, 8 figures

  50. arXiv:2406.12303  [pdf, other

    cs.CV

    Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment

    Authors: Yiheng Li, Heyang Jiang, Akio Kodaira, Masayoshi Tomizuka, Kurt Keutzer, Chenfeng Xu

    Abstract: In this paper, we point out suboptimal noise-data mapping leads to slow training of diffusion models. During diffusion training, current methods diffuse each image across the entire noise space, resulting in a mixture of all images at every point in the noise layer. We emphasize that this random mixture of noise-data mapping complicates the optimization of the denoising function in diffusion model… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.