Skip to main content

Showing 1–50 of 685 results for author: Du, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.09675  [pdf, other

    cs.AI cs.MA cs.RO

    Multi-Agent Reinforcement Learning for Autonomous Driving: A Survey

    Authors: Ruiqi Zhang, Jing Hou, Florian Walter, Shangding Gu, Jiayi Guan, Florian Röhrbein, Yali Du, Panpan Cai, Guang Chen, Alois Knoll

    Abstract: Reinforcement Learning (RL) is a potent tool for sequential decision-making and has achieved performance surpassing human capabilities across many challenging real-world tasks. As the extension of RL in the multi-agent system domain, multi-agent RL (MARL) not only need to learn the control policy but also requires consideration regarding interactions with all other agents in the environment, mutua… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 23 pages, 6 figures and 2 tables. Submitted to IEEE Journal

  2. arXiv:2408.08230  [pdf, other

    cs.AI cs.LG

    Explaining an Agent's Future Beliefs through Temporally Decomposing Future Reward Estimators

    Authors: Mark Towers, Yali Du, Christopher Freeman, Timothy J. Norman

    Abstract: Future reward estimation is a core component of reinforcement learning agents; i.e., Q-value and state-value functions, predicting an agent's sum of future rewards. Their scalar output, however, obfuscates when or what individual future rewards an agent may expect to receive. We address this by modifying an agent's future reward estimator to predict their next N expected rewards, referred to as Te… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 7 pages + 3 pages of supplementary material. Published at ECAI 2024

    Journal ref: ECAI 2024

  3. arXiv:2408.07009  [pdf, other

    cs.CV

    Imagen 3

    Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis, Siavash Khodadadeh , et al. (227 additional authors not shown)

    Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

    Submitted 13 August, 2024; originally announced August 2024.

  4. arXiv:2408.06185  [pdf, other

    eess.SY cs.CY cs.GT cs.NI

    Hi-SAM: A high-scalable authentication model for satellite-ground Zero-Trust system using mean field game

    Authors: Xuesong Wu, Tianshuai Zheng, Runfang Wu, Jie Ren, Junyan Guo, Ye Du

    Abstract: As more and more Internet of Thing (IoT) devices are connected to satellite networks, the Zero-Trust Architecture brings dynamic security to the satellite-ground system, while frequent authentication creates challenges for system availability. To make the system's accommodate more IoT devices, this paper proposes a high-scalable authentication model (Hi-SAM). Hi-SAM introduces the Proof-of-Work id… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  5. arXiv:2408.05706  [pdf, other

    cs.CV

    Decoder Pre-Training with only Text for Scene Text Recognition

    Authors: Shuai Zhao, Yongkun Du, Zhineng Chen, Yu-Gang Jiang

    Abstract: Scene text recognition (STR) pre-training methods have achieved remarkable progress, primarily relying on synthetic datasets. However, the domain gap between synthetic and real images poses a challenge in acquiring feature representations that align well with images on real scenes, thereby limiting the performance of these methods. We note that vision-language models like CLIP, pre-trained on exte… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024

  6. arXiv:2408.05285  [pdf, other

    cs.LG cs.AI

    Semi-Supervised One-Shot Imitation Learning

    Authors: Philipp Wu, Kourosh Hakhamaneshi, Yuqing Du, Igor Mordatch, Aravind Rajeswaran, Pieter Abbeel

    Abstract: One-shot Imitation Learning~(OSIL) aims to imbue AI agents with the ability to learn a new task from a single demonstration. To supervise the learning, OSIL typically requires a prohibitively large number of paired expert demonstrations -- i.e. trajectories corresponding to different variations of the same semantic task. To overcome this limitation, we introduce the semi-supervised OSIL problem se… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Journal ref: Reinforcement Learning Journal 1 (2024)

  7. arXiv:2408.04380  [pdf, other

    cs.RO cs.LG

    Deep Generative Models in Robotics: A Survey on Learning from Multimodal Demonstrations

    Authors: Julen Urain, Ajay Mandlekar, Yilun Du, Mahi Shafiullah, Danfei Xu, Katerina Fragkiadaki, Georgia Chalvatzaki, Jan Peters

    Abstract: Learning from Demonstrations, the field that proposes to learn robot behavior models from data, is gaining popularity with the emergence of deep generative models. Although the problem has been studied for years under names such as Imitation Learning, Behavioral Cloning, or Inverse Reinforcement Learning, classical methods have relied on models that don't capture complex data distributions well or… ▽ More

    Submitted 20 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 20 pages, 11 figures, submitted to TRO

  8. arXiv:2408.04171  [pdf, other

    cs.CV

    Rotation center identification based on geometric relationships for rotary motion deblurring

    Authors: Jinhui Qin, Yong Ma, Jun Huang, Fan Fan, You Du

    Abstract: Non-blind rotary motion deblurring (RMD) aims to recover the latent clear image from a rotary motion blurred (RMB) image. The rotation center is a crucial input parameter in non-blind RMD methods. Existing methods directly estimate the rotation center from the RMB image. However they always suffer significant errors, and the performance of RMD is limited. For the assembled imaging systems, the pos… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  9. arXiv:2408.03574  [pdf, other

    cs.CV cs.CL cs.LG

    Teach CLIP to Develop a Number Sense for Ordinal Regression

    Authors: Yao Du, Qiang Zhai, Weihang Dai, Xiaomeng Li

    Abstract: Ordinal regression is a fundamental problem within the field of computer vision, with customised well-trained models on specific tasks. While pre-trained vision-language models (VLMs) have exhibited impressive performance on various vision tasks, their potential for ordinal regression has received less exploration. In this study, we first investigate CLIP's potential for ordinal regression, from w… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV 2024

  10. arXiv:2408.02039  [pdf, other

    cs.CV

    Pixel-Level Domain Adaptation: A New Perspective for Enhancing Weakly Supervised Semantic Segmentation

    Authors: Ye Du, Zehua Fu, Qingjie Liu

    Abstract: Recent attention has been devoted to the pursuit of learning semantic segmentation models exclusively from image tags, a paradigm known as image-level Weakly Supervised Semantic Segmentation (WSSS). Existing attempts adopt the Class Activation Maps (CAMs) as priors to mine object regions yet observe the imbalanced activation issue, where only the most discriminative object parts are located. In th… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: 15 pages, 9 figures

  11. arXiv:2408.01090  [pdf, other

    cs.CL cs.AR cs.NE

    General-purpose Dataflow Model with Neuromorphic Primitives

    Authors: Weihao Zhang, Yu Du, Hongyi Li, Songchen Ma, Rong Zhao

    Abstract: Neuromorphic computing exhibits great potential to provide high-performance benefits in various applications beyond neural networks. However, a general-purpose program execution model that aligns with the features of neuromorphic computing is required to bridge the gap between program versatility and neuromorphic hardware efficiency. The dataflow model offers a potential solution, but it faces hig… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  12. arXiv:2407.21011  [pdf, other

    cs.CV cs.AI cs.LG

    CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning

    Authors: Yuexi Du, Brian Chang, Nicha C. Dvornek

    Abstract: Recent advancements in Contrastive Language-Image Pre-training (CLIP) have demonstrated notable success in self-supervised representation learning across various tasks. However, the existing CLIP-like approaches often demand extensive GPU resources and prolonged training times due to the considerable size of the model and dataset, making them poor for medical applications, in which large datasets… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by MICCAI 2024

  13. arXiv:2407.15111  [pdf, other

    cs.CV

    D$^4$-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On

    Authors: Zhaotong Yang, Zicheng Jiang, Xinzhe Li, Huiyu Zhou, Junyu Dong, Huaidong Zhang, Yong Du

    Abstract: In this paper, we introduce D$^4$-VTON, an innovative solution for image-based virtual try-on. We address challenges from previous studies, such as semantic inconsistencies before and after garment warping, and reliance on static, annotation-driven clothing parsers. Additionally, we tackle the complexities in diffusion-based VTON models when handling simultaneous tasks like inpainting and denoisin… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  14. arXiv:2407.13622  [pdf, other

    cs.LG cs.AI

    Misspecified $Q$-Learning with Sparse Linear Function Approximation: Tight Bounds on Approximation Error

    Authors: Ally Yalei Du, Lin F. Yang, Ruosong Wang

    Abstract: The recent work by Dong & Yang (2023) showed for misspecified sparse linear bandits, one can obtain an $O\left(ε\right)$-optimal policy using a polynomial number of samples when the sparsity is a constant, where $ε$ is the misspecification error. This result is in sharp contrast to misspecified linear bandits without sparsity, which require an exponential number of samples to get the same guarante… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 21 pages

  15. arXiv:2407.13168  [pdf, other

    cs.AI cs.CL

    SciCode: A Research Coding Benchmark Curated by Scientists

    Authors: Minyang Tian, Luyu Gao, Shizhuo Dylan Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li, Shengyan Liu, Di Luo, Yutao Ma, Hao Tong, Kha Trinh, Chenyu Tian, Zihan Wang, Bohao Wu, Yanyu Xiong, Shengzhu Yin, Minhui Zhu, Kilian Lieret, Yanxin Lu, Genglin Liu, Yufeng Du , et al. (5 additional authors not shown)

    Abstract: Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this issue by examining LMs' capabilities to generate code for solving real scientific research problems. Incorporating input from scientists and AI researchers in 16 diverse natural science sub-fields,… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 25 pages, 9 figures, 7 tables

  16. arXiv:2407.12505  [pdf, other

    cs.LG cs.AI cs.RO

    Subequivariant Reinforcement Learning in 3D Multi-Entity Physical Environments

    Authors: Runfa Chen, Ling Wang, Yu Du, Tianrui Xue, Fuchun Sun, Jianwei Zhang, Wenbing Huang

    Abstract: Learning policies for multi-entity systems in 3D environments is far more complicated against single-entity scenarios, due to the exponential expansion of the global state space as the number of entities increases. One potential solution of alleviating the exponential complexity is dividing the global space into independent local views that are invariant to transformations including translations a… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ICML 2024

  17. arXiv:2407.12317  [pdf, other

    cs.CV

    Out of Length Text Recognition with Sub-String Matching

    Authors: Yongkun Du, Zhineng Chen, Caiyan Jia, Xieping Gao, Yu-Gang Jiang

    Abstract: Scene Text Recognition (STR) methods have demonstrated robust performance in word-level text recognition. However, in real applications the text image is sometimes long due to detected with multiple horizontal words. It triggers the requirement to build long text recognition models from readily available short (i.e., word-level) text datasets, which has been less studied previously. In this paper,… ▽ More

    Submitted 13 August, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: Preprint, 16 pages

  18. arXiv:2407.11333  [pdf, other

    cs.RO cs.SD eess.AS

    Disentangled Acoustic Fields For Multimodal Physical Scene Understanding

    Authors: Jie Yin, Andrew Luo, Yilun Du, Anoop Cherian, Tim K. Marks, Jonathan Le Roux, Chuang Gan

    Abstract: We study the problem of multimodal physical scene understanding, where an embodied agent needs to find fallen objects by inferring object properties, direction, and distance of an impact sound source. Previous works adopt feed-forward neural networks to directly regress the variables from sound, leading to poor generalization and domain adaptation issues. In this paper, we illustrate that learning… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  19. arXiv:2407.06494  [pdf, other

    cs.LG cs.AI

    A Generative Approach to Control Complex Physical Systems

    Authors: Long Wei, Peiyan Hu, Ruiqi Feng, Haodong Feng, Yixuan Du, Tao Zhang, Rui Wang, Yue Wang, Zhi-Ming Ma, Tailin Wu

    Abstract: Controlling the evolution of complex physical systems is a fundamental task across science and engineering. Classical techniques suffer from limited applicability or huge computational costs. On the other hand, recent deep learning and reinforcement learning-based approaches often struggle to optimize long-term control sequences under the constraints of system dynamics. In this work, we introduce… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  20. arXiv:2407.06169  [pdf, other

    cs.RO cs.CV cs.LG

    Potential Based Diffusion Motion Planning

    Authors: Yunhao Luo, Chen Sun, Joshua B. Tenenbaum, Yilun Du

    Abstract: Effective motion planning in high dimensional spaces is a long-standing open problem in robotics. One class of traditional motion planning algorithms corresponds to potential-based motion planning. An advantage of potential based motion planning is composability -- different motion constraints can be easily combined by adding corresponding potentials. However, constructing motion paths from potent… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: ICML 2024. Project page and code at https://energy-based-model.github.io/potential-motion-plan/

  21. arXiv:2407.04842  [pdf, other

    cs.CV cs.CL cs.LG

    MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

    Authors: Zhaorun Chen, Yichao Du, Zichen Wen, Yiyang Zhou, Chenhang Cui, Zhenzhen Weng, Haoqin Tu, Chaoqi Wang, Zhengwei Tong, Qinglan Huang, Canyu Chen, Qinghao Ye, Zhihong Zhu, Yuqing Zhang, Jiawei Zhou, Zhuokai Zhao, Rafael Rafailov, Chelsea Finn, Huaxiu Yao

    Abstract: While text-to-image models like DALLE-3 and Stable Diffusion are rapidly proliferating, they often encounter challenges such as hallucination, bias, and the production of unsafe, low-quality output. To effectively address these issues, it is crucial to align these models with desired behaviors based on feedback from a multimodal judge. Despite their significance, current multimodal judges frequent… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    Comments: 42 pages, 13 figures, 33 tables

  22. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  23. arXiv:2407.03719  [pdf, other

    cs.CV

    Relative Difficulty Distillation for Semantic Segmentation

    Authors: Dong Liang, Yue Sun, Yun Du, Songcan Chen, Sheng-Jun Huang

    Abstract: Current knowledge distillation (KD) methods primarily focus on transferring various structured knowledge and designing corresponding optimization goals to encourage the student network to imitate the output of the teacher network. However, introducing too many additional optimization objectives may lead to unstable training, such as gradient conflicts. Moreover, these methods ignored the guideline… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  24. arXiv:2407.03442  [pdf, other

    cs.CV

    Fisher-aware Quantization for DETR Detectors with Critical-category Objectives

    Authors: Huanrui Yang, Yafeng Huang, Zhen Dong, Denis A Gudovskiy, Tomoyuki Okuno, Yohei Nakata, Yuan Du, Kurt Keutzer, Shanghang Zhang

    Abstract: The impact of quantization on the overall performance of deep learning models is a well-studied problem. However, understanding and mitigating its effects on a more fine-grained level is still lacking, especially for harder tasks such as object detection with both classification and regression objectives. This work defines the performance for a subset of task-critical categories, i.e. the critical… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Poster presentation at the 2nd Workshop on Advancing Neural Network Training: Computational Efficiency, Scalability, and Resource Optimization (WANT@ICML 2024)

  25. arXiv:2407.02913  [pdf, other

    cs.LG cs.AI eess.IV eess.SP math.NA

    SFC: Achieve Accurate Fast Convolution under Low-precision Arithmetic

    Authors: Liulu He, Yufei Zhao, Rui Gao, Yuan Du, Li Du

    Abstract: Fast convolution algorithms, including Winograd and FFT, can efficiently accelerate convolution operations in deep models. However, these algorithms depend on high-precision arithmetic to maintain inference accuracy, which conflicts with the model quantization. To resolve this conflict and further improve the efficiency of quantized convolution, we proposes SFC, a new algebra transform for fast co… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: ICML 2024

  26. arXiv:2407.01392  [pdf, other

    cs.LG cs.CV cs.RO

    Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

    Authors: Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, Vincent Sitzmann

    Abstract: This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens without fully diffusing past ones. Our approach is shown to combine the strengths of… ▽ More

    Submitted 4 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Project website: https://boyuan.space/diffusion-forcing Code: https://github.com/buoyancy99/diffusion-forcing

  27. arXiv:2406.19298  [pdf, other

    cs.CV cs.LG

    Compositional Image Decomposition with Diffusion Models

    Authors: Jocelin Su, Nan Liu, Yanbo Wang, Joshua B. Tenenbaum, Yilun Du

    Abstract: Given an image of a natural scene, we are able to quickly decompose it into a set of components such as objects, lighting, shadows, and foreground. We can then envision a scene where we combine certain components with those from other images, for instance a set of objects from our bedroom and animals from a zoo under the lighting conditions of a forest, even if we have never encountered such a sce… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: ICML 2024, Webpage: https://energy-based-model.github.io/decomp-diffusion

  28. arXiv:2406.18020  [pdf, other

    cs.LG cs.AI physics.chem-ph

    MolFusion: Multimodal Fusion Learning for Molecular Representations via Multi-granularity Views

    Authors: Muzhen Cai, Sendong Zhao, Haochun Wang, Yanrui Du, Zewen Qiang, Bing Qin, Ting Liu

    Abstract: Artificial Intelligence predicts drug properties by encoding drug molecules, aiding in the rapid screening of candidates. Different molecular representations, such as SMILES and molecule graphs, contain complementary information for molecular encoding. Thus exploiting complementary information from different molecular representations is one of the research priorities in molecular encoding. Most ex… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 8 pages, 5 figures

  29. arXiv:2406.16976  [pdf, other

    cs.NE cs.AI cs.LG physics.chem-ph

    Efficient Evolutionary Search Over Chemical Space with Large Language Models

    Authors: Haorui Wang, Marta Skreta, Cher-Tian Ser, Wenhao Gao, Lingkai Kong, Felix Strieth-Kalthoff, Chenru Duan, Yuchen Zhuang, Yue Yu, Yanqiao Zhu, Yuanqi Du, Alán Aspuru-Guzik, Kirill Neklyudov, Chao Zhang

    Abstract: Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable. Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations… ▽ More

    Submitted 2 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  30. arXiv:2406.16754  [pdf, other

    cs.LG cs.CV eess.IV

    The MRI Scanner as a Diagnostic: Image-less Active Sampling

    Authors: Yuning Du, Rohan Dharmakumar, Sotirios A. Tsaftaris

    Abstract: Despite the high diagnostic accuracy of Magnetic Resonance Imaging (MRI), using MRI as a Point-of-Care (POC) disease identification tool poses significant accessibility challenges due to the use of high magnetic field strength and lengthy acquisition times. We ask a simple question: Can we dynamically optimise acquired samples, at the patient level, according to an (automated) downstream decision… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted in MICCAI 2024

  31. arXiv:2406.16087  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    Imperative Learning: A Self-supervised Neural-Symbolic Learning Framework for Robot Autonomy

    Authors: Chen Wang, Kaiyi Ji, Junyi Geng, Zhongqiang Ren, Taimeng Fu, Fan Yang, Yifan Guo, Haonan He, Xiangyu Chen, Zitong Zhan, Qiwei Du, Shaoshu Su, Bowen Li, Yuheng Qiu, Yi Du, Qihang Li, Yifan Yang, Xiao Lin, Zhipeng Zhao

    Abstract: Data-driven methods such as reinforcement and imitation learning have achieved remarkable success in robot autonomy. However, their data-centric nature still hinders them from generalizing well to ever-changing environments. Moreover, collecting large datasets for robotic tasks is often impractical and expensive. To overcome these challenges, we introduce a new self-supervised neural-symbolic (NeS… ▽ More

    Submitted 6 August, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  32. arXiv:2406.16030  [pdf, other

    cs.CL cs.AI

    Zero-Shot Cross-Lingual NER Using Phonemic Representations for Low-Resource Languages

    Authors: Jimin Sohn, Haeji Jung, Alex Cheng, Jooeon Kang, Yilin Du, David R. Mortensen

    Abstract: Existing zero-shot cross-lingual NER approaches require substantial prior knowledge of the target language, which is impractical for low-resource languages. In this paper, we propose a novel approach to NER using phonemic representation based on the International Phonetic Alphabet (IPA) to bridge the gap between representations of different languages. Our experiments show that our method significa… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures, 5 tables

  33. arXiv:2406.14129  [pdf, other

    cs.CV cs.CL cs.MM

    Towards Event-oriented Long Video Understanding

    Authors: Yifan Du, Kun Zhou, Yuqi Huo, Yifan Li, Wayne Xin Zhao, Haoyu Lu, Zijia Zhao, Bingning Wang, Weipeng Chen, Ji-Rong Wen

    Abstract: With the rapid development of video Multimodal Large Language Models (MLLMs), numerous benchmarks have been proposed to assess their video understanding capability. However, due to the lack of rich events in the videos, these datasets may suffer from the short-cut bias that the answers can be deduced from a few frames, without the need to watch the entire video. To address this issue, we introduce… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Work on progress

  34. arXiv:2406.13948  [pdf, other

    cs.AI cs.CL cs.LG

    CityGPT: Empowering Urban Spatial Cognition of Large Language Models

    Authors: Jie Feng, Yuwei Du, Tianhui Liu, Siqi Guo, Yuming Lin, Yong Li

    Abstract: Large language models(LLMs) with powerful language generation and reasoning capabilities have already achieved success in many domains, e.g., math and code generation. However, due to the lacking of physical world's corpus and knowledge during training, they usually fail to solve many real-life tasks in the urban space. In this paper, we propose CityGPT, a systematic framework for enhancing the ca… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  35. arXiv:2406.13945  [pdf, other

    cs.AI cs.CL cs.LG

    CityBench: Evaluating the Capabilities of Large Language Model as World Model

    Authors: Jie Feng, Jun Zhang, Junbo Yan, Xin Zhang, Tianjian Ouyang, Tianhui Liu, Yuwei Du, Siqi Guo, Yong Li

    Abstract: Large language models (LLMs) with powerful generalization ability has been widely used in many domains. A systematic and reliable evaluation of LLMs is a crucial step in their development and applications, especially for specific professional fields. In the urban domain, there have been some early explorations about the usability of LLMs, but a systematic and scalable evaluation benchmark is still… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  36. arXiv:2406.13271  [pdf, other

    cs.CV

    Hierarchical IoU Tracking based on Interval

    Authors: Yunhao Du, Zhicheng Zhao, Fei Su

    Abstract: Multi-Object Tracking (MOT) aims to detect and associate all targets of given classes across frames. Current dominant solutions, e.g. ByteTrack and StrongSORT++, follow the hybrid pipeline, which first accomplish most of the associations in an online manner, and then refine the results using offline tricks such as interpolation and global link. While this paradigm offers flexibility in application… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 7 pages, 3 figures

  37. arXiv:2406.12195  [pdf, other

    quant-ph cs.LG

    Quantum Compiling with Reinforcement Learning on a Superconducting Processor

    Authors: Z. T. Wang, Qiuhao Chen, Yuxuan Du, Z. H. Yang, Xiaoxia Cai, Kaixuan Huang, Jingning Zhang, Kai Xu, Jun Du, Yinan Li, Yuling Jiao, Xingyao Wu, Wu Liu, Xiliang Lu, Huikai Xu, Yirong Jin, Ruixia Wang, Haifeng Yu, S. P. Zhao

    Abstract: To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcemen… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  38. arXiv:2406.11776  [pdf, other

    cs.CL

    Improving Multi-Agent Debate with Sparse Communication Topology

    Authors: Yunxuan Li, Yibing Du, Jiageng Zhang, Le Hou, Peter Grabowski, Yeqing Li, Eugene Ie

    Abstract: Multi-agent debate has proven effective in improving large language models quality for reasoning and factuality tasks. While various role-playing strategies in multi-agent debates have been explored, in terms of the communication among agents, existing approaches adopt a brute force algorithm -- each agent can communicate with all other agents. In this paper, we systematically investigate the effe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 13 pages, 9 figures

  39. arXiv:2406.11546  [pdf, other

    eess.AS cs.CL cs.SD

    GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

    Authors: Yifan Yang, Zheshu Song, Jianheng Zhuo, Mingyu Cui, Jinpeng Li, Bo Yang, Yexing Du, Ziyang Ma, Xunying Liu, Ziyuan Wang, Ke Li, Shuai Fan, Kai Yu, Wei-Qiang Zhang, Guoguo Chen, Xie Chen

    Abstract: The evolution of speech technology has been spurred by the rapid increase in dataset sizes. Traditional speech models generally depend on a large amount of labeled training data, which is scarce for low-resource languages. This paper presents GigaSpeech 2, a large-scale, multi-domain, multilingual speech recognition corpus. It is designed for low-resource languages and does not rely on paired spee… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Under review

  40. arXiv:2406.11179  [pdf, other

    cs.LG cs.AI

    Learning Iterative Reasoning through Energy Diffusion

    Authors: Yilun Du, Jiayuan Mao, Joshua B. Tenenbaum

    Abstract: We introduce iterative reasoning through energy diffusion (IRED), a novel framework for learning to reason for a variety of tasks by formulating reasoning and decision-making problems with energy-based optimization. IRED learns energy functions to represent the constraints between input conditions and desired outputs. After training, IRED adapts the number of optimization steps during inference ba… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: ICML 2024, website: https://energy-based-model.github.io/ired/

  41. arXiv:2406.09367  [pdf, other

    cs.CV

    Needle In A Video Haystack: A Scalable Synthetic Framework for Benchmarking Video MLLMs

    Authors: Zijia Zhao, Haoyu Lu, Yuqi Huo, Yifan Du, Tongtian Yue, Longteng Guo, Bingning Wang, Weipeng Chen, Jing Liu

    Abstract: Video understanding is a crucial next step for multimodal large language models (MLLMs). To probe specific aspects of video understanding ability, existing video benchmarks typically require careful video selection based on the target capability, along with laborious annotation of query-response pairs to match the specific video content. This process is both challenging and resource-intensive. In… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  42. arXiv:2406.07098  [pdf, other

    cs.IR cs.AI cs.DB

    Guiding Catalogue Enrichment with User Queries

    Authors: Yupei Du, Jacek Golebiowski, Philipp Schmidt, Ziawasch Abedjan

    Abstract: Techniques for knowledge graph (KGs) enrichment have been increasingly crucial for commercial applications that rely on evolving product catalogues. However, because of the huge search space of potential enrichment, predictions from KG completion (KGC) methods suffer from low precision, making them unreliable for real-world catalogues. Moreover, candidate facts for enrichment have varied relevance… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: ECML PKDD 2024

  43. arXiv:2406.07006  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Few-shot RAW Image Denoising: Methods and Results

    Authors: Xin Jin, Chunle Guo, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Yuekun Dai, Peiqing Yang, Chen Change Loy, Ruoqi Li, Chang Liu, Ziyi Wang, Yao Du, Jingjing Yang, Long Bao, Heng Sun, Xiangyu Kong, Xiaoxia Xing, Jinlong Wu, Yuanyang Xue, Hyunhee Park, Sejun Song, Changho Kim, Jingfan Tan , et al. (17 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Few-shot RAWImage Denoising Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  44. arXiv:2406.05954  [pdf, other

    cs.AI cs.LG eess.SY

    Aligning Large Language Models with Representation Editing: A Control Perspective

    Authors: Lingkai Kong, Haorui Wang, Wenhao Mu, Yuanqi Du, Yuchen Zhuang, Yifei Zhou, Yue Song, Rongzhi Zhang, Kai Wang, Chao Zhang

    Abstract: Aligning large language models (LLMs) with human objectives is crucial for real-world applications. However, fine-tuning LLMs for alignment often suffers from unstable training and requires substantial computing resources. Test-time alignment techniques, such as prompting and guided decoding, do not modify the underlying model, and their performance remains dependent on the original model's capabi… ▽ More

    Submitted 11 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: fix typos

  45. arXiv:2406.05343  [pdf, other

    cs.AI cs.CL

    M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark

    Authors: Wei Song, Yadong Li, Jianhua Xu, Guowei Wu, Lingfeng Ming, Kexin Yi, Weihua Luo, Houyi Li, Yi Du, Fangda Guo, Kaicheng Yu

    Abstract: As recent multi-modality large language models (MLLMs) have shown formidable proficiency on various complex tasks, there has been increasing attention on debating whether these models could eventually mirror human intelligence. However, existing benchmarks mainly focus on evaluating solely on task performance, such as the accuracy of identifying the attribute of an object. Combining well-developed… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  46. arXiv:2406.04845  [pdf, other

    cs.CL cs.AI cs.DC cs.LG cs.MA

    FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large Language Models

    Authors: Rui Ye, Rui Ge, Xinyu Zhu, Jingyi Chai, Yaxin Du, Yang Liu, Yanfeng Wang, Siheng Chen

    Abstract: Federated learning has enabled multiple parties to collaboratively train large language models without directly sharing their data (FedLLM). Following this training paradigm, the community has put massive efforts from diverse aspects including framework, performance, and privacy. However, an unpleasant fact is that there are currently no realistic datasets and benchmarks for FedLLM and previous wo… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 22 pages

  47. arXiv:2406.00497  [pdf, ps, other

    cs.SD cs.AI cs.CL eess.AS

    Recent Advances in End-to-End Simultaneous Speech Translation

    Authors: Xiaoqian Liu, Guoqiang Hu, Yangfan Du, Erfeng He, Yingfeng Luo, Chen Xu, Tong Xiao, Jingbo Zhu

    Abstract: Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles.… ▽ More

    Submitted 20 August, 2024; v1 submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted by IJCAI 2024

  48. arXiv:2405.20018  [pdf, other

    cs.MA cs.CL cs.LG

    Safe Multi-agent Reinforcement Learning with Natural Language Constraints

    Authors: Ziyan Wang, Meng Fang, Tristan Tomilin, Fei Fang, Yali Du

    Abstract: The role of natural language constraints in Safe Multi-agent Reinforcement Learning (MARL) is crucial, yet often overlooked. While Safe MARL has vast potential, especially in fields like robotics and autonomous vehicles, its full potential is limited by the need to define constraints in pre-designed mathematical terms, which requires extensive domain expertise and reinforcement learning knowledge,… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 23 pages, 6 figures

  49. arXiv:2405.19946  [pdf, other

    cs.AI

    Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf

    Authors: Xuanfa Jin, Ziyan Wang, Yali Du, Meng Fang, Haifeng Zhang, Jun Wang

    Abstract: Communication is a fundamental aspect of human society, facilitating the exchange of information and beliefs among people. Despite the advancements in large language models (LLMs), recent agents built with these often neglect the control over discussion tactics, which are essential in communication scenarios and games. As a variant of the famous communication game Werewolf, One Night Ultimate Were… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 27 pages, 5 figures

  50. arXiv:2405.19667  [pdf, other

    cs.LG cs.AI

    Reconciling Model Multiplicity for Downstream Decision Making

    Authors: Ally Yalei Du, Dung Daniel Ngo, Zhiwei Steven Wu

    Abstract: We consider the problem of model multiplicity in downstream decision-making, a setting where two predictive models of equivalent accuracy cannot agree on the best-response action for a downstream loss function. We show that even when the two predictive models approximately agree on their individual predictions almost everywhere, it is still possible for their induced best-response actions to diffe… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

    Comments: 16 pages main body, 6 figures