Skip to main content

Showing 1–50 of 284 results for author: Wu, P

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.05905  [pdf, other

    cs.CV cs.AI

    Weakly Supervised Video Anomaly Detection and Localization with Spatio-Temporal Prompts

    Authors: Peng Wu, Xuerong Zhou, Guansong Pang, Zhiwei Yang, Qingsen Yan, Peng Wang, Yanning Zhang

    Abstract: Current weakly supervised video anomaly detection (WSVAD) task aims to achieve frame-level anomalous event detection with only coarse video-level annotations available. Existing works typically involve extracting global features from full-resolution video frames and training frame-level classifiers to detect anomalies in the temporal dimension. However, most anomalous events tend to occur in local… ▽ More

    Submitted 13 August, 2024; v1 submitted 11 August, 2024; originally announced August 2024.

    Comments: Accepted by ACMMM2024

  2. arXiv:2408.05746  [pdf, ps, other

    cs.IT eess.SP

    Movable Antenna Enhanced AF Relaying: Two-Stage Antenna Position Optimization

    Authors: Nianzu Li, Weidong Mei, Boyu Ning, Peiran Wu

    Abstract: The movable antenna (MA) technology has attracted increasing attention in wireless communications due to its capability for flexibly adjusting the positions of multiple antennas in a local region to reconfigure channel conditions. In this paper, we investigate its application in an amplify-and-forward (AF) relay system, where a multi-MA AF relay is deployed to assist in the wireless communications… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  3. arXiv:2408.05545  [pdf, other

    cs.CL cs.AI

    Multi-layer Sequence Labeling-based Joint Biomedical Event Extraction

    Authors: Gongchi Chen, Pengchao Wu, Jinghang Gu, Longhua Qian, Guodong Zhou

    Abstract: In recent years, biomedical event extraction has been dominated by complicated pipeline and joint methods, which need to be simplified. In addition, existing work has not effectively utilized trigger word information explicitly. Hence, we propose MLSL, a method based on multi-layer sequence labeling for joint biomedical event extraction. MLSL does not introduce prior knowledge and complex structur… ▽ More

    Submitted 14 August, 2024; v1 submitted 10 August, 2024; originally announced August 2024.

    Comments: 13 pages, 3 figures, accepted by NLPCC2024

  4. arXiv:2408.05285  [pdf, other

    cs.LG cs.AI

    Semi-Supervised One-Shot Imitation Learning

    Authors: Philipp Wu, Kourosh Hakhamaneshi, Yuqing Du, Igor Mordatch, Aravind Rajeswaran, Pieter Abbeel

    Abstract: One-shot Imitation Learning~(OSIL) aims to imbue AI agents with the ability to learn a new task from a single demonstration. To supervise the learning, OSIL typically requires a prohibitively large number of paired expert demonstrations -- i.e. trajectories corresponding to different variations of the same semantic task. To overcome this limitation, we introduce the semi-supervised OSIL problem se… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Journal ref: Reinforcement Learning Journal 1 (2024)

  5. arXiv:2408.02934  [pdf, other

    cs.IT eess.SP

    Learned Trimmed-Ridge Regression for Channel Estimation in Millimeter-Wave Massive MIMO

    Authors: Pengxia Wu, Julian Cheng, Yonina C. Eldar, John M. Cioffi

    Abstract: Channel estimation poses significant challenges in millimeter-wave massive multiple-input multiple-output systems, especially when the base station has fewer radio-frequency chains than antennas. To address this challenge, one promising solution exploits the beamspace channel sparsity to reconstruct full-dimensional channels from incomplete measurements. This paper presents a model-based deep lear… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE Transactions on Communications

  6. arXiv:2408.01310  [pdf, other

    cs.CR

    PsybORG+: Modeling and Simulation for Detecting Cognitive Biases in Advanced Persistent Threats

    Authors: Shuo Huang, Fred Jones, Nikolos Gurney, David Pynadath, Kunal Srivastava, Stoney Trent, Peggy Wu, Quanyan Zhu

    Abstract: Advanced Persistent Threats (APTs) bring significant challenges to cybersecurity due to their sophisticated and stealthy nature. Traditional cybersecurity measures fail to defend against APTs. Cognitive vulnerabilities can significantly influence attackers' decision-making processes, which presents an opportunity for defenders to exploit. This work introduces PsybORG$^+$, a multi-agent cybersecuri… ▽ More

    Submitted 13 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

  7. arXiv:2407.18627  [pdf, ps, other

    cs.LG eess.SP

    Multi-Agent Deep Reinforcement Learning for Energy Efficient Multi-Hop STAR-RIS-Assisted Transmissions

    Authors: Pei-Hsiang Liao, Li-Hsiang Shen, Po-Chen Wu, Kai-Ten Feng

    Abstract: Simultaneously transmitting and reflecting reconfigurable intelligent surface (STAR-RIS) provides a promising way to expand coverage in wireless communications. However, limitation of single STAR-RIS inspire us to integrate the concept of multi-hop transmissions, as focused on RIS in existing research. Therefore, we propose the novel architecture of multi-hop STAR-RISs to achieve a wider range of… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Accepted by Proc. IEEE VTC-fall

  8. arXiv:2407.17691  [pdf, other

    cs.NI eess.SY

    System-Level Simulation Framework for NB-IoT: Key Features and Performance Evaluation

    Authors: Shutao Zhang, Wenkun Wen, Peiran Wu, Hongqing Huang, Liya Zhu, Yijia Guo, Tingting Yang, Minghua Xia

    Abstract: Narrowband Internet of Things (NB-IoT) is a technology specifically designated by the 3rd Generation Partnership Project (3GPP) to meet the explosive demand for massive machine-type communications (mMTC), and it is evolving to RedCap. Industrial companies have increasingly adopted NB-IoT as the solution for mMTC due to its lightweight design and comprehensive technical specifications released by 3… ▽ More

    Submitted 13 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  9. arXiv:2407.16207  [pdf, other

    cs.CL

    Graph-Structured Speculative Decoding

    Authors: Zhuocheng Gong, Jiahao Liu, Ziyue Wang, Pengfei Wu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

    Abstract: Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models (LLMs) by employing a small language model to draft a hypothesis sequence, which is then validated by the LLM. The effectiveness of this approach heavily relies on the balance between performance and efficiency of the draft model. In our research, we focus on enhancing the proportion of d… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  10. arXiv:2407.14335  [pdf, other

    econ.GN cs.CE cs.CR q-fin.CP stat.CO

    Quantifying the Blockchain Trilemma: A Comparative Analysis of Algorand, Ethereum 2.0, and Beyond

    Authors: Yihang Fu, Mingwei Jing, Jiaolun Zhou, Peilin Wu, Ye Wang, Luyao Zhang, Chuang Hu

    Abstract: Blockchain technology is essential for the digital economy and metaverse, supporting applications from decentralized finance to virtual assets. However, its potential is constrained by the "Blockchain Trilemma," which necessitates balancing decentralization, security, and scalability. This study evaluates and compares two leading proof-of-stake (PoS) systems, Algorand and Ethereum 2.0, against the… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  11. arXiv:2407.12022   

    cs.CL cs.AI

    ITERTL: An Iterative Framework for Fine-tuning LLMs for RTL Code Generation

    Authors: Peiyang Wu, Nan Guo, Xiao Xiao, Wenming Li, Xiaochun Ye, Dongrui Fan

    Abstract: Recently, large language models (LLMs) have demonstrated excellent performance in understanding human instructions and generating code, which has inspired researchers to explore the feasibility of generating RTL code with LLMs. However, the existing approaches to fine-tune LLMs on RTL codes typically are conducted on fixed datasets, which do not fully stimulate the capability of LLMs and require l… ▽ More

    Submitted 23 July, 2024; v1 submitted 27 June, 2024; originally announced July 2024.

    Comments: There is some mistakes about the Experimental Setup in Section4.1

  12. arXiv:2407.10574  [pdf

    cs.CV

    Stacking-Enhanced Bagging Ensemble Learning for Breast Cancer Classification with CNN

    Authors: Peihceng Wu, Runze Ma, Teoh Teik Toe

    Abstract: This paper proposes a CNN classification network based on Bagging and stacking ensemble learning methods for breast cancer classification. The model was trained and tested on the public dataset of DDSM. The model is capable of fast and accurate classification of input images. According to our research results, for binary classification (presence or absence of breast cancer), the accuracy reached 9… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Published in: 2023 3rd International Conference on Electronic Engineering (ICEEM)

  13. arXiv:2407.09550  [pdf

    cs.CV cs.AI cs.LG

    CAPM: Fast and Robust Verification on Maxpool-based CNN via Dual Network

    Authors: Jia-Hau Bai, Chi-Ting Liu, Yu Wang, Fu-Chieh Chang, Pei-Yuan Wu

    Abstract: This study uses CAPM (Convex Adversarial Polytope for Maxpool-based CNN) to improve the verified bound for general purpose maxpool-based convolutional neural networks (CNNs) under bounded norm adversarial perturbations. The maxpool function is decomposed as a series of ReLU functions to extend the convex relaxation technique to maxpool functions, by which the verified bound can be efficiently comp… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  14. arXiv:2407.09032  [pdf, other

    math.NA cs.LG

    DRM Revisited: A Complete Error Analysis

    Authors: Yuling Jiao, Ruoxuan Li, Peiying Wu, Jerry Zhijian Yang, Pingwen Zhang

    Abstract: In this work, we address a foundational question in the theoretical analysis of the Deep Ritz Method (DRM) under the over-parameteriztion regime: Given a target precision level, how can one determine the appropriate number of training samples, the key architectural parameters of the neural networks, the step size for the projected gradient descent optimization procedure, and the requisite number o… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  15. arXiv:2407.07427  [pdf, other

    cs.CV

    Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation

    Authors: Hao Fang, Peng Wu, Yawei Li, Xinxin Zhang, Xiankai Lu

    Abstract: Open-Vocabulary Video Instance Segmentation (VIS) is attracting increasing attention due to its ability to segment and track arbitrary objects. However, the recent Open-Vocabulary VIS attempts obtained unsatisfactory results, especially in terms of generalization ability of novel categories. We discover that the domain gap between the VLM features (e.g., CLIP) and the instance queries and the unde… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  16. arXiv:2407.07094  [pdf, other

    cs.CL cs.AI

    AnyTaskTune: Advanced Domain-Specific Solutions through Task-Fine-Tuning

    Authors: Jiaxi Cui, Wentao Zhang, Jing Tang, Xudong Tong, Zhenwei Zhang, Amie, Jing Wen, Rongsheng Wang, Pengfei Wu

    Abstract: The pervasive deployment of Large Language Models-LLMs in various sectors often neglects the nuanced requirements of individuals and small organizations, who benefit more from models precisely tailored to their specific business contexts rather than those with broadly superior general capabilities. This work introduces \textbf{AnyTaskTune}, a novel fine-tuning methodology coined as \textbf{Task-Fi… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  17. arXiv:2407.03900  [pdf, other

    cs.CV

    Oracle Bone Inscriptions Multi-modal Dataset

    Authors: Bang Li, Donghao Luo, Yujie Liang, Jing Yang, Zengmao Ding, Xu Peng, Boyuan Jiang, Shengwei Han, Dan Sui, Peichao Qin, Pian Wu, Chaoyang Wang, Yun Qi, Taisong Jin, Chengjie Wang, Xiaoming Huang, Zhan Shu, Rongrong Ji, Yongge Liu, Yunsheng Wu

    Abstract: Oracle bone inscriptions(OBI) is the earliest developed writing system in China, bearing invaluable written exemplifications of early Shang history and paleography. However, the task of deciphering OBI, in the current climate of the scholarship, can prove extremely challenging. Out of the 4,500 oracle bone characters excavated, only a third have been successfully identified. Therefore, leveraging… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  18. arXiv:2407.03314  [pdf, other

    cs.CV cs.CL cs.DB

    BACON: Supercharge Your VLM with Bag-of-Concept Graph to Mitigate Hallucinations

    Authors: Zhantao Yang, Ruili Feng, Keyu Yan, Huangji Wang, Zhicai Wang, Shangwen Zhu, Han Zhang, Jie Xiao, Pingyu Wu, Kai Zhu, Jixuan Chen, Chen-Wei Xie, Chaojie Mao, Yue Yang, Hongyang Zhang, Yu Liu, Fan Cheng

    Abstract: This paper presents Bag-of-Concept Graph (BACON) to gift models with limited linguistic abilities to taste the privilege of Vision Language Models (VLMs) and boost downstream tasks such as detection, visual question answering (VQA), and image generation. Since the visual scenes in physical worlds are structured with complex relations between objects, BACON breaks down annotations into basic minimu… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  19. arXiv:2407.00632  [pdf, other

    cs.RO cs.CL cs.CV cs.MA

    CAMON: Cooperative Agents for Multi-Object Navigation with LLM-based Conversations

    Authors: Pengying Wu, Yao Mu, Kangjie Zhou, Ji Ma, Junting Chen, Chang Liu

    Abstract: Visual navigation tasks are critical for household service robots. As these tasks become increasingly complex, effective communication and collaboration among multiple robots become imperative to ensure successful completion. In recent years, large language models (LLMs) have exhibited remarkable comprehension and planning abilities in the context of embodied agents. However, their application in… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: Accepted to the RSS 2024 Workshop: GROUND

  20. arXiv:2406.16860  [pdf, other

    cs.CV

    Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

    Authors: Shengbang Tong, Ellis Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Austin Wang, Rob Fergus, Yann LeCun, Saining Xie

    Abstract: We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can enhance multimodal capabilities, the design choices for vision components are often insufficiently explored and disconnected from visual representation learning research. This gap hinders accurate sensory grounding in real-world scenarios. Our study uses LLMs and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Website at https://cambrian-mllm.github.io

  21. arXiv:2406.15754  [pdf, other

    cs.CV cs.CL cs.LG cs.SD eess.AS

    Multimodal Segmentation for Vocal Tract Modeling

    Authors: Rishi Jain, Bohan Yu, Peter Wu, Tejas Prabhune, Gopala Anumanchipalli

    Abstract: Accurate modeling of the vocal tract is necessary to construct articulatory representations for interpretable speech processing and linguistics. However, vocal tract modeling is challenging because many internal articulators are occluded from external motion capture technologies. Real-time magnetic resonance imaging (RT-MRI) allows measuring precise movements of internal articulators during speech… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Interspeech 2024

  22. arXiv:2406.12998  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    Articulatory Encodec: Vocal Tract Kinematics as a Codec for Speech

    Authors: Cheol Jun Cho, Peter Wu, Tejas S. Prabhune, Dhruv Agarwal, Gopala K. Anumanchipalli

    Abstract: Vocal tract articulation is a natural, grounded control space of speech production. The spatiotemporal coordination of articulators combined with the vocal source shapes intelligible speech sounds to enable effective spoken communication. Based on this physiological grounding of speech, we propose a new framework of neural encoding-decoding of speech -- articulatory encodec. The articulatory encod… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  23. arXiv:2406.11739  [pdf, other

    cs.CV

    V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results

    Authors: Jiaqi Wang, Yuhang Zang, Pan Zhang, Tao Chu, Yuhang Cao, Zeyi Sun, Ziyu Liu, Xiaoyi Dong, Tong Wu, Dahua Lin, Zeming Chen, Zhi Wang, Lingchen Meng, Wenhao Yao, Jianwei Yang, Sihong Wu, Zhineng Chen, Zuxuan Wu, Yu-Gang Jiang, Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou , et al. (9 additional authors not shown)

    Abstract: Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  24. arXiv:2406.10870  [pdf, other

    cs.CL

    COOL: Comprehensive Knowledge Enhanced Prompt Learning for Domain Adaptive Few-shot Fake News Detection

    Authors: Yi Ouyang, Peng Wu, Li Pan

    Abstract: Most Fake News Detection (FND) methods often struggle with data scarcity for emerging news domain. Recently, prompt learning based on Pre-trained Language Models (PLM) has emerged as a promising approach in domain adaptive few-shot learning, since it greatly reduces the need for labeled data by bridging the gap between pre-training and downstream task. Furthermore, external knowledge is also helpf… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  25. arXiv:2406.09201  [pdf, other

    cs.CV

    Enhanced Object Detection: A Study on Vast Vocabulary Object Detection Track for V3Det Challenge 2024

    Authors: Peixi Wu, Bosong Chai, Xuan Nie, Longquan Yan, Zeyu Wang, Qifan Zhou, Boning Wang, Yansong Peng, Hebei Li

    Abstract: In this technical report, we present our findings from the research conducted on the Vast Vocabulary Visual Detection (V3Det) dataset for Supervised Vast Vocabulary Visual Detection task. How to deal with complex categories and detection boxes has become a difficulty in this track. The original supervised detector is not suitable for this task. We have designed a series of improvements, including… ▽ More

    Submitted 21 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Journal ref: Second Place in CVPR 2024 Vast Vocabulary Visual Detection Challenge

  26. arXiv:2405.16225  [pdf, ps, other

    cs.LG cs.AI

    Local Causal Structure Learning in the Presence of Latent Variables

    Authors: Feng Xie, Zheng Li, Peng Wu, Yan Zeng, Chunchen Liu, Zhi Geng

    Abstract: Discovering causal relationships from observational data, particularly in the presence of latent variables, poses a challenging problem. While current local structure learning methods have proven effective and efficient when the focus lies solely on the local relationships of a target variable, they operate under the assumption of causal sufficiency. This assumption implies that all the common cau… ▽ More

    Submitted 6 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

  27. arXiv:2405.15189  [pdf, other

    cs.SE cs.CL

    SOAP: Enhancing Efficiency of Generated Code via Self-Optimization

    Authors: Dong Huang, Jianbo Dai, Han Weng, Puzhen Wu, Yuhao Qing, Jie M. Zhang, Heming Cui, Zhijiang Guo

    Abstract: Large language models (LLMs) have shown remarkable progress in code generation, but their generated code often suffers from inefficiency, resulting in longer execution times and higher memory consumption. To address this issue, we propose Self Optimization based on OverheAd Profile (SOAP), a self-optimization framework that utilizes execution overhead profiles to improve the efficiency of LLM-gene… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 31 pages, 18 figures, and 8 tables

  28. Treatment Effect Estimation for User Interest Exploration on Recommender Systems

    Authors: Jiaju Chen, Wenjie Wang, Chongming Gao, Peng Wu, Jianxiong Wei, Qingsong Hua

    Abstract: Recommender systems learn personalized user preferences from user feedback like clicks. However, user feedback is usually biased towards partially observed interests, leaving many users' hidden interests unexplored. Existing approaches typically mitigate the bias, increase recommendation diversity, or use bandit algorithms to balance exploration-exploitation trade-offs. Nevertheless, they fail to… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

    Comments: Accepted to SIGIR 2024

  29. arXiv:2405.04798  [pdf, other

    cs.RO cs.AI

    From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control

    Authors: Yide Shentu, Philipp Wu, Aravind Rajeswaran, Pieter Abbeel

    Abstract: Hierarchical control for robotics has long been plagued by the need to have a well defined interface layer to communicate between high-level task planners and low-level policies. With the advent of LLMs, language has been emerging as a prospective interface layer. However, this has several limitations. Not all tasks can be decomposed into steps that are easily expressible in natural language (e.g.… ▽ More

    Submitted 8 July, 2024; v1 submitted 8 May, 2024; originally announced May 2024.

  30. arXiv:2405.03329  [pdf, other

    cs.LG stat.ML

    Policy Learning for Balancing Short-Term and Long-Term Rewards

    Authors: Peng Wu, Ziyu Shen, Feng Xie, Zhongyao Wang, Chunchen Liu, Yan Zeng

    Abstract: Empirical researchers and decision-makers spanning various domains frequently seek profound insights into the long-term impacts of interventions. While the significance of long-term outcomes is undeniable, an overemphasis on them may inadvertently overshadow short-term gains. Motivated by this, this paper formalizes a new framework for learning the optimal policy that effectively balances both lon… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  31. arXiv:2404.19620  [pdf, other

    cs.LG cs.IR stat.ML

    Be Aware of the Neighborhood Effect: Modeling Selection Bias under Interference

    Authors: Haoxuan Li, Chunyuan Zheng, Sihao Ding, Peng Wu, Zhi Geng, Fuli Feng, Xiangnan He

    Abstract: Selection bias in recommender system arises from the recommendation process of system filtering and the interactive process of user selection. Many previous studies have focused on addressing selection bias to achieve unbiased learning of the prediction model, but ignore the fact that potential outcomes for a given user-item pair may vary with the treatments assigned to other user-item pairs, name… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: ICLR 24

  32. arXiv:2404.19596  [pdf, other

    cs.IR cs.LG

    Debiased Collaborative Filtering with Kernel-Based Causal Balancing

    Authors: Haoxuan Li, Chunyuan Zheng, Yanghao Xiao, Peng Wu, Zhi Geng, Xu Chen, Peng Cui

    Abstract: Debiased collaborative filtering aims to learn an unbiased prediction model by removing different biases in observational datasets. To solve this problem, one of the simple and effective methods is based on the propensity score, which adjusts the observational sample distribution to the target one by reweighting observed instances. Ideally, propensity scores should be learned with causal balancing… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: ICLR 24 Spotlight

  33. arXiv:2404.14248  [pdf, other

    cs.CV

    NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results

    Authors: Xiaoning Liu, Zongwei Wu, Ao Li, Florin-Alexandru Vasluianu, Yulun Zhang, Shuhang Gu, Le Zhang, Ce Zhu, Radu Timofte, Zhi Jin, Hongjun Wu, Chenxi Wang, Haitao Ling, Yuanhao Cai, Hao Bian, Yuxin Zheng, Jing Lin, Alan Yuille, Ben Shao, Jin Guo, Tianli Liu, Mohao Wu, Yixu Feng, Shuo Hou, Haotian Lin , et al. (87 additional authors not shown)

    Abstract: This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: NTIRE 2024 Challenge Report

  34. arXiv:2404.14132  [pdf, other

    cs.CV eess.IV

    CRNet: A Detail-Preserving Network for Unified Image Restoration and Enhancement Task

    Authors: Kangzhen Yang, Tao Hu, Kexin Dai, Genggeng Chen, Yu Cao, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, images captured often suffer from blurring, noise, and other forms of image degradation, and due to sensor limitations, people usually can only obtain low dynamic range images. To achieve high-quality images, researchers have attempted various image restoration and enhancement operations on photographs, including denoising, deblurring, and high dynamic range imaging. Howev… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR2024 Workshop, Code: https://github.com/CalvinYang0/CRNet

  35. arXiv:2404.13537  [pdf, other

    eess.IV cs.CV

    Bracketing Image Restoration and Enhancement with High-Low Frequency Decomposition

    Authors: Genggeng Chen, Kexin Dai, Kangzhen Yang, Tao Hu, Xiangyu Chen, Yongqing Yang, Wei Dong, Peng Wu, Yanning Zhang, Qingsen Yan

    Abstract: In real-world scenarios, due to a series of image degradations, obtaining high-quality, clear content photos is challenging. While significant progress has been made in synthesizing high-quality images, previous methods for image restoration and enhancement often overlooked the characteristics of different degradations. They applied the same structure to address various types of degradation, resul… ▽ More

    Submitted 24 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: This paper is accepted by CVPR 2024 Workshop, code: https://github.com/chengeng0613/HLNet

  36. arXiv:2404.12022  [pdf, other

    cs.CL

    Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration

    Authors: Pengfei Wu, Jiahao Liu, Zhuocheng Gong, Qifan Wang, Jinpeng Li, Jingang Wang, Xunliang Cai, Dongyan Zhao

    Abstract: Large language models (LLMs) have recently shown remarkable performance across a wide range of tasks. However, the substantial number of parameters in LLMs contributes to significant latency during model inference. This is particularly evident when utilizing autoregressive decoding methods, which generate one token in a single forward process, thereby not fully capitalizing on the parallel computi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  37. arXiv:2404.10211  [pdf, other

    cs.LG cs.AI

    Anomaly Correction of Business Processes Using Transformer Autoencoder

    Authors: Ziyou Gong, Xianwen Fang, Ping Wu

    Abstract: Event log records all events that occur during the execution of business processes, so detecting and correcting anomalies in event log can provide reliable guarantee for subsequent process analysis. The previous works mainly include next event prediction based methods and autoencoder-based methods. These methods cannot accurately and efficiently detect anomalies and correct anomalies at the same t… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  38. arXiv:2404.08531  [pdf, other

    cs.CV

    Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection

    Authors: Zhiwei Yang, Jing Liu, Peng Wu

    Abstract: Weakly supervised video anomaly detection (WSVAD) is a challenging task. Generating fine-grained pseudo-labels based on weak-label and then self-training a classifier is currently a promising solution. However, since the existing methods use only RGB visual modality and the utilization of category text information is neglected, thus limiting the generation of more accurate pseudo-labels and affect… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR2024

  39. arXiv:2404.01650  [pdf, other

    cs.LG

    Test-Time Model Adaptation with Only Forward Passes

    Authors: Shuaicheng Niu, Chunyan Miao, Guohao Chen, Pengcheng Wu, Peilin Zhao

    Abstract: Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts. However, in real-world scenarios, models are usually deployed on resource-limited devices, e.g., FPGAs, and are often quantized and hard-coded with non-modifiable parameters for acceleration. In light of this, existing methods are often infeasible since they heavil… ▽ More

    Submitted 29 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 18 pages, 4 figures, 17 tables, accepted by International Conference on Machine Learning

  40. arXiv:2404.01356  [pdf, other

    cs.LG cs.AI cs.CY

    The Double-Edged Sword of Input Perturbations to Robust Accurate Fairness

    Authors: Xuran Li, Peng Wu, Yanting Chen, Xingjun Ma, Zhen Zhang, Kaixiang Dong

    Abstract: Deep neural networks (DNNs) are known to be sensitive to adversarial input perturbations, leading to a reduction in either prediction accuracy or individual fairness. To jointly characterize the susceptibility of prediction accuracy and individual fairness to adversarial perturbations, we introduce a novel robustness definition termed robust accurate fairness. Informally, robust accurate fairness… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  41. arXiv:2403.12386  [pdf

    cs.CL cs.AI

    Pipelined Biomedical Event Extraction Rivaling Joint Learning

    Authors: Pengchao Wu, Xuefeng Li, Jinghang Gu, Longhua Qian, Guodong Zhou

    Abstract: Biomedical event extraction is an information extraction task to obtain events from biomedical text, whose targets include the type, the trigger, and the respective arguments involved in an event. Traditional biomedical event extraction usually adopts a pipelined approach, which contains trigger identification, argument role recognition, and finally event construction either using specific rules o… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  42. arXiv:2403.10039  [pdf, other

    cs.CV cs.AI

    Rethinking Low-quality Optical Flow in Unsupervised Surgical Instrument Segmentation

    Authors: Peiran Wu, Yang Liu, Jiayu Huo, Gongyu Zhang, Christos Bergeles, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin

    Abstract: Video-based surgical instrument segmentation plays an important role in robot-assisted surgeries. Unlike supervised settings, unsupervised segmentation relies heavily on motion cues, which are challenging to discern due to the typically lower quality of optical flow in surgical footage compared to natural scenes. This presents a considerable burden for the advancement of unsupervised segmentation… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  43. arXiv:2403.09630  [pdf, other

    cs.CV

    GenAD: Generalized Predictive Model for Autonomous Driving

    Authors: Jiazhi Yang, Shenyuan Gao, Yihang Qiu, Li Chen, Tianyu Li, Bo Dai, Kashyap Chitta, Penghao Wu, Jia Zeng, Ping Luo, Jun Zhang, Andreas Geiger, Yu Qiao, Hongyang Li

    Abstract: In this paper, we introduce the first large-scale video prediction model in the autonomous driving discipline. To eliminate the restriction of high-cost data collection and empower the generalization ability of our model, we acquire massive data from the web and pair it with diverse and high-quality text descriptions. The resultant dataset accumulates over 2000 hours of driving videos, spanning ar… ▽ More

    Submitted 8 August, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: CVPR 2024 Highlight Paper. Dataset: https://github.com/OpenDriveLab/DriveAGI

  44. arXiv:2403.01674  [pdf, other

    cs.RO

    ASPIRe: An Informative Trajectory Planner with Mutual Information Approximation for Target Search and Tracking

    Authors: Kangjie Zhou, Pengying Wu, Yao Su, Han Gao, Ji Ma, Hangxin Liu, Chang Liu

    Abstract: This paper proposes an informative trajectory planning approach, namely, \textit{adaptive particle filter tree with sigma point-based mutual information reward approximation} (ASPIRe), for mobile target search and tracking (SAT) in cluttered environments with limited sensing field of view. We develop a novel sigma point-based approximation to accurately estimate mutual information (MI) for general… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: accepted to ICRA 2024

  45. arXiv:2402.19007  [pdf, other

    cs.CV cs.RO

    DOZE: A Dataset for Open-Vocabulary Zero-Shot Object Navigation in Dynamic Environments

    Authors: Ji Ma, Hongming Dai, Yao Mu, Pengying Wu, Hao Wang, Xiaowei Chi, Yang Fei, Shanghang Zhang, Chang Liu

    Abstract: Zero-Shot Object Navigation (ZSON) requires agents to autonomously locate and approach unseen objects in unfamiliar environments and has emerged as a particularly challenging task within the domain of Embodied AI. Existing datasets for developing ZSON algorithms lack consideration of dynamic obstacles, object attribute diversity, and scene texts, thus exhibiting noticeable discrepancies from real-… ▽ More

    Submitted 8 July, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: This version of the paper has been accepted for publication in IEEE Robotics and Automation Letters (RA-L)

  46. arXiv:2402.18790  [pdf, ps, other

    quant-ph cs.CC

    The Power of Unentangled Quantum Proofs with Non-negative Amplitudes

    Authors: Fernando Granha Jeronimo, Pei Wu

    Abstract: Quantum entanglement is a fundamental property of quantum mechanics and plays a crucial role in quantum computation and information. We study entanglement via the lens of computational complexity by considering quantum generalizations of the class NP with multiple unentangled quantum proofs, the so-called QMA(2) and its variants. The complexity of QMA(2) is a longstanding open problem, and only th… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 64 pages

  47. arXiv:2402.16050  [pdf, other

    cs.CV cs.CL

    LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding

    Authors: Yuxuan Wang, Yueqian Wang, Pengfei Wu, Jianxin Liang, Dongyan Zhao, Zilong Zheng

    Abstract: Despite progress in video-language modeling, the computational challenge of interpreting long-form videos in response to task-specific linguistic queries persists, largely due to the complexity of high-dimensional video data and the misalignment between language and visual cues over space and time. To tackle this issue, we introduce a novel approach called Language-guided Spatial-Temporal Prompt L… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

  48. arXiv:2402.15282  [pdf, ps, other

    quant-ph cs.CC

    Dimension Independent Disentanglers from Unentanglement and Applications

    Authors: Fernando G. Jeronimo, Pei Wu

    Abstract: Quantum entanglement is a key enabling ingredient in diverse applications. However, the presence of unwanted adversarial entanglement also poses challenges in many applications. In this paper, we explore methods to "break" quantum entanglement. Specifically, we construct a dimension-independent k-partite disentangler (like) channel from bipartite unentangled input. We show: For every… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

    Comments: 28 pages

  49. arXiv:2402.13296  [pdf, other

    cs.NE

    Evolutionary Reinforcement Learning: A Systematic Review and Future Directions

    Authors: Yuanguo Lin, Fan Lin, Guorong Cai, Hong Chen, Lixin Zou, Pengcheng Wu

    Abstract: In response to the limitations of reinforcement learning and evolutionary algorithms (EAs) in complex problem-solving, Evolutionary Reinforcement Learning (EvoRL) has emerged as a synergistic solution. EvoRL integrates EAs and reinforcement learning, presenting a promising avenue for training intelligent agents. This systematic review firstly navigates through the technological background of EvoRL… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

    Comments: 18 pages, 2 figures

  50. arXiv:2402.07703  [pdf, other

    cs.LG cs.AI

    Online Sequential Decision-Making with Unknown Delays

    Authors: Ping Wu, Heyan Huang, Zhengyang Liu

    Abstract: In the field of online sequential decision-making, we address the problem with delays utilizing the framework of online convex optimization (OCO), where the feedback of a decision can arrive with an unknown delay. Unlike previous research that is limited to Euclidean norm and gradient information, we propose three families of delayed algorithms based on approximate solutions to handle different ty… ▽ More

    Submitted 23 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.