Skip to main content

Showing 1–50 of 350 results for author: Wu, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10527  [pdf, other

    cs.CV cs.AI

    EdgeNAT: Transformer for Efficient Edge Detection

    Authors: Jinghuai Jie, Yan Guo, Guixing Wu, Junmin Wu, Baojian Hua

    Abstract: Transformers, renowned for their powerful feature extraction capabilities, have played an increasingly prominent role in various vision tasks. Especially, recent advancements present transformer with hierarchical structures such as Dilated Neighborhood Attention Transformer (DiNAT), demonstrating outstanding ability to efficiently capture both global and local features. However, transformers' appl… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2408.10411  [pdf, other

    cs.CL

    Resolving Lexical Bias in Edit Scoping with Projector Editor Networks

    Authors: Hammad Rizwan, Domenic Rosati, Ga Wu, Hassan Sajjad

    Abstract: Weight-preserving model editing techniques heavily rely on the scoping mechanism that decides when to apply an edit to the base model. These scoping mechanisms utilize distance functions in the representation space to ascertain the scope of the edit. In this work, we show that distance-based scoping functions grapple with lexical biases leading to issues such as misfires with irrelevant prompts th… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  3. arXiv:2408.07930  [pdf, other

    cs.CL cs.AI

    MAG-SQL: Multi-Agent Generative Approach with Soft Schema Linking and Iterative Sub-SQL Refinement for Text-to-SQL

    Authors: Wenxuan Xie, Gaochen Wu, Bowen Zhou

    Abstract: Recent In-Context Learning based methods have achieved remarkable success in Text-to-SQL task. However, there is still a large gap between the performance of these models and human performance on datasets with complex database schema and difficult questions, such as BIRD. Besides, existing work has neglected to supervise intermediate steps when solving questions iteratively with question decomposi… ▽ More

    Submitted 15 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

    Comments: 22 pages, 14 figures

  4. arXiv:2408.05775  [pdf, other

    cs.CV

    Efficient Test-Time Prompt Tuning for Vision-Language Models

    Authors: Yuhan Zhu, Guozhen Zhang, Chen Xu, Haocheng Shen, Xiaoxin Chen, Gangshan Wu, Limin Wang

    Abstract: Vision-language models have showcased impressive zero-shot classification capabilities when equipped with suitable text prompts. Previous studies have shown the effectiveness of test-time prompt tuning; however, these methods typically require per-image prompt adaptation during inference, which incurs high computational budgets and limits scalability and practical deployment. To overcome this issu… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  5. arXiv:2408.01950  [pdf, other

    cs.SD cs.CL eess.AS

    Why Perturbing Symbolic Music is Necessary: Fitting the Distribution of Never-used Notes through a Joint Probabilistic Diffusion Model

    Authors: Shipei Liu, Xiaoya Fan, Guowei Wu

    Abstract: Existing music generation models are mostly language-based, neglecting the frequency continuity property of notes, resulting in inadequate fitting of rare or never-used notes and thus reducing the diversity of generated samples. We argue that the distribution of notes can be modeled by translational invariance and periodicity, especially using diffusion models to generalize notes by injecting freq… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  6. arXiv:2407.21045  [pdf

    cs.CL cs.AI

    Unlocking the Potential: Benchmarking Large Language Models in Water Engineering and Research

    Authors: Boyan Xu, Liang Wen, Zihao Li, Yuxing Yang, Guanlan Wu, Xiongpeng Tang, Yu Li, Zihao Wu, Qingxian Su, Xueqing Shi, Yue Yang, Rui Tong, How Yong Ng

    Abstract: Recent advancements in Large Language Models (LLMs) have sparked interest in their potential applications across various fields. This paper embarked on a pivotal inquiry: Can existing LLMs effectively serve as "water expert models" for water engineering and research tasks? This study was the first to evaluate LLMs' contributions across various water engineering and research tasks by establishing a… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  7. arXiv:2407.20026  [pdf, other

    cs.MS math.OC

    JAX-SSO: Differentiable Finite Element Analysis Solver for Structural Optimization and Seamless Integration with Neural Networks

    Authors: Gaoyuan Wu

    Abstract: Differentiable numerical simulations of physical systems have gained rising attention in the past few years with the development of automatic differentiation tools. This paper presents JAX-SSO, a differentiable finite element analysis solver built with JAX, Google's high-performance computing library, to assist efficient structural design in the built environment. With the adjoint method and autom… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  8. arXiv:2407.19224  [pdf, other

    cs.SD cs.MM eess.AS

    RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual Cues

    Authors: Tianrui Pan, Jie Liu, Bohan Wang, Jie Tang, Gangshan Wu

    Abstract: While existing Audio-Visual Speech Separation (AVSS) methods primarily concentrate on the audio-visual fusion strategy for two-speaker separation, they demonstrate a severe performance drop in the multi-speaker separation scenarios. Typically, AVSS methods employ guiding videos to sequentially isolate individual speakers from the given audio mixture, resulting in notable missing and noisy parts ac… ▽ More

    Submitted 29 July, 2024; v1 submitted 27 July, 2024; originally announced July 2024.

    Comments: Accepted by MM 2024

  9. arXiv:2407.14105  [pdf, other

    cs.FL cs.LO

    Quasi-Isometric Reductions Between Infinite Strings

    Authors: Karen Frilya Celine, Ziyuan Gao, Sanjay Jain, Ryan Lou, Frank Stephan, Guohua Wu

    Abstract: This paper studies the recursion-theoretic aspects of large-scale geometries of infinite strings, a subject initiated by Khoussainov and Takisaka (2017). We investigate several notions of quasi-isometric reductions between recursive infinite strings and prove various results on the equivalence classes of such reductions. The main result is the construction of two infinite recursive strings $α$ and… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  10. arXiv:2407.12260  [pdf, other

    cs.HC

    HuBar: A Visual Analytics Tool to Explore Human Behaviour based on fNIRS in AR guidance systems

    Authors: Sonia Castelo, Joao Rulff, Parikshit Solunke, Erin McGowan, Guande Wu, Iran Roman, Roque Lopez, Bea Steers, Qi Sun, Juan Bello, Bradley Feest, Michael Middleton, Ryan Mckendrick, Claudio Silva

    Abstract: The concept of an intelligent augmented reality (AR) assistant has significant, wide-ranging applications, with potential uses in medicine, military, and mechanics domains. Such an assistant must be able to perceive the environment and actions, reason about the environment state in relation to a given task, and seamlessly interact with the task performer. These interactions typically involve an AR… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 figures. This is the author's version of the article that has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics (TVCG)

  11. arXiv:2407.10756  [pdf, other

    cs.CV

    GTPT: Group-based Token Pruning Transformer for Efficient Human Pose Estimation

    Authors: Haonan Wang, Jie Liu, Jie Tang, Gangshan Wu, Bo Xu, Yanbing Chou, Yong Wang

    Abstract: In recent years, 2D human pose estimation has made significant progress on public benchmarks. However, many of these approaches face challenges of less applicability in the industrial community due to the large number of parametric quantities and computational overhead. Efficient human pose estimation remains a hurdle, especially for whole-body pose estimation with numerous keypoints. While most c… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 accepted

  12. arXiv:2407.07332  [pdf, ps, other

    cs.IT

    Several new classes of optimal ternary cyclic codes with two or three zeros

    Authors: Gaofei Wu, Zhuohui You, Zhengbang Zha, Yuqing Zhang

    Abstract: Cyclic codes are a subclass of linear codes and have wide applications in data storage systems, communication systems and consumer electronics due to their efficient encoding and decoding algorithms. Let $α$ be a generator of $\mathbb{F}_{3^m}^*$, where $m$ is a positive integer. Denote by $\mathcal{C}_{(i_1,i_2,\cdots, i_t)}$ the cyclic code with generator polynomial… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 16 pages

  13. arXiv:2407.06516  [pdf, other

    cs.CV

    VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving

    Authors: Yibo Liu, Zheyuan Yang, Guile Wu, Yuan Ren, Kejian Lin, Bingbing Liu, Yang Liu, Jinjun Shan

    Abstract: Generating 3D vehicle assets from in-the-wild observations is crucial to autonomous driving. Existing image-to-3D methods cannot well address this problem because they learn generation merely from image RGB information without a deeper understanding of in-the-wild vehicles (such as car models, manufacturers, etc.). This leads to their poor zero-shot prediction capability to handle real-world obser… ▽ More

    Submitted 10 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  14. arXiv:2407.04969  [pdf, other

    cs.CL

    EVA-Score: Evaluation of Long-form Summarization on Informativeness through Extraction and Validation

    Authors: Yuchen Fan, Xin Zhong, Chengsi Wang, Gaoche Wu, Bowen Zhou

    Abstract: Summarization is a fundamental task in natural language processing (NLP) and since large language models (LLMs), such as GPT-4 and Claude, come out, increasing attention has been paid to long-form summarization whose input sequences are much longer, indicating more information contained. The current evaluation metrics either use similarity-based metrics like ROUGE and BERTScore which rely on sim… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 16 pages, 3 figures, submitted to EMNLP

  15. arXiv:2407.04603  [pdf, other

    cs.CV

    AWT: Transferring Vision-Language Models via Augmentation, Weighting, and Transportation

    Authors: Yuhan Zhu, Yuyang Ji, Zhiyu Zhao, Gangshan Wu, Limin Wang

    Abstract: Pre-trained vision-language models (VLMs) have shown impressive results in various visual classification tasks. However, we often fail to fully unleash their potential when adapting them for new concept understanding due to limited information on new classes. To address this limitation, we introduce a novel adaptation framework, AWT (Augment, Weight, then Transport). AWT comprises three key compon… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  16. arXiv:2407.04504  [pdf, other

    cs.CV

    Segment Any 4D Gaussians

    Authors: Shengxiang Ji, Guanjun Wu, Jiemin Fang, Jiazhong Cen, Taoran Yi, Wenyu Liu, Qi Tian, Xinggang Wang

    Abstract: Modeling, understanding, and reconstructing the real world are crucial in XR/VR. Recently, 3D Gaussian Splatting (3D-GS) methods have shown remarkable success in modeling and understanding 3D scenes. Similarly, various 4D representations have demonstrated the ability to capture the dynamics of the 4D world. However, there is a dearth of research focusing on segmentation within 4D representations.… ▽ More

    Submitted 12 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

    Comments: 22 pages

  17. arXiv:2407.03277  [pdf, other

    cs.CL

    Evaluating Automatic Metrics with Incremental Machine Translation Systems

    Authors: Guojun Wu, Shay B. Cohen, Rico Sennrich

    Abstract: We introduce a dataset comprising commercial machine translations, gathered weekly over six years across 12 translation directions. Since human A/B testing is commonly used, we assume commercial systems improve over time, which enables us to evaluate machine translation (MT) metrics based on their preference for more recent translations. Our study confirms several previous findings in MT metrics r… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  18. arXiv:2406.18462  [pdf, other

    cs.CV cs.GR

    GaussianDreamerPro: Text to Manipulable 3D Gaussians with Highly Enhanced Quality

    Authors: Taoran Yi, Jiemin Fang, Zanwei Zhou, Junjie Wang, Guanjun Wu, Lingxi Xie, Xiaopeng Zhang, Wenyu Liu, Xinggang Wang, Qi Tian

    Abstract: Recently, 3D Gaussian splatting (3D-GS) has achieved great success in reconstructing and rendering real-world scenes. To transfer the high rendering quality to generation tasks, a series of research works attempt to generate 3D-Gaussian assets from text. However, the generated assets have not achieved the same quality as those in reconstruction tasks. We observe that Gaussians tend to grow without… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Project page: https://taoranyi.com/gaussiandreamerpro/

  19. arXiv:2406.10857  [pdf, other

    cs.SE

    An LLM-enhanced Multi-objective Evolutionary Search for Autonomous Driving Test Scenario Generation

    Authors: Haoxiang Tian, Xingshuo Han, Guoquan Wu, Yuan Zhou, Shuo Li, Jun Wei, Dan Ye, Wei Wang, Tianwei Zhang

    Abstract: The safety of Autonomous Driving Systems (ADSs) is significantly important for the implementation of autonomous vehicles (AVs). Therefore, ADSs must be evaluated thoroughly before their release and deployment to the public. How to generate diverse safety-critical test scenarios is a key task for ADS testing. This paper proposes LEADE, an LLM-enhanced scenario generation approach for ADS testing, w… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 12 pages

  20. arXiv:2406.09016  [pdf, other

    cs.CV

    Cross-Modal Learning for Anomaly Detection in Fused Magnesium Smelting Process: Methodology and Benchmark

    Authors: Gaochang Wu, Yapeng Zhang, Lan Deng, Jingxin Zhang, Tianyou Chai

    Abstract: Fused Magnesium Furnace (FMF) is a crucial industrial equipment in the production of magnesia, and anomaly detection plays a pivotal role in ensuring its efficient, stable, and secure operation. Existing anomaly detection methods primarily focus on analyzing dominant anomalies using the process variables (such as arc current) or constructing neural networks based on abnormal visual features, while… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 14 pages, 6 figures, 5 tables. Submitted to IEEE

  21. arXiv:2406.06252  [pdf, other

    eess.SP cs.CR

    Random Time-hopping Secure Ranging Strategy Against Distance-Reduction Attacks in UWB

    Authors: Wenlong Gou, Chuanhang Yu, Gang Wu

    Abstract: In order to mitigate the distance reduction attack in Ultra-Wide Band (UWB) ranging, this paper proposes a secure ranging scheme based on a random time-hopping mechanism without redundant signaling overhead. Additionally, a secure ranging strategy is designed for backward compatibility with existing standards such as IEEE 802.15.4a/z, combined with an attack detection scheme. The effectiveness and… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    ACM Class: H.1.1

  22. arXiv:2406.05343  [pdf, other

    cs.AI cs.CL

    M3GIA: A Cognition Inspired Multilingual and Multimodal General Intelligence Ability Benchmark

    Authors: Wei Song, Yadong Li, Jianhua Xu, Guowei Wu, Lingfeng Ming, Kexin Yi, Weihua Luo, Houyi Li, Yi Du, Fangda Guo, Kaicheng Yu

    Abstract: As recent multi-modality large language models (MLLMs) have shown formidable proficiency on various complex tasks, there has been increasing attention on debating whether these models could eventually mirror human intelligence. However, existing benchmarks mainly focus on evaluating solely on task performance, such as the accuracy of identifying the attribute of an object. Combining well-developed… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

  23. arXiv:2406.03843  [pdf, other

    cs.HC cs.AI

    POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models

    Authors: Jianben He, Xingbo Wang, Shiyi Liu, Guande Wu, Claudio Silva, Huamin Qu

    Abstract: Large language models (LLMs) have exhibited impressive abilities for multimodal content comprehension and reasoning with proper prompting in zero- or few-shot settings. Despite the proliferation of interactive systems developed to support prompt engineering for LLMs across various tasks, most have primarily focused on textual or visual inputs, thus neglecting the complex interplay between modaliti… ▽ More

    Submitted 14 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: 11 pages, 5 figures

    MSC Class: 68 ACM Class: H.5; I.2.1

  24. arXiv:2406.00833  [pdf, other

    cs.CY cs.AI

    Harvard Undergraduate Survey on Generative AI

    Authors: Shikoh Hirabayashi, Rishab Jain, Nikola Jurković, Gabriel Wu

    Abstract: How has generative AI impacted the experiences of college students? We study the influence of AI on the study habits, class choices, and career prospects of Harvard undergraduates (n=326), finding that almost 90% of students use generative AI. For roughly 25% of these students, AI has begun to substitute for attending office hours and completing required readings. Half of students are concerned th… ▽ More

    Submitted 7 August, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  25. arXiv:2405.18255  [pdf, other

    cs.CR cs.SI eess.SP

    Channel Reciprocity Based Attack Detection for Securing UWB Ranging by Autoencoder

    Authors: Wenlong Gou, Chuanhang Yu, Juntao Ma, Gang Wu, Vladimir Mordachev

    Abstract: A variety of ranging threats represented by Ghost Peak attack have raised concerns regarding the security performance of Ultra-Wide Band (UWB) systems with the finalization of the IEEE 802.15.4z standard. Based on channel reciprocity, this paper proposes a low complexity attack detection scheme that compares Channel Impulse Response (CIR) features of both ranging sides utilizing an autoencoder wit… ▽ More

    Submitted 10 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    ACM Class: H.1.1

  26. arXiv:2405.16845  [pdf, other

    cs.LG cs.CL stat.ML

    On Mesa-Optimization in Autoregressively Trained Transformers: Emergence and Capability

    Authors: Chenyu Zheng, Wei Huang, Rongzhen Wang, Guoqiang Wu, Jun Zhu, Chongxuan Li

    Abstract: Autoregressively trained transformers have brought a profound revolution to the world, especially with their in-context learning (ICL) ability to address downstream tasks. Recently, several studies suggest that transformers learn a mesa-optimizer during autoregressive (AR) pretraining to implement ICL. Namely, the forward pass of the trained transformer is equivalent to optimizing an inner objecti… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 37pages

  27. arXiv:2405.12155  [pdf, other

    cs.IT

    Embracing Radiance Field Rendering in 6G: Over-the-Air Training and Inference with 3D Contents

    Authors: Guanlin Wu, Zhonghao Lyu, Juyong Zhang, Jie Xu

    Abstract: The efficient representation, transmission, and reconstruction of three-dimensional (3D) contents are becoming increasingly important for sixth-generation (6G) networks that aim to merge virtual and physical worlds for offering immersive communication experiences. Neural radiance field (NeRF) and 3D Gaussian splatting (3D-GS) have recently emerged as two promising 3D representation techniques base… ▽ More

    Submitted 18 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: 16 pages,7 figures

  28. arXiv:2405.10832  [pdf, other

    cs.CV

    Open-Vocabulary Spatio-Temporal Action Detection

    Authors: Tao Wu, Shuqiu Ge, Jie Qin, Gangshan Wu, Limin Wang

    Abstract: Spatio-temporal action detection (STAD) is an important fine-grained video understanding task. Current methods require box and label supervision for all action classes in advance. However, in real-world applications, it is very likely to come across new action classes not seen in training because the action category space is large and hard to enumerate. Also, the cost of data annotation and model… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  29. arXiv:2405.10691  [pdf, other

    eess.IV cs.CV

    LoCI-DiffCom: Longitudinal Consistency-Informed Diffusion Model for 3D Infant Brain Image Completion

    Authors: Zihao Zhu, Tianli Tao, Yitian Tao, Haowen Deng, Xinyi Cai, Gaofeng Wu, Kaidong Wang, Haifeng Tang, Lixuan Zhu, Zhuoyang Gu, Jiawei Huang, Dinggang Shen, Han Zhang

    Abstract: The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets wit… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  30. arXiv:2405.08217  [pdf, other

    cs.LG q-bio.GN q-bio.QM stat.ML

    Data Valuation with Gradient Similarity

    Authors: Nathaniel J. Evans, Gordon B. Mills, Guanming Wu, Xubo Song, Shannon McWeeney

    Abstract: High-quality data is crucial for accurate machine learning and actionable analytics, however, mislabeled or noisy data is a common problem in many domains. Distinguishing low- from high-quality data can be challenging, often requiring expert knowledge and considerable manual intervention. Data Valuation algorithms are a class of methods that seek to quantify the value of each sample in a dataset b… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  31. arXiv:2405.03873  [pdf, other

    cs.AI cs.HC

    Investigating Personalized Driving Behaviors in Dilemma Zones: Analysis and Prediction of Stop-or-Go Decisions

    Authors: Ziye Qin, Siyan Li, Guoyuan Wu, Matthew J. Barth, Amr Abdelraouf, Rohit Gupta, Kyungtae Han

    Abstract: Dilemma zones at signalized intersections present a commonly occurring but unsolved challenge for both drivers and traffic operators. Onsets of the yellow lights prompt varied responses from different drivers: some may brake abruptly, compromising the ride comfort, while others may accelerate, increasing the risk of red-light violations and potential safety hazards. Such diversity in drivers' stop… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  32. arXiv:2405.03103  [pdf, other

    cs.LG cs.CV

    Learning from Students: Applying t-Distributions to Explore Accurate and Efficient Formats for LLMs

    Authors: Jordan Dotzel, Yuzong Chen, Bahaa Kotb, Sushma Prasad, Gang Wu, Sheng Li, Mohamed S. Abdelfattah, Zhiru Zhang

    Abstract: The increasing size of large language models (LLMs) traditionally requires low-precision integer formats to meet strict latency and power demands. Yet recently, alternative formats such as Normal Float (NF4) have increased model accuracy at the cost of increased chip area. In this work, we first conduct a large-scale analysis of LLM weights and activations across 30 networks and conclude that most… ▽ More

    Submitted 10 June, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

    Comments: Accepted to ICML 2024

  33. arXiv:2404.16831  [pdf, other

    cs.CV

    The Third Monocular Depth Estimation Challenge

    Authors: Jaime Spencer, Fabio Tosi, Matteo Poggi, Ripudaman Singh Arora, Chris Russell, Simon Hadfield, Richard Bowden, GuangYuan Zhou, ZhengXin Li, Qiang Rao, YiPing Bao, Xiao Liu, Dohyeong Kim, Jinseong Kim, Myunghyun Kim, Mykola Lavreniuk, Rui Li, Qing Mao, Jiang Wu, Yu Zhu, Jinqiu Sun, Yanning Zhang, Suraj Patni, Aradhye Agarwal, Chetan Arora , et al. (16 additional authors not shown)

    Abstract: This paper discusses the results of the third edition of the Monocular Depth Estimation Challenge (MDEC). The challenge focuses on zero-shot generalization to the challenging SYNS-Patches dataset, featuring complex scenes in natural and indoor settings. As with the previous edition, methods can use any form of supervision, i.e. supervised or self-supervised. The challenge received a total of 19 su… ▽ More

    Submitted 27 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: To appear in CVPRW2024

  34. arXiv:2404.11214  [pdf, other

    cs.CV cs.AI

    Feature Corrective Transfer Learning: End-to-End Solutions to Object Detection in Non-Ideal Visual Conditions

    Authors: Chuheng Wei, Guoyuan Wu, Matthew J. Barth

    Abstract: A significant challenge in the field of object detection lies in the system's performance under non-ideal imaging conditions, such as rain, fog, low illumination, or raw Bayer images that lack ISP processing. Our study introduces "Feature Corrective Transfer Learning", a novel approach that leverages transfer learning and a bespoke loss function to facilitate the end-to-end detection of objects in… ▽ More

    Submitted 19 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: 2024 CVPR UG2+ Workshop

  35. arXiv:2404.11181  [pdf, other

    cs.LG cs.AI cs.RO

    KI-GAN: Knowledge-Informed Generative Adversarial Networks for Enhanced Multi-Vehicle Trajectory Forecasting at Signalized Intersections

    Authors: Chuheng Wei, Guoyuan Wu, Matthew J. Barth, Amr Abdelraouf, Rohit Gupta, Kyungtae Han

    Abstract: Reliable prediction of vehicle trajectories at signalized intersections is crucial to urban traffic management and autonomous driving systems. However, it presents unique challenges, due to the complex roadway layout at intersections, involvement of traffic signal controls, and interactions among different types of road users. To address these issues, we present in this paper a novel model called… ▽ More

    Submitted 19 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: 2024 CVPR AICity Workshop

  36. arXiv:2404.09842  [pdf, other

    cs.CV

    STMixer: A One-Stage Sparse Action Detector

    Authors: Tao Wu, Mengqi Cao, Ziteng Gao, Gangshan Wu, Limin Wang

    Abstract: Traditional video action detectors typically adopt the two-stage pipeline, where a person detector is first employed to generate actor boxes and then 3D RoIAlign is used to extract actor-specific features for classification. This detection paradigm requires multi-stage training and inference, and the feature sampling is constrained inside the box, failing to effectively leverage richer context inf… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Extended version of the paper arXiv:2303.15879 presented at CVPR 2023. Accepted by TPAMI 2024

  37. arXiv:2404.06692  [pdf, other

    cs.CV

    Perception-Oriented Video Frame Interpolation via Asymmetric Blending

    Authors: Guangyang Wu, Xin Tao, Changlin Li, Wenyi Wang, Xiaohong Liu, Qingqing Zheng

    Abstract: Previous methods for Video Frame Interpolation (VFI) have encountered challenges, notably the manifestation of blur and ghosting effects. These issues can be traced back to two pivotal factors: unavoidable motion errors and misalignment in supervision. In practice, motion estimates often prove to be error-prone, resulting in misaligned features. Furthermore, the reconstruction loss tends to bring… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  38. arXiv:2404.04565  [pdf, other

    cs.CV

    SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos

    Authors: Tao Wu, Runyu He, Gangshan Wu, Limin Wang

    Abstract: Video-based visual relation detection tasks, such as video scene graph generation, play important roles in fine-grained video understanding. However, current video visual relation detection datasets have two main limitations that hinder the progress of research in this area. First, they do not explore complex human-human interactions in multi-person scenarios. Second, the relation types of existin… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  39. arXiv:2404.00653  [pdf, other

    cs.CV

    Dual DETRs for Multi-Label Temporal Action Detection

    Authors: Yuhan Zhu, Guozhen Zhang, Jing Tan, Gangshan Wu, Limin Wang

    Abstract: Temporal Action Detection (TAD) aims to identify the action boundaries and the corresponding category within untrimmed videos. Inspired by the success of DETR in object detection, several methods have adapted the query-based framework to the TAD task. However, these approaches primarily followed DETR to predict actions at the instance level (i.e., identify each action by its center point), leading… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  40. arXiv:2404.00260  [pdf, other

    cs.CV eess.IV

    Exploiting Self-Supervised Constraints in Image Super-Resolution

    Authors: Gang Wu, Junjun Jiang, Kui Jiang, Xianming Liu

    Abstract: Recent advances in self-supervised learning, predominantly studied in high-level visual tasks, have been explored in low-level image processing. This paper introduces a novel self-supervised constraint for single image super-resolution, termed SSC-SR. SSC-SR uniquely addresses the divergence in image complexity by employing a dual asymmetric paradigm and a target model updated via exponential movi… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: ICME 2024

  41. arXiv:2404.00246  [pdf, other

    cs.CL cs.AI cs.HC

    Your Co-Workers Matter: Evaluating Collaborative Capabilities of Language Models in Blocks World

    Authors: Guande Wu, Chen Zhao, Claudio Silva, He He

    Abstract: Language agents that interact with the world on their own have great potential for automating digital tasks. While large language model (LLM) agents have made progress in understanding and executing tasks such as textual games and webpage control, many real-world tasks also require collaboration with humans or other LLMs in equal roles, which involves intent understanding, task coordination, and c… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  42. arXiv:2403.19586  [pdf, other

    cs.CV cs.GR

    TOGS: Gaussian Splatting with Temporal Opacity Offset for Real-Time 4D DSA Rendering

    Authors: Shuai Zhang, Huangxuan Zhao, Zhenghong Zhou, Guanjun Wu, Chuansheng Zheng, Xinggang Wang, Wenyu Liu

    Abstract: Four-dimensional Digital Subtraction Angiography (4D DSA) is a medical imaging technique that provides a series of 2D images captured at different stages and angles during the process of contrast agent filling blood vessels. It plays a significant role in the diagnosis of cerebrovascular diseases. Improving the rendering quality and speed under sparse sampling is important for observing the status… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  43. arXiv:2403.15576  [pdf, other

    cs.LG cs.CV

    Data-centric Prediction Explanation via Kernelized Stein Discrepancy

    Authors: Mahtab Sarvmaili, Hassan Sajjad, Ga Wu

    Abstract: Existing example-based prediction explanation methods often bridge test and training data points through the model's parameters or latent representations. While these methods offer clues to the causes of model predictions, they often exhibit innate shortcomings, such as incurring significant computational overhead or producing coarse-grained explanations. This paper presents a Highly-precise and D… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  44. arXiv:2403.06452  [pdf, other

    cs.CV

    Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation

    Authors: Guangyang Wu, Xiaohong Liu, Jun Jia, Xuehao Cui, Guangtao Zhai

    Abstract: In the digital era, QR codes serve as a linchpin connecting virtual and physical realms. Their pervasive integration across various applications highlights the demand for aesthetically pleasing codes without compromised scannability. However, prevailing methods grapple with the intrinsic challenge of balancing customization and scannability. Notably, stable-diffusion models have ushered in an epoc… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR 2024

  45. arXiv:2403.05304  [pdf

    cs.RO

    Spatiotemporal Predictive Pre-training for Robotic Motor Control

    Authors: Jiange Yang, Bei Liu, Jianlong Fu, Bocheng Pan, Gangshan Wu, Limin Wang

    Abstract: Robotic motor control necessitates the ability to predict the dynamics of environments and interaction objects. However, advanced self-supervised pre-trained visual representations (PVRs) in robotic motor control, leveraging large-scale egocentric videos, often focus solely on learning the static content features of sampled image frames. This neglects the crucial temporal motion clues in human vid… ▽ More

    Submitted 27 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: 21 pages, 9 figures, 8 tables

  46. arXiv:2403.03425  [pdf, other

    cs.LG physics.chem-ph q-bio.BM

    Sculpting Molecules in 3D: A Flexible Substructure Aware Framework for Text-Oriented Molecular Optimization

    Authors: Kaiwei Zhang, Yange Lin, Guangcheng Wu, Yuxiang Ren, Xuecang Zhang, Bo wang, Xiaoyu Zhang, Weitao Du

    Abstract: The integration of deep learning, particularly AI-Generated Content, with high-quality data derived from ab initio calculations has emerged as a promising avenue for transforming the landscape of scientific research. However, the challenge of designing molecular drugs or materials that incorporate multi-modality prior knowledge remains a critical and complex undertaking. Specifically, achieving a… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  47. arXiv:2403.01501  [pdf, other

    cs.LG cs.CR

    Applying Self-supervised Learning to Network Intrusion Detection for Network Flows with Graph Neural Network

    Authors: Renjie Xu, Guangwei Wu, Weiping Wang, Xing Gao, An He, Zhengpeng Zhang

    Abstract: Graph Neural Networks (GNNs) have garnered intensive attention for Network Intrusion Detection System (NIDS) due to their suitability for representing the network traffic flows. However, most present GNN-based methods for NIDS are supervised or semi-supervised. Network flows need to be manually annotated as supervisory labels, a process that is time-consuming or even impossible, making NIDS diffic… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

    Comments: 15pages,8figures

  48. arXiv:2403.00170  [pdf, other

    cs.SE cs.PL

    AlloyASG: Alloy Predicate Code Representation as a Compact Structurally Balanced Graph

    Authors: Guanxuan Wu, Allison Sullivan

    Abstract: Writing declarative models has numerous benefits, ranging from automated reasoning and correction of design-level properties before systems are built to automated testing and debugging of their implementations after they are built. Unfortunately, the model itself needs to be correct to gain these benefits. Alloy is a commonly used modeling language that has several existing efforts to repair fault… ▽ More

    Submitted 4 May, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

    Comments: 12 pages

  49. ARTiST: Automated Text Simplification for Task Guidance in Augmented Reality

    Authors: Guande Wu, Jing Qian, Sonia Castelo, Shaoyu Chen, Joao Rulff, Claudio Silva

    Abstract: Text presented in augmented reality provides in-situ, real-time information for users. However, this content can be challenging to apprehend quickly when engaging in cognitively demanding AR tasks, especially when it is presented on a head-mounted display. We propose ARTiST, an automatic text simplification system that uses a few-shot prompt and GPT-3 models to specifically optimize the text lengt… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: Conditionally accepted by CHI '24

    ACM Class: H.1.2; I.2.7

  50. arXiv:2402.10834  [pdf, other

    stat.AP cs.CY

    Agent-based Simulation Evaluation of CBD Tolling: A Case Study from New York City

    Authors: Qingnan Liang, Ruili Yao, Ruixuan Zhang, Zhibin Chen, Guoyuan Wu

    Abstract: Congestion tollings have been widely developed and adopted as an effective tool to mitigate urban traffic congestion and enhance transportation system sustainability. Nevertheless, these tolling schemes are often tailored on a city-by-city or even area-by-area basis, and the cost of conducting field experiments often makes the design and evaluation process challenging. In this work, we leverage MA… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Accepted by 2024 IEEE Forum on Integrated and Sustainable Transportation Systems