Skip to main content

Showing 1–50 of 418 results for author: Wu, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10013  [pdf, other

    cs.DC cs.NE

    TBA: Faster Large Language Model Training Using SSD-Based Activation Offloading

    Authors: Kun Wu, Jeongmin Brian Park, Xiaofan Zhang, Mert Hidayetoğlu, Vikram Sharma Mailthody, Sitao Huang, Steven Sam Lumetta, Wen-mei Hwu

    Abstract: The growth rate of the GPU memory capacity has not been able to keep up with that of the size of large language models (LLMs), hindering the model training process. In particular, activations -- the intermediate tensors produced during forward propagation and reused in backward propagation -- dominate the GPU memory use. To address this challenge, we propose TBA to efficiently offload activations… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  2. arXiv:2408.09865  [pdf, other

    cs.LG cs.CL cs.IR

    MAPLE: Enhancing Review Generation with Multi-Aspect Prompt LEarning in Explainable Recommendation

    Authors: Ching-Wen Yang, Che Wei Chen, Kun-da Wu, Hao Xu, Jui-Feng Yao, Hung-Yu Kao

    Abstract: Explainable Recommendation task is designed to receive a pair of user and item and output explanations to justify why an item is recommended to a user. Many models treat review-generation as a proxy of explainable recommendation. Although they are able to generate fluent and grammatical sentences, they suffer from generality and hallucination issues. We propose a personalized, aspect-controlled mo… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 8 main pages, 10 pages for appendix. Under review

  3. arXiv:2408.09532  [pdf, other

    stat.ML cs.LG stat.ME

    Deep Limit Model-free Prediction in Regression

    Authors: Kejin Wu, Dimitris N. Politis

    Abstract: In this paper, we provide a novel Model-free approach based on Deep Neural Network (DNN) to accomplish point prediction and prediction interval under a general regression setting. Usually, people rely on parametric or non-parametric models to bridge dependent and independent variables (Y and X). However, this classical method relies heavily on the correct model specification. Even for the non-para… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  4. arXiv:2408.09439  [pdf, other

    cs.IR cs.AI

    Towards Boosting LLMs-driven Relevance Modeling with Progressive Retrieved Behavior-augmented Prompting

    Authors: Zeyuan Chen, Haiyan Wu, Kaixin Wu, Wei Chen, Mingjie Zhong, Jia Xu, Zhongyi Liu, Wei Zhang

    Abstract: Relevance modeling is a critical component for enhancing user experience in search engines, with the primary objective of identifying items that align with users' queries. Traditional models only rely on the semantic congruence between queries and items to ascertain relevance. However, this approach represents merely one aspect of the relevance judgement, and is insufficient in isolation. Even pow… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  5. arXiv:2408.09357  [pdf, other

    cs.GR cs.AI cs.SD eess.AS

    Meta-Learning Empowered Meta-Face: Personalized Speaking Style Adaptation for Audio-Driven 3D Talking Face Animation

    Authors: Xukun Zhou, Fengxin Li, Ziqiao Peng, Kejian Wu, Jun He, Biao Qin, Zhaoxin Fan, Hongyan Liu

    Abstract: Audio-driven 3D face animation is increasingly vital in live streaming and augmented reality applications. While remarkable progress has been observed, most existing approaches are designed for specific individuals with predefined speaking styles, thus neglecting the adaptability to varied speaking styles. To address this limitation, this paper introduces MetaFace, a novel methodology meticulously… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  6. arXiv:2408.09251  [pdf, other

    cs.RO cs.AI cs.LG

    V2X-VLM: End-to-End V2X Cooperative Autonomous Driving Through Large Vision-Language Models

    Authors: Junwei You, Haotian Shi, Zhuoyu Jiang, Zilin Huang, Rui Gan, Keshu Wu, Xi Cheng, Xiaopeng Li, Bin Ran

    Abstract: Advancements in autonomous driving have increasingly focused on end-to-end (E2E) systems that manage the full spectrum of driving tasks, from environmental perception to vehicle navigation and control. This paper introduces V2X-VLM, an innovative E2E vehicle-infrastructure cooperative autonomous driving (VICAD) framework with large vision-language models (VLMs). V2X-VLM is designed to enhance situ… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  7. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  8. arXiv:2407.21118  [pdf, other

    cs.AI cs.LG

    Palu: Compressing KV-Cache with Low-Rank Projection

    Authors: Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin, Chong-Yan Chen, Yu-Fang Hu, Pei-Shuo Wang, Ning-Chi Huang, Luis Ceze, Kai-Chiang Wu

    Abstract: KV-Cache compression methods generally sample a KV-Cache of effectual tokens or quantize it into lower bits. However, these methods cannot exploit the redundancy of the hidden dimension of KV tensors. This paper investigates a unique hidden dimension approach called Palu, a novel KV-Cache compression framework that utilizes low-rank projection. Palu decomposes the linear layers into low-rank matri… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  9. arXiv:2407.20099  [pdf, other

    cs.CV

    RSC-SNN: Exploring the Trade-off Between Adversarial Robustness and Accuracy in Spiking Neural Networks via Randomized Smoothing Coding

    Authors: Keming Wu, Man Yao, Yuhong Chou, Xuerui Qiu, Rui Yang, Bo Xu, Guoqi Li

    Abstract: Spiking Neural Networks (SNNs) have received widespread attention due to their unique neuronal dynamics and low-power nature. Previous research empirically shows that SNNs with Poisson coding are more robust than Artificial Neural Networks (ANNs) on small-scale datasets. However, it is still unclear in theory how the adversarial robustness of SNNs is derived, and whether SNNs can still maintain it… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  10. arXiv:2407.19259  [pdf, other

    cs.CV cs.AI

    Fine-Grained Scene Graph Generation via Sample-Level Bias Prediction

    Authors: Yansheng Li, Tingzhu Wang, Kang Wu, Linlin Wang, Xin Guo, Wenbin Wang

    Abstract: Scene Graph Generation (SGG) aims to explore the relationships between objects in images and obtain scene summary graphs, thereby better serving downstream tasks. However, the long-tailed problem has adversely affected the scene graph's quality. The predictions are dominated by coarse-grained relationships, lacking more informative fine-grained ones. The union region of one object pair (i.e., one… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: 24 pages, 10 figures, ECCV2024

  11. arXiv:2407.16900  [pdf, other

    cs.LG cs.AI cs.CY

    Regulating AI Adaptation: An Analysis of AI Medical Device Updates

    Authors: Kevin Wu, Eric Wu, Kit Rodolfa, Daniel E. Ho, James Zou

    Abstract: While the pace of development of AI has rapidly progressed in recent years, the implementation of safe and effective regulatory frameworks has lagged behind. In particular, the adaptive nature of AI models presents unique challenges to regulators as updating a model can improve its performance but also introduce safety risks. In the US, the Food and Drug Administration (FDA) has been a forerunner… ▽ More

    Submitted 22 June, 2024; originally announced July 2024.

    Journal ref: CHIL 2024

  12. arXiv:2407.15264  [pdf, other

    cs.DC cs.LG

    LSM-GNN: Large-scale Storage-based Multi-GPU GNN Training by Optimizing Data Transfer Scheme

    Authors: Jeongmin Brian Park, Kun Wu, Vikram Sharma Mailthody, Zaid Quresh, Scott Mahlke, Wen-mei Hwu

    Abstract: Graph Neural Networks (GNNs) are widely used today in recommendation systems, fraud detection, and node/link classification tasks. Real world GNNs continue to scale in size and require a large memory footprint for storing graphs and embeddings that often exceed the memory capacities of the target GPUs used for training. To address limited memory capacities, traditional GNN training approaches use… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  13. arXiv:2407.13284  [pdf, other

    cs.IR

    Semantic-aware Representation Learning for Homography Estimation

    Authors: Yuhan Liu, Qianxin Huang, Siqi Hui, Jingwen Fu, Sanping Zhou, Kangyi Wu, Pengna Li, Jinjun Wang

    Abstract: Homography estimation is the task of determining the transformation from an image pair. Our approach focuses on employing detector-free feature matching methods to address this issue. Previous work has underscored the importance of incorporating semantic information, however there still lacks an efficient way to utilize semantic information. Previous methods suffer from treating the semantics as a… ▽ More

    Submitted 5 August, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  14. arXiv:2407.10449  [pdf, other

    cs.LG stat.ML

    A Fast, Robust Elliptical Slice Sampling Implementation for Linearly Truncated Multivariate Normal Distributions

    Authors: Kaiwen Wu, Jacob R. Gardner

    Abstract: Elliptical slice sampling, when adapted to linearly truncated multivariate normal distributions, is a rejection-free Markov chain Monte Carlo method. At its core, it requires analytically constructing an ellipse-polytope intersection. The main novelty of this paper is an algorithm that computes this intersection in $\mathcal{O}(m \log m)$ time, where $m$ is the number of linear inequality constrai… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 13 pages

  15. Exploring Knowledge Transfer in Evolutionary Many-task Optimization: A Complex Network Perspective

    Authors: Yudong Yang, Kai Wu, Xiangyi Teng, Handing Wang, He Yu, Jing Liu

    Abstract: The field of evolutionary many-task optimization (EMaTO) is increasingly recognized for its ability to streamline the resolution of optimization challenges with repetitive characteristics, thereby conserving computational resources. This paper tackles the challenge of crafting efficient knowledge transfer mechanisms within EMaTO, a task complicated by the computational demands of individual task e… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 9 pages, accepted by GECCO 2024 poster

  16. arXiv:2407.08153  [pdf, other

    cs.CV

    Lifelong Histopathology Whole Slide Image Retrieval via Distance Consistency Rehearsal

    Authors: Xinyu Zhu, Zhiguo Jiang, Kun Wu, Jun Shi, Yushan Zheng

    Abstract: Content-based histopathological image retrieval (CBHIR) has gained attention in recent years, offering the capability to return histopathology images that are content-wise similar to the query one from an established database. However, in clinical practice, the continuously expanding size of WSI databases limits the practical application of the current CBHIR methods. In this paper, we propose a Li… ▽ More

    Submitted 12 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted for MICCAI 2024

  17. arXiv:2407.07504  [pdf, other

    cs.CV

    Pan-cancer Histopathology WSI Pre-training with Position-aware Masked Autoencoder

    Authors: Kun Wu, Zhiguo Jiang, Kunming Tang, Jun Shi, Fengying Xie, Wei Wang, Haibo Wu, Yushan Zheng

    Abstract: Large-scale pre-training models have promoted the development of histopathology image analysis. However, existing self-supervised methods for histopathology images focus on learning patch features, while there is still a lack of available pre-training models for WSI-level feature learning. In this paper, we propose a novel self-supervised learning framework for pan-cancer WSI-level representation… ▽ More

    Submitted 15 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  18. arXiv:2407.05736  [pdf, other

    cs.AI cs.CV

    TransMA: an explainable multi-modal deep learning model for predicting properties of ionizable lipid nanoparticles in mRNA delivery

    Authors: Kun Wu, Zixu Wang, Xiulong Yang, Yangyang Chen, Zhenqi Han, Jialu Zhang, Lizhuang Liu

    Abstract: As the primary mRNA delivery vehicles, ionizable lipid nanoparticles (LNPs) exhibit excellent safety, high transfection efficiency, and strong immune response induction. However, the screening process for LNPs is time-consuming and costly. To expedite the identification of high-transfection-efficiency mRNA drug delivery systems, we propose an explainable LNPs transfection efficiency prediction mod… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 14 pages, 9 figures

  19. arXiv:2407.04203  [pdf, other

    cs.CV

    HCS-TNAS: Hybrid Constraint-driven Semi-supervised Transformer-NAS for Ultrasound Image Segmentation

    Authors: Renqi Chen, Xinzhe Zheng, Haoyang Su, Kehan Wu

    Abstract: Precise ultrasound segmentation is vital for clinicians to provide comprehensive diagnoses. However, developing a model that accurately segments ultrasound images is challenging due to the images' low quality and the scarcity of extensive labeled data. This results in two main solutions: (1) optimizing multi-scale feature representations, and (2) increasing resistance to data dependency. The first… ▽ More

    Submitted 16 August, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  20. arXiv:2407.02431  [pdf, other

    cs.LG cs.CR

    On the Robustness of Graph Reduction Against GNN Backdoor

    Authors: Yuxuan Zhu, Michael Mandulak, Kerui Wu, George Slota, Yuseok Jeon, Ka-Ho Chow, Lei Yu

    Abstract: Graph Neural Networks (GNNs) are gaining popularity across various domains due to their effectiveness in learning graph-structured data. Nevertheless, they have been shown to be susceptible to backdoor poisoning attacks, which pose serious threats to real-world applications. Meanwhile, graph reduction techniques, including coarsening and sparsification, which have long been employed to improve the… ▽ More

    Submitted 8 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  21. arXiv:2407.01846  [pdf, other

    cs.CV

    Investigating the Segment Anything Foundation Model for Mapping Smallholder Agriculture Field Boundaries Without Training Labels

    Authors: Pratyush Tripathy, Kathy Baylis, Kyle Wu, Jyles Watson, Ruizhe Jiang

    Abstract: Accurate mapping of agricultural field boundaries is crucial for enhancing outcomes like precision agriculture, crop monitoring, and yield estimation. However, extracting these boundaries from satellite images is challenging, especially for smallholder farms and data-scarce environments. This study explores the Segment Anything Model (SAM) to delineate agricultural field boundaries in Bihar, India… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 11 pages, 6 main figures, 7 supplementary figures

  22. arXiv:2406.13988  [pdf, other

    cs.CV

    LGmap: Local-to-Global Mapping Network for Online Long-Range Vectorized HD Map Construction

    Authors: Kuang Wu, Sulei Nian, Can Shen, Chuan Yang, Zhanbin Li

    Abstract: This report introduces the first-place winning solution for the Autonomous Grand Challenge 2024 - Mapless Driving. In this report, we introduce a novel online mapping pipeline LGmap, which adept at long-range temporal model. Firstly, we propose symmetric view transformation(SVT), a hybrid view transformation module. Our approach overcomes the limitations of forward sparse feature representation an… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  23. arXiv:2406.13743  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.MM

    GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation

    Authors: Baiqi Li, Zhiqiu Lin, Deepak Pathak, Jiayao Li, Yixin Fei, Kewen Wu, Tiffany Ling, Xide Xia, Pengchuan Zhang, Graham Neubig, Deva Ramanan

    Abstract: While text-to-visual models now produce photo-realistic images and videos, they struggle with compositional text prompts involving attributes, relationships, and higher-order reasoning such as logic and comparison. In this work, we conduct an extensive human study on GenAI-Bench to evaluate the performance of leading image and video generation models in various aspects of compositional text-to-vis… ▽ More

    Submitted 21 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: We open-source our dataset, model, and code at: https://linzhiqiu.github.io/papers/genai_bench ; Project page: https://linzhiqiu.github.io/papers/genai_bench ; GenAI-Bench was first introduced in arxiv:2404.01291. This article extends it with an additional GenAI-Rank benchmark.

  24. arXiv:2406.11941  [pdf, other

    cs.LG cs.AI cs.RO

    Crossfusor: A Cross-Attention Transformer Enhanced Conditional Diffusion Model for Car-Following Trajectory Prediction

    Authors: Junwei You, Haotian Shi, Keshu Wu, Keke Long, Sicheng Fu, Sikai Chen, Bin Ran

    Abstract: Vehicle trajectory prediction is crucial for advancing autonomous driving and advanced driver assistance systems (ADAS), enhancing road safety and traffic efficiency. While traditional methods have laid foundational work, modern deep learning techniques, particularly transformer-based models and generative approaches, have significantly improved prediction accuracy by capturing complex and non-lin… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  25. arXiv:2406.11643  [pdf, other

    cs.CV

    AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection

    Authors: Lingjie Kong, Kai Wu, Xiaobin Hu, Wenhui Han, Jinlong Peng, Chengming Xu, Donghao Luo, Jiangning Zhang, Chengjie Wang, Yanwei Fu

    Abstract: Text-to-image based object customization, aiming to generate images with the same identity (ID) as objects of interest in accordance with text prompts and reference images, has made significant progress. However, recent customizing research is dominated by specialized tasks, such as human customization or virtual try-on, leaving a gap in general object customization. To this end, we introduce AnyM… ▽ More

    Submitted 5 July, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  26. arXiv:2406.06496  [pdf, other

    cs.LG cs.CL cs.CV

    Direct Preference Optimization for Suppressing Hallucinated Prior Exams in Radiology Report Generation

    Authors: Oishi Banerjee, Hong-Yu Zhou, Subathra Adithan, Stephen Kwak, Kay Wu, Pranav Rajpurkar

    Abstract: Recent advances in generative vision-language models (VLMs) have exciting potential implications for AI in radiology, yet VLMs are also known to produce hallucinations, nonsensical text, and other unwanted behaviors that can waste clinicians' time and cause patient harm. Drawing on recent work on direct preference optimization (DPO), we propose a simple method for modifying the behavior of pretrai… ▽ More

    Submitted 14 June, 2024; v1 submitted 10 June, 2024; originally announced June 2024.

    Comments: Added acknowledgemnts

  27. arXiv:2406.01870  [pdf, other

    cs.LG stat.ML

    Understanding Stochastic Natural Gradient Variational Inference

    Authors: Kaiwen Wu, Jacob R. Gardner

    Abstract: Stochastic natural gradient variational inference (NGVI) is a popular posterior inference method with applications in various probabilistic models. Despite its wide usage, little is known about the non-asymptotic convergence rate in the \emph{stochastic} setting. We aim to lessen this gap and provide a better understanding. For conjugate likelihoods, we prove the first $\mathcal{O}(\frac{1}{T})$ n… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  28. arXiv:2406.01316  [pdf, other

    cs.CV cs.AI

    Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUs

    Authors: Vitor Fortes Rey, Lala Shakti Swarup Ray, Xia Qingxin, Kaishun Wu, Paul Lukowicz

    Abstract: Due to the scarcity of labeled sensor data in HAR, prior research has turned to video data to synthesize Inertial Measurement Units (IMU) data, capitalizing on its rich activity annotations. However, generating IMU data from videos presents challenges for HAR in real-world settings, attributed to the poor quality of synthetic IMU data and its limited efficacy in subtle, fine-grained motions. In th… ▽ More

    Submitted 27 July, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: ISWC 2024

  29. arXiv:2405.20613  [pdf, other

    cs.CL

    FineRadScore: A Radiology Report Line-by-Line Evaluation Technique Generating Corrections with Severity Scores

    Authors: Alyssa Huang, Oishi Banerjee, Kay Wu, Eduardo Pontes Reis, Pranav Rajpurkar

    Abstract: The current gold standard for evaluating generated chest x-ray (CXR) reports is through radiologist annotations. However, this process can be extremely time-consuming and costly, especially when evaluating large numbers of reports. In this work, we present FineRadScore, a Large Language Model (LLM)-based automated evaluation metric for generated CXR reports. Given a candidate report and a ground-t… ▽ More

    Submitted 12 August, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

  30. arXiv:2405.20343  [pdf, other

    cs.CV cs.GR cs.LG

    Unique3D: High-Quality and Efficient 3D Mesh Generation from a Single Image

    Authors: Kailu Wu, Fangfu Liu, Zhihan Cai, Runjie Yan, Hanyang Wang, Yating Hu, Yueqi Duan, Kaisheng Ma

    Abstract: In this work, we introduce Unique3D, a novel image-to-3D framework for efficiently generating high-quality 3D meshes from single-view images, featuring state-of-the-art generation fidelity and strong generalizability. Previous methods based on Score Distillation Sampling (SDS) can produce diversified 3D results by distilling 3D knowledge from large 2D diffusion models, but they usually suffer from… ▽ More

    Submitted 13 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: Project page: https://wukailu.github.io/Unique3D

    ACM Class: I.2.10

  31. arXiv:2405.20281  [pdf, other

    cs.CR quant-ph

    Tight Characterizations for Preprocessing against Cryptographic Salting

    Authors: Fangqi Dong, Qipeng Liu, Kewen Wu

    Abstract: Cryptography often considers the strongest yet plausible attacks in the real world. Preprocessing (a.k.a. non-uniform attack) plays an important role in both theory and practice: an efficient online attacker can take advantage of advice prepared by a time-consuming preprocessing stage. Salting is a heuristic strategy to counter preprocessing attacks by feeding a small amount of randomness to the… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

  32. arXiv:2405.20081  [pdf, other

    cs.CV cs.AI

    NoiseBoost: Alleviating Hallucination with Noise Perturbation for Multimodal Large Language Models

    Authors: Kai Wu, Boyuan Jiang, Zhengkai Jiang, Qingdong He, Donghao Luo, Shengzhi Wang, Qingwen Liu, Chengjie Wang

    Abstract: Multimodal large language models (MLLMs) contribute a powerful mechanism to understanding visual information building on large language models. However, MLLMs are notorious for suffering from hallucinations, especially when generating lengthy, detailed descriptions for images. Our analysis reveals that hallucinations stem from the inherent summarization mechanism of large language models, leading… ▽ More

    Submitted 31 May, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: 14 pages, 5 figures with supplementary material

  33. arXiv:2405.17372  [pdf, other

    cs.AI cs.LG cs.RO

    BehaviorGPT: Smart Agent Simulation for Autonomous Driving with Next-Patch Prediction

    Authors: Zikang Zhou, Haibo Hu, Xinhong Chen, Jianping Wang, Nan Guan, Kui Wu, Yung-Hui Li, Yu-Kai Huang, Chun Jason Xue

    Abstract: Simulating realistic interactions among traffic agents is crucial for efficiently validating the safety of autonomous driving systems. Existing leading simulators primarily use an encoder-decoder structure to encode the historical trajectories for future simulation. However, such a paradigm complicates the model architecture, and the manual separation of history and future trajectories leads to lo… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  34. arXiv:2405.17336  [pdf, other

    cs.CL

    XFormParser: A Simple and Effective Multimodal Multilingual Semi-structured Form Parser

    Authors: Xianfu Cheng, Hang Zhang, Jian Yang, Xiang Li, Weixiao Zhou, Kui Wu, Fei Liu, Wei Zhang, Tao Sun, Tongliang Li, Zhoujun Li

    Abstract: In the domain of document AI, semi-structured form parsing plays a crucial role. This task leverages techniques from key information extraction (KIE), dealing with inputs that range from plain text to intricate modal data comprising images and structural layouts. The advent of pre-trained multimodal models has driven the extraction of key information from form documents in different formats such a… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 10 pages, 3 figures, 6 tables

  35. arXiv:2405.14133  [pdf, other

    cs.LG cs.AI cs.SC

    Automated Loss function Search for Class-imbalanced Node Classification

    Authors: Xinyu Guo, Kai Wu, Xiaoyu Zhang, Jing Liu

    Abstract: Class-imbalanced node classification tasks are prevalent in real-world scenarios. Due to the uneven distribution of nodes across different classes, learning high-quality node representations remains a challenging endeavor. The engineering of loss functions has shown promising potential in addressing this issue. It involves the meticulous design of loss functions, utilizing information about the qu… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  36. arXiv:2405.12484  [pdf, other

    cs.GR

    Meta-Homogenization for Knitwear Simulation

    Authors: Chun Yuan, Kui Wu, Haoyang Shi, Lei Lan, Yuxing Qiu, Cem Yuksel, Huamin Wang, Chenfanfu Jiang, Yin Yang

    Abstract: This paper presents meta-homogenization, a spatially varying homogenization scheme for knitwear simulation. We are motivated by the observation that macro-scale fabric dynamics is strongly correlated with its underlying knitting patterns. Therefore, homogenization towards a single material is less effective when the knitting is complex and non-repetitive. Our method tackles this challenge by homog… ▽ More

    Submitted 23 May, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

  37. arXiv:2405.10988  [pdf, other

    cs.LG cs.AI

    Flow Score Distillation for Diverse Text-to-3D Generation

    Authors: Runjie Yan, Kailu Wu, Kaisheng Ma

    Abstract: Recent advancements in Text-to-3D generation have yielded remarkable progress, particularly through methods that rely on Score Distillation Sampling (SDS). While SDS exhibits the capability to create impressive 3D assets, it is hindered by its inherent maximum-likelihood-seeking essence, resulting in limited diversity in generation outcomes. In this paper, we discover that the Denoise Diffusion Im… ▽ More

    Submitted 28 July, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: Consistent Flow Distillation is an improved version of this paper

  38. arXiv:2405.10739  [pdf, other

    cs.CV cs.AI

    Efficient Multimodal Large Language Models: A Survey

    Authors: Yizhang Jin, Jian Li, Yexin Liu, Tianjun Gu, Kai Wu, Zhengkai Jiang, Muyang He, Bo Zhao, Xin Tan, Zhenye Gan, Yabiao Wang, Chengjie Wang, Lizhuang Ma

    Abstract: In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficient and lightweight MLLMs has enormous potential, e… ▽ More

    Submitted 9 August, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  39. arXiv:2405.10565  [pdf, other

    cs.GR

    Real-time Level-of-Detail Strand-based Hair Rendering

    Authors: Tao Huang, Yang Zhou, Daqi Lin, Junqiu Zhu, Ling-Qi Yan, Kui Wu

    Abstract: Strand-based hair rendering has become increasingly popular in production for its realistic appearance. However, the prevailing level-of-detail solution employing hair cards for distant hair models introduces a significant discontinuity in dynamics and appearance during the transition from strands to cards. We introduce an innovative real-time framework for strand-based hair rendering that ensures… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: 12 pages, 10 figures, 1 performance plot

    ACM Class: I.3.5; I.3.3

  40. arXiv:2405.09285  [pdf, other

    cs.LG math.NA

    Positional Knowledge is All You Need: Position-induced Transformer (PiT) for Operator Learning

    Authors: Junfeng Chen, Kailiang Wu

    Abstract: Operator learning for Partial Differential Equations (PDEs) is rapidly emerging as a promising approach for surrogate modeling of intricate systems. Transformers with the self-attention mechanism$\unicode{x2013}$a powerful tool originally designed for natural language processing$\unicode{x2013}$have recently been adapted for operator learning. However, they confront challenges, including high comp… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

    Journal ref: Poster in International Conference on Machine Learning (ICML) 2024

  41. arXiv:2405.08276  [pdf, other

    stat.ML cs.LG stat.CO

    Scalable Subsampling Inference for Deep Neural Networks

    Authors: Kejin Wu, Dimitris N. Politis

    Abstract: Deep neural networks (DNN) has received increasing attention in machine learning applications in the last several years. Recently, a non-asymptotic error bound has been developed to measure the performance of the fully connected DNN estimator with ReLU activation functions for estimating regression models. The paper at hand gives a small improvement on the current error bound based on the latest r… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  42. arXiv:2405.06201  [pdf, other

    cs.CV

    PhysMLE: Generalizable and Priors-Inclusive Multi-task Remote Physiological Measurement

    Authors: Jiyao Wang, Hao Lu, Ange Wang, Xiao Yang, Yingcong Chen, Dengbo He, Kaishun Wu

    Abstract: Remote photoplethysmography (rPPG) has been widely applied to measure heart rate from face videos. To increase the generalizability of the algorithms, domain generalization (DG) attracted increasing attention in rPPG. However, when rPPG is extended to simultaneously measure more vital signs (e.g., respiration and blood oxygen saturation), achieving generalizability brings new challenges. Although… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  43. arXiv:2405.05663  [pdf, other

    cs.CV

    RPBG: Towards Robust Neural Point-based Graphics in the Wild

    Authors: Qingtian Zhu, Zizhuang Wei, Zhongtian Zheng, Yifan Zhan, Zhuyu Yao, Jiawang Zhang, Kejian Wu, Yinqiang Zheng

    Abstract: Point-based representations have recently gained popularity in novel view synthesis, for their unique advantages, e.g., intuitive geometric representation, simple manipulation, and faster convergence. However, based on our observation, these point-based neural re-rendering methods are only expected to perform well under ideal conditions and suffer from noisy, patchy points and unbounded scenes, wh… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

    Comments: ECCV 2024

  44. arXiv:2405.04324  [pdf, other

    cs.AI cs.CL cs.SE

    Granite Code Models: A Family of Open Foundation Models for Code Intelligence

    Authors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang, Yikang Shen, Aditya Prasad, Adriana Meza Soria, Michele Merler, Parameswaran Selvam, Saptha Surendran, Shivdeep Singh, Manish Sethi, Xuan-Hong Dang, Pengyuan Li, Kun-Lung Wu, Syed Zawad, Andrew Coleman, Matthew White, Mark Lewis, Raju Pavuluri, Yan Koyfman, Boris Lublinsky, Maximilien de Bayser, Ibrahim Abdelaziz, Kinjal Basu, Mayank Agarwal , et al. (21 additional authors not shown)

    Abstract: Large Language Models (LLMs) trained on code are revolutionizing the software development process. Increasingly, code LLMs are being integrated into software development environments to improve the productivity of human programmers, and LLM-based agents are beginning to show promise for handling complex tasks autonomously. Realizing the full potential of code LLMs requires a wide range of capabili… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

    Comments: Corresponding Authors: Rameswar Panda, Ruchir Puri; Equal Contributors: Mayank Mishra, Matt Stallone, Gaoyuan Zhang

  45. arXiv:2405.03728  [pdf, other

    cs.NE cs.AI

    GLHF: General Learned Evolutionary Algorithm Via Hyper Functions

    Authors: Xiaobin Li, Kai Wu, Yujian Betterest Li, Xiaoyu Zhang, Handing Wang, Jing Liu

    Abstract: Pretrained Optimization Models (POMs) leverage knowledge gained from optimizing various tasks, providing efficient solutions for new optimization challenges through direct usage or fine-tuning. Despite the inefficiencies and limited generalization abilities observed in current POMs, our proposed model, the general pre-trained optimization model (GPOM), addresses these shortcomings. GPOM constructs… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  46. arXiv:2405.00566  [pdf, other

    cs.CE cs.CL q-fin.GN

    NumLLM: Numeric-Sensitive Large Language Model for Chinese Finance

    Authors: Huan-Yi Su, Ke Wu, Yu-Hao Huang, Wu-Jun Li

    Abstract: Recently, many works have proposed various financial large language models (FinLLMs) by pre-training from scratch or fine-tuning open-sourced LLMs on financial corpora. However, existing FinLLMs exhibit unsatisfactory performance in understanding financial text when numeric variables are involved in questions. In this paper, we propose a novel LLM, called numeric-sensitive large language model (Nu… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  47. arXiv:2404.18891  [pdf, other

    cs.CV cs.AI cs.LG

    IPixMatch: Boost Semi-supervised Semantic Segmentation with Inter-Pixel Relation

    Authors: Kebin Wu, Wenbin Li, Xiaofei Xiao

    Abstract: The scarcity of labeled data in real-world scenarios is a critical bottleneck of deep learning's effectiveness. Semi-supervised semantic segmentation has been a typical solution to achieve a desirable tradeoff between annotation cost and segmentation performance. However, previous approaches, whether based on consistency regularization or self-training, tend to neglect the contextual knowledge emb… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 7 pages, 2 figures

  48. arXiv:2404.18327  [pdf, other

    cs.CV

    MultiMAE-DER: Multimodal Masked Autoencoder for Dynamic Emotion Recognition

    Authors: Peihao Xiang, Chaohao Lin, Kaida Wu, Ou Bai

    Abstract: This paper presents a novel approach to processing multimodal data for dynamic emotion recognition, named as the Multimodal Masked Autoencoder for Dynamic Emotion Recognition (MultiMAE-DER). The MultiMAE-DER leverages the closely correlated representation information within spatiotemporal sequences across visual and audio modalities. By utilizing a pre-trained masked autoencoder model, the MultiMA… ▽ More

    Submitted 16 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: Camera-ready Version, Accepted by ICPRS 2024

  49. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  50. arXiv:2404.13611  [pdf, other

    cs.CV cs.CL

    Video sentence grounding with temporally global textual knowledge

    Authors: Cai Chen, Runzhong Zhang, Jianjun Gao, Kejun Wu, Kim-Hui Yap, Yi Wang

    Abstract: Temporal sentence grounding involves the retrieval of a video moment with a natural language query. Many existing works directly incorporate the given video and temporally localized query for temporal grounding, overlooking the inherent domain gap between different modalities. In this paper, we utilize pseudo-query features containing extensive temporally global textual knowledge sourced from the… ▽ More

    Submitted 1 June, 2024; v1 submitted 21 April, 2024; originally announced April 2024.