Skip to main content

Showing 1–50 of 3,996 results for author: Yang, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11491  [pdf, other

    cs.AI

    Nothing in Excess: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering

    Authors: Zouying Cao, Yifei Yang, Hai Zhao

    Abstract: Safety alignment is indispensable for Large language models (LLMs) to defend threats from malicious instructions. However, recent researches reveal safety-aligned LLMs prone to reject benign queries due to the exaggerated safety issue, limiting their helpfulness. In this paper, we propose a Safety-Conscious Activation Steering (SCANS) method to mitigate the exaggerated safety concerns in aligned L… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  2. arXiv:2408.11405  [pdf, other

    cs.SD eess.AS

    DDSP Guitar Amp: Interpretable Guitar Amplifier Modeling

    Authors: Yen-Tung Yeh, Yu-Hua Chen, Yuan-Chiao Cheng, Jui-Te Wu, Jun-Jie Fu, Yi-Fan Yeh, Yi-Hsuan Yang

    Abstract: Neural network models for guitar amplifier emulation, while being effective, often demand high computational cost and lack interpretability. Drawing ideas from physical amplifier design, this paper aims to address these issues with a new differentiable digital signal processing (DDSP)-based model, called ``DDSP guitar amp,'' that models the four components of a guitar amp (i.e., preamp, tone stack… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Preprint paper

  3. arXiv:2408.10588  [pdf, other

    cs.CV cs.GR

    DEGAS: Detailed Expressions on Full-Body Gaussian Avatars

    Authors: Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang

    Abstract: Although neural rendering has made significant advancements in creating lifelike, animatable full-body and head avatars, incorporating detailed expressions into full-body avatars remains largely unexplored. We present DEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for full-body avatars with rich facial expressions. Trained on multiview videos of a given subject, our method lea… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  4. arXiv:2408.10285  [pdf, other

    cs.LG cs.AI cs.CE

    BatGPT-Chem: A Foundation Large Model For Retrosynthesis Prediction

    Authors: Yifei Yang, Runhan Shi, Zuchao Li, Shu Jiang, Bao-Liang Lu, Yang Yang, Hai Zhao

    Abstract: Retrosynthesis analysis is pivotal yet challenging in drug discovery and organic chemistry. Despite the proliferation of computational tools over the past decade, AI-based systems often fall short in generalizing across diverse reaction types and exploring alternative synthetic pathways. This paper presents BatGPT-Chem, a large language model with 15 billion parameters, tailored for enhanced retro… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  5. arXiv:2408.09943  [pdf

    cs.CR

    Calibrating Noise for Group Privacy in Subsampled Mechanisms

    Authors: Yangfan Jiang, Xinjian Luo, Yin Yang, Xiaokui Xiao

    Abstract: Given a group size m and a sensitive dataset D, group privacy (GP) releases information about D with the guarantee that the adversary cannot infer with high confidence whether the underlying data is D or a neighboring dataset D' that differs from D by m records. GP generalizes the well-established notion of differential privacy (DP) for protecting individuals' privacy; in particular, when m=1, GP… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: accepted for publication in Proceedings of VLDB Endowment (PVLDB) 2025

  6. arXiv:2408.09851  [pdf, other

    cs.NI eess.SY

    ISAC-Fi: Enabling Full-fledged Monostatic Sensing over Wi-Fi Communication

    Authors: Zhe Chen, Chao Hu, Tianyue Zheng, Hangcheng Cao, Yanbing Yang, Yen Chu, Hongbo Jiang, Jun Luo

    Abstract: Whereas Wi-Fi communications have been exploited for sensing purpose for over a decade, the bistatic or multistatic nature of Wi-Fi still poses multiple challenges, hampering real-life deployment of integrated sensing and communication (ISAC) within Wi-Fi framework. In this paper, we aim to re-design WiFi so that monostatic sensing (mimicking radar) can be achieved over the multistatic communicati… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 14 pages, 22 figures

  7. arXiv:2408.09768  [pdf, other

    cs.AI

    MalLight: Influence-Aware Coordinated Traffic Signal Control for Traffic Signal Malfunctions

    Authors: Qinchen Yang, Zejun Xie, Hua Wei, Desheng Zhang, Yu Yang

    Abstract: Urban traffic is subject to disruptions that cause extended waiting time and safety issues at signalized intersections. While numerous studies have addressed the issue of intelligent traffic systems in the context of various disturbances, traffic signal malfunction, a common real-world occurrence with significant repercussions, has received comparatively limited attention. The primary objective of… ▽ More

    Submitted 20 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

    Comments: Paper accepted to CIKM24 Full Research track

  8. arXiv:2408.09706  [pdf, other

    cs.CV

    MePT: Multi-Representation Guided Prompt Tuning for Vision-Language Model

    Authors: Xinyang Wang, Yi Yang, Minfeng Zhu, Kecheng Zheng, Shi Liu, Wei Chen

    Abstract: Recent advancements in pre-trained Vision-Language Models (VLMs) have highlighted the significant potential of prompt tuning for adapting these models to a wide range of downstream tasks. However, existing prompt tuning methods typically map an image to a single representation, limiting the model's ability to capture the diverse ways an image can be described. To address this limitation, we invest… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  9. arXiv:2408.09588  [pdf, other

    cs.AI

    SynTraC: A Synthetic Dataset for Traffic Signal Control from Traffic Monitoring Cameras

    Authors: Tiejin Chen, Prithvi Shirke, Bharatesh Chakravarthi, Arpitsinh Vaghela, Longchao Da, Duo Lu, Yezhou Yang, Hua Wei

    Abstract: This paper introduces SynTraC, the first public image-based traffic signal control dataset, aimed at bridging the gap between simulated environments and real-world traffic management challenges. Unlike traditional datasets for traffic signal control which aim to provide simplified feature vectors like vehicle counts from traffic simulators, SynTraC provides real-style images from the CARLA simulat… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted to IEEE ITSC2024

  10. arXiv:2408.09172  [pdf, other

    cs.AI cs.CL

    Unc-TTP: A Method for Classifying LLM Uncertainty to Improve In-Context Example Selection

    Authors: Hsiu-Yuan Huang, Zichen Wu, Yutong Yang, Junzhao Zhang, Yunfang Wu

    Abstract: Nowadays, Large Language Models (LLMs) have demonstrated exceptional performance across various downstream tasks. However, it is challenging for users to discern whether the responses are generated with certainty or are fabricated to meet user expectations. Estimating the uncertainty of LLMs is particularly challenging due to their vast scale and the lack of white-box access. In this work, we prop… ▽ More

    Submitted 20 August, 2024; v1 submitted 17 August, 2024; originally announced August 2024.

    Comments: 9 pages, long paper

  11. arXiv:2408.08978  [pdf, other

    cs.CL

    See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses

    Authors: Yulong Chen, Yang Liu, Jianhao Yan, Xuefeng Bai, Ming Zhong, Yinghao Yang, Ziyi Yang, Chenguang Zhu, Yue Zhang

    Abstract: The impressive performance of Large Language Models (LLMs) has consistently surpassed numerous human-designed benchmarks, presenting new challenges in assessing the shortcomings of LLMs. Designing tasks and finding LLMs' limitations are becoming increasingly important. In this paper, we investigate the question of whether an LLM can discover its own limitations from the errors it makes. To this en… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: COLM 2024

  12. arXiv:2408.08669  [pdf, other

    cs.SD eess.AS

    HSDreport: Heart Sound Diagnosis with Echocardiography Reports

    Authors: Zihan Zhao, Pingjie Wang, Liudan Zhao, Yuchen Yang, Ya Zhang, Kun Sun, Xin Sun, Xin Zhou, Yu Wang, Yanfeng Wang

    Abstract: Heart sound auscultation holds significant importance in the diagnosis of congenital heart disease. However, existing methods for Heart Sound Diagnosis (HSD) tasks are predominantly limited to a few fixed categories, framing the HSD task as a rigid classification problem that does not fully align with medical practice and offers only limited information to physicians. Besides, such methods do not… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  13. arXiv:2408.08047  [pdf, other

    cs.LG cs.IR

    An Efficient Continuous Control Perspective for Reinforcement-Learning-based Sequential Recommendation

    Authors: Jun Wang, Likang Wu, Qi Liu, Yu Yang

    Abstract: Sequential recommendation, where user preference is dynamically inferred from sequential historical behaviors, is a critical task in recommender systems (RSs). To further optimize long-term user engagement, offline reinforcement-learning-based RSs have become a mainstream technique as they provide an additional advantage in avoiding global explorations that may harm online users' experiences. Howe… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  14. arXiv:2408.07986  [pdf, other

    cs.LG

    Experimental evaluation of offline reinforcement learning for HVAC control in buildings

    Authors: Jun Wang, Linyan Li, Qi Liu, Yu Yang

    Abstract: Reinforcement learning (RL) techniques have been increasingly investigated for dynamic HVAC control in buildings. However, most studies focus on exploring solutions in online or off-policy scenarios without discussing in detail the implementation feasibility or effectiveness of dealing with purely offline datasets or trajectories. The lack of these works limits the real-world deployment of RL-base… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  15. AIE: Auction Information Enhanced Framework for CTR Prediction in Online Advertising

    Authors: Yang Yang, Bo Chen, Chenxu Zhu, Menghui Zhu, Xinyi Dai, Huifeng Guo, Muyu Zhang, Zhenhua Dong, Ruiming Tang

    Abstract: Click-Through Rate (CTR) prediction is a fundamental technique for online advertising recommendation and the complex online competitive auction process also brings many difficulties to CTR optimization. Recent studies have shown that introducing posterior auction information contributes to the performance of CTR prediction. However, existing work doesn't fully capitalize on the benefits of auction… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  16. arXiv:2408.07888  [pdf, other

    cs.CL

    Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering

    Authors: Yushi Yang, Andrew M. Bean, Robert McCraith, Adam Mahdi

    Abstract: Training Large Language Models (LLMs) incurs substantial data-related costs, motivating the development of data-efficient training methods through optimised data ordering and selection. Human-inspired learning strategies, such as curriculum learning, offer possibilities for efficient training by organising data according to common human learning practices. Despite evidence that fine-tuning with cu… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  17. arXiv:2408.07556  [pdf, other

    cs.LG

    PolyCL: Contrastive Learning for Polymer Representation Learning via Explicit and Implicit Augmentations

    Authors: Jiajun Zhou, Yijie Yang, Austin M. Mroz, Kim E. Jelfs

    Abstract: Polymers play a crucial role in a wide array of applications due to their diverse and tunable properties. Establishing the relationship between polymer representations and their properties is crucial to the computational design and screening of potential polymers via machine learning. The quality of the representation significantly influences the effectiveness of these computational methods. Here,… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  18. arXiv:2408.07430  [pdf, other

    cs.CV

    UAHOI: Uncertainty-aware Robust Interaction Learning for HOI Detection

    Authors: Mu Chen, Minghan Chen, Yi Yang

    Abstract: This paper focuses on Human-Object Interaction (HOI) detection, addressing the challenge of identifying and understanding the interactions between humans and objects within a given image or video frame. Spearheaded by Detection Transformer (DETR), recent developments lead to significant improvements by replacing traditional region proposals by a set of learnable queries. However, despite the power… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted by CVIU

  19. arXiv:2408.07246  [pdf, other

    cs.LG cs.CV

    ChemVLM: Exploring the Power of Multimodal Large Language Models in Chemistry Area

    Authors: Junxian Li, Di Zhang, Xunzhi Wang, Zeying Hao, Jingdi Lei, Qian Tan, Cai Zhou, Wei Liu, Yaotian Yang, Xinrui Xiong, Weiyun Wang, Zhe Chen, Wenhai Wang, Wei Li, Shufei Zhang, Mao Su, Wanli Ouyang, Yuqiang Li, Dongzhan Zhou

    Abstract: Large Language Models (LLMs) have achieved remarkable success and have been applied across various scientific fields, including chemistry. However, many chemical tasks require the processing of visual information, which cannot be successfully handled by existing chemical LLMs. This brings a growing need for models capable of integrating multimodal information in the chemical domain. In this paper,… ▽ More

    Submitted 16 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 11 pages, updated version

  20. arXiv:2408.07084  [pdf

    cs.LG cs.AI

    Dynamic Hypergraph-Enhanced Prediction of Sequential Medical Visits

    Authors: Wangying Yang, Zitao Zheng, Shi Bo, Zhizhong Wu, Bo Zhang, Yuanfang Yang

    Abstract: This study introduces a pioneering Dynamic Hypergraph Networks (DHCE) model designed to predict future medical diagnoses from electronic health records with enhanced accuracy. The DHCE model innovates by identifying and differentiating acute and chronic diseases within a patient's visit history, constructing dynamic hypergraphs that capture the complex, high-order interactions between diseases. It… ▽ More

    Submitted 19 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

  21. arXiv:2408.06816  [pdf, other

    cs.AI cs.CL

    MAQA: Evaluating Uncertainty Quantification in LLMs Regarding Data Uncertainty

    Authors: Yongjin Yang, Haneul Yoo, Hwaran Lee

    Abstract: Although large language models (LLMs) are capable of performing various tasks, they still suffer from producing plausible but incorrect responses. To improve the reliability of LLMs, recent research has focused on uncertainty quantification to predict whether a response is correct or not. However, most uncertainty quantification methods have been evaluated on questions requiring a single clear ans… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  22. arXiv:2408.06740  [pdf, other

    cs.CV cs.AI

    DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion

    Authors: Yujia Wu, Yiming Shi, Jiwei Wei, Chengwei Sun, Yuyang Zhou, Yang Yang, Heng Tao Shen

    Abstract: Personalized text-to-image generation has gained significant attention for its capability to generate high-fidelity portraits of specific identities conditioned on user-defined prompts. Existing methods typically involve test-time fine-tuning or instead incorporating an additional pre-trained branch. However, these approaches struggle to simultaneously address the demands of efficiency, identity f… ▽ More

    Submitted 18 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 9 pages,8 figures

  23. arXiv:2408.06574  [pdf, other

    cs.CL

    SparkRA: A Retrieval-Augmented Knowledge Service System Based on Spark Large Language Model

    Authors: Dayong Wu, Jiaqi Li, Baoxin Wang, Honghong Zhao, Siyuan Xue, Yanjie Yang, Zhijun Chang, Rui Zhang, Li Qian, Bo Wang, Shijin Wang, Zhixiong Zhang, Guoping Hu

    Abstract: Large language models (LLMs) have shown remarkable achievements across various language tasks.To enhance the performance of LLMs in scientific literature services, we developed the scientific literature LLM (SciLit-LLM) through pre-training and supervised fine-tuning on scientific literature, building upon the iFLYTEK Spark LLM. Furthermore, we present a knowledge service system Spark Research Ass… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  24. arXiv:2408.06327  [pdf, other

    cs.AI cs.CL cs.CV

    VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

    Authors: Xiao Liu, Tianjie Zhang, Yu Gu, Iat Long Iong, Yifan Xu, Xixuan Song, Shudan Zhang, Hanyu Lai, Xinyi Liu, Hanlin Zhao, Jiadai Sun, Xinyue Yang, Yu Yang, Zehan Qi, Shuntian Yao, Xueqiao Sun, Siyi Cheng, Qinkai Zheng, Hao Yu, Hanchen Zhang, Wenyi Hong, Ming Ding, Lihang Pan, Xiaotao Gu, Aohan Zeng , et al. (5 additional authors not shown)

    Abstract: Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable Visual Foundation Agents. These agents are postulated to excel across a myriad of tasks, potentially approaching general artificial intelligence. However, existing benchmarks fail to sufficiently challenge or showcase the full potential of LMM… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  25. arXiv:2408.06072  [pdf, other

    cs.CV

    CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

    Authors: Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong, Jie Tang

    Abstract: We introduce CogVideoX, a large-scale diffusion transformer model designed for generating videos based on text prompts. To efficently model video data, we propose to levearge a 3D Variational Autoencoder (VAE) to compress videos along both spatial and temporal dimensions. To improve the text-video alignment, we propose an expert transformer with the expert adaptive LayerNorm to facilitate the deep… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  26. arXiv:2408.06053  [pdf, other

    cs.SD eess.AS

    PyNeuralFx: A Python Package for Neural Audio Effect Modeling

    Authors: Yen-Tung Yeh, Wen-Yi Hsiao, Yi-Hsuan Yang

    Abstract: We present PyNeuralFx, an open-source Python toolkit designed for research on neural audio effect modeling. The toolkit provides an intuitive framework and offers a comprehensive suite of features, including standardized implementation of well-established model architectures, loss functions, and easy-to-use visualization tools. As such, it helps promote reproducibility for research on neural audio… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: toolkit paper

  27. arXiv:2408.05938  [pdf, other

    cs.CV

    Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation

    Authors: Utkarsh Nath, Rajeev Goel, Eun Som Jeon, Changhoon Kim, Kyle Min, Yezhou Yang, Yingzhen Yang, Pavan Turaga

    Abstract: To address the data scarcity associated with 3D assets, 2D-lifting techniques such as Score Distillation Sampling (SDS) have become a widely adopted practice in text-to-3D generation pipelines. However, the diffusion models used in these techniques are prone to viewpoint bias and thus lead to geometric inconsistencies such as the Janus problem. To counter this, we introduce MT3D, a text-to-3D gene… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 9 pages, 8 figures

  28. arXiv:2408.05738  [pdf, other

    cs.CL

    Language-Informed Beam Search Decoding for Multilingual Machine Translation

    Authors: Yilin Yang, Stefan Lee, Prasad Tadepalli

    Abstract: Beam search decoding is the de-facto method for decoding auto-regressive Neural Machine Translation (NMT) models, including multilingual NMT where the target language is specified as an input. However, decoding multilingual NMT models commonly produces ``off-target'' translations -- yielding translation outputs not in the intended language. In this paper, we first conduct an error analysis of off-… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: ACL 2024 Findings

  29. arXiv:2408.05564  [pdf, other

    cs.NE cs.CE

    Meta-heuristic Optimizer Inspired by the Philosophy of Yi Jing

    Authors: Yisheng Yang, Sim Kuan Goh, Qing Cai, Shen Yuong Wong, Ho-Kin Tang

    Abstract: Drawing inspiration from the philosophy of Yi Jing, the Yin-Yang pair optimization (YYPO) algorithm has been shown to achieve competitive performance in single objective optimizations, in addition to the advantage of low time complexity when compared to other population-based meta-heuristics. Building upon a reversal concept in Yi Jing, we propose the novel Yi optimization (YI) algorithm. Specific… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication. arXiv admin note: substantial text overlap with arXiv:2104.08564

  30. arXiv:2408.05556  [pdf, other

    cs.CV cs.AI cs.LG cs.NE

    Evolutionary Neural Architecture Search for 3D Point Cloud Analysis

    Authors: Yisheng Yang, Guodong Du, Chean Khim Toa, Ho-Kin Tang, Sim Kuan Goh

    Abstract: Neural architecture search (NAS) automates neural network design by using optimization algorithms to navigate architecture spaces, reducing the burden of manual architecture design. While NAS has achieved success, applying it to emerging domains, such as analyzing unstructured 3D point clouds, remains underexplored due to the data lying in non-Euclidean spaces, unlike images. This paper presents S… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication

  31. arXiv:2408.05541  [pdf, other

    cs.CL

    P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for Optimizing LLM Training

    Authors: Yingxuan Yang, Huayi Wang, Muning Wen, Weinan Zhang

    Abstract: In the rapidly evolving field of Large Language Models (LLMs), selecting high-quality data for fine-tuning is essential. This paper focuses on task-specific data pruning and selection to enhance fine-tuning. We introduce an innovative framework, termed P3, which improves LLM performance through a dynamic, adaptive training strategy. Specifically, P3 comprises the following components: (1) Policy-d… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  32. arXiv:2408.05477  [pdf, other

    cs.CV

    Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE

    Authors: Yiying Yang, Fukun Yin, Jiayuan Fan, Xin Chen, Wanzhang Li, Gang Yu

    Abstract: As Artificial Intelligence Generated Content (AIGC) advances, a variety of methods have been developed to generate text, images, videos, and 3D objects from single or multimodal inputs, contributing efforts to emulate human-like cognitive content creation. However, generating realistic large-scale scenes from a single input presents a challenge due to the complexities involved in ensuring consiste… ▽ More

    Submitted 20 August, 2024; v1 submitted 10 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2305.11588 by other authors

  33. arXiv:2408.05105  [pdf, other

    cs.HC cs.GR

    Evaluating Layout Dimensionalities in PC+VR Asymmetric Collaborative Decision Making

    Authors: Daniel Enriquez, Wai Tong, Chris North, Huamin Qu, Yalong Yang

    Abstract: With the commercialization of virtual/augmented reality (VR/AR) devices, there is an increasing interest in combining immersive and non-immersive devices (e.g., desktop computers) for asymmetric collaborations. While such asymmetric settings have been examined in social platforms, significant questions around layout dimensionality in data-driven decision-making remain underexplored. A crucial inqu… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: To be presented at ACM ISS 2024

  34. arXiv:2408.04856  [pdf, other

    cs.OS

    Wasm-bpf: Streamlining eBPF Deployment in Cloud Environments with WebAssembly

    Authors: Yusheng Zheng, Tong Yu, Yiwei Yang, Andrew Quinn

    Abstract: The extended Berkeley Packet Filter (eBPF) is extensively utilized for observability and performance analysis in cloud-native environments. However, deploying eBPF programs across a heterogeneous cloud environment presents challenges, including compatibility issues across different kernel versions, operating systems, runtimes, and architectures. Traditional deployment methods, such as standalone c… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  35. arXiv:2408.04829  [pdf, other

    cs.SD eess.AS

    Hyper Recurrent Neural Network: Condition Mechanisms for Black-box Audio Effect Modeling

    Authors: Yen-Tung Yeh, Wen-Yi Hsiao, Yi-Hsuan Yang

    Abstract: Recurrent neural networks (RNNs) have demonstrated impressive results for virtual analog modeling of audio effects. These networks process time-domain audio signals using a series of matrix multiplication and nonlinear activation functions to emulate the behavior of the target device accurately. To additionally model the effect of the knobs for an RNN-based model, existing approaches integrate con… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted to DAFx24

  36. arXiv:2408.04665  [pdf, other

    cs.CL cs.AI

    LLM-based MOFs Synthesis Condition Extraction using Few-Shot Demonstrations

    Authors: Lei Shi, Zhimeng Liu, Yi Yang, Weize Wu, Yuyang Zhang, Hongbo Zhang, Jing Lin, Siyu Wu, Zihan Chen, Ruiming Li, Nan Wang, Zipeng Liu, Huobin Tan, Hongyi Gao, Yue Zhang, Ge Wang

    Abstract: The extraction of Metal-Organic Frameworks (MOFs) synthesis conditions from literature text has been challenging but crucial for the logical design of new MOFs with desirable functionality. The recent advent of large language models (LLMs) provides disruptively new solution to this long-standing problem and latest researches have reported over 90% F1 in extracting correct conditions from MOFs lite… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  37. arXiv:2408.04388  [pdf, other

    cs.MM cs.AI cs.IR

    MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models

    Authors: Haoxuan Li, Zhengmao Yang, Yunshan Ma, Yi Bin, Yang Yang, Tat-Seng Chua

    Abstract: We study an emerging and intriguing problem of multimodal temporal event forecasting with large language models. Compared to using text or graph modalities, the investigation of utilizing images for temporal event forecasting has not been fully explored, especially in the era of large language models (LLMs). To bridge this gap, we are particularly interested in two key questions of: 1) why images… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    ACM Class: H.3.3

  38. arXiv:2408.04057  [pdf, other

    cs.LG cs.AI

    PowerPM: Foundation Model for Power Systems

    Authors: Shihao Tu, Yupeng Zhang, Jing Zhang, Yang Yang

    Abstract: The emergence of abundant electricity time series (ETS) data provides ample opportunities for various applications in the power systems, including demand-side management, grid stability, and consumer behavior analysis. Deep learning models have advanced ETS modeling by effectively capturing sequence dependence. Nevertheless, learning a generic representation of ETS data for various applications re… ▽ More

    Submitted 21 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: 23 pages, 5 figures, 8 tables

  39. arXiv:2408.03124  [pdf, other

    eess.SY cs.LG

    Closed-loop Diffusion Control of Complex Physical Systems

    Authors: Long Wei, Haodong Feng, Peiyan Hu, Tao Zhang, Yuchen Yang, Xiang Zheng, Ruiqi Feng, Dixia Fan, Tailin Wu

    Abstract: The control problems of complex physical systems have wide applications in science and engineering. Several previous works have demonstrated that generative control methods based on diffusion models have significant advantages for solving these problems. However, existing generative control methods face challenges in handling closed-loop control, which is an inherent constraint for effective contr… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

  40. arXiv:2408.03097  [pdf, other

    cs.CV

    Prototype Learning for Micro-gesture Classification

    Authors: Guoliang Chen, Fei Wang, Kun Li, Zhiliang Wu, Hehe Fan, Yi Yang, Meng Wang, Dan Guo

    Abstract: In this paper, we briefly introduce the solution developed by our team, HFUT-VUT, for the track of Micro-gesture Classification in the MiGA challenge at IJCAI 2024. The task of micro-gesture classification task involves recognizing the category of a given video clip, which focuses on more fine-grained and subtle body movements compared to typical action recognition tasks. Given the inherent comple… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: 1st Place in Micro-gesture Classification in MiGA at IJCAI-2024

  41. arXiv:2408.02240  [pdf, other

    cs.HC

    CompositingVis: Exploring Interactions for Creating Composite Visualizations in Immersive Environments

    Authors: Qian Zhu, Tao Lu, Shunan Guo, Xiaojuan Ma, Yalong Yang

    Abstract: Composite visualization represents a widely embraced design that combines multiple visual representations to create an integrated view. However, the traditional approach of creating composite visualizations in immersive environments typically occurs asynchronously outside of the immersive space and is carried out by experienced experts. In this work, we aim to empower users to participate in the c… ▽ More

    Submitted 7 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

    Comments: 11 pages

    Journal ref: IEEE VIS 2024

  42. arXiv:2408.02231  [pdf, other

    cs.CV

    REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language Models

    Authors: Agneet Chatterjee, Yiran Luo, Tejas Gokhale, Yezhou Yang, Chitta Baral

    Abstract: Text-to-Image (T2I) and multimodal large language models (MLLMs) have been adopted in solutions for several computer vision and multimodal learning tasks. However, it has been found that such vision-language models lack the ability to correctly reason over spatial relationships. To tackle this shortcoming, we develop the REVISION framework which improves spatial fidelity in vision-language models.… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV 2024. Project Page : https://agneetchatterjee.com/revision/

  43. arXiv:2408.02208  [pdf, other

    eess.SY cs.LG physics.soc-ph

    Multi-level Traffic-Responsive Tilt Camera Surveillance through Predictive Correlated Online Learning

    Authors: Tao Li, Zilin Bian, Haozhe Lei, Fan Zuo, Ya-Ting Yang, Quanyan Zhu, Zhenning Li, Kaan Ozbay

    Abstract: In urban traffic management, the primary challenge of dynamically and efficiently monitoring traffic conditions is compounded by the insufficient utilization of thousands of surveillance cameras along the intelligent transportation system. This paper introduces the multi-level Traffic-responsive Tilt Camera surveillance system (TTC-X), a novel framework designed for dynamic and efficient monitorin… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted to Transportation Research Part C special issue: Modelling, Learning, and Control of Conventional, Cooperative and Automated Motorway and Urban Traffic Systems

  44. arXiv:2408.02085  [pdf, other

    cs.CV cs.AI cs.CL eess.SP

    Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models

    Authors: Yulei Qin, Yuncheng Yang, Pengcheng Guo, Gang Li, Hang Shao, Yuchen Shi, Zihan Xu, Yun Gu, Ke Li, Xing Sun

    Abstract: Instruction tuning plays a critical role in aligning large language models (LLMs) with human preference. Despite the vast amount of open instruction datasets, naively training a LLM on all existing instructions may not be optimal and practical. To pinpoint the most beneficial datapoints, data assessment and selection methods have been proposed in the fields of natural language processing (NLP) and… ▽ More

    Submitted 7 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

    Comments: review, survey, 28 pages, 2 figures, 4 tables

  45. arXiv:2408.01690  [pdf, other

    cs.CV cs.AI cs.MM

    IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection

    Authors: Hong Guan, Yancheng Wang, Lulu Xie, Soham Nag, Rajeev Goel, Niranjan Erappa Narayana Swamy, Yingzhen Yang, Chaowei Xiao, Jonathan Prisby, Ross Maciejewski, Jia Zou

    Abstract: Effective fraud detection and analysis of government-issued identity documents, such as passports, driver's licenses, and identity cards, are essential in thwarting identity theft and bolstering security on online platforms. The training of accurate fraud detection and analysis tools depends on the availability of extensive identity document datasets. However, current publicly available benchmark… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: 40 pages

  46. arXiv:2408.01688  [pdf, other

    cs.CV

    SiamMo: Siamese Motion-Centric 3D Object Tracking

    Authors: Yuxiang Yang, Yingqi Deng, Jing Zhang, Hongjie Gu, Zhekang Don

    Abstract: Current 3D single object tracking methods primarily rely on the Siamese matching-based paradigm, which struggles with textureless and incomplete LiDAR point clouds. Conversely, the motion-centric paradigm avoids appearance matching, thus overcoming these issues. However, its complex multi-stage pipeline and the limited temporal modeling capability of a single-stream architecture constrain its pote… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  47. Degrade to Function: Towards Eco-friendly Morphing Devices that Function Through Programmed Sequential Degradation

    Authors: Qiuyu Lu, Semina Yi, Mentian Gan, Jihong Huang, Xiao Zhang, Yue Yang, Chenyi Shen, Lining Yao

    Abstract: While it seems counterintuitive to think of degradation within an operating device as beneficial, one may argue that when rationally designed, the controlled breakdown of materials can be harnessed for specific functions. To apply this principle to the design of morphing devices, we introduce the concept of Degrade to Function (DtF). This concept aims to create eco-friendly and self-contained morp… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: 24 pages, 24 figures, The 37th Annual ACM Symposium on User Interface Software and Technology (UIST 24)

  48. arXiv:2408.01551  [pdf, other

    cs.SD eess.AS

    PiCoGen2: Piano cover generation with transfer learning approach and weakly aligned data

    Authors: Chih-Pin Tan, Hsin Ai, Yi-Hsin Chang, Shuen-Huei Guan, Yi-Hsuan Yang

    Abstract: Piano cover generation aims to create a piano cover from a pop song. Existing approaches mainly employ supervised learning and the training demands strongly-aligned and paired song-to-piano data, which is built by remapping piano notes to song audio. This would, however, result in the loss of piano information and accordingly cause inconsistencies between the original and remapped piano versions.… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: Accepted at the 25th International Society for Music Information Retrieval Conference (ISMIR), 2024

  49. arXiv:2408.01402  [pdf, other

    cs.LG cs.AI cs.CL

    Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer

    Authors: Yu Yang, Pan Xu

    Abstract: Decision Transformer (DT) has emerged as a promising class of algorithms in offline reinforcement learning (RL) tasks, leveraging pre-collected datasets and Transformer's capability to model long sequences. Recent works have demonstrated that using parts of trajectories from training tasks as prompts in DT enhances its performance on unseen tasks, giving rise to Prompt-DT methods. However, collect… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 2 figures, 8 tables. Accepted by the Training Agents with Foundation Models Workshop at RLC 2024

  50. arXiv:2408.01332  [pdf, other

    cs.LG

    HMDN: Hierarchical Multi-Distribution Network for Click-Through Rate Prediction

    Authors: Xingyu Lou, Yu Yang, Kuiyao Dong, Heyuan Huang, Wenyi Yu, Ping Wang, Xiu Li, Jun Wang

    Abstract: As the recommendation service needs to address increasingly diverse distributions, such as multi-population, multi-scenario, multitarget, and multi-interest, more and more recent works have focused on multi-distribution modeling and achieved great progress. However, most of them only consider modeling in a single multi-distribution manner, ignoring that mixed multi-distributions often coexist and… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.