Skip to main content

Showing 1–50 of 6,229 results for author: Wang, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10901  [pdf, other

    cs.CV cs.AI cs.LG

    A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse

    Authors: Zhongliang Guo, Lei Fang, Jingyu Lin, Yifei Qian, Shuai Zhao, Zeyu Wang, Junhao Dong, Cunjian Chen, Ognjen Arandjelović, Chun Pong Lau

    Abstract: Recent advancements in generative AI, particularly Latent Diffusion Models (LDMs), have revolutionized image synthesis and manipulation. However, these generative techniques raises concerns about data misappropriation and intellectual property infringement. Adversarial attacks on machine learning models have been extensively studied, and a well-established body of research has extended these techn… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 21 pages, 7 figures, 10 tables

  2. arXiv:2408.10899  [pdf, other

    cs.RO

    All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents

    Authors: Zhiqiang Wang, Hao Zheng, Yunshuang Nie, Wenjun Xu, Qingwei Wang, Hua Ye, Zhe Li, Kaidong Zhang, Xuewen Cheng, Wanxi Dong, Chang Cai, Liang Lin, Feng Zheng, Xiaodan Liang

    Abstract: Embodied AI is transforming how AI systems interact with the physical world, yet existing datasets are inadequate for developing versatile, general-purpose agents. These limitations include a lack of standardized formats, insufficient data diversity, and inadequate data volume. To address these issues, we introduce ARIO (All Robots In One), a new data standard that enhances existing datasets by of… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Project website: https://imaei.github.io/project_pages/ario/

  3. arXiv:2408.10895  [pdf, ps, other

    cs.AI

    Analytical and Empirical Study of Herding Effects in Recommendation Systems

    Authors: Hong Xie, Mingze Zhong, Defu Lian, Zhen Wang, Enhong Chen

    Abstract: Online rating systems are often used in numerous web or mobile applications, e.g., Amazon and TripAdvisor, to assess the ground-truth quality of products. Due to herding effects, the aggregation of historical ratings (or historical collective opinion) can significantly influence subsequent ratings, leading to misleading and erroneous assessments. We study how to manage product ratings via rating a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 29 pages

  4. arXiv:2408.10853  [pdf, other

    cs.SD cs.AI eess.AS

    Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?

    Authors: Yuankun Xie, Chenxu Xiong, Xiaopeng Wang, Zhiyong Wang, Yi Lu, Xin Qi, Ruibo Fu, Yukun Liu, Zhengqi Wen, Jianhua Tao, Guanjun Li, Long Ye

    Abstract: Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society. Consequently, effective audio deepfake detection technologies to detect ALM-based a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  5. arXiv:2408.10852  [pdf, other

    cs.SD eess.AS

    EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

    Authors: Xin Qi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Shuchen Shi, Yi Lu, Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Yukun Liu, Guanjun Li, Xuefei Liu, Yongwei Li

    Abstract: In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  6. arXiv:2408.10849  [pdf, other

    cs.SD eess.AS

    A Noval Feature via Color Quantisation for Fake Audio Detection

    Authors: Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Yukun Liu, Guanjun Li, Xin Qi, Yi Lu, Xuefei Liu, Yongwei Li

    Abstract: In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features, such as wav2vec2.0 and Masked Auto Encoder. These methods have proven that using real audio for reconstruction pre-training can better help the model… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: accepted by ISCSLP2024

  7. arXiv:2408.10588  [pdf, other

    cs.CV cs.GR

    DEGAS: Detailed Expressions on Full-Body Gaussian Avatars

    Authors: Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang

    Abstract: Although neural rendering has made significant advancements in creating lifelike, animatable full-body and head avatars, incorporating detailed expressions into full-body avatars remains largely unexplored. We present DEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for full-body avatars with rich facial expressions. Trained on multiview videos of a given subject, our method lea… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  8. arXiv:2408.10568  [pdf, other

    cs.RO

    Constrained Behavior Cloning for Robotic Learning

    Authors: Wensheng Liang, Jun Xie, Zhicheng Wang, Jianwei Tan, Xiaoguang Ma

    Abstract: Behavior cloning (BC) is a popular supervised imitation learning method in the societies of robotics, autonomous driving, etc., wherein complex skills can be learned by direct imitation from expert demonstrations. Despite its rapid development, it is still affected by limited field of view where accumulation of sensors and joint noise bring compounding errors. In this paper, we introduced geometri… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  9. arXiv:2408.10228  [pdf, other

    eess.SP cs.LG

    ECG Unveiled: Analysis of Client Re-identification Risks in Real-World ECG Datasets

    Authors: Ziyu Wang, Anil Kanduri, Seyed Amir Hossein Aqajari, Salar Jafarlou, Sanaz R. Mousavi, Pasi Liljeberg, Shaista Malik, Amir M. Rahmani

    Abstract: While ECG data is crucial for diagnosing and monitoring heart conditions, it also contains unique biometric information that poses significant privacy risks. Existing ECG re-identification studies rely on exhaustive analysis of numerous deep learning features, confining to ad-hoc explainability towards clinicians decision making. In this work, we delve into explainability of ECG re-identification… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  10. arXiv:2408.10198  [pdf, other

    cs.CV cs.GR

    MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

    Authors: Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, Hongzhi Wu, Hao Su

    Abstract: Open-world 3D reconstruction models have recently garnered significant attention. However, without sufficient 3D inductive bias, existing methods typically entail expensive training costs and struggle to extract high-quality 3D meshes. In this work, we introduce MeshFormer, a sparse-view reconstruction model that explicitly leverages 3D native structure, input guidance, and training supervision. S… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 20 pages, 9 figures

  11. arXiv:2408.10134  [pdf, other

    cs.CV cs.MM eess.IV

    Perceptual Depth Quality Assessment of Stereoscopic Omnidirectional Images

    Authors: Wei Zhou, Zhou Wang

    Abstract: Depth perception plays an essential role in the viewer experience for immersive virtual reality (VR) visual environments. However, previous research investigations in the depth quality of 3D/stereoscopic images are rather limited, and in particular, are largely lacking for 3D viewing of 360-degree omnidirectional content. In this work, we make one of the first attempts to develop an objective qual… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE TCSVT

  12. arXiv:2408.10088  [pdf, other

    cs.SI

    Recent Surge in Public Interest in Transportation: Sentiment Analysis of Baidu Apollo Go Using Weibo Data

    Authors: Shiqi Wang, Zhouye Zhao, Yuhang Xie, Mingchuan Ma, Zirui Chen, Zeyu Wang, Bohao Su, Wenrui Xu, Tianyi Li

    Abstract: Urban mobility and transportation systems have been profoundly transformed by the advancement of autonomous vehicle technologies. Baidu Apollo Go, a pioneer robotaxi service from the Chinese tech giant Baidu, has recently been widely deployed in major cities like Beijing and Wuhan, sparking increased conversation and offering a glimpse into the future of urban mobility. This study investigates p… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    ACM Class: J.4

  13. arXiv:2408.10006  [pdf, other

    cs.LG

    Unlocking the Power of LSTM for Long Term Time Series Forecasting

    Authors: Yaxuan Kong, Zepu Wang, Yuqi Nie, Tian Zhou, Stefan Zohren, Yuxuan Liang, Peng Sun, Qingsong Wen

    Abstract: Traditional recurrent neural network architectures, such as long short-term memory neural networks (LSTM), have historically held a prominent role in time series forecasting (TSF) tasks. While the recently introduced sLSTM for Natural Language Processing (NLP) introduces exponential gating and memory mixing that are beneficial for long term sequential learning, its potential short memory issue is… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  14. arXiv:2408.09702  [pdf, other

    cs.CV cs.AI cs.GR

    Photorealistic Object Insertion with Diffusion-Guided Inverse Rendering

    Authors: Ruofan Liang, Zan Gojcic, Merlin Nimier-David, David Acuna, Nandita Vijaykumar, Sanja Fidler, Zian Wang

    Abstract: The correct insertion of virtual objects in images of real-world scenes requires a deep understanding of the scene's lighting, geometry and materials, as well as the image formation process. While recent large-scale diffusion models have shown strong generative and inpainting capabilities, we find that current models do not sufficiently "understand" the scene shown in a single picture to generate… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: ECCV 2024, Project page: https://research.nvidia.com/labs/toronto-ai/DiPIR/

  15. arXiv:2408.09485  [pdf, other

    cs.CL

    Activated Parameter Locating via Causal Intervention for Model Merging

    Authors: Fanshuang Kong, Richong Zhang, Ziqiao Wang

    Abstract: Model merging combines multiple homologous models into one model, achieving convincing generalization without the necessity of additional training. A key challenge in this problem is resolving parameter redundancies and conflicts across multiple models. Existing models have demonstrated that dropping a portion of delta parameters can alleviate conflicts while maintaining performance. However, thes… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  16. arXiv:2408.09189  [pdf, other

    cs.LG cs.AI

    SA-GDA: Spectral Augmentation for Graph Domain Adaptation

    Authors: Jinhui Pang, Zixuan Wang, Jiliang Tang, Mingyan Xiao, Nan Yin

    Abstract: Graph neural networks (GNNs) have achieved impressive impressions for graph-related tasks. However, most GNNs are primarily studied under the cases of signal domain with supervised training, which requires abundant task-specific labels and is difficult to transfer to other domains. There are few works focused on domain adaptation for graph node classification. They mainly focused on aligning the f… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  17. arXiv:2408.08994  [pdf, ps, other

    cs.LG

    Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds

    Authors: Zhiyong Wang, Dongruo Zhou, John C. S. Lui, Wen Sun

    Abstract: Learning a transition model via Maximum Likelihood Estimation (MLE) followed by planning inside the learned model is perhaps the most standard and simplest Model-based Reinforcement Learning (RL) framework. In this work, we show that such a simple Model-based RL scheme, when equipped with optimistic and pessimistic planning procedures, achieves strong regret and sample complexity bounds in online… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  18. arXiv:2408.08989  [pdf, other

    cs.AI cs.CV

    Ask, Attend, Attack: A Effective Decision-Based Black-Box Targeted Attack for Image-to-Text Models

    Authors: Qingyuan Zeng, Zhenzhong Wang, Yiu-ming Cheung, Min Jiang

    Abstract: While image-to-text models have demonstrated significant advancements in various vision-language tasks, they remain susceptible to adversarial attacks. Existing white-box attacks on image-to-text models require access to the architecture, gradients, and parameters of the target model, resulting in low practicality. Although the recently proposed gray-box attacks have improved practicality, they su… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  19. arXiv:2408.08909  [pdf

    cs.CR cs.AI cs.DC

    An Adaptive Differential Privacy Method Based on Federated Learning

    Authors: Zhiqiang Wang, Xinyue Yu, Qianli Huang, Yongguang Gong

    Abstract: Differential privacy is one of the methods to solve the problem of privacy protection in federated learning. Setting the same privacy budget for each round will result in reduced accuracy in training. The existing methods of the adjustment of privacy budget consider fewer influencing factors and tend to ignore the boundaries, resulting in unreasonable privacy budgets. Therefore, we proposed an ada… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  20. arXiv:2408.08862  [pdf, other

    cs.LG

    Visual Agents as Fast and Slow Thinkers

    Authors: Guangyan Sun, Mingyu Jin, Zhenting Wang, Cheng-Long Wang, Siqi Ma, Qifan Wang, Ying Nian Wu, Yongfeng Zhang, Dongfang Liu

    Abstract: Achieving human-level intelligence requires refining cognitive distinctions between System 1 and System 2 thinking. While contemporary AI, driven by large language models, demonstrates human-like traits, it falls short of genuine cognition. Transitioning from structured benchmarks to real-world scenarios presents challenges for visual agents, often leading to inaccurate and overly confident respon… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  21. arXiv:2408.08780  [pdf, other

    cs.CL

    Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions

    Authors: Chenming Tang, Zhixiang Wang, Yunfang Wu

    Abstract: With the help of in-context learning (ICL), large language models (LLMs) have achieved impressive performance across various tasks. However, the function of descriptive instructions during ICL remains under-explored. In this work, we propose an ensemble prompt framework to describe the selection criteria of multiple in-context examples, and preliminary experiments on machine translation (MT) acros… ▽ More

    Submitted 20 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: 10 pages, 6 figures, 3 tables

  22. arXiv:2408.08708  [pdf, other

    cs.CV

    Decoupling Feature Representations of Ego and Other Modalities for Incomplete Multi-modal Brain Tumor Segmentation

    Authors: Kaixiang Yang, Wenqi Shan, Xudong Li, Xuan Wang, Xikai Yang, Xi Wang, Pheng-Ann Heng, Qiang Li, Zhiwei Wang

    Abstract: Multi-modal brain tumor segmentation typically involves four magnetic resonance imaging (MRI) modalities, while incomplete modalities significantly degrade performance. Existing solutions employ explicit or implicit modality adaptation, aligning features across modalities or learning a fused feature robust to modality incompleteness. They share a common goal of encouraging each modality to express… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 8 pages, 4 figures

  23. arXiv:2408.08524  [pdf, other

    cs.CV cs.AI

    GS-ID: Illumination Decomposition on Gaussian Splatting via Diffusion Prior and Parametric Light Source Optimization

    Authors: Kang Du, Zhihao Liang, Zeyu Wang

    Abstract: We present GS-ID, a novel framework for illumination decomposition on Gaussian Splatting, achieving photorealistic novel view synthesis and intuitive light editing. Illumination decomposition is an ill-posed problem facing three main challenges: 1) priors for geometry and material are often lacking; 2) complex illumination conditions involve multiple unknown light sources; and 3) calculating surfa… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 15 pages, 13 figures

  24. arXiv:2408.08515  [pdf, other

    cs.SE

    Selecting Initial Seeds for Better JVM Fuzzing

    Authors: Tianchang Gao, Junjie Chen, Dong Wang, Yile Guo, Yingquan Zhao, Zan Wang

    Abstract: Literature in traditional program fuzzing has confirmed that effectiveness is largely impacted by redundancy among initial seeds, thereby proposing a series of seed selection methods. JVM fuzzing, compared to traditional ones, presents unique characteristics, including large-scale and intricate code, and programs with both syntactic and semantic features. However, it remains unclear whether the ex… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  25. arXiv:2408.08147  [pdf, other

    cs.DC cs.CL cs.LG

    P/D-Serve: Serving Disaggregated Large Language Model at Scale

    Authors: Yibo Jin, Tao Wang, Huimin Lin, Mingyang Song, Peiyang Li, Yipeng Ma, Yicheng Shan, Zhengfan Yuan, Cailong Li, Yajing Sun, Tiandeng Wu, Xing Chu, Ruizhi Huan, Li Ma, Xiao You, Wenting Zhou, Yunpeng Ye, Wen Liu, Xiangkun Xu, Yongsheng Zhang, Tiantian Dong, Jiawei Zhu, Zhe Wang, Xijian Ju, Jianxun Song , et al. (5 additional authors not shown)

    Abstract: Serving disaggregated large language models (LLMs) over tens of thousands of xPU devices (GPUs or NPUs) with reliable performance faces multiple challenges. 1) Ignoring the diversity (various prefixes and tidal requests), treating all the prompts in a mixed pool is inadequate. To facilitate the similarity per scenario and minimize the inner mismatch on P/D (prefill and decoding) processing, fine-g… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  26. arXiv:2408.08078  [pdf, other

    cs.CV cs.AI

    Treat Stillness with Movement: Remote Sensing Change Detection via Coarse-grained Temporal Foregrounds Mining

    Authors: Xixi Wang, Zitian Wang, Jingtao Jiang, Lan Chen, Xiao Wang, Bo Jiang

    Abstract: Current works focus on addressing the remote sensing change detection task using bi-temporal images. Although good performance can be achieved, however, seldom of they consider the motion cues which may also be vital. In this work, we revisit the widely adopted bi-temporal images-based framework and propose a novel Coarse-grained Temporal Mining Augmented (CTMA) framework. To be specific, given th… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: In Peer Review

  27. arXiv:2408.08067  [pdf, other

    cs.CL cs.AI

    RAGChecker: A Fine-grained Framework for Diagnosing Retrieval-Augmented Generation

    Authors: Dongyu Ru, Lin Qiu, Xiangkun Hu, Tianhang Zhang, Peng Shi, Shuaichen Chang, Cheng Jiayang, Cunxiang Wang, Shichao Sun, Huanyu Li, Zizhao Zhang, Binjie Wang, Jiarong Jiang, Tong He, Zhiguo Wang, Pengfei Liu, Yue Zhang, Zheng Zhang

    Abstract: Despite Retrieval-Augmented Generation (RAG) showing promising capability in leveraging external knowledge, a comprehensive evaluation of RAG systems is still challenging due to the modular nature of RAG, evaluation of long-form responses and reliability of measurements. In this paper, we propose a fine-grained evaluation framework, RAGChecker, that incorporates a suite of diagnostic metrics for b… ▽ More

    Submitted 16 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

    Comments: Under Review. Github Repo: https://github.com/amazon-science/RAGChecker

  28. arXiv:2408.08050  [pdf, other

    cs.CV

    CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection

    Authors: Xunfa Lai, Zhiyu Yang, Jie Hu, Shengchuan Zhang, Liujuan Cao, Guannan Jiang, Zhiyu Wang, Songan Zhang, Rongrong Ji

    Abstract: Existing camouflaged object detection~(COD) methods depend heavily on large-scale pixel-level annotations.However, acquiring such annotations is laborious due to the inherent camouflage characteristics of the objects.Semi-supervised learning offers a promising solution to this challenge.Yet, its application in COD is hindered by significant pseudo-label noise, both pixel-level and instance-level.W… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV 2024

  29. arXiv:2408.08044  [pdf, other

    cs.CE

    Crystalline Material Discovery in the Era of Artificial Intelligence

    Authors: Zhenzhong Wang, Haowei Hua, Wanyu Lin, Ming Yang, Kay Chen Tan

    Abstract: Crystalline materials, with their symmetrical and periodic structures, possess a diverse array of properties and have been widely used in various fields, e.g., sustainable development. To discover crystalline materials, traditional experimental and computational approaches are often time-consuming and expensive. In these years, thanks to the explosive amount of crystalline materials data, great in… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  30. arXiv:2408.08034  [pdf, other

    cs.NI

    Centralized Network Utility Maximization with Accelerated Gradient Method

    Authors: Ying Tian, Zhiliang Wang, Xia Yin, Xingang Shi, Jiahai Yang, Han Zhang

    Abstract: Network utility maximization (NUM) is a well-studied problem for network traffic management and resource allocation. Because of the inherent decentralization and complexity of networks, most researches develop decentralized NUM algorithms. In recent years, the Software Defined Networking (SDN) architecture has been widely used, especially in cloud networks and inter-datacenter networks managed by… ▽ More

    Submitted 15 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

    Journal ref: 2022 IEEE 30th International Conference on Network Protocols (ICNP), pp. 1-11

  31. Physically Aware Synthesis Revisited: Guiding Technology Mapping with Primitive Logic Gate Placement

    Authors: Hongyang Pan, Cunqing Lan, Yiting Liu, Zhiang Wang, Li Shang, Xuan Zeng, Fan Yang, Keren Zhu

    Abstract: A typical VLSI design flow is divided into separated front-end logic synthesis and back-end physical design (PD) stages, which often require costly iterations between these stages to achieve design closure. Existing approaches face significant challenges, notably in utilizing feedback from physical metrics to better adapt and refine synthesis operations, and in establishing a unified and comprehen… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 9 pages, 8 figures, 2 tables

    Journal ref: 2024 International Conference on Computer-Aided Design, New Jersey, NY, USA, Oct 2024

  32. arXiv:2408.07867  [pdf, other

    cs.CV

    Continuous Perception Benchmark

    Authors: Zeyu Wang, Zhenzhen Weng, Serena Yeung-Levy

    Abstract: Humans continuously perceive and process visual signals. However, current video models typically either sample key frames sparsely or divide videos into chunks and densely sample within each chunk. This approach stems from the fact that most existing video benchmarks can be addressed by analyzing key frames or aggregating information from separate chunks. We anticipate that the next generation of… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  33. arXiv:2408.07728  [pdf, other

    cs.CR

    Moderator: Moderating Text-to-Image Diffusion Models through Fine-grained Context-based Policies

    Authors: Peiran Wang, Qiyu Li, Longxuan Yu, Ziyao Wang, Ang Li, Haojian Jin

    Abstract: We present Moderator, a policy-based model management system that allows administrators to specify fine-grained content moderation policies and modify the weights of a text-to-image (TTI) model to make it significantly more challenging for users to produce images that violate the policies. In contrast to existing general-purpose model editing techniques, which unlearn concepts without considering… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM CCS 2024

  34. arXiv:2408.07385  [pdf, other

    cs.IT eess.SP

    Iterative Equalization of CPM With Unitary Approximate Message Passing

    Authors: Zilong Liu, Yi Song, Qinghua Guo, Peng Sun, Kexian Gong, Zhongyong Wang

    Abstract: Continuous phase modulation (CPM) has extensive applications in wireless communications due to its high spectral and power efficiency. However, its nonlinear characteristics pose significant challenges for detection in frequency selective fading channels. This paper proposes an iterative receiver tailored for the detection of CPM signals over frequency selective fading channels. This design levera… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  35. arXiv:2408.06922  [pdf, other

    cs.SD cs.AI eess.AS

    Temporal Variability and Multi-Viewed Self-Supervised Representations to Tackle the ASVspoof5 Deepfake Challenge

    Authors: Yuankun Xie, Xiaopeng Wang, Zhiyong Wang, Ruibo Fu, Zhengqi Wen, Haonan Cheng, Long Ye

    Abstract: ASVspoof5, the fifth edition of the ASVspoof series, is one of the largest global audio security challenges. It aims to advance the development of countermeasure (CM) to discriminate bonafide and spoofed speech utterances. In this paper, we focus on addressing the problem of open-domain audio deepfake detection, which corresponds directly to the ASVspoof5 Track1 open condition. At first, we compre… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  36. arXiv:2408.06904  [pdf, other

    cs.CL

    Re-TASK: Revisiting LLM Tasks from Capability, Skill, and Knowledge Perspectives

    Authors: Zhihu Wang, Shiwan Zhao, Yu Wang, Heyuan Huang, Jiaxin Shi, Sitao Xie, Zhixing Wang, Yubo Zhang, Hongyan Li, Junchi Yan

    Abstract: As large language models (LLMs) continue to scale, their enhanced performance often proves insufficient for solving domain-specific tasks. Systematically analyzing their failures and effectively enhancing their performance remain significant challenges. This paper introduces the Re-TASK framework, a novel theoretical model that Revisits LLM Tasks from cApability, Skill, Knowledge perspectives, gui… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Work in Progress

  37. arXiv:2408.06793  [pdf, other

    cs.CL

    Layerwise Recurrent Router for Mixture-of-Experts

    Authors: Zihan Qiu, Zeyu Huang, Shuang Cheng, Yizhi Zhou, Zili Wang, Ivan Titov, Jie Fu

    Abstract: The scaling of large language models (LLMs) has revolutionized their capabilities in various tasks, yet this growth must be matched with efficient computational strategies. The Mixture-of-Experts (MoE) architecture stands out for its ability to scale model size without significantly increasing training costs. Despite their advantages, current MoE models often display parameter inefficiency. For in… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  38. arXiv:2408.06779  [pdf, other

    cs.CV

    ED$^4$: Explicit Data-level Debiasing for Deepfake Detection

    Authors: Jikang Cheng, Ying Zhang, Qin Zou, Zhiyuan Yan, Chao Liang, Zhongyuan Wang, Chen Li

    Abstract: Learning intrinsic bias from limited data has been considered the main reason for the failure of deepfake detection with generalizability. Apart from the discovered content and specific-forgery bias, we reveal a novel spatial bias, where detectors inertly anticipate observing structural forgery clues appearing at the image center, also can lead to the poor generalization of existing methods. We pr… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  39. arXiv:2408.06717  [pdf, other

    cs.LG cs.AI

    Computation-friendly Graph Neural Network Design by Accumulating Knowledge on Large Language Models

    Authors: Jialiang Wang, Shimin Di, Hanmo Liu, Zhili Wang, Jiachuan Wang, Lei Chen, Xiaofang Zhou

    Abstract: Graph Neural Networks (GNNs), like other neural networks, have shown remarkable success but are hampered by the complexity of their architecture designs, which heavily depend on specific data and tasks. Traditionally, designing proper architectures involves trial and error, which requires intensive manual effort to optimize various components. To reduce human workload, researchers try to develop a… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  40. arXiv:2408.06635  [pdf, other

    cs.CV

    IDRetracor: Towards Visual Forensics Against Malicious Face Swapping

    Authors: Jikang Cheng, Jiaxin Ai, Zhen Han, Chao Liang, Qin Zou, Zhongyuan Wang, Qian Wang

    Abstract: The face swapping technique based on deepfake methods poses significant social risks to personal identity security. While numerous deepfake detection methods have been proposed as countermeasures against malicious face swapping, they can only output binary labels (Fake/Real) for distinguishing fake content without reliable and traceable evidence. To achieve visual forensics and target face attribu… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  41. arXiv:2408.06625  [pdf, other

    cs.CV

    DePatch: Towards Robust Adversarial Patch for Evading Person Detectors in the Real World

    Authors: Jikang Cheng, Ying Zhang, Zhongyuan Wang, Zou Qin, Chen Li

    Abstract: Recent years have seen an increasing interest in physical adversarial attacks, which aim to craft deployable patterns for deceiving deep neural networks, especially for person detectors. However, the adversarial patterns of existing patch-based attacks heavily suffer from the self-coupling issue, where a degradation, caused by physical transformations, in any small patch segment can result in a co… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  42. arXiv:2408.06614  [pdf, other

    cs.CV cs.MM

    ViMo: Generating Motions from Casual Videos

    Authors: Liangdong Qiu, Chengxing Yu, Yanran Li, Zhao Wang, Haibin Huang, Chongyang Ma, Di Zhang, Pengfei Wan, Xiaoguang Han

    Abstract: Although humans have the innate ability to imagine multiple possible actions from videos, it remains an extraordinary challenge for computers due to the intricate camera movements and montages. Most existing motion generation methods predominantly rely on manually collected motion datasets, usually tediously sourced from motion capture (Mocap) systems or Multi-View cameras, unavoidably resulting i… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    MSC Class: 68Txx

  43. arXiv:2408.06603  [pdf, other

    cs.AI

    Simple but Effective Compound Geometric Operations for Temporal Knowledge Graph Completion

    Authors: Rui Ying, Mengting Hu, Jianfeng Wu, Yalan Xie, Xiaoyi Liu, Zhunheng Wang, Ming Jiang, Hang Gao, Linlin Zhang, Renhong Cheng

    Abstract: Temporal knowledge graph completion aims to infer the missing facts in temporal knowledge graphs. Current approaches usually embed factual knowledge into continuous vector space and apply geometric operations to learn potential patterns in temporal knowledge graphs. However, these methods only adopt a single operation, which may have limitations in capturing the complex temporal dynamics present i… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  44. arXiv:2408.06321  [pdf, other

    cs.RO cs.CV

    EqNIO: Subequivariant Neural Inertial Odometry

    Authors: Royina Karegoudra Jayanth, Yinshuang Xu, Ziyun Wang, Evangelos Chatzipantazis, Daniel Gehrig, Kostas Daniilidis

    Abstract: Neural networks are seeing rapid adoption in purely inertial odometry, where accelerometer and gyroscope measurements from commodity inertial measurement units (IMU) are used to regress displacements and associated uncertainties. They can learn informative displacement priors, which can be directly fused with the raw data with off-the-shelf non-linear filters. Nevertheless, these networks do not c… ▽ More

    Submitted 18 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: 27 pages

  45. arXiv:2408.06029  [pdf, other

    cs.LG

    Graph Clustering with Cross-View Feature Propagation

    Authors: Zhixuan Duan, Zuo Wang, Fanghui Bi

    Abstract: Graph clustering is a fundamental and challenging learning task, which is conventionally approached by grouping similar vertices based on edge structure and feature similarity.In contrast to previous methods, in this paper, we investigate how multi-view feature propagation can influence cluster discovery in graph data.To this end, we present Graph Clustering With Cross-View Feature Propagation (GC… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  46. arXiv:2408.06018  [pdf, other

    cs.GR cs.AI cs.CV cs.LG

    Uncertainty-Informed Volume Visualization using Implicit Neural Representation

    Authors: Shanu Saklani, Chitwan Goel, Shrey Bansal, Zhe Wang, Soumya Dutta, Tushar M. Athawale, David Pugmire, Christopher R. Johnson

    Abstract: The increasing adoption of Deep Neural Networks (DNNs) has led to their application in many challenging scientific visualization tasks. While advanced DNNs offer impressive generalization capabilities, understanding factors such as model prediction quality, robustness, and uncertainty is crucial. These insights can enable domain scientists to make informed decisions about their data. However, DNNs… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: To appear in IEEE Workshop on Uncertainty Visualization in conjunction with IEEE VIS 2024, Florida, USA

  47. arXiv:2408.05945  [pdf, other

    cs.CV

    MV2DFusion: Leveraging Modality-Specific Object Semantics for Multi-Modal 3D Detection

    Authors: Zitian Wang, Zehao Huang, Yulu Gao, Naiyan Wang, Si Liu

    Abstract: The rise of autonomous vehicles has significantly increased the demand for robust 3D object detection systems. While cameras and LiDAR sensors each offer unique advantages--cameras provide rich texture information and LiDAR offers precise 3D spatial data--relying on a single modality often leads to performance limitations. This paper introduces MV2DFusion, a multi-modal detection framework that in… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  48. arXiv:2408.05808  [pdf, other

    cs.RO cs.MA

    Fast and Communication-Efficient Multi-UAV Exploration Via Voronoi Partition on Dynamic Topological Graph

    Authors: Qianli Dong, Haobo Xi, Shiyong Zhang, Qingchen Bi, Tianyi Li, Ziyu Wang, Xuebo Zhang

    Abstract: Efficient data transmission and reasonable task allocation are important to improve multi-robot exploration efficiency. However, most communication data types typically contain redundant information and thus require massive communication volume. Moreover, exploration-oriented task allocation is far from trivial and becomes even more challenging for resource-limited unmanned aerial vehicles (UAVs).… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 8 pages, 8 figures, accepted by IEEE IROS2024, code see https://github.com/NKU-MobFly-Robotics/GVP-MREP

  49. arXiv:2408.05798  [pdf, other

    q-bio.NC cs.AI cs.LG cs.NE

    Time Makes Space: Emergence of Place Fields in Networks Encoding Temporally Continuous Sensory Experiences

    Authors: Zhaoze Wang, Ronald W. Di Tullio, Spencer Rooke, Vijay Balasubramanian

    Abstract: The vertebrate hippocampus is believed to use recurrent connectivity in area CA3 to support episodic memory recall from partial cues. This brain area also contains place cells, whose location-selective firing fields implement maps supporting spatial memory. Here we show that place cells emerge in networks trained to remember temporally continuous sensory episodes. We model CA3 as a recurrent autoe… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  50. arXiv:2408.05752  [pdf, other

    cs.CV

    RTF-Q: Unsupervised domain adaptation based retraining-free quantization network

    Authors: Nanyang Du, Chen Tang, Yuan Meng, Zhi Wang

    Abstract: Performing unsupervised domain adaptation on resource-constrained edge devices is a significant task. Although existing research allows edge devices to use subnets with different computational budgets for inference, they often require expensive pre-training and do not consider the issues of parameter precision redundancy in the model, which is not conducive to the deployment of the model on edge d… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.