Skip to main content

Showing 1–50 of 7,172 results for author: Liu, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10853  [pdf, other

    cs.SD cs.AI eess.AS

    Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?

    Authors: Yuankun Xie, Chenxu Xiong, Xiaopeng Wang, Zhiyong Wang, Yi Lu, Xin Qi, Ruibo Fu, Yukun Liu, Zhengqi Wen, Jianhua Tao, Guanjun Li, Long Ye

    Abstract: Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society. Consequently, effective audio deepfake detection technologies to detect ALM-based a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2408.10852  [pdf, other

    cs.SD eess.AS

    EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

    Authors: Xin Qi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Shuchen Shi, Yi Lu, Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Yukun Liu, Guanjun Li, Xuefei Liu, Yongwei Li

    Abstract: In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  3. arXiv:2408.10849  [pdf, other

    cs.SD eess.AS

    A Noval Feature via Color Quantisation for Fake Audio Detection

    Authors: Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Yukun Liu, Guanjun Li, Xin Qi, Yi Lu, Xuefei Liu, Yongwei Li

    Abstract: In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features, such as wav2vec2.0 and Masked Auto Encoder. These methods have proven that using real audio for reconstruction pre-training can better help the model… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: accepted by ISCSLP2024

  4. arXiv:2408.10848  [pdf, other

    cs.CV

    Perception-guided Jailbreak against Text-to-Image Models

    Authors: Yihao Huang, Le Liang, Tianlin Li, Xiaojun Jia, Run Wang, Weikai Miao, Geguang Pu, Yang Liu

    Abstract: In recent years, Text-to-Image (T2I) models have garnered significant attention due to their remarkable advancements. However, security concerns have emerged due to their potential to generate inappropriate or Not-Safe-For-Work (NSFW) images. In this paper, inspired by the observation that texts with different semantics can lead to similar human perceptions, we propose an LLM-driven perception-gui… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 8 pages

  5. arXiv:2408.10706  [pdf, ps, other

    cs.IT eess.SP

    Performance Analysis of Physical Layer Security: From Far-Field to Near-Field

    Authors: Boqun Zhao, Chongjun Ouyang, Xingqi Zhang, Yuanwei Liu

    Abstract: The secrecy performance in both near-field and far-field communications is analyzed using two fundamental metrics: the secrecy capacity under a power constraint and the minimum power requirement to achieve a specified secrecy rate target. 1) For the secrecy capacity, a closed-form expression is derived under a discrete-time memoryless setup. This expression is further analyzed under several far-fi… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  6. arXiv:2408.10657  [pdf, other

    cs.CR cs.AI

    ETGuard: Malicious Encrypted Traffic Detection in Blockchain-based Power Grid Systems

    Authors: Peng Zhou, Yongdong Liu, Lixun Ma, Weiye Zhang, Haohan Tan, Zhenguang Liu, Butian Huang

    Abstract: The escalating prevalence of encryption protocols has led to a concomitant surge in the number of malicious attacks that hide in encrypted traffic. Power grid systems, as fundamental infrastructure, are becoming prime targets for such attacks. Conventional methods for detecting malicious encrypted packets typically use a static pre-trained model. We observe that these methods are not well-suited f… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  7. arXiv:2408.10647  [pdf, other

    cs.LG cs.AI cs.CR

    Privacy-preserving Universal Adversarial Defense for Black-box Models

    Authors: Qiao Li, Cong Wu, Jing Chen, Zijun Zhang, Kun He, Ruiying Du, Xinxin Wang, Qingchuang Zhao, Yang Liu

    Abstract: Deep neural networks (DNNs) are increasingly used in critical applications such as identity authentication and autonomous driving, where robustness against adversarial attacks is crucial. These attacks can exploit minor perturbations to cause significant prediction errors, making it essential to enhance the resilience of DNNs. Traditional defense methods often rely on access to detailed model info… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 12 pages, 9 figures

    MSC Class: I.2.10

  8. arXiv:2408.10645  [pdf, other

    cs.IR cs.LG

    CoRA: Collaborative Information Perception by Large Language Model's Weights for Recommendation

    Authors: Yuting Liu, Jinghao Zhang, Yizhou Dang, Yuliang Liang, Qiang Liu, Guibing Guo, Jianzhe Zhao, Xingwei Wang

    Abstract: Involving collaborative information in Large Language Models (LLMs) is a promising technique for adapting LLMs for recommendation. Existing methods achieve this by concatenating collaborative features with text tokens into a unified sequence input and then fine-tuning to align these features with LLM's input space. Although effective, in this work, we identify two limitations when adapting LLMs to… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  9. arXiv:2408.10599  [pdf, other

    hep-ex cs.CV

    Vision Calorimeter for Anti-neutron Reconstruction: A Baseline

    Authors: Hongtian Yu, Yangu Li, Mingrui Wu, Letian Shen, Yue Liu, Yunxuan Song, Qixiang Ye, Xiaorui Lyu, Yajun Mao, Yangheng Zheng, Yunfan Liu

    Abstract: In high-energy physics, anti-neutrons ($\bar{n}$) are fundamental particles that frequently appear as final-state particles, and the reconstruction of their kinematic properties provides an important probe for understanding the governing principles. However, this confronts significant challenges instrumentally with the electromagnetic calorimeter (EMC), a typical experimental sensor but recovering… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  10. arXiv:2408.10479  [pdf, other

    cs.LG cs.AI

    An End-to-End Reinforcement Learning Based Approach for Micro-View Order-Dispatching in Ride-Hailing

    Authors: Xinlang Yue, Yiran Liu, Fangzhou Shi, Sihong Luo, Chen Zhong, Min Lu, Zhe Xu

    Abstract: Assigning orders to drivers under localized spatiotemporal context (micro-view order-dispatching) is a major task in Didi, as it influences ride-hailing service experience. Existing industrial solutions mainly follow a two-stage pattern that incorporate heuristic or learning-based algorithms with naive combinatorial methods, tackling the uncertainty of both sides' behaviors, including emerging tim… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 8 pages, 4 figures

  11. arXiv:2408.10229  [pdf

    cs.CY cs.IR

    AI Transparency in Academic Search Systems: An Initial Exploration

    Authors: Yifan Liu, Peter Sullivan, Luanne Sinnamon

    Abstract: As AI-enhanced academic search systems become increasingly popular among researchers, investigating their AI transparency is crucial to ensure trust in the search outcomes, as well as the reliability and integrity of scholarly work. This study employs a qualitative content analysis approach to examine the websites of a sample of 10 AI-enhanced academic search systems identified through university… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  12. arXiv:2408.10207  [pdf, other

    cs.CV

    A Comprehensive Survey on Diffusion Models and Their Applications

    Authors: Md Manjurul Ahsan, Shivakumar Raman, Yingtao Liu, Zahed Siddique

    Abstract: Diffusion Models are probabilistic models that create realistic samples by simulating the diffusion process, gradually adding and removing noise from data. These models have gained popularity in domains such as image processing, speech synthesis, and natural language processing due to their ability to produce high-quality samples. As Diffusion Models are being adopted in various domains, existing… ▽ More

    Submitted 1 July, 2024; originally announced August 2024.

  13. arXiv:2408.10195  [pdf, other

    cs.CV cs.AI cs.GR

    SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views

    Authors: Chao Xu, Ang Li, Linghao Chen, Yulin Liu, Ruoxi Shi, Hao Su, Minghua Liu

    Abstract: Open-world 3D generation has recently attracted considerable attention. While many single-image-to-3D methods have yielded visually appealing outcomes, they often lack sufficient controllability and tend to produce hallucinated regions that may not align with users' expectations. In this paper, we explore an important scenario in which the input consists of one or a few unposed 2D images of a sing… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  14. arXiv:2408.10116  [pdf, other

    cs.SE

    Vulseye: Detect Smart Contract Vulnerabilities via Stateful Directed Graybox Fuzzing

    Authors: Ruichao Liang, Jing Chen, Cong Wu, Kun He, Yueming Wu, Ruochen Cao, Ruiying Du, Yang Liu, Ziming Zhao

    Abstract: Smart contracts, the cornerstone of decentralized applications, have become increasingly prominent in revolutionizing the digital landscape. However, vulnerabilities in smart contracts pose great risks to user assets and undermine overall trust in decentralized systems. But current smart contract fuzzers fall short of expectations in testing efficiency for two primary reasons. Firstly, smart contr… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Submitted to TIFS

  15. arXiv:2408.10067  [pdf, other

    eess.IV cs.CV

    Towards a Benchmark for Colorectal Cancer Segmentation in Endorectal Ultrasound Videos: Dataset and Model Development

    Authors: Yuncheng Jiang, Yiwen Hu, Zixun Zhang, Jun Wei, Chun-Mei Feng, Xuemei Tang, Xiang Wan, Yong Liu, Shuguang Cui, Zhen Li

    Abstract: Endorectal ultrasound (ERUS) is an important imaging modality that provides high reliability for diagnosing the depth and boundary of invasion in colorectal cancer. However, the lack of a large-scale ERUS dataset with high-quality annotations hinders the development of automatic ultrasound diagnostics. In this paper, we collected and annotated the first benchmark dataset that covers diverse ERUS s… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  16. arXiv:2408.10007  [pdf, other

    cs.CV

    P3P: Pseudo-3D Pre-training for Scaling 3D Masked Autoencoders

    Authors: Xuechao Chen, Ying Chen, Jialin Li, Qiang Nie, Yong Liu, Qixing Huang, Yang Li

    Abstract: 3D pre-training is crucial to 3D perception tasks. However, limited by the difficulties in collecting clean 3D data, 3D pre-training consistently faced data scaling challenges. Inspired by semi-supervised learning leveraging limited labeled data and a large amount of unlabeled data, in this work, we propose a novel self-supervised pre-training framework utilizing the real 3D data and the pseudo-3D… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Under review. Pre-print

  17. arXiv:2408.09624  [pdf, ps, other

    cs.AI cs.LG math.NA

    Attention is a smoothed cubic spline

    Authors: Zehua Lai, Lek-Heng Lim, Yucong Liu

    Abstract: We highlight a perhaps important but hitherto unobserved insight: The attention module in a transformer is a smoothed cubic spline. Viewed in this manner, this mysterious but critical component of a transformer becomes a natural development of an old notion deeply entrenched in classical approximation theory. More precisely, we show that with ReLU-activation, attention, masked attention, encoder-d… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 20 pages, 2 figures

    MSC Class: 26B40; 41A15; 65D07; 68T01; 14P10; 13J30

  18. arXiv:2408.09615  [pdf, other

    cs.CV

    The First Competition on Resource-Limited Infrared Small Target Detection Challenge: Methods and Results

    Authors: Boyang Li, Xinyi Ying, Ruojing Li, Yongxian Liu, Yangsi Shi, Miao Li

    Abstract: In this paper, we briefly summarize the first competition on resource-limited infrared small target detection (namely, LimitIRSTD). This competition has two tracks, including weakly-supervised infrared small target detection (Track 1) and lightweight infrared small target detection (Track 2). 46 and 60 teams successfully registered and took part in Tracks 1 and Track 2, respectively. The top-perfo… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  19. arXiv:2408.09474  [pdf, other

    cs.CR cs.CL cs.CV

    Image-Based Geolocation Using Large Vision-Language Models

    Authors: Yi Liu, Junchen Ding, Gelei Deng, Yuekang Li, Tianwei Zhang, Weisong Sun, Yaowen Zheng, Jingquan Ge, Yang Liu

    Abstract: Geolocation is now a vital aspect of modern life, offering numerous benefits but also presenting serious privacy concerns. The advent of large vision-language models (LVLMs) with advanced image-processing capabilities introduces new risks, as these models can inadvertently reveal sensitive geolocation information. This paper presents the first in-depth study analyzing the challenges posed by tradi… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  20. arXiv:2408.09397  [pdf, other

    cs.CV

    Combo: Co-speech holistic 3D human motion generation and efficient customizable adaptation in harmony

    Authors: Chao Xu, Mingze Sun, Zhi-Qi Cheng, Fei Wang, Yang Liu, Baigui Sun, Ruqi Huang, Alexander Hauptmann

    Abstract: In this paper, we propose a novel framework, Combo, for harmonious co-speech holistic 3D human motion generation and efficient customizable adaption. In particular, we identify that one fundamental challenge as the multiple-input-multiple-output (MIMO) nature of the generative model of interest. More concretely, on the input end, the model typically consumes both speech signals and character guida… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  21. arXiv:2408.09380   

    cs.AI cs.IR

    ELASTIC: Efficient Linear Attention for Sequential Interest Compression

    Authors: Jiaxin Deng, Shiyao Wang, Song Lu, Yinfeng Li, Xinchen Luo, Yuanjun Liu, Peixing Xu, Guorui Zhou

    Abstract: State-of-the-art sequential recommendation models heavily rely on transformer's attention mechanism. However, the quadratic computational and memory complexities of self attention have limited its scalability for modeling users' long range behaviour sequences. To address this problem, we propose ELASTIC, an Efficient Linear Attention for SequenTial Interest Compression, requiring only linear time… ▽ More

    Submitted 20 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: We hereby withdraw this paper from arXiv due to incomplete experiments. Upon further review, we have determined that additional experimental work is necessary to fully validate our findings and conclusions

  22. arXiv:2408.09352  [pdf, ps, other

    cs.CC cs.DM

    Parallel Repetition for $3$-Player XOR Games

    Authors: Amey Bhangale, Mark Braverman, Subhash Khot, Yang P. Liu, Dor Minzer

    Abstract: In a $3$-$\mathsf{XOR}$ game $\mathcal{G}$, the verifier samples a challenge $(x,y,z)\sim μ$ where $μ$ is a probability distribution over $Σ\timesΓ\timesΦ$, and a map $t\colon Σ\timesΓ\timesΦ\to\mathcal{A}$ for a finite Abelian group $\mathcal{A}$ defining a constraint. The verifier sends the questions $x$, $y$ and $z$ to the players Alice, Bob and Charlie respectively, receives answers $f(x)$,… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  23. arXiv:2408.09326  [pdf, other

    cs.CL cs.AI cs.SE

    Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks

    Authors: Kexin Chen, Yi Liu, Dongxia Wang, Jiaying Chen, Wenhai Wang

    Abstract: Large Language Models (LLMs) have increasingly become pivotal in content generation with notable societal impact. These models hold the potential to generate content that could be deemed harmful.Efforts to mitigate this risk include implementing safeguards to ensure LLMs adhere to social ethics.However, despite such measures, the phenomenon of "jailbreaking" -- where carefully crafted prompts elic… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  24. arXiv:2408.09186  [pdf, other

    cs.HC cs.AI

    EEG-SCMM: Soft Contrastive Masked Modeling for Cross-Corpus EEG-Based Emotion Recognition

    Authors: Qile Liu, Weishan Ye, Yulu Liu, Zhen Liang

    Abstract: Emotion recognition using electroencephalography (EEG) signals has garnered widespread attention in recent years. However, existing studies have struggled to develop a sufficiently generalized model suitable for different datasets without re-training (cross-corpus). This difficulty arises because distribution differences across datasets far exceed the intra-dataset variability. To solve this probl… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 16 pages, 8 figures, 15 tables, submitted to AAAI 2025

  25. arXiv:2408.09130  [pdf, other

    cs.CV

    Gaussian in the Dark: Real-Time View Synthesis From Inconsistent Dark Images Using Gaussian Splatting

    Authors: Sheng Ye, Zhen-Hui Dong, Yubin Hu, Yu-Hui Wen, Yong-Jin Liu

    Abstract: 3D Gaussian Splatting has recently emerged as a powerful representation that can synthesize remarkable novel views using consistent multi-view images as input. However, we notice that images captured in dark environments where the scenes are not fully illuminated can exhibit considerable brightness variations and multi-view inconsistency, which poses great challenges to 3D Gaussian Splatting and s… ▽ More

    Submitted 20 August, 2024; v1 submitted 17 August, 2024; originally announced August 2024.

    Comments: accepted by PG 2024

  26. arXiv:2408.09115  [pdf, other

    cs.CV

    GoodSAM++: Bridging Domain and Capacity Gaps via Segment Anything Model for Panoramic Semantic Segmentation

    Authors: Weiming Zhang, Yexin Liu, Xu Zheng, Lin Wang

    Abstract: This paper presents GoodSAM++, a novel framework utilizing the powerful zero-shot instance segmentation capability of SAM (i.e., teacher) to learn a compact panoramic semantic segmentation model, i.e., student, without requiring any labeled data. GoodSAM++ addresses two critical challenges: 1) SAM's inability to provide semantic labels and inherent distortion problems of panoramic images; 2) the s… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 15 pages, under review. arXiv admin note: substantial text overlap with arXiv:2403.16370

  27. arXiv:2408.09110  [pdf, other

    cs.CV

    Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community

    Authors: Jiancheng Pan, Yanxing Liu, Yuqian Fu, Muyuan Ma, Jiaohao Li, Danda Pani Paudel, Luc Van Gool, Xiaomeng Huang

    Abstract: Object detection, particularly open-vocabulary object detection, plays a crucial role in Earth sciences, such as environmental monitoring, natural disaster assessment, and land-use planning. However, existing open-vocabulary detectors, primarily trained on natural-world images, struggle to generalize to remote sensing images due to a significant data domain gap. Thus, this paper aims to advance th… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures

  28. arXiv:2408.08978  [pdf, other

    cs.CL

    See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses

    Authors: Yulong Chen, Yang Liu, Jianhao Yan, Xuefeng Bai, Ming Zhong, Yinghao Yang, Ziyi Yang, Chenguang Zhu, Yue Zhang

    Abstract: The impressive performance of Large Language Models (LLMs) has consistently surpassed numerous human-designed benchmarks, presenting new challenges in assessing the shortcomings of LLMs. Designing tasks and finding LLMs' limitations are becoming increasingly important. In this paper, we investigate the question of whether an LLM can discover its own limitations from the errors it makes. To this en… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: COLM 2024

  29. arXiv:2408.08921  [pdf, other

    cs.AI cs.CL cs.IR

    Graph Retrieval-Augmented Generation: A Survey

    Authors: Boci Peng, Yun Zhu, Yongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Yan Zhang, Siliang Tang

    Abstract: Recently, Retrieval-Augmented Generation (RAG) has achieved remarkable success in addressing the challenges of Large Language Models (LLMs) without necessitating retraining. By referencing an external knowledge base, RAG refines LLM outputs, effectively mitigating issues such as ``hallucination'', lack of domain-specific knowledge, and outdated information. However, the complex structure of relati… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Ongoing work

  30. arXiv:2408.08815  [pdf, other

    cs.LG

    An Empirical Examination of Balancing Strategy for Counterfactual Estimation on Time Series

    Authors: Qiang Huang, Chuizheng Meng, Defu Cao, Biwei Huang, Yi Chang, Yan Liu

    Abstract: Counterfactual estimation from observations represents a critical endeavor in numerous application fields, such as healthcare and finance, with the primary challenge being the mitigation of treatment bias. The balancing strategy aimed at reducing covariate disparities between different treatment groups serves as a universal solution. However, when it comes to the time series data, the effectivenes… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: ICML 2024 Carema Ready Version. 20 Pages, 12 Figures, 10 Tables

  31. arXiv:2408.08813  [pdf, other

    cs.CV

    Retrieval-augmented Few-shot Medical Image Segmentation with Foundation Models

    Authors: Lin Zhao, Xiao Chen, Eric Z. Chen, Yikang Liu, Terrence Chen, Shanhui Sun

    Abstract: Medical image segmentation is crucial for clinical decision-making, but the scarcity of annotated data presents significant challenges. Few-shot segmentation (FSS) methods show promise but often require retraining on the target domain and struggle to generalize across different modalities. Similarly, adapting foundation models like the Segment Anything Model (SAM) for medical imaging has limitatio… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  32. arXiv:2408.08724  [pdf, other

    cs.CL

    ChatZero:Zero-shot Cross-Lingual Dialogue Generation via Pseudo-Target Language

    Authors: Yongkang Liu, Feng Shi, Daling Wang, Yifei Zhang, Hinrich Schütze

    Abstract: Although large language models(LLMs) show amazing capabilities, among various exciting applications discovered for LLMs fall short in other low-resource languages. Besides, most existing methods depend on large-scale dialogue corpora and thus building systems for dialogue generation in a zero-shot scenario remains a considerable challenge. To address this challenge, we propose a novel end-to-end z… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: ECAI2024

    Journal ref: ECAI2024

  33. arXiv:2408.08601  [pdf, other

    cs.CV

    Learning A Low-Level Vision Generalist via Visual Task Prompt

    Authors: Xiangyu Chen, Yihao Liu, Yuandong Pu, Wenlong Zhang, Jiantao Zhou, Yu Qiao, Chao Dong

    Abstract: Building a unified model for general low-level vision tasks holds significant research and practical value. Current methods encounter several critical issues. Multi-task restoration approaches can address multiple degradation-to-clean restoration tasks, while their applicability to tasks with different target domains (e.g., image stylization) is limited. Methods like PromptGIP can handle multiple… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Accepted to ACMMM24

  34. arXiv:2408.08536  [pdf, other

    cs.SE cs.LG

    Blockchain-Enabled Accountability in Data Supply Chain: A Data Bill of Materials Approach

    Authors: Yue Liu, Dawen Zhang, Boming Xia, Julia Anticev, Tunde Adebayo, Zhenchang Xing, Moses Machao

    Abstract: In the era of advanced artificial intelligence, highlighted by large-scale generative models like GPT-4, ensuring the traceability, verifiability, and reproducibility of datasets throughout their lifecycle is paramount for research institutions and technology companies. These organisations increasingly rely on vast corpora to train and fine-tune advanced AI models, resulting in intricate data supp… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  35. arXiv:2408.08518  [pdf, other

    cs.CV

    Visual-Friendly Concept Protection via Selective Adversarial Perturbations

    Authors: Xiaoyue Mi, Fan Tang, Juan Cao, Peng Li, Yang Liu

    Abstract: Personalized concept generation by tuning diffusion models with a few images raises potential legal and ethical concerns regarding privacy and intellectual property rights. Researchers attempt to prevent malicious personalization using adversarial perturbations. However, previous efforts have mainly focused on the effectiveness of protection while neglecting the visibility of perturbations. They u… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Under Review

  36. arXiv:2408.08322  [pdf, other

    eess.SP cs.IT

    Movable-Antenna Position Optimization for Physical-Layer Security via Discrete Sampling

    Authors: Weidong Mei, Xin Wei, Yijie Liu, Boyu Ning, Zhi Chen

    Abstract: Fluid antennas (FAs) and mobile antennas (MAs) are innovative technologies in wireless communications that are able to proactively improve channel conditions by dynamically adjusting the transmit/receive antenna positions within a given spatial region. In this paper, we investigate an MA-enhanced multiple-input single-output (MISO) secure communication system, aiming to maximize the secrecy rate b… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: This paper is accepted by IEEE Globecom 2024. arXiv admin note: substantial text overlap with arXiv:2403.16886

  37. arXiv:2408.08134  [pdf, other

    cs.CV

    CorrAdaptor: Adaptive Local Context Learning for Correspondence Pruning

    Authors: Wei Zhu, Yicheng Liu, Yuping He, Tangfei Liao, Kang Zheng, Xiaoqiu Xu, Tao Wang, Tong Lu

    Abstract: In the fields of computer vision and robotics, accurate pixel-level correspondences are essential for enabling advanced tasks such as structure-from-motion and simultaneous localization and mapping. Recent correspondence pruning methods usually focus on learning local consistency through k-nearest neighbors, which makes it difficult to capture robust context for each correspondence. We propose Cor… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 8 pages, 4 figures, accepted by ECAI

  38. arXiv:2408.07890  [pdf, other

    stat.ML cs.LG

    Local Causal Discovery with Background Knowledge

    Authors: Qingyuan Zheng, Yue Liu, Yangbo He

    Abstract: Causality plays a pivotal role in various fields of study. Based on the framework of causal graphical models, previous works have proposed identifying whether a variable is a cause or non-cause of a target in every Markov equivalent graph solely by learning a local structure. However, the presence of prior knowledge, often represented as a partially known causal graph, is common in many causal mod… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  39. arXiv:2408.07889  [pdf, other

    cs.CV

    MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

    Authors: Simiao Lai, Chang Liu, Jiawen Zhu, Ben Kang, Yang Liu, Dong Wang, Huchuan Lu

    Abstract: Existing RGB-T tracking algorithms have made remarkable progress by leveraging the global interaction capability and extensive pre-trained models of the Transformer architecture. Nonetheless, these methods mainly adopt imagepair appearance matching and face challenges of the intrinsic high quadratic complexity of the attention mechanism, resulting in constrained exploitation of temporal informatio… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  40. Physically Aware Synthesis Revisited: Guiding Technology Mapping with Primitive Logic Gate Placement

    Authors: Hongyang Pan, Cunqing Lan, Yiting Liu, Zhiang Wang, Li Shang, Xuan Zeng, Fan Yang, Keren Zhu

    Abstract: A typical VLSI design flow is divided into separated front-end logic synthesis and back-end physical design (PD) stages, which often require costly iterations between these stages to achieve design closure. Existing approaches face significant challenges, notably in utilizing feedback from physical metrics to better adapt and refine synthesis operations, and in establishing a unified and comprehen… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 9 pages, 8 figures, 2 tables

    Journal ref: 2024 International Conference on Computer-Aided Design, New Jersey, NY, USA, Oct 2024

  41. arXiv:2408.07663  [pdf, other

    cs.CL cs.AI

    Alignment-Enhanced Decoding:Defending via Token-Level Adaptive Refining of Probability Distributions

    Authors: Quan Liu, Zhenhong Zhou, Longzhu He, Yi Liu, Wei Zhang, Sen Su

    Abstract: Large language models are susceptible to jailbreak attacks, which can result in the generation of harmful content. While prior defenses mitigate these risks by perturbing or inspecting inputs, they ignore competing objectives, the underlying cause of alignment failures. In this paper, we propose Alignment-Enhanced Decoding (AED), a novel defense that employs adaptive decoding to address the root c… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 15 pages, 5 figures

  42. arXiv:2408.07654  [pdf, other

    cs.LG

    Graph Triple Attention Network: A Decoupled Perspective

    Authors: Xiaotang Wang, Yun Zhu, Haizhou Shi, Yongchao Liu, Chuntao Hong

    Abstract: Graph Transformers (GTs) have recently achieved significant success in the graph domain by effectively capturing both long-range dependencies and graph inductive biases. However, these methods face two primary challenges: (1) multi-view chaos, which results from coupling multi-view information (positional, structural, attribute), thereby impeding flexible usage and the interpretability of the prop… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  43. arXiv:2408.07611  [pdf, other

    cs.CL cs.IR

    WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs

    Authors: Weijian Xie, Xuefeng Liang, Yuhui Liu, Kaihua Ni, Hong Cheng, Zetian Hu

    Abstract: Large Language Models (LLMs) have greatly contributed to the development of adaptive intelligent agents and are positioned as an important way to achieve Artificial General Intelligence (AGI). However, LLMs are prone to produce factually incorrect information and often produce "phantom" content that undermines their reliability, which poses a serious challenge for their deployment in real-world sc… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 8 pages, 2 figures, technical report for 3rd place in Task 3 of Meta KDD Cup 2024 CRAG Challenge

  44. arXiv:2408.07605  [pdf, other

    cs.CV

    Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving

    Authors: Yuqing Wen, Yucheng Zhao, Yingfei Liu, Binyuan Huang, Fan Jia, Yanhui Wang, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang

    Abstract: The field of autonomous driving increasingly demands high-quality annotated video training data. In this paper, we propose Panacea+, a powerful and universally applicable framework for generating video data in driving scenes. Built upon the foundation of our previous work, Panacea, Panacea+ adopts a multi-view appearance noise prior mechanism and a super-resolution module for enhanced consistency… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Project page: https://panacea-ad.github.io/. arXiv admin note: text overlap with arXiv:2311.16813

  45. arXiv:2408.07540  [pdf, other

    cs.CV cs.MM

    3D Gaussian Editing with A Single Image

    Authors: Guan Luo, Tian-Xing Xu, Ying-Tian Liu, Xiao-Xiong Fan, Fang-Lue Zhang, Song-Hai Zhang

    Abstract: The modeling and manipulation of 3D scenes captured from the real world are pivotal in various applications, attracting growing research interest. While previous works on editing have achieved interesting results through manipulating 3D meshes, they often require accurately reconstructed meshes to perform editing, which limits their application in 3D content generation. To address this gap, we int… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 10 pages, 12 figures

  46. arXiv:2408.07444  [pdf, other

    eess.IV cs.CV

    Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark

    Authors: Senmao Wang, Haifan Gong, Runmeng Cui, Boyao Wan, Yicheng Liu, Zhonglin Hu, Haiqing Yang, Jingyang Zhou, Bo Pan, Lin Lin, Haiyue Jiang

    Abstract: Costal cartilage segmentation is crucial to various medical applications, necessitating precise and reliable techniques due to its complex anatomy and the importance of accurate diagnosis and surgical planning. We propose a novel deep learning-based approach called topology-guided deformable Mamba (TGDM) for costal cartilage segmentation. The TGDM is tailored to capture the intricate long-range co… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  47. arXiv:2408.07367  [pdf, other

    cs.RO

    Risk Occupancy: A New and Efficient Paradigm through Vehicle-Road-Cloud Collaboration

    Authors: Jiaxing Chen, Wei Zhong, Bolin Gao, Yifei Liu, Hengduo Zou, Jiaxi Liu, Yanbo Lu, Jin Huang, Zhihua Zhong

    Abstract: This study introduces the 4D Risk Occupancy within a vehicle-road-cloud architecture, integrating the road surface spatial, risk, and temporal dimensions, and endowing the algorithm with beyond-line-of-sight, all-angles, and efficient abilities. The algorithm simplifies risk modeling by focusing on directly observable information and key factors, drawing on the concept of Occupancy Grid Maps (OGM)… ▽ More

    Submitted 17 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: 13 pages,9 figures

  48. arXiv:2408.07291  [pdf, other

    cs.CR

    Evaluating Large Language Model based Personal Information Extraction and Countermeasures

    Authors: Yupei Liu, Yuqi Jia, Jinyuan Jia, Neil Zhenqiang Gong

    Abstract: Automatically extracting personal information--such as name, phone number, and email address--from publicly available profiles at a large scale is a stepstone to many other security attacks including spear phishing. Traditional methods--such as regular expression, keyword search, and entity detection--achieve limited success at such personal information extraction. In this work, we perform a syste… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  49. arXiv:2408.07266  [pdf, other

    cs.CV cs.RO

    Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling

    Authors: Ruofeng Wei, Bin Li, Kai Chen, Yiyao Ma, Yunhui Liu, Qi Dou

    Abstract: Scale-aware monocular depth estimation poses a significant challenge in computer-aided endoscopic navigation. However, existing depth estimation methods that do not consider the geometric priors struggle to learn the absolute scale from training with monocular endoscopic sequences. Additionally, conventional methods face difficulties in accurately estimating details on tissue and instruments bound… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  50. arXiv:2408.07009  [pdf, other

    cs.CV

    Imagen 3

    Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis, Siavash Khodadadeh , et al. (227 additional authors not shown)

    Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

    Submitted 13 August, 2024; originally announced August 2024.