Skip to main content

Showing 1–50 of 1,457 results for author: Li, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10853  [pdf, other

    cs.SD cs.AI eess.AS

    Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?

    Authors: Yuankun Xie, Chenxu Xiong, Xiaopeng Wang, Zhiyong Wang, Yi Lu, Xin Qi, Ruibo Fu, Yukun Liu, Zhengqi Wen, Jianhua Tao, Guanjun Li, Long Ye

    Abstract: Currently, Audio Language Models (ALMs) are rapidly advancing due to the developments in large language models and audio neural codecs. These ALMs have significantly lowered the barrier to creating deepfake audio, generating highly realistic and diverse types of deepfake audio, which pose severe threats to society. Consequently, effective audio deepfake detection technologies to detect ALM-based a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2408.10852  [pdf, other

    cs.SD eess.AS

    EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

    Authors: Xin Qi, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Shuchen Shi, Yi Lu, Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Yukun Liu, Guanjun Li, Xuefei Liu, Yongwei Li

    Abstract: In the current era of Artificial Intelligence Generated Content (AIGC), a Low-Rank Adaptation (LoRA) method has emerged. It uses a plugin-based approach to learn new knowledge with lower parameter quantities and computational costs, and it can be plugged in and out based on the specific sub-tasks, offering high flexibility. However, the current application schemes primarily incorporate LoRA into t… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  3. arXiv:2408.10849  [pdf, other

    cs.SD eess.AS

    A Noval Feature via Color Quantisation for Fake Audio Detection

    Authors: Zhiyong Wang, Xiaopeng Wang, Yuankun Xie, Ruibo Fu, Zhengqi Wen, Jianhua Tao, Yukun Liu, Guanjun Li, Xin Qi, Yi Lu, Xuefei Liu, Yongwei Li

    Abstract: In the field of deepfake detection, previous studies focus on using reconstruction or mask and prediction methods to train pre-trained models, which are then transferred to fake audio detection training where the encoder is used to extract features, such as wav2vec2.0 and Masked Auto Encoder. These methods have proven that using real audio for reconstruction pre-training can better help the model… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: accepted by ISCSLP2024

  4. arXiv:2408.10575  [pdf, other

    cs.CV

    MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval

    Authors: Haoran Tang, Meng Cao, Jinfa Huang, Ruyang Liu, Peng Jin, Ge Li, Xiaodan Liang

    Abstract: Text-Video Retrieval (TVR) aims to align and associate relevant video content with corresponding natural language queries. Most existing TVR methods are based on large-scale pre-trained vision-language models (e.g., CLIP). However, due to the inherent plain structure of CLIP, few TVR methods explore the multi-scale representations which offer richer contextual information for a more thorough under… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 8 pages

  5. arXiv:2408.10473  [pdf, other

    cs.CL cs.LG

    Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism

    Authors: Guanchen Li, Xiandong Zhao, Lian Liu, Zeping Li, Dong Li, Lu Tian, Jie He, Ashish Sirasao, Emad Barsoum

    Abstract: Pre-trained language models (PLMs) are engineered to be robust in contextual understanding and exhibit outstanding performance in various natural language processing tasks. However, their considerable size incurs significant computational and storage costs. Modern pruning strategies employ one-shot techniques to compress PLMs without the need for retraining on task-specific or otherwise general da… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  6. arXiv:2408.10123  [pdf, other

    cs.RO cs.CV

    Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

    Authors: Gen Li, Nikolaos Tsagkas, Jifei Song, Ruaridh Mon-Williams, Sethu Vijayakumar, Kun Shao, Laura Sevilla-Lara

    Abstract: Affordance, defined as the potential actions that an object offers, is crucial for robotic manipulation tasks. A deep understanding of affordance can lead to more intelligent AI systems. For example, such knowledge directs an agent to grasp a knife by the handle for cutting and by the blade when passing it to someone. In this paper, we present a streamlined affordance learning system that encompas… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Project page: https://reagan1311.github.io/affgrasp

  7. arXiv:2408.10115  [pdf, other

    cs.CL

    GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization

    Authors: Ran Liu, Ming Liu, Min Yu, Jianguo Jiang, Gang Li, Dan Zhang, Jingyuan Li, Xiang Meng, Weiqing Huang

    Abstract: Pre-trained language models are increasingly being used in multi-document summarization tasks. However, these models need large-scale corpora for pre-training and are domain-dependent. Other non-neural unsupervised summarization approaches mostly rely on key sentence extraction, which can lead to information loss. To address these challenges, we propose a lightweight yet effective unsupervised app… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 19 pages, 7 figures. Accepted by ECAI 2024

  8. arXiv:2408.08870  [pdf, other

    cs.CV

    SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation

    Authors: Xinyu Xiong, Zihuang Wu, Shuangyi Tan, Wenxue Li, Feilong Tang, Ying Chen, Siying Li, Jie Ma, Guanbin Li

    Abstract: Image segmentation plays an important role in vision understanding. Recently, the emerging vision foundation models continuously achieved superior performance on various tasks. Following such success, in this paper, we prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models. We propose a simple but effective framework, termed SAM2-UNet, for versatile… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Technical Report

  9. arXiv:2408.08707  [pdf, other

    cs.LG cs.AI

    Beam Prediction based on Large Language Models

    Authors: Yucheng Sheng, Kai Huang, Le Liang, Peng Liu, Shi Jin, Geoffrey Ye Li

    Abstract: Millimeter-wave (mmWave) communication is promising for next-generation wireless networks but suffers from significant path loss, requiring extensive antenna arrays and frequent beam training. Traditional deep learning models, such as long short-term memory (LSTM), enhance beam tracking accuracy however are limited by poor robustness and generalization. In this letter, we use large language models… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  10. arXiv:2408.08610  [pdf, other

    cs.CV cs.AI cs.LG

    Generative Dataset Distillation Based on Diffusion Model

    Authors: Duo Su, Junjie Hou, Guang Li, Ren Togo, Rui Song, Takahiro Ogawa, Miki Haseyama

    Abstract: This paper presents our method for the generative track of The First Dataset Distillation Challenge at ECCV 2024. Since the diffusion model has become the mainstay of generative models because of its high-quality generative effects, we focus on distillation methods based on the diffusion model. Considering that the track can only generate a fixed number of images in 10 minutes using a generative m… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: The Third Place Winner in Generative Track of the ECCV 2024 DD Challenge

  11. arXiv:2408.07397  [pdf, other

    cs.MA

    Bridging Training and Execution via Dynamic Directed Graph-Based Communication in Cooperative Multi-Agent Systems

    Authors: Zhuohui Zhang, Bin He, Bin Cheng, Gang Li

    Abstract: Multi-agent systems must learn to communicate and understand interactions between agents to achieve cooperative goals in partially observed tasks. However, existing approaches lack a dynamic directed communication mechanism and rely on global states, thus diminishing the role of communication in centralized training. Thus, we propose the transformer-based graph coarsening network (TGCNet), a novel… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 9 pages, 7 figures

  12. arXiv:2408.06969  [pdf, ps, other

    cs.NI cs.LG

    IRS-Assisted Lossy Communications Under Correlated Rayleigh Fading: Outage Probability Analysis and Optimization

    Authors: Guanchang Li, Wensheng Lin, Lixin Li, Yixuan He, Fucheng Yang, Zhu Han

    Abstract: This paper focuses on an intelligent reflecting surface (IRS)-assisted lossy communication system with correlated Rayleigh fading. We analyze the correlated channel model and derive the outage probability of the system. Then, we design a deep reinforce learning (DRL) method to optimize the phase shift of IRS, in order to maximize the received signal power. Moreover, this paper presents results of… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  13. arXiv:2408.06601  [pdf, other

    cs.HC cs.GR

    HiRegEx: Interactive Visual Query and Exploration of Multivariate Hierarchical Data

    Authors: Guozheng Li, Haotian Mi, Chi Harold Liu, Takayuki Itoh, Guoren Wang

    Abstract: When using exploratory visual analysis to examine multivariate hierarchical data, users often need to query data to narrow down the scope of analysis. However, formulating effective query expressions remains a challenge for multivariate hierarchical data, particularly when datasets become very large. To address this issue, we develop a declarative grammar, HiRegEx (Hierarchical data Regular Expres… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 11 pages, 8 figures, accepted at IEEE VIS 2024

    MSC Class: 65D18 ACM Class: I.3.6

  14. arXiv:2408.05416  [pdf, other

    cs.CV cs.AI cs.MM

    High-fidelity and Lip-synced Talking Face Synthesis via Landmark-based Diffusion Model

    Authors: Weizhi Zhong, Junfan Lin, Peixin Chen, Liang Lin, Guanbin Li

    Abstract: Audio-driven talking face video generation has attracted increasing attention due to its huge industrial potential. Some previous methods focus on learning a direct mapping from audio to visual content. Despite progress, they often struggle with the ambiguity of the mapping process, leading to flawed results. An alternative strategy involves facial structural representations (e.g., facial landmark… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: submitted to IEEE Transactions on Image Processing(TIP)

  15. arXiv:2408.05412  [pdf, other

    cs.CV cs.AI cs.MM

    Style-Preserving Lip Sync via Audio-Aware Style Reference

    Authors: Weizhi Zhong, Jichang Li, Yinqi Cai, Liang Lin, Guanbin Li

    Abstract: Audio-driven lip sync has recently drawn significant attention due to its widespread application in the multimedia domain. Individuals exhibit distinct lip shapes when speaking the same utterance, attributed to the unique speaking styles of individuals, posing a notable challenge for audio-driven lip sync. Earlier methods for such task often bypassed the modeling of personalized speaking styles, r… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: submitted to IEEE Transactions on Image Processing(TIP)

  16. arXiv:2408.05109  [pdf, other

    cs.DB

    A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?

    Authors: Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuyu Luo, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang

    Abstract: Translating users' natural language queries (NL) into SQL queries (i.e., NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of NL2SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of NL2SQL techniques powered by LLMs, covering its e… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  17. arXiv:2408.03429  [pdf, other

    quant-ph cs.ET

    MarQSim: Reconciling Determinism and Randomness in Compiler Optimization for Quantum Simulation

    Authors: Xiuqi Cao, Junyu Zhou, Yuhao Liu, Yunong Shi, Gushu Li

    Abstract: Quantum simulation, fundamental in quantum algorithm design, extends far beyond its foundational roots, powering diverse quantum computing applications. However, optimizing the compilation of quantum Hamiltonian simulation poses significant challenges. Existing approaches fall short in reconciling deterministic and randomized compilation, lack appropriate intermediate representations, and struggle… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  18. arXiv:2408.02408  [pdf, other

    cs.CV cs.AI

    Multi-weather Cross-view Geo-localization Using Denoising Diffusion Models

    Authors: Tongtong Feng, Qing Li, Xin Wang, Mingzi Wang, Guangyao Li, Wenwu Zhu

    Abstract: Cross-view geo-localization in GNSS-denied environments aims to determine an unknown location by matching drone-view images with the correct geo-tagged satellite-view images from a large gallery. Recent research shows that learning discriminative image representations under specific weather conditions can significantly enhance performance. However, the frequent occurrence of unseen extreme weather… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM24 workshop

  19. arXiv:2408.02320  [pdf, ps, other

    cs.LG eess.SP math.NA math.ST stat.ML

    A Sharp Convergence Theory for The Probability Flow ODEs of Diffusion Models

    Authors: Gen Li, Yuting Wei, Yuejie Chi, Yuxin Chen

    Abstract: Diffusion models, which convert noise into new data instances by learning to reverse a diffusion process, have become a cornerstone in contemporary generative modeling. In this work, we develop non-asymptotic convergence theory for a popular diffusion-based sampler (i.e., the probability flow ODE sampler) in discrete time, assuming access to $\ell_2$-accurate estimates of the (Stein) score functio… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: This manuscript presents improved theory for probability flow ODEs compared to its earlier version arXiv:2306.09251

  20. arXiv:2408.02085  [pdf, other

    cs.CV cs.AI cs.CL eess.SP

    Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models

    Authors: Yulei Qin, Yuncheng Yang, Pengcheng Guo, Gang Li, Hang Shao, Yuchen Shi, Zihan Xu, Yun Gu, Ke Li, Xing Sun

    Abstract: Instruction tuning plays a critical role in aligning large language models (LLMs) with human preference. Despite the vast amount of open instruction datasets, naively training a LLM on all existing instructions may not be optimal and practical. To pinpoint the most beneficial datapoints, data assessment and selection methods have been proposed in the fields of natural language processing (NLP) and… ▽ More

    Submitted 7 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

    Comments: review, survey, 28 pages, 2 figures, 4 tables

  21. arXiv:2408.01929  [pdf, other

    eess.IV cs.CV

    Advancing H&E-to-IHC Stain Translation in Breast Cancer: A Multi-Magnification and Attention-Based Approach

    Authors: Linhao Qu, Chengsheng Zhang, Guihui Li, Haiyong Zheng, Chen Peng, Wei He

    Abstract: Breast cancer presents a significant healthcare challenge globally, demanding precise diagnostics and effective treatment strategies, where histopathological examination of Hematoxylin and Eosin (H&E) stained tissue sections plays a central role. Despite its importance, evaluating specific biomarkers like Human Epidermal Growth Factor Receptor 2 (HER2) for personalized treatment remains constraine… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE CIS-RAM 2024 Invited Session Oral

  22. arXiv:2408.00788  [pdf, other

    cs.NE cs.LG

    SpikeVoice: High-Quality Text-to-Speech Via Efficient Spiking Neural Network

    Authors: Kexin Wang, Jiahong Zhang, Yong Ren, Man Yao, Di Shang, Bo Xu, Guoqi Li

    Abstract: Brain-inspired Spiking Neural Network (SNN) has demonstrated its effectiveness and efficiency in vision, natural language, and speech understanding tasks, indicating their capacity to "see", "listen", and "read". In this paper, we design \textbf{SpikeVoice}, which performs high-quality Text-To-Speech (TTS) via SNN, to explore the potential of SNN to "speak". A major obstacle to using SNN for such… ▽ More

    Submitted 17 July, 2024; originally announced August 2024.

    Comments: 9 pages

  23. arXiv:2408.00465  [pdf, ps, other

    cs.DS cs.LG math.OC

    Infrequent Resolving Algorithm for Online Linear Programming

    Authors: Guokai Li, Zizhuo Wang, Jingwei Zhang

    Abstract: Online linear programming (OLP) has gained significant attention from both researchers and practitioners due to its extensive applications, such as online auction, network revenue management and advertising. Existing OLP algorithms fall into two categories: LP-based algorithms and LP-free algorithms. The former one typically guarantees better performance, even offering a constant regret, but requi… ▽ More

    Submitted 1 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

    Comments: 35 pages, 7 figures

  24. arXiv:2407.21465  [pdf, other

    cs.CV

    MarvelOVD: Marrying Object Recognition and Vision-Language Models for Robust Open-Vocabulary Object Detection

    Authors: Kuo Wang, Lechao Cheng, Weikai Chen, Pingping Zhang, Liang Lin, Fan Zhou, Guanbin Li

    Abstract: Learning from pseudo-labels that generated with VLMs~(Vision Language Models) has been shown as a promising solution to assist open vocabulary detection (OVD) in recent studies. However, due to the domain gap between VLM and vision-detection tasks, pseudo-labels produced by the VLMs are prone to be noisy, while the training design of the detector further amplifies the bias. In this work, we invest… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Codes are available at https://github.com/wkfdb/MarvelOVD

  25. arXiv:2407.21282  [pdf, ps, other

    cs.LG cs.HC

    FedBChain: A Blockchain-enabled Federated Learning Framework for Improving DeepConvLSTM with Comparative Strategy Insights

    Authors: Gaoxuan Li, Chern Hong Lim, Qiyao Ma, Xinyu Tang, Hwa Hui Tew, Fan Ding, Xuewen Luo

    Abstract: Recent research in the field of Human Activity Recognition has shown that an improvement in prediction performance can be achieved by reducing the number of LSTM layers. However, this kind of enhancement is only significant on monolithic architectures, and when it runs on large-scale distributed training, data security and privacy issues will be reconsidered, and its prediction performance is unkn… ▽ More

    Submitted 7 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

  26. arXiv:2407.20853  [pdf, other

    cs.CV

    NIS-SLAM: Neural Implicit Semantic RGB-D SLAM for 3D Consistent Scene Understanding

    Authors: Hongjia Zhai, Gan Huang, Qirui Hu, Guanglin Li, Hujun Bao, Guofeng Zhang

    Abstract: In recent years, the paradigm of neural implicit representations has gained substantial attention in the field of Simultaneous Localization and Mapping (SLAM). However, a notable gap exists in the existing approaches when it comes to scene understanding. In this paper, we introduce NIS-SLAM, an efficient neural implicit semantic RGB-D SLAM system, that leverages a pre-trained 2D segmentation netwo… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accept by TVCG (ISMAR 2024 Journal Track)

  27. arXiv:2407.20708  [pdf, other

    cs.AI

    Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection

    Authors: Xinhao Luo, Man Yao, Yuhong Chou, Bo Xu, Guoqi Li

    Abstract: Brain-inspired Spiking Neural Networks (SNNs) have bio-plausibility and low-power advantages over Artificial Neural Networks (ANNs). Applications of SNNs are currently limited to simple classification tasks because of their poor performance. In this work, we focus on bridging the performance gap between ANNs and SNNs on object detection. Our design revolves around network architecture and spiking… ▽ More

    Submitted 5 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024; 19 pages, 4 figures

  28. arXiv:2407.20693  [pdf, other

    cs.CV cs.AI cs.MM

    Boosting Audio Visual Question Answering via Key Semantic-Aware Cues

    Authors: Guangyao Li, Henghui Du, Di Hu

    Abstract: The Audio Visual Question Answering (AVQA) task aims to answer questions related to various visual objects, sounds, and their interactions in videos. Such naturally multimodal videos contain rich and complex dynamic audio-visual components, with only a portion of them closely related to the given questions. Hence, effectively perceiving audio-visual cues relevant to the given questions is crucial… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  29. arXiv:2407.20679  [pdf, other

    cs.CE

    Online Prediction-Assisted Safe Reinforcement Learning for Electric Vehicle Charging Station Recommendation in Dynamically Coupled Transportation-Power Systems

    Authors: Qionghua Liao, Guilong Li, Jiajie Yu, Ziyuan Gu, Wei Ma

    Abstract: With the proliferation of electric vehicles (EVs), the transportation network and power grid become increasingly interdependent and coupled via charging stations. The concomitant growth in charging demand has posed challenges for both networks, highlighting the importance of charging coordination. Existing literature largely overlooks the interactions between power grid security and traffic effici… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 33 pages, 31 figures

  30. arXiv:2407.20508  [pdf, other

    cs.AI cs.LG cs.NE

    Unveiling the Potential of Spiking Dynamics in Graph Representation Learning through Spatial-Temporal Normalization and Coding Strategies

    Authors: Mingkun Xu, Huifeng Yin, Yujie Wu, Guoqi Li, Faqiang Liu, Jing Pei, Shuai Zhong, Lei Deng

    Abstract: In recent years, spiking neural networks (SNNs) have attracted substantial interest due to their potential to replicate the energy-efficient and event-driven processing of biological neurons. Despite this, the application of SNNs in graph representation learning, particularly for non-Euclidean data, remains underexplored, and the influence of spiking dynamics on graph learning is not yet fully und… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  31. arXiv:2407.20099  [pdf, other

    cs.CV

    RSC-SNN: Exploring the Trade-off Between Adversarial Robustness and Accuracy in Spiking Neural Networks via Randomized Smoothing Coding

    Authors: Keming Wu, Man Yao, Yuhong Chou, Xuerui Qiu, Rui Yang, Bo Xu, Guoqi Li

    Abstract: Spiking Neural Networks (SNNs) have received widespread attention due to their unique neuronal dynamics and low-power nature. Previous research empirically shows that SNNs with Poisson coding are more robust than Artificial Neural Networks (ANNs) on small-scale datasets. However, it is still unclear in theory how the adversarial robustness of SNNs is derived, and whether SNNs can still maintain it… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  32. arXiv:2407.18932  [pdf

    cs.CY cs.AI

    Be More Real: Travel Diary Generation Using LLM Agents and Individual Profiles

    Authors: Xuchuan Li, Fei Huang, Jianrong Lv, Zhixiong Xiao, Guolong Li, Yang Yue

    Abstract: Human mobility is inextricably linked to social issues such as traffic congestion, energy consumption, and public health; however, privacy concerns restrict access to mobility data. Recently, research have utilized Large Language Models (LLMs) for human mobility generation, in which the challenge is how LLMs can understand individuals' mobility behavioral differences to generate realistic trajecto… ▽ More

    Submitted 5 August, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  33. arXiv:2407.18877  [pdf, other

    cs.SE

    Code Structure-Aware through Line-level Semantic Learning for Code Vulnerability Detection

    Authors: Ziliang Wang, Ge Li, Jia Li, Yihong Dong, Yingfei Xiong, Zhi Jin

    Abstract: Different from the flow semantics of natural languages, programming languages are inherently rigid in structure and grammar. Existing fine-tuning methodologies for code vulnerability detection generally treat code as long text sequences, stripping away structural elements such as newlines ('/n') and whitespace. However, this approach inadvertently results in the loss of crucial structural informat… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  34. arXiv:2407.18625  [pdf, other

    cs.ET cs.AI cs.NE

    Topology Optimization of Random Memristors for Input-Aware Dynamic SNN

    Authors: Bo Wang, Shaocong Wang, Ning Lin, Yi Li, Yifei Yu, Yue Zhang, Jichang Yang, Xiaoshan Wu, Yangu He, Songqi Wang, Rui Chen, Guoqi Li, Xiaojuan Qi, Zhongrui Wang, Dashan Shang

    Abstract: There is unprecedented development in machine learning, exemplified by recent large language models and world simulators, which are artificial neural networks running on digital computers. However, they still cannot parallel human brains in terms of energy efficiency and the streamlined adaptability to inputs of different difficulties, due to differences in signal representation, optimization, run… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 15 pages, 5 figures

  35. arXiv:2407.16508  [pdf, other

    cs.CV

    ToDER: Towards Colonoscopy Depth Estimation and Reconstruction with Geometry Constraint Adaptation

    Authors: Zhenhua Wu, Yanlin Jin, Liangdong Qiu, Xiaoguang Han, Xiang Wan, Guanbin Li

    Abstract: Visualizing colonoscopy is crucial for medical auxiliary diagnosis to prevent undetected polyps in areas that are not fully observed. Traditional feature-based and depth-based reconstruction approaches usually end up with undesirable results due to incorrect point matching or imprecise depth estimation in realistic colonoscopy videos. Modern deep-based methods often require a sufficient number of… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  36. arXiv:2407.14100  [pdf, other

    cs.GR cs.AI cs.LG

    ParamsDrag: Interactive Parameter Space Exploration via Image-Space Dragging

    Authors: Guan Li, Yang Liu, Guihua Shan, Shiyu Cheng, Weiqun Cao, Junpeng Wang, Ko-Chih Wang

    Abstract: Numerical simulation serves as a cornerstone in scientific modeling, yet the process of fine-tuning simulation parameters poses significant challenges. Conventionally, parameter adjustment relies on extensive numerical simulations, data analysis, and expert insights, resulting in substantial computational costs and low efficiency. The emergence of deep learning in recent years has provided promisi… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: To be published in Proc. IEEE VIS 2024

  37. arXiv:2407.13978  [pdf, other

    cs.LG

    Double Gradient Reversal Network for Single-Source Domain Generalization in Multi-mode Fault Diagnosis

    Authors: Guangqiang Li, M. Amine Atoui, Xiangshun Li

    Abstract: Domain generalization achieves fault diagnosis on unseen modes. In process industrial systems, fault samples are limited, and only single-mode fault data can be obtained. Extracting domain-invariant fault features from single-mode data for unseen mode fault diagnosis poses challenges. Existing methods utilize a generator module to simulate samples of unseen modes. However, multi-mode samples conta… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  38. arXiv:2407.13782  [pdf, other

    eess.AS cs.AI cs.SD

    Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition

    Authors: Shujie Hu, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan Li, Yi Wang, Mingyu Cui, Tianzi Wang, Helen Meng, Xunying Liu

    Abstract: Self-supervised learning (SSL) based speech foundation models have been applied to a wide range of ASR tasks. However, their application to dysarthric and elderly speech via data-intensive parameter fine-tuning is confronted by in-domain data scarcity and mismatch. To this end, this paper explores a series of approaches to integrate domain fine-tuned SSL pre-trained models and their features into… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: IEEE/ACM Transactions on Audio, Speech, and Language Processing

  39. arXiv:2407.12565  [pdf, other

    cs.AR

    SigDLA: A Deep Learning Accelerator Extension for Signal Processing

    Authors: Fangfa Fu, Wenyu Zhang, Zesong Jiang, Zhiyu Zhu, Guoyu Li, Bing Yang, Cheng Liu, Liyi Xiao, Jinxiang Wang, Huawei Li, Xiaowei Li

    Abstract: Deep learning and signal processing are closely correlated in many IoT scenarios such as anomaly detection to empower intelligence of things. Many IoT processors utilize digital signal processors (DSPs) for signal processing and build deep learning frameworks on this basis. While deep learning is usually much more computing-intensive than signal processing, the computing efficiency of deep learnin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  40. arXiv:2407.12258  [pdf, other

    cs.CV

    Facial Affect Recognition based on Multi Architecture Encoder and Feature Fusion for the ABAW7 Challenge

    Authors: Kang Shen, Xuxiong Liu, Boyan Wang, Jun Yao, Xin Liu, Yujie Guan, Yu Wang, Gengchen Li, Xiao Sun

    Abstract: In this paper, we present our approach to addressing the challenges of the 7th ABAW competition. The competition comprises three sub-challenges: Valence Arousal (VA) estimation, Expression (Expr) classification, and Action Unit (AU) detection. To tackle these challenges, we employ state-of-the-art models to extract powerful visual features. Subsequently, a Transformer Encoder is utilized to integr… ▽ More

    Submitted 26 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

  41. arXiv:2407.12038  [pdf, ps, other

    eess.AS cs.AI

    ICAGC 2024: Inspirational and Convincing Audio Generation Challenge 2024

    Authors: Ruibo Fu, Rui Liu, Chunyu Qiang, Yingming Gao, Yi Lu, Shuchen Shi, Tao Wang, Ya Li, Zhengqi Wen, Chen Zhang, Hui Bu, Yukun Liu, Xin Qi, Guanjun Li

    Abstract: The Inspirational and Convincing Audio Generation Challenge 2024 (ICAGC 2024) is part of the ISCSLP 2024 Competitions and Challenges track. While current text-to-speech (TTS) technology can generate high-quality audio, its ability to convey complex emotions and controlled detail content remains limited. This constraint leads to a discrepancy between the generated audio and human subjective percept… ▽ More

    Submitted 31 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: ISCSLP 2024 Challenge description and results

  42. arXiv:2407.11486  [pdf, other

    cs.CV

    An efficient framework based on large foundation model for cervical cytopathology whole slide image screening

    Authors: Jialong Huang, Gaojie Li, Shichao Kan, Jianfeng Liu, Yixiong Liang

    Abstract: Current cervical cytopathology whole slide image (WSI) screening primarily relies on detection-based approaches, which are limited in performance due to the expense and time-consuming annotation process. Multiple Instance Learning (MIL), a weakly supervised approach that relies solely on bag-level labels, can effectively alleviate these challenges. Nonetheless, MIL commonly employs frozen pretrain… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  43. arXiv:2407.11405  [pdf, other

    cs.CR cs.CV

    Cover-separable Fixed Neural Network Steganography via Deep Generative Models

    Authors: Guobiao Li, Sheng Li, Zhenxing Qian, Xinpeng Zhang

    Abstract: Image steganography is the process of hiding secret data in a cover image by subtle perturbation. Recent studies show that it is feasible to use a fixed neural network for data embedding and extraction. Such Fixed Neural Network Steganography (FNNS) demonstrates favorable performance without the need for training networks, making it more practical for real-world applications. However, the stego-im… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepetd at ACMMM 2024

  44. arXiv:2407.10957  [pdf, other

    cs.CV cs.AI

    Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes

    Authors: Yaoting Wang, Peiwen Sun, Dongzhan Zhou, Guangyao Li, Honggang Zhang, Di Hu

    Abstract: Traditional reference segmentation tasks have predominantly focused on silent visual scenes, neglecting the integral role of multimodal perception and interaction in human experiences. In this work, we introduce a novel task called Reference Audio-Visual Segmentation (Ref-AVS), which seeks to segment objects within the visual domain based on expressions containing multimodal cues. Such expressions… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  45. arXiv:2407.10625  [pdf, other

    cs.CV

    WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models

    Authors: Zijian He, Peixin Chen, Guangrun Wang, Guanbin Li, Philip H. S. Torr, Liang Lin

    Abstract: Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos. Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions, limiting their effectiveness in video try-on applications. Moreover, video-based models require extensive, high-quality data and… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  46. arXiv:2407.08959  [pdf, other

    cs.CL

    Domain-Hierarchy Adaptation via Chain of Iterative Reasoning for Few-shot Hierarchical Text Classification

    Authors: Ke Ji, Peng Wang, Wenjun Ke, Guozheng Li, Jiajun Liu, Jingsheng Gao, Ziyu Shang

    Abstract: Recently, various pre-trained language models (PLMs) have been proposed to prove their impressive performances on a wide range of few-shot tasks. However, limited by the unstructured prior knowledge in PLMs, it is difficult to maintain consistent performance on complex structured scenarios, such as hierarchical text classification (HTC), especially when the downstream data is extremely scarce. The… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 9 pages, 2 figures, Accepted by IJCAI2024

  47. arXiv:2407.08850  [pdf, other

    cs.HC cs.AI

    UICrit: Enhancing Automated Design Evaluation with a UICritique Dataset

    Authors: Peitong Duan, Chin-yi Chen, Gang Li, Bjoern Hartmann, Yang Li

    Abstract: Automated UI evaluation can be beneficial for the design process; for example, to compare different UI designs, or conduct automated heuristic evaluation. LLM-based UI evaluation, in particular, holds the promise of generalizability to a wide variety of UI types and evaluation tasks. However, current LLM-based techniques do not yet match the performance of human evaluators. We hypothesize that aut… ▽ More

    Submitted 13 August, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted to ACM UIST 2024

  48. arXiv:2407.08200  [pdf, other

    cs.CV

    Deep Understanding of Soccer Match Videos

    Authors: Shikun Xu, Yandong Zhu, Gen Li, Changhu Wang

    Abstract: Soccer is one of the most popular sport worldwide, with live broadcasts frequently available for major matches. However, extracting detailed, frame-by-frame information on player actions from these videos remains a challenge. Utilizing state-of-the-art computer vision technologies, our system can detect key objects such as soccer balls, players and referees. It also tracks the movements of players… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  49. arXiv:2407.08093  [pdf, other

    eess.IV cs.AI cs.CV eess.SP

    MemWarp: Discontinuity-Preserving Cardiac Registration with Memorized Anatomical Filters

    Authors: Hang Zhang, Xiang Chen, Renjiu Hu, Dongdong Liu, Gaolei Li, Rongguang Wang

    Abstract: Many existing learning-based deformable image registration methods impose constraints on deformation fields to ensure they are globally smooth and continuous. However, this assumption does not hold in cardiac image registration, where different anatomical regions exhibit asymmetric motions during respiration and movements due to sliding organs within the chest. Consequently, such global constraint… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 11 pages, 2 figure, 2 tables

  50. arXiv:2407.07020  [pdf, other

    cs.AI cs.RO

    Less is More: Efficient Brain-Inspired Learning for Autonomous Driving Trajectory Prediction

    Authors: Haicheng Liao, Yongkang Li, Zhenning Li, Chengyue Wang, Chunlin Tian, Yuming Huang, Zilin Bian, Kaiqun Zhu, Guofa Li, Ziyuan Pu, Jia Hu, Zhiyong Cui, Chengzhong Xu

    Abstract: Accurately and safely predicting the trajectories of surrounding vehicles is essential for fully realizing autonomous driving (AD). This paper presents the Human-Like Trajectory Prediction model (HLTP++), which emulates human cognitive processes to improve trajectory prediction in AD. HLTP++ incorporates a novel teacher-student knowledge distillation framework. The "teacher" model equipped with an… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.19251