Skip to main content

Showing 1–50 of 885 results for author: Yang, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10556  [pdf, other

    cs.AI cs.LG

    Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks

    Authors: Yun Qu, Boyuan Wang, Jianzhun Shao, Yuhang Jiang, Chen Chen, Zhenbin Ye, Lin Liu, Junfeng Yang, Lin Lai, Hongyang Qin, Minwen Deng, Juchao Zhuo, Deheng Ye, Qiang Fu, Wei Yang, Guang Yang, Lanxiao Huang, Xiangyang Ji

    Abstract: The advancement of Offline Reinforcement Learning (RL) and Offline Multi-Agent Reinforcement Learning (MARL) critically depends on the availability of high-quality, pre-collected offline datasets that represent real-world complexities and practical applications. However, existing datasets often fall short in their simplicity and lack of realism. To address this gap, we propose Hokoff, a comprehens… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2408.10531  [pdf, other

    cs.RO

    Leveraging Temporal Contexts to Enhance Vehicle-Infrastructure Cooperative Perception

    Authors: Jiaru Zhong, Haibao Yu, Tianyi Zhu, Jiahui Xu, Wenxian Yang, Zaiqing Nie, Chao Sun

    Abstract: Infrastructure sensors installed at elevated positions offer a broader perception range and encounter fewer occlusions. Integrating both infrastructure and ego-vehicle data through V2X communication, known as vehicle-infrastructure cooperation, has shown considerable advantages in enhancing perception capabilities and addressing corner cases encountered in single-vehicle autonomous driving. Howeve… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE ITSC 2024

  3. arXiv:2408.10072  [pdf, other

    cs.CV cs.AI

    FFAA: Multimodal Large Language Model based Explainable Open-World Face Forgery Analysis Assistant

    Authors: Zhengchao Huang, Bin Xia, Zicheng Lin, Zhun Mou, Wenming Yang

    Abstract: The rapid advancement of deepfake technologies has sparked widespread public concern, particularly as face forgery poses a serious threat to public information security. However, the unknown and diverse forgery techniques, varied facial features and complex environmental factors pose significant challenges for face forgery analysis. Existing datasets lack descriptions of these aspects, making it d… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 17 pages, 18 figures; project page: https://ffaa-vl.github.io

  4. arXiv:2408.09612  [pdf, other

    cs.RO

    ContactSDF: Signed Distance Functions as Multi-Contact Models for Dexterous Manipulation

    Authors: Wen Yang, Wanxin Jin

    Abstract: In this paper, we propose ContactSDF, a method that uses signed distance functions (SDFs) to approximate multi-contact models, including both collision detection and time-stepping routines. ContactSDF first establishes an SDF using the supporting plane representation of an object for collision detection, and then use the generated contact dual cones to build a second SDF for time stepping predicti… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  5. arXiv:2408.08765  [pdf, other

    cs.NI

    Rethinking Generative Semantic Communication for Multi-User Systems with Multi-Modal LLM

    Authors: Wanting Yang, Zehui Xiong, Shiwen Mao, Tony Q. S. Quek, Ping Zhang, Merouane Debbah, Rahim Tafazolli

    Abstract: The surge in connected devices in 6G with typical massive access scenarios, such as smart agriculture, and smart cities, poses significant challenges to unsustainable traditional communication with limited radio resources and already high system complexity. Fortunately, the booming artificial intelligence technology and the growing computational power of devices offer a promising 6G enabler: seman… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  6. arXiv:2408.08143  [pdf, other

    cs.CR cs.CV

    Unlearnable Examples Detection via Iterative Filtering

    Authors: Yi Yu, Qichen Zheng, Siyuan Yang, Wenhan Yang, Jun Liu, Shijian Lu, Yap-Peng Tan, Kwok-Yan Lam, Alex Kot

    Abstract: Deep neural networks are proven to be vulnerable to data poisoning attacks. Recently, a specific type of data poisoning attack known as availability attacks has led to the failure of data utilization for model learning by adding imperceptible perturbations to images. Consequently, it is quite beneficial and challenging to detect poisoned samples, also known as Unlearnable Examples (UEs), from a mi… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Accepted by ICANN 2024

  7. arXiv:2408.07107  [pdf, other

    cs.LG

    Maximizing V-information for Pre-training Superior Foundation Models

    Authors: Wenxuan Yang, Weimin Tan, Hanyu Zhang, Bo Yan

    Abstract: Pre-training foundation models on large-scale datasets demonstrates exceptional performance. However, recent research questions this traditional notion, exploring whether an increase in pre-training data always leads to enhanced model performance. To address this issue, data-effective learning approaches have been introduced. However, current methods in this area lack a clear standard for sample s… ▽ More

    Submitted 16 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

  8. arXiv:2408.07084  [pdf

    cs.LG cs.AI

    Dynamic Hypergraph-Enhanced Prediction of Sequential Medical Visits

    Authors: Wangying Yang, Zitao Zheng, Shi Bo, Zhizhong Wu, Bo Zhang, Yuanfang Yang

    Abstract: This study introduces a pioneering Dynamic Hypergraph Networks (DHCE) model designed to predict future medical diagnoses from electronic health records with enhanced accuracy. The DHCE model innovates by identifying and differentiating acute and chronic diseases within a patient's visit history, constructing dynamic hypergraphs that capture the complex, high-order interactions between diseases. It… ▽ More

    Submitted 19 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

  9. Diffusion Model-based Contrastive Learning for Human Activity Recognition

    Authors: Chunjing Xiao, Yanhui Han, Wei Yang, Yane Hou, Fangzhan Shi, Kevin Chetty

    Abstract: WiFi Channel State Information (CSI)-based activity recognition has sparked numerous studies due to its widespread availability and privacy protection. However, when applied in practical applications, general CSI-based recognition models may face challenges related to the limited generalization capability, since individuals with different behavior habits will cause various fluctuations in CSI data… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: The paper has been accepted by IEEE Internet of Things Journal

  10. arXiv:2408.05006  [pdf, other

    cs.SE cs.AI

    Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement

    Authors: Weiqing Yang, Hanbin Wang, Zhenghao Liu, Xinze Li, Yukun Yan, Shuo Wang, Yu Gu, Minghe Yu, Zhiyuan Liu, Ge Yu

    Abstract: Debugging is a vital aspect of software development, yet the debugging capabilities of Large Language Models (LLMs) remain largely unexplored. This paper first introduces DEBUGEVAL, a comprehensive benchmark designed to evaluate the debugging capabilities of LLMs. DEBUGEVAL collects data from existing high-quality datasets and designs four different tasks to evaluate the debugging effectiveness, i… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  11. arXiv:2408.04927  [pdf, other

    cs.NI eess.SP

    Large Models for Aerial Edges: An Edge-Cloud Model Evolution and Communication Paradigm

    Authors: Shuhang Zhang, Qingyu Liu, Ke Chen, Boya Di, Hongliang Zhang, Wenhan Yang, Dusit Niyato, Zhu Han, H. Vincent Poor

    Abstract: The future sixth-generation (6G) of wireless networks is expected to surpass its predecessors by offering ubiquitous coverage through integrated air-ground facility deployments in both communication and computing domains. In this network, aerial facilities, such as unmanned aerial vehicles (UAVs), conduct artificial intelligence (AI) computations based on multi-modal data to support diverse applic… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  12. arXiv:2408.02922  [pdf, other

    cs.CV

    Pose Magic: Efficient and Temporally Consistent Human Pose Estimation with a Hybrid Mamba-GCN Network

    Authors: Xinyi Zhang, Qiqi Bao, Qinpeng Cui, Wenming Yang, Qingmin Liao

    Abstract: Current state-of-the-art (SOTA) methods in 3D Human Pose Estimation (HPE) are primarily based on Transformers. However, existing Transformer-based 3D HPE backbones often encounter a trade-off between accuracy and computational efficiency. To resolve the above dilemma, in this work, we leverage recent advances in state space models and utilize Mamba for high-quality and efficient long-range modelin… ▽ More

    Submitted 7 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  13. arXiv:2408.01651  [pdf, other

    cs.MM cs.AI cs.HC

    Music2P: A Multi-Modal AI-Driven Tool for Simplifying Album Cover Design

    Authors: Joong Ho Choi, Geonyeong Choi, Ji-Eun Han, Wonjin Yang, Zhi-Qi Cheng

    Abstract: In today's music industry, album cover design is as crucial as the music itself, reflecting the artist's vision and brand. However, many AI-driven album cover services require subscriptions or technical expertise, limiting accessibility. To address these challenges, we developed Music2P, an open-source, multi-modal AI-driven tool that streamlines album cover creation, making it efficient, accessib… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: Accepted at CIKM 2024 Demo Paper track. Project available at https://github.com/JC-78/Music2P

    ACM Class: H.5.1; H.5.5

  14. arXiv:2408.01276  [pdf, other

    cs.CV

    Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement

    Authors: Wenbin Zou, Hongxia Gao, Weipeng Yang, Tongtong Liu

    Abstract: Ultra-high-definition (UHD) technology has attracted widespread attention due to its exceptional visual quality, but it also poses new challenges for low-light image enhancement (LLIE) techniques. UHD images inherently possess high computational complexity, leading existing UHD LLIE methods to employ high-magnification downsampling to reduce computational costs, which in turn results in informatio… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 10 pages, 8 figures, ACMMM2024 accepted

  15. arXiv:2408.00131  [pdf, other

    stat.ML cs.AI cs.LG q-fin.RM

    Distributionally Robust Optimization as a Scalable Framework to Characterize Extreme Value Distributions

    Authors: Patrick Kuiper, Ali Hasan, Wenhao Yang, Yuting Ng, Hoda Bidkhori, Jose Blanchet, Vahid Tarokh

    Abstract: The goal of this paper is to develop distributionally robust optimization (DRO) estimators, specifically for multidimensional Extreme Value Theory (EVT) statistics. EVT supports using semi-parametric models called max-stable distributions built from spatial Poisson point processes. While powerful, these models are only asymptotically valid for large samples. However, since extreme data is by defin… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

  16. arXiv:2407.21092  [pdf, ps, other

    cs.CL cond-mat.stat-mech hep-th math.DG

    Entropy, Thermodynamics and the Geometrization of the Language Model

    Authors: Wenzhe Yang

    Abstract: In this paper, we discuss how pure mathematics and theoretical physics can be applied to the study of language models. Using set theory and analysis, we formulate mathematically rigorous definitions of language models, and introduce the concept of the moduli space of distributions for a language model. We formulate a generalized distributional hypothesis using functional analysis and topology. We… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 18 pages

    MSC Class: 68T01 ACM Class: I.2.7

  17. arXiv:2407.20730  [pdf, other

    cs.CV

    Autogenic Language Embedding for Coherent Point Tracking

    Authors: Zikai Song, Ying Tang, Run Luo, Lintao Ma, Junqing Yu, Yi-Ping Phoebe Chen, Wei Yang

    Abstract: Point tracking is a challenging task in computer vision, aiming to establish point-wise correspondence across long video sequences. Recent advancements have primarily focused on temporal modeling techniques to improve local feature similarity, often overlooking the valuable semantic consistency inherent in tracked points. In this paper, we introduce a novel approach leveraging language embeddings… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: accepted by ACM MM 2024

  18. arXiv:2407.20585  [pdf, other

    cs.NI eess.SP

    A UAV-Enabled Time-Sensitive Data Collection Scheme for Grassland Monitoring Edge Networks

    Authors: Dongbin Jiao, Zihao Wang, Wen Fan, Weibo Yang, Peng Yang, Zhanhuan Shang, Shi Yan

    Abstract: Grassland monitoring is essential for the sustainable development of grassland resources. Traditional Internet of Things (IoT) devices generate critical ecological data, making data loss unacceptable, but the harsh environment complicates data collection. Unmanned Aerial Vehicle (UAV) and mobile edge computing (MEC) offer efficient data collection solutions, enhancing performance on resource-limit… ▽ More

    Submitted 10 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

  19. arXiv:2407.19639  [pdf, other

    cs.CR

    Segmented Private Data Aggregation in the Multi-message Shuffle Model

    Authors: Shaowei Wang, Ruilin Yang, Sufen Zeng, Kaiqi Yu, Rundong Mei, Shaozheng Huang, Wei Yang

    Abstract: The shuffle model of differential privacy (DP) offers compelling privacy-utility trade-offs in decentralized settings (e.g., internet of things, mobile edge networks). Particularly, the multi-message shuffle model, where each user may contribute multiple messages, has shown that accuracy can approach that of the central model of DP. However, existing studies typically assume a uniform privacy prot… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  20. arXiv:2407.19580  [pdf, other

    cs.LG cs.AI cs.CL

    Memory-efficient Training of LLMs with Larger Mini-batches

    Authors: Dang Nguyen, Wenhan Yang, Rathul Anand, Yu Yang, Baharan Mirzasoleiman

    Abstract: Training with larger mini-batches improves the performance and convergence rate of training machine learning models. However, training with large mini-batches becomes prohibitive for Large Language Models (LLMs) with billions of parameters, due to the large GPU memory requirement. To address this problem, we propose finding small mini-batches that simulate the dynamics of training with larger mini… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 15 pages, 2 figures, 4 tables

  21. arXiv:2407.18046  [pdf, other

    cs.CV cs.AI

    GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution

    Authors: Jintong Hu, Bin Xia, Bin Chen, Wenming Yang, Lei Zhang

    Abstract: Implicit neural representations (INRs) have significantly advanced the field of arbitrary-scale super-resolution (ASSR) of images. Most existing INR-based ASSR networks first extract features from the given low-resolution image using an encoder, and then render the super-resolved result via a multi-layer perceptron decoder. Although these approaches have shown promising results, their performance… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 13 pages, 12 figures

  22. arXiv:2407.17738  [pdf, other

    cs.CV

    Enhancing Fine-grained Object Detection in Aerial Images via Orthogonal Mapping

    Authors: Haoran Zhu, Yifan Zhou, Chang Xu, Ruixiang Zhang, Wen Yang

    Abstract: Fine-Grained Object Detection (FGOD) is a critical task in high-resolution aerial image analysis. This letter introduces Orthogonal Mapping (OM), a simple yet effective method aimed at addressing the challenge of semantic confusion inherent in FGOD. OM introduces orthogonal constraints in the feature space by decoupling features from the last layer of the classification branch with a class-wise or… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  23. arXiv:2407.17723  [pdf, other

    cs.LG

    Your Graph Recommender is Provably a Single-view Graph Contrastive Learning

    Authors: Wenjie Yang, Shengzhong Zhang, Jiaxing Guo, Zengfeng Huang

    Abstract: Graph recommender (GR) is a type of graph neural network (GNNs) encoder that is customized for extracting information from the user-item interaction graph. Due to its strong performance on the recommendation task, GR has gained significant attention recently. Graph contrastive learning (GCL) is also a popular research direction that aims to learn, often unsupervised, GNNs with certain contrastive… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  24. arXiv:2407.17229  [pdf, other

    cs.CV

    LPGen: Enhancing High-Fidelity Landscape Painting Generation through Diffusion Model

    Authors: Wanggong Yang, Xiaona Wang, Yingrui Qiu, Yifei Zhao

    Abstract: Generating landscape paintings expands the possibilities of artistic creativity and imagination. Traditional landscape painting methods involve using ink or colored ink on rice paper, which requires substantial time and effort. These methods are susceptible to errors and inconsistencies and lack precise control over lines and colors. This paper presents LPGen, a high-fidelity, controllable model f… ▽ More

    Submitted 12 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  25. arXiv:2407.14742  [pdf, other

    cs.HC

    Dynamic Color Assignment for Hierarchical Data

    Authors: Jiashu Chen, Weikai Yang, Zelin Jia, Lanxi Xiao, Shixia Liu

    Abstract: Assigning discriminable and harmonic colors to samples according to their class labels and spatial distribution can generate attractive visualizations and facilitate data exploration. However, as the number of classes increases, it is challenging to generate a high-quality color assignment result that accommodates all classes simultaneously. A practical solution is to organize classes into a hiera… ▽ More

    Submitted 9 August, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

  26. arXiv:2407.13237  [pdf, other

    cs.AI

    LLM-Empowered State Representation for Reinforcement Learning

    Authors: Boyuan Wang, Yun Qu, Yuhang Jiang, Jianzhun Shao, Chang Liu, Wenming Yang, Xiangyang Ji

    Abstract: Conventional state representations in reinforcement learning often omit critical task-related details, presenting a significant challenge for value networks in establishing accurate mappings from states to task rewards. Traditional methods typically depend on extensive sample learning to enrich state representations with task-specific information, which leads to low sample efficiency and high time… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  27. arXiv:2407.12470  [pdf, other

    cs.CL

    Continual Learning for Temporal-Sensitive Question Answering

    Authors: Wanqi Yang, Yunqiu Xu, Yanda Li, Kunze Wang, Binbin Huang, Ling Chen

    Abstract: In this study, we explore an emerging research area of Continual Learning for Temporal Sensitive Question Answering (CLTSQA). Previous research has primarily focused on Temporal Sensitive Question Answering (TSQA), often overlooking the unpredictable nature of future events. In real-world applications, it's crucial for models to continually acquire knowledge over time, rather than relying on a sta… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted by IJCNN 2024

  28. arXiv:2407.10153  [pdf, other

    cs.CL cs.AI

    Look Within, Why LLMs Hallucinate: A Causal Perspective

    Authors: He Li, Haoang Chi, Mingyu Liu, Wenjing Yang

    Abstract: The emergence of large language models (LLMs) is a milestone in generative artificial intelligence, achieving significant success in text comprehension and generation tasks. Despite the tremendous success of LLMs in many downstream tasks, they suffer from severe hallucination problems, posing significant challenges to the practical applications of LLMs. Most of the works about LLMs' hallucinations… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 15 pages, 7 figures

  29. arXiv:2407.09562  [pdf, other

    cs.CV eess.IV

    Edge AI-Enabled Chicken Health Detection Based on Enhanced FCOS-Lite and Knowledge Distillation

    Authors: Qiang Tong, Jinrui Wang, Wenshuang Yang, Songtao Wu, Wenqi Zhang, Chen Sun, Kuanhong Xu

    Abstract: The utilization of AIoT technology has become a crucial trend in modern poultry management, offering the potential to optimize farming operations and reduce human workloads. This paper presents a real-time and compact edge-AI enabled detector designed to identify chickens and their healthy statuses using frames captured by a lightweight and intelligent camera equipped with an edge-AI enabled CMOS… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  30. arXiv:2407.08865  [pdf, other

    cs.CV

    Single-Image Shadow Removal Using Deep Learning: A Comprehensive Survey

    Authors: Laniqng Guo, Chong Wang, Yufei Wang, Siyu Huang, Wenhan Yang, Alex C. Kot, Bihan Wen

    Abstract: Shadow removal aims at restoring the image content within shadow regions, pursuing a uniform distribution of illumination that is consistent between shadow and non-shadow regions. {Comparing to other image restoration tasks, there are two unique challenges in shadow removal:} 1) The patterns of shadows are arbitrary, varied, and often have highly complex trace structures, making ``trace-less'' ima… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: url: https://github.com/GuoLanqing/Awesome-Shadow-Removal

  31. arXiv:2407.08176  [pdf, other

    cs.SE cs.AI cs.LG

    Foundation Model Engineering: Engineering Foundation Models Just as Engineering Software

    Authors: Dezhi Ran, Mengzhou Wu, Wei Yang, Tao Xie

    Abstract: By treating data and models as the source code, Foundation Models (FMs) become a new type of software. Mirroring the concept of software crisis, the increasing complexity of FMs making FM crisis a tangible concern in the coming decade, appealing for new theories and methodologies from the field of software engineering. In this paper, we outline our vision of introducing Foundation Model (FM) engin… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by 2030 Software Engineering Workshop, co-located with FSE24; Invited to ACM TOSEM 2030 Roadmap for Software Engineering

  32. arXiv:2407.07660  [pdf, ps, other

    cs.CV cs.AI

    Boosting Medical Image Synthesis via Registration-guided Consistency and Disentanglement Learning

    Authors: Chuanpu Li, Zeli Chen, Yiwen Zhang, Liming Zhong, Wei Yang

    Abstract: Medical image synthesis remains challenging due to misalignment noise during training. Existing methods have attempted to address this challenge by incorporating a registration-guided module. However, these methods tend to overlook the task-specific constraints on the synthetic and registration modules, which may cause the synthetic module to still generate spatially aligned images with misaligned… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  33. arXiv:2407.06654  [pdf, other

    cs.CL cs.AI

    SoftDedup: an Efficient Data Reweighting Method for Speeding Up Language Model Pre-training

    Authors: Nan He, Weichen Xiong, Hanwen Liu, Yi Liao, Lei Ding, Kai Zhang, Guohua Tang, Xiao Han, Wei Yang

    Abstract: The effectiveness of large language models (LLMs) is often hindered by duplicated data in their extensive pre-training datasets. Current approaches primarily focus on detecting and removing duplicates, which risks the loss of valuable information and neglects the varying degrees of duplication. To address this, we propose a soft deduplication method that maintains dataset integrity while selective… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 12 pages, 7 figures

  34. arXiv:2407.05820  [pdf, other

    cs.RO

    Co-RaL: Complementary Radar-Leg Odometry with 4-DoF Optimization and Rolling Contact

    Authors: Sangwoo Jung, Wooseong Yang, Ayoung Kim

    Abstract: Robust and accurate localization in challenging environments is becoming crucial for SLAM. In this paper, we propose a unique sensor configuration for precise and robust odometry by integrating chip radar and a legged robot. Specifically, we introduce a tightly coupled radar-leg odometry algorithm for complementary drift correction. Adopting the 4-DoF optimization and decoupled RANSAC to mmWave ch… ▽ More

    Submitted 10 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: IROS 2024 accepted, 8 pages, 7 figures, 4 Tables

  35. arXiv:2407.04066  [pdf, ps, other

    cs.CV

    EMPL: A novel Efficient Meta Prompt Learning Framework for Few-shot Unsupervised Domain Adaptation

    Authors: Wanqi Yang, Haoran Wang, Lei Wang, Ge Song, Yang Gao

    Abstract: Few-shot unsupervised domain adaptation (FS-UDA) utilizes few-shot labeled source domain data to realize effective classification in unlabeled target domain. However, current FS-UDA methods are still suffer from two issues: 1) the data from different domains can not be effectively aligned by few-shot labeled data due to the large domain gaps, 2) it is unstable and time-consuming to generalize to n… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  36. arXiv:2407.02547  [pdf, other

    cs.AI cs.LG

    Domain Generalizable Knowledge Tracing via Concept Aggregation and Relation-Based Attention

    Authors: Yuquan Xie, Wanqi Yang, Jinyu Wei, Ming Yang, Yang Gao

    Abstract: Knowledge Tracing (KT) is a critical task in online education systems, aiming to monitor students' knowledge states throughout a learning period. Common KT approaches involve predicting the probability of a student correctly answering the next question based on their exercise history. However, these methods often suffer from performance degradation when faced with the scarcity of student interacti… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  37. arXiv:2407.02482  [pdf, other

    cs.CV

    Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models

    Authors: Fei Shen, Hu Ye, Sibo Liu, Jun Zhang, Cong Wang, Xiao Han, Wei Yang

    Abstract: Recent research showcases the considerable potential of conditional diffusion models for generating consistent stories. However, current methods, which predominantly generate stories in an autoregressive and excessively caption-dependent manner, often underrate the contextual consistency and relevance of frames during sequential generation. To address this, we propose a novel Rich-contextual Condi… ▽ More

    Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  38. arXiv:2407.02123  [pdf, other

    cs.CV

    Hybrid Feature Collaborative Reconstruction Network for Few-Shot Fine-Grained Image Classification

    Authors: Shulei Qiu, Wanqi Yang, Ming Yang

    Abstract: Our research focuses on few-shot fine-grained image classification, which faces two major challenges: appearance similarity of fine-grained objects and limited number of samples. To preserve the appearance details of images, traditional feature reconstruction networks usually enhance the representation ability of key features by spatial feature reconstruction and minimizing the reconstruction erro… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  39. arXiv:2407.01247  [pdf, ps, other

    cs.CV

    Multi-level Reliable Guidance for Unpaired Multi-view Clustering

    Authors: Like Xin, Wanqi Yang, Lei Wang, Ming Yang

    Abstract: In this paper, we address the challenging problem of unpaired multi-view clustering (UMC), aiming to perform effective joint clustering using unpaired observed samples across multiple views. Commonly, traditional incomplete multi-view clustering (IMC) methods often depend on paired samples to capture complementary information between views. However, the strategy becomes impractical in UMC due to t… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  40. arXiv:2407.01017  [pdf, other

    cs.CV

    Coding for Intelligence from the Perspective of Category

    Authors: Wenhan Yang, Zixuan Hu, Lilang Lin, Jiaying Liu, Ling-Yu Duan

    Abstract: Coding, which targets compressing and reconstructing data, and intelligence, often regarded at an abstract computational level as being centered around model learning and prediction, interweave recently to give birth to a series of significant progress. The recent trends demonstrate the potential homogeneity of these two fields, especially when deep-learning models aid these two categories for bet… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  41. arXiv:2406.19776  [pdf, other

    cs.MM cs.IR

    MDF: A Dynamic Fusion Model for Multi-modal Fake News Detection

    Authors: Hongzhen Lv, Wenzhong Yang, Fuyuan Wei, Jiaren Peng, Haokun Geng

    Abstract: Fake news detection has received increasing attention from researchers in recent years, especially multi-modal fake news detection containing both text and images. However, many previous works have fed two modal features, text and image, into a binary classifier after a simple concatenation or attention mechanism, in which the features contain a large amount of noise inherent in the data,which in… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  42. arXiv:2406.19101  [pdf, other

    cs.CV

    DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming

    Authors: Jiaxin Zhang, Wentao Yang, Songxuan Lai, Zecheng Xie, Lianwen Jin

    Abstract: Current multimodal large language models (MLLMs) face significant challenges in visual document understanding (VDU) tasks due to the high resolution, dense text, and complex layouts typical of document images. These characteristics demand a high level of detail perception ability from MLLMs. While increasing input resolution improves detail perception, it also leads to longer sequences of visual t… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  43. arXiv:2406.17442  [pdf, other

    cs.CV

    Mamba24/8D: Enhancing Global Interaction in Point Clouds via State Space Model

    Authors: Zhuoyuan Li, Yubo Ai, Jiahao Lu, ChuXin Wang, Jiacheng Deng, Hanzhi Chang, Yanzhe Liang, Wenfei Yang, Shifeng Zhang, Tianzhu Zhang

    Abstract: Transformers have demonstrated impressive results for 3D point cloud semantic segmentation. However, the quadratic complexity of transformer makes computation cost high, limiting the number of points that can be processed simultaneously and impeding the modeling of long-range dependencies. Drawing inspiration from the great potential of recent state space models (SSM) for long sequence modeling, w… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  44. arXiv:2406.16707  [pdf, other

    cs.LG cs.AI

    Probabilistic Subgoal Representations for Hierarchical Reinforcement learning

    Authors: Vivienne Huiling Wang, Tinghuai Wang, Wenyan Yang, Joni-Kristian Kämäräinen, Joni Pajarinen

    Abstract: In goal-conditioned hierarchical reinforcement learning (HRL), a high-level policy specifies a subgoal for the low-level policy to reach. Effective HRL hinges on a suitable subgoal represen tation function, abstracting state space into latent subgoal space and inducing varied low-level behaviors. Existing methods adopt a subgoal representation that provides a deterministic mapping from state space… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  45. arXiv:2406.16434  [pdf, other

    cs.CV

    Multi-threshold Deep Metric Learning for Facial Expression Recognition

    Authors: Wenwu Yang, Jinyi Yu, Tuo Chen, Zhenguang Liu, Xun Wang, Jianbing Shen

    Abstract: Effective expression feature representations generated by a triplet-based deep metric learning are highly advantageous for facial expression recognition (FER). The performance of triplet-based deep metric learning is contingent upon identifying the best threshold for triplet loss. Threshold validation, however, is tough and challenging, as the ideal threshold changes among datasets and even across… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: accepted by Pattern Recognition

  46. arXiv:2406.15846  [pdf, other

    cs.CL eess.AS

    Revisiting Interpolation Augmentation for Speech-to-Text Generation

    Authors: Chen Xu, Jie Wang, Xiaoqian Liu, Qianqian Dong, Chunliang Zhang, Tong Xiao, Jingbo Zhu, Dapeng Man, Wu Yang

    Abstract: Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Findings

  47. arXiv:2406.15459  [pdf, other

    cs.GT cs.CE cs.LG

    Large-Scale Contextual Market Equilibrium Computation through Deep Learning

    Authors: Yunxuan Ma, Yide Bian, Hao Xu, Weitao Yang, Jingshu Zhao, Zhijian Duan, Feng Wang, Xiaotie Deng

    Abstract: Market equilibrium is one of the most fundamental solution concepts in economics and social optimization analysis. Existing works on market equilibrium computation primarily focus on settings with a relatively small number of buyers. Motivated by this, our paper investigates the computation of market equilibrium in scenarios with a large-scale buyer population, where buyers and goods are represent… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: 22 pages

  48. arXiv:2406.14422  [pdf, other

    cs.CV cs.AI

    FutureNet-LOF: Joint Trajectory Prediction and Lane Occupancy Field Prediction with Future Context Encoding

    Authors: Mingkun Wang, Xiaoguang Ren, Ruochun Jin, Minglong Li, Xiaochuan Zhang, Changqian Yu, Mingxu Wang, Wenjing Yang

    Abstract: Most prior motion prediction endeavors in autonomous driving have inadequately encoded future scenarios, leading to predictions that may fail to accurately capture the diverse movements of agents (e.g., vehicles or pedestrians). To address this, we propose FutureNet, which explicitly integrates initially predicted trajectories into the future scenario and further encodes these future contexts to e… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 10 pages

  49. arXiv:2406.14080  [pdf, other

    cs.CV cs.GR

    CMTNet: Convolutional Meets Transformer Network for Hyperspectral Images Classification

    Authors: Faxu Guo, Quan Feng, Sen Yang, Wanxia Yang

    Abstract: Hyperspectral remote sensing (HIS) enables the detailed capture of spectral information from the Earth's surface, facilitating precise classification and identification of surface crops due to its superior spectral diagnostic capabilities. However, current convolutional neural networks (CNNs) focus on local features in hyperspectral data, leading to suboptimal performance when classifying intricat… ▽ More

    Submitted 20 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 15 pages, 11figures

    ACM Class: I.4.6

  50. arXiv:2406.14052  [pdf, other

    eess.IV cs.CV

    Perspective+ Unet: Enhancing Segmentation with Bi-Path Fusion and Efficient Non-Local Attention for Superior Receptive Fields

    Authors: Jintong Hu, Siyan Chen, Zhiyi Pan, Sen Zeng, Wenming Yang

    Abstract: Precise segmentation of medical images is fundamental for extracting critical clinical information, which plays a pivotal role in enhancing the accuracy of diagnoses, formulating effective treatment plans, and improving patient outcomes. Although Convolutional Neural Networks (CNNs) and non-local attention methods have achieved notable success in medical image segmentation, they either struggle to… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 13 pages, 5 figures