Skip to main content

Showing 1–50 of 3,716 results for author: Wang, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11029  [pdf, other

    cs.CL

    Scaling Law with Learning Rate Annealing

    Authors: Howe Tissue, Venus Wang, Lu Wang

    Abstract: We find that the cross-entropy loss curves of neural language models empirically adhere to a scaling law with learning rate (LR) annealing over training steps ($s$): $$L(s) = L_0 + A\cdot S_1^{-α} - C\cdot S_2$$ Where $S_1$ is forward area and $S_2$ is learning rate annealing area. This formulation takes into account two factors: (1) The forward scaling defined as typical scaling law, and (2) the… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 25 pages, 23 figures

  2. arXiv:2408.10145  [pdf, other

    cs.CV

    Multi-Scale Representation Learning for Image Restoration with State-Space Model

    Authors: Yuhong He, Long Peng, Qiaosi Yi, Chen Wu, Lu Wang

    Abstract: Image restoration endeavors to reconstruct a high-quality, detail-rich image from a degraded counterpart, which is a pivotal process in photography and various computer vision systems. In real-world scenarios, different types of degradation can cause the loss of image details at various scales and degrade image contrast. Existing methods predominantly rely on CNN and Transformer to capture multi-s… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  3. arXiv:2408.10120  [pdf, other

    cs.AI

    Geometry Informed Tokenization of Molecules for Language Model Generation

    Authors: Xiner Li, Limei Wang, Youzhi Luo, Carl Edwards, Shurui Gui, Yuchao Lin, Heng Ji, Shuiwang Ji

    Abstract: We consider molecule generation in 3D space using language models (LMs), which requires discrete tokenization of 3D molecular geometries. Although tokenization of molecular graphs exists, that for 3D geometries is largely unexplored. Here, we attempt to bridge this gap by proposing the Geo2Seq, which converts molecular geometries into $SE(3)$-invariant 1D discrete sequences. Geo2Seq consists of ca… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  4. arXiv:2408.10069  [pdf, other

    cs.CV

    LNQ 2023 challenge: Benchmark of weakly-supervised techniques for mediastinal lymph node quantification

    Authors: Reuben Dorent, Roya Khajavi, Tagwa Idris, Erik Ziegler, Bhanusupriya Somarouthu, Heather Jacene, Ann LaCasce, Jonathan Deissler, Jan Ehrhardt, Sofija Engelson, Stefan M. Fischer, Yun Gu, Heinz Handels, Satoshi Kasai, Satoshi Kondo, Klaus Maier-Hein, Julia A. Schnabel, Guotai Wang, Litingyu Wang, Tassilo Wald, Guang-Zhong Yang, Hanxiao Zhang, Minghui Zhang, Steve Pieper, Gordon Harris , et al. (2 additional authors not shown)

    Abstract: Accurate assessment of lymph node size in 3D CT scans is crucial for cancer staging, therapeutic management, and monitoring treatment response. Existing state-of-the-art segmentation frameworks in medical imaging often rely on fully annotated datasets. However, for lymph node segmentation, these datasets are typically small due to the extensive time and expertise required to annotate the numerous… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Submitted to MELBA

  5. arXiv:2408.09787  [pdf, other

    cs.CL cs.CV cs.MM

    Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation

    Authors: Yunxin Li, Haoyuan Shi, Baotian Hu, Longyue Wang, Jiashun Zhu, Jinyi Xu, Zhen Zhao, Min Zhang

    Abstract: Traditional animation generation methods depend on training generative models with human-labelled data, entailing a sophisticated multi-stage pipeline that demands substantial human effort and incurs high training costs. Due to limited prompting plans, these methods typically produce brief, information-poor, and context-incoherent animations. To overcome these limitations and automate the animatio… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted by SIGGRAPH Asia 2024, Project and Codes: https://github.com/HITsz-TMG/Anim-Director

  6. arXiv:2408.09496  [pdf, other

    cs.CV

    StyleBrush: Style Extraction and Transfer from a Single Image

    Authors: Wancheng Feng, Wanquan Feng, Dawei Huang, Jiaming Pei, Guangliang Cheng, Lukun Wang

    Abstract: Stylization for visual content aims to add specific style patterns at the pixel level while preserving the original structural features. Compared with using predefined styles, stylization guided by reference style images is more challenging, where the main difficulty is to effectively separate style from structural elements. In this paper, we propose StyleBrush, a method that accurately captures s… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 9 pages, 6figures, Under Review

  7. Weakly Supervised Lymph Nodes Segmentation Based on Partial Instance Annotations with Pre-trained Dual-branch Network and Pseudo Label Learning

    Authors: Litingyu Wang, Yijie Qu, Xiangde Luo, Wenjun Liao, Shichuan Zhang, Guotai Wang

    Abstract: Assessing the presence of potentially malignant lymph nodes aids in estimating cancer progression, and identifying surrounding benign lymph nodes can assist in determining potential metastatic pathways for cancer. For quantitative analysis, automatic segmentation of lymph nodes is crucial. However, due to the labor-intensive and time-consuming manual annotation process required for a large number… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2024:013

    Journal ref: Machine.Learning.for.Biomedical.Imaging. 2 (2024)

  8. arXiv:2408.09336  [pdf, other

    cs.CV

    Elite360M: Efficient 360 Multi-task Learning via Bi-projection Fusion and Cross-task Collaboration

    Authors: Hao Ai, Lin Wang

    Abstract: 360 cameras capture the entire surrounding environment with a large FoV, exhibiting comprehensive visual information to directly infer the 3D structures, e.g., depth and surface normal, and semantic information simultaneously. Existing works predominantly specialize in a single task, leaving multi-task learning of 3D geometry and semantics largely unexplored. Achieving such an objective is, howeve… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 15 pages main paper

  9. arXiv:2408.09198  [pdf, other

    cs.RO

    Learning Based Toolpath Planner on Diverse Graphs for 3D Printing

    Authors: Yuming Huang, Yuhu Guo, Renbo Su, Xingjian Han, Junhao Ding, Tianyu Zhang, Tao Liu, Weiming Wang, Guoxin Fang, Xu Song, Emily Whiting, Charlie C. L. Wang

    Abstract: This paper presents a learning based planner for computing optimized 3D printing toolpaths on prescribed graphs, the challenges of which include the varying graph structures on different models and the large scale of nodes & edges on a graph. We adopt an on-the-fly strategy to tackle these challenges, formulating the planner as a Deep Q-Network (DQN) based optimizer to decide the next `best' node… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  10. arXiv:2408.09181  [pdf, other

    cs.CV cs.CR cs.LG

    PADetBench: Towards Benchmarking Physical Attacks against Object Detection

    Authors: Jiawei Lian, Jianhong Pan, Lefan Wang, Yi Wang, Lap-Pui Chau, Shaohui Mei

    Abstract: Physical attacks against object detection have gained increasing attention due to their significant practical implications. However, conducting physical experiments is extremely time-consuming and labor-intensive. Moreover, physical dynamics and cross-domain transformation are challenging to strictly regulate in the real world, leading to unaligned evaluation and comparison, severely hindering the… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  11. arXiv:2408.09115  [pdf, other

    cs.CV

    GoodSAM++: Bridging Domain and Capacity Gaps via Segment Anything Model for Panoramic Semantic Segmentation

    Authors: Weiming Zhang, Yexin Liu, Xu Zheng, Lin Wang

    Abstract: This paper presents GoodSAM++, a novel framework utilizing the powerful zero-shot instance segmentation capability of SAM (i.e., teacher) to learn a compact panoramic semantic segmentation model, i.e., student, without requiring any labeled data. GoodSAM++ addresses two critical challenges: 1) SAM's inability to provide semantic labels and inherent distortion problems of panoramic images; 2) the s… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 15 pages, under review. arXiv admin note: substantial text overlap with arXiv:2403.16370

  12. arXiv:2408.08295  [pdf, other

    cs.CV cs.AI cs.LG

    SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training

    Authors: Gengwei Zhang, Liyuan Wang, Guoliang Kang, Ling Chen, Yunchao Wei

    Abstract: In recent years, continual learning with pre-training (CLPT) has received widespread interest, instead of its traditional focus of training from scratch. The use of strong pre-trained models (PTMs) can greatly facilitate knowledge transfer and alleviate catastrophic forgetting, but also suffers from progressive overfitting of pre-trained knowledge into specific downstream tasks. A majority of curr… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: This paper is an extension of our ICCV 23 paper (arXiv:2303.05118)

  13. arXiv:2408.07869  [pdf, other

    cs.LG

    A Systematic Evaluation of Generated Time Series and Their Effects in Self-Supervised Pretraining

    Authors: Audrey Der, Chin-Chia Michael Yeh, Xin Dai, Huiyuan Chen, Yan Zheng, Yujie Fan, Zhongfang Zhuang, Vivian Lai, Junpeng Wang, Liang Wang, Wei Zhang, Eamonn Keogh

    Abstract: Self-supervised Pretrained Models (PTMs) have demonstrated remarkable performance in computer vision and natural language processing tasks. These successes have prompted researchers to design PTMs for time series data. In our experiments, most self-supervised time series PTMs were surpassed by simple supervised models. We hypothesize this undesired phenomenon may be caused by data scarcity. In res… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: To appear in CIKM 2024 as a short paper; the version here is the self-contained version that includes the non-mandatory supplementary material available on the paper's companion website

  14. arXiv:2408.07530  [pdf, other

    cs.CV

    Towards Real-time Video Compressive Sensing on Mobile Devices

    Authors: Miao Cao, Lishun Wang, Huan Wang, Guoqing Wang, Xin Yuan

    Abstract: Video Snapshot Compressive Imaging (SCI) uses a low-speed 2D camera to capture high-speed scenes as snapshot compressed measurements, followed by a reconstruction algorithm to retrieve the high-speed video frames. The fast evolving mobile devices and existing high-performance video SCI reconstruction algorithms motivate us to develop mobile reconstruction methods for real-world applications. Yet,… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 9 pages, Accepted by ACM MM 2024

  15. arXiv:2408.07410  [pdf, other

    cs.CL

    Aquila2 Technical Report

    Authors: Bo-Wen Zhang, Liangdong Wang, Jijie Li, Shuhao Gu, Xinya Wu, Zhengduo Zhang, Boyan Gao, Yulong Ao, Guang Liu

    Abstract: This paper introduces the Aquila2 series, which comprises a wide range of bilingual models with parameter sizes of 7, 34, and 70 billion. These models are trained based on an innovative framework named HeuriMentor (HM), which offers real-time insights into model convergence and enhances the training process and data management. The HM System, comprising the Adaptive Training Engine (ATE), Training… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  16. arXiv:2408.07114  [pdf, other

    eess.IV cs.LG

    Investigation of unsupervised and supervised hyperspectral anomaly detection

    Authors: Mazharul Hossain, Aaron Robinson, Lan Wang, Chrysanthe Preza

    Abstract: Hyperspectral sensing is a valuable tool for detecting anomalies and distinguishing between materials in a scene. Hyperspectral anomaly detection (HS-AD) helps characterize the captured scenes and separates them into anomaly and background classes. It is vital in agriculture, environment, and military applications such as RSTA (reconnaissance, surveillance, and target acquisition) missions. We pre… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  17. arXiv:2408.07104  [pdf, other

    cs.LG

    Model Based and Physics Informed Deep Learning Neural Network Structures

    Authors: Ali Mohammad-Djafari, Ning Chu, Li Wang, Caifang Cai, Liang Yu

    Abstract: Neural Networks (NN) has been used in many areas with great success. When a NN's structure (Model) is given, during the training steps, the parameters of the model are determined using an appropriate criterion and an optimization algorithm (Training). Then, the trained model can be used for the prediction or inference step (Testing). As there are also many hyperparameters, related to the optimizat… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: key words: Deep Neural Network, Inverse problems; Bayesian inference; Model based DNN structure, MaxEnt2024 conference, Gent University, Gent, Belgium, July 1-5, 2024

  18. arXiv:2408.07009  [pdf, other

    cs.CV

    Imagen 3

    Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis, Siavash Khodadadeh , et al. (227 additional authors not shown)

    Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

    Submitted 13 August, 2024; originally announced August 2024.

  19. arXiv:2408.06911  [pdf, other

    eess.AS cs.AI

    Heterogeneous Space Fusion and Dual-Dimension Attention: A New Paradigm for Speech Enhancement

    Authors: Tao Zheng, Liejun Wang, Yinfeng Yu

    Abstract: Self-supervised learning has demonstrated impressive performance in speech tasks, yet there remains ample opportunity for advancement in the realm of speech enhancement research. In addressing speech tasks, confining the attention mechanism solely to the temporal dimension poses limitations in effectively focusing on critical speech features. Considering the aforementioned issues, our study introd… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted for publication by IEEE International Conference on Systems, Man, and Cybernetics 2024

  20. arXiv:2408.06906  [pdf, other

    eess.AS cs.AI

    VNet: A GAN-based Multi-Tier Discriminator Network for Speech Synthesis Vocoders

    Authors: Yubing Cao, Yongming Li, Liejun Wang, Yinfeng Yu

    Abstract: Since the introduction of Generative Adversarial Networks (GANs) in speech synthesis, remarkable achievements have been attained. In a thorough exploration of vocoders, it has been discovered that audio waveforms can be generated at speeds exceeding real-time while maintaining high fidelity, achieved through the utilization of GAN-based models. Typically, the inputs to the vocoder consist of band-… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted for publication by IEEE International Conference on Systems, Man, and Cybernetics 2024

  21. arXiv:2408.06851  [pdf, other

    eess.AS cs.AI

    BSS-CFFMA: Cross-Domain Feature Fusion and Multi-Attention Speech Enhancement Network based on Self-Supervised Embedding

    Authors: Alimjan Mattursun, Liejun Wang, Yinfeng Yu

    Abstract: Speech self-supervised learning (SSL) represents has achieved state-of-the-art (SOTA) performance in multiple downstream tasks. However, its application in speech enhancement (SE) tasks remains immature, offering opportunities for improvement. In this study, we introduce a novel cross-domain feature fusion and multi-attention speech enhancement network, termed BSS-CFFMA, which leverages self-super… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted for publication by IEEE International Conference on Systems, Man, and Cybernetics 2024

  22. arXiv:2408.06840  [pdf, other

    cs.CV

    Dynamic and Compressive Adaptation of Transformers From Images to Videos

    Authors: Guozhen Zhang, Jingyu Liu, Shengming Cao, Xiaotong Zhao, Kevin Zhao, Kai Ma, Limin Wang

    Abstract: Recently, the remarkable success of pre-trained Vision Transformers (ViTs) from image-text matching has sparked an interest in image-to-video adaptation. However, most current approaches retain the full forward pass for each frame, leading to a high computation overhead for processing entire videos. In this paper, we present InTI, a novel approach for compressive image-to-video adaptation using dy… ▽ More

    Submitted 13 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

  23. arXiv:2408.06653  [pdf, other

    cs.IR cs.AI

    Hierarchical Structured Neural Network for Retrieval

    Authors: Kaushik Rangadurai, Siyang Yuan, Minhui Huang, Yiqun Liu, Golnaz Ghasemiesfeh, Yunchen Pu, Xinfeng Xie, Xingfeng He, Fangzhou Xu, Andrew Cui, Vidhoon Viswanathan, Yan Dong, Liang Xiong, Lin Yang, Liang Wang, Jiyan Yang, Chonglin Sun

    Abstract: Embedding Based Retrieval (EBR) is a crucial component of the retrieval stage in (Ads) Recommendation System that utilizes Two Tower or Siamese Networks to learn embeddings for both users and items (ads). It then employs an Approximate Nearest Neighbor Search (ANN) to efficiently retrieve the most relevant ads for a specific user. Despite the recent rise to popularity in the industry, they have a… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 9 pages

  24. arXiv:2408.06567  [pdf, other

    cs.CL cs.AI

    AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

    Authors: Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, Chengwei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu , et al. (2 additional authors not shown)

    Abstract: In recent years, with the rapid application of large language models across various fields, the scale of these models has gradually increased, and the resources required for their pre-training have grown exponentially. Training an LLM from scratch will cost a lot of computation resources while scaling up from a smaller model is a more efficient approach and has thus attracted significant attention… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  25. arXiv:2408.06549  [pdf, other

    cs.LG cs.DC

    Prioritizing Modalities: Flexible Importance Scheduling in Federated Multimodal Learning

    Authors: Jieming Bian, Lei Wang, Jie Xu

    Abstract: Federated Learning (FL) is a distributed machine learning approach that enables devices to collaboratively train models without sharing their local data, ensuring user privacy and scalability. However, applying FL to real-world data presents challenges, particularly as most existing FL research focuses on unimodal data. Multimodal Federated Learning (MFL) has emerged to address these challenges, l… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Submitted to IEEE TMC, under review

  26. arXiv:2408.06360  [pdf, other

    cs.IR cs.CV

    Modality-Balanced Learning for Multimedia Recommendation

    Authors: Jinghao Zhang, Guofan Liu, Qiang Liu, Shu Wu, Liang Wang

    Abstract: Many recommender models have been proposed to investigate how to incorporate multimodal content information into traditional collaborative filtering framework effectively. The use of multimodal information is expected to provide more comprehensive information and lead to superior performance. However, the integration of multiple modalities often encounters the modal imbalance problem: since the in… ▽ More

    Submitted 26 July, 2024; originally announced August 2024.

    Comments: ACM Multimedia 2024 (Oral)

  27. arXiv:2408.06003  [pdf, other

    cs.AR cs.LG

    LUT Tensor Core: Lookup Table Enables Efficient Low-Bit LLM Inference Acceleration

    Authors: Zhiwen Mo, Lei Wang, Jianyu Wei, Zhichen Zeng, Shijie Cao, Lingxiao Ma, Naifeng Jing, Ting Cao, Jilong Xue, Fan Yang, Mao Yang

    Abstract: As large language model (LLM) inference demands ever-greater resources, there is a rapid growing trend of using low-bit weights to shrink memory usage and boost inference efficiency. However, these low-bit LLMs introduce the need for mixed-precision matrix multiplication (mpGEMM), which is a crucial yet under-explored operation that involves multiplying lower-precision weights with higher-precisio… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  28. arXiv:2408.05775  [pdf, other

    cs.CV

    Efficient Test-Time Prompt Tuning for Vision-Language Models

    Authors: Yuhan Zhu, Guozhen Zhang, Chen Xu, Haocheng Shen, Xiaoxin Chen, Gangshan Wu, Limin Wang

    Abstract: Vision-language models have showcased impressive zero-shot classification capabilities when equipped with suitable text prompts. Previous studies have shown the effectiveness of test-time prompt tuning; however, these methods typically require per-image prompt adaptation during inference, which incurs high computational budgets and limits scalability and practical deployment. To overcome this issu… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  29. arXiv:2408.05758  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing

    Authors: Chunyu Qiang, Wang Geng, Yi Zhao, Ruibo Fu, Tao Wang, Cheng Gong, Tianrui Wang, Qiuyu Liu, Jiangyan Yi, Zhengqi Wen, Chen Zhang, Hao Che, Longbiao Wang, Jianwu Dang, Jianhua Tao

    Abstract: Deep learning has brought significant improvements to the field of cross-modal representation learning. For tasks such as text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), a cross-modal fine-grained (frame-level) sequence representation is desired, emphasizing the semantic content of the text modality while de-emphasizing the paralinguistic information of the spe… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  30. arXiv:2408.05678  [pdf, other

    cs.DC cs.AI cs.LG

    Efficient Federated Learning Using Dynamic Update and Adaptive Pruning with Momentum on Shared Server Data

    Authors: Ji Liu, Juncheng Jia, Hong Zhang, Yuhui Yun, Leye Wang, Yang Zhou, Huaiyu Dai, Dejing Dou

    Abstract: Despite achieving remarkable performance, Federated Learning (FL) encounters two important problems, i.e., low training efficiency and limited computational resources. In this paper, we propose a new FL framework, i.e., FedDUMAP, with three original contributions, to leverage the shared insensitive data on the server in addition to the distributed data in edge devices so as to efficiently train a… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 27 pages, to appear in TIST

  31. arXiv:2408.05455  [pdf, other

    cs.CV cs.NI

    Multimodal generative semantic communication based on latent diffusion model

    Authors: Weiqi Fu, Lianming Xu, Xin Wu, Haoyang Wei, Li Wang

    Abstract: In emergencies, the ability to quickly and accurately gather environmental data and command information, and to make timely decisions, is particularly critical. Traditional semantic communication frameworks, primarily based on a single modality, are susceptible to complex environments and lighting conditions, thereby limiting decision accuracy. To this end, this paper introduces a multimodal gener… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  32. arXiv:2408.04400  [pdf, other

    cs.LG cs.AI

    DIVE: Subgraph Disagreement for Graph Out-of-Distribution Generalization

    Authors: Xin Sun, Liang Wang, Qiang Liu, Shu Wu, Zilei Wang, Liang Wang

    Abstract: This paper addresses the challenge of out-of-distribution (OOD) generalization in graph machine learning, a field rapidly advancing yet grappling with the discrepancy between source and target data distributions. Traditional graph learning algorithms, based on the assumption of uniform distribution between training and test data, falter in real-world scenarios where this assumption fails, resultin… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  33. arXiv:2408.04203  [pdf, other

    cs.AI

    MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents

    Authors: Yanqi Dai, Huanran Hu, Lei Wang, Shengjie Jin, Xu Chen, Zhiwu Lu

    Abstract: Recently, Role-Playing Agents (RPAs) have garnered increasing attention for their potential to deliver emotional value and facilitate sociological research. However, existing studies are primarily confined to the textual modality, unable to simulate humans' multimodal perceptual capabilities. To bridge this gap, we introduce the concept of Multimodal Role-Playing Agents (MRPAs), and propose a comp… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  34. arXiv:2408.03651  [pdf, other

    eess.IV cs.CV

    SAM2-PATH: A better segment anything model for semantic segmentation in digital pathology

    Authors: Mingya Zhang, Liang Wang, Limei Gu, Zhao Li, Yaohui Wang, Tingshen Ling, Xianping Tao

    Abstract: The semantic segmentation task in pathology plays an indispensable role in assisting physicians in determining the condition of tissue lesions. Foundation models, such as the SAM (Segment Anything Model) and SAM2, exhibit exceptional performance in instance segmentation within everyday natural scenes. SAM-PATH has also achieved impressive results in semantic segmentation within the field of pathol… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 6 pages , 3 figures

  35. arXiv:2408.03616  [pdf, other

    eess.IV cs.CV

    Distillation Learning Guided by Image Reconstruction for One-Shot Medical Image Segmentation

    Authors: Feng Zhou, Yanjie Zhou, Longjie Wang, Yun Peng, David E. Carlson, Liyun Tu

    Abstract: Traditional one-shot medical image segmentation (MIS) methods use registration networks to propagate labels from a reference atlas or rely on comprehensive sampling strategies to generate synthetic labeled data for training. However, these methods often struggle with registration errors and low-quality synthetic images, leading to poor performance and generalization. To overcome this, we introduce… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  36. arXiv:2408.03482  [pdf, other

    cs.CR

    Beyond App Markets: Demystifying Underground Mobile App Distribution Via Telegram

    Authors: Yanhui Guo, Dong Wang, Liu Wang, Yongsheng Fang, Chao Wang, Minghui Yang, Tianming Liu, Haoyu Wang

    Abstract: The thriving mobile app ecosystem encompasses a wide range of functionalities. However, within this ecosystem, a subset of apps provides illicit services such as gambling and pornography to pursue economic gains, collectively referred to as "underground economy apps". While previous studies have examined these apps' characteristics and identification methods, investigations into their distribution… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  37. arXiv:2408.03446  [pdf, other

    cs.NI eess.SP

    Optimizing NOMA Transmissions to Advance Federated Learning in Vehicular Networks

    Authors: Ziru Chen, Zhou Ni, Peiyuan Guan, Lu Wang, Lin X. Cai, Morteza Hashemi, Zongzhi Li

    Abstract: Diverse critical data, such as location information and driving patterns, can be collected by IoT devices in vehicular networks to improve driving experiences and road safety. However, drivers are often reluctant to share their data due to privacy concerns. The Federated Vehicular Network (FVN) is a promising technology that tackles these concerns by transmitting model parameters instead of raw da… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: The paper is accepted by IEEE Globecom 2024

  38. arXiv:2408.03246  [pdf, other

    cs.CL

    Making Long-Context Language Models Better Multi-Hop Reasoners

    Authors: Yanyang Li, Shuo Liang, Michael R. Lyu, Liwei Wang

    Abstract: Recent advancements in long-context modeling have enhanced language models (LMs) for complex tasks across multiple NLP applications. Despite this progress, we find that these models struggle with multi-hop reasoning and exhibit decreased performance in the presence of noisy contexts. In this paper, we introduce Reasoning with Attributions, a novel approach that prompts LMs to supply attributions f… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: ACL 2024 Main Conference Camera Ready; Dataset, model, and code are available at https://github.com/LaVi-Lab/LongContextReasoner

  39. arXiv:2408.02964  [pdf, other

    cs.CL

    Accuracy and Consistency of LLMs in the Registered Dietitian Exam: The Impact of Prompt Engineering and Knowledge Retrieval

    Authors: Iman Azimi, Mohan Qi, Li Wang, Amir M. Rahmani, Youlin Li

    Abstract: Large language models (LLMs) are fundamentally transforming human-facing applications in the health and well-being domains: boosting patient engagement, accelerating clinical decision-making, and facilitating medical education. Although state-of-the-art LLMs have shown superior performance in several conversational applications, evaluations within nutrition and diet applications are still insuffic… ▽ More

    Submitted 7 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  40. arXiv:2408.02796  [pdf, other

    cs.CV

    Gaussian Mixture based Evidential Learning for Stereo Matching

    Authors: Weide Liu, Xingxing Wang, Lu Wang, Jun Cheng, Fayao Liu, Xulei Yang

    Abstract: In this paper, we introduce a novel Gaussian mixture based evidential learning solution for robust stereo matching. Diverging from previous evidential deep learning approaches that rely on a single Gaussian distribution, our framework posits that individual image data adheres to a mixture-of-Gaussian distribution in stereo matching. This assumption yields more precise pixel-level predictions and m… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  41. arXiv:2408.02704  [pdf

    cs.LG cs.AI

    Spatial-temporal Graph Convolutional Networks with Diversified Transformation for Dynamic Graph Representation Learning

    Authors: Ling Wang, Yixiang Huang, Hao Wu

    Abstract: Dynamic graphs (DG) are often used to describe evolving interactions between nodes in real-world applications. Temporal patterns are a natural feature of DGs and are also key to representation learning. However, existing dynamic GCN models are mostly composed of static GCNs and sequence modules, which results in the separation of spatiotemporal information and cannot effectively capture complex te… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 8 papges, 1 figure

  42. arXiv:2408.02695  [pdf, other

    cs.LG cs.AI

    Distribution-Level Memory Recall for Continual Learning: Preserving Knowledge and Avoiding Confusion

    Authors: Shaoxu Cheng, Kanglei Geng, Chiyuan He, Zihuan Qiu, Linfeng Xu, Heqian Qiu, Lanxiao Wang, Qingbo Wu, Fanman Meng, Hongliang Li

    Abstract: Continual Learning (CL) aims to enable Deep Neural Networks (DNNs) to learn new data without forgetting previously learned knowledge. The key to achieving this goal is to avoid confusion at the feature level, i.e., avoiding confusion within old tasks and between new and old tasks. Previous prototype-based CL methods generate pseudo features for old knowledge replay by adding Gaussian noise to the… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  43. arXiv:2408.01715  [pdf, other

    cs.CR cs.AI

    Joint Universal Adversarial Perturbations with Interpretations

    Authors: Liang-bo Ning, Zeyu Dai, Wenqi Fan, Jingran Su, Chao Pan, Luning Wang, Qing Li

    Abstract: Deep neural networks (DNNs) have significantly boosted the performance of many challenging tasks. Despite the great development, DNNs have also exposed their vulnerability. Recent studies have shown that adversaries can manipulate the predictions of DNNs by adding a universal adversarial perturbation (UAP) to benign samples. On the other hand, increasing efforts have been made to help users unders… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  44. arXiv:2408.01269  [pdf, other

    cs.CV

    A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness

    Authors: Lutao Jiang, Hangyu Li, Lin Wang

    Abstract: Text-to-3D content creation has recently received much attention, especially with the prevalence of 3D Gaussians Splatting. In general, GS-based methods comprise two key stages: initialization and rendering optimization. To achieve initialization, existing works directly apply random sphere initialization or 3D diffusion models, e.g., Point-E, to derive the initial shapes. However, such strategies… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Journal ref: ACM MM 2024

  45. arXiv:2408.00765  [pdf, other

    cs.CV cs.AI cs.CL

    MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities

    Authors: Weihao Yu, Zhengyuan Yang, Linfeng Ren, Linjie Li, Jianfeng Wang, Kevin Lin, Chung-Ching Lin, Zicheng Liu, Lijuan Wang, Xinchao Wang

    Abstract: MM-Vet, with open-ended vision-language questions targeting at evaluating integrated capabilities, has become one of the most popular benchmarks for large multimodal model evaluation. MM-Vet assesses six core vision-language (VL) capabilities: recognition, knowledge, spatial awareness, language generation, OCR, and math. However, its question format is restricted to single image-text pairs, lackin… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Extension of MM-Vet: arXiv:2308.02490

  46. arXiv:2408.00376  [pdf, other

    cs.LG cs.AI

    On the Limitations and Prospects of Machine Unlearning for Generative AI

    Authors: Shiji Zhou, Lianzhe Wang, Jiangnan Ye, Yongliang Wu, Heng Chang

    Abstract: Generative AI (GenAI), which aims to synthesize realistic and diverse data samples from latent variables or other data modalities, has achieved remarkable results in various domains, such as natural language, images, audio, and graphs. However, they also pose challenges and risks to data privacy, security, and ethics. Machine unlearning is the process of removing or weakening the influence of spec… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  47. RoCo:Robust Collaborative Perception By Iterative Object Matching and Pose Adjustment

    Authors: Zhe Huang, Shuo Wang, Yongcai Wang, Wanting Li, Deying Li, Lei Wang

    Abstract: Collaborative autonomous driving with multiple vehicles usually requires the data fusion from multiple modalities. To ensure effective fusion, the data from each individual modality shall maintain a reasonably high quality. However, in collaborative perception, the quality of object detection based on a modality is highly sensitive to the relative pose errors among the agents. It leads to feature… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

    Comments: ACM MM2024

    Journal ref: Proceedings of the 32nd ACM International Conference on Multimedia (MM '24), October 28-November 1, 2024, Melbourne, VIC, Australia

  48. arXiv:2408.00038  [pdf, other

    cs.IR

    MIMNet: Multi-Interest Meta Network with Multi-Granularity Target-Guided Attention for Cross-domain Recommendation

    Authors: Xiaofei Zhu, Yabo Yin, Li Wang

    Abstract: Cross-domain recommendation (CDR) plays a critical role in alleviating the sparsity and cold-start problem and substantially boosting the performance of recommender systems. Existing CDR methods prefer to either learn a common preference bridge shared by all users or a personalized preference bridge tailored for each user to transfer user preference from the source domain to the target domain. Alt… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

  49. arXiv:2407.21735  [pdf, other

    cs.CV

    Unifying Event-based Flow, Stereo and Depth Estimation via Feature Similarity Matching

    Authors: Pengjie Zhang, Lin Zhu, Lizhi Wang, Hua Huang

    Abstract: As an emerging vision sensor, the event camera has gained popularity in various vision tasks such as optical flow estimation, stereo matching, and depth estimation due to its high-speed, sparse, and asynchronous event streams. Unlike traditional approaches that use specialized architectures for each specific task, we propose a unified framework, EventMatch, that reformulates these tasks as an even… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  50. arXiv:2407.21531  [pdf, other

    cs.SD cs.CL cs.MM eess.AS

    Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation

    Authors: Ziya Zhou, Yuhang Wu, Zhiyue Wu, Xinyue Zhang, Ruibin Yuan, Yinghao Ma, Lu Wang, Emmanouil Benetos, Wei Xue, Yike Guo

    Abstract: Symbolic Music, akin to language, can be encoded in discrete symbols. Recent research has extended the application of large language models (LLMs) such as GPT-4 and Llama2 to the symbolic music domain including understanding and generation. Yet scant research explores the details of how these LLMs perform on advanced music understanding and conditioned generation, especially from the multi-step re… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted by ISMIR2024