Skip to main content

Showing 1–50 of 3,189 results for author: Zhang, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.09269  [pdf, other

    cs.SD cs.LG eess.AS

    Enhancing Audio-Language Models through Self-Supervised Post-Training with Text-Audio Pairs

    Authors: Anshuman Sinha, Camille Migozzi, Aubin Rey, Chao Zhang

    Abstract: Research on multi-modal contrastive learning strategies for audio and text has rapidly gained interest. Contrastively trained Audio-Language Models (ALMs), such as CLAP, which establish a unified representation across audio and language modalities, have enhanced the efficacy in various subsequent tasks by providing good text aligned audio encoders and vice versa. These improvements are evident in… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 31 pages, 11 figures

  2. arXiv:2408.08931  [pdf, other

    cs.IR cs.AI cs.LG

    Personalized Federated Collaborative Filtering: A Variational AutoEncoder Approach

    Authors: Zhiwei Li, Guodong Long, Tianyi Zhou, Jing Jiang, Chengqi Zhang

    Abstract: Federated Collaborative Filtering (FedCF) is an emerging field focused on developing a new recommendation framework with preserving privacy in a federated setting. Existing FedCF methods typically combine distributed Collaborative Filtering (CF) algorithms with privacy-preserving mechanisms, and then preserve personalized information into a user embedding vector. However, the user embedding is usu… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 10 pages, 3 figures, 4 tables, conference

  3. arXiv:2408.08882  [pdf, other

    cs.DC

    A 1024 RV-Cores Shared-L1 Cluster with High Bandwidth Memory Link for Low-Latency 6G-SDR

    Authors: Yichao Zhang, Marco Bertuletti, Chi Zhang, Samuel Riedel, Alessandro Vanelli-Coralli, Luca Benini

    Abstract: We introduce an open-source architecture for next-generation Radio-Access Network baseband processing: 1024 latency-tolerant 32-bit RISC-V cores share 4 MiB of L1 memory via an ultra-low latency interconnect (7-11 cycles), a modular Direct Memory Access engine provides an efficient link to a high bandwidth memory, such as HBM2E (98% peak bandwidth at 910GBps). The system achieves leading-edge ener… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  4. arXiv:2408.08588  [pdf, other

    cs.IT eess.SP

    Movable Antenna for Wireless Communications:Prototyping and Experimental Results

    Authors: Zhenjun Dong, Zhiwen Zhou, Zhiqiang Xiao, Chaoyue Zhang, Xinrui Li, Hongqi Min, Yong Zeng, Shi Jin, Rui Zhang

    Abstract: Movable antenna (MA), which can flexibly change the position of antenna in three-dimensional (3D) continuous space, is an emerging technology for achieving full spatial performance gains. In this paper, a prototype of MA communication system with ultra-accurate movement control is presented to verify the performance gain of MA in practical environments. The prototype utilizes the feedback control… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  5. arXiv:2408.08315  [pdf, other

    cs.CV cs.AI

    Segment Anything for Videos: A Systematic Survey

    Authors: Chunhui Zhang, Yawen Cui, Weilin Lin, Guanjie Huang, Yan Rong, Li Liu, Shiguang Shan

    Abstract: The recent wave of foundation models has witnessed tremendous success in computer vision (CV) and beyond, with the segment anything model (SAM) having sparked a passion for exploring task-agnostic visual foundation models. Empowered by its remarkable zero-shot generalization, SAM is currently challenging numerous traditional paradigms in CV, delivering extraordinary performance not only in various… ▽ More

    Submitted 30 July, 2024; originally announced August 2024.

    Comments: https://github.com/983632847/SAM-for-Videos

  6. arXiv:2408.08209  [pdf, other

    cs.IR

    Modeling Domain and Feedback Transitions for Cross-Domain Sequential Recommendation

    Authors: Changshuo Zhang, Teng Shi, Xiao Zhang, Qi Liu, Ruobing Xie, Jun Xu, Ji-Rong Wen

    Abstract: Nowadays, many recommender systems encompass various domains to cater to users' diverse needs, leading to user behaviors transitioning across different domains. In fact, user behaviors across different domains reveal changes in preference toward recommended items. For instance, a shift from negative feedback to positive feedback indicates improved user satisfaction. However, existing cross-domain… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  7. arXiv:2408.08192  [pdf, other

    cs.LG cs.GT cs.MA math.OC

    Stochastic Semi-Gradient Descent for Learning Mean Field Games with Population-Aware Function Approximation

    Authors: Chenyu Zhang, Xu Chen, Xuan Di

    Abstract: Mean field games (MFGs) model the interactions within a large-population multi-agent system using the population distribution. Traditional learning methods for MFGs are based on fixed-point iteration (FPI), which calculates best responses and induced population distribution separately and sequentially. However, FPI-type methods suffer from inefficiency and instability, due to oscillations caused b… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  8. arXiv:2408.08105  [pdf, other

    cs.CV cs.AI

    Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images

    Authors: Zhiyuan Li, Heng Wang, Dongnan Liu, Chaoyi Zhang, Ao Ma, Jieting Long, Weidong Cai

    Abstract: Large Language Models (LLMs) have showcased exceptional ability in causal reasoning from textual information. However, will these causalities remain straightforward for Vision Large Language Models (VLLMs) when only visual hints are provided? Motivated by this, we propose a novel Multimodal Causal Reasoning benchmark, namely MuCR, to challenge VLLMs to infer semantic cause-and-effect relationship… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 20 pages

  9. arXiv:2408.07733  [pdf, other

    cs.LG cs.CR

    Enhancing Adversarial Attacks via Parameter Adaptive Adversarial Attack

    Authors: Zhibo Jin, Jiayu Zhang, Zhiyu Zhu, Chenyu Zhang, Jiahao Huang, Jianlong Zhou, Fang Chen

    Abstract: In recent times, the swift evolution of adversarial attacks has captured widespread attention, particularly concerning their transferability and other performance attributes. These techniques are primarily executed at the sample level, frequently overlooking the intrinsic parameters of models. Such neglect suggests that the perturbations introduced in adversarial samples might have the potential f… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  10. arXiv:2408.07605  [pdf, other

    cs.CV

    Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving

    Authors: Yuqing Wen, Yucheng Zhao, Yingfei Liu, Binyuan Huang, Fan Jia, Yanhui Wang, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang

    Abstract: The field of autonomous driving increasingly demands high-quality annotated video training data. In this paper, we propose Panacea+, a powerful and universally applicable framework for generating video data in driving scenes. Built upon the foundation of our previous work, Panacea, Panacea+ adopts a multi-view appearance noise prior mechanism and a super-resolution module for enhanced consistency… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Project page: https://panacea-ad.github.io/. arXiv admin note: text overlap with arXiv:2311.16813

  11. arXiv:2408.07401  [pdf, other

    cs.CL cs.AI cs.DB

    DataVisT5: A Pre-trained Language Model for Jointly Understanding Text and Data Visualization

    Authors: Zhuoyue Wan, Yuanfeng Song, Shuaimin Li, Chen Jason Zhang, Raymond Chi-Wing Wong

    Abstract: Data visualization (DV) is the fundamental and premise tool to improve the efficiency in conveying the insights behind the big data, which has been widely accepted in existing data-driven world. Task automation in DV, such as converting natural language queries to visualizations (i.e., text-to-vis), generating explanations from visualizations (i.e., vis-to-text), answering DV-related questions in… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  12. arXiv:2408.06901  [pdf, other

    cs.CV

    Divide and Conquer: Improving Multi-Camera 3D Perception with 2D Semantic-Depth Priors and Input-Dependent Queries

    Authors: Qi Song, Qingyong Hu, Chi Zhang, Yongquan Chen, Rui Huang

    Abstract: 3D perception tasks, such as 3D object detection and Bird's-Eye-View (BEV) segmentation using multi-camera images, have drawn significant attention recently. Despite the fact that accurately estimating both semantic and 3D scene layouts are crucial for this task, existing techniques often neglect the synergistic effects of semantic and depth cues, leading to the occurrence of classification and po… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted by TIP 2024

  13. arXiv:2408.06385  [pdf, other

    cs.SE cs.AI cs.CL

    ViC: Virtual Compiler Is All You Need For Assembly Code Search

    Authors: Zeyu Gao, Hao Wang, Yuanda Wang, Chao Zhang

    Abstract: Assembly code search is vital for reducing the burden on reverse engineers, allowing them to quickly identify specific functions using natural language within vast binary programs. Despite its significance, this critical task is impeded by the complexities involved in building high-quality datasets. This paper explores training a Large Language Model (LLM) to emulate a general compiler. By leverag… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  14. arXiv:2408.06294  [pdf, other

    cs.HC

    AniBalloons: Animated Chat Balloons as Affective Augmentation for Social Messaging and Chatbot Interaction

    Authors: Pengcheng An, Chaoyu Zhang, Haichen Gao, Ziqi Zhou, Yage Xiao, Jian Zhao

    Abstract: Despite being prominent and ubiquitous, message-based interaction is limited in nonverbally conveying emotions. Besides emoticons or stickers, messaging users continue seeking richer options for affective communication. Recent research explored using chat balloons' shape and color to communicate emotional states. However, little work explored whether and how chat-balloon animations could be design… ▽ More

    Submitted 14 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: under the 2nd review after minor revision by International Journal of Human-Computer Studies

  15. arXiv:2408.05758  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing

    Authors: Chunyu Qiang, Wang Geng, Yi Zhao, Ruibo Fu, Tao Wang, Cheng Gong, Tianrui Wang, Qiuyu Liu, Jiangyan Yi, Zhengqi Wen, Chen Zhang, Hao Che, Longbiao Wang, Jianwu Dang, Jianhua Tao

    Abstract: Deep learning has brought significant improvements to the field of cross-modal representation learning. For tasks such as text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), a cross-modal fine-grained (frame-level) sequence representation is desired, emphasizing the semantic content of the text modality while de-emphasizing the paralinguistic information of the spe… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  16. arXiv:2408.05705  [pdf, other

    eess.IV cs.AI cs.CV

    TC-KANRecon: High-Quality and Accelerated MRI Reconstruction via Adaptive KAN Mechanisms and Intelligent Feature Scaling

    Authors: Ruiquan Ge, Xiao Yu, Yifei Chen, Fan Jia, Shenghao Zhu, Guanyu Zhou, Yiyu Huang, Chenyan Zhang, Dong Zeng, Changmiao Wang, Qiegen Liu, Shanzhou Niu

    Abstract: Magnetic Resonance Imaging (MRI) has become essential in clinical diagnosis due to its high resolution and multiple contrast mechanisms. However, the relatively long acquisition time limits its broader application. To address this issue, this study presents an innovative conditional guided diffusion model, named as TC-KANRecon, which incorporates the Multi-Free U-KAN (MF-UKAN) module and a dynamic… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 10 pages, 3 figures

  17. arXiv:2408.05584  [pdf

    cs.LG stat.ME

    Dynamical causality under invisible confounders

    Authors: Jinling Yan, Shao-Wu Zhang, Chihao Zhang, Weitian Huang, Jifan Shi, Luonan Chen

    Abstract: Causality inference is prone to spurious causal interactions, due to the substantial confounders in a complex system. While many existing methods based on the statistical methods or dynamical methods attempt to address misidentification challenges, there remains a notable lack of effective methods to infer causality, in particular in the presence of invisible/unobservable confounders. As a result,… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 23 pages, 5 figures

  18. arXiv:2408.05508  [pdf, other

    cs.CV

    PointMT: Efficient Point Cloud Analysis with Hybrid MLP-Transformer Architecture

    Authors: Qiang Zheng, Chao Zhang, Jian Sun

    Abstract: In recent years, point cloud analysis methods based on the Transformer architecture have made significant progress, particularly in the context of multimedia applications such as 3D modeling, virtual reality, and autonomous systems. However, the high computational resource demands of the Transformer architecture hinder its scalability, real-time processing capabilities, and deployment on mobile de… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  19. arXiv:2408.05233  [pdf, other

    cs.AI

    Large Language Model based Agent Framework for Electric Vehicle Charging Behavior Simulation

    Authors: Junkang Feng, Chenggang Cui, Chuanlin Zhang, Zizhu Fan

    Abstract: This paper introduces a new LLM based agent framework for simulating electric vehicle (EV) charging behavior, integrating user preferences, psychological characteristics, and environmental factors to optimize the charging process. The framework comprises several modules, enabling sophisticated, adaptive simulations. Dynamic decision making is supported by continuous reflection and memory updates,… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 7 pages,3 figures

  20. arXiv:2408.05058  [pdf, other

    stat.ML cs.LG

    Variational Bayesian Phylogenetic Inference with Semi-implicit Branch Length Distributions

    Authors: Tianyu Xie, Frederick A. Matsen IV, Marc A. Suchard, Cheng Zhang

    Abstract: Reconstructing the evolutionary history relating a collection of molecular sequences is the main subject of modern Bayesian phylogenetic inference. However, the commonly used Markov chain Monte Carlo methods can be inefficient due to the complicated space of phylogenetic trees, especially when the number of sequences is large. An alternative approach is variational Bayesian phylogenetic inference… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 26 pages, 7 figures

  21. arXiv:2408.04967  [pdf, other

    eess.AS cs.SD

    ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild

    Authors: Jiangyan Yi, Chu Yuan Zhang, Jianhua Tao, Chenglong Wang, Xinrui Yan, Yong Ren, Hao Gu, Junzuo Zhou

    Abstract: The growing prominence of the field of audio deepfake detection is driven by its wide range of applications, notably in protecting the public from potential fraud and other malicious activities, prompting the need for greater attention and research in this area. The ADD 2023 challenge goes beyond binary real/fake classification by emulating real-world scenarios, such as the identification of manip… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  22. arXiv:2408.04889  [pdf, other

    cs.MM

    Deep joint source-channel coding for wireless point cloud transmission

    Authors: Cixiao Zhang, Mufan Liu, Wenjie Huang, Yin Xu, Yiling Xu, Dazhi He

    Abstract: The growing demand for high-quality point cloud transmission over wireless networks presents significant challenges, primarily due to the large data sizes and the need for efficient encoding techniques. In response to these challenges, we introduce a novel system named Deep Point Cloud Semantic Transmission (PCST), designed for end-to-end wireless point cloud transmission. Our approach employs a p… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  23. arXiv:2408.04708  [pdf, other

    cs.SD cs.AI eess.AS

    MulliVC: Multi-lingual Voice Conversion With Cycle Consistency

    Authors: Jiawei Huang, Chen Zhang, Yi Ren, Ziyue Jiang, Zhenhui Ye, Jinglin Liu, Jinzheng He, Xiang Yin, Zhou Zhao

    Abstract: Voice conversion aims to modify the source speaker's voice to resemble the target speaker while preserving the original speech content. Despite notable advancements in voice conversion these days, multi-lingual voice conversion (including both monolingual and cross-lingual scenarios) has yet to be extensively studied. It faces two main challenges: 1) the considerable variability in prosody and art… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  24. arXiv:2408.04595  [pdf, other

    stat.ML cs.AI cs.LG eess.SY math.ST

    Inference with the Upper Confidence Bound Algorithm

    Authors: Koulik Khamaru, Cun-Hui Zhang

    Abstract: In this paper, we discuss the asymptotic behavior of the Upper Confidence Bound (UCB) algorithm in the context of multiarmed bandit problems and discuss its implication in downstream inferential tasks. While inferential tasks become challenging when data is collected in a sequential manner, we argue that this problem can be alleviated when the sequential algorithm at hand satisfies certain stabili… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 17 pages, 1 figure

  25. arXiv:2408.04344  [pdf, other

    cs.SE

    Semantic-Enhanced Indirect Call Analysis with Large Language Models

    Authors: Baijun Cheng, Cen Zhang, Kailong Wang, Ling Shi, Yang Liu, Haoyu Wang, Yao Guo, Xiangqun Chen

    Abstract: In contemporary software development, the widespread use of indirect calls to achieve dynamic features poses challenges in constructing precise control flow graphs (CFGs), which further impacts the performance of downstream static analysis tasks. To tackle this issue, various types of indirect call analyzers have been proposed. However, they do not fully leverage the semantic information of the pr… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted by ASE'24

  26. arXiv:2408.04334  [pdf, other

    cs.AR

    A Node-Based Polar List Decoder with Frame Interleaving and Ensemble Decoding Support

    Authors: Yuqing Ren, Leyu Zhang, Ludovic Damien Blanc, Yifei Shen, Xinwei Li, Alexios Balatsoukas-Stimming, Chuan Zhang, Andreas Burg

    Abstract: Node-based successive cancellation list (SCL) decoding has received considerable attention in wireless communications for its significant reduction in decoding latency, particularly with 5G New Radio (NR) polar codes. However, the existing node-based SCL decoders are constrained by sequential processing, leading to complicated and data-dependent computational units that introduce unavoidable stall… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 13 pages, 16 figures, accepted by IEEE Transactions on Circuits and Systems I: Regular Papers

  27. arXiv:2408.04197  [pdf, other

    cs.IR cs.AI cs.DB

    Pairwise Judgment Formulation for Semantic Embedding Model in Web Search

    Authors: Mengze Hong, Chen Jason Zhang

    Abstract: Semantic Embedding Model (SEM), a neural network-based Siamese architecture, is gaining momentum in information retrieval and natural language processing. In order to train SEM in a supervised fashion for Web search, the search engine query log is typically utilized to automatically formulate pairwise judgments as training data. Despite the growing application of semantic embeddings in the search… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  28. arXiv:2408.03979  [pdf, ps, other

    cs.SD eess.AS

    Speaker Adaptation for Quantised End-to-End ASR Models

    Authors: Qiuming Zhao, Guangzhi Sun, Chao Zhang, Mingxing Xu, Thomas Fang Zheng

    Abstract: End-to-end models have shown superior performance for automatic speech recognition (ASR). However, such models are often very large in size and thus challenging to deploy on resource-constrained edge devices. While quantisation can reduce model sizes, it can lead to increased word error rates (WERs). Although improved quantisation methods were proposed to address the issue of performance degradati… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: submitted to ASRU 2023 Workshop

  29. arXiv:2408.03943  [pdf, other

    cs.HC cs.AI cs.LG

    Building Machines that Learn and Think with People

    Authors: Katherine M. Collins, Ilia Sucholutsky, Umang Bhatt, Kartik Chandra, Lionel Wong, Mina Lee, Cedegao E. Zhang, Tan Zhi-Xuan, Mark Ho, Vikash Mansinghka, Adrian Weller, Joshua B. Tenenbaum, Thomas L. Griffiths

    Abstract: What do we want from machine intelligence? We envision machines that are not just tools for thought, but partners in thought: reasonable, insightful, knowledgeable, reliable, and trustworthy systems that think with us. Current artificial intelligence (AI) systems satisfy some of these criteria, some of the time. In this Perspective, we show how the science of collaborative cognition can be put to… ▽ More

    Submitted 21 July, 2024; originally announced August 2024.

  30. arXiv:2408.02919  [pdf, other

    cs.CL

    Data Checklist: On Unit-Testing Datasets with Usable Information

    Authors: Heidi C. Zhang, Shabnam Behzad, Kawin Ethayarajh, Dan Jurafsky

    Abstract: Model checklists (Ribeiro et al., 2020) have emerged as a useful tool for understanding the behavior of LLMs, analogous to unit-testing in software engineering. However, despite datasets being a key determinant of model behavior, evaluating datasets, e.g., for the existence of annotation artifacts, is largely done ad hoc, once a problem in model behavior has already been found downstream. In this… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 17 pages, 4 figures. COLM 2024

  31. arXiv:2408.02555  [pdf, other

    cs.CV cs.AI cs.GR

    MeshAnything V2: Artist-Created Mesh Generation With Adjacent Mesh Tokenization

    Authors: Yiwen Chen, Yikai Wang, Yihao Luo, Zhengyi Wang, Zilong Chen, Jun Zhu, Chi Zhang, Guosheng Lin

    Abstract: We introduce MeshAnything V2, an autoregressive transformer that generates Artist-Created Meshes (AM) aligned to given shapes. It can be integrated with various 3D asset production pipelines to achieve high-quality, highly controllable AM generation. MeshAnything V2 surpasses previous methods in both efficiency and performance using models of the same size. These improvements are due to our newly… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Project Page: https://buaacyw.github.io/meshanything-v2/ Github: https://github.com/buaacyw/MeshAnythingV2

  32. arXiv:2408.02384  [pdf, other

    cs.LG cs.GT

    Strategic Federated Learning: Application to Smart Meter Data Clustering

    Authors: Hassan Mohamad, Chao Zhang, Samson Lasaulce, Vineeth S Varma, Mérouane Debbah, Mounir Ghogho

    Abstract: Federated learning (FL) involves several clients that share with a fusion center (FC), the model each client has trained with its own data. Conventional FL, which can be interpreted as an estimation or distortion-based approach, ignores the final use of model information (MI) by the FC and the other clients. In this paper, we introduce a novel FL framework in which the FC uses an aggregate version… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  33. arXiv:2408.01945  [pdf, other

    cs.CV cs.RO

    Generalized Maximum Likelihood Estimation for Perspective-n-Point Problem

    Authors: Tian Zhan, Chunfeng Xu, Cheng Zhang, Ke Zhu

    Abstract: The Perspective-n-Point (PnP) problem has been widely studied in the literature and applied in various vision-based pose estimation scenarios. However, existing methods ignore the anisotropy uncertainty of observations, as demonstrated in several real-world datasets in this paper. This oversight may lead to suboptimal and inaccurate estimation, particularly in the presence of noisy observations. T… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  34. arXiv:2408.01929  [pdf, other

    eess.IV cs.CV

    Advancing H&E-to-IHC Stain Translation in Breast Cancer: A Multi-Magnification and Attention-Based Approach

    Authors: Linhao Qu, Chengsheng Zhang, Guihui Li, Haiyong Zheng, Chen Peng, Wei He

    Abstract: Breast cancer presents a significant healthcare challenge globally, demanding precise diagnostics and effective treatment strategies, where histopathological examination of Hematoxylin and Eosin (H&E) stained tissue sections plays a central role. Despite its importance, evaluating specific biomarkers like Human Epidermal Growth Factor Receptor 2 (HER2) for personalized treatment remains constraine… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE CIS-RAM 2024 Invited Session Oral

  35. arXiv:2408.01784  [pdf, other

    cs.IR

    Graph Stochastic Neural Process for Inductive Few-shot Knowledge Graph Completion

    Authors: Zicheng Zhao, Linhao Luo, Shirui Pan, Chengqi Zhang, Chen Gong

    Abstract: Knowledge graphs (KGs) store enormous facts as relationships between entities. Due to the long-tailed distribution of relations and the incompleteness of KGs, there is growing interest in few-shot knowledge graph completion (FKGC). Existing FKGC methods often assume the existence of all entities in KGs, which may not be practical since new relations and entities can emerge over time. Therefore, we… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  36. arXiv:2408.01038  [pdf, other

    cs.CL

    UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents

    Authors: Yi Tu, Chong Zhang, Ya Guo, Huan Chen, Jinyang Tang, Huijia Zhu, Qi Zhang

    Abstract: The recognition of named entities in visually-rich documents (VrD-NER) plays a critical role in various real-world scenarios and applications. However, the research in VrD-NER faces three major challenges: complex document layouts, incorrect reading orders, and unsuitable task formulations. To address these challenges, we propose a query-aware entity extraction head, namely UNER, to collaborate wi… ▽ More

    Submitted 11 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: accepted by ACM Multimedia 2024

  37. arXiv:2408.00804  [pdf, other

    cs.AR cs.AI cs.LG

    ChipExpert: The Open-Source Integrated-Circuit-Design-Specific Large Language Model

    Authors: Ning Xu, Zhaoyang Zhang, Lei Qi, Wensuo Wang, Chao Zhang, Zihao Ren, Huaiyuan Zhang, Xin Cheng, Yanqi Zhang, Zhichao Liu, Qingwen Wei, Shiyang Wu, Lanlan Yang, Qianfeng Lu, Yiqun Ma, Mengyao Zhao, Junbo Liu, Yufan Song, Xin Geng, Jun Yang

    Abstract: The field of integrated circuit (IC) design is highly specialized, presenting significant barriers to entry and research and development challenges. Although large language models (LLMs) have achieved remarkable success in various domains, existing LLMs often fail to meet the specific needs of students, engineers, and researchers. Consequently, the potential of LLMs in the IC design domain remains… ▽ More

    Submitted 26 July, 2024; originally announced August 2024.

  38. arXiv:2408.00741  [pdf, other

    cs.AI cs.AR cs.DC

    DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency

    Authors: Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Josep Torrellas, Esha Choukse

    Abstract: The rapid evolution and widespread adoption of generative large language models (LLMs) have made them a pivotal workload in various applications. Today, LLM inference clusters receive a large number of queries with strict Service Level Objectives (SLOs). To achieve the desired performance, these models execute on power-hungry GPUs causing the inference clusters to consume large amount of energy an… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  39. arXiv:2408.00483  [pdf, other

    cs.LG cs.AI cs.CV cs.MM

    A Systematic Review on Long-Tailed Learning

    Authors: Chongsheng Zhang, George Almpanidis, Gaojuan Fan, Binquan Deng, Yanbo Zhang, Ji Liu, Aouaidjia Kamel, Paolo Soda, João Gama

    Abstract: Long-tailed data is a special type of multi-class imbalanced data with a very large amount of minority/tail classes that have a very significant combined influence. Long-tailed learning aims to build high-performance models on datasets with long-tailed distributions, which can identify all the classes with high accuracy, in particular the minority/tail classes. It is a cutting-edge research direct… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Current Under Revision at IEEE TNNLS. [This is the long/Full-length version of our Long-Tailed Learning Survey paper]

  40. arXiv:2408.00413  [pdf, other

    cs.IT eess.SP

    Joint Antenna Position and Beamforming Optimization with Self-Interference Mitigation in MA-ISAC System

    Authors: Size Peng, Cixiao Zhang, Yin Xu, Qingqing Wu, Xiaowu Ou, Dazhi He

    Abstract: Movable antennas (MAs) have demonstrated significant potential in enhancing the performance of integrated sensing and communication (ISAC) systems. However, the application in the integrated and cost-effective full-duplex (FD) monostatic systems remains underexplored. To address this research gap, we develop an MA-ISAC model within a monostatic framework, where the self-interference channel is mod… ▽ More

    Submitted 9 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

  41. arXiv:2407.21256  [pdf, other

    cs.CV

    Leveraging Adaptive Implicit Representation Mapping for Ultra High-Resolution Image Segmentation

    Authors: Ziyu Zhao, Xiaoguang Li, Pingping Cai, Canyu Zhang, Song Wang

    Abstract: Implicit representation mapping (IRM) can translate image features to any continuous resolution, showcasing its potent capability for ultra-high-resolution image segmentation refinement. Current IRM-based methods for refining ultra-high-resolution image segmentation often rely on CNN-based encoders to extract image features and apply a Shared Implicit Representation Mapping Function (SIRMF) to con… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  42. arXiv:2407.21048  [pdf, other

    cs.CL cs.AI

    APTNESS: Incorporating Appraisal Theory and Emotion Support Strategies for Empathetic Response Generation

    Authors: Yuxuan Hu, Minghuan Tan, Chenwei Zhang, Zixuan Li, Xiaodan Liang, Min Yang, Chengming Li, Xiping Hu

    Abstract: Empathetic response generation is designed to comprehend the emotions of others and select the most appropriate strategies to assist them in resolving emotional challenges. Empathy can be categorized into cognitive empathy and affective empathy. The former pertains to the ability to understand and discern the emotional issues and situations of others, while the latter involves the capacity to prov… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Appectped to CIKM2024

  43. arXiv:2407.20121  [pdf, other

    cs.IR cs.AI

    EXIT: An EXplicit Interest Transfer Framework for Cross-Domain Recommendation

    Authors: Lei Huang, Weitao Li, Chenrui Zhang, Jinpeng Wang, Xianchun Yi, Sheng Chen

    Abstract: Cross-domain recommendation has attracted substantial interest in industrial apps such as Meituan, which serves multiple business domains via knowledge transfer and meets the diverse interests of users. However, existing methods typically follow an implicit modeling paradigm that blends the knowledge from both the source and target domains, and design intricate network structures to share learned… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted at CIKM 2024

  44. arXiv:2407.19984  [pdf, other

    cs.CL

    Confidence Estimation for Automatic Detection of Depression and Alzheimer's Disease Based on Clinical Interviews

    Authors: Wen Wu, Chao Zhang, Philip C. Woodland

    Abstract: Speech-based automatic detection of Alzheimer's disease (AD) and depression has attracted increased attention. Confidence estimation is crucial for a trust-worthy automatic diagnostic system which informs the clinician about the confidence of model predictions and helps reduce the risk of misdiagnosis. This paper investigates confidence estimation for automatic detection of AD and depression based… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by Interspeech 2024

  45. arXiv:2407.19507  [pdf, other

    cs.CV cs.AI

    WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting

    Authors: Jingjing Wu, Zhengyao Fang, Pengyuan Lyu, Chengquan Zhang, Fanglin Chen, Guangming Lu, Wenjie Pei

    Abstract: Transcription-only Supervised Text Spotting aims to learn text spotters relying only on transcriptions but no text boundaries for supervision, thus eliminating expensive boundary annotation. The crux of this task lies in locating each transcription in scene text images without location annotations. In this work, we formulate this challenging problem as a Weakly Supervised Cross-modality Contrastiv… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  46. arXiv:2407.18957  [pdf, other

    q-fin.TR cs.AI cs.MA

    When AI Meets Finance (StockAgent): Large Language Model-based Stock Trading in Simulated Real-world Environments

    Authors: Chong Zhang, Xinyi Liu, Mingyu Jin, Zhongmou Zhang, Lingyao Li, Zhenting Wang, Wenyue Hua, Dong Shu, Suiyuan Zhu, Xiaobo Jin, Sujian Li, Mengnan Du, Yongfeng Zhang

    Abstract: Can AI Agents simulate real-world trading environments to investigate the impact of external factors on stock trading activities (e.g., macroeconomics, policy changes, company fundamentals, and global events)? These factors, which frequently influence trading behaviors, are critical elements in the quest for maximizing investors' profits. Our work attempts to solve this problem through large langu… ▽ More

    Submitted 1 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 33 pages, 10 figures

  47. arXiv:2407.17889  [pdf

    cs.NE

    An Error Discovery and Correction for the Family of V-Shaped BPSO Algorithms

    Authors: Qing Zhao, Chengkui Zhang, Hao Li, Ting Ke

    Abstract: BPSO algorithm is a swarm intelligence optimization algorithm, which has the characteristics of good optimization effect, high efficiency and easy to implement. In recent years, it has been used to optimize a variety of machine learning and deep learning models, such as CNN, LSTM, SVM, etc. But it is easy to fall into local optimum for the lack of exploitation ability. It is found that in the arti… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 25 pages, 11 figures

  48. arXiv:2407.17792  [pdf, other

    cs.CV

    Harnessing Temporal Causality for Advanced Temporal Action Detection

    Authors: Shuming Liu, Lin Sui, Chen-Lin Zhang, Fangzhou Mu, Chen Zhao, Bernard Ghanem

    Abstract: As a fundamental task in long-form video understanding, temporal action detection (TAD) aims to capture inherent temporal relations in untrimmed videos and identify candidate actions with precise boundaries. Over the years, various networks, including convolutions, graphs, and transformers, have been explored for effective temporal modeling for TAD. However, these modules typically treat past and… ▽ More

    Submitted 25 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: 1st in Moment Queries track at the Ego4D Challenge 2024; 1st in Action Recognition, Action Detection, and Audio-Based Interaction Detection tracks at the EPIC-Kitchens Challenge 2024

  49. arXiv:2407.17674  [pdf, other

    cs.LG q-bio.BM

    Synthetic High-resolution Cryo-EM Density Maps with Generative Adversarial Networks

    Authors: Chenwei Zhang, Anne Condon, Khanh Dao Duc

    Abstract: Generating synthetic cryogenic electron microscopy (cryo-EM) 3D density maps from molecular structures has potential important applications in structural biology. Yet existing simulation-based methods cannot mimic all the complex features present in experimental maps, such as secondary structure elements. As an alternative, we propose struc2mapGAN, a novel data-driven method that employs a generat… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  50. TWIN V2: Scaling Ultra-Long User Behavior Sequence Modeling for Enhanced CTR Prediction at Kuaishou

    Authors: Zihua Si, Lin Guan, ZhongXiang Sun, Xiaoxue Zang, Jing Lu, Yiqun Hui, Xingchao Cao, Zeyu Yang, Yichen Zheng, Dewei Leng, Kai Zheng, Chenbin Zhang, Yanan Niu, Yang Song, Kun Gai

    Abstract: The significance of modeling long-term user interests for CTR prediction tasks in large-scale recommendation systems is progressively gaining attention among researchers and practitioners. Existing work, such as SIM and TWIN, typically employs a two-stage approach to model long-term user behavior sequences for efficiency concerns. The first stage rapidly retrieves a subset of sequences related to… ▽ More

    Submitted 16 August, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by CIKM 2024