Skip to main content

Showing 1–50 of 220 results for author: Yan, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.07452  [pdf, other

    cs.CL cs.AI

    CMU's IWSLT 2024 Simultaneous Speech Translation System

    Authors: Xi Xu, Siqi Ouyang, Brian Yan, Patrick Fernandes, William Chen, Lei Li, Graham Neubig, Shinji Watanabe

    Abstract: This paper describes CMU's submission to the IWSLT 2024 Simultaneous Speech Translation (SST) task for translating English speech to German text in a streaming manner. Our end-to-end speech-to-text (ST) system integrates the WavLM speech encoder, a modality adapter, and the Llama2-7B-Base model as the decoder. We employ a two-stage training approach: initially, we align the representations of spee… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  2. arXiv:2408.07107  [pdf, other

    cs.LG

    Maximizing V-information for Pre-training Superior Foundation Models

    Authors: Wenxuan Yang, Weimin Tan, Hanyu Zhang, Bo Yan

    Abstract: Pre-training foundation models on large-scale datasets demonstrates exceptional performance. However, recent research questions this traditional notion, exploring whether an increase in pre-training data always leads to enhanced model performance. To address this issue, data-effective learning approaches have been introduced. However, current methods in this area lack a clear standard for sample s… ▽ More

    Submitted 16 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

  3. arXiv:2408.02928  [pdf, other

    cs.DB

    PGB: Benchmarking Differentially Private Synthetic Graph Generation Algorithms

    Authors: Shang Liu, Hao Du, Yang Cao, Bo Yan, Jinfei Liu, Masatoshi Yoshikawa

    Abstract: Differentially private graph analysis is a powerful tool for deriving insights from diverse graph data while protecting individual information. Designing private analytic algorithms for different graph queries often requires starting from scratch. In contrast, differentially private synthetic graph generation offers a general paradigm that supports one-time generation for multiple queries. Althoug… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 12 pages

  4. arXiv:2407.20494  [pdf, other

    cs.NI

    MLOPS in a multicloud environment: Typical Network Topology

    Authors: Boyang Yan

    Abstract: As artificial intelligence, machine learning, and data science continue to drive the data-centric economy, the challenges of implementing machine learning on a single machine due to extensive data and computational needs have led to the adoption of cloud computing solutions. This research paper explores the design and implementation of a secure, cloud-native machine learning operations (MLOPS) pip… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 56 pages

  5. arXiv:2407.18772  [pdf, other

    cs.LG cs.CY cs.SI

    Learning production functions for supply chains with graph neural networks

    Authors: Serina Chang, Zhiyin Lin, Benjamin Yan, Swapnil Bembde, Qi Xiu, Chi Heem Wong, Yu Qin, Frank Kloster, Alex Luo, Raj Palleti, Jure Leskovec

    Abstract: The global economy relies on the flow of goods over supply chain networks, with nodes as firms and edges as transactions between firms. While we may observe these external transactions, they are governed by unseen production functions, which determine how firms internally transform the input products they receive into output products that they sell. In this setting, it can be extremely valuable to… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  6. arXiv:2407.15167  [pdf

    cs.CV

    The VEP Booster: A Closed-Loop AI System for Visual EEG Biomarker Auto-generation

    Authors: Junwen Luo, Chengyong Jiang, Qingyuan Chen, Dongqi Han, Yansen Wang, Biao Yan, Dongsheng Li, Jiayi Zhang

    Abstract: Effective visual brain-machine interfaces (BMI) is based on reliable and stable EEG biomarkers. However, traditional adaptive filter-based approaches may suffer from individual variations in EEG signals, while deep neural network-based approaches may be hindered by the non-stationarity of EEG signals caused by biomarker attenuation and background oscillations. To address these challenges, we propo… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 19 pages, 6 figures

  7. arXiv:2407.13768  [pdf, other

    cs.CV cs.AI

    Addressing Imbalance for Class Incremental Learning in Medical Image Classification

    Authors: Xuze Hao, Wenqian Ni, Xuhao Jiang, Weimin Tan, Bo Yan

    Abstract: Deep convolutional neural networks have made significant breakthroughs in medical image classification, under the assumption that training samples from all classes are simultaneously available. However, in real-world medical scenarios, there's a common need to continuously learn about new diseases, leading to the emerging field of class incremental learning (CIL) in the medical domain. Typically,… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  8. arXiv:2407.02301  [pdf, other

    cs.CL

    CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models

    Authors: Ying Nie, Binwei Yan, Tianyu Guo, Hao Liu, Haoyu Wang, Wei He, Binfan Zheng, Weihao Wang, Qiang Li, Weijian Sun, Yunhe Wang, Dacheng Tao

    Abstract: Large language models (LLMs) have achieved remarkable performance on various NLP tasks, yet their potential in more challenging and domain-specific task, such as finance, has not been fully explored. In this paper, we present CFinBench: a meticulously crafted, the most comprehensive evaluation benchmark to date, for assessing the financial knowledge of LLMs under Chinese context. In practice, to b… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  9. arXiv:2407.01885  [pdf, other

    cs.CL cs.AI

    Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application

    Authors: Chuanpeng Yang, Wang Lu, Yao Zhu, Yidong Wang, Qian Chen, Chenlong Gao, Bingjie Yan, Yiqiang Chen

    Abstract: Large Language Models (LLMs) have showcased exceptional capabilities in various domains, attracting significant interest from both academia and industry. Despite their impressive performance, the substantial size and computational demands of LLMs pose considerable challenges for practical deployment, particularly in environments with limited resources. The endeavor to compress language models whil… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 28 pages

  10. arXiv:2406.18849  [pdf, other

    cs.CV

    Dysca: A Dynamic and Scalable Benchmark for Evaluating Perception Ability of LVLMs

    Authors: Jie Zhang, Zhongqi Wang, Mengqi Lei, Zheng Yuan, Bei Yan, Shiguang Shan, Xilin Chen

    Abstract: Currently many benchmarks have been proposed to evaluate the perception ability of the Large Vision-Language Models (LVLMs). However, most benchmarks conduct questions by selecting images from existing datasets, resulting in the potential data leakage. Besides, these benchmarks merely focus on evaluating LVLMs on the realistic style images and clean scenarios, leaving the multi-stylized images and… ▽ More

    Submitted 25 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  11. arXiv:2406.17115  [pdf, other

    cs.CV cs.AI

    Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models

    Authors: Bei Yan, Jie Zhang, Zheng Yuan, Shiguang Shan, Xilin Chen

    Abstract: Despite the rapid progress and outstanding performance of Large Vision-Language Models (LVLMs) in recent years, LVLMs have been plagued by the issue of hallucination, i.e., LVLMs tend to generate responses that are inconsistent with the corresponding visual inputs. To evaluate the degree of hallucination in LVLMs, previous works have proposed a series of benchmarks featuring different types of tas… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  12. arXiv:2406.02950  [pdf, other

    eess.AS cs.CL cs.SD

    4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders

    Authors: Yui Sudo, Muhammad Shakeel, Yosuke Fukumoto, Brian Yan, Jiatong Shi, Yifan Peng, Shinji Watanabe

    Abstract: End-to-end automatic speech recognition (E2E-ASR) can be classified into several network architectures, such as connectionist temporal classification (CTC), recurrent neural network transducer (RNN-T), attention-based encoder-decoder, and mask-predict models. Each network architecture has advantages and disadvantages, leading practitioners to switch between these different models depending on appl… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: submitted to IEEE/ACM Transactions on Audio Speech and Language Processing

  13. arXiv:2406.02859   

    eess.AS cs.SD

    ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization

    Authors: Bi-Cheng Yan, Wei-Cheng Chao, Jiun-Ting Li, Yi-Cheng Wang, Hsin-Wei Wang, Meng-Shin Lin, Berlin Chen

    Abstract: Automatic pronunciation assessment (APA) manages to evaluate the pronunciation proficiency of a second language (L2) learner in a target language. Existing efforts typically draw on regression models for proficiency score prediction, where the models are trained to estimate target values without explicitly accounting for phoneme-awareness in the feature space. In this paper, we propose a contrasti… ▽ More

    Submitted 8 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: This paper has been withdrawn because the authors aim to achieve better organization in writing and more detailed experimental analysis

  14. arXiv:2406.01224  [pdf, other

    cs.CL

    Demonstration Augmentation for Zero-shot In-context Learning

    Authors: Yi Su, Yunpeng Tai, Yixin Ji, Juntao Li, Bowen Yan, Min Zhang

    Abstract: Large Language Models (LLMs) have demonstrated an impressive capability known as In-context Learning (ICL), which enables them to acquire knowledge from textual demonstrations without the need for parameter updates. However, many studies have highlighted that the model's performance is sensitive to the choice of demonstrations, presenting a significant challenge for practical applications where we… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Findings

  15. arXiv:2405.16883  [pdf, other

    cs.LG cs.AI cs.MS cs.PL

    Scorch: A Library for Sparse Deep Learning

    Authors: Bobby Yan, Alexander J. Root, Trevor Gale, David Broman, Fredrik Kjolstad

    Abstract: The rapid growth in the size of deep learning models strains the capabilities of traditional dense computation paradigms. Leveraging sparse computation has become increasingly popular for training and deploying large-scale models, but existing deep learning frameworks lack extensive support for sparse operations. To bridge this gap, we introduce Scorch, a library that seamlessly integrates efficie… ▽ More

    Submitted 20 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: 25 pages, 8 figures

  16. arXiv:2405.11408  [pdf, other

    cs.NI

    Workload Prediction in P4 Programmable Switches

    Authors: Boyang Yan

    Abstract: The rapid expansion of cloud services and their unpredictable workload demands present significant challenges in resource management. Traditional resource management approaches, primarily based on static rules and thresholds, often fail to ensure cost-effectiveness and optimal resource utilization. This research introduces a predictive model designed to forecast traffic demand, aiming to shift fro… ▽ More

    Submitted 29 July, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

    Comments: 15 pages

    ACM Class: C.2.3

  17. arXiv:2405.05745  [pdf, other

    cs.CV

    Efficient Pretraining Model based on Multi-Scale Local Visual Field Feature Reconstruction for PCB CT Image Element Segmentation

    Authors: Chen Chen, Kai Qiao, Jie Yang, Jian Chen, Bin Yan

    Abstract: Element segmentation is a key step in nondestructive testing of Printed Circuit Boards (PCB) based on Computed Tomography (CT) technology. In recent years, the rapid development of self-supervised pretraining technology can obtain general image features without labeled samples, and then use a small amount of labeled samples to solve downstream tasks, which has a good potential in PCB element segme… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  18. arXiv:2405.03911  [pdf, other

    cs.LG cs.AI cs.CR cs.DC

    Federated Graph Condensation with Information Bottleneck Principles

    Authors: Bo Yan

    Abstract: Graph condensation, which reduces the size of a large-scale graph by synthesizing a small-scale condensed graph as its substitution, has immediately benefited various graph learning tasks. However, existing graph condensation methods rely on centralized data storage, which is unfeasible for real-world decentralized data distribution, and overlook data holders' privacy-preserving requirements. To b… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 13 pages

  19. arXiv:2405.03419  [pdf, other

    cs.NE cs.LG

    Automated Metaheuristic Algorithm Design with Autoregressive Learning

    Authors: Qi Zhao, Tengfei Liu, Bai Yan, Qiqi Duan, Jian Yang, Yuhui Shi

    Abstract: Automated design of metaheuristic algorithms offers an attractive avenue to reduce human effort and gain enhanced performance beyond human intuition. Current automated methods design algorithms within a fixed structure and operate from scratch. This poses a clear gap towards fully discovering potentials over the metaheuristic family and fertilizing from prior design experience. To bridge the gap,… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  20. arXiv:2405.00452  [pdf, other

    cs.CV

    Predictive Accuracy-Based Active Learning for Medical Image Segmentation

    Authors: Jun Shi, Shulan Ruan, Ziqi Zhu, Minfan Zhao, Hong An, Xudong Xue, Bing Yan

    Abstract: Active learning is considered a viable solution to alleviate the contradiction between the high dependency of deep learning-based segmentation methods on annotated data and the expensive pixel-level annotation cost of medical images. However, most existing methods suffer from unreliable uncertainty assessment and the struggle to balance diversity and informativeness, leading to poor performance in… ▽ More

    Submitted 29 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

    Comments: 9 pages, 4 figures

  21. arXiv:2404.09497  [pdf, other

    cs.AR

    Towards Efficient SRAM-PIM Architecture Design by Exploiting Unstructured Bit-Level Sparsity

    Authors: Cenlin Duan, Jianlei Yang, Yiou Wang, Yikun Wang, Yingjie Qi, Xiaolin He, Bonan Yan, Xueyan Wang, Xiaotao Jia, Weisheng Zhao

    Abstract: Bit-level sparsity in neural network models harbors immense untapped potential. Eliminating redundant calculations of randomly distributed zero-bits significantly boosts computational efficiency. Yet, traditional digital SRAM-PIM architecture, limited by rigid crossbar architecture, struggles to effectively exploit this unstructured sparsity. To address this challenge, we propose Dyadic Block PIM… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted by DAC'24

  22. arXiv:2404.09149  [pdf, other

    eess.SY cs.NE math.NA

    Heuristic Solution to Joint Deployment and Beamforming Design for STAR-RIS Aided Networks

    Authors: Bai Yan, Qi Zhao, Jin Zhang, J. Andrew Zhang

    Abstract: This paper tackles the deployment challenges of Simultaneous Transmitting and Reflecting Reconfigurable Intelligent Surface (STAR-RIS) in communication systems. Unlike existing works that use fixed deployment setups or solely optimize the location, this paper emphasizes the joint optimization of the location and orientation of STAR-RIS. This enables searching across all user grouping possibilities… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: 30 pages

  23. arXiv:2404.02538  [pdf, other

    stat.ML cs.LG

    Convergence Analysis of Flow Matching in Latent Space with Transformers

    Authors: Yuling Jiao, Yanming Lai, Yang Wang, Bokai Yan

    Abstract: We present theoretical convergence guarantees for ODE-based generative models, specifically flow matching. We use a pre-trained autoencoder network to map high-dimensional original inputs to a low-dimensional latent space, where a transformer network is trained to predict the velocity field of the transformation from a standard normal distribution to the target latent distribution. Our error analy… ▽ More

    Submitted 28 April, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

  24. arXiv:2403.17645  [pdf

    cs.CL

    DANCER: Entity Description Augmented Named Entity Corrector for Automatic Speech Recognition

    Authors: Yi-Cheng Wang, Hsin-Wei Wang, Bi-Cheng Yan, Chi-Han Lin, Berlin Chen

    Abstract: End-to-end automatic speech recognition (E2E ASR) systems often suffer from mistranscription of domain-specific phrases, such as named entities, sometimes leading to catastrophic failures in downstream tasks. A family of fast and lightweight named entity correction (NEC) models for ASR have recently been proposed, which normally build on phonetic-level edit distance algorithms and have shown impre… ▽ More

    Submitted 11 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted by LREC-COLING 2024

  25. arXiv:2403.12695  [pdf, other

    eess.IV cs.CV cs.LG

    Federated Semi-supervised Learning for Medical Image Segmentation with intra-client and inter-client Consistency

    Authors: Yubin Zheng, Peng Tang, Tianjie Ju, Weidong Qiu, Bo Yan

    Abstract: Medical image segmentation plays a vital role in clinic disease diagnosis and medical image analysis. However, labeling medical images for segmentation task is tough due to the indispensable domain expertise of radiologists. Furthermore, considering the privacy and sensitivity of medical images, it is impractical to build a centralized segmentation dataset from different medical institutions. Fede… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Working in progress

  26. Efficient size-prescribed $k$-core search

    Authors: Yiping Liu, Bo Yan, Bo Zhao, Hongyi Su, Yang Chen, Michael Witbrock

    Abstract: $k$-core is a subgraph where every node has at least $k$ neighbors within the subgraph. The $k$-core subgraphs has been employed in large platforms like Network Repository to comprehend the underlying structures and dynamics of the network. Existing studies have primarily focused on finding $k… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  27. arXiv:2403.05156  [pdf, other

    cs.CR

    On Protecting the Data Privacy of Large Language Models (LLMs): A Survey

    Authors: Biwei Yan, Kun Li, Minghui Xu, Yueyan Dong, Yue Zhang, Zhaochun Ren, Xiuzhen Cheng

    Abstract: Large language models (LLMs) are complex artificial intelligence systems capable of understanding, generating and translating human language. They learn language patterns by analyzing large amounts of text data, allowing them to perform writing, conversation, summarizing and other language tasks. When LLMs process and generate large amounts of data, there is a risk of leaking sensitive information… ▽ More

    Submitted 14 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

    Comments: 18 pages, 4 figures

  28. arXiv:2402.16602  [pdf, other

    cs.CL

    Rethinking Negative Instances for Generative Named Entity Recognition

    Authors: Yuyang Ding, Juntao Li, Pinzheng Wang, Zecheng Tang, Bowen Yan, Min Zhang

    Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities for generalizing in unseen tasks. In the Named Entity Recognition (NER) task, recent advancements have seen the remarkable improvement of LLMs in a broad range of entity domains via instruction tuning, by adopting entity-centric schema. In this work, we explore the potential enhancement of the existing methods by incorporating… ▽ More

    Submitted 18 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: ACL 2024 Findings

  29. arXiv:2402.04557  [pdf

    physics.chem-ph cs.LG

    An Artificial Intelligence (AI) workflow for catalyst design and optimization

    Authors: Nung Siong Lai, Yi Shen Tew, Xialin Zhong, Jun Yin, Jiali Li, Binhang Yan, Xiaonan Wang

    Abstract: In the pursuit of novel catalyst development to address pressing environmental concerns and energy demand, conventional design and optimization methods often fall short due to the complexity and vastness of the catalyst parameter space. The advent of Machine Learning (ML) has ushered in a new era in the field of catalyst optimization, offering potential solutions to the shortcomings of traditional… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 31 pages, 7 figures

    Journal ref: Ind. Eng. Chem. Res. 2023, 62, 43, 17835-17848

  30. STAR: An Efficient Softmax Engine for Attention Model with RRAM Crossbar

    Authors: Yifeng Zhai, Bing Li, Bonan Yan, Jing Wang

    Abstract: RRAM crossbars have been studied to construct in-memory accelerators for neural network applications due to their in-situ computing capability. However, prior RRAM-based accelerators show efficiency degradation when executing the popular attention models. We observed that the frequent softmax operations arise as the efficiency bottleneck and also are insensitive to computing precision. Thus, we pr… ▽ More

    Submitted 30 January, 2024; originally announced January 2024.

    Journal ref: 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)

  31. arXiv:2401.17542  [pdf, other

    cs.LG cs.AI cs.CV

    A Medical Data-Effective Learning Benchmark for Highly Efficient Pre-training of Foundation Models

    Authors: Wenxuan Yang, Weimin Tan, Yuqi Sun, Bo Yan

    Abstract: Foundation models, pre-trained on massive datasets, have achieved unprecedented generalizability. However, is it truly necessary to involve such vast amounts of data in pre-training, consuming extensive computational resources? This paper introduces data-effective learning, aiming to use data in the most impactful way to pre-train foundation models. This involves strategies that focus on data qual… ▽ More

    Submitted 16 August, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

  32. arXiv:2401.16658  [pdf, ps, other

    cs.CL eess.AS

    OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

    Authors: Yifan Peng, Jinchuan Tian, William Chen, Siddhant Arora, Brian Yan, Yui Sudo, Muhammad Shakeel, Kwanghee Choi, Jiatong Shi, Xuankai Chang, Jee-weon Jung, Shinji Watanabe

    Abstract: Recent studies have highlighted the importance of fully open foundation models. The Open Whisper-style Speech Model (OWSM) is an initial step towards reproducing OpenAI Whisper using public data and open-source toolkits. However, previous versions of OWSM (v1 to v3) are still based on standard Transformer, which might lead to inferior performance compared to state-of-the-art speech encoder archite… ▽ More

    Submitted 16 June, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted at INTERSPEECH 2024. Webpage: https://www.wavlab.org/activities/2024/owsm/

  33. arXiv:2401.13499  [pdf, other

    cs.CV

    LDCA: Local Descriptors with Contextual Augmentation for Few-Shot Learning

    Authors: Maofa Wang, Bingchen Yan

    Abstract: Few-shot image classification has emerged as a key challenge in the field of computer vision, highlighting the capability to rapidly adapt to new tasks with minimal labeled data. Existing methods predominantly rely on image-level features or local descriptors, often overlooking the holistic context surrounding these descriptors. In this work, we introduce a novel approach termed "Local Descriptor… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  34. arXiv:2401.11459  [pdf, other

    cs.AR cs.AI cs.LG

    AttentionLego: An Open-Source Building Block For Spatially-Scalable Large Language Model Accelerator With Processing-In-Memory Technology

    Authors: Rongqing Cong, Wenyang He, Mingxuan Li, Bangning Luo, Zebin Yang, Yuchao Yang, Ru Huang, Bonan Yan

    Abstract: Large language models (LLMs) with Transformer architectures have become phenomenal in natural language processing, multimodal generative artificial intelligence, and agent-oriented artificial intelligence. The self-attention module is the most dominating sub-structure inside Transformer-based LLMs. Computation using general-purpose graphics processing units (GPUs) inflicts reckless demand for I/O… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: for associated source codes, see https://bonany.cc/attentionleg

  35. FedRFQ: Prototype-Based Federated Learning with Reduced Redundancy, Minimal Failure, and Enhanced Quality

    Authors: Biwei Yan, Hongliang Zhang, Minghui Xu, Dongxiao Yu, Xiuzhen Cheng

    Abstract: Federated learning is a powerful technique that enables collaborative learning among different clients. Prototype-based federated learning is a specific approach that improves the performance of local models under non-IID (non-Independently and Identically Distributed) settings by integrating class prototypes. However, prototype-based federated learning faces several challenges, such as prototype… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  36. arXiv:2401.03851  [pdf, other

    cs.CV q-bio.NC

    Aligned with LLM: a new multi-modal training paradigm for encoding fMRI activity in visual cortex

    Authors: Shuxiao Ma, Linyuan Wang, Senbao Hou, Bin Yan

    Abstract: Recently, there has been a surge in the popularity of pre trained large language models (LLMs) (such as GPT-4), sweeping across the entire Natural Language Processing (NLP) and Computer Vision (CV) communities. These LLMs have demonstrated advanced multi-modal understanding capabilities and showcased strong performance across various benchmarks. The LLM has started to embody traits of artificial g… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  37. arXiv:2312.15715  [pdf, other

    cs.CV

    UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces

    Authors: Jiannan Wu, Yi Jiang, Bin Yan, Huchuan Lu, Zehuan Yuan, Ping Luo

    Abstract: The reference-based object segmentation tasks, namely referring image segmentation (RIS), few-shot image segmentation (FSS), referring video object segmentation (RVOS), and video object segmentation (VOS), aim to segment a specific object by utilizing either language or annotated masks as references. Despite significant progress in each respective field, current methods are task-specifically desig… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: Extended version of ICCV2023 UniRef. 20 pages

  38. arXiv:2312.10890  [pdf, other

    cs.CV cs.GR

    Low-latency Space-time Supersampling for Real-time Rendering

    Authors: Ruian He, Shili Zhou, Yuqi Sun, Ri Cheng, Weimin Tan, Bo Yan

    Abstract: With the rise of real-time rendering and the evolution of display devices, there is a growing demand for post-processing methods that offer high-resolution content in a high frame rate. Existing techniques often suffer from quality and latency issues due to the disjointed treatment of frame supersampling and extrapolation. In this paper, we recognize the shared context and mechanisms between frame… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: Accepted to AAAI 2024

  39. FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge

    Authors: Jiahe Lan, Jie Wang, Baochen Yan, Zheng Yan, Elisa Bertino

    Abstract: Speech recognition systems driven by DNNs have revolutionized human-computer interaction through voice interfaces, which significantly facilitate our daily lives. However, the growing popularity of these systems also raises special concerns on their security, particularly regarding backdoor attacks. A backdoor attack inserts one or more hidden backdoors into a DNN model during its training process… ▽ More

    Submitted 5 July, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

    Comments: To appear at lEEE Symposium on Security & Privacy (Oakland) 2024

  40. arXiv:2312.07180  [pdf, other

    cs.CV

    Context-Aware Iteration Policy Network for Efficient Optical Flow Estimation

    Authors: Ri Cheng, Ruian He, Xuhao Jiang, Shili Zhou, Weimin Tan, Bo Yan

    Abstract: Existing recurrent optical flow estimation networks are computationally expensive since they use a fixed large number of iterations to update the flow field for each sample. An efficient network should skip iterations when the flow improvement is limited. In this paper, we develop a Context-Aware Iteration Policy Network for efficient optical flow estimation, which determines the optimal number of… ▽ More

    Submitted 5 January, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: 2024, Association for the Advancement of Artificial Intelligence

  41. arXiv:2312.04747  [pdf, other

    cs.NI cs.SE stat.ME

    MetaDetect: Metamorphic Testing Based Anomaly Detection for Multi-UAV Wireless Networks

    Authors: Boyang Yan

    Abstract: The reliability of wireless Ad Hoc Networks (WANET) communication is much lower than wired networks. WANET will be impacted by node overload, routing protocol, weather, obstacle blockage, and many other factors, all those anomalies cannot be avoided. Accurate prediction of the network entirely stopping in advance is essential after people could do networking re-routing or changing to different ban… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

    Comments: 9 pages, 7 figures

    MSC Class: 68-06

  42. arXiv:2311.10865  [pdf

    cs.CV

    Zero-Shot Digital Rock Image Segmentation with a Fine-Tuned Segment Anything Model

    Authors: Zhaoyang Ma, Xupeng He, Shuyu Sun, Bicheng Yan, Hyung Kwak, Jun Gao

    Abstract: Accurate image segmentation is crucial in reservoir modelling and material characterization, enhancing oil and gas extraction efficiency through detailed reservoir models. This precision offers insights into rock properties, advancing digital rock physics understanding. However, creating pixel-level annotations for complex CT and SEM rock images is challenging due to their size and low contrast, l… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

  43. arXiv:2311.09656  [pdf, other

    cs.CL cs.AI

    Structured Chemistry Reasoning with Large Language Models

    Authors: Siru Ouyang, Zhuosheng Zhang, Bing Yan, Xuan Liu, Yejin Choi, Jiawei Han, Lianhui Qin

    Abstract: Large Language Models (LLMs) excel in diverse areas, yet struggle with complex scientific reasoning, especially in the field of chemistry. Different from the simple chemistry tasks (e.g., molecule classification) addressed in previous studies, complex chemistry problems require not only vast knowledge and precise calculation, but also compositional reasoning about rich dynamic interactions of diff… ▽ More

    Submitted 9 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Work in progress

  44. arXiv:2311.06079  [pdf

    cs.CV eess.IV

    Enhancing Rock Image Segmentation in Digital Rock Physics: A Fusion of Generative AI and State-of-the-Art Neural Networks

    Authors: Zhaoyang Ma, Xupeng He, Hyung Kwak, Jun Gao, Shuyu Sun, Bicheng Yan

    Abstract: In digital rock physics, analysing microstructures from CT and SEM scans is crucial for estimating properties like porosity and pore connectivity. Traditional segmentation methods like thresholding and CNNs often fall short in accurately detailing rock microstructures and are prone to noise. U-Net improved segmentation accuracy but required many expert-annotated samples, a laborious and error-pron… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  45. arXiv:2310.20424  [pdf, other

    cs.AR cs.LG

    DDC-PIM: Efficient Algorithm/Architecture Co-design for Doubling Data Capacity of SRAM-based Processing-In-Memory

    Authors: Cenlin Duan, Jianlei Yang, Xiaolin He, Yingjie Qi, Yikun Wang, Yiou Wang, Ziyan He, Bonan Yan, Xueyan Wang, Xiaotao Jia, Weitao Pan, Weisheng Zhao

    Abstract: Processing-in-memory (PIM), as a novel computing paradigm, provides significant performance benefits from the aspect of effective data movement reduction. SRAM-based PIM has been demonstrated as one of the most promising candidates due to its endurance and compatibility. However, the integration density of SRAM-based PIM is much lower than other non-volatile memory-based ones, due to its inherent… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

    Comments: 14 pages, to be published in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)

  46. arXiv:2310.17811  [pdf, other

    cs.AI cs.CL

    Style-Aware Radiology Report Generation with RadGraph and Few-Shot Prompting

    Authors: Benjamin Yan, Ruochen Liu, David E. Kuo, Subathra Adithan, Eduardo Pontes Reis, Stephen Kwak, Vasantha Kumar Venugopal, Chloe P. O'Connell, Agustina Saenz, Pranav Rajpurkar, Michael Moor

    Abstract: Automatically generated reports from medical images promise to improve the workflow of radiologists. Existing methods consider an image-to-report modeling task by directly generating a fully-fledged report from an image. However, this conflates the content of the report (e.g., findings and their attributes) with its style (e.g., format and choice of words), which can lead to clinically inaccurate… ▽ More

    Submitted 31 October, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted to Findings of EMNLP 2023

  47. arXiv:2310.16298  [pdf, other

    cs.DC

    Stencil Matrixization

    Authors: Wenxuan Zhao, Liang Yuan, Baicheng Yan, Penghao Ma, Yunquan Zhang, Long Wang, Zhe Wang

    Abstract: Current architectures are now equipped with matrix computation units designed to enhance AI and high-performance computing applications. Within these architectures, two fundamental instruction types are matrix multiplication and vector outer product, with the latter being lighter due to its vector inputs. This characteristic not only allows for the development of flexible algorithms beyond dense l… ▽ More

    Submitted 1 March, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

  48. arXiv:2310.11730  [pdf, other

    cs.LG cs.AI cs.CR cs.DC

    Federated Heterogeneous Graph Neural Network for Privacy-preserving Recommendation

    Authors: Bo Yan, Yang Cao, Haoyu Wang, Wenchuan Yang, Junping Du, Chuan Shi

    Abstract: The heterogeneous information network (HIN), which contains rich semantics depicted by meta-paths, has emerged as a potent tool for mitigating data sparsity in recommender systems. Existing HIN-based recommender systems operate under the assumption of centralized storage and model training. However, real-world data is often distributed due to privacy concerns, leading to the semantic broken issue… ▽ More

    Submitted 28 February, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

    Comments: Accepted by WWW 2024

  49. arXiv:2310.01839  [pdf

    eess.AS cs.CL cs.SD

    Preserving Phonemic Distinctions for Ordinal Regression: A Novel Loss Function for Automatic Pronunciation Assessment

    Authors: Bi-Cheng Yan, Hsin-Wei Wang, Yi-Cheng Wang, Jiun-Ting Li, Chi-Han Lin, Berlin Chen

    Abstract: Automatic pronunciation assessment (APA) manages to quantify the pronunciation proficiency of a second language (L2) learner in a language. Prevailing approaches to APA normally leverage neural models trained with a regression loss function, such as the mean-squared error (MSE) loss, for proficiency level prediction. Despite most regression models can effectively capture the ordinality of proficie… ▽ More

    Submitted 4 October, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU 2023

  50. arXiv:2309.15826  [pdf, other

    cs.CL cs.SD eess.AS

    Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

    Authors: Brian Yan, Xuankai Chang, Antonios Anastasopoulos, Yuya Fujita, Shinji Watanabe

    Abstract: Recent works in end-to-end speech-to-text translation (ST) have proposed multi-tasking methods with soft parameter sharing which leverage machine translation (MT) data via secondary encoders that map text inputs to an eventual cross-modal representation. In this work, we instead propose a ST/MT multi-tasking framework with hard parameter sharing in which all model parameters are shared cross-modal… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.