Skip to main content

Showing 1–50 of 821 results for author: Huang, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10470  [pdf, other

    cs.RO

    Inverse Design of Snap-Actuated Jumping Robots Powered by Mechanics-Aided Machine Learning

    Authors: Dezhong Tong, Zhuonan Hao, Mingchao Liu, Weicheng Huang

    Abstract: Exploring the design and control strategies of soft robots through simulation is highly attractive due to its cost-effectiveness. Although many existing models (e.g., finite element analysis) are effective for simulating soft robotic dynamics, there remains a need for a general and efficient numerical simulation approach in the soft robotics community. In this paper, we develop a discrete differen… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 8 pages, 6 figures

  2. arXiv:2408.10115  [pdf, other

    cs.CL

    GLIMMER: Incorporating Graph and Lexical Features in Unsupervised Multi-Document Summarization

    Authors: Ran Liu, Ming Liu, Min Yu, Jianguo Jiang, Gang Li, Dan Zhang, Jingyuan Li, Xiang Meng, Weiqing Huang

    Abstract: Pre-trained language models are increasingly being used in multi-document summarization tasks. However, these models need large-scale corpora for pre-training and are domain-dependent. Other non-neural unsupervised summarization approaches mostly rely on key sentence extraction, which can lead to information loss. To address these challenges, we propose a lightweight yet effective unsupervised app… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 19 pages, 7 figures. Accepted by ECAI 2024

  3. arXiv:2408.09839  [pdf, other

    cs.CV cs.AI

    Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving

    Authors: Jun Yan, Pengyu Wang, Danni Wang, Weiquan Huang, Daniel Watzenig, Huilin Yin

    Abstract: Semantic segmentation is a significant perception task in autonomous driving. It suffers from the risks of adversarial examples. In the past few years, deep learning has gradually transitioned from convolutional neural network (CNN) models with a relatively small number of parameters to foundation models with a huge number of parameters. The segment-anything model (SAM) is a generalized image segm… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted to IAVVC 2024

  4. arXiv:2408.08108  [pdf, other

    cs.CV

    Unsupervised Part Discovery via Dual Representation Alignment

    Authors: Jiahao Xia, Wenjian Huang, Min Xu, Jianguo Zhang, Haimin Zhang, Ziyu Sheng, Dong Xu

    Abstract: Object parts serve as crucial intermediate representations in various downstream tasks, but part-level representation learning still has not received as much attention as other vision tasks. Previous research has established that Vision Transformer can learn instance-level attention without labels, extracting high-quality instance-level representations for boosting downstream tasks. In this paper,… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Accepted by TPAMI-2024

  5. arXiv:2408.08072  [pdf, other

    cs.CL

    I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm

    Authors: Yiming Liang, Ge Zhang, Xingwei Qu, Tianyu Zheng, Jiawei Guo, Xinrun Du, Zhenzhu Yang, Jiaheng Liu, Chenghua Lin, Lei Ma, Wenhao Huang, Jiajun Zhang

    Abstract: Large Language Models (LLMs) have achieved significant advancements, however, the common learning paradigm treats LLMs as passive information repositories, neglecting their potential for active learning and alignment. Some approaches train LLMs using their own generated synthetic data, exploring the possibility of active alignment. However, there is still a huge gap between these one-time alignmen… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  6. arXiv:2408.06761  [pdf, other

    cs.CV cs.AI

    Cross-View Geolocalization and Disaster Mapping with Street-View and VHR Satellite Imagery: A Case Study of Hurricane IAN

    Authors: Hao Li, Fabian Deuser, Wenping Yina, Xuanshu Luo, Paul Walther, Gengchen Mai, Wei Huang, Martin Werner

    Abstract: Nature disasters play a key role in shaping human-urban infrastructure interactions. Effective and efficient response to natural disasters is essential for building resilience and a sustainable urban environment. Two types of information are usually the most necessary and difficult to gather in disaster response. The first information is about disaster damage perception, which shows how badly peop… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  7. arXiv:2408.05699  [pdf, other

    cs.CV

    MacFormer: Semantic Segmentation with Fine Object Boundaries

    Authors: Guoan Xu, Wenfeng Huang, Tao Wu, Ligeng Chen, Wenjing Jia, Guangwei Gao, Xiatian Zhu, Stuart Perry

    Abstract: Semantic segmentation involves assigning a specific category to each pixel in an image. While Vision Transformer-based models have made significant progress, current semantic segmentation methods often struggle with precise predictions in localized areas like object boundaries. To tackle this challenge, we introduce a new semantic segmentation architecture, ``MacFormer'', which features two key co… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 13 pages, 7 figures, submitted to TIP

  8. arXiv:2408.05584  [pdf

    cs.LG stat.ME

    Dynamical causality under invisible confounders

    Authors: Jinling Yan, Shao-Wu Zhang, Chihao Zhang, Weitian Huang, Jifan Shi, Luonan Chen

    Abstract: Causality inference is prone to spurious causal interactions, due to the substantial confounders in a complex system. While many existing methods based on the statistical methods or dynamical methods attempt to address misidentification challenges, there remains a notable lack of effective methods to infer causality, in particular in the presence of invisible/unobservable confounders. As a result,… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 23 pages, 5 figures

  9. arXiv:2408.04889  [pdf, other

    cs.MM

    Deep joint source-channel coding for wireless point cloud transmission

    Authors: Cixiao Zhang, Mufan Liu, Wenjie Huang, Yin Xu, Yiling Xu, Dazhi He

    Abstract: The growing demand for high-quality point cloud transmission over wireless networks presents significant challenges, primarily due to the large data sizes and the need for efficient encoding techniques. In response to these challenges, we introduce a novel system named Deep Point Cloud Semantic Transmission (PCST), designed for end-to-end wireless point cloud transmission. Our approach employs a p… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  10. arXiv:2408.02501  [pdf, ps, other

    cs.IT eess.SP

    Fair Resource Allocation For Hierarchical Federated Edge Learning in Space-Air-Ground Integrated Networks via Deep Reinforcement Learning with Hybrid Control

    Authors: Chong Huang, Gaojie Chen, Pei Xiao, Jonathon A. Chambers, Wei Huang

    Abstract: The space-air-ground integrated network (SAGIN) has become a crucial research direction in future wireless communications due to its ubiquitous coverage, rapid and flexible deployment, and multi-layer cooperation capabilities. However, integrating hierarchical federated learning (HFL) with edge computing and SAGINs remains a complex open issue to be resolved. This paper proposes a novel framework… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted for publication in IEEE Journal on Selected Areas in Communications

  11. arXiv:2408.02283  [pdf, other

    cs.GT

    Enhanced Equilibria-Solving via Private Information Pre-Branch Structure in Adversarial Team Games

    Authors: Chen Qiu, Haobo Fu, Kai Li, Weixin Huang, Jiajia Zhang, Xuan Wang

    Abstract: In ex ante coordinated adversarial team games (ATGs), a team competes against an adversary, and the team members are only allowed to coordinate their strategies before the game starts. The team-maxmin equilibrium with correlation (TMECor) is a suitable solution concept for ATGs. One class of TMECor-solving methods transforms the problem into solving NE in two-player zero-sum games, leveraging well… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 13 pages, 4 figures

  12. arXiv:2408.01429  [pdf, ps, other

    cs.NI cs.AI cs.LG

    An Agile Adaptation Method for Multi-mode Vehicle Communication Networks

    Authors: Shiwen He, Kanghong Chen, Shiyue Huang, Wei Huang, Zhenyu An

    Abstract: This paper focuses on discovering the impact of communication mode allocation on communication efficiency in the vehicle communication networks. To be specific, Markov decision process and reinforcement learning are applied to establish an agile adaptation mechanism for multi-mode communication devices according to the driving scenarios and business requirements. Then, Q-learning is used to train… ▽ More

    Submitted 18 July, 2024; originally announced August 2024.

  13. arXiv:2407.20584  [pdf, other

    cs.CL cs.AI

    Pruning Large Language Models with Semi-Structural Adaptive Sparse Training

    Authors: Weiyu Huang, Guohao Jian, Yuezhou Hu, Jun Zhu, Jianfei Chen

    Abstract: Transformer-based Large Language Models (LLMs) have demonstrated remarkable success across various challenging tasks. However, the deployment of LLMs is hindered by their substantial parameter count and memory consumption. Recently, numerous studies have attempted to compress LLMs by pruning them using training-free methods. However, these pruned models often experience significant performance deg… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  14. arXiv:2407.19832  [pdf, other

    cs.CV cs.AI cs.CL

    ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2

    Authors: Wenjun Huang, Jianguo Hu

    Abstract: Multimodal Large Language Models (MLLMs) have attracted much attention for their multifunctionality. However, traditional Transformer architectures incur significant overhead due to their secondary computational complexity. To address this issue, we introduce ML-Mamba, a multimodal language model, which utilizes the latest and efficient Mamba-2 model for inference. Mamba-2 is known for its linear… ▽ More

    Submitted 14 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2403.13600, arXiv:2406.07537 by other authors

  15. arXiv:2407.18743  [pdf, other

    cs.CL

    Towards Effective and Efficient Continual Pre-training of Large Language Models

    Authors: Jie Chen, Zhipeng Chen, Jiapeng Wang, Kun Zhou, Yutao Zhu, Jinhao Jiang, Yingqian Min, Wayne Xin Zhao, Zhicheng Dou, Jiaxin Mao, Yankai Lin, Ruihua Song, Jun Xu, Xu Chen, Rui Yan, Zhewei Wei, Di Hu, Wenbing Huang, Ji-Rong Wen

    Abstract: Continual pre-training (CPT) has been an important approach for adapting language models to specific domains or tasks. To make the CPT approach more traceable, this paper presents a technical report for continually pre-training Llama-3 (8B), which significantly enhances the Chinese language ability and scientific reasoning ability of the backbone model. To enhance the new abilities while retaining… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 16 pages, 10 figures, 16 tables

    MSC Class: 68T50 ACM Class: I.2.7

  16. arXiv:2407.17379  [pdf, other

    cs.CV cs.CL

    MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language Models

    Authors: Siwei Wu, Kang Zhu, Yu Bai, Yiming Liang, Yizhi Li, Haoning Wu, J. H. Liu, Ruibo Liu, Xingwei Qu, Xuxin Cheng, Ge Zhang, Wenhao Huang, Chenghua Lin

    Abstract: Given the remarkable success that large visual language models (LVLMs) have achieved in image perception tasks, the endeavor to make LVLMs perceive the world like humans is drawing increasing attention. Current multi-modal benchmarks primarily focus on facts or specific topic-related knowledge contained within individual images. However, they often overlook the associative relations between multip… ▽ More

    Submitted 5 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: VLMs, Multi-Image Association

  17. arXiv:2407.16850  [pdf, other

    cs.SI cs.DS cs.IR

    Covering a Graph with Dense Subgraph Families, via Triangle-Rich Sets

    Authors: Sabyasachi Basu, Daniel Paul-Pena, Kun Qian, C. Seshadhri, Edward W Huang, Karthik Subbian

    Abstract: Graphs are a fundamental data structure used to represent relationships in domains as diverse as the social sciences, bioinformatics, cybersecurity, the Internet, and more. One of the central observations in network science is that real-world graphs are globally sparse, yet contains numerous "pockets" of high edge density. A fundamental task in graph mining is to discover these dense subgraphs. Mo… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  18. SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition

    Authors: Wenbo Huang, Jinghui Zhang, Xuwei Qian, Zhen Wu, Meng Wang, Lei Zhang

    Abstract: High frame-rate (HFR) videos of action recognition improve fine-grained expression while reducing the spatio-temporal relation and motion information density. Thus, large amounts of video samples are continuously required for traditional data-driven training. However, samples are not always sufficient in real-world scenarios, promoting few-shot action recognition (FSAR) research. We observe that m… ▽ More

    Submitted 24 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  19. arXiv:2407.15883  [pdf, other

    cs.CV

    A Novel Method to Improve Quality Surface Coverage in Multi-View Capture

    Authors: Wei-Lun Huang, Davood Tashayyod, Amir Gandjbakhche, Michael Kazhdan, Mehran Armand

    Abstract: The depth of field of a camera is a limiting factor for applications that require taking images at a short subject-to-camera distance or using a large focal length, such as total body photography, archaeology, and other close-range photogrammetry applications. Furthermore, in multi-view capture, where the target is larger than the camera's field of view, an efficient way to optimize surface covera… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: submitted version 1

  20. SNNGX: Securing Spiking Neural Networks with Genetic XOR Encryption on RRAM-based Neuromorphic Accelerator

    Authors: Kwunhang Wong, Songqi Wang, Wei Huang, Xinyuan Zhang, Yangu He, Karl M. H. Lai, Yuzhong Jiao, Ning Lin, Xiaojuan Qi, Xiaoming Chen, Zhongrui Wang

    Abstract: Biologically plausible Spiking Neural Networks (SNNs), characterized by spike sparsity, are growing tremendous attention over intellectual edge devices and critical bio-medical applications as compared to artificial neural networks (ANNs). However, there is a considerable risk from malicious attempts to extract white-box information (i.e., weights) from SNNs, as attackers could exploit well-traine… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: International Conference on Computer-Aided Design 2024

  21. arXiv:2407.14996  [pdf, other

    cs.LG

    All Against Some: Efficient Integration of Large Language Models for Message Passing in Graph Neural Networks

    Authors: Ajay Jaiswal, Nurendra Choudhary, Ravinarayana Adkathimar, Muthu P. Alagappan, Gaurush Hiranandani, Ying Ding, Zhangyang Wang, Edward W Huang, Karthik Subbian

    Abstract: Graph Neural Networks (GNNs) have attracted immense attention in the past decade due to their numerous real-world applications built around graph-structured data. On the other hand, Large Language Models (LLMs) with extensive pretrained knowledge and powerful semantic comprehension abilities have recently shown a remarkable ability to benefit applications using vision and text data. In this paper,… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  22. arXiv:2407.13217  [pdf, other

    cs.CV

    LIDIA: Precise Liver Tumor Diagnosis on Multi-Phase Contrast-Enhanced CT via Iterative Fusion and Asymmetric Contrastive Learning

    Authors: Wei Huang, Wei Liu, Xiaoming Zhang, Xiaoli Yin, Xu Han, Chunli Li, Yuan Gao, Yu Shi, Le Lu, Ling Zhang, Lei Zhang, Ke Yan

    Abstract: The early detection and precise diagnosis of liver tumors are tasks of critical clinical value, yet they pose significant challenges due to the high heterogeneity and variability of liver tumors. In this work, a precise LIver tumor DIAgnosis network on multi-phase contrast-enhance CT, named LIDIA, is proposed for real-world scenario. To fully utilize all available phases in contrast-enhanced CT, L… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to MICCAI 2024

  23. arXiv:2407.12899  [pdf, other

    cs.CV cs.AI cs.MM

    DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion

    Authors: Huiguo He, Huan Yang, Zixi Tuo, Yuan Zhou, Qiuyue Wang, Yuhang Zhang, Zeyu Liu, Wenhao Huang, Hongyang Chao, Jian Yin

    Abstract: Story visualization aims to create visually compelling images or videos corresponding to textual narratives. Despite recent advances in diffusion models yielding promising results, existing methods still struggle to create a coherent sequence of subject-consistent frames based solely on a story. To this end, we propose DreamStory, an automatic open-domain story visualization framework by leveragin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  24. arXiv:2407.12505  [pdf, other

    cs.LG cs.AI cs.RO

    Subequivariant Reinforcement Learning in 3D Multi-Entity Physical Environments

    Authors: Runfa Chen, Ling Wang, Yu Du, Tianrui Xue, Fuchun Sun, Jianwei Zhang, Wenbing Huang

    Abstract: Learning policies for multi-entity systems in 3D environments is far more complicated against single-entity scenarios, due to the exponential expansion of the global state space as the number of entities increases. One potential solution of alleviating the exponential complexity is dividing the global space into independent local views that are invariant to transformations including translations a… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ICML 2024

  25. arXiv:2407.12487  [pdf, other

    cs.HC

    Application of Prompt Learning Models in Identifying the Collaborative Problem Solving Skills in an Online Task

    Authors: Mengxiao Zhu, Xin Wang, Xiantao Wang, Zihang Chen, Wei Huang

    Abstract: Collaborative problem solving (CPS) competence is considered one of the essential 21st-century skills. To facilitate the assessment and learning of CPS competence, researchers have proposed a series of frameworks to conceptualize CPS and explored ways to make sense of the complex processes involved in collaborative problem solving. However, encoding explicit behaviors into subskills within the fra… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  26. arXiv:2407.11717  [pdf, other

    cs.CV

    Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models

    Authors: Chen Ju, Haicheng Wang, Haozhe Cheng, Xu Chen, Zhonghua Zhai, Weilin Huang, Jinsong Lan, Shuai Xiao, Bo Zheng

    Abstract: Vision-Language Large Models (VLMs) recently become primary backbone of AI, due to the impressive performance. However, their expensive computation costs, i.e., throughput and delay, impede potentials in the real-world scenarios. To achieve acceleration for VLMs, most existing methods focus on the model perspective: pruning, distillation, quantization, but completely overlook the data-perspective… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. The first two authors share the same contribution. arXiv admin note: substantial text overlap with arXiv:2312.07408

  27. arXiv:2407.10943  [pdf, other

    cs.RO cs.CV

    GRUtopia: Dream General Robots in a City at Scale

    Authors: Hanqing Wang, Jiahe Chen, Wensi Huang, Qingwei Ben, Tai Wang, Boyu Mi, Tao Huang, Siheng Zhao, Yilun Chen, Sizhe Yang, Peizhou Cao, Wenye Yu, Zichao Ye, Jialun Li, Junfeng Long, Zirui Wang, Huiling Wang, Ying Zhao, Zhongying Tu, Yu Qiao, Dahua Lin, Jiangmiao Pang

    Abstract: Recent works have been exploring the scaling laws in the field of Embodied AI. Given the prohibitive costs of collecting real-world data, we believe the Simulation-to-Real (Sim2Real) paradigm is a crucial step for scaling the learning of embodied models. This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots. It features several advancements:… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  28. arXiv:2407.09904  [pdf, other

    cs.LG

    Learning a Mini-batch Graph Transformer via Two-stage Interaction Augmentation

    Authors: Wenda Li, Kaixuan Chen, Shunyu Liu, Tongya Zheng, Wenjie Huang, Mingli Song

    Abstract: Mini-batch Graph Transformer (MGT), as an emerging graph learning model, has demonstrated significant advantages in semi-supervised node prediction tasks with improved computational efficiency and enhanced model robustness. However, existing methods for processing local information either rely on sampling or simple aggregation, which respectively result in the loss and squashing of critical neighb… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures, Accept by ECAI2024

  29. arXiv:2407.09886  [pdf, other

    eess.AS cs.CL cs.SD

    Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation

    Authors: Chun-Yi Kuan, Chih-Kai Yang, Wei-Ping Huang, Ke-Han Lu, Hung-yi Lee

    Abstract: In this work, we introduce Speech-Copilot, a modular framework for instruction-oriented speech-processing tasks that minimizes human effort in toolset construction. Unlike end-to-end methods using large audio-language models, Speech-Copilot builds speech processing-specific toolsets by analyzing pre-collected task instructions and breaking tasks into manageable sub-tasks. It features a flexible ag… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 8 pages, 2 figures

  30. arXiv:2407.09241  [pdf, other

    cs.CL

    The Sociolinguistic Foundations of Language Modeling

    Authors: Jack Grieve, Sara Bartl, Matteo Fuoli, Jason Grafmiller, Weihang Huang, Alejandro Jawerbaum, Akira Murakami, Marcus Perlman, Dana Roemling, Bodo Winter

    Abstract: In this paper, we introduce a sociolinguistic perspective on language modeling. We claim that large language models are inherently models of varieties of language, and we consider how this insight can inform the development and deployment of large language models. We begin by presenting a technical definition of the concept of a variety of language as developed in sociolinguistics. We then discuss… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  31. arXiv:2407.08554  [pdf, other

    cs.AI cs.HC

    Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models

    Authors: Wanling Gao, Yunyou Huang, Dandan Cui, Zhuoming Yu, Wenjing Liu, Xiaoshuang Liang, Jiahui Zhao, Jiyue Xie, Hao Li, Li Ma, Ning Ye, Yumiao Kang, Dingfeng Luo, Peng Pan, Wei Huang, Zhongmou Liu, Jizhong Hu, Gangyuan Zhao, Chongrong Jiang, Fan Huang, Tianyi Wei, Suqin Tang, Bingjie Xia, Zhifei Zhang, Jianfeng Zhan

    Abstract: A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of cl… ▽ More

    Submitted 28 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: 24 pages

  32. arXiv:2407.07304  [pdf, other

    cs.AI

    Inference Performance Optimization for Large Language Models on CPUs

    Authors: Pujiang He, Shan Zhou, Wenhuan Huang, Changqing Li, Duyi Wang, Bin Guo, Chen Meng, Sheng Gui, Weifei Yu, Yi Xie

    Abstract: Large language models (LLMs) have shown exceptional performance and vast potential across diverse tasks. However, the deployment of LLMs with high performance in low-resource environments has garnered significant attention in the industry. When GPU hardware resources are limited, we can explore alternative options on CPUs. To mitigate the financial burden and alleviate constraints imposed by hardw… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 5 pages, 6 figure, ICML 2024 on Foundation Models in the Wild

  33. arXiv:2407.07014  [pdf, other

    cs.NE

    An Attempt to Devise a Pairwise Ising-Type Maximum Entropy Model Integrated Cost Function for Optimizing SNN Deployment

    Authors: Wanhong Huang

    Abstract: The deployment process of a spiking neural network (SNN) often involves partitioning the neural network and mapping these partitions onto processing units within the neuromorphic hardware. Finding optimal deployment schemes is an NP-hard problem. Optimizing these schemes presents challenges, particular in devising computationally effective cost functions optimization objectives such as communicati… ▽ More

    Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  34. arXiv:2407.06360  [pdf, other

    cs.SE

    CodeCSE: A Simple Multilingual Model for Code and Comment Sentence Embeddings

    Authors: Anthony Varkey, Siyuan Jiang, Weijing Huang

    Abstract: Pretrained language models for code token embeddings are used in code search, code clone detection, and other code-related tasks. Similarly, code function embeddings are useful in such tasks. However, there are no out-of-box models for function embeddings in the current literature. So, this paper proposes CodeCSE, a contrastive learning model that learns embeddings for functions and their descript… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  35. arXiv:2407.00029  [pdf, other

    cs.DC

    Distributed Inference Performance Optimization for LLMs on CPUs

    Authors: Pujiang He, Shan Zhou, Changqing Li, Wenhuan Huang, Weifei Yu, Duyi Wang, Chen Meng, Sheng Gui

    Abstract: Large language models (LLMs) hold tremendous potential for addressing numerous real-world challenges, yet they typically demand significant computational resources and memory. Deploying LLMs onto a resource-limited hardware device with restricted memory capacity presents considerable challenges. Distributed computing emerges as a prevalent strategy to mitigate single-node memory constraints and ex… ▽ More

    Submitted 16 May, 2024; originally announced July 2024.

    Comments: 4 pages, 3 figures, Practical ML for Low Resource Settings Workshop @ ICLR 2024

  36. arXiv:2406.19853  [pdf, other

    cs.CL cs.AI

    YuLan: An Open-source Large Language Model

    Authors: Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou , et al. (13 additional authors not shown)

    Abstract: Large language models (LLMs) have become the foundation of many applications, leveraging their extensive capabilities in processing and understanding natural language. While many open-source LLMs have been released with technical reports, the lack of training details hinders further research and development. This paper presents the development of YuLan, a series of open-source LLMs with $12$ billi… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  37. arXiv:2406.18937  [pdf, other

    cs.LG cs.AI

    Federated Graph Semantic and Structural Learning

    Authors: Wenke Huang, Guancheng Wan, Mang Ye, Bo Du

    Abstract: Federated graph learning collaboratively learns a global graph neural network with distributed graphs, where the non-independent and identically distributed property is one of the major challenges. Most relative arts focus on traditional distributed tasks like images and voices, incapable of graph structures. This paper firstly reveals that local client distortion is brought by both node-level sem… ▽ More

    Submitted 29 June, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Journal ref: International Joint Conference on Artificial Intelligence (IJCAI), 2023

  38. arXiv:2406.17588  [pdf, other

    cs.CL

    LongIns: A Challenging Long-context Instruction-based Exam for LLMs

    Authors: Shawn Gavin, Tuney Zheng, Jiaheng Liu, Quehry Que, Noah Wang, Jian Yang, Chenchen Zhang, Wenhao Huang, Wenhu Chen, Ge Zhang

    Abstract: The long-context capabilities of large language models (LLMs) have been a hot topic in recent years. To evaluate the performance of LLMs in different scenarios, various assessment benchmarks have emerged. However, as most of these benchmarks focus on identifying key information to answer questions, which mainly requires the retrieval ability of LLMs, these benchmarks can partially represent the re… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  39. arXiv:2406.16928  [pdf, other

    eess.SP cs.LG

    A Multi-Resolution Mutual Learning Network for Multi-Label ECG Classification

    Authors: Wei Huang, Ning Wang, Panpan Feng, Haiyan Wang, Zongmin Wang, Bing Zhou

    Abstract: Electrocardiograms (ECG), which record the electrophysiological activity of the heart, have become a crucial tool for diagnosing these diseases. In recent years, the application of deep learning techniques has significantly improved the performance of ECG signal classification. Multi-resolution feature analysis, which captures and processes information at different time scales, can extract subtle… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  40. arXiv:2406.16150  [pdf, other

    eess.IV cs.CV

    Intensity Confusion Matters: An Intensity-Distance Guided Loss for Bronchus Segmentation

    Authors: Haifan Gong, Wenhao Huang, Huan Zhang, Yu Wang, Xiang Wan, Hong Shen, Guanbin Li, Haofeng Li

    Abstract: Automatic segmentation of the bronchial tree from CT imaging is important, as it provides structural information for disease diagnosis. Despite the merits of previous automatic bronchus segmentation methods, they have paied less attention to the issue we term as \textit{Intensity Confusion}, wherein the intensity values of certain background voxels approach those of the foreground voxels within br… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: IEEE International Conference on Multimedia & Expo (ICME) 2024

  41. arXiv:2406.15501  [pdf

    cs.CR

    Secure Combination of Untrusted Time information Based on Optimized Dempster-Shafer Theory

    Authors: Yang Li, Yujie Luo, Yichen Zhang, Ao Sun, Wei Huang, Shuai Zhang, Tao Zhang, Chuang Zhou, Li Ma, Jie Yang, Mei Wu, Heng Wang, Yan Pan, Yun Shao, Xing Chen, Ziyang Chen, Song Yu, Hong Guo, Bingjie Xu

    Abstract: Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  42. arXiv:2406.15166  [pdf, other

    cond-mat.soft cs.GR

    Inverse Design of Planar Clamped-Free Elastic Rods from Noisy Data

    Authors: Dezhong Tong, Zhuonan Hao, Weicheng Huang

    Abstract: Slender structures, such as rods, often exhibit large nonlinear geometrical deformations even under moderate external forces (e.g., gravity). This characteristic results in a rich variety of morphological changes, making them appealing for engineering design and applications, such as soft robots, submarine cables, decorative knots, and more. Prior studies have demonstrated that the natural shape o… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: 21 pages, 9 figures

  43. arXiv:2406.14903  [pdf, other

    cs.AI

    GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models

    Authors: Leyan Wang, Yonggang Jin, Tianhao Shen, Tianyu Zheng, Xinrun Du, Chenchen Zhang, Wenhao Huang, Jiaheng Liu, Shi Wang, Ge Zhang, Liuyu Xiang, Zhaofeng He

    Abstract: As large language models (LLMs) continue to develop and gain widespread application, the ability of LLMs to exhibit empathy towards diverse group identities and understand their perspectives is increasingly recognized as critical. Most existing benchmarks for empathy evaluation of LLMs focus primarily on universal human emotions, such as sadness and pain, often overlooking the context of individua… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  44. arXiv:2406.13923  [pdf, other

    cs.AI cs.CL cs.CV cs.MM

    PIN: A Knowledge-Intensive Dataset for Paired and Interleaved Multimodal Documents

    Authors: Junjie Wang, Yin Zhang, Yatai Ji, Yuxiang Zhang, Chunyang Jiang, Yubo Wang, Kang Zhu, Zekun Wang, Tiezhen Wang, Wenhao Huang, Jie Fu, Bei Chen, Qunshu Lin, Minghao Liu, Ge Zhang, Wenhu Chen

    Abstract: Recent advancements in Large Multimodal Models (LMMs) have leveraged extensive multimodal datasets to enhance capabilities in complex knowledge-driven tasks. However, persistent challenges in perceptual and reasoning errors limit their efficacy, particularly in interpreting intricate visual data and deducing multimodal relationships. Addressing these issues, we introduce a novel dataset format, PI… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  45. arXiv:2406.13698  [pdf, other

    cs.CL

    MMTE: Corpus and Metrics for Evaluating Machine Translation Quality of Metaphorical Language

    Authors: Shun Wang, Ge Zhang, Han Wu, Tyler Loakman, Wenhao Huang, Chenghua Lin

    Abstract: Machine Translation (MT) has developed rapidly since the release of Large Language Models and current MT evaluation is performed through comparison with reference human translations or by predicting quality scores from human-labeled data. However, these mainstream evaluation methods mainly focus on fluency and factual reliability, whilst paying little attention to figurative quality. In this paper… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  46. arXiv:2406.12641  [pdf, other

    cs.CL

    DetectBench: Can Large Language Model Detect and Piece Together Implicit Evidence?

    Authors: Zhouhong Gu, Lin Zhang, Xiaoxuan Zhu, Jiangjie Chen, Wenhao Huang, Yikai Zhang, Shusen Wang, Zheyu Ye, Yan Gao, Hongwei Feng, Yanghua Xiao

    Abstract: Detecting evidence within the context is a key step in the process of reasoning task. Evaluating and enhancing the capabilities of LLMs in evidence detection will strengthen context-based reasoning performance. This paper proposes a benchmark called DetectBench for verifying the ability to detect and piece together implicit evidence within a long context. DetectBench contains 3,928 multiple-choice… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  47. arXiv:2406.12539  [pdf, other

    cs.LG cs.AI

    The Heterophilic Snowflake Hypothesis: Training and Empowering GNNs for Heterophilic Graphs

    Authors: Kun Wang, Guibin Zhang, Xinnan Zhang, Junfeng Fang, Xun Wu, Guohao Li, Shirui Pan, Wei Huang, Yuxuan Liang

    Abstract: Graph Neural Networks (GNNs) have become pivotal tools for a range of graph-based learning tasks. Notably, most current GNN architectures operate under the assumption of homophily, whether explicitly or implicitly. While this underlying assumption is frequently adopted, it is not universally applicable, which can result in potential shortcomings in learning effectiveness. In this paper, \textbf{fo… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  48. arXiv:2406.11775  [pdf, other

    cs.CV cs.AI

    Task Me Anything

    Authors: Jieyu Zhang, Weikai Huang, Zixian Ma, Oscar Michel, Dong He, Tanmay Gupta, Wei-Chiu Ma, Ali Farhadi, Aniruddha Kembhavi, Ranjay Krishna

    Abstract: Benchmarks for large multimodal language models (MLMs) now serve to simultaneously assess the general capabilities of models instead of evaluating for a specific capability. As a result, when a developer wants to identify which models to use for their application, they are overwhelmed by the number of benchmarks and remain uncertain about which benchmark's results are most reflective of their spec… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: website: https://www.task-me-anything.org

  49. arXiv:2406.11455  [pdf, other

    cs.CL cs.AI

    Adaptive Reinforcement Learning Planning: Harnessing Large Language Models for Complex Information Extraction

    Authors: Zepeng Ding, Ruiyang Ke, Wenhao Huang, Guochao Jiang, Yanda Li, Deqing Yang, Yanghua Xiao, Jiaqing Liang

    Abstract: Existing research on large language models (LLMs) shows that they can solve information extraction tasks through multi-step planning. However, their extraction behavior on complex sentences and tasks is unstable, emerging issues such as false positives and missing elements. We observe that decomposing complex extraction tasks and extracting them step by step can effectively improve LLMs' performan… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  50. arXiv:2406.11064  [pdf, other

    eess.AS cs.SD

    Continual Test-time Adaptation for End-to-end Speech Recognition on Noisy Speech

    Authors: Guan-Ting Lin, Wei-Ping Huang, Hung-yi Lee

    Abstract: Deep learning-based end-to-end automatic speech recognition (ASR) has made significant strides but still struggles with performance on out-of-domain (OOD) samples due to domain shifts in real-world scenarios. Test-Time Adaptation (TTA) methods address this issue by adapting models using test samples at inference time. However, current ASR TTA methods have largely focused on non-continual TTA, whic… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 13 pages