Skip to main content

Showing 1–50 of 574 results for author: Hu, W

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.09526  [pdf, other

    cs.LG

    Fine-gained air quality inference based on low-quality sensing data using self-supervised learning

    Authors: Meng Xu, Ke Han, Weijian Hu, Wen Ji

    Abstract: Fine-grained air quality (AQ) mapping is made possible by the proliferation of cheap AQ micro-stations (MSs). However, their measurements are often inaccurate and sensitive to local disturbances, in contrast to standardized stations (SSs) that provide accurate readings but fall short in number. To simultaneously address the issues of low data quality (MSs) and high label sparsity (SSs), a multi-ta… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 17 pages

  2. arXiv:2408.09106  [pdf, other

    q-bio.BM cs.AI

    Fragment-Masked Molecular Optimization

    Authors: Kun Li, Xiantao Cai, Jia Wu, Bo Du, Wenbin Hu

    Abstract: Molecular optimization is a crucial aspect of drug discovery, aimed at refining molecular structures to enhance drug efficacy and minimize side effects, ultimately accelerating the overall drug development process. Many target-based molecular optimization methods have been proposed, significantly advancing drug discovery. These methods primarily on understanding the specific drug target structures… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 11 pages, 5 figures, 2 tables

  3. arXiv:2408.07884  [pdf, other

    cs.CL

    Instruct Large Language Models to Generate Scientific Literature Survey Step by Step

    Authors: Yuxuan Lai, Yupeng Wu, Yidan Wang, Wenpeng Hu, Chen Zheng

    Abstract: Abstract. Automatically generating scientific literature surveys is a valuable task that can significantly enhance research efficiency. However, the diverse and complex nature of information within a literature survey poses substantial challenges for generative models. In this paper, we design a series of prompts to systematically leverage large language models (LLMs), enabling the creation of com… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: NLPCC 2024

  4. Towards Enhanced Context Awareness with Vision-based Multimodal Interfaces

    Authors: Yongquan Hu, Wen Hu, Aaron Quigley

    Abstract: Vision-based Interfaces (VIs) are pivotal in advancing Human-Computer Interaction (HCI), particularly in enhancing context awareness. However, there are significant opportunities for these interfaces due to rapid advancements in multimodal Artificial Intelligence (AI), which promise a future of tight coupling between humans and intelligent systems. AI-driven VIs, when integrated with other modalit… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 3 pages, MOBILEHCI Adjunct '24 26th International Conference on Mobile Human-Computer Interaction, September 30-October 3, 2024, Melbourne, VIC, Australia

  5. Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental Health

    Authors: Yongquan Hu, Shuning Zhang, Ting Dang, Hong Jia, Flora D. Salim, Wen Hu, Aaron J. Quigley

    Abstract: Integrating physiological signals such as electroencephalogram (EEG), with other data such as interview audio, may offer valuable multimodal insights into psychological states or neurological disorders. Recent advancements with Large Language Models (LLMs) position them as prospective ``health agents'' for mental health assessment. However, current research predominantly focus on single data modal… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 6 pages; UbiComp Companion '24, Companion of the 2024 ACM International Joint Conference on Pervasive and Ubiquitous Computing, October 5--9, 2024}{Melbourne, VIC, Australia

  6. MultiSurf-GPT: Facilitating Context-Aware Reasoning with Large-Scale Language Models for Multimodal Surface Sensing

    Authors: Yongquan Hu, Black Sun, Pengcheng An, Zhuying Li, Wen Hu, Aaron J. Quigley

    Abstract: Surface sensing is widely employed in health diagnostics, manufacturing and safety monitoring. Advances in mobile sensing affords this potential for context awareness in mobile computing, typically with a single sensing modality. Emerging multimodal large-scale language models offer new opportunities. We propose MultiSurf-GPT, which utilizes the advanced capabilities of GPT-4o to process and inter… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 6 pages. MOBILEHCI Adjunct '24, 26th International Conference on Mobile Human-Computer Interaction, September 30-October 3, 2024, Melbourne, VIC, Australia

  7. arXiv:2408.07091  [pdf, other

    cs.LG cs.AI cs.CL

    Node Level Graph Autoencoder: Unified Pretraining for Textual Graph Learning

    Authors: Wenbin Hu, Huihao Jing, Qi Hu, Haoran Li, Yangqiu Song

    Abstract: Textual graphs are ubiquitous in real-world applications, featuring rich text information with complex relationships, which enables advanced research across various fields. Textual graph representation learning aims to generate low-dimensional feature embeddings from textual graphs that can improve the performance of downstream tasks. A high-quality feature embedding should effectively capture bot… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  8. arXiv:2408.05440  [pdf

    cs.CV eess.IV

    Content-decoupled Contrastive Learning-based Implicit Degradation Modeling for Blind Image Super-Resolution

    Authors: Jiang Yuan, Ji Ma, Bo Wang, Weiming Hu

    Abstract: Implicit degradation modeling-based blind super-resolution (SR) has attracted more increasing attention in the community due to its excellent generalization to complex degradation scenarios and wide application range. How to extract more discriminative degradation representations and fully adapt them to specific image features is the key to this task. In this paper, we propose a new Content-decoup… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  9. arXiv:2408.04831  [pdf, other

    cs.CV cs.AI

    Self-augmented Gaussian Splatting with Structure-aware Masks for Sparse-view 3D Reconstruction

    Authors: Lingbei Meng, Bi'an Du, Wei Hu

    Abstract: Sparse-view 3D reconstruction stands as a formidable challenge in computer vision, aiming to build complete three-dimensional models from a limited array of viewing perspectives. This task confronts several difficulties: 1) the limited number of input images that lack consistent information; 2) dependence on the quality of input images; and 3) the substantial size of model parameters. To address t… ▽ More

    Submitted 14 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

  10. arXiv:2408.04734  [pdf, other

    cs.HC cs.MA hep-ex

    A Multi-Scale Cognitive Interaction Model of Instrument Operations at the Linac Coherent Light Source

    Authors: Jonathan Segal, Wan-Lin Hu, Paul Fuoss, Frank E. Ritter, Jeff Shrager

    Abstract: We describe a novel multi-agent, multi-scale computational cognitive interaction model of instrument operations at the Linac Coherent Light Source (LCLS). A leading scientific user facility, LCLS is the world's first hard x-ray free electron laser, operated by the SLAC National Accelerator Laboratory for the U.S. Department of Energy. As the world's first x-ray free electron laser, LCLS is in high… ▽ More

    Submitted 14 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: Supplemental videos: https://www.youtube.com/playlist?list=PLI13S4Z1cbXggy98pDXjqnVnnoekohF2f

  11. arXiv:2408.04323  [pdf, other

    cs.DC

    Towards SLO-Optimized LLM Serving via Automatic Inference Engine Tuning

    Authors: Ke Cheng, Zhi Wang, Wen Hu, Tiannuo Yang, Jianguo Li, Sheng Zhang

    Abstract: A service-level objective (SLO) is a target performance metric of service that cloud vendors aim to ensure. Delivering optimized SLOs can enhance user satisfaction and improve the competitiveness of cloud vendors. As large language models (LLMs) are gaining increasing popularity across various fields, it is of great significance to optimize SLOs for LLM inference services. In this paper, we observ… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  12. arXiv:2408.01694  [pdf, other

    cs.CV

    Bayesian Active Learning for Semantic Segmentation

    Authors: Sima Didari, Wenjun Hu, Jae Oh Woo, Heng Hao, Hankyu Moon, Seungjai Min

    Abstract: Fully supervised training of semantic segmentation models is costly and challenging because each pixel within an image needs to be labeled. Therefore, the sparse pixel-level annotation methods have been introduced to train models with a subset of pixels within each image. We introduce a Bayesian active learning framework based on sparse pixel-level annotation that utilizes a pixel-level Bayesian u… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  13. arXiv:2408.00462  [pdf, other

    cs.AR cs.LG

    Designing Efficient LLM Accelerators for Edge Devices

    Authors: Jude Haris, Rappy Saha, Wenhao Hu, José Cano

    Abstract: The increase in open-source availability of Large Language Models (LLMs) has enabled users to deploy them on more and more resource-constrained edge devices to reduce reliance on network connections and provide more privacy. However, the high computation and memory demands of LLMs make their execution on resource-constrained edge devices challenging and inefficient. To address this issue, designin… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  14. arXiv:2407.20060  [pdf, other

    cs.LG cs.AI cs.DB

    RelBench: A Benchmark for Deep Learning on Relational Databases

    Authors: Joshua Robinson, Rishabh Ranjan, Weihua Hu, Kexin Huang, Jiaqi Han, Alejandro Dobles, Matthias Fey, Jan E. Lenssen, Yiwen Yuan, Zecheng Zhang, Xinwei He, Jure Leskovec

    Abstract: We present RelBench, a public benchmark for solving predictive tasks over relational databases with graph neural networks. RelBench provides databases and tasks spanning diverse domains and scales, and is intended to be a foundational infrastructure for future research. We use RelBench to conduct the first comprehensive study of Relational Deep Learning (RDL) (Fey et al., 2024), which combines gra… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  15. arXiv:2407.18903  [pdf, other

    cs.CE

    Using high-fidelity discrete element simulation to calibrate an expeditious terramechanics model in a multibody dynamics framework

    Authors: Yuemin Zhang, Junpeng Dai, Wei Hu, Dan Negrut

    Abstract: The wheel-soil interaction has great impact on the dynamics of off-road vehicles in terramechanics applications. The Soil Contact Model (SCM), which anchors an empirical method to characterize the frictional contact between a wheel and soil, has been widely used in off-road vehicle dynamics simulations because it quickly produces adequate results for many terramechanics applications. The SCM appro… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: version has Appendix

    MSC Class: 70-10

  16. arXiv:2407.16127  [pdf, other

    cs.CL cs.AI

    Finetuning Generative Large Language Models with Discrimination Instructions for Knowledge Graph Completion

    Authors: Yang Liu, Xiaobin Tian, Zequn Sun, Wei Hu

    Abstract: Traditional knowledge graph (KG) completion models learn embeddings to predict missing facts. Recent works attempt to complete KGs in a text-generation manner with large language models (LLMs). However, they need to ground the output of LLMs to KG entities, which inevitably brings errors. In this paper, we present a finetuning framework, DIFT, aiming to unleash the KG completion ability of LLMs an… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted in the 23rd International Semantic Web Conference (ISWC 2024)

  17. MicroCam: Leveraging Smartphone Microscope Camera for Context-Aware Contact Surface Sensing

    Authors: Yongquan Hu, Hui-Shyong Yeo, Mingyue Yuan, Haoran Fan, Don Samitha Elvitigala, Wen Hu, Aaron Quigley

    Abstract: The primary focus of this research is the discreet and subtle everyday contact interactions between mobile phones and their surrounding surfaces. Such interactions are anticipated to facilitate mobile context awareness, encompassing aspects such as dispensing medication updates, intelligently switching modes (e.g., silent mode), or initiating commands (e.g., deactivating an alarm). We introduce Mi… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 28 pages

    Journal ref: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 7.3 (2023): 1-28

  18. arXiv:2407.15309  [pdf, other

    cs.DC cs.LG

    vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving

    Authors: Jiale Xu, Rui Zhang, Cong Guo, Weiming Hu, Zihan Liu, Feiyang Wu, Yu Feng, Shixuan Sun, Changxu Shao, Yuhong Guo, Junping Zhao, Ke Zhang, Minyi Guo, Jingwen Leng

    Abstract: Large Language Models (LLMs) are widely used across various domains, processing millions of daily requests. This surge in demand poses significant challenges in optimizing throughput and latency while keeping costs manageable. The Key-Value (KV) cache, a standard method for retaining previous computations, makes LLM inference highly bounded by memory. While batching strategies can enhance performa… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 16 pages, 12 figures

  19. arXiv:2407.15272  [pdf, other

    cs.CV

    MIBench: Evaluating Multimodal Large Language Models over Multiple Images

    Authors: Haowei Liu, Xi Zhang, Haiyang Xu, Yaya Shi, Chaoya Jiang, Ming Yan, Ji Zhang, Fei Huang, Chunfeng Yuan, Bing Li, Weiming Hu

    Abstract: Built on the power of LLMs, numerous multimodal large language models (MLLMs) have recently achieved remarkable performance on various vision-language tasks across multiple benchmarks. However, most existing MLLMs and benchmarks primarily focus on single-image input scenarios, leaving the performance of MLLMs when handling realistic multiple images remain underexplored. Although a few benchmarks c… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: 10 pages, 4 figures

  20. arXiv:2407.15077  [pdf, other

    cs.MA

    B2MAPO: A Batch-by-Batch Multi-Agent Policy Optimization to Balance Performance and Efficiency

    Authors: Wenjing Zhang, Wei Zhang, Wenqing Hu, Yifan Wang

    Abstract: Most multi-agent reinforcement learning approaches adopt two types of policy optimization methods that either update policy simultaneously or sequentially. Simultaneously updating policies of all agents introduces non-stationarity problem. Although sequentially updating policies agent-by-agent in an appropriate order improves policy performance, it is prone to low efficiency due to sequential exec… ▽ More

    Submitted 29 July, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

  21. arXiv:2407.14086  [pdf, other

    cs.CV

    Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking

    Authors: Yunfei Zhang, Chao Liang, Jin Gao, Zhipeng Zhang, Weiming Hu, Stephen Maybank, Xue Zhou, Liang Li

    Abstract: Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks by incorporating the extraction of appearance features as auxiliary tasks through embedding Re-Identification task (ReID) into the detector, achieving a balance between inference speed and tracking performance. However, solving the competition between the detector and the featu… ▽ More

    Submitted 6 August, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

    Comments: A submission to IJCV

  22. arXiv:2407.13912  [pdf, other

    cs.RO eess.SY

    Optimization-Based Outlier Accommodation for Tightly Coupled RTK-Aided Inertial Navigation Systems in Urban Environments

    Authors: Wang Hu, Yingjie Hu, Mike Stas, Jay A. Farrell

    Abstract: Global Navigation Satellite Systems (GNSS) aided Inertial Navigation System (INS) is a fundamental approach for attaining continuously available absolute vehicle position and full state estimates at high bandwidth. For transportation applications, stated accuracy specifications must be achieved, unless the navigation system can detect when it is violated. In urban environments, GNSS measurements a… ▽ More

    Submitted 27 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: 8 pages, 2 figures. accepted by the 27th IEEE International Conference on Intelligent Transportation Systems (IEEE ITSC 2024)

  23. arXiv:2407.12568  [pdf, other

    cs.CV

    LTRL: Boosting Long-tail Recognition via Reflective Learning

    Authors: Qihao Zhao, Yalun Dai, Shen Lin, Wei Hu, Fan Zhang, Jun Liu

    Abstract: In real-world scenarios, where knowledge distributions exhibit long-tail. Humans manage to master knowledge uniformly across imbalanced distributions, a feat attributed to their diligent practices of reviewing, summarizing, and correcting errors. Motivated by this learning process, we propose a novel learning paradigm, called reflecting learning, in handling long-tail recognition. Our method integ… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV2024

  24. arXiv:2407.11398  [pdf, other

    cs.CV

    Animate3D: Animating Any 3D Model with Multi-view Video Diffusion

    Authors: Yanqin Jiang, Chaohui Yu, Chenjie Cao, Fan Wang, Weiming Hu, Jin Gao

    Abstract: Recent advances in 4D generation mainly focus on generating 4D content by distilling pre-trained text or single-view image-conditioned models. It is inconvenient for them to take advantage of various off-the-shelf 3D assets with multi-view attributes, and their results suffer from spatiotemporal inconsistency owing to the inherent ambiguity in the supervision signals. In this work, we present Anim… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Project Page: https://animate3d.github.io/

  25. arXiv:2407.10990  [pdf

    cs.CL cs.AI

    MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models

    Authors: Mianxin Liu, Jinru Ding, Jie Xu, Weiguo Hu, Xiaoyang Li, Lifeng Zhu, Zhian Bai, Xiaoming Shi, Benyou Wang, Haitao Song, Pengfei Liu, Xiaofan Zhang, Shanshan Wang, Kang Li, Haofen Wang, Tong Ruan, Xuanjing Huang, Xin Sun, Shaoting Zhang

    Abstract: Ensuring the general efficacy and goodness for human beings from medical large language models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce "MedBench", a comprehensive, standardized, and reliable benchmarking system for Chinese med… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

    Comments: 25 pages.4 figures

  26. arXiv:2407.10956  [pdf, other

    cs.AI cs.CL

    Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

    Authors: Ruisheng Cao, Fangyu Lei, Haoyuan Wu, Jixuan Chen, Yeqiao Fu, Hongcheng Gao, Xinzhuang Xiong, Hanchong Zhang, Yuchen Mao, Wenjing Hu, Tianbao Xie, Hongshen Xu, Danyang Zhang, Sida Wang, Ruoxi Sun, Pengcheng Yin, Caiming Xiong, Ansong Ni, Qian Liu, Victor Zhong, Lu Chen, Kai Yu, Tao Yu

    Abstract: Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivit… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 34 pages, 14 figures, 10 tables

  27. arXiv:2407.10430  [pdf, other

    cs.CL cs.AI

    Expanding the Scope: Inductive Knowledge Graph Reasoning with Multi-Starting Progressive Propagation

    Authors: Zhoutian Shao, Yuanning Cui, Wei Hu

    Abstract: Knowledge graphs (KGs) are widely acknowledged as incomplete, and new entities are constantly emerging in the real world. Inductive KG reasoning aims to predict missing facts for these new entities. Among existing models, graph neural networks (GNNs) based ones have shown promising performance for this task. However, they are still challenged by inefficient message propagation due to the distance… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted in the 23rd International Semantic Web Conference (ISWC 2024)

  28. arXiv:2407.07479  [pdf, other

    cs.CV

    How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval?

    Authors: Yuxin Chen, Zongyang Ma, Ziqi Zhang, Zhongang Qi, Chunfeng Yuan, Bing Li, Junfu Pu, Ying Shan, Xiaojuan Qi, Weiming Hu

    Abstract: Dominant dual-encoder models enable efficient image-text retrieval but suffer from limited accuracy while the cross-encoder models offer higher accuracy at the expense of efficiency. Distilling cross-modality matching knowledge from cross-encoder to dual-encoder provides a natural approach to harness their strengths. Thus we investigate the following valuable question: how to make cross-encoder a… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by CVPR 2024

  29. arXiv:2407.07478  [pdf, other

    cs.CV

    EA-VTR: Event-Aware Video-Text Retrieval

    Authors: Zongyang Ma, Ziqi Zhang, Yuxin Chen, Zhongang Qi, Chunfeng Yuan, Bing Li, Yingmin Luo, Xu Li, Xiaojuan Qi, Ying Shan, Weiming Hu

    Abstract: Understanding the content of events occurring in the video and their inherent temporal logic is crucial for video-text retrieval. However, web-crawled pre-training datasets often lack sufficient event information, and the widely adopted video-level cross-modal contrastive learning also struggles to capture detailed and complex video-text event alignment. To address these challenges, we make improv… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  30. arXiv:2407.07403  [pdf, other

    cs.CV

    A Survey of Attacks on Large Vision-Language Models: Resources, Advances, and Future Trends

    Authors: Daizong Liu, Mingyu Yang, Xiaoye Qu, Pan Zhou, Yu Cheng, Wei Hu

    Abstract: With the significant development of large models in recent years, Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities across a wide range of multimodal understanding and reasoning tasks. Compared to traditional Large Language Models (LLMs), LVLMs present great potential and challenges due to its closer proximity to the multi-resource real-world applications and the compl… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  31. arXiv:2407.04675  [pdf, other

    eess.AS cs.SD

    Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition

    Authors: Ye Bai, Jingping Chen, Jitong Chen, Wei Chen, Zhuo Chen, Chuang Ding, Linhao Dong, Qianqian Dong, Yujiao Du, Kepan Gao, Lu Gao, Yi Guo, Minglun Han, Ting Han, Wenchao Hu, Xinying Hu, Yuxiang Hu, Deyu Hua, Lu Huang, Mingkun Huang, Youjia Huang, Jishuo Jin, Fanliu Kong, Zongwei Lan, Tianyu Li , et al. (30 additional authors not shown)

    Abstract: Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor… ▽ More

    Submitted 10 July, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  32. arXiv:2407.03884  [pdf, other

    cs.CL cs.AI

    Planning with Large Language Models for Conversational Agents

    Authors: Zhigen Li, Jianxiang Peng, Yanmeng Wang, Tianhao Shen, Minghui Zhang, Linxi Su, Shang Wu, Yihang Wu, Yuqian Wang, Ye Wang, Wei Hu, Jianfeng Li, Shaojun Wang, Jing Xiao, Deyi Xiong

    Abstract: Controllability and proactivity are crucial properties of autonomous conversational agents (CAs). Controllability requires the CAs to follow the standard operating procedures (SOPs), such as verifying identity before activating credit cards. Proactivity requires the CAs to guide the conversation towards the goal during user uncooperation, such as persuasive dialogue. Existing research cannot be un… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  33. arXiv:2406.19240  [pdf, other

    cs.SE

    Data Preparation for Deep Learning based Code Smell Detection: A Systematic Literature Review

    Authors: Fengji Zhang, Zexian Zhang, Jacky Wai Keung, Xiangru Tang, Zhen Yang, Xiao Yu, Wenhua Hu

    Abstract: Code Smell Detection (CSD) plays a crucial role in improving software quality and maintainability. And Deep Learning (DL) techniques have emerged as a promising approach for CSD due to their superior performance. However, the effectiveness of DL-based CSD methods heavily relies on the quality of the training data. Despite its importance, little attention has been paid to analyzing the data prepara… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  34. arXiv:2406.18078  [pdf, other

    cs.CL cs.AI

    Self-Training with Pseudo-Label Scorer for Aspect Sentiment Quad Prediction

    Authors: Yice Zhang, Jie Zeng, Weiming Hu, Ziyi Wang, Shiwei Chen, Ruifeng Xu

    Abstract: Aspect Sentiment Quad Prediction (ASQP) aims to predict all quads (aspect term, aspect category, opinion term, sentiment polarity) for a given review, which is the most representative and challenging task in aspect-based sentiment analysis. A key challenge in the ASQP task is the scarcity of labeled data, which limits the performance of existing methods. To tackle this issue, we propose a self-tra… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to ACL 2024 Main Conference

  35. arXiv:2406.14473  [pdf, other

    cs.LG cs.CL

    Data-Centric AI in the Age of Large Language Models

    Authors: Xinyi Xu, Zhaoxuan Wu, Rui Qiao, Arun Verma, Yao Shu, Jingtan Wang, Xinyuan Niu, Zhenfeng He, Jiangwei Chen, Zijian Zhou, Gregory Kang Ruey Lau, Hieu Dao, Lucas Agussurja, Rachael Hwee Ling Sim, Xiaoqiang Lin, Wenyang Hu, Zhongxiang Dai, Pang Wei Koh, Bryan Kian Hsiang Low

    Abstract: This position paper proposes a data-centric viewpoint of AI research, focusing on large language models (LLMs). We start by making the key observation that data is instrumental in the developmental (e.g., pretraining and fine-tuning) and inferential stages (e.g., in-context learning) of LLMs, and yet it receives disproportionally low attention from the research community. We identify four specific… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Preprint

  36. arXiv:2406.13511  [pdf, other

    cs.DC

    Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving

    Authors: Ke Cheng, Wen Hu, Zhi Wang, Hongen Peng, Jianguo Li, Sheng Zhang

    Abstract: Large language models (LLMs) iteratively generate text token by token, with memory usage increasing with the length of generated token sequences. The unpredictability of generation lengths makes it difficult to estimate the time and memory needed to process requests, posing a challenge for effective request scheduling. Conventional sequence-level scheduling (SLS) serves requests in a first-come fi… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages, 22 figures

  37. PetalView: Fine-grained Location and Orientation Extraction of Street-view Images via Cross-view Local Search with Supplementary Materials

    Authors: Wenmiao Hu, Yichen Zhang, Yuxuan Liang, Xianjing Han, Yifang Yin, Hannes Kruppa, See-Kiong Ng, Roger Zimmermann

    Abstract: Satellite-based street-view information extraction by cross-view matching refers to a task that extracts the location and orientation information of a given street-view image query by using one or multiple geo-referenced satellite images. Recent work has initiated a new research direction to find accurate information within a local area covered by one satellite image centered at a location prior (… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted by ACM Multimedia 2023. This version contains additional supplementary materials

    Journal ref: Proceedings of the 31st ACM International Conference on Multimedia (2023) 56-66

  38. arXiv:2406.12881  [pdf, other

    physics.acc-ph cs.CL

    Towards Unlocking Insights from Logbooks Using AI

    Authors: Antonin Sulc, Alex Bien, Annika Eichler, Daniel Ratner, Florian Rehm, Frank Mayet, Gregor Hartmann, Hayden Hoschouer, Henrik Tuennermann, Jan Kaiser, Jason St. John, Jennefer Maldonado, Kyle Hazelwood, Raimund Kammering, Thorsten Hellert, Tim Wilksen, Verena Kain, Wan-Lin Hu

    Abstract: Electronic logbooks contain valuable information about activities and events concerning their associated particle accelerator facilities. However, the highly technical nature of logbook entries can hinder their usability and automation. As natural language processing (NLP) continues advancing, it offers opportunities to address various challenges that logbooks present. This work explores jointly t… ▽ More

    Submitted 25 May, 2024; originally announced June 2024.

    Comments: 5 pages, 1 figure, 15th International Particle Accelerator Conference

  39. arXiv:2406.10765  [pdf, other

    cs.DC

    PWDFT-SW: Extending the Limit of Plane-Wave DFT Calculations to 16K Atoms on the New Sunway Supercomputer

    Authors: Qingcai Jiang, Zhenwei Cao, Junshi Chen, Xinming Qin, Wei Hu, Hong An, Jinlong Yang

    Abstract: First-principles density functional theory (DFT) with plane wave (PW) basis set is the most widely used method in quantum mechanical material simulations due to its advantages in accuracy and universality. However, a perceived drawback of PW-based DFT calculations is their substantial computational cost and memory usage, which currently limits their ability to simulate large-scale complex systems… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

  40. arXiv:2406.08835  [pdf, other

    cs.SD eess.AS

    EffectiveASR: A Single-Step Non-Autoregressive Mandarin Speech Recognition Architecture with High Accuracy and Inference Speed

    Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Shuai Gong, Ming Fang, Tao Wei, Zijian Li, Wei Hu, Shaojun Wang, Jing Xiao

    Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. In this paper, we propose a single-step NAR ASR architecture with high accuracy and inference speed, called EffectiveASR. It uses an Index Mappin… ▽ More

    Submitted 19 August, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: Submitted to ICASSP 2025

  41. arXiv:2406.05785  [pdf, other

    cs.CV

    A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions

    Authors: Daizong Liu, Yang Liu, Wencan Huang, Wei Hu

    Abstract: Text-guided 3D visual grounding (T-3DVG), which aims to locate a specific object that semantically corresponds to a language query from a complicated 3D scene, has drawn increasing attention in the 3D research community over the past few years. Compared to 2D visual grounding, this task presents great potential and challenges due to its closer proximity to the real world and the complexity of data… ▽ More

    Submitted 21 July, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  42. arXiv:2406.04983  [pdf, other

    cs.CV

    CityCraft: A Real Crafter for 3D City Generation

    Authors: Jie Deng, Wenhao Chai, Junsheng Huang, Zhonghan Zhao, Qixuan Huang, Mingyan Gao, Jianshu Guo, Shengyu Hao, Wenhao Hu, Jenq-Neng Hwang, Xi Li, Gaoang Wang

    Abstract: City scene generation has gained significant attention in autonomous driving, smart city development, and traffic simulation. It helps enhance infrastructure planning and monitoring solutions. Existing methods have employed a two-stage process involving city layout generation, typically using Variational Autoencoders (VAEs), Generative Adversarial Networks (GANs), or Transformers, followed by neur… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 20 pages, 9 figures

  43. arXiv:2406.04785  [pdf, other

    cs.DC

    Enabling Efficient Batch Serving for LMaaS via Generation Length Prediction

    Authors: Ke Cheng, Wen Hu, Zhi Wang, Peng Du, Jianguo Li, Sheng Zhang

    Abstract: Nowadays, large language models (LLMs) are published as a service and can be accessed by various applications via APIs, also known as language-model-as-a-service (LMaaS). Without knowing the generation length of requests, existing serving systems serve requests in a first-come, first-served (FCFS) manner with a fixed batch size, which leads to two problems that affect batch serving efficiency. Fir… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: 12 pages, 14 figures

  44. arXiv:2406.00403  [pdf, other

    cs.LG cs.AI

    Dual-perspective Cross Contrastive Learning in Graph Transformers

    Authors: Zelin Yao, Chuang Liu, Xueqi Ma, Mukun Chen, Jia Wu, Xiantao Cai, Bo Du, Wenbin Hu

    Abstract: Graph contrastive learning (GCL) is a popular method for leaning graph representations by maximizing the consistency of features across augmented views. Traditional GCL methods utilize single-perspective i.e. data or model-perspective) augmentation to generate positive samples, restraining the diversity of positive samples. In addition, these positive samples may be unreliable due to uncontrollabl… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: 12 pages, 5 figures, submitted to IEEE TKDE

  45. arXiv:2405.20279  [pdf, other

    cs.CV cs.AI eess.IV

    CV-VAE: A Compatible Video VAE for Latent Generative Video Models

    Authors: Sijie Zhao, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Muyao Niu, Xiaoyu Li, Wenbo Hu, Ying Shan

    Abstract: Spatio-temporal compression of videos, utilizing networks such as Variational Autoencoders (VAE), plays a crucial role in OpenAI's SORA and numerous other video generative models. For instance, many LLM-like video models learn the distribution of discrete tokens derived from 3D VAEs within the VQVAE framework, while most diffusion-based video models capture the distribution of continuous latent ex… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Project Page: https://ailab-cvc.github.io/cvvae/index.html

  46. arXiv:2405.19958  [pdf, other

    cs.CL cs.AI

    Multi-Aspect Controllable Text Generation with Disentangled Counterfactual Augmentation

    Authors: Yi Liu, Xiangyu Liu, Xiangrong Zhu, Wei Hu

    Abstract: Multi-aspect controllable text generation aims to control the generated texts in attributes from multiple aspects (e.g., "positive" from sentiment and "sport" from topic). For ease of obtaining training samples, existing works neglect attribute correlations formed by the intertwining of different attributes. Particularly, the stereotype formed by imbalanced attribute correlations significantly aff… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted in the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

  47. arXiv:2405.19782  [pdf, other

    cs.SE cs.CL

    Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion

    Authors: Wei Cheng, Yuhan Wu, Wei Hu

    Abstract: Recent years have witnessed the deployment of code language models (LMs) in various code intelligence tasks such as code completion. Yet, it is challenging for pre-trained LMs to generate correct completions in private repositories. Previous studies retrieve cross-file context based on import relations or text similarity, which is insufficiently relevant to completion targets. In this paper, we pr… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted in the 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024)

  48. arXiv:2405.19315  [pdf, other

    cs.CV cs.CL cs.LG

    Matryoshka Query Transformer for Large Vision-Language Models

    Authors: Wenbo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang

    Abstract: Large Vision-Language Models (LVLMs) typically encode an image into a fixed number of visual tokens (e.g., 576) and process these tokens with a language model. Despite their strong performance, LVLMs face challenges in adapting to varying computational constraints. This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resource… ▽ More

    Submitted 6 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: Preprint. Our code and model are publicly available at https://github.com/gordonhu608/MQT-LLaVA

  49. arXiv:2405.18435  [pdf, other

    eess.IV cs.CV

    QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge

    Authors: Hongwei Bran Li, Fernando Navarro, Ivan Ezhov, Amirhossein Bayat, Dhritiman Das, Florian Kofler, Suprosanna Shit, Diana Waldmannstetter, Johannes C. Paetzold, Xiaobin Hu, Benedikt Wiestler, Lucas Zimmer, Tamaz Amiranashvili, Chinmay Prabhakar, Christoph Berger, Jonas Weidner, Michelle Alonso-Basant, Arif Rashid, Ujjwal Baid, Wesam Adel, Deniz Ali, Bhakti Baheti, Yingbin Bai, Ishaan Bhatt, Sabri Can Cetindag , et al. (55 additional authors not shown)

    Abstract: Uncertainty in medical image segmentation tasks, especially inter-rater variability, arising from differences in interpretations and annotations by various experts, presents a significant challenge in achieving consistent and reliable image segmentation. This variability not only reflects the inherent complexity and subjective nature of medical image interpretation but also directly impacts the de… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 March, 2024; originally announced May 2024.

    Comments: initial technical report

  50. arXiv:2405.17875  [pdf, other

    math.OC cs.LG

    BO4IO: A Bayesian optimization approach to inverse optimization with uncertainty quantification

    Authors: Yen-An Lu, Wei-Shou Hu, Joel A. Paulson, Qi Zhang

    Abstract: This work addresses data-driven inverse optimization (IO), where the goal is to estimate unknown parameters in an optimization model from observed decisions that can be assumed to be optimal or near-optimal solutions to the optimization problem. The IO problem is commonly formulated as a large-scale bilevel program that is notoriously difficult to solve. Deviating from traditional exact solution m… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.