Skip to main content

Showing 1–50 of 1,510 results for author: Xu, C

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10826  [pdf, other

    cs.DC

    NeuLite: Memory-Efficient Federated Learning via Elastic Progressive Training

    Authors: Yebo Wu, Li Li, Chunlin Tian, Dubing Chen, Chengzhong Xu

    Abstract: Federated Learning (FL) emerges as a new learning paradigm that enables multiple devices to collaboratively train a shared model while preserving data privacy. However, intensive memory footprint during the training process severely bottlenecks the deployment of FL on resource-constrained devices in real-world cases. In this paper, we propose NeuLite, a framework that breaks the memory wall throug… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2408.10681  [pdf, other

    cs.CL cs.LG

    HMoE: Heterogeneous Mixture of Experts for Language Modeling

    Authors: An Wang, Xingwu Sun, Ruobing Xie, Shuaipeng Li, Jiaqi Zhu, Zhen Yang, Pinxue Zhao, J. N. Han, Zhanhui Kang, Di Wang, Naoaki Okazaki, Cheng-zhong Xu

    Abstract: Mixture of Experts (MoE) offers remarkable performance and computational efficiency by selectively activating subsets of model parameters. Traditionally, MoE models use homogeneous experts, each with identical capacity. However, varying complexity in input data necessitates experts with diverse capabilities, while homogeneous MoE hinders effective expert specialization and efficient parameter util… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  3. arXiv:2408.10198  [pdf, other

    cs.CV cs.GR

    MeshFormer: High-Quality Mesh Generation with 3D-Guided Reconstruction Model

    Authors: Minghua Liu, Chong Zeng, Xinyue Wei, Ruoxi Shi, Linghao Chen, Chao Xu, Mengqi Zhang, Zhaoning Wang, Xiaoshuai Zhang, Isabella Liu, Hongzhi Wu, Hao Su

    Abstract: Open-world 3D reconstruction models have recently garnered significant attention. However, without sufficient 3D inductive bias, existing methods typically entail expensive training costs and struggle to extract high-quality 3D meshes. In this work, we introduce MeshFormer, a sparse-view reconstruction model that explicitly leverages 3D native structure, input guidance, and training supervision. S… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 20 pages, 9 figures

  4. arXiv:2408.10195  [pdf, other

    cs.CV cs.AI cs.GR

    SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views

    Authors: Chao Xu, Ang Li, Linghao Chen, Yulin Liu, Ruoxi Shi, Hao Su, Minghua Liu

    Abstract: Open-world 3D generation has recently attracted considerable attention. While many single-image-to-3D methods have yielded visually appealing outcomes, they often lack sufficient controllability and tend to produce hallucinated regions that may not align with users' expectations. In this paper, we explore an important scenario in which the input consists of one or a few unposed 2D images of a sing… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  5. arXiv:2408.09786  [pdf, other

    cs.CV

    Cross-composition Feature Disentanglement for Compositional Zero-shot Learning

    Authors: Yuxia Geng, Runkai Zhu, Jiaoyan Chen, Jintai Chen, Zhuo Chen, Xiang Chen, Can Xu, Yuxiang Wang, Xiaoliang Xu

    Abstract: Disentanglement of visual features of primitives (i.e., attributes and objects) has shown exceptional results in Compositional Zero-shot Learning (CZSL). However, due to the feature divergence of an attribute (resp. object) when combined with different objects (resp. attributes), it is challenging to learn disentangled primitive features that are general across different compositions. To this end,… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: work in progress

  6. arXiv:2408.09476  [pdf, other

    cs.CV cs.LG

    Advances in Multiple Instance Learning for Whole Slide Image Analysis: Techniques, Challenges, and Future Directions

    Authors: Jun Wang, Yu Mao, Nan Guan, Chun Jason Xue

    Abstract: Whole slide images (WSIs) are gigapixel-scale digital images of H\&E-stained tissue samples widely used in pathology. The substantial size and complexity of WSIs pose unique analytical challenges. Multiple Instance Learning (MIL) has emerged as a powerful approach for addressing these challenges, particularly in cancer classification and detection. This survey provides a comprehensive overview of… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  7. arXiv:2408.09468  [pdf, other

    cs.RO

    Towards Safe and Robust Autonomous Vehicle Platooning: A Self-Organizing Cooperative Control Framework

    Authors: Chengkai Xu, Zihao Deng, Jiaqi Liu, Chao Huang, Peng Hang

    Abstract: In the emerging hybrid traffic flow environment, which includes both human-driven vehicles (HDVs) and autonomous vehicles (AVs), ensuring safe and robust decision-making and control is crucial for the effective operation of autonomous vehicle platooning. Current systems for cooperative adaptive cruise control and lane changing are inadequate in responding to real-world emergency situations, limiti… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  8. arXiv:2408.09458  [pdf, other

    cs.CV

    G2Face: High-Fidelity Reversible Face Anonymization via Generative and Geometric Priors

    Authors: Haoxin Yang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Jing Qin, Yi Wang, Pheng-Ann Heng, Shengfeng He

    Abstract: Reversible face anonymization, unlike traditional face pixelization, seeks to replace sensitive identity information in facial images with synthesized alternatives, preserving privacy without sacrificing image clarity. Traditional methods, such as encoder-decoder networks, often result in significant loss of facial details due to their limited learning capacity. Additionally, relying on latent man… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  9. arXiv:2408.09397  [pdf, other

    cs.CV

    Combo: Co-speech holistic 3D human motion generation and efficient customizable adaptation in harmony

    Authors: Chao Xu, Mingze Sun, Zhi-Qi Cheng, Fei Wang, Yang Liu, Baigui Sun, Ruqi Huang, Alexander Hauptmann

    Abstract: In this paper, we propose a novel framework, Combo, for harmonious co-speech holistic 3D human motion generation and efficient customizable adaption. In particular, we identify that one fundamental challenge as the multiple-input-multiple-output (MIMO) nature of the generative model of interest. More concretely, on the input end, the model typically consumes both speech signals and character guida… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  10. arXiv:2408.09220  [pdf, other

    cs.CV cs.AI

    Flatten: Video Action Recognition is an Image Classification task

    Authors: Junlin Chen, Chengcheng Xu, Yangfan Xu, Jian Yang, Jun Li, Zhiping Shi

    Abstract: In recent years, video action recognition, as a fundamental task in the field of video understanding, has been deeply explored by numerous researchers.Most traditional video action recognition methods typically involve converting videos into three-dimensional data that encapsulates both spatial and temporal information, subsequently leveraging prevalent image understanding models to model and anal… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 13pages, 6figures

  11. arXiv:2408.08502  [pdf, other

    cs.CV

    Efficient Image-to-Image Diffusion Classifier for Adversarial Robustness

    Authors: Hefei Mei, Minjing Dong, Chang Xu

    Abstract: Diffusion models (DMs) have demonstrated great potential in the field of adversarial robustness, where DM-based defense methods can achieve superior defense capability without adversarial training. However, they all require huge computational costs due to the usage of large-scale pre-trained DMs, making it difficult to conduct full evaluation under strong attacks and compare with traditional CNN-b… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  12. arXiv:2408.07673  [pdf

    cs.LG cs.AI cs.NE q-bio.QM

    Deep Learning: a Heuristic Three-stage Mechanism for Grid Searches to Optimize the Future Risk Prediction of Breast Cancer Metastasis Using EHR-based Clinical Data

    Authors: Xia Jiang, Yijun Zhou, Chuhan Xu, Adam Brufsky, Alan Wells

    Abstract: A grid search, at the cost of training and testing a large number of models, is an effective way to optimize the prediction performance of deep learning models. A challenging task concerning grid search is the time management. Without a good time management scheme, a grid search can easily be set off as a mission that will not finish in our lifetime. In this study, we introduce a heuristic three-s… ▽ More

    Submitted 15 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

  13. arXiv:2408.05775  [pdf, other

    cs.CV

    Efficient Test-Time Prompt Tuning for Vision-Language Models

    Authors: Yuhan Zhu, Guozhen Zhang, Chen Xu, Haocheng Shen, Xiaoxin Chen, Gangshan Wu, Limin Wang

    Abstract: Vision-language models have showcased impressive zero-shot classification capabilities when equipped with suitable text prompts. Previous studies have shown the effectiveness of test-time prompt tuning; however, these methods typically require per-image prompt adaptation during inference, which incurs high computational budgets and limits scalability and practical deployment. To overcome this issu… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  14. arXiv:2408.04294  [pdf, other

    cs.CV cs.LG

    Dual-branch PolSAR Image Classification Based on GraphMAE and Local Feature Extraction

    Authors: Yuchen Wang, Ziyi Guo, Haixia Bi, Danfeng Hong, Chen Xu

    Abstract: The annotation of polarimetric synthetic aperture radar (PolSAR) images is a labor-intensive and time-consuming process. Therefore, classifying PolSAR images with limited labels is a challenging task in remote sensing domain. In recent years, self-supervised learning approaches have proven effective in PolSAR image classification with sparse labels. However, we observe a lack of research on genera… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  15. arXiv:2408.01945  [pdf, other

    cs.CV cs.RO

    Generalized Maximum Likelihood Estimation for Perspective-n-Point Problem

    Authors: Tian Zhan, Chunfeng Xu, Cheng Zhang, Ke Zhu

    Abstract: The Perspective-n-Point (PnP) problem has been widely studied in the literature and applied in various vision-based pose estimation scenarios. However, existing methods ignore the anisotropy uncertainty of observations, as demonstrated in several real-world datasets in this paper. This oversight may lead to suboptimal and inaccurate estimation, particularly in the presence of noisy observations. T… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  16. arXiv:2408.01835  [pdf, other

    cs.CV

    TS-SAM: Fine-Tuning Segment-Anything Model for Downstream Tasks

    Authors: Yang Yu, Chen Xu, Kai Wang

    Abstract: Adapter based fine-tuning has been studied for improving the performance of SAM on downstream tasks. However, there is still a significant performance gap between fine-tuned SAMs and domain-specific models. To reduce the gap, we propose Two-Stream SAM (TS-SAM). On the one hand, inspired by the side network in Parameter-Efficient Fine-Tuning (PEFT), we designed a lightweight Convolutional Side Adap… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  17. arXiv:2408.01649  [pdf, other

    cs.RO

    LF-3PM: a LiDAR-based Framework for Perception-aware Planning with Perturbation-induced Metric

    Authors: Kaixin Chai, Long Xu, Qianhao Wang, Chao Xu, Peng Yin, Fei Gao

    Abstract: Just as humans can become disoriented in featureless deserts or thick fogs, not all environments are conducive to the Localization Accuracy and Stability (LAS) of autonomous robots. This paper introduces an efficient framework designed to enhance LiDAR-based LAS through strategic trajectory generation, known as Perception-aware Planning. Unlike vision-based frameworks, the LiDAR-based requires dif… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  18. arXiv:2408.01430  [pdf, other

    cs.CV cs.AI

    SUSTechGAN: Image Generation for Object Recognition in Adverse Conditions of Autonomous Driving

    Authors: Gongjin Lan, Yang Peng, Qi Hao, Chengzhong Xu

    Abstract: Autonomous driving significantly benefits from data-driven deep neural networks. However, the data in autonomous driving typically fits the long-tailed distribution, in which the critical driving data in adverse conditions is hard to collect. Although generative adversarial networks (GANs) have been applied to augment data for autonomous driving, generating driving images in adverse conditions is… ▽ More

    Submitted 18 July, 2024; originally announced August 2024.

    Comments: 10 pages, 9 figures

  19. arXiv:2408.01076  [pdf, other

    cs.CV

    Exploiting the Semantic Knowledge of Pre-trained Text-Encoders for Continual Learning

    Authors: Lu Yu, Zhe Tao, Hantao Yao, Joost Van de Weijer, Changsheng Xu

    Abstract: Deep neural networks (DNNs) excel on fixed datasets but struggle with incremental and shifting data in real-world scenarios. Continual learning addresses this challenge by allowing models to learn from new data while retaining previously learned knowledge. Existing methods mainly rely on visual features, often neglecting the rich semantic information encoded in text. The semantic knowledge availab… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  20. arXiv:2408.00764  [pdf, other

    cs.CL cs.AI cs.LG

    AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation

    Authors: Mengkang Hu, Pu Zhao, Can Xu, Qingfeng Sun, Jianguang Lou, Qingwei Lin, Ping Luo, Saravan Rajmohan, Dongmei Zhang

    Abstract: Large Language Model (LLM) based agents have garnered significant attention and are becoming increasingly popular. Furthermore, planning ability is a crucial component of an LLM-based agent, involving interaction with the environment and executing actions to complete a planning task, which generally entails achieving a desired goal from an initial state. This paper investigates enhancing the plann… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  21. arXiv:2407.21439  [pdf, other

    cs.AI cs.CL cs.LG

    MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training

    Authors: Zhanpeng Chen, Chengjin Xu, Yiyan Qi, Jian Guo

    Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in processing and generating content across multiple data modalities, including text, images, audio, and video. However, a significant drawback of MLLMs is their reliance on static training data, leading to outdated information and limited contextual awareness. This static nature hampers their ability to provide acc… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  22. arXiv:2407.20523  [pdf, other

    cs.IT cs.MM

    Wireless Multi-User Interactive Virtual Reality in Metaverse with Edge-Device Collaborative Computing

    Authors: Caolu Xu, Zhiyong Chen, Meixia Tao, Wenjun Zhang

    Abstract: The immersive nature of the metaverse presents significant challenges for wireless multi-user interactive virtual reality (VR), such as ultra-low latency, high throughput and intensive computing, which place substantial demands on the wireless bandwidth and rendering resources of mobile edge computing (MEC). In this paper, we propose a wireless multi-user interactive VR with edge-device collaborat… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: submitted to IEEE journal

  23. arXiv:2407.19078  [pdf, other

    cs.LG stat.ML

    Practical Marketplace Optimization at Uber Using Causally-Informed Machine Learning

    Authors: Bobby Chen, Siyu Chen, Jason Dowlatabadi, Yu Xuan Hong, Vinayak Iyer, Uday Mantripragada, Rishabh Narang, Apoorv Pandey, Zijun Qin, Abrar Sheikh, Hongtao Sun, Jiaqi Sun, Matthew Walker, Kaichen Wei, Chen Xu, Jingnan Yang, Allen T. Zhang, Guoqing Zhang

    Abstract: Budget allocation of marketplace levers, such as incentives for drivers and promotions for riders, has long been a technical and business challenge at Uber; understanding lever budget changes' impact and estimating cost efficiency to achieve predefined budgets is crucial, with the goal of optimal allocations that maximize business value; we introduce an end-to-end machine learning and optimization… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: To be published in the 2nd Workshop on Causal Inference and Machine Learning in Practice, KDD 2024, August 25 to 29, 2024, Barcelona, Spain, 10 pages

    MSC Class: 62J99

  24. arXiv:2407.19014  [pdf, other

    cs.CV

    Sparse Refinement for Efficient High-Resolution Semantic Segmentation

    Authors: Zhijian Liu, Zhuoyang Zhang, Samir Khaki, Shang Yang, Haotian Tang, Chenfeng Xu, Kurt Keutzer, Song Han

    Abstract: Semantic segmentation empowers numerous real-world applications, such as autonomous driving and augmented/mixed reality. These applications often operate on high-resolution images (e.g., 8 megapixels) to capture the fine details. However, this comes at the cost of considerable computational complexity, hindering the deployment in latency-sensitive scenarios. In this paper, we introduce SparseRefin… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. The first two authors contributed equally to this work. Project page: https://sparserefine.mit.edu

  25. arXiv:2407.18559  [pdf, other

    cs.CV

    VSSD: Vision Mamba with Non-Causal State Space Duality

    Authors: Yuheng Shi, Minjing Dong, Mingjia Li, Chang Xu

    Abstract: Vision transformers have significantly advanced the field of computer vision, offering robust modeling capabilities and global receptive field. However, their high computational demands limit their applicability in processing long sequences. To tackle this issue, State Space Models (SSMs) have gained prominence in vision tasks as they offer linear computational complexity. Recently, State Space Du… ▽ More

    Submitted 4 August, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

    Comments: 16 pages, 5 figures, 7 tables

  26. arXiv:2407.18418  [pdf, other

    cs.CL

    Know Your Limits: A Survey of Abstention in Large Language Models

    Authors: Bingbing Wen, Jihan Yao, Shangbin Feng, Chenjun Xu, Yulia Tsvetkov, Bill Howe, Lucy Lu Wang

    Abstract: Abstention, the refusal of large language models (LLMs) to provide an answer, is increasingly recognized for its potential to mitigate hallucinations and enhance safety in LLM systems. In this survey, we introduce a framework to examine abstention from three perspectives: the query, the model, and human values. We organize the literature on abstention methods, benchmarks, and evaluation metrics us… ▽ More

    Submitted 8 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: preprint

  27. arXiv:2407.17757  [pdf, other

    cs.CV cs.RO

    CRASH: Crash Recognition and Anticipation System Harnessing with Context-Aware and Temporal Focus Attentions

    Authors: Haicheng Liao, Haoyu Sun, Huanming Shen, Chengyue Wang, Kahou Tam, Chunlin Tian, Li Li, Chengzhong Xu, Zhenning Li

    Abstract: Accurately and promptly predicting accidents among surrounding traffic agents from camera footage is crucial for the safety of autonomous vehicles (AVs). This task presents substantial challenges stemming from the unpredictable nature of traffic accidents, their long-tail distribution, the intricacies of traffic scene dynamics, and the inherently constrained field of vision of onboard cameras. To… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  28. arXiv:2407.17738  [pdf, other

    cs.CV

    Enhancing Fine-grained Object Detection in Aerial Images via Orthogonal Mapping

    Authors: Haoran Zhu, Yifan Zhou, Chang Xu, Ruixiang Zhang, Wen Yang

    Abstract: Fine-Grained Object Detection (FGOD) is a critical task in high-resolution aerial image analysis. This letter introduces Orthogonal Mapping (OM), a simple yet effective method aimed at addressing the challenge of semantic confusion inherent in FGOD. OM introduces orthogonal constraints in the feature space by decoupling features from the last layer of the classification branch with a class-wise or… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  29. arXiv:2407.17730  [pdf, other

    cs.CL

    Are Large Language Models Possible to Conduct Cognitive Behavioral Therapy?

    Authors: Hao Shen, Zihan Li, Minqiang Yang, Minghui Ni, Yongfeng Tao, Zhengyang Yu, Weihao Zheng, Chen Xu, Bin Hu

    Abstract: In contemporary society, the issue of psychological health has become increasingly prominent, characterized by the diversification, complexity, and universality of mental disorders. Cognitive Behavioral Therapy (CBT), currently the most influential and clinically effective psychological treatment method with no side effects, has limited coverage and poor quality in most countries. In recent years,… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  30. arXiv:2407.17078  [pdf, other

    cs.RO

    Active Loop Closure for OSM-guided Robotic Mapping in Large-Scale Urban Environments

    Authors: Wei Gao, Zezhou Sun, Mingle Zhao, Cheng-Zhong Xu, Hui Kong

    Abstract: The autonomous mapping of large-scale urban scenes presents significant challenges for autonomous robots. To mitigate the challenges, global planning, such as utilizing prior GPS trajectories from OpenStreetMap (OSM), is often used to guide the autonomous navigation of robots for mapping. However, due to factors like complex terrain, unexpected body movement, and sensor noise, the uncertainty of t… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  31. arXiv:2407.16277  [pdf, other

    cs.CV cs.HC

    When, Where, and What? A Novel Benchmark for Accident Anticipation and Localization with Large Language Models

    Authors: Haicheng Liao, Yongkang Li, Chengyue Wang, Yanchen Guan, KaHou Tam, Chunlin Tian, Li Li, Chengzhong Xu, Zhenning Li

    Abstract: As autonomous driving systems increasingly become part of daily transportation, the ability to accurately anticipate and mitigate potential traffic accidents is paramount. Traditional accident anticipation models primarily utilizing dashcam videos are adept at predicting when an accident may occur but fall short in localizing the incident and identifying involved entities. Addressing this gap, thi… ▽ More

    Submitted 26 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

  32. arXiv:2407.14744  [pdf, other

    cs.CV

    A Comprehensive Review of Few-shot Action Recognition

    Authors: Yuyang Wanyan, Xiaoshan Yang, Weiming Dong, Changsheng Xu

    Abstract: Few-shot action recognition aims to address the high cost and impracticality of manually labeling complex and variable video data in action recognition. It requires accurately classifying human actions in videos using only a few labeled examples per class. Compared to few-shot learning in image scenarios, few-shot action recognition is more challenging due to the intrinsic complexity of video data… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 22 pages

  33. arXiv:2407.14212  [pdf, other

    cs.SD cs.CL eess.AS

    Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2

    Authors: Chun Xu, En-Wei Sun

    Abstract: An increasing number of Chinese people are troubled by different degrees of visual impairment, which has made the modal conversion between a single image or video frame in the visual field and the audio expressing the same information a research hotspot. Deep learning technologies such as OCR+Vocoder and Im2Wav enable English audio synthesis or image-to-sound matching in a self-supervised manner.… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  34. arXiv:2407.13773  [pdf, other

    cs.DL cs.AI

    OpenDataLab: Empowering General Artificial Intelligence with Open Datasets

    Authors: Conghui He, Wei Li, Zhenjiang Jin, Chao Xu, Bin Wang, Dahua Lin

    Abstract: The advancement of artificial intelligence (AI) hinges on the quality and accessibility of data, yet the current fragmentation and variability of data sources hinder efficient data utilization. The dispersion of data sources and diversity of data formats often lead to inefficiencies in data retrieval and processing, significantly impeding the progress of AI research and applications. To address th… ▽ More

    Submitted 4 June, 2024; originally announced July 2024.

  35. arXiv:2407.13609  [pdf, other

    cs.CV cs.AI

    Training-free Composite Scene Generation for Layout-to-Image Synthesis

    Authors: Jiaqi Liu, Tao Huang, Chang Xu

    Abstract: Recent breakthroughs in text-to-image diffusion models have significantly advanced the generation of high-fidelity, photo-realistic images from textual descriptions. Yet, these models often struggle with interpreting spatial arrangements from text, hindering their ability to produce images with precise spatial configurations. To bridge this gap, layout-to-image generation has emerged as a promisin… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  36. arXiv:2407.13292  [pdf, other

    cs.SD cs.CL eess.AS

    Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training

    Authors: Lukuan Dong, Donghong Qin, Fengbo Bai, Fanhua Song, Yan Liu, Chen Xu, Zhijian Ou

    Abstract: The mainstream automatic speech recognition (ASR) technology usually requires hundreds to thousands of hours of annotated speech data. Three approaches to low-resourced ASR are phoneme or subword based supervised pre-training, and self-supervised pre-training over multilingual data. The Iu Mien language is the main ethnic language of the Yao ethnic group in China and is low-resourced in the sense… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  37. arXiv:2407.13193  [pdf, other

    cs.CL

    Retrieval-Augmented Generation for Natural Language Processing: A Survey

    Authors: Shangyu Wu, Ying Xiong, Yufei Cui, Haolun Wu, Can Chen, Ye Yuan, Lianming Huang, Xue Liu, Tei-Wei Kuo, Nan Guan, Chun Jason Xue

    Abstract: Large language models (LLMs) have demonstrated great success in various fields, benefiting from their huge amount of parameters that store knowledge. However, LLMs still suffer from several key issues, such as hallucination problems, knowledge update issues, and lacking domain-specific expertise. The appearance of retrieval-augmented generation (RAG), which leverages an external knowledge database… ▽ More

    Submitted 18 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  38. arXiv:2407.13083  [pdf, other

    cs.SD cs.CV eess.AS

    Modeling and Driving Human Body Soundfields through Acoustic Primitives

    Authors: Chao Huang, Dejan Markovic, Chenliang Xu, Alexander Richard

    Abstract: While rendering and animation of photorealistic 3D human body models have matured and reached an impressive quality over the past years, modeling the spatial audio associated with such full body models has been largely ignored so far. In this work, we present a framework that allows for high-quality spatial audio generation, capable of rendering the full 3D soundfield generated by a human body, in… ▽ More

    Submitted 20 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. Project Page: https://wikichao.github.io/Acoustic-Primitives/

  39. arXiv:2407.12371  [pdf, other

    cs.CV cs.AI

    HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects

    Authors: Xintao Lv, Liang Xu, Yichao Yan, Xin Jin, Congsheng Xu, Shuwen Wu, Yifan Liu, Lincheng Li, Mengxiao Bi, Wenjun Zeng, Xiaokang Yang

    Abstract: Generating human-object interactions (HOIs) is critical with the tremendous advances of digital avatars. Existing datasets are typically limited to humans interacting with a single object while neglecting the ubiquitous manipulation of multiple objects. Thus, we propose HIMO, a large-scale MoCap dataset of full-body human interacting with multiple objects, containing 3.3K 4D HOI sequences and 4.08… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Project page: https://lvxintao.github.io/himo, accepted by ECCV 2024

  40. arXiv:2407.12306  [pdf, other

    cs.CV

    Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections

    Authors: Congrong Xu, Justin Kerr, Angjoo Kanazawa

    Abstract: Novel view synthesis from unconstrained in-the-wild image collections remains a significant yet challenging task due to photometric variations and transient occluders that complicate accurate scene reconstruction. Previous methods have approached these issues by integrating per-image appearance features embeddings in Neural Radiance Fields (NeRFs). Although 3D Gaussian Splatting (3DGS) offers fast… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 9 pages

  41. arXiv:2407.12189  [pdf, other

    cs.RO

    Wheeled Humanoid Bilateral Teleoperation with Position-Force Control Modes for Dynamic Loco-Manipulation

    Authors: Amartya Purushottam, Jack Yan, Christopher Xu, Youngwoo Sim, Joao Ramos

    Abstract: Remote-controlled humanoid robots can revolutionize manufacturing, construction, and healthcare industries by performing complex or dangerous manual tasks traditionally done by humans. We refer to these behaviors as Dynamic Loco-Manipulation (DLM). To successfully complete these tasks, humans control the position of their bodies and contact forces at their hands. To enable similar whole-body contr… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  42. arXiv:2407.11419  [pdf, other

    cs.CV

    TeethDreamer: 3D Teeth Reconstruction from Five Intra-oral Photographs

    Authors: Chenfan Xu, Zhentao Liu, Yuan Liu, Yulong Dou, Jiamin Wu, Jiepeng Wang, Minjiao Wang, Dinggang Shen, Zhiming Cui

    Abstract: Orthodontic treatment usually requires regular face-to-face examinations to monitor dental conditions of the patients. When in-person diagnosis is not feasible, an alternative is to utilize five intra-oral photographs for remote dental monitoring. However, it lacks of 3D information, and how to reconstruct 3D dental models from such sparse view photographs is a challenging problem. In this study,… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: MICCAI2024

  43. arXiv:2407.10805  [pdf, other

    cs.CL cs.AI

    Think-on-Graph 2.0: Deep and Interpretable Large Language Model Reasoning with Knowledge Graph-guided Retrieval

    Authors: Shengjie Ma, Chengjin Xu, Xuhui Jiang, Muzhi Li, Huaren Qu, Jian Guo

    Abstract: Retrieval-augmented generation (RAG) has significantly advanced large language models (LLMs) by enabling dynamic information retrieval to mitigate knowledge gaps and hallucinations in generated content. However, these systems often falter with complex reasoning and consistency across diverse queries. In this work, we present Think-on-Graph 2.0, an enhanced RAG framework that aligns questions with… ▽ More

    Submitted 6 August, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  44. arXiv:2407.10627  [pdf, other

    cs.CL cs.AI cs.LG

    Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena

    Authors: Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Qingwei Lin, Jianguang Lou, Shifeng Chen, Yansong Tang, Weizhu Chen

    Abstract: Assessing the effectiveness of large language models (LLMs) presents substantial challenges. The method of conducting human-annotated battles in an online Chatbot Arena is a highly effective evaluative technique. However, this approach is limited by the costs and time required for human annotation. In this paper, we introduce Arena Learning, an innovative offline strategy designed to simulate thes… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  45. arXiv:2407.10204  [pdf, other

    cs.LG

    Improving Graph Out-of-distribution Generalization on Real-world Data

    Authors: Can Xu, Yao Cheng, Jianxiang Yu, Haosen Wang, Jingsong Lv, Xiang Li

    Abstract: Existing methods for graph out-of-distribution (OOD) generalization primarily rely on empirical studies on synthetic datasets. Such approaches tend to overemphasize the causal relationships between invariant sub-graphs and labels, thereby neglecting the non-negligible role of environment in real-world scenarios. In contrast to previous studies that impose rigid independence assumptions on environm… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 21 pages, 5 figures

  46. arXiv:2407.10173  [pdf, other

    cs.DC

    StatuScale: Status-aware and Elastic Scaling Strategy for Microservice Applications

    Authors: Linfeng Wen, Minxian Xu, Sukhpal Singh Gill, Muhammad Hafizhuddin Hilman, Satish Narayana Srirama, Kejiang Ye, Chengzhong Xu

    Abstract: Microservice architecture has transformed traditional monolithic applications into lightweight components. Scaling these lightweight microservices is more efficient than scaling servers. However, scaling microservices still faces the challenges resulted from the unexpected spikes or bursts of requests, which are difficult to detect and can degrade performance instantaneously. To address this chall… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 26 pages

    Journal ref: ACM Transactions on Autonomous and Adaptive Systems, 2024

  47. arXiv:2407.10169  [pdf, other

    cs.DC

    DRPC: Distributed Reinforcement Learning Approach for Scalable Resource Provisioning in Container-based Clusters

    Authors: Haoyu Bai, Minxian Xu, Kejiang Ye, Rajkumar Buyya, Chengzhong Xu

    Abstract: Microservices have transformed monolithic applications into lightweight, self-contained, and isolated application components, establishing themselves as a dominant paradigm for application development and deployment in public clouds such as Google and Alibaba. Autoscaling emerges as an efficient strategy for managing resources allocated to microservices' replicas. However, the dynamic and intricat… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 12 pages

    Journal ref: IEEE Transactions on Service Computing, 2024

  48. arXiv:2407.10101  [pdf, other

    cs.RO

    WING: Wheel-Inertial Neural Odometry with Ground Manifold Constraints

    Authors: Chenxing Jiang, Kunyi Zhang, Sheng Yang, Shaojie Shen, Chao Xu, Fei Gao

    Abstract: In this paper, we propose an interoceptive-only odometry system for ground robots with neural network processing and soft constraints based on the assumption of a globally continuous ground manifold. Exteroceptive sensors such as cameras, GPS and LiDAR may encounter difficulties in scenarios with poor illumination, indoor environments, dusty areas and straight tunnels. Therefore, improving the pos… ▽ More

    Submitted 23 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

  49. arXiv:2407.09590  [pdf, other

    cs.CL cs.LG

    Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts

    Authors: Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, Jianfeng Gao

    Abstract: By increasing model parameters but activating them sparsely when performing a task, the use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large Language Models (LLMs) without increasing the inference cost. However, the memory consumption due to the growing number of experts presents a challenge to the deployment of these models in many real world settings. Our… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 13pages, 6 figures

  50. arXiv:2407.08526  [pdf, other

    cs.CV

    BLOS-BEV: Navigation Map Enhanced Lane Segmentation Network, Beyond Line of Sight

    Authors: Hang Wu, Zhenghao Zhang, Siyuan Lin, Tong Qin, Jin Pan, Qiang Zhao, Chunjing Xu, Ming Yang

    Abstract: Bird's-eye-view (BEV) representation is crucial for the perception function in autonomous driving tasks. It is difficult to balance the accuracy, efficiency and range of BEV representation. The existing works are restricted to a limited perception range within 50 meters. Extending the BEV representation range can greatly benefit downstream tasks such as topology reasoning, scene understanding, and… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: IEEE IV 2024