Skip to main content

Showing 1–50 of 1,540 results for author: Huang, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10497  [pdf, other

    cs.CL cs.AI

    QUITO-X: An Information Bottleneck-based Compression Algorithm with Cross-Attention

    Authors: Yihang Wang, Xu Huang, Bowen Tian, Yixing Fan, Jiafeng Guo

    Abstract: Generative LLM have achieved significant success in various industrial tasks and can effectively adapt to vertical domains and downstream tasks through ICL. However, with tasks becoming increasingly complex, the context length required by ICL is also getting longer, and two significant issues arise: (i) The excessively long context leads to high costs and inference delays. (ii) A substantial amoun… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  2. arXiv:2408.10404  [pdf, other

    cs.CV eess.IV eess.SP

    Parallel Processing of Point Cloud Ground Segmentation for Mechanical and Solid-State LiDARs

    Authors: Xiao Zhang, Zhanhong Huang, Garcia Gonzalez Antony, Witek Jachimczyk, Xinming Huang

    Abstract: In this study, we introduce a novel parallel processing framework for real-time point cloud ground segmentation on FPGA platforms, aimed at adapting LiDAR algorithms to the evolving landscape from mechanical to solid-state LiDAR (SSL) technologies. Focusing on the ground segmentation task, we explore parallel processing techniques on existing approaches and adapt them to real-world SSL data handli… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 5 pages

  3. arXiv:2408.09110  [pdf, other

    cs.CV

    Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community

    Authors: Jiancheng Pan, Yanxing Liu, Yuqian Fu, Muyuan Ma, Jiaohao Li, Danda Pani Paudel, Luc Van Gool, Xiaomeng Huang

    Abstract: Object detection, particularly open-vocabulary object detection, plays a crucial role in Earth sciences, such as environmental monitoring, natural disaster assessment, and land-use planning. However, existing open-vocabulary detectors, primarily trained on natural-world images, struggle to generalize to remote sensing images due to a significant data domain gap. Thus, this paper aims to advance th… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures

  4. arXiv:2408.08959  [pdf, other

    cs.AI cs.CL

    Adaptive Guardrails For Large Language Models via Trust Modeling and In-Context Learning

    Authors: Jinwei Hu, Yi Dong, Xiaowei Huang

    Abstract: Guardrails have become an integral part of Large language models (LLMs), by moderating harmful or toxic response in order to maintain LLMs' alignment to human expectations. However, the existing guardrail methods do not consider different needs and access rights of individual users, and treat all the users with the same rule. This study introduces an adaptive guardrail mechanism, supported by trus… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Under Review

  5. arXiv:2408.08711  [pdf, ps, other

    cs.GT

    Weighted Envy-free Allocation with Subsidy

    Authors: Haris Aziz, Xin Huang, Kei Kimura, Indrajit Saha, Zhaohong Sun Mashbat Suzuki, Makoto Yokoo

    Abstract: We consider the problem of fair allocation with subsidy when agents have weighted entitlements. After highlighting several important differences from the unweighted cases, we present several results concerning weighted envy-freeability including general characterizations, algorithms for achieving and testing weighted envy-freeability, lower and upper bounds for worst case subsidy for non-wasteful… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 20 pages, 1 Table

  6. arXiv:2408.08640  [pdf, other

    cs.CL

    Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning

    Authors: Wenwen Zhuang, Xin Huang, Xiantao Zhang, Jin Zeng

    Abstract: Multimodal Large Language Models (MLLMs) excel in solving text-based mathematical problems, but they struggle with mathematical diagrams since they are primarily trained on natural scene images. For humans, visual aids generally enhance problem-solving, but MLLMs perform worse as information shifts from textual to visual modality. This decline is mainly due to their shortcomings in aligning images… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  7. arXiv:2408.08341  [pdf, other

    q-bio.QM cs.AI cs.LG

    Exploring Latent Space for Generating Peptide Analogs Using Protein Language Models

    Authors: Po-Yu Liang, Xueting Huang, Tibo Duran, Andrew J. Wiemer, Jun Bai

    Abstract: Generating peptides with desired properties is crucial for drug discovery and biotechnology. Traditional sequence-based and structure-based methods often require extensive datasets, which limits their effectiveness. In this study, we proposed a novel method that utilized autoencoder shaped models to explore the protein embedding space, and generate novel peptide analogs by leveraging protein langu… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  8. arXiv:2408.07543  [pdf, other

    cs.CV cs.CL

    MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark

    Authors: Minxuan Zhou, Hao Liang, Tianpeng Li, Zhiyu Wu, Mingan Lin, Linzhuang Sun, Yaqi Zhou, Yan Zhang, Xiaoqin Huang, Yicong Chen, Yujing Qiao, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

    Abstract: With the development of Multimodal Large Language Models (MLLMs), the evaluation of multimodal models in the context of mathematical problems has become a valuable research field. Multimodal visual-textual mathematical reasoning serves as a critical indicator for evaluating the comprehension and complex multi-step quantitative reasoning abilities of MLLMs. However, previous multimodal math benchma… ▽ More

    Submitted 15 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

  9. arXiv:2408.07481  [pdf, other

    cs.CV

    DeCo: Decoupled Human-Centered Diffusion Video Editing with Motion Consistency

    Authors: Xiaojing Zhong, Xinyi Huang, Xiaofeng Yang, Guosheng Lin, Qingyao Wu

    Abstract: Diffusion models usher a new era of video editing, flexibly manipulating the video contents with text prompts. Despite the widespread application demand in editing human-centered videos, these models face significant challenges in handling complex objects like humans. In this paper, we introduce DeCo, a novel video editing framework specifically designed to treat humans and the background as separ… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: European Conference on Computer Vision

  10. arXiv:2408.06604  [pdf, other

    cs.CV

    MV-DETR: Multi-modality indoor object detection by Multi-View DEtecton TRansformers

    Authors: Zichao Dong, Yilin Zhang, Xufeng Huang, Hang Ji, Zhan Shi, Xin Zhan, Junbo Chen

    Abstract: We introduce a novel MV-DETR pipeline which is effective while efficient transformer based detection method. Given input RGBD data, we notice that there are super strong pretraining weights for RGB data while less effective works for depth related data. First and foremost , we argue that geometry and texture cues are both of vital importance while could be encoded separately. Secondly, we find tha… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  11. arXiv:2408.06567  [pdf, other

    cs.CL cs.AI

    AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

    Authors: Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, Chengwei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu , et al. (2 additional authors not shown)

    Abstract: In recent years, with the rapid application of large language models across various fields, the scale of these models has gradually increased, and the resources required for their pre-training have grown exponentially. Training an LLM from scratch will cost a lot of computation resources while scaling up from a smaller model is a more efficient approach and has thus attracted significant attention… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  12. arXiv:2408.06082  [pdf, ps, other

    cs.SE

    AutoCheck: Automatically Identifying Variables for Checkpointing by Data Dependency Analysis

    Authors: Xiang Fu, Weiping Zhang, Xin Huang, Shiman Meng, Wubiao Xu, Luanzheng Guo, Kento Sato

    Abstract: Checkpoint/Restart (C/R) has been widely deployed in numerous HPC systems, Clouds, and industrial data centers, which are typically operated by system engineers. Nevertheless, there is no existing approach that helps system engineers without domain expertise, and domain scientists without system fault tolerance knowledge identify those critical variables accounted for correct application execution… ▽ More

    Submitted 15 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: 11 pages, 7 figures, 4 tables

  13. arXiv:2408.06030  [pdf, other

    cs.RO

    Developing Smart MAVs for Autonomous Inspection in GPS-denied Constructions

    Authors: Paoqiang Pan, Kewei Hu, Xiao Huang, Wei Ying, Xiaoxuan Xie, Yue Ma, Naizhong Zhang, Hanwen Kang

    Abstract: Smart Micro Aerial Vehicles (MAVs) have transformed infrastructure inspection by enabling efficient, high-resolution monitoring at various stages of construction, including hard-to-reach areas. Traditional manual operation of drones in GPS-denied environments, such as industrial facilities and infrastructure, is labour-intensive, tedious and prone to error. This study presents an innovative framew… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  14. arXiv:2408.05575  [pdf, other

    cs.AI cs.GT

    In-Context Exploiter for Extensive-Form Games

    Authors: Shuxin Li, Chang Yang, Youzhi Zhang, Pengdeng Li, Xinrun Wang, Xiao Huang, Hau Chan, Bo An

    Abstract: Nash equilibrium (NE) is a widely adopted solution concept in game theory due to its stability property. However, we observe that the NE strategy might not always yield the best results, especially against opponents who do not adhere to NE strategies. Based on this observation, we pose a new game-solving question: Can we learn a model that can exploit any, even NE, opponent to maximize their own u… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  15. arXiv:2408.05456  [pdf, other

    cs.CL

    Path-LLM: A Shortest-Path-based LLM Learning for Unified Graph Representation

    Authors: Wenbo Shang, Xuliang Zhu, Xin Huang

    Abstract: Unified graph representation learning aims to produce node embeddings, which can be applied to multiple downstream applications. However, existing studies based on graph neural networks and language models either suffer from the limitations of numerous training needed toward specific downstream predictions or have shallow semantic features. In this work, we propose a novel Path-LLM model to learn… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 12 pages, 8 figures

  16. arXiv:2408.05452  [pdf, other

    cs.CV cs.RO

    EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency

    Authors: Junjie Jiang, Hao Zhuang, Xinjie Huang, Delei Kong, Zheng Fang

    Abstract: Event cameras have the potential to revolutionize the field of robot vision, particularly in areas like stereo disparity estimation, owing to their high temporal resolution and high dynamic range. Many studies use deep learning for event camera stereo disparity estimation. However, these methods fail to fully exploit the temporal information in the event stream to acquire clear event representatio… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  17. arXiv:2408.03877  [pdf, other

    cs.LG cs.AI

    Knowledge Probing for Graph Representation Learning

    Authors: Mingyu Zhao, Xingyu Huang, Ziyu Lyu, Yanlin Wang, Lixin Cui, Lu Bai

    Abstract: Graph learning methods have been extensively applied in diverse application areas. However, what kind of inherent graph properties e.g. graph proximity, graph structural information has been encoded into graph representation learning for downstream tasks is still under-explored. In this paper, we propose a novel graph probing framework (GraphProbe) to investigate and interpret whether the family o… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  18. arXiv:2408.03677  [pdf, other

    cs.CV

    L4DR: LiDAR-4DRadar Fusion for Weather-Robust 3D Object Detection

    Authors: Xun Huang, Ziyu Xu, Hai Wu, Jinlong Wang, Qiming Xia, Yan Xia, Jonathan Li, Kyle Gao, Chenglu Wen, Cheng Wang

    Abstract: LiDAR-based vision systems are integral for 3D object detection, which is crucial for autonomous navigation. However, they suffer from performance degradation in adverse weather conditions due to the quality deterioration of LiDAR point clouds. Fusing LiDAR with the weather-robust 4D radar sensor is expected to solve this problem. However, the fusion of LiDAR and 4D radar is challenging because th… ▽ More

    Submitted 9 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

  19. arXiv:2408.02914  [pdf, other

    cs.HC

    VirtualNexus: Enhancing 360-Degree Video AR/VR Collaboration with Environment Cutouts and Virtual Replicas

    Authors: Xincheng Huang, Michael Yin, Ziyi Xia, Robert Xiao

    Abstract: Asymmetric AR/VR collaboration systems bring a remote VR user to a local AR user's physical environment, allowing them to communicate and work within a shared virtual/physical space. Such systems often display the remote environment through 3D reconstructions or 360-degree videos. While 360-degree cameras stream an environment in higher quality, they lack spatial information, making them less inte… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 12 pages, 10 figures, to be published in The 37th Annual ACM Symposium on User Interface Software and Technology (UIST'24)

  20. arXiv:2408.02404  [pdf, other

    cs.IR

    Feedback Reciprocal Graph Collaborative Filtering

    Authors: Weijun Chen, Yuanchen Bei, Qijie Shen, Hao Chen, Xiao Huang, Feiran Huang

    Abstract: Collaborative filtering on user-item interaction graphs has achieved success in the industrial recommendation. However, recommending users' truly fascinated items poses a seesaw dilemma for collaborative filtering models learned from the interaction graph. On the one hand, not all items that users interact with are equally appealing. Some items are genuinely fascinating to users, while others are… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 9 pages, accepted by CIKM 2024

  21. arXiv:2408.01471  [pdf, other

    cs.CV cs.RO

    Enhancing Online Road Network Perception and Reasoning with Standard Definition Maps

    Authors: Hengyuan Zhang, David Paz, Yuliang Guo, Arun Das, Xinyu Huang, Karsten Haug, Henrik I. Christensen, Liu Ren

    Abstract: Autonomous driving for urban and highway driving applications often requires High Definition (HD) maps to generate a navigation plan. Nevertheless, various challenges arise when generating and maintaining HD maps at scale. While recent online mapping methods have started to emerge, their performance especially for longer ranges is limited by heavy occlusion in dynamic environments. With these cons… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted by the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024)

  22. arXiv:2408.00662  [pdf, other

    cs.CL cs.LG

    Aligning Multiple Knowledge Graphs in a Single Pass

    Authors: Yaming Yang, Zhe Wang, Ziyu Guan, Wei Zhao, Weigang Lu, Xinyan Huang

    Abstract: Entity alignment (EA) is to identify equivalent entities across different knowledge graphs (KGs), which can help fuse these KGs into a more comprehensive one. Previous EA methods mainly focus on aligning a pair of KGs, and to the best of our knowledge, no existing EA method considers aligning multiple (more than two) KGs. To fill this research gap, in this work, we study a novel problem of alignin… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  23. arXiv:2407.21781  [pdf, other

    cs.RO

    Berkeley Humanoid: A Research Platform for Learning-based Control

    Authors: Qiayuan Liao, Bike Zhang, Xuanyu Huang, Xiaoyu Huang, Zhongyu Li, Koushil Sreenath

    Abstract: We introduce Berkeley Humanoid, a reliable and low-cost mid-scale humanoid research platform for learning-based control. Our lightweight, in-house-built robot is designed specifically for learning algorithms with low simulation complexity, anthropomorphic motion, and high reliability against falls. The robot's narrow sim-to-real gap enables agile and robust locomotion across various terrains in ou… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: 12 pages, 9 figures

  24. arXiv:2407.21693  [pdf, other

    cs.AI

    TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities

    Authors: Ming Zhang, Caishuang Huang, Yilong Wu, Shichun Liu, Huiyuan Zheng, Yurui Dong, Yujiong Shen, Shihan Dou, Jun Zhao, Junjie Ye, Qi Zhang, Tao Gui, Xuanjing Huang

    Abstract: Task-oriented dialogue (TOD) systems aim to efficiently handle task-oriented conversations, including information collection. How to utilize TOD accurately, efficiently and effectively for information collection has always been a critical and challenging task. Recent studies have demonstrated that Large Language Models (LLMs) excel in dialogue, instruction generation, and reasoning, and can signif… ▽ More

    Submitted 7 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  25. arXiv:2407.21669  [pdf, other

    cs.CL cs.LG

    Synth-Empathy: Towards High-Quality Synthetic Empathy Data

    Authors: Hao Liang, Linzhuang Sun, Jingxuan Wei, Xijie Huang, Linkun Sun, Bihui Yu, Conghui He, Wentao Zhang

    Abstract: In recent years, with the rapid advancements in large language models (LLMs), achieving excellent empathetic response capabilities has become a crucial prerequisite. Consequently, managing and understanding empathetic datasets have gained increasing significance. However, empathetic data are typically human-labeled, leading to insufficient datasets and wasted human labor. In this work, we present… ▽ More

    Submitted 10 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2407.01937

  26. arXiv:2407.20756  [pdf, other

    cs.CV cs.CL

    SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models

    Authors: Zheng Liu, Hao Liang, Xijie Huang, Wentao Xiong, Qinhan Yu, Linzhuang Sun, Chong Chen, Conghui He, Bin Cui, Wentao Zhang

    Abstract: Recently, with the rise of web images, managing and understanding large-scale image datasets has become increasingly important. Vision Large Language Models (VLLMs) have recently emerged due to their robust vision-understanding capabilities. However, training these models requires vast amounts of data, posing challenges to efficiency, effectiveness, data quality, and privacy. In this paper, we int… ▽ More

    Submitted 10 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

  27. arXiv:2407.20157  [pdf, other

    cs.AI

    rLLM: Relational Table Learning with LLMs

    Authors: Weichen Li, Xiaotong Huang, Jianwu Zheng, Zheng Wang, Chaokun Wang, Li Pan, Jianhua Li

    Abstract: We introduce rLLM (relationLLM), a PyTorch library designed for Relational Table Learning (RTL) with Large Language Models (LLMs). The core idea is to decompose state-of-the-art Graph Neural Networks, LLMs, and Table Neural Networks into standardized modules, to enable the fast construction of novel RTL-type models in a simple "combine, align, and co-train" manner. To illustrate the usage of rLLM,… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  28. arXiv:2407.20057  [pdf

    physics.ao-ph cs.LG stat.AP

    Reconstructing Global Daily CO2 Emissions via Machine Learning

    Authors: Tao Li, Lixing Wang, Zihan Qiu, Philippe Ciais, Taochun Sun, Matthew W. Jones, Robbie M. Andrew, Glen P. Peters, Piyu ke, Xiaoting Huang, Robert B. Jackson, Zhu Liu

    Abstract: High temporal resolution CO2 emission data are crucial for understanding the drivers of emission changes, however, current emission dataset is only available on a yearly basis. Here, we extended a global daily CO2 emissions dataset backwards in time to 1970 using machine learning algorithm, which was trained to predict historical daily emissions on national scales based on relationships between da… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  29. arXiv:2407.19412  [pdf, other

    cs.AI

    Identity-Driven Hierarchical Role-Playing Agents

    Authors: Libo Sun, Siyuan Wang, Xuanjing Huang, Zhongyu Wei

    Abstract: Utilizing large language models (LLMs) to achieve role-playing has gained great attention recently. The primary implementation methods include leveraging refined prompts and fine-tuning on role-specific datasets. However, these methods suffer from insufficient precision and limited flexibility respectively. To achieve a balance between flexibility and precision, we construct a Hierarchical Identit… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  30. arXiv:2407.18426  [pdf, other

    physics.geo-ph cs.LG

    Diffusion-based subsurface multiphysics monitoring and forecasting

    Authors: Xinquan Huang, Fu Wang, Tariq Alkhalifah

    Abstract: Carbon capture and storage (CCS) plays a crucial role in mitigating greenhouse gas emissions, particularly from industrial outputs. Using seismic monitoring can aid in an accurate and robust monitoring system to ensure the effectiveness of CCS and mitigate associated risks. However, conventional seismic wave equation-based approaches are computationally demanding, which hinders real-time applicati… ▽ More

    Submitted 4 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

  31. arXiv:2407.17802  [pdf, other

    cs.IR

    Sample Enrichment via Temporary Operations on Subsequences for Sequential Recommendation

    Authors: Shu Chen, Jinwei Luo, Weike Pan, Jiangxing Yu, Xin Huang, Zhong Ming

    Abstract: Sequential recommendation leverages interaction sequences to predict forthcoming user behaviors, crucial for crafting personalized recommendations. However, the true preferences of a user are inherently complex and high-dimensional, while the observed data is merely a simplified and low-dimensional projection of the rich preferences, which often leads to prevalent issues like data sparsity and ina… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 12 pages, 6 figures

  32. arXiv:2407.17638  [pdf

    cs.CL

    Time Matters: Examine Temporal Effects on Biomedical Language Models

    Authors: Weisi Liu, Zhe He, Xiaolei Huang

    Abstract: Time roots in applying language models for biomedical applications: models are trained on historical data and will be deployed for new or future data, which may vary from training data. While increasing biomedical tasks have employed state-of-the-art language models, there are very few studies have examined temporal effects on biomedical models when data usually shifts across development and deplo… ▽ More

    Submitted 11 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted to AMIA 2024 Annual Symposium

  33. arXiv:2407.15525  [pdf, other

    cs.LG stat.ML

    Multiple importance sampling for stochastic gradient estimation

    Authors: Corentin Salaün, Xingchang Huang, Iliyan Georgiev, Niloy J. Mitra, Gurprit Singh

    Abstract: We introduce a theoretical and practical framework for efficient importance sampling of mini-batch samples for gradient estimation from single and multiple probability distributions. To handle noisy gradients, our framework dynamically evolves the importance distribution during training by utilizing a self-adaptive metric. Our framework combines multiple, diverse sampling distributions, each tailo… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 13 pages, 11 figures

  34. arXiv:2407.15424  [pdf, other

    cs.CV

    Bidirectional skip-frame prediction for video anomaly detection with intra-domain disparity-driven attention

    Authors: Jiahao Lyu, Minghua Zhao, Jing Hu, Runtao Xi, Xuewen Huang, Shuangli Du, Cheng Shi, Tian Ma

    Abstract: With the widespread deployment of video surveillance devices and the demand for intelligent system development, video anomaly detection (VAD) has become an important part of constructing intelligent surveillance systems. Expanding the discriminative boundary between normal and abnormal events to enhance performance is the common goal and challenge of VAD. To address this problem, we propose a Bidi… ▽ More

    Submitted 23 July, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 11 pages,7 figures, 4 tables

  35. arXiv:2407.14829  [pdf, other

    cs.CL

    Overview of AI-Debater 2023: The Challenges of Argument Generation Tasks

    Authors: Jiayu Lin, Guanrong Chen, Bojun Jin, Chenyang Li, Shutong Jia, Wancong Lin, Yang Sun, Yuhang He, Caihua Yang, Jianzhu Bao, Jipeng Wu, Wen Su, Jinglu Chen, Xinyi Li, Tianyu Chen, Mingjie Han, Shuaiwen Du, Zijian Wang, Jiyin Li, Fuzhong Suo, Hao Wang, Nuanchen Lin, Xuanjing Huang, Changjian Jiang, RuiFeng Xu , et al. (4 additional authors not shown)

    Abstract: In this paper we present the results of the AI-Debater 2023 Challenge held by the Chinese Conference on Affect Computing (CCAC 2023), and introduce the related datasets. We organize two tracks to handle the argumentative generation tasks in different scenarios, namely, Counter-Argument Generation (Track 1) and Claim-based Argument Generation (Track 2). Each track is equipped with its distinct data… ▽ More

    Submitted 24 July, 2024; v1 submitted 20 July, 2024; originally announced July 2024.

  36. arXiv:2407.14098  [pdf, other

    cs.DB

    Top-k Representative Search for Comparative Tree Summarization

    Authors: Yuqi Chen, Xin Huang, Bilian Chen

    Abstract: Data summarization aims at utilizing a small-scale summary to represent massive datasets as a whole, which is useful for visualization and information sipped generation. However, most existing studies of hierarchical summarization only work on \emph{one single tree} by selecting $k$ representative nodes, which neglects an important problem of comparative summarization on two trees. In this paper,… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  37. arXiv:2407.13390  [pdf, other

    cs.CV

    GeometrySticker: Enabling Ownership Claim of Recolorized Neural Radiance Fields

    Authors: Xiufeng Huang, Ka Chun Cheung, Simon See, Renjie Wan

    Abstract: Remarkable advancements in the recolorization of Neural Radiance Fields (NeRF) have simplified the process of modifying NeRF's color attributes. Yet, with the potential of NeRF to serve as shareable digital assets, there's a concern that malicious users might alter the color of NeRF models and falsely claim the recolorized version as their own. To safeguard against such breaches of ownership, enab… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  38. arXiv:2407.12504  [pdf, other

    cs.CL

    Case2Code: Learning Inductive Reasoning with Synthetic Data

    Authors: Yunfan Shao, Linyang Li, Yichuan Ma, Peiji Li, Demin Song, Qinyuan Cheng, Shimin Li, Xiaonan Li, Pengyu Wang, Qipeng Guo, Hang Yan, Xipeng Qiu, Xuanjing Huang, Dahua Lin

    Abstract: Complex reasoning is an impressive ability shown by large language models (LLMs). Most LLMs are skilled in deductive reasoning, such as chain-of-thought prompting or iterative tool-using to solve challenging tasks step-by-step. In this paper, we hope to focus on evaluating and teaching LLMs to conduct inductive reasoning, that is, LLMs are supposed to infer underlying rules by observing examples o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  39. arXiv:2407.11638  [pdf, other

    cs.CL cs.IR

    A Comprehensive Evaluation of Large Language Models on Temporal Event Forecasting

    Authors: He Chang, Chenchen Ye, Zhulin Tao, Jie Wu, Zhengmao Yang, Yunshan Ma, Xianglin Huang, Tat-Seng Chua

    Abstract: Recently, Large Language Models (LLMs) have demonstrated great potential in various data mining tasks, such as knowledge question answering, mathematical reasoning, and commonsense reasoning. However, the reasoning capability of LLMs on temporal event forecasting has been under-explored. To systematically investigate their abilities in temporal event forecasting, we conduct a comprehensive evaluat… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  40. arXiv:2407.10990  [pdf

    cs.CL cs.AI

    MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models

    Authors: Mianxin Liu, Jinru Ding, Jie Xu, Weiguo Hu, Xiaoyang Li, Lifeng Zhu, Zhian Bai, Xiaoming Shi, Benyou Wang, Haitao Song, Pengfei Liu, Xiaofan Zhang, Shanshan Wang, Kang Li, Haofen Wang, Tong Ruan, Xuanjing Huang, Xin Sun, Shaoting Zhang

    Abstract: Ensuring the general efficacy and goodness for human beings from medical large language models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce "MedBench", a comprehensive, standardized, and reliable benchmarking system for Chinese med… ▽ More

    Submitted 23 June, 2024; originally announced July 2024.

    Comments: 25 pages.4 figures

  41. arXiv:2407.10980  [pdf, ps, other

    cs.NI

    Learning-based Big Data Sharing Incentive in Mobile AIGC Networks

    Authors: Jinbo Wen, Yang Zhang, Yulin Chen, Weifeng Zhong, Xumin Huang, Lei Liu, Dusit Niyato

    Abstract: Rapid advancements in wireless communication have led to a dramatic upsurge in data volumes within mobile edge networks. These substantial data volumes offer opportunities for training Artificial Intelligence-Generated Content (AIGC) models to possess strong prediction and decision-making capabilities. AIGC represents an innovative approach that utilizes sophisticated generative AI algorithms to a… ▽ More

    Submitted 31 July, 2024; v1 submitted 10 June, 2024; originally announced July 2024.

  42. arXiv:2407.10649  [pdf, other

    cs.CV

    APC: Adaptive Patch Contrast for Weakly Supervised Semantic Segmentation

    Authors: Wangyu Wu, Tianhong Dai, Zhenhong Chen, Xiaowei Huang, Fei Ma, Jimin Xiao

    Abstract: Weakly Supervised Semantic Segmentation (WSSS) using only image-level labels has gained significant attention due to its cost-effectiveness. The typical framework involves using image-level labels as training data to generate pixel-level pseudo-labels with refinements. Recently, methods based on Vision Transformers (ViT) have demonstrated superior capabilities in generating reliable pseudo-labels,… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  43. arXiv:2407.10068  [pdf, other

    cs.CL

    Multi-Granularity Semantic Revision for Large Language Model Distillation

    Authors: Xiaoyu Liu, Yun Zhang, Wei Li, Simiao Li, Xudong Huang, Hanting Chen, Yehui Tang, Jie Hu, Zhiwei Xiong, Yunhe Wang

    Abstract: Knowledge distillation plays a key role in compressing the Large Language Models (LLMs), which boosts a small-size student model under large teacher models' guidance. However, existing LLM distillation methods overly rely on student-generated outputs, which may introduce generation errors and misguide the distillation process. Moreover, the distillation loss functions introduced in previous art st… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  44. arXiv:2407.09893  [pdf, other

    cs.CL

    Synergistic Multi-Agent Framework with Trajectory Learning for Knowledge-Intensive Tasks

    Authors: Shengbin Yue, Siyuan Wang, Wei Chen, Xuanjing Huang, Zhongyu Wei

    Abstract: Recent advancements in Large Language Models (LLMs) have led to significant breakthroughs in various natural language processing tasks. However, generating factually consistent responses in knowledge-intensive scenarios remains a challenge due to issues such as hallucination, difficulty in acquiring long-tailed knowledge, and limited memory expansion. This paper introduces SMART, a novel multi-age… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  45. arXiv:2407.09787  [pdf, other

    cs.CV

    Semi-supervised 3D Object Detection with PatchTeacher and PillarMix

    Authors: Xiaopei Wu, Liang Peng, Liang Xie, Yuenan Hou, Binbin Lin, Xiaoshui Huang, Haifeng Liu, Deng Cai, Wanli Ouyang

    Abstract: Semi-supervised learning aims to leverage numerous unlabeled data to improve the model performance. Current semi-supervised 3D object detection methods typically use a teacher to generate pseudo labels for a student, and the quality of the pseudo labels is essential for the final performance. In this paper, we propose PatchTeacher, which focuses on partial scene 3D object detection to provide high… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by AAAI 2024

  46. arXiv:2407.09751  [pdf, other

    cs.CV

    TASeg: Temporal Aggregation Network for LiDAR Semantic Segmentation

    Authors: Xiaopei Wu, Yuenan Hou, Xiaoshui Huang, Binbin Lin, Tong He, Xinge Zhu, Yuexin Ma, Boxi Wu, Haifeng Liu, Deng Cai, Wanli Ouyang

    Abstract: Training deep models for LiDAR semantic segmentation is challenging due to the inherent sparsity of point clouds. Utilizing temporal data is a natural remedy against the sparsity problem as it makes the input signal denser. However, previous multi-frame fusion algorithms fall short in utilizing sufficient temporal information due to the memory constraint, and they also ignore the informative tempo… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by CVPR 2024

  47. arXiv:2407.08733  [pdf, other

    cs.CL

    Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist

    Authors: Zihao Zhou, Shudong Liu, Maizhen Ning, Wei Liu, Jindong Wang, Derek F. Wong, Xiaowei Huang, Qiufeng Wang, Kaizhu Huang

    Abstract: Exceptional mathematical reasoning ability is one of the key features that demonstrate the power of large language models (LLMs). How to comprehensively define and evaluate the mathematical abilities of LLMs, and even reflect the user experience in real-world scenarios, has emerged as a critical issue. Current benchmarks predominantly concentrate on problem-solving capabilities, which presents a s… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 35 pages, 10 figures, preprint

  48. arXiv:2407.08044  [pdf, other

    cs.CL cs.AI cs.LG

    RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization

    Authors: Xijie Huang, Zechun Liu, Shih-Yang Liu, Kwang-Ting Cheng

    Abstract: Low-Rank Adaptation (LoRA), as a representative Parameter-Efficient Fine-Tuning (PEFT)method, significantly enhances the training efficiency by updating only a small portion of the weights in Large Language Models (LLMs). Recently, weight-only quantization techniques have also been applied to LoRA methods to reduce the memory footprint of fine-tuning. However, applying weight-activation quantizati… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  49. arXiv:2407.06584  [pdf, other

    cs.RO

    HiLMa-Res: A General Hierarchical Framework via Residual RL for Combining Quadrupedal Locomotion and Manipulation

    Authors: Xiaoyu Huang, Qiayuan Liao, Yiming Ni, Zhongyu Li, Laura Smith, Sergey Levine, Xue Bin Peng, Koushil Sreenath

    Abstract: This work presents HiLMa-Res, a hierarchical framework leveraging reinforcement learning to tackle manipulation tasks while performing continuous locomotion using quadrupedal robots. Unlike most previous efforts that focus on solving a specific task, HiLMa-Res is designed to be general for various loco-manipulation tasks that require quadrupedal robots to maintain sustained mobility. The novel des… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: IROS 2024

  50. arXiv:2407.06187  [pdf, other

    cs.CV cs.GR

    JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation

    Authors: Yu Zeng, Vishal M. Patel, Haochen Wang, Xun Huang, Ting-Chun Wang, Ming-Yu Liu, Yogesh Balaji

    Abstract: Personalized text-to-image generation models enable users to create images that depict their individual possessions in diverse scenes, finding applications in various domains. To achieve the personalization capability, existing methods rely on finetuning a text-to-image foundation model on a user's custom dataset, which can be non-trivial for general users, resource-intensive, and time-consuming.… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: CVPR 24