Skip to main content

Showing 1–50 of 259 results for author: Xu, N

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.09422  [pdf, other

    cs.CL cs.AI

    Distinguish Confusion in Legal Judgment Prediction via Revised Relation Knowledge

    Authors: Nuo Xu, Pinghui Wang, Junzhou Zhao, Feiyang Sun, Lin Lan, Jing Tao, Li Pan, Xiaohong Guan

    Abstract: Legal Judgment Prediction (LJP) aims to automatically predict a law case's judgment results based on the text description of its facts. In practice, the confusing law articles (or charges) problem frequently occurs, reflecting that the law cases applicable to similar articles (or charges) tend to be misjudged. Although some recent works based on prior knowledge solve this issue well, they ignore t… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM TOIS

  2. arXiv:2408.02599  [pdf, other

    cs.CL cs.AI

    Progressively Selective Label Enhancement for Language Model Alignment

    Authors: Biao Liu, Ning Xu, Xin Geng

    Abstract: Large Language Models have demonstrated impressive capabilities in various language tasks but may produce content that misaligns with human expectations, raising ethical and legal concerns. Therefore, it is important to explore the limitations and implement restrictions on the models to ensure safety and compliance, with Reinforcement Learning from Human Feedback (RLHF) being the primary method. D… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  3. arXiv:2408.00804  [pdf, other

    cs.AR cs.AI cs.LG

    ChipExpert: The Open-Source Integrated-Circuit-Design-Specific Large Language Model

    Authors: Ning Xu, Zhaoyang Zhang, Lei Qi, Wensuo Wang, Chao Zhang, Zihao Ren, Huaiyuan Zhang, Xin Cheng, Yanqi Zhang, Zhichao Liu, Qingwen Wei, Shiyang Wu, Lanlan Yang, Qianfeng Lu, Yiqun Ma, Mengyao Zhao, Junbo Liu, Yufan Song, Xin Geng, Jun Yang

    Abstract: The field of integrated circuit (IC) design is highly specialized, presenting significant barriers to entry and research and development challenges. Although large language models (LLMs) have achieved remarkable success in various domains, existing LLMs often fail to meet the specific needs of students, engineers, and researchers. Consequently, the potential of LLMs in the IC design domain remains… ▽ More

    Submitted 26 July, 2024; originally announced August 2024.

  4. arXiv:2407.16412  [pdf, other

    cs.RO

    Cross Anything: General Quadruped Robot Navigation through Complex Terrains

    Authors: Shaoting Zhu, Derun Li, Yong Liu, Ningyi Xu, Hang Zhao

    Abstract: The application of vision-language models (VLMs) has achieved impressive success in various robotics tasks, but there are few explorations for foundation models used in quadruped robot navigation. We introduce Cross Anything System (CAS), an innovative system composed of a high-level reasoning module and a low-level control policy, enabling the robot to navigate across complex 3D terrains and reac… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  5. arXiv:2407.12851  [pdf

    cs.CL

    ISPO: An Integrated Ontology of Symptom Phenotypes for Semantic Integration of Traditional Chinese Medical Data

    Authors: Zixin Shu, Rui Hua, Dengying Yan, Chenxia Lu, Ning Xu, Jun Li, Hui Zhu, Jia Zhang, Dan Zhao, Chenyang Hui, Junqiu Ye, Chu Liao, Qi Hao, Wen Ye, Cheng Luo, Xinyan Wang, Chuang Cheng, Xiaodong Li, Baoyan Liu, Xiaji Zhou, Runshun Zhang, Min Xu, Xuezhong Zhou

    Abstract: Symptom phenotypes are one of the key types of manifestations for diagnosis and treatment of various disease conditions. However, the diversity of symptom terminologies is one of the major obstacles hindering the analysis and knowledge sharing of various types of symptom-related medical data particularly in the fields of Traditional Chinese Medicine (TCM). Objective: This study aimed to construct… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 39 pages, 6 figures, 6 tables

  6. arXiv:2407.10599  [pdf, other

    cs.IR cs.DB

    General algorithm of assigning raster features to vector maps at any resolution or scale

    Authors: Nan Xu, Mark Stevenson, Kerry A. Nice, Sachith Seneviratne

    Abstract: The fusion of multi-source data is essential for a comprehensive analysis of geographic applications. Due to distinct data structures, the fusion process tends to encounter technical difficulties in terms of preservation of the intactness of each source data. Furthermore, a lack of generalized methods is a problem when the method is expected to be applicable in multiple resolutions, sizes, or scal… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  7. arXiv:2407.08061  [pdf, other

    cs.CV

    Geospecific View Generation -- Geometry-Context Aware High-resolution Ground View Inference from Satellite Views

    Authors: Ningli Xu, Rongjun Qin

    Abstract: Predicting realistic ground views from satellite imagery in urban scenes is a challenging task due to the significant view gaps between satellite and ground-view images. We propose a novel pipeline to tackle this challenge, by generating geospecifc views that maximally respect the weak geometry and texture from multi-view satellite images. Different from existing approaches that hallucinate images… ▽ More

    Submitted 29 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: 11 figures

  8. arXiv:2407.07372  [pdf, other

    eess.IV cs.CV

    Trustworthy Contrast-enhanced Brain MRI Synthesis

    Authors: Jiyao Liu, Yuxin Li, Shangqi Gao, Yuncheng Zhou, Xin Gao, Ningsheng Xu, Xiao-Yong Zhang, Xiahai Zhuang

    Abstract: Contrast-enhanced brain MRI (CE-MRI) is a valuable diagnostic technique but may pose health risks and incur high costs. To create safer alternatives, multi-modality medical image translation aims to synthesize CE-MRI images from other available modalities. Although existing methods can generate promising predictions, they still face two challenges, i.e., exhibiting over-confidence and lacking inte… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

  9. arXiv:2407.00902  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    From Introspection to Best Practices: Principled Analysis of Demonstrations in Multimodal In-Context Learning

    Authors: Nan Xu, Fei Wang, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: Motivated by in-context learning (ICL) capabilities of Large Language models (LLMs), multimodal LLMs with additional visual modality are also exhibited with similar ICL abilities when multiple image-text pairs are provided as demonstrations. However, relatively less work has been done to investigate the principles behind how and why multimodal ICL works. We conduct a systematic and principled eval… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  10. arXiv:2406.17419  [pdf, other

    cs.CL cs.AI

    Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

    Authors: Minzheng Wang, Longze Chen, Cheng Fu, Shengyi Liao, Xinghua Zhang, Bingli Wu, Haiyang Yu, Nan Xu, Lei Zhang, Run Luo, Yunshui Li, Min Yang, Fei Huang, Yongbin Li

    Abstract: Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows. Meanwhile, benchmarks for evaluating long-context LLMs are gradually catching up. However, existing benchmarks employ irrelevant noise texts to artificially extend the length of test cases, diverging from the real-world scenarios of long-contex… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: We release our code and data publicly at https://github.com/MozerWang/Loong

  11. arXiv:2406.11839  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    mDPO: Conditional Preference Optimization for Multimodal Large Language Models

    Authors: Fei Wang, Wenxuan Zhou, James Y. Huang, Nan Xu, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: Direct preference optimization (DPO) has shown to be an effective method for large language model (LLM) alignment. Recent works have attempted to apply DPO to multimodal scenarios but have found it challenging to achieve consistent improvement. Through a comparative experiment, we identify the unconditional preference problem in multimodal preference optimization, where the model overlooks the ima… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  12. arXiv:2406.09897  [pdf, other

    cs.CL

    3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding

    Authors: Xindian Ma, Wenyuan Liu, Peng Zhang, Nan Xu

    Abstract: Inspired by the Bloch Sphere representation, we propose a novel rotary position encoding on a three-dimensional sphere, named 3D Rotary Position Encoding (3D-RPE). 3D-RPE is an advanced version of the widely used 2D Rotary Position Encoding (RoPE), with two major advantages for modeling long contexts: controllable long-term decay and improved position resolution. For controllable long-term decay,… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  13. arXiv:2406.09411  [pdf, other

    cs.CV cs.AI cs.CL

    MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding

    Authors: Fei Wang, Xingyu Fu, James Y. Huang, Zekun Li, Qin Liu, Xiaogeng Liu, Mingyu Derek Ma, Nan Xu, Wenxuan Zhou, Kai Zhang, Tianyi Lorena Yan, Wenjie Jacky Mo, Hsiang-Hui Liu, Pan Lu, Chunyuan Li, Chaowei Xiao, Kai-Wei Chang, Dan Roth, Sheng Zhang, Hoifung Poon, Muhao Chen

    Abstract: We introduce MuirBench, a comprehensive benchmark that focuses on robust multi-image understanding capabilities of multimodal LLMs. MuirBench consists of 12 diverse multi-image tasks (e.g., scene understanding, ordering) that involve 10 categories of multi-image relations (e.g., multiview, temporal relations). Comprising 11,264 images and 2,600 multiple-choice questions, MuirBench is created in a… ▽ More

    Submitted 1 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: typos corrected, references added, Project Page: https://muirbench.github.io/

  14. arXiv:2406.01961  [pdf, other

    cs.RO cs.CV

    Exploring Real World Map Change Generalization of Prior-Informed HD Map Prediction Models

    Authors: Samuel M. Bateman, Ning Xu, H. Charles Zhao, Yael Ben Shalom, Vince Gong, Greg Long, Will Maddern

    Abstract: Building and maintaining High-Definition (HD) maps represents a large barrier to autonomous vehicle deployment. This, along with advances in modern online map detection models, has sparked renewed interest in the online mapping problem. However, effectively predicting online maps at a high enough quality to enable safe, driverless deployments remains a significant challenge. Recent work on these m… ▽ More

    Submitted 5 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted to CVPR 2024, Workshop on Autonomous Driving

  15. arXiv:2405.20786  [pdf, other

    cs.CV cs.HC

    Stratified Avatar Generation from Sparse Observations

    Authors: Han Feng, Wenchao Ma, Quankai Gao, Xianwei Zheng, Nan Xue, Huijuan Xu

    Abstract: Estimating 3D full-body avatars from AR/VR devices is essential for creating immersive experiences in AR/VR applications. This task is challenging due to the limited input from Head Mounted Devices, which capture only sparse observations from the head and hands. Predicting the full-body avatars, particularly the lower body, from these sparse observations presents significant difficulties. In this… ▽ More

    Submitted 3 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR 2024 (Oral)

  16. arXiv:2405.20310  [pdf, other

    cs.CV

    A Pixel Is Worth More Than One 3D Gaussians in Single-View 3D Reconstruction

    Authors: Jianghao Shen, Nan Xue, Tianfu Wu

    Abstract: Learning 3D scene representation from a single-view image is a long-standing fundamental problem in computer vision, with the inherent ambiguity in predicting contents unseen from the input view. Built on the recently proposed 3D Gaussian Splatting (3DGS), the Splatter Image method has made promising progress on fast single-image novel view synthesis via learning a single 3D Gaussian for each pixe… ▽ More

    Submitted 3 June, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: preprint, under review

  17. Large-scale DSM registration via motion averaging

    Authors: Ningli Xu, Rongjun Qin

    Abstract: Generating wide-area digital surface models (DSMs) requires registering a large number of individual, and partially overlapped DSMs. This presents a challenging problem for a typical registration algorithm, since when a large number of observations from these multiple DSMs are considered, it may easily cause memory overflow. Sequential registration algorithms, although can significantly reduce the… ▽ More

    Submitted 2 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: 9 Figures

    Journal ref: ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. X-1-2024

  18. arXiv:2405.12200  [pdf, other

    cs.CV

    Multi-View Attentive Contextualization for Multi-View 3D Object Detection

    Authors: Xianpeng Liu, Ce Zheng, Ming Qian, Nan Xue, Chen Chen, Zhebin Zhang, Chen Li, Tianfu Wu

    Abstract: We present Multi-View Attentive Contextualization (MvACon), a simple yet effective method for improving 2D-to-3D feature lifting in query-based multi-view 3D (MV3D) object detection. Despite remarkable progress witnessed in the field of query-based MV3D object detection, prior art often suffers from either the lack of exploiting high-resolution 2D features in dense attention-based lifting, due to… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted by CVPR2024

  19. arXiv:2405.03971  [pdf, other

    cs.CV cs.MA

    Unified End-to-End V2X Cooperative Autonomous Driving

    Authors: Zhiwei Li, Bozhen Zhang, Lei Yang, Tianyu Shen, Nuo Xu, Ruosen Hao, Weiting Li, Tao Yan, Huaping Liu

    Abstract: V2X cooperation, through the integration of sensor data from both vehicles and infrastructure, is considered a pivotal approach to advancing autonomous driving technology. Current research primarily focuses on enhancing perception accuracy, often overlooking the systematic improvement of accident prediction accuracy through end-to-end learning, leading to insufficient attention to the safety issue… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  20. arXiv:2405.03131  [pdf, other

    cs.IT cs.AI cs.LG

    WDMoE: Wireless Distributed Large Language Models with Mixture of Experts

    Authors: Nan Xue, Yaping Sun, Zhiyong Chen, Meixia Tao, Xiaodong Xu, Liang Qian, Shuguang Cui, Ping Zhang

    Abstract: Large Language Models (LLMs) have achieved significant success in various natural language processing tasks, but how wireless communications can support LLMs has not been extensively studied. In this paper, we propose a wireless distributed LLMs paradigm based on Mixture of Experts (MoE), named WDMoE, deploying LLMs collaboratively across edge servers of base station (BS) and mobile devices in the… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: submitted to IEEE conference

  21. arXiv:2404.18112  [pdf, other

    cs.CV cs.RO

    Garbage Segmentation and Attribute Analysis by Robotic Dogs

    Authors: Nuo Xu, Jianfeng Liao, Qiwei Meng, Wei Song

    Abstract: Efficient waste management and recycling heavily rely on garbage exploration and identification. In this study, we propose GSA2Seg (Garbage Segmentation and Attribute Analysis), a novel visual approach that utilizes quadruped robotic dogs as autonomous agents to address waste management and recycling challenges in diverse indoor and outdoor environments. Equipped with advanced visual perception sy… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  22. arXiv:2404.17569  [pdf, other

    cs.CV

    MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

    Authors: Shangzhan Zhang, Sida Peng, Tao Xu, Yuanbo Yang, Tianrun Chen, Nan Xue, Yujun Shen, Hujun Bao, Ruizhen Hu, Xiaowei Zhou

    Abstract: This paper aims to generate materials for 3D meshes from text descriptions. Unlike existing methods that synthesize texture maps, we propose to generate segment-wise procedural material graphs as the appearance representation, which supports high-quality rendering and provides substantial flexibility in editing. Instead of relying on extensive paired data, i.e., 3D meshes with material graphs and… ▽ More

    Submitted 25 June, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: SIGGRAPH 2024. Project page: https://zju3dv.github.io/MaPa

  23. arXiv:2404.11613  [pdf, other

    cs.CV

    InFusion: Inpainting 3D Gaussians via Learning Depth Completion from Diffusion Prior

    Authors: Zhiheng Liu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Jie Xiao, Kai Zhu, Nan Xue, Yu Liu, Yujun Shen, Yang Cao

    Abstract: 3D Gaussians have recently emerged as an efficient representation for novel view synthesis. This work studies its editability with a particular focus on the inpainting task, which aims to supplement an incomplete set of 3D Gaussians with additional points for visually harmonious rendering. Compared to 2D inpainting, the crux of inpainting 3D Gaussians is to figure out the rendering-relevant proper… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Project page: https://johanan528.github.io/Infusion

  24. arXiv:2404.04319  [pdf, other

    cs.CV

    SpatialTracker: Tracking Any 2D Pixels in 3D Space

    Authors: Yuxi Xiao, Qianqian Wang, Shangzhan Zhang, Nan Xue, Sida Peng, Yujun Shen, Xiaowei Zhou

    Abstract: Recovering dense and long-range pixel motion in videos is a challenging problem. Part of the difficulty arises from the 3D-to-2D projection process, leading to occlusions and discontinuities in the 2D motion domain. While 2D motion can be intricate, we posit that the underlying 3D motion can often be simple and low-dimensional. In this work, we propose to estimate point trajectories in 3D space to… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted to CVPR 2024 (selected as highlight paper). Project page: https://henry123-boy.github.io/SpaTracker/

  25. arXiv:2404.01548  [pdf, other

    cs.CV cs.AI

    mChartQA: A universal benchmark for multimodal Chart Question Answer based on Vision-Language Alignment and Reasoning

    Authors: Jingxuan Wei, Nan Xu, Guiyong Chang, Yin Luo, BiHui Yu, Ruifeng Guo

    Abstract: In the fields of computer vision and natural language processing, multimodal chart question-answering, especially involving color, structure, and textless charts, poses significant challenges. Traditional methods, which typically involve either direct multimodal processing or a table-to-text conversion followed by language model analysis, have limitations in effectively handling these complex scen… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  26. arXiv:2403.16038  [pdf, other

    cs.CL

    Monotonic Paraphrasing Improves Generalization of Language Model Prompting

    Authors: Qin Liu, Fei Wang, Nan Xu, Tianyi Yan, Tao Meng, Muhao Chen

    Abstract: Performance of large language models (LLMs) may vary with different prompts or instructions of even the same task. One commonly recognized factor for this phenomenon is the model's familiarity with the given prompt or instruction, which is typically estimated by its perplexity. However, finding the prompt with the lowest perplexity is challenging, given the enormous space of possible prompting phr… ▽ More

    Submitted 18 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

    Comments: Under review at ARR 2024 April

  27. arXiv:2403.11453  [pdf, other

    cs.GR cs.CV

    Bridging 3D Gaussian and Mesh for Freeview Video Rendering

    Authors: Yuting Xiao, Xuan Wang, Jiafei Li, Hongrui Cai, Yanbo Fan, Nan Xue, Minghui Yang, Yujun Shen, Shenghua Gao

    Abstract: This is only a preview version of GauMesh. Recently, primitive-based rendering has been proven to achieve convincing results in solving the problem of modeling and rendering the 3D dynamic scene from 2D images. Despite this, in the context of novel view synthesis, each type of primitive has its inherent defects in terms of representation ability. It is difficult to exploit the mesh to depict the f… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: 7 pages

  28. arXiv:2403.06520  [pdf, other

    cs.CL cs.AI

    How to Understand Named Entities: Using Common Sense for News Captioning

    Authors: Ning Xu, Yanhui Wang, Tingting Zhang, Hongshuo Tian, Mohan Kankanhalli, An-An Liu

    Abstract: News captioning aims to describe an image with its news article body as input. It greatly relies on a set of detected named entities, including real-world people, organizations, and places. This paper exploits commonsense knowledge to understand named entities for news captioning. By ``understand'', we mean correlating the news content with common sense in the wild, which helps an agent to 1) dist… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  29. arXiv:2403.05101  [pdf, other

    cs.CL cs.AI

    Rule-driven News Captioning

    Authors: Ning Xu, Tingting Zhang, Hongshuo Tian, An-An Liu

    Abstract: News captioning task aims to generate sentences by describing named entities or concrete events for an image with its news article. Existing methods have achieved remarkable results by relying on the large-scale pre-trained models, which primarily focus on the correlations between the input news content and the output predictions. However, the news captioning requires adhering to some fundamental… ▽ More

    Submitted 14 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  30. arXiv:2403.03736  [pdf, other

    cs.CV cs.LG eess.IV

    Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer

    Authors: Naifu Xue, Qi Mao, Zijian Wang, Yuan Zhang, Siwei Ma

    Abstract: Recent progress in generative compression technology has significantly improved the perceptual quality of compressed data. However, these advancements primarily focus on producing high-frequency details, often overlooking the ability of generative models to capture the prior distribution of image content, thus impeding further bitrate reduction in extreme compression scenarios (<0.05 bpp). Motivat… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  31. arXiv:2403.02713  [pdf, other

    cs.CL cs.CV cs.HC cs.LG

    Android in the Zoo: Chain-of-Action-Thought for GUI Agents

    Authors: Jiwen Zhang, Jihao Wu, Yihua Teng, Minghui Liao, Nuo Xu, Xiao Xiao, Zhongyu Wei, Duyu Tang

    Abstract: Large language model (LLM) leads to a surge of autonomous GUI agents for smartphone, which completes a task triggered by natural language through predicting a sequence of actions of API. Even though the task highly relies on past actions and visual observations, existing studies typically consider little semantic information carried out by intermediate screenshots and screen operations. To address… ▽ More

    Submitted 12 July, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: Dataset could be found in https://github.com/IMNearth/CoAT

  32. arXiv:2402.18892  [pdf, other

    cs.CV cs.RO

    Aligning Knowledge Graph with Visual Perception for Object-goal Navigation

    Authors: Nuo Xu, Wen Wang, Rong Yang, Mengjie Qin, Zheyuan Lin, Wei Song, Chunlong Zhang, Jason Gu, Chao Li

    Abstract: Object-goal navigation is a challenging task that requires guiding an agent to specific objects based on first-person visual observations. The ability of agent to comprehend its surroundings plays a crucial role in achieving successful object finding. However, existing knowledge-graph-based navigators often rely on discrete categorical one-hot vectors and vote counting strategy to construct graph… ▽ More

    Submitted 25 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepted to ICRA 2024

  33. arXiv:2402.17430  [pdf, other

    cs.CV

    Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction

    Authors: Zihao Liu, Xiaoyu Zhang, Guangwei Liu, Ji Zhao, Ningyi Xu

    Abstract: In autonomous driving, the high-definition (HD) map plays a crucial role in localization and planning. Recently, several methods have facilitated end-to-end online map construction in DETR-like frameworks. However, little attention has been paid to the potential capabilities of exploring the query mechanism for map elements. This paper introduces MapQR, an end-to-end method with an emphasis on enh… ▽ More

    Submitted 23 July, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted to European Conference on Computer Vision (ECCV), 2024.Code can be found at https://github.com/HXMap/MapQR

  34. arXiv:2402.14568  [pdf, other

    cs.CL

    LLM-DA: Data Augmentation via Large Language Models for Few-Shot Named Entity Recognition

    Authors: Junjie Ye, Nuo Xu, Yikun Wang, Jie Zhou, Qi Zhang, Tao Gui, Xuanjing Huang

    Abstract: Despite the impressive capabilities of large language models (LLMs), their performance on information extraction tasks is still not entirely satisfactory. However, their remarkable rewriting capabilities and extensive world knowledge offer valuable insights to improve these tasks. In this paper, we propose $LLM-DA$, a novel data augmentation technique based on LLMs for the few-shot NER task. To ov… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  35. arXiv:2402.14404  [pdf, other

    cs.CL cs.AI

    On the Tip of the Tongue: Analyzing Conceptual Representation in Large Language Models with Reverse-Dictionary Probe

    Authors: Ningyu Xu, Qi Zhang, Menghan Zhang, Peng Qian, Xuanjing Huang

    Abstract: Probing and enhancing large language models' reasoning capacity remains a crucial open question. Here we re-purpose the reverse dictionary task as a case study to probe LLMs' capacity for conceptual inference. We use in-context learning to guide the models to generate the term for an object concept implied in a linguistic description. Models robustly achieve high accuracy in this task, and their r… ▽ More

    Submitted 26 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 21 pages, 13 figures

  36. arXiv:2402.11525  [pdf, other

    cs.CL cs.LG

    Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution

    Authors: Nuo Xu, Jun Zhao, Can Zu, Sixian Li, Lu Chen, Zhihao Zhang, Rui Zheng, Shihan Dou, Wenjuan Qin, Tao Gui, Qi Zhang, Xuanjing Huang

    Abstract: Faithfulness, expressiveness, and elegance is the constant pursuit in machine translation. However, traditional metrics like \textit{BLEU} do not strictly align with human preference of translation quality. In this paper, we explore leveraging reinforcement learning with human feedback (\textit{RLHF}) to improve translation quality. It is non-trivial to collect a large high-quality dataset of huma… ▽ More

    Submitted 27 February, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

  37. arXiv:2402.10631  [pdf, other

    cs.CL

    BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation

    Authors: Dayou Du, Yijia Zhang, Shijie Cao, Jiaqi Guo, Ting Cao, Xiaowen Chu, Ningyi Xu

    Abstract: The upscaling of Large Language Models (LLMs) has yielded impressive advances in natural language processing, yet it also poses significant deployment challenges. Weight quantization has emerged as a widely embraced solution to reduce memory and computational demands. This paper introduces BitDistiller, a framework that synergizes Quantization-Aware Training (QAT) with Knowledge Distillation (KD)… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  38. arXiv:2401.06080  [pdf, other

    cs.AI

    Secrets of RLHF in Large Language Models Part II: Reward Modeling

    Authors: Binghai Wang, Rui Zheng, Lu Chen, Yan Liu, Shihan Dou, Caishuang Huang, Wei Shen, Senjie Jin, Enyu Zhou, Chenyu Shi, Songyang Gao, Nuo Xu, Yuhao Zhou, Xiaoran Fan, Zhiheng Xi, Jun Zhao, Xiao Wang, Tao Ji, Hang Yan, Lixing Shen, Zhan Chen, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang , et al. (2 additional authors not shown)

    Abstract: Reinforcement Learning from Human Feedback (RLHF) has become a crucial technology for aligning language models with human values and intentions, enabling models to produce more helpful and harmless responses. Reward models are trained as proxies for human preferences to drive reinforcement learning optimization. While reward models are often considered central to achieving high performance, they f… ▽ More

    Submitted 12 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

  39. arXiv:2312.15548  [pdf, other

    cs.CL cs.AI

    YAYI-UIE: A Chat-Enhanced Instruction Tuning Framework for Universal Information Extraction

    Authors: Xinglin Xiao, Yijie Wang, Nan Xu, Yuqi Wang, Hanxuan Yang, Minzheng Wang, Yin Luo, Lei Wang, Wenji Mao, Daniel Zeng

    Abstract: The difficulty of the information extraction task lies in dealing with the task-specific label schemas and heterogeneous data structures. Recent work has proposed methods based on large language models to uniformly model different information extraction tasks. However, these existing methods are deficient in their information extraction capabilities for Chinese languages other than English. In thi… ▽ More

    Submitted 2 April, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

  40. arXiv:2312.14862  [pdf, other

    cs.CL cs.AI

    YAYI 2: Multilingual Open-Source Large Language Models

    Authors: Yin Luo, Qingchao Kong, Nan Xu, Jia Cao, Bao Hao, Baoyu Qu, Bo Chen, Chao Zhu, Chenyang Zhao, Donglei Zhang, Fan Feng, Feifei Zhao, Hailong Sun, Hanxuan Yang, Haojun Pan, Hongyu Liu, Jianbin Guo, Jiangtao Du, Jingyi Wang, Junfeng Li, Lei Sun, Liduo Liu, Lifeng Dong, Lili Liu, Lin Wang , et al. (28 additional authors not shown)

    Abstract: As the latest advancements in natural language processing, large language models (LLMs) have achieved human-level language understanding and generation abilities in many real-world tasks, and even have been regarded as a potential path to the artificial general intelligence. To better facilitate research on LLMs, many open-source LLMs, such as Llama 2 and Falcon, have recently been proposed and ga… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

  41. arXiv:2312.11112  [pdf, other

    cs.CV

    ConDaFormer: Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding

    Authors: Lunhao Duan, Shanshan Zhao, Nan Xue, Mingming Gong, Gui-Song Xia, Dacheng Tao

    Abstract: Transformers have been recently explored for 3D point cloud understanding with impressive progress achieved. A large number of points, over 0.1 million, make the global self-attention infeasible for point cloud data. Thus, most methods propose to apply the transformer in a local region, e.g., spherical or cubic window. However, it still contains a large number of Query-Key pairs, which requires hi… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023. Code: https://github.com/LHDuan/ConDaFormer

  42. arXiv:2312.08702  [pdf, other

    cs.AI

    Rational Sensibility: LLM Enhanced Empathetic Response Generation Guided by Self-presentation Theory

    Authors: Linzhuang Sun, Nan Xu, Jingxuan Wei, Bihui Yu, Liping Bu, Yin Luo

    Abstract: Having the ability to empathize is crucial for accurately representing human behavior during conversations. Despite numerous research aim to improve the cognitive capability of models by incorporating external knowledge, there has been limited attention on the sensible and rational expression of the conversation itself, which are crucial components of the cognitive empathy. Guided by self-presenta… ▽ More

    Submitted 1 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

  43. arXiv:2311.17516  [pdf, other

    cs.CR cs.CV

    MMA-Diffusion: MultiModal Attack on Diffusion Models

    Authors: Yijun Yang, Ruiyuan Gao, Xiaosen Wang, Tsung-Yi Ho, Nan Xu, Qiang Xu

    Abstract: In recent years, Text-to-Image (T2I) models have seen remarkable advancements, gaining widespread adoption. However, this progress has inadvertently opened avenues for potential misuse, particularly in generating inappropriate or Not-Safe-For-Work (NSFW) content. Our work introduces MMA-Diffusion, a framework that presents a significant and realistic threat to the security of T2I models by effecti… ▽ More

    Submitted 30 March, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: CVPR 2024. Our codes and benchmarks are available at https://github.com/cure-lab/MMA-Diffusion

  44. arXiv:2311.16618  [pdf, other

    cs.CV

    Cross-level Attention with Overlapped Windows for Camouflaged Object Detection

    Authors: Jiepan Li, Fangxiao Lu, Nan Xue, Zhuohong Li, Hongyan Zhang, Wei He

    Abstract: Camouflaged objects adaptively fit their color and texture with the environment, which makes them indistinguishable from the surroundings. Current methods revealed that high-level semantic features can highlight the differences between camouflaged objects and the backgrounds. Consequently, they integrate high-level semantic features with low-level detailed features for accurate camouflaged object… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

  45. arXiv:2311.12307  [pdf, other

    cs.AI

    Causality is all you need

    Authors: Ning Xu, Yifei Gao, Hongshuo Tian, Yongdong Zhang, An-An Liu

    Abstract: In the fundamental statistics course, students are taught to remember the well-known saying: "Correlation is not Causation". Till now, statistics (i.e., correlation) have developed various successful frameworks, such as Transformer and Pre-training large-scale models, which have stacked multiple parallel self-attention blocks to imitate a wide range of tasks. However, in the causation community, h… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  46. arXiv:2311.09827  [pdf, other

    cs.CL

    Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking

    Authors: Nan Xu, Fei Wang, Ben Zhou, Bang Zheng Li, Chaowei Xiao, Muhao Chen

    Abstract: While large language models (LLMs) have demonstrated increasing power, they have also given rise to a wide range of harmful behaviors. As representatives, jailbreak attacks can provoke harmful or unethical responses from LLMs, even after safety alignment. In this paper, we investigate a novel category of jailbreak attacks specifically designed to target the cognitive structure and processes of LLM… ▽ More

    Submitted 29 February, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

  47. arXiv:2311.01792  [pdf, other

    cs.CL cs.AI

    AFPQ: Asymmetric Floating Point Quantization for LLMs

    Authors: Yijia Zhang, Sicheng Zhang, Shijie Cao, Dayou Du, Jianyu Wei, Ting Cao, Ningyi Xu

    Abstract: Large language models (LLMs) show great performance in various tasks, but face deployment challenges from limited memory capacity and bandwidth. Low-bit weight quantization can save memory and accelerate inference. Although floating-point (FP) formats show good performance in LLM quantization, they tend to perform poorly with small group sizes or sub-4 bits. We find the reason is that the absence… ▽ More

    Submitted 3 November, 2023; originally announced November 2023.

  48. arXiv:2310.19620  [pdf, other

    cs.RO cs.AI cs.CV

    Large Trajectory Models are Scalable Motion Predictors and Planners

    Authors: Qiao Sun, Shiduo Zhang, Danjiao Ma, Jingzhe Shi, Derun Li, Simian Luo, Yu Wang, Ningyi Xu, Guangzhi Cao, Hang Zhao

    Abstract: Motion prediction and planning are vital tasks in autonomous driving, and recent efforts have shifted to machine learning-based approaches. The challenges include understanding diverse road topologies, reasoning traffic dynamics over a long time horizon, interpreting heterogeneous behaviors, and generating policies in a large continuous state space. Inspired by the success of large language models… ▽ More

    Submitted 28 February, 2024; v1 submitted 30 October, 2023; originally announced October 2023.

  49. arXiv:2310.18619  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Dense Retrieval as Indirect Supervision for Large-space Decision Making

    Authors: Nan Xu, Fei Wang, Mingtao Dong, Muhao Chen

    Abstract: Many discriminative natural language understanding (NLU) tasks have large label spaces. Learning such a process of large-space decision making is particularly challenging due to the lack of training instances per label and the difficulty of selection among many fine-grained labels. Inspired by dense retrieval methods for passage finding in open-domain QA, we propose a reformulation of large-space… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 (Findings)

  50. arXiv:2310.17910  [pdf, other

    cs.CV

    DocStormer: Revitalizing Multi-Degraded Colored Document Images to Pristine PDF

    Authors: Chaowei Liu, Jichun Li, Yihua Teng, Chaoqun Wang, Nuo Xu, Jihao Wu, Dandan Tu

    Abstract: For capturing colored document images, e.g. posters and magazines, it is common that multiple degradations such as shadows, wrinkles, etc., are simultaneously introduced due to external factors. Restoring multi-degraded colored document images is a great challenge, yet overlooked, as most existing algorithms focus on enhancing color-ignored document images via binarization. Thus, we propose DocSto… ▽ More

    Submitted 27 October, 2023; originally announced October 2023.