Skip to main content

Showing 1–50 of 876 results for author: Lin, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10718  [pdf, other

    cs.SE cs.CL

    CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding?

    Authors: Yuwei Zhao, Ziyang Luo, Yuchen Tian, Hongzhan Lin, Weixiang Yan, Annan Li, Jing Ma

    Abstract: Recent advancements in large language models (LLMs) have showcased impressive code generation capabilities, primarily evaluated through language-to-code benchmarks. However, these benchmarks may not fully capture a model's code understanding abilities. We introduce CodeJudge-Eval (CJ-Eval), a novel benchmark designed to assess LLMs' code understanding abilities from the perspective of code judging… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Work in progress

  2. arXiv:2408.10663  [pdf, other

    cs.CL

    REInstruct: Building Instruction Data from Unlabeled Corpus

    Authors: Shu Chen, Xinyan Guan, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun

    Abstract: Manually annotating instruction data for large language models is difficult, costly, and hard to scale. Meanwhile, current automatic annotation methods typically rely on distilling synthetic data from proprietary LLMs, which not only limits the upper bound of the quality of the instruction data but also raises potential copyright issues. In this paper, we propose REInstruct, a simple and scalable… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted by ACL2024 Findings

  3. arXiv:2408.10159  [pdf, other

    cs.IR cs.AI

    Customizing Language Models with Instance-wise LoRA for Sequential Recommendation

    Authors: Xiaoyu Kong, Jiancan Wu, An Zhang, Leheng Sheng, Hui Lin, Xiang Wang, Xiangnan He

    Abstract: Sequential recommendation systems predict a user's next item of interest by analyzing past interactions, aligning recommendations with individual preferences. Leveraging the strengths of Large Language Models (LLMs) in knowledge comprehension and reasoning, recent approaches have applied LLMs to sequential recommendation through language generation paradigms. These methods convert user behavior se… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  4. arXiv:2408.08551  [pdf, other

    cs.CL

    Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection

    Authors: Haohao Zhu, Xiaokun Zhang, Junyu Lu, Liang Yang, Hongfei Lin

    Abstract: Textual personality detection aims to identify personality traits by analyzing user-generated content. To achieve this effectively, it is essential to thoroughly examine user-generated content from various perspectives. However, previous studies have struggled with automatically extracting and effectively integrating information from multiple perspectives, thereby limiting their performance on per… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Accepted by NLPCC 2024

  5. arXiv:2408.08538  [pdf, other

    cs.IR

    Don't Click the Bait: Title Debiasing News Recommendation via Cross-Field Contrastive Learning

    Authors: Yijie Shu, Xiaokun Zhang, Youlin Wu, Bo Xu, Liang Yang, Hongfei Lin

    Abstract: News recommendation emerges as a primary means for users to access content of interest from the vast amount of news. The title clickbait extensively exists in news domain and increases the difficulty for news recommendation to offer satisfactory services for users. Fortunately, we find that news abstract, as a critical field of news, aligns cohesively with the news authenticity. To this end, we pr… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  6. arXiv:2408.08533  [pdf, ps, other

    stat.ML cs.LG

    Unsupervised Transfer Learning via Adversarial Contrastive Training

    Authors: Chenguang Duan, Yuling Jiao, Huazhen Lin, Wensen Ma, Jerry Zhijian Yang

    Abstract: Learning a data representation for downstream supervised learning tasks under unlabeled scenario is both critical and challenging. In this paper, we propose a novel unsupervised transfer learning approach using adversarial contrastive training (ACT). Our experimental results demonstrate outstanding classification accuracy with both fine-tuned linear probe and K-NN protocol across various datasets,… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  7. arXiv:2408.08494  [pdf, ps, other

    cs.DS cs.LG

    Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms

    Authors: Yi Li, Honghao Lin, David P. Woodruff

    Abstract: We study the problem of residual error estimation for matrix and vector norms using a linear sketch. Such estimates can be used, for example, to quickly assess how useful a more expensive low-rank approximation computation will be. The matrix case concerns the Frobenius norm and the task is to approximate the $k$-residual $\|A - A_k\|_F$ of the input matrix $A$ within a $(1+ε)$-factor, where… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Published as a conference paper at ICLR 2024

  8. arXiv:2408.08147  [pdf, other

    cs.DC cs.CL cs.LG

    P/D-Serve: Serving Disaggregated Large Language Model at Scale

    Authors: Yibo Jin, Tao Wang, Huimin Lin, Mingyang Song, Peiyang Li, Yipeng Ma, Yicheng Shan, Zhengfan Yuan, Cailong Li, Yajing Sun, Tiandeng Wu, Xing Chu, Ruizhi Huan, Li Ma, Xiao You, Wenting Zhou, Yunpeng Ye, Wen Liu, Xiangkun Xu, Yongsheng Zhang, Tiantian Dong, Jiawei Zhu, Zhe Wang, Xijian Ju, Jianxun Song , et al. (5 additional authors not shown)

    Abstract: Serving disaggregated large language models (LLMs) over tens of thousands of xPU devices (GPUs or NPUs) with reliable performance faces multiple challenges. 1) Ignoring the diversity (various prefixes and tidal requests), treating all the prompts in a mixed pool is inadequate. To facilitate the similarity per scenario and minimize the inner mismatch on P/D (prefill and decoding) processing, fine-g… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  9. arXiv:2408.08092  [pdf, other

    cs.CV cs.AI

    OC3D: Weakly Supervised Outdoor 3D Object Detection with Only Coarse Click Annotation

    Authors: Qiming Xia, Hongwei Lin, Wei Ye, Hai Wu, Yadan Luo, Shijia Zhao, Xin Li, Chenglu Wen

    Abstract: LiDAR-based outdoor 3D object detection has received widespread attention. However, training 3D detectors from the LiDAR point cloud typically relies on expensive bounding box annotations. This paper presents OC3D, an innovative weakly supervised method requiring only coarse clicks on the bird's eye view of the 3D point cloud. A key challenge here is the absence of complete geometric descriptions… ▽ More

    Submitted 15 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  10. arXiv:2408.07975  [pdf, other

    cs.RO cs.CL cs.CV

    Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models

    Authors: Tianyu Wang, Haitao Lin, Junqiu Yu, Yanwei Fu

    Abstract: This paper investigates the task of the open-ended interactive robotic manipulation on table-top scenarios. While recent Large Language Models (LLMs) enhance robots' comprehension of user instructions, their lack of visual grounding constrains their ability to physically interact with the environment. This is because the robot needs to locate the target object for manipulation within the physical… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: Accepted by IROS 2024. 8 pages, 5 figures. See https://star-uu-wang.github.io/Polaris/

  11. arXiv:2408.05899  [pdf, other

    quant-ph cs.AI cs.LG

    Quantum Gradient Class Activation Map for Model Interpretability

    Authors: Hsin-Yi Lin, Huan-Hsin Tseng, Samuel Yen-Chi Chen, Shinjae Yoo

    Abstract: Quantum machine learning (QML) has recently made significant advancements in various topics. Despite the successes, the safety and interpretability of QML applications have not been thoroughly investigated. This work proposes using Variational Quantum Circuits (VQCs) for activation mapping to enhance model transparency, introducing the Quantum Gradient Class Activation Map (QGrad-CAM). This hybrid… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: Submitted to IEEE SiPS 2024

  12. arXiv:2408.05211  [pdf, other

    cs.CV cs.AI cs.CL

    VITA: Towards Open-Source Interactive Omni Multimodal LLM

    Authors: Chaoyou Fu, Haojia Lin, Zuwei Long, Yunhang Shen, Meng Zhao, Yifan Zhang, Xiong Wang, Di Yin, Long Ma, Xiawu Zheng, Ran He, Rongrong Ji, Yunsheng Wu, Caifeng Shan, Xing Sun

    Abstract: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advance… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Project Page: https://vita-home.github.io

  13. arXiv:2408.03706  [pdf, other

    cs.CL cs.AI cs.LG

    Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction

    Authors: Benjamin Matthias Ruppik, Michael Heck, Carel van Niekerk, Renato Vukovic, Hsien-chin Lin, Shutong Feng, Marcus Zibrowius, Milica Gašić

    Abstract: A common approach for sequence tagging tasks based on contextual word representations is to train a machine learning classifier directly on these embedding vectors. This approach has two shortcomings. First, such methods consider single input sequences in isolation and are unable to put an individual embedding vector in relation to vectors outside the current local context of use. Second, the high… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted as a long paper to SIGDIAL 2024. 9 pages, 2 figures, 3 tables

  14. arXiv:2408.03505  [pdf, other

    cs.CL cs.AI cs.DC

    Optimus: Accelerating Large-Scale Multi-Modal LLM Training by Bubble Exploitation

    Authors: Weiqi Feng, Yangrui Chen, Shaoyu Wang, Yanghua Peng, Haibin Lin, Minlan Yu

    Abstract: Multimodal large language models (MLLMs) have extended the success of large language models (LLMs) to multiple data types, such as image, text and audio, achieving significant performance in various domains, including multimodal translation, visual question answering and content generation. Nonetheless, existing systems are inefficient to train MLLMs due to substantial GPU bubbles caused by the he… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  15. arXiv:2408.03281  [pdf, other

    cs.CL cs.AI cs.LG

    StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation

    Authors: Boxi Cao, Mengjie Ren, Hongyu Lin, Xianpei Han, Feng Zhang, Junfeng Zhan, Le Sun

    Abstract: Evaluation is the baton for the development of large language models. Current evaluations typically employ a single-item assessment paradigm for each atomic test objective, which struggles to discern whether a model genuinely possesses the required capabilities or merely memorizes/guesses the answers to specific questions. To this end, we propose a novel evaluation framework referred to as StructE… ▽ More

    Submitted 6 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: ACL 2024;Benchmark at https://github.com/c-box/StructEval ;Leaderboard at https://huggingface.co/spaces/Bowieee/StructEval_leaderboard

  16. arXiv:2408.03238  [pdf, other

    cs.RO cs.CV

    LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion

    Authors: Jinyu Zhang, Yongchong Gu, Jianxiong Gao, Haitao Lin, Qiang Sun, Xinwei Sun, Xiangyang Xue, Yanwei Fu

    Abstract: This paper addresses the challenge of perceiving complete object shapes through visual perception. While prior studies have demonstrated encouraging outcomes in segmenting the visible parts of objects within a scene, amodal segmentation, in particular, has the potential to allow robots to infer the occluded parts of objects. To this end, this paper introduces a new framework that explores amodal s… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: accepted by IROS2024

  17. arXiv:2408.02976  [pdf, ps, other

    cs.CL cs.AI

    Empathy Level Alignment via Reinforcement Learning for Empathetic Response Generation

    Authors: Hui Ma, Bo Zhang, Bo Xu, Jian Wang, Hongfei Lin, Xiao Sun

    Abstract: Empathetic response generation, aiming at understanding the user's situation and feelings and respond empathically, is crucial in building human-like dialogue systems. Previous methods mainly focus on using maximum likelihood estimation as the optimization objective for training response generation models, without taking into account the empathy level alignment between generated responses and targ… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  18. arXiv:2408.02417  [pdf, other

    cs.CL

    Infusing Emotions into Task-oriented Dialogue Systems: Understanding, Management, and Generation

    Authors: Shutong Feng, Hsien-chin Lin, Christian Geishauser, Nurul Lubis, Carel van Niekerk, Michael Heck, Benjamin Ruppik, Renato Vukovic, Milica Gašić

    Abstract: Emotions are indispensable in human communication, but are often overlooked in task-oriented dialogue (ToD) modelling, where the task success is the primary focus. While existing works have explored user emotions or similar concepts in some ToD tasks, none has so far included emotion modelling into a fully-fledged ToD system nor conducted interaction with human or simulated users. In this work, we… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted by SIGDIAL 2024

  19. arXiv:2408.02361  [pdf, other

    cs.CL cs.AI cs.LG

    Dialogue Ontology Relation Extraction via Constrained Chain-of-Thought Decoding

    Authors: Renato Vukovic, David Arps, Carel van Niekerk, Benjamin Matthias Ruppik, Hsien-Chin Lin, Michael Heck, Milica Gašić

    Abstract: State-of-the-art task-oriented dialogue systems typically rely on task-specific ontologies for fulfilling user queries. The majority of task-oriented dialogue data, such as customer service recordings, comes without ontology and annotation. Such ontologies are normally built manually, limiting the application of specialised systems. Dialogue ontology construction is an approach for automating that… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted to appear at SIGDIAL 2024. 9 pages, 4 figures

  20. arXiv:2407.20207  [pdf, other

    cs.CL cs.AI cs.IR

    QAEA-DR: A Unified Text Augmentation Framework for Dense Retrieval

    Authors: Hongming Tan, Shaoxiong Zhan, Hai Lin, Hai-Tao Zheng, Wai Kin, Chan

    Abstract: In dense retrieval, embedding long texts into dense vectors can result in information loss, leading to inaccurate query-text matching. Additionally, low-quality texts with excessive noise or sparse key information are unlikely to align well with relevant queries. Recent studies mainly focus on improving the sentence embedding model or retrieval process. In this work, we introduce a novel text augm… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  21. arXiv:2407.20174  [pdf, other

    cs.CV cs.AI

    Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning

    Authors: Xingchen Zeng, Haichuan Lin, Yilin Ye, Wei Zeng

    Abstract: Emerging multimodal large language models (MLLMs) exhibit great potential for chart question answering (CQA). Recent efforts primarily focus on scaling up training datasets (i.e., charts, data tables, and question-answer (QA) pairs) through data collection and synthesis. However, our empirical study on existing MLLMs and CQA datasets reveals notable gaps. First, current data collection and synthes… ▽ More

    Submitted 11 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

    Comments: 11 pages, 7 figures

  22. arXiv:2407.20143  [pdf, other

    cs.AI

    ByteCheckpoint: A Unified Checkpointing System for LLM Development

    Authors: Borui Wan, Mingji Han, Yiyao Sheng, Zhichao Lai, Mofan Zhang, Junda Zhang, Yanghua Peng, Haibin Lin, Xin Liu, Chuan Wu

    Abstract: The development of real-world Large Language Models (LLMs) necessitates checkpointing of training states in persistent storage to mitigate potential software and hardware failures, as well as to facilitate checkpoint transferring within the training pipeline and across various tasks. Due to the immense size of LLMs, saving and loading checkpoints often incur intolerable minute-level stalls, signif… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  23. arXiv:2407.19669  [pdf, other

    cs.CL cs.IR

    mGTE: Generalized Long-Context Text Representation and Reranking Models for Multilingual Text Retrieval

    Authors: Xin Zhang, Yanzhao Zhang, Dingkun Long, Wen Xie, Ziqi Dai, Jialong Tang, Huan Lin, Baosong Yang, Pengjun Xie, Fei Huang, Meishan Zhang, Wenjie Li, Min Zhang

    Abstract: We present systematic efforts in building long-context multilingual text representation model (TRM) and reranker from scratch for text retrieval. We first introduce a text encoder (base size) enhanced with RoPE and unpadding, pre-trained in a native 8192-token context (longer than 512 of previous multilingual encoders). Then we construct a hybrid TRM and a cross-encoder reranker by contrastive lea… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 20 pages, 5 figures

  24. arXiv:2407.17956  [pdf, other

    cs.CV

    SaccadeDet: A Novel Dual-Stage Architecture for Rapid and Accurate Detection in Gigapixel Images

    Authors: Wenxi Li, Ruxin Zhang, Haozhe Lin, Yuchen Guo, Chao Ma, Xiaokang Yang

    Abstract: The advancement of deep learning in object detection has predominantly focused on megapixel images, leaving a critical gap in the efficient processing of gigapixel images. These super high-resolution images present unique challenges due to their immense size and computational demands. To address this, we introduce 'SaccadeDet', an innovative architecture for gigapixel-level object detection, inspi… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: This paper is accepted to ECML-PKDD 2024

    Journal ref: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2024

  25. arXiv:2407.15362  [pdf, other

    cs.CV cs.AI

    A Multimodal Knowledge-enhanced Whole-slide Pathology Foundation Model

    Authors: Yingxue Xu, Yihui Wang, Fengtao Zhou, Jiabo Ma, Shu Yang, Huangjing Lin, Xin Wang, Jiguang Wang, Li Liang, Anjia Han, Ronald Cheong Kin Chan, Hao Chen

    Abstract: Remarkable strides in computational pathology have been made in the task-agnostic foundation model that advances the performance of a wide array of downstream clinical tasks. Despite the promising performance, there are still several challenges. First, prior works have resorted to either vision-only or vision-captions data, disregarding invaluable pathology reports and gene expression profiles whi… ▽ More

    Submitted 5 August, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 45 pages, 9 figures

  26. arXiv:2407.15346  [pdf, other

    cs.CV cs.CL cs.MM

    Knowledge Acquisition Disentanglement for Knowledge-based Visual Question Answering with Large Language Models

    Authors: Wenbin An, Feng Tian, Jiahao Nie, Wenkai Shi, Haonan Lin, Yan Chen, QianYing Wang, Yaqiang Wu, Guang Dai, Ping Chen

    Abstract: Knowledge-based Visual Question Answering (KVQA) requires both image and world knowledge to answer questions. Current methods first retrieve knowledge from the image and external knowledge base with the original complex question, then generate answers with Large Language Models (LLMs). However, since the original question contains complex elements that require knowledge from different sources, acq… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Pre-print

  27. arXiv:2407.14768  [pdf, other

    cs.LG cs.AI

    Teach Harder, Learn Poorer: Rethinking Hard Sample Distillation for GNN-to-MLP Knowledge Distillation

    Authors: Lirong Wu, Yunfan Liu, Haitao Lin, Yufei Huang, Stan Z. Li

    Abstract: To bridge the gaps between powerful Graph Neural Networks (GNNs) and lightweight Multi-Layer Perceptron (MLPs), GNN-to-MLP Knowledge Distillation (KD) proposes to distill knowledge from a well-trained teacher GNN into a student MLP. In this paper, we revisit the knowledge samples (nodes) in teacher GNNs from the perspective of hardness, and identify that hard sample distillation may be a major per… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  28. arXiv:2407.14653  [pdf, other

    cs.LG

    OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning

    Authors: Yihang Yao, Zhepeng Cen, Wenhao Ding, Haohong Lin, Shiqi Liu, Tingnan Zhang, Wenhao Yu, Ding Zhao

    Abstract: Offline safe reinforcement learning (RL) aims to train a policy that satisfies constraints using a pre-collected dataset. Most current methods struggle with the mismatch between imperfect demonstrations and the desired safe and rewarding performance. In this paper, we introduce OASIS (cOnditionAl diStributIon Shaping), a new paradigm in offline safe RL designed to overcome these critical limitatio… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  29. arXiv:2407.14474  [pdf, other

    cs.CV

    Contrastive Learning with Counterfactual Explanations for Radiology Report Generation

    Authors: Mingjie Li, Haokun Lin, Liang Qiu, Xiaodan Liang, Ling Chen, Abdulmotaleb Elsaddik, Xiaojun Chang

    Abstract: Due to the common content of anatomy, radiology images with their corresponding reports exhibit high similarity. Such inherent data bias can predispose automatic report generation models to learn entangled and spurious representations resulting in misdiagnostic reports. To tackle these, we propose a novel \textbf{Co}unter\textbf{F}actual \textbf{E}xplanations-based framework (CoFE) for radiology r… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  30. arXiv:2407.12279  [pdf, other

    cs.LG cs.CV

    ER-FSL: Experience Replay with Feature Subspace Learning for Online Continual Learning

    Authors: Huiwei Lin

    Abstract: Online continual learning (OCL) involves deep neural networks retaining knowledge from old data while adapting to new data, which is accessible only once. A critical challenge in OCL is catastrophic forgetting, reflected in reduced model performance on old data. Existing replay-based methods mitigate forgetting by replaying buffered samples from old data and learning current samples of new data. I… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 10 pages, 6 figures

  31. arXiv:2407.11470  [pdf, other

    cs.SE cs.AI cs.CL

    Beyond Correctness: Benchmarking Multi-dimensional Code Generation for Large Language Models

    Authors: Jiasheng Zheng, Boxi Cao, Zhengzhao Ma, Ruotong Pan, Hongyu Lin, Yaojie Lu, Xianpei Han, Le Sun

    Abstract: In recent years, researchers have proposed numerous benchmarks to evaluate the impressive coding capabilities of large language models (LLMs). However, existing benchmarks primarily focus on assessing the correctness of code generated by LLMs, while neglecting other critical dimensions that also significantly impact code quality. Therefore, this paper proposes the RACE benchmark, which comprehensi… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: We release benchmark at https://github.com/jszheng21/RACE and leaderboard at https://huggingface.co/spaces/jszheng/RACE_leaderboard

  32. arXiv:2407.10967  [pdf, other

    cs.LG cs.AI

    BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning

    Authors: Haohong Lin, Wenhao Ding, Jian Chen, Laixi Shi, Jiacheng Zhu, Bo Li, Ding Zhao

    Abstract: Offline model-based reinforcement learning (MBRL) enhances data efficiency by utilizing pre-collected datasets to learn models and policies, especially in scenarios where exploration is costly or infeasible. Nevertheless, its performance often suffers from the objective mismatch between model and policy learning, resulting in inferior performance despite accurate model predictions. This paper firs… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  33. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  34. arXiv:2407.10040  [pdf, other

    cs.AI

    Lean-STaR: Learning to Interleave Thinking and Proving

    Authors: Haohan Lin, Zhiqing Sun, Yiming Yang, Sean Welleck

    Abstract: Traditional language model-based theorem proving assumes that by training on a sufficient amount of formal proof data, a model will learn to prove theorems. Our key observation is that a wealth of informal information that is not present in formal proofs can be useful for learning to prove theorems. For instance, humans think through steps of a proof, but this thought process is not visible in the… ▽ More

    Submitted 8 August, 2024; v1 submitted 13 July, 2024; originally announced July 2024.

  35. arXiv:2407.06317  [pdf, other

    cs.AI cs.CV cs.RO

    Enhanced Safety in Autonomous Driving: Integrating Latent State Diffusion Model for End-to-End Navigation

    Authors: Detian Chu, Linyuan Bai, Jianuo Huang, Zhenlong Fang, Peng Zhang, Wei Kang, Haifeng Lin

    Abstract: With the advancement of autonomous driving, ensuring safety during motion planning and navigation is becoming more and more important. However, most end-to-end planning methods suffer from a lack of safety. This research addresses the safety issue in the control optimization problem of autonomous driving, formulated as Constrained Markov Decision Processes (CMDPs). We propose a novel, model-based… ▽ More

    Submitted 17 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  36. arXiv:2407.06001  [pdf, other

    cs.CV cs.MM

    Pseudo-triplet Guided Few-shot Composed Image Retrieval

    Authors: Bohan Hou, Haoqiang Lin, Haokun Wen, Meng Liu, Xuemeng Song

    Abstract: Composed Image Retrieval (CIR) is a challenging task that aims to retrieve the target image based on a multimodal query, i.e., a reference image and its corresponding modification text. While previous supervised or zero-shot learning paradigms all fail to strike a good trade-off between time-consuming annotation cost and retrieval performance, recent researchers introduced the task of few-shot CIR… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 15 pages, 5 figures,

  37. arXiv:2407.05594  [pdf, other

    cs.CV

    SLIM: Spuriousness Mitigation with Minimal Human Annotations

    Authors: Xiwei Xuan, Ziquan Deng, Hsuan-Tien Lin, Kwan-Liu Ma

    Abstract: Recent studies highlight that deep learning models often learn spurious features mistakenly linked to labels, compromising their reliability in real-world scenarios where such correlations do not hold. Despite the increasing research effort, existing solutions often face two main challenges: they either demand substantial annotations of spurious attributes, or they yield less competitive outcomes… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: This paper is accepted by ECCV 2024

  38. arXiv:2407.03917  [pdf, other

    cs.CV

    Timestep-Aware Correction for Quantized Diffusion Models

    Authors: Yuzhe Yao, Feng Tian, Jun Chen, Haonan Lin, Guang Dai, Yong Liu, Jingdong Wang

    Abstract: Diffusion models have marked a significant breakthrough in the synthesis of semantically coherent images. However, their extensive noise estimation networks and the iterative generation process limit their wider application, particularly on resource-constrained platforms like mobile devices. Existing post-training quantization (PTQ) methods have managed to compress diffusion models to low precisio… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  39. arXiv:2407.02327  [pdf, other

    cs.LG cs.DC

    QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices

    Authors: Juntao Zhao, Borui Wan, Yanghua Peng, Haibin Lin, Yibo Zhu, Chuan Wu

    Abstract: A number of production deep learning clusters have attempted to explore inference hardware for DNN training, at the off-peak serving hours with many inference GPUs idling. Conducting DNN training with a combination of heterogeneous training and inference GPUs, known as hybrid device training, presents considerable challenges due to disparities in compute capability and significant differences in m… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: IPDPS 24

  40. arXiv:2407.00614  [pdf, other

    cs.RO cs.CV eess.IV

    Learning Granularity-Aware Affordances from Human-Object Interaction for Tool-Based Functional Grasping in Dexterous Robotics

    Authors: Fan Yang, Wenrui Chen, Kailun Yang, Haoran Lin, DongSheng Luo, Conghui Tang, Zhiyong Li, Yaonan Wang

    Abstract: To enable robots to use tools, the initial step is teaching robots to employ dexterous gestures for touching specific areas precisely where tasks are performed. Affordance features of objects serve as a bridge in the functional interaction between agents and objects. However, leveraging these affordance cues to help robots achieve functional tool grasping remains unresolved. To address this, we pr… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: The source code and the established dataset will be made publicly available at https://github.com/yangfan293/GAAF-DEX

  41. arXiv:2407.00114  [pdf, other

    cs.LG cs.AI cs.CL

    OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

    Authors: Zihao Wang, Shaofei Cai, Zhancun Mu, Haowei Lin, Ceyao Zhang, Xuejie Liu, Qing Li, Anji Liu, Xiaojian Ma, Yitao Liang

    Abstract: We present OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-world instruction-following agents in open-world Minecraft. Compared to prior works that either emit textual goals to separate controllers or produce the control command directly, OmniJARVIS seeks a different path to ensure both strong reasoning and efficient decision-making capabilities via unified tokenization of multimod… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  42. arXiv:2406.20098  [pdf, other

    cs.CV cs.AI cs.CL

    Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs

    Authors: Sukmin Yun, Haokun Lin, Rusiru Thushara, Mohammad Qazim Bhat, Yongxin Wang, Zutao Jiang, Mingkai Deng, Jinhong Wang, Tianhua Tao, Junbo Li, Haonan Li, Preslav Nakov, Timothy Baldwin, Zhengzhong Liu, Eric P. Xing, Xiaodan Liang, Zhiqiang Shen

    Abstract: Multimodal large language models (MLLMs) have shown impressive success across modalities such as image, video, and audio in a variety of understanding and generation tasks. However, current MLLMs are surprisingly poor at understanding webpage screenshots and generating their corresponding HTML code. To address this problem, we propose Web2Code, a benchmark consisting of a new large-scale webpage-t… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Website at https://mbzuai-llm.github.io/webpage2code/

  43. arXiv:2406.19598  [pdf, other

    cs.CL

    Mixture of In-Context Experts Enhance LLMs' Long Context Awareness

    Authors: Hongzhan Lin, Ang Lv, Yuhan Chen, Chen Zhu, Yang Song, Hengshu Zhu, Rui Yan

    Abstract: Many studies have revealed that large language models (LLMs) exhibit uneven awareness of different contextual positions.Their limited context awareness can lead to overlooking critical information and subsequent task failures. While several approaches have been proposed to enhance LLMs' context awareness, achieving both effectiveness and efficiency remains challenging.In this paper, for LLMs utili… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 14 pages, 5 figures

  44. arXiv:2406.19392  [pdf, other

    cs.CV

    ReXTime: A Benchmark Suite for Reasoning-Across-Time in Videos

    Authors: Jr-Jen Chen, Yu-Chien Liao, Hsi-Che Lin, Yu-Chu Yu, Yen-Chun Chen, Yu-Chiang Frank Wang

    Abstract: We introduce ReXTime, a benchmark designed to rigorously test AI models' ability to perform temporal reasoning within video events. Specifically, ReXTime focuses on reasoning across time, i.e. human-like understanding when the question and its corresponding answer occur in different video segments. This form of reasoning, requiring advanced understanding of cause-and-effect relationships across vi… ▽ More

    Submitted 2 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: Project page: https://rextime.github.io/

  45. arXiv:2406.18591  [pdf, other

    cs.CV cs.AI cs.LG

    Composition Vision-Language Understanding via Segment and Depth Anything Model

    Authors: Mingxiao Huo, Pengliang Ji, Haotian Lin, Junchen Liu, Yixiao Wang, Yijun Chen

    Abstract: We introduce a pioneering unified library that leverages depth anything, segment anything models to augment neural comprehension in language-vision model zero-shot understanding. This library synergizes the capabilities of the Depth Anything Model (DAM), Segment Anything Model (SAM), and GPT-4V, enhancing multimodal tasks such as vision-question-answering (VQA) and composition reasoning. Through t… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  46. arXiv:2406.15881  [pdf, other

    cs.LG cs.AI

    Fast Tree-Field Integrators: From Low Displacement Rank to Topological Transformers

    Authors: Krzysztof Choromanski, Arijit Sehanobish, Somnath Basu Roy Chowdhury, Han Lin, Avinava Dubey, Tamas Sarlos, Snigdha Chaturvedi

    Abstract: We present a new class of fast polylog-linear algorithms based on the theory of structured matrices (in particular low displacement rank) for integrating tensor fields defined on weighted trees. Several applications of the resulting fast tree-field integrators (FTFIs) are presented, including (a) approximation of graph metrics with tree metrics, (b) graph classification, (c) modeling on meshes, an… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: Preprint. Comments welcome

  47. arXiv:2406.13231  [pdf, other

    cs.DS

    Tight Lower Bounds for Directed Cut Sparsification and Distributed Min-Cut

    Authors: Yu Cheng, Max Li, Honghao Lin, Zi-Yi Tai, David P. Woodruff, Jason Zhang

    Abstract: In this paper, we consider two fundamental cut approximation problems on large graphs. We prove new lower bounds for both problems that are optimal up to logarithmic factors. The first problem is to approximate cuts in balanced directed graphs. In this problem, the goal is to build a data structure that $(1 \pm ε)$-approximates cut values in graphs with $n$ vertices. For arbitrary directed graph… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  48. arXiv:2406.12718  [pdf, other

    cs.CV cs.AI cs.CL

    AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

    Authors: Wenbin An, Feng Tian, Sicong Leng, Jiahao Nie, Haonan Lin, QianYing Wang, Guang Dai, Ping Chen, Shijian Lu

    Abstract: Despite their great success across various multimodal tasks, Large Vision-Language Models (LVLMs) are facing a prevalent problem with object hallucinations, where the generated textual responses are inconsistent with ground-truth objects in the given image. This paper investigates various LVLMs and pinpoints attention deficiency toward discriminative local image features as one root cause of objec… ▽ More

    Submitted 21 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  49. arXiv:2406.12386  [pdf, other

    cs.CL

    IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models

    Authors: Qiyao Wang, Jianguo Huang, Shule Lu, Yuan Lin, Kan Xu, Liang Yang, Hongfei Lin

    Abstract: The rapid development of Large Language Models (LLMs) in vertical domains, including intellectual property (IP), lacks a specific evaluation benchmark for assessing their understanding, application, and reasoning abilities. To fill this gap, we introduce IPEval, the first evaluation benchmark tailored for IP agency and consulting tasks. IPEval comprises 2657 multiple-choice questions across four m… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  50. arXiv:2406.12221  [pdf, other

    cs.CL

    On-Policy Fine-grained Knowledge Feedback for Hallucination Mitigation

    Authors: Xueru Wen, Xinyu Lu, Xinyan Guan, Yaojie Lu, Hongyu Lin, Ben He, Xianpei Han, Le Sun

    Abstract: Hallucination occurs when large language models (LLMs) exhibit behavior that deviates from the boundaries of their knowledge during the response generation process. Previous learning-based methods focus on detecting knowledge boundaries and finetuning models with instance-level feedback, but they suffer from inaccurate signals due to off-policy data sampling and coarse-grained feedback. In this pa… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.