Skip to main content

Showing 1–50 of 206 results for author: Tian, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.05090  [pdf, other

    cs.CV cs.MM

    Loc4Plan: Locating Before Planning for Outdoor Vision and Language Navigation

    Authors: Huilin Tian, Jingke Meng, Wei-Shi Zheng, Yuan-Ming Li, Junkai Yan, Yunong Zhang

    Abstract: Vision and Language Navigation (VLN) is a challenging task that requires agents to understand instructions and navigate to the destination in a visual environment.One of the key challenges in outdoor VLN is keeping track of which part of the instruction was completed. To alleviate this problem, previous works mainly focus on grounding the natural language to the visual input, but neglecting the cr… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2203.13838 by other authors

  2. arXiv:2408.02718  [pdf, other

    cs.CV

    MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

    Authors: Fanqing Meng, Jin Wang, Chuanhao Li, Quanfeng Lu, Hao Tian, Jiaqi Liao, Xizhou Zhu, Jifeng Dai, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao

    Abstract: The capability to process multiple images is crucial for Large Vision-Language Models (LVLMs) to develop a more thorough and nuanced understanding of a scene. Recent multi-image LVLMs have begun to address this need. However, their evaluation has not kept pace with their development. To fill this gap, we introduce the Multimodal Multi-image Understanding (MMIU) benchmark, a comprehensive evaluatio… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Project Page: https://mmiu-bench.github.io/

  3. arXiv:2407.21570  [pdf

    cs.RO

    Vision and Contact based Optimal Control for Autonomous Trocar Docking

    Authors: Christopher E. Mower, Martin Huber, Huanyu Tian, Ayoob Davoodi, Emmanuel Vander Poorten, Tom Vercauteren, Christos Bergeles

    Abstract: Future operating theatres will be equipped with robots to perform various surgical tasks including, for example, endoscope control. Human-in-the-loop supervisory control architectures where the surgeon selects from several autonomous sequences is already being successfully applied in preclinical tests. Inserting an endoscope into a trocar or introducer is a key step for every keyhole surgical proc… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Presented at the 12th Conference on New Technologies for Computer and Robot Assisted Surgery

  4. arXiv:2407.15838  [pdf, other

    cs.CV

    MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity

    Authors: Yangzhou Liu, Yue Cao, Zhangwei Gao, Weiyun Wang, Zhe Chen, Wenhai Wang, Hao Tian, Lewei Lu, Xizhou Zhu, Tong Lu, Yu Qiao, Jifeng Dai

    Abstract: Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of Vision Large Language Models (VLLMs). However, existing visual instruction tuning datasets include the following limitations: (1) Instruction annotation quality: despite existing VLLMs exhibiting strong performance, instructions generated by those advanced VLLMs may still suffer from inaccuracies, s… ▽ More

    Submitted 7 August, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: 18 pages, 8 figures, technical report

  5. arXiv:2407.10471  [pdf, other

    cs.CR cs.AI cs.SD eess.AS

    GROOT: Generating Robust Watermark for Diffusion-Model-Based Audio Synthesis

    Authors: Weizhi Liu, Yue Li, Dongdong Lin, Hui Tian, Haizhou Li

    Abstract: Amid the burgeoning development of generative models like diffusion models, the task of differentiating synthesized audio from its natural counterpart grows more daunting. Deepfake detection offers a viable solution to combat this challenge. Yet, this defensive measure unintentionally fuels the continued refinement of generative models. Watermarking emerges as a proactive and sustainable tactic, p… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  6. arXiv:2407.00225  [pdf, other

    cs.SE

    Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation

    Authors: Wendkûuni C. Ouédraogo, Kader Kaboré, Haoye Tian, Yewei Song, Anil Koyuncu, Jacques Klein, David Lo, Tegawendé F. Bissyandé

    Abstract: Unit testing, crucial for identifying bugs in code modules like classes and methods, is often neglected by developers due to time constraints. Automated test generation techniques have emerged to address this, but often lack readability and require developer intervention. Large Language Models (LLMs), like GPT and Mistral, show promise in software engineering, including in test generation. However… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  7. arXiv:2406.13972  [pdf, other

    cs.SE

    CREF: An LLM-based Conversational Software Repair Framework for Programming Tutors

    Authors: Boyang Yang, Haoye Tian, Weiguo Pian, Haoran Yu, Haitao Wang, Jacques Klein, Tegawendé F. Bissyandé, Shunfu Jin

    Abstract: Program repair techniques offer cost-saving benefits for debugging within software development and programming education scenarios. With the proven effectiveness of Large Language Models (LLMs) in code-related tasks, researchers have explored their potential for program repair. However, it is crucial to recognize that existing repair benchmarks may have influenced LLM training data, potentially ca… ▽ More

    Submitted 8 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  8. arXiv:2406.13558   

    cs.AI

    Enhancing Travel Choice Modeling with Large Language Models: A Prompt-Learning Approach

    Authors: Xuehao Zhai, Hanlin Tian, Lintong Li, Tianyu Zhao

    Abstract: Travel choice analysis is crucial for understanding individual travel behavior to develop appropriate transport policies and recommendation systems in Intelligent Transportation Systems (ITS). Despite extensive research, this domain faces two critical challenges: a) modeling with limited survey data, and b) simultaneously achieving high model explainability and accuracy. In this paper, we introduc… ▽ More

    Submitted 22 June, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

    Comments: We currently do not have a replacement version available. We request withdrawal due to a significant methodological error affecting the paper's validity, specifically a miscalculation in data preprocessing. We are working on corrections, but this will take time. We believe an interim withdrawal is necessary to prevent the dissemination of incorrect information.

  9. arXiv:2406.10857  [pdf, other

    cs.SE

    An LLM-enhanced Multi-objective Evolutionary Search for Autonomous Driving Test Scenario Generation

    Authors: Haoxiang Tian, Xingshuo Han, Guoquan Wu, Yuan Zhou, Shuo Li, Jun Wei, Dan Ye, Wei Wang, Tianwei Zhang

    Abstract: The safety of Autonomous Driving Systems (ADSs) is significantly important for the implementation of autonomous vehicles (AVs). Therefore, ADSs must be evaluated thoroughly before their release and deployment to the public. How to generate diverse safety-critical test scenarios is a key task for ADS testing. This paper proposes LEADE, an LLM-enhanced scenario generation approach for ADS testing, w… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 12 pages

  10. arXiv:2406.08418  [pdf, other

    cs.CV cs.AI

    OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

    Authors: Qingyun Li, Zhe Chen, Weiyun Wang, Wenhai Wang, Shenglong Ye, Zhenjiang Jin, Guanzhou Chen, Yinan He, Zhangwei Gao, Erfei Cui, Jiashuo Yu, Hao Tian, Jiasheng Zhou, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai, Licheng Wen, Xiangchao Yan, Zhenxiang Li, Pei Chu, Yi Wang , et al. (15 additional authors not shown)

    Abstract: Image-text interleaved data, consisting of multiple images and texts arranged in a natural document format, aligns with the presentation paradigm of internet data and closely resembles human reading habits. Recent studies have shown that such data aids multimodal in-context learning and maintains the capabilities of large language models during multimodal fine-tuning. However, the limited scale an… ▽ More

    Submitted 12 July, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

  11. arXiv:2406.05892  [pdf, other

    cs.CR cs.LG cs.SE

    Security Vulnerability Detection with Multitask Self-Instructed Fine-Tuning of Large Language Models

    Authors: Aidan Z. H. Yang, Haoye Tian, He Ye, Ruben Martins, Claire Le Goues

    Abstract: Software security vulnerabilities allow attackers to perform malicious activities to disrupt software operations. Recent Transformer-based language models have significantly advanced vulnerability detection, surpassing the capabilities of static analysis based deep learning models. However, language models trained solely on code tokens do not capture either the explanation of vulnerability type or… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  12. arXiv:2405.18786  [pdf, other

    cs.LG cs.CV

    MOKD: Cross-domain Finetuning for Few-shot Classification via Maximizing Optimized Kernel Dependence

    Authors: Hongduan Tian, Feng Liu, Tongliang Liu, Bo Du, Yiu-ming Cheung, Bo Han

    Abstract: In cross-domain few-shot classification, \emph{nearest centroid classifier} (NCC) aims to learn representations to construct a metric space where few-shot classification can be performed by measuring the similarities between samples and the prototype of each class. An intuition behind NCC is that each sample is pulled closer to the class centroid it belongs to while pushed away from those of other… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  13. arXiv:2405.07411  [pdf, other

    cs.CV cs.AI

    MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks

    Authors: Haijiang Tian, Jingkun Yue, Xiaohong Liu, Guoxing Yang, Zeyu Jiang, Guangyu Wang

    Abstract: Medical images are often more difficult to acquire than natural images due to the specialism of the equipment and technology, which leads to less medical image datasets. So it is hard to train a strong pretrained medical vision model. How to make the best of natural pretrained vision model and adapt in medical domain still pends. For image classification, a popular method is linear probe (LP). How… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  14. arXiv:2405.05817  [pdf, other

    cs.RO

    Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion

    Authors: Huanyu Tian, Martin Huber, Christopher E. Mower, Zhe Han, Changsheng Li, Xingguang Duan, Christos Bergeles

    Abstract: In this study, we introduce a novel shared-control system for key-hole docking operations, combining a commercial camera with occlusion-robust pose estimation and a hand-eye information fusion technique. This system is used to enhance docking precision and force-compliance safety. To train a hand-eye information fusion network model, we generated a self-supervised dataset using this docking system… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  15. arXiv:2405.05545  [pdf, other

    cs.LG stat.ML

    Deep Hierarchical Graph Alignment Kernels

    Authors: Shuhao Tang, Hao Tian, Xiaofeng Cao, Wei Ye

    Abstract: Typical R-convolution graph kernels invoke the kernel functions that decompose graphs into non-isomorphic substructures and compare them. However, overlooking implicit similarities and topological position information between those substructures limits their performances. In this paper, we introduce Deep Hierarchical Graph Alignment Kernels (DHGAK) to resolve this problem. Specifically, the relati… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  16. arXiv:2405.00482  [pdf, other

    cs.CR cs.LG

    PackVFL: Efficient HE Packing for Vertical Federated Learning

    Authors: Liu Yang, Shuowei Cai, Di Chai, Junxue Zhang, Han Tian, Yilun Jin, Kun Guo, Kai Chen, Qiang Yang

    Abstract: As an essential tool of secure distributed machine learning, vertical federated learning (VFL) based on homomorphic encryption (HE) suffers from severe efficiency problems due to data inflation and time-consuming operations. To this core, we propose PackVFL, an efficient VFL framework based on packed HE (PackedHE), to accelerate the existing HE-based VFL algorithms. PackVFL packs multiple cleartex… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 12 pages excluding references

  17. arXiv:2404.16821  [pdf, other

    cs.CV

    How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites

    Authors: Zhe Chen, Weiyun Wang, Hao Tian, Shenglong Ye, Zhangwei Gao, Erfei Cui, Wenwen Tong, Kongzhi Hu, Jiapeng Luo, Zheng Ma, Ji Ma, Jiaqi Wang, Xiaoyi Dong, Hang Yan, Hewei Guo, Conghui He, Botian Shi, Zhenjiang Jin, Chao Xu, Bin Wang, Xingjian Wei, Wei Li, Wenjian Zhang, Bo Zhang, Pinlong Cai , et al. (10 additional authors not shown)

    Abstract: In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual… ▽ More

    Submitted 29 April, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

    Comments: Technical report

  18. arXiv:2404.15199  [pdf, other

    cs.LG

    Reinforcement Learning with Adaptive Control Regularization for Safe Control of Critical Systems

    Authors: Haozhe Tian, Homayoun Hamedmoghadam, Robert Shorten, Pietro Ferraro

    Abstract: Reinforcement Learning (RL) is a powerful method for controlling dynamic systems, but its learning mechanism can lead to unpredictable actions that undermine the safety of critical systems. Here, we propose RL with Adaptive Control Regularization (RL-ACR), an algorithm that enables safe RL exploration by combining the RL policy with a policy regularizer that hard-codes safety constraints. We perfo… ▽ More

    Submitted 23 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  19. arXiv:2404.12636  [pdf, other

    cs.SE

    Multi-Objective Fine-Tuning for Enhanced Program Repair with LLMs

    Authors: Boyang Yang, Haoye Tian, Jiadong Ren, Hongyu Zhang, Jacques Klein, Tegawendé F. Bissyandé, Claire Le Goues, Shunfu Jin

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities on a broad spectrum of downstream tasks. Within the realm of software engineering, specialized tasks on code, such as program repair, present unique challenges, necessitating fine-tuning to unlock state-of-the-art performance. Fine-tuning approaches proposed in the literature for LLMs on program repair tasks are however general… ▽ More

    Submitted 22 April, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  20. arXiv:2404.08570  [pdf, other

    cs.RO cs.AI cs.LG

    Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation

    Authors: Hanlin Tian, Kethan Reddy, Yuxiang Feng, Mohammed Quddus, Yiannis Demiris, Panagiotis Angeloudis

    Abstract: This paper introduces CRITICAL, a novel closed-loop framework for autonomous vehicle (AV) training and testing. CRITICAL stands out for its ability to generate diverse scenarios, focusing on critical driving situations that target specific learning and performance gaps identified in the Reinforcement Learning (RL) agent. The framework achieves this by integrating real-world traffic dynamics, drivi… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 7 pages, 5 figures

  21. arXiv:2404.05258  [pdf, other

    cs.CV

    Unsupervised Band Selection Using Fused HSI and LiDAR Attention Integrating With Autoencoder

    Authors: Judy X Yang, Jun Zhou, Jing Wang, Hui Tian, Alan Wee Chung Liew

    Abstract: Band selection in hyperspectral imaging (HSI) is critical for optimising data processing and enhancing analytical accuracy. Traditional approaches have predominantly concentrated on analysing spectral and pixel characteristics within individual bands independently. These approaches overlook the potential benefits of integrating multiple data sources, such as Light Detection and Ranging (LiDAR), an… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: 13 pages, 13figures, 6 tables

    MSC Class: F.2.2; I.2.7

  22. arXiv:2404.03883  [pdf, other

    eess.IV cs.CV

    LiDAR-Guided Cross-Attention Fusion for Hyperspectral Band Selection and Image Classification

    Authors: Judy X Yang, Jun Zhou, Jing Wang, Hui Tian, Alan Wee-Chung Liew

    Abstract: The fusion of hyperspectral and LiDAR data has been an active research topic. Existing fusion methods have ignored the high-dimensionality and redundancy challenges in hyperspectral images, despite that band selection methods have been intensively studied for hyperspectral image (HSI) processing. This paper addresses this significant gap by introducing a cross-attention mechanism from the transfor… ▽ More

    Submitted 15 April, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: 15 pages, 13 figures

    MSC Class: F.2.2; I.2.7

    Journal ref: IEEE - TGRS-2024-00264.R1 Final Files Received

  23. arXiv:2404.01780  [pdf, other

    astro-ph.IM astro-ph.GA cs.CV

    CSST Strong Lensing Preparation: a Framework for Detecting Strong Lenses in the Multi-color Imaging Survey by the China Survey Space Telescope (CSST)

    Authors: Xu Li, Ruiqi Sun, Jiameng Lv, Peng Jia, Nan Li, Chengliang Wei, Zou Hu, Xinzhong Er, Yun Chen, Zhang Ban, Yuedong Fang, Qi Guo, Dezi Liu, Guoliang Li, Lin Lin, Ming Li, Ran Li, Xiaobo Li, Yu Luo, Xianmin Meng, Jundan Nie, Zhaoxiang Qi, Yisheng Qiu, Li Shao, Hao Tian , et al. (7 additional authors not shown)

    Abstract: Strong gravitational lensing is a powerful tool for investigating dark matter and dark energy properties. With the advent of large-scale sky surveys, we can discover strong lensing systems on an unprecedented scale, which requires efficient tools to extract them from billions of astronomical objects. The existing mainstream lens-finding tools are based on machine learning algorithms and applied to… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: The paper is accepted by the AJ. The complete code could be downloaded with DOI of: 10.12149/101393. Comments are welcome

  24. arXiv:2404.00272  [pdf, other

    cs.CV

    HSIMamba: Hyperpsectral Imaging Efficient Feature Learning with Bidirectional State Space for Classification

    Authors: Judy X Yang, Jun Zhou, Jing Wang, Hui Tian, Alan Wee Chung Liew

    Abstract: Classifying hyperspectral images is a difficult task in remote sensing, due to their complex high-dimensional data. To address this challenge, we propose HSIMamba, a novel framework that uses bidirectional reversed convolutional neural network pathways to extract spectral features more efficiently. Additionally, it incorporates a specialized block for spatial analysis. Our approach combines the op… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

    Comments: 11 pages, 2 figures, 8 tables

    ACM Class: F.2.2, I.2.7

  25. arXiv:2403.14085  [pdf, other

    cs.CV

    Surface Reconstruction from Point Clouds via Grid-based Intersection Prediction

    Authors: Hui Tian, Kai Xu

    Abstract: Surface reconstruction from point clouds is a crucial task in the fields of computer vision and computer graphics. SDF-based methods excel at reconstructing smooth meshes with minimal error and artefacts but struggle with representing open surfaces. On the other hand, UDF-based methods can effectively represent open surfaces but often introduce noise, leading to artefacts in the mesh. In this work… ▽ More

    Submitted 8 April, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  26. arXiv:2403.08896  [pdf, other

    cs.LG cs.DC

    One-Shot Averaging for Distributed TD($λ$) Under Markov Sampling

    Authors: Haoxing Tian, Ioannis Ch. Paschalidis, Alex Olshevsky

    Abstract: We consider a distributed setup for reinforcement learning, where each agent has a copy of the same Markov Decision Process but transitions are sampled from the corresponding Markov chain independently by each agent. We show that in this setting, we can achieve a linear speedup for TD($λ$), a family of popular methods for policy evaluation, in the sense that $N$ agents can evaluate a policy $N$ ti… ▽ More

    Submitted 31 May, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  27. arXiv:2403.06838  [pdf, other

    cs.SE cs.CR

    ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart Contracts

    Authors: Lyuye Zhang, Kaixuan Li, Kairan Sun, Daoyuan Wu, Ye Liu, Haoye Tian, Yang Liu

    Abstract: Smart contracts are susceptible to various security issues, among which access control (AC) vulnerabilities are particularly critical. While existing research has proposed multiple detection tools, the automatic and appropriate repair of AC vulnerabilities in smart contracts remains a challenge. Unlike commonly supported vulnerability types by existing repair tools, such as reentrancy, which are u… ▽ More

    Submitted 18 March, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: This is a technical report from Nanyang Technological University

  28. arXiv:2403.06520  [pdf, other

    cs.CL cs.AI

    How to Understand Named Entities: Using Common Sense for News Captioning

    Authors: Ning Xu, Yanhui Wang, Tingting Zhang, Hongshuo Tian, Mohan Kankanhalli, An-An Liu

    Abstract: News captioning aims to describe an image with its news article body as input. It greatly relies on a set of detected named entities, including real-world people, organizations, and places. This paper exploits commonsense knowledge to understand named entities for news captioning. By ``understand'', we mean correlating the news content with common sense in the wild, which helps an agent to 1) dist… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  29. arXiv:2403.05101  [pdf, other

    cs.CL cs.AI

    Rule-driven News Captioning

    Authors: Ning Xu, Tingting Zhang, Hongshuo Tian, An-An Liu

    Abstract: News captioning task aims to generate sentences by describing named entities or concrete events for an image with its news article. Existing methods have achieved remarkable results by relying on the large-scale pre-trained models, which primarily focus on the correlations between the input news content and the output predictions. However, the news captioning requires adhering to some fundamental… ▽ More

    Submitted 14 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  30. arXiv:2403.01798  [pdf, other

    cs.NI cs.LG

    Towards Fair and Efficient Learning-based Congestion Control

    Authors: Xudong Liao, Han Tian, Chaoliang Zeng, Xinchen Wan, Kai Chen

    Abstract: Recent years have witnessed a plethora of learning-based solutions for congestion control (CC) that demonstrate better performance over traditional TCP schemes. However, they fail to provide consistently good convergence properties, including {\em fairness}, {\em fast convergence} and {\em stability}, due to the mismatch between their objective functions and these properties. Despite being intuiti… ▽ More

    Submitted 4 March, 2024; originally announced March 2024.

  31. arXiv:2402.19414  [pdf, ps, other

    cs.SI cs.DS

    Higher-Order Networks Representation and Learning: A Survey

    Authors: Hao Tian, Reza Zafarani

    Abstract: Network data has become widespread, larger, and more complex over the years. Traditional network data is dyadic, capturing the relations among pairs of entities. With the need to model interactions among more than two entities, significant research has focused on higher-order networks and ways to represent, analyze, and learn from them. There are two main directions to studying higher-order networ… ▽ More

    Submitted 9 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: 25 pages

    MSC Class: 68Q06 ACM Class: A.1; I.5.1

  32. arXiv:2402.15321  [pdf, other

    cs.CV cs.AI cs.LG

    OpenSUN3D: 1st Workshop Challenge on Open-Vocabulary 3D Scene Understanding

    Authors: Francis Engelmann, Ayca Takmaz, Jonas Schult, Elisabetta Fedele, Johanna Wald, Songyou Peng, Xi Wang, Or Litany, Siyu Tang, Federico Tombari, Marc Pollefeys, Leonidas Guibas, Hongbo Tian, Chunjie Wang, Xiaosheng Yan, Bingwen Wang, Xuanyang Zhang, Xiao Liu, Phuc Nguyen, Khoi Nguyen, Anh Tran, Cuong Pham, Zhening Huang, Xiaoyang Wu, Xi Chen , et al. (3 additional authors not shown)

    Abstract: This report provides an overview of the challenge hosted at the OpenSUN3D Workshop on Open-Vocabulary 3D Scene Understanding held in conjunction with ICCV 2023. The goal of this workshop series is to provide a platform for exploration and discussion of open-vocabulary 3D scene understanding tasks, including but not limited to segmentation, detection and mapping. We provide an overview of the chall… ▽ More

    Submitted 17 March, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: Our OpenSUN3D workshop website for ICCV 2023: https://opensun3d.github.io/index_iccv23.html

  33. arXiv:2402.02172  [pdf, other

    cs.SE

    CodeAgent: Collaborative Agents for Software Engineering

    Authors: Daniel Tang, Kisub Kim, Yewei Song, Cedric Lothritz, Bei Li, Saad Ezzini, Haoye Tian, Jacques Klein, Tegawende F. Bissyande

    Abstract: Code review, which aims at ensuring the overall quality and reliability of software, is a cornerstone of software development. Unfortunately, while crucial, Code review is a labor-intensive process that the research community is looking to automate. Existing automated methods rely on single input-output generative models and thus generally struggle to emulate the collaborative nature of code revie… ▽ More

    Submitted 28 June, 2024; v1 submitted 3 February, 2024; originally announced February 2024.

  34. arXiv:2401.16566  [pdf, other

    cs.RO

    Excitation Trajectory Optimization for Dynamic Parameter Identification Using Virtual Constraints in Hands-on Robotic System

    Authors: Huanyu Tian, Martin Huber, Christopher E. Mower, Zhe Han, Changsheng Li, Xingguang Duan, Christos Bergeles

    Abstract: This paper proposes a novel, more computationally efficient method for optimizing robot excitation trajectories for dynamic parameter identification, emphasizing self-collision avoidance. This addresses the system identification challenges for getting high-quality training data associated with co-manipulated robotic arms that can be equipped with a variety of tools, a common scenario in industrial… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

  35. arXiv:2401.07870  [pdf, other

    cs.CL cs.AI cs.SE

    JumpCoder: Go Beyond Autoregressive Coder via Online Modification

    Authors: Mouxiang Chen, Hao Tian, Zhongxin Liu, Xiaoxue Ren, Jianling Sun

    Abstract: While existing code large language models (code LLMs) exhibit impressive capabilities in code generation, their autoregressive sequential generation inherently lacks reversibility. This limitation hinders them from timely correcting previous missing statements during coding as humans do, often leading to error propagation and suboptimal performance. We introduce JumpCoder, a novel model-agnostic f… ▽ More

    Submitted 5 June, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

    Comments: ACL 2024 (main)

  36. arXiv:2312.15186  [pdf, other

    cs.DC cs.AI cs.LG

    Efficient Asynchronous Federated Learning with Sparsification and Quantization

    Authors: Juncheng Jia, Ji Liu, Chendi Zhou, Hao Tian, Mianxiong Dong, Dejing Dou

    Abstract: While data is distributed in multiple edge devices, Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data. FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training, while several devices are selected in each round. However, straggler devices may… ▽ More

    Submitted 6 January, 2024; v1 submitted 23 December, 2023; originally announced December 2023.

    Comments: To appear in Concurrency and Computation: Practice and Experience (CCPE), 21 pages

  37. arXiv:2312.09245  [pdf, other

    cs.CV

    DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving

    Authors: Wenhai Wang, Jiangwei Xie, ChuanYang Hu, Haoming Zou, Jianan Fan, Wenwen Tong, Yang Wen, Silei Wu, Hanming Deng, Zhiqi Li, Hao Tian, Lewei Lu, Xizhou Zhu, Xiaogang Wang, Yu Qiao, Jifeng Dai

    Abstract: Large language models (LLMs) have opened up new possibilities for intelligent agents, endowing them with human-like thinking and cognitive abilities. In this work, we delve into the potential of large language models (LLMs) in autonomous driving (AD). We introduce DriveMLM, an LLM-based AD framework that can perform close-loop autonomous driving in realistic simulators. To this end, (1) we bridge… ▽ More

    Submitted 25 December, 2023; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Technical Report

  38. arXiv:2312.09086  [pdf, other

    cs.LG cs.NE

    COMBHelper: A Neural Approach to Reduce Search Space for Graph Combinatorial Problems

    Authors: Hao Tian, Sourav Medya, Wei Ye

    Abstract: Combinatorial Optimization (CO) problems over graphs appear routinely in many applications such as in optimizing traffic, viral marketing in social networks, and matching for job allocation. Due to their combinatorial nature, these problems are often NP-hard. Existing approximation algorithms and heuristics rely on the search space to find the solutions and become time-consuming when this space is… ▽ More

    Submitted 1 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

  39. arXiv:2312.05397  [pdf, other

    cs.LG

    On the Performance of Temporal Difference Learning With Neural Networks

    Authors: Haoxing Tian, Ioannis Ch. Paschalidis, Alex Olshevsky

    Abstract: Neural Temporal Difference (TD) Learning is an approximate temporal difference method for policy evaluation that uses a neural network for function approximation. Analysis of Neural TD Learning has proven to be challenging. In this paper we provide a convergence analysis of Neural TD Learning with a projection onto $B(θ_0, ω)$, a ball of fixed radius $ω$ around the initial point $θ_0$. We show an… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  40. arXiv:2312.02521  [pdf, other

    cs.CV cs.AI

    Retrieving Conditions from Reference Images for Diffusion Models

    Authors: Haoran Tang, Xin Zhou, Jieren Deng, Zhihong Pan, Hao Tian, Pratik Chaudhari

    Abstract: Newly developed diffusion-based techniques have showcased phenomenal abilities in producing a wide range of high-quality images, sparking considerable interest in various applications. A prevalent scenario is to generate new images based on a subject from reference images. This subject could be face identity for styled avatars, body and clothing for virtual try-on and so on. Satisfying this requir… ▽ More

    Submitted 15 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  41. arXiv:2312.01241  [pdf, other

    cs.CR cs.AI

    Just-in-Time Security Patch Detection -- LLM At the Rescue for Data Augmentation

    Authors: Xunzhu Tang, Zhenghan Chen, Kisub Kim, Haoye Tian, Saad Ezzini, Jacques Klein

    Abstract: In the face of growing vulnerabilities found in open-source software, the need to identify {discreet} security patches has become paramount. The lack of consistency in how software providers handle maintenance often leads to the release of security patches without comprehensive advisories, leaving users vulnerable to unaddressed security risks. To address this pressing issue, we introduce a novel… ▽ More

    Submitted 12 December, 2023; v1 submitted 2 December, 2023; originally announced December 2023.

  42. arXiv:2311.18835  [pdf, other

    cs.CV

    InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation

    Authors: Rongyao Fang, Shilin Yan, Zhaoyang Huang, Jingqiu Zhou, Hao Tian, Jifeng Dai, Hongsheng Li

    Abstract: Empowering models to dynamically accomplish tasks specified through natural language instructions represents a promising path toward more capable and general artificial intelligence. In this work, we introduce InstructSeq, an instruction-conditioned multi-modal modeling framework that unifies diverse vision tasks through flexible natural language control and handling of both visual and textual dat… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

    Comments: 10 pages

  43. arXiv:2311.18405  [pdf, other

    cs.CV

    CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model

    Authors: Jianhao Zeng, Dan Song, Weizhi Nie, Hongshuo Tian, Tongtong Wang, Anan Liu

    Abstract: Generative Adversarial Networks (GANs) dominate the research field in image-based virtual try-on, but have not resolved problems such as unnatural deformation of garments and the blurry generation quality. While the generative quality of diffusion models is impressive, achieving controllability poses a significant challenge when applying it to virtual try-on and multiple denoising iterations limit… ▽ More

    Submitted 25 April, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

  44. arXiv:2311.12307  [pdf, other

    cs.AI

    Causality is all you need

    Authors: Ning Xu, Yifei Gao, Hongshuo Tian, Yongdong Zhang, An-An Liu

    Abstract: In the fundamental statistics course, students are taught to remember the well-known saying: "Correlation is not Causation". Till now, statistics (i.e., correlation) have developed various successful frameworks, such as Transformer and Pre-training large-scale models, which have stacked multiple parallel self-attention blocks to imitate a wide range of tasks. However, in the causation community, h… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

  45. arXiv:2311.03865  [pdf, other

    cs.LG cs.AI cs.CR

    When Fairness Meets Privacy: Exploring Privacy Threats in Fair Binary Classifiers through Membership Inference Attacks

    Authors: Huan Tian, Guangsheng Zhang, Bo Liu, Tianqing Zhu, Ming Ding, Wanlei Zhou

    Abstract: Previous studies have developed fairness methods for biased models that exhibit discriminatory behaviors towards specific subgroups. While these models have shown promise in achieving fair predictions, recent research has identified their potential vulnerability to score-based membership inference attacks (MIAs). In these attacks, adversaries can infer whether a particular data sample was used dur… ▽ More

    Submitted 12 January, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: Under review

  46. arXiv:2310.14560  [pdf, other

    cs.CV

    Polyhedral Surface: Self-supervised Point Cloud Reconstruction Based on Polyhedral Surface

    Authors: Hui Tian, Kai Xu

    Abstract: Point cloud reconstruction from raw point cloud has been an important topic in computer graphics for decades, especially due to its high demand in modeling and rendering applications. An important way to solve this problem is establishing a local geometry to fit the local curve. However, previous methods build either a local plane or polynomial curve. Local plane brings the loss of sharp feature a… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  47. arXiv:2310.12753   

    cs.SE

    Patch-CLIP: A Patch-Text Pre-Trained Model

    Authors: Xunzhu Tang, Zhenghan Chen, Saad Ezzini, Haoye Tian, Jacques Klein, Tegawende F. Bissyande

    Abstract: In recent years, patch representation learning has emerged as a necessary research direction for exploiting the capabilities of machine learning in software generation. These representations have driven significant performance enhancements across a variety of tasks involving code changes. While the progress is undeniable, a common limitation among existing models is their specialization: they pred… ▽ More

    Submitted 30 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: The paper is incomplete, causing much confusion for the community

  48. arXiv:2310.02559  [pdf, ps, other

    cs.IT cs.LG eess.SP

    Semi-Federated Learning: Convergence Analysis and Optimization of A Hybrid Learning Framework

    Authors: Jingheng Zheng, Wanli Ni, Hui Tian, Deniz Gunduz, Tony Q. S. Quek, Zhu Han

    Abstract: Under the organization of the base station (BS), wireless federated learning (FL) enables collaborative model training among multiple devices. However, the BS is merely responsible for aggregating local updates during the training process, which incurs a waste of the computational resource at the BS. To tackle this issue, we propose a semi-federated learning (SemiFL) paradigm to leverage the compu… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted by IEEE Transactions on Wireless Communications

  49. Convergence Analysis and Latency Minimization for Semi-Federated Learning in Massive IoT Networks

    Authors: Jianyang Ren, Wanli Ni, Hui Tian, Gaofeng Nie

    Abstract: As the number of sensors becomes massive in Internet of Things (IoT) networks, the amount of data is humongous. To process data in real-time while protecting user privacy, federated learning (FL) has been regarded as an enabling technique to push edge intelligence into IoT networks with massive devices. However, FL latency increases dramatically due to the increase of the number of parameters in d… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: This paper has been accepted by IEEE Transactions on Green Communications and Networking

  50. arXiv:2310.01045  [pdf, other

    cs.CL

    Tool-Augmented Reward Modeling

    Authors: Lei Li, Yekun Chai, Shuohuan Wang, Yu Sun, Hao Tian, Ningyu Zhang, Hua Wu

    Abstract: Reward modeling (a.k.a., preference modeling) is instrumental for aligning large language models with human preferences, particularly within the context of reinforcement learning from human feedback (RLHF). While conventional reward models (RMs) have exhibited remarkable scalability, they oft struggle with fundamental functionality such as arithmetic computation, code execution, and factual lookup… ▽ More

    Submitted 11 February, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 Spotlight