Skip to main content

Showing 1–50 of 2,265 results for author: Li, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10739  [pdf, other

    cs.CV

    TrackNeRF: Bundle Adjusting NeRF from Sparse and Noisy Views via Feature Tracks

    Authors: Jinjie Mai, Wenxuan Zhu, Sara Rojas, Jesus Zarzar, Abdullah Hamdi, Guocheng Qian, Bing Li, Silvio Giancola, Bernard Ghanem

    Abstract: Neural radiance fields (NeRFs) generally require many images with accurate poses for accurate novel view synthesis, which does not reflect realistic setups where views can be sparse and poses can be noisy. Previous solutions for learning NeRFs with sparse views and noisy poses only consider local geometry consistency with pairs of views. Closely following \textit{bundle adjustment} in Structure-fr… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: ECCV 2024 (supplemental pages included)

  2. arXiv:2408.09933  [pdf, other

    cs.SD cs.AI eess.AS

    SZU-AFS Antispoofing System for the ASVspoof 5 Challenge

    Authors: Yuxiong Xu, Jiafeng Zhong, Sengui Zheng, Zefeng Liu, Bin Li

    Abstract: This paper presents the SZU-AFS anti-spoofing system, designed for Track 1 of the ASVspoof 5 Challenge under open conditions. The system is built with four stages: selecting a baseline model, exploring effective data augmentation (DA) methods for fine-tuning, applying a co-enhancement strategy based on gradient norm aware minimization (GAM) for secondary fine-tuning, and fusing logits scores from… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 8 pages, 2 figures, ASVspoof 5 Workshop (Interspeech2024 Satellite)

  3. arXiv:2408.09615  [pdf, other

    cs.CV

    The First Competition on Resource-Limited Infrared Small Target Detection Challenge: Methods and Results

    Authors: Boyang Li, Xinyi Ying, Ruojing Li, Yongxian Liu, Yangsi Shi, Miao Li

    Abstract: In this paper, we briefly summarize the first competition on resource-limited infrared small target detection (namely, LimitIRSTD). This competition has two tracks, including weakly-supervised infrared small target detection (Track 1) and lightweight infrared small target detection (Track 2). 46 and 60 teams successfully registered and took part in Tracks 1 and Track 2, respectively. The top-perfo… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  4. arXiv:2408.09481  [pdf, other

    cs.CL cs.AI

    PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis

    Authors: Meng Luo, Hao Fei, Bobo Li, Shengqiong Wu, Qian Liu, Soujanya Poria, Erik Cambria, Mong-Li Lee, Wynne Hsu

    Abstract: While existing Aspect-based Sentiment Analysis (ABSA) has received extensive effort and advancement, there are still gaps in defining a more holistic research target seamlessly integrating multimodality, conversation context, fine-granularity, and also covering the changing sentiment dynamics as well as cognitive causal rationales. This paper bridges the gaps by introducing a multimodal conversati… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM 2024 (Oral)

  5. arXiv:2408.09462  [pdf, other

    cs.MM

    SpeechEE: A Novel Benchmark for Speech Event Extraction

    Authors: Bin Wang, Meishan Zhang, Hao Fei, Yu Zhao, Bobo Li, Shengqiong Wu, Wei Ji, Min Zhang

    Abstract: Event extraction (EE) is a critical direction in the field of information extraction, laying an important foundation for the construction of structured knowledge bases. EE from text has received ample research and attention for years, yet there can be numerous real-world applications that require direct information acquisition from speech signals, online meeting minutes, interview summaries, press… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  6. arXiv:2408.09040  [pdf, other

    cs.NI eess.SY

    GLANCE: Graph-based Learnable Digital Twin for Communication Networks

    Authors: Boning Li, Gunjan Verma, Timofey Efimov, Abhishek Kumar, Santiago Segarra

    Abstract: As digital twins (DTs) to physical communication systems, network simulators can aid the design and deployment of communication networks. However, time-consuming simulations must be run for every new set of network configurations. Learnable digital twins (LDTs), in contrast, can be trained offline to emulate simulation outcomes and serve as a more efficient alternative to simulation-based DTs at r… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  7. arXiv:2408.08981  [pdf, other

    cs.IR cs.CL

    From Lazy to Prolific: Tackling Missing Labels in Open Vocabulary Extreme Classification by Positive-Unlabeled Sequence Learning

    Authors: Haoran Ranran Zhang, Bensu Uçar, Soumik Dey, Hansi Wu, Binbin Li, Rui Zhang

    Abstract: Open-vocabulary Extreme Multi-label Classification (OXMC) extends traditional XMC by allowing prediction beyond an extremely large, predefined label set (typically $10^3$ to $10^{12}$ labels), addressing the dynamic nature of real-world labeling tasks. However, self-selection bias in data annotation leads to significant missing labels in both training and test data, particularly for less popular i… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  8. arXiv:2408.08808  [pdf, other

    cs.LG cs.AI

    Constructing Domain-Specific Evaluation Sets for LLM-as-a-judge

    Authors: Ravi Raju, Swayambhoo Jain, Bo Li, Jonathan Li, Urmish Thakker

    Abstract: Large Language Models (LLMs) have revolutionized the landscape of machine learning, yet current benchmarks often fall short in capturing the diverse behavior of these models in real-world applications. A benchmark's usefulness is determined by its ability to clearly differentiate between models of varying capabilities (separability) and closely align with human preferences. Existing frameworks lik… ▽ More

    Submitted 19 August, 2024; v1 submitted 16 August, 2024; originally announced August 2024.

    Comments: 14 pages, 8 figures, Under review

  9. arXiv:2408.08345  [pdf, other

    cs.CV

    5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks

    Authors: Dongshuo Yin, Leiyi Hu, Bin Li, Youqun Zhang, Xue Yang

    Abstract: Pre-training & fine-tuning can enhance the transferring efficiency and performance in visual tasks. Recent delta-tuning methods provide more options for visual classification tasks. Despite their success, existing visual delta-tuning art fails to exceed the upper limit of full fine-tuning on challenging tasks like object detection and segmentation. To find a competitive alternative to full fine-tu… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2311.15010

  10. arXiv:2408.08071  [pdf, other

    cs.LG cs.NE

    Universality of Real Minimal Complexity Reservoir

    Authors: Robert Simon Fong, Boyu Li, Peter Tiňo

    Abstract: Reservoir Computing (RC) models, a subclass of recurrent neural networks, are distinguished by their fixed, non-trainable input layer and dynamically coupled reservoir, with only the static readout layer being trained. This design circumvents the issues associated with backpropagating error signals through time, thereby enhancing both stability and training efficiency. RC models have been successf… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 19 pages, 5 figures

  11. arXiv:2408.07891  [pdf, other

    cs.CV cs.AI cs.LG

    Quantum-inspired Interpretable Deep Learning Architecture for Text Sentiment Analysis

    Authors: Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Yuan Yuan

    Abstract: Text has become the predominant form of communication on social media, embedding a wealth of emotional nuances. Consequently, the extraction of emotional information from text is of paramount importance. Despite previous research making some progress, existing text sentiment analysis models still face challenges in integrating diverse semantic information and lack interpretability. To address thes… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  12. arXiv:2408.07317  [pdf, other

    cs.HC

    Connecting Dreams with Visual Brainstorming Instruction

    Authors: Yasheng Sun, Bohan Li, Mingchen Zhuge, Deng-Ping Fan, Salman Khan, Fahad Shahbaz Khan, Hideki Koike

    Abstract: Recent breakthroughs in understanding the human brain have revealed its impressive ability to efficiently process and interpret human thoughts, opening up possibilities for intervening in brain signals. In this paper, we aim to develop a straightforward framework that uses other modalities, such as natural language, to translate the original dreamland. We present DreamConnect, employing a dual-str… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  13. arXiv:2408.07266  [pdf, other

    cs.CV cs.RO

    Enhanced Scale-aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling

    Authors: Ruofeng Wei, Bin Li, Kai Chen, Yiyao Ma, Yunhui Liu, Qi Dou

    Abstract: Scale-aware monocular depth estimation poses a significant challenge in computer-aided endoscopic navigation. However, existing depth estimation methods that do not consider the geometric priors struggle to learn the absolute scale from training with monocular endoscopic sequences. Additionally, conventional methods face difficulties in accurately estimating details on tissue and instruments bound… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  14. arXiv:2408.06158  [pdf, other

    cs.CV

    OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning

    Authors: Mushui Liu, Bozheng Li, Yunlong Yu

    Abstract: Recent Vision-Language Models (VLMs) \textit{e.g.} CLIP have made great progress in video recognition. Despite the improvement brought by the strong visual backbone in extracting spatial features, CLIP still falls short in capturing and integrating spatial-temporal features which is essential for video recognition. In this paper, we propose OmniCLIP, a framework that adapts CLIP for video recognit… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: ECAI-2024

  15. arXiv:2408.05802  [pdf, other

    cs.CV

    Egocentric Vision Language Planning

    Authors: Zhirui Fang, Ming Yang, Weishuai Zeng, Boyu Li, Junpeng Yue, Ziluo Ding, Xiu Li, Zongqing Lu

    Abstract: We explore leveraging large multi-modal models (LMMs) and text2image models to build a more general embodied agent. LMMs excel in planning long-horizon tasks over symbolic abstractions but struggle with grounding in the physical world, often failing to accurately identify object positions in images. A bridge is needed to connect LMMs to the physical world. The paper proposes a novel approach, egoc… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  16. arXiv:2408.05435  [pdf, other

    quant-ph cs.LG

    SuperEncoder: Towards Universal Neural Approximate Quantum State Preparation

    Authors: Yilun Zhao, Bingmeng Wang, Wenle Jiang, Xiwei Pan, Bing Li, Yinhe Han, Ying Wang

    Abstract: Numerous quantum algorithms operate under the assumption that classical data has already been converted into quantum states, a process termed Quantum State Preparation (QSP). However, achieving precise QSP requires a circuit depth that scales exponentially with the number of qubits, making it a substantial obstacle in harnessing quantum advantage. Recent research suggests using a Parameterized Qua… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  17. arXiv:2408.05109  [pdf, other

    cs.DB

    A Survey of NL2SQL with Large Language Models: Where are we, and where are we going?

    Authors: Xinyu Liu, Shuyu Shen, Boyan Li, Peixian Ma, Runzhi Jiang, Yuyu Luo, Yuxin Zhang, Ju Fan, Guoliang Li, Nan Tang

    Abstract: Translating users' natural language queries (NL) into SQL queries (i.e., NL2SQL) can significantly reduce barriers to accessing relational databases and support various commercial applications. The performance of NL2SQL has been greatly enhanced with the emergence of Large Language Models (LLMs). In this survey, we provide a comprehensive review of NL2SQL techniques powered by LLMs, covering its e… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  18. arXiv:2408.04812  [pdf, other

    cs.ET cs.AI

    A Collaborative PIM Computing Optimization Framework for Multi-Tenant DNN

    Authors: Bojing Li, Duo Zhong, Xiang Chen, Chenchen Liu

    Abstract: Modern Artificial Intelligence (AI) applications are increasingly utilizing multi-tenant deep neural networks (DNNs), which lead to a significant rise in computing complexity and the need for computing parallelism. ReRAM-based processing-in-memory (PIM) computing, with its high density and low power consumption characteristics, holds promising potential for supporting the deployment of multi-tenan… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  19. arXiv:2408.04181  [pdf, other

    cs.CR cs.AI

    EdgeShield: A Universal and Efficient Edge Computing Framework for Robust AI

    Authors: Duo Zhong, Bojing Li, Xiang Chen, Chenchen Liu

    Abstract: The increasing prevalence of adversarial attacks on Artificial Intelligence (AI) systems has created a need for innovative security measures. However, the current methods of defending against these attacks often come with a high computing cost and require back-end processing, making real-time defense challenging. Fortunately, there have been remarkable advancements in edge-computing, which make it… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  20. arXiv:2408.03544  [pdf, other

    cs.CL cs.AI

    Unlocking the Non-Native Language Context Limitation: Native Language Prompting Facilitates Knowledge Elicitation

    Authors: Baixuan Li, Yunlong Fan, Zhiqiang Gao

    Abstract: Multilingual large language models (MLLMs) struggle to answer questions posed in non-dominant languages, even though they have acquired the relevant knowledge from their dominant language corpus. In contrast, human multilinguals can overcome such non-native language context limitations through Positive Native Language Transfer (PNLT). Inspired by the process of PNLT, we analogize the dominant lang… ▽ More

    Submitted 16 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

  21. arXiv:2408.03326  [pdf, other

    cs.CV cs.AI cs.CL

    LLaVA-OneVision: Easy Visual Task Transfer

    Authors: Bo Li, Yuanhan Zhang, Dong Guo, Renrui Zhang, Feng Li, Hao Zhang, Kaichen Zhang, Yanwei Li, Ziwei Liu, Chunyuan Li

    Abstract: We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. Our experimental results demonstrate that LLaVA-OneVision is the first single model that can simultaneously push the performance boundaries of open LMMs in three important computer vision scenarios: single-i… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Project Homepage: https://llava-vl.github.io/blog/2024-08-05-llava-onevision/

  22. arXiv:2408.02479  [pdf, other

    cs.SE cs.AI cs.CL

    From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future

    Authors: Haolin Jin, Linghan Huang, Haipeng Cai, Jun Yan, Bo Li, Huaming Chen

    Abstract: With the rise of large language models (LLMs), researchers are increasingly exploring their applications in var ious vertical domains, such as software engineering. LLMs have achieved remarkable success in areas including code generation and vulnerability detection. However, they also exhibit numerous limitations and shortcomings. LLM-based agents, a novel tech nology with the potential for Artifi… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  23. arXiv:2408.02206  [pdf, other

    cs.RO

    Large-scale Deployment of Vision-based Tactile Sensors on Multi-fingered Grippers

    Authors: Meng Wang, Wanlin Li, Hao Liang, Boren Li, Kaspar Althoefer, Yao Su, Hangxin Liu

    Abstract: Vision-based Tactile Sensors (VBTSs) show significant promise in that they can leverage image measurements to provide high-spatial-resolution human-like performance. However, current VBTS designs, typically confined to the fingertips of robotic grippers, prove somewhat inadequate, as many grasping and manipulation tasks require multiple contact points with the object. With an end goal of enabling… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Journal ref: IROS 2024

  24. arXiv:2408.01827  [pdf, other

    cs.CV cs.AI

    ST-SACLF: Style Transfer Informed Self-Attention Classifier for Bias-Aware Painting Classification

    Authors: Mridula Vijendran, Frederick W. B. Li, Jingjing Deng, Hubert P. H. Shum

    Abstract: Painting classification plays a vital role in organizing, finding, and suggesting artwork for digital and classic art galleries. Existing methods struggle with adapting knowledge from the real world to artistic images during training, leading to poor performance when dealing with different datasets. Our innovation lies in addressing these challenges through a two-step process. First, we generate m… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  25. arXiv:2408.01766  [pdf, other

    cs.CV

    MultiFuser: Multimodal Fusion Transformer for Enhanced Driver Action Recognition

    Authors: Ruoyu Wang, Wenqian Wang, Jianjun Gao, Dan Lin, Kim-Hui Yap, Bingbing Li

    Abstract: Driver action recognition, aiming to accurately identify drivers' behaviours, is crucial for enhancing driver-vehicle interactions and ensuring driving safety. Unlike general action recognition, drivers' environments are often challenging, being gloomy and dark, and with the development of sensors, various cameras such as IR and depth cameras have emerged for analyzing drivers' behaviors. Therefor… ▽ More

    Submitted 17 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

  26. arXiv:2408.01680  [pdf, ps, other

    cs.IT

    Service Placement and Trajectory Design for Heterogeneous Tasks in Multi-UAV Cooperative Computing Networks

    Authors: Bin Li, Rongrong Yang, Lei Liu, Celimuge Wu

    Abstract: In this paper, we consider deploying multiple Unmanned Aerial Vehicles (UAVs) to enhance the computation service of Mobile Edge Computing (MEC) through collaborative computation among UAVs. In particular, the tasks of different types and service requirements in MEC network are offloaded from one UAV to another. To pursue the goal of low-carbon edge computing, we study the problem of minimizing sys… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: 11 pages, 10 figures

  27. arXiv:2408.01343  [pdf, other

    cs.CV cs.AI cs.LG

    StitchFusion: Weaving Any Visual Modalities to Enhance Multimodal Semantic Segmentation

    Authors: Bingyu Li, Da Zhang, Zhiyuan Zhao, Junyu Gao, Xuelong Li

    Abstract: Multimodal semantic segmentation shows significant potential for enhancing segmentation accuracy in complex scenes. However, current methods often incorporate specialized feature fusion modules tailored to specific modalities, thereby restricting input flexibility and increasing the number of training parameters. To address these challenges, we propose StitchFusion, a straightforward yet effective… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  28. arXiv:2408.00761  [pdf, other

    cs.LG cs.AI cs.CL

    Tamper-Resistant Safeguards for Open-Weight LLMs

    Authors: Rishub Tamirisa, Bhrugu Bharathi, Long Phan, Andy Zhou, Alice Gatti, Tarun Suresh, Maxwell Lin, Justin Wang, Rowan Wang, Ron Arel, Andy Zou, Dawn Song, Bo Li, Dan Hendrycks, Mantas Mazeika

    Abstract: Rapid advances in the capabilities of large language models (LLMs) have raised widespread concerns regarding their potential for malicious use. Open-weight LLMs present unique challenges, as existing safeguards lack robustness to tampering attacks that modify model weights. For example, recent works have demonstrated that refusal and unlearning safeguards can be trivially removed with a few steps… ▽ More

    Submitted 8 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

    Comments: Website: https://www.tamper-resistant-safeguards.com

  29. arXiv:2408.00620  [pdf, other

    cs.CV cs.CL

    Are Bigger Encoders Always Better in Vision Large Models?

    Authors: Bozhou Li, Hao Liang, Zimo Meng, Wentao Zhang

    Abstract: In recent years, multimodal large language models (MLLMs) have shown strong potential in real-world applications. They are developing rapidly due to their remarkable ability to comprehend multimodal information and their inherent powerful cognitive and reasoning capabilities. Among MLLMs, vision language models (VLM) stand out for their ability to understand vision information. However, the scalin… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  30. arXiv:2407.21611  [pdf, other

    cs.SD cs.AI eess.AS

    Enhancing Partially Spoofed Audio Localization with Boundary-aware Attention Mechanism

    Authors: Jiafeng Zhong, Bin Li, Jiangyan Yi

    Abstract: The task of partially spoofed audio localization aims to accurately determine audio authenticity at a frame level. Although some works have achieved encouraging results, utilizing boundary information within a single model remains an unexplored research topic. In this work, we propose a novel method called Boundary-aware Attention Mechanism (BAM). Specifically, it consists of two core modules: Bou… ▽ More

    Submitted 19 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted by interspeech 2024

  31. arXiv:2407.21596  [pdf, other

    cs.CV

    Evaluating SAM2's Role in Camouflaged Object Detection: From SAM to SAM2

    Authors: Lv Tang, Bo Li

    Abstract: The Segment Anything Model (SAM), introduced by Meta AI Research as a generic object segmentation model, quickly garnered widespread attention and significantly influenced the academic community. To extend its application to video, Meta further develops Segment Anything Model 2 (SAM2), a unified model capable of both video and image segmentation. SAM2 shows notable improvements over its predecesso… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  32. arXiv:2407.21325  [pdf

    cs.AR

    EdgeLLM: A Highly Efficient CPU-FPGA Heterogeneous Edge Accelerator for Large Language Models

    Authors: Mingqiang Huang, Ao Shen, Kai Li, Haoxiang Peng, Boyu Li, Hao Yu

    Abstract: The rapid advancements in artificial intelligence (AI), particularly the Large Language Models (LLMs), have profoundly affected our daily work and communication forms. However, the colossal scale of LLM presents significant operational challenges, particularly when attempting to deploy them on resource-constrained edge devices such as smartphones, robots, and embedded systems. In this work, we pro… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  33. Matting by Generation

    Authors: Zhixiang Wang, Baiang Li, Jian Wang, Yu-Lun Liu, Jinwei Gu, Yung-Yu Chuang, Shin'ichi Satoh

    Abstract: This paper introduces an innovative approach for image matting that redefines the traditional regression-based task as a generative modeling challenge. Our method harnesses the capabilities of latent diffusion models, enriched with extensive pre-trained knowledge, to regularize the matting process. We present novel architectural innovations that empower our model to produce mattes with superior re… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: SIGGRAPH'24, Project page: https://lightchaserx.github.io/matting-by-generation/

  34. arXiv:2407.20761  [pdf, other

    cs.AI

    OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance

    Authors: Yongqiang Yao, Jingru Tan, Jiahao Hu, Feizhao Zhang, Xin Jin, Bo Li, Ruihao Gong, Pengfei Liu

    Abstract: Recently, vision-language instruct-tuning models have made significant progress due to their more comprehensive understanding of the world. In this work, we discovered that large-scale 3D parallel training on those models leads to an imbalanced computation load across different devices. The vision and language parts are inherently heterogeneous: their data distribution and model architecture diffe… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  35. arXiv:2407.20640  [pdf, other

    cs.LG

    Improved Bounds for Pure Private Agnostic Learning: Item-Level and User-Level Privacy

    Authors: Bo Li, Wei Wang, Peng Ye

    Abstract: Machine Learning has made remarkable progress in a wide range of fields. In many scenarios, learning is performed on datasets involving sensitive information, in which privacy protection is essential for learning algorithms. In this work, we study pure private learning in the agnostic model -- a framework reflecting the learning process in practice. We examine the number of users required under it… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  36. arXiv:2407.20462  [pdf, other

    cs.IR cs.LG

    Graphite: A Graph-based Extreme Multi-Label Short Text Classifier for Keyphrase Recommendation

    Authors: Ashirbad Mishra, Soumik Dey, Jinyu Zhao, Marshall Wu, Binbin Li, Kamesh Madduri

    Abstract: Keyphrase Recommendation has been a pivotal problem in advertising and e-commerce where advertisers/sellers are recommended keyphrases (search queries) to bid on to increase their sales. It is a challenging task due to the plethora of items shown on online platforms and various possible queries that users search while showing varying interest in the displayed items. Moreover, query/keyphrase recom… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  37. arXiv:2407.19953  [pdf, other

    cs.CV

    FedDEO: Description-Enhanced One-Shot Federated Learning with Diffusion Models

    Authors: Mingzhao Yang, Shangchao Su, Bin Li, Xiangyang Xue

    Abstract: In recent years, the attention towards One-Shot Federated Learning (OSFL) has been driven by its capacity to minimize communication. With the development of the diffusion model (DM), several methods employ the DM for OSFL, utilizing model parameters, image features, or textual prompts as mediums to transfer the local client knowledge to the server. However, these mediums often require public datas… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by MM 24

  38. arXiv:2407.19711  [pdf, other

    cs.SE

    TVDiag: A Task-oriented and View-invariant Failure Diagnosis Framework with Multimodal Data

    Authors: Shuaiyu Xie, Jian Wang, Hanbin He, Zhihao Wang, Yuqi Zhao, Neng Zhang, Bing Li

    Abstract: Microservice-based systems often suffer from reliability issues due to their intricate interactions and expanding scale. With the rapid growth of observability techniques, various methods have been proposed to achieve failure diagnosis, including root cause localization and failure type identification, by leveraging diverse monitoring data such as logs, metrics, or traces. However, traditional fai… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 30 pages

  39. arXiv:2407.19279  [pdf, other

    cs.RO

    Grasping Force Control and Adaptation for a Cable-Driven Robotic Hand

    Authors: Eric Mountain, Ean Weise, Sibo Tian, Beiwen Li, Xiao Liang, Minghui Zheng

    Abstract: This paper introduces a unique force control and adaptation algorithm for a lightweight and low-complexity five-fingered robotic hand, namely an Integrated-Finger Robotic Hand (IFRH). The force control and adaptation algorithm is intuitive to design, easy to implement, and improves the grasping functionality through feedforward adaptation automatically. Specifically, we have extended Youla-paramet… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  40. arXiv:2407.18908  [pdf, other

    cs.LG cs.CL cs.CV

    Wolf: Captioning Everything with a World Summarization Framework

    Authors: Boyi Li, Ligeng Zhu, Ran Tian, Shuhan Tan, Yuxiao Chen, Yao Lu, Yin Cui, Sushant Veer, Max Ehrlich, Jonah Philion, Xinshuo Weng, Fuzhao Xue, Andrew Tao, Ming-Yu Liu, Sanja Fidler, Boris Ivanovic, Trevor Darrell, Jitendra Malik, Song Han, Marco Pavone

    Abstract: We propose Wolf, a WOrLd summarization Framework for accurate video captioning. Wolf is an automated captioning framework that adopts a mixture-of-experts approach, leveraging complementary strengths of Vision Language Models (VLMs). By utilizing both image and video models, our framework captures different levels of information and summarizes them efficiently. Our approach can be applied to enhan… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  41. arXiv:2407.18326  [pdf, other

    cs.AR cs.AI

    Classification-Based Automatic HDL Code Generation Using LLMs

    Authors: Wenhao Sun, Bing Li, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Ulf Schlichtmann

    Abstract: While large language models (LLMs) have demonstrated the ability to generate hardware description language (HDL) code for digital circuits, they still suffer from the hallucination problem, which leads to the generation of incorrect HDL code or misunderstanding of specifications. In this work, we introduce a human-expert-inspired method to mitigate the hallucination of LLMs and improve the perform… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  42. arXiv:2407.17744  [pdf, other

    cs.CV

    Balancing Complementarity and Consistency via Delayed Activation in Incomplete Multi-view Clustering

    Authors: Bo Li

    Abstract: This paper study one challenging issue in incomplete multi-view clustering, where valuable complementary information from other views is always ignored. To be specific, we propose a framework that effectively balances Complementarity and Consistency information in Incomplete Multi-view Clustering (CoCo-IMC). Specifically, we design a dual network of delayed activation, which achieves a balance of… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  43. arXiv:2407.17476  [pdf, other

    cs.CY cs.AI

    ORCDF: An Oversmoothing-Resistant Cognitive Diagnosis Framework for Student Learning in Online Education Systems

    Authors: Hong Qian, Shuo Liu, Mingjia Li, Bingdong Li, Zhi Liu, Aimin Zhou

    Abstract: Cognitive diagnosis models (CDMs) are designed to learn students' mastery levels using their response logs. CDMs play a fundamental role in online education systems since they significantly influence downstream applications such as teachers' guidance and computerized adaptive testing. Despite the success achieved by existing CDMs, we find that they suffer from a thorny issue that the learned stude… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Journal ref: KDD 2024

  44. arXiv:2407.17436  [pdf, other

    cs.CY cs.AI

    AIR-Bench 2024: A Safety Benchmark Based on Risk Categories from Regulations and Policies

    Authors: Yi Zeng, Yu Yang, Andy Zhou, Jeffrey Ziwei Tan, Yuheng Tu, Yifan Mai, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li

    Abstract: Foundation models (FMs) provide societal benefits but also amplify risks. Governments, companies, and researchers have proposed regulatory frameworks, acceptable use policies, and safety benchmarks in response. However, existing public benchmarks often define safety categories based on previous literature, intuitions, or common sense, leading to disjointed sets of categories for risks specified in… ▽ More

    Submitted 5 August, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  45. arXiv:2407.17164  [pdf, other

    cs.LG cs.AI

    Robust Deep Hawkes Process under Label Noise of Both Event and Occurrence

    Authors: Xiaoyu Tan, Bin Li, Xihe Qiu, Jingjing Huang, Yinghui Xu, Wei Chu

    Abstract: Integrating deep neural networks with the Hawkes process has significantly improved predictive capabilities in finance, health informatics, and information technology. Nevertheless, these models often face challenges in real-world settings, particularly due to substantial label noise. This issue is of significant concern in the medical field, where label noise can arise from delayed updates in ele… ▽ More

    Submitted 29 July, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: ECAI2024

  46. arXiv:2407.17060  [pdf, other

    cs.CV cs.AI cs.CL eess.IV

    High Efficiency Image Compression for Large Visual-Language Models

    Authors: Binzhe Li, Shurun Wang, Shiqi Wang, Yan Ye

    Abstract: In recent years, large visual language models (LVLMs) have shown impressive performance and promising generalization capability in multi-modal tasks, thus replacing humans as receivers of visual information in various application scenarios. In this paper, we pioneer to propose a variable bitrate image compression framework consisting of a pre-editing module and an end-to-end codec to achieve promi… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  47. arXiv:2407.17020  [pdf, other

    cs.CV

    EAFormer: Scene Text Segmentation with Edge-Aware Transformers

    Authors: Haiyang Yu, Teng Fu, Bin Li, Xiangyang Xue

    Abstract: Scene text segmentation aims at cropping texts from scene images, which is usually used to help generative models edit or remove texts. The existing text segmentation methods tend to involve various text-related supervisions for better performance. However, most of them ignore the importance of text edges, which are significant for downstream applications. In this paper, we propose Edge-Aware Tran… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  48. arXiv:2407.16741  [pdf, other

    cs.SE cs.AI cs.CL

    OpenDevin: An Open Platform for AI Software Developers as Generalist Agents

    Authors: Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, Graham Neubig

    Abstract: Software is one of the most powerful tools that we humans have at our disposal; it allows a skilled programmer to interact with the world in complex and profound ways. At the same time, thanks to improvements in large language models (LLMs), there has also been a rapid development in AI agents that interact with and affect change in their surrounding environments. In this paper, we introduce OpenD… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Code: https://github.com/OpenDevin/OpenDevin

  49. arXiv:2407.16308  [pdf, other

    cs.CV eess.IV

    SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging

    Authors: Lingtong Kong, Bo Li, Yike Xiong, Hao Zhang, Hong Gu, Jinwei Chen

    Abstract: Multi-exposure High Dynamic Range (HDR) imaging is a challenging task when facing truncated texture and complex motion. Existing deep learning-based methods have achieved great success by either following the alignment and fusion pipeline or utilizing attention mechanism. However, the large computation cost and inference delay hinder them from deploying on resource limited devices. In this paper,… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  50. arXiv:2407.16291  [pdf, other

    cs.CV cs.RO

    TAPTRv2: Attention-based Position Update Improves Tracking Any Point

    Authors: Hongyang Li, Hao Zhang, Shilong Liu, Zhaoyang Zeng, Feng Li, Tianhe Ren, Bohan Li, Lei Zhang

    Abstract: In this paper, we present TAPTRv2, a Transformer-based approach built upon TAPTR for solving the Tracking Any Point (TAP) task. TAPTR borrows designs from DEtection TRansformer (DETR) and formulates each tracking point as a point query, making it possible to leverage well-studied operations in DETR-like algorithms. TAPTRv2 improves TAPTR by addressing a critical issue regarding its reliance on cos… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.