Skip to main content

Showing 1–50 of 454 results for author: Yu, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10774  [pdf, other

    cs.AI cs.CL

    Flexora: Flexible Low Rank Adaptation for Large Language Models

    Authors: Chenxing Wei, Yao Shu, Ying Tiffany He, Fei Richard Yu

    Abstract: Large Language Models (LLMs) are driving advancements in artificial intelligence by increasing the scale of model parameters, which has significantly enhanced generalization ability and unlocked new capabilities in practice. However, their performance in specific downstream tasks is usually hindered by their knowledge boundaries on these tasks. Thus, fine-tuning techniques, especially the widely u… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 29 pages, 13 figures

  2. arXiv:2408.10642  [pdf, other

    cs.AI cs.CL

    Minor SFT loss for LLM fine-tune to increase performance and reduce model deviation

    Authors: Shiming Xie, Hong Chen, Fred Yu, Zeye Sun, Xiuyu Wu

    Abstract: Instruct LLM provide a paradigm used in large scale language model to align LLM to human preference. The paradigm contains supervised fine tuning and reinforce learning from human feedback. This paradigm is also used in downstream scenarios to adapt LLM to specific corpora and applications. Comparing to SFT, there are many efforts focused on RLHF and several algorithms being proposed, such as PPO,… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 8 pages, 5 figures

  3. arXiv:2408.09834  [pdf, other

    cs.AI

    Minor DPO reject penalty to increase training robustness

    Authors: Shiming Xie, Hong Chen, Fred Yu, Zeye Sun, Xiuyu Wu, Yingfan Hu

    Abstract: Learning from human preference is a paradigm used in large-scale language model (LLM) fine-tuning step to better align pretrained LLM to human preference for downstream task. In the past it uses reinforcement learning from human feedback (RLHF) algorithm to optimize the LLM policy to align with these preferences and not to draft too far from the original model. Recently, Direct Preference Optimiza… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 8 pages, 19 figures

  4. arXiv:2408.09765  [pdf, other

    cs.LG cs.HC

    Baby Bear: Seeking a Just Right Rating Scale for Scalar Annotations

    Authors: Xu Han, Felix Yu, Joao Sedoc, Benjamin Van Durme

    Abstract: Our goal is a mechanism for efficiently assigning scalar ratings to each of a large set of elements. For example, "what percent positive or negative is this product review?" When sample sizes are small, prior work has advocated for methods such as Best Worst Scaling (BWS) as being more robust than direct ordinal annotation ("Likert scales"). Here we first introduce IBWS, which iteratively collects… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  5. arXiv:2408.08474  [pdf, other

    hep-ex astro-ph.IM cs.LG

    Enhancing Events in Neutrino Telescopes through Deep Learning-Driven Super-Resolution

    Authors: Felix J. Yu, Nicholas Kamp, Carlos A. Argüelles

    Abstract: Recent discoveries by neutrino telescopes, such as the IceCube Neutrino Observatory, relied extensively on machine learning (ML) tools to infer physical quantities from the raw photon hits detected. Neutrino telescope reconstruction algorithms are limited by the sparse sampling of photons by the optical modules due to the relatively large spacing ($10-100\,{\rm m})$ between them. In this letter, w… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 5+1 pages, 4+1 figures

  6. arXiv:2408.07295  [pdf, other

    cs.RO cs.AI

    Learning Multi-Modal Whole-Body Control for Real-World Humanoid Robots

    Authors: Pranay Dugar, Aayam Shrestha, Fangzhou Yu, Bart van Marum, Alan Fern

    Abstract: We introduce the Masked Humanoid Controller (MHC) for whole-body tracking of target trajectories over arbitrary subsets of humanoid state variables. This enables the realization of whole-body motions from diverse sources such as video, motion capture, and VR, while ensuring balance and robustness against disturbances. The MHC is trained in simulation using a carefully designed curriculum that imit… ▽ More

    Submitted 30 July, 2024; originally announced August 2024.

    Comments: Website: https://masked-humanoid.github.io/mhc/

  7. arXiv:2408.04590  [pdf, other

    cs.LG

    Learn To Learn More Precisely

    Authors: Runxi Cheng, Yongxian Wei, Xianglong He, Wanyun Zhu, Songsong Huang, Fei Richard Yu, Fei Ma, Chun Yuan

    Abstract: Meta-learning has been extensively applied in the domains of few-shot learning and fast adaptation, achieving remarkable performance. While Meta-learning methods like Model-Agnostic Meta-Learning (MAML) and its variants provide a good set of initial parameters for the model, the model still tends to learn shortcut features, which leads to poor generalization. In this paper, we propose the formal c… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 10pages,4 figures, meta learning

  8. arXiv:2407.21143  [pdf, ps, other

    cs.GT

    Diffusion Mechanism Design in Tree-Structured Social Network

    Authors: Feiyang Yu

    Abstract: We design a fixed-price auction mechanism for a seller to sell multiple items in a tree-structured market. The buyers have independently drawn valuation from a uniform distribution, and the seller would like to incentivize buyers to invite more people to the auction. We prove that our mechanism is individual rational, and incentivize compatible with regard to the buyers' action. Furthermore, we sh… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  9. arXiv:2407.19789  [pdf, other

    cs.CV

    Interpreting Low-level Vision Models with Causal Effect Maps

    Authors: Jinfan Hu, Jinjin Gu, Shiyao Yu, Fanghua Yu, Zheyuan Li, Zhiyuan You, Chaochao Lu, Chao Dong

    Abstract: Deep neural networks have significantly improved the performance of low-level vision tasks but also increased the difficulty of interpretability. A deep understanding of deep models is beneficial for both network design and practical reliability. To take up this challenge, we introduce causality theory to interpret low-level vision models and propose a model-/task-agnostic method called Causal Eff… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  10. arXiv:2407.18569  [pdf, other

    cs.RO cs.AI cs.LG

    PP-TIL: Personalized Planning for Autonomous Driving with Instance-based Transfer Imitation Learning

    Authors: Fangze Lin, Ying He, Fei Yu

    Abstract: Personalized motion planning holds significant importance within urban automated driving, catering to the unique requirements of individual users. Nevertheless, prior endeavors have frequently encountered difficulties in simultaneously addressing two crucial aspects: personalized planning within intricate urban settings and enhancing planning performance through data utilization. The challenge ari… ▽ More

    Submitted 4 August, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

    Comments: IROS 2024 Accepted

  11. arXiv:2407.16984  [pdf, other

    cs.LG cs.IR q-bio.GN

    scGHSOM: Hierarchical clustering and visualization of single-cell and CRISPR data using growing hierarchical SOM

    Authors: Shang-Jung Wen, Jia-Ming Chang, Fang Yu

    Abstract: High-dimensional single-cell data poses significant challenges in identifying underlying biological patterns due to the complexity and heterogeneity of cellular states. We propose a comprehensive gene-cell dependency visualization via unsupervised clustering, Growing Hierarchical Self-Organizing Map (GHSOM), specifically designed for analyzing high-dimensional single-cell data like single-cell seq… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Abstract presentation at BIOKDD@ACM KDD 2024

  12. arXiv:2407.06985  [pdf, other

    cs.AI

    PEER: Expertizing Domain-Specific Tasks with a Multi-Agent Framework and Tuning Methods

    Authors: Yiying Wang, Xiaojing Li, Binzhu Wang, Yueyang Zhou, Han Ji, Hong Chen, Jinshi Zhang, Fei Yu, Zewei Zhao, Song Jin, Renji Gong, Wanqing Xu

    Abstract: In domain-specific applications, GPT-4, augmented with precise prompts or Retrieval-Augmented Generation (RAG), shows notable potential but faces the critical tri-lemma of performance, cost, and data privacy. High performance requires sophisticated processing techniques, yet managing multiple agents within a complex workflow often proves costly and challenging. To address this, we introduce the PE… ▽ More

    Submitted 9 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

  13. arXiv:2407.06305  [pdf, other

    cs.CV cs.GR

    SweepNet: Unsupervised Learning Shape Abstraction via Neural Sweepers

    Authors: Mingrui Zhao, Yizhi Wang, Fenggen Yu, Changqing Zou, Ali Mahdavi-Amiri

    Abstract: Shape abstraction is an important task for simplifying complex geometric structures while retaining essential features. Sweep surfaces, commonly found in human-made objects, aid in this process by effectively capturing and representing object geometry, thereby facilitating abstraction. In this paper, we introduce \papername, a novel approach to shape abstraction through sweep surfaces. We propose… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 14 pages,20 figures, ECCV 2024

  14. arXiv:2407.05878  [pdf, other

    cs.CV

    HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution

    Authors: Xiang Zhang, Yulun Zhang, Fisher Yu

    Abstract: Transformers have exhibited promising performance in computer vision tasks including image super-resolution (SR). However, popular transformer-based SR methods often employ window self-attention with quadratic computational complexity to window sizes, resulting in fixed small windows with limited receptive fields. In this paper, we present a general strategy to convert transformer-based SR network… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  15. arXiv:2407.04998  [pdf, other

    cs.CV cs.CL cs.LG

    The Solution for the 5th GCAIAC Zero-shot Referring Expression Comprehension Challenge

    Authors: Longfei Huang, Feng Yu, Zhihao Guan, Zhonghua Wan, Yang Yang

    Abstract: This report presents a solution for the zero-shot referring expression comprehension task. Visual-language multimodal base models (such as CLIP, SAM) have gained significant attention in recent years as a cornerstone of mainstream research. One of the key applications of multimodal base models lies in their ability to generalize to zero-shot downstream tasks. Unlike traditional referring expressio… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  16. arXiv:2407.03640  [pdf, other

    cs.LG cs.CL cs.CV

    Generative Technology for Human Emotion Recognition: A Scope Review

    Authors: Fei Ma, Yucheng Yuan, Yifan Xie, Hongwei Ren, Ivan Liu, Ying He, Fuji Ren, Fei Richard Yu, Shiguang Ni

    Abstract: Affective computing stands at the forefront of artificial intelligence (AI), seeking to imbue machines with the ability to comprehend and respond to human emotions. Central to this field is emotion recognition, which endeavors to identify and interpret human emotional states from different modalities, such as speech, facial images, text, and physiological signals. In recent years, important progre… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Under Review

  17. arXiv:2407.02277  [pdf, other

    cs.SD eess.AS

    MelodyT5: A Unified Score-to-Score Transformer for Symbolic Music Processing

    Authors: Shangda Wu, Yashan Wang, Xiaobing Li, Feng Yu, Maosong Sun

    Abstract: In the domain of symbolic music research, the progress of developing scalable systems has been notably hindered by the scarcity of available training data and the demand for models tailored to specific tasks. To address these issues, we propose MelodyT5, a novel unified framework that leverages an encoder-decoder architecture tailored for symbolic music processing in ABC notation. This framework c… ▽ More

    Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: 9 pages, 2 figures, 3 tables, accepted by ISMIR 2024

  18. arXiv:2407.01796  [pdf, other

    cs.CL

    Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation

    Authors: Sirui Xia, Xintao Wang, Jiaqing Liang, Yifei Zhang, Weikang Zhou, Jiaji Deng, Fei Yu, Yanghua Xiao

    Abstract: Retrieval-Augmented Generation (RAG) has been widely adopted to enhance Large Language Models (LLMs) in knowledge-intensive tasks. Recently, Attributed Text Generation (ATG) has attracted growing attention, which provides citations to support the model's responses in RAG, so as to enhance the credibility of LLM-generated content and facilitate verification. Prior methods mainly adopt coarse-graine… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 15 pages,2 figures

  19. arXiv:2407.01081  [pdf, other

    cs.CV cs.CL

    CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation

    Authors: Yuxuan Wang, Yijun Liu, Fei Yu, Chen Huang, Kexin Li, Zhiguo Wan, Wanxiang Che

    Abstract: Despite the rapid development of Chinese vision-language models (VLMs), most existing Chinese vision-language (VL) datasets are constructed on Western-centric images from existing English VL datasets. The cultural bias in the images makes these datasets unsuitable for evaluating VLMs in Chinese culture. To remedy this issue, we present a new Chinese Vision- Language Understanding Evaluation (CVLUE… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  20. arXiv:2407.00141  [pdf, other

    cs.LG cs.AI

    Towards Secure and Efficient Data Scheduling for Vehicular Social Networks

    Authors: Youhua Xia, Tiehua Zhang, Jiong Jin, Ying He, Fei Yu

    Abstract: Efficient data transmission scheduling within vehicular environments poses a significant challenge due to the high mobility of such networks. Contemporary research predominantly centers on crafting cooperative scheduling algorithms tailored for vehicular networks. Notwithstanding, the intricacies of orchestrating scheduling in vehicular social networks both effectively and efficiently remain formi… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

  21. arXiv:2406.19781  [pdf, other

    cs.RO

    LCSim: A Large-Scale Controllable Traffic Simulator

    Authors: Yuheng Zhang, Tianjian Ouyang, Fudan Yu, Cong Ma, Lei Qiao, Wei Wu, Jian Yuan, Yong Li

    Abstract: With the rapid development of urban transportation and the continuous advancement in autonomous vehicles, the demand for safely and efficiently testing autonomous driving and traffic optimization algorithms arises, which needs accurate modeling of large-scale urban traffic scenarios. Existing traffic simulation systems encounter two significant limitations. Firstly, they often rely on open-source… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Submitted to the 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and Benchmarks

  22. arXiv:2406.17968  [pdf, other

    cs.IR cs.AI cs.LG stat.ML

    Efficient Document Ranking with Learnable Late Interactions

    Authors: Ziwei Ji, Himanshu Jain, Andreas Veit, Sashank J. Reddi, Sadeep Jayasumana, Ankit Singh Rawat, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

    Abstract: Cross-Encoder (CE) and Dual-Encoder (DE) models are two fundamental approaches for query-document relevance in information retrieval. To predict relevance, CE models use joint query-document embeddings, while DE models maintain factorized query and document embeddings; usually, the former has higher quality while the latter benefits from lower latency. Recently, late-interaction models have been p… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  23. arXiv:2406.17224  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.SC

    Large Language Models are Interpretable Learners

    Authors: Ruochen Wang, Si Si, Felix Yu, Dorothea Wiesmann, Cho-Jui Hsieh, Inderjit Dhillon

    Abstract: The trade-off between expressiveness and interpretability remains a core challenge when building human-centric predictive models for classification and decision-making. While symbolic rules offer interpretability, they often lack expressiveness, whereas neural networks excel in performance but are known for being black boxes. In this paper, we show a combination of Large Language Models (LLMs) and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Preliminary Version, Code at [this url](https://github.com/ruocwang/llm-symbolic-program)

    MSC Class: 68T05

  24. arXiv:2406.13362  [pdf, other

    cs.CV cs.CL cs.LG

    VisualRWKV: Exploring Recurrent Neural Networks for Visual Language Models

    Authors: Haowen Hou, Peigen Zeng, Fei Ma, Fei Richard Yu

    Abstract: Visual Language Models (VLMs) have rapidly progressed with the recent success of large language models. However, there have been few attempts to incorporate efficient linear Recurrent Neural Networks (RNNs) architectures into VLMs. In this study, we introduce VisualRWKV, the first application of a linear RNN model to multimodal learning tasks, leveraging the pre-trained RWKV language model. We pro… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 18 pages,14 tables,6 figures

  25. arXiv:2406.11389  [pdf, other

    cs.LG

    SEFraud: Graph-based Self-Explainable Fraud Detection via Interpretative Mask Learning

    Authors: Kaidi Li, Tianmeng Yang, Min Zhou, Jiahao Meng, Shendi Wang, Yihui Wu, Boshuai Tan, Hu Song, Lujia Pan, Fan Yu, Zhenli Sheng, Yunhai Tong

    Abstract: Graph-based fraud detection has widespread application in modern industry scenarios, such as spam review and malicious account detection. While considerable efforts have been devoted to designing adequate fraud detectors, the interpretability of their results has often been overlooked. Previous works have attempted to generate explanations for specific instances using post-hoc explaining methods s… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024

  26. arXiv:2406.06799  [pdf, other

    cs.DC cs.CL

    LLM-dCache: Improving Tool-Augmented LLMs with GPT-Driven Localized Data Caching

    Authors: Simranjit Singh, Michael Fore, Andreas Karatzas, Chaehong Lee, Yanan Jian, Longfei Shangguan, Fuxun Yu, Iraklis Anagnostopoulos, Dimitrios Stamoulis

    Abstract: As Large Language Models (LLMs) broaden their capabilities to manage thousands of API calls, they are confronted with complex data operations across vast datasets with significant overhead to the underlying system. In this work, we introduce LLM-dCache to optimize data accesses by treating cache operations as callable API functions exposed to the tool-augmented agent. We grant LLMs the autonomy to… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  27. arXiv:2406.05839  [pdf, other

    eess.AS cs.AI

    MaLa-ASR: Multimedia-Assisted LLM-Based ASR

    Authors: Guanrou Yang, Ziyang Ma, Fan Yu, Zhifu Gao, Shiliang Zhang, Xie Chen

    Abstract: As more and more information-rich data like video become available, utilizing multi-modal auxiliary information to enhance audio tasks has sparked widespread research interest. The recent surge in research on LLM-based audio models provides fresh perspectives for tackling audio tasks. Given that LLM can flexibly ingest multiple inputs, we propose MaLa-ASR, an LLM-based ASR model that can integrate… ▽ More

    Submitted 13 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  28. arXiv:2406.05673  [pdf, other

    cs.AI cs.CL

    Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking

    Authors: Fangxu Yu, Lai Jiang, Haoqiang Kang, Shibo Hao, Lianhui Qin

    Abstract: Divergent thinking, the cognitive process of generating diverse solutions, is a hallmark of human creativity and problem-solving. For machines, sampling diverse solution trajectories in complex reasoning problems is crucial for robust outcomes, data augmentation, and enhanced model generalization. Large language models (LLMs) often struggle with generating high-quality, diverse reasoning. While su… ▽ More

    Submitted 24 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  29. arXiv:2406.04221  [pdf, other

    cs.CV

    Matching Anything by Segmenting Anything

    Authors: Siyuan Li, Lei Ke, Martin Danelljan, Luigi Piccinelli, Mattia Segu, Luc Van Gool, Fisher Yu

    Abstract: The robust association of the same objects across video frames in complex scenes is crucial for many applications, especially Multiple Object Tracking (MOT). Current methods predominantly rely on labeled domain-specific video datasets, which limits the cross-domain generalization of learned similarity embeddings. We propose MASA, a novel method for robust instance association learning, capable of… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 Highlight. code at: https://github.com/siyuanliii/masa

  30. arXiv:2406.02495  [pdf, other

    cs.CV

    GenS: Generalizable Neural Surface Reconstruction from Multi-View Images

    Authors: Rui Peng, Xiaodong Gu, Luyang Tang, Shihe Shen, Fanqi Yu, Ronggang Wang

    Abstract: Combining the signed distance function (SDF) and differentiable volume rendering has emerged as a powerful paradigm for surface reconstruction from multi-view images without 3D supervision. However, current methods are impeded by requiring long-time per-scene optimizations and cannot generalize to new scenes. In this paper, we present GenS, an end-to-end generalizable neural surface reconstruction… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2023 Accepted

  31. arXiv:2406.01219  [pdf, other

    cs.CR cs.SE

    Constraint-based Adversarial Example Synthesis

    Authors: Fang Yu, Ya-Yu Chi, Yu-Fang Chen

    Abstract: In the era of rapid advancements in artificial intelligence (AI), neural network models have achieved notable breakthroughs. However, concerns arise regarding their vulnerability to adversarial attacks. This study focuses on enhancing Concolic Testing, a specialized technique for testing Python programs implementing neural networks. The extended tool, PyCT, now accommodates a broader range of neur… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  32. arXiv:2405.13972  [pdf, other

    cs.LG

    Infinite-Dimensional Feature Interaction

    Authors: Chenhui Xu, Fuxun Yu, Maoliang Li, Zihao Zheng, Zirui Xu, Jinjun Xiong, Xiang Chen

    Abstract: The past neural network design has largely focused on feature representation space dimension and its capacity scaling (e.g., width, depth), but overlooked the feature interaction space scaling. Recent advancements have shown shifted focus towards element-wise multiplication to facilitate higher-dimensional feature interaction space for better information transformation. Despite this progress, mu… ▽ More

    Submitted 9 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  33. Audio Matters Too! Enhancing Markerless Motion Capture with Audio Signals for String Performance Capture

    Authors: Yitong Jin, Zhiping Qiu, Yi Shi, Shuangpeng Sun, Chongwu Wang, Donghao Pan, Jiachen Zhao, Zhenghao Liang, Yuan Wang, Xiaobing Li, Feng Yu, Tao Yu, Qionghai Dai

    Abstract: In this paper, we touch on the problem of markerless multi-modal human motion capture especially for string performance capture which involves inherently subtle hand-string contacts and intricate movements. To fulfill this goal, we first collect a dataset, named String Performance Dataset (SPD), featuring cello and violin performances. The dataset includes videos captured from up to 23 different v… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: SIGGRAPH2024

  34. arXiv:2405.03192  [pdf, other

    cs.LG cs.AI

    QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation

    Authors: Chenhui Xu, Xinyao Wang, Fuxun Yu, Jinjun Xiong, Xiang Chen

    Abstract: Machine learning is evolving towards high-order models that necessitate pre-training on extensive datasets, a process associated with significant overheads. Traditional models, despite having pre-trained weights, are becoming obsolete due to architectural differences that obstruct the effective transfer and initialization of these weights. To address these challenges, we introduce a novel framewor… ▽ More

    Submitted 8 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

  35. arXiv:2404.18532  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    MileBench: Benchmarking MLLMs in Long Context

    Authors: Dingjie Song, Shunian Chen, Guiming Hardy Chen, Fei Yu, Xiang Wan, Benyou Wang

    Abstract: Despite the advancements and impressive performance of Multimodal Large Language Models (MLLMs) on benchmarks, their effectiveness in real-world, long-context, and multi-image tasks is unclear due to the benchmarks' limited scope. Existing benchmarks often focus on single-image and short-text samples, and when assessing multi-image tasks, they either limit the image count or focus on specific task… ▽ More

    Submitted 15 May, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

    Comments: 31 pages, 13 figures, 14 tables; We add results of GPT-4o in this version

  36. arXiv:2404.12611  [pdf, other

    cs.CV

    Rethinking Clothes Changing Person ReID: Conflicts, Synthesis, and Optimization

    Authors: Junjie Li, Guanshuo Wang, Fufu Yu, Yichao Yan, Qiong Jia, Shouhong Ding, Xingdong Sheng, Yunhui Liu, Xiaokang Yang

    Abstract: Clothes-changing person re-identification (CC-ReID) aims to retrieve images of the same person wearing different outfits. Mainstream researches focus on designing advanced model structures and strategies to capture identity information independent of clothing. However, the same-clothes discrimination as the standard ReID learning objective in CC-ReID is persistently ignored in previous researches.… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  37. arXiv:2404.11590  [pdf, other

    cs.CV

    A Subspace-Constrained Tyler's Estimator and its Applications to Structure from Motion

    Authors: Feng Yu, Teng Zhang, Gilad Lerman

    Abstract: We present the subspace-constrained Tyler's estimator (STE) designed for recovering a low-dimensional subspace within a dataset that may be highly corrupted with outliers. STE is a fusion of the Tyler's M-estimator (TME) and a variant of the fast median subspace. Our theoretical analysis suggests that, under a common inlier-outlier model, STE can effectively recover the underlying subspace, even w… ▽ More

    Submitted 7 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

    Comments: 23 pages, accepted by CVPR 24

  38. arXiv:2404.10004  [pdf

    cs.LG physics.soc-ph stat.AP

    A Strategy Transfer and Decision Support Approach for Epidemic Control in Experience Shortage Scenarios

    Authors: X. Xiao, P. Chen, X. Cao, K. Liu, L. Deng, D. Zhao, Z. Chen, Q. Deng, F. Yu, H. Zhang

    Abstract: Epidemic outbreaks can cause critical health concerns and severe global economic crises. For countries or regions with new infectious disease outbreaks, it is essential to generate preventive strategies by learning lessons from others with similar risk profiles. A Strategy Transfer and Decision Support Approach (STDSA) is proposed based on the profile similarity evaluation. There are four steps in… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 20 pages, 9 figures

  39. arXiv:2404.08406  [pdf, other

    cs.CV

    MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion

    Authors: Zhe Li, Haiwei Pan, Kejia Zhang, Yuhua Wang, Fengming Yu

    Abstract: Multi-modality image fusion (MMIF) aims to integrate complementary information from different modalities into a single fused image to represent the imaging scene and facilitate downstream visual tasks comprehensively. In recent years, significant progress has been made in MMIF tasks due to advances in deep neural networks. However, existing methods cannot effectively and efficiently extract modali… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

  40. arXiv:2404.00875  [pdf, other

    cs.CV

    DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly

    Authors: Fenggen Yu, Yiming Qian, Xu Zhang, Francisca Gil-Ureta, Brian Jackson, Eric Bennett, Hao Zhang

    Abstract: We present a differentiable rendering framework to learn structured 3D abstractions in the form of primitive assemblies from sparse RGB images capturing a 3D object. By leveraging differentiable volume rendering, our method does not require 3D supervision. Architecturally, our network follows the general pipeline of an image-conditioned neural radiance field (NeRF) exemplified by pixelNeRF for col… ▽ More

    Submitted 6 August, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

    Comments: 14 pages, accepted to ECCV 2024

  41. arXiv:2403.20289  [pdf, other

    cs.CL cs.SD eess.AS

    Emotion-Anchored Contrastive Learning Framework for Emotion Recognition in Conversation

    Authors: Fangxu Yu, Junjie Guo, Zhen Wu, Xinyu Dai

    Abstract: Emotion Recognition in Conversation (ERC) involves detecting the underlying emotion behind each utterance within a conversation. Effectively generating representations for utterances remains a significant challenge in this task. Recent works propose various models to address this issue, but they still struggle with differentiating similar emotions such as excitement and happiness. To alleviate thi… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted by Findings of NAACL 2024

  42. arXiv:2403.18913  [pdf, other

    cs.CV

    UniDepth: Universal Monocular Metric Depth Estimation

    Authors: Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis, Mattia Segu, Siyuan Li, Luc Van Gool, Fisher Yu

    Abstract: Accurate monocular metric depth estimation (MMDE) is crucial to solving downstream tasks in 3D perception and modeling. However, the remarkable accuracy of recent MMDE methods is confined to their training domains. These methods fail to generalize to unseen domains even in the presence of moderate domain gaps, which hinders their practical applicability. We propose a new model, UniDepth, capable o… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  43. arXiv:2403.16398  [pdf, other

    cs.LG cs.AI

    Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data

    Authors: Xinting Liao, Weiming Liu, Chaochao Chen, Pengyang Zhou, Fengyuan Yu, Huabin Zhu, Binhui Yao, Tao Wang, Xiaolin Zheng, Yanchao Tan

    Abstract: Federated learning achieves effective performance in modeling decentralized data. In practice, client data are not well-labeled, which makes it potential for federated unsupervised learning (FUSL) with non-IID data. However, the performance of existing FUSL methods suffers from insufficient representations, i.e., (1) representation collapse entanglement among local and global models, and (2) incon… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  44. arXiv:2403.16260  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble

    Authors: Chenhui Xu, Fuxun Yu, Zirui Xu, Nathan Inkawhich, Xiang Chen

    Abstract: Recent research underscores the pivotal role of the Out-of-Distribution (OOD) feature representation field scale in determining the efficacy of models in OOD detection. Consequently, the adoption of model ensembles has emerged as a prominent strategy to augment this feature representation field, capitalizing on anticipated model diversity. However, our introduction of novel qualitative and quant… ▽ More

    Submitted 15 August, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

    Comments: ICML 2024

  45. arXiv:2403.14300  [pdf, other

    cs.RO cs.AI

    DexDribbler: Learning Dexterous Soccer Manipulation via Dynamic Supervision

    Authors: Yutong Hu, Kehan Wen, Fisher Yu

    Abstract: Learning dexterous locomotion policy for legged robots is becoming increasingly popular due to its ability to handle diverse terrains and resemble intelligent behaviors. However, joint manipulation of moving objects and locomotion with legs, such as playing soccer, receive scant attention in the learning community, although it is natural for humans and smart animals. A key challenge to solve this… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: 8 pages, 7 figures, submitted to IROS 2024

  46. Java JIT Testing with Template Extraction

    Authors: Zhiqiang Zang, Fu-Yao Yu, Aditya Thimmaiah, August Shi, Milos Gligoric

    Abstract: We present LeJit, a template-based framework for testing Java just-in-time (JIT) compilers. Like recent template-based frameworks, LeJit executes a template -- a program with holes to be filled -- to generate concrete programs given as inputs to Java JIT compilers. LeJit automatically generates template programs from existing Java code by converting expressions to holes, as well as generating nece… ▽ More

    Submitted 7 July, 2024; v1 submitted 17 March, 2024; originally announced March 2024.

    Comments: 23 pages, 6 figures, 8 tables, accepted in FSE 2024 (Research Papers track)

  47. arXiv:2403.09671  [pdf, other

    cs.DC cs.AI

    CoRaiS: Lightweight Real-Time Scheduler for Multi-Edge Cooperative Computing

    Authors: Yujiao Hu, Qingmin Jia, Jinchao Chen, Yuan Yao, Yan Pan, Renchao Xie, F. Richard Yu

    Abstract: Multi-edge cooperative computing that combines constrained resources of multiple edges into a powerful resource pool has the potential to deliver great benefits, such as a tremendous computing power, improved response time, more diversified services. However, the mass heterogeneous resources composition and lack of scheduling strategies make the modeling and cooperating of multi-edge computing sys… ▽ More

    Submitted 20 May, 2024; v1 submitted 4 February, 2024; originally announced March 2024.

    Comments: Accepted by IEEE Internet of Things Journal

  48. arXiv:2403.04182  [pdf, other

    cs.CL cs.AI

    Metric-aware LLM inference for regression and scoring

    Authors: Michal Lukasik, Harikrishna Narasimhan, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

    Abstract: Large language models (LLMs) have demonstrated strong results on a range of NLP tasks. Typically, outputs are obtained via autoregressive sampling from the LLM's underlying distribution. Building on prior work on Minimum Bayes Risk Decoding, we show that this inference strategy can be suboptimal for a range of regression and scoring tasks, and associated evaluation metrics. As a remedy, we propose… ▽ More

    Submitted 4 April, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: 15 pages

  49. arXiv:2402.16870  [pdf, other

    cs.NI

    Pioneering Deterministic Scheduling and Network Structure Optimization for Time-Critical Computing Tasks in Industrial IoT

    Authors: Yujiao Hu, Yining Zhu, Huayu Zhang, Yan Pan, Qingmin Jia, Renchao Xie, Gang Yang, F. Richard Yu

    Abstract: The Industrial Internet of Things (IIoT) has become a critical technology to accelerate the process of digital and intelligent transformation of industries. As the cooperative relationship between smart devices in IIoT becomes more complex, getting deterministic responses of IIoT periodic time-critical computing tasks becomes a crucial and nontrivial problem. However, few current works in cloud/ed… ▽ More

    Submitted 23 January, 2024; originally announced February 2024.

    Comments: Under Review

  50. arXiv:2402.12886  [pdf, other

    cs.GR

    Real-time High-resolution View Synthesis of Complex Scenes with Explicit 3D Visibility Reasoning

    Authors: Tiansong Zhou, Yebin Liu, Xuangeng Chu, Chengkun Cao, Changyin Zhou, Fei Yu, Yu Li

    Abstract: Rendering photo-realistic novel-view images of complex scenes has been a long-standing challenge in computer graphics. In recent years, great research progress has been made on enhancing rendering quality and accelerating rendering speed in the realm of view synthesis. However, when rendering complex dynamic scenes with sparse views, the rendering quality remains limited due to occlusion problems.… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.