Skip to main content

Showing 1–50 of 1,136 results for author: Shen, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10943  [pdf, other

    cs.CL

    SysBench: Can Large Language Models Follow System Messages?

    Authors: Yanzhao Qin, Tao Zhang, Tao Zhang, Yanjun Shen, Wenjing Luo, Haoze Sun, Yan Zhang, Yujing Qiao, Weipeng Chen, Zenan Zhou, Wentao Zhang, Bin Cui

    Abstract: Large Language Models (LLMs) have become instrumental across various applications, with the customization of these models to specific scenarios becoming increasingly critical. System message, a fundamental component of LLMs, is consist of carefully crafted instructions that guide the behavior of model to meet intended goals. Despite the recognized potential of system messages to optimize AI-driven… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2408.09698  [pdf, other

    cs.IR cs.AI

    Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation

    Authors: Yuyang Ye, Zhi Zheng, Yishan Shen, Tianshu Wang, Hengruo Zhang, Peijun Zhu, Runlong Yu, Kai Zhang, Hui Xiong

    Abstract: Recent advances in Large Language Models (LLMs) have demonstrated significant potential in the field of Recommendation Systems (RSs). Most existing studies have focused on converting user behavior logs into textual prompts and leveraging techniques such as prompt tuning to enable LLMs for recommendation tasks. Meanwhile, research interest has recently grown in multimodal recommendation systems tha… ▽ More

    Submitted 20 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  3. arXiv:2408.09632  [pdf, other

    cs.LG cs.CL stat.ML

    MoDeGPT: Modular Decomposition for Large Language Model Compression

    Authors: Chi-Heng Lin, Shangqian Gao, James Seale Smith, Abhishek Patel, Shikhar Tuli, Yilin Shen, Hongxia Jin, Yen-Chang Hsu

    Abstract: Large Language Models (LLMs) have reshaped the landscape of artificial intelligence by demonstrating exceptional performance across various tasks. However, substantial computational requirements make their deployment challenging on devices with limited resources. Recently, compression methods using low-rank matrix techniques have shown promise, yet these often lead to degraded accuracy or introduc… ▽ More

    Submitted 20 August, 2024; v1 submitted 18 August, 2024; originally announced August 2024.

    Comments: 31 pages, 9 figures

    MSC Class: 15A23 (Primary) ACM Class: I.2.7

  4. arXiv:2408.09529  [pdf, other

    cs.CL cs.AI

    Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path

    Authors: Xinnan Dai, Qihao Wen, Yifei Shen, Hongzhi Wen, Dongsheng Li, Jiliang Tang, Caihua Shan

    Abstract: Large Language Models (LLMs) have achieved great success in various reasoning tasks. In this work, we focus on the graph reasoning ability of LLMs. Although theoretical studies proved that LLMs are capable of handling graph reasoning tasks, empirical evaluations reveal numerous failures. To deepen our understanding on this discrepancy, we revisit the ability of LLMs on three fundamental graph task… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  5. arXiv:2408.09431  [pdf, other

    cs.CV

    Adversarial Attacked Teacher for Unsupervised Domain Adaptive Object Detection

    Authors: Kaiwen Wang, Yinzhe Shen, Martin Lauer

    Abstract: Object detectors encounter challenges in handling domain shifts. Cutting-edge domain adaptive object detection methods use the teacher-student framework and domain adversarial learning to generate domain-invariant pseudo-labels for self-training. However, the pseudo-labels generated by the teacher model tend to be biased towards the majority class and often mistakenly include overconfident false p… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  6. arXiv:2408.06631  [pdf, other

    cs.CL

    IFShip: A Large Vision-Language Model for Interpretable Fine-grained Ship Classification via Domain Knowledge-Enhanced Instruction Tuning

    Authors: Mingning Guo, Mengwei Wu, Yuxiang Shen, Haifeng Li, Chao Tao

    Abstract: End-to-end interpretation is currently the prevailing paradigm for remote sensing fine-grained ship classification (RS-FGSC) task. However, its inference process is uninterpretable, leading to criticism as a black box model. To address this issue, we propose a large vision-language model (LVLM) named IFShip for interpretable fine-grained ship classification. Unlike traditional methods, IFShip exce… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  7. arXiv:2408.06277  [pdf, other

    stat.ML cs.LG stat.ME

    Multi-marginal Schrödinger Bridges with Iterative Reference Refinement

    Authors: Yunyi Shen, Renato Berlinghieri, Tamara Broderick

    Abstract: Practitioners frequently aim to infer an unobserved population trajectory using sample snapshots at multiple time points. For instance, in single-cell sequencing, scientists would like to learn how gene expression evolves over time. But sequencing any cell destroys that cell. So we cannot access any cell's full trajectory, but we can access snapshot samples from many cells. Stochastic differential… ▽ More

    Submitted 16 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: Updated to fix title error

  8. arXiv:2408.05788  [pdf, other

    cs.LG cs.AI stat.ML

    Continual Learning of Nonlinear Independent Representations

    Authors: Boyang Sun, Ignavier Ng, Guangyi Chen, Yifan Shen, Qirong Ho, Kun Zhang

    Abstract: Identifying the causal relations between interested variables plays a pivotal role in representation learning as it provides deep insights into the dataset. Identifiability, as the central theme of this approach, normally hinges on leveraging data from multiple distributions (intervention, distribution shift, time series, etc.). Despite the exciting development in this field, a practical but often… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 9 pages, 5 Figures

  9. arXiv:2408.05211  [pdf, other

    cs.CV cs.AI cs.CL

    VITA: Towards Open-Source Interactive Omni Multimodal LLM

    Authors: Chaoyou Fu, Haojia Lin, Zuwei Long, Yunhang Shen, Meng Zhao, Yifan Zhang, Xiong Wang, Di Yin, Long Ma, Xiawu Zheng, Ran He, Rongrong Ji, Yunsheng Wu, Caifeng Shan, Xing Sun

    Abstract: The remarkable multimodal capabilities and interactive experience of GPT-4o underscore their necessity in practical applications, yet open-source models rarely excel in both areas. In this paper, we introduce VITA, the first-ever open-source Multimodal Large Language Model (MLLM) adept at simultaneous processing and analysis of Video, Image, Text, and Audio modalities, and meanwhile has an advance… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Project Page: https://vita-home.github.io

  10. arXiv:2408.04939  [pdf, other

    cs.CR cs.SE

    Demystifying and Detecting Cryptographic Defects in Ethereum Smart Contracts

    Authors: Jiashuo Zhang, Yiming Shen, Jiachi Chen, Jianzhong Su, Yanlin Wang, Ting Chen, Jianbo Gao, Zhong Chen

    Abstract: Ethereum has officially provided a set of system-level cryptographic APIs to enhance smart contracts with cryptographic capabilities. These APIs have been utilized in over 10% of Ethereum transactions, motivating developers to implement various on-chain cryptographic tasks, such as digital signatures. However, since developers may not always be cryptographic experts, their ad-hoc and potentially d… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  11. arXiv:2408.04594  [pdf, other

    cs.CV cs.AI

    Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models

    Authors: Qirui Jiao, Daoyuan Chen, Yilun Huang, Yaliang Li, Ying Shen

    Abstract: High-performance Multimodal Large Language Models (MLLMs) rely heavily on data quality. This study introduces a novel dataset named Img-Diff, designed to enhance fine-grained image recognition in MLLMs by leveraging insights from contrastive learning and image difference captioning. By analyzing object differences between similar images, we challenge models to identify both matching and distinct c… ▽ More

    Submitted 9 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 14 pages, 9 figures, 7 tables

  12. arXiv:2408.04334  [pdf, other

    cs.AR

    A Node-Based Polar List Decoder with Frame Interleaving and Ensemble Decoding Support

    Authors: Yuqing Ren, Leyu Zhang, Ludovic Damien Blanc, Yifei Shen, Xinwei Li, Alexios Balatsoukas-Stimming, Chuan Zhang, Andreas Burg

    Abstract: Node-based successive cancellation list (SCL) decoding has received considerable attention in wireless communications for its significant reduction in decoding latency, particularly with 5G New Radio (NR) polar codes. However, the existing node-based SCL decoders are constrained by sequential processing, leading to complicated and data-dependent computational units that introduce unavoidable stall… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 13 pages, 16 figures, accepted by IEEE Transactions on Circuits and Systems I: Regular Papers

  13. arXiv:2408.03013  [pdf, other

    cs.DB cs.AI cs.LG

    NeurDB: On the Design and Implementation of an AI-powered Autonomous Database

    Authors: Zhanhao Zhao, Shaofeng Cai, Haotian Gao, Hexiang Pan, Siqi Xiang, Naili Xing, Gang Chen, Beng Chin Ooi, Yanyan Shen, Yuncheng Wu, Meihui Zhang

    Abstract: Databases are increasingly embracing AI to provide autonomous system optimization and intelligent in-database analytics, aiming to relieve end-user burdens across various industry sectors. Nonetheless, most existing approaches fail to account for the dynamic nature of databases, which renders them ineffective for real-world applications characterized by evolving data and workloads. This paper intr… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  14. arXiv:2408.02047  [pdf, other

    eess.SY cs.AI

    Latency-Aware Resource Allocation for Mobile Edge Generation and Computing via Deep Reinforcement Learning

    Authors: Yinyu Wu, Xuhui Zhang, Jinke Ren, Huijun Xing, Yanyan Shen, Shuguang Cui

    Abstract: Recently, the integration of mobile edge computing (MEC) and generative artificial intelligence (GAI) technology has given rise to a new area called mobile edge generation and computing (MEGC), which offers mobile users heterogeneous services such as task computing and content generation. In this letter, we investigate the joint communication, computation, and the AIGC resource allocation problem… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: 5 pages, 5 figures, submitted to IEEE

  15. arXiv:2408.01723  [pdf, other

    cs.CV cs.IR

    A Novel Evaluation Framework for Image2Text Generation

    Authors: Jia-Hong Huang, Hongyi Zhu, Yixian Shen, Stevan Rudinac, Alessio M. Pacces, Evangelos Kanoulas

    Abstract: Evaluating the quality of automatically generated image descriptions is challenging, requiring metrics that capture various aspects such as grammaticality, coverage, correctness, and truthfulness. While human evaluation offers valuable insights, its cost and time-consuming nature pose limitations. Existing automated metrics like BLEU, ROUGE, METEOR, and CIDEr aim to bridge this gap but often show… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: The paper has been accepted for presentation at the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, specifically in the Large Language Model for Evaluation in IR (LLM4Eval) Workshop in 2024

  16. arXiv:2408.01122  [pdf, other

    cs.CL

    CFBench: A Comprehensive Constraints-Following Benchmark for LLMs

    Authors: Tao Zhang, Yanjun Shen, Wenjing Luo, Yan Zhang, Hao Liang, Tao Zhang, Fan Yang, Mingan Lin, Yujing Qiao, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

    Abstract: The adeptness of Large Language Models (LLMs) in comprehending and following natural language instructions is critical for their deployment in sophisticated real-world applications. Existing evaluations mainly focus on fragmented constraints or narrow scenarios, but they overlook the comprehensiveness and authenticity of constraints from the user's perspective. To bridge this gap, we propose CFBen… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 15 pages, 10 figures

  17. arXiv:2408.00665  [pdf, other

    cs.LG

    AutoM3L: An Automated Multimodal Machine Learning Framework with Large Language Models

    Authors: Daqin Luo, Chengjian Feng, Yuxuan Nong, Yiqing Shen

    Abstract: Automated Machine Learning (AutoML) offers a promising approach to streamline the training of machine learning models. However, existing AutoML frameworks are often limited to unimodal scenarios and require extensive manual configuration. Recent advancements in Large Language Models (LLMs) have showcased their exceptional abilities in reasoning, interaction, and code generation, presenting an oppo… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accpeted by ACMMM2024

  18. arXiv:2408.00203  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    OmniParser for Pure Vision Based GUI Agent

    Authors: Yadong Lu, Jianwei Yang, Yelong Shen, Ahmed Awadallah

    Abstract: The recent success of large vision language models shows great potential in driving the agent system operating on user interfaces. However, we argue that the power multimodal models like GPT-4V as a general agent on multiple operating systems across different applications is largely underestimated due to the lack of a robust screen parsing technique capable of: 1) reliably identifying interactable… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

  19. arXiv:2407.21693  [pdf, other

    cs.AI

    TransferTOD: A Generalizable Chinese Multi-Domain Task-Oriented Dialogue System with Transfer Capabilities

    Authors: Ming Zhang, Caishuang Huang, Yilong Wu, Shichun Liu, Huiyuan Zheng, Yurui Dong, Yujiong Shen, Shihan Dou, Jun Zhao, Junjie Ye, Qi Zhang, Tao Gui, Xuanjing Huang

    Abstract: Task-oriented dialogue (TOD) systems aim to efficiently handle task-oriented conversations, including information collection. How to utilize TOD accurately, efficiently and effectively for information collection has always been a critical and challenging task. Recent studies have demonstrated that Large Language Models (LLMs) excel in dialogue, instruction generation, and reasoning, and can signif… ▽ More

    Submitted 7 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  20. arXiv:2407.21581  [pdf, other

    cs.CV

    InScope: A New Real-world 3D Infrastructure-side Collaborative Perception Dataset for Open Traffic Scenarios

    Authors: Xiaofei Zhang, Yining Li, Jinping Wang, Xiangyi Qin, Ying Shen, Zhengping Fan, Xiaojun Tan

    Abstract: Perception systems of autonomous vehicles are susceptible to occlusion, especially when examined from a vehicle-centric perspective. Such occlusion can lead to overlooked object detections, e.g., larger vehicles such as trucks or buses may create blind spots where cyclists or pedestrians could be obscured, accentuating the safety concerns associated with such perception system limitations. To miti… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  21. arXiv:2407.20859  [pdf, other

    cs.CR cs.LG

    Breaking Agents: Compromising Autonomous LLM Agents Through Malfunction Amplification

    Authors: Boyang Zhang, Yicong Tan, Yun Shen, Ahmed Salem, Michael Backes, Savvas Zannettou, Yang Zhang

    Abstract: Recently, autonomous agents built on large language models (LLMs) have experienced significant development and are being deployed in real-world applications. These agents can extend the base LLM's capabilities in multiple ways. For example, a well-built agent using GPT-3.5-Turbo as its core can outperform the more advanced GPT-4 model by leveraging external components. More importantly, the usage… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  22. arXiv:2407.20502  [pdf, other

    cs.CV

    Restoring Real-World Degraded Events Improves Deblurring Quality

    Authors: Yeqing Shen, Shang Li, Kun Song

    Abstract: Due to its high speed and low latency, DVS is frequently employed in motion deblurring. Ideally, high-quality events would adeptly capture intricate motion information. However, real-world events are generally degraded, thereby introducing significant artifacts into the deblurred results. In response to this challenge, we model the degradation of events and propose RDNet to improve the quality of… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  23. arXiv:2407.20228  [pdf, other

    cs.CV

    FlexAttention for Efficient High-Resolution Vision-Language Models

    Authors: Junyan Li, Delin Chen, Tianle Cai, Peihao Chen, Yining Hong, Zhenfang Chen, Yikang Shen, Chuang Gan

    Abstract: Current high-resolution vision-language models encode images as high-resolution image tokens and exhaustively take all these tokens to compute attention, which significantly increases the computational cost. To address this problem, we propose FlexAttention, a flexible attention mechanism for efficient high-resolution vision-language models. Specifically, a high-resolution image is encoded both as… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  24. arXiv:2407.19041  [pdf, other

    cs.AI cs.CL

    Optimizing Numerical Estimation and Operational Efficiency in the Legal Domain through Large Language Models

    Authors: Jia-Hong Huang, Chao-Chun Yang, Yixian Shen, Alessio M. Pacces, Evangelos Kanoulas

    Abstract: The legal landscape encompasses a wide array of lawsuit types, presenting lawyers with challenges in delivering timely and accurate information to clients, particularly concerning critical aspects like potential imprisonment duration or financial repercussions. Compounded by the scarcity of legal experts, there's an urgent need to enhance the efficiency of traditional legal workflows. Recent advan… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: The paper has been accepted by the 33rd ACM International Conference on Information and Knowledge Management (CIKM) in 2024

  25. arXiv:2407.19039  [pdf, other

    cs.LG cs.AI physics.chem-ph q-bio.BM

    GraphBPE: Molecular Graphs Meet Byte-Pair Encoding

    Authors: Yuchen Shen, Barnabás Póczos

    Abstract: With the increasing attention to molecular machine learning, various innovations have been made in designing better models or proposing more comprehensive benchmarks. However, less is studied on the data preprocessing schedule for molecular graphs, where a different view of the molecular graph could potentially boost the model's performance. Inspired by the Byte-Pair Encoding (BPE) algorithm, a su… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: accepted by ICML 2024 AI for Science Workshop

  26. arXiv:2407.16449  [pdf, ps, other

    cs.IT math.CO

    Constrained coding upper bounds via Goulden-Jackson cluster theorem

    Authors: Yuanting Shen, Chong Shangguan, Zhicong Lin, Gennian Ge

    Abstract: Motivated by applications in DNA-based data storage, constrained codes have attracted a considerable amount of attention from both academia and industry. We study the maximum cardinality of constrained codes for which the constraints can be characterized by a set of forbidden substrings, where by a substring we mean some consecutive coordinates in a string. For finite-type constrained codes (for… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 21 pages

  27. arXiv:2407.14799  [pdf, other

    cs.CV cs.CY

    FairViT: Fair Vision Transformer via Adaptive Masking

    Authors: Bowei Tian, Ruijie Du, Yanning Shen

    Abstract: Vision Transformer (ViT) has achieved excellent performance and demonstrated its promising potential in various computer vision tasks. The wide deployment of ViT in real-world tasks requires a thorough understanding of the societal impact of the model. However, most ViT-based works do not take fairness into account and it is unclear whether directly applying CNN-oriented debiased algorithm to ViT… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

    Comments: 20 pages, The European Conference on Computer Vision (ECCV 2024)

  28. arXiv:2407.13739  [pdf, other

    cs.AI cs.CL cs.SE

    Scaling Granite Code Models to 128K Context

    Authors: Matt Stallone, Vaibhav Saxena, Leonid Karlinsky, Bridget McGinn, Tim Bula, Mayank Mishra, Adriana Meza Soria, Gaoyuan Zhang, Aditya Prasad, Yikang Shen, Saptha Surendran, Shanmukha Guttula, Hima Patel, Parameswaran Selvam, Xuan-Hong Dang, Yan Koyfman, Atin Sood, Rogerio Feris, Nirmit Desai, David D. Cox, Ruchir Puri, Rameswar Panda

    Abstract: This paper introduces long-context Granite code models that support effective context windows of up to 128K tokens. Our solution for scaling context length of Granite 3B/8B code models from 2K/4K to 128K consists of a light-weight continual pretraining by gradually increasing its RoPE base frequency with repository-level file packing and length-upsampled long-context data. Additionally, we also re… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  29. arXiv:2407.12358  [pdf, other

    cs.CV cs.CL

    ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

    Authors: Yufan Shen, Chuwei Luo, Zhaoqing Zhu, Yang Chen, Qi Zheng, Zhi Yu, Jiajun Bu, Cong Yao

    Abstract: Recently, large language models (LLMs) and multimodal large language models (MLLMs) have demonstrated promising results on document visual question answering (VQA) task, particularly after training on document instruction datasets. An effective evaluation method for document instruction data is crucial in constructing instruction data with high efficacy, which, in turn, facilitates the training of… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  30. arXiv:2407.10702  [pdf, ps, other

    cs.LG

    Geometric Analysis of Unconstrained Feature Models with $d=K$

    Authors: Yi Shen, Shao Gu

    Abstract: Recently, interesting empirical phenomena known as Neural Collapse have been observed during the final phase of training deep neural networks for classification tasks. We examine this issue when the feature dimension d is equal to the number of classes K. We demonstrate that two popular unconstrained feature models are strict saddle functions, with every critical point being either a global minimu… ▽ More

    Submitted 22 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

  31. arXiv:2407.08529  [pdf, other

    cs.CR

    Enhancing Privacy of Spatiotemporal Federated Learning against Gradient Inversion Attacks

    Authors: Lele Zheng, Yang Cao, Renhe Jiang, Kenjiro Taura, Yulong Shen, Sheng Li, Masatoshi Yoshikawa

    Abstract: Spatiotemporal federated learning has recently raised intensive studies due to its ability to train valuable models with only shared gradients in various location-based services. On the other hand, recent studies have shown that shared gradients may be subject to gradient inversion attacks (GIA) on images or texts. However, so far there has not been any systematic study of the gradient inversion a… ▽ More

    Submitted 15 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by DASFAA 2024, 16 pages

  32. arXiv:2407.08125  [pdf, ps, other

    cs.LG

    Real-Time Summarization of Twitter

    Authors: Yixin Jin, Meiqi Wang, Meng Li, Wenjing Zhou, Yi Shen, Hao Liu

    Abstract: In this paper, we describe our approaches to TREC Real-Time Summarization of Twitter. We focus on real time push notification scenario, which requires a system monitors the stream of sampled tweets and returns the tweets relevant and novel to given interest profiles. Dirichlet score with and with very little smoothing (baseline) are employed to classify whether a tweet is relevant to a given inter… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: This paper was accepted to International Conference on Artificial Intelligence and Electromechanical Automation 2024

  33. arXiv:2407.08064  [pdf, other

    cs.LG cs.AI

    TinyGraph: Joint Feature and Node Condensation for Graph Neural Networks

    Authors: Yezi Liu, Yanning Shen

    Abstract: Training graph neural networks (GNNs) on large-scale graphs can be challenging due to the high computational expense caused by the massive number of nodes and high-dimensional nodal features. Existing graph condensation studies tackle this problem only by reducing the number of nodes in the graph. However, the resulting condensed graph data can still be cumbersome. Specifically, although the nodes… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  34. arXiv:2407.07053  [pdf, other

    cs.CV

    Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model

    Authors: Wenqi Zhang, Zhenglin Cheng, Yuanyu He, Mengna Wang, Yongliang Shen, Zeqi Tan, Guiyang Hou, Mingqian He, Yanna Ma, Weiming Lu, Yueting Zhuang

    Abstract: Although most current large multimodal models (LMMs) can already understand photos of natural scenes and portraits, their understanding of abstract images, e.g., charts, maps, or layouts, and visual reasoning capabilities remains quite rudimentary. They often struggle with simple daily tasks, such as reading time from a clock, understanding a flowchart, or planning a route using a road map. In lig… ▽ More

    Submitted 8 August, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

    Comments: code: https://github.com/zwq2018/Multi-modal-Self-instruct dataset: https://huggingface.co/datasets/zwq2018/Multi-modal-Self-instruct Leaderboard: https://multi-modal-self-instruct.github.io/

  35. arXiv:2407.06324  [pdf, other

    cs.LG cs.CL cs.NE

    B'MOJO: Hybrid State Space Realizations of Foundation Models with Eidetic and Fading Memory

    Authors: Luca Zancato, Arjun Seshadri, Yonatan Dukler, Aditya Golatkar, Yantao Shen, Benjamin Bowman, Matthew Trager, Alessandro Achille, Stefano Soatto

    Abstract: We describe a family of architectures to support transductive inference by allowing memory to grow to a finite but a-priori unknown bound while making efficient use of finite resources for inference. Current architectures use such resources to represent data either eidetically over a finite span ("context" in Transformers), or fading over an infinite span (in State Space Models, or SSMs). Recent h… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  36. arXiv:2407.06227  [pdf, ps, other

    eess.SY cs.AI

    Communication and Control Co-Design in 6G: Sequential Decision-Making with LLMs

    Authors: Xianfu Chen, Celimuge Wu, Yi Shen, Yusheng Ji, Tsutomu Yoshinaga, Qiang Ni, Charilaos C. Zarakovitis, Honggang Zhang

    Abstract: This article investigates a control system within the context of six-generation wireless networks. The control performance optimization confronts the technical challenges that arise from the intricate interactions between communication and control sub-systems, asking for a co-design. Accounting for the system dynamics, we formulate the sequential co-design decision-makings of communication and con… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  37. arXiv:2407.06027  [pdf, other

    cs.CL

    PAS: Data-Efficient Plug-and-Play Prompt Augmentation System

    Authors: Miao Zheng, Hao Liang, Fan Yang, Haoze Sun, Tianpeng Li, Lingchu Xiong, Yan Zhang, Youzhen Wu, Kun Li, Yanjun Shen, Mingan Lin, Tao Zhang, Guosheng Dong, Yujing Qiao, Kun Fang, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

    Abstract: In recent years, the rise of Large Language Models (LLMs) has spurred a growing demand for plug-and-play AI systems. Among the various AI techniques, prompt engineering stands out as particularly significant. However, users often face challenges in writing prompts due to the steep learning curve and significant time investment, and existing automatic prompt engineering (APE) models can be difficul… ▽ More

    Submitted 7 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  38. arXiv:2407.05467  [pdf, other

    cs.DC cs.AI

    The infrastructure powering IBM's Gen AI model development

    Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

    Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

  39. arXiv:2407.05415  [pdf, other

    cs.CV

    DIVESPOT: Depth Integrated Volume Estimation of Pile of Things Based on Point Cloud

    Authors: Yiran Ling, Rongqiang Zhao, Yixuan Shen, Dongbo Li, Jing Jin, Jie Liu

    Abstract: Non-contact volume estimation of pile-type objects has considerable potential in industrial scenarios, including grain, coal, mining, and stone materials. However, using existing method for these scenarios is challenged by unstable measurement poses, significant light interference, the difficulty of training data collection, and the computational burden brought by large piles. To address the above… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  40. arXiv:2407.03745  [pdf, other

    cs.CR

    SRAS: Self-governed Remote Attestation Scheme for Multi-party Collaboration

    Authors: Linan Tian, Yunke Shen, Zhiqiang Li

    Abstract: Trusted Execution Environments (TEEs), such as Intel Software Guard Extensions (SGX), ensure the confidentiality and integrity of user applications when using cloud computing resources. However, in the multi-party cloud computing scenario, how to select a Relying Party to verify the TEE of each party and avoid leaking sensitive data to each other remains an open question. In this paper, we propose… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  41. arXiv:2407.03663  [pdf, other

    cs.CV

    Limited-View Photoacoustic Imaging Reconstruction Via High-quality Self-supervised Neural Representation

    Authors: Youshen xiao, Yuting Shen, Bowei Yao, Xiran Cai, Yuyao Zhang, Fei Gao

    Abstract: In practical applications within the human body, it is often challenging to fully encompass the target tissue or organ, necessitating the use of limited-view arrays, which can lead to the loss of crucial information. Addressing the reconstruction of photoacoustic sensor signals in limited-view detection spaces has become a focal point of current research. In this study, we introduce a self-supervi… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  42. arXiv:2407.03604  [pdf, other

    cs.CL cs.CV

    Lateralization LoRA: Interleaved Instruction Tuning with Modality-Specialized Adaptations

    Authors: Zhiyang Xu, Minqian Liu, Ying Shen, Joy Rimchala, Jiaxin Zhang, Qifan Wang, Yu Cheng, Lifu Huang

    Abstract: Recent advancements in Vision-Language Models (VLMs) have led to the development of Vision-Language Generalists (VLGs) capable of understanding and generating interleaved images and text. Despite these advances, VLGs still struggle to follow user instructions for interleaved text and image generation. To address this issue, we introduce LeafInstruct, the first open-sourced interleaved instruction… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 8 Pages, visual instruction tuning, parameter-efficient tuning

  43. arXiv:2407.02119  [pdf, other

    cs.LG cs.AI cs.CL

    Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning

    Authors: Yifang Chen, Shuohang Wang, Ziyi Yang, Hiteshi Sharma, Nikos Karampatziakis, Donghan Yu, Kevin Jamieson, Simon Shaolei Du, Yelong Shen

    Abstract: Reinforcement learning with human feedback (RLHF), as a widely adopted approach in current large language model pipelines, is \textit{bottlenecked by the size of human preference data}. While traditional methods rely on offline preference dataset constructions, recent approaches have shifted towards online settings, where a learner uses a small amount of labeled seed data and a large pool of unlab… ▽ More

    Submitted 9 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

  44. arXiv:2407.01546  [pdf, other

    cs.NE cs.LG math.OC

    Machine Learning-Enhanced Ant Colony Optimization for Column Generation

    Authors: Hongjie Xu, Yunzhuang Shen, Yuan Sun, Xiaodong Li

    Abstract: Column generation (CG) is a powerful technique for solving optimization problems that involve a large number of variables or columns. This technique begins by solving a smaller problem with a subset of columns and gradually generates additional columns as needed. However, the generation of columns often requires solving difficult subproblems repeatedly, which can be a bottleneck for CG. To address… ▽ More

    Submitted 22 April, 2024; originally announced July 2024.

    Comments: 9 pages including reference

  45. arXiv:2407.01491  [pdf, other

    cs.CL cs.CV

    Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning

    Authors: Siwei Li, Yifan Yang, Yifei Shen, Fangyun Wei, Zongqing Lu, Lili Qiu, Yuqing Yang

    Abstract: Efficient fine-tuning plays a fundamental role in modern large models, with low-rank adaptation emerging as a particularly promising approach. However, the existing variants of LoRA are hampered by limited expressiveness, a tendency to overfit, and sensitivity to hyperparameter settings. This paper presents LoRA Slow Cascade Learning (LoRASC), an innovative technique designed to enhance LoRA's exp… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  46. arXiv:2407.01455  [pdf, other

    cs.CL

    TimeToM: Temporal Space is the Key to Unlocking the Door of Large Language Models' Theory-of-Mind

    Authors: Guiyang Hou, Wenqi Zhang, Yongliang Shen, Linjuan Wu, Weiming Lu

    Abstract: Theory of Mind (ToM)-the cognitive ability to reason about mental states of ourselves and others, is the foundation of social interaction. Although ToM comes naturally to humans, it poses a significant challenge to even the most advanced Large Language Models (LLMs). Due to the complex logical chains in ToM reasoning, especially in higher-order ToM questions, simply utilizing reasoning methods lik… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 16 pages, 6 figures, ACL 2024(findings)

  47. arXiv:2407.01111  [pdf, other

    cs.LG cs.AI stat.ML

    Proximity Matters: Local Proximity Preserved Balancing for Treatment Effect Estimation

    Authors: Hao Wang, Zhichao Chen, Yuan Shen, Jiajun Fan, Zhaoran Liu, Degui Yang, Xinggao Liu, Haoxuan Li

    Abstract: Heterogeneous treatment effect (HTE) estimation from observational data poses significant challenges due to treatment selection bias. Existing methods address this bias by minimizing distribution discrepancies between treatment groups in latent space, focusing on global alignment. However, the fruitful aspect of local proximity, where similar units exhibit similar outcomes, is often overlooked. In… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Code is available at https://anonymous.4open.science/status/ncr-B697

  48. arXiv:2407.00934  [pdf, other

    cs.CL

    CLEME2.0: Towards More Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

    Authors: Jingheng Ye, Zishan Xu, Yinghui Li, Xuxin Cheng, Linlin Song, Qingyu Zhou, Hai-Tao Zheng, Ying Shen, Xin Su

    Abstract: The paper focuses on improving the interpretability of Grammatical Error Correction (GEC) metrics, which receives little attention in previous studies. To bridge the gap, we propose CLEME2.0, a reference-based evaluation strategy that can describe four elementary dimensions of GEC systems, namely hit-correction, error-correction, under-correction, and over-correction. They collectively contribute… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

    Comments: 16 pages, 8 tables, 2 figures. Under review

  49. arXiv:2407.00390  [pdf, other

    cs.CL

    Advancing Process Verification for Large Language Models via Tree-Based Preference Learning

    Authors: Mingqian He, Yongliang Shen, Wenqi Zhang, Zeqi Tan, Weiming Lu

    Abstract: Large Language Models (LLMs) have demonstrated remarkable potential in handling complex reasoning tasks by generating step-by-step rationales.Some methods have proven effective in boosting accuracy by introducing extra verifiers to assess these paths. However, existing verifiers, typically trained on binary-labeled reasoning paths, fail to fully utilize the relative merits of intermediate steps, t… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  50. arXiv:2406.19394  [pdf, other

    cs.CV

    HUWSOD: Holistic Self-training for Unified Weakly Supervised Object Detection

    Authors: Liujuan Cao, Jianghang Lin, Zebo Hong, Yunhang Shen, Shaohui Lin, Chao Chen, Rongrong Ji

    Abstract: Most WSOD methods rely on traditional object proposals to generate candidate regions and are confronted with unstable training, which easily gets stuck in a poor local optimum. In this paper, we introduce a unified, high-capacity weakly supervised object detection (WSOD) network called HUWSOD, which utilizes a comprehensive self-training framework without needing external modules or additional sup… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.