Skip to main content

Showing 1–50 of 182 results for author: Lyu, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.01739  [pdf, other

    cs.CV cs.AI

    LAM3D: Leveraging Attention for Monocular 3D Object Detection

    Authors: Diana-Alexandra Sas, Leandro Di Bella, Yangxintong Lyu, Florin Oniga, Adrian Munteanu

    Abstract: Since the introduction of the self-attention mechanism and the adoption of the Transformer architecture for Computer Vision tasks, the Vision Transformer-based architectures gained a lot of popularity in the field, being used for tasks such as image classification, object detection and image segmentation. However, efficiently leveraging the attention mechanism in vision transformers for the Monocu… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: 6 pages. Accepted to MMSP 2024

  2. arXiv:2407.14439  [pdf, other

    cs.CV

    Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding

    Authors: Renshan Zhang, Yibo Lyu, Rui Shao, Gongwei Chen, Weili Guan, Liqiang Nie

    Abstract: Cropping high-resolution document images into multiple sub-images is the most widely used approach for current Multimodal Large Language Models (MLLMs) to do document understanding. Most of current document understanding methods preserve all tokens within sub-images and treat them equally. This neglects their different informativeness and leads to a significant increase in the number of image toke… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  3. arXiv:2407.11351  [pdf, other

    cs.CV

    Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities

    Authors: Xu Zheng, Yuanhuiyi Lyu, Lin Wang

    Abstract: Image modality is not perfect as it often fails in certain conditions, e.g., night and fast motion. This significantly limits the robustness and versatility of existing multi-modal (i.e., Image+X) semantic segmentation methods when confronting modality absence or failure, as often occurred in real-world applications. Inspired by the open-world learning capability of multi-modal vision-language mod… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  4. arXiv:2407.11344  [pdf, other

    cs.CV

    Centering the Value of Every Modality: Towards Efficient and Resilient Modality-agnostic Semantic Segmentation

    Authors: Xu Zheng, Yuanhuiyi Lyu, Jiazhou Zhou, Lin Wang

    Abstract: Fusing an arbitrary number of modalities is vital for achieving robust multi-modal fusion of semantic segmentation yet remains less explored to date. Recent endeavors regard RGB modality as the center and the others as the auxiliary, yielding an asymmetric architecture with two branches. However, the RGB modality may struggle in certain circumstances, e.g., nighttime, while others, e.g., event dat… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  5. arXiv:2407.01884  [pdf, other

    cs.CV cs.HC

    EIT-1M: One Million EEG-Image-Text Pairs for Human Visual-textual Recognition and More

    Authors: Xu Zheng, Ling Wang, Kanghao Chen, Yuanhuiyi Lyu, Jiazhou Zhou, Lin Wang

    Abstract: Recently, electroencephalography (EEG) signals have been actively incorporated to decode brain activity to visual or textual stimuli and achieve object recognition in multi-modal AI. Accordingly, endeavors have been focused on building EEG-based datasets from visual or textual single-modal stimuli. However, these datasets offer limited EEG epochs per category, and the complex semantics of stimuli… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  6. arXiv:2406.19528  [pdf, other

    cs.HC cs.AI cs.CY

    Harnessing LLMs for Automated Video Content Analysis: An Exploratory Workflow of Short Videos on Depression

    Authors: Jiaying Lizzy Liu, Yunlong Wang, Yao Lyu, Yiheng Su, Shuo Niu, Xuhai Orson Xu, Yan Zhang

    Abstract: Despite the growing interest in leveraging Large Language Models (LLMs) for content analysis, current studies have primarily focused on text-based content. In the present work, we explored the potential of LLMs in assisting video content analysis by conducting a case study that followed a new workflow of LLM-assisted multimodal content analysis. The workflow encompasses codebook design, prompt eng… ▽ More

    Submitted 29 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

    Comments: 7 pages, 2 figures, accepted by CSCW 24

  7. arXiv:2406.14979  [pdf, other

    cs.CL

    Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation

    Authors: Yuanjie Lyu, Zihan Niu, Zheyong Xie, Chao Zhang, Tong Xu, Yang Wang, Enhong Chen

    Abstract: Despite the significant progress of large language models (LLMs) in various tasks, they often produce factual errors due to their limited internal knowledge. Retrieval-Augmented Generation (RAG), which enhances LLMs with external knowledge sources, offers a promising solution. However, these methods can be misled by irrelevant paragraphs in retrieved documents. Due to the inherent uncertainty in L… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  8. arXiv:2406.12802  [pdf, other

    cs.RO

    Decentralized Multi-Robot Line-of-Sight Connectivity Maintenance under Uncertainty

    Authors: Yupeng Yang, Yiwei Lyu, Yanze Zhang, Sha Yi, Wenhao Luo

    Abstract: In this paper, we propose a novel decentralized control method to maintain Line-of-Sight connectivity for multi-robot networks in the presence of Guassian-distributed localization uncertainty. In contrast to most existing work that assumes perfect positional information about robots or enforces overly restrictive rigid formation against uncertainty, our method enables robots to preserve Line-of-Si… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by RSS 2024

  9. arXiv:2406.04872  [pdf, other

    cs.LG

    Diversified Batch Selection for Training Acceleration

    Authors: Feng Hong, Yueming Lyu, Jiangchao Yao, Ya Zhang, Ivor W. Tsang, Yanfeng Wang

    Abstract: The remarkable success of modern machine learning models on large datasets often demands extensive training time and resource consumption. To save cost, a prevalent research line, known as online batch selection, explores selecting informative subsets during the training process. Although recent efforts achieve advancements by measuring the impact of each sample on generalization, their reliance o… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  10. arXiv:2406.00812  [pdf, other

    stat.ML cs.LG

    Covariance-Adaptive Sequential Black-box Optimization for Diffusion Targeted Generation

    Authors: Yueming Lyu, Kim Yong Tan, Yew Soon Ong, Ivor W. Tsang

    Abstract: Diffusion models have demonstrated great potential in generating high-quality content for images, natural language, protein domains, etc. However, how to perform user-preferred targeted generation via diffusion models with only black-box target scores of users remains challenging. To address this issue, we first formulate the fine-tuning of the targeted reserve-time stochastic differential equatio… ▽ More

    Submitted 8 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

  11. arXiv:2405.16108  [pdf, other

    cs.CV

    OmniBind: Teach to Build Unequal-Scale Modality Interaction for Omni-Bind of All

    Authors: Yuanhuiyi Lyu, Xu Zheng, Dahun Kim, Lin Wang

    Abstract: Research on multi-modal learning dominantly aligns the modalities in a unified space at training, and only a single one is taken for prediction at inference. However, for a real machine, e.g., a robot, sensors could be added or removed at any time. Thus, it is crucial to enable the machine to tackle the mismatch and unequal-scale problems of modality combinations between training and inference. In… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  12. arXiv:2405.13390  [pdf, ps, other

    cs.LG math.NA q-fin.MF

    Convergence analysis of kernel learning FBSDE filter

    Authors: Yunzheng Lyu, Feng Bao

    Abstract: Kernel learning forward backward SDE filter is an iterative and adaptive meshfree approach to solve the nonlinear filtering problem. It builds from forward backward SDE for Fokker-Planker equation, which defines evolving density for the state variable, and employs KDE to approximate density. This algorithm has shown more superior performance than mainstream particle filter method, in both converge… ▽ More

    Submitted 28 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  13. Hierarchical Learned Risk-Aware Planning Framework for Human Driving Modeling

    Authors: Nathan Ludlow, Yiwei Lyu, John Dolan

    Abstract: This paper presents a novel approach to modeling human driving behavior, designed for use in evaluating autonomous vehicle control systems in a simulation environments. Our methodology leverages a hierarchical forward-looking, risk-aware estimation framework with learned parameters to generate human-like driving trajectories, accommodating multiple driver levels determined by model parameters. Thi… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: 7 pages, 5 figures, accepted to the 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)

  14. arXiv:2404.16558  [pdf, other

    cs.CV cs.AI cs.RO

    DeepKalPose: An Enhanced Deep-Learning Kalman Filter for Temporally Consistent Monocular Vehicle Pose Estimation

    Authors: Leandro Di Bella, Yangxintong Lyu, Adrian Munteanu

    Abstract: This paper presents DeepKalPose, a novel approach for enhancing temporal consistency in monocular vehicle pose estimation applied on video through a deep-learning-based Kalman Filter. By integrating a Bi-directional Kalman filter strategy utilizing forward and backward time-series processing, combined with a learnable motion model to represent complex motion patterns, our method significantly impr… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: 4 pages, 3 Figures, published to IET Electronic Letters

    Journal ref: Electronics Letters (ISSN: 00135194), jaar: 2024, volume: 60, nummer: 8, startpagina: ?

  15. arXiv:2404.15576  [pdf, ps, other

    cs.HC

    Designing AI-Enabled Games to Support Social-Emotional Learning for Children with Autism Spectrum Disorders

    Authors: Yue Lyu, Pengcheng An, Huan Zhang, Keiko Katsuragawa, Jian Zhao

    Abstract: Children with autism spectrum disorder (ASD) experience challenges in grasping social-emotional cues, which can result in difficulties in recognizing emotions and understanding and responding to social interactions. Social-emotional intervention is an effective method to improve emotional understanding and facial expression recognition among individuals with ASD. Existing work emphasizes the impor… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 2 pages, 1 table, peer-reviewed and presented at the "CHI 2024 Workshop on Child-centred AI Design, May 11, 2024, Honolulu, HI, USA"

  16. arXiv:2404.14305  [pdf, other

    cs.HC

    "I Upload...All Types of Different Things to Say, the World of Blindness Is More Than What They Think It Is": A Study of Blind TikTokers' Identity Work from a Flourishing Perspective

    Authors: Yao Lyu, Jie Cai, Bryan Dosono, Davis Yadav, John M. Carroll

    Abstract: Identity work in Human-Computer Interaction (HCI) has focused on the marginalized group to explore designs to support their asset (what they have). However, little has been explored specifically on the identity work of people with disabilities, specifically, visual impairments. In this study, we interviewed 45 BlindTokers (blind users on TikTok) from various backgrounds to understand their identit… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: ACM CSCW

  17. arXiv:2404.12400  [pdf, other

    cs.LG

    Efflex: Efficient and Flexible Pipeline for Spatio-Temporal Trajectory Graph Modeling and Representation Learning

    Authors: Ming Cheng, Ziyi Zhou, Bowen Zhang, Ziyu Wang, Jiaqi Gan, Ziang Ren, Weiqi Feng, Yi Lyu, Hefan Zhang, Xingjian Diao

    Abstract: In the landscape of spatio-temporal data analytics, effective trajectory representation learning is paramount. To bridge the gap of learning accurate representations with efficient and flexible mechanisms, we introduce Efflex, a comprehensive pipeline for transformative graph modeling and representation learning of the large-volume spatio-temporal trajectories. Efflex pioneers the incorporation of… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  18. arXiv:2404.11595  [pdf, other

    cs.SE

    A Deep Dive into Large Language Models for Automated Bug Localization and Repair

    Authors: Soneya Binta Hossain, Nan Jiang, Qiang Zhou, Xiaopeng Li, Wen-Hao Chiang, Yingjun Lyu, Hoan Nguyen, Omer Tripp

    Abstract: Large language models (LLMs) have shown impressive effectiveness in various software engineering tasks, including automated program repair (APR). In this study, we take a deep dive into automated bug fixing utilizing LLMs. In contrast to many deep learning-based APR methods that assume known bug locations, rely on line-level localization tools, or address bug prediction and fixing in one step, our… ▽ More

    Submitted 10 May, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  19. arXiv:2404.09425  [pdf, other

    eess.IV cs.CV

    Super-resolution of biomedical volumes with 2D supervision

    Authors: Cheng Jiang, Alexander Gedeon, Yiwei Lyu, Eric Landgraf, Yufeng Zhang, Xinhai Hou, Akhil Kondepudi, Asadur Chowdury, Honglak Lee, Todd Hollon

    Abstract: Volumetric biomedical microscopy has the potential to increase the diagnostic information extracted from clinical tissue specimens and improve the diagnostic accuracy of both human pathologists and computational pathology models. Unfortunately, barriers to integrating 3-dimensional (3D) volumetric microscopy into clinical medicine include long imaging times, poor depth / z-axis resolution, and an… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: CVPR Workshop on Computer Vision for Microscopy Image Analysis 2024

  20. arXiv:2404.08021  [pdf, other

    cs.LG cs.AI cs.RO

    VeTraSS: Vehicle Trajectory Similarity Search Through Graph Modeling and Representation Learning

    Authors: Ming Cheng, Bowen Zhang, Ziyu Wang, Ziyi Zhou, Weiqi Feng, Yi Lyu, Xingjian Diao

    Abstract: Trajectory similarity search plays an essential role in autonomous driving, as it enables vehicles to analyze the information and characteristics of different trajectories to make informed decisions and navigate safely in dynamic environments. Existing work on the trajectory similarity search task primarily utilizes sequence-processing algorithms or Recurrent Neural Networks (RNNs), which suffer f… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  21. arXiv:2404.01789  [pdf

    cs.SE

    A Feature Dataset of Microservices-based Systems

    Authors: Weipan Yang, Yongchao Xing, Yiming Lyu, Zhihao Liang, Zhiying Tu

    Abstract: Microservice architecture has become a dominant architectural style in the service-oriented software industry. Poor practices in the design and development of microservices are called microservice bad smells. In microservice bad smells research, the detection of these bad smells relies on feature data from microservices. However, there is a lack of an appropriate open-source microservice feature d… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

  22. arXiv:2403.14102  [pdf, other

    cs.AI cs.LG

    DouRN: Improving DouZero by Residual Neural Networks

    Authors: Yiquan Chen, Yingchao Lyu, Di Zhang

    Abstract: Deep reinforcement learning has made significant progress in games with imperfect information, but its performance in the card game Doudizhu (Chinese Poker/Fight the Landlord) remains unsatisfactory. Doudizhu is different from conventional games as it involves three players and combines elements of cooperation and confrontation, resulting in a large state and action space. In 2021, a Doudizhu prog… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Journal ref: CyberC 2023: 96-99

  23. arXiv:2403.13680  [pdf, other

    eess.IV cs.CV

    Step-Calibrated Diffusion for Biomedical Optical Image Restoration

    Authors: Yiwei Lyu, Sung Jik Cha, Cheng Jiang, Asadur Chowdury, Xinhai Hou, Edward Harake, Akhil Kondepudi, Christian Freudiger, Honglak Lee, Todd C. Hollon

    Abstract: High-quality, high-resolution medical imaging is essential for clinical care. Raman-based biomedical optical imaging uses non-ionizing infrared radiation to evaluate human tissues in real time and is used for early cancer detection, brain tumor diagnosis, and intraoperative tissue analysis. Unfortunately, optical imaging is vulnerable to image degradation due to laser scattering and absorption, wh… ▽ More

    Submitted 16 May, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  24. arXiv:2403.12847  [pdf, other

    cs.LG

    Policy Bifurcation in Safe Reinforcement Learning

    Authors: Wenjun Zou, Yao Lyu, Jie Li, Yujie Yang, Shengbo Eben Li, Jingliang Duan, Xianyuan Zhan, Jingjing Liu, Yaqin Zhang, Keqiang Li

    Abstract: Safe reinforcement learning (RL) offers advanced solutions to constrained optimal control problems. Existing studies in safe RL implicitly assume continuity in policy functions, where policies map states to actions in a smooth, uninterrupted manner; however, our research finds that in some scenarios, the feasible policy should be discontinuous or multi-valued, interpolating between discontinuous l… ▽ More

    Submitted 28 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  25. arXiv:2403.12534  [pdf, other

    cs.CV

    ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More

    Authors: Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu, Lin Wang

    Abstract: Event cameras have recently been shown beneficial for practical vision tasks, such as action recognition, thanks to their high temporal resolution, power efficiency, and reduced privacy concerns. However, current research is hindered by 1) the difficulty in processing events because of their prolonged duration and dynamic actions with complex and ambiguous semantics and 2) the redundant action dep… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  26. arXiv:2403.12532  [pdf, other

    cs.CV

    UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All

    Authors: Yuanhuiyi Lyu, Xu Zheng, Jiazhou Zhou, Lin Wang

    Abstract: We present UniBind, a flexible and efficient approach that learns a unified representation space for seven diverse modalities -- images, text, audio, point cloud, thermal, video, and event data. Existing works, eg., ImageBind, treat the image as the central modality and build an image-centered representation space; however, the space may be sub-optimal as it leads to an unbalanced representation s… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR2024

  27. A Preliminary Exploration of YouTubers' Use of Generative-AI in Content Creation

    Authors: Yao Lyu, He Zhang, Shuo Niu, Jie Cai

    Abstract: Content creators increasingly utilize generative artificial intelligence (Gen-AI) on platforms such as YouTube, TikTok, Instagram, and various blogging sites to produce imaginative images, AI-generated videos, and articles using Large Language Models (LLMs). Despite its growing popularity, there remains an underexplored area concerning the specific domains where AI-generated content is being appli… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

    Comments: Accepted at CHI LBW 2024

  28. arXiv:2402.14590  [pdf, other

    cs.IR cs.CL cs.LG

    Scaling Up LLM Reviews for Google Ads Content Moderation

    Authors: Wei Qiao, Tushar Dogra, Otilia Stretcu, Yu-Han Lyu, Tiantian Fang, Dongjin Kwon, Chun-Ta Lu, Enming Luo, Yuan Wang, Chih-Chun Chia, Ariel Fuxman, Fangzhou Wang, Ranjay Krishna, Mehmet Tek

    Abstract: Large language models (LLMs) are powerful tools for content moderation, but their inference costs and latency make them prohibitive for casual use on large datasets, such as the Google Ads repository. This study proposes a method for scaling up LLM reviews for content moderation in Google Ads. First, we use heuristics to select candidates via filtering and duplicate removal, and create clusters of… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  29. arXiv:2402.11176  [pdf, other

    cs.CL cs.AI

    KnowTuning: Knowledge-aware Fine-tuning for Large Language Models

    Authors: Yougang Lyu, Lingyong Yan, Shuaiqiang Wang, Haibo Shi, Dawei Yin, Pengjie Ren, Zhumin Chen, Maarten de Rijke, Zhaochun Ren

    Abstract: Despite their success at many natural language processing (NLP) tasks, large language models still struggle to effectively leverage knowledge for knowledge-intensive tasks, manifesting limitations such as generating incomplete, non-factual, or illogical answers. These limitations stem from inadequate knowledge awareness of LLMs during vanilla fine-tuning. To address these problems, we propose a kn… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 February, 2024; originally announced February 2024.

  30. arXiv:2402.06188  [pdf, other

    cs.CV cs.AI cs.LG

    A self-supervised framework for learning whole slide representations

    Authors: Xinhai Hou, Cheng Jiang, Akhil Kondepudi, Yiwei Lyu, Asadur Chowdury, Honglak Lee, Todd C. Hollon

    Abstract: Whole slide imaging is fundamental to biomedical microscopy and computational pathology. Previously, learning representations for gigapixel-sized whole slide images (WSIs) has relied on multiple instance learning with weak labels, which do not annotate the diverse morphologic features and spatial heterogeneity of WSIs. A high-quality self-supervised learning method for WSIs would provide transfera… ▽ More

    Submitted 23 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: 26 pages, 11 figures

  31. arXiv:2401.17664  [pdf, other

    cs.CV cs.GR

    Image Anything: Towards Reasoning-coherent and Training-free Multi-modal Image Generation

    Authors: Yuanhuiyi Lyu, Xu Zheng, Lin Wang

    Abstract: The multifaceted nature of human perception and comprehension indicates that, when we think, our body can naturally take any combination of senses, a.k.a., modalities and form a beautiful picture in our brain. For example, when we see a cattery and simultaneously perceive the cat's purring sound, our brain can construct a picture of a cat in the cattery. Intuitively, generative AI models should ho… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

  32. arXiv:2401.17043  [pdf, other

    cs.CL

    CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

    Authors: Yuanjie Lyu, Zhiyu Li, Simin Niu, Feiyu Xiong, Bo Tang, Wenjin Wang, Hao Wu, Huanyong Liu, Tong Xu, Enhong Chen

    Abstract: Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This method addresses common LLM limitations, including outdated information and the tendency to produce inaccurate "hallucinated" content. However, the evaluation of RAG systems is challenging, as existing benchmarks are limited in scope a… ▽ More

    Submitted 15 July, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

    Comments: 40 Pages

  33. "I Got Flagged for Supposed Bullying, Even Though It Was in Response to Someone Harassing Me About My Disability.": A Study of Blind TikTokers' Content Moderation Experiences

    Authors: Yao Lyu, Jie Cai, Anisa Callis, Kelley Cotter, John M. Carroll

    Abstract: The Human-Computer Interaction (HCI) community has consistently focused on the experiences of users moderated by social media platforms. Recently, scholars have noticed that moderation practices could perpetuate biases, resulting in the marginalization of user groups undergoing moderation. However, most studies have primarily addressed marginalization related to issues such as racism or sexism, wi… ▽ More

    Submitted 21 January, 2024; originally announced January 2024.

    Comments: 24 paged, 1 Figure, accepted by CHI'24

  34. arXiv:2401.09455  [pdf, other

    cs.NI cs.AI cs.LG eess.SY

    Dynamic Routing for Integrated Satellite-Terrestrial Networks: A Constrained Multi-Agent Reinforcement Learning Approach

    Authors: Yifeng Lyu, Han Hu, Rongfei Fan, Zhi Liu, Jianping An, Shiwen Mao

    Abstract: The integrated satellite-terrestrial network (ISTN) system has experienced significant growth, offering seamless communication services in remote areas with limited terrestrial infrastructure. However, designing a routing scheme for ISTN is exceedingly difficult, primarily due to the heightened complexity resulting from the inclusion of additional ground stations, along with the requirement to sat… ▽ More

    Submitted 22 December, 2023; originally announced January 2024.

  35. arXiv:2401.04429  [pdf, other

    cs.AI cs.MA

    i-Rebalance: Personalized Vehicle Repositioning for Supply Demand Balance

    Authors: Haoyang Chen, Peiyan Sun, Qiyuan Song, Wanyuan Wang, Weiwei Wu, Wencan Zhang, Guanyu Gao, Yan Lyu

    Abstract: Ride-hailing platforms have been facing the challenge of balancing demand and supply. Existing vehicle reposition techniques often treat drivers as homogeneous agents and relocate them deterministically, assuming compliance with the reposition. In this paper, we consider a more realistic and driver-centric scenario where drivers have unique cruising preferences and can decide whether to take the r… ▽ More

    Submitted 2 April, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

  36. arXiv:2312.17677  [pdf, other

    cs.CR cs.SE

    Prompt Fuzzing for Fuzz Driver Generation

    Authors: Yunlong Lyu, Yuxuan Xie, Peng Chen, Hao Chen

    Abstract: Crafting high-quality fuzz drivers not only is time-consuming but also requires a deep understanding of the library. However, the state-of-the-art automatic fuzz driver generation techniques fall short of expectations. While fuzz drivers derived from consumer code can reach deep states, they have limited coverage. Conversely, interpretative fuzzing can explore most API calls but requires numerous… ▽ More

    Submitted 29 May, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

    Comments: To appear in the ACM CCS 2024

  37. arXiv:2312.10934  [pdf, other

    cs.SE

    APIDocBooster: An Extract-Then-Abstract Framework Leveraging Large Language Models for Augmenting API Documentation

    Authors: Chengran Yang, Jiakun Liu, Bowen Xu, Christoph Treude, Yunbo Lyu, Junda He, Ming Li, David Lo

    Abstract: API documentation is often the most trusted resource for programming. Many approaches have been proposed to augment API documentation by summarizing complementary information from external resources such as Stack Overflow. Existing extractive-based summarization approaches excel in producing faithful summaries that accurately represent the source content without input length restrictions. Neverthe… ▽ More

    Submitted 10 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  38. arXiv:2312.05762  [pdf, other

    cs.CL

    Multi-Defendant Legal Judgment Prediction via Hierarchical Reasoning

    Authors: Yougang Lyu, Jitai Hao, Zihan Wang, Kai Zhao, Shen Gao, Pengjie Ren, Zhumin Chen, Fang Wang, Zhaochun Ren

    Abstract: Multiple defendants in a criminal fact description generally exhibit complex interactions, and cannot be well handled by existing Legal Judgment Prediction (LJP) methods which focus on predicting judgment results (e.g., law articles, charges, and terms of penalty) for single-defendant cases. To address this problem, we propose the task of multi-defendant LJP, which aims to automatically predict th… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: EMNLP2023 Findings

  39. arXiv:2312.05256  [pdf, other

    eess.IV cs.AI

    Holistic Evaluation of GPT-4V for Biomedical Imaging

    Authors: Zhengliang Liu, Hanqi Jiang, Tianyang Zhong, Zihao Wu, Chong Ma, Yiwei Li, Xiaowei Yu, Yutong Zhang, Yi Pan, Peng Shu, Yanjun Lyu, Lu Zhang, Junjie Yao, Peixin Dong, Chao Cao, Zhenxiang Xiao, Jiaqi Wang, Huan Zhao, Shaochen Xu, Yaonai Wei, Jingyuan Chen, Haixing Dai, Peilong Wang, Hao He, Zewei Wang , et al. (25 additional authors not shown)

    Abstract: In this paper, we present a large-scale evaluation probing GPT-4V's capabilities and limitations for biomedical image analysis. GPT-4V represents a breakthrough in artificial general intelligence (AGI) for computer vision, with applications in the biomedical domain. We assess GPT-4V's performance across 16 medical imaging categories, including radiology, oncology, ophthalmology, pathology, and mor… ▽ More

    Submitted 10 November, 2023; originally announced December 2023.

  40. arXiv:2312.04810  [pdf, other

    cs.CV

    RS-Corrector: Correcting the Racial Stereotypes in Latent Diffusion Models

    Authors: Yue Jiang, Yueming Lyu, Tianxiang Ma, Bo Peng, Jing Dong

    Abstract: Recent text-conditioned image generation models have demonstrated an exceptional capacity to produce diverse and creative imagery with high visual quality. However, when pre-trained on billion-sized datasets randomly collected from the Internet, where potential biased human preferences exist, these models tend to produce images with common and recurring stereotypes, particularly for certain racial… ▽ More

    Submitted 20 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

    Comments: 16 pages, 15 figures, conference

  41. arXiv:2312.04668  [pdf, other

    cs.CL cs.AI cs.LG

    TOD-Flow: Modeling the Structure of Task-Oriented Dialogues

    Authors: Sungryull Sohn, Yiwei Lyu, Anthony Liu, Lajanugen Logeswaran, Dong-Ki Kim, Dongsub Shim, Honglak Lee

    Abstract: Task-Oriented Dialogue (TOD) systems have become crucial components in interactive artificial intelligence applications. While recent advances have capitalized on pre-trained language models (PLMs), they exhibit limitations regarding transparency and controllability. To address these challenges, we propose a novel approach focusing on inferring the TOD-Flow graph from dialogue data annotated with… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  42. arXiv:2311.09601  [pdf, other

    cs.AI

    Code Models are Zero-shot Precondition Reasoners

    Authors: Lajanugen Logeswaran, Sungryull Sohn, Yiwei Lyu, Anthony Zhe Liu, Dong-Ki Kim, Dongsub Shim, Moontae Lee, Honglak Lee

    Abstract: One of the fundamental skills required for an agent acting in an environment to complete tasks is the ability to understand what actions are plausible at any given point. This work explores a novel use of code representations to reason about action preconditions for sequential decision making tasks. Code representations offer the flexibility to model procedural activities and associated constraint… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: Neurips Foundation Models for Decision Making Workshop 2023

  43. arXiv:2311.03213  [pdf, other

    cs.SE

    On the Model Update Strategies for Supervised Learning in AIOps Solutions

    Authors: Yingzhe Lyu, Heng Li, Zhen Ming, Jiang, Ahmed E. Hassan

    Abstract: AIOps (Artificial Intelligence for IT Operations) solutions leverage the massive data produced during the operation of large-scale systems and machine learning models to assist software engineers in their system operations. As operation data produced in the field are constantly evolving due to factors such as the changing operational environment and user base, the models in AIOps solutions need to… ▽ More

    Submitted 11 April, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

  44. arXiv:2310.08855  [pdf, other

    cs.LG

    Overcoming Recency Bias of Normalization Statistics in Continual Learning: Balance and Adaptation

    Authors: Yilin Lyu, Liyuan Wang, Xingxing Zhang, Zicheng Sun, Hang Su, Jun Zhu, Liping Jing

    Abstract: Continual learning entails learning a sequence of tasks and balancing their knowledge appropriately. With limited access to old training samples, much of the current work in deep neural networks has focused on overcoming catastrophic forgetting of old tasks in gradient-based optimization. However, the normalization layers provide an exception, as they are updated interdependently by the gradient a… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023

  45. arXiv:2310.08785  [pdf, other

    cs.CV cs.AI

    DeltaSpace: A Semantic-aligned Feature Space for Flexible Text-guided Image Editing

    Authors: Yueming Lyu, Kang Zhao, Bo Peng, Yue Jiang, Yingya Zhang, Jing Dong

    Abstract: Text-guided image editing faces significant challenges to training and inference flexibility. Much literature collects large amounts of annotated image-text pairs to train text-conditioned generative models from scratch, which is expensive and not efficient. After that, some approaches that leverage pre-trained vision-language models are put forward to avoid data collection, but they are also limi… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

    Comments: 17 pages. arXiv admin note: text overlap with arXiv:2303.06285

  46. "Because Some Sighted People, They Don't Know What the Heck You're Talking About:" A Study of Blind TikTokers' Infrastructuring Work to Build Independence

    Authors: Yao Lyu, John M. Carroll

    Abstract: There has been extensive research on the experiences of individuals with visual impairments on text- and image-based social media platforms, such as Facebook and Twitter. However, little is known about the experiences of visually impaired users on short-video platforms like TikTok. To bridge this gap, we conducted an interview study with 30 BlindTokers (the nickname of blind TikTokers). Our study… ▽ More

    Submitted 11 December, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted at CSCW'24, 29 pages, 2 figures, and 2 tables

  47. arXiv:2310.04162  [pdf, other

    cs.RO

    Light-LOAM: A Lightweight LiDAR Odometry and Mapping based on Graph-Matching

    Authors: Shiquan Yi, Yang Lyu, Lin Hua, Quan Pan, Chunhui Zhao

    Abstract: Simultaneous Localization and Mapping (SLAM) plays an important role in robot autonomy. Reliability and efficiency are the two most valued features for applying SLAM in robot applications. In this paper, we consider achieving a reliable LiDAR-based SLAM function in computation-limited platforms, such as quadrotor UAVs based on graph-based point cloud association. First, contrary to most works sele… ▽ More

    Submitted 6 October, 2023; originally announced October 2023.

  48. arXiv:2309.16071  [pdf, other

    cs.SI

    Influence Pathway Discovery on Social Media

    Authors: Xinyi Liu, Ruijie Wang, Dachun Sun, Jinning Li, Christina Youn, You Lyu, Jianyuan Zhan, Dayou Wu, Xinhe Xu, Mingjun Liu, Xinshuo Lei, Zhihao Xu, Yutong Zhang, Zehao Li, Qikai Yang, Tarek Abdelzaher

    Abstract: This paper addresses influence pathway discovery, a key emerging problem in today's online media. We propose a discovery algorithm that leverages recently published work on unsupervised interpretable ideological embedding, a mapping of ideological beliefs (done in a self-supervised fashion) into interpretable low-dimensional spaces. Computing the ideological embedding at scale allows one to analyz… ▽ More

    Submitted 27 September, 2023; originally announced September 2023.

    Comments: This paper is accepted by IEEE CIC as an invited vision paper

  49. arXiv:2309.10771  [pdf, other

    cs.HC

    Redefining Qualitative Analysis in the AI Era: Utilizing ChatGPT for Efficient Thematic Analysis

    Authors: He Zhang, Chuhao Wu, Jingyi Xie, Yao Lyu, Jie Cai, John M. Carroll

    Abstract: AI tools, particularly large-scale language model (LLM) based applications such as ChatGPT, have the potential to simplify qualitative research. Through semi-structured interviews with seventeen participants, we identified challenges and concerns in integrating ChatGPT into the qualitative analysis process. Collaborating with thirteen qualitative researchers, we developed a framework for designing… ▽ More

    Submitted 27 May, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

  50. arXiv:2309.09297  [pdf, other

    cs.CV cs.RO

    Chasing Day and Night: Towards Robust and Efficient All-Day Object Detection Guided by an Event Camera

    Authors: Jiahang Cao, Xu Zheng, Yuanhuiyi Lyu, Jiaxu Wang, Renjing Xu, Lin Wang

    Abstract: The ability to detect objects in all lighting (i.e., normal-, over-, and under-exposed) conditions is crucial for real-world applications, such as self-driving.Traditional RGB-based detectors often fail under such varying lighting conditions.Therefore, recent works utilize novel event cameras to supplement or guide the RGB modality; however, these methods typically adopt asymmetric network structu… ▽ More

    Submitted 18 March, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: Accepted by ICRA 2024