Skip to main content

Showing 1–50 of 1,679 results for author: Liu, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11535  [pdf, other

    cs.CV

    SAM-REF: Rethinking Image-Prompt Synergy for Refinement in Segment Anything

    Authors: Chongkai Yu, Anqi Li, Xiaochao Qu, Luoqi Liu, Ting Liu

    Abstract: The advent of the Segment Anything Model (SAM) marks a significant milestone for interactive segmentation using generalist models. As a late fusion model, SAM extracts image embeddings once and merges them with prompts in later interactions. This strategy limits the models ability to extract detailed information from the prompted target zone. Current specialist models utilize the early fusion stra… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  2. arXiv:2408.11431  [pdf, other

    cs.CL cs.AI

    Diagnosing and Remedying Knowledge Deficiencies in LLMs via Label-free Curricular Meaningful Learning

    Authors: Kai Xiong, Xiao Ding, Li Du, Jiahao Ying, Ting Liu, Bing Qin, Yixin Cao

    Abstract: Large Language Models (LLMs) are versatile and demonstrate impressive generalization ability by mining and learning information from extensive unlabeled text. However, they still exhibit reasoning mistakes, often stemming from knowledge deficiencies, which can affect their trustworthiness and reliability. Although users can provide diverse and comprehensive queries, obtaining sufficient and effect… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

    Comments: Under Review

  3. arXiv:2408.10627  [pdf, other

    cs.CV

    Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended?

    Authors: Chen Liang, Qiang Guo, Xiaochao Qu, Luoqi Liu, Ting Liu

    Abstract: Video segmentation aims at partitioning video sequences into meaningful segments based on objects or regions of interest within frames. Current video segmentation models are often derived from image segmentation techniques, which struggle to cope with small-scale or class-imbalanced video datasets. This leads to inconsistent segmentation results across frames. To address these issues, we propose a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  4. arXiv:2408.10623  [pdf, other

    cs.CV

    TextMastero: Mastering High-Quality Scene Text Editing in Diverse Languages and Styles

    Authors: Tong Wang, Xiaochao Qu, Ting Liu

    Abstract: Scene text editing aims to modify texts on images while maintaining the style of newly generated text similar to the original. Given an image, a target area, and target text, the task produces an output image with the target text in the selected area, replacing the original. This task has been studied extensively, with initial success using Generative Adversarial Networks (GANs) to balance text fi… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  5. arXiv:2408.09916  [pdf, other

    cs.CV cs.CL

    Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit

    Authors: Qizhou Chen, Taolin Zhang, Chengyu Wang, Xiaofeng He, Dakan Wang, Tingting Liu

    Abstract: Model editing aims to correct outdated or erroneous knowledge in large models without costly retraining. Recent research discovered that the mid-layer representation of the subject's final token in a prompt has a strong influence on factual predictions, and developed Large Language Model (LLM) editing techniques based on this observation. However, for Vision-LLMs (VLLMs), how visual representation… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  6. arXiv:2408.09819  [pdf, other

    cs.CL cs.AI

    CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models

    Authors: Linhao Yu, Yongqi Leng, Yufei Huang, Shang Wu, Haixin Liu, Xinmeng Ji, Jiahui Zhao, Jinwang Song, Tingting Cui, Xiaoqing Cheng, Tao Liu, Deyi Xiong

    Abstract: What a large language model (LLM) would respond in ethically relevant context? In this paper, we curate a large benchmark CMoralEval for morality evaluation of Chinese LLMs. The data sources of CMoralEval are two-fold: 1) a Chinese TV program discussing Chinese moral norms with stories from the society and 2) a collection of Chinese moral anomies from various newspapers and academic papers on mora… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted by ACL 2024 (Findings)

  7. arXiv:2408.09465  [pdf, other

    cs.CV cs.AI

    MedMAP: Promoting Incomplete Multi-modal Brain Tumor Segmentation with Alignment

    Authors: Tianyi Liu, Zhaorui Tan, Muyin Chen, Xi Yang, Haochuan Jiang, Kaizhu Huang

    Abstract: Brain tumor segmentation is often based on multiple magnetic resonance imaging (MRI). However, in clinical practice, certain modalities of MRI may be missing, which presents a more difficult scenario. To cope with this challenge, Knowledge Distillation, Domain Adaption, and Shared Latent Space have emerged as commonly promising strategies. However, recent efforts typically overlook the modality ga… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  8. arXiv:2408.09198  [pdf, other

    cs.RO

    Learning Based Toolpath Planner on Diverse Graphs for 3D Printing

    Authors: Yuming Huang, Yuhu Guo, Renbo Su, Xingjian Han, Junhao Ding, Tianyu Zhang, Tao Liu, Weiming Wang, Guoxin Fang, Xu Song, Emily Whiting, Charlie C. L. Wang

    Abstract: This paper presents a learning based planner for computing optimized 3D printing toolpaths on prescribed graphs, the challenges of which include the varying graph structures on different models and the large scale of nodes & edges on a graph. We adopt an on-the-fly strategy to tackle these challenges, formulating the planner as a Deep Q-Network (DQN) based optimizer to decide the next `best' node… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  9. arXiv:2408.08323  [pdf, other

    cs.HC

    Exploring Urban Comfort through Novel Wearables and Environmental Surveys

    Authors: Patrick Chwalek, Sailin Zhong, Nathan Perry, Tianqi Liu, Clayton Miller, Hamed Seiied Alavi, Denis Lalanne, Joseph A. Paradiso

    Abstract: This study presents a comprehensive dataset capturing indoor environmental parameters, physiological responses, and subjective perceptions across three global cities. Utilizing wearable sensors, including smart eyeglasses, and a modified Cozie app, environmental and physiological data were collected, along with pre-screening, onboarding, and recurring surveys. Peripheral cues facilitated participa… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Submitted to Nature Scientific Data

  10. arXiv:2408.07967  [pdf, other

    cs.CV

    FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering

    Authors: Guofeng Feng, Siyan Chen, Rong Fu, Zimu Liao, Yi Wang, Tao Liu, Zhilin Pei, Hengjie Li, Xingcheng Zhang, Bo Dai

    Abstract: This work introduces FlashGS, an open-source CUDA Python library, designed to facilitate the efficient differentiable rasterization of 3D Gaussian Splatting through algorithmic and kernel-level optimizations. FlashGS is developed based on the observations from a comprehensive analysis of the rendering process to enhance computational efficiency and bring the technique to wide adoption. The paper i… ▽ More

    Submitted 19 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  11. Learning Rule-Induced Subgraph Representations for Inductive Relation Prediction

    Authors: Tianyu Liu, Qitan Lv, Jie Wang, Shuling Yang, Hanzhu Chen

    Abstract: Inductive relation prediction (IRP) -- where entities can be different during training and inference -- has shown great power for completing evolving knowledge graphs. Existing works mainly focus on using graph neural networks (GNNs) to learn the representation of the subgraph induced from the target link, which can be seen as an implicit rule-mining process to measure the plausibility of the targ… ▽ More

    Submitted 20 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Journal ref: Advances in Neural Information Processing Systems 36 (2024)

  12. arXiv:2408.06072  [pdf, other

    cs.CV

    CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

    Authors: Zhuoyi Yang, Jiayan Teng, Wendi Zheng, Ming Ding, Shiyu Huang, Jiazheng Xu, Yuanming Yang, Wenyi Hong, Xiaohan Zhang, Guanyu Feng, Da Yin, Xiaotao Gu, Yuxuan Zhang, Weihan Wang, Yean Cheng, Ting Liu, Bin Xu, Yuxiao Dong, Jie Tang

    Abstract: We introduce CogVideoX, a large-scale diffusion transformer model designed for generating videos based on text prompts. To efficently model video data, we propose to levearge a 3D Variational Autoencoder (VAE) to compress videos along both spatial and temporal dimensions. To improve the text-video alignment, we propose an expert transformer with the expert adaptive LayerNorm to facilitate the deep… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  13. arXiv:2408.04835  [pdf, other

    cs.NI

    Next-Generation Wi-Fi Networks with Generative AI: Design and Insights

    Authors: Jingyu Wang, Xuming Fang, Dusit Niyato, Tie Liu

    Abstract: Generative artificial intelligence (GAI), known for its powerful capabilities in image and text processing, also holds significant promise for the design and performance enhancement of future wireless networks. In this article, we explore the transformative potential of GAI in next-generation Wi-Fi networks, exploiting its advanced capabilities to address key challenges and improve overall network… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  14. arXiv:2408.04583  [pdf, other

    cs.LG cs.AI

    Unveiling the Power of Sparse Neural Networks for Feature Selection

    Authors: Zahra Atashgahi, Tennison Liu, Mykola Pechenizkiy, Raymond Veldhuis, Decebal Constantin Mocanu, Mihaela van der Schaar

    Abstract: Sparse Neural Networks (SNNs) have emerged as powerful tools for efficient feature selection. Leveraging the dynamic sparse training (DST) algorithms within SNNs has demonstrated promising feature selection capabilities while drastically reducing computational overheads. Despite these advancements, several critical aspects remain insufficiently explored for feature selection. Questions persist reg… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  15. arXiv:2408.04131  [pdf, other

    cs.LG

    Heterogeneous Graph Sequence Neural Networks for Dynamic Traffic Assignment

    Authors: Tong Liu, Hadi Meidani

    Abstract: Traffic assignment and traffic flow prediction provide critical insights for urban planning, traffic management, and the development of intelligent transportation systems. An efficient model for calculating traffic flows over the entire transportation network could provide a more detailed and realistic understanding of traffic dynamics. However, existing traffic prediction approaches, such as thos… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 9 pages, 5 figures

  16. arXiv:2408.04034  [pdf, other

    cs.CV

    Task-oriented Sequential Grounding in 3D Scenes

    Authors: Zhuofan Zhang, Ziyu Zhu, Pengxiang Li, Tengyu Liu, Xiaojian Ma, Yixin Chen, Baoxiong Jia, Siyuan Huang, Qing Li

    Abstract: Grounding natural language in physical 3D environments is essential for the advancement of embodied artificial intelligence. Current datasets and models for 3D visual grounding predominantly focus on identifying and localizing objects from static, object-centric descriptions. These approaches do not adequately address the dynamic and sequential nature of task-oriented grounding necessary for pract… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: website: https://sg-3d.github.io/

  17. AdapMTL: Adaptive Pruning Framework for Multitask Learning Model

    Authors: Mingcan Xiang, Steven Jiaxun Tang, Qizheng Yang, Hui Guan, Tongping Liu

    Abstract: In the domain of multimedia and multimodal processing, the efficient handling of diverse data streams such as images, video, and sensor data is paramount. Model compression and multitask learning (MTL) are crucial in this field, offering the potential to address the resource-intensive demands of processing and interpreting multiple forms of media simultaneously. However, effectively compressing a… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 13 pages, 9 figures, Published at ACM Multimedia (ACM MM) 2024

  18. arXiv:2408.03675  [pdf, other

    cs.CL

    NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time

    Authors: Yilong Chen, Guoxia Wang, Junyuan Shang, Shiyao Cui, Zhenyu Zhang, Tingwen Liu, Shuohuan Wang, Yu Sun, Dianhai Yu, Hua Wu

    Abstract: Large Language Models (LLMs) have ignited an innovative surge of AI applications, marking a new era of exciting possibilities equipped with extended context windows. However, hosting these models is cost-prohibitive mainly due to the extensive memory consumption of KV Cache involving long-context modeling. Despite several works proposing to evict unnecessary tokens from the KV Cache, most of them… ▽ More

    Submitted 7 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted by ACL 2024 (main conference, long paper)

  19. arXiv:2408.03482  [pdf, other

    cs.CR

    Beyond App Markets: Demystifying Underground Mobile App Distribution Via Telegram

    Authors: Yanhui Guo, Dong Wang, Liu Wang, Yongsheng Fang, Chao Wang, Minghui Yang, Tianming Liu, Haoyu Wang

    Abstract: The thriving mobile app ecosystem encompasses a wide range of functionalities. However, within this ecosystem, a subset of apps provides illicit services such as gambling and pornography to pursue economic gains, collectively referred to as "underground economy apps". While previous studies have examined these apps' characteristics and identification methods, investigations into their distribution… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  20. arXiv:2408.03359  [pdf, other

    cs.LG cs.AI cs.CL

    LAMPO: Large Language Models as Preference Machines for Few-shot Ordinal Classification

    Authors: Zhen Qin, Junru Wu, Jiaming Shen, Tianqi Liu, Xuanhui Wang

    Abstract: We introduce LAMPO, a novel paradigm that leverages Large Language Models (LLMs) for solving few-shot multi-class ordinal classification tasks. Unlike conventional methods, which concatenate all demonstration examples with the test instance and prompt LLMs to produce the pointwise prediction, our framework uses the LLM as a preference machine that makes a relative comparative decision between the… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: COLM 2024

  21. arXiv:2408.03286  [pdf, other

    cs.CV

    Biomedical SAM 2: Segment Anything in Biomedical Images and Videos

    Authors: Zhiling Yan, Weixiang Sun, Rong Zhou, Zhengqing Yuan, Kai Zhang, Yiwei Li, Tianming Liu, Quanzheng Li, Xiang Li, Lifang He, Lichao Sun

    Abstract: Medical image segmentation and video object segmentation are essential for diagnosing and analyzing diseases by identifying and measuring biological structures. Recent advances in natural domain have been driven by foundation models like the Segment Anything Model 2 (SAM-2). To explore the performance of SAM-2 in biomedical applications, we designed three evaluation pipelines for single-frame 2D i… ▽ More

    Submitted 17 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  22. arXiv:2408.03079  [pdf, other

    cs.CL cs.AI

    Enhancing Complex Causality Extraction via Improved Subtask Interaction and Knowledge Fusion

    Authors: Jinglong Gao, Chen Lu, Xiao Ding, Zhongyang Li, Ting Liu, Bing Qin

    Abstract: Event Causality Extraction (ECE) aims at extracting causal event pairs from texts. Despite ChatGPT's recent success, fine-tuning small models remains the best approach for the ECE task. However, existing fine-tuning based ECE methods cannot address all three key challenges in ECE simultaneously: 1) Complex Causality Extraction, where multiple causal-effect pairs occur within a single sentence; 2)… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: NLPCC 2024 Oral

  23. arXiv:2408.01607  [pdf

    cs.CV cs.LG

    Deep Learning Meets OBIA: Tasks, Challenges, Strategies, and Perspectives

    Authors: Lei Ma, Ziyun Yan, Mengmeng Li, Tao Liu, Liqin Tan, Xuan Wang, Weiqiang He, Ruikun Wang, Guangjun He, Heng Lu, Thomas Blaschke

    Abstract: Deep learning has gained significant attention in remote sensing, especially in pixel- or patch-level applications. Despite initial attempts to integrate deep learning into object-based image analysis (OBIA), its full potential remains largely unexplored. In this article, as OBIA usage becomes more widespread, we conducted a comprehensive review and expansion of its task subdomains, with or withou… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  24. arXiv:2408.01370  [pdf, other

    cs.CV cs.RO

    EVIT: Event-based Visual-Inertial Tracking in Semi-Dense Maps Using Windowed Nonlinear Optimization

    Authors: Runze Yuan, Tao Liu, Zijia Dai, Yi-Fan Zuo, Laurent Kneip

    Abstract: Event cameras are an interesting visual exteroceptive sensor that reacts to brightness changes rather than integrating absolute image intensities. Owing to this design, the sensor exhibits strong performance in situations of challenging dynamics and illumination conditions. While event-based simultaneous tracking and mapping remains a challenging problem, a number of recent works have pointed out… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 8 pages, 5 figures, 3 tables, International Conference on Intelligent Robots and Systems 2024

  25. arXiv:2408.01319  [pdf, other

    cs.AI

    A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks

    Authors: Jiaqi Wang, Hanqi Jiang, Yiheng Liu, Chong Ma, Xu Zhang, Yi Pan, Mengyuan Liu, Peiran Gu, Sichen Xia, Wenjun Li, Yutong Zhang, Zihao Wu, Zhengliang Liu, Tianyang Zhong, Bao Ge, Tuo Zhang, Ning Qiang, Xintao Hu, Xi Jiang, Xin Zhang, Wei Zhang, Dinggang Shen, Tianming Liu, Shu Zhang

    Abstract: In an era defined by the explosive growth of data and rapid technological advancements, Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence (AI) systems. Designed to seamlessly integrate diverse data types-including text, images, videos, audio, and physiological sequences-MLLMs address the complexities of real-world applications far beyond the capabilities of… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  26. arXiv:2408.01276  [pdf, other

    cs.CV

    Wave-Mamba: Wavelet State Space Model for Ultra-High-Definition Low-Light Image Enhancement

    Authors: Wenbin Zou, Hongxia Gao, Weipeng Yang, Tongtong Liu

    Abstract: Ultra-high-definition (UHD) technology has attracted widespread attention due to its exceptional visual quality, but it also poses new challenges for low-light image enhancement (LLIE) techniques. UHD images inherently possess high computational complexity, leading existing UHD LLIE methods to employ high-magnification downsampling to reduce computational costs, which in turn results in informatio… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 10 pages, 8 figures, ACMMM2024 accepted

  27. arXiv:2408.00278  [pdf, other

    cs.LG cs.AI cs.NE

    High Performance Im2win and Direct Convolutions using Three Tensor Layouts on SIMD Architectures

    Authors: Xiang Fu, Xinpeng Zhang, Jixiang Ma, Peng Zhao, Shuai Lu, Xu T. Liu

    Abstract: Convolution is the core component within deep neural networks and it is computationally intensive and time consuming. Tensor data layouts significantly impact convolution operations in terms of memory access and computational efficiency. Yet, there is still a lack of comprehensive performance characterization on data layouts on SIMD architectures concerning convolution methods. This paper proposes… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  28. arXiv:2407.21057  [pdf, other

    cs.CL cs.AI cs.LG

    Multi-group Uncertainty Quantification for Long-form Text Generation

    Authors: Terrance Liu, Zhiwei Steven Wu

    Abstract: While large language models are rapidly moving towards consumer-facing applications, they are often still prone to factual errors and hallucinations. In order to reduce the potential harms that may come from these errors, it is important for users to know to what extent they can trust an LLM when it makes a factual claim. To this end, we study the problem of uncertainty quantification of factual c… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  29. arXiv:2407.20522  [pdf, other

    cs.HC

    Evaluating Fairness in Black-box Algorithmic Markets: A Case Study of Ride Sharing in Chicago

    Authors: Yuhan Liu, Yuhan Zheng, Siyuan Zhang, Lydia T. Liu

    Abstract: This study examines fairness within the rideshare industry, focusing on both drivers' wages and riders' trip fares. Through quantitative analysis, we found that drivers' hourly wages are significantly influenced by factors such as race/ethnicity, health insurance status, tenure to the platform, and working hours. Despite platforms' policies not intentionally embedding biases, disparities persist b… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Accepted to the Humans, Algorithmic Decision-Making and Society: Modeling Interactions and Impact, co-located with the International Conference on Machine Learning, Vienna, Austria

  30. arXiv:2407.19625  [pdf, other

    cs.CL cs.MM

    LoginMEA: Local-to-Global Interaction Network for Multi-modal Entity Alignment

    Authors: Taoyu Su, Xinghua Zhang, Jiawei Sheng, Zhenyu Zhang, Tingwen Liu

    Abstract: Multi-modal entity alignment (MMEA) aims to identify equivalent entities between two multi-modal knowledge graphs (MMKGs), whose entities can be associated with relational triples and related images. Most previous studies treat the graph structure as a special modality, and fuse different modality information with separate uni-modal encoders, neglecting valuable relational associations in modaliti… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted by ECAI 2024

  31. arXiv:2407.19389  [pdf, other

    cs.DC cs.LG math.OC

    FIARSE: Model-Heterogeneous Federated Learning via Importance-Aware Submodel Extraction

    Authors: Feijie Wu, Xingchen Wang, Yaqing Wang, Tianci Liu, Lu Su, Jing Gao

    Abstract: In federated learning (FL), accommodating clients' varied computational capacities poses a challenge, often limiting the participation of those with constrained resources in global model training. To address this issue, the concept of model heterogeneity through submodel extraction has emerged, offering a tailored solution that aligns the model's complexity with each client's computational capacit… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  32. arXiv:2407.19302  [pdf, other

    cs.CL cs.MM

    IBMEA: Exploring Variational Information Bottleneck for Multi-modal Entity Alignment

    Authors: Taoyu Su, Jiawei Sheng, Shicheng Wang, Xinghua Zhang, Hongbo Xu, Tingwen Liu

    Abstract: Multi-modal entity alignment (MMEA) aims to identify equivalent entities between multi-modal knowledge graphs (MMKGs), where the entities can be associated with related images. Most existing studies integrate multi-modal information heavily relying on the automatically-learned fusion module, rarely suppressing the redundant information for MMEA explicitly. To this end, we explore variational infor… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  33. arXiv:2407.18523  [pdf, other

    cs.LG

    DTFormer: A Transformer-Based Method for Discrete-Time Dynamic Graph Representation Learning

    Authors: Xi Chen, Yun Xiong, Siwei Zhang, Jiawei Zhang, Yao Zhang, Shiyang Zhou, Xixi Wu, Mingyang Zhang, Tengfei Liu, Weiqiang Wang

    Abstract: Discrete-Time Dynamic Graphs (DTDGs), which are prevalent in real-world implementations and notable for their ease of data acquisition, have garnered considerable attention from both academic researchers and industry practitioners. The representation learning of DTDGs has been extensively applied to model the dynamics of temporally changing entities and their evolving connections. Currently, DTDG… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

  34. arXiv:2407.18064  [pdf, other

    cs.HC

    ComPeer: A Generative Conversational Agent for Proactive Peer Support

    Authors: Tianjian Liu, Hongzheng Zhao, Yuheng Liu, Xingbo Wang, Zhenhui Peng

    Abstract: Conversational Agents (CAs) acting as peer supporters have been widely studied and demonstrated beneficial for people's mental health. However, previous peer support CAs either are user-initiated or follow predefined rules to initiate the conversations, which may discourage users to engage and build relationships with the CAs for long-term benefits. In this paper, we develop ComPeer, a generative… ▽ More

    Submitted 5 August, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: To appear at the 2024 ACM Symposium on User Interface Software and Technology (UIST); 22 pages (7 figures, 7 tables)

  35. arXiv:2407.16008  [pdf, other

    cs.CL

    Boosting Reward Model with Preference-Conditional Multi-Aspect Synthetic Data Generation

    Authors: Jiaming Shen, Ran Xu, Yennie Jun, Zhen Qin, Tianqi Liu, Carl Yang, Yi Liang, Simon Baumgartner, Michael Bendersky

    Abstract: Reward models (RMs) are crucial for aligning large language models (LLMs) with human preferences. They are trained using preference datasets where each example consists of one input prompt, two responses, and a preference label. As curating a high-quality human labeled preference dataset is both time-consuming and expensive, people often rely on existing powerful LLMs for preference label generati… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  36. arXiv:2407.15975  [pdf, other

    cs.CL

    Multilingual Fine-Grained News Headline Hallucination Detection

    Authors: Jiaming Shen, Tianqi Liu, Jialu Liu, Zhen Qin, Jay Pavagadhi, Simon Baumgartner, Michael Bendersky

    Abstract: The popularity of automated news headline generation has surged with advancements in pre-trained language models. However, these models often suffer from the ``hallucination'' problem, where the generated headline is not fully supported by its source article. Efforts to address this issue have predominantly focused on English, using over-simplistic classification schemes that overlook nuanced hall… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  37. arXiv:2407.15791  [pdf, other

    cs.CV

    RADA: Robust and Accurate Feature Learning with Domain Adaptation

    Authors: Jingtai He, Gehao Zhang, Tingting Liu, Songlin Du

    Abstract: Recent advancements in keypoint detection and descriptor extraction have shown impressive performance in local feature learning tasks. However, existing methods generally exhibit suboptimal performance under extreme conditions such as significant appearance changes and domain shifts. In this study, we introduce a multi-level feature aggregation network that incorporates two pivotal components to f… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  38. arXiv:2407.15683  [pdf, other

    cs.CV

    Enhancing Transferability of Targeted Adversarial Examples: A Self-Universal Perspective

    Authors: Bowen Peng, Li Liu, Tianpeng Liu, Zhen Liu, Yongxiang Liu

    Abstract: Transfer-based targeted adversarial attacks against black-box deep neural networks (DNNs) have been proven to be significantly more challenging than untargeted ones. The impressive transferability of current SOTA, the generative methods, comes at the cost of requiring massive amounts of additional data and time-consuming training for each targeted label. This results in limited efficiency and flex… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 8 pages and 9 figures

  39. arXiv:2407.13642  [pdf, other

    cs.CV

    Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models

    Authors: Xiaoyu Zhu, Hao Zhou, Pengfei Xing, Long Zhao, Hao Xu, Junwei Liang, Alexander Hauptmann, Ting Liu, Andrew Gallagher

    Abstract: In this paper, we investigate the use of diffusion models which are pre-trained on large-scale image-caption pairs for open-vocabulary 3D semantic understanding. We propose a novel method, namely Diff2Scene, which leverages frozen representations from text-image generative models, along with salient-aware and geometric-aware masks, for open-vocabulary 3D semantic segmentation and visual grounding… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  40. arXiv:2407.12448  [pdf, other

    cs.LG

    Energy-Guided Diffusion Sampling for Offline-to-Online Reinforcement Learning

    Authors: Xu-Hui Liu, Tian-Shuo Liu, Shengyi Jiang, Ruifeng Chen, Zhilong Zhang, Xinwei Chen, Yang Yu

    Abstract: Combining offline and online reinforcement learning (RL) techniques is indeed crucial for achieving efficient and safe learning where data acquisition is expensive. Existing methods replay offline data directly in the online phase, resulting in a significant challenge of data distribution shift and subsequently causing inefficiency in online fine-tuning. To address this issue, we introduce an inno… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  41. arXiv:2407.11481  [pdf, other

    cs.LG cs.AI eess.SP

    Multi-Channel Masked Autoencoder and Comprehensive Evaluations for Reconstructing 12-Lead ECG from Arbitrary Single-Lead ECG

    Authors: Jiarong Chen, Wanqing Wu, Tong Liu, Shenda Hong

    Abstract: In the context of cardiovascular diseases (CVD) that exhibit an elevated prevalence and mortality, the electrocardiogram (ECG) is a popular and standard diagnostic tool for doctors, commonly utilizing a 12-lead configuration in clinical practice. However, the 10 electrodes placed on the surface would cause a lot of inconvenience and discomfort, while the rapidly advancing wearable devices adopt th… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by KDD-AIDSH 2024

  42. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin , et al. (37 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 17 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  43. arXiv:2407.10181  [pdf, other

    cs.CV

    Multiscale Sliced Wasserstein Distances as Perceptual Color Difference Measures

    Authors: Jiaqi He, Zhihua Wang, Leon Wang, Tsein-I Liu, Yuming Fang, Qilin Sun, Kede Ma

    Abstract: Contemporary color difference (CD) measures for photographic images typically operate by comparing co-located pixels, patches in a ``perceptually uniform'' color space, or features in a learned latent space. Consequently, these measures inadequately capture the human color perception of misaligned image pairs, which are prevalent in digital photography (e.g., the same scene captured by different s… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  44. arXiv:2407.10132  [pdf, other

    cs.LG stat.ME

    Optimal Kernel Choice for Score Function-based Causal Discovery

    Authors: Wenjie Wang, Biwei Huang, Feng Liu, Xinge You, Tongliang Liu, Kun Zhang, Mingming Gong

    Abstract: Score-based methods have demonstrated their effectiveness in discovering causal relationships by scoring different causal structures based on their goodness of fit to the data. Recently, Huang et al. proposed a generalized score function that can handle general data distributions and causal relationships by modeling the relations in reproducing kernel Hilbert space (RKHS). The selection of an appr… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: Accepted by ICML2024

  45. arXiv:2407.10105  [pdf, other

    cs.CV cs.AI

    Hierarchical Multi-modal Transformer for Cross-modal Long Document Classification

    Authors: Tengfei Liu, Yongli Hu, Junbin Gao, Yanfeng Sun, Baocai Yin

    Abstract: Long Document Classification (LDC) has gained significant attention recently. However, multi-modal data in long documents such as texts and images are not being effectively utilized. Prior studies in this area have attempted to integrate texts and images in document-related tasks, but they have only focused on short text sequences and images of pages. How to classify long documents with hierarchic… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: IEEE Transactions on Multimedia

  46. arXiv:2407.09509  [pdf, other

    q-bio.NC cs.HC

    Brain Dialogue Interface (BDI): A User-Friendly fMRI Model for Interactive Brain Decoding

    Authors: Heng Huang, Lin Zhao, Zihao Wu, Xiaowei Yu, Jing Zhang, Xintao Hu, Dajiang Zhu, Tianming Liu

    Abstract: Brain decoding techniques are essential for understanding the neurocognitive system. Although numerous methods have been introduced in this field, accurately aligning complex external stimuli with brain activities remains a formidable challenge. To alleviate alignment difficulties, many studies have simplified their models by employing single-task paradigms and establishing direct links between br… ▽ More

    Submitted 17 June, 2024; originally announced July 2024.

  47. arXiv:2407.08937  [pdf, other

    cs.CL cs.AI

    Self-Evolving GPT: A Lifelong Autonomous Experiential Learner

    Authors: Jinglong Gao, Xiao Ding, Yiming Cui, Jianbai Zhao, Hepeng Wang, Ting Liu, Bing Qin

    Abstract: To improve the performance of large language models (LLMs), researchers have explored providing LLMs with textual task-solving experience via prompts. However, they rely on manual efforts to acquire and apply such experience for each task, which is not feasible for the growing demand for LLMs and the variety of user questions. To address this issue, we design a lifelong autonomous experiential lea… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ACL 2024 MAIN

  48. arXiv:2407.05758  [pdf, other

    eess.IV cs.AI cs.CV

    Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports

    Authors: Yutong Zhang, Yi Pan, Tianyang Zhong, Peixin Dong, Kangni Xie, Yuxiao Liu, Hanqi Jiang, Zhengliang Liu, Shijie Zhao, Tuo Zhang, Xi Jiang, Dinggang Shen, Tianming Liu, Xin Zhang

    Abstract: Medical images and radiology reports are crucial for diagnosing medical conditions, highlighting the importance of quantitative analysis for clinical decision-making. However, the diversity and cross-source heterogeneity of these data challenge the generalizability of current data-mining methods. Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecti… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  49. arXiv:2407.05573  [pdf

    cs.CV

    Spatio-Temporal Encoding and Decoding-Based Method for Future Human Activity Skeleton Synthesis

    Authors: Tingyu Liu, Jun Huang, Chenyi Weng

    Abstract: Inferring future activity information based on observed activity data is a crucial step to improve the accuracy of early activity prediction. Traditional methods based on generative adversarial networks(GAN) or joint learning frameworks can achieve good prediction accuracy under low observation ratios, but they usually have high computational costs. In view of this, this paper proposes a spatio-te… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  50. arXiv:2407.04973  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts

    Authors: Yijia Xiao, Edward Sun, Tianyu Liu, Wei Wang

    Abstract: We propose LogicVista, an evaluation benchmark that assesses the integrated logical reasoning capabilities of multimodal large language models (MLLMs) in Visual contexts. Recent advancements in MLLMs have demonstrated various fascinating abilities, from crafting poetry based on an image to performing mathematical reasoning. However, there is still a lack of systematic evaluation of MLLMs' proficie… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: LogicVista benchmarks the logical reasoning of multimodal large language models in visual tasks