Skip to main content

Showing 1–50 of 276 results for author: Tian, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.08242  [pdf, ps, other

    cs.RO cs.AI cs.LG eess.SY

    A Conflicts-free, Speed-lossless KAN-based Reinforcement Learning Decision System for Interactive Driving in Roundabouts

    Authors: Zhihao Lin, Zhen Tian, Qi Zhang, Ziyang Ye, Hanyang Zhuang, Jianglin Lan

    Abstract: Safety and efficiency are crucial for autonomous driving in roundabouts, especially in the context of mixed traffic where autonomous vehicles (AVs) and human-driven vehicles coexist. This paper introduces a learning-based algorithm tailored to foster safe and efficient driving behaviors across varying levels of traffic flows in roundabouts. The proposed algorithm employs a deep Q-learning network… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 15 pages, 12 figures, submitted to an IEEE journal

  2. arXiv:2408.01760  [pdf, other

    cs.SE

    Large Language Models for Equivalent Mutant Detection: How Far Are We?

    Authors: Zhao Tian, Honglin Shu, Dong Wang, Xuejie Cao, Yasutaka Kamei, Junjie Chen

    Abstract: Mutation testing is vital for ensuring software quality. However, the presence of equivalent mutants is known to introduce redundant cost and bias issues, hindering the effectiveness of mutation testing in practical use. Although numerous equivalent mutant detection (EMD) techniques have been proposed, they exhibit limitations due to the scarcity of training data and challenges in generalizing to… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by ISSTA'2024

  3. arXiv:2407.21043  [pdf, other

    cs.CL cs.AI cs.LG

    CP-Prompt: Composition-Based Cross-modal Prompting for Domain-Incremental Continual Learning

    Authors: Yu Feng, Zhen Tian, Yifan Zhu, Zongfu Han, Haoran Luo, Guangwei Zhang, Meina Song

    Abstract: The key challenge of cross-modal domain-incremental learning (DIL) is to enable the learning model to continuously learn from novel data with different feature distributions under the same task without forgetting old ones. However, existing top-performing methods still cause high forgetting rates, by lacking intra-domain knowledge extraction and inter-domain common prompting strategy. In this pape… ▽ More

    Submitted 2 August, 2024; v1 submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  4. arXiv:2407.20962  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

    Authors: Xiaowei Chi, Yatian Wang, Aosong Cheng, Pengjun Fang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, Mengfei Li, Ruibin Yuan, Yanbing Jiang, Wei Xue, Wenhan Luo, Qifeng Chen, Shanghang Zhang, Qifeng Liu, Yike Guo

    Abstract: Massive multi-modality datasets play a significant role in facilitating the success of large video-language models. However, current video-language datasets primarily provide text descriptions for visual frames, considering audio to be weakly related information. They usually overlook exploring the potential of inherent audio-visual correlation, leading to monotonous annotation within each modalit… ▽ More

    Submitted 6 August, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

    Comments: 15 Pages. Dataset report

  5. arXiv:2407.08268  [pdf, other

    cs.CV

    Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation

    Authors: Tong Shao, Zhuotao Tian, Hang Zhao, Jingyong Su

    Abstract: CLIP, as a vision-language model, has significantly advanced Open-Vocabulary Semantic Segmentation (OVSS) with its zero-shot capabilities. Despite its success, its application to OVSS faces challenges due to its initial image-level alignment training, which affects its performance in tasks requiring detailed local context. Our study delves into the impact of CLIP's [CLS] token on patch feature cor… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: ECCV24 accepted

  6. arXiv:2407.05342  [pdf, other

    cs.CV

    Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models

    Authors: Longxiang Tang, Zhuotao Tian, Kai Li, Chunming He, Hantao Zhou, Hengshuang Zhao, Xiu Li, Jiaya Jia

    Abstract: This study addresses the Domain-Class Incremental Learning problem, a realistic but challenging continual learning scenario where both the domain distribution and target classes vary across tasks. To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability. However, this incurs a new problem: the knowledge encoded in the pre-trained VLM… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  7. arXiv:2407.02073  [pdf, other

    cs.LG

    Contribution Evaluation of Heterogeneous Participants in Federated Learning via Prototypical Representations

    Authors: Qi Guo, Minghao Yao, Zhen Tian, Saiyu Qi, Yong Qi, Yun Lin, Jin Song Dong

    Abstract: Contribution evaluation in federated learning (FL) has become a pivotal research area due to its applicability across various domains, such as detecting low-quality datasets, enhancing model robustness, and designing incentive mechanisms. Existing contribution evaluation methods, which primarily rely on data volume, model similarity, and auxiliary test datasets, have shown success in diverse scena… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  8. arXiv:2406.18629  [pdf, other

    cs.LG cs.AI cs.CL

    Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

    Authors: Xin Lai, Zhuotao Tian, Yukang Chen, Senqiao Yang, Xiangru Peng, Jiaya Jia

    Abstract: Mathematical reasoning presents a significant challenge for Large Language Models (LLMs) due to the extensive and precise chain of reasoning required for accuracy. Ensuring the correctness of each reasoning step is critical. To address this, we aim to enhance the robustness and factuality of LLMs by learning from human feedback. However, Direct Preference Optimization (DPO) has shown limited benef… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Code, data, and models are available at https://github.com/dvlab-research/Step-DPO

  9. arXiv:2406.18301  [pdf, other

    eess.AS cs.CL cs.SD

    MSR-86K: An Evolving, Multilingual Corpus with 86,300 Hours of Transcribed Audio for Speech Recognition Research

    Authors: Song Li, Yongbin You, Xuezhi Wang, Zhengkun Tian, Ke Ding, Guanglu Wan

    Abstract: Recently, multilingual artificial intelligence assistants, exemplified by ChatGPT, have gained immense popularity. As a crucial gateway to human-computer interaction, multilingual automatic speech recognition (ASR) has also garnered significant attention, as evidenced by systems like Whisper. However, the proprietary nature of the training data has impeded researchers' efforts to study multilingua… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by InterSpeech 2024

  10. arXiv:2406.17795  [pdf, other

    cs.CV cs.GR

    RACon: Retrieval-Augmented Simulated Character Locomotion Control

    Authors: Yuxuan Mu, Shihao Zou, Kangning Yin, Zheng Tian, Li Cheng, Weinan Zhang, Jun Wang

    Abstract: In computer animation, driving a simulated character with lifelike motion is challenging. Current generative models, though able to generalize to diverse motions, often pose challenges to the responsiveness of end-user control. To address these issues, we introduce RACon: Retrieval-Augmented Simulated Character Locomotion Control. Our end-to-end hierarchical reinforcement learning method utilizes… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted in ICME2024 for oral presentation

  11. arXiv:2406.17147  [pdf, other

    cs.LG cs.AI q-bio.QM

    Quantifying Heterogeneous Ecosystem Services With Multi-Label Soft Classification

    Authors: Zhihui Tian, John Upchurch, G. Austin Simon, José Dubeux, Alina Zare, Chang Zhao, Joel B. Harley

    Abstract: Understanding and quantifying ecosystem services are crucial for sustainable environmental management, conservation efforts, and policy-making. The advancement of remote sensing technology and machine learning techniques has greatly facilitated this process. Yet, ground truth labels, such as biodiversity, are very difficult and expensive to measure. In addition, more easily obtainable proxy labels… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  12. arXiv:2406.04321  [pdf, other

    cs.CV cs.LG cs.MM cs.SD

    VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

    Authors: Zeyue Tian, Zhaoyang Liu, Ruibin Yuan, Jiahao Pan, Xiaoqiang Huang, Qifeng Liu, Xu Tan, Qifeng Chen, Wei Xue, Yike Guo

    Abstract: In this work, we systematically study music generation conditioned solely on the video. First, we present a large-scale dataset comprising 190K video-music pairs, including various genres such as movie trailers, advertisements, and documentaries. Furthermore, we propose VidMuse, a simple framework for generating music aligned with video inputs. VidMuse stands out by producing high-fidelity music t… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: The code and datasets will be available at https://github.com/ZeyueT/VidMuse/

  13. arXiv:2405.19334  [pdf, other

    cs.AI cs.CL cs.CV cs.MM cs.SD

    LLMs Meet Multimodal Generation and Editing: A Survey

    Authors: Yingqing He, Zhaoyang Liu, Jingye Chen, Zeyue Tian, Hongyu Liu, Xiaowei Chi, Runtao Liu, Ruibin Yuan, Yazhou Xing, Wenhai Wang, Jifeng Dai, Yong Zhang, Wei Xue, Qifeng Liu, Yike Guo, Qifeng Chen

    Abstract: With the recent advancement in large language models (LLMs), there is a growing interest in combining LLMs with multimodal learning. Previous surveys of multimodal large language models (MLLMs) mainly focus on multimodal understanding. This survey elaborates on multimodal generation and editing across various domains, comprising image, video, 3D, and audio. Specifically, we summarize the notable a… ▽ More

    Submitted 9 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

    Comments: 52 Pages with 16 Figures, 12 Tables, and 545 References. GitHub Repository at: https://github.com/YingqingHe/Awesome-LLMs-meet-Multimodal-Generation

  14. arXiv:2405.18009  [pdf, other

    cs.CL cs.LG

    Exploring Context Window of Large Language Models via Decomposed Positional Vectors

    Authors: Zican Dong, Junyi Li, Xin Men, Wayne Xin Zhao, Bingbing Wang, Zhen Tian, Weipeng Chen, Ji-Rong Wen

    Abstract: Transformer-based large language models (LLMs) typically have a limited context window, resulting in significant performance degradation when processing text beyond the length of the context window. Extensive studies have been proposed to extend the context window and achieve length extrapolation of LLMs, but there is still a lack of in-depth interpretation of these approaches. In this study, we e… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  15. arXiv:2405.16965  [pdf, ps, other

    cs.IT eess.SP

    Timeliness of Status Update System: The Effect of Parallel Transmission Using Heterogeneous Updating Devices

    Authors: Zhengchuan Chen, Kang Lang, Nikolaos Pappas, Howard H. Yang, Min Wang, Zhong Tian, Tony Q. S. Quek

    Abstract: Timely status updating is the premise of emerging interaction-based applications in the Internet of Things (IoT). Using redundant devices to update the status of interest is a promising method to improve the timeliness of information. However, parallel status updating leads to out-of-order arrivals at the monitor, significantly challenging timeliness analysis. This work studies the Age of Informat… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  16. arXiv:2405.15414  [pdf, other

    cs.AI

    Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification

    Authors: Yuxuan Guo, Shaohui Peng, Jiaming Guo, Di Huang, Xishan Zhang, Rui Zhang, Yifan Hao, Ling Li, Zikang Tian, Mingju Gao, Yutai Li, Yiming Gan, Shuai Liang, Zihao Zhang, Zidong Du, Qi Guo, Xing Hu, Yunji Chen

    Abstract: Building open agents has always been the ultimate goal in AI research, and creative agents are the more enticing. Existing LLM agents excel at long-horizon tasks with well-defined goals (e.g., `mine diamonds' in Minecraft). However, they encounter difficulties on creative tasks with open goals and abstract criteria due to the inability to bridge the gap between them, thus lacking feedback for self… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  17. arXiv:2405.14383  [pdf, other

    cs.CL cs.AI

    Perception of Knowledge Boundary for Large Language Models through Semi-open-ended Question Answering

    Authors: Zhihua Wen, Zhiliang Tian, Zexin Jian, Zhen Huang, Pei Ke, Yifu Gao, Minlie Huang, Dongsheng Li

    Abstract: Large Language Models (LLMs) are widely used for knowledge-seeking yet suffer from hallucinations. The knowledge boundary (KB) of an LLM limits its factual understanding, beyond which it may begin to hallucinate. Investigating the perception of LLMs' KB is crucial for detecting hallucinations and LLMs' reliable generation. Current studies perceive LLMs' KB on questions with a concrete answer (clos… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  18. arXiv:2405.12571  [pdf, other

    cs.RO

    iHERO: Interactive Human-oriented Exploration and Supervision Under Scarce Communication

    Authors: Zhuoli Tian, Yuyang Zhang, Jinsheng Wei, Meng Guo

    Abstract: Exploration of unknown scenes before human entry is essential for safety and efficiency in numerous scenarios, e.g., subterranean exploration, reconnaissance, search and rescue missions. Fleets of autonomous robots are particularly suitable for this task, via concurrent exploration, multi-sensory perception and autonomous navigation. Communication however among the robots can be severely restricte… ▽ More

    Submitted 7 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: Accepted at RSS 2024

  19. arXiv:2405.07406  [pdf, other

    cs.CR cs.AI

    Machine Unlearning: A Comprehensive Survey

    Authors: Weiqi Wang, Zhiyi Tian, Chenhan Zhang, Shui Yu

    Abstract: As the right to be forgotten has been legislated worldwide, many studies attempt to design unlearning mechanisms to protect users' privacy when they want to leave machine learning service platforms. Specifically, machine unlearning is to make a trained model to remove the contribution of an erased subset of the training dataset. This survey aims to systematically classify a wide range of machine u… ▽ More

    Submitted 24 July, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

  20. arXiv:2405.01221  [pdf, other

    cs.NI

    A Survey on Semantic Communication Networks: Architecture, Security, and Privacy

    Authors: Shaolong Guo, Yuntao Wang, Ning Zhang, Zhou Su, Tom H. Luan, Zhiyi Tian, Xuemin Shen

    Abstract: Semantic communication, emerging as a breakthrough beyond the classical Shannon paradigm, aims to convey the essential meaning of source data rather than merely focusing on precise yet content-agnostic bit transmission. By interconnecting diverse intelligent agents (e.g., autonomous vehicles and VR devices) via semantic communications, the semantic communication networks (SemComNet) supports seman… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  21. arXiv:2404.18081  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ComposerX: Multi-Agent Symbolic Music Composition with LLMs

    Authors: Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

    Abstract: Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and C… ▽ More

    Submitted 30 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  22. arXiv:2404.15909  [pdf, other

    cs.CV

    Learning Long-form Video Prior via Generative Pre-Training

    Authors: Jinheng Xie, Jiajun Feng, Zhaoxu Tian, Kevin Qinghong Lin, Yawen Huang, Xi Xia, Nanxu Gong, Xu Zuo, Jiaqi Yang, Yefeng Zheng, Mike Zheng Shou

    Abstract: Concepts involved in long-form videos such as people, objects, and their interactions, can be viewed as following an implicit prior. They are notably complex and continue to pose challenges to be comprehensively learned. In recent years, generative pre-training (GPT) has exhibited versatile capacities in modeling any kind of text content even visual locations. Can this manner work for learning lon… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  23. arXiv:2404.13579  [pdf, other

    cs.CV cs.AI

    LTOS: Layout-controllable Text-Object Synthesis via Adaptive Cross-attention Fusions

    Authors: Xiaoran Zhao, Tianhao Wu, Yu Lai, Zhiliang Tian, Zhen Huang, Yahui Liu, Zejiang He, Dongsheng Li

    Abstract: Controllable text-to-image generation synthesizes visual text and objects in images with certain conditions, which are frequently applied to emoji and poster generation. Visual text rendering and layout-to-image generation tasks have been popular in controllable text-to-image generation. However, each of these tasks typically focuses on single modality generation or rendering, leaving yet-to-be-br… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  24. arXiv:2404.12135  [pdf, other

    cs.MA cs.CR cs.DC

    mABC: multi-Agent Blockchain-Inspired Collaboration for root cause analysis in micro-services architecture

    Authors: Wei Zhang, Hongcheng Guo, Jian Yang, Yi Zhang, Chaoran Yan, Zhoujin Tian, Hangyuan Ji, Zhoujun Li, Tongliang Li, Tieqiao Zheng, Chao Chen, Yi Liang, Xu Shi, Liangfan Zheng, Bo Zhang

    Abstract: The escalating complexity of micro-services architecture in cloud-native technologies poses significant challenges for maintaining system stability and efficiency. To conduct root cause analysis (RCA) and resolution of alert events, we propose a pioneering framework, multi-Agent Blockchain-inspired Collaboration for root cause analysis in micro-services architecture (mABC), to revolutionize the AI… ▽ More

    Submitted 3 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  25. arXiv:2404.11003  [pdf, other

    cs.CV

    InfoMatch: Entropy Neural Estimation for Semi-Supervised Image Classification

    Authors: Qi Han, Zhibo Tian, Chengwei Xia, Kun Zhan

    Abstract: Semi-supervised image classification, leveraging pseudo supervision and consistency regularization, has demonstrated remarkable success. However, the ongoing challenge lies in fully exploiting the potential of unlabeled data. To address this, we employ information entropy neural estimation to utilize the potential of unlabeled samples. Inspired by contrastive learning, the entropy is estimated by… ▽ More

    Submitted 12 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: IJCAI 2024

  26. arXiv:2404.07470  [pdf, other

    cs.CL

    Scalable Language Model with Generalized Continual Learning

    Authors: Bohao Peng, Zhuotao Tian, Shu Liu, Mingchang Yang, Jiaya Jia

    Abstract: Continual learning has gained increasing importance as it facilitates the acquisition and refinement of scalable knowledge and skills in language models. However, existing methods typically encounter strict limitations and challenges in real-world scenarios, such as reliance on experience replay, optimization constraints, and inference task-ID. In this study, we introduce the Scalable Language Mod… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: The Twelfth International Conference on Learning Representations

  27. arXiv:2404.07155  [pdf, other

    cs.CV

    Unified Language-driven Zero-shot Domain Adaptation

    Authors: Senqiao Yang, Zhuotao Tian, Li Jiang, Jiaya Jia

    Abstract: This paper introduces Unified Language-driven Zero-shot Domain Adaptation (ULDA), a novel task setting that enables a single model to adapt to diverse target domains without explicit domain-ID knowledge. We identify the constraints in the existing language-driven zero-shot domain adaptation task, particularly the requirement for domain IDs and domain-specific models, which may restrict flexibility… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  28. arXiv:2404.05569  [pdf, other

    cs.AI cs.CL cs.MA

    360$^\circ$REA: Towards A Reusable Experience Accumulation with 360° Assessment for Multi-Agent System

    Authors: Shen Gao, Hao Li, Chengrui Huang, Quan Tu, Zhiliang Tian, Minlie Huang, Shuo Shang

    Abstract: Large language model agents have demonstrated remarkable advancements across various complex tasks. Recent works focus on optimizing the agent team or employing self-reflection to iteratively solve complex tasks. Since these agents are all based on the same LLM, only conducting self-evaluation or removing underperforming agents does not substantively enhance the capability of the agents. We argue… ▽ More

    Submitted 26 June, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

  29. arXiv:2404.02532  [pdf, other

    cs.AI cs.CL

    Learn to Disguise: Avoid Refusal Responses in LLM's Defense via a Multi-agent Attacker-Disguiser Game

    Authors: Qianqiao Xu, Zhiliang Tian, Hongyan Wu, Zhen Huang, Yiping Song, Feng Liu, Dongsheng Li

    Abstract: With the enhanced performance of large models on natural language processing tasks, potential moral and ethical issues of large models arise. There exist malicious attackers who induce large models to jailbreak and generate information containing illegal, privacy-invasive information through techniques such as prompt engineering. As a result, large models counter malicious attackers' attacks using… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 13 pages, 2 figures

  30. arXiv:2403.20188  [pdf, other

    cs.NI cs.AI cs.LG

    Distributed Swarm Learning for Edge Internet of Things

    Authors: Yue Wang, Zhi Tian, FXin Fan, Zhipeng Cai, Cameron Nowzari, Kai Zeng

    Abstract: The rapid growth of Internet of Things (IoT) has led to the widespread deployment of smart IoT devices at wireless edge for collaborative machine learning tasks, ushering in a new era of edge learning. With a huge number of hardware-constrained IoT devices operating in resource-limited wireless networks, edge learning encounters substantial challenges, including communication and computation bottl… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2210.16705

  31. arXiv:2403.20022  [pdf, other

    cs.CV

    Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity

    Authors: Ruijie Quan, Wenguan Wang, Zhibo Tian, Fan Ma, Yi Yang

    Abstract: Reconstructing the viewed images from human brain activity bridges human and computer vision through the Brain-Computer Interface. The inherent variability in brain function between individuals leads existing literature to focus on acquiring separate models for each individual using their respective brain signal data, ignoring commonalities between these data. In this article, we devise Psychometr… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  32. arXiv:2403.17729  [pdf, other

    cs.IR cs.LG

    EulerFormer: Sequential User Behavior Modeling with Complex Vector Attention

    Authors: Zhen Tian, Wayne Xin Zhao, Changwang Zhang, Xin Zhao, Zhongrui Ma, Ji-Rong Wen

    Abstract: To capture user preference, transformer models have been widely applied to model sequential user behavior data. The core of transformer architecture lies in the self-attention mechanism, which computes the pairwise attention scores in a sequence. Due to the permutation-equivariant nature, positional encoding is used to enhance the attention between token representations. In this setting, the pairw… ▽ More

    Submitted 4 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted for publication in SIGIR'24

  33. arXiv:2403.15520  [pdf, other

    cs.LG cs.IR

    GTC: GNN-Transformer Co-contrastive Learning for Self-supervised Heterogeneous Graph Representation

    Authors: Yundong Sun, Dongjie Zhu, Yansong Wang, Zhaoshuo Tian

    Abstract: Graph Neural Networks (GNNs) have emerged as the most powerful weapon for various graph tasks due to the message-passing mechanism's great local information aggregation ability. However, over-smoothing has always hindered GNNs from going deeper and capturing multi-hop neighbors. Unlike GNNs, Transformers can model global information and multi-hop interactions via multi-head self-attention and a pr… ▽ More

    Submitted 22 March, 2024; originally announced March 2024.

  34. arXiv:2403.15480  [pdf, other

    cs.NE cs.LG

    SpikeGraphormer: A High-Performance Graph Transformer with Spiking Graph Attention

    Authors: Yundong Sun, Dongjie Zhu, Yansong Wang, Zhaoshuo Tian, Ning Cao, Gregory O'Hared

    Abstract: Recently, Graph Transformers have emerged as a promising solution to alleviate the inherent limitations of Graph Neural Networks (GNNs) and enhance graph representation performance. Unfortunately, Graph Transformers are computationally expensive due to the quadratic complexity inherent in self-attention when applied over large-scale graphs, especially for node tasks. In contrast, spiking neural ne… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  35. arXiv:2403.14418  [pdf, other

    cs.CV

    OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation

    Authors: Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Hengshuang Zhao, Zhuotao Tian, Jiaya Jia

    Abstract: The booming of 3D recognition in the 2020s began with the introduction of point cloud transformers. They quickly overwhelmed sparse CNNs and became state-of-the-art models, especially in 3D semantic segmentation. However, sparse CNNs are still valuable networks, due to their efficiency treasure, and ease of application. In this work, we reexamine the design distinctions and test the limits of what… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  36. arXiv:2403.14366  [pdf, other

    cs.CV

    SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field

    Authors: Lizhe Liu, Bohua Wang, Hongwei Xie, Daqi Liu, Li Liu, Zhiqiang Tian, Kuiyuan Yang, Bing Wang

    Abstract: Vision-centric 3D environment understanding is both vital and challenging for autonomous driving systems. Recently, object-free methods have attracted considerable attention. Such methods perceive the world by predicting the semantics of discrete voxel grids but fail to construct continuous and accurate obstacle surfaces. To this end, in this paper, we propose SurroundSDF to implicitly predict the… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  37. arXiv:2403.13408  [pdf, other

    cs.CV cs.AI

    S2DM: Sector-Shaped Diffusion Models for Video Generation

    Authors: Haoran Lang, Yuxuan Ge, Zheng Tian

    Abstract: Diffusion models have achieved great success in image generation. However, when leveraging this idea for video generation, we face significant challenges in maintaining the consistency and continuity across video frames. This is mainly caused by the lack of an effective framework to align frames of videos with desired temporal features while preserving consistent semantic and stochastic features.… ▽ More

    Submitted 22 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

    Comments: 17 pages, 6 figures

  38. arXiv:2403.10188  [pdf, other

    cs.CR cs.AR

    Taiyi: A high-performance CKKS accelerator for Practical Fully Homomorphic Encryption

    Authors: Shengyu Fan, Xianglong Deng, Zhuoyu Tian, Zhicheng Hu, Liang Chang, Rui Hou, Dan Meng, Mingzhe Zhang

    Abstract: Fully Homomorphic Encryption (FHE), a novel cryptographic theory enabling computation directly on ciphertext data, offers significant security benefits but is hampered by substantial performance overhead. In recent years, a series of accelerator designs have significantly enhanced the performance of FHE applications, bringing them closer to real-world applicability. However, these accelerators fac… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

    Comments: 14 pages, 15 figures

  39. arXiv:2403.09639  [pdf, other

    cs.CV

    GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

    Authors: Chengyao Wang, Li Jiang, Xiaoyang Wu, Zhuotao Tian, Bohao Peng, Hengshuang Zhao, Jiaya Jia

    Abstract: Self-supervised 3D representation learning aims to learn effective representations from large-scale unlabeled point clouds. Most existing approaches adopt point discrimination as the pretext task, which assigns matched points in two distinct views as positive pairs and unmatched points as negative pairs. However, this approach often results in semantically identical points having dissimilar repres… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  40. arXiv:2403.06563  [pdf, other

    cs.LG cs.CL

    Unraveling the Mystery of Scaling Laws: Part I

    Authors: Hui Su, Zhi Tian, Xiaoyu Shen, Xunliang Cai

    Abstract: Scaling law principles indicate a power-law correlation between loss and variables such as model size, dataset size, and computational resources utilized during training. These principles play a vital role in optimizing various aspects of model pre-training, ultimately contributing to the success of large language models such as GPT-4, Llama and Gemini. However, the original scaling law paper by O… ▽ More

    Submitted 5 April, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

  41. arXiv:2403.03920  [pdf, other

    cs.AI cs.CL cs.HC

    Enhancing Instructional Quality: Leveraging Computer-Assisted Textual Analysis to Generate In-Depth Insights from Educational Artifacts

    Authors: Zewei Tian, Min Sun, Alex Liu, Shawon Sarkar, Jing Liu

    Abstract: This paper explores the transformative potential of computer-assisted textual analysis in enhancing instructional quality through in-depth insights from educational artifacts. We integrate Richard Elmore's Instructional Core Framework to examine how artificial intelligence (AI) and machine learning (ML) methods, particularly natural language processing (NLP), can analyze educational content, teach… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  42. arXiv:2403.00691  [pdf, other

    cs.CV cs.AI

    Tri-Modal Motion Retrieval by Learning a Joint Embedding Space

    Authors: Kangning Yin, Shihao Zou, Yuxuan Ge, Zheng Tian

    Abstract: Information retrieval is an ever-evolving and crucial research domain. The substantial demand for high-quality human motion data especially in online acquirement has led to a surge in human motion research works. Prior works have mainly concentrated on dual-modality learning, such as text and motion tasks, but three-modality learning has been rarely explored. Intuitively, an extra introduced modal… ▽ More

    Submitted 1 March, 2024; originally announced March 2024.

  43. arXiv:2402.18166  [pdf, other

    cs.IR

    Sequence-level Semantic Representation Fusion for Recommender Systems

    Authors: Lanling Xu, Zhen Tian, Bingqian Li, Junjie Zhang, Jinpeng Wang, Mingchen Cai, Wayne Xin Zhao

    Abstract: With the rapid development of recommender systems, there is increasing side information that can be employed to improve the recommendation performance. Specially, we focus on the utilization of the associated \emph{textual data} of items (eg product title) and study how text features can be effectively fused with ID features in sequential recommendation. However, there exists distinct data charact… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

    Comments: 8 pages, 5 figures

  44. arXiv:2402.17933  [pdf, other

    cs.RO

    ICAT: An Indoor Connected and Autonomous Testbed for Vehicle Computing

    Authors: Zhaofeng Tian, William He, Boyang Tian, Ren Zhong, Erfan Foorginejad, Weisong Shi

    Abstract: Indoor autonomous driving testbeds have emerged to complement expensive outdoor testbeds and virtual simulations, offering scalable and cost-effective solutions for research in navigation, traffic optimization, and swarm intelligence. However, they often lack the robust sensing and computing infrastructure for advanced research. Addressing these limitations, we introduce the Indoor Connected Auton… ▽ More

    Submitted 5 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  45. arXiv:2402.17723  [pdf, other

    cs.CV cs.MM cs.SD eess.AS

    Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

    Authors: Yazhou Xing, Yingqing He, Zeyue Tian, Xintao Wang, Qifeng Chen

    Abstract: Video and audio content creation serves as the core technique for the movie industry and professional users. Recently, existing diffusion-based methods tackle video and audio generation separately, which hinders the technique transfer from academia to industry. In this work, we aim at filling the gap, with a carefully designed optimization-based framework for cross-visual-audio and joint-visual-au… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024. Project website: https://yzxing87.github.io/Seeing-and-Hearing/

  46. arXiv:2402.17593  [pdf, other

    cs.RO

    Autonomous Shuttle Operation for Vulnerable Populations: Lessons and Experiences

    Authors: Ren Zhong, Zhaofeng Tian, Jinghui Liao, Weisong Shi

    Abstract: The increasing shortage of drivers poses a significant threat to vulnerable populations, particularly seniors and disabled individuals who heavily depend on public transportation for accessing healthcare services and social events. Autonomous Vehicles (AVs) emerge as a promising alternative, offering potential improvements in accessibility and independence for these groups. However, current design… ▽ More

    Submitted 28 February, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

  47. arXiv:2402.17550  [pdf, other

    cs.NI cs.AI eess.SP

    Emergency Caching: Coded Caching-based Reliable Map Transmission in Emergency Networks

    Authors: Zeyu Tian, Lianming Xu, Liang Li, Li Wang, Aiguo Fei

    Abstract: Many rescue missions demand effective perception and real-time decision making, which highly rely on effective data collection and processing. In this study, we propose a three-layer architecture of emergency caching networks focusing on data collection and reliable transmission, by leveraging efficient perception and edge caching technologies. Based on this architecture, we propose a disaster map… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  48. arXiv:2402.16515  [pdf, other

    cs.CL cs.CR

    LLM-based Privacy Data Augmentation Guided by Knowledge Distillation with a Distribution Tutor for Medical Text Classification

    Authors: Yiping Song, Juhua Zhang, Zhiliang Tian, Yuxin Yang, Minlie Huang, Dongsheng Li

    Abstract: As sufficient data are not always publically accessible for model training, researchers exploit limited data with advanced learning algorithms or expand the dataset via data augmentation (DA). Conducting DA in private domain requires private protection approaches (i.e. anonymization and perturbation), but those methods cannot provide protection guarantees. Differential privacy (DP) learning method… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  49. arXiv:2402.16255  [pdf, other

    cs.LG cs.AI

    Watch Your Head: Assembling Projection Heads to Save the Reliability of Federated Models

    Authors: Jinqian Chen, Jihua Zhu, Qinghai Zheng, Zhongyu Li, Zhiqiang Tian

    Abstract: Federated learning encounters substantial challenges with heterogeneous data, leading to performance degradation and convergence issues. While considerable progress has been achieved in mitigating such an impact, the reliability aspect of federated models has been largely disregarded. In this study, we conduct extensive experiments to investigate the reliability of both generic and personalized fe… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: Accepted in AAAI-24

  50. arXiv:2402.16153  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ChatMusician: Understanding and Generating Music Intrinsically with LLM

    Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu , et al. (10 additional authors not shown)

    Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: GitHub: https://shanghaicannon.github.io/ChatMusician/