Skip to main content

Showing 1–50 of 1,269 results for author: Chen, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11049  [pdf, other

    cs.CL

    MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding

    Authors: Jian Chen, Vashisth Tiwari, Ranajoy Sadhukhan, Zhuoming Chen, Jinyuan Shi, Ian En-Hsu Yen, Beidi Chen

    Abstract: Large Language Models (LLMs) have become more prevalent in long-context applications such as interactive chatbots, document analysis, and agent workflows, but it is challenging to serve long-context requests with low latency and high throughput. Speculative decoding (SD) is a widely used technique to reduce latency without sacrificing performance but the conventional wisdom suggests that its effic… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2408.10520  [pdf, other

    cs.IR

    Efficient and Deployable Knowledge Infusion for Open-World Recommendations via Large Language Models

    Authors: Yunjia Xi, Weiwen Liu, Jianghao Lin, Muyan Weng, Xiaoling Cai, Hong Zhu, Jieming Zhu, Bo Chen, Ruiming Tang, Yong Yu, Weinan Zhang

    Abstract: Recommender systems (RSs) play a pervasive role in today's online services, yet their closed-loop nature constrains their access to open-world knowledge. Recently, large language models (LLMs) have shown promise in bridging this gap. However, previous attempts to directly implement LLMs as recommenders fall short in meeting the requirements of industrial RSs, particularly in terms of online infere… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2306.10933

  3. arXiv:2408.09920  [pdf, other

    cs.CV cs.MM eess.IV

    Sliced Maximal Information Coefficient: A Training-Free Approach for Image Quality Assessment Enhancement

    Authors: Kang Xiao, Xu Wang, Yulin He, Baoliang Chen, Xuelin Shen

    Abstract: Full-reference image quality assessment (FR-IQA) models generally operate by measuring the visual differences between a degraded image and its reference. However, existing FR-IQA models including both the classical ones (eg, PSNR and SSIM) and deep-learning based measures (eg, LPIPS and DISTS) still exhibit limitations in capturing the full perception characteristics of the human visual system (HV… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 6 pages, 5 figures, accepted by ICME2024

  4. HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction

    Authors: Xiao Zhao, Bo Chen, Mingyang Sun, Dingkang Yang, Youxing Wang, Xukun Zhang, Mingcheng Li, Dongliang Kou, Xiaoyi Wei, Lihua Zhang

    Abstract: Vision-based 3D semantic scene completion (SSC) describes autonomous driving scenes through 3D volume representations. However, the occlusion of invisible voxels by scene surfaces poses challenges to current SSC methods in hallucinating refined 3D geometry. This paper proposes HybridOcc, a hybrid 3D volume query proposal method generated by Transformer framework and NeRF representation and refined… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: Accepted to IEEE RAL

  5. AIE: Auction Information Enhanced Framework for CTR Prediction in Online Advertising

    Authors: Yang Yang, Bo Chen, Chenxu Zhu, Menghui Zhu, Xinyi Dai, Huifeng Guo, Muyu Zhang, Zhenhua Dong, Ruiming Tang

    Abstract: Click-Through Rate (CTR) prediction is a fundamental technique for online advertising recommendation and the complex online competitive auction process also brings many difficulties to CTR optimization. Recent studies have shown that introducing posterior auction information contributes to the performance of CTR prediction. However, existing work doesn't fully capitalize on the benefits of auction… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  6. arXiv:2408.07331  [pdf, other

    cs.LG

    RSEA-MVGNN: Multi-View Graph Neural Network with Reliable Structural Enhancement and Aggregation

    Authors: Junyu Chen, Long Shi, Badong Chen

    Abstract: Graph Neural Networks (GNNs) have exhibited remarkable efficacy in learning from multi-view graph data. In the framework of multi-view graph neural networks, a critical challenge lies in effectively combining diverse views, where each view has distinct graph structure features (GSFs). Existing approaches to this challenge primarily focus on two aspects: 1) prioritizing the most important GSFs, 2)… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  7. arXiv:2408.05827  [pdf, other

    cs.IT

    Divergence Maximizing Linear Projection for Supervised Dimension Reduction

    Authors: Biao Chen, Joshua Kortje

    Abstract: This paper proposes two linear projection methods for supervised dimension reduction using only the first and second-order statistics. The methods, each catering to a different parameter regime, are derived under the general Gaussian model by maximizing the Kullback-Leibler divergence between the two classes in the projected sample for a binary classification problem. They subsume existing linear… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 17 pages, 5 figures, 3 tables

  8. arXiv:2408.05676  [pdf, other

    cs.IR

    A Decoding Acceleration Framework for Industrial Deployable LLM-based Recommender Systems

    Authors: Yunjia Xi, Hangyu Wang, Bo Chen, Jianghao Lin, Menghui Zhu, Weiwen Liu, Ruiming Tang, Weinan Zhang, Yong Yu

    Abstract: Recently, increasing attention has been paid to LLM-based recommender systems, but their deployment is still under exploration in the industry. Most deployments utilize LLMs as feature enhancers, generating augmentation knowledge in the offline stage. However, in recommendation scenarios, involving numerous users and items, even offline generation with LLMs consumes considerable time and resources… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  9. arXiv:2408.05019  [pdf, other

    cs.CV

    Instruction Tuning-free Visual Token Complement for Multimodal LLMs

    Authors: Dongsheng Wang, Jiequan Cui, Miaoge Li, Wang Lin, Bo Chen, Hanwang Zhang

    Abstract: As the open community of large language models (LLMs) matures, multimodal LLMs (MLLMs) have promised an elegant bridge between vision and language. However, current research is inherently constrained by challenges such as the need for high-quality instruction pairs and the loss of visual information in image-to-text training objectives. To this end, we propose a Visual Token Complement framework (… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV2024 (20pages)

  10. arXiv:2408.03533  [pdf, other

    cs.IR cs.AI

    Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation

    Authors: Jiachen Zhu, Jianghao Lin, Xinyi Dai, Bo Chen, Rong Shan, Jieming Zhu, Ruiming Tang, Yong Yu, Weinan Zhang

    Abstract: We primarily focus on the field of large language models (LLMs) for recommendation, which has been actively explored recently and poses a significant challenge in effectively enhancing recommender systems with logical reasoning abilities and open-world knowledge. Current mainstream efforts mainly center around injecting personalized information from recommendation models into LLMs by customizing i… ▽ More

    Submitted 11 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

  11. arXiv:2408.03388  [pdf, other

    cs.LG cs.AI cs.CV

    A Non-negative VAE:the Generalized Gamma Belief Network

    Authors: Zhibin Duan, Tiansheng Wen, Muyao Wang, Bo Chen, Mingyuan Zhou

    Abstract: The gamma belief network (GBN), often regarded as a deep topic model, has demonstrated its potential for uncovering multi-layer interpretable latent representations in text data. Its notable capability to acquire interpretable latent factors is partially attributed to sparse and non-negative gamma-distributed latent variables. However, the existing GBN and its variations are constrained by the lin… ▽ More

    Submitted 15 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  12. arXiv:2408.03160  [pdf, other

    cs.CV cs.AI

    User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance

    Authors: Mrinal Verghese, Brian Chen, Hamid Eghbalzadeh, Tushar Nagarajan, Ruta Desai

    Abstract: Our research investigates the capability of modern multimodal reasoning models, powered by Large Language Models (LLMs), to facilitate vision-powered assistants for multi-step daily activities. Such assistants must be able to 1) encode relevant visual history from the assistant's sensors, e.g., camera, 2) forecast future actions for accomplishing the activity, and 3) replan based on the user in th… ▽ More

    Submitted 11 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

    Comments: 9 pages, 4 figures

  13. arXiv:2408.01775  [pdf, other

    cs.HC

    3DStoryline: Immersive Visual Storytelling

    Authors: Haonan Yao, Lixiang Zhao, Boyuan Chen, Kaiwen Li, Hai-Ning Liang, Lingyun Yu

    Abstract: Storyline visualization has emerged as an innovative method for illustrating the development and changes in stories across various domains. Traditional approaches typically represent stories with one line per character, progressing from left to right. While effective for simpler narratives, this method faces significant challenges when dealing with complex stories involving multiple characters, as… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: 9 pages

  14. arXiv:2408.01548  [pdf, other

    cs.CV

    Trainable Pointwise Decoder Module for Point Cloud Segmentation

    Authors: Bike Chen, Chen Gong, Antti Tikanmäki, Juha Röning

    Abstract: Point cloud segmentation (PCS) aims to make per-point predictions and enables robots and autonomous driving cars to understand the environment. The range image is a dense representation of a large-scale outdoor point cloud, and segmentation models built upon the image commonly execute efficiently. However, the projection of the point cloud onto the range image inevitably leads to dropping points b… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: No comments

  15. arXiv:2408.01027  [pdf, ps, other

    cs.GT

    Randomized Strategyproof Mechanisms with Best of Both Worlds Fairness and Efficiency

    Authors: Ankang Sun, Bo Chen

    Abstract: We study the problem of mechanism design for allocating a set of indivisible items among agents with private preferences on items. We are interested in such a mechanism that is strategyproof (where agents' best strategy is to report their true preferences) and is expected to ensure fairness and efficiency to a certain degree. We first present an impossibility result that a deterministic mechanism… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 27 pages

    ACM Class: F.2.2

  16. arXiv:2408.00170  [pdf, other

    cs.HC cs.AI cs.LG

    CREW: Facilitating Human-AI Teaming Research

    Authors: Lingyu Zhang, Zhengran Ji, Boyuan Chen

    Abstract: With the increasing deployment of artificial intelligence (AI) technologies, the potential of humans working with AI agents has been growing at a great speed. Human-AI teaming is an important paradigm for studying various aspects when humans and AI agents work together. The unique aspect of Human-AI teaming research is the need to jointly study humans and AI agents, demanding multidisciplinary res… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

    Comments: Our project website is at: http://generalroboticslab.com/CREW

  17. arXiv:2407.21740  [pdf, other

    cs.LG cs.AI cs.CV

    Contrastive Factor Analysis

    Authors: Zhibin Duan, Tiansheng Wen, Yifei Wang, Chen Zhu, Bo Chen, Mingyuan Zhou

    Abstract: Factor analysis, often regarded as a Bayesian variant of matrix factorization, offers superior capabilities in capturing uncertainty, modeling complex dependencies, and ensuring robustness. As the deep learning era arrives, factor analysis is receiving less and less attention due to their limited expressive ability. On the contrary, contrastive learning has emerged as a potent technique with demon… ▽ More

    Submitted 31 July, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  18. arXiv:2407.20455  [pdf, other

    cs.CV

    Learning Feature-Preserving Portrait Editing from Generated Pairs

    Authors: Bowei Chen, Tiancheng Zhi, Peihao Zhu, Shen Sang, Jing Liu, Linjie Luo

    Abstract: Portrait editing is challenging for existing techniques due to difficulties in preserving subject features like identity. In this paper, we propose a training-based method leveraging auto-generated paired data to learn desired editing while ensuring the preservation of unchanged subject features. Specifically, we design a data generation process to create reasonably good training pairs for desired… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  19. arXiv:2407.20256  [pdf

    cs.DB cs.AI cs.LG

    Making LLMs Work for Enterprise Data Tasks

    Authors: Çağatay Demiralp, Fabian Wenz, Peter Baile Chen, Moe Kayali, Nesime Tatbul, Michael Stonebraker

    Abstract: Large language models (LLMs) know little about enterprise database tables in the private data ecosystem, which substantially differ from web text in structure and content. As LLMs' performance is tied to their training data, a crucial question is how useful they can be in improving enterprise database management and analysis tasks. To address this, we contribute experimental results on LLMs' perfo… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Poster at North East Database Day 2024

  20. arXiv:2407.19078  [pdf, other

    cs.LG stat.ML

    Practical Marketplace Optimization at Uber Using Causally-Informed Machine Learning

    Authors: Bobby Chen, Siyu Chen, Jason Dowlatabadi, Yu Xuan Hong, Vinayak Iyer, Uday Mantripragada, Rishabh Narang, Apoorv Pandey, Zijun Qin, Abrar Sheikh, Hongtao Sun, Jiaqi Sun, Matthew Walker, Kaichen Wei, Chen Xu, Jingnan Yang, Allen T. Zhang, Guoqing Zhang

    Abstract: Budget allocation of marketplace levers, such as incentives for drivers and promotions for riders, has long been a technical and business challenge at Uber; understanding lever budget changes' impact and estimating cost efficiency to achieve predefined budgets is crucial, with the goal of optimal allocations that maximize business value; we introduce an end-to-end machine learning and optimization… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: To be published in the 2nd Workshop on Causal Inference and Machine Learning in Practice, KDD 2024, August 25 to 29, 2024, Barcelona, Spain, 10 pages

    MSC Class: 62J99

  21. HICEScore: A Hierarchical Metric for Image Captioning Evaluation

    Authors: Zequn Zeng, Jianqiao Sun, Hao Zhang, Tiansheng Wen, Yudi Su, Yan Xie, Zhengjue Wang, Bo Chen

    Abstract: Image captioning evaluation metrics can be divided into two categories, reference-based metrics and reference-free metrics. However, reference-based approaches may struggle to evaluate descriptive captions with abundant visual details produced by advanced multimodal large language models, due to their heavy reliance on limited human-annotated references. In contrast, previous reference-free metric… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM2024

  22. arXiv:2407.18046  [pdf, other

    cs.CV cs.AI

    GaussianSR: High Fidelity 2D Gaussian Splatting for Arbitrary-Scale Image Super-Resolution

    Authors: Jintong Hu, Bin Xia, Bin Chen, Wenming Yang, Lei Zhang

    Abstract: Implicit neural representations (INRs) have significantly advanced the field of arbitrary-scale super-resolution (ASSR) of images. Most existing INR-based ASSR networks first extract features from the given low-resolution image using an encoder, and then render the super-resolved result via a multi-layer perceptron decoder. Although these approaches have shown promising results, their performance… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: 13 pages, 12 figures

  23. arXiv:2407.16564  [pdf, other

    cs.SD cs.AI eess.AS

    Audio Prompt Adapter: Unleashing Music Editing Abilities for Text-to-Music with Lightweight Finetuning

    Authors: Fang-Duo Tsai, Shih-Lun Wu, Haven Kim, Bo-Yu Chen, Hao-Chung Cheng, Yi-Hsuan Yang

    Abstract: Text-to-music models allow users to generate nearly realistic musical audio with textual commands. However, editing music audios remains challenging due to the conflicting desiderata of performing fine-grained alterations on the audio while maintaining a simple user interface. To address this challenge, we propose Audio Prompt Adapter (or AP-Adapter), a lightweight addition to pretrained text-to-m… ▽ More

    Submitted 24 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by the 25th International Society for Music Information Retrieval (ISMIR)

  24. arXiv:2407.15892  [pdf, other

    cs.LG

    MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training

    Authors: Cheng Luo, Jiawei Zhao, Zhuoming Chen, Beidi Chen, Anima Anandkumar

    Abstract: We introduce Mini-Sequence Transformer (MsT), a simple and effective methodology for highly efficient and accurate LLM training with extremely long sequences. MsT partitions input sequences and iteratively processes mini-sequences to reduce intermediate memory usage. Integrated with activation recomputation, it enables significant memory savings in both forward and backward passes. In experiments… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  25. arXiv:2407.15754  [pdf, other

    cs.CV cs.CL cs.LG

    LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding

    Authors: Haoning Wu, Dongxu Li, Bei Chen, Junnan Li

    Abstract: Large multimodal models (LMMs) are processing increasingly longer and richer inputs. Albeit the progress, few public benchmark is available to measure such development. To mitigate this gap, we introduce LongVideoBench, a question-answering benchmark that features video-language interleaved inputs up to an hour long. Our benchmark includes 3,763 varying-length web-collected videos with their subti… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 29 pages

  26. arXiv:2407.14098  [pdf, other

    cs.DB

    Top-k Representative Search for Comparative Tree Summarization

    Authors: Yuqi Chen, Xin Huang, Bilian Chen

    Abstract: Data summarization aims at utilizing a small-scale summary to represent massive datasets as a whole, which is useful for visualization and information sipped generation. However, most existing studies of hierarchical summarization only work on \emph{one single tree} by selecting $k$ representative nodes, which neglects an important problem of comparative summarization on two trees. In this paper,… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  27. arXiv:2407.13863  [pdf, other

    cs.CV

    A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks

    Authors: Yixiang Qiu, Hao Fang, Hongyao Yu, Bin Chen, MeiKang Qiu, Shu-Tao Xia

    Abstract: Model Inversion (MI) attacks aim to reconstruct privacy-sensitive training data from released models by utilizing output information, raising extensive concerns about the security of Deep Neural Networks (DNNs). Recent advances in generative adversarial networks (GANs) have contributed significantly to the improved performance of MI attacks due to their powerful ability to generate realistic image… ▽ More

    Submitted 27 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  28. arXiv:2407.11699  [pdf, other

    cs.CV

    Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

    Authors: Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen, Xuguang Lan

    Abstract: This paper presents a general scheme for enhancing the convergence and performance of DETR (DEtection TRansformer). We investigate the slow convergence problem in transformers from a new perspective, suggesting that it arises from the self-attention that introduces no structural bias over inputs. To address this issue, we explore incorporating position relation prior as attention bias to augment o… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  29. arXiv:2407.10179  [pdf, other

    cs.CV

    CLIP-Guided Networks for Transferable Targeted Attacks

    Authors: Hao Fang, Jiawei Kong, Bin Chen, Tao Dai, Hao Wu, Shu-Tao Xia

    Abstract: Transferable targeted adversarial attacks aim to mislead models into outputting adversary-specified predictions in black-box scenarios. Recent studies have introduced \textit{single-target} generative attacks that train a generator for each target class to generate highly transferable perturbations, resulting in substantial computational overhead when handling multiple classes. \textit{Multi-targe… ▽ More

    Submitted 22 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  30. arXiv:2407.10081  [pdf, other

    cs.IR

    All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era

    Authors: Bo Chen, Xinyi Dai, Huifeng Guo, Wei Guo, Weiwen Liu, Yong Liu, Jiarui Qin, Ruiming Tang, Yichao Wang, Chuhan Wu, Yaxiong Wu, Hao Zhang

    Abstract: Recommender systems (RS) are vital for managing information overload and delivering personalized content, responding to users' diverse information needs. The emergence of large language models (LLMs) offers a new horizon for redefining recommender systems with vast general knowledge and reasoning capabilities. Standing across this LLM era, we aim to integrate recommender systems into a broader pic… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

  31. arXiv:2407.10074  [pdf, ps, other

    cs.IT

    Optimal linear codes with few weights from simplicial complexes

    Authors: Bing Chen, Yunge Xu, Zhao Hu, Nian Li, Xiangyong Zeng

    Abstract: Recently, constructions of optimal linear codes from simplicial complexes have attracted much attention and some related nice works were presented. Let $q$ be a prime power. In this paper, by using the simplicial complexes of ${\mathbb F}_{q}^m$ with one single maximal element, we construct four families of linear codes over the ring ${\mathbb F}_{q}+u{\mathbb F}_{q}$ ($u^2=0$), which generalizes… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 18 pages

  32. arXiv:2407.08722  [pdf, other

    cs.RO cs.CV cs.LG

    Unifying 3D Representation and Control of Diverse Robots with a Single Camera

    Authors: Sizhe Lester Li, Annan Zhang, Boyuan Chen, Hanna Matusik, Chao Liu, Daniela Rus, Vincent Sitzmann

    Abstract: Mirroring the complex structures and diverse functions of natural organisms is a long-standing challenge in robotics. Modern fabrication techniques have dramatically expanded feasible hardware, yet deploying these systems requires control software to translate desired motions into actuator commands. While conventional robots can easily be modeled as rigid links connected via joints, it remains an… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Project Page: https://sizhe-li.github.io/publication/neural_jacobian_field

  33. arXiv:2407.08019  [pdf, other

    cs.CV

    Coherent and Multi-modality Image Inpainting via Latent Space Optimization

    Authors: Lingzhi Pan, Tong Zhang, Bingyuan Chen, Qi Zhou, Wei Ke, Sabine Süsstrunk, Mathieu Salzmann

    Abstract: With the advancements in denoising diffusion probabilistic models (DDPMs), image inpainting has significantly evolved from merely filling information based on nearby regions to generating content conditioned on various prompts such as text, exemplar images, and sketches. However, existing methods, such as model fine-tuning and simple concatenation of latent vectors, often result in generation fail… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  34. arXiv:2407.05425  [pdf, other

    cs.RO

    ClutterGen: A Cluttered Scene Generator for Robot Learning

    Authors: Yinsen Jia, Boyuan Chen

    Abstract: We introduce ClutterGen, a physically compliant simulation scene generator capable of producing highly diverse, cluttered, and stable scenes for robot learning. Generating such scenes is challenging as each object must adhere to physical laws like gravity and collision. As the number of objects increases, finding valid poses becomes more difficult, necessitating significant human engineering effor… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  35. arXiv:2407.05125  [pdf, other

    cs.DC cs.LG

    A Joint Approach to Local Updating and Gradient Compression for Efficient Asynchronous Federated Learning

    Authors: Jiajun Song, Jiajun Luo, Rongwei Lu, Shuzhao Xie, Bin Chen, Zhi Wang

    Abstract: Asynchronous Federated Learning (AFL) confronts inherent challenges arising from the heterogeneity of devices (e.g., their computation capacities) and low-bandwidth environments, both potentially causing stale model updates (e.g., local gradients) for global aggregation. Traditional approaches mitigating the staleness of updates typically focus on either adjusting the local updating or gradient co… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  36. arXiv:2407.04960  [pdf, other

    cs.IR

    MemoCRS: Memory-enhanced Sequential Conversational Recommender Systems with Large Language Models

    Authors: Yunjia Xi, Weiwen Liu, Jianghao Lin, Bo Chen, Ruiming Tang, Weinan Zhang, Yong Yu

    Abstract: Conversational recommender systems (CRSs) aim to capture user preferences and provide personalized recommendations through multi-round natural language dialogues. However, most existing CRS models mainly focus on dialogue comprehension and preferences mining from the current dialogue session, overlooking user preferences in historical dialogue sessions. The preferences embedded in the user's histo… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  37. arXiv:2407.04147  [pdf, other

    cs.SE

    ALPINE: An adaptive language-agnostic pruning method for language models for code

    Authors: Mootez Saad, José Antonio Hernández López, Boqi Chen, Dániel Varró, Tushar Sharma

    Abstract: Language models of code have demonstrated state-of-the-art performance across various software engineering and source code analysis tasks. However, their demanding computational resource requirements and consequential environmental footprint remain as significant challenges. This work introduces ALPINE, an adaptive programming language-agnostic pruning technique designed to substantially reduce th… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  38. arXiv:2407.03963  [pdf, other

    cs.CL cs.AI

    LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs

    Authors: LLM-jp, :, Akiko Aizawa, Eiji Aramaki, Bowen Chen, Fei Cheng, Hiroyuki Deguchi, Rintaro Enomoto, Kazuki Fujii, Kensuke Fukumoto, Takuya Fukushima, Namgi Han, Yuto Harada, Chikara Hashimoto, Tatsuya Hiraoka, Shohei Hisada, Sosuke Hosokawa, Lu Jie, Keisuke Kamata, Teruhito Kanazawa, Hiroki Kanezashi, Hiroshi Kataoka, Satoru Katsumata, Daisuke Kawahara, Seiya Kawano , et al. (57 additional authors not shown)

    Abstract: This paper introduces LLM-jp, a cross-organizational project for the research and development of Japanese large language models (LLMs). LLM-jp aims to develop open-source and strong Japanese LLMs, and as of this writing, more than 1,500 participants from academia and industry are working together for this purpose. This paper presents the background of the establishment of LLM-jp, summaries of its… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  39. arXiv:2407.01955  [pdf, other

    cs.CL

    S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models

    Authors: Parsa Kavehzadeh, Mohammadreza Pourreza, Mojtaba Valipour, Tinashu Zhu, Haoli Bai, Ali Ghodsi, Boxing Chen, Mehdi Rezagholizadeh

    Abstract: Deployment of autoregressive large language models (LLMs) is costly, and as these models increase in size, the associated costs will become even more considerable. Consequently, different methods have been proposed to accelerate the token generation process and reduce costs. Speculative decoding (SD) is among the most promising approaches to speed up the LLM decoding process by verifying multiple… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  40. arXiv:2407.01392  [pdf, other

    cs.LG cs.CV cs.RO

    Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion

    Authors: Boyuan Chen, Diego Marti Monso, Yilun Du, Max Simchowitz, Russ Tedrake, Vincent Sitzmann

    Abstract: This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is trained to denoise a set of tokens with independent per-token noise levels. We apply Diffusion Forcing to sequence generative modeling by training a causal next-token prediction model to generate one or several future tokens without fully diffusing past ones. Our approach is shown to combine the strengths of… ▽ More

    Submitted 4 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

    Comments: Project website: https://boyuan.space/diffusion-forcing Code: https://github.com/buoyancy99/diffusion-forcing

  41. arXiv:2407.00467  [pdf, other

    cs.LG cs.DC eess.IV

    VcLLM: Video Codecs are Secretly Tensor Codecs

    Authors: Ceyu Xu, Yongji Wu, Xinyu Yang, Beidi Chen, Matthew Lentz, Danyang Zhuo, Lisa Wu Wills

    Abstract: As the parameter size of large language models (LLMs) continues to expand, the need for a large memory footprint and high communication bandwidth have become significant bottlenecks for the training and inference of LLMs. To mitigate these bottlenecks, various tensor compression techniques have been proposed to reduce the data size, thereby alleviating memory requirements and communication pressur… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  42. arXiv:2406.19995  [pdf, other

    cs.CL cs.AI cs.LG

    Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model

    Authors: Habib Hajimolahoseini, Mohammad Hassanpour, Foozhan Ataiefard, Boxing Chen, Yang Liu

    Abstract: This paper introduces a novel method of Progressive Low Rank Decomposition (PLRD) tailored for the compression of large language models. Our approach leverages a pre-trained model, which is then incrementally decompressed to smaller sizes using progressively lower ranks. This method allows for significant reductions in computational overhead and energy consumption, as subsequent models are derived… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  43. arXiv:2406.19971  [pdf, other

    cs.RO

    Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies

    Authors: Pingcheng Jian, Easop Lee, Zachary Bell, Michael M. Zavlanos, Boyuan Chen

    Abstract: Vision-based imitation learning has shown promising capabilities of endowing robots with various motion skills given visual observation. However, current visuomotor policies fail to adapt to drastic changes in their visual observations. We present Perception Stitching that enables strong zero-shot adaptation to large visual changes by directly stitching novel combinations of visual encoders. Our k… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  44. arXiv:2406.19963  [pdf, other

    cs.RO cs.AI cs.LG

    Text2Robot: Evolutionary Robot Design from Text Descriptions

    Authors: Ryan P. Ringel, Zachary S. Charlick, Jiaxun Liu, Boxi Xia, Boyuan Chen

    Abstract: Robot design has traditionally been costly and labor-intensive. Despite advancements in automated processes, it remains challenging to navigate a vast design space while producing physically manufacturable robots. We introduce Text2Robot, a framework that converts user text specifications and performance preferences into physical quadrupedal robots. Within minutes, Text2Robot can use text-to-3D mo… ▽ More

    Submitted 1 July, 2024; v1 submitted 28 June, 2024; originally announced June 2024.

    Comments: Our project website is at: http://generalroboticslab.com/Text2Robot

  45. arXiv:2406.19043  [pdf

    eess.IV cs.AI cs.CV cs.DB

    CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

    Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Yajing Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, Jing Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

    Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 19 pages, 3 figures, 2 tables

  46. arXiv:2406.18825  [pdf, other

    cs.IR

    ELCoRec: Enhance Language Understanding with Co-Propagation of Numerical and Categorical Features for Recommendation

    Authors: Jizheng Chen, Kounianhua Du, Jianghao Lin, Bo Chen, Ruiming Tang, Weinan Zhang

    Abstract: Large language models have been flourishing in the natural language processing (NLP) domain, and their potential for recommendation has been paid much attention to. Despite the intelligence shown by the recommendation-oriented finetuned models, LLMs struggle to fully understand the user behavior patterns due to their innate weakness in interpreting numerical features and the overhead for long cont… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  47. arXiv:2406.18204  [pdf, other

    cs.NI

    Analysis of Channel Uncertainty in Trusted Wireless Services via Repeated Interactions

    Authors: Bingwen Chen, Xintong Ling, Weihang Cao, Jiaheng Wang, Zhi Ding

    Abstract: The coexistence of heterogeneous sub-networks in 6G poses new security and trust concerns and thus calls for a perimeterless-security model. Blockchain radio access network (B-RAN) provides a trust-building approach via repeated interactions rather than relying on pre-established trust or central authentication. Such a trust-building process naturally supports dynamic trusted services across vario… ▽ More

    Submitted 2 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  48. arXiv:2406.17932  [pdf, other

    cs.RO cs.MM cs.SD eess.AS

    SonicSense: Object Perception from In-Hand Acoustic Vibration

    Authors: Jiaxun Liu, Boyuan Chen

    Abstract: We introduce SonicSense, a holistic design of hardware and software to enable rich robot object perception through in-hand acoustic vibration sensing. While previous studies have shown promising results with acoustic sensing for object perception, current solutions are constrained to a handful of objects with simple geometries and homogeneous materials, single-finger sensing, and mixing training a… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Our project website is at: http://generalroboticslab.com/SonicSense

  49. arXiv:2406.17600  [pdf, other

    cs.CL

    "Seeing the Big through the Small": Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?

    Authors: Beiduo Chen, Xinpeng Wang, Siyao Peng, Robert Litschko, Anna Korhonen, Barbara Plank

    Abstract: Human label variation (HLV) is a valuable source of information that arises when multiple human annotators provide different labels for valid reasons. In Natural Language Inference (NLI) earlier approaches to capturing HLV involve either collecting annotations from many crowd workers to represent human judgment distribution (HJD) or use expert linguists to provide detailed explanations for their c… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 22 pages, 9 figures

  50. arXiv:2406.17577  [pdf, other

    eess.IV cs.CV

    Advancing Cell Detection in Anterior Segment Optical Coherence Tomography Images

    Authors: Boyu Chen, Ameenat L. Solebo, Paul Taylor

    Abstract: Anterior uveitis, a common form of eye inflammation, can lead to permanent vision loss if not promptly diagnosed. Monitoring this condition involves quantifying inflammatory cells in the anterior chamber (AC) of the eye, which can be captured using Anterior Segment Optical Coherence Tomography (AS-OCT). However, manually identifying cells in AS-OCT images is time-consuming and subjective. Moreover… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.