Skip to main content

Showing 1–50 of 1,247 results for author: Zhang, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11296  [pdf, other

    cs.SE cs.CL

    RePair: Automated Program Repair with Process-based Feedback

    Authors: Yuze Zhao, Zhenya Huang, Yixiao Ma, Rui Li, Kai Zhang, Hao Jiang, Qi Liu, Linbo Zhu, Yu Su

    Abstract: The gap between the trepidation of program reliability and the expense of repairs underscores the indispensability of Automated Program Repair (APR). APR is instrumental in transforming vulnerable programs into more robust ones, bolstering program reliability while simultaneously diminishing the financial burden of manual repairs. Commercial-scale language models (LM) have taken APR to unprecedent… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 15 pages, 13 figures

    Journal ref: ACL 2024 Findings

  2. arXiv:2408.10899  [pdf, other

    cs.RO

    All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents

    Authors: Zhiqiang Wang, Hao Zheng, Yunshuang Nie, Wenjun Xu, Qingwei Wang, Hua Ye, Zhe Li, Kaidong Zhang, Xuewen Cheng, Wanxi Dong, Chang Cai, Liang Lin, Feng Zheng, Xiaodan Liang

    Abstract: Embodied AI is transforming how AI systems interact with the physical world, yet existing datasets are inadequate for developing versatile, general-purpose agents. These limitations include a lack of standardized formats, insufficient data diversity, and inadequate data volume. To address these issues, we introduce ARIO (All Robots In One), a new data standard that enhances existing datasets by of… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Project website: https://imaei.github.io/project_pages/ario/

  3. arXiv:2408.10666  [pdf, other

    cs.IR

    Accelerating the Surrogate Retraining for Poisoning Attacks against Recommender Systems

    Authors: Yunfan Wu, Qi Cao, Shuchang Tao, Kaike Zhang, Fei Sun, Huawei Shen

    Abstract: Recent studies have demonstrated the vulnerability of recommender systems to data poisoning attacks, where adversaries inject carefully crafted fake user interactions into the training data of recommenders to promote target items. Current attack methods involve iteratively retraining a surrogate recommender on the poisoned data with the latest fake users to optimize the attack. However, this repet… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted by RecSys 2024

  4. arXiv:2408.10609  [pdf, other

    cs.LG q-bio.GN stat.ML

    PerturBench: Benchmarking Machine Learning Models for Cellular Perturbation Analysis

    Authors: Yan Wu, Esther Wershof, Sebastian M Schmon, Marcel Nassar, Błażej Osiński, Ridvan Eksi, Kun Zhang, Thore Graepel

    Abstract: We present a comprehensive framework for predicting the effects of perturbations in single cells, designed to standardize benchmarking in this rapidly evolving field. Our framework, PerturBench, includes a user-friendly platform, diverse datasets, metrics for fair model comparison, and detailed performance analysis. Extensive evaluations of published and baseline models reveal limitations like mod… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 9 pages plus 19 pages supplementary material. Code is available at https://github.com/altoslabs/perturbench

  5. arXiv:2408.10588  [pdf, other

    cs.CV cs.GR

    DEGAS: Detailed Expressions on Full-Body Gaussian Avatars

    Authors: Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang

    Abstract: Although neural rendering has made significant advancements in creating lifelike, animatable full-body and head avatars, incorporating detailed expressions into full-body avatars remains largely unexplored. We present DEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for full-body avatars with rich facial expressions. Trained on multiview videos of a given subject, our method lea… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  6. arXiv:2408.10469  [pdf, other

    cs.CV cs.IR

    LSVOS Challenge 3rd Place Report: SAM2 and Cutie based VOS

    Authors: Xinyu Liu, Jing Zhang, Kexin Zhang, Xu Liu, Lingling Li

    Abstract: Video Object Segmentation (VOS) presents several challenges, including object occlusion and fragmentation, the dis-appearance and re-appearance of objects, and tracking specific objects within crowded scenes. In this work, we combine the strengths of the state-of-the-art (SOTA) models SAM2 and Cutie to address these challenges. Additionally, we explore the impact of various hyperparameters on vide… ▽ More

    Submitted 20 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: text overlap with arXiv:2406.03668

  7. arXiv:2408.10353  [pdf, other

    cs.LG stat.ML

    On the Identifiability of Sparse ICA without Assuming Non-Gaussianity

    Authors: Ignavier Ng, Yujia Zheng, Xinshuai Dong, Kun Zhang

    Abstract: Independent component analysis (ICA) is a fundamental statistical tool used to reveal hidden generative processes from observed data. However, traditional ICA approaches struggle with the rotational invariance inherent in Gaussian distributions, often necessitating the assumption of non-Gaussianity in the underlying sources. This may limit their applicability in broader contexts. To accommodate Ga… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: NeurIPS 2023

  8. arXiv:2408.09937  [pdf, other

    quant-ph cs.LG

    The curse of random quantum data

    Authors: Kaining Zhang, Junyu Liu, Liu Liu, Liang Jiang, Min-Hsiu Hsieh, Dacheng Tao

    Abstract: Quantum machine learning, which involves running machine learning algorithms on quantum devices, may be one of the most significant flagship applications for these devices. Unlike its classical counterparts, the role of data in quantum machine learning has not been fully understood. In this work, we quantify the performances of quantum machine learning in the landscape of quantum data. Provided th… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 40 pages, 8 figures

  9. arXiv:2408.09698  [pdf, other

    cs.IR cs.AI

    Harnessing Multimodal Large Language Models for Multimodal Sequential Recommendation

    Authors: Yuyang Ye, Zhi Zheng, Yishan Shen, Tianshu Wang, Hengruo Zhang, Peijun Zhu, Runlong Yu, Kai Zhang, Hui Xiong

    Abstract: Recent advances in Large Language Models (LLMs) have demonstrated significant potential in the field of Recommendation Systems (RSs). Most existing studies have focused on converting user behavior logs into textual prompts and leveraging techniques such as prompt tuning to enable LLMs for recommendation tasks. Meanwhile, research interest has recently grown in multimodal recommendation systems tha… ▽ More

    Submitted 20 August, 2024; v1 submitted 19 August, 2024; originally announced August 2024.

  10. arXiv:2408.09650  [pdf, other

    cs.CV cs.AI cs.MM eess.IV

    ExpoMamba: Exploiting Frequency SSM Blocks for Efficient and Effective Image Enhancement

    Authors: Eashan Adhikarla, Kai Zhang, John Nicholson, Brian D. Davison

    Abstract: Low-light image enhancement remains a challenging task in computer vision, with existing state-of-the-art models often limited by hardware constraints and computational inefficiencies, particularly in handling high-resolution images. Recent foundation models, such as transformers and diffusion models, despite their efficacy in various domains, are limited in use on edge devices due to their comput… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Journal ref: Efficient Systems for Foundation Models II, International Conference on Machine Learning (ICML) 2024

  11. arXiv:2408.08933  [pdf, other

    cs.IR cs.AI cs.DB

    RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search

    Authors: Meng Chen, Kai Zhang, Zhenying He, Yinan Jing, X. Sean Wang

    Abstract: Approximate Nearest Neighbor Search (ANNS) is a fundamental and critical component in many applications, including recommendation systems and large language model-based applications. With the advancement of multimodal neural models, which transform data from different modalities into a shared high-dimensional space as feature vectors, cross-modal ANNS aims to use the data vector from one modality… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: to be published in PVLDB

  12. arXiv:2408.08926  [pdf, other

    cs.CR cs.AI cs.CL cs.CY cs.LG

    Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models

    Authors: Andy K. Zhang, Neil Perry, Riya Dulepet, Eliot Jones, Justin W. Lin, Joey Ji, Celeste Menders, Gashon Hussein, Samantha Liu, Donovan Jasper, Pura Peetathawatchai, Ari Glenn, Vikram Sivashankar, Daniel Zamoshchin, Leo Glikbarg, Derek Askaryar, Mike Yang, Teddy Zhang, Rishi Alluri, Nathan Tran, Rinnara Sangpisit, Polycarpos Yiorkadjis, Kenny Osele, Gautham Raghupathi, Dan Boneh , et al. (2 additional authors not shown)

    Abstract: Language Model (LM) agents for cybersecurity that are capable of autonomously identifying vulnerabilities and executing exploits have the potential to cause real-world impact. Policymakers, model providers, and other researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such agents to help mitigate cyberrisk and investigate opportunities for penetrat… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 86 pages, 7 figures

  13. arXiv:2408.08736  [pdf, other

    cs.CV

    Task-Aware Dynamic Transformer for Efficient Arbitrary-Scale Image Super-Resolution

    Authors: Tianyi Xu, Yiji Zhou, Xiaotao Hu, Kai Zhang, Anran Zhang, Xingye Qiu, Jun Xu

    Abstract: Arbitrary-scale super-resolution (ASSR) aims to learn a single model for image super-resolution at arbitrary magnifying scales. Existing ASSR networks typically comprise an off-the-shelf scale-agnostic feature extractor and an arbitrary scale upsampler. These feature extractors often use fixed network architectures to address different ASSR inference tasks, each of which is characterized by an inp… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: ECAI 2024

  14. arXiv:2408.08342  [pdf, other

    cs.GR cs.CV

    CT4D: Consistent Text-to-4D Generation with Animatable Meshes

    Authors: Ce Chen, Shaoli Huang, Xuelin Chen, Guangyi Chen, Xiaoguang Han, Kun Zhang, Mingming Gong

    Abstract: Text-to-4D generation has recently been demonstrated viable by integrating a 2D image diffusion model with a video diffusion model. However, existing models tend to produce results with inconsistent motions and geometric structures over time. To this end, we present a novel framework, coined CT4D, which directly operates on animatable meshes for generating consistent 4D content from arbitrary user… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  15. "I Try to Represent Myself as I Am": Self-Presentation Preferences of People with Invisible Disabilities through Embodied Social VR Avatars

    Authors: Ria J. Gualano, Lucy Jiang, Kexin Zhang, Tanisha Shende, Andrea Stevenson Won, Shiri Azenkot

    Abstract: With the increasing adoption of social virtual reality (VR), it is critical to design inclusive avatars. While researchers have investigated how and why blind and d/Deaf people wish to disclose their disabilities in VR, little is known about the preferences of many others with invisible disabilities (e.g., ADHD, dyslexia, chronic conditions). We filled this gap by interviewing 15 participants, eac… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: To appear at ASSETS 2024

  16. arXiv:2408.08146  [pdf, other

    cs.CL

    KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning

    Authors: Kaiqi Zhang, Jing Zhao, Rui Chen

    Abstract: Large Language Models (LLMs) exhibit high inference latency due to their autoregressive decoding nature. While the draft head in speculative decoding mitigates this issue, its full potential remains unexplored. In this paper, we introduce KOALA (K-layer Optimized Adversarial Learning Architecture), an orthogonal approach to the draft head. By transforming the conventional single-layer draft head i… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  17. arXiv:2408.07340  [pdf, other

    cs.LG cs.AI

    Towards Few-shot Self-explaining Graph Neural Networks

    Authors: Jingyu Peng, Qi Liu, Linan Yue, Zaixi Zhang, Kai Zhang, Yunhao Sha

    Abstract: Recent advancements in Graph Neural Networks (GNNs) have spurred an upsurge of research dedicated to enhancing the explainability of GNNs, particularly in critical domains such as medicine. A promising approach is the self-explaining method, which outputs explanations along with predictions. However, existing self-explaining models require a large amount of training data, rendering them unavailabl… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  18. arXiv:2408.07278  [pdf, other

    cs.IR cs.AI cs.CV

    Scene-wise Adaptive Network for Dynamic Cold-start Scenes Optimization in CTR Prediction

    Authors: Wenhao Li, Jie Zhou, Chuan Luo, Chao Tang, Kun Zhang, Shixiong Zhao

    Abstract: In the realm of modern mobile E-commerce, providing users with nearby commercial service recommendations through location-based online services has become increasingly vital. While machine learning approaches have shown promise in multi-scene recommendation, existing methodologies often struggle to address cold-start problems in unprecedented scenes: the increasing diversity of commercial choices,… ▽ More

    Submitted 18 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

    Comments: 10 pages, 6 figures, accepted by Recsys 2024

    MSC Class: 68T09 ACM Class: I.2.0

  19. arXiv:2408.07176  [pdf, other

    cs.NE

    Surrogate-Assisted Search with Competitive Knowledge Transfer for Expensive Optimization

    Authors: Xiaoming Xue, Yao Hu, Liang Feng, Kai Zhang, Linqi Song, Kay Chen Tan

    Abstract: Expensive optimization problems (EOPs) have attracted increasing research attention over the decades due to their ubiquity in a variety of practical applications. Despite many sophisticated surrogate-assisted evolutionary algorithms (SAEAs) that have been developed for solving such problems, most of them lack the ability to transfer knowledge from previously-solved tasks and always start their sea… ▽ More

    Submitted 20 August, 2024; v1 submitted 13 August, 2024; originally announced August 2024.

    Comments: 22 pages, 14 figures

  20. arXiv:2408.07060  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents

    Authors: Kexun Zhang, Weiran Yao, Zuxin Liu, Yihao Feng, Zhiwei Liu, Rithesh Murthy, Tian Lan, Lei Li, Renze Lou, Jiacheng Xu, Bo Pang, Yingbo Zhou, Shelby Heinecke, Silvio Savarese, Huan Wang, Caiming Xiong

    Abstract: Large language model (LLM) agents have shown great potential in solving real-world software engineering (SWE) problems. The most advanced open-source SWE agent can resolve over 27% of real GitHub issues in SWE-Bench Lite. However, these sophisticated agent frameworks exhibit varying strengths, excelling in certain tasks while underperforming in others. To fully harness the diversity of these agent… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  21. arXiv:2408.07009  [pdf, other

    cs.CV

    Imagen 3

    Authors: Imagen-Team-Google, :, Jason Baldridge, Jakob Bauer, Mukul Bhutani, Nicole Brichtova, Andrew Bunner, Kelvin Chan, Yichang Chen, Sander Dieleman, Yuqing Du, Zach Eaton-Rosen, Hongliang Fei, Nando de Freitas, Yilin Gao, Evgeny Gladchenko, Sergio Gómez Colmenarejo, Mandy Guo, Alex Haig, Will Hawkins, Hexiang Hu, Huilian Huang, Tobenna Peter Igwe, Christos Kaplanis, Siavash Khodadadeh , et al. (227 additional authors not shown)

    Abstract: We introduce Imagen 3, a latent diffusion model that generates high quality images from text prompts. We describe our quality and responsibility evaluations. Imagen 3 is preferred over other state-of-the-art (SOTA) models at the time of evaluation. In addition, we discuss issues around safety and representation, as well as methods we used to minimize the potential harm of our models.

    Submitted 13 August, 2024; originally announced August 2024.

  22. arXiv:2408.06878  [pdf, other

    cs.CV cs.GR

    PBIR-NIE: Glossy Object Capture under Non-Distant Lighting

    Authors: Guangyan Cai, Fujun Luan, Miloš Hašan, Kai Zhang, Sai Bi, Zexiang Xu, Iliyan Georgiev, Shuang Zhao

    Abstract: Glossy objects present a significant challenge for 3D reconstruction from multi-view input images under natural lighting. In this paper, we introduce PBIR-NIE, an inverse rendering framework designed to holistically capture the geometry, material attributes, and surrounding illumination of such objects. We propose a novel parallax-aware non-distant environment map as a lightweight and efficient li… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  23. arXiv:2408.06286  [pdf, other

    cs.CV

    Mipmap-GS: Let Gaussians Deform with Scale-specific Mipmap for Anti-aliasing Rendering

    Authors: Jiameng Li, Yue Shi, Jiezhang Cao, Bingbing Ni, Wenjun Zhang, Kai Zhang, Luc Van Gool

    Abstract: 3D Gaussian Splatting (3DGS) has attracted great attention in novel view synthesis because of its superior rendering efficiency and high fidelity. However, the trained Gaussians suffer from severe zooming degradation due to non-adjustable representation derived from single-scale training. Though some methods attempt to tackle this problem via post-processing techniques such as selective rendering… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: 9 pages

  24. arXiv:2408.06141  [pdf, ps, other

    cs.FL

    [Draft] High-order observers and high-order state-estimation-based properties of discrete-event systems

    Authors: Kuize Zhang, Xiaoguang Han, Alessandro Giua, Carla Seatzu

    Abstract: State-estimation-based properties are central properties in discrete-event systems modeled by labeled finite-state automata studied over the past 3 decades. Most existing results are based on a single agent who knows the structure of a system and can observe a subset of events and estimate the system's state based on the system's structure and the agent's observation to the system. The main tool u… ▽ More

    Submitted 13 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: 32 pages, 38 figures

  25. arXiv:2408.06087  [pdf, other

    cs.CL cs.AI cs.LG

    Building Decision Making Models Through Language Model Regime

    Authors: Yu Zhang, Haoxiang Liu, Feijun Jiang, Weihua Luo, Kaifu Zhang

    Abstract: We propose a novel approach for decision making problems leveraging the generalization capabilities of large language models (LLMs). Traditional methods such as expert systems, planning algorithms, and reinforcement learning often exhibit limited generalization, typically requiring the training of new models for each unique task. In contrast, LLMs demonstrate remarkable success in generalizing acr… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  26. arXiv:2408.06079  [pdf, other

    cs.CV

    Towards Adversarial Robustness via Debiased High-Confidence Logit Alignment

    Authors: Kejia Zhang, Juanjuan Weng, Zhiming Luo, Shaozi Li

    Abstract: Despite the significant advances that deep neural networks (DNNs) have achieved in various visual tasks, they still exhibit vulnerability to adversarial examples, leading to serious security concerns. Recent adversarial training techniques have utilized inverse adversarial attacks to generate high-confidence examples, aiming to align the distributions of adversarial examples with the high-confiden… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  27. arXiv:2408.06047  [pdf, other

    cs.CV

    BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training

    Authors: Xuanpu Zhang, Dan Song, Pengxin Zhan, Qingguo Chen, Zhao Xu, Weihua Luo, Kaifu Zhang, Anan Liu

    Abstract: Image-based virtual try-on is an increasingly popular and important task to generate realistic try-on images of specific person. Existing methods always employ an accurate mask to remove the original garment in the source image, thus achieving realistic synthesized images in simple and conventional try-on scenarios based on powerful diffusion model. Therefore, acquiring suitable mask is vital to t… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  28. arXiv:2408.05926  [pdf, other

    cs.AI cs.LG cs.MM

    BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation

    Authors: Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee, Kang Zhang, Yu-Jung Heo, Du-Seong Chang, Chang D. Yoo

    Abstract: Multimodal Dialogue Response Generation (MDRG) is a recently proposed task where the model needs to generate responses in texts, images, or a blend of both based on the dialogue context. Due to the lack of a large-scale dataset specifically for this task and the benefits of leveraging powerful pre-trained models, previous work relies on the text modality as an intermediary step for both the image… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  29. arXiv:2408.05788  [pdf, other

    cs.LG cs.AI stat.ML

    Continual Learning of Nonlinear Independent Representations

    Authors: Boyang Sun, Ignavier Ng, Guangyi Chen, Yifan Shen, Qirong Ho, Kun Zhang

    Abstract: Identifying the causal relations between interested variables plays a pivotal role in representation learning as it provides deep insights into the dataset. Identifiability, as the central theme of this approach, normally hinges on leveraging data from multiple distributions (intervention, distribution shift, time series, etc.). Despite the exciting development in this field, a practical but often… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 9 pages, 5 Figures

  30. arXiv:2408.05694  [pdf, other

    cs.CR

    ICSFuzz: Collision Detector Bug Discovery in Autonomous Driving Simulators

    Authors: Weiwei Fu, Heqing Huang, Yifan Zhang, Ke Zhang, Jin Huang, Wei-Bin Lee, Jianping Wang

    Abstract: With the increasing adoption of autonomous vehicles, ensuring the reliability of autonomous driving systems (ADSs) deployed on autonomous vehicles has become a significant concern. Driving simulators have emerged as crucial platforms for testing autonomous driving systems, offering realistic, dynamic, and configurable environments. However, existing simulation-based ADS testers have largely overlo… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  31. arXiv:2408.05428  [pdf, other

    cs.LG stat.ME stat.ML

    Generalized Encouragement-Based Instrumental Variables for Counterfactual Regression

    Authors: Anpeng Wu, Kun Kuang, Ruoxuan Xiong, Xiangwei Chen, Zexu Sun, Fei Wu, Kun Zhang

    Abstract: In causal inference, encouragement designs (EDs) are widely used to analyze causal effects, when randomized controlled trials (RCTs) are impractical or compliance to treatment cannot be perfectly enforced. Unlike RCTs, which directly allocate treatments, EDs randomly assign encouragement policies that positively motivate individuals to engage in a specific treatment. These random encouragements ac… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  32. arXiv:2408.05411  [pdf, other

    cs.CV

    How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model

    Authors: Yuxin Zhu, Huiyu Duan, Kaiwei Zhang, Yucheng Zhu, Xilei Zhu, Long Teng, Xiongkuo Min, Guangtao Zhai

    Abstract: Understanding and predicting viewer attention in omnidirectional videos (ODVs) is crucial for enhancing user engagement in virtual and augmented reality applications. Although both audio and visual modalities are essential for saliency prediction in ODVs, the joint exploitation of these two modalities has been limited, primarily due to the absence of large-scale audio-visual saliency databases and… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  33. arXiv:2408.05112  [pdf, other

    cs.LG cs.AI eess.IV

    Semantic Successive Refinement: A Generative AI-aided Semantic Communication Framework

    Authors: Kexin Zhang, Lixin Li, Wensheng Lin, Yuna Yan, Rui Li, Wenchi Cheng, Zhu Han

    Abstract: Semantic Communication (SC) is an emerging technology aiming to surpass the Shannon limit. Traditional SC strategies often minimize signal distortion between the original and reconstructed data, neglecting perceptual quality, especially in low Signal-to-Noise Ratio (SNR) environments. To address this issue, we introduce a novel Generative AI Semantic Communication (GSC) system for single-user scen… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

  34. arXiv:2408.04336  [pdf, other

    cs.AI

    KnowPC: Knowledge-Driven Programmatic Reinforcement Learning for Zero-shot Coordination

    Authors: Yin Gu, Qi Liu, Zhi Li, Kai Zhang

    Abstract: Zero-shot coordination (ZSC) remains a major challenge in the cooperative AI field, which aims to learn an agent to cooperate with an unseen partner in training environments or even novel environments. In recent years, a popular ZSC solution paradigm has been deep reinforcement learning (DRL) combined with advanced self-play or population-based methods to enhance the neural policy's ability to han… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  35. arXiv:2408.04235  [pdf, other

    cs.CV

    LLDif: Diffusion Models for Low-light Emotion Recognition

    Authors: Zhifeng Wang, Kaihao Zhang, Ramesh Sankaranarayana

    Abstract: This paper introduces LLDif, a novel diffusion-based facial expression recognition (FER) framework tailored for extremely low-light (LL) environments. Images captured under such conditions often suffer from low brightness and significantly reduced contrast, presenting challenges to conventional methods. These challenges include poor image quality that can significantly reduce the accuracy of emoti… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted by ICPR2024

  36. arXiv:2408.03360  [pdf, other

    cs.LG cs.AI

    Prioritize Alignment in Dataset Distillation

    Authors: Zekai Li, Ziyao Guo, Wangbo Zhao, Tianle Zhang, Zhi-Qi Cheng, Samir Khaki, Kaipeng Zhang, Ahmad Sajedi, Konstantinos N Plataniotis, Kai Wang, Yang You

    Abstract: Dataset Distillation aims to compress a large dataset into a significantly more compact, synthetic one without compromising the performance of the trained models. To achieve this, existing methods use the agent model to extract information from the target dataset and embed it into the distilled dataset. Consequently, the quality of extracted and embedded information determines the quality of the d… ▽ More

    Submitted 13 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: 18 pages, 9 figures

  37. arXiv:2408.03326  [pdf, other

    cs.CV cs.AI cs.CL

    LLaVA-OneVision: Easy Visual Task Transfer

    Authors: Bo Li, Yuanhan Zhang, Dong Guo, Renrui Zhang, Feng Li, Hao Zhang, Kaichen Zhang, Yanwei Li, Ziwei Liu, Chunyuan Li

    Abstract: We present LLaVA-OneVision, a family of open large multimodal models (LMMs) developed by consolidating our insights into data, models, and visual representations in the LLaVA-NeXT blog series. Our experimental results demonstrate that LLaVA-OneVision is the first single model that can simultaneously push the performance boundaries of open LMMs in three important computer vision scenarios: single-i… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Project Homepage: https://llava-vl.github.io/blog/2024-08-05-llava-onevision/

  38. arXiv:2408.03286  [pdf, other

    cs.CV

    Biomedical SAM 2: Segment Anything in Biomedical Images and Videos

    Authors: Zhiling Yan, Weixiang Sun, Rong Zhou, Zhengqing Yuan, Kai Zhang, Yiwei Li, Tianming Liu, Quanzheng Li, Xiang Li, Lifang He, Lichao Sun

    Abstract: Medical image segmentation and video object segmentation are essential for diagnosing and analyzing diseases by identifying and measuring biological structures. Recent advances in natural domain have been driven by foundation models like the Segment Anything Model 2 (SAM-2). To explore the performance of SAM-2 in biomedical applications, we designed three evaluation pipelines for single-frame 2D i… ▽ More

    Submitted 17 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  39. arXiv:2408.03149  [pdf, other

    cs.CV cs.CL

    Leveraging Entity Information for Cross-Modality Correlation Learning: The Entity-Guided Multimodal Summarization

    Authors: Yanghai Zhang, Ye Liu, Shiwei Wu, Kai Zhang, Xukai Liu, Qi Liu, Enhong Chen

    Abstract: The rapid increase in multimedia data has spurred advancements in Multimodal Summarization with Multimodal Output (MSMO), which aims to produce a multimodal summary that integrates both text and relevant images. The inherent heterogeneity of content within multimodal inputs and outputs presents a significant challenge to the execution of MSMO. Traditional approaches typically adopt a holistic pers… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: In ACL-Findings 2024

  40. arXiv:2408.02718  [pdf, other

    cs.CV

    MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

    Authors: Fanqing Meng, Jin Wang, Chuanhao Li, Quanfeng Lu, Hao Tian, Jiaqi Liao, Xizhou Zhu, Jifeng Dai, Yu Qiao, Ping Luo, Kaipeng Zhang, Wenqi Shao

    Abstract: The capability to process multiple images is crucial for Large Vision-Language Models (LVLMs) to develop a more thorough and nuanced understanding of a scene. Recent multi-image LVLMs have begun to address this need. However, their evaluation has not kept pace with their development. To fill this gap, we introduce the Multimodal Multi-image Understanding (MMIU) benchmark, a comprehensive evaluatio… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Project Page: https://mmiu-bench.github.io/

  41. arXiv:2407.21323  [pdf

    eess.IV cs.CV

    STANet: A Novel Spatio-Temporal Aggregation Network for Depression Classification with Small and Unbalanced FMRI Data

    Authors: Wei Zhang, Weiming Zeng, Hongyu Chen, Jie Liu, Hongjie Yan, Kaile Zhang, Ran Tao, Wai Ting Siok, Nizhuan Wang

    Abstract: Accurate diagnosis of depression is crucial for timely implementation of optimal treatments, preventing complications and reducing the risk of suicide. Traditional methods rely on self-report questionnaires and clinical assessment, lacking objective biomarkers. Combining fMRI with artificial intelligence can enhance depression diagnosis by integrating neuroimaging indicators. However, the specific… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  42. arXiv:2407.20271  [pdf, other

    cs.LG cs.AI cs.CL

    Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models

    Authors: Haoyu Tang, Ye Liu, Xukai Liu, Kai Zhang, Yanghai Zhang, Qi Liu, Enhong Chen

    Abstract: Recent advancements in machine learning, especially in Natural Language Processing (NLP), have led to the development of sophisticated models trained on vast datasets, but this progress has raised concerns about potential sensitive information leakage. In response, regulatory measures like the EU General Data Protection Regulation (GDPR) have driven the exploration of Machine Unlearning techniques… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  43. arXiv:2407.19628  [pdf, other

    cs.CV

    Text2LiDAR: Text-guided LiDAR Point Cloud Generation via Equirectangular Transformer

    Authors: Yang Wu, Kaihua Zhang, Jianjun Qian, Jin Xie, Jian Yang

    Abstract: The complex traffic environment and various weather conditions make the collection of LiDAR data expensive and challenging. Achieving high-quality and controllable LiDAR data generation is urgently needed, controlling with text is a common practice, but there is little research in this field. To this end, we propose Text2LiDAR, the first efficient, diverse, and text-controllable LiDAR data generat… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  44. arXiv:2407.19572  [pdf, other

    cs.CR

    Maximal Extractable Value Mitigation Approaches in Ethereum and Layer-2 Chains: A Comprehensive Survey

    Authors: Zeinab Alipanahloo, Abdelhakim Senhaji Hafid, Kaiwen Zhang

    Abstract: Maximal Extractable Value (MEV) represents a pivotal challenge within the Ethereum ecosystem; it impacts the fairness, security, and efficiency of both Layer 1 (L1) and Layer 2 (L2) networks. MEV arises when miners or validators manipulate transaction ordering to extract additional value, often at the expense of other network participants. This not only affects user experience by introducing unpre… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: 19 pages

  45. arXiv:2407.19426  [pdf, ps, other

    cs.LG cs.AI stat.ML

    Causal Discovery in Linear Models with Unobserved Variables and Measurement Error

    Authors: Yuqin Yang, Mohamed Nafea, Negar Kiyavash, Kun Zhang, AmirEmad Ghassami

    Abstract: The presence of unobserved common causes and the presence of measurement error are two of the most limiting challenges in the task of causal structure learning. Ignoring either of the two challenges can lead to detecting spurious causal links among variables of interest. In this paper, we study the problem of causal discovery in systems where these two challenges can be present simultaneously. We… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  46. arXiv:2407.19371  [pdf, other

    cs.LG

    Deep State-Space Generative Model For Correlated Time-to-Event Predictions

    Authors: Yuan Xue, Denny Zhou, Nan Du, Andrew M. Dai, Zhen Xu, Kun Zhang, Claire Cui

    Abstract: Capturing the inter-dependencies among multiple types of clinically-critical events is critical not only to accurate future event prediction, but also to better treatment planning. In this work, we propose a deep latent state-space generative model to capture the interactions among different types of correlated clinical events (e.g., kidney failure, mortality) by explicitly modeling the temporal d… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: Proceedings of the 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

  47. arXiv:2407.18492  [pdf

    cs.CV

    Neural Modulation Alteration to Positive and Negative Emotions in Depressed Patients: Insights from fMRI Using Positive/Negative Emotion Atlas

    Authors: Yu Feng, Weiming Zeng, Yifan Xie, Hongyu Chen, Lei Wang, Yingying Wang, Hongjie Yan, Kaile Zhang, Ran Tao, Wai Ting Siok, Nizhuan Wang

    Abstract: Background: Although it has been noticed that depressed patients show differences in processing emotions, the precise neural modulation mechanisms of positive and negative emotions remain elusive. FMRI is a cutting-edge medical imaging technology renowned for its high spatial resolution and dynamic temporal information, making it particularly suitable for the neural dynamics of depression research… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  48. arXiv:2407.16982  [pdf, other

    cs.CV cs.AI

    Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model

    Authors: Lirui Zhao, Tianshuo Yang, Wenqi Shao, Yuxin Zhang, Yu Qiao, Ping Luo, Kaipeng Zhang, Rongrong Ji

    Abstract: This paper addresses an important problem of object addition for images with only text guidance. It is challenging because the new object must be integrated seamlessly into the image with consistent visual context, such as lighting, texture, and spatial location. While existing text-guided image inpainting methods can add objects, they either fail to preserve the background consistency or involve… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  49. arXiv:2407.16975  [pdf, other

    cs.LG stat.ME

    On the Parameter Identifiability of Partially Observed Linear Causal Models

    Authors: Xinshuai Dong, Ignavier Ng, Biwei Huang, Yuewen Sun, Songyao Jin, Roberto Legaspi, Peter Spirtes, Kun Zhang

    Abstract: Linear causal models are important tools for modeling causal dependencies and yet in practice, only a subset of the variables can be observed. In this paper, we examine the parameter identifiability of these models by investigating whether the edge coefficients can be recovered given the causal structure and partially observed data. Our setting is more general than that of prior research - we allo… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  50. arXiv:2407.15325  [pdf, other

    cs.AI

    Odyssey: Empowering Agents with Open-World Skills

    Authors: Shunyu Liu, Yaoru Li, Kongcheng Zhang, Zhenyu Cui, Wenkai Fang, Yuxuan Zheng, Tongya Zheng, Mingli Song

    Abstract: Recent studies have delved into constructing generalist agents for open-world embodied environments like Minecraft. Despite the encouraging results, existing efforts mainly focus on solving basic programmatic tasks, e.g., material collection and tool-crafting following the Minecraft tech-tree, treating the ObtainDiamond task as the ultimate goal. This limitation stems from the narrowly defined set… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.