Skip to main content

Showing 1–50 of 288 results for author: Gong, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10495  [pdf, other

    cs.SE cs.AI

    How Well Do Large Language Models Serve as End-to-End Secure Code Producers?

    Authors: Jianian Gong, Nachuan Duan, Ziheng Tao, Zhaohui Gong, Yuan Yuan, Minlie Huang

    Abstract: The rapid advancement of large language models (LLMs) such as GPT-4 has revolutionized the landscape of software engineering, positioning these models at the core of modern development practices. As we anticipate these models to evolve into the primary and trustworthy tools used in software development, ensuring the security of the code they produce becomes paramount. How well can LLMs serve as en… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    ACM Class: D.2

  2. arXiv:2408.09333  [pdf, other

    cs.CL

    SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama

    Authors: Jing Tang, Quanlu Jia, Yuqiang Xie, Zeyu Gong, Xiang Wen, Jiayi Zhang, Yalong Guo, Guibin Chen, Jiangping Yang

    Abstract: Generating high-quality shooting scripts containing information such as scene and shot language is essential for short drama script generation. We collect 6,660 popular short drama episodes from the Internet, each with an average of 100 short episodes, and the total number of short episodes is about 80,000, with a total duration of about 2,000 hours and totaling 10 terabytes (TB). We perform keyfr… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

    Comments: 18 pages, 12 figures

  3. arXiv:2408.08089  [pdf, other

    cs.CL cs.AI

    AgentCourt: Simulating Court with Adversarial Evolvable Lawyer Agents

    Authors: Guhong Chen, Liyang Fan, Zihan Gong, Nan Xie, Zixuan Li, Ziqiang Liu, Chengming Li, Qiang Qu, Shiwen Ni, Min Yang

    Abstract: In this paper, we present a simulation system called AgentCourt that simulates the entire courtroom process. The judge, plaintiff's lawyer, defense lawyer, and other participants are autonomous agents driven by large language models (LLMs). Our core goal is to enable lawyer agents to learn how to argue a case, as well as improving their overall legal skills, through courtroom process simulation. T… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  4. arXiv:2408.07291  [pdf, other

    cs.CR

    Evaluating Large Language Model based Personal Information Extraction and Countermeasures

    Authors: Yupei Liu, Yuqi Jia, Jinyuan Jia, Neil Zhenqiang Gong

    Abstract: Automatically extracting personal information--such as name, phone number, and email address--from publicly available profiles at a large scale is a stepstone to many other security attacks including spear phishing. Traditional methods--such as regular expression, keyword search, and entity detection--achieve limited success at such personal information extraction. In this work, we perform a syste… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  5. arXiv:2408.00118  [pdf, other

    cs.CL cs.AI

    Gemma 2: Improving Open Language Models at a Practical Size

    Authors: Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman , et al. (172 additional authors not shown)

    Abstract: In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

  6. arXiv:2407.21315  [pdf, other

    cs.CL cs.AI

    Beyond Silent Letters: Amplifying LLMs in Emotion Recognition with Vocal Nuances

    Authors: Zehui Wu, Ziwei Gong, Lin Ai, Pengyuan Shi, Kaan Donbekci, Julia Hirschberg

    Abstract: This paper introduces a novel approach to emotion detection in speech using Large Language Models (LLMs). We address the limitation of LLMs in processing audio inputs by translating speech characteristics into natural language descriptions. Our method integrates these descriptions into text prompts, enabling LLMs to perform multimodal emotion analysis without architectural modifications. We evalua… ▽ More

    Submitted 31 July, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

  7. arXiv:2407.21293  [pdf, other

    cs.CV cs.AI

    SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving

    Authors: Peiru Zheng, Yun Zhao, Zhan Gong, Hong Zhu, Shaohua Wu

    Abstract: Many fields could benefit from the rapid development of the large language models (LLMs). The end-to-end autonomous driving (e2eAD) is one of the typically fields facing new opportunities as the LLMs have supported more and more modalities. Here, by utilizing vision-language model (VLM), we proposed an e2eAD method called SimpleLLM4AD. In our method, the e2eAD task are divided into four stages, wh… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 16 pages, 3 figures

  8. arXiv:2407.21033  [pdf, other

    cs.IR cs.AI cs.CL cs.CV

    Multi-Grained Query-Guided Set Prediction Network for Grounded Multimodal Named Entity Recognition

    Authors: Jielong Tang, Zhenxing Wang, Ziyang Gong, Jianxing Yu, Shuang Wang, Jian Yin

    Abstract: Grounded Multimodal Named Entity Recognition (GMNER) is an emerging information extraction (IE) task, aiming to simultaneously extract entity spans, types, and entity-matched bounding box groundings in images from given sentence-image pairs data. Recent unified methods employing machine reading comprehension (MRC-based) frameworks or sequence generation-based models face challenges in understandin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 11 pages, 5 figures

  9. arXiv:2407.16207  [pdf, other

    cs.CL

    Graph-Structured Speculative Decoding

    Authors: Zhuocheng Gong, Jiahao Liu, Ziyue Wang, Pengfei Wu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

    Abstract: Speculative decoding has emerged as a promising technique to accelerate the inference of Large Language Models (LLMs) by employing a small language model to draft a hypothesis sequence, which is then validated by the LLM. The effectiveness of this approach heavily relies on the balance between performance and efficiency of the draft model. In our research, we focus on enhancing the proportion of d… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  10. arXiv:2407.15100  [pdf, other

    cs.CR cs.LG

    A General Framework for Data-Use Auditing of ML Models

    Authors: Zonghao Huang, Neil Zhenqiang Gong, Michael K. Reiter

    Abstract: Auditing the use of data in training machine-learning (ML) models is an increasingly pressing challenge, as myriad ML practitioners routinely leverage the effort of content creators to train models without their permission. In this paper, we propose a general method to audit an ML model for the use of a data-owner's data in training, without prior knowledge of the ML task for which the data might… ▽ More

    Submitted 4 August, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

    Comments: The full paper of "A General Framework for Data-Use Auditing of ML Models" accepted by CCS 2024

  11. arXiv:2407.09050  [pdf, other

    cs.CR cs.AI cs.CV cs.LG

    Refusing Safe Prompts for Multi-modal Large Language Models

    Authors: Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong

    Abstract: Multimodal large language models (MLLMs) have become the cornerstone of today's generative AI ecosystem, sparking intense competition among tech giants and startups. In particular, an MLLM generates a text response given a prompt consisting of an image and a question. While state-of-the-art MLLMs use safety filters and alignment techniques to refuse unsafe prompts, in this work, we introduce MLLM-… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  12. arXiv:2407.07221  [pdf, other

    cs.CV cs.CR

    Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning

    Authors: Yuqi Jia, Minghong Fang, Hongbin Liu, Jinghuai Zhang, Neil Zhenqiang Gong

    Abstract: Poisoning attacks compromise the training phase of federated learning (FL) such that the learned global model misclassifies attacker-chosen inputs called target inputs. Existing defenses mainly focus on protecting the training phase of FL such that the learnt global model is poison free. However, these defenses often achieve limited effectiveness when the clients' local training data is highly non… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  13. arXiv:2407.06677  [pdf, other

    cs.CL

    Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules

    Authors: Zhuocheng Gong, Ang Lv, Jian Guan, Junxi Yan, Wei Wu, Huishuai Zhang, Minlie Huang, Dongyan Zhao, Rui Yan

    Abstract: Is it always necessary to compute tokens from shallow to deep layers in Transformers? The continued success of vanilla Transformers and their variants suggests an undoubted "yes". In this work, however, we attempt to break the depth-ordered convention by proposing a novel architecture dubbed mixture-of-modules (MoM), which is motivated by an intuition that any layer, regardless of its position, ca… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  14. arXiv:2407.04086  [pdf, other

    cs.CR cs.CV cs.LG

    Certifiably Robust Image Watermark

    Authors: Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Jinyuan Jia, Neil Zhenqiang Gong

    Abstract: Generative AI raises many societal concerns such as boosting disinformation and propaganda campaigns. Watermarking AI-generated content is a key technology to address these concerns and has been widely deployed in industry. However, watermarking is vulnerable to removal attacks and forgery attacks. In this work, we propose the first image watermarks with certified robustness guarantees against rem… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  15. arXiv:2407.01505  [pdf, other

    cs.CL cs.AI

    Self-Cognition in Large Language Models: An Exploratory Study

    Authors: Dongping Chen, Jiawen Shi, Yao Wan, Pan Zhou, Neil Zhenqiang Gong, Lichao Sun

    Abstract: While Large Language Models (LLMs) have achieved remarkable success across various applications, they also raise concerns regarding self-cognition. In this paper, we perform a pioneering study to explore self-cognition in LLMs. Specifically, we first construct a pool of self-cognition instruction prompts to evaluate where an LLM exhibits self-cognition and four well-designed principles to quantify… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted at ICML 2024 Large Language Models and Cognition Workshop

  16. arXiv:2407.01026  [pdf, other

    cs.CL cs.AI

    Augmenting Document-level Relation Extraction with Efficient Multi-Supervision

    Authors: Xiangyu Lin, Weijia Jia, Zhiguo Gong

    Abstract: Despite its popularity in sentence-level relation extraction, distantly supervised data is rarely utilized by existing work in document-level relation extraction due to its noisy nature and low information density. Among its current applications, distantly supervised data is mostly used as a whole for pertaining, which is of low time efficiency. To fill in the gap of efficient and robust utilizati… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  17. arXiv:2406.15968  [pdf, other

    cs.CL cs.LG

    ReCaLL: Membership Inference via Relative Conditional Log-Likelihoods

    Authors: Roy Xie, Junlin Wang, Ruomin Huang, Minxing Zhang, Rong Ge, Jian Pei, Neil Zhenqiang Gong, Bhuwan Dhingra

    Abstract: The rapid scaling of large language models (LLMs) has raised concerns about the transparency and fair use of the pretraining data used for training them. Detecting such content is challenging due to the scale of the data and limited exposure of each instance during training. We propose ReCaLL (Relative Conditional Log-Likelihood), a novel membership inference attack (MIA) to detect LLMs' pretraini… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  18. arXiv:2406.14056  [pdf, other

    cs.CV

    VGA: Vision GUI Assistant -- Minimizing Hallucinations through Image-Centric Fine-Tuning

    Authors: Ziyang Meng, Yu Dai, Zezheng Gong, Shaoxiong Guo, Minglong Tang, Tongquan Wei

    Abstract: Recent advances in Large Vision-Language Models (LVLMs) have significantly improve performance in image comprehension tasks, such as formatted charts and rich-content images. Yet, Graphical User Interface (GUI) pose a greater challenge due to their structured format and detailed textual information. Existing LVLMs often overly depend on internal knowledge and neglect image content, resulting in ha… ▽ More

    Submitted 21 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

    Comments: 18 pages

    MSC Class: 68-04 68-04 ACM Class: I.2.7; I.2.10

  19. arXiv:2406.12723  [pdf, other

    cs.LG

    BIOSCAN-5M: A Multimodal Dataset for Insect Biodiversity

    Authors: Zahra Gharaee, Scott C. Lowe, ZeMing Gong, Pablo Millan Arias, Nicholas Pellegrino, Austin T. Wang, Joakim Bruslund Haurum, Iuliia Zarubiieva, Lila Kari, Dirk Steinke, Graham W. Taylor, Paul Fieguth, Angel X. Chang

    Abstract: As part of an ongoing worldwide effort to comprehend and monitor insect biodiversity, this paper presents the BIOSCAN-5M Insect dataset to the machine learning community and establish several benchmark tasks. BIOSCAN-5M is a comprehensive dataset containing multi-modal information for over 5 million insect specimens, and it significantly expands existing image-based biological datasets by includin… ▽ More

    Submitted 24 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  20. arXiv:2406.11431  [pdf, other

    cs.CL cs.AI

    Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

    Authors: Wenkai Yang, Shiqi Shen, Guangyao Shen, Zhi Gong, Yankai Lin

    Abstract: Superalignment, where humans are weak supervisors of superhuman models, has become an important and widely discussed issue in the current era of rapid development of Large Language Models (LLMs). The recent work preliminarily studies this problem by using weak models to supervise strong models. It discovers that weakly supervised strong students can consistently outperform weak teachers towards th… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Code is available at https://github.com/keven980716/weak-to-strong-deception

  21. arXiv:2406.11409  [pdf, other

    cs.CL cs.AI

    CodeGemma: Open Code Models Based on Gemma

    Authors: CodeGemma Team, Heri Zhao, Jeffrey Hui, Joshua Howland, Nam Nguyen, Siqi Zuo, Andrea Hu, Christopher A. Choquette-Choo, Jingyue Shen, Joe Kelley, Kshitij Bansal, Luke Vilnis, Mateo Wirth, Paul Michel, Peter Choy, Pratik Joshi, Ravin Kumar, Sarmad Hashmi, Shubham Agrawal, Zhitao Gong, Jane Fine, Tris Warkentin, Ale Jakse Hartman, Bin Ni, Kathy Korevec , et al. (2 additional authors not shown)

    Abstract: This paper introduces CodeGemma, a collection of specialized open code models built on top of Gemma, capable of a variety of code and natural language generation tasks. We release three model variants. CodeGemma 7B pretrained (PT) and instruction-tuned (IT) variants have remarkably resilient natural language understanding, excel in mathematical reasoning, and match code capabilities of other open… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

    Comments: v1: 11 pages, 4 figures, 5 tables. v2: Update metadata

  22. arXiv:2406.08187  [pdf, other

    cs.RO

    Learning-based Traversability Costmap for Autonomous Off-road Navigation

    Authors: Qiumin Zhu, Zhen Sun, Songpengcheng Xia, Guoqing Liu, Kehui Ma, Ling Pei, Zheng Gong

    Abstract: Traversability estimation in off-road terrains is an essential procedure for autonomous navigation. However, creating reliable labels for complex interactions between the robot and the surface is still a challenging problem in learning-based costmap generation. To address this, we propose a method that predicts traversability costmaps by leveraging both visual and geometric information of the envi… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  23. arXiv:2406.06979  [pdf, other

    cs.LG cs.CR cs.SD eess.AS

    AudioMarkBench: Benchmarking Robustness of Audio Watermarking

    Authors: Hongbin Liu, Moyang Guo, Zhengyuan Jiang, Lun Wang, Neil Zhenqiang Gong

    Abstract: The increasing realism of synthetic speech, driven by advancements in text-to-speech models, raises ethical concerns regarding impersonation and disinformation. Audio watermarking offers a promising solution via embedding human-imperceptible watermarks into AI-generated audios. However, the robustness of audio watermarking against common/adversarial perturbations remains understudied. We present A… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  24. arXiv:2406.02252  [pdf, other

    cs.DC

    Exploring the Efficiency of Renewable Energy-based Modular Data Centers at Scale

    Authors: Jinghan Sun, Zibo Gong, Anup Agarwal, Shadi Noghabi, Ranveer Chandra, Marc Snir, Jian Huang

    Abstract: Modular data centers (MDCs) that can be placed right at the energy farms and powered mostly by renewable energy, are proven to be a flexible and effective approach to lowering the carbon footprint of data centers. However, the main challenge of using renewable energy is the high variability of power produced, which implies large volatility in powering computing resources at MDCs, and degraded appl… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  25. arXiv:2406.01460  [pdf, other

    cs.CV cs.AI

    MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization

    Authors: Yu Zhang, Qi Zhang, Zixuan Gong, Yiwei Shi, Yepeng Liu, Duoqian Miao, Yang Liu, Ke Liu, Kun Yi, Wei Fan, Liang Hu, Changwei Wang

    Abstract: Contrastive Language-Image Pretraining (CLIP) has achieved remarkable success, leading to rapid advancements in multimodal studies. However, CLIP faces a notable challenge in terms of inefficient data utilization. It relies on a single contrastive supervision for each image-text pair during representation learning, disregarding a substantial amount of valuable information that could offer richer s… ▽ More

    Submitted 4 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  26. arXiv:2405.19730  [pdf

    cs.AI cs.CV cs.LG

    Research on the Spatial Data Intelligent Large Model

    Authors: Shaohua Wang, Xing Xie, Yong Li, Danhuai Guo, Zhi Cai, Yu Liu, Yang Yue, Xiao Pan, Feng Lu, Huayi Wu, Zhipeng Gui, Zhiming Ding, Bolong Zheng, Fuzheng Zhang, Jingyuan Wang, Zhengchao Chen, Hao Lu, Jiayi Li, Peng Yue, Wenhao Yu, Yao Yao, Leilei Sun, Yong Zhang, Longbiao Chen, Xiaoping Du , et al. (6 additional authors not shown)

    Abstract: This report focuses on spatial data intelligent large models, delving into the principles, methods, and cutting-edge applications of these models. It provides an in-depth discussion on the definition, development history, current status, and trends of spatial data intelligent large models, as well as the challenges they face. The report systematically elucidates the key technologies of spatial dat… ▽ More

    Submitted 15 August, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

    Comments: V1 and V2 are in Chinese language, other versions are in English

  27. arXiv:2405.17537  [pdf, other

    cs.AI cs.CL cs.CV

    BIOSCAN-CLIP: Bridging Vision and Genomics for Biodiversity Monitoring at Scale

    Authors: ZeMing Gong, Austin T. Wang, Joakim Bruslund Haurum, Scott C. Lowe, Graham W. Taylor, Angel X. Chang

    Abstract: Measuring biodiversity is crucial for understanding ecosystem health. While prior works have developed machine learning models for the taxonomic classification of photographic images and DNA separately, in this work, we introduce a multimodal approach combining both, using CLIP-style contrastive learning to align images, DNA barcodes, and textual data in a unified embedding space. This allows for… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 16 pages with 9 figures

  28. arXiv:2405.10029  [pdf, other

    cs.MM

    AsCL: An Asymmetry-sensitive Contrastive Learning Method for Image-Text Retrieval with Cross-Modal Fusion

    Authors: Ziyu Gong, Chengcheng Mai, Yihua Huang

    Abstract: The image-text retrieval task aims to retrieve relevant information from a given image or text. The main challenge is to unify multimodal representation and distinguish fine-grained differences across modalities, thereby finding similar contents and filtering irrelevant contents. However, existing methods mainly focus on unified semantic representation and concept alignment for multi-modalities, w… ▽ More

    Submitted 17 May, 2024; v1 submitted 16 May, 2024; originally announced May 2024.

    Comments: This work has been strong-accepted as the oral conference paper by IEEE International Conference on Multimedia & Expo (ICME) 2024

  29. arXiv:2405.05784  [pdf, other

    cs.CR cs.LG

    Link Stealing Attacks Against Inductive Graph Neural Networks

    Authors: Yixin Wu, Xinlei He, Pascal Berrang, Mathias Humbert, Michael Backes, Neil Zhenqiang Gong, Yang Zhang

    Abstract: A graph neural network (GNN) is a type of neural network that is specifically designed to process graph-structured data. Typically, GNNs can be implemented in two settings, including the transductive setting and the inductive setting. In the transductive setting, the trained model can only predict the labels of nodes that were observed at the training time. In the inductive setting, the trained mo… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: To appear in the 24th Privacy Enhancing Technologies Symposium (PETS 2024), July 15-20, 2024

  30. arXiv:2404.17839  [pdf, other

    cs.CR cs.SE

    Improving Smart Contract Security with Contrastive Learning-based Vulnerability Detection

    Authors: Yizhou Chen, Zeyu Sun, Zhihao Gong, Dan Hao

    Abstract: Currently, smart contract vulnerabilities (SCVs) have emerged as a major factor threatening the transaction security of blockchain. Existing state-of-the-art methods rely on deep learning to mitigate this threat. They treat each input contract as an independent entity and feed it into a deep learning model to learn vulnerability patterns by fitting vulnerability labels. It is a pity that they disr… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Journal ref: 2024 IEEE/ACM 46th International Conference on Software Engineering (ICSE '24)

  31. arXiv:2404.17698  [pdf, other

    cs.HC

    "Actually I Can Count My Blessings": User-Centered Design of an Application to Promote Gratitude Among Young Adults

    Authors: Ananya Bhattacharjee, Zichen Gong, Bingcheng Wang, Timothy James Luckcock, Emma Watson, Elena Allica Abellan, Leslie Gutman, Anne Hsu, Joseph Jay Williams

    Abstract: Regular practice of gratitude has the potential to enhance psychological wellbeing and foster stronger social connections among young adults. However, there is a lack of research investigating user needs and expectations regarding gratitude-promoting applications. To address this gap, we employed a user-centered design approach to develop a mobile application that facilitates gratitude practice. O… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

  32. arXiv:2404.17031  [pdf, other

    cs.CV

    Motor Focus: Ego-Motion Prediction with All-Pixel Matching

    Authors: Hao Wang, Jiayou Qin, Xiwen Chen, Ashish Bastola, John Suchanek, Zihao Gong, Abolfazl Razi

    Abstract: Motion analysis plays a critical role in various applications, from virtual reality and augmented reality to assistive visual navigation. Traditional self-driving technologies, while advanced, typically do not translate directly to pedestrian applications due to their reliance on extensive sensor arrays and non-feasible computational frameworks. This highlights a significant gap in applying these… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  33. arXiv:2404.15611  [pdf, other

    cs.CR

    Model Poisoning Attacks to Federated Learning via Multi-Round Consistency

    Authors: Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong

    Abstract: Model poisoning attacks are critical security threats to Federated Learning (FL). Existing model poisoning attacks suffer from two key limitations: 1) they achieve suboptimal effectiveness when defenses are deployed, and/or 2) they require knowledge of the model updates or local training data on genuine clients. In this work, we make a key observation that their suboptimal effectiveness arises fro… ▽ More

    Submitted 6 June, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

  34. arXiv:2404.13282  [pdf, other

    cs.CV cs.MM

    Wills Aligner: A Robust Multi-Subject Brain Representation Learner

    Authors: Guangyin Bao, Zixuan Gong, Qi Zhang, Jialei Zhou, Wei Fan, Kun Yi, Usman Naseem, Liang Hu, Duoqian Miao

    Abstract: Decoding visual information from human brain activity has seen remarkable advancements in recent research. However, due to the significant variability in cortical parcellation and cognition patterns across subjects, current approaches personalized deep models for each subject, constraining the practicality of this technology in real-world contexts. To tackle the challenges, we introduce Wills Alig… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

    Comments: 15 pages

  35. arXiv:2404.12630  [pdf, other

    cs.CV cs.MM

    MindTuner: Cross-Subject Visual Decoding with Visual Fingerprint and Semantic Correction

    Authors: Zixuan Gong, Qi Zhang, Guangyin Bao, Lei Zhu, Ke Liu, Liang Hu, Duoqian Miao

    Abstract: Decoding natural visual scenes from brain activity has flourished, with extensive research in single-subject tasks and, however, less in cross-subject tasks. Reconstructing high-quality images in cross-subject tasks is a challenging problem due to profound individual differences between subjects and the scarcity of data annotation. In this work, we proposed MindTuner for cross-subject visual decod… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: 14 pages

  36. arXiv:2404.12022  [pdf, other

    cs.CL

    Parallel Decoding via Hidden Transfer for Lossless Large Language Model Acceleration

    Authors: Pengfei Wu, Jiahao Liu, Zhuocheng Gong, Qifan Wang, Jinpeng Li, Jingang Wang, Xunliang Cai, Dongyan Zhao

    Abstract: Large language models (LLMs) have recently shown remarkable performance across a wide range of tasks. However, the substantial number of parameters in LLMs contributes to significant latency during model inference. This is particularly evident when utilizing autoregressive decoding methods, which generate one token in a single forward process, thereby not fully capitalizing on the parallel computi… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  37. arXiv:2404.10211  [pdf, other

    cs.LG cs.AI

    Anomaly Correction of Business Processes Using Transformer Autoencoder

    Authors: Ziyou Gong, Xianwen Fang, Ping Wu

    Abstract: Event log records all events that occur during the execution of business processes, so detecting and correcting anomalies in event log can provide reliable guarantee for subsequent process analysis. The previous works mainly include next event prediction based methods and autoencoder-based methods. These methods cannot accurately and efficiently detect anomalies and correct anomalies at the same t… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  38. arXiv:2404.10179  [pdf, other

    cs.RO cs.AI cs.HC cs.LG

    Scaling Instructable Agents Across Many Simulated Worlds

    Authors: SIMA Team, Maria Abi Raad, Arun Ahuja, Catarina Barros, Frederic Besse, Andrew Bolt, Adrian Bolton, Bethanie Brownfield, Gavin Buttimore, Max Cant, Sarah Chakera, Stephanie C. Y. Chan, Jeff Clune, Adrian Collister, Vikki Copeman, Alex Cullum, Ishita Dasgupta, Dario de Cesare, Julia Di Trapani, Yani Donchev, Emma Dunleavy, Martin Engelcke, Ryan Faulkner, Frankie Garcia, Charles Gbadamosi , et al. (68 additional authors not shown)

    Abstract: Building embodied AI systems that can follow arbitrary language instructions in any 3D environment is a key challenge for creating general AI. Accomplishing this goal requires learning to ground language in perception and embodied actions, in order to accomplish complex tasks. The Scalable, Instructable, Multiworld Agent (SIMA) project tackles this by training agents to follow free-form instructio… ▽ More

    Submitted 17 April, 2024; v1 submitted 13 March, 2024; originally announced April 2024.

  39. arXiv:2404.07431  [pdf, other

    cs.RO eess.SY

    Parameterized Fast and Safe Tracking (FaSTrack) using Deepreach

    Authors: Hyun Joe Jeong, Zheng Gong, Somil Bansal, Sylvia Herbert

    Abstract: Fast and Safe Tracking (FaSTrack) is a modular framework that provides safety guarantees while planning and executing trajectories in real time via value functions of Hamilton-Jacobi (HJ) reachability. These value functions are computed through dynamic programming, which is notorious for being computationally inefficient. Moreover, the resulting trajectory does not adapt online to the environment,… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

    Comments: 12 pages, 6 figures, 1 table, to be published in L4DC

  40. arXiv:2404.05403  [pdf, other

    cs.CR cs.AI

    SoK: Gradient Leakage in Federated Learning

    Authors: Jiacheng Du, Jiahui Hu, Zhibo Wang, Peng Sun, Neil Zhenqiang Gong, Kui Ren

    Abstract: Federated learning (FL) enables collaborative model training among multiple clients without raw data exposure. However, recent studies have shown that clients' private training data can be reconstructed from the gradients they share in FL, known as gradient inversion attacks (GIAs). While GIAs have demonstrated effectiveness under \emph{ideal settings and auxiliary assumptions}, their actual effic… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

  41. arXiv:2404.04254  [pdf, other

    cs.CR cs.AI cs.CL cs.CV cs.LG

    Watermark-based Detection and Attribution of AI-Generated Content

    Authors: Zhengyuan Jiang, Moyang Guo, Yuepeng Hu, Neil Zhenqiang Gong

    Abstract: Several companies--such as Google, Microsoft, and OpenAI--have deployed techniques to watermark AI-generated content to enable proactive detection. However, existing literature mainly focuses on user-agnostic detection. Attribution aims to further trace back the user of a generative-AI service who generated a given content detected as AI-generated. Despite its growing importance, attribution is la… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  42. arXiv:2404.02472  [pdf, other

    cs.RO eess.SY

    Safe Returning FaSTrack with Robust Control Lyapunov-Value Functions

    Authors: Zheng Gong, Boyang Li, Sylvia Herbert

    Abstract: Real-time navigation in a priori unknown environment remains a challenging task, especially when an unexpected (unmodeled) disturbance occurs. In this paper, we propose the framework Safe Returning Fast and Safe Tracking (SR-F) that merges concepts from 1) Robust Control Lyapunov-Value Functions (R-CLVF), and 2) the Fast and Safe Tracking (FaSTrack) framework. The SR-F computes an R-CLVF offline b… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 6 pages, 4 figures, 1 table, 2 algorithms. Submitted to LCSS on 03/06

  43. arXiv:2403.17710  [pdf, other

    cs.CR cs.AI

    Optimization-based Prompt Injection Attack to LLM-as-a-Judge

    Authors: Jiawen Shi, Zenghui Yuan, Yinuo Liu, Yue Huang, Pan Zhou, Lichao Sun, Neil Zhenqiang Gong

    Abstract: LLM-as-a-Judge is a novel solution that can assess textual information with large language models (LLMs). Based on existing research studies, LLMs demonstrate remarkable performance in providing a compelling alternative to traditional human assessment. However, the robustness of these systems against prompt injection attacks remains an open question. In this work, we introduce JudgeDeceiver, a nov… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

  44. arXiv:2403.17369  [pdf, other

    cs.CV

    CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning

    Authors: Ziyang Gong, Fuhao Li, Yupeng Deng, Deblina Bhattacharjee, Xianzheng Ma, Xiangwei Zhu, Zhenming Ji

    Abstract: Unsupervised Domain Adaptation (UDA) aims to adapt models from labeled source domains to unlabeled target domains. When adapting to adverse scenes, existing UDA methods fail to perform well due to the lack of instructions, leading their models to overlook discrepancies within all adverse scenes. To tackle this, we propose CoDA which instructs models to distinguish, focus, and learn from these disc… ▽ More

    Submitted 15 July, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  45. arXiv:2403.12415  [pdf, other

    cs.CV cs.HC

    VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation

    Authors: Hao Wang, Jiayou Qin, Ashish Bastola, Xiwen Chen, John Suchanek, Zihao Gong, Abolfazl Razi

    Abstract: This paper explores the potential of Large Language Models(LLMs) in zero-shot anomaly detection for safe visual navigation. With the assistance of the state-of-the-art real-time open-world object detection model Yolo-World and specialized prompts, the proposed framework can identify anomalies within camera-captured frames that include any possible obstacles, then generate concise, audio-delivered… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  46. arXiv:2403.11482  [pdf, other

    cs.LG physics.geo-ph

    SeisFusion: Constrained Diffusion Model with Input Guidance for 3D Seismic Data Interpolation and Reconstruction

    Authors: Shuang Wang, Fei Deng, Peifan Jiang, Zishan Gong, Xiaolin Wei, Yuqing Wang

    Abstract: Geographical, physical, or economic constraints often result in missing traces within seismic data, making the reconstruction of complete seismic data a crucial step in seismic data processing. Traditional methods for seismic data reconstruction require the selection of multiple empirical parameters and struggle to handle large-scale continuous missing data. With the development of deep learning,… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

  47. arXiv:2403.08295  [pdf, other

    cs.CL cs.AI

    Gemma: Open Models Based on Gemini Research and Technology

    Authors: Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, Léonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Amélie Héliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari , et al. (83 additional authors not shown)

    Abstract: This work introduces Gemma, a family of lightweight, state-of-the art open models built from the research and technology used to create Gemini models. Gemma models demonstrate strong performance across academic benchmarks for language understanding, reasoning, and safety. We release two sizes of models (2 billion and 7 billion parameters), and provide both pretrained and fine-tuned checkpoints. Ge… ▽ More

    Submitted 16 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  48. arXiv:2403.06408  [pdf, other

    cs.LG cs.AI

    What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation

    Authors: Zhuocheng Gong, Jiahao Liu, Jingang Wang, Xunliang Cai, Dongyan Zhao, Rui Yan

    Abstract: Quantization has emerged as a promising technique for improving the memory and computational efficiency of large language models (LLMs). Though the trade-off between performance and efficiency is well-known, there is still much to be learned about the relationship between quantization and LLM performance. To shed light on this relationship, we propose a new perspective on quantization, viewing it… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  49. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  50. arXiv:2403.05318  [pdf, other

    cs.AI cs.LG

    Looking Ahead to Avoid Being Late: Solving Hard-Constrained Traveling Salesman Problem

    Authors: Jingxiao Chen, Ziqin Gong, Minghuan Liu, Jun Wang, Yong Yu, Weinan Zhang

    Abstract: Many real-world problems can be formulated as a constrained Traveling Salesman Problem (TSP). However, the constraints are always complex and numerous, making the TSPs challenging to solve. When the number of complicated constraints grows, it is time-consuming for traditional heuristic algorithms to avoid illegitimate outcomes. Learning-based methods provide an alternative to solve TSPs in a soft… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.