Skip to main content

Showing 1–50 of 729 results for author: Song, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.06949  [pdf, other

    cs.CL cs.AI

    You Have Thirteen Hours in Which to Solve the Labyrinth: Enhancing AI Game Masters with Function Calling

    Authors: Jaewoo Song, Andrew Zhu, Chris Callison-Burch

    Abstract: Developing a consistent and reliable AI game master for text-based games is a challenging task due to the limitations of large language models (LLMs) and the complexity of the game master's role. This paper presents a novel approach to enhance AI game masters by leveraging function calling in the context of the table-top role-playing game "Jim Henson's Labyrinth: The Adventure Game." Our methodolo… ▽ More

    Submitted 10 September, 2024; originally announced September 2024.

    Comments: Wordplay Workshop @ ACL 2024

  2. arXiv:2409.05840  [pdf, other

    cs.CL

    MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct

    Authors: Run Luo, Haonan Zhang, Longze Chen, Ting-En Lin, Xiong Liu, Yuchuan Wu, Min Yang, Minzheng Wang, Pengpeng Zeng, Lianli Gao, Heng Tao Shen, Yunshui Li, Xiaobo Xia, Fei Huang, Jingkuan Song, Yongbin Li

    Abstract: The development of Multimodal Large Language Models (MLLMs) has seen significant advancements. However, the quantity and quality of multimodal instruction data have emerged as significant bottlenecks in their progress. Manually creating multimodal instruction data is both time-consuming and inefficient, posing challenges in producing instructions of high complexity. Moreover, distilling instructio… ▽ More

    Submitted 9 September, 2024; originally announced September 2024.

  3. arXiv:2409.03773  [pdf, other

    q-bio.BM cs.LG

    CoPRA: Bridging Cross-domain Pretrained Sequence Models with Complex Structures for Protein-RNA Binding Affinity Prediction

    Authors: Rong Han, Xiaohong Liu, Tong Pan, Jing Xu, Xiaoyu Wang, Wuyang Lan, Zhenyu Li, Zixuan Wang, Jiangning Song, Guangyu Wang, Ting Chen

    Abstract: Accurately measuring protein-RNA binding affinity is crucial in many biological processes and drug design. Previous computational methods for protein-RNA binding affinity prediction rely on either sequence or structure features, unable to capture the binding mechanisms comprehensively. The recent emerging pre-trained language models trained on massive unsupervised sequences of protein and RNA have… ▽ More

    Submitted 21 August, 2024; originally announced September 2024.

  4. arXiv:2409.02339  [pdf, ps, other

    cs.LG math-ph nlin.PS physics.comp-ph physics.optics

    Data-driven 2D stationary quantum droplets and wave propagations in the amended GP equation with two potentials via deep neural networks learning

    Authors: Jin Song, Zhenya Yan

    Abstract: In this paper, we develop a systematic deep learning approach to solve two-dimensional (2D) stationary quantum droplets (QDs) and investigate their wave propagation in the 2D amended Gross-Pitaevskii equation with Lee-Huang-Yang correction and two kinds of potentials. Firstly, we use the initial-value iterative neural network (IINN) algorithm for 2D stationary quantum droplets of stationary equati… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 17 pages, 12 figures (Proc. R. Soc. A, accepted for publication). arXiv admin note: text overlap with arXiv:2409.01124

  5. arXiv:2409.01124  [pdf, ps, other

    physics.comp-ph cs.AI cs.LG math-ph nlin.PS nlin.SI

    Two-stage initial-value iterative physics-informed neural networks for simulating solitary waves of nonlinear wave equations

    Authors: Jin Song, Ming Zhong, George Em Karniadakis, Zhenya Yan

    Abstract: We propose a new two-stage initial-value iterative neural network (IINN) algorithm for solitary wave computations of nonlinear wave equations based on traditional numerical iterative methods and physics-informed neural networks (PINNs). Specifically, the IINN framework consists of two subnetworks, one of which is used to fit a given initial value, and the other incorporates physical information an… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

    Comments: 25 pages, 17 figures

    Journal ref: Journal of Computational Physics 505 (2024) 112917

  6. arXiv:2409.00942  [pdf, other

    cs.CV

    VQ-Flow: Taming Normalizing Flows for Multi-Class Anomaly Detection via Hierarchical Vector Quantization

    Authors: Yixuan Zhou, Xing Xu, Zhe Sun, Jingkuan Song, Andrzej Cichocki, Heng Tao Shen

    Abstract: Normalizing flows, a category of probabilistic models famed for their capabilities in modeling complex data distributions, have exhibited remarkable efficacy in unsupervised anomaly detection. This paper explores the potential of normalizing flows in multi-class anomaly detection, wherein the normal data is compounded with multiple classes without providing class labels. Through the integration of… ▽ More

    Submitted 2 September, 2024; originally announced September 2024.

  7. arXiv:2409.00343  [pdf, other

    cs.CV

    EgoHDM: An Online Egocentric-Inertial Human Motion Capture, Localization, and Dense Mapping System

    Authors: Bonan Liu, Handi Yin, Manuel Kaufmann, Jinhao He, Sammy Christen, Jie Song, Pan Hui

    Abstract: We present EgoHDM, an online egocentric-inertial human motion capture (mocap), localization, and dense mapping system. Our system uses 6 inertial measurement units (IMUs) and a commodity head-mounted RGB camera. EgoHDM is the first human mocap system that offers dense scene mapping in near real-time. Further, it is fast and robust to initialize and fully closes the loop between physically plausibl… ▽ More

    Submitted 5 September, 2024; v1 submitted 31 August, 2024; originally announced September 2024.

    Comments: Project Page: https://handiyin.github.io/EgoHDM/

  8. arXiv:2408.16354  [pdf, other

    cs.RO

    An Accurate Filter-based Visual Inertial External Force Estimator via Instantaneous Accelerometer Update

    Authors: Junlin Song, Antoine Richard, Miguel Olivares-Mendez

    Abstract: Accurate disturbance estimation is crucial for reliable robotic physical interaction. To estimate environmental interference in a low-cost and sensorless way (without force sensor), a variety of tightly-coupled visual inertial external force estimators are proposed in the literature. However, existing solutions may suffer from relatively low-frequency preintegration. In this paper, a novel estimat… ▽ More

    Submitted 29 August, 2024; originally announced August 2024.

    Comments: Accepted by the 40th Anniversary of the IEEE Conference on Robotics and Automation (ICRA@40)

  9. arXiv:2408.15310  [pdf, other

    q-bio.MN cs.CE cs.LG

    RGDA-DDI: Residual graph attention network and dual-attention based framework for drug-drug interaction prediction

    Authors: Changjian Zhou, Xin Zhang, Jiafeng Li, Jia Song, Wensheng Xiang

    Abstract: Recent studies suggest that drug-drug interaction (DDI) prediction via computational approaches has significant importance for understanding the functions and co-prescriptions of multiple drugs. However, the existing silico DDI prediction methods either ignore the potential interactions among drug-drug pairs (DDPs), or fail to explicitly model and fuse the multi-scale drug feature representations… ▽ More

    Submitted 27 August, 2024; originally announced August 2024.

  10. arXiv:2408.14152  [pdf, other

    cs.CV cs.LG

    Application of Disentanglement to Map Registration Problem

    Authors: Hae Jin Song, Patrycja Krawczuk, Po-Hsuan Huang

    Abstract: Geospatial data come from various sources, such as satellites, aircraft, and LiDAR. The variability of the source is not limited to the types of data acquisition techniques, as we have maps from different time periods. To incorporate these data for a coherent analysis, it is essential to first align different "styles" of geospatial data to its matching images that point to the same location on the… ▽ More

    Submitted 26 August, 2024; originally announced August 2024.

  11. arXiv:2408.13587  [pdf, other

    cs.CV

    Explainable Convolutional Networks for Crater Detection and Lunar Landing Navigation

    Authors: Jianing Song, Nabil Aouf, Duarte Rondao, Christophe Honvault, Luis Mansilla

    Abstract: The Lunar landing has drawn great interest in lunar exploration in recent years, and autonomous lunar landing navigation is fundamental to this task. AI is expected to play a critical role in autonomous and intelligent space missions, yet human experts question the reliability of AI solutions. Thus, the \gls{xai} for vision-based lunar landing is studied in this paper, aiming at providing transpar… ▽ More

    Submitted 24 August, 2024; originally announced August 2024.

  12. arXiv:2408.10474  [pdf, other

    cs.SE cs.AI cs.CL cs.CR cs.LG

    LeCov: Multi-level Testing Criteria for Large Language Models

    Authors: Xuan Xie, Jiayang Song, Yuheng Huang, Da Song, Fuyuan Zhang, Felix Juefei-Xu, Lei Ma

    Abstract: Large Language Models (LLMs) are widely used in many different domains, but because of their limited interpretability, there are questions about how trustworthy they are in various perspectives, e.g., truthfulness and toxicity. Recent research has started developing testing methods for LLMs, aiming to uncover untrustworthy issues, i.e., defects, before deployment. However, systematic and formalize… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  13. arXiv:2408.10123  [pdf, other

    cs.RO cs.CV

    Learning Precise Affordances from Egocentric Videos for Robotic Manipulation

    Authors: Gen Li, Nikolaos Tsagkas, Jifei Song, Ruaridh Mon-Williams, Sethu Vijayakumar, Kun Shao, Laura Sevilla-Lara

    Abstract: Affordance, defined as the potential actions that an object offers, is crucial for robotic manipulation tasks. A deep understanding of affordance can lead to more intelligent AI systems. For example, such knowledge directs an agent to grasp a knife by the handle for cutting and by the blade when passing it to someone. In this paper, we present a streamlined affordance learning system that encompas… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Project page: https://reagan1311.github.io/affgrasp

  14. arXiv:2408.09819  [pdf, other

    cs.CL cs.AI

    CMoralEval: A Moral Evaluation Benchmark for Chinese Large Language Models

    Authors: Linhao Yu, Yongqi Leng, Yufei Huang, Shang Wu, Haixin Liu, Xinmeng Ji, Jiahui Zhao, Jinwang Song, Tingting Cui, Xiaoqing Cheng, Tao Liu, Deyi Xiong

    Abstract: What a large language model (LLM) would respond in ethically relevant context? In this paper, we curate a large benchmark CMoralEval for morality evaluation of Chinese LLMs. The data sources of CMoralEval are two-fold: 1) a Chinese TV program discussing Chinese moral norms with stories from the society and 2) a collection of Chinese moral anomies from various newspapers and academic papers on mora… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted by ACL 2024 (Findings)

  15. arXiv:2408.09503  [pdf, other

    cs.CL cs.AI cs.LG stat.ML

    Out-of-distribution generalization via composition: a lens through induction heads in Transformers

    Authors: Jiajun Song, Zhuoyan Xu, Yiqiao Zhong

    Abstract: Large language models (LLMs) such as GPT-4 sometimes appear to be creative, solving novel tasks often with a few demonstrations in the prompt. These tasks require the models to generalize on distributions different from those from training data -- which is known as out-of-distribution (OOD) generalization. Despite the tremendous success of LLMs, how they approach OOD generalization remains an open… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 41 pages, 25 figures

  16. arXiv:2408.08152  [pdf, other

    cs.CL cs.AI cs.LG cs.LO

    DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

    Authors: Huajian Xin, Z. Z. Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, Wenjun Gao, Qihao Zhu, Dejian Yang, Zhibin Gou, Z. F. Wu, Fuli Luo, Chong Ruan

    Abstract: We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with specialization in formal mathematical languages, the model undergoes supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  17. arXiv:2408.08147  [pdf, other

    cs.DC cs.CL cs.LG

    P/D-Serve: Serving Disaggregated Large Language Model at Scale

    Authors: Yibo Jin, Tao Wang, Huimin Lin, Mingyang Song, Peiyang Li, Yipeng Ma, Yicheng Shan, Zhengfan Yuan, Cailong Li, Yajing Sun, Tiandeng Wu, Xing Chu, Ruizhi Huan, Li Ma, Xiao You, Wenting Zhou, Yunpeng Ye, Wen Liu, Xiangkun Xu, Yongsheng Zhang, Tiantian Dong, Jiawei Zhu, Zhe Wang, Xijian Ju, Jianxun Song , et al. (5 additional authors not shown)

    Abstract: Serving disaggregated large language models (LLMs) over tens of thousands of xPU devices (GPUs or NPUs) with reliable performance faces multiple challenges. 1) Ignoring the diversity (various prefixes and tidal requests), treating all the prompts in a mixed pool is inadequate. To facilitate the similarity per scenario and minimize the inner mismatch on P/D (prefill and decoding) processing, fine-g… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  18. arXiv:2408.06665  [pdf, ps, other

    cs.LG cs.AI

    RW-NSGCN: A Robust Approach to Structural Attacks via Negative Sampling

    Authors: Shuqi He, Jun Zhuang, Ding Wang, Jun Song

    Abstract: Node classification using Graph Neural Networks (GNNs) has been widely applied in various practical scenarios, such as predicting user interests and detecting communities in social networks. However, recent studies have shown that graph-structured networks often contain potential noise and attacks, in the form of topological perturbations and weight disturbances, which can lead to decreased classi… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  19. arXiv:2408.05917  [pdf

    cs.CE cs.AI cs.LG

    Inverse design of Non-parameterized Ventilated Acoustic Resonator via Variational Autoencoder with Acoustic Response-encoded Latent Space

    Authors: Min Woo Cho, Seok Hyeon Hwang, Jun-Young Jang, Jin Yeong Song, Sun-kwang Hwang, Kyoung Je Cha, Dong Yong Park, Kyungjun Song, Sang Min Park

    Abstract: Ventilated acoustic resonator(VAR), a type of acoustic metamaterial, emerge as an alternative for sound attenuation in environments that require ventilation, owing to its excellent low-frequency attenuation performance and flexible shape adaptability. However, due to the non-linear acoustic responses of VARs, the VAR designs are generally obtained within a limited parametrized design space, and th… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  20. arXiv:2408.05707  [pdf, other

    cs.LG

    Fast and Scalable Semi-Supervised Learning for Multi-View Subspace Clustering

    Authors: Huaming Ling, Chenglong Bao, Jiebo Song, Zuoqiang Shi

    Abstract: In this paper, we introduce a Fast and Scalable Semi-supervised Multi-view Subspace Clustering (FSSMSC) method, a novel solution to the high computational complexity commonly found in existing approaches. FSSMSC features linear computational and space complexity relative to the size of the data. The method generates a consensus anchor graph across all views, representing each data point as a spars… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 40 pages,7 figures

  21. arXiv:2408.03892  [pdf, other

    cs.SE cs.AI

    MORTAR: A Model-based Runtime Action Repair Framework for AI-enabled Cyber-Physical Systems

    Authors: Renzhi Wang, Zhehua Zhou, Jiayang Song, Xuan Xie, Xiaofei Xie, Lei Ma

    Abstract: Cyber-Physical Systems (CPSs) are increasingly prevalent across various industrial and daily-life domains, with applications ranging from robotic operations to autonomous driving. With recent advancements in artificial intelligence (AI), learning-based components, especially AI controllers, have become essential in enhancing the functionality and efficiency of CPSs. However, the lack of interpreta… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  22. arXiv:2408.03573  [pdf, other

    cs.SE cs.AI cs.CL

    Active Testing of Large Language Model via Multi-Stage Sampling

    Authors: Yuheng Huang, Jiayang Song, Qiang Hu, Felix Juefei-Xu, Lei Ma

    Abstract: Performance evaluation plays a crucial role in the development life cycle of large language models (LLMs). It estimates the model's capability, elucidates behavior characteristics, and facilitates the identification of potential issues and limitations, thereby guiding further improvement. Given that LLMs' diverse task-handling abilities stem from large volumes of training data, a comprehensive eva… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    ACM Class: D.2.5; I.2.7

  23. arXiv:2408.02110  [pdf, other

    cs.CV

    AvatarPose: Avatar-guided 3D Pose Estimation of Close Human Interaction from Sparse Multi-view Videos

    Authors: Feichi Lu, Zijian Dong, Jie Song, Otmar Hilliges

    Abstract: Despite progress in human motion capture, existing multi-view methods often face challenges in estimating the 3D pose and shape of multiple closely interacting people. This difficulty arises from reliance on accurate 2D joint estimations, which are hard to obtain due to occlusions and body contact when people are in close interaction. To address this, we propose a novel method leveraging the perso… ▽ More

    Submitted 20 August, 2024; v1 submitted 4 August, 2024; originally announced August 2024.

    Comments: Project Page: https://eth-ait.github.io/AvatarPose/

  24. arXiv:2408.01569  [pdf, other

    cs.RO

    TURTLMap: Real-time Localization and Dense Mapping of Low-texture Underwater Environments with a Low-cost Unmanned Underwater Vehicle

    Authors: Jingyu Song, Onur Bagoren, Razan Andigani, Advaith Venkatramanan Sethuraman, Katherine Skinner

    Abstract: Significant work has been done on advancing localization and mapping in underwater environments. Still, state-of-the-art methods are challenged by low-texture environments, which is common for underwater settings. This makes it difficult to use existing methods in diverse, real-world scenes. In this paper, we present TURTLMap, a novel solution that focuses on textureless underwater environments th… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: Accepted to IROS 2024

  25. arXiv:2408.01230  [pdf, other

    cs.RO cs.LG

    HeteroMorpheus: Universal Control Based on Morphological Heterogeneity Modeling

    Authors: YiFan Hao, Yang Yang, Junru Song, Wei Peng, Weien Zhou, Tingsong Jiang, Wen Yao

    Abstract: In the field of robotic control, designing individual controllers for each robot leads to high computational costs. Universal control policies, applicable across diverse robot morphologies, promise to mitigate this challenge. Predominantly, models based on Graph Neural Networks (GNN) and Transformers are employed, owing to their effectiveness in capturing relational dynamics across a robot's limbs… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  26. arXiv:2408.00624  [pdf, other

    eess.AS cs.CL cs.CV

    SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data

    Authors: Yichen Lu, Jiaqi Song, Xuankai Chang, Hengwei Bian, Soumi Maiti, Shinji Watanabe

    Abstract: In this work, we present SynesLM, an unified model which can perform three multimodal language understanding tasks: audio-visual automatic speech recognition(AV-ASR) and visual-aided speech/machine translation(VST/VMT). Unlike previous research that focused on lip motion as visual cues for speech signals, our work explores more general visual information within entire frames, such as objects and a… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  27. arXiv:2408.00347  [pdf, other

    cs.CV cs.AI

    Advancing Medical Image Segmentation: Morphology-Driven Learning with Diffusion Transformer

    Authors: Sungmin Kang, Jaeha Song, Jihie Kim

    Abstract: Understanding the morphological structure of medical images and precisely segmenting the region of interest or abnormality is an important task that can assist in diagnosis. However, the unique properties of medical imaging make clear segmentation difficult,and the high cost and time-consuming task of labeling leads to a coarse-grained representation of ground truth. Facing with these problems, we… ▽ More

    Submitted 31 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted in BMVC 2024

  28. arXiv:2408.00137  [pdf, other

    cs.CL cs.AI

    Correcting Negative Bias in Large Language Models through Negative Attention Score Alignment

    Authors: Sangwon Yu, Jongyoon Song, Bongkyu Hwang, Hoyoung Kang, Sooah Cho, Junhwa Choi, Seongho Joe, Taehee Lee, Youngjune L. Gwon, Sungroh Yoon

    Abstract: A binary decision task, like yes-no questions or answer verification, reflects a significant real-world scenario such as where users look for confirmation about the correctness of their decisions on specific issues. In this work, we observe that language models exhibit a negative bias in the binary decisions of complex reasoning tasks. Based on our observations and the rationale about attention-ba… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

  29. Discovery of 6G Services and Resources in Edge-Cloud-Continuum

    Authors: Mohammad Farhoudi, Masoud Shokrnezhad, Tarik Taleb, Richard Li, JaeSeung Song

    Abstract: The advent of 6G networks will present a pivotal juncture in the evolution of telecommunications, marked by the proliferation of devices, dynamic service requests, and the integration of edge and cloud computing. In response to these transformative shifts, this paper proposes a service and resource discovery architecture as part of service provisioning for the future 6G edge-cloud-continuum. Throu… ▽ More

    Submitted 8 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

    Comments: 10 pages, 5 figures

  30. arXiv:2407.19471  [pdf, other

    cs.CV

    On the Evaluation Consistency of Attribution-based Explanations

    Authors: Jiarui Duan, Haoling Li, Haofei Zhang, Hao Jiang, Mengqi Xue, Li Sun, Mingli Song, Jie Song

    Abstract: Attribution-based explanations are garnering increasing attention recently and have emerged as the predominant approach towards \textit{eXplanable Artificial Intelligence}~(XAI). However, the absence of consistent configurations and systematic investigations in prior literature impedes comprehensive evaluations of existing methodologies. In this work, we introduce {Meta-Rank}, an open platform for… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

    Comments: Accepted as a conference paper by ECCV 2024

  31. arXiv:2407.19055  [pdf, other

    cs.SE cs.AI cs.LG

    Effective Large Language Model Debugging with Best-first Tree Search

    Authors: Jialin Song, Jonathan Raiman, Bryan Catanzaro

    Abstract: Large Language Models (LLMs) show promise in code generation tasks. However, their code-writing abilities are often limited in scope: while they can successfully implement simple functions, they struggle with more complex tasks. A fundamental difference with how an LLM writes code, compared to a human programmer, is that it cannot consistently spot and fix bugs. Debugging is a crucial skill for pr… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  32. arXiv:2407.15383  [pdf, other

    cs.CV

    Is user feedback always informative? Retrieval Latent Defending for Semi-Supervised Domain Adaptation without Source Data

    Authors: Junha Song, Tae Soo Kim, Junha Kim, Gunhee Nam, Thijs Kooi, Jaegul Choo

    Abstract: This paper aims to adapt the source model to the target environment, leveraging small user feedback (i.e., labeled target data) readily available in real-world applications. We find that existing semi-supervised domain adaptation (SemiSDA) methods often suffer from poorly improved adaptation performance when directly utilizing such feedback data, as shown in Figure 1. We analyze this phenomenon vi… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024, Project page: https://sites.google.com/view/junha/nbf-rld

  33. arXiv:2407.15089  [pdf, other

    physics.geo-ph cs.AI cs.LG

    Learning Physics for Unveiling Hidden Earthquake Ground Motions via Conditional Generative Modeling

    Authors: Pu Ren, Rie Nakata, Maxime Lacour, Ilan Naiman, Nori Nakata, Jialin Song, Zhengfa Bi, Osman Asif Malik, Dmitriy Morozov, Omri Azencot, N. Benjamin Erichson, Michael W. Mahoney

    Abstract: Predicting high-fidelity ground motions for future earthquakes is crucial for seismic hazard assessment and infrastructure resilience. Conventional empirical simulations suffer from sparse sensor distribution and geographically localized earthquake locations, while physics-based methods are computationally intensive and require accurate representations of Earth structures and earthquake sources. W… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  34. arXiv:2407.14741  [pdf, other

    cs.IR cs.AI

    Orthogonal Hyper-category Guided Multi-interest Elicitation for Micro-video Matching

    Authors: Beibei Li, Beihong Jin, Yisong Yu, Yiyuan Zheng, Jiageng Song, Wei Zhuo, Tao Xiang

    Abstract: Watching micro-videos is becoming a part of public daily life. Usually, user watching behaviors are thought to be rooted in their multiple different interests. In the paper, we propose a model named OPAL for micro-video matching, which elicits a user's multiple heterogeneous interests by disentangling multiple soft and hard interest embeddings from user interactions. Moreover, OPAL employs a two-s… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 6 pages, accepted by ICME 2024

  35. arXiv:2407.12538  [pdf, other

    eess.IV cs.CV

    High Frequency Matters: Uncertainty Guided Image Compression with Wavelet Diffusion

    Authors: Juan Song, Jiaxiang He, Mingtao Feng, Keyan Wang, Yunsong Li, Ajmal Mian

    Abstract: Diffusion probabilistic models have recently achieved remarkable success in generating high-quality images. However, balancing high perceptual quality and low distortion remains challenging in image compression applications. To address this issue, we propose an efficient Uncertainty-Guided image compression approach with wavelet Diffusion (UGDiff). Our approach focuses on high frequency compressio… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  36. arXiv:2407.12292  [pdf, other

    cs.CV cs.AI

    Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection

    Authors: Youheng Sun, Shengming Yuan, Xuanhan Wang, Lianli Gao, Jingkuan Song

    Abstract: Targeted adversarial attack, which aims to mislead a model to recognize any image as a target object by imperceptible perturbations, has become a mainstream tool for vulnerability assessment of deep neural networks (DNNs). Since existing targeted attackers only learn to attack known target classes, they cannot generalize well to unknown classes. To tackle this issue, we propose $\bf{G}$eneralized… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  37. arXiv:2407.07342  [pdf, other

    cs.CL

    Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture

    Authors: Jiayang Song, Yuheng Huang, Zhehua Zhou, Lei Ma

    Abstract: As safety remains a crucial concern throughout the development lifecycle of Large Language Models (LLMs), researchers and industrial practitioners have increasingly focused on safeguarding and aligning LLM behaviors with human preferences and ethical standards. LLMs, trained on extensive multilingual corpora, exhibit powerful generalization abilities across diverse languages and domains. However,… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  38. arXiv:2407.07110  [pdf, other

    cs.LG cs.AI eess.SP

    Foundation Models for Electrocardiograms

    Authors: Junho Song, Jong-Hwan Jang, Byeong Tak Lee, DongGyun Hong, Joon-myoung Kwon, Yong-Yeon Jo

    Abstract: Foundation models, enhanced by self-supervised learning (SSL) techniques, represent a cutting-edge frontier in biomedical signal analysis, particularly for electrocardiograms (ECGs), crucial for cardiac health monitoring and diagnosis. This study conducts a comprehensive analysis of foundation models for ECGs by employing and refining innovative SSL methodologies - namely, generative and contrasti… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

    Comments: 27 pages

  39. FORAY: Towards Effective Attack Synthesis against Deep Logical Vulnerabilities in DeFi Protocols

    Authors: Hongbo Wen, Hanzhi Liu, Jiaxin Song, Yanju Chen, Wenbo Guo, Yu Feng

    Abstract: Blockchain adoption has surged with the rise of Decentralized Finance (DeFi) applications. However, the significant value of digital assets managed by DeFi protocols makes them prime targets for attacks. Current smart contract vulnerability detection tools struggle with DeFi protocols due to deep logical bugs arising from complex financial interactions between multiple smart contracts. These tools… ▽ More

    Submitted 30 August, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  40. arXiv:2407.05125  [pdf, other

    cs.DC cs.LG

    A Joint Approach to Local Updating and Gradient Compression for Efficient Asynchronous Federated Learning

    Authors: Jiajun Song, Jiajun Luo, Rongwei Lu, Shuzhao Xie, Bin Chen, Zhi Wang

    Abstract: Asynchronous Federated Learning (AFL) confronts inherent challenges arising from the heterogeneity of devices (e.g., their computation capacities) and low-bandwidth environments, both potentially causing stale model updates (e.g., local gradients) for global aggregation. Traditional approaches mitigating the staleness of updates typically focus on either adjusting the local updating or gradient co… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  41. arXiv:2407.04561  [pdf, other

    cs.NI eess.SP

    Wireless Spectrum in Rural Farmlands: Status, Challenges and Opportunities

    Authors: Mukaram Shahid, Kunal Das, Taimoor Ul Islam, Christ Somiah, Daji Qiao, Arsalan Ahmad, Jimming Song, Zhengyuan Zhu, Sarath Babu, Yong Guan, Tusher Chakraborty, Suraj Jog, Ranveer Chandra, Hongwei Zhang

    Abstract: Due to factors such as low population density and expansive geographical distances, network deployment falls behind in rural regions, leading to a broadband divide. Wireless spectrum serves as the blood and flesh of wireless communications. Shared white spaces such as those in the TVWS and CBRS spectrum bands offer opportunities to expand connectivity, innovate, and provide affordable access to hi… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  42. arXiv:2407.04295  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    Jailbreak Attacks and Defenses Against Large Language Models: A Survey

    Authors: Sibo Yi, Yule Liu, Zhen Sun, Tianshuo Cong, Xinlei He, Jiaxing Song, Ke Xu, Qi Li

    Abstract: Large Language Models (LLMs) have performed exceptionally in various text-generative tasks, including question answering, translation, code completion, etc. However, the over-assistance of LLMs has raised the challenge of "jailbreaking", which induces the model to generate malicious responses against the usage policy and society by designing adversarial prompts. With the emergence of jailbreak att… ▽ More

    Submitted 30 August, 2024; v1 submitted 5 July, 2024; originally announced July 2024.

  43. arXiv:2407.01598  [pdf

    cs.LG cs.AI

    Long-Term Prediction Accuracy Improvement of Data-Driven Medium-Range Global Weather Forecast

    Authors: Yifan Hu, Fukang Yin, Weimin Zhang, Kaijun Ren, Junqiang Song, Kefeng Deng, Di Zhang

    Abstract: Long-term stability stands as a crucial requirement in data-driven medium-range global weather forecasting. Spectral bias is recognized as the primary contributor to instabilities, as data-driven methods difficult to learn small-scale dynamics. In this paper, we reveal that the universal mechanism for these instabilities is not only related to spectral bias but also to distortions brought by proce… ▽ More

    Submitted 25 June, 2024; originally announced July 2024.

  44. arXiv:2407.00081  [pdf, other

    cs.DC cs.AI cs.ET cs.LG cs.NI

    Semantic Revolution from Communications to Orchestration for 6G: Challenges, Enablers, and Research Directions

    Authors: Masoud Shokrnezhad, Hamidreza Mazandarani, Tarik Taleb, Jaeseung Song, Richard Li

    Abstract: In the context of emerging 6G services, the realization of everything-to-everything interactions involving a myriad of physical and digital entities presents a crucial challenge. This challenge is exacerbated by resource scarcity in communication infrastructures, necessitating innovative solutions for effective service implementation. Exploring the potential of Semantic Communications (SemCom) to… ▽ More

    Submitted 24 June, 2024; originally announced July 2024.

    Comments: Accepted at IEEE Network magazine special issue: Goal-oriented Semantic Communication and Networking

  45. arXiv:2406.18151  [pdf, other

    cs.CV

    SynRS3D: A Synthetic Dataset for Global 3D Semantic Understanding from Monocular Remote Sensing Imagery

    Authors: Jian Song, Hongruixuan Chen, Weihao Xuan, Junshi Xia, Naoto Yokoya

    Abstract: Global semantic 3D understanding from single-view high-resolution remote sensing (RS) imagery is crucial for Earth Observation (EO). However, this task faces significant challenges due to the high costs of annotations and data collection, as well as geographically restricted data availability. To address these challenges, synthetic data offer a promising solution by being easily accessible and thu… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  46. arXiv:2406.14984  [pdf, ps, other

    cs.DS

    Colorful Priority $k$-Supplier

    Authors: Chandra Chekuri, Junkai Song

    Abstract: In the Priority $k$-Supplier problem the input consists of a metric space $(F \cup C, d)$ over set of facilities $F$ and a set of clients $C$, an integer $k > 0$, and a non-negative radius $r_v$ for each client $v \in C$. The goal is to select $k$ facilities $S \subseteq F$ to minimize $\max_{v \in C} \frac{d(v,S)}{r_v}$ where $d(v,S)$ is the distance of $v$ to the closes facility in $S$. This pro… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  47. arXiv:2406.13929  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Models are Skeptics: False Negative Problem of Input-conflicting Hallucination

    Authors: Jongyoon Song, Sangwon Yu, Sungroh Yoon

    Abstract: In this paper, we identify a new category of bias that induces input-conflicting hallucinations, where large language models (LLMs) generate responses inconsistent with the content of the input context. This issue we have termed the false negative problem refers to the phenomenon where LLMs are predisposed to return negative judgments when assessing the correctness of a statement given the context… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 12 pages, 9 figures

  48. arXiv:2406.12907  [pdf, other

    cs.LG cs.CL

    Reconciling Kaplan and Chinchilla Scaling Laws

    Authors: Tim Pearce, Jinyeop Song

    Abstract: Kaplan et al. [2020] (`Kaplan') and Hoffmann et al. [2022] (`Chinchilla') studied the scaling behavior of transformers trained on next-token language prediction. These studies produced different estimates for how the number of parameters ($N$) and training tokens ($D$) should be set to achieve the lowest possible loss for a given compute budget ($C$). Kaplan: $N_\text{optimal} \propto C^{0.73}$, C… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  49. arXiv:2406.12315  [pdf, other

    cs.AI

    PruningBench: A Comprehensive Benchmark of Structural Pruning

    Authors: Haoling Li, Changhao Li, Mengqi Xue, Gongfan Fang, Sheng Zhou, Zunlei Feng, Huiqiong Wang, Yong Wang, Lechao Cheng, Mingli Song, Jie Song

    Abstract: Structural pruning has emerged as a promising approach for producing more efficient models. Nevertheless, the community suffers from a lack of standardized benchmarks and metrics, leaving the progress in this area not fully comprehended. To fill this gap, we present the first comprehensive benchmark, termed \textit{PruningBench}, for structural pruning. PruningBench showcases the following three c… ▽ More

    Submitted 20 July, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

    Comments: This is a paper aims to present a evaluation benchmark for structural pruning. The full text is 30 pages

  50. arXiv:2406.11931  [pdf, other

    cs.SE cs.AI cs.LG

    DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

    Authors: DeepSeek-AI, Qihao Zhu, Daya Guo, Zhihong Shao, Dejian Yang, Peiyi Wang, Runxin Xu, Y. Wu, Yukun Li, Huazuo Gao, Shirong Ma, Wangding Zeng, Xiao Bi, Zihui Gu, Hanwei Xu, Damai Dai, Kai Dong, Liyue Zhang, Yishi Piao, Zhibin Gou, Zhenda Xie, Zhewen Hao, Bingxuan Wang, Junxiao Song, Deli Chen , et al. (15 additional authors not shown)

    Abstract: We present DeepSeek-Coder-V2, an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. Specifically, DeepSeek-Coder-V2 is further pre-trained from an intermediate checkpoint of DeepSeek-V2 with additional 6 trillion tokens. Through this continued pre-training, DeepSeek-Coder-V2 substantially enhances the coding and mathe… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.