Skip to main content

Showing 1–50 of 2,343 results for author: Zhao, Y

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11048  [pdf, other

    cs.RO cs.AI cs.LG

    RP1M: A Large-Scale Motion Dataset for Piano Playing with Bi-Manual Dexterous Robot Hands

    Authors: Yi Zhao, Le Chen, Jan Schneider, Quankai Gao, Juho Kannala, Bernhard Schölkopf, Joni Pajarinen, Dieter Büchler

    Abstract: It has been a long-standing research goal to endow robot hands with human-level dexterity. Bi-manual robot piano playing constitutes a task that combines challenges from dynamic tasks, such as generating fast while precise motions, with slower but contact-rich manipulation problems. Although reinforcement learning based approaches have shown promising results in single-task performance, these meth… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Project Website: https://rp1m.github.io/

  2. arXiv:2408.11030  [pdf, other

    cs.CV

    OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding

    Authors: Youjun Zhao, Jiaying Lin, Shuquan Ye, Qianshi Pang, Rynson W. H. Lau

    Abstract: Open-vocabulary 3D scene understanding (OV-3D) aims to localize and classify novel objects beyond the closed object classes. However, existing approaches and benchmarks primarily focus on the open vocabulary problem within the context of object classes, which is insufficient to provide a holistic evaluation to what extent a model understands the 3D scene. In this paper, we introduce a more challen… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  3. arXiv:2408.10935  [pdf, other

    cs.CV

    Large Point-to-Gaussian Model for Image-to-3D Generation

    Authors: Longfei Lu, Huachen Gao, Tao Dai, Yaohua Zha, Zhi Hou, Junta Wu, Shu-Tao Xia

    Abstract: Recently, image-to-3D approaches have significantly advanced the generation quality and speed of 3D assets based on large reconstruction models, particularly 3D Gaussian reconstruction models. Existing large 3D Gaussian models directly map 2D image to 3D Gaussian parameters, while regressing 2D image to 3D Gaussian representations is challenging without 3D priors. In this paper, we propose a large… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 10 pages, 9 figures, ACM MM 2024

  4. arXiv:2408.10918  [pdf, other

    cs.CL

    CHECKWHY: Causal Fact Verification via Argument Structure

    Authors: Jiasheng Si, Yibo Zhao, Yingjie Zhu, Haiyang Zhu, Wenpeng Lu, Deyu Zhou

    Abstract: With the growing complexity of fact verification tasks, the concern with "thoughtful" reasoning capabilities is increasing. However, recent fact verification benchmarks mainly focus on checking a narrow scope of semantic factoids within claims and lack an explicit logical reasoning process. In this paper, we introduce CheckWhy, a challenging dataset tailored to a novel causal fact verification tas… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted by ACL2024; Awarded as Outstanding Paper Award and Area Chair Award

  5. arXiv:2408.10912  [pdf, other

    cs.IT

    An Achievable Rate-Distortion Region of Joint Identification and Sensing for Multiple Access Channels

    Authors: Yaning Zhao, Wafa Labidi, Holger Boche, Eduard Jorswieck, Christian Deppe

    Abstract: In contrast to Shannon transmission codes, the size of identification (ID) codes for discrete memoryless channels (DMCs) experiences doubly exponential growth with the block length when randomized encoding is used. Additional enhancements within the ID paradigm can be realized through supplementary resources such as quantum entanglement, common randomness (CR), and feedback. Joint transmission and… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  6. arXiv:2408.10718  [pdf, other

    cs.SE cs.CL

    CodeJudge-Eval: Can Large Language Models be Good Judges in Code Understanding?

    Authors: Yuwei Zhao, Ziyang Luo, Yuchen Tian, Hongzhan Lin, Weixiang Yan, Annan Li, Jing Ma

    Abstract: Recent advancements in large language models (LLMs) have showcased impressive code generation capabilities, primarily evaluated through language-to-code benchmarks. However, these benchmarks may not fully capture a model's code understanding abilities. We introduce CodeJudge-Eval (CJ-Eval), a novel benchmark designed to assess LLMs' code understanding abilities from the perspective of code judging… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Work in progress

  7. arXiv:2408.10566  [pdf, other

    cs.LG cs.AI

    SparseGrow: Addressing Growth-Induced Forgetting in Task-Agnostic Continual Learning

    Authors: Yuqing Zhao, Divya Saxena, Jiannong Cao, Xiaoyun Liu, Changlin Song

    Abstract: In continual learning (CL), model growth enhances adaptability over new data, improving knowledge retention for more tasks. However, improper model growth can lead to severe degradation of previously learned knowledge, an issue we name as growth-induced forgetting (GIFt), especially in task-agnostic CL using entire grown model for inference. Existing works, despite adopting model growth and random… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: This paper has been submitted to the AAAI conference. If accepted, the final version will be updated to reflect the conference proceedings

  8. arXiv:2408.10172  [pdf, other

    cs.DS

    Eulerian Graph Sparsification by Effective Resistance Decomposition

    Authors: Arun Jambulapati, Sushant Sachdeva, Aaron Sidford, Kevin Tian, Yibin Zhao

    Abstract: We provide an algorithm that, given an $n$-vertex $m$-edge Eulerian graph with polynomially bounded weights, computes an $\breve{O}(n\log^{2} n \cdot \varepsilon^{-2})$-edge $\varepsilon$-approximate Eulerian sparsifier with high probability in $\breve{O}(m\log^3 n)$ time (where $\breve{O}(\cdot)$ hides $\text{polyloglog}(n)$ factors). Due to a reduction from [Peng-Song, STOC '22], this yields an… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  9. arXiv:2408.09647  [pdf, other

    cs.CV

    C2P-CLIP: Injecting Category Common Prompt in CLIP to Enhance Generalization in Deepfake Detection

    Authors: Chuangchuang Tan, Renshuai Tao, Huan Liu, Guanghua Gu, Baoyuan Wu, Yao Zhao, Yunchao Wei

    Abstract: This work focuses on AIGC detection to develop universal detectors capable of identifying various types of forgery images. Recent studies have found large pre-trained models, such as CLIP, are effective for generalizable deepfake detection along with linear classifiers. However, two critical issues remain unresolved: 1) understanding why CLIP features are effective on deepfake detection through a… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures

  10. arXiv:2408.09533  [pdf, other

    cs.CV

    AnomalyFactory: Regard Anomaly Generation as Unsupervised Anomaly Localization

    Authors: Ying Zhao

    Abstract: Recent advances in anomaly generation approaches alleviate the effect of data insufficiency on task of anomaly localization. While effective, most of them learn multiple large generative models on different datasets and cumbersome anomaly prediction models for different classes. To address the limitations, we propose a novel scalable framework, named AnomalyFactory, that unifies unsupervised anoma… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: Accepted to the 2nd workshop on Vision-based InduStrial InspectiON (VISION) at ECCV 2024

  11. arXiv:2408.09462  [pdf, other

    cs.MM

    SpeechEE: A Novel Benchmark for Speech Event Extraction

    Authors: Bin Wang, Meishan Zhang, Hao Fei, Yu Zhao, Bobo Li, Shengqiong Wu, Wei Ji, Min Zhang

    Abstract: Event extraction (EE) is a critical direction in the field of information extraction, laying an important foundation for the construction of structured knowledge bases. EE from text has received ample research and attention for years, yet there can be numerous real-world applications that require direct information acquisition from speech signals, online meeting minutes, interview summaries, press… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  12. arXiv:2408.09119  [pdf, ps, other

    cs.IT

    Identification via Gaussian Multiple Access Channels in the Presence of Feedback

    Authors: Yaning Zhao, Wafa Labidi, Holger Boche, Eduard Jorswieck, Christian Deppe

    Abstract: We investigate message identification over a K-sender Gaussian multiple access channel (K-GMAC). Unlike conventional Shannon transmission codes, the size of randomized identification (ID) codes experiences a doubly exponential growth in the code length. Improvements in the ID approach can be attained through additional resources such as quantum entanglement, common randomness (CR), and feedback. I… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  13. arXiv:2408.08515  [pdf, other

    cs.SE

    Selecting Initial Seeds for Better JVM Fuzzing

    Authors: Tianchang Gao, Junjie Chen, Dong Wang, Yile Guo, Yingquan Zhao, Zan Wang

    Abstract: Literature in traditional program fuzzing has confirmed that effectiveness is largely impacted by redundancy among initial seeds, thereby proposing a series of seed selection methods. JVM fuzzing, compared to traditional ones, presents unique characteristics, including large-scale and intricate code, and programs with both syntactic and semantic features. However, it remains unclear whether the ex… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  14. arXiv:2408.08252  [pdf, other

    cs.LG cs.AI q-bio.GN stat.ML

    Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding

    Authors: Xiner Li, Yulai Zhao, Chenyu Wang, Gabriele Scalia, Gokcen Eraslan, Surag Nair, Tommaso Biancalani, Aviv Regev, Sergey Levine, Masatoshi Uehara

    Abstract: Diffusion models excel at capturing the natural design spaces of images, molecules, DNA, RNA, and protein sequences. However, rather than merely generating designs that are natural, we often aim to optimize downstream reward functions while preserving the naturalness of these design spaces. Existing methods for achieving this goal often require ``differentiable'' proxy models (\textit{e.g.}, class… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: The code is available at https://github.com/masa-ue/SVDD

  15. arXiv:2408.07605  [pdf, other

    cs.CV

    Panacea+: Panoramic and Controllable Video Generation for Autonomous Driving

    Authors: Yuqing Wen, Yucheng Zhao, Yingfei Liu, Binyuan Huang, Fan Jia, Yanhui Wang, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang

    Abstract: The field of autonomous driving increasingly demands high-quality annotated video training data. In this paper, we propose Panacea+, a powerful and universally applicable framework for generating video data in driving scenes. Built upon the foundation of our previous work, Panacea, Panacea+ adopts a multi-view appearance noise prior mechanism and a super-resolution module for enhanced consistency… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Project page: https://panacea-ad.github.io/. arXiv admin note: text overlap with arXiv:2311.16813

  16. Speech-based Mark for Data Sonification

    Authors: Yichun Zhao, Jingyi Lu, Miguel A Nacenta

    Abstract: Sonification serves as a powerful tool for data accessibility, especially for people with vision loss. Among various modalities, speech is a familiar means of communication similar to the role of text in visualization. However, speech-based sonification is underexplored. We introduce SpeechTone, a novel speech-based mark for data sonification and extension to the existing Erie declarative grammar… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted in ASSETS '24, October 27-30, 2024, St. John's, NL, Canada

    Journal ref: The 26th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS '24), October 27-30, 2024, St. John's, NL, Canada

  17. arXiv:2408.06834  [pdf, other

    cs.CV

    GLGait: A Global-Local Temporal Receptive Field Network for Gait Recognition in the Wild

    Authors: Guozhen Peng, Yunhong Wang, Yuwei Zhao, Shaoxiong Zhang, Annan Li

    Abstract: Gait recognition has attracted increasing attention from academia and industry as a human recognition technology from a distance in non-intrusive ways without requiring cooperation. Although advanced methods have achieved impressive success in lab scenarios, most of them perform poorly in the wild. Recently, some Convolution Neural Networks (ConvNets) based methods have been proposed to address th… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted by ACM MM2024

  18. arXiv:2408.06567  [pdf, other

    cs.CL cs.AI

    AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies

    Authors: Bo-Wen Zhang, Liangdong Wang, Ye Yuan, Jijie Li, Shuhao Gu, Mengdi Zhao, Xinya Wu, Guang Liu, Chengwei Wu, Hanyu Zhao, Li Du, Yiming Ju, Quanyue Ma, Yulong Ao, Yingli Zhao, Songhe Zhu, Zhou Cao, Dong Liang, Yonghua Lin, Ming Zhang, Shunfei Wang, Yanxin Zhou, Min Ye, Xuekai Chen, Xinyang Yu , et al. (2 additional authors not shown)

    Abstract: In recent years, with the rapid application of large language models across various fields, the scale of these models has gradually increased, and the resources required for their pre-training have grown exponentially. Training an LLM from scratch will cost a lot of computation resources while scaling up from a smaller model is a more efficient approach and has thus attracted significant attention… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  19. arXiv:2408.06550  [pdf, other

    cs.HC

    Stretch or Vibrate? Rendering Spatial Information of Static and Moving Objects in VR via Haptic Feedback for Blind People

    Authors: Jiasheng Li, Zining Zhang, Zeyu Yan, Yuhang Zhao, Huaishu Peng

    Abstract: Perceiving spatial information of a virtual object (e.g., direction, distance) is critical yet challenging for blind users seeking an immersive virtual reality experience. To facilitate VR accessibility for blind users, in this paper, we investigate the effectiveness of two types of haptic cues--vibrotactile and skin-stretch cues--in conveying the spatial information of a virtual object when appli… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  20. arXiv:2408.06019  [pdf, other

    cs.CV

    HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors

    Authors: Xiaozheng Zheng, Chao Wen, Zhaohu Li, Weiyi Zhang, Zhuo Su, Xu Chang, Yang Zhao, Zheng Lv, Xiaoyuan Zhang, Yongjie Zhang, Guidong Wang, Lan Xu

    Abstract: In this paper, we present a novel 3D head avatar creation approach capable of generalizing from few-shot in-the-wild data with high-fidelity and animatable robustness. Given the underconstrained nature of this problem, incorporating prior knowledge is essential. Therefore, we propose a framework comprising prior learning and avatar creation phases. The prior learning phase leverages 3D head priors… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Project page: https://headgap.github.io/

  21. arXiv:2408.05758  [pdf, other

    eess.AS cs.AI cs.CL cs.SD

    VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing

    Authors: Chunyu Qiang, Wang Geng, Yi Zhao, Ruibo Fu, Tao Wang, Cheng Gong, Tianrui Wang, Qiuyu Liu, Jiangyan Yi, Zhengqi Wen, Chen Zhang, Hao Che, Longbiao Wang, Jianwu Dang, Jianhua Tao

    Abstract: Deep learning has brought significant improvements to the field of cross-modal representation learning. For tasks such as text-to-speech (TTS), voice conversion (VC), and automatic speech recognition (ASR), a cross-modal fine-grained (frame-level) sequence representation is desired, emphasizing the semantic content of the text modality while de-emphasizing the paralinguistic information of the spe… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  22. arXiv:2408.05702  [pdf, other

    cs.LG eess.SY nlin.CD

    Predicting Chaotic System Behavior using Machine Learning Techniques

    Authors: Huaiyuan Rao, Yichen Zhao, Qiang Lai

    Abstract: Recently, machine learning techniques, particularly deep learning, have demonstrated superior performance over traditional time series forecasting methods across various applications, including both single-variable and multi-variable predictions. This study aims to investigate the capability of i) Next Generation Reservoir Computing (NG-RC) ii) Reservoir Computing (RC) iii) Long short-term Memory… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 8 pages, 15 figures

  23. arXiv:2408.05686  [pdf, other

    cs.LG cs.MA

    The Bandit Whisperer: Communication Learning for Restless Bandits

    Authors: Yunfan Zhao, Tonghan Wang, Dheeraj Nagaraj, Aparna Taneja, Milind Tambe

    Abstract: Applying Reinforcement Learning (RL) to Restless Multi-Arm Bandits (RMABs) offers a promising avenue for addressing allocation problems with resource constraints and temporal dynamics. However, classic RMAB models largely overlook the challenges of (systematic) data errors - a common occurrence in real-world scenarios due to factors like varying data collection protocols and intentional noise for… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  24. arXiv:2408.05517  [pdf, other

    cs.CL

    SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning

    Authors: Yuze Zhao, Jintao Huang, Jinghan Hu, Xingjun Wang, Yunlin Mao, Daoze Zhang, Zeyinzi Jiang, Zhikai Wu, Baole Ai, Ang Wang, Wenmeng Zhou, Yingda Chen

    Abstract: Recent development in Large Language Models (LLMs) and Multi-modal Large Language Models (MLLMs) have leverage Attention-based Transformer architectures and achieved superior performance and generalization capabilities. They have since covered extensive areas of traditional learning tasks. For instance, text-based tasks such as text-classification and sequence-labeling, as well as multi-modal task… ▽ More

    Submitted 18 August, 2024; v1 submitted 10 August, 2024; originally announced August 2024.

  25. arXiv:2408.05435  [pdf, other

    quant-ph cs.LG

    SuperEncoder: Towards Universal Neural Approximate Quantum State Preparation

    Authors: Yilun Zhao, Bingmeng Wang, Wenle Jiang, Xiwei Pan, Bing Li, Yinhe Han, Ying Wang

    Abstract: Numerous quantum algorithms operate under the assumption that classical data has already been converted into quantum states, a process termed Quantum State Preparation (QSP). However, achieving precise QSP requires a circuit depth that scales exponentially with the number of qubits, making it a substantial obstacle in harnessing quantum advantage. Recent research suggests using a Parameterized Qua… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

  26. arXiv:2408.05307  [pdf

    cs.CE cs.LG

    Audio-visual cross-modality knowledge transfer for machine learning-based in-situ monitoring in laser additive manufacturing

    Authors: Jiarui Xie, Mutahar Safdar, Lequn Chen, Seung Ki Moon, Yaoyao Fiona Zhao

    Abstract: Various machine learning (ML)-based in-situ monitoring systems have been developed to detect laser additive manufacturing (LAM) process anomalies and defects. Multimodal fusion can improve in-situ monitoring performance by acquiring and integrating data from multiple modalities, including visual and audio data. However, multimodal fusion employs multiple sensors of different types, which leads to… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 36 pages, 12 figures, 6 tables

  27. arXiv:2408.05117  [pdf, other

    eess.IV cs.AI cs.CV

    Beyond the Eye: A Relational Model for Early Dementia Detection Using Retinal OCTA Images

    Authors: Shouyue Liu, Jinkui Hao, Yonghuai Liu, Huazhu Fu, Xinyu Guo, Shuting Zhang, Yitian Zhao

    Abstract: Early detection of dementia, such as Alzheimer's disease (AD) or mild cognitive impairment (MCI), is essential to enable timely intervention and potential treatment. Accurate detection of AD/MCI is challenging due to the high complexity, cost, and often invasive nature of current diagnostic techniques, which limit their suitability for large-scale population screening. Given the shared embryologic… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  28. arXiv:2408.04863  [pdf, other

    cs.SE

    Coding-PTMs: How to Find Optimal Code Pre-trained Models for Code Embedding in Vulnerability Detection?

    Authors: Yu Zhao, Lina Gong, Zhiqiu Huang, Yongwei Wang, Mingqiang Wei, Fei Wu

    Abstract: Vulnerability detection is garnering increasing attention in software engineering, since code vulnerabilities possibly pose significant security. Recently, reusing various code pre-trained models has become common for code embedding without providing reasonable justifications in vulnerability detection. The premise for casually utilizing pre-trained models (PTMs) is that the code embeddings genera… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by ASE 2024

  29. arXiv:2408.03284  [pdf, other

    cs.CV cs.GR cs.MM

    ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer

    Authors: Jiazhi Guan, Zhiliang Xu, Hang Zhou, Kaisiyuan Wang, Shengyi He, Zhanwang Zhang, Borong Liang, Haocheng Feng, Errui Ding, Jingtuo Liu, Jingdong Wang, Youjian Zhao, Ziwei Liu

    Abstract: Lip-syncing videos with given audio is the foundation for various applications including the creation of virtual presenters or performers. While recent studies explore high-fidelity lip-sync with different techniques, their task-orientated models either require long-term videos for clip-specific training or retain visible artifacts. In this paper, we propose a unified and effective framework ReSyn… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted to European Conference on Computer Vision (ECCV), 2024. Project page: https://guanjz20.github.io/projects/ReSyncer

  30. arXiv:2408.02993  [pdf, other

    cs.CV

    DreamLCM: Towards High-Quality Text-to-3D Generation via Latent Consistency Model

    Authors: Yiming Zhong, Xiaolin Zhang, Yao Zhao, Yunchao Wei

    Abstract: Recently, the text-to-3D task has developed rapidly due to the appearance of the SDS method. However, the SDS method always generates 3D objects with poor quality due to the over-smooth issue. This issue is attributed to two factors: 1) the DDPM single-step inference produces poor guidance gradients; 2) the randomness from the input noises and timesteps averages the details of the 3D contents. In… ▽ More

    Submitted 9 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

    Comments: 15 pages, 9 figures, ACM MM 2024

  31. arXiv:2408.02705  [pdf, other

    cs.LG cs.AI

    PSNE: Efficient Spectral Sparsification Algorithms for Scaling Network Embedding

    Authors: Longlong Lin, Yunfeng Yu, Zihao Wang, Zeli Wang, Yuying Zhao, Jin Zhao, Tao Jia

    Abstract: Network embedding has numerous practical applications and has received extensive attention in graph learning, which aims at mapping vertices into a low-dimensional and continuous dense vector space by preserving the underlying structural properties of the graph. Many network embedding methods have been proposed, among which factorization of the Personalized PageRank (PPR for short) matrix has been… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  32. arXiv:2408.02598  [pdf, other

    cs.LG cs.CY

    AI-Driven Strategies for Reducing Student Withdrawal -- A Study of EMU Student Stopout

    Authors: Yan Zhao, Amy Otteson

    Abstract: Not everyone who enrolls in college will leave with a certificate or degree, but the number of people who drop out or take a break is much higher than experts previously believed. In December 2013, there were 29 million people with some college education but no degree. That number jumped to 36 million by December of 2018, according to a new report from the National Student Clearinghouse Research C… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 6 pages, 5 figures

  33. arXiv:2408.02446  [pdf, other

    cs.IT

    Second 6G life Workshop on Post Shannon Theory

    Authors: Yaning Zhao, Christian Deppe

    Abstract: The one-day workshop, held prior to the "ZIF Workshop on Information Theory and Related Fields", provided an excellent opportunity for in-depth discussions on several topics within the field of post-Shannon theory. The agenda covered deterministic and randomized identification, focusing on various methods and algorithms for identifying data or signals deterministically and through randomized proce… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  34. arXiv:2408.01906  [pdf, ps, other

    cs.IT

    Binary $[n,(n\pm1)/2]$ cyclic codes with good minimum distances from sequences

    Authors: Xianhong Xie, Yaxin Zhao, Zhonghua Sun, Xiaobo Zhou

    Abstract: Recently, binary cyclic codes with parameters $[n,(n\pm1)/2,\geq \sqrt{n}]$ have been a hot topic since their minimum distances have a square-root bound. In this paper, we construct four classes of binary cyclic codes $\mathcal{C}_{\mathcal{S},0}$, $\mathcal{C}_{\mathcal{S},1}$ and $\mathcal{C}_{\mathcal{D},0}$, $\mathcal{C}_{\mathcal{D},1}$ by using two families of sequences, and obtain some code… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  35. arXiv:2408.01840  [pdf, other

    cs.CV

    E$^3$NeRF: Efficient Event-Enhanced Neural Radiance Fields from Blurry Images

    Authors: Yunshan Qi, Jia Li, Yifan Zhao, Yu Zhang, Lin Zhu

    Abstract: Neural Radiance Fields (NeRF) achieve impressive rendering performance by learning volumetric 3D representation from several images of different views. However, it is difficult to reconstruct a sharp NeRF from blurry input as it often occurs in the wild. To solve this problem, we propose a novel Efficient Event-Enhanced NeRF (E$^3$NeRF) by utilizing the combination of RGB images and event streams.… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  36. arXiv:2408.01687  [pdf, other

    cs.SE

    Voices from the Frontier: A Comprehensive Analysis of the OpenAI Developer Forum

    Authors: Xinyi Hou, Yanjie Zhao, Haoyu Wang

    Abstract: OpenAI's advanced large language models (LLMs) have revolutionized natural language processing and enabled developers to create innovative applications. As adoption grows, understanding the experiences and challenges of developers working with these technologies is crucial. This paper presents a comprehensive analysis of the OpenAI Developer Forum, focusing on (1) popularity trends and user engage… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  37. arXiv:2408.01544  [pdf, other

    cs.LG stat.AP

    Momentum Capture and Prediction System Based on Wimbledon Open2023 Tournament Data

    Authors: Chang Liu, Tongyuan Yang, Yan Zhao

    Abstract: There is a hidden energy in tennis, which cannot be seen or touched. It is the force that controls the flow of the game and is present in all types of matches. This mysterious force is Momentum. This study introduces an evaluation model that synergizes the Entropy Weight Method (EWM) and Gray Relation Analysis (GRA) to quantify momentum's impact on match outcomes. Empirical validation was conducte… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  38. arXiv:2408.00744  [pdf, other

    cs.CV

    Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation

    Authors: Siyu Jiao, Hongguang Zhu, Jiannan Huang, Yao Zhao, Yunchao Wei, Humphrey Shi

    Abstract: Pre-trained vision-language models, e.g. CLIP, have been increasingly used to address the challenging Open-Vocabulary Segmentation (OVS) task, benefiting from their well-aligned vision-text embedding space. Typical solutions involve either freezing CLIP during training to unilaterally maintain its zero-shot capability, or fine-tuning CLIP vision encoder to achieve perceptual sensitivity to local r… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  39. arXiv:2408.00629  [pdf, other

    cs.CV eess.IV

    Empowering Snapshot Compressive Imaging: Spatial-Spectral State Space Model with Across-Scanning and Local Enhancement

    Authors: Wenzhe Tian, Haijin Zeng, Yin-Ping Zhao, Yongyong Chen, Zhen Wang, Xuelong Li

    Abstract: Snapshot Compressive Imaging (SCI) relies on decoding algorithms such as CNN or Transformer to reconstruct the hyperspectral image (HSI) from its compressed measurement. Although existing CNN and Transformer-based methods have proven effective, CNNs are limited by their inadequate modeling of long-range dependencies, while Transformer ones face high computational costs due to quadratic complexity.… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 12 pages,6 figures

  40. arXiv:2408.00496  [pdf, other

    cs.CV

    SegStitch: Multidimensional Transformer for Robust and Efficient Medical Imaging Segmentation

    Authors: Shengbo Tan, Zeyu Zhang, Ying Cai, Daji Ergu, Lin Wu, Binbin Hu, Pengzhang Yu, Yang Zhao

    Abstract: Medical imaging segmentation plays a significant role in the automatic recognition and analysis of lesions. State-of-the-art methods, particularly those utilizing transformers, have been prominently adopted in 3D semantic segmentation due to their superior performance in scalability and generalizability. However, plain vision transformers encounter challenges due to their neglect of local features… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  41. arXiv:2408.00083  [pdf, other

    cs.CV

    Localized Gaussian Splatting Editing with Contextual Awareness

    Authors: Hanyuan Xiao, Yingshu Chen, Huajian Huang, Haolin Xiong, Jing Yang, Pratusha Prasad, Yajie Zhao

    Abstract: Recent text-guided generation of individual 3D object has achieved great success using diffusion priors. However, these methods are not suitable for object insertion and replacement tasks as they do not consider the background, leading to illumination mismatches within the environment. To bridge the gap, we introduce an illumination-aware 3D scene editing pipeline for 3D Gaussian Splatting (3DGS)… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

  42. arXiv:2407.21293  [pdf, other

    cs.CV cs.AI

    SimpleLLM4AD: An End-to-End Vision-Language Model with Graph Visual Question Answering for Autonomous Driving

    Authors: Peiru Zheng, Yun Zhao, Zhan Gong, Hong Zhu, Shaohua Wu

    Abstract: Many fields could benefit from the rapid development of the large language models (LLMs). The end-to-end autonomous driving (e2eAD) is one of the typically fields facing new opportunities as the LLMs have supported more and more modalities. Here, by utilizing vision-language model (VLM), we proposed an e2eAD method called SimpleLLM4AD. In our method, the e2eAD task are divided into four stages, wh… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 16 pages, 3 figures

  43. arXiv:2407.21243  [pdf, other

    cs.LG cs.AI

    Informed Correctors for Discrete Diffusion Models

    Authors: Yixiu Zhao, Jiaxin Shi, Lester Mackey, Scott Linderman

    Abstract: Discrete diffusion modeling is a promising framework for modeling and generating data in discrete spaces. To sample from these models, different strategies present trade-offs between computation and sample quality. A predominant sampling strategy is predictor-corrector $τ$-leaping, which simulates the continuous time generative process with discretized predictor steps and counteracts the accumulat… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  44. arXiv:2407.21075  [pdf, other

    cs.AI cs.CL cs.LG

    Apple Intelligence Foundation Language Models

    Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

    Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  45. arXiv:2407.20908  [pdf, other

    cs.CV

    Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering

    Authors: Yanpeng Zhao, Yiwei Hao, Siyu Gao, Yunbo Wang, Xiaokang Yang

    Abstract: Learning object-centric representations from unsupervised videos is challenging. Unlike most previous approaches that focus on decomposing 2D images, we present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning within a differentiable volume rendering framework. The key idea is to perform object-centric voxelization to capture the 3D nature of the scene,… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  46. arXiv:2407.20600  [pdf, other

    cs.CV

    Knowledge Fused Recognition: Fusing Hierarchical Knowledge for Image Recognition through Quantitative Relativity Modeling and Deep Metric Learning

    Authors: Yunfeng Zhao, Huiyu Zhou, Fei Wu, Xifeng Wu

    Abstract: Image recognition is an essential baseline for deep metric learning. Hierarchical knowledge about image classes depicts inter-class similarities or dissimilarities. Effective fusion of hierarchical knowledge about image classes to enhance image recognition remains a challenging topic to advance. In this paper, we propose a novel deep metric learning based method to effectively fuse hierarchical pr… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  47. arXiv:2407.19941  [pdf, other

    cs.LG

    Boosting Graph Foundation Model from Structural Perspective

    Authors: Yao Cheng, Yige Zhao, Jianxiang Yu, Xiang Li

    Abstract: Graph foundation models have recently attracted significant attention due to its strong generalizability. Although existing methods resort to language models to learn unified semantic representations across domains, they disregard the unique structural characteristics of graphs from different domains. To address the problem, in this paper, we boost graph foundation model from structural perspectiv… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  48. arXiv:2407.19775  [pdf, other

    cs.AI cs.CL cs.CR cs.DC

    Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference

    Authors: Claudio Angione, Yue Zhao, Harry Yang, Ahmad Farhan, Fielding Johnston, James Buban, Patrick Colangelo

    Abstract: The rapid growth of large-scale AI models, particularly large language models has brought significant challenges in data privacy, computational resources, and accessibility. Traditional centralized architectures often struggle to meet required data security and scalability needs which hinders the democratization of AI systems. Nesa introduces a model-agnostic sharding framework designed for decent… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  49. arXiv:2407.19711  [pdf, other

    cs.SE

    TVDiag: A Task-oriented and View-invariant Failure Diagnosis Framework with Multimodal Data

    Authors: Shuaiyu Xie, Jian Wang, Hanbin He, Zhihao Wang, Yuqi Zhao, Neng Zhang, Bing Li

    Abstract: Microservice-based systems often suffer from reliability issues due to their intricate interactions and expanding scale. With the rapid growth of observability techniques, various methods have been proposed to achieve failure diagnosis, including root cause localization and failure type identification, by leveraging diverse monitoring data such as logs, metrics, or traces. However, traditional fai… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 30 pages

  50. arXiv:2407.19672  [pdf, other

    cs.CL

    SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages

    Authors: Wenxuan Zhang, Hou Pong Chan, Yiran Zhao, Mahani Aljunied, Jianyu Wang, Chaoqun Liu, Yue Deng, Zhiqiang Hu, Weiwen Xu, Yew Ken Chia, Xin Li, Lidong Bing

    Abstract: Large Language Models (LLMs) have shown remarkable abilities across various tasks, yet their development has predominantly centered on high-resource languages like English and Chinese, leaving low-resource languages underserved. To address this disparity, we present SeaLLMs 3, the latest iteration of the SeaLLMs model family, tailored for Southeast Asian languages. This region, characterized by it… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.