Skip to main content

Showing 101–150 of 747 results for author: Yu, F

.
  1. arXiv:2311.17956  [pdf, other

    cs.LG cs.CV cs.NE

    QuadraNet: Improving High-Order Neural Interaction Efficiency with Hardware-Aware Quadratic Neural Networks

    Authors: Chenhui Xu, Fuxun Yu, Zirui Xu, Chenchen Liu, Jinjun Xiong, Xiang Chen

    Abstract: Recent progress in computer vision-oriented neural network designs is mostly driven by capturing high-order neural interactions among inputs and features. And there emerged a variety of approaches to accomplish this, such as Transformers and its variants. However, these interactions generate a large amount of intermediate state and/or strong data dependency, leading to considerable memory consum… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: ASP-DAC 2024 Best Paper Nomination

  2. arXiv:2311.16434  [pdf, ps, other

    astro-ph.SR

    Observational signature of continuously operating drivers of decayless kink oscillation

    Authors: Dong Li, Zhentong Li, Fanpeng Shi, Yang Su, Wei Chen, Fu Yu, Chuan Li, Ye Qiu, Yu Huang, Zongjun Ning

    Abstract: Decayless kink oscillations, which are nearly omnipresent in the solar corona, are believed to be driven by continuously operating energy supply. In this letter, we investigate an external continuous excitation of an apparent decayless oscillation during an X1.1 flare on June 20, 2023 (SOL2023-06-20T16:42).The decayless kink oscillation was identified in the coronal loop at extreme ultraviolet (EU… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: accepted by A&A

  3. arXiv:2311.13233  [pdf, other

    cs.CR cs.AI

    A Survey of Adversarial CAPTCHAs on its History, Classification and Generation

    Authors: Zisheng Xu, Qiao Yan, F. Richard Yu, Victor C. M. Leung

    Abstract: Completely Automated Public Turing test to tell Computers and Humans Apart, short for CAPTCHA, is an essential and relatively easy way to defend against malicious attacks implemented by bots. The security and usability trade-off limits the use of massive geometric transformations to interfere deep model recognition and deep models even outperformed humans in complex CAPTCHAs. The discovery of adve… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: Submitted to ACM Computing Surveys (Under Review)

  4. arXiv:2311.12345  [pdf, other

    cs.CV cs.AI cs.LG

    Stable Diffusion For Aerial Object Detection

    Authors: Yanan Jian, Fuxun Yu, Simranjit Singh, Dimitrios Stamoulis

    Abstract: Aerial object detection is a challenging task, in which one major obstacle lies in the limitations of large-scale data collection and the long-tail distribution of certain classes. Synthetic data offers a promising solution, especially with recent advances in diffusion-based methods like stable diffusion (SD). However, the direct application of diffusion methods to aerial domains poses unique chal… ▽ More

    Submitted 20 November, 2023; originally announced November 2023.

    Comments: Accepted at NeurIPS 2023 Synthetic Data Generation with Generative AI workshop

  5. arXiv:2311.10117  [pdf, other

    cs.AI cs.LG

    Automatic Engineering of Long Prompts

    Authors: Cho-Jui Hsieh, Si Si, Felix X. Yu, Inderjit S. Dhillon

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in solving complex open-domain tasks, guided by comprehensive instructions and demonstrations provided in the form of prompts. However, these prompts can be lengthy, often comprising hundreds of lines and thousands of tokens, and their design often requires considerable human effort. Recent research has explored automatic promp… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

  6. arXiv:2311.09724  [pdf, other

    cs.AI cs.CL

    OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning

    Authors: Fei Yu, Anningzhe Gao, Benyou Wang

    Abstract: Large language models (LLMs) often struggle with maintaining accuracy throughout multiple multiple reasoning steps, especially in mathematical reasoning where an error in earlier steps can propagate to subsequent ones and it ultimately leading to an incorrect answer. To reduce error propagation, guided decoding is employed to direct the LM decoding on a step-by-step basis. We argue that in guided… ▽ More

    Submitted 1 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL findings. https://github.com/FreedomIntelligence/OVM

  7. arXiv:2311.04471  [pdf, ps, other

    math.AP

    Multiple blowing-up solutions for a slightly critical Lane-Emden system with non-power nonlinearity

    Authors: Shengbing Deng, Fang Yu

    Abstract: In this paper, we study the following Lane-Emden system with nearly critical non-power nonlinearity \begin{eqnarray*} \left\{ \arraycolsep=1.5pt \begin{array}{lll} -Δu =\frac{|v|^{p-1}v}{[\ln(e+|v|)]^ε}\ \ &{\rm in}\ Ω, \\[2mm] -Δv =\frac{|u|^{q-1}u}{[\ln(e+|u|)]^ε}\ \ &{\rm in}\ Ω, \\[2mm] u= v=0 \ \ & {\rm on}\ \partialΩ, \end{array} \right. \end{eqnarray*} where $Ω$ is a bounded smooth domain… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  8. arXiv:2310.20242  [pdf, other

    cs.NI eess.SP

    Intelligent-Reflecting-Surface-Assisted UAV Communications for 6G Networks

    Authors: Zhaolong Ning, Tengfeng Li, Yu Wu, Xiaojie Wang, Qingqing Wu, Fei Richard Yu, Song Guo

    Abstract: In 6th-Generation (6G) mobile networks, Intelligent Reflective Surfaces (IRSs) and Unmanned Aerial Vehicles (UAVs) have emerged as promising technologies to address the coverage difficulties and resource constraints faced by terrestrial networks. UAVs, with their mobility and low costs, offer diverse connectivity options for mobile users and a novel deployment paradigm for 6G networks. However, th… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  9. arXiv:2310.19617  [pdf, other

    astro-ph.SR physics.space-ph

    Data-driven Modeling of a Coronal Magnetic Flux Rope: from Birth to Death

    Authors: J. H. Guo, Y. W. Ni, Y. Guo, C. Xia, B. Schmieder, S. Poedts, Z. Zhong, Y. H. Zhou, F. Yu, P. F. Chen

    Abstract: Magnetic flux ropes are a bundle of twisted magnetic field lines produced by internal electric currents, which are responsible for solar eruptions and are the major drivers of geomagnetic storms. As such, it is crucial to develop a numerical model that can capture the entire evolution of a flux rope, from its birth to death, in order to predict whether adverse space weather events might occur or n… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: 30 pages, 10 figures, Accepted for ApJ

  10. arXiv:2310.17944  [pdf, other

    cs.LG

    A Survey on Trustworthy Edge Intelligence: From Security and Reliability To Transparency and Sustainability

    Authors: Xiaojie Wang, Beibei Wang, Yu Wu, Zhaolong Ning, Song Guo, Fei Richard Yu

    Abstract: Edge Intelligence (EI) integrates Edge Computing (EC) and Artificial Intelligence (AI) to push the capabilities of AI to the network edge for real-time, efficient and secure intelligent decision-making and computation. However, EI faces various challenges due to resource constraints, heterogeneous network environments, and diverse service requirements of different applications, which together affe… ▽ More

    Submitted 25 January, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

    Comments: 25 pages, 6 figures, 8 tables

  11. arXiv:2310.17784  [pdf, other

    cs.CL cs.AI cs.LG

    Data-Centric Financial Large Language Models

    Authors: Zhixuan Chu, Huaiyu Guo, Xinyuan Zhou, Yijia Wang, Fei Yu, Hong Chen, Wanqing Xu, Xin Lu, Qing Cui, Longfei Li, Jun Zhou, Sheng Li

    Abstract: Large language models (LLMs) show promise for natural language tasks but struggle when applied directly to complex domains like finance. LLMs have difficulty reasoning about and integrating all relevant information. We propose a data-centric approach to enable LLMs to better handle financial tasks. Our key insight is that rather than overloading the LLM with everything at once, it is more effectiv… ▽ More

    Submitted 13 November, 2023; v1 submitted 7 October, 2023; originally announced October 2023.

  12. arXiv:2310.15301  [pdf, other

    cs.LG

    ADMarker: A Multi-Modal Federated Learning System for Monitoring Digital Biomarkers of Alzheimer's Disease

    Authors: Xiaomin Ouyang, Xian Shuai, Yang Li, Li Pan, Xifan Zhang, Heming Fu, Sitong Cheng, Xinyan Wang, Shihua Cao, Jiang Xin, Hazel Mok, Zhenyu Yan, Doris Sau Fung Yu, Timothy Kwok, Guoliang Xing

    Abstract: Alzheimer's Disease (AD) and related dementia are a growing global health challenge due to the aging population. In this paper, we present ADMarker, the first end-to-end system that integrates multi-modal sensors and new federated learning algorithms for detecting multidimensional AD digital biomarkers in natural living environments. ADMarker features a novel three-stage multi-modal federated lear… ▽ More

    Submitted 12 April, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

  13. arXiv:2310.15141  [pdf, other

    cs.LG cs.CL cs.DS cs.IT

    SpecTr: Fast Speculative Decoding via Optimal Transport

    Authors: Ziteng Sun, Ananda Theertha Suresh, Jae Hun Ro, Ahmad Beirami, Himanshu Jain, Felix Yu

    Abstract: Autoregressive sampling from large language models has led to state-of-the-art results in several natural language tasks. However, autoregressive sampling generates tokens one at a time making it slow, and even prohibitive in certain tasks. One way to speed up sampling is $\textit{speculative decoding}$: use a small model to sample a $\textit{draft}$ (block or sequence of tokens), and then score a… ▽ More

    Submitted 17 January, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  14. arXiv:2310.13810  [pdf

    cs.LG

    A Better Match for Drivers and Riders: Reinforcement Learning at Lyft

    Authors: Xabi Azagirre, Akshay Balwally, Guillaume Candeli, Nicholas Chamandy, Benjamin Han, Alona King, Hyungjun Lee, Martin Loncaric, Sebastien Martin, Vijay Narasiman, Zhiwei, Qin, Baptiste Richard, Sara Smoot, Sean Taylor, Garrett van Ryzin, Di Wu, Fei Yu, Alex Zamoshchin

    Abstract: To better match drivers to riders in our ridesharing application, we revised Lyft's core matching algorithm. We use a novel online reinforcement learning approach that estimates the future earnings of drivers in real time and use this information to find more efficient matches. This change was the first documented implementation of a ridesharing matching algorithm that can learn and improve in rea… ▽ More

    Submitted 13 November, 2023; v1 submitted 20 October, 2023; originally announced October 2023.

  15. arXiv:2310.12970  [pdf, other

    cs.CV cs.RO

    Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding

    Authors: Zhejun Zhang, Alexander Liniger, Christos Sakaridis, Fisher Yu, Luc Van Gool

    Abstract: The real-world deployment of an autonomous driving system requires its components to run on-board and in real-time, including the motion prediction module that predicts the future trajectories of surrounding traffic participants. Existing agent-centric methods have demonstrated outstanding performance on public benchmarks. However, they suffer from high computational overhead and poor scalability… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023

  16. arXiv:2310.10068  [pdf, other

    cs.CV

    Generalizable Person Search on Open-world User-Generated Video Content

    Authors: Junjie Li, Guanshuo Wang, Yichao Yan, Fufu Yu, Qiong Jia, Jie Qin, Shouhong Ding, Xiaokang Yang

    Abstract: Person search is a challenging task that involves detecting and retrieving individuals from a large set of un-cropped scene images. Existing person search applications are mostly trained and deployed in the same-origin scenarios. However, collecting and annotating training samples for each scene is often difficult due to the limitation of resources and the labor cost. Moreover, large-scale intra-d… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  17. arXiv:2310.04863  [pdf, other

    cs.SD eess.AS

    SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR

    Authors: Yangze Li, Fan Yu, Yuhao Liang, Pengcheng Guo, Mohan Shi, Zhihao Du, Shiliang Zhang, Lei Xie

    Abstract: Joint modeling of multi-speaker ASR and speaker diarization has recently shown promising results in speaker-attributed automatic speech recognition (SA-ASR).Although being able to obtain state-of-the-art (SOTA) performance, most of the studies are based on an autoregressive (AR) decoder which generates tokens one-by-one and results in a large real-time factor (RTF). To speed up inference, we intro… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  18. arXiv:2310.03006  [pdf, other

    cs.CV

    COOLer: Class-Incremental Learning for Appearance-Based Multiple Object Tracking

    Authors: Zhizheng Liu, Mattia Segu, Fisher Yu

    Abstract: Continual learning allows a model to learn multiple tasks sequentially while retaining the old knowledge without the training data of the preceding tasks. This paper extends the scope of continual learning research to class-incremental learning for multiple object tracking (MOT), which is desirable to accommodate the continuously evolving needs of autonomous systems. Previous solutions for continu… ▽ More

    Submitted 5 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: GCPR 2023 Oral

  19. arXiv:2310.02690  [pdf, other

    eess.IV cs.CV

    Multi-Dimension-Embedding-Aware Modality Fusion Transformer for Psychiatric Disorder Clasification

    Authors: Guoxin Wang, Xuyang Cao, Shan An, Fengmei Fan, Chao Zhang, Jinsong Wang, Feng Yu, Zhiren Wang

    Abstract: Deep learning approaches, together with neuroimaging techniques, play an important role in psychiatric disorders classification. Previous studies on psychiatric disorders diagnosis mainly focus on using functional connectivity matrices of resting-state functional magnetic resonance imaging (rs-fMRI) as input, which still needs to fully utilize the rich temporal information of the time series of rs… ▽ More

    Submitted 4 October, 2023; originally announced October 2023.

  20. arXiv:2310.02629  [pdf, other

    cs.SD eess.AS

    BA-MoE: Boundary-Aware Mixture-of-Experts Adapter for Code-Switching Speech Recognition

    Authors: Peikun Chen, Fan Yu, Yuhao Lian, Hongfei Xue, Xucheng Wan, Naijun Zheng, Huan Zhou, Lei Xie

    Abstract: Mixture-of-experts based models, which use language experts to extract language-specific representations effectively, have been well applied in code-switching automatic speech recognition. However, there is still substantial space to improve as similar pronunciation across languages may result in ineffective multi-language modeling and inaccurate language boundary estimation. To eliminate these dr… ▽ More

    Submitted 7 October, 2023; v1 submitted 4 October, 2023; originally announced October 2023.

    Comments: Accepted by ASRU2023

  21. arXiv:2310.01926  [pdf, other

    cs.CV cs.AI

    DARTH: Holistic Test-time Adaptation for Multiple Object Tracking

    Authors: Mattia Segu, Bernt Schiele, Fisher Yu

    Abstract: Multiple object tracking (MOT) is a fundamental component of perception systems for autonomous driving, and its robustness to unseen conditions is a requirement to avoid life-critical failures. Despite the urge of safety in driving systems, no solution to the MOT adaptation problem to domain shift in test-time conditions has ever been proposed. However, the nature of a MOT system is manifold - req… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Proceedings of the IEEE/CVF International Conference on Computer Vision

  22. arXiv:2309.16421  [pdf, other

    cs.CV

    Distilling ODE Solvers of Diffusion Models into Smaller Steps

    Authors: Sanghwan Kim, Hao Tang, Fisher Yu

    Abstract: Abstract Diffusion models have recently gained prominence as a novel category of generative models. Despite their success, these models face a notable drawback in terms of slow sampling speeds, requiring a high number of function evaluations (NFE) in the order of hundreds or thousands. In response, both learning-free and learning-based sampling strategies have been explored to expedite the samplin… ▽ More

    Submitted 26 March, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

  23. arXiv:2309.13573  [pdf, other

    cs.SD eess.AS

    The second multi-channel multi-party meeting transcription challenge (M2MeT) 2.0): A benchmark for speaker-attributed ASR

    Authors: Yuhao Liang, Mohan Shi, Fan Yu, Yangze Li, Shiliang Zhang, Zhihao Du, Qian Chen, Lei Xie, Yanmin Qian, Jian Wu, Zhuo Chen, Kong Aik Lee, Zhijie Yan, Hui Bu

    Abstract: With the success of the first Multi-channel Multi-party Meeting Transcription challenge (M2MeT), the second M2MeT challenge (M2MeT 2.0) held in ASRU2023 particularly aims to tackle the complex task of \emph{speaker-attributed ASR (SA-ASR)}, which directly addresses the practical and challenging problem of ``who spoke what at when" at typical meeting scenario. We particularly established two sub-tr… ▽ More

    Submitted 5 October, 2023; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: 8 pages, Accepted by ASRU2023

  24. arXiv:2309.12053  [pdf, other

    cs.CL

    AceGPT, Localizing Large Language Models in Arabic

    Authors: Huang Huang, Fei Yu, Jianqing Zhu, Xuening Sun, Hao Cheng, Dingjie Song, Zhihong Chen, Abdulmohsen Alharthi, Bang An, Juncai He, Ziche Liu, Zhiyi Zhang, Junying Chen, Jianquan Li, Benyou Wang, Lian Zhang, Ruoyu Sun, Xiang Wan, Haizhou Li, Jinchao Xu

    Abstract: This paper is devoted to the development of a localized Large Language Model (LLM) specifically for Arabic, a language imbued with unique cultural characteristics inadequately addressed by current mainstream models. Significant concerns emerge when addressing cultural sensitivity and local values. To address this, the paper proposes a comprehensive solution that includes further pre-training with… ▽ More

    Submitted 2 April, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: Accepted to NAACL main conference. https://github.com/FreedomIntelligence/AceGPT

  25. arXiv:2309.06006  [pdf, ps, other

    cs.CV cs.AI

    SoccerNet 2023 Challenges Results

    Authors: Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim , et al. (77 additional authors not shown)

    Abstract: The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, fo… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  26. arXiv:2309.05396  [pdf, other

    cs.SD eess.AS

    SlideSpeech: A Large-Scale Slide-Enriched Audio-Visual Corpus

    Authors: Haoxu Wang, Fan Yu, Xian Shi, Yuezhang Wang, Shiliang Zhang, Ming Li

    Abstract: Multi-Modal automatic speech recognition (ASR) techniques aim to leverage additional modalities to improve the performance of speech recognition systems. While existing approaches primarily focus on video or contextual information, the utilization of extra supplementary textual information has been overlooked. Recognizing the abundance of online conference videos with slides, which provide rich do… ▽ More

    Submitted 25 December, 2023; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Accepted by ICASSP 2024

  27. arXiv:2309.04707  [pdf, other

    cs.AI cs.LG

    Advantage Actor-Critic with Reasoner: Explaining the Agent's Behavior from an Exploratory Perspective

    Authors: Muzhe Guo, Feixu Yu, Tian Lan, Fang Jin

    Abstract: Reinforcement learning (RL) is a powerful tool for solving complex decision-making problems, but its lack of transparency and interpretability has been a major challenge in domains where decisions have significant real-world consequences. In this paper, we propose a novel Advantage Actor-Critic with Reasoner (A2CR), which can be easily applied to Actor-Critic-based RL models and make them interpre… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

  28. arXiv:2309.04422  [pdf, other

    cs.CV

    Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving

    Authors: Thomas E. Huang, Yifan Liu, Luc Van Gool, Fisher Yu

    Abstract: Performing multiple heterogeneous visual tasks in dynamic scenes is a hallmark of human perception capability. Despite remarkable progress in image and video recognition via representation learning, current research still focuses on designing specialized networks for singular, homogeneous, or simple combination of tasks. We instead explore the construction of a unified model for major image and vi… ▽ More

    Submitted 26 November, 2023; v1 submitted 8 September, 2023; originally announced September 2023.

    Comments: ICCV 2023, project page at https://www.vis.xyz/pub/vtd

  29. arXiv:2308.15726  [pdf

    cs.SD cs.AI eess.AS

    AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition

    Authors: Nan Che, Chenrui Liu, Fei Yu

    Abstract: Environmental sound scene and sound event recognition is important for the recognition of suspicious events in indoor and outdoor environments (such as nurseries, smart homes, nursing homes, etc.) and is a fundamental task involved in many audio surveillance applications. In particular, there is no public common data set for the research field of sound event recognition for the data set of the ind… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  30. arXiv:2308.15070  [pdf, other

    cs.CV

    DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior

    Authors: Xinqi Lin, Jingwen He, Ziyan Chen, Zhaoyang Lyu, Bo Dai, Fanghua Yu, Wanli Ouyang, Yu Qiao, Chao Dong

    Abstract: We present DiffBIR, a general restoration pipeline that could handle different blind image restoration tasks in a unified framework. DiffBIR decouples blind image restoration problem into two stages: 1) degradation removal: removing image-independent content; 2) information regeneration: generating the lost image content. Each stage is developed independently but they work seamlessly in a cascaded… ▽ More

    Submitted 12 April, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

  31. arXiv:2308.14713  [pdf, other

    cs.CV

    R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras

    Authors: Aron Schmied, Tobias Fischer, Martin Danelljan, Marc Pollefeys, Fisher Yu

    Abstract: Dense 3D reconstruction and ego-motion estimation are key challenges in autonomous driving and robotics. Compared to the complex, multi-modal systems deployed today, multi-camera systems provide a simpler, low-cost alternative. However, camera-based 3D reconstruction of complex dynamic scenes has proven extremely difficult, as existing solutions often produce incomplete or incoherent results. We p… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023. Project page is available at https://www.vis.xyz/pub/r3d3/

  32. arXiv:2308.12581  [pdf, other

    cs.LG cs.AI

    A Huber Loss Minimization Approach to Byzantine Robust Federated Learning

    Authors: Puning Zhao, Fei Yu, Zhiguo Wan

    Abstract: Federated learning systems are susceptible to adversarial attacks. To combat this, we introduce a novel aggregator based on Huber loss minimization, and provide a comprehensive theoretical analysis. Under independent and identically distributed (i.i.d) assumption, our approach has several advantages compared to existing methods. Firstly, it has optimal dependence on $ε$, which stands for the ratio… ▽ More

    Submitted 25 March, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

  33. MolGrapher: Graph-based Visual Recognition of Chemical Structures

    Authors: Lucas Morin, Martin Danelljan, Maria Isabel Agea, Ahmed Nassar, Valery Weber, Ingmar Meijer, Peter Staar, Fisher Yu

    Abstract: The automatic analysis of chemical literature has immense potential to accelerate the discovery of new materials and drugs. Much of the critical information in patent documents and scientific articles is contained in figures, depicting the molecule structures. However, automatically parsing the exact chemical structure is a formidable challenge, due to the amount of detailed information, the diver… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  34. arXiv:2308.11093  [pdf, other

    cs.CV cs.AI cs.LG

    Video OWL-ViT: Temporally-consistent open-world localization in video

    Authors: Georg Heigold, Matthias Minderer, Alexey Gritsenko, Alex Bewley, Daniel Keysers, Mario Lučić, Fisher Yu, Thomas Kipf

    Abstract: We present an architecture and a training recipe that adapts pre-trained open-world image models to localization in videos. Understanding the open visual world (without being constrained by fixed label spaces) is crucial for many real-world vision tasks. Contrastive pre-training on large image-text datasets has recently led to significant improvements for image-level tasks. For more structured tas… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  35. Primer on Axion Physics

    Authors: Felix Yu

    Abstract: I review the canonical axion potential, with an emphasis on the field theory underlying radial and angular modes of complex scalar fields. I present the explicit calculation of the instanton-induced breaking of the Goldstone field direction necessary to derive the canonical axion mass and decay constant relation. The primer is intended to serve an audience with elementary quantum field theory expe… ▽ More

    Submitted 16 August, 2023; originally announced August 2023.

    Comments: 15 pages, 1 figure; invited contribution to Annalen der Physik

    Report number: MITP-23-038

    Journal ref: Ann. Phys.(Berlin) 2023, 2300106

  36. arXiv:2308.06272  [pdf, other

    cs.HC cs.AI

    Beyond Reality: The Pivotal Role of Generative AI in the Metaverse

    Authors: Vinay Chamola, Gaurang Bansal, Tridib Kumar Das, Vikas Hassija, Naga Siva Sai Reddy, Jiacheng Wang, Sherali Zeadally, Amir Hussain, F. Richard Yu, Mohsen Guizani, Dusit Niyato

    Abstract: Imagine stepping into a virtual world that's as rich, dynamic, and interactive as our physical one. This is the promise of the Metaverse, and it's being brought to life by the transformative power of Generative Artificial Intelligence (AI). This paper offers a comprehensive exploration of how generative AI technologies are shaping the Metaverse, transforming it into a dynamic, immersive, and inter… ▽ More

    Submitted 28 July, 2023; originally announced August 2023.

    Comments: 8 pages, 4 figures

  37. arXiv:2308.05023  [pdf

    physics.chem-ph physics.atom-ph physics.comp-ph

    High-energy nitrogen rings stabilized by superatom properties

    Authors: Zhen Gong, Rui Wang, Famin Yu, Chenxi Wan, Xinrui Yang, Zhigang Wang

    Abstract: How to stabilize nitrogen-rich high-energy-density molecules under conventional conditions is particularly important for the energy storage and conversion of such systems and has attracted extensive attention. In this work, our theoretical study showed for the first time that the stabilization mechanism of the nitrogen ring conformed to the superatomic properties at the atomic level. This result o… ▽ More

    Submitted 9 August, 2023; originally announced August 2023.

    Comments: 6 pages, 3 figures

  38. arXiv:2308.03422  [pdf, other

    cs.CL cs.AI

    Prompt Guided Copy Mechanism for Conversational Question Answering

    Authors: Yong Zhang, Zhitao Li, Jianzong Wang, Yiming Gao, Ning Cheng, Fengying Yu, Jing Xiao

    Abstract: Conversational Question Answering (CQA) is a challenging task that aims to generate natural answers for conversational flow questions. In this paper, we propose a pluggable approach for extractive methods that introduces a novel prompt-guided copy mechanism to improve the fluency and appropriateness of the extracted answers. Our approach uses prompts to link questions to answers and employs attent… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: Accepted by 24th Annual Conference of the International Speech Communication Association (INTERSPEECH 2023)

  39. arXiv:2308.03364  [pdf, other

    cs.CV

    Dual Aggregation Transformer for Image Super-Resolution

    Authors: Zheng Chen, Yulun Zhang, Jinjin Gu, Linghe Kong, Xiaokang Yang, Fisher Yu

    Abstract: Transformer has recently gained considerable popularity in low-level vision tasks, including image super-resolution (SR). These networks utilize self-attention along different dimensions, spatial or channel, and achieve impressive performance. This inspires us to combine the two dimensions in Transformer for a more powerful representation capability. Based on the above idea, we propose a novel Tra… ▽ More

    Submitted 11 August, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV 2023. Code is available at https://github.com/zhengchen1999/DAT

  40. arXiv:2308.03166  [pdf, other

    cs.CV cs.AI

    Strategic Preys Make Acute Predators: Enhancing Camouflaged Object Detectors by Generating Camouflaged Objects

    Authors: Chunming He, Kai Li, Yachao Zhang, Yulun Zhang, Zhenhua Guo, Xiu Li, Martin Danelljan, Fisher Yu

    Abstract: Camouflaged object detection (COD) is the challenging task of identifying camouflaged objects visually blended into surroundings. Albeit achieving remarkable success, existing COD detectors still struggle to obtain precise results in some challenging cases. To handle this problem, we draw inspiration from the prey-vs-predator game that leads preys to develop better camouflage and predators to acqu… ▽ More

    Submitted 10 March, 2024; v1 submitted 6 August, 2023; originally announced August 2023.

    Comments: Accepted at ICLR 2024

  41. arXiv:2308.02621  [pdf, other

    cs.CV cs.LG

    Color Image Recovery Using Generalized Matrix Completion over Higher-Order Finite Dimensional Algebra

    Authors: Liang Liao, Zhuang Guo, Qi Gao, Yan Wang, Fajun Yu, Qifeng Zhao, Stephen Johh Maybank

    Abstract: To improve the accuracy of color image completion with missing entries, we present a recovery method based on generalized higher-order scalars. We extend the traditional second-order matrix model to a more comprehensive higher-order matrix equivalent, called the "t-matrix" model, which incorporates a pixel neighborhood expansion strategy to characterize the local pixel constraints. This "t-matrix"… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: 24 pages; 9 figures

  42. arXiv:2308.01244  [pdf, ps, other

    hep-th hep-ph quant-ph

    Quantum Imprint of the Anharmonic Oscillator

    Authors: Prisco Lo Chiatto, Sebastian Schenk, Felix Yu

    Abstract: We study the anharmonic double well in quantum mechanics using exact Wentzel-Kramers-Brillouin (WKB) methods in a 't Hooft-like double scaling limit where classical behavior is expected to dominate. We compute the tunneling action in this double scaling limit, and compare it to the transition amplitude from the vacuum to a highly excited state. Our results, exact in the semiclassical limit, show t… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: 23 pages

    Report number: MITP-23-034

  43. arXiv:2307.16147  [pdf

    physics.optics

    Broadband Dispersive-Wave Emission Coupled with Two-Stage Soliton Self-Compression in Gas-Filled Anti-Resonant Hollow-Core Fibers

    Authors: Jinyu Pan, Zhiyuan Huang, Yifei Chen, Fei Yu, Dakun Wu, Tiandao Chen, Donghan Liu, Yue Yu, Xin Jiang, Meng Pang, Yuxin Leng, Ruxin Li

    Abstract: We studied the underlying mechanism of broadband dispersive-wave emission within a resonance band of gas-filled anti-resonant hollow-core fiber. Both theoretical and experimental results unveiled that the high-order soliton, launched into the hollow-core fiber, experienced two stages of pulse compression, resulting in a multi-peak structure of the dispersive-wave spectrum. Over the first-stage pul… ▽ More

    Submitted 30 July, 2023; originally announced July 2023.

  44. arXiv:2307.16000  [pdf, other

    cs.CV

    Automated Hit-frame Detection for Badminton Match Analysis

    Authors: Yu-Hang Chien, Fang Yu

    Abstract: Sports professionals constantly under pressure to perform at the highest level can benefit from sports analysis, which allows coaches and players to reduce manual efforts and systematically evaluate their performance using automated tools. This research aims to advance sports analysis in badminton, systematically detecting hit-frames automatically from match videos using modern deep learning techn… ▽ More

    Submitted 2 August, 2023; v1 submitted 29 July, 2023; originally announced July 2023.

  45. arXiv:2307.14918  [pdf, other

    cs.CV

    GET3D--: Learning GET3D from Unconstrained Image Collections

    Authors: Fanghua Yu, Xintao Wang, Zheyuan Li, Yan-Pei Cao, Ying Shan, Chao Dong

    Abstract: The demand for efficient 3D model generation techniques has grown exponentially, as manual creation of 3D models is time-consuming and requires specialized expertise. While generative models have shown potential in creating 3D textured shapes from 2D images, their applicability in 3D industries is limited due to the lack of a well-defined camera distribution in real-world scenarios, resulting in l… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  46. arXiv:2307.13048   

    astro-ph.HE astro-ph.IM

    The IceCube-Gen2 Collaboration -- Contributions to the 38th International Cosmic Ray Conference (ICRC2023)

    Authors: IceCube-Gen2, :, R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, G. Anton, C. Argüelles, Y. Ashida, S. Athanasiadou, J. Audehm, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, V. Basu, R. Bay, J. Becker Tjus, J. Beise , et al. (432 additional authors not shown)

    Abstract: IceCube-Gen2 is a planned next-generation neutrino observatory at the South Pole that builds upon the successful design of IceCube. Integrating two complementary detection technologies for neutrinos, optical and radio Cherenkov emission, in combination with a surface array for cosmic ray air shower detection, IceCube-Gen2 will cover a broad neutrino energy range from MeV to EeV. This index of cont… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: To access the list of contributions, please follow the "HTML" link. Links to individual contributions will fill in as authors upload their material

  47. arXiv:2307.13047   

    astro-ph.HE astro-ph.IM

    The IceCube Collaboration -- Contributions to the 38th International Cosmic Ray Conference (ICRC2023)

    Authors: IceCube, :, R. Abbasi, M. Ackermann, J. Adams, S. K. Agarwalla, J. A. Aguilar, M. Ahlers, J. M. Alameddine, N. M. Amin, K. Andeen, G. Anton, C. Argüelles, Y. Ashida, S. Athanasiadou, S. N. Axani, X. Bai, A. Balagopal V., M. Baricevic, S. W. Barwick, V. Basu, R. Bay, J. J. Beatty, J. Becker Tjus, J. Beise , et al. (382 additional authors not shown)

    Abstract: The IceCube Observatory at the South Pole has been operating in its full configuration since May 2011 with a duty cycle of about 99%. Its main component consists of a cubic-kilometer array of optical sensors deployed deep in the Glacial ice designed for the detection of high-energy astrophysical neutrinos. A surface array for cosmic ray air shower detection, IceTop, and a denser inner subdetector,… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: To access the list of contributions, please follow the "HTML" link. Links to individual contributions will fill in as authors upload their material

  48. arXiv:2307.12862  [pdf, other

    cs.SI cs.LG stat.CO stat.ML

    Stochastic Step-wise Feature Selection for Exponential Random Graph Models (ERGMs)

    Authors: Helal El-Zaatari, Fei Yu, Michael R Kosorok

    Abstract: Statistical analysis of social networks provides valuable insights into complex network interactions across various scientific disciplines. However, accurate modeling of networks remains challenging due to the heavy computational burden and the need to account for observed network dependencies. Exponential Random Graph Models (ERGMs) have emerged as a promising technique used in social network mod… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

    Comments: 23 pages, 6 tables and 18 figures

  49. arXiv:2307.12191  [pdf, other

    astro-ph.SR physics.space-ph

    Effects of Coronal Magnetic Field Configuration on Particle Acceleration and Release during the Ground Level Enhancement Events in Solar Cycle 24

    Authors: Wenlong Liu, Xiangliang Kong, Fan Guo, Lulu Zhao, Shiwei Feng, Feiyu Yu, Zelong Jiang, Yao Chen, Joe Giacalone

    Abstract: Ground level enhancements (GLEs) are extreme solar energetic particle (SEP) events that are of particular importance in space weather. In solar cycle 24, two GLEs were recorded on 2012 May 17 (GLE 71) and 2017 September 10 (GLE 72), respectively, by a range of advanced modern instruments. Here we conduct a comparative analysis of the two events by focusing on the effects of large-scale magnetic fi… ▽ More

    Submitted 22 July, 2023; originally announced July 2023.

    Comments: Accepted for publication in ApJ

  50. arXiv:2307.11035  [pdf, other

    cs.CV cs.AI

    Cascade-DETR: Delving into High-Quality Universal Object Detection

    Authors: Mingqiao Ye, Lei Ke, Siyuan Li, Yu-Wing Tai, Chi-Keung Tang, Martin Danelljan, Fisher Yu

    Abstract: Object localization in general environments is a fundamental part of vision systems. While dominating on the COCO benchmark, recent Transformer-based detection methods are not competitive in diverse domains. Moreover, these methods still struggle to very accurately estimate the object bounding boxes in complex environments. We introduce Cascade-DETR for high-quality universal object detection. W… ▽ More

    Submitted 20 July, 2023; originally announced July 2023.

    Comments: Accepted in ICCV 2023. Our code and models will be released at https://github.com/SysCV/cascade-detr