Skip to main content

Showing 1–50 of 155 results for author: You, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.02284  [pdf, other

    cs.CV cs.AI

    Biochemical Prostate Cancer Recurrence Prediction: Thinking Fast & Slow

    Authors: Suhang You, Sanyukta Adap, Siddhesh Thakur, Bhakti Baheti, Spyridon Bakas

    Abstract: Time to biochemical recurrence in prostate cancer is essential for prognostic monitoring of the progression of patients after prostatectomy, which assesses the efficacy of the surgery. In this work, we proposed to leverage multiple instance learning through a two-stage ``thinking fast \& slow'' strategy for the time to recurrence (TTR) prediction. The first (``thinking fast'') stage finds the most… ▽ More

    Submitted 3 September, 2024; originally announced September 2024.

    Comments: 8 pages, 3 figures, methodology paper for LEOPRARD Challenge

    MSC Class: 68T10 ACM Class: I.5.4

  2. arXiv:2408.13423  [pdf, other

    cs.CV

    Training-free Long Video Generation with Chain of Diffusion Model Experts

    Authors: Wenhao Li, Yichao Cao, Xiu Su, Xi Lin, Shan You, Mingkai Zheng, Yi Chen, Chang Xu

    Abstract: Video generation models hold substantial potential in areas such as filmmaking. However, current video diffusion models need high computational costs and produce suboptimal results due to high complexity of video generation task. In this paper, we propose \textbf{ConFiner}, an efficient high-quality video generation framework that decouples video generation into easier subtasks: structure \textbf{… ▽ More

    Submitted 2 September, 2024; v1 submitted 23 August, 2024; originally announced August 2024.

  3. arXiv:2408.07018  [pdf, other

    cs.CV

    Efficient Human-Object-Interaction (EHOI) Detection via Interaction Label Coding and Conditional Decision

    Authors: Tsung-Shan Yang, Yun-Cheng Wang, Chengwei Wei, Suya You, C. -C. Jay Kuo

    Abstract: Human-Object Interaction (HOI) detection is a fundamental task in image understanding. While deep-learning-based HOI methods provide high performance in terms of mean Average Precision (mAP), they are computationally expensive and opaque in training and inference processes. An Efficient HOI (EHOI) detector is proposed in this work to strike a good balance between detection performance, inference c… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  4. arXiv:2408.01437  [pdf, other

    cs.CV cs.GR

    Img2CAD: Reverse Engineering 3D CAD Models from Images through VLM-Assisted Conditional Factorization

    Authors: Yang You, Mikaela Angelina Uy, Jiaqi Han, Rahul Thomas, Haotong Zhang, Suya You, Leonidas Guibas

    Abstract: Reverse engineering 3D computer-aided design (CAD) models from images is an important task for many downstream applications including interactive editing, manufacturing, architecture, robotics, etc. The difficulty of the task lies in vast representational disparities between the CAD output and the image input. CAD models are precise, programmatic constructs that involves sequential operations comb… ▽ More

    Submitted 19 July, 2024; originally announced August 2024.

  5. arXiv:2407.19407  [pdf, other

    physics.comp-ph cond-mat.mtrl-sci cs.LG math.OC

    Near-Isotropic Sub-Ångstrom 3D Resolution Phase Contrast Imaging Achieved by End-to-End Ptychographic Electron Tomography

    Authors: Shengboy You, Andrey Romanov, Philipp Pelz

    Abstract: Three-dimensional atomic resolution imaging using transmission electron microscopes is a unique capability that requires challenging experiments. Linear electron tomography methods are limited by the missing wedge effect, requiring a high tilt range. Multislice ptychography can achieve deep sub-Ångstrom resolution in the transverse direction, but the depth resolution is limited to 2 to 3 nanometer… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  6. arXiv:2407.04917  [pdf, other

    cs.PL

    A Calculus for Unreachable Code

    Authors: Peter Zhong, Shu-Hung You, Simone Campanoni, Robert Bruce Findler, Matthew Flatt, Christos Dimoulas

    Abstract: In Racket, the LLVM IR, Rust, and other modern languages, programmers and static analyses can hint, with special annotations, that certain parts of a program are unreachable. Same as other assumptions about undefined behavior; the compiler assumes these hints are correct and transforms the program aggressively. While compile-time transformations due to undefined behavior often perplex compiler w… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

  7. arXiv:2406.16822  [pdf, other

    cs.CR cs.DC

    A Multi-Party, Multi-Blockchain Atomic Swap Protocol with Universal Adaptor Secret

    Authors: Shengewei You, Aditya Joshi, Andrey Kuehlkamp, Jarek Nabrzyski

    Abstract: The increasing complexity of digital asset transactions across multiple blockchains necessitates a robust atomic swap protocol that can securely handle more than two participants. Traditional atomic swap protocols, including those based on adaptor signatures, are vulnerable to malicious dropout attacks, which break atomicity and compromise the security of the transaction. This paper presents a nov… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  8. arXiv:2406.10744  [pdf, other

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

  9. arXiv:2405.16144  [pdf, other

    cs.CV cs.AI

    GreenCOD: A Green Camouflaged Object Detection Method

    Authors: Hong-Shuo Chen, Yao Zhu, Suya You, Azad M. Madni, C. -C. Jay Kuo

    Abstract: We introduce GreenCOD, a green method for detecting camouflaged objects, distinct in its avoidance of backpropagation techniques. GreenCOD leverages gradient boosting and deep features extracted from pre-trained Deep Neural Networks (DNNs). Traditional camouflaged object detection (COD) approaches often rely on complex deep neural network architectures, seeking performance improvements through bac… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  10. arXiv:2404.06903  [pdf, other

    cs.CV cs.AI

    DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting

    Authors: Shijie Zhou, Zhiwen Fan, Dejia Xu, Haoran Chang, Pradyumna Chari, Tejas Bharadwaj, Suya You, Zhangyang Wang, Achuta Kadambi

    Abstract: The increasing demand for virtual reality applications has highlighted the significance of crafting immersive 3D assets. We present a text-to-3D 360$^{\circ}$ scene generation pipeline that facilitates the creation of comprehensive 360$^{\circ}$ scenes for in-the-wild environments in a matter of minutes. Our approach utilizes the generative power of a 2D diffusion model and prompt self-refinement… ▽ More

    Submitted 25 July, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

  11. arXiv:2403.20092  [pdf, other

    cs.CV

    Modeling Weather Uncertainty for Multi-weather Co-Presence Estimation

    Authors: Qi Bi, Shaodi You, Theo Gevers

    Abstract: Images from outdoor scenes may be taken under various weather conditions. It is well studied that weather impacts the performance of computer vision algorithms and needs to be handled properly. However, existing algorithms model weather condition as a discrete status and estimate it using multi-label classification. The fact is that, physically, specifically in meteorology, weather are modeled as… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Work in progress

  12. arXiv:2403.09338  [pdf, other

    cs.CV cs.AI

    LocalMamba: Visual State Space Model with Windowed Selective Scan

    Authors: Tao Huang, Xiaohuan Pei, Shan You, Fei Wang, Chen Qian, Chang Xu

    Abstract: Recent advancements in state space models, notably Mamba, have demonstrated significant progress in modeling long sequences for tasks like language understanding. Yet, their application in vision tasks has not markedly surpassed the performance of traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs). This paper posits that the key to enhancing Vision Mamba (ViM) lies in… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  13. arXiv:2403.06517  [pdf, other

    cs.CV cs.AI

    Active Generation for Image Classification

    Authors: Tao Huang, Jiaqi Liu, Shan You, Chang Xu

    Abstract: Recently, the growing capabilities of deep generative models have underscored their potential in enhancing image classification accuracy. However, existing methods often demand the generation of a disproportionately large number of images compared to the original dataset, while having only marginal improvements in accuracy. This computationally expensive and time-consuming process hampers the prac… ▽ More

    Submitted 15 August, 2024; v1 submitted 11 March, 2024; originally announced March 2024.

    Comments: ECCV 2024

  14. Enhancing Wind Speed and Wind Power Forecasting Using Shape-Wise Feature Engineering: A Novel Approach for Improved Accuracy and Robustness

    Authors: Mulomba Mukendi Christian, Yun Seon Kim, Hyebong Choi, Jaeyoung Lee, SongHee You

    Abstract: Accurate prediction of wind speed and power is vital for enhancing the efficiency of wind energy systems. Numerous solutions have been implemented to date, demonstrating their potential to improve forecasting. Among these, deep learning is perceived as a revolutionary approach in the field. However, despite their effectiveness, the noise present in the collected data remains a significant challeng… ▽ More

    Submitted 16 January, 2024; originally announced January 2024.

    Journal ref: International Journal of Advanced Culture Technology Vol.11 No.4 393-405 (2023)

  15. Enhancing Acute Kidney Injury Prediction through Integration of Drug Features in Intensive Care Units

    Authors: Gabriel D. M. Manalu, Mulomba Mukendi Christian, Songhee You, Hyebong Choi

    Abstract: The relationship between acute kidney injury (AKI) prediction and nephrotoxic drugs, or drugs that adversely affect kidney function, is one that has yet to be explored in the critical care setting. One contributing factor to this gap in research is the limited investigation of drug modalities in the intensive care unit (ICU) context, due to the challenges of processing prescription data into the c… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 9 pages, 2 tables

    Journal ref: International Journal of Advanced Smart Convergence Vol.12 No.4 434- 442 (2023)

  16. arXiv:2312.13307  [pdf, other

    cs.LG cs.AI cs.CV

    Not All Steps are Equal: Efficient Generation with Progressive Diffusion Models

    Authors: Wenhao Li, Xiu Su, Shan You, Tao Huang, Fei Wang, Chen Qian, Chang Xu

    Abstract: Diffusion models have demonstrated remarkable efficacy in various generative tasks with the predictive prowess of denoising model. Currently, these models employ a uniform denoising approach across all timesteps. However, the inherent variations in noisy latents at each timestep lead to conflicts during training, constraining the potential of diffusion models. To address this challenge, we propose… ▽ More

    Submitted 1 January, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  17. arXiv:2312.12471  [pdf, other

    cs.CV

    Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion

    Authors: Fan Zhang, Shaodi You, Yu Li, Ying Fu

    Abstract: Monocular depth estimation has experienced significant progress on terrestrial images in recent years, largely due to deep learning advancements. However, it remains inadequate for underwater scenes, primarily because of data scarcity. Given the inherent challenges of light attenuation and backscattering in water, acquiring clear underwater images or precise depth information is notably difficult… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: 10 pages

  18. arXiv:2312.03203  [pdf, other

    cs.CV

    Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields

    Authors: Shijie Zhou, Haoran Chang, Sicheng Jiang, Zhiwen Fan, Zehao Zhu, Dejia Xu, Pradyumna Chari, Suya You, Zhangyang Wang, Achuta Kadambi

    Abstract: 3D scene representations have gained immense popularity in recent years. Methods that use Neural Radiance fields are versatile for traditional tasks such as novel view synthesis. In recent times, some work has emerged that aims to extend the functionality of NeRF beyond view synthesis, for semantically aware tasks such as editing and segmentation using 3D feature field distillation from 2D foundat… ▽ More

    Submitted 8 April, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

  19. arXiv:2311.04944  [pdf, other

    cs.LG cs.AI cs.CR

    Edge-assisted U-Shaped Split Federated Learning with Privacy-preserving for Internet of Things

    Authors: Hengliang Tang, Zihang Zhao, Detian Liu, Yang Cao, Shiqiang Zhang, Siqing You

    Abstract: In the realm of the Internet of Things (IoT), deploying deep learning models to process data generated or collected by IoT devices is a critical challenge. However, direct data transmission can cause network congestion and inefficient execution, given that IoT devices typically lack computation and communication capabilities. Centralized data processing in data centers is also no longer feasible d… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  20. arXiv:2311.03799  [pdf, other

    cs.CV

    Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models

    Authors: Yichao Cao, Qingfei Tang, Xiu Su, Chen Song, Shan You, Xiaobo Lu, Chang Xu

    Abstract: Human-object interaction (HOI) detection aims to comprehend the intricate relationships between humans and objects, predicting $<human, action, object>$ triplets, and serving as the foundation for numerous computer vision tasks. The complexity and diversity of human-object interactions in the real world, however, pose significant challenges for both annotation and recognition, particularly in reco… ▽ More

    Submitted 7 November, 2023; originally announced November 2023.

  21. arXiv:2311.02535   

    cs.CV

    TokenMotion: Motion-Guided Vision Transformer for Video Camouflaged Object Detection Via Learnable Token Selection

    Authors: Zifan Yu, Erfan Bank Tavakoli, Meida Chen, Suya You, Raghuveer Rao, Sanjeev Agarwal, Fengbo Ren

    Abstract: The area of Video Camouflaged Object Detection (VCOD) presents unique challenges in the field of computer vision due to texture similarities between target objects and their surroundings, as well as irregular motion patterns caused by both objects and camera movement. In this paper, we introduce TokenMotion (TMNet), which employs a transformer-based model to enhance VCOD by extracting motion-guide… ▽ More

    Submitted 1 February, 2024; v1 submitted 4 November, 2023; originally announced November 2023.

    Comments: Revising Needed

  22. arXiv:2310.20187  [pdf, other

    cs.LG cs.AI

    Self-Supervised Pre-Training for Precipitation Post-Processor

    Authors: Sojung An, Junha Lee, Jiyeon Jang, Inchae Na, Wooyeon Park, Sujeong You

    Abstract: Obtaining a sufficient forecast lead time for local precipitation is essential in preventing hazardous weather events. Global warming-induced climate change increases the challenge of accurately predicting severe precipitation events, such as heavy rainfall. In this paper, we propose a deep learning-based precipitation post-processor for numerical weather prediction (NWP) models. The precipitation… ▽ More

    Submitted 19 February, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: 7 pages, 3 figures, 1 table, accepted to NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning at [this http URL](https://www.climatechange.ai/papers/neurips2023/18)

  23. arXiv:2310.18788  [pdf, other

    cs.CV

    PrObeD: Proactive Object Detection Wrapper

    Authors: Vishal Asnani, Abhinav Kumar, Suya You, Xiaoming Liu

    Abstract: Previous research in $2D$ object detection focuses on various tasks, including detecting objects in generic and camouflaged images. These works are regarded as passive works for object detection as they take the input image as is. However, convergence to global minima is not guaranteed to be optimal in neural networks; therefore, we argue that the trained weights in the object detector are not opt… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted at Neurips 2023

  24. arXiv:2310.16102  [pdf, other

    eess.IV cs.CV physics.optics

    Learned, Uncertainty-driven Adaptive Acquisition for Photon-Efficient Multiphoton Microscopy

    Authors: Cassandra Tong Ye, Jiashu Han, Kunzan Liu, Anastasios Angelopoulos, Linda Griffith, Kristina Monakhova, Sixian You

    Abstract: Multiphoton microscopy (MPM) is a powerful imaging tool that has been a critical enabler for live tissue imaging. However, since most multiphoton microscopy platforms rely on point scanning, there is an inherent trade-off between acquisition time, field of view (FOV), phototoxicity, and image quality, often resulting in noisy measurements when fast, large FOV, and/or gentle imaging is needed. Deep… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  25. arXiv:2310.10879  [pdf, other

    cs.LG cs.DC

    BLoad: Enhancing Neural Network Training with Efficient Sequential Data Handling

    Authors: Raphael Ruschel, A. S. M. Iftekhar, B. S. Manjunath, Suya You

    Abstract: The increasing complexity of modern deep neural network models and the expanding sizes of datasets necessitate the development of optimized and scalable training methods. In this white paper, we addressed the challenge of efficiently training neural network models using sequences of varying sizes. To address this challenge, we propose a novel training scheme that enables efficient distributed data… ▽ More

    Submitted 25 April, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  26. arXiv:2310.04995  [pdf, other

    cs.CV

    SemST: Semantically Consistent Multi-Scale Image Translation via Structure-Texture Alignment

    Authors: Ganning Zhao, Wenhui Cui, Suya You, C. -C. Jay Kuo

    Abstract: Unsupervised image-to-image (I2I) translation learns cross-domain image mapping that transfers input from the source domain to output in the target domain while preserving its semantics. One challenge is that different semantic statistics in source and target domains result in content discrepancy known as semantic distortion. To address this problem, a novel I2I method that maintains semantic cons… ▽ More

    Submitted 7 October, 2023; originally announced October 2023.

  27. arXiv:2310.04750  [pdf, other

    cs.AI cs.CV cs.LG

    DiffNAS: Bootstrapping Diffusion Models by Prompting for Better Architectures

    Authors: Wenhao Li, Xiu Su, Shan You, Fei Wang, Chen Qian, Chang Xu

    Abstract: Diffusion models have recently exhibited remarkable performance on synthetic data. After a diffusion path is selected, a base model, such as UNet, operates as a denoising autoencoder, primarily predicting noises that need to be eliminated step by step. Consequently, it is crucial to employ a model that aligns with the expected budgets to facilitate superior synthetic performance. In this paper, we… ▽ More

    Submitted 9 October, 2023; v1 submitted 7 October, 2023; originally announced October 2023.

  28. arXiv:2309.10421  [pdf, other

    cs.CV

    Exploring Different Levels of Supervision for Detecting and Localizing Solar Panels on Remote Sensing Imagery

    Authors: Maarten Burger, Rob Wijnhoven, Shaodi You

    Abstract: This study investigates object presence detection and localization in remote sensing imagery, focusing on solar panel recognition. We explore different levels of supervision, evaluating three models: a fully supervised object detector, a weakly supervised image classifier with CAM-based localization, and a minimally supervised anomaly detector. The classifier excels in binary presence detection (0… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Presented at the Netherlands Conference on Computer Vision (NCCV), The Hague, the Netherlands, September 14, 2023

  29. arXiv:2309.09078  [pdf, other

    cs.CV

    Unsupervised Green Object Tracker (GOT) without Offline Pre-training

    Authors: Zhiruo Zhou, Suya You, C. -C. Jay Kuo

    Abstract: Supervised trackers trained on labeled data dominate the single object tracking field for superior tracking accuracy. The labeling cost and the huge computational complexity hinder their applications on edge devices. Unsupervised learning methods have also been investigated to reduce the labeling cost but their complexity remains high. Aiming at lightweight high-performance tracking, feasibility w… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  30. arXiv:2309.00237  [pdf, other

    cs.CL cs.AI

    Publicly Shareable Clinical Large Language Model Built on Synthetic Clinical Notes

    Authors: Sunjun Kweon, Junu Kim, Jiyoun Kim, Sujeong Im, Eunbyeol Cho, Seongsu Bae, Jungwoo Oh, Gyubok Lee, Jong Hak Moon, Seng Chan You, Seungjin Baek, Chang Hoon Han, Yoon Bin Jung, Yohan Jo, Edward Choi

    Abstract: The development of large language models tailored for handling patients' clinical notes is often hindered by the limited accessibility and usability of these notes due to strict privacy regulations. To address these challenges, we first create synthetic large-scale clinical notes using publicly available case reports extracted from biomedical literature. We then use these synthetic notes to train… ▽ More

    Submitted 29 July, 2024; v1 submitted 1 September, 2023; originally announced September 2023.

    Comments: ACL 2024 (Findings)

  31. arXiv:2308.11880  [pdf, other

    cs.CV cs.LG

    SUMMIT: Source-Free Adaptation of Uni-Modal Models to Multi-Modal Targets

    Authors: Cody Simons, Dripta S. Raychaudhuri, Sk Miraj Ahmed, Suya You, Konstantinos Karydis, Amit K. Roy-Chowdhury

    Abstract: Scene understanding using multi-modal data is necessary in many applications, e.g., autonomous navigation. To achieve this in a variety of situations, existing models must be able to adapt to shifting data distributions without arduous data annotation. Current approaches assume that the source data is available during adaptation and that the source consists of paired multi-modal data. Both these a… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: 12 pages, 5 figures, 9 tables, ICCV 2023

  32. arXiv:2308.10761  [pdf, other

    cs.CV

    CoNe: Contrast Your Neighbours for Supervised Image Classification

    Authors: Mingkai Zheng, Shan You, Lang Huang, Xiu Su, Fei Wang, Chen Qian, Xiaogang Wang, Chang Xu

    Abstract: Image classification is a longstanding problem in computer vision and machine learning research. Most recent works (e.g. SupCon , Triplet, and max-margin) mainly focus on grouping the intra-class samples aggressively and compactly, with the assumption that all intra-class samples should be pulled tightly towards their class centers. However, such an objective will be very hard to achieve since it… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

  33. arXiv:2308.06692  [pdf, other

    cs.CV cs.LG

    SimMatchV2: Semi-Supervised Learning with Graph Consistency

    Authors: Mingkai Zheng, Shan You, Lang Huang, Chen Luo, Fei Wang, Chen Qian, Chang Xu

    Abstract: Semi-Supervised image classification is one of the most fundamental problem in computer vision, which significantly reduces the need for human labor. In this paper, we introduce a new semi-supervised learning algorithm - SimMatchV2, which formulates various consistency regularizations between labeled and unlabeled data from the graph perspective. In SimMatchV2, we regard the augmented view of a sa… ▽ More

    Submitted 13 August, 2023; originally announced August 2023.

  34. arXiv:2307.13529  [pdf, other

    cs.CV cs.AI

    Re-mine, Learn and Reason: Exploring the Cross-modal Semantic Correlations for Language-guided HOI detection

    Authors: Yichao Cao, Qingfei Tang, Feng Yang, Xiu Su, Shan You, Xiaobo Lu, Chang Xu

    Abstract: Human-Object Interaction (HOI) detection is a challenging computer vision task that requires visual models to address the complex interactive relationship between humans and objects and predict HOI triplets. Despite the challenges posed by the numerous interaction combinations, they also offer opportunities for multimodal learning of visual texts. In this paper, we present a systematic and unified… ▽ More

    Submitted 18 September, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

    Comments: ICCV2023

  35. arXiv:2307.00371  [pdf, other

    cs.CV

    Learning Content-enhanced Mask Transformer for Domain Generalized Urban-Scene Segmentation

    Authors: Qi Bi, Shaodi You, Theo Gevers

    Abstract: Domain-generalized urban-scene semantic segmentation (USSS) aims to learn generalized semantic predictions across diverse urban-scene styles. Unlike domain gap challenges, USSS is unique in that the semantic categories are often similar in different urban scenes, while the styles can vary significantly due to changes in urban landscapes, weather conditions, lighting, and other factors. Existing ap… ▽ More

    Submitted 17 December, 2023; v1 submitted 1 July, 2023; originally announced July 2023.

    Comments: Accepted by AAAI 2024. Camera-ready version with available source code

  36. arXiv:2305.15712  [pdf, other

    cs.CV cs.AI

    Knowledge Diffusion for Distillation

    Authors: Tao Huang, Yuan Zhang, Mingkai Zheng, Shan You, Fei Wang, Chen Qian, Chang Xu

    Abstract: The representation gap between teacher and student is an emerging topic in knowledge distillation (KD). To reduce the gap and improve the performance, current methods often resort to complicated training schemes, loss functions, and feature alignments, which are task-specific and feature-specific. In this paper, we state that the essence of these methods is to discard the noisy information and dis… ▽ More

    Submitted 3 December, 2023; v1 submitted 25 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  37. arXiv:2304.12591  [pdf, other

    cs.CV cs.AI eess.IV

    Unsupervised Synthetic Image Refinement via Contrastive Learning and Consistent Semantic-Structural Constraints

    Authors: Ganning Zhao, Tingwei Shen, Suya You, C. -C. Jay Kuo

    Abstract: Ensuring the realism of computer-generated synthetic images is crucial to deep neural network (DNN) training. Due to different semantic distributions between synthetic and real-world captured datasets, there exists semantic mismatch between synthetic and refined images, which in turn results in the semantic distortion. Recently, contrastive learning (CL) has been successfully used to pull correlat… ▽ More

    Submitted 26 April, 2023; v1 submitted 25 April, 2023; originally announced April 2023.

  38. arXiv:2304.12463  [pdf, other

    cs.CV cs.LG eess.IV

    A Study on Improving Realism of Synthetic Data for Machine Learning

    Authors: Tingwei Shen, Ganning Zhao, Suya You

    Abstract: Synthetic-to-real data translation using generative adversarial learning has achieved significant success in improving synthetic data. Yet, limited studies focus on deep evaluation and comparison of adversarial training on general-purpose synthetic data for machine learning. This work aims to train and evaluate a synthetic-to-real generative model that transforms the synthetic renderings into more… ▽ More

    Submitted 28 April, 2023; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: 8 pages, 1 figure, 7 tables. Submit to the "SPIE Defense + Commercial Sensing" conference

  39. arXiv:2304.10970  [pdf, other

    cs.LG

    Can GPT-4 Perform Neural Architecture Search?

    Authors: Mingkai Zheng, Xiu Su, Shan You, Fei Wang, Chen Qian, Chang Xu, Samuel Albanie

    Abstract: We investigate the potential of GPT-4~\cite{gpt4} to perform Neural Architecture Search (NAS) -- the task of designing effective neural architectures. Our proposed approach, \textbf{G}PT-4 \textbf{E}nhanced \textbf{N}eural arch\textbf{I}tect\textbf{U}re \textbf{S}earch (GENIUS), leverages the generative capabilities of GPT-4 as a black-box optimiser to quickly navigate the architecture search spac… ▽ More

    Submitted 1 August, 2023; v1 submitted 21 April, 2023; originally announced April 2023.

  40. arXiv:2302.08595  [pdf, other

    cs.CV eess.IV

    Frequency-domain Learning for Volumetric-based 3D Data Perception

    Authors: Zifan Yu, Suya You, Fengbo Ren

    Abstract: Frequency-domain learning draws attention due to its superior tradeoff between inference accuracy and input data size. Frequency-domain learning in 2D computer vision tasks has shown that 2D convolutional neural networks (CNN) have a stationary spectral bias towards low-frequency channels so that high-frequency channels can be pruned with no or little accuracy degradation. However, frequency-domai… ▽ More

    Submitted 20 February, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: 13 pages

  41. arXiv:2302.08594  [pdf, other

    cs.CV

    TransUPR: A Transformer-based Uncertain Point Refiner for LiDAR Point Cloud Semantic Segmentation

    Authors: Zifan Yu, Meida Chen, Zhikang Zhang, Suya You, Raghuveer Rao, Sanjeev Agarwal, Fengbo Ren

    Abstract: Common image-based LiDAR point cloud semantic segmentation (LiDAR PCSS) approaches have bottlenecks resulting from the boundary-blurring problem of convolution neural networks (CNNs) and quantitation loss of spherical projection. In this work, we propose a transformer-based plug-and-play uncertain point refiner, i.e., TransUPR, to refine selected uncertain points in a learnable manner, which leads… ▽ More

    Submitted 12 October, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: 6 pages; Accepted by 2023 IROS

  42. arXiv:2301.07666  [pdf, other

    cs.CV

    DDS: Decoupled Dynamic Scene-Graph Generation Network

    Authors: A S M Iftekhar, Raphael Ruschel, Satish Kumar, Suya You, B. S. Manjunath

    Abstract: Scene-graph generation involves creating a structural representation of the relationships between objects in a scene by predicting subject-object-relation triplets from input data. However, existing methods show poor performance in detecting triplets outside of a predefined set, primarily due to their reliance on dependent feature learning. To address this issue we propose DDS -- a decoupled dynam… ▽ More

    Submitted 18 January, 2023; originally announced January 2023.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  43. MR.Brick: Designing A Remote Mixed-reality Educational Game System for Promoting Children's Social & Collaborative Skills

    Authors: Yudan Wu, Shanhe You, Zixuan Guo, Xiangyang Li, Guyue Zhou, Jiangtao Gong

    Abstract: Children are one of the groups most influenced by COVID-19-related social distancing, and a lack of contact with peers can limit their opportunities to develop social and collaborative skills. However, remote socialization and collaboration as an alternative approach is still a great challenge for children. This paper presents MR.Brick, a Mixed Reality (MR) educational game system that helps child… ▽ More

    Submitted 26 January, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

    Comments: 14 pages, 9 figures

    MSC Class: H.5.2

    Journal ref: CHI2023

  44. arXiv:2212.14623  [pdf, other

    cs.LG physics.comp-ph stat.ML

    Essential Number of Principal Components and Nearly Training-Free Model for Spectral Analysis

    Authors: Yifeng Bie, Shuai You, Xinrui Li, Xuekui Zhang, Tao Lu

    Abstract: Through a study of multi-gas mixture datasets, we show that in multi-component spectral analysis, the number of functional or non-functional principal components required to retain the essential information is the same as the number of independent constituents in the mixture set. Due to the mutual in-dependency among different gas molecules, near one-to-one projection from the principal component… ▽ More

    Submitted 30 December, 2022; originally announced December 2022.

  45. arXiv:2212.04096  [pdf, other

    cs.CV

    ALTO: Alternating Latent Topologies for Implicit 3D Reconstruction

    Authors: Zhen Wang, Shijie Zhou, Jeong Joon Park, Despoina Paschalidou, Suya You, Gordon Wetzstein, Leonidas Guibas, Achuta Kadambi

    Abstract: This work introduces alternating latent topologies (ALTO) for high-fidelity reconstruction of implicit 3D surfaces from noisy point clouds. Previous work identifies that the spatial arrangement of latent encodings is important to recover detail. One school of thought is to encode a latent vector for each point (point latents). Another school of thought is to project point latents into a grid (grid… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  46. arXiv:2211.03932  [pdf, other

    cs.CV cs.MM

    Enhanced Low-resolution LiDAR-Camera Calibration Via Depth Interpolation and Supervised Contrastive Learning

    Authors: Zhikang Zhang, Zifan Yu, Suya You, Raghuveer Rao, Sanjeev Agarwal, Fengbo Ren

    Abstract: Motivated by the increasing application of low-resolution LiDAR recently, we target the problem of low-resolution LiDAR-camera calibration in this work. The main challenges are two-fold: sparsity and noise in point clouds. To address the problem, we propose to apply depth interpolation to increase the point density and supervised contrastive learning to learn noise-resistant features. The experime… ▽ More

    Submitted 7 November, 2022; originally announced November 2022.

  47. arXiv:2210.14670  [pdf, other

    cs.CV

    Boosting Semi-Supervised Semantic Segmentation with Probabilistic Representations

    Authors: Haoyu Xie, Changqi Wang, Mingkai Zheng, Minjing Dong, Shan You, Chong Fu, Chang Xu

    Abstract: Recent breakthroughs in semi-supervised semantic segmentation have been developed through contrastive learning. In prevalent pixel-wise contrastive learning solutions, the model maps pixels to deterministic representations and regularizes them in the latent space. However, there exist inaccurate pseudo-labels which map the ambiguous representations of pixels to the wrong classes due to the limited… ▽ More

    Submitted 15 December, 2022; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted to AAAI 2023

  48. arXiv:2210.04708  [pdf, other

    cs.CV

    GTAV-NightRain: Photometric Realistic Large-scale Dataset for Night-time Rain Streak Removal

    Authors: Fan Zhang, Shaodi You, Yu Li, Ying Fu

    Abstract: Rain is transparent, which reflects and refracts light in the scene to the camera. In outdoor vision, rain, especially rain streaks degrade visibility and therefore need to be removed. In existing rain streak removal datasets, although density, scale, direction and intensity have been considered, transparency is not fully taken into account. This problem is particularly serious in night scenes, wh… ▽ More

    Submitted 10 October, 2022; originally announced October 2022.

  49. arXiv:2207.07629  [pdf, other

    cs.CV

    GUSOT: Green and Unsupervised Single Object Tracking for Long Video Sequences

    Authors: Zhiruo Zhou, Hongyu Fu, Suya You, C. -C. Jay Kuo

    Abstract: Supervised and unsupervised deep trackers that rely on deep learning technologies are popular in recent years. Yet, they demand high computational complexity and a high memory cost. A green unsupervised single-object tracker, called GUSOT, that aims at object tracking for long videos under a resource-constrained environment is proposed in this work. Built upon a baseline tracker, UHP-SOT++, which… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

  50. arXiv:2207.07267  [pdf, other

    cs.CV cs.LG

    ScaleNet: Searching for the Model to Scale

    Authors: Jiyang Xie, Xiu Su, Shan You, Zhanyu Ma, Fei Wang, Chen Qian

    Abstract: Recently, community has paid increasing attention on model scaling and contributed to developing a model family with a wide spectrum of scales. Current methods either simply resort to a one-shot NAS manner to construct a non-structural and non-scalable model family or rely on a manual yet fixed scaling strategy to scale an unnecessarily best base model. In this paper, we bridge both two components… ▽ More

    Submitted 14 July, 2022; originally announced July 2022.

    Comments: Accepted by ECCV2022