Skip to main content

Showing 1–50 of 119 results for author: Porikli, F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.11306  [pdf, other

    cs.CV

    PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer

    Authors: Pierre-David Letourneau, Manish Kumar Singh, Hsin-Pai Cheng, Shizhong Han, Yunxiao Shi, Dalton Jones, Matthew Harper Langston, Hong Cai, Fatih Porikli

    Abstract: We present Polynomial Attention Drop-in Replacement (PADRe), a novel and unifying framework designed to replace the conventional self-attention mechanism in transformer models. Notably, several recent alternative attention mechanisms, including Hyena, Mamba, SimA, Conv2Former, and Castling-ViT, can be viewed as specific instances of our PADRe framework. PADRe leverages polynomial functions and dra… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  2. arXiv:2407.04800  [pdf, other

    cs.CV

    Segmentation-Free Guidance for Text-to-Image Diffusion Models

    Authors: Kambiz Azarian, Debasmit Das, Qiqi Hou, Fatih Porikli

    Abstract: We introduce segmentation-free guidance, a novel method designed for text-to-image diffusion models like Stable Diffusion. Our method does not require retraining of the diffusion model. At no additional compute cost, it uses the diffusion model itself as an implied segmentation network, hence named segmentation-free guidance, to dynamically adjust the negative prompt for each patch of the generate… ▽ More

    Submitted 3 June, 2024; originally announced July 2024.

  3. arXiv:2407.00021  [pdf, other

    cs.CV cs.GR eess.IV

    Neural Graphics Texture Compression Supporting Random Acces

    Authors: Farzad Farhadzadeh, Qiqi Hou, Hoang Le, Amir Said, Randall Rauwendaal, Alex Bourd, Fatih Porikli

    Abstract: Advances in rendering have led to tremendous growth in texture assets, including resolution, complexity, and novel textures components, but this growth in data volume has not been matched by advances in its compression. Meanwhile Neural Image Compression (NIC) has advanced significantly and shown promising results, but the proposed methods cannot be directly adapted to neural texture compression.… ▽ More

    Submitted 6 May, 2024; originally announced July 2024.

    Comments: ECCV submission

  4. arXiv:2406.08816  [pdf, other

    cs.CV

    ToSA: Token Selective Attention for Efficient Vision Transformers

    Authors: Manish Kumar Singh, Rajeev Yasarla, Hong Cai, Mingu Lee, Fatih Porikli

    Abstract: In this paper, we propose a novel token selective attention approach, ToSA, which can identify tokens that need to be attended as well as those that can skip a transformer layer. More specifically, a token selector parses the current attention maps and predicts the attention maps for the next layer, which are then used to select the important tokens that should participate in the attention operati… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: Accepted at CVPRW 2024

  5. arXiv:2406.08798  [pdf, other

    cs.CV

    FouRA: Fourier Low Rank Adaptation

    Authors: Shubhankar Borse, Shreya Kadambi, Nilesh Prasad Pandey, Kartikeya Bhardwaj, Viswanath Ganapathy, Sweta Priyadarshi, Risheek Garrepalli, Rafael Esteves, Munawar Hayat, Fatih Porikli

    Abstract: While Low-Rank Adaptation (LoRA) has proven beneficial for efficiently fine-tuning large models, LoRA fine-tuned text-to-image diffusion models lack diversity in the generated images, as the model tends to copy data from the observed training samples. This effect becomes more pronounced at higher values of adapter strength and for adapters with higher ranks which are fine-tuned on smaller datasets… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  6. arXiv:2404.09918  [pdf, other

    cs.CV

    EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting

    Authors: Min-Hui Lin, Mahesh Reddy, Guillaume Berger, Michel Sarkis, Fatih Porikli, Ning Bi

    Abstract: In this paper, we present EdgeRelight360, an approach for real-time video portrait relighting on mobile devices, utilizing text-conditioned generation of 360-degree high dynamic range image (HDRI) maps. Our method proposes a diffusion-based text-to-360-degree image generation in the HDR domain, taking advantage of the HDR10 standard. This technique facilitates the generation of high-quality, reali… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Camera-ready version (CVPR workshop - EDGE'24)

  7. arXiv:2404.08135  [pdf, other

    cs.CV

    SciFlow: Empowering Lightweight Optical Flow Models with Self-Cleaning Iterations

    Authors: Jamie Menjay Lin, Jisoo Jeong, Hong Cai, Risheek Garrepalli, Kai Wang, Fatih Porikli

    Abstract: Optical flow estimation is crucial to a variety of vision tasks. Despite substantial recent advancements, achieving real-time on-device optical flow estimation remains a complex challenge. First, an optical flow model must be sufficiently lightweight to meet computation and memory constraints to ensure real-time performance on devices. Second, the necessity for real-time on-device operation impose… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: CVPRW 2024

  8. arXiv:2403.18092  [pdf, other

    cs.CV

    OCAI: Improving Optical Flow Estimation by Occlusion and Consistency Aware Interpolation

    Authors: Jisoo Jeong, Hong Cai, Risheek Garrepalli, Jamie Menjay Lin, Munawar Hayat, Fatih Porikli

    Abstract: The scarcity of ground-truth labels poses one major challenge in developing optical flow estimation models that are both generalizable and robust. While current methods rely on data augmentation, they have yet to fully exploit the rich information available in labeled video sequences. We propose OCAI, a method that supports robust frame interpolation by generating intermediate video frames alongsi… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: CVPR 2024

  9. arXiv:2403.12953  [pdf, other

    cs.CV

    FutureDepth: Learning to Predict the Future Improves Video Depth Estimation

    Authors: Rajeev Yasarla, Manish Kumar Singh, Hong Cai, Yunxiao Shi, Jisoo Jeong, Yinhao Zhu, Shizhong Han, Risheek Garrepalli, Fatih Porikli

    Abstract: In this paper, we propose a novel video depth estimation approach, FutureDepth, which enables the model to implicitly leverage multi-frame and motion cues to improve depth estimation by making it learn to predict the future at training. More specifically, we propose a future prediction network, F-Net, which takes the features of multiple consecutive frames and is trained to predict multi-frame fea… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  10. arXiv:2403.12202  [pdf, other

    cs.CV

    DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions

    Authors: Yunxiao Shi, Manish Kumar Singh, Hong Cai, Fatih Porikli

    Abstract: In this paper, we introduce a novel approach that harnesses both 2D and 3D attentions to enable highly accurate depth completion without requiring iterative spatial propagations. Specifically, we first enhance a baseline convolutional depth completion model by applying attention to 2D features in the bottleneck and skip connections. This effectively improves the performance of this simple network… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted at CVPR 2024

  11. arXiv:2403.09620  [pdf, other

    cs.CV

    PosSAM: Panoptic Open-vocabulary Segment Anything

    Authors: Vibashan VS, Shubhankar Borse, Hyojin Park, Debasmit Das, Vishal Patel, Munawar Hayat, Fatih Porikli

    Abstract: In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP model in an end-to-end framework. While SAM excels in generating spatially-aware masks, it's decoder falls short in recognizing object class information and tends to oversegment without additional guidance. Existing appr… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  12. arXiv:2402.16739  [pdf, other

    cs.CV

    Neural Mesh Fusion: Unsupervised 3D Planar Surface Understanding

    Authors: Farhad G. Zanjani, Hong Cai, Yinhao Zhu, Leyla Mirvakhabova, Fatih Porikli

    Abstract: This paper presents Neural Mesh Fusion (NMF), an efficient approach for joint optimization of polygon mesh from multi-view image observations and unsupervised 3D planar-surface parsing of the scene. In contrast to implicit neural representations, NMF directly learns to deform surface triangle mesh and generate an embedding for unsupervised 3D planar segmentation through gradient-based optimization… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  13. arXiv:2402.09948  [pdf, other

    eess.SP cs.LG

    Neural 5G Indoor Localization with IMU Supervision

    Authors: Aleksandr Ermolov, Shreya Kadambi, Maximilian Arnold, Mohammed Hirzallah, Roohollah Amiri, Deepak Singh Mahendar Singh, Srinivas Yerramalli, Daniel Dijkman, Fatih Porikli, Taesang Yoo, Bence Major

    Abstract: Radio signals are well suited for user localization because they are ubiquitous, can operate in the dark and maintain privacy. Many prior works learn mappings between channel state information (CSI) and position fully-supervised. However, that approach relies on position labels which are very expensive to acquire. In this work, this requirement is relaxed by using pseudo-labels during deployment,… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: IEEE GLOBECOM 2023

  14. arXiv:2401.07727  [pdf, other

    cs.CV

    HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation

    Authors: Antoine Mercier, Ramin Nakhli, Mahesh Reddy, Rajeev Yasarla, Hong Cai, Fatih Porikli, Guillaume Berger

    Abstract: Despite the latest remarkable advances in generative modeling, efficient generation of high-quality 3D assets from textual prompts remains a difficult task. A key challenge lies in data scarcity: the most extensive 3D datasets encompass merely millions of assets, while their 2D counterparts contain billions of text-image pairs. To address this, we propose a novel approach which harnesses the power… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: 9 pages, 8 figures, 2 tables

  15. arXiv:2401.05735  [pdf, other

    cs.CV cs.LG

    Object-Centric Diffusion for Efficient Video Editing

    Authors: Kumara Kahatapitiya, Adil Karjauv, Davide Abati, Fatih Porikli, Yuki M. Asano, Amirhossein Habibian

    Abstract: Diffusion-based video editing have reached impressive quality and can transform either the global style, local structure, and attributes of given video inputs, following textual edit prompts. However, such solutions typically incur heavy memory and computational costs to generate temporally-coherent frames, either in the form of diffusion inversion and/or cross-frame attention. In this paper, we c… ▽ More

    Submitted 30 August, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

    Comments: ECCV24

  16. arXiv:2312.08128  [pdf, other

    cs.CV

    Clockwork Diffusion: Efficient Generation With Model-Step Distillation

    Authors: Amirhossein Habibian, Amir Ghodrati, Noor Fathima, Guillaume Sautiere, Risheek Garrepalli, Fatih Porikli, Jens Petersen

    Abstract: This work aims to improve the efficiency of text-to-image diffusion models. While diffusion models use computationally expensive UNet-based denoising operations in every generation step, we identify that not all operations are equally relevant for the final output quality. In particular, we observe that UNet layers operating on high-res feature maps are relatively sensitive to small perturbations.… ▽ More

    Submitted 20 February, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

  17. arXiv:2308.01483  [pdf, other

    cs.CV cs.GR cs.LG

    Efficient neural supersampling on a novel gaming dataset

    Authors: Antoine Mercier, Ruan Erasmus, Yashesh Savani, Manik Dhingra, Fatih Porikli, Guillaume Berger

    Abstract: Real-time rendering for video games has become increasingly challenging due to the need for higher resolutions, framerates and photorealism. Supersampling has emerged as an effective solution to address this challenge. Our work introduces a novel neural algorithm for supersampling rendered content that is 4 times more efficient than existing methods while maintaining the same level of accuracy. Ad… ▽ More

    Submitted 2 August, 2023; originally announced August 2023.

    Comments: ICCV'23

  18. arXiv:2307.14336  [pdf, other

    cs.CV

    MAMo: Leveraging Memory and Attention for Monocular Video Depth Estimation

    Authors: Rajeev Yasarla, Hong Cai, Jisoo Jeong, Yunxiao Shi, Risheek Garrepalli, Fatih Porikli

    Abstract: We propose MAMo, a novel memory and attention frame-work for monocular video depth estimation. MAMo can augment and improve any single-image depth estimation networks into video depth estimation models, enabling them to take advantage of the temporal information to predict more accurate depth. In MAMo, we augment model with memory which aids the depth prediction as the model streams through the vi… ▽ More

    Submitted 12 September, 2023; v1 submitted 26 July, 2023; originally announced July 2023.

    Comments: Accepted at ICCV 2023

  19. arXiv:2306.05691  [pdf, other

    cs.CV

    DIFT: Dynamic Iterative Field Transforms for Memory Efficient Optical Flow

    Authors: Risheek Garrepalli, Jisoo Jeong, Rajeswaran C Ravindran, Jamie Menjay Lin, Fatih Porikli

    Abstract: Recent advancements in neural network-based optical flow estimation often come with prohibitively high computational and memory requirements, presenting challenges in their model adaptation for mobile and low-power use cases. In this paper, we introduce a lightweight low-latency and memory-efficient model, Dynamic Iterative Field Transforms (DIFT), for optical flow estimation feasible for edge app… ▽ More

    Submitted 9 June, 2023; originally announced June 2023.

    Comments: CVPR MAI 2023 Accepted Paper

  20. arXiv:2306.03810  [pdf, other

    cs.CV cs.RO

    X-Align++: cross-modal cross-view alignment for Bird's-eye-view segmentation

    Authors: Shubhankar Borse, Senthil Yogamani, Marvin Klingner, Varun Ravi, Hong Cai, Abdulaziz Almuzairee, Fatih Porikli

    Abstract: Bird's-eye-view (BEV) grid is a typical representation of the perception of road components, e.g., drivable area, in autonomous driving. Most existing approaches rely on cameras only to perform segmentation in BEV space, which is fundamentally constrained by the absence of reliable depth information. The latest works leverage both camera and LiDAR modalities but suboptimally fuse their features us… ▽ More

    Submitted 6 June, 2023; originally announced June 2023.

    Comments: Accepted for publication at Springer Machine Vision and Applications Journal. The Version of Record of this article is published in Machine Vision and Applications Journal, and is available online at https://doi.org/10.1007/s00138-023-01400-7. arXiv admin note: substantial text overlap with arXiv:2210.06778

  21. arXiv:2305.10764  [pdf, other

    cs.CV

    OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding

    Authors: Minghua Liu, Ruoxi Shi, Kaiming Kuang, Yinhao Zhu, Xuanlin Li, Shizhong Han, Hong Cai, Fatih Porikli, Hao Su

    Abstract: We introduce OpenShape, a method for learning multi-modal joint representations of text, image, and point clouds. We adopt the commonly used multi-modal contrastive learning framework for representation alignment, but with a specific focus on scaling up 3D representations to enable open-world 3D shape understanding. To achieve this, we scale up training data by ensembling multiple 3D datasets and… ▽ More

    Submitted 16 June, 2023; v1 submitted 18 May, 2023; originally announced May 2023.

    Comments: Project Website: https://colin97.github.io/OpenShape/

  22. arXiv:2304.11431  [pdf, other

    cs.CV

    A Review of Deep Learning for Video Captioning

    Authors: Moloud Abdar, Meenakshi Kollati, Swaraja Kuraparthi, Farhad Pourpanah, Daniel McDuff, Mohammad Ghavamzadeh, Shuicheng Yan, Abduallah Mohamed, Abbas Khosravi, Erik Cambria, Fatih Porikli

    Abstract: Video captioning (VC) is a fast-moving, cross-disciplinary area of research that bridges work in the fields of computer vision, natural language processing (NLP), linguistics, and human-computer interaction. In essence, VC involves understanding a video and describing it with language. Captioning is used in a host of applications from creating more accessible interfaces (e.g., low-vision navigatio… ▽ More

    Submitted 22 April, 2023; originally announced April 2023.

    Comments: 42 pages, 10 figures

  23. arXiv:2304.05669  [pdf, other

    cs.CV cs.GR

    Factorized Inverse Path Tracing for Efficient and Accurate Material-Lighting Estimation

    Authors: Liwen Wu, Rui Zhu, Mustafa B. Yaldiz, Yinhao Zhu, Hong Cai, Janarbek Matai, Fatih Porikli, Tzu-Mao Li, Manmohan Chandraker, Ravi Ramamoorthi

    Abstract: Inverse path tracing has recently been applied to joint material and lighting estimation, given geometry and multi-view HDR observations of an indoor scene. However, it has two major limitations: path tracing is expensive to compute, and ambiguities exist between reflection and emission. Our Factorized Inverse Path Tracing (FIPT) addresses these challenges by using a factored light transport formu… ▽ More

    Submitted 23 August, 2023; v1 submitted 12 April, 2023; originally announced April 2023.

    Comments: Updated experiment results; modified real-world sections

  24. arXiv:2304.03369  [pdf, other

    cs.CV

    EGA-Depth: Efficient Guided Attention for Self-Supervised Multi-Camera Depth Estimation

    Authors: Yunxiao Shi, Hong Cai, Amin Ansari, Fatih Porikli

    Abstract: The ubiquitous multi-camera setup on modern autonomous vehicles provides an opportunity to construct surround-view depth. Existing methods, however, either perform independent monocular depth estimations on each camera or rely on computationally heavy self attention mechanisms. In this paper, we propose a novel guided attention architecture, EGA-Depth, which can improve both the efficiency and acc… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: CVPR 2023 Workshop on Autonomous Driving

  25. arXiv:2303.15651  [pdf, other

    cs.CV

    4D Panoptic Segmentation as Invariant and Equivariant Field Prediction

    Authors: Minghan Zhu, Shizhong Han, Hong Cai, Shubhankar Borse, Maani Ghaffari, Fatih Porikli

    Abstract: In this paper, we develop rotation-equivariant neural networks for 4D panoptic segmentation. 4D panoptic segmentation is a benchmark task for autonomous driving that requires recognizing semantic classes and object instances on the road based on LiDAR scans, as well as assigning temporally consistent IDs to instances across time. We observe that the driving scenario is symmetric to rotations on th… ▽ More

    Submitted 12 September, 2023; v1 submitted 27 March, 2023; originally announced March 2023.

    Comments: 13 pages. Accepted at ICCV 2023

  26. arXiv:2303.14078  [pdf, other

    cs.CV

    DistractFlow: Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling

    Authors: Jisoo Jeong, Hong Cai, Risheek Garrepalli, Fatih Porikli

    Abstract: We propose a novel data augmentation approach, DistractFlow, for training optical flow estimation models by introducing realistic distractions to the input frames. Based on a mixing ratio, we combine one of the frames in the pair with a distractor image depicting a similar domain, which allows for inducing visual perturbations congruent with natural objects and scenes. We refer to such pairs as di… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  27. arXiv:2303.04336  [pdf, other

    eess.IV cs.CV cs.LG

    QuickSRNet: Plain Single-Image Super-Resolution Architecture for Faster Inference on Mobile Platforms

    Authors: Guillaume Berger, Manik Dhingra, Antoine Mercier, Yashesh Savani, Sunny Panchal, Fatih Porikli

    Abstract: In this work, we present QuickSRNet, an efficient super-resolution architecture for real-time applications on mobile platforms. Super-resolution clarifies, sharpens, and upscales an image to higher resolution. Applications such as gaming and video playback along with the ever-improving display capabilities of TVs, smartphones, and VR headsets are driving the need for efficient upscaling solutions.… ▽ More

    Submitted 14 May, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: Camera-ready version (CVPR workshop - MAI'23)

  28. arXiv:2303.02203  [pdf, other

    cs.CV cs.RO

    X$^3$KD: Knowledge Distillation Across Modalities, Tasks and Stages for Multi-Camera 3D Object Detection

    Authors: Marvin Klingner, Shubhankar Borse, Varun Ravi Kumar, Behnaz Rezaei, Venkatraman Narayanan, Senthil Yogamani, Fatih Porikli

    Abstract: Recent advances in 3D object detection (3DOD) have obtained remarkably strong results for LiDAR-based models. In contrast, surround-view 3DOD models based on multiple camera images underperform due to the necessary view transformation of features from perspective view (PV) to a 3D world representation which is ambiguous due to missing depth information. This paper introduces X$^3$KD, a comprehensi… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  29. arXiv:2303.01573  [pdf, other

    cs.CV

    DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction

    Authors: Shubhankar Borse, Debasmit Das, Hyojin Park, Hong Cai, Risheek Garrepalli, Fatih Porikli

    Abstract: We present DejaVu, a novel framework which leverages conditional image regeneration as additional supervision during training to improve deep networks for dense prediction tasks such as segmentation, depth estimation, and surface normal prediction. First, we apply redaction to the input image, which removes certain structural information by sparse sampling or selective frequency removal. Next, we… ▽ More

    Submitted 29 March, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  30. arXiv:2302.14611  [pdf, other

    cs.CV

    TransAdapt: A Transformative Framework for Online Test Time Adaptive Semantic Segmentation

    Authors: Debasmit Das, Shubhankar Borse, Hyojin Park, Kambiz Azarian, Hong Cai, Risheek Garrepalli, Fatih Porikli

    Abstract: Test-time adaptive (TTA) semantic segmentation adapts a source pre-trained image semantic segmentation model to unlabeled batches of target domain test images, different from real-world, where samples arrive one-by-one in an online fashion. To tackle online settings, we propose TransAdapt, a framework that uses transformer and input transformations to improve segmentation performance. Specifically… ▽ More

    Submitted 23 February, 2023; originally announced February 2023.

    Comments: ICASSP 2023

  31. Adaptive Siamese Tracking with a Compact Latent Network

    Authors: Xingping Dong, Jianbing Shen, Fatih Porikli, Jiebo Luo, Ling Shao

    Abstract: In this paper, we provide an intuitive viewing to simplify the Siamese-based trackers by converting the tracking task to a classification. Under this viewing, we perform an in-depth analysis for them through visual simulations and real tracking examples, and find that the failure cases in some challenging situations can be regarded as the issue of missing decisive samples in offline training. Sinc… ▽ More

    Submitted 14 June, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Comments: Accepted at TPAMI

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022

  32. arXiv:2301.02240  [pdf, other

    cs.CV

    Skip-Attention: Improving Vision Transformers by Paying Less Attention

    Authors: Shashanka Venkataramanan, Amir Ghodrati, Yuki M. Asano, Fatih Porikli, Amirhossein Habibian

    Abstract: This work aims to improve the efficiency of vision transformers (ViT). While ViTs use computationally expensive self-attention operations in every layer, we identify that these operations are highly correlated across layers -- a key redundancy that causes unnecessary computations. Based on this observation, we propose SkipAt, a method to reuse self-attention computation from preceding layers to ap… ▽ More

    Submitted 17 January, 2023; v1 submitted 5 January, 2023; originally announced January 2023.

  33. arXiv:2212.14875  [pdf, other

    cs.LG cs.AI cs.CR cs.CV

    Guidance Through Surrogate: Towards a Generic Diagnostic Attack

    Authors: Muzammal Naseer, Salman Khan, Fatih Porikli, Fahad Shahbaz Khan

    Abstract: Adversarial training is an effective approach to make deep neural networks robust against adversarial attacks. Recently, different adversarial training defenses are proposed that not only maintain a high clean accuracy but also show significant robustness against popular and well studied adversarial attacks such as PGD. High adversarial robustness can also arise if an attack fails to find adversar… ▽ More

    Submitted 30 December, 2022; originally announced December 2022.

    Comments: IEEE Transactions on Neural Networks and Learning Systems (TNNLS)

  34. arXiv:2212.06242  [pdf, other

    cs.CV cs.LG

    Test-time Adaptation vs. Training-time Generalization: A Case Study in Human Instance Segmentation using Keypoints Estimation

    Authors: Kambiz Azarian, Debasmit Das, Hyojin Park, Fatih Porikli

    Abstract: We consider the problem of improving the human instance segmentation mask quality for a given test image using keypoints estimation. We compare two alternative approaches. The first approach is a test-time adaptation (TTA) method, where we allow test-time modification of the segmentation network's weights using a single unlabeled test image. In this approach, we do not assume test-time access to t… ▽ More

    Submitted 12 December, 2022; originally announced December 2022.

  35. arXiv:2212.01558  [pdf, other

    cs.CV cs.RO

    PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models

    Authors: Minghua Liu, Yinhao Zhu, Hong Cai, Shizhong Han, Zhan Ling, Fatih Porikli, Hao Su

    Abstract: Generalizable 3D part segmentation is important but challenging in vision and robotics. Training deep models via conventional supervised methods requires large-scale 3D datasets with fine-grained part annotations, which are costly to collect. This paper explores an alternative way for low-shot part segmentation of 3D point clouds by leveraging a pretrained image-language model, GLIP, which achieve… ▽ More

    Submitted 19 June, 2023; v1 submitted 3 December, 2022; originally announced December 2022.

    Comments: CVPR 2023, project page: https://colin97.github.io/PartSLIP_page/

  36. arXiv:2210.07199  [pdf, other

    cs.CV cs.LG cs.RO

    Self-Supervised Geometric Correspondence for Category-Level 6D Object Pose Estimation in the Wild

    Authors: Kaifeng Zhang, Yang Fu, Shubhankar Borse, Hong Cai, Fatih Porikli, Xiaolong Wang

    Abstract: While 6D object pose estimation has wide applications across computer vision and robotics, it remains far from being solved due to the lack of annotations. The problem becomes even more challenging when moving to category-level 6D pose, which requires generalization to unseen instances. Current approaches are restricted by leveraging annotations from simulation or collected from humans. In this pa… ▽ More

    Submitted 3 April, 2023; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Project page: https://kywind.github.io/self-pose

  37. arXiv:2210.06778  [pdf, other

    cs.CV

    X-Align: Cross-Modal Cross-View Alignment for Bird's-Eye-View Segmentation

    Authors: Shubhankar Borse, Marvin Klingner, Varun Ravi Kumar, Hong Cai, Abdulaziz Almuzairee, Senthil Yogamani, Fatih Porikli

    Abstract: Bird's-eye-view (BEV) grid is a common representation for the perception of road components, e.g., drivable area, in autonomous driving. Most existing approaches rely on cameras only to perform segmentation in BEV space, which is fundamentally constrained by the absence of reliable depth information. Latest works leverage both camera and LiDAR modalities, but sub-optimally fuse their features usin… ▽ More

    Submitted 31 October, 2022; v1 submitted 13 October, 2022; originally announced October 2022.

    Comments: Accepted to WACV 2023

  38. arXiv:2207.12272  [pdf, other

    cs.CV

    Online Adaptive Personalization for Face Anti-spoofing

    Authors: Davide Belli, Debasmit Das, Bence Major, Fatih Porikli

    Abstract: Face authentication systems require a robust anti-spoofing module as they can be deceived by fabricating spoof images of authorized users. Most recent face anti-spoofing methods rely on optimized architectures and training objectives to alleviate the distribution shift between train and test users. However, in real online scenarios, past data from a user contains valuable information that could be… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: IEEE International Conference on Image Processing (ICIP) 2022

  39. arXiv:2206.08655  [pdf, other

    cs.CV

    Learning Implicit Feature Alignment Function for Semantic Segmentation

    Authors: Hanzhe Hu, Yinbo Chen, Jiarui Xu, Shubhankar Borse, Hong Cai, Fatih Porikli, Xiaolong Wang

    Abstract: Integrating high-level context information with low-level details is of central importance in semantic segmentation. Towards this end, most existing segmentation models apply bilinear up-sampling and convolutions to feature maps of different scales, and then align them at the same resolution. However, bilinear up-sampling blurs the precise information learned in these feature maps and convolutions… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

  40. arXiv:2206.08423  [pdf, other

    cs.CV

    IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes

    Authors: Rui Zhu, Zhengqin Li, Janarbek Matai, Fatih Porikli, Manmohan Chandraker

    Abstract: Indoor scenes exhibit significant appearance variations due to myriad interactions between arbitrarily diverse object shapes, spatially-changing materials, and complex lighting. Shadows, highlights, and inter-reflections caused by visible and invisible light sources require reasoning about long-range interactions for inverse rendering, which seeks to recover the components of image formation, name… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: CVPR 22 camera ready version with supplementary

  41. arXiv:2206.08236  [pdf, other

    cs.CV cs.LG eess.IV

    Simple and Efficient Architectures for Semantic Segmentation

    Authors: Dushyant Mehta, Andrii Skliar, Haitam Ben Yahia, Shubhankar Borse, Fatih Porikli, Amirhossein Habibian, Tijmen Blankevoort

    Abstract: Though the state-of-the architectures for semantic segmentation, such as HRNet, demonstrate impressive accuracy, the complexity arising from their salient design choices hinders a range of model acceleration tools, and further they make use of operations that are inefficient on current hardware. This paper demonstrates that a simple encoder-decoder architecture with a ResNet-like backbone and a sm… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: To be presented at Efficient Deep Learning for Computer Vision Workshop at CVPR 2022

  42. arXiv:2204.07262  [pdf, other

    cs.CV

    Imposing Consistency for Optical Flow Estimation

    Authors: Jisoo Jeong, Jamie Menjay Lin, Fatih Porikli, Nojun Kwak

    Abstract: Imposing consistency through proxy tasks has been shown to enhance data-driven learning and enable self-supervision in various tasks. This paper introduces novel and effective consistency strategies for optical flow estimation, a problem where labels from real-world data are very challenging to derive. More specifically, we propose occlusion consistency and zero forcing in the forms of self-superv… ▽ More

    Submitted 24 May, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

    Comments: CVPR 2022

  43. arXiv:2204.05370  [pdf, other

    cs.CV

    Panoptic, Instance and Semantic Relations: A Relational Context Encoder to Enhance Panoptic Segmentation

    Authors: Shubhankar Borse, Hyojin Park, Hong Cai, Debasmit Das, Risheek Garrepalli, Fatih Porikli

    Abstract: This paper presents a novel framework to integrate both semantic and instance contexts for panoptic segmentation. In existing works, it is common to use a shared backbone to extract features for both things (countable classes such as vehicles) and stuff (uncountable classes such as roads). This, however, fails to capture the rich relations among them, which can be utilized to enhance visual unders… ▽ More

    Submitted 11 April, 2022; originally announced April 2022.

    Comments: Accepted to CVPR 2022

  44. arXiv:2204.02397  [pdf, other

    cs.CV

    SALISA: Saliency-based Input Sampling for Efficient Video Object Detection

    Authors: Babak Ehteshami Bejnordi, Amirhossein Habibian, Fatih Porikli, Amir Ghodrati

    Abstract: High-resolution images are widely adopted for high-performance object detection in videos. However, processing high-resolution inputs comes with high computation costs, and naive down-sampling of the input to reduce the computation costs quickly degrades the detection performance. In this paper, we propose SALISA, a novel non-uniform SALiency-based Input SAmpling technique for video object detecti… ▽ More

    Submitted 5 April, 2022; originally announced April 2022.

    Comments: 20 pages, 7 figures

  45. arXiv:2203.09594  [pdf, other

    cs.CV cs.LG

    Delta Distillation for Efficient Video Processing

    Authors: Amirhossein Habibian, Haitam Ben Yahia, Davide Abati, Efstratios Gavves, Fatih Porikli

    Abstract: This paper aims to accelerate video stream processing, such as object detection and semantic segmentation, by leveraging the temporal redundancies that exist between video frames. Instead of propagating and warping features using motion alignment, such as optical flow, we propose a novel knowledge distillation schema coined as Delta Distillation. In our proposal, the student learns the variations… ▽ More

    Submitted 17 March, 2022; originally announced March 2022.

  46. arXiv:2202.04861  [pdf, other

    cs.CV

    Consistency and Diversity induced Human Motion Segmentation

    Authors: Tao Zhou, Huazhu Fu, Chen Gong, Ling Shao, Fatih Porikli, Haibin Ling, Jianbing Shen

    Abstract: Subspace clustering is a classical technique that has been widely used for human motion segmentation and other related tasks. However, existing segmentation methods often cluster data without guidance from prior knowledge, resulting in unsatisfactory segmentation results. To this end, we propose a novel Consistency and Diversity induced human Motion Segmentation (CDMS) algorithm. Specifically, our… ▽ More

    Submitted 10 February, 2022; originally announced February 2022.

    Comments: This paper has been accepted by IEEE TPAMI

  47. arXiv:2111.12265  [pdf, other

    cs.CV

    Distribution Estimation to Automate Transformation Policies for Self-Supervision

    Authors: Seunghan Yang, Debasmit Das, Simyung Chang, Sungrack Yun, Fatih Porikli

    Abstract: In recent visual self-supervision works, an imitated classification objective, called pretext task, is established by assigning labels to transformed or augmented input images. The goal of pretext can be predicting what transformations are applied to the image. However, it is observed that image transformations already present in the dataset might be less effective in learning such self-supervised… ▽ More

    Submitted 23 November, 2021; originally announced November 2021.

    Comments: NeurIPS 2021 Workshop: Self-Supervised Learning - Theory and Practice

  48. arXiv:2111.06500  [pdf, other

    cs.CV

    Dynamic Iterative Refinement for Efficient 3D Hand Pose Estimation

    Authors: John Yang, Yash Bhalgat, Simyung Chang, Fatih Porikli, Nojun Kwak

    Abstract: While hand pose estimation is a critical component of most interactive extended reality and gesture recognition systems, contemporary approaches are not optimized for computational and memory efficiency. In this paper, we propose a tiny deep neural network of which partial layers are recursively exploited for refining its previous estimations. During its iterative refinements, we employ learned ga… ▽ More

    Submitted 11 November, 2021; originally announced November 2021.

  49. arXiv:2111.02333  [pdf, other

    cs.CV

    HS3: Learning with Proper Task Complexity in Hierarchically Supervised Semantic Segmentation

    Authors: Shubhankar Borse, Hong Cai, Yizhe Zhang, Fatih Porikli

    Abstract: While deeply supervised networks are common in recent literature, they typically impose the same learning objective on all transitional layers despite their varying representation powers. In this paper, we propose Hierarchically Supervised Semantic Segmentation (HS3), a training scheme that supervises intermediate layers in a segmentation network to learn meaningful representations by varying ta… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

    Comments: Accepted to BMVC 2021

  50. arXiv:2110.12516  [pdf, other

    cs.CV

    X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task Distillation

    Authors: Hong Cai, Janarbek Matai, Shubhankar Borse, Yizhe Zhang, Amin Ansari, Fatih Porikli

    Abstract: In this paper, we propose a novel method, X-Distill, to improve the self-supervised training of monocular depth via cross-task knowledge distillation from semantic segmentation to depth estimation. More specifically, during training, we utilize a pretrained semantic segmentation teacher network and transfer its semantic knowledge to the depth network. In order to enable such knowledge distillation… ▽ More

    Submitted 24 October, 2021; originally announced October 2021.

    Comments: Accepted to BMVC 2021