Skip to main content

Showing 1–50 of 1,998 results for author: Lee, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10240  [pdf, other

    cs.HC cs.AI cs.CV

    AltCanvas: A Tile-Based Image Editor with Generative AI for Blind or Visually Impaired People

    Authors: Seonghee Lee, Maho Kohga, Steve Landau, Sile O'Modhrain, Hari Subramonyam

    Abstract: People with visual impairments often struggle to create content that relies heavily on visual elements, particularly when conveying spatial and structural information. Existing accessible drawing tools, which construct images line by line, are suitable for simple tasks like math but not for more expressive artwork. On the other hand, emerging generative AI-based text-to-image tools can produce exp… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  2. arXiv:2408.09734  [pdf, other

    cs.CV cs.AI

    Mutually-Aware Feature Learning for Few-Shot Object Counting

    Authors: Yerim Jeon, Subeen Lee, Jihwan Kim, Jae-Pil Heo

    Abstract: Few-shot object counting has garnered significant attention for its practicality as it aims to count target objects in a query image based on given exemplars without the need for additional training. However, there is a shortcoming in the prevailing extract-and-match approach: query and exemplar features lack interaction during feature extraction since they are extracted unaware of each other and… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Submitted to Pattern Recognition

  3. arXiv:2408.09657  [pdf, other

    cs.SE

    Impact of Large Language Models of Code on Fault Localization

    Authors: Suhwan Ji, Sanghwa Lee, Changsup Lee, Hyeonseung Im, Yo-Sub Han

    Abstract: Identifying the point of error is imperative in software debugging. Traditional fault localization (FL) techniques rely on executing the program and using the code coverage matrix in tandem with test case results to calculate a suspiciousness score for each function or line. Recently, learning-based FL techniques have harnessed machine learning models to extract meaningful features from the code c… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

  4. arXiv:2408.08461  [pdf, other

    cs.CV

    TEXTOC: Text-driven Object-Centric Style Transfer

    Authors: Jihun Park, Jongmin Gim, Kyoungmin Lee, Seunghun Lee, Sunghoon Im

    Abstract: We present Text-driven Object-Centric Style Transfer (TEXTOC), a novel method that guides style transfer at an object-centric level using textual inputs. The core of TEXTOC is our Patch-wise Co-Directional (PCD) loss, meticulously designed for precise object-centric transformations that are closely aligned with the input text. This loss combines a patch directional loss for text-guided style direc… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    ACM Class: I.4; I.4.9

  5. arXiv:2408.08446  [pdf, other

    cs.LG

    Lifelong Reinforcement Learning via Neuromodulation

    Authors: Sebastian Lee, Samuel Liebana Garcia, Claudia Clopath, Will Dabney

    Abstract: Navigating multiple tasks$\unicode{x2014}$for instance in succession as in continual or lifelong learning, or in distributions as in meta or multi-task learning$\unicode{x2014}$requires some notion of adaptation. Evolution over timescales of millennia has imbued humans and other animals with highly effective adaptive learning and decision-making strategies. Central to these functions are so-called… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  6. DIVE: Towards Descriptive and Diverse Visual Commonsense Generation

    Authors: Jun-Hyung Park, Hyuntae Park, Youjin Kang, Eojin Jeon, SangKeun Lee

    Abstract: Towards human-level visual understanding, visual commonsense generation has been introduced to generate commonsense inferences beyond images. However, current research on visual commonsense generation has overlooked an important human cognitive ability: generating descriptive and diverse inferences. In this work, we propose a novel visual commonsense generation framework, called DIVE, which aims t… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 19 pages, 10 figuers, EMNLP 2023 (main)

  7. arXiv:2408.08019  [pdf, other

    cs.SD cs.AI cs.LG eess.AS eess.SP

    Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization

    Authors: Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee

    Abstract: This paper introduces PeriodWave-Turbo, a high-fidelity and high-efficient waveform generation model via adversarial flow matching optimization. Recently, conditional flow matching (CFM) generative models have been successfully adopted for waveform generation tasks, leveraging a single vector field estimation objective for training. Although these models can generate high-fidelity waveform signals… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 9 pages, 9 tables, 1 figure,

  8. arXiv:2408.07790  [pdf, other

    cs.CV

    Cropper: Vision-Language Model for Image Cropping through In-Context Learning

    Authors: Seung Hyun Lee, Junjie Ke, Yinxiao Li, Junfeng He, Steven Hickson, Katie Datsenko, Sangpil Kim, Ming-Hsuan Yang, Irfan Essa, Feng Yang

    Abstract: The goal of image cropping is to identify visually appealing crops within an image. Conventional methods rely on specialized architectures trained on specific datasets, which struggle to be adapted to new requirements. Recent breakthroughs in large vision-language models (VLMs) have enabled visual in-context learning without explicit training. However, effective strategies for vision downstream ta… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  9. arXiv:2408.07648  [pdf, other

    cs.CV cs.CL

    See It All: Contextualized Late Aggregation for 3D Dense Captioning

    Authors: Minjung Kim, Hyung Suk Lim, Seung Hwan Kim, Soonyoung Lee, Bumsoo Kim, Gunhee Kim

    Abstract: 3D dense captioning is a task to localize objects in a 3D scene and generate descriptive sentences for each object. Recent approaches in 3D dense captioning have adopted transformer encoder-decoder frameworks from object detection to build an end-to-end pipeline without hand-crafted components. However, these approaches struggle with contradicting objectives where a single query attention has to s… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted to ACL 2024 Findings

  10. arXiv:2408.07547  [pdf, other

    cs.SD cs.AI cs.LG eess.AS eess.SP

    PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation

    Authors: Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee

    Abstract: Recently, universal waveform generation tasks have been investigated conditioned on various out-of-distribution scenarios. Although GAN-based methods have shown their strength in fast waveform generation, they are vulnerable to train-inference mismatch scenarios such as two-stage text-to-speech. Meanwhile, diffusion-based models have shown their powerful generative performance in other domains; ho… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 24 pages, 16 tables, 4 figures

  11. arXiv:2408.06707  [pdf, other

    cs.CV

    MAIR++: Improving Multi-view Attention Inverse Rendering with Implicit Lighting Representation

    Authors: JunYong Choi, SeokYeong Lee, Haesol Park, Seung-Won Jung, Ig-Jae Kim, Junghyun Cho

    Abstract: In this paper, we propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, SVBRDF, and 3D spatially-varying lighting. While multi-view images have been widely used for object-level inverse rendering, scene-level inverse rendering has primarily been studied using single-view images due to the lack of a dataset containing high dynamic range… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  12. arXiv:2408.06672  [pdf, other

    cs.LG cs.AI

    Leveraging Priors via Diffusion Bridge for Time Series Generation

    Authors: Jinseong Park, Seungyun Lee, Woojin Jeong, Yujin Choi, Jaewook Lee

    Abstract: Time series generation is widely used in real-world applications such as simulation, data augmentation, and hypothesis test techniques. Recently, diffusion models have emerged as the de facto approach for time series generation, emphasizing diverse synthesis scenarios based on historical or correlated time series data streams. Since time series have unique characteristics, such as fixed time order… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  13. arXiv:2408.06662  [pdf, other

    cs.CV

    Bi-directional Contextual Attention for 3D Dense Captioning

    Authors: Minjung Kim, Hyung Suk Lim, Soonyoung Lee, Bumsoo Kim, Gunhee Kim

    Abstract: 3D dense captioning is a task involving the localization of objects and the generation of descriptions for each object in a 3D scene. Recent approaches have attempted to incorporate contextual information by modeling relationships with object pairs or aggregating the nearest neighbor features of an object. However, the contextual information constructed in these scenarios is limited in two aspects… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV 2024 (Oral)

  14. arXiv:2408.06201  [pdf, other

    cs.HC cs.IR

    Investigating Characteristics of Media Recommendation Solicitation in r/ifyoulikeblank

    Authors: Md Momen Bhuiyan, Donghan Hu, Andrew Jelson, Tanushree Mitra, Sang Won Lee

    Abstract: Despite the existence of search-based recommender systems like Google, Netflix, and Spotify, online users sometimes may turn to crowdsourced recommendations in places like the r/ifyoulikeblank subreddit. In this exploratory study, we probe why users go to r/ifyoulikeblank, how they look for recommendation, and how the subreddit users respond to recommendation requests. To answer, we collected samp… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: page 23

  15. arXiv:2408.05738  [pdf, other

    cs.CL

    Language-Informed Beam Search Decoding for Multilingual Machine Translation

    Authors: Yilin Yang, Stefan Lee, Prasad Tadepalli

    Abstract: Beam search decoding is the de-facto method for decoding auto-regressive Neural Machine Translation (NMT) models, including multilingual NMT where the target language is specified as an input. However, decoding multilingual NMT models commonly produces ``off-target'' translations -- yielding translation outputs not in the intended language. In this paper, we first conduct an error analysis of off-… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: ACL 2024 Findings

  16. arXiv:2408.05453  [pdf, other

    cs.RO

    TOSS: Real-time Tracking and Moving Object Segmentation for Static Scene Mapping

    Authors: Seoyeon Jang, Minho Oh, Byeongho Yu, I Made Aswin Nahrendra, Seungjae Lee, Hyungtae Lim, Hyun Myung

    Abstract: Safe navigation with simultaneous localization and mapping (SLAM) for autonomous robots is crucial in challenging environments. To achieve this goal, detecting moving objects in the surroundings and building a static map are essential. However, existing moving object segmentation methods have been developed separately for each field, making it challenging to perform real-time navigation and precis… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 13 pages, The 11th International Conference on Robot Intelligence Technology and Applications (RiTA 2023)

  17. arXiv:2408.04619  [pdf, other

    cs.LG cs.AI cs.CL cs.HC

    Transformer Explainer: Interactive Learning of Text-Generative Models

    Authors: Aeree Cho, Grace C. Kim, Alexander Karpekov, Alec Helbling, Zijie J. Wang, Seongmin Lee, Benjamin Hoover, Duen Horng Chau

    Abstract: Transformers have revolutionized machine learning, yet their inner workings remain opaque to many. We present Transformer Explainer, an interactive visualization tool designed for non-experts to learn about Transformers through the GPT-2 model. Our tool helps users understand complex Transformer concepts by integrating a model overview and enabling smooth transitions across abstraction levels of m… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: To be presented at IEEE VIS 2024

  18. arXiv:2408.03663  [pdf, other

    cs.CV

    Designing Extremely Memory-Efficient CNNs for On-device Vision Tasks

    Authors: Jaewook Lee, Yoel Park, Seulki Lee

    Abstract: In this paper, we introduce a memory-efficient CNN (convolutional neural network), which enables resource-constrained low-end embedded and IoT devices to perform on-device vision tasks, such as image classification and object detection, using extremely low memory, i.e., only 63 KB on ImageNet classification. Based on the bottleneck block of MobileNet, we propose three design principles that signif… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  19. arXiv:2408.03612  [pdf, other

    cs.CV cs.LG

    JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling

    Authors: Seok Hwan Lee, Taein Son, Soo Won Seo, Jisong Kim, Jun Won Choi

    Abstract: Video action detection (VAD) is a formidable vision task that involves the localization and classification of actions within the spatial and temporal dimensions of a video clip. Among the myriad VAD architectures, two-stage VAD methods utilize a pre-trained person detector to extract the region of interest features, subsequently employing these features for action detection. However, the performan… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 31 pages, 10 figures

  20. Hierarchical Neural Constructive Solver for Real-world TSP Scenarios

    Authors: Yong Liang Goh, Zhiguang Cao, Yining Ma, Yanfei Dong, Mohammed Haroon Dupty, Wee Sun Lee

    Abstract: Existing neural constructive solvers for routing problems have predominantly employed transformer architectures, conceptualizing the route construction as a set-to-sequence learning task. However, their efficacy has primarily been demonstrated on entirely random problem instances that inadequately capture real-world scenarios. In this paper, we introduce realistic Traveling Salesman Problem (TSP)… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted to KDD 2024

  21. arXiv:2408.03541  [pdf, ps, other

    cs.CL cs.AI

    EXAONE 3.0 7.8B Instruction Tuned Language Model

    Authors: LG AI Research, :, Soyoung An, Kyunghoon Bae, Eunbi Choi, Stanley Jungkyu Choi, Yemuk Choi, Seokhee Hong, Yeonjung Hong, Junwon Hwang, Hyojin Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Yountae Jung, Euisoon Kim, Hyosang Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Youchul Kim, Edward Hwayoung Lee, Haeju Lee , et al. (14 additional authors not shown)

    Abstract: We introduce EXAONE 3.0 instruction-tuned language model, the first open model in the family of Large Language Models (LLMs) developed by LG AI Research. Among different model sizes, we publicly release the 7.8B instruction-tuned model to promote open research and innovations. Through extensive evaluations across a wide range of public and in-house benchmarks, EXAONE 3.0 demonstrates highly compet… ▽ More

    Submitted 13 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

  22. arXiv:2408.03204  [pdf, other

    cs.SD eess.AS

    GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch

    Authors: Sungho Lee, Marco Martínez-Ramírez, Wei-Hsiang Liao, Stefan Uhlich, Giorgio Fabbro, Kyogu Lee, Yuki Mitsufuji

    Abstract: We present GRAFX, an open-source library designed for handling audio processing graphs in PyTorch. Along with various library functionalities, we describe technical details on the efficient parallel computation of input graphs, signals, and processor parameters in GPU. Then, we show its example use under a music mixing scenario, where parameters of every differentiable processor in a large graph a… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted to DAFx 2024 demo

  23. arXiv:2408.02954  [pdf, other

    cs.CV

    WWW: Where, Which and Whatever Enhancing Interpretability in Multimodal Deepfake Detection

    Authors: Juho Jung, Sangyoun Lee, Jooeon Kang, Yunjin Na

    Abstract: All current benchmarks for multimodal deepfake detection manipulate entire frames using various generation techniques, resulting in oversaturated detection accuracies exceeding 94% at the video-level classification. However, these benchmarks struggle to detect dynamic deepfake attacks with challenging frame-by-frame alterations presented in real-world scenarios. To address this limitation, we intr… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: 4 pages, 2 figures, 2 tables, Accepted as Oral Presentation at The Trustworthy AI Workshop @ IJCAI 2024

  24. arXiv:2408.02888  [pdf, other

    cs.CV cs.AI

    VizECGNet: Visual ECG Image Network for Cardiovascular Diseases Classification with Multi-Modal Training and Knowledge Distillation

    Authors: Ju-Hyeon Nam, Seo-Hyung Park, Su Jung Kim, Sang-Chul Lee

    Abstract: An electrocardiogram (ECG) captures the heart's electrical signal to assess various heart conditions. In practice, ECG data is stored as either digitized signals or printed images. Despite the emergence of numerous deep learning models for digitized signals, many hospitals prefer image storage due to cost considerations. Recognizing the unavailability of raw ECG signals in many clinical settings,… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted in International Conference on Image Processing (ICIP) 2024

  25. arXiv:2408.01857  [pdf, other

    math.NA cs.LG math.OC

    Using Linearized Optimal Transport to Predict the Evolution of Stochastic Particle Systems

    Authors: Nicholas Karris, Evangelos A. Nikitopoulos, Ioannis Kevrekidis, Seungjoon Lee, Alexander Cloninger

    Abstract: We develop an algorithm to approximate the time evolution of a probability measure without explicitly learning an operator that governs the evolution. A particular application of interest is discrete measures $μ_t^N$ that arise from particle systems. In many such situations, the individual particles move chaotically on short time scales, making it difficult to learn the dynamics of a governing ope… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  26. arXiv:2408.01426  [pdf, other

    physics.chem-ph cond-mat.mtrl-sci cs.LG

    MolTRES: Improving Chemical Language Representation Learning for Molecular Property Prediction

    Authors: Jun-Hyung Park, Yeachan Kim, Mingyu Lee, Hyuntae Park, SangKeun Lee

    Abstract: Chemical representation learning has gained increasing interest due to the limited availability of supervised data in fields such as drug and materials design. This interest particularly extends to chemical language representation learning, which involves pre-training Transformers on SMILES sequences -- textual descriptors of molecules. Despite its success in molecular property prediction, current… ▽ More

    Submitted 8 July, 2024; originally announced August 2024.

    Comments: 12 pages, 5 figures, submitted to EMNLP 2024 main track

    ACM Class: I.2.7

  27. arXiv:2408.01096  [pdf, other

    cs.SD cs.AI eess.AS

    Six Dragons Fly Again: Reviving 15th-Century Korean Court Music with Transformers and Novel Encoding

    Authors: Danbinaerin Han, Mark Gotham, Dongmin Kim, Hannah Park, Sihun Lee, Dasaem Jeong

    Abstract: We introduce a project that revives a piece of 15th-century Korean court music, Chihwapyeong and Chwipunghyeong, composed upon the poem Songs of the Dragon Flying to Heaven. One of the earliest examples of Jeongganbo, a Korean musical notation system, the remaining version only consists of a rudimentary melody. Our research team, commissioned by the National Gugak (Korean Traditional Music) Center… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: Accepted at the 25th International Society for Music Information Retrieval Conference (ISMIR 2024)

  28. arXiv:2408.01084  [pdf, other

    cs.CL

    Adaptive Contrastive Decoding in Retrieval-Augmented Generation for Handling Noisy Contexts

    Authors: Youna Kim, Hyuhng Joon Kim, Cheonbok Park, Choonghyun Park, Hyunsoo Cho, Junyeob Kim, Kang Min Yoo, Sang-goo Lee, Taeuk Kim

    Abstract: When using large language models (LLMs) in knowledge-intensive tasks, such as open-domain question answering, external context can bridge a gap between external knowledge and LLM's parametric knowledge. Recent research has been developed to amplify contextual knowledge over the parametric knowledge of LLM with contrastive decoding approaches. While these approaches could yield truthful responses w… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  29. arXiv:2408.00965  [pdf, other

    cs.AI

    Integrating ESG and AI: A Comprehensive Responsible AI Assessment Framework

    Authors: Sung Une Lee, Harsha Perera, Yue Liu, Boming Xia, Qinghua Lu, Liming Zhu, Jessica Cairns, Moana Nottage

    Abstract: Artificial Intelligence (AI) is a widely developed and adopted technology across entire industry sectors. Integrating environmental, social, and governance (ESG) considerations with AI investments is crucial for ensuring ethical and sustainable technological advancement. Particularly from an investor perspective, this integration not only mitigates risks but also enhances long-term value creation… ▽ More

    Submitted 5 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

    Comments: 23 pages, 8 tables, 10 figures

  30. arXiv:2408.00380  [pdf, other

    cs.LG cs.AI cs.CV

    Enhancing Whole Slide Pathology Foundation Models through Stain Normalization

    Authors: Juseung Yun, Yi Hu, Jinhyung Kim, Jongseong Jang, Soonyoung Lee

    Abstract: Recent advancements in digital pathology have led to the development of numerous foundational models that utilize self-supervised learning on patches extracted from gigapixel whole slide images (WSIs). While this approach leverages vast amounts of unlabeled data, we have discovered a significant issue: features extracted from these self-supervised models tend to cluster by individual WSIs, a pheno… ▽ More

    Submitted 4 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

    Comments: 13 pages, 8 figures

  31. arXiv:2407.21635  [pdf, other

    cs.LG

    MART: MultiscAle Relational Transformer Networks for Multi-agent Trajectory Prediction

    Authors: Seongju Lee, Junseok Lee, Yeonguk Yu, Taeri Kim, Kyoobin Lee

    Abstract: Multi-agent trajectory prediction is crucial to autonomous driving and understanding the surrounding environment. Learning-based approaches for multi-agent trajectory prediction, such as primarily relying on graph neural networks, graph transformers, and hypergraph neural networks, have demonstrated outstanding performance on real-world datasets in recent years. However, the hypergraph transformer… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: 19 pages, 12 figures, 7 tables, 8 pages of supplementary material. Paper accepted at ECCV 2024

  32. arXiv:2407.21037  [pdf, other

    cs.CL cs.AI

    An Application of Large Language Models to Coding Negotiation Transcripts

    Authors: Ray Friedman, Jaewoo Cho, Jeanne Brett, Xuhui Zhan, Ningyu Han, Sriram Kannan, Yingxiang Ma, Jesse Spencer-Smith, Elisabeth Jäckel, Alfred Zerres, Madison Hooper, Katie Babbit, Manish Acharya, Wendi Adair, Soroush Aslani, Tayfun Aykaç, Chris Bauman, Rebecca Bennett, Garrett Brady, Peggy Briggs, Cheryl Dowie, Chase Eck, Igmar Geiger, Frank Jacob, Molly Kern , et al. (33 additional authors not shown)

    Abstract: In recent years, Large Language Models (LLM) have demonstrated impressive capabilities in the field of natural language processing (NLP). This paper explores the application of LLMs in negotiation transcript analysis by the Vanderbilt AI Negotiation Lab. Starting in September 2022, we applied multiple strategies using LLMs from zero shot learning to fine tuning models to in-context learning). The… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  33. arXiv:2407.20845  [pdf, other

    cs.CV cs.HC cs.LG

    Assessing Graphical Perception of Image Embedding Models using Channel Effectiveness

    Authors: Soohyun Lee, Minsuk Chang, Seokhyeon Park, Jinwook Seo

    Abstract: Recent advancements in vision models have greatly improved their ability to handle complex chart understanding tasks, like chart captioning and question answering. However, it remains challenging to assess how these models process charts. Existing benchmarks only roughly evaluate model performance without evaluating the underlying mechanisms, such as how models extract image embeddings. This limit… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: In Proceedings of the 2024 IEEE Visualization and Visual Analytics (VIS)

  34. arXiv:2407.20806  [pdf, other

    cs.AI cs.LG

    ARCLE: The Abstraction and Reasoning Corpus Learning Environment for Reinforcement Learning

    Authors: Hosung Lee, Sejin Kim, Seungpil Lee, Sanha Hwang, Jihwan Lee, Byung-Jun Lee, Sundong Kim

    Abstract: This paper introduces ARCLE, an environment designed to facilitate reinforcement learning research on the Abstraction and Reasoning Corpus (ARC). Addressing this inductive reasoning benchmark with reinforcement learning presents these challenges: a vast action space, a hard-to-reach goal, and a variety of tasks. We demonstrate that an agent with proximal policy optimization can learn individual ta… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Accepted by CoLLAs 2024, Project page: https://github.com/confeitoHS/arcle

  35. arXiv:2407.20391  [pdf, other

    cs.CV cs.RO

    Alignment Scores: Robust Metrics for Multiview Pose Accuracy Evaluation

    Authors: Seong Hun Lee, Javier Civera

    Abstract: We propose three novel metrics for evaluating the accuracy of a set of estimated camera poses given the ground truth: Translation Alignment Score (TAS), Rotation Alignment Score (RAS), and Pose Alignment Score (PAS). The TAS evaluates the translation accuracy independently of the rotations, and the RAS evaluates the rotation accuracy independently of the translations. The PAS is the average of the… ▽ More

    Submitted 2 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

  36. arXiv:2407.19862  [pdf, other

    cs.SD eess.AS

    Wavespace: A Highly Explorable Wavetable Generator

    Authors: Hazounne Lee, Kihong Kim, Sungho Lee, Kyogu Lee

    Abstract: Wavetable synthesis generates quasi-periodic waveforms of musical tones by interpolating a list of waveforms called wavetable. As generative models that utilize latent representations offer various methods in waveform generation for musical applications, studies in wavetable generation with invertible architecture have also arisen recently. While they are promising, it is still challenging to gene… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  37. arXiv:2407.19156  [pdf, other

    cs.CV

    Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble

    Authors: Juhan Cha, Minseok Joo, Jihwan Park, Sanghyeok Lee, Injae Kim, Hyunwoo J. Kim

    Abstract: Recent advancements in 3D object detection have benefited from multi-modal information from the multi-view cameras and LiDAR sensors. However, the inherent disparities between the modalities pose substantial challenges. We observe that existing multi-modal 3D object detection methods heavily rely on the LiDAR sensor, treating the camera as an auxiliary modality for augmenting semantic details. Thi… ▽ More

    Submitted 19 August, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

  38. arXiv:2407.16329  [pdf, other

    cs.HC cs.AI

    PhenoFlow: A Human-LLM Driven Visual Analytics System for Exploring Large and Complex Stroke Datasets

    Authors: Jaeyoung Kim, Sihyeon Lee, Hyeon Jeon, Keon-Joo Lee, Hee-Joon Bae, Bohyoung Kim, Jinwook Seo

    Abstract: Acute stroke demands prompt diagnosis and treatment to achieve optimal patient outcomes. However, the intricate and irregular nature of clinical data associated with acute stroke, particularly blood pressure (BP) measurements, presents substantial obstacles to effective visual analytics and decision-making. Through a year-long collaboration with experienced neurologists, we developed PhenoFlow, a… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 11 pages, 5 figures, paper to appear in IEEE Transactions on Visualization and Computer Graphics (TVCG) (Proc. IEEE VIS 2024)

  39. arXiv:2407.16125  [pdf, other

    cs.CV

    Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems

    Authors: Sojin Lee, Dogyun Park, Inho Kong, Hyunwoo J. Kim

    Abstract: Recent studies on inverse problems have proposed posterior samplers that leverage the pre-trained diffusion models as powerful priors. These attempts have paved the way for using diffusion models in a wide range of inverse problems. However, the existing methods entail computationally demanding iterative sampling procedures and optimize a separate solution for each measurement, which leads to limi… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; 41 pages, 19 figures

  40. arXiv:2407.15475  [pdf, other

    cs.RO cs.AI

    A Multi-Level Corroborative Approach for Verification and Validation of Autonomous Robotic Swarms

    Authors: Dhaminda B. Abeywickrama, Suet Lee, Chris Bennett, Razanne Abu-Aisheh, Tom Didiot-Cook, Simon Jones, Sabine Hauert, Kerstin Eder

    Abstract: Modelling and characterizing emergent behaviour within a swarm can pose significant challenges in terms of 'assurance'. Assurance tasks encompass adherence to standards, certification processes, and the execution of verification and validation (V&V) methods, such as model checking. In this study, we propose a holistic, multi-level modelling approach for formally verifying and validating autonomous… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 15 pages, 11 figures

    ACM Class: I.2.9; D.2; I.6

  41. arXiv:2407.14059  [pdf, other

    cs.CV

    Regularizing Dynamic Radiance Fields with Kinematic Fields

    Authors: Woobin Im, Geonho Cha, Sebin Lee, Jumin Lee, Juhyeong Seon, Dongyoon Wee, Sung-Eui Yoon

    Abstract: This paper presents a novel approach for reconstructing dynamic radiance fields from monocular videos. We integrate kinematics with dynamic radiance fields, bridging the gap between the sparse nature of monocular videos and the real-world physics. Our method introduces the kinematic field, capturing motion through kinematic quantities: velocity, acceleration, and jerk. The kinematic field is joint… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  42. arXiv:2407.13942  [pdf, other

    cs.CY cs.AI cs.CL cs.SI

    Harmful Suicide Content Detection

    Authors: Kyumin Park, Myung Jae Baik, YeongJun Hwang, Yen Shin, HoJae Lee, Ruda Lee, Sang Min Lee, Je Young Hannah Sun, Ah Rah Lee, Si Yeun Yoon, Dong-ho Lee, Jihyung Moon, JinYeong Bak, Kyunghyun Cho, Jong-Woo Paik, Sungjoon Park

    Abstract: Harmful suicide content on the Internet is a significant risk factor inducing suicidal thoughts and behaviors among vulnerable populations. Despite global efforts, existing resources are insufficient, specifically in high-risk regions like the Republic of Korea. Current research mainly focuses on understanding negative effects of such content or suicide risk in individuals, rather than on automati… ▽ More

    Submitted 2 June, 2024; originally announced July 2024.

    Comments: 30 pages, 7 figures

  43. arXiv:2407.13853  [pdf, other

    cs.LG cs.PF

    Data-driven Forecasting of Deep Learning Performance on GPUs

    Authors: Seonho Lee, Amar Phanishayee, Divya Mahajan

    Abstract: Deep learning kernels exhibit predictable memory accesses and compute patterns, making GPUs' parallel architecture well-suited for their execution. Software and runtime systems for GPUs are optimized to better utilize the stream multiprocessors, on-chip cache, and off-chip high-bandwidth memory. As deep learning models and GPUs evolve, access to newer GPUs is often limited, raising questions about… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  44. arXiv:2407.13808  [pdf, other

    cs.CV

    CoAPT: Context Attribute words for Prompt Tuning

    Authors: Gun Lee, Subin An, Sungyong Baik, Soochahn Lee

    Abstract: We propose a novel prompt tuning method called CoAPT(Context Attribute words in Prompt Tuning) for few/zero-shot image classification. The core motivation is that attributes are descriptive words with rich information about a given concept. Thus, we aim to enrich text queries of existing prompt tuning methods, improving alignment between text and image embeddings in CLIP embedding space. To do so,… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 14 pages, 4 figures

  45. arXiv:2407.13437  [pdf, other

    cs.CV

    FREST: Feature RESToration for Semantic Segmentation under Multiple Adverse Conditions

    Authors: Sohyun Lee, Namyup Kim, Sungyeon Kim, Suha Kwak

    Abstract: Robust semantic segmentation under adverse conditions is crucial in real-world applications. To address this challenging task in practical scenarios where labeled normal condition images are not accessible in training, we propose FREST, a novel feature restoration framework for source-free domain adaptation (SFDA) of semantic segmentation to adverse conditions. FREST alternates two steps: (1) lear… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  46. arXiv:2407.13078  [pdf, other

    cs.CV cs.AI

    Enhancing Temporal Action Localization: Advanced S6 Modeling with Recurrent Mechanism

    Authors: Sangyoun Lee, Juho Jung, Changdae Oh, Sunghee Yun

    Abstract: Temporal Action Localization (TAL) is a critical task in video analysis, identifying precise start and end times of actions. Existing methods like CNNs, RNNs, GCNs, and Transformers have limitations in capturing long-range dependencies and temporal causality. To address these challenges, we propose a novel TAL architecture leveraging the Selective State Space Model (S6). Our approach integrates th… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 8 pages, 3 figures, Preprint

  47. arXiv:2407.13052  [pdf, other

    cs.CY cs.DS cs.LG

    Matchings, Predictions and Counterfactual Harm in Refugee Resettlement Processes

    Authors: Seungeon Lee, Nina Corvelo Benz, Suhas Thejaswi, Manuel Gomez-Rodriguez

    Abstract: Resettlement agencies have started to adopt data-driven algorithmic matching to match refugees to locations using employment rate as a measure of utility. Given a pool of refugees, data-driven algorithmic matching utilizes a classifier to predict the probability that each refugee would find employment at any given location. Then, it uses the predicted probabilities to estimate the expected utility… ▽ More

    Submitted 24 May, 2024; originally announced July 2024.

    Comments: 24 pages including reference and appendix

  48. arXiv:2407.12614  [pdf

    cs.CV

    Strawberry detection and counting based on YOLOv7 pruning and information based tracking algorithm

    Authors: Shiyu Liu, Congliang Zhou, Won Suk Lee

    Abstract: The strawberry industry yields significant economic benefits for Florida, yet the process of monitoring strawberry growth and yield is labor-intensive and costly. The development of machine learning-based detection and tracking methodologies has been used for helping automated monitoring and prediction of strawberry yield, still, enhancement has been limited as previous studies only applied the de… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  49. arXiv:2407.12463  [pdf, other

    cs.CV

    Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation

    Authors: Hyun Seok Seong, WonJun Moon, SuBeen Lee, Jae-Pil Heo

    Abstract: The labor-intensive labeling for semantic segmentation has spurred the emergence of Unsupervised Semantic Segmentation. Recent studies utilize patch-wise contrastive learning based on features from image-level self-supervised pretrained models. However, relying solely on similarity-based supervision from image-level pretrained models often leads to unreliable guidance due to insufficient patch-lev… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  50. arXiv:2407.12405  [pdf, other

    eess.IV cs.CV cs.RO

    Fisheye-Calib-Adapter: An Easy Tool for Fisheye Camera Model Conversion

    Authors: Sangjun Lee

    Abstract: The increasing necessity for fisheye cameras in fields such as robotics and autonomous driving has led to the proposal of various fisheye camera models. While the evolution of camera models has facilitated the development of diverse systems in the field, the lack of adaptation between different fisheye camera models means that recalibration is always necessary, which is cumbersome. This paper intr… ▽ More

    Submitted 19 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures