Skip to main content

Showing 1–50 of 1,186 results for author: Kim, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.09791  [pdf, other

    stat.ML cs.LG

    ALTBI: Constructing Improved Outlier Detection Models via Optimization of Inlier-Memorization Effect

    Authors: Seoyoung Cho, Jaesung Hwang, Kwan-Young Bak, Dongha Kim

    Abstract: Outlier detection (OD) is the task of identifying unusual observations (or outliers) from a given or upcoming data by learning unique patterns of normal observations (or inliers). Recently, a study introduced a powerful unsupervised OD (UOD) solver based on a new observation of deep generative models, called inlier-memorization (IM) effect, which suggests that generative models memorize inliers be… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 24 pages in total

  2. arXiv:2408.08261  [pdf, other

    cs.CL

    mhGPT: A Lightweight Generative Pre-Trained Transformer for Mental Health Text Analysis

    Authors: Dae-young Kim, Rebecca Hwa, Muhammad Mahbubur Rahman

    Abstract: This paper introduces mhGPT, a lightweight generative pre-trained transformer trained on mental health-related social media and PubMed articles. Fine-tuned for specific mental health tasks, mhGPT was evaluated under limited hardware constraints and compared with state-of-the-art models like MentaLLaMA and Gemma. Despite having only 1.98 billion parameters and using just 5% of the dataset, mhGPT ou… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  3. arXiv:2408.06010  [pdf, other

    cs.CV

    DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation

    Authors: Jisoo Kim, Jungbin Cho, Joonho Park, Soonmin Hwang, Da Eun Kim, Geon Kim, Youngjae Yu

    Abstract: Speech-driven 3D facial animation has garnered lots of attention thanks to its broad range of applications. Despite recent advancements in achieving realistic lip motion, current methods fail to capture the nuanced emotional undertones conveyed through speech and produce monotonous facial motion. These limitations result in blunt and repetitive facial animations, reducing user engagement and hinde… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: First two authors contributed equally

  4. arXiv:2408.05749  [pdf, other

    cs.CV cs.LG

    Efficient and Versatile Robust Fine-Tuning of Zero-shot Models

    Authors: Sungyeon Kim, Boseung Jeong, Donghyun Kim, Suha Kwak

    Abstract: Large-scale image-text pre-trained models enable zero-shot classification and provide consistent accuracy across various data distributions. Nonetheless, optimizing these models in downstream tasks typically requires fine-tuning, which reduces generalization to out-of-distribution (OOD) data and demands extensive computational resources. We introduce Robust Adapter (R-Adapter), a novel method for… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV 2024

  5. arXiv:2408.04506  [pdf, other

    cs.HC

    Who ruins the game?: unveiling cheating players in the "Battlefield" game

    Authors: Dong Young Kim, Huy Kang Kim

    Abstract: The "Battlefield" online game is well-known for its large-scale multiplayer capabilities and unique gaming features, including various vehicle controls. However, these features make the game a major target for cheating, significantly detracting from the gaming experience. This study analyzes user behavior in cheating play in the popular online game, the "Battlefield", using statistical methods. We… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 12 pages, 5 figures, 2 tables; accepted to the 25th World Conference on Information Security Applications (WISA 2024)

  6. arXiv:2408.04297  [pdf, other

    cs.ET cs.HC

    Spatial Affordance-aware Interactable Subspace Allocation for Mixed Reality Telepresence

    Authors: Dooyoung Kim, Seonji Kim, Selin Choi, Woontack Woo

    Abstract: To enable remote Virtual Reality (VR) and Augmented Reality (AR) clients to collaborate as if they were in the same space during Mixed Reality (MR) telepresence, it is essential to overcome spatial heterogeneity and generate a unified shared collaborative environment by integrating remote spaces into a target host space. Especially when multiple remote users connect, a large shared space is necess… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: Accepted at the 2024 IEEE ISMAR Conference. 10 pages, 6 figures

  7. arXiv:2408.03758  [pdf, other

    cs.CR cs.NI

    MTDSense: AI-Based Fingerprinting of Moving Target Defense Techniques in Software-Defined Networking

    Authors: Tina Moghaddam, Guowei Yang, Chandra Thapa, Seyit Camtepe, Dan Dongseong Kim

    Abstract: Moving target defenses (MTD) are proactive security techniques that enhance network security by confusing the attacker and limiting their attack window. MTDs have been shown to have significant benefits when evaluated against traditional network attacks, most of which are automated and untargeted. However, little has been done to address an attacker who is aware the network uses an MTD. In this wo… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 12 pages, 12 figures, 3 tables

  8. arXiv:2408.02957  [pdf, other

    cs.CV

    Online Temporal Action Localization with Memory-Augmented Transformer

    Authors: Youngkil Song, Dongkeun Kim, Minsu Cho, Suha Kwak

    Abstract: Online temporal action localization (On-TAL) is the task of identifying multiple action instances given a streaming video. Since existing methods take as input only a video segment of fixed size per iteration, they are limited in considering long-term context and require tuning the segment size carefully. To overcome these limitations, we propose memory-augmented transformer (MATR). MATR utilizes… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: Accepted to ECCV 2024, Project page: https://cvlab.postech.ac.kr/research/MATR/

  9. arXiv:2408.01585  [pdf, other

    cs.SE cs.AI

    OpenLogParser: Unsupervised Parsing with Open-Source Large Language Models

    Authors: Zeyang Ma, Dong Jae Kim, Tse-Hsun Chen

    Abstract: Log parsing is a critical step that transforms unstructured log data into structured formats, facilitating subsequent log-based analysis. Traditional syntax-based log parsers are efficient and effective, but they often experience decreased accuracy when processing logs that deviate from the predefined rules. Recently, large language models (LLM) based log parsers have shown superior parsing accura… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  10. arXiv:2408.01096  [pdf, other

    cs.SD cs.AI eess.AS

    Six Dragons Fly Again: Reviving 15th-Century Korean Court Music with Transformers and Novel Encoding

    Authors: Danbinaerin Han, Mark Gotham, Dongmin Kim, Hannah Park, Sihun Lee, Dasaem Jeong

    Abstract: We introduce a project that revives a piece of 15th-century Korean court music, Chihwapyeong and Chwipunghyeong, composed upon the poem Songs of the Dragon Flying to Heaven. One of the earliest examples of Jeongganbo, a Korean musical notation system, the remaining version only consists of a rudimentary melody. Our research team, commissioned by the National Gugak (Korean Traditional Music) Center… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: Accepted at the 25th International Society for Music Information Retrieval Conference (ISMIR 2024)

  11. arXiv:2408.00973  [pdf, other

    stat.ML cs.LG math.ST

    META-ANOVA: Screening interactions for interpretable machine learning

    Authors: Yongchan Choi, Seokhun Park, Chanmoo Park, Dongha Kim, Yongdai Kim

    Abstract: There are two things to be considered when we evaluate predictive models. One is prediction accuracy,and the other is interpretability. Over the recent decades, many prediction models of high performance, such as ensemble-based models and deep neural networks, have been developed. However, these models are often too complex, making it difficult to intuitively interpret their predictions. This comp… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 26 pages

  12. arXiv:2407.21260  [pdf, other

    cs.LG cs.AI stat.ML

    Tractable and Provably Efficient Distributional Reinforcement Learning with General Value Function Approximation

    Authors: Taehyun Cho, Seungyub Han, Kyungjae Lee, Seokhun Ju, Dohyeong Kim, Jungwoo Lee

    Abstract: Distributional reinforcement learning improves performance by effectively capturing environmental stochasticity, but a comprehensive theoretical understanding of its effectiveness remains elusive. In this paper, we present a regret analysis for distributional reinforcement learning with general value function approximation in a finite episodic Markov decision process setting. We first introduce a… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  13. arXiv:2407.21199  [pdf, other

    cs.HC

    A Survey on Exploratory Spatiotemporal Visual Analytics Approaches for Climate Science

    Authors: Abdullah-Al-Raihan Nayeem, Dongyun Han, Huikyo Lee, Donghoon Kim, Daniel Feldman, William J. Tolone, Daniel Crichton, Isaac Cho

    Abstract: Climate science produces a wealth of complex, high-dimensional, multivariate data from observations and numerical models. These data are critical for understanding climate changes and their socioeconomic impacts. Climate scientists are continuously evaluating output from numerical models against observations. This model evaluation process provides useful guidance to improve the numerical models an… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  14. arXiv:2407.20248  [pdf, other

    cs.CL cs.AI cs.CY

    LAPIS: Language Model-Augmented Police Investigation System

    Authors: Heedou Kim, Dain Kim, Jiwoo Lee, Chanwoong Yoon, Donghee Choi, Mogan Gim, Jaewoo Kang

    Abstract: Crime situations are race against time. An AI-assisted criminal investigation system, providing prompt but precise legal counsel is in need for police officers. We introduce LAPIS (Language Model Augmented Police Investigation System), an automated system that assists police officers to perform rational and legal investigative actions. We constructed a finetuning dataset and retrieval knowledgebas… ▽ More

    Submitted 31 July, 2024; v1 submitted 19 July, 2024; originally announced July 2024.

  15. arXiv:2407.18257  [pdf

    cs.NE stat.ML

    Estimation of Distribution Algorithms with Matrix Transpose in Bayesian Learning

    Authors: Dae-Won Kim, Song Ko, Bo-Yeong Kang

    Abstract: Estimation of distribution algorithms (EDAs) constitute a new branch of evolutionary optimization algorithms, providing effective and efficient optimization performance in a variety of research areas. Recent studies have proposed new EDAs that employ mutation operators in standard EDAs to increase the population diversity. We present a new mutation operator, a matrix transpose, specifically design… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  16. arXiv:2407.17710  [pdf, other

    cs.LG

    Revisiting Machine Unlearning with Dimensional Alignment

    Authors: Seonguk Seo, Dongwan Kim, Bohyung Han

    Abstract: Machine unlearning, an emerging research topic focusing on compliance with data privacy regulations, enables trained models to remove the information learned from specific data. While many existing methods indirectly address this issue by intentionally injecting incorrect supervisions, they can drastically and unpredictably alter the decision boundaries and feature spaces, leading to training inst… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  17. arXiv:2407.17423  [pdf, ps, other

    cs.CV

    On selection of centroids of fuzzy clusters for color classification

    Authors: Dae-Won Kim, Kwang H. Lee

    Abstract: A novel initialization method in the fuzzy c-means (FCM) algorithm is proposed for the color clustering problem. Given a set of color points, the proposed initialization extracts dominant colors that are the most vivid and distinguishable colors. Color points closest to the dominant colors are selected as initial centroids in the FCM. To obtain the dominant colors and their closest color points, w… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  18. arXiv:2407.16232  [pdf, other

    cs.CV

    Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution

    Authors: Dinh Phu Tran, Dao Duy Hung, Daeyoung Kim

    Abstract: Recently, window-based attention methods have shown great potential for computer vision tasks, particularly in Single Image Super-Resolution (SISR). However, it may fall short in capturing long-range dependencies and relationships between distant tokens. Additionally, we find that learning on spatial domain does not convey the frequency content of the image, which is a crucial aspect in SISR. To t… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Version 1, BMVC 2024

  19. arXiv:2407.15296  [pdf, other

    cs.CV cs.CL cs.LG

    Weak-to-Strong Compositional Learning from Generative Models for Language-based Object Detection

    Authors: Kwanyong Park, Kuniaki Saito, Donghyun Kim

    Abstract: Vision-language (VL) models often exhibit a limited understanding of complex expressions of visual objects (e.g., attributes, shapes, and their relations), given complex and diverse language queries. Traditional approaches attempt to improve VL models using hard negative synthetic text, but their effectiveness is limited. In this paper, we harness the exceptional compositional understanding capabi… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  20. arXiv:2407.14612  [pdf, other

    cs.RO

    A Biomechanics-Inspired Approach to Soccer Kicking for Humanoid Robots

    Authors: Daniel Marew, Nisal Perera, Shangqun Yu, Sarah Roelker, Donghyun Kim

    Abstract: Soccer kicking is a complex whole-body motion that requires intricate coordination of various motor actions. To accomplish such dynamic motion in a humanoid robot, the robot needs to simultaneously: 1) transfer high kinetic energy to the kicking leg, 2) maintain balance and stability of the entire body, and 3) manage the impact disturbance from the ball during the kicking moment. Prior studies on… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  21. arXiv:2407.14563  [pdf, other

    cs.CV

    Learning Visual Grounding from Generative Vision and Language Model

    Authors: Shijie Wang, Dahun Kim, Ali Taalimi, Chen Sun, Weicheng Kuo

    Abstract: Visual grounding tasks aim to localize image regions based on natural language references. In this work, we explore whether generative VLMs predominantly trained on image-text data could be leveraged to scale up the text annotation of visual grounding data. We find that grounding knowledge already exists in generative VLM and can be elicited by proper prompting. We thus prompt a VLM to generate ob… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  22. arXiv:2407.13833  [pdf, other

    cs.CL cs.AI

    Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle

    Authors: Emman Haider, Daniel Perez-Becker, Thomas Portet, Piyush Madan, Amit Garg, David Majercak, Wen Wen, Dongwoo Kim, Ziyi Yang, Jianwen Zhang, Hiteshi Sharma, Blake Bullwinkel, Martin Pouliot, Amanda Minnich, Shiven Chawla, Solianna Herrera, Shahed Warreth, Maggie Engler, Gary Lopez, Nina Chikanov, Raja Sekhar Rao Dheekonda, Bolor-Erdene Jagdagdorj, Roman Lutz, Richard Lundeen, Tori Westerhoff , et al. (5 additional authors not shown)

    Abstract: Recent innovations in language model training have demonstrated that it is possible to create highly performant models that are small enough to run on a smartphone. As these models are deployed in an increasing number of domains, it is critical to ensure that they are aligned with human preferences and safety considerations. In this report, we present our methodology for safety aligning the Phi-3… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  23. arXiv:2407.12777  [pdf, other

    cs.CV cs.GR

    Generalizable Human Gaussians for Sparse View Synthesis

    Authors: Youngjoong Kwon, Baole Fang, Yixing Lu, Haoye Dong, Cheng Zhang, Francisco Vicente Carrasco, Albert Mosella-Montoro, Jianjin Xu, Shingo Takagi, Daeil Kim, Aayush Prakash, Fernando De la Torre

    Abstract: Recent progress in neural rendering has brought forth pioneering methods, such as NeRF and Gaussian Splatting, which revolutionize view rendering across various domains like AR/VR, gaming, and content creation. While these methods excel at interpolating {\em within the training data}, the challenge of generalizing to new scenes and objects from very sparse views persists. Specifically, modeling 3D… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  24. arXiv:2407.12637  [pdf, other

    cs.CV

    Toward INT4 Fixed-Point Training via Exploring Quantization Error for Gradients

    Authors: Dohyung Kim, Junghyup Lee, Jeimin Jeon, Jaehyeon Moon, Bumsub Ham

    Abstract: Network quantization generally converts full-precision weights and/or activations into low-bit fixed-point values in order to accelerate an inference process. Recent approaches to network quantization further discretize the gradients into low-bit fixed-point values, enabling an efficient training. They typically set a quantization interval using a min-max range of the gradients or adjust the inter… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  25. arXiv:2407.12616  [pdf, other

    cs.CV cs.AI

    Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models

    Authors: Donggeun Kim, Taesup Kim

    Abstract: Multimodal learning typically relies on the assumption that all modalities are fully available during both the training and inference phases. However, in real-world scenarios, consistently acquiring complete multimodal data presents significant challenges due to various factors. This often leads to the issue of missing modalities, where data for certain modalities are absent, posing considerable o… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  26. arXiv:2407.11714  [pdf, other

    cs.CV

    Improving Unsupervised Video Object Segmentation via Fake Flow Generation

    Authors: Suhwan Cho, Minhyeok Lee, Jungho Lee, Donghyeong Kim, Seunghoon Lee, Sungmin Woo, Sangyoun Lee

    Abstract: Unsupervised video object segmentation (VOS), also known as video salient object detection, aims to detect the most prominent object in a video at the pixel level. Recently, two-stream approaches that leverage both RGB images and optical flow maps have gained significant attention. However, the limited amount of training data remains a substantial challenge. In this study, we propose a novel data… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  27. arXiv:2407.11330  [pdf

    cs.NE cs.LG cs.MA nlin.AO

    Navigating the swarm: Deep neural networks command emergent behaviours

    Authors: Dongjo Kim, Jeongsu Lee, Ho-Young Kim

    Abstract: Interacting individuals in complex systems often give rise to coherent motion exhibiting coordinated global structures. Such phenomena are ubiquitously observed in nature, from cell migration, bacterial swarms, animal and insect groups, and even human societies. Primary mechanisms responsible for the emergence of collective behavior have been extensively identified, including local alignments base… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  28. arXiv:2407.10733  [pdf, other

    cs.CV

    Joint-Embedding Predictive Architecture for Self-Supervised Learning of Mask Classification Architecture

    Authors: Dong-Hee Kim, Sungduk Cho, Hyeonwoo Cho, Chanmin Park, Jinyoung Kim, Won Hwa Kim

    Abstract: In this work, we introduce Mask-JEPA, a self-supervised learning framework tailored for mask classification architectures (MCA), to overcome the traditional constraints associated with training segmentation models. Mask-JEPA combines a Joint Embedding Predictive Architecture with MCA to adeptly capture intricate semantics and precise object boundaries. Our approach addresses two critical challenge… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 27 pages, 5 figures

  29. arXiv:2407.09541  [pdf, other

    cs.CL cs.AI cs.CV

    MATE: Meet At The Embedding -- Connecting Images with Long Texts

    Authors: Young Kyun Jang, Junmo Kang, Yong Jae Lee, Donghyun Kim

    Abstract: While advancements in Vision Language Models (VLMs) have significantly improved the alignment of visual and textual data, these models primarily focus on aligning images with short descriptive captions. This focus limits their ability to handle complex text interactions, particularly with longer texts such as lengthy captions or documents, which have not been extensively explored yet. In this pape… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

  30. arXiv:2407.09033  [pdf, other

    cs.CV

    Textual Query-Driven Mask Transformer for Domain Generalized Segmentation

    Authors: Byeonghyun Pak, Byeongju Woo, Sunghwan Kim, Dae-hwan Kim, Hoseong Kim

    Abstract: In this paper, we introduce a method to tackle Domain Generalized Semantic Segmentation (DGSS) by utilizing domain-invariant semantic knowledge from text embeddings of vision-language models. We employ the text embeddings as object queries within a transformer-based segmentation framework (textual object queries). These queries are regarded as a domain-invariant basis for pixel grouping in DGSS. T… ▽ More

    Submitted 31 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  31. arXiv:2407.08872  [pdf, other

    cs.CV

    Visual Multi-Object Tracking with Re-Identification and Occlusion Handling using Labeled Random Finite Sets

    Authors: Linh Van Ma, Tran Thien Dat Nguyen, Changbeom Shim, Du Yong Kim, Namkoo Ha, Moongu Jeon

    Abstract: This paper proposes an online visual multi-object tracking (MOT) algorithm that resolves object appearance-reappearance and occlusion. Our solution is based on the labeled random finite set (LRFS) filtering approach, which in principle, addresses disappearance, appearance, reappearance, and occlusion via a single Bayesian recursion. However, in practice, existing numerical approximations cause rea… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  32. arXiv:2407.08073  [pdf, other

    cs.RO cs.AI cs.LG

    NDST: Neural Driving Style Transfer for Human-Like Vision-Based Autonomous Driving

    Authors: Donghyun Kim, Aws Khalil, Haewoon Nam, Jaerock Kwon

    Abstract: Autonomous Vehicles (AV) and Advanced Driver Assistant Systems (ADAS) prioritize safety over comfort. The intertwining factors of safety and comfort emerge as pivotal elements in ensuring the effectiveness of Autonomous Driving (AD). Users often experience discomfort when AV or ADAS drive the vehicle on their behalf. Providing a personalized human-like AD experience, tailored to match users' uniqu… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 9 pages, 11 figures

  33. arXiv:2407.06782  [pdf, ps, other

    cs.CV

    Fuzzy color model and clustering algorithm for color clustering problem

    Authors: Dae-Won Kim, Kwang H. Lee

    Abstract: The research interest of this paper is focused on the efficient clustering task for an arbitrary color data. In order to tackle this problem, we have tried to model the inherent uncertainty and vagueness of color data using fuzzy color model. By taking fuzzy approach to color modeling, we could make a soft decision for the vague regions between neighboring colors. The proposed fuzzy color model de… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  34. arXiv:2407.06774  [pdf, ps, other

    cs.AI

    A new validity measure for fuzzy c-means clustering

    Authors: Dae-Won Kim, Kwang H. Lee

    Abstract: A new cluster validity index is proposed for fuzzy clusters obtained from fuzzy c-means algorithm. The proposed validity index exploits inter-cluster proximity between fuzzy clusters. Inter-cluster proximity is used to measure the degree of overlap between clusters. A low proximity value refers to well-partitioned clusters. The best fuzzy c-partition is obtained by minimizing inter-cluster proximi… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Accepted at FIP-2002

  35. arXiv:2407.06613  [pdf, other

    cs.CV

    Sparse-DeRF: Deblurred Neural Radiance Fields from Sparse View

    Authors: Dogyoon Lee, Donghyeong Kim, Jungho Lee, Minhyeok Lee, Seunghoon Lee, Sangyoun Lee

    Abstract: Recent studies construct deblurred neural radiance fields (DeRF) using dozens of blurry images, which are not practical scenarios if only a limited number of blurry images are available. This paper focuses on constructing DeRF from sparse-view for more pragmatic real-world scenarios. As observed in our experiments, establishing DeRF from sparse views proves to be a more challenging problem due to… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Project page: https://dogyoonlee.github.io/sparsederf/

  36. arXiv:2407.06551  [pdf, other

    cs.CL

    OffsetBias: Leveraging Debiased Data for Tuning Evaluators

    Authors: Junsoo Park, Seungyeon Jwa, Meiying Ren, Daeyoung Kim, Sanghyuk Choi

    Abstract: Employing Large Language Models (LLMs) to assess the quality of generated responses, such as prompting instruct-tuned models or fine-tuning judge models, has become a widely adopted evaluation method. It is also known that such evaluators are vulnerable to biases, such as favoring longer responses. While it is important to overcome this problem, the specifics of these biases remain under-explored.… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: Work in Progress

  37. arXiv:2407.06004  [pdf, other

    cs.CL

    Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models

    Authors: Chani Jung, Dongkwan Kim, Jiho Jin, Jiseon Kim, Yeon Seonwoo, Yejin Choi, Alice Oh, Hyunwoo Kim

    Abstract: While humans naturally develop theory of mind (ToM), the capability to understand other people's mental states and beliefs, state-of-the-art large language models (LLMs) underperform on simple ToM benchmarks. We posit that we can extend our understanding of LLMs' ToM abilities by evaluating key human ToM precursors -- perception inference and perception-to-belief inference -- in LLMs. We introduce… ▽ More

    Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

  38. arXiv:2407.03923  [pdf, other

    cs.CV cs.AI

    CRiM-GS: Continuous Rigid Motion-Aware Gaussian Splatting from Motion Blur Images

    Authors: Junghe Lee, Donghyeong Kim, Dogyoon Lee, Suhwan Cho, Sangyoun Lee

    Abstract: Neural radiance fields (NeRFs) have received significant attention due to their high-quality novel view rendering ability, prompting research to address various real-world cases. One critical challenge is the camera motion blur caused by camera movement during exposure time, which prevents accurate 3D scene reconstruction. In this study, we propose continuous rigid motion-aware gaussian splatting… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: Project Page : https://jho-yonsei.github.io/CRiM-Gaussian/

  39. arXiv:2407.03600  [pdf, other

    cs.CL

    Contrastive Chain-of-Thought Prompting

    Authors: Grant Kruttschnitt, Jay Shim, Alyssa Ma, Daniel Kim, Benjamin Chek, Athul Anand, Kevin Zhu, Sean O'Brien

    Abstract: Rapidly increasing model scales coupled with steering methods such as chain-of-thought prompting have led to drastic improvements in language model reasoning. At the same time, models struggle with compositional generalization and are far from human performance on many reasoning-based benchmarks. Leveraging the success of chain-of-thought prompting, and also taking inspiration from context-aware d… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 6 pages, 0 figures

  40. arXiv:2407.00972  [pdf, other

    cs.CV

    FALCON: Frequency Adjoint Link with CONtinuous Density Mask for Fast Single Image Dehazing

    Authors: Donghyun Kim, Seil Kang, Seong Jae Hwang

    Abstract: Image dehazing, addressing atmospheric interference like fog and haze, remains a pervasive challenge crucial for robust vision applications such as surveillance and remote sensing under adverse visibility. While various methodologies have evolved from early works predicting transmission matrix and atmospheric light features to deep learning and dehazing networks, they innately prioritize dehazing… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  41. arXiv:2406.18695  [pdf, other

    cs.LG cs.AI cs.CL

    Learning to Correct for QA Reasoning with Black-box LLMs

    Authors: Jaehyung Kim, Dongyoung Kim, Yiming Yang

    Abstract: An open challenge in recent machine learning is about how to improve the reasoning capability of large language models (LLMs) in a black-box setting, i.e., without access to detailed information such as output token probabilities. Existing approaches either rely on accessibility (which is often unrealistic) or involve significantly increased train- and inference-time costs. This paper addresses th… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: preprint, 18 pages

  42. arXiv:2406.17998  [pdf, other

    cs.CV

    Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model

    Authors: Zhuo Zheng, Stefano Ermon, Dongjun Kim, Liangpei Zhang, Yanfei Zhong

    Abstract: Our understanding of the temporal dynamics of the Earth's surface has been advanced by deep vision models, which often require lots of labeled multi-temporal images for training. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present change data generators based on gene… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: The enhanced extension of our ICCV 2023 (Changen)

  43. arXiv:2406.15275  [pdf, other

    cs.CL

    Cognitive Map for Language Models: Optimal Planning via Verbally Representing the World Model

    Authors: Doyoung Kim, Jongwon Lee, Jinho Park, Minjoon Seo

    Abstract: Language models have demonstrated impressive capabilities across various natural language processing tasks, yet they struggle with planning tasks requiring multi-step simulations. Inspired by human cognitive processes, this paper investigates the optimal planning power of language models that can construct a cognitive map of a given environment. Our experiments demonstrate that cognitive map signi… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  44. arXiv:2406.14706  [pdf

    cs.ET cs.AR

    SWANN: Shuffling Weights in Crossbar Arrays for Enhanced DNN Accuracy in Deeply Scaled Technologies

    Authors: Jeffry Victor, Dong Eun Kim, Chunguang Wang, Kaushik Roy, Sumeet Gupta

    Abstract: Deep neural network (DNN) accelerators employing crossbar arrays capable of in-memory computing (IMC) are highly promising for neural computing platforms. However, in deeply scaled technologies, interconnect resistance severely impairs IMC robustness, leading to a drop in the system accuracy. To address this problem, we propose SWANN - a technique based on shuffling weights in crossbar arrays whic… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  45. arXiv:2406.13964  [pdf, other

    cs.NI

    Hierarchical Micro-Segmentations for Zero-Trust Services via Large Language Model (LLM)-enhanced Graph Diffusion

    Authors: Yinqiu Liu, Guangyuan Liu, Hongyang Du, Dusit Niyato, Jiawen Kang, Zehui Xiong, Dong In Kim, Xuemin Shen

    Abstract: In the rapidly evolving Next-Generation Networking (NGN) era, the adoption of zero-trust architectures has become increasingly crucial to protect security. However, provisioning zero-trust services in NGNs poses significant challenges, primarily due to the environmental complexity and dynamics. Motivated by these challenges, this paper explores efficient zero-trust service provisioning using hiera… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages

  46. arXiv:2406.13248  [pdf, other

    cs.IT eess.SP

    Overlay Space-Air-Ground Integrated Networks with SWIPT-Empowered Aerial Communications

    Authors: Anuradha Verma, Pankaj Kumar Sharma, Pawan Kumar, Dong In Kim

    Abstract: In this article, we consider overlay space-air-ground integrated networks (OSAGINs) where a low earth orbit (LEO) satellite communicates with ground users (GUs) with the assistance of an energy-constrained coexisting air-to-air (A2A) network. Particularly, a non-linear energy harvester with a hybrid SWIPT utilizing both power-splitting and time-switching energy harvesting (EH) techniques is employ… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 36 pages, 14 figures, This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  47. arXiv:2406.12806  [pdf, other

    cs.SE cs.AI

    Identifying Performance-Sensitive Configurations in Software Systems through Code Analysis with LLM Agents

    Authors: Zehao Wang, Dong Jae Kim, Tse-Hsun Chen

    Abstract: Configuration settings are essential for tailoring software behavior to meet specific performance requirements. However, incorrect configurations are widespread, and identifying those that impact system performance is challenging due to the vast number and complexity of possible settings. In this work, we present PerfSense, a lightweight framework that leverages Large Language Models (LLMs) to eff… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  48. arXiv:2406.11427  [pdf, other

    eess.AS cs.AI cs.CL cs.LG cs.SD

    DiTTo-TTS: Efficient and Scalable Zero-Shot Text-to-Speech with Diffusion Transformer

    Authors: Keon Lee, Dong Won Kim, Jaehyeon Kim, Jaewoong Cho

    Abstract: Large-scale diffusion models have shown outstanding generative abilities across multiple modalities including images, videos, and audio. However, text-to-speech (TTS) systems typically involve domain-specific modeling factors (e.g., phonemes and phoneme-level durations) to ensure precise temporal alignments between text and speech, which hinders the efficiency and scalability of diffusion models f… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  49. arXiv:2406.11210  [pdf, other

    cs.CV

    Zero-Shot Scene Change Detection

    Authors: Kyusik Cho, Dong Yeop Kim, Euntai Kim

    Abstract: We present a novel, training-free approach to scene change detection. Our method leverages tracking models, which inherently perform change detection between consecutive frames of video by identifying common objects and detecting new or missing objects. Specifically, our method takes advantage of the change detection effect of the tracking model by inputting reference and query images instead of c… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Preprint. Under review

  50. arXiv:2406.10935  [pdf, other

    cs.CV

    Pick-or-Mix: Dynamic Channel Sampling for ConvNets

    Authors: Ashish Kumar, Daneul Kim, Jaesik Park, Laxmidhar Behera

    Abstract: Channel pruning approaches for convolutional neural networks (ConvNets) deactivate the channels, statically or dynamically, and require special implementation. In addition, channel squeezing in representative ConvNets is carried out via 1x1 convolutions which dominates a large portion of computations and network parameters. Given these challenges, we propose an effective multi-purpose module for d… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Published in Computer Vision and Pattern Recognition (CVPR 2024)

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024