Skip to main content

Showing 1–50 of 2,606 results for author: Kim, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10937  [pdf, other

    cs.HC

    Proxona: Leveraging LLM-Driven Personas to Enhance Creators' Understanding of Their Audience

    Authors: Yoonseo Choi, Eun Jeong Kang, Seulgi Choi, Min Kyung Lee, Juho Kim

    Abstract: Creators are nothing without their audience, and thereby understanding their audience is the cornerstone of their professional achievement. Yet many creators feel lost while comprehending audiences with existing tools, which offer insufficient insights for tailoring content to audience needs. To address the challenges creators face in understanding their audience, we present Proxona, a system for… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 32 pages (including 14 pages of Appendix)

  2. arXiv:2408.10900  [pdf, other

    cs.AI cs.ET cs.NE

    Towards Efficient Formal Verification of Spiking Neural Network

    Authors: Baekryun Seong, Jieung Kim, Sang-Ki Ko

    Abstract: Recently, AI research has primarily focused on large language models (LLMs), and increasing accuracy often involves scaling up and consuming more power. The power consumption of AI has become a significant societal issue; in this context, spiking neural networks (SNNs) offer a promising solution. SNNs operate event-driven, like the human brain, and compress information temporally. These characteri… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  3. arXiv:2408.10086  [pdf, other

    cs.AI

    ARMADA: Attribute-Based Multimodal Data Augmentation

    Authors: Xiaomeng Jin, Jeonghwan Kim, Yu Zhou, Kuan-Hao Huang, Te-Lin Wu, Nanyun Peng, Heng Ji

    Abstract: In Multimodal Language Models (MLMs), the cost of manually annotating high-quality image-text pair data for fine-tuning and alignment is extremely high. While existing multimodal data augmentation frameworks propose ways to augment image-text pairs, they either suffer from semantic inconsistency between texts and images, or generate unrealistic images, causing knowledge gap with real world example… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  4. arXiv:2408.09734  [pdf, other

    cs.CV cs.AI

    Mutually-Aware Feature Learning for Few-Shot Object Counting

    Authors: Yerim Jeon, Subeen Lee, Jihwan Kim, Jae-Pil Heo

    Abstract: Few-shot object counting has garnered significant attention for its practicality as it aims to count target objects in a query image based on given exemplars without the need for additional training. However, there is a shortcoming in the prevailing extract-and-match approach: query and exemplar features lack interaction during feature extraction since they are extracted unaware of each other and… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Submitted to Pattern Recognition

  5. arXiv:2408.09685  [pdf, ps, other

    cs.IT quant-ph

    Triorthogonal Codes and Self-dual Codes

    Authors: Minjia Shi, Haodong Lu, Jon-Lark Kim, Patrick Sole

    Abstract: Triorthogonal matrices were introduced in Quantum Information Theory in connection with distillation of magic states (Bravyi and Haah (2012)). We give an algorithm to construct binary triorthogonal matrices from binary self-dual codes. Further, we generalize to this setting the classical coding techniques of shortening and extending. We also give some simple propagation rules.

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 21 pages

    MSC Class: 94B05

    Journal ref: Quantum Inf Process 23, 280 (2024)

  6. arXiv:2408.09354  [pdf, other

    cs.CV

    Boundary-Recovering Network for Temporal Action Detection

    Authors: Jihwan Kim, Jaehyun Choi, Yerim Jeon, Jae-Pil Heo

    Abstract: Temporal action detection (TAD) is challenging, yet fundamental for real-world video applications. Large temporal scale variation of actions is one of the most primary difficulties in TAD. Naturally, multi-scale features have potential in localizing actions of diverse lengths as widely used in object detection. Nevertheless, unlike objects in images, actions have more ambiguity in their boundaries… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: Submitted to Pattern Recognition Journal

  7. arXiv:2408.09064  [pdf, other

    cs.CV cs.LG

    MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality

    Authors: Zhiyi Shi, Junsik Kim, Wanhua Li, Yicong Li, Hanspeter Pfister

    Abstract: Multi-modal pre-trained models efficiently extract and fuse features from different modalities with low memory requirements for fine-tuning. Despite this efficiency, their application in disease diagnosis is under-explored. A significant challenge is the frequent occurrence of missing modalities, which impairs performance. Additionally, fine-tuning the entire pre-trained model demands substantial… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Accepted by MICCAI 2024

  8. arXiv:2408.08990  [pdf, other

    stat.ME cs.AI cs.CL cs.LG stat.ML

    Adaptive Uncertainty Quantification for Generative AI

    Authors: Jungeum Kim, Sean O'Hagan, Veronika Rockova

    Abstract: This work is concerned with conformal prediction in contemporary applications (including generative AI) where a black-box model has been trained on data that are not accessible to the user. Mirroring split-conformal inference, we design a wrapper around a black-box algorithm which calibrates conformity scores. This calibration is local and proceeds in two stages by first adaptively partitioning th… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  9. arXiv:2408.08631  [pdf, other

    cs.CL

    Persona is a Double-edged Sword: Enhancing the Zero-shot Reasoning by Ensembling the Role-playing and Neutral Prompts

    Authors: Junseok Kim, Nakyeong Yang, Kyomin Jung

    Abstract: Recent studies demonstrate that prompting an appropriate role-playing persona to an LLM improves its reasoning capability. However, assigning a proper persona is difficult since an LLM's performance is extremely sensitive to assigned prompts; therefore, personas sometimes hinder LLMs and degrade their reasoning capabilities. In this paper, we propose a novel framework, Jekyll \& Hyde, which ensemb… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 13 pages, 4 figures

  10. arXiv:2408.08591  [pdf, other

    cs.CV

    Zero-Shot Dual-Path Integration Framework for Open-Vocabulary 3D Instance Segmentation

    Authors: Tri Ton, Ji Woo Hong, SooHwan Eom, Jun Yeop Shim, Junyeong Kim, Chang D. Yoo

    Abstract: Open-vocabulary 3D instance segmentation transcends traditional closed-vocabulary methods by enabling the identification of both previously seen and unseen objects in real-world scenarios. It leverages a dual-modality approach, utilizing both 3D point clouds and 2D multi-view images to generate class-agnostic object mask proposals. Previous efforts predominantly focused on enhancing 3D mask propos… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: OpenSUN 3D: 2nd Workshop on Open-Vocabulary 3D Scene Understanding (CVPR 2024)

  11. arXiv:2408.08430  [pdf, other

    cs.LG cs.CR

    Random Gradient Masking as a Defensive Measure to Deep Leakage in Federated Learning

    Authors: Joon Kim, Sejin Park

    Abstract: Federated Learning(FL), in theory, preserves privacy of individual clients' data while producing quality machine learning models. However, attacks such as Deep Leakage from Gradients(DLG) severely question the practicality of FL. In this paper, we empirically evaluate the efficacy of four defensive methods against DLG: Masking, Clipping, Pruning, and Noising. Masking, while only previously studied… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 13 pages, 5 figures, to be submitted to Applied Intelligence

  12. arXiv:2408.07900  [pdf, other

    cs.SI physics.soc-ph

    Network analysis reveals news press landscape and asymmetric user polarization

    Authors: Byunghwee Lee, Hyo-sun Ryu, Jae Kook Lee, Hawoong Jeong, Beom Jun Kim

    Abstract: Unlike traditional media, online news platforms allow users to consume content that suits their tastes and to facilitate interactions with other people. However, as more personalized consumption of information and interaction with like-minded users increase, ideological bias can inadvertently increase and contribute to the formation of echo chambers, reinforcing the polarization of opinions. Altho… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: 21 pages, 6 figures

  13. arXiv:2408.07757  [pdf, other

    cs.RO

    Inverse k-visibility for RSSI-based Indoor Geometric Mapping

    Authors: Junseo Kim, Matthew Lisondra, Yeganeh Bahoo, Sajad Saeedi

    Abstract: In recent years, the increased availability of WiFi in indoor environments has gained an interest in the robotics community to leverage WiFi signals for enhancing indoor SLAM (Simultaneous Localization and Mapping) systems. SLAM technology is widely used, especially for the navigation and control of autonomous robots. This paper discusses various works in developing WiFi-based localization and cha… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE Sensors Journal for possible publication

  14. arXiv:2408.07326  [pdf, other

    cs.AR

    LPU: A Latency-Optimized and Highly Scalable Processor for Large Language Model Inference

    Authors: Seungjae Moon, Jung-Hoon Kim, Junsoo Kim, Seongmin Hong, Junseo Cha, Minsu Kim, Sukbin Lim, Gyubin Choi, Dongjin Seo, Jongho Kim, Hunjong Lee, Hyunjun Park, Ryeowook Ko, Soongyu Choi, Jongse Park, Jinwon Lee, Joo-Young Kim

    Abstract: The explosive arrival of OpenAI's ChatGPT has fueled the globalization of large language model (LLM), which consists of billions of pretrained parameters that embodies the aspects of syntax and semantics. HyperAccel introduces latency processing unit (LPU), a latency-optimized and highly scalable processor architecture for the acceleration of LLM inference. LPU perfectly balances the memory bandwi… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  15. arXiv:2408.07233  [pdf

    q-bio.GN cs.LG

    Pan-cancer gene set discovery via scRNA-seq for optimal deep learning based downstream tasks

    Authors: Jong Hyun Kim, Jongseong Jang

    Abstract: The application of machine learning to transcriptomics data has led to significant advances in cancer research. However, the high dimensionality and complexity of RNA sequencing (RNA-seq) data pose significant challenges in pan-cancer studies. This study hypothesizes that gene sets derived from single-cell RNA sequencing (scRNA-seq) data will outperform those selected using bulk RNA-seq in pan-can… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

    Comments: 16 pages, 3 figures, 1 tables, and 6 supplementary Table

  16. arXiv:2408.06276  [pdf, other

    cs.CL

    Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation

    Authors: Jieyong Kim, Hyunseo Kim, Hyunjin Cho, SeongKu Kang, Buru Chang, Jinyoung Yeo, Dongha Lee

    Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated exceptional performance across a wide range of tasks, generating significant interest in their application to recommendation systems. However, existing methods have not fully capitalized on the potential of LLMs, often constrained by limited input information or failing to fully utilize their advanced reasoning capabilities. To… ▽ More

    Submitted 13 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

  17. Blind-Match: Efficient Homomorphic Encryption-Based 1:N Matching for Privacy-Preserving Biometric Identification

    Authors: Hyunmin Choi, Jiwon Kim, Chiyoung Song, Simon S. Woo, Hyoungshick Kim

    Abstract: We present Blind-Match, a novel biometric identification system that leverages homomorphic encryption (HE) for efficient and privacy-preserving 1:N matching. Blind-Match introduces a HE-optimized cosine similarity computation method, where the key idea is to divide the feature vector into smaller parts for processing rather than computing the entire vector at once. By optimizing the number of thes… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted to CIKM 2024 (Applied Research Track)

  18. arXiv:2408.06010  [pdf, other

    cs.CV

    DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation

    Authors: Jisoo Kim, Jungbin Cho, Joonho Park, Soonmin Hwang, Da Eun Kim, Geon Kim, Youngjae Yu

    Abstract: Speech-driven 3D facial animation has garnered lots of attention thanks to its broad range of applications. Despite recent advancements in achieving realistic lip motion, current methods fail to capture the nuanced emotional undertones conveyed through speech and produce monotonous facial motion. These limitations result in blunt and repetitive facial animations, reducing user engagement and hinde… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: First two authors contributed equally

  19. arXiv:2408.05955  [pdf, other

    cs.CV

    Probabilistic Vision-Language Representation for Weakly Supervised Temporal Action Localization

    Authors: Geuntaek Lim, Hyunwoo Kim, Joonsoo Kim, Yukyung Choi

    Abstract: Weakly supervised temporal action localization (WTAL) aims to detect action instances in untrimmed videos using only video-level annotations. Since many existing works optimize WTAL models based on action classification labels, they encounter the task discrepancy problem (i.e., localization-by-classification). To tackle this issue, recent studies have attempted to utilize action category names as… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: Accepted to ACM MM 2024

  20. arXiv:2408.04990  [pdf, ps, other

    eess.SP cs.IT

    Stochastic Geometry Analysis of RIS-Assisted Cellular Networks with Reflective Intelligent Surfaces on Roads

    Authors: Chang-Sik Choi, Junhyeong Kim, Junil Choi

    Abstract: Reconfigurable intelligent surfaces (RISs) provide alternative routes for reflected signals to network users, offering numerous applications. This paper explores an innovative approach of strategically deploying RISs along road areas to leverage various propagation and blockage conditions present in cellular networks with roads. To address the local network geometries shown by such networks, we us… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: accepted to IEEE Transactions on Communications

  21. arXiv:2408.04874  [pdf, other

    cs.HC

    DG Comics: Semi-Automatically Authoring Graph Comics for Dynamic Graphs

    Authors: Joohee Kim, Hyunwook Lee, Duc M. Nguyen, Minjeong Shin, Bum Chul Kwon, Sungahn Ko, Niklas Elmqvist

    Abstract: Comics are an effective method for sequential data-driven storytelling, especially for dynamic graphs -- graphs whose vertices and edges change over time. However, manually creating such comics is currently time-consuming, complex, and error-prone. In this paper, we propose DG Comics, a novel comic authoring tool for dynamic graphs that allows users to semi-automatically build and annotate comics.… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: To appear in IEEE Transactions on Visualization and Computer Graphics

  22. arXiv:2408.04693  [pdf, other

    cs.CL cs.AI cs.LG

    Understanding the Performance and Estimating the Cost of LLM Fine-Tuning

    Authors: Yuchen Xia, Jiho Kim, Yuhan Chen, Haojie Ye, Souvik Kundu, Cong Hao, Nishil Talati

    Abstract: Due to the cost-prohibitive nature of training Large Language Models (LLMs), fine-tuning has emerged as an attractive alternative for specializing LLMs for specific tasks using limited compute resources in a cost-effective manner. In this paper, we characterize sparse Mixture of Experts (MoE) based LLM fine-tuning to understand their accuracy and runtime performance on a single GPU. Our evaluation… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 10 pages, conference

  23. arXiv:2408.04266  [pdf, other

    cs.RO eess.SY

    BPMP-Tracker: A Versatile Aerial Target Tracker Using Bernstein Polynomial Motion Primitives

    Authors: Yunwoo Lee, Jungwon Park, Boseong Jeon, Seungwoo Jung, H. Jin Kim

    Abstract: This letter presents a versatile trajectory planning pipeline for aerial tracking. The proposed tracker is capable of handling various chasing settings such as complex unstructured environments, crowded dynamic obstacles and multiple-target following. Among the entire pipeline, we focus on developing a predictor for future target motion and a chasing trajectory planner. For rapid computation, we e… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 8 pages, 9 figures

  24. arXiv:2408.03612  [pdf, other

    cs.CV cs.LG

    JARViS: Detecting Actions in Video Using Unified Actor-Scene Context Relation Modeling

    Authors: Seok Hwan Lee, Taein Son, Soo Won Seo, Jisong Kim, Jun Won Choi

    Abstract: Video action detection (VAD) is a formidable vision task that involves the localization and classification of actions within the spatial and temporal dimensions of a video clip. Among the myriad VAD architectures, two-stage VAD methods utilize a pre-trained person detector to extract the region of interest features, subsequently employing these features for action detection. However, the performan… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 31 pages, 10 figures

  25. arXiv:2408.03551  [pdf, other

    cs.CV cs.RO

    VPOcc: Exploiting Vanishing Point for Monocular 3D Semantic Occupancy Prediction

    Authors: Junsu Kim, Junhee Lee, Ukcheol Shin, Jean Oh, Kyungdon Joo

    Abstract: Monocular 3D semantic occupancy prediction is becoming important in robot vision due to the compactness of using a single RGB camera. However, existing methods often do not adequately account for camera perspective geometry, resulting in information imbalance along the depth range of the image. To address this issue, we propose a vanishing point (VP) guided monocular 3D semantic occupancy predicti… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  26. arXiv:2408.03541  [pdf, ps, other

    cs.CL cs.AI

    EXAONE 3.0 7.8B Instruction Tuned Language Model

    Authors: LG AI Research, :, Soyoung An, Kyunghoon Bae, Eunbi Choi, Stanley Jungkyu Choi, Yemuk Choi, Seokhee Hong, Yeonjung Hong, Junwon Hwang, Hyojin Jeon, Gerrard Jeongwon Jo, Hyunjik Jo, Jiyeon Jung, Yountae Jung, Euisoon Kim, Hyosang Kim, Joonkee Kim, Seonghwan Kim, Soyeon Kim, Sunkyoung Kim, Yireun Kim, Youchul Kim, Edward Hwayoung Lee, Haeju Lee , et al. (14 additional authors not shown)

    Abstract: We introduce EXAONE 3.0 instruction-tuned language model, the first open model in the family of Large Language Models (LLMs) developed by LG AI Research. Among different model sizes, we publicly release the 7.8B instruction-tuned model to promote open research and innovations. Through extensive evaluations across a wide range of public and in-house benchmarks, EXAONE 3.0 demonstrates highly compet… ▽ More

    Submitted 13 August, 2024; v1 submitted 7 August, 2024; originally announced August 2024.

  27. arXiv:2408.02888  [pdf, other

    cs.CV cs.AI

    VizECGNet: Visual ECG Image Network for Cardiovascular Diseases Classification with Multi-Modal Training and Knowledge Distillation

    Authors: Ju-Hyeon Nam, Seo-Hyung Park, Su Jung Kim, Sang-Chul Lee

    Abstract: An electrocardiogram (ECG) captures the heart's electrical signal to assess various heart conditions. In practice, ECG data is stored as either digitized signals or printed images. Despite the emergence of numerous deep learning models for digitized signals, many hospitals prefer image storage due to cost considerations. Recognizing the unavailability of raw ECG signals in many clinical settings,… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Accepted in International Conference on Image Processing (ICIP) 2024

  28. arXiv:2408.02883  [pdf, other

    cs.HC cs.SI

    "Sharing, Not Showing Off": How BeReal Approaches Authentic Self-Presentation on Social Media Through Its Design

    Authors: JaeWon Kim, Robert Wolfe, Ishita Chordia, Katie Davis, Alexis Hiniker

    Abstract: Adolescents are particularly vulnerable to the pressures created by social media, such as heightened self-consciousness and the need for extensive self-presentation. In this study, we investigate how BeReal, a social media platform designed to counter some of these pressures, influences adolescents' self-presentation behaviors. We interviewed 29 users aged 13-18 to understand their experiences wit… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  29. arXiv:2408.02582  [pdf, other

    cs.SD cs.AI eess.AS

    Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition

    Authors: Jaeyoung Kim, Han Lu, Soheil Khorram, Anshuman Tripathi, Qian Zhang, Hasim Sak

    Abstract: Modern automatic speech recognition (ASR) systems are typically trained on more than tens of thousands hours of speech data, which is one of the main factors for their great success. However, the distribution of such data is typically biased towards common accents or typical speech patterns. As a result, those systems often poorly perform on atypical accented speech. In this paper, we present acce… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  30. arXiv:2408.01585  [pdf, other

    cs.SE cs.AI

    OpenLogParser: Unsupervised Parsing with Open-Source Large Language Models

    Authors: Zeyang Ma, Dong Jae Kim, Tse-Hsun Chen

    Abstract: Log parsing is a critical step that transforms unstructured log data into structured formats, facilitating subsequent log-based analysis. Traditional syntax-based log parsers are efficient and effective, but they often experience decreased accuracy when processing logs that deviate from the predefined rules. Recently, large language models (LLM) based log parsers have shown superior parsing accura… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  31. arXiv:2408.01446  [pdf, other

    cs.CY cs.AI cs.CV cs.LG

    Estimating Environmental Cost Throughout Model's Adaptive Life Cycle

    Authors: Vishwesh Sangarya, Richard Bradford, Jung-Eun Kim

    Abstract: With the rapid increase in the research, development, and application of neural networks in the current era, there is a proportional increase in the energy needed to train and use models. Crucially, this is accompanied by the increase in carbon emissions into the environment. A sustainable and socially beneficial approach to reducing the carbon footprint and rising energy demands associated with t… ▽ More

    Submitted 22 July, 2024; originally announced August 2024.

    Comments: Accepted in the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society, 2024

  32. arXiv:2408.01292  [pdf

    eess.IV cs.AI cs.CV

    3DPX: Progressive 2D-to-3D Oral Image Reconstruction with Hybrid MLP-CNN Networks

    Authors: Xiaoshuang Li, Mingyuan Meng, Zimo Huang, Lei Bi, Eduardo Delamare, Dagan Feng, Bin Sheng, Jinman Kim

    Abstract: Panoramic X-ray (PX) is a prevalent modality in dental practice for its wide availability and low cost. However, as a 2D projection image, PX does not contain 3D anatomical information, and therefore has limited use in dental applications that can benefit from 3D information, e.g., tooth angular misa-lignment detection and classification. Reconstructing 3D structures directly from 2D PX has recent… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: accepted by MICCAI 2024

  33. arXiv:2408.01084  [pdf, other

    cs.CL

    Adaptive Contrastive Decoding in Retrieval-Augmented Generation for Handling Noisy Contexts

    Authors: Youna Kim, Hyuhng Joon Kim, Cheonbok Park, Choonghyun Park, Hyunsoo Cho, Junyeob Kim, Kang Min Yoo, Sang-goo Lee, Taeuk Kim

    Abstract: When using large language models (LLMs) in knowledge-intensive tasks, such as open-domain question answering, external context can bridge a gap between external knowledge and LLM's parametric knowledge. Recent research has been developed to amplify contextual knowledge over the parametric knowledge of LLM with contrastive decoding approaches. While these approaches could yield truthful responses w… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  34. arXiv:2408.00994  [pdf, other

    cs.SE cs.AI cs.CL

    ArchCode: Incorporating Software Requirements in Code Generation with Large Language Models

    Authors: Hojae Han, Jaejin Kim, Jaeseok Yoo, Youngwon Lee, Seung-won Hwang

    Abstract: This paper aims to extend the code generation capability of large language models (LLMs) to automatically manage comprehensive software requirements from given textual descriptions. Such requirements include both functional (i.e. achieving expected behavior for inputs) and non-functional (e.g., time/space performance, robustness, maintainability) requirements. However, textual descriptions can eit… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted by ACL 2024 main conference

  35. arXiv:2408.00380  [pdf, other

    cs.LG cs.AI cs.CV

    Enhancing Whole Slide Pathology Foundation Models through Stain Normalization

    Authors: Juseung Yun, Yi Hu, Jinhyung Kim, Jongseong Jang, Soonyoung Lee

    Abstract: Recent advancements in digital pathology have led to the development of numerous foundational models that utilize self-supervised learning on patches extracted from gigapixel whole slide images (WSIs). While this approach leverages vast amounts of unlabeled data, we have discovered a significant issue: features extracted from these self-supervised models tend to cluster by individual WSIs, a pheno… ▽ More

    Submitted 4 August, 2024; v1 submitted 1 August, 2024; originally announced August 2024.

    Comments: 13 pages, 8 figures

  36. arXiv:2408.00351  [pdf, other

    cs.CV

    Hierarchically Structured Neural Bones for Reconstructing Animatable Objects from Casual Videos

    Authors: Subin Jeon, In Cho, Minsu Kim, Woong Oh Cho, Seon Joo Kim

    Abstract: We propose a new framework for creating and easily manipulating 3D models of arbitrary objects using casually captured videos. Our core ingredient is a novel hierarchy deformation model, which captures motions of objects with a tree-structured bones. Our hierarchy system decomposes motions based on the granularity and reveals the correlations between parts without exploiting any prior structural k… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: ECCV 2024 accepted

  37. arXiv:2408.00347  [pdf, other

    cs.CV cs.AI

    Advancing Medical Image Segmentation: Morphology-Driven Learning with Diffusion Transformer

    Authors: Sungmin Kang, Jaeha Song, Jihie Kim

    Abstract: Understanding the morphological structure of medical images and precisely segmenting the region of interest or abnormality is an important task that can assist in diagnosis. However, the unique properties of medical imaging make clear segmentation difficult, and the high cost and time-consuming task of labeling leads to a coarse-grained representation of ground truth. Facing with these problems, w… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted in BMVC 2024

  38. Exploiting Preferences in Loss Functions for Sequential Recommendation via Weak Transitivity

    Authors: Hyunsoo Chung, Jungtaek Kim, Hyungeun Jo, Hyungwon Choi

    Abstract: A choice of optimization objective is immensely pivotal in the design of a recommender system as it affects the general modeling process of a user's intent from previous interactions. Existing approaches mainly adhere to three categories of loss functions: pairwise, pointwise, and setwise loss functions. Despite their effectiveness, a critical and common drawback of such objectives is viewing the… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Accepted to CIKM 2024, Short Research Paper Track

  39. arXiv:2407.21604  [pdf, other

    cs.CV

    MicroMIL: Graph-based Contextual Multiple Instance Learning for Patient Diagnosis Using Microscopy Images

    Authors: JongWoo Kim, Bryan Wong, YoungSin Ko, MunYong Yi

    Abstract: Current histopathology research has primarily focused on using whole-slide images (WSIs) produced by scanners with weakly-supervised multiple instance learning (MIL). However, WSIs are costly, memory-intensive, and require extensive analysis time. As an alternative, microscopy-based analysis offers cost and memory efficiency, though microscopy images face issues with unknown absolute positions and… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: The first two authors contributed equally to this work

  40. arXiv:2407.21571  [pdf, other

    cs.CL cs.AI

    PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning

    Authors: Min Jae Jung, JooHee Kim

    Abstract: Large Language Models (LLMs) encounter significant challenges in continual learning due to catastrophic forgetting, where new information overwrites previously acquired knowledge. This limitation leads to substantial environmental and economic waste. In this study, we introduce the PMoE, Progressive Mixture of Experts with Asymmetric Transformer, which aims to minimize forgetting by utilizing an a… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  41. arXiv:2407.21448  [pdf, other

    cs.CV

    Accelerating Image Super-Resolution Networks with Pixel-Level Classification

    Authors: Jinho Jeong, Jinwoo Kim, Younghyun Jo, Seon Joo Kim

    Abstract: In recent times, the need for effective super-resolution (SR) techniques has surged, especially for large-scale images ranging 2K to 8K resolutions. For DNN-based SISR, decomposing images into overlapping patches is typically necessary due to computational constraints. In such patch-decomposing scheme, one can allocate computational resources differently based on each patch's difficulty to further… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  42. arXiv:2407.21267  [pdf, other

    cs.RO cs.AI cs.CV

    DEF-oriCORN: efficient 3D scene understanding for robust language-directed manipulation without demonstrations

    Authors: Dongwon Son, Sanghyeon Son, Jaehyung Kim, Beomjoon Kim

    Abstract: We present DEF-oriCORN, a framework for language-directed manipulation tasks. By leveraging a novel object-based scene representation and diffusion-model-based state estimation algorithm, our framework enables efficient and robust manipulation planning in response to verbal commands, even in tightly packed environments with sparse camera views without any demonstrations. Unlike traditional represe… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  43. arXiv:2407.21035  [pdf, other

    cs.CV

    Direct Unlearning Optimization for Robust and Safe Text-to-Image Models

    Authors: Yong-Hyun Park, Sangdoo Yun, Jin-Hwa Kim, Junho Kim, Geonhui Jang, Yonghyun Jeong, Junghyo Jo, Gayoung Lee

    Abstract: Recent advancements in text-to-image (T2I) models have greatly benefited from large-scale datasets, but they also pose significant risks due to the potential generation of unsafe content. To mitigate this issue, researchers have developed unlearning techniques to remove the model's ability to generate potentially harmful content. However, these methods are easily bypassed by adversarial attacks, m… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Extended abstract accepted in GenLaw 2024 workshop @ ICML2024

  44. arXiv:2407.20648  [pdf, other

    cs.LG cs.AI

    Leveraging Multi-facet Paths for Heterogeneous Graph Representation Learning

    Authors: JongWoo Kim, SeongYeub Chu, HyeongMin Park, Bryan Wong, MunYong Yi

    Abstract: Recent advancements in graph neural networks (GNNs) and heterogeneous GNNs (HGNNs) have advanced node embeddings and relationship learning for various tasks. However, existing methods often rely on domain-specific predefined meta-paths, which are coarse-grained and focus solely on aspects like node type, limiting their ability to capture complex interactions. We introduce MF2Vec, a model that uses… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 9pages

  45. arXiv:2407.19900  [pdf, other

    cs.SD cs.AI eess.AS

    Practical and Reproducible Symbolic Music Generation by Large Language Models with Structural Embeddings

    Authors: Seungyeon Rhyu, Kichang Yang, Sungjun Cho, Jaehyeon Kim, Kyogu Lee, Moontae Lee

    Abstract: Music generation introduces challenging complexities to large language models. Symbolic structures of music often include vertical harmonization as well as horizontal counterpoint, urging various adaptations and enhancements for large-scale Transformers. However, existing works share three major drawbacks: 1) their tokenization requires domain-specific annotations, such as bars and beats, that are… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: 9 pages, 6 figures, 4 tables

  46. arXiv:2407.19532  [pdf, other

    cs.AI cs.LG

    The Interpretability of Codebooks in Model-Based Reinforcement Learning is Limited

    Authors: Kenneth Eaton, Jonathan Balloch, Julia Kim, Mark Riedl

    Abstract: Interpretability of deep reinforcement learning systems could assist operators with understanding how they interact with their environment. Vector quantization methods -- also called codebook methods -- discretize a neural network's latent space that is often suggested to yield emergent interpretability. We investigate whether vector quantization in fact provides interpretability in model-based re… ▽ More

    Submitted 28 July, 2024; originally announced July 2024.

  47. arXiv:2407.19216  [pdf, other

    cs.CR cs.AI cs.SE

    EaTVul: ChatGPT-based Evasion Attack Against Software Vulnerability Detection

    Authors: Shigang Liu, Di Cao, Junae Kim, Tamas Abraham, Paul Montague, Seyit Camtepe, Jun Zhang, Yang Xiang

    Abstract: Recently, deep learning has demonstrated promising results in enhancing the accuracy of vulnerability detection and identifying vulnerabilities in software. However, these techniques are still vulnerable to attacks. Adversarial examples can exploit vulnerabilities within deep neural networks, posing a significant threat to system security. This study showcases the susceptibility of deep learning m… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  48. arXiv:2407.19156  [pdf, other

    cs.CV

    Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble

    Authors: Juhan Cha, Minseok Joo, Jihwan Park, Sanghyeok Lee, Injae Kim, Hyunwoo J. Kim

    Abstract: Recent advancements in 3D object detection have benefited from multi-modal information from the multi-view cameras and LiDAR sensors. However, the inherent disparities between the modalities pose substantial challenges. We observe that existing multi-modal 3D object detection methods heavily rely on the LiDAR sensor, treating the camera as an auxiliary modality for augmenting semantic details. Thi… ▽ More

    Submitted 19 August, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

  49. arXiv:2407.18574  [pdf, other

    cs.CV

    Learning to Enhance Aperture Phasor Field for Non-Line-of-Sight Imaging

    Authors: In Cho, Hyunbo Shim, Seon Joo Kim

    Abstract: This paper aims to facilitate more practical NLOS imaging by reducing the number of samplings and scan areas. To this end, we introduce a phasor-based enhancement network that is capable of predicting clean and full measurements from noisy partial observations. We leverage a denoising autoencoder scheme to acquire rich and noise-robust representations in the measurement space. Through this pipelin… ▽ More

    Submitted 28 July, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

  50. arXiv:2407.18550  [pdf, other

    cs.RO cs.AI

    ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments

    Authors: Taewoong Kim, Cheolhong Min, Byeonghwi Kim, Jinyeon Kim, Wonje Jeung, Jonghyun Choi

    Abstract: Simulated virtual environments have been widely used to learn robotic agents that perform daily household tasks. These environments encourage research progress by far, but often provide limited object interactability, visual appearance different from real-world environments, or relatively smaller environment sizes. This prevents the learned models in the virtual scenes from being readily deployabl… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: ECCV 2024 (Project page: https://twoongg.github.io/projects/realfred)