Skip to main content

Showing 1–50 of 205 results for author: Jung, S

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.09727  [pdf, other

    cs.RO

    Quantitative 3D Map Accuracy Evaluation Hardware and Algorithm for LiDAR(-Inertial) SLAM

    Authors: Sanghyun Hahn, Seunghun Oh, Minwoo Jung, Ayoung Kim, Sangwoo Jung

    Abstract: Accuracy evaluation of a 3D pointcloud map is crucial for the development of autonomous driving systems. In this work, we propose a user-independent software/hardware system that can quantitatively evaluate the accuracy of a 3D pointcloud map acquired from LiDAR(-Inertial) SLAM. We introduce a LiDAR target that functions robustly in the outdoor environment, while remaining observable by LiDAR. We… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: ICCAS 2024 accepted, 5 pages, 6 figures, 2 Tables

  2. arXiv:2408.09140  [pdf, other

    cs.LG cs.AI cs.CV

    Learning to Explore for Stochastic Gradient MCMC

    Authors: SeungHyun Kim, Seohyeon Jung, Seonghyeon Kim, Juho Lee

    Abstract: Bayesian Neural Networks(BNNs) with high-dimensional parameters pose a challenge for posterior inference due to the multi-modality of the posterior distributions. Stochastic Gradient MCMC(SGMCMC) with cyclical learning rate scheduling is a promising solution, but it requires a large number of sampling steps to explore high-dimensional multi-modal posteriors, making it computationally expensive. In… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  3. arXiv:2408.06707  [pdf, other

    cs.CV

    MAIR++: Improving Multi-view Attention Inverse Rendering with Implicit Lighting Representation

    Authors: JunYong Choi, SeokYeong Lee, Haesol Park, Seung-Won Jung, Ig-Jae Kim, Junghyun Cho

    Abstract: In this paper, we propose a scene-level inverse rendering framework that uses multi-view images to decompose the scene into geometry, SVBRDF, and 3D spatially-varying lighting. While multi-view images have been widely used for object-level inverse rendering, scene-level inverse rendering has primarily been studied using single-view images due to the lack of a dataset containing high dynamic range… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  4. arXiv:2408.04266  [pdf, other

    cs.RO eess.SY

    BPMP-Tracker: A Versatile Aerial Target Tracker Using Bernstein Polynomial Motion Primitives

    Authors: Yunwoo Lee, Jungwon Park, Boseong Jeon, Seungwoo Jung, H. Jin Kim

    Abstract: This letter presents a versatile trajectory planning pipeline for aerial tracking. The proposed tracker is capable of handling various chasing settings such as complex unstructured environments, crowded dynamic obstacles and multiple-target following. Among the entire pipeline, we focus on developing a predictor for future target motion and a chasing trajectory planner. For rapid computation, we e… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: 8 pages, 9 figures

  5. arXiv:2408.04190  [pdf, other

    cs.LG cs.AI

    Listwise Reward Estimation for Offline Preference-based Reinforcement Learning

    Authors: Heewoong Choi, Sangwon Jung, Hongjoon Ahn, Taesup Moon

    Abstract: In Reinforcement Learning (RL), designing precise reward functions remains to be a challenge, particularly when aligning with human intent. Preference-based RL (PbRL) was introduced to address this problem by learning reward models from human feedback. However, existing PbRL methods have limitations as they often overlook the second-order preference that indicates the relative strength of preferen… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: 21 pages, ICML 2024

  6. arXiv:2407.21032  [pdf, other

    cs.CV cs.AI

    Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion

    Authors: Sanghyun Kim, Seohyeon Jung, Balhae Kim, Moonseok Choi, Jinwoo Shin, Juho Lee

    Abstract: This paper addresses the societal concerns arising from large-scale text-to-image diffusion models for generating potentially harmful or copyrighted content. Existing models rely heavily on internet-crawled data, wherein problematic concepts persist due to incomplete filtration processes. While previous approaches somewhat alleviate the issue, they often rely on text-specified concepts, introducin… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. 56 pages, 24 figures. Caution: This paper contains discussions and examples related to harmful content, including text and images. Reader discretion is advised. Code is available at https://github.com/nannullna/safeguard-hfi

  7. arXiv:2407.18442  [pdf, other

    cs.CL

    Guidance-Based Prompt Data Augmentation in Specialized Domains for Named Entity Recognition

    Authors: Hyeonseok Kang, Hyein Seo, Jeesu Jung, Sangkeun Jung, Du-Seong Chang, Riwoo Chung

    Abstract: While the abundance of rich and vast datasets across numerous fields has facilitated the advancement of natural language processing, sectors in need of specialized data types continue to struggle with the challenge of finding quality data. Our study introduces a novel guidance data augmentation technique utilizing abstracted context and sentence structures to produce varied sentences while maintai… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  8. arXiv:2407.17546  [pdf, other

    cs.LG cs.AI cs.CL

    Exploring Domain Robust Lightweight Reward Models based on Router Mechanism

    Authors: Hyuk Namgoong, Jeesu Jung, Sangkeun Jung, Yoonhyung Roh

    Abstract: Recent advancements in large language models have heavily relied on the large reward model from reinforcement learning from human feedback for fine-tuning. However, the use of a single reward model across various domains may not always be optimal, often requiring retraining from scratch when new domain data is introduced. To address these challenges, we explore the utilization of small language mo… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: This paper is accepted for ACL 2024

  9. arXiv:2407.11682  [pdf, other

    cs.CV

    MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation

    Authors: Xiaoshuai Hao, Ruikai Li, Hui Zhang, Dingzhe Li, Rong Yin, Sangil Jung, Seung-In Park, ByungIn Yoo, Haimei Zhao, Jing Zhang

    Abstract: Online high-definition (HD) map construction is an important and challenging task in autonomous driving. Recently, there has been a growing interest in cost-effective multi-view camera-based methods without relying on other sensors like LiDAR. However, these methods suffer from a lack of explicit depth information, necessitating the use of large models to achieve satisfactory performance. To addre… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  10. arXiv:2407.11348  [pdf, other

    cs.CV

    Flatfish Disease Detection Based on Part Segmentation Approach and Disease Image Generation

    Authors: Seo-Bin Hwang, Han-Young Kim, Chae-Yeon Heo, Hie-Yong Jung, Sung-Ju Jung, Yeong-Jun Cho

    Abstract: The flatfish is a major farmed species consumed globally in large quantities. However, due to the densely populated farming environment, flatfish are susceptible to injuries and diseases, making early disease detection crucial. Traditionally, diseases were detected through visual inspection, but observing large numbers of fish is challenging. Automated approaches based on deep learning technologie… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 16 page, 13 figures, 4 tables

  11. arXiv:2407.09184  [pdf, other

    cs.CL

    Does Incomplete Syntax Influence Korean Language Model? Focusing on Word Order and Case Markers

    Authors: Jong Myoung Kim, Young-Jun Lee, Yong-jin Han, Sangkeun Jung, Ho-Jin Choi

    Abstract: Syntactic elements, such as word order and case markers, are fundamental in natural language processing. Recent studies show that syntactic information boosts language model performance and offers clues for people to understand their learning mechanisms. Unlike languages with a fixed word order such as English, Korean allows for varied word sequences, despite its canonical structure, due to case m… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: COLM 2024; Code and dataset is available in https://github.com/grayapple-git/SIKO

  12. arXiv:2407.05820  [pdf, other

    cs.RO

    Co-RaL: Complementary Radar-Leg Odometry with 4-DoF Optimization and Rolling Contact

    Authors: Sangwoo Jung, Wooseong Yang, Ayoung Kim

    Abstract: Robust and accurate localization in challenging environments is becoming crucial for SLAM. In this paper, we propose a unique sensor configuration for precise and robust odometry by integrating chip radar and a legged robot. Specifically, we introduce a tightly coupled radar-leg odometry algorithm for complementary drift correction. Adopting the 4-DoF optimization and decoupled RANSAC to mmWave ch… ▽ More

    Submitted 10 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: IROS 2024 accepted, 8 pages, 7 figures, 4 Tables

  13. arXiv:2406.16994  [pdf, other

    eess.SP cs.AI

    Quantum Multi-Agent Reinforcement Learning for Cooperative Mobile Access in Space-Air-Ground Integrated Networks

    Authors: Gyu Seon Kim, Yeryeong Cho, Jaehyun Chung, Soohyun Park, Soyi Jung, Zhu Han, Joongheon Kim

    Abstract: Achieving global space-air-ground integrated network (SAGIN) access only with CubeSats presents significant challenges such as the access sustainability limitations in specific regions (e.g., polar regions) and the energy efficiency limitations in CubeSats. To tackle these problems, high-altitude long-endurance unmanned aerial vehicles (HALE-UAVs) can complement these CubeSat shortcomings for prov… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: 17 pages, 22 figures

  14. Lesion-Aware Cross-Phase Attention Network for Renal Tumor Subtype Classification on Multi-Phase CT Scans

    Authors: Kwang-Hyun Uhm, Seung-Won Jung, Sung-Hoo Hong, Sung-Jea Ko

    Abstract: Multi-phase computed tomography (CT) has been widely used for the preoperative diagnosis of kidney cancer due to its non-invasive nature and ability to characterize renal lesions. However, since enhancement patterns of renal lesions across CT phases are different even for the same lesion type, the visual assessment by radiologists suffers from inter-observer variability in clinical practice. Altho… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: This article has been accepted for publication in Computers in Biology and Medicine

    Journal ref: Computers in Biology and Medicine, 108746, 2024

  15. arXiv:2405.10944  [pdf, other

    physics.chem-ph cs.LG

    Probabilistic transfer learning methodology to expedite high fidelity simulation of reactive flows

    Authors: Bruno S. Soriano, Ki Sung Jung, Tarek Echekki, Jacqueline H. Chen, Mohammad Khalil

    Abstract: Reduced order models based on the transport of a lower dimensional manifold representation of the thermochemical state, such as Principal Component (PC) transport and Machine Learning (ML) techniques, have been developed to reduce the computational cost associated with the Direct Numerical Simulations (DNS) of reactive flows. Both PC transport and ML normally require an abundance of data to exhibi… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

  16. arXiv:2404.18063  [pdf, other

    cs.LG physics.flu-dyn

    Machine Learning Techniques for Data Reduction of CFD Applications

    Authors: Jaemoon Lee, Ki Sung Jung, Qian Gong, Xiao Li, Scott Klasky, Jacqueline Chen, Anand Rangarajan, Sanjay Ranka

    Abstract: We present an approach called guaranteed block autoencoder that leverages Tensor Correlations (GBATC) for reducing the spatiotemporal data generated by computational fluid dynamics (CFD) and other scientific applications. It uses a multidimensional block of tensors (spanning in space and time) for both input and output, capturing the spatiotemporal and interspecies relationship within a tensor. Th… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 10 pages, 8 figures

  17. arXiv:2404.14664  [pdf, ps, other

    cs.LG cs.AI

    Employing Layerwised Unsupervised Learning to Lessen Data and Loss Requirements in Forward-Forward Algorithms

    Authors: Taewook Hwang, Hyein Seo, Sangkeun Jung

    Abstract: Recent deep learning models such as ChatGPT utilizing the back-propagation algorithm have exhibited remarkable performance. However, the disparity between the biological brain processes and the back-propagation algorithm has been noted. The Forward-Forward algorithm, which trains deep learning models solely through the forward pass, has emerged to address this. Although the Forward-Forward algorit… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 8 pages, 6 figures

  18. arXiv:2404.06021  [pdf, other

    cs.HC

    Combinational Nonuniform Timeslicing of Dynamic Networks

    Authors: Seokweon Jung, DongHwa Shin, Hyeon Jeon, Jinwook Seo

    Abstract: Dynamic networks represent the complex and evolving interrelationships between real-world entities. Given the scale and variability of these networks, finding an optimal slicing interval is essential for meaningful analysis. Nonuniform timeslicing, which adapts to density changes within the network, is drawing attention as a solution to this problem. In this research, we categorized existing algor… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: PacificVis2024 poster

  19. arXiv:2404.04819  [pdf, other

    cs.CV

    Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer

    Authors: Hyeongjin Nam, Daniel Sungho Jung, Gyeongsik Moon, Kyoung Mu Lee

    Abstract: Human-object contact serves as a strong cue to understand how humans physically interact with objects. Nevertheless, it is not widely explored to utilize human-object contact information for the joint reconstruction of 3D human and object from a single image. In this work, we present a novel joint 3D human-object reconstruction method (CONTHO) that effectively exploits contact information between… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: Published at CVPR 2024, 19 pages including the supplementary material

  20. arXiv:2404.04656  [pdf, other

    cs.LG cs.AI cs.CL

    Binary Classifier Optimization for Large Language Model Alignment

    Authors: Seungjae Jung, Gunsoo Han, Daniel Wontae Nam, Kyoung-Woon On

    Abstract: Aligning Large Language Models (LLMs) to human preferences through preference optimization has been crucial but labor-intensive, necessitating for each prompt a comparison of both a chosen and a rejected text completion by evaluators. Recently, Kahneman-Tversky Optimization (KTO) has demonstrated that LLMs can be aligned using merely binary "thumbs-up" or "thumbs-down" signals on each prompt-compl… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 18 pages, 9 figures

  21. arXiv:2404.03991  [pdf, other

    eess.IV cs.CV cs.LG

    Towards Efficient and Accurate CT Segmentation via Edge-Preserving Probabilistic Downsampling

    Authors: Shahzad Ali, Yu Rim Lee, Soo Young Park, Won Young Tak, Soon Ki Jung

    Abstract: Downsampling images and labels, often necessitated by limited resources or to expedite network training, leads to the loss of small objects and thin boundaries. This undermines the segmentation network's capacity to interpret images accurately and predict detailed labels, resulting in diminished performance compared to processing at original resolutions. This situation exemplifies the trade-off be… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: 5 pages (4 figures, 1 table); This work has been submitted to the IEEE Signal Processing Letters. Copyright may be transferred without notice, after which this version may no longer be accessible

  22. arXiv:2404.01954  [pdf, other

    cs.CL cs.AI

    HyperCLOVA X Technical Report

    Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

    Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More

    Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: 44 pages; updated authors list and fixed author names

  23. arXiv:2404.01690  [pdf, other

    cs.CV

    RefQSR: Reference-based Quantization for Image Super-Resolution Networks

    Authors: Hongjae Lee, Jun-Sang Yoo, Seung-Won Jung

    Abstract: Single image super-resolution (SISR) aims to reconstruct a high-resolution image from its low-resolution observation. Recent deep learning-based SISR models show high performance at the expense of increased computational costs, limiting their use in resource-constrained environments. As a promising solution for computationally efficient network design, network quantization has been extensively stu… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE Transactions on Image Processing (TIP)

  24. Imaging radar and LiDAR image translation for 3-DOF extrinsic calibration

    Authors: Sangwoo Jung, Hyesu Jang, Minwoo Jung, Ayoung Kim, Myung-Hwan Jeon

    Abstract: The integration of sensor data is crucial in the field of robotics to take full advantage of the various sensors employed. One critical aspect of this integration is determining the extrinsic calibration parameters, such as the relative transformation, between each sensor. The use of data fusion between complementary sensors, such as radar and LiDAR, can provide significant benefits, particularly… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  25. arXiv:2403.17330  [pdf, other

    cs.CV

    Staircase Localization for Autonomous Exploration in Urban Environments

    Authors: Jinrae Kim, Sunggoo Jung, Sung-Kyun Kim, Youdan Kim, Ali-akbar Agha-mohammadi

    Abstract: A staircase localization method is proposed for robots to explore urban environments autonomously. The proposed method employs a modular design in the form of a cascade pipeline consisting of three modules of stair detection, line segment detection, and stair localization modules. The stair detection module utilizes an object detection algorithm based on deep learning to generate a region of inter… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: 9 pages, 10 figures

  26. arXiv:2403.09193  [pdf, other

    cs.CV cs.AI cs.LG q-bio.NC

    Are Vision Language Models Texture or Shape Biased and Can We Steer Them?

    Authors: Paul Gavrikov, Jovita Lukasik, Steffen Jung, Robert Geirhos, Bianca Lamm, Muhammad Jehanzeb Mirza, Margret Keuper, Janis Keuper

    Abstract: Vision language models (VLMs) have drastically changed the computer vision model landscape in only a few years, opening an exciting array of new applications from zero-shot image classification, over to image captioning, and visual question answering. Unlike pure vision models, they offer an intuitive way to access visual content through language prompting. The wide applicability of such models en… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  27. arXiv:2403.09055  [pdf, other

    cs.CV

    StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control

    Authors: Jaerin Lee, Daniel Sungho Jung, Kanggeon Lee, Kyoung Mu Lee

    Abstract: The enormous success of diffusion models in text-to-image synthesis has made them promising candidates for the next generation of end-user applications for image generation and editing. Previous works have focused on improving the usability of diffusion models by reducing the inference time or increasing user interactivity by allowing new, fine-grained controls such as region-based text prompts. H… ▽ More

    Submitted 1 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: 29 pages, 16 figures. v2: typos corrected, references added. Project page: https://jaerinlee.com/research/StreamMultiDiffusion

  28. arXiv:2403.08639  [pdf, other

    cs.CV

    HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction

    Authors: Yi Zhou, Hui Zhang, Jiaqian Yu, Yifan Yang, Sangil Jung, Seung-In Park, ByungIn Yoo

    Abstract: Vectorized High-Definition (HD) map construction requires predictions of the category and point coordinates of map elements (e.g. road boundary, lane divider, pedestrian crossing, etc.). State-of-the-art methods are mainly based on point-level representation learning for regressing accurate point coordinates. However, this pipeline has limitations in obtaining element-level information and handlin… ▽ More

    Submitted 26 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  29. arXiv:2403.05093  [pdf, other

    cs.CV eess.IV

    Spectrum Translation for Refinement of Image Generation (STIG) Based on Contrastive Learning and Spectral Filter Profile

    Authors: Seokjun Lee, Seung-Won Jung, Hyunseok Seo

    Abstract: Currently, image generation and synthesis have remarkably progressed with generative models. Despite photo-realistic results, intrinsic discrepancies are still observed in the frequency domain. The spectral discrepancy appeared not only in generative adversarial networks but in diffusion models. In this study, we propose a framework to effectively mitigate the disparity in frequency domain of the… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: Accepted to AAAI 2024

  30. arXiv:2402.09754  [pdf, other

    stat.ML cs.LG math.ST

    Robust SVD Made Easy: A fast and reliable algorithm for large-scale data analysis

    Authors: Sangil Han, Kyoowon Kim, Sungkyu Jung

    Abstract: The singular value decomposition (SVD) is a crucial tool in machine learning and statistical data analysis. However, it is highly susceptible to outliers in the data matrix. Existing robust SVD algorithms often sacrifice speed for robustness or fail in the presence of only a few outliers. This study introduces an efficient algorithm, called Spherically Normalized SVD, for robust SVD approximation… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  31. arXiv:2402.05195  [pdf, other

    cs.CV cs.CL

    $λ$-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

    Authors: Maitreya Patel, Sangmin Jung, Chitta Baral, Yezhou Yang

    Abstract: Despite the recent advances in personalized text-to-image (P-T2I) generative models, it remains challenging to perform finetuning-free multi-subject-driven T2I in a resource-efficient manner. Predominantly, contemporary approaches, involving the training of Hypernetworks and Multimodal Large Language Models (MLLMs), require heavy computing resources that range from 600 to 12300 GPU hours of traini… ▽ More

    Submitted 9 April, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

    Comments: Project page: https://eclipse-t2i.github.io/Lambda-ECLIPSE/

  32. arXiv:2401.13921  [pdf, other

    eess.AS cs.SD

    Intelli-Z: Toward Intelligible Zero-Shot TTS

    Authors: Sunghee Jung, Won Jang, Jaesam Yoon, Bongwan Kim

    Abstract: Although numerous recent studies have suggested new frameworks for zero-shot TTS using large-scale, real-world data, studies that focus on the intelligibility of zero-shot TTS are relatively scarce. Zero-shot TTS demands additional efforts to ensure clear pronunciation and speech quality due to its inherent requirement of replacing a core parameter (speaker embedding or acoustic prompt) with a new… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

  33. arXiv:2401.13146  [pdf, other

    eess.AS cs.CL cs.SD

    Locality enhanced dynamic biasing and sampling strategies for contextual ASR

    Authors: Md Asif Jalal, Pablo Peso Parada, George Pavlidis, Vasileios Moschopoulos, Karthikeyan Saravanan, Chrysovalantis-Giorgos Kontoulis, Jisi Zhang, Anastasios Drosou, Gil Ho Lee, Jungin Lee, Seokyeong Jung

    Abstract: Automatic Speech Recognition (ASR) still face challenges when recognizing time-variant rare-phrases. Contextual biasing (CB) modules bias ASR model towards such contextually-relevant phrases. During training, a list of biasing phrases are selected from a large pool of phrases following a sampling strategy. In this work we firstly analyse different sampling strategies to provide insights into the t… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

    Comments: Accepted for IEEE ASRU 2023

  34. arXiv:2401.12326  [pdf, other

    cs.CL cs.AI

    Fine-tuning Large Language Models for Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection

    Authors: Feng Xiong, Thanet Markchom, Ziwei Zheng, Subin Jung, Varun Ojha, Huizhi Liang

    Abstract: SemEval-2024 Task 8 introduces the challenge of identifying machine-generated texts from diverse Large Language Models (LLMs) in various languages and domains. The task comprises three subtasks: binary classification in monolingual and multilingual (Subtask A), multi-class classification (Subtask B), and mixed text detection (Subtask C). This paper focuses on Subtask A & B. Each subtask is support… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  35. arXiv:2401.12085  [pdf, other

    eess.AS cs.SD

    Consistency Based Unsupervised Self-training For ASR Personalisation

    Authors: Jisi Zhang, Vandana Rajan, Haaris Mehmood, David Tuckey, Pablo Peso Parada, Md Asif Jalal, Karthikeyan Saravanan, Gil Ho Lee, Jungin Lee, Seokyeong Jung

    Abstract: On-device Automatic Speech Recognition (ASR) models trained on speech data of a large population might underperform for individuals unseen during training. This is due to a domain shift between user data and the original training data, differed by user's speaking characteristics and environmental acoustic conditions. ASR personalisation is a solution that aims to exploit user data to improve model… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted for IEEE ASRU 2023

  36. arXiv:2401.04143  [pdf, other

    cs.CV

    RHOBIN Challenge: Reconstruction of Human Object Interaction

    Authors: Xianghui Xie, Xi Wang, Nikos Athanasiou, Bharat Lal Bhatnagar, Chun-Hao P. Huang, Kaichun Mo, Hao Chen, Xia Jia, Zerui Zhang, Liangxian Cui, Xiao Lin, Bingqiao Qian, Jie Xiao, Wenfei Yang, Hyeongjin Nam, Daniel Sungho Jung, Kihoon Kim, Kyoung Mu Lee, Otmar Hilliges, Gerard Pons-Moll

    Abstract: Modeling the interaction between humans and objects has been an emerging research direction in recent years. Capturing human-object interaction is however a very challenging task due to heavy occlusion and complex dynamics, which requires understanding not only 3D human pose, and object pose but also the interaction between them. Reconstruction of 3D humans and objects has been two separate resear… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

    Comments: 14 pages, 5 tables, 7 figure. Technical report of the CVPR'23 workshop: RHOBIN challenge (https://rhobin-challenge.github.io/)

  37. arXiv:2312.16016  [pdf, other

    cs.RO

    V-STRONG: Visual Self-Supervised Traversability Learning for Off-road Navigation

    Authors: Sanghun Jung, JoonHo Lee, Xiangyun Meng, Byron Boots, Alexander Lambert

    Abstract: Reliable estimation of terrain traversability is critical for the successful deployment of autonomous systems in wild, outdoor environments. Given the lack of large-scale annotated datasets for off-road navigation, strictly-supervised learning approaches remain limited in their generalization ability. To this end, we introduce a novel, image-based self-supervised learning method for traversability… ▽ More

    Submitted 15 March, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

    Comments: ICRA 2024; 8 pages

  38. arXiv:2312.05548  [pdf, other

    eess.IV cs.CV cs.LG

    A Unified Multi-Phase CT Synthesis and Classification Framework for Kidney Cancer Diagnosis with Incomplete Data

    Authors: Kwang-Hyun Uhm, Seung-Won Jung, Moon Hyung Choi, Sung-Hoo Hong, Sung-Jea Ko

    Abstract: Multi-phase CT is widely adopted for the diagnosis of kidney cancer due to the complementary information among phases. However, the complete set of multi-phase CT is often not available in practical clinical applications. In recent years, there have been some studies to generate the missing modality image from the available data. Nevertheless, the generated images are not guaranteed to be effectiv… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: This article has been accepted for publication in IEEE Journal of Biomedical and Health Informatics

    Journal ref: JBHI, 2022

  39. arXiv:2312.05528  [pdf, other

    eess.IV cs.CV

    Exploring 3D U-Net Training Configurations and Post-Processing Strategies for the MICCAI 2023 Kidney and Tumor Segmentation Challenge

    Authors: Kwang-Hyun Uhm, Hyunjun Cho, Zhixin Xu, Seohoon Lim, Seung-Won Jung, Sung-Hoo Hong, Sung-Jea Ko

    Abstract: In 2023, it is estimated that 81,800 kidney cancer cases will be newly diagnosed, and 14,890 people will die from this cancer in the United States. Preoperative dynamic contrast-enhanced abdominal computed tomography (CT) is often used for detecting lesions. However, there exists inter-observer variability due to subtle differences in the imaging features of kidney and kidney tumors. In this paper… ▽ More

    Submitted 9 December, 2023; originally announced December 2023.

    Comments: MICCAI 2023, KITS 2023 challenge 2nd place

  40. arXiv:2312.01638  [pdf, other

    eess.IV cs.CV

    J-Net: Improved U-Net for Terahertz Image Super-Resolution

    Authors: Woon-Ha Yeo, Seung-Hwan Jung, Seung Jae Oh, Inhee Maeng, Eui Su Lee, Han-Cheol Ryu

    Abstract: Terahertz (THz) waves are electromagnetic waves in the 0.1 to 10 THz frequency range, and THz imaging is utilized in a range of applications, including security inspections, biomedical fields, and the non-destructive examination of materials. However, THz images have low resolution due to the long wavelength of THz waves. Therefore, improving the resolution of THz images is one of the current hot… ▽ More

    Submitted 4 December, 2023; originally announced December 2023.

  41. arXiv:2312.00356  [pdf, other

    physics.chem-ph cs.LG

    Transfer learning for predicting source terms of principal component transport in chemically reactive flow

    Authors: Ki Sung Jung, Tarek Echekki, Jacqueline H. Chen, Mohammad Khalil

    Abstract: The objective of this study is to evaluate whether the number of requisite training samples can be reduced with the use of various transfer learning models for predicting, for example, the chemical source terms of the data-driven reduced-order model that represents the homogeneous ignition process of a hydrogen/air mixture. Principal component analysis is applied to reduce the dimensionality of th… ▽ More

    Submitted 1 December, 2023; originally announced December 2023.

    Comments: 41 pages, 14 figures

  42. arXiv:2311.15683  [pdf

    eess.AS cs.SD eess.SP

    Ultrasensitive Textile Strain Sensors Redefine Wearable Silent Speech Interfaces with High Machine Learning Efficiency

    Authors: Chenyu Tang, Muzi Xu, Wentian Yi, Zibo Zhang, Edoardo Occhipinti, Chaoqun Dong, Dafydd Ravenscroft, Sung-Min Jung, Sanghyo Lee, Shuo Gao, Jong Min Kim, Luigi G. Occhipinti

    Abstract: Our research presents a wearable Silent Speech Interface (SSI) technology that excels in device comfort, time-energy efficiency, and speech decoding accuracy for real-world use. We developed a biocompatible, durable textile choker with an embedded graphene-based strain sensor, capable of accurately detecting subtle throat movements. This sensor, surpassing other strain sensors in sensitivity by 42… ▽ More

    Submitted 7 December, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

    Comments: 5 figures in the article; 11 figures and 4 tables in supplementary information

    Journal ref: npj Flexible Electronics (2024)

  43. arXiv:2311.13338  [pdf, other

    cs.CV

    High-Quality Face Caricature via Style Translation

    Authors: Lamyanba Laishram, Muhammad Shaheryar, Jong Taek Lee, Soon Ki Jung

    Abstract: Caricature is an exaggerated form of artistic portraiture that accentuates unique yet subtle characteristics of human faces. Recently, advancements in deep end-to-end techniques have yielded encouraging outcomes in capturing both style and elevated exaggerations in creating face caricatures. Most of these approaches tend to produce cartoon-like results that could be more practical for real-world a… ▽ More

    Submitted 22 November, 2023; originally announced November 2023.

    Comments: 14 pages, 21 figures

  44. arXiv:2311.12805  [pdf, other

    cs.CV cs.AI

    DeepCompass: AI-driven Location-Orientation Synchronization for Navigating Platforms

    Authors: Jihun Lee, SP Choi, Bumsoo Kang, Hyekyoung Seok, Hyoungseok Ahn, Sanghee Jung

    Abstract: In current navigating platforms, the user's orientation is typically estimated based on the difference between two consecutive locations. In other words, the orientation cannot be identified until the second location is taken. This asynchronous location-orientation identification often leads to our real-life question: Why does my navigator tell the wrong direction of my car at the beginning? We pr… ▽ More

    Submitted 15 September, 2023; originally announced November 2023.

    Comments: 7page with 3 supplemental pages

  45. arXiv:2311.10922  [pdf, other

    cs.AI cs.CL cs.DB cs.IR

    Explainable Product Classification for Customs

    Authors: Eunji Lee, Sihyeon Kim, Sundong Kim, Soyeon Jung, Heeja Kim, Meeyoung Cha

    Abstract: The task of assigning internationally accepted commodity codes (aka HS codes) to traded goods is a critical function of customs offices. Like court decisions made by judges, this task follows the doctrine of precedent and can be nontrivial even for experienced officers. Together with the Korea Customs Service (KCS), we propose a first-ever explainable decision supporting model that suggests the mo… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: 24 pages, Accepted to ACM Transactions on Intelligent Systems and Technology

  46. arXiv:2311.05407  [pdf

    physics.comp-ph cs.LG physics.chem-ph

    Data Distillation for Neural Network Potentials toward Foundational Dataset

    Authors: Gang Seob Jung, Sangkeun Lee, Jong Youl Choi

    Abstract: Machine learning (ML) techniques and atomistic modeling have rapidly transformed materials design and discovery. Specifically, generative models can swiftly propose promising materials for targeted applications. However, the predicted properties of materials through the generative models often do not match with calculated properties through ab initio calculations. This discrepancy can arise becaus… ▽ More

    Submitted 9 November, 2023; originally announced November 2023.

  47. arXiv:2310.19583  [pdf, other

    cs.CV cs.LG

    GC-MVSNet: Multi-View, Multi-Scale, Geometrically-Consistent Multi-View Stereo

    Authors: Vibhas K. Vats, Sripad Joshi, David J. Crandall, Md. Alimoor Reza, Soon-heung Jung

    Abstract: Traditional multi-view stereo (MVS) methods rely heavily on photometric and geometric consistency constraints, but newer machine learning-based MVS methods check geometric consistency across multiple source views only as a post-processing step. In this paper, we present a novel approach that explicitly encourages geometric consistency of reference view depth maps across multiple source views at di… ▽ More

    Submitted 21 December, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted in WACV 2024 Link: https://openaccess.thecvf.com/content/WACV2024/html/Vats_GC-MVSNet_Multi-View_Multi-Scale_Geometrically-Consistent_Multi-View_Stereo_WACV_2024_paper.html

    Journal ref: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

  48. arXiv:2309.15314  [pdf

    physics.med-ph cs.CV

    Conversion of single-energy computed tomography to parametric maps of dual-energy computed tomography using convolutional neural network

    Authors: Sangwook Kim, Jimin Lee, Jungye Kim, Bitbyeol Kim, Chang Heon Choi, Seongmoon Jung

    Abstract: Objectives: We propose a deep learning (DL) multi-task learning framework using convolutional neural network (CNN) for a direct conversion of single-energy CT (SECT) to three different parametric maps of dual-energy CT (DECT): Virtual-monochromatic image (VMI), effective atomic number (EAN), and relative electron density (RED). Methods: We propose VMI-Net for conversion of SECT to 70, 120, and 2… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: 29 pages, 17 figures

  49. arXiv:2309.13523  [pdf, other

    cs.CV

    LiDAR-UDA: Self-ensembling Through Time for Unsupervised LiDAR Domain Adaptation

    Authors: Amirreza Shaban, JoonHo Lee, Sanghun Jung, Xiangyun Meng, Byron Boots

    Abstract: We introduce LiDAR-UDA, a novel two-stage self-training-based Unsupervised Domain Adaptation (UDA) method for LiDAR segmentation. Existing self-training methods use a model trained on labeled source data to generate pseudo labels for target data and refine the predictions via fine-tuning the network on the pseudo labels. These methods suffer from domain shifts caused by different LiDAR sensor conf… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

    Comments: Accepted ICCV 2023 (Oral)

  50. arXiv:2309.13457  [pdf, other

    cs.LG cs.CV physics.comp-ph physics.flu-dyn

    Turbulence in Focus: Benchmarking Scaling Behavior of 3D Volumetric Super-Resolution with BLASTNet 2.0 Data

    Authors: Wai Tong Chung, Bassem Akoush, Pushan Sharma, Alex Tamkin, Ki Sung Jung, Jacqueline H. Chen, Jack Guo, Davy Brouzet, Mohsen Talei, Bruno Savard, Alexei Y. Poludnenko, Matthias Ihme

    Abstract: Analysis of compressible turbulent flows is essential for applications related to propulsion, energy generation, and the environment. Here, we present BLASTNet 2.0, a 2.2 TB network-of-datasets containing 744 full-domain samples from 34 high-fidelity direct numerical simulations, which addresses the current limited availability of 3D high-fidelity reacting and non-reacting compressible turbulent f… ▽ More

    Submitted 27 October, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

    Comments: Accepted in Adv. in Neural Information Processing Systems 36 (NeurIPS 2023). Link: https://nips.cc/virtual/2023/poster/73433 . 55 pages, 21 figures. Keywords: Super-resolution, 3D, Neural Scaling, Physics-informed Loss, Computational Fluid Dynamics, Partial Differential Equations, Turbulent Reacting Flows, Direct Numerical Simulation, Fluid Mechanics, Combustion, Computer Vision