-
Decoding the formation of hammerhead ion populations observed by Parker Solar Probe
Authors:
Shaaban M. Shaaban,
M. Lazar,
R. A. López,
P. H. Yoon,
S. Poedts
Abstract:
In situ observations by the Parker Solar Probe (PSP) have revealed new properties of the proton velocity distributions, including hammerhead features that suggest non-isotropic broadening of the beams. The present work proposes a very plausible explanation for the formation of these populations through the action of a proton firehose-like instability triggered by the proton beam. The quasi-linear…
▽ More
In situ observations by the Parker Solar Probe (PSP) have revealed new properties of the proton velocity distributions, including hammerhead features that suggest non-isotropic broadening of the beams. The present work proposes a very plausible explanation for the formation of these populations through the action of a proton firehose-like instability triggered by the proton beam. The quasi-linear (QL) theory proposed here shows that the resulting right-hand (RH) waves have two consequences on the protons: (i) reduce the relative drift between the beam and the core, but above all, (ii) induce a strong perpendicular temperature anisotropy, specific to the observed hammerhead ion strahl. Moreover, the long-run QL results suggest that these hammerhead distributions are rather transitory states, still subject to relaxation mechanisms, of which instabilities like the one discussed here are very likely involved.
△ Less
Submitted 3 September, 2024;
originally announced September 2024.
-
AnoPLe: Few-Shot Anomaly Detection via Bi-directional Prompt Learning with Only Normal Samples
Authors:
Yujin Lee,
Seoyoon Jang,
Hyunsoo Yoon
Abstract:
Few-shot Anomaly Detection (FAD) poses significant challenges due to the limited availability of training samples and the frequent absence of abnormal samples. Previous approaches often rely on annotations or true abnormal samples to improve detection, but such textual or visual cues are not always accessible. To address this, we introduce AnoPLe, a multi-modal prompt learning method designed for…
▽ More
Few-shot Anomaly Detection (FAD) poses significant challenges due to the limited availability of training samples and the frequent absence of abnormal samples. Previous approaches often rely on annotations or true abnormal samples to improve detection, but such textual or visual cues are not always accessible. To address this, we introduce AnoPLe, a multi-modal prompt learning method designed for anomaly detection without prior knowledge of anomalies. AnoPLe simulates anomalies and employs bidirectional coupling of textual and visual prompts to facilitate deep interaction between the two modalities. Additionally, we integrate a lightweight decoder with a learnable multi-view signal, trained on multi-scale images to enhance local semantic comprehension. To further improve performance, we align global and local semantics, enriching the image-level understanding of anomalies. The experimental results demonstrate that AnoPLe achieves strong FAD performance, recording 94.1% and 86.2% Image AUROC on MVTec-AD and VisA respectively, with only around a 1% gap compared to the SoTA, despite not being exposed to true anomalies. Code is available at https://github.com/YoojLee/AnoPLe.
△ Less
Submitted 24 August, 2024;
originally announced August 2024.
-
Dowker duality, profunctors, and spectral sequences
Authors:
Iris H. R. Yoon
Abstract:
The intent of this paper is to explore Dowker duality from a combinatorial, topological, and categorical perspective. The paper presents three short, new proofs of Dowker duality using various poset fiber lemmas. We introduce modifications of joins and products of simplicial complexes called relational join and relational product complexes. These relational complexes can be constructed whenever th…
▽ More
The intent of this paper is to explore Dowker duality from a combinatorial, topological, and categorical perspective. The paper presents three short, new proofs of Dowker duality using various poset fiber lemmas. We introduce modifications of joins and products of simplicial complexes called relational join and relational product complexes. These relational complexes can be constructed whenever there is a relation between simplicial complexes, which includes the context of Dowker duality and covers of simplicial complexes. In this more general setting, we show that the homologies of the simplicial complexes and the relational complexes fit together in a long exact sequence. Similar results are then established for profunctors, which are generalizations of relations to categories. The cograph and graph of profunctors play the role of the relational join and relational product complexes. For a profunctor that arise from adjoint functors between $C$ and $D$, we show that $C$, $D$, the cograph, and the graph all have homotopy equivalent classifying spaces. Lastly, we show that given any profunctor from $D$ to $C$, the homologies of $C$, $D$, the cograph, and the graph form a long exact sequence. We make use of a spectral sequence for Grothendieck fibrations of small categories.
△ Less
Submitted 23 August, 2024;
originally announced August 2024.
-
The First Large Absorption Survey in HI (FLASH): II. Pilot Survey data release and first results
Authors:
Hyein Yoon,
Elaine M. Sadler,
Elizabeth K. Mahony,
J. N. H. S. Aditya,
James R. Allison,
Marcin Glowacki,
Emily F. Kerrison,
Vanessa A. Moss,
Renzhi Su,
Simon Weng,
Matthew Whiting,
O. Ivy Wong,
Joseph R. Callingham,
Stephen J. Curran,
Jeremy Darling,
Alastair C. Edge,
Sara L. Ellison,
Kimberly L. Emig,
Lilian Garratt-Smithson,
Gordon German,
Kathryn Grasha,
Baerbel S. Koribalski,
Raffaella Morganti,
Tom Oosterloo,
Céline Péroux
, et al. (19 additional authors not shown)
Abstract:
The First Large Absorption Survey in HI (FLASH) is a large-area radio survey for neutral hydrogen in the redshift range 0.4<z<1.0, using the 21cm HI absorption line as a probe of cold neutral gas. FLASH uses the ASKAP radio telescope and is the first large 21cm absorption survey to be carried out without any optical preselection of targets. We use an automated Bayesian line-finding tool to search…
▽ More
The First Large Absorption Survey in HI (FLASH) is a large-area radio survey for neutral hydrogen in the redshift range 0.4<z<1.0, using the 21cm HI absorption line as a probe of cold neutral gas. FLASH uses the ASKAP radio telescope and is the first large 21cm absorption survey to be carried out without any optical preselection of targets. We use an automated Bayesian line-finding tool to search through large datasets and assign a statistical significance to potential line detections. The survey aims to explore the neutral gas content of galaxies at a cosmic epoch where almost no HI data are currently available, and to investigate the role of neutral gas in AGN fuelling and feedback. Two Pilot Surveys, covering around 3000 deg$^2$ of sky, were carried out in 2019-22 to test and verify the strategy for the full FLASH survey. The processed data from these Pilot Surveys (spectral-line cubes, continuum images, and catalogues) are available online. Here, we describe the FLASH spectral-line and continuum data and discuss the quality of the HI spectra and the completeness of our automated line search. Finally, we present a set of 30 new HI absorption lines that were robustly detected in the Pilot Surveys. These lines span a wide range in HI optical depth, including three lines with a peak optical depth $τ>1$, and appear to be a mixture of intervening and associated systems. The overall detection rate for HI absorption lines in the Pilot Surveys (0.3 to 0.5 lines per ASKAP field) is a factor of two below the expected value. There are several possible reasons for this, but one likely factor is the presence of a range of spectral-line artefacts in the Pilot Survey data that have now been mitigated and are not expected to recur in the full FLASH survey. A future paper will discuss the host galaxies of the HI absorption systems identified here.
△ Less
Submitted 13 August, 2024;
originally announced August 2024.
-
BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation
Authors:
Hee Suk Yoon,
Eunseop Yoon,
Joshua Tian Jin Tee,
Kang Zhang,
Yu-Jung Heo,
Du-Seong Chang,
Chang D. Yoo
Abstract:
Multimodal Dialogue Response Generation (MDRG) is a recently proposed task where the model needs to generate responses in texts, images, or a blend of both based on the dialogue context. Due to the lack of a large-scale dataset specifically for this task and the benefits of leveraging powerful pre-trained models, previous work relies on the text modality as an intermediary step for both the image…
▽ More
Multimodal Dialogue Response Generation (MDRG) is a recently proposed task where the model needs to generate responses in texts, images, or a blend of both based on the dialogue context. Due to the lack of a large-scale dataset specifically for this task and the benefits of leveraging powerful pre-trained models, previous work relies on the text modality as an intermediary step for both the image input and output of the model rather than adopting an end-to-end approach. However, this approach can overlook crucial information about the image, hindering 1) image-grounded text response and 2) consistency of objects in the image response. In this paper, we propose BI-MDRG that bridges the response generation path such that the image history information is utilized for enhanced relevance of text responses to the image content and the consistency of objects in sequential image responses. Through extensive experiments on the multimodal dialogue benchmark dataset, we show that BI-MDRG can effectively increase the quality of multimodal dialogue. Additionally, recognizing the gap in benchmark datasets for evaluating the image consistency in multimodal dialogue, we have created a curated set of 300 dialogues annotated to track object consistency across conversations.
△ Less
Submitted 12 August, 2024;
originally announced August 2024.
-
LI-TTA: Language Informed Test-Time Adaptation for Automatic Speech Recognition
Authors:
Eunseop Yoon,
Hee Suk Yoon,
John Harvill,
Mark Hasegawa-Johnson,
Chang D. Yoo
Abstract:
Test-Time Adaptation (TTA) has emerged as a crucial solution to the domain shift challenge, wherein the target environment diverges from the original training environment. A prime exemplification is TTA for Automatic Speech Recognition (ASR), which enhances model performance by leveraging output prediction entropy minimization as a self-supervision signal. However, a key limitation of this self-su…
▽ More
Test-Time Adaptation (TTA) has emerged as a crucial solution to the domain shift challenge, wherein the target environment diverges from the original training environment. A prime exemplification is TTA for Automatic Speech Recognition (ASR), which enhances model performance by leveraging output prediction entropy minimization as a self-supervision signal. However, a key limitation of this self-supervision lies in its primary focus on acoustic features, with minimal attention to the linguistic properties of the input. To address this gap, we propose Language Informed Test-Time Adaptation (LI-TTA), which incorporates linguistic insights during TTA for ASR. LI-TTA integrates corrections from an external language model to merge linguistic with acoustic information by minimizing the CTC loss from the correction alongside the standard TTA loss. With extensive experiments, we show that LI-TTA effectively improves the performance of TTA for ASR in various distribution shift situations.
△ Less
Submitted 11 August, 2024;
originally announced August 2024.
-
RT-Surv: Improving Mortality Prediction After Radiotherapy with Large Language Model Structuring of Large-Scale Unstructured Electronic Health Records
Authors:
Sangjoon Park,
Chan Woo Wee,
Seo Hee Choi,
Kyung Hwan Kim,
Jee Suk Chang,
Hong In Yoon,
Ik Jae Lee,
Yong Bae Kim,
Jaeho Cho,
Ki Chang Keum,
Chang Geol Lee,
Hwa Kyung Byun,
Woong Sub Koom
Abstract:
Accurate patient selection is critical in radiotherapy (RT) to prevent ineffective treatments. Traditional survival prediction models, relying on structured data, often lack precision. This study explores the potential of large language models (LLMs) to structure unstructured electronic health record (EHR) data, thereby improving survival prediction accuracy through comprehensive clinical informat…
▽ More
Accurate patient selection is critical in radiotherapy (RT) to prevent ineffective treatments. Traditional survival prediction models, relying on structured data, often lack precision. This study explores the potential of large language models (LLMs) to structure unstructured electronic health record (EHR) data, thereby improving survival prediction accuracy through comprehensive clinical information integration. Data from 34,276 patients treated with RT at Yonsei Cancer Center between 2013 and 2023 were analyzed, encompassing both structured and unstructured data. An open-source LLM was used to structure the unstructured EHR data via single-shot learning, with its performance compared against a domain-specific medical LLM and a smaller variant. Survival prediction models were developed using statistical, machine learning, and deep learning approaches, incorporating both structured and LLM-structured data. Clinical experts evaluated the accuracy of the LLM-structured data. The open-source LLM achieved 87.5% accuracy in structuring unstructured EHR data without additional training, significantly outperforming the domain-specific medical LLM, which reached only 35.8% accuracy. Larger LLMs were more effective, particularly in extracting clinically relevant features like general condition and disease extent, which closely correlated with patient survival. Incorporating LLM-structured clinical features into survival prediction models significantly improved accuracy, with the C-index of deep learning models increasing from 0.737 to 0.820. These models also became more interpretable by emphasizing clinically significant factors. This study shows that general-domain LLMs, even without specific medical training, can effectively structure large-scale unstructured EHR data, substantially enhancing the accuracy and interpretability of clinical predictive models.
△ Less
Submitted 13 September, 2024; v1 submitted 9 August, 2024;
originally announced August 2024.
-
Identifying treatment response subgroups in observational time-to-event data
Authors:
Vincent Jeanselme,
Chang Ho Yoon,
Fabian Falck,
Brian Tom,
Jessica Barrett
Abstract:
Identifying patient subgroups with different treatment responses is an important task to inform medical recommendations, guidelines, and the design of future clinical trials. Existing approaches for subgroup analysis primarily focus on Randomised Controlled Trials (RCTs), in which treatment assignment is randomised. Furthermore, the patient cohort of an RCT is often constrained by cost, and is not…
▽ More
Identifying patient subgroups with different treatment responses is an important task to inform medical recommendations, guidelines, and the design of future clinical trials. Existing approaches for subgroup analysis primarily focus on Randomised Controlled Trials (RCTs), in which treatment assignment is randomised. Furthermore, the patient cohort of an RCT is often constrained by cost, and is not representative of the heterogeneity of patients likely to receive treatment in real-world clinical practice. Therefore, when applied to observational studies, such approaches suffer from significant statistical biases because of the non-randomisation of treatment. Our work introduces a novel, outcome-guided method for identifying treatment response subgroups in observational studies. Our approach assigns each patient to a subgroup associated with two time-to-event distributions: one under treatment and one under control regime. It hence positions itself in between individualised and average treatment effect estimation. The assumptions of our model result in a simple correction of the statistical bias from treatment non-randomisation through inverse propensity weighting. In experiments, our approach significantly outperforms the current state-of-the-art method for outcome-guided subgroup analysis in both randomised and observational treatment regimes.
△ Less
Submitted 6 August, 2024;
originally announced August 2024.
-
TLCR: Token-Level Continuous Reward for Fine-grained Reinforcement Learning from Human Feedback
Authors:
Eunseop Yoon,
Hee Suk Yoon,
SooHwan Eom,
Gunsoo Han,
Daniel Wontae Nam,
Daejin Jo,
Kyoung-Woon On,
Mark A. Hasegawa-Johnson,
Sungwoong Kim,
Chang D. Yoo
Abstract:
Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between sequence-level preference labels and tokens, which are autoregressively generated from the language model. Although several recent approaches have tri…
▽ More
Reinforcement Learning from Human Feedback (RLHF) leverages human preference data to train language models to align more closely with human essence. These human preference data, however, are labeled at the sequence level, creating a mismatch between sequence-level preference labels and tokens, which are autoregressively generated from the language model. Although several recent approaches have tried to provide token-level (i.e., dense) rewards for each individual token, these typically rely on predefined discrete reward values (e.g., positive: +1, negative: -1, neutral: 0), failing to account for varying degrees of preference inherent to each token. To address this limitation, we introduce TLCR (Token-Level Continuous Reward) for RLHF, which incorporates a discriminator trained to distinguish positive and negative tokens, and the confidence of the discriminator is used to assign continuous rewards to each token considering the context. Extensive experiments show that our proposed TLCR leads to consistent performance improvements over previous sequence-level or token-level discrete rewards on open-ended generation benchmarks.
△ Less
Submitted 23 July, 2024;
originally announced July 2024.
-
Pacer and Runner: Cooperative Learning Framework between Single- and Cross-Domain Sequential Recommendation
Authors:
Chung Park,
Taesan Kim,
Hyungjun Yoon,
Junui Hong,
Yelim Yu,
Mincheol Cho,
Minsung Choi,
Jaegul Choo
Abstract:
Cross-Domain Sequential Recommendation (CDSR) improves recommendation performance by utilizing information from multiple domains, which contrasts with Single-Domain Sequential Recommendation (SDSR) that relies on a historical interaction within a specific domain. However, CDSR may underperform compared to the SDSR approach in certain domains due to negative transfer, which occurs when there is a l…
▽ More
Cross-Domain Sequential Recommendation (CDSR) improves recommendation performance by utilizing information from multiple domains, which contrasts with Single-Domain Sequential Recommendation (SDSR) that relies on a historical interaction within a specific domain. However, CDSR may underperform compared to the SDSR approach in certain domains due to negative transfer, which occurs when there is a lack of relation between domains or different levels of data sparsity. To address the issue of negative transfer, our proposed CDSR model estimates the degree of negative transfer of each domain and adaptively assigns it as a weight factor to the prediction loss, to control gradient flows through domains with significant negative transfer. To this end, our model compares the performance of a model trained on multiple domains (CDSR) with a model trained solely on the specific domain (SDSR) to evaluate the negative transfer of each domain using our asymmetric cooperative network. In addition, to facilitate the transfer of valuable cues between the SDSR and CDSR tasks, we developed an auxiliary loss that maximizes the mutual information between the representation pairs from both tasks on a per-domain basis. This cooperative learning between SDSR and CDSR tasks is similar to the collaborative dynamics between pacers and runners in a marathon. Our model outperformed numerous previous works in extensive experiments on two real-world industrial datasets across ten service domains. We also have deployed our model in the recommendation system of our personal assistant app service, resulting in 21.4% increase in click-through rate compared to existing models, which is valuable to real-world business.
△ Less
Submitted 24 July, 2024; v1 submitted 15 July, 2024;
originally announced July 2024.
-
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting
Authors:
Hyungjun Yoon,
Biniyam Aschalew Tolera,
Taesik Gong,
Kimin Lee,
Sung-Ju Lee
Abstract:
Large language models (LLMs) have demonstrated exceptional abilities across various domains. However, utilizing LLMs for ubiquitous sensing applications remains challenging as existing text-prompt methods show significant performance degradation when handling long sensor data sequences. We propose a visual prompting approach for sensor data using multimodal LLMs (MLLMs). We design a visual prompt…
▽ More
Large language models (LLMs) have demonstrated exceptional abilities across various domains. However, utilizing LLMs for ubiquitous sensing applications remains challenging as existing text-prompt methods show significant performance degradation when handling long sensor data sequences. We propose a visual prompting approach for sensor data using multimodal LLMs (MLLMs). We design a visual prompt that directs MLLMs to utilize visualized sensor data alongside the target sensory task descriptions. Additionally, we introduce a visualization generator that automates the creation of optimal visualizations tailored to a given sensory task, eliminating the need for prior task-specific knowledge. We evaluated our approach on nine sensory tasks involving four sensing modalities, achieving an average of 10% higher accuracy than text-based prompts and reducing token costs by 15.8x. Our findings highlight the effectiveness and cost-efficiency of visual prompts with MLLMs for various sensory tasks.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images
Authors:
Jisu Shin,
Junmyeong Lee,
Seongmin Lee,
Min-Gyu Park,
Ju-Mi Kang,
Ju Hong Yoon,
Hae-Gon Jeon
Abstract:
We present a novel framework for reconstructing animatable human avatars from multiple images, termed CanonicalFusion. Our central concept involves integrating individual reconstruction results into the canonical space. To be specific, we first predict Linear Blend Skinning (LBS) weight maps and depth maps using a shared-encoder-dual-decoder network, enabling direct canonicalization of the 3D mesh…
▽ More
We present a novel framework for reconstructing animatable human avatars from multiple images, termed CanonicalFusion. Our central concept involves integrating individual reconstruction results into the canonical space. To be specific, we first predict Linear Blend Skinning (LBS) weight maps and depth maps using a shared-encoder-dual-decoder network, enabling direct canonicalization of the 3D mesh from the predicted depth maps. Here, instead of predicting high-dimensional skinning weights, we infer compressed skinning weights, i.e., 3-dimensional vector, with the aid of pre-trained MLP networks. We also introduce a forward skinning-based differentiable rendering scheme to merge the reconstructed results from multiple images. This scheme refines the initial mesh by reposing the canonical mesh via the forward skinning and by minimizing photometric and geometric errors between the rendered and the predicted results. Our optimization scheme considers the position and color of vertices as well as the joint angles for each image, thereby mitigating the negative effects of pose errors. We conduct extensive experiments to demonstrate the effectiveness of our method and compare our CanonicalFusion with state-of-the-art methods. Our source codes are available at https://github.com/jsshin98/CanonicalFusion.
△ Less
Submitted 15 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Superconducting phase diagram in Bi$_x$Ni$_{1-x}$ thin films$\colon$ the effects of Bi stoichiometry on superconductivity
Authors:
Jihun Park,
Jarryd A. Horn,
Dylan J. Kirsch,
Rohit K. Pant,
Hyeok Yoon,
Sungha Baek,
Suchismita Sarker,
Apurva Mehta,
Xiaohang Zhang,
Seunghun Lee,
Richard Greene,
Johnpierre Paglione,
Ichiro Takeuchi
Abstract:
The Bi${-}$Ni binary system has been of interest due to possible unconventional superconductivity aroused therein, such as time-reversal symmetry breaking in Bi/Ni bilayers or the coexistence of superconductivity and ferromagnetism in Bi$_3$Ni crystals. While Ni acts as a ferromagnetic element in such systems, the role of strong spin-orbit-coupling element Bi in superconductivity has remained unex…
▽ More
The Bi${-}$Ni binary system has been of interest due to possible unconventional superconductivity aroused therein, such as time-reversal symmetry breaking in Bi/Ni bilayers or the coexistence of superconductivity and ferromagnetism in Bi$_3$Ni crystals. While Ni acts as a ferromagnetic element in such systems, the role of strong spin-orbit-coupling element Bi in superconductivity has remained unexplored. In this work, we systematically studied the effects of Bi stoichiometry on the superconductivity of Bi$_x$Ni$_{1-x}$ thin films (${x} \approx$ 0.5 to 0.9) fabricated via a composition-spread approach. The superconducting phase map of Bi$_x$Ni$_{1-x}$ thin films exhibited a superconducting composition region attributable to the intermetallic Bi$_3$Ni phase with different amount of excess Bi, revealed by synchrotron X-ray diffraction analysis. Interestingly, the mixed phase region with Bi$_3$Ni and Bi showed unusual increases in the superconducting transition temperature and residual resistance ratio as more Bi impurities were included, with the maximum ${T}_{c}$ ($=$ 4.2 K) observed at $x \approx$ 0.79. A correlation analysis of structural, electrical, and magneto-transport characteristics across the composition variation revealed that the unusual superconducting $"$dome$"$ is due to two competing roles of Bi$\colon$ impurity scattering and carrier doping. We found that the carrier doping effect is dominant in the mild doping regime (0.74 $\leq {x} \leq$ 0.79), while impurity scattering becomes more pronounced at larger Bi stoichiometry.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
The Relation Between Variances of a 3D Density and Its 2D Column Density Revisited
Authors:
Heesun Yoon,
Jungyeon Cho
Abstract:
We revisit the relation between the variance of three-dimensional (3D) density ($σ^{2}_ρ$) and that of the projected two-dimensional (2D) column density ($σ^{2}_Σ$) in turbulent media, which is of great importance in obtaining turbulence properties from observations. Earlier studies showed that $σ^{2}_{Σ/ Σ_{0}}/σ^{2}_{ρ/ ρ_{0}} = \mathcal{R}$, where $Σ/Σ_0$ and $ρ/ρ_0$ are 2D column and 3D volume…
▽ More
We revisit the relation between the variance of three-dimensional (3D) density ($σ^{2}_ρ$) and that of the projected two-dimensional (2D) column density ($σ^{2}_Σ$) in turbulent media, which is of great importance in obtaining turbulence properties from observations. Earlier studies showed that $σ^{2}_{Σ/ Σ_{0}}/σ^{2}_{ρ/ ρ_{0}} = \mathcal{R}$, where $Σ/Σ_0$ and $ρ/ρ_0$ are 2D column and 3D volume densities normalized by their mean values, respectively. The factor $\mathcal{R}$ depends only on the density spectrum for isotropic turbulence in a cloud that has similar dimensions along and perpendicular to the line of sight. Our major findings in this paper are as follows. First, we show that the factor $\mathcal{R}$ can be expressed in terms of $N$, the number of independent eddies along the line of sight. To be specific, $σ^{2}_{Σ/ Σ_{0}}/σ^{2}_{ρ/ρ_{0}}$ is proportional to $\sim 1/N$, due to the averaging effect arising from independent eddies along the line of sight. Second, we show that the factor $\mathcal{R}$ needs to be modified if the dimension of the cloud in the line-of-sight direction is different from that in the perpendicular direction. However, if we express $σ^{2}_{Σ/ Σ_{0}}/σ^{2}_{ρ/ ρ_{0}}$ in terms of $N$, the expression remains same even in the case the cloud has different dimensions along and perpendicular to the line of sight. Third, when we plot $Nσ^{2}_{Σ/ Σ_{0}}$ against $σ^{2}_{ρ/ ρ_{0}}$, two quantities roughly lie on a single curve regardless of the sonic Mach number, which implies that we can directly obtain the latter from the former. We discuss observational implications of our findings.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Recover as It is Designed to Be: Recovering from Compatibility Mobile App Crashes by Reusing User Flows
Authors:
Donghwi Kim,
Hyungjun Yoon,
Chang Min Park,
Sujin Han,
Youngjin Kwon,
Steven Y. Ko,
Sung-Ju Lee
Abstract:
Android OS is severely fragmented by API updates and device vendors' OS customization, creating a market condition where vastly different OS versions coexist. This gives rise to compatibility crash problems where Android apps crash on certain Android versions but not on others. Although well-known, this problem is extremely challenging for app developers to overcome due to the sheer number of Andr…
▽ More
Android OS is severely fragmented by API updates and device vendors' OS customization, creating a market condition where vastly different OS versions coexist. This gives rise to compatibility crash problems where Android apps crash on certain Android versions but not on others. Although well-known, this problem is extremely challenging for app developers to overcome due to the sheer number of Android versions in the market that must be tested. We present RecoFlow, a framework for enabling app developers to automatically recover an app from a crash by programming user flows with our API and visual tools. RecoFlow tracks app feature usage with the user flows on user devices and recovers an app from a crash by replaying UI actions of the app feature disrupted by the crash. To prevent recurring compatibility crashes, RecoFlow executes a previously crashed app in compatibility mode that is enabled by our novel Android OS virtualization technique. Our evaluation with professional Android developers shows that our API and tools are easy to use and effective in recovering from compatibility crashes.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Consistent Hamiltonian Reduction
Authors:
Jong Hyuk Yoon
Abstract:
I show that the recently proposed (2+2) Hamiltonian reduction of Einstein's equations of 4-dimensional spacetimes is consistent with general covariance. The consistency proof is {\it extrinsic}, as it follows from the fact that Hamilton's equations derived from the non-zero gravitational Hamiltonian are identical to the Ricci-flat condition of 4-dimensional spacetimes in privileged coordinates.
I show that the recently proposed (2+2) Hamiltonian reduction of Einstein's equations of 4-dimensional spacetimes is consistent with general covariance. The consistency proof is {\it extrinsic}, as it follows from the fact that Hamilton's equations derived from the non-zero gravitational Hamiltonian are identical to the Ricci-flat condition of 4-dimensional spacetimes in privileged coordinates.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Poisson algebra of quasilocal angular momentum and its asymptotic limit
Authors:
Jong Hyuk Yoon,
Seung Hun Oh
Abstract:
We study the previously proposed quasilocal angular momentum of gravitational fields in the absence of isometries. The quasilocal angular momentum $L(ξ)$ has the following attractive properties; ({\it i}) it follows from the Einstein's constraint equations, ({\it ii}) it satisfies the Poisson algebra $\{L(ξ), L(η) \}_{\rm P.B.} =({1/16π)}\, L( [ξ, η]_{\rm L} )$, ({\it iii}) its Poisson algebra red…
▽ More
We study the previously proposed quasilocal angular momentum of gravitational fields in the absence of isometries. The quasilocal angular momentum $L(ξ)$ has the following attractive properties; ({\it i}) it follows from the Einstein's constraint equations, ({\it ii}) it satisfies the Poisson algebra $\{L(ξ), L(η) \}_{\rm P.B.} =({1/16π)}\, L( [ξ, η]_{\rm L} )$, ({\it iii}) its Poisson algebra reduces to the standard $SO(3)$ algebra of angular momentum at null infinity, and ({\it iv}) it reproduces the standard value for the Kerr spacetime at null infinity. It will be argued that our definition is a quasilocal and canonical generalization of A. Rizzi's geometric definition at null infinity. We also propose a new definition of an {\it invariant} quasilocal angular momentum $L^{2}$ such that $\{ L^2, L(ξ) \}_{\rm P.B.} = 0$, which becomes $(ma)^{2}$ at the null infinity of the Kerr spacetime. Therefore, it may be regarded as a quasilocal generalization of the Casimir invariant of ordinary angular momentum in the flat spacetime.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Hamiltonian structure and constraint algebra in the (2+2) formalism
Authors:
J. H. Yoon
Abstract:
The canonical formalism of the (2+2) formulation of general relativity of 4 spacetime dimensions is studied under no symmetry assumptions, where the spacetime is viewed as a local product of a 2 dimensional base manifold of Lorentzian signature with the vertical space as its complement. The affine null parameter is chosen as the time coordinate whose level surfaces are 3 dimensional spacelike hype…
▽ More
The canonical formalism of the (2+2) formulation of general relativity of 4 spacetime dimensions is studied under no symmetry assumptions, where the spacetime is viewed as a local product of a 2 dimensional base manifold of Lorentzian signature with the vertical space as its complement. The affine null parameter is chosen as the time coordinate whose level surfaces are 3 dimensional spacelike hypersurfaces. From the first-order action principle, Hamilton's equations of motion and the constraints are obtained, which are found to be equivalent to the Einstein's equations. The constraint algebra is also presented, which has interesting subalgebras such as the infinite dimensional Lie algebra of the diffeomorphisms of the 2 dimensional vertical space, infinite dimensional Virasoro algebra associated with the 2 dimensional base manifold, and an analog of supertranslation. The symmetry algebra may be viewed as a generalization of the BMS or Spi group to a finite distance.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
PULL: PU-Learning-based Accurate Link Prediction
Authors:
Junghun Kim,
Ka Hyun Park,
Hoyoung Yoon,
U Kang
Abstract:
Given an edge-incomplete graph, how can we accurately find the missing links? The link prediction in edge-incomplete graphs aims to discover the missing relations between entities when their relationships are represented as a graph. Edge-incomplete graphs are prevalent in real-world due to practical limitations, such as not checking all users when adding friends in a social network. Addressing the…
▽ More
Given an edge-incomplete graph, how can we accurately find the missing links? The link prediction in edge-incomplete graphs aims to discover the missing relations between entities when their relationships are represented as a graph. Edge-incomplete graphs are prevalent in real-world due to practical limitations, such as not checking all users when adding friends in a social network. Addressing the problem is crucial for various tasks, including recommending friends in social networks and finding references in citation networks. However, previous approaches rely heavily on the given edge-incomplete (observed) graph, making it challenging to consider the missing (unobserved) links during training. In this paper, we propose PULL (PU-Learning-based Link predictor), an accurate link prediction method based on the positive-unlabeled (PU) learning. PULL treats the observed edges in the training graph as positive examples, and the unconnected node pairs as unlabeled ones. PULL effectively prevents the link predictor from overfitting to the observed graph by proposing latent variables for every edge, and leveraging the expected graph structure with respect to the variables. Extensive experiments on five real-world datasets show that PULL consistently outperforms the baselines for predicting links in edge-incomplete graphs.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Simplifying Multimodality: Unimodal Approach to Multimodal Challenges in Radiology with General-Domain Large Language Model
Authors:
Seonhee Cho,
Choonghan Kim,
Jiho Lee,
Chetan Chilkunda,
Sujin Choi,
Joo Heung Yoon
Abstract:
Recent advancements in Large Multimodal Models (LMMs) have attracted interest in their generalization capability with only a few samples in the prompt. This progress is particularly relevant to the medical domain, where the quality and sensitivity of data pose unique challenges for model training and application. However, the dependency on high-quality data for effective in-context learning raises…
▽ More
Recent advancements in Large Multimodal Models (LMMs) have attracted interest in their generalization capability with only a few samples in the prompt. This progress is particularly relevant to the medical domain, where the quality and sensitivity of data pose unique challenges for model training and application. However, the dependency on high-quality data for effective in-context learning raises questions about the feasibility of these models when encountering with the inevitable variations and errors inherent in real-world medical data. In this paper, we introduce MID-M, a novel framework that leverages the in-context learning capabilities of a general-domain Large Language Model (LLM) to process multimodal data via image descriptions. MID-M achieves a comparable or superior performance to task-specific fine-tuned LMMs and other general-domain ones, without the extensive domain-specific training or pre-training on multimodal data, with significantly fewer parameters. This highlights the potential of leveraging general-domain LLMs for domain-specific tasks and offers a sustainable and cost-effective alternative to traditional LMM developments. Moreover, the robustness of MID-M against data quality issues demonstrates its practical utility in real-world medical domain applications.
△ Less
Submitted 29 April, 2024;
originally announced May 2024.
-
Enhancing Diagnosis through AI-driven Analysis of Reflectance Confocal Microscopy
Authors:
Hong-Jun Yoon,
Chris Keum,
Alexander Witkowski,
Joanna Ludzik,
Tracy Petrie,
Heidi A. Hanson,
Sancy A. Leachman
Abstract:
Reflectance Confocal Microscopy (RCM) is a non-invasive imaging technique used in biomedical research and clinical dermatology. It provides virtual high-resolution images of the skin and superficial tissues, reducing the need for physical biopsies. RCM employs a laser light source to illuminate the tissue, capturing the reflected light to generate detailed images of microscopic structures at vario…
▽ More
Reflectance Confocal Microscopy (RCM) is a non-invasive imaging technique used in biomedical research and clinical dermatology. It provides virtual high-resolution images of the skin and superficial tissues, reducing the need for physical biopsies. RCM employs a laser light source to illuminate the tissue, capturing the reflected light to generate detailed images of microscopic structures at various depths. Recent studies explored AI and machine learning, particularly CNNs, for analyzing RCM images. Our study proposes a segmentation strategy based on textural features to identify clinically significant regions, empowering dermatologists in effective image interpretation and boosting diagnostic confidence. This approach promises to advance dermatological diagnosis and treatment.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
GaussianTalker: Real-Time High-Fidelity Talking Head Synthesis with Audio-Driven 3D Gaussian Splatting
Authors:
Kyusun Cho,
Joungbin Lee,
Heeji Yoon,
Yeobin Hong,
Jaehoon Ko,
Sangjun Ahn,
Seungryong Kim
Abstract:
We propose GaussianTalker, a novel framework for real-time generation of pose-controllable talking heads. It leverages the fast rendering capabilities of 3D Gaussian Splatting (3DGS) while addressing the challenges of directly controlling 3DGS with speech audio. GaussianTalker constructs a canonical 3DGS representation of the head and deforms it in sync with the audio. A key insight is to encode t…
▽ More
We propose GaussianTalker, a novel framework for real-time generation of pose-controllable talking heads. It leverages the fast rendering capabilities of 3D Gaussian Splatting (3DGS) while addressing the challenges of directly controlling 3DGS with speech audio. GaussianTalker constructs a canonical 3DGS representation of the head and deforms it in sync with the audio. A key insight is to encode the 3D Gaussian attributes into a shared implicit feature representation, where it is merged with audio features to manipulate each Gaussian attribute. This design exploits the spatial-aware features and enforces interactions between neighboring points. The feature embeddings are then fed to a spatial-audio attention module, which predicts frame-wise offsets for the attributes of each Gaussian. It is more stable than previous concatenation or multiplication approaches for manipulating the numerous Gaussians and their intricate parameters. Experimental results showcase GaussianTalker's superiority in facial fidelity, lip synchronization accuracy, and rendering speed compared to previous methods. Specifically, GaussianTalker achieves a remarkable rendering speed up to 120 FPS, surpassing previous benchmarks. Our code is made available at https://github.com/KU-CVLAB/GaussianTalker/ .
△ Less
Submitted 25 April, 2024; v1 submitted 24 April, 2024;
originally announced April 2024.
-
ADAPT^2: Adapting Pre-Trained Sensing Models to End-Users via Self-Supervision Replay
Authors:
Hyungjun Yoon,
Jaehyun Kwak,
Biniyam Aschalew Tolera,
Gaole Dai,
Mo Li,
Taesik Gong,
Kimin Lee,
Sung-Ju Lee
Abstract:
Self-supervised learning has emerged as a method for utilizing massive unlabeled data for pre-training models, providing an effective feature extractor for various mobile sensing applications. However, when deployed to end-users, these models encounter significant domain shifts attributed to user diversity. We investigate the performance degradation that occurs when self-supervised models are fine…
▽ More
Self-supervised learning has emerged as a method for utilizing massive unlabeled data for pre-training models, providing an effective feature extractor for various mobile sensing applications. However, when deployed to end-users, these models encounter significant domain shifts attributed to user diversity. We investigate the performance degradation that occurs when self-supervised models are fine-tuned in heterogeneous domains. To address the issue, we propose ADAPT^2, a few-shot domain adaptation framework for personalizing self-supervised models. ADAPT2 proposes self-supervised meta-learning for initial model pre-training, followed by a user-side model adaptation by replaying the self-supervision with user-specific data. This allows models to adjust their pre-trained representations to the user with only a few samples. Evaluation with four benchmarks demonstrates that ADAPT^2 outperforms existing baselines by an average F1-score of 8.8%p. Our on-device computational overhead analysis on a commodity off-the-shelf (COTS) smartphone shows that ADAPT2 completes adaptation within an unobtrusive latency (in three minutes) with only a 9.54% memory consumption, demonstrating the computational efficiency of the proposed method.
△ Less
Submitted 29 March, 2024;
originally announced April 2024.
-
Pegasus-v1 Technical Report
Authors:
Raehyuk Jung,
Hyojun Go,
Jaehyuk Yi,
Jiho Jang,
Daniel Kim,
Jay Suh,
Aiden Lee,
Cooper Han,
Jae Lee,
Jeff Kim,
Jin-Young Kim,
Junwan Kim,
Kyle Park,
Lucas Lee,
Mars Ha,
Minjoon Seo,
Abraham Jo,
Ed Park,
Hassan Kianinejad,
SJ Kim,
Tony Moon,
Wade Jeong,
Andrei Popescu,
Esther Kim,
EK Yoon
, et al. (19 additional authors not shown)
Abstract:
This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's archi…
▽ More
This technical report introduces Pegasus-1, a multimodal language model specialized in video content understanding and interaction through natural language. Pegasus-1 is designed to address the unique challenges posed by video data, such as interpreting spatiotemporal information, to offer nuanced video content comprehension across various lengths. This technical report overviews Pegasus-1's architecture, training strategies, and its performance in benchmarks on video conversation, zero-shot video question answering, and video summarization. We also explore qualitative characteristics of Pegasus-1 , demonstrating its capabilities as well as its limitations, in order to provide readers a balanced view of its current state and its future direction.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Nonclassicality in Two-Mode Stabilized Squeezed Coherent State: Quantum-to-Classical transition
Authors:
C. Lee,
T. H. Yoon
Abstract:
We consider a two-mode stabilized squeezed coherent state (SSCS) of light and introduce the $Π_{\rm N}$ indicator, a novel measure for characterizing nonclassicality in the resulting EPR-entangled state. Unlike existing methods based on Cauchy-Schwarz or Murihead inequalities, $Π_{\rm N}$ leverages analytical solutions to the quantum Langevin equations to directly analyze nonclassicality arising f…
▽ More
We consider a two-mode stabilized squeezed coherent state (SSCS) of light and introduce the $Π_{\rm N}$ indicator, a novel measure for characterizing nonclassicality in the resulting EPR-entangled state. Unlike existing methods based on Cauchy-Schwarz or Murihead inequalities, $Π_{\rm N}$ leverages analytical solutions to the quantum Langevin equations to directly analyze nonclassicality arising from key processes like bichromatic injection, frequency conversion, and parametric down-conversion (both spontaneous and stimulated). This approach not only identifies the optimal phase for maximum nonclassicality but also reveals two new phenomena: first, both intra-cavity and extra-cavity fields exhibit the same degree of nonclassicality, and second, balanced seeding in phase-mismatched configurations induces nonclassicality across a broad range of squeezing and seeding parameters. Our work deepens the understanding of the intricate dependence of nonclassicality on system parameters in the context of SSCS, paving the way for investigations into the quantum-to-classical transition in entangled systems. The potential of $Π_{\rm N}$ holds significant promise for advancements in quantum optics and information science.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Laser mode-hopping assisted all-optical single beam pulsed atomic magnetometer
Authors:
Ji Hoon Yoon,
Sang Hyuk Hong,
Taek Jeong,
Sin Hyuk Yim,
Kyu Min Shim,
Sangkyung Lee
Abstract:
We demonstrate an all-optical single beam pulsed atomic magnetometer assisted by laser mode-hopping in a distributed Bragg reflector (DBR) laser. We implement a temporal sequence of the laser current; sinusoidal current modulation including the laser mode-hop current for synchronous optical pumping and the following constant current for paramagnetic Faraday rotation measurements to probe the free…
▽ More
We demonstrate an all-optical single beam pulsed atomic magnetometer assisted by laser mode-hopping in a distributed Bragg reflector (DBR) laser. We implement a temporal sequence of the laser current; sinusoidal current modulation including the laser mode-hop current for synchronous optical pumping and the following constant current for paramagnetic Faraday rotation measurements to probe the free induction decay (FID) of transverse $^{87}$Rb spin polarization. Repetitive sudden frequency shifts of 20 GHz around the pressure-broadened $^{87}$Rb spectra originating from laser mode-hopping enables discontinuous optical pumping modulation with a large depth which enhances transverse spin polarization. We achieve a sensitivity of 3.77 pT/Hz$^{1/2}$ in a magnetic field of 14 $μ$T, limited by the performance of the frequency counter. The Cramer-Rao lower bound (CRLB) of the sensitivity due to the non-magnetic noise such as photon shot-noise is 191 fT/Hz$^{1/2}$. Our approach based on laser mode-hopping can be applied to miniaturization of all-optical atomic magnetometers with sub-pT/Hz$^{1/2}$ sensitivities.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame for 4D Medical Images
Authors:
JungEun Kim,
Hangyul Yoon,
Geondo Park,
Kyungsu Kim,
Eunho Yang
Abstract:
4D medical images, which represent 3D images with temporal information, are crucial in clinical practice for capturing dynamic changes and monitoring long-term disease progression. However, acquiring 4D medical images poses challenges due to factors such as radiation exposure and imaging duration, necessitating a balance between achieving high temporal resolution and minimizing adverse effects. Gi…
▽ More
4D medical images, which represent 3D images with temporal information, are crucial in clinical practice for capturing dynamic changes and monitoring long-term disease progression. However, acquiring 4D medical images poses challenges due to factors such as radiation exposure and imaging duration, necessitating a balance between achieving high temporal resolution and minimizing adverse effects. Given these circumstances, not only is data acquisition challenging, but increasing the frame rate for each dataset also proves difficult. To address this challenge, this paper proposes a simple yet effective Unsupervised Volumetric Interpolation framework, UVI-Net. This framework facilitates temporal interpolation without the need for any intermediate frames, distinguishing it from the majority of other existing unsupervised methods. Experiments on benchmark datasets demonstrate significant improvements across diverse evaluation metrics compared to unsupervised and supervised baselines. Remarkably, our approach achieves this superior performance even when trained with a dataset as small as one, highlighting its exceptional robustness and efficiency in scenarios with sparse supervision. This positions UVI-Net as a compelling alternative for 4D medical imaging, particularly in settings where data availability is limited. The source code is available at https://github.com/jungeun122333/UVI-Net.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Talk3D: High-Fidelity Talking Portrait Synthesis via Personalized 3D Generative Prior
Authors:
Jaehoon Ko,
Kyusun Cho,
Joungbin Lee,
Heeji Yoon,
Sangmin Lee,
Sangjun Ahn,
Seungryong Kim
Abstract:
Recent methods for audio-driven talking head synthesis often optimize neural radiance fields (NeRF) on a monocular talking portrait video, leveraging its capability to render high-fidelity and 3D-consistent novel-view frames. However, they often struggle to reconstruct complete face geometry due to the absence of comprehensive 3D information in the input monocular videos. In this paper, we introdu…
▽ More
Recent methods for audio-driven talking head synthesis often optimize neural radiance fields (NeRF) on a monocular talking portrait video, leveraging its capability to render high-fidelity and 3D-consistent novel-view frames. However, they often struggle to reconstruct complete face geometry due to the absence of comprehensive 3D information in the input monocular videos. In this paper, we introduce a novel audio-driven talking head synthesis framework, called Talk3D, that can faithfully reconstruct its plausible facial geometries by effectively adopting the pre-trained 3D-aware generative prior. Given the personalized 3D generative model, we present a novel audio-guided attention U-Net architecture that predicts the dynamic face variations in the NeRF space driven by audio. Furthermore, our model is further modulated by audio-unrelated conditioning tokens which effectively disentangle variations unrelated to audio features. Compared to existing methods, our method excels in generating realistic facial geometries even under extreme head poses. We also conduct extensive experiments showing our approach surpasses state-of-the-art benchmarks in terms of both quantitative and qualitative evaluations.
△ Less
Submitted 29 March, 2024;
originally announced March 2024.
-
C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion
Authors:
Hee Suk Yoon,
Eunseop Yoon,
Joshua Tian Jin Tee,
Mark Hasegawa-Johnson,
Yingzhen Li,
Chang D. Yoo
Abstract:
In deep learning, test-time adaptation has gained attention as a method for model fine-tuning without the need for labeled data. A prime exemplification is the recently proposed test-time prompt tuning for large-scale vision-language models such as CLIP. Unfortunately, these prompts have been mainly developed to improve accuracy, overlooking the importance of calibration, which is a crucial aspect…
▽ More
In deep learning, test-time adaptation has gained attention as a method for model fine-tuning without the need for labeled data. A prime exemplification is the recently proposed test-time prompt tuning for large-scale vision-language models such as CLIP. Unfortunately, these prompts have been mainly developed to improve accuracy, overlooking the importance of calibration, which is a crucial aspect for quantifying prediction uncertainty. However, traditional calibration methods rely on substantial amounts of labeled data, making them impractical for test-time scenarios. To this end, this paper explores calibration during test-time prompt tuning by leveraging the inherent properties of CLIP. Through a series of observations, we find that the prompt choice significantly affects the calibration in CLIP, where the prompts leading to higher text feature dispersion result in better-calibrated predictions. Introducing the Average Text Feature Dispersion (ATFD), we establish its relationship with calibration error and present a novel method, Calibrated Test-time Prompt Tuning (C-TPT), for optimizing prompts during test-time with enhanced calibration. Through extensive experiments on different CLIP architectures and datasets, we show that C-TPT can effectively improve the calibration of test-time prompt tuning without needing labeled data. The code is publicly accessible at https://github.com/hee-suk-yoon/C-TPT.
△ Less
Submitted 31 March, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
AdaMER-CTC: Connectionist Temporal Classification with Adaptive Maximum Entropy Regularization for Automatic Speech Recognition
Authors:
SooHwan Eom,
Eunseop Yoon,
Hee Suk Yoon,
Chanwoo Kim,
Mark Hasegawa-Johnson,
Chang D. Yoo
Abstract:
In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions. This phenomenon emerges as a side effect of Connectionist Temporal Classification (CTC), a robust sequence learning tool that utilizes dynamic programming for sequence mapping. While earlier efforts have tried to combine the CTC loss with an entropy maximization regulariz…
▽ More
In Automatic Speech Recognition (ASR) systems, a recurring obstacle is the generation of narrowly focused output distributions. This phenomenon emerges as a side effect of Connectionist Temporal Classification (CTC), a robust sequence learning tool that utilizes dynamic programming for sequence mapping. While earlier efforts have tried to combine the CTC loss with an entropy maximization regularization term to mitigate this issue, they employed a constant weighting term on the regularization during the training, which we find may not be optimal. In this work, we introduce Adaptive Maximum Entropy Regularization (AdaMER), a technique that can modulate the impact of entropy regularization throughout the training process. This approach not only refines ASR model training but ensures that as training proceeds, predictions display the desired model confidence.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Probing $p$-wave superconductivity in UTe$_2$ via point-contact junctions
Authors:
Hyeok Yoon,
Yun Suk Eo,
Jihun Park,
Jarryd A. Horn,
Ryan G. Dorman,
Shanta R. Saha,
Ian M. Hayes,
Ichiro Takeuchi,
Philip M. R. Brydon,
Johnpierre Paglione
Abstract:
Uranium ditelluride (UTe$_2$) is the strongest contender to date for a $p$-wave superconductor in bulk form. Here we perform a spectroscopic study of the ambient pressure superconducting phase of UTe$_2$, measuring conductance through point-contact junctions formed by metallic contacts on different crystalline facets down to 250 mK and up to 18 T. Fitting a range of qualitatively varying spectra w…
▽ More
Uranium ditelluride (UTe$_2$) is the strongest contender to date for a $p$-wave superconductor in bulk form. Here we perform a spectroscopic study of the ambient pressure superconducting phase of UTe$_2$, measuring conductance through point-contact junctions formed by metallic contacts on different crystalline facets down to 250 mK and up to 18 T. Fitting a range of qualitatively varying spectra with a Blonder-Tinkham-Klapwijk(BTK) model for $p$-wave pairing, we can extract gap amplitude and interface barrier strength for each junction. We find good agreement with the data for a $p_y$ -wave gap function with amplitude in 0.26 $\pm$ 0.06 meV. Our work provides spectroscopic evidence for a gap structure consistent with the proposed spin-triplet pairing in the superconducting state of UTe$_2$.
△ Less
Submitted 4 September, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
High-Field Superconducting Halo in UTe$_2$
Authors:
Sylvia K. Lewin,
Peter Czajka,
Corey E. Frank,
Gicela Saucedo Salas,
Hyeok Yoon,
Yun Suk Eo,
Johnpierre Paglione,
Andriy H. Nevidomskyy,
John Singleton,
Nicholas P. Butch
Abstract:
Heavy fermion UTe$_2$ is a promising candidate for topological superconductivity that also exhibits multiple high-field superconducting phases. The SC$_{\rm{FP}}$ phase has only been observed in off-axis magnetic fields in the $bc$ plane at fields greater than 40 teslas, a striking scale given its critical temperature of only 2 kelvins. Here, we extend measurements of this unique superconducting s…
▽ More
Heavy fermion UTe$_2$ is a promising candidate for topological superconductivity that also exhibits multiple high-field superconducting phases. The SC$_{\rm{FP}}$ phase has only been observed in off-axis magnetic fields in the $bc$ plane at fields greater than 40 teslas, a striking scale given its critical temperature of only 2 kelvins. Here, we extend measurements of this unique superconducting state outside of the $bc$ plane and reveal its core structure. The SC$_{\rm{FP}}$ phase is not confined to fields in the $bc$ plane and in fact wraps around the $b$ axis in a halo-like fashion. In other words, this superconducting state, which exists in fields above 73 teslas, is stabilized by a field component perpendicular to the magnetic easy axis. These remarkable field scales further underscore UTe$_2$'s unique magnetophilic superconducting tendencies and suggest an underlying pairing mechanism that is qualitatively distinct from known theories for field-enhanced superconductivity. Phenomenological modeling points to a two-component, non-unitary spin triplet order parameter with finite orbital momentum of the Cooper pairs as a natural explanation for the field-angle dependence of the upper critical field of the SC$_{\rm{FP}}$ phase.
△ Less
Submitted 28 February, 2024;
originally announced February 2024.
-
EHRNoteQA: An LLM Benchmark for Real-World Clinical Practice Using Discharge Summaries
Authors:
Sunjun Kweon,
Jiyoun Kim,
Heeyoung Kwak,
Dongchul Cha,
Hangyul Yoon,
Kwanghyun Kim,
Jeewon Yang,
Seunghyun Won,
Edward Choi
Abstract:
Discharge summaries in Electronic Health Records (EHRs) are crucial for clinical decision-making, but their length and complexity make information extraction challenging, especially when dealing with accumulated summaries across multiple patient admissions. Large Language Models (LLMs) show promise in addressing this challenge by efficiently analyzing vast and complex data. Existing benchmarks, ho…
▽ More
Discharge summaries in Electronic Health Records (EHRs) are crucial for clinical decision-making, but their length and complexity make information extraction challenging, especially when dealing with accumulated summaries across multiple patient admissions. Large Language Models (LLMs) show promise in addressing this challenge by efficiently analyzing vast and complex data. Existing benchmarks, however, fall short in properly evaluating LLMs' capabilities in this context, as they typically focus on single-note information or limited topics, failing to reflect the real-world inquiries required by clinicians. To bridge this gap, we introduce EHRNoteQA, a novel benchmark built on the MIMIC-IV EHR, comprising 962 different QA pairs each linked to distinct patients' discharge summaries. Every QA pair is initially generated using GPT-4 and then manually reviewed and refined by three clinicians to ensure clinical relevance. EHRNoteQA includes questions that require information across multiple discharge summaries and covers eight diverse topics, mirroring the complexity and diversity of real clinical inquiries. We offer EHRNoteQA in two formats: open-ended and multi-choice question answering, and propose a reliable evaluation method for each. We evaluate 27 LLMs using EHRNoteQA and examine various factors affecting the model performance (e.g., the length and number of discharge summaries). Furthermore, to validate EHRNoteQA as a reliable proxy for expert evaluations in clinical practice, we measure the correlation between the LLM performance on EHRNoteQA, and the LLM performance manually evaluated by clinicians. Results show that LLM performance on EHRNoteQA have higher correlation with clinician-evaluated performance (Spearman: 0.78, Kendall: 0.62) compared to other benchmarks, demonstrating its practical relevance in evaluating LLMs in clinical settings.
△ Less
Submitted 27 June, 2024; v1 submitted 25 February, 2024;
originally announced February 2024.
-
Interplay between 2D ferromagnetism and transport at the surface of FeSi
Authors:
Keenan E. Avers,
Yun Suk Eo,
Hyeok Yoon,
Jarryd A. Horn,
Shanta R. Saha,
Alonso Suarez,
Peter Zavalij,
Johnpierre Paglione
Abstract:
FeSi is a curious example of a $d$-electron system that manifests many of the same phenomena associated with $f$-electron Kondo insulators, including conducting surface states with potentially non-trivial topology. Here we investigate the magnetization and magnetotransport of these surface states and how a 2D ferromagnetic state at the surface of FeSi influences the surface conductivity. We confir…
▽ More
FeSi is a curious example of a $d$-electron system that manifests many of the same phenomena associated with $f$-electron Kondo insulators, including conducting surface states with potentially non-trivial topology. Here we investigate the magnetization and magnetotransport of these surface states and how a 2D ferromagnetic state at the surface of FeSi influences the surface conductivity. We confirm the 2D ferromagnetism via a systematic study of magnetization on groups of filtered fragments with increasing surface area-to-volume ratios, identifying characteristic temperatures and magnetic fields associated with the ordered state. The paramagnetic to ferromagnetic transition appears broadened, suggesting disorder, which allows spin fluctuations to manifest up to at least 9 T at 2 K. This highlights the need to understand the relation between the disorder of the 2D ferromagnetism and the surface conductivity in FeSi.
△ Less
Submitted 21 February, 2024;
originally announced February 2024.
-
On the convergence of the graph sequence $\left\{ C^m(D) \right\}_{m=1}^{\infty}$ for a multipartite tournament $D$
Authors:
Ji-Hwan Jung,
Suh-Ryung Kim,
Hyesun Yoon
Abstract:
Given a positive integer $m$, the $m$-step competition graph of a digraph $D$, denoted by $C^m(D)$, has the same vertex set as $D$ and has an edge between vertices $u$ and $v$ if and only if there exists a vertex $w$ such that there exist directed walks of length $m$ from $u$ to $w$ and from $v$ to $w$, respectively. In this paper, we completely characterize the convergence of…
▽ More
Given a positive integer $m$, the $m$-step competition graph of a digraph $D$, denoted by $C^m(D)$, has the same vertex set as $D$ and has an edge between vertices $u$ and $v$ if and only if there exists a vertex $w$ such that there exist directed walks of length $m$ from $u$ to $w$ and from $v$ to $w$, respectively. In this paper, we completely characterize the convergence of $\{C^m(D)\}_{m=1}^{\infty}$ for a multipartite tournament $D$ based on the last nontrivial strong component of $D$. Furthermore, not only do we determine the limit in the case of convergence, but also in the event of divergence, we specify how $C^m(D)$ changes periodically depending on the value of $m$. Our results extend the work of Jung et al. [On the limit of the sequence $\{C^m (D)\}_{m=1}^{\infty}$ for a multipartite tournament $D$. Discrete Appl. Math., 340:1--13, 2023] which addresses the case of the last strong component being nontrivial, thereby completing the convergence analysis of $\{C^m(D)\}_{m=1}^{\infty}$ for a multipartite tournament $D$.
Our results can also be expressed in terms of matrix sequence $\{A^m(A^T)^m\}_{m=1}^{\infty}$ for the adjacency matrix $A$ of $D$ and this part is also covered in the text.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Evaluation of General Large Language Models in Contextually Assessing Semantic Concepts Extracted from Adult Critical Care Electronic Health Record Notes
Authors:
Darren Liu,
Cheng Ding,
Delgersuren Bold,
Monique Bouvier,
Jiaying Lu,
Benjamin Shickel,
Craig S. Jabaley,
Wenhui Zhang,
Soojin Park,
Michael J. Young,
Mark S. Wainwright,
Gilles Clermont,
Parisa Rashidi,
Eric S. Rosenthal,
Laurie Dimisko,
Ran Xiao,
Joo Heung Yoon,
Carl Yang,
Xiao Hu
Abstract:
The field of healthcare has increasingly turned its focus towards Large Language Models (LLMs) due to their remarkable performance. However, their performance in actual clinical applications has been underexplored. Traditional evaluations based on question-answering tasks don't fully capture the nuanced contexts. This gap highlights the need for more in-depth and practical assessments of LLMs in r…
▽ More
The field of healthcare has increasingly turned its focus towards Large Language Models (LLMs) due to their remarkable performance. However, their performance in actual clinical applications has been underexplored. Traditional evaluations based on question-answering tasks don't fully capture the nuanced contexts. This gap highlights the need for more in-depth and practical assessments of LLMs in real-world healthcare settings. Objective: We sought to evaluate the performance of LLMs in the complex clinical context of adult critical care medicine using systematic and comprehensible analytic methods, including clinician annotation and adjudication. Methods: We investigated the performance of three general LLMs in understanding and processing real-world clinical notes. Concepts from 150 clinical notes were identified by MetaMap and then labeled by 9 clinicians. Each LLM's proficiency was evaluated by identifying the temporality and negation of these concepts using different prompts for an in-depth analysis. Results: GPT-4 showed overall superior performance compared to other LLMs. In contrast, both GPT-3.5 and text-davinci-003 exhibit enhanced performance when the appropriate prompting strategies are employed. The GPT family models have demonstrated considerable efficiency, evidenced by their cost-effectiveness and time-saving capabilities. Conclusion: A comprehensive qualitative performance evaluation framework for LLMs is developed and operationalized. This framework goes beyond singular performance aspects. With expert annotations, this methodology not only validates LLMs' capabilities in processing complex medical data but also establishes a benchmark for future LLM evaluations across specialized domains.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Broadband miniaturized spectrometers with a van der Waals tunnel diode
Authors:
MD Gius Uddin,
Susobhan Das,
Abde Mayeen Shafi,
Lei Wang,
Xiaoqi Cui,
Fedor Nigmatulin,
Faisal Ahmed,
Andreas C. Liapis,
Weiwei Cai,
Zongyin Yang,
Harri Lipsanen,
Tawfique Hasan,
Hoon Hahn Yoon,
Zhipei Sun
Abstract:
Miniaturized spectrometers are of immense interest for various on-chip and implantable photonic and optoelectronic applications. State-of-the-art conventional spectrometer designs rely heavily on bulky dispersive components (such as gratings, photodetector arrays, and interferometric optics) to capture different input spectral components that increase their integration complexity. Here, we report…
▽ More
Miniaturized spectrometers are of immense interest for various on-chip and implantable photonic and optoelectronic applications. State-of-the-art conventional spectrometer designs rely heavily on bulky dispersive components (such as gratings, photodetector arrays, and interferometric optics) to capture different input spectral components that increase their integration complexity. Here, we report a high-performance broadband spectrometer based on a simple and compact van der Waals heterostructure diode, leveraging a careful selection of active van der Waals materials -- molybdenum disulfide and black phosphorus, their electrically tunable photoresponse, and advanced computational algorithms for spectral reconstruction. We achieve remarkably high peak wavelength accuracy of ~2 nanometers, and broad operation bandwidth spanning from ~500 to 1600 nanometers in a device with a ~30x20 μm2 footprint. This diode-based spectrometer scheme with broadband operation offers an attractive pathway for various applications, such as sensing, surveillance and spectral imaging.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
HEAR: Hearing Enhanced Audio Response for Video-grounded Dialogue
Authors:
Sunjae Yoon,
Dahyun Kim,
Eunseop Yoon,
Hee Suk Yoon,
Junyeong Kim,
Chnag D. Yoo
Abstract:
Video-grounded Dialogue (VGD) aims to answer questions regarding a given multi-modal input comprising video, audio, and dialogue history. Although there have been numerous efforts in developing VGD systems to improve the quality of their responses, existing systems are competent only to incorporate the information in the video and text and tend to struggle in extracting the necessary information f…
▽ More
Video-grounded Dialogue (VGD) aims to answer questions regarding a given multi-modal input comprising video, audio, and dialogue history. Although there have been numerous efforts in developing VGD systems to improve the quality of their responses, existing systems are competent only to incorporate the information in the video and text and tend to struggle in extracting the necessary information from the audio when generating appropriate responses to the question. The VGD system seems to be deaf, and thus, we coin this symptom of current systems' ignoring audio data as a deaf response. To overcome the deaf response problem, Hearing Enhanced Audio Response (HEAR) framework is proposed to perform sensible listening by selectively attending to audio whenever the question requires it. The HEAR framework enhances the accuracy and audibility of VGD systems in a model-agnostic manner. HEAR is validated on VGD datasets (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows effectiveness with various VGD systems.
△ Less
Submitted 15 December, 2023;
originally announced December 2023.
-
Neural Radiance Fields for Transparent Object Using Visual Hull
Authors:
Heechan Yoon,
Seungkyu Lee
Abstract:
Unlike opaque object, novel view synthesis of transparent object is a challenging task, because transparent object refracts light of background causing visual distortions on the transparent object surface along the viewpoint change. Recently introduced Neural Radiance Fields (NeRF) is a view synthesis method. Thanks to its remarkable performance improvement, lots of following applications based on…
▽ More
Unlike opaque object, novel view synthesis of transparent object is a challenging task, because transparent object refracts light of background causing visual distortions on the transparent object surface along the viewpoint change. Recently introduced Neural Radiance Fields (NeRF) is a view synthesis method. Thanks to its remarkable performance improvement, lots of following applications based on NeRF in various topics have been developed. However, if an object with a different refractive index is included in a scene such as transparent object, NeRF shows limited performance because refracted light ray at the surface of the transparent object is not appropriately considered. To resolve the problem, we propose a NeRF-based method consisting of the following three steps: First, we reconstruct a three-dimensional shape of a transparent object using visual hull. Second, we simulate the refraction of the rays inside of the transparent object according to Snell's law. Last, we sample points through refracted rays and put them into NeRF. Experimental evaluation results demonstrate that our method addresses the limitation of conventional NeRF with transparent objects.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
SimPSI: A Simple Strategy to Preserve Spectral Information in Time Series Data Augmentation
Authors:
Hyun Ryu,
Sunjae Yoon,
Hee Suk Yoon,
Eunseop Yoon,
Chang D. Yoo
Abstract:
Data augmentation is a crucial component in training neural networks to overcome the limitation imposed by data size, and several techniques have been studied for time series. Although these techniques are effective in certain tasks, they have yet to be generalized to time series benchmarks. We find that current data augmentation techniques ruin the core information contained within the frequency…
▽ More
Data augmentation is a crucial component in training neural networks to overcome the limitation imposed by data size, and several techniques have been studied for time series. Although these techniques are effective in certain tasks, they have yet to be generalized to time series benchmarks. We find that current data augmentation techniques ruin the core information contained within the frequency domain. To address this issue, we propose a simple strategy to preserve spectral information (SimPSI) in time series data augmentation. SimPSI preserves the spectral information by mixing the original and augmented input spectrum weighted by a preservation map, which indicates the importance score of each frequency. Specifically, our experimental contributions are to build three distinct preservation maps: magnitude spectrum, saliency map, and spectrum-preservative map. We apply SimPSI to various time series data augmentations and evaluate its effectiveness across a wide range of time series benchmarks. Our experimental results support that SimPSI considerably enhances the performance of time series data augmentations by preserving core spectral information. The source code used in the paper is available at https://github.com/Hyun-Ryu/simpsi.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
Single Image Reflection Removal with Reflection Intensity Prior Knowledge
Authors:
Dongshen Han,
Seungkyu Lee,
Chaoning Zhang,
Heechan Yoon,
Hyukmin Kwon,
HyunCheol Kim,
HyonGon Choo
Abstract:
Single Image Reflection Removal (SIRR) in real-world images is a challenging task due to diverse image degradations occurring on the glass surface during light transmission and reflection. Many existing methods rely on specific prior assumptions to resolve the problem. In this paper, we propose a general reflection intensity prior that captures the intensity of the reflection phenomenon and demons…
▽ More
Single Image Reflection Removal (SIRR) in real-world images is a challenging task due to diverse image degradations occurring on the glass surface during light transmission and reflection. Many existing methods rely on specific prior assumptions to resolve the problem. In this paper, we propose a general reflection intensity prior that captures the intensity of the reflection phenomenon and demonstrate its effectiveness. To learn the reflection intensity prior, we introduce the Reflection Prior Extraction Network (RPEN). By segmenting images into regional patches, RPEN learns non-uniform reflection prior in an image. We propose Prior-based Reflection Removal Network (PRRN) using a simple transformer U-Net architecture that adapts reflection prior fed from RPEN. Experimental results on real-world benchmarks demonstrate the effectiveness of our approach achieving state-of-the-art accuracy in SIRR.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Synergistic Perception and Control Simplex for Verifiable Safe Vertical Landing
Authors:
Ayoosh Bansal,
Yang Zhao,
James Zhu,
Sheng Cheng,
Yuliang Gu,
Hyung-Jin Yoon,
Hunmin Kim,
Naira Hovakimyan,
Lui Sha
Abstract:
Perception, Planning, and Control form the essential components of autonomy in advanced air mobility. This work advances the holistic integration of these components to enhance the performance and robustness of the complete cyber-physical system. We adapt Perception Simplex, a system for verifiable collision avoidance amidst obstacle detection faults, to the vertical landing maneuver for autonomou…
▽ More
Perception, Planning, and Control form the essential components of autonomy in advanced air mobility. This work advances the holistic integration of these components to enhance the performance and robustness of the complete cyber-physical system. We adapt Perception Simplex, a system for verifiable collision avoidance amidst obstacle detection faults, to the vertical landing maneuver for autonomous air mobility vehicles. We improve upon this system by replacing static assumptions of control capabilities with dynamic confirmation, i.e., real-time confirmation of control limitations of the system, ensuring reliable fulfillment of safety maneuvers and overrides, without dependence on overly pessimistic assumptions. Parameters defining control system capabilities and limitations, e.g., maximum deceleration, are continuously tracked within the system and used to make safety-critical decisions. We apply these techniques to propose a verifiable collision avoidance solution for autonomous aerial mobility vehicles operating in cluttered and potentially unsafe environments.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Augmenting x-ray single particle imaging reconstruction with self-supervised machine learning
Authors:
Zhantao Chen,
Cong Wang,
Mingye Gao,
Chun Hong Yoon,
Jana B. Thayer,
Joshua J. Turner
Abstract:
The development of X-ray Free Electron Lasers (XFELs) has opened numerous opportunities to probe atomic structure and ultrafast dynamics of various materials. Single Particle Imaging (SPI) with XFELs enables the investigation of biological particles in their natural physiological states with unparalleled temporal resolution, while circumventing the need for cryogenic conditions or crystallization.…
▽ More
The development of X-ray Free Electron Lasers (XFELs) has opened numerous opportunities to probe atomic structure and ultrafast dynamics of various materials. Single Particle Imaging (SPI) with XFELs enables the investigation of biological particles in their natural physiological states with unparalleled temporal resolution, while circumventing the need for cryogenic conditions or crystallization. However, reconstructing real-space structures from reciprocal-space x-ray diffraction data is highly challenging due to the absence of phase and orientation information, which is further complicated by weak scattering signals and considerable fluctuations in the number of photons per pulse. In this work, we present an end-to-end, self-supervised machine learning approach to recover particle orientations and estimate reciprocal space intensities from diffraction images only. Our method demonstrates great robustness under demanding experimental conditions with significantly enhanced reconstruction capabilities compared with conventional algorithms, and signifies a paradigm shift in SPI as currently practiced at XFELs.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Cracking the Code of Negative Transfer: A Cooperative Game Theoretic Approach for Cross-Domain Sequential Recommendation
Authors:
Chung Park,
Taesan Kim,
Taekyoon Choi,
Junui Hong,
Yelim Yu,
Mincheol Cho,
Kyunam Lee,
Sungil Ryu,
Hyungjun Yoon,
Minsung Choi,
Jaegul Choo
Abstract:
This paper investigates Cross-Domain Sequential Recommendation (CDSR), a promising method that uses information from multiple domains (more than three) to generate accurate and diverse recommendations, and takes into account the sequential nature of user interactions. The effectiveness of these systems often depends on the complex interplay among the multiple domains. In this dynamic landscape, th…
▽ More
This paper investigates Cross-Domain Sequential Recommendation (CDSR), a promising method that uses information from multiple domains (more than three) to generate accurate and diverse recommendations, and takes into account the sequential nature of user interactions. The effectiveness of these systems often depends on the complex interplay among the multiple domains. In this dynamic landscape, the problem of negative transfer arises, where heterogeneous knowledge between dissimilar domains leads to performance degradation due to differences in user preferences across these domains. As a remedy, we propose a new CDSR framework that addresses the problem of negative transfer by assessing the extent of negative transfer from one domain to another and adaptively assigning low weight values to the corresponding prediction losses. To this end, the amount of negative transfer is estimated by measuring the marginal contribution of each domain to model performance based on a cooperative game theory. In addition, a hierarchical contrastive learning approach that incorporates information from the sequence of coarse-level categories into that of fine-level categories (e.g., item level) when implementing contrastive learning was developed to mitigate negative transfer. Despite the potentially low relevance between domains at the fine-level, there may be higher relevance at the category level due to its generalised and broader preferences. We show that our model is superior to prior works in terms of model performance on two real-world datasets across ten different domains.
△ Less
Submitted 22 November, 2023;
originally announced November 2023.
-
Ultra-Long Sequence Distributed Transformer
Authors:
Xiao Wang,
Isaac Lyngaas,
Aristeidis Tsaris,
Peng Chen,
Sajal Dash,
Mayanka Chandra Shekar,
Tao Luo,
Hong-Jun Yoon,
Mohamed Wahib,
John Gouley
Abstract:
Transformer models trained on long sequences often achieve higher accuracy than short sequences. Unfortunately, conventional transformers struggle with long sequence training due to the overwhelming computation and memory requirements. Existing methods for long sequence training offer limited speedup and memory reduction, and may compromise accuracy. This paper presents a novel and efficient distr…
▽ More
Transformer models trained on long sequences often achieve higher accuracy than short sequences. Unfortunately, conventional transformers struggle with long sequence training due to the overwhelming computation and memory requirements. Existing methods for long sequence training offer limited speedup and memory reduction, and may compromise accuracy. This paper presents a novel and efficient distributed training method, the Long Short-Sequence Transformer (LSS Transformer), for training transformer with long sequences. It distributes a long sequence into segments among GPUs, with each GPU computing a partial self-attention for its segment. Then, it uses a fused communication and a novel double gradient averaging technique to avoid the need to aggregate partial self-attention and minimize communication overhead. We evaluated the performance between LSS Transformer and the state-of-the-art Nvidia sequence parallelism on a Wikipedia enwik8 dataset. Results show that our proposed method lead to 5.6x faster and 10.2x more memory-efficient implementation compared to state-of-the-art sequence parallelism on 144 Nvidia V100 GPUs. Moreover, our algorithm scales to an extreme sequence length of 50,112 at 3,456 GPUs, achieving 161% super-linear parallel efficiency and a throughput of 32 petaflops.
△ Less
Submitted 8 November, 2023; v1 submitted 4 November, 2023;
originally announced November 2023.
-
CXR-LLAVA: a multimodal large language model for interpreting chest X-ray images
Authors:
Seowoo Lee,
Jiwon Youn,
Hyungjin Kim,
Mansu Kim,
Soon Ho Yoon
Abstract:
Purpose: This study aimed to develop an open-source multimodal large language model (CXR-LLAVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists Materials and Methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain…
▽ More
Purpose: This study aimed to develop an open-source multimodal large language model (CXR-LLAVA) for interpreting chest X-ray images (CXRs), leveraging recent advances in large language models (LLMs) to potentially replicate the image interpretation skills of human radiologists Materials and Methods: For training, we collected 592,580 publicly available CXRs, of which 374,881 had labels for certain radiographic abnormalities (Dataset 1) and 217,699 provided free-text radiology reports (Dataset 2). After pre-training a vision transformer with Dataset 1, we integrated it with an LLM influenced by the LLAVA network. Then, the model was fine-tuned, primarily using Dataset 2. The model's diagnostic performance for major pathological findings was evaluated, along with the acceptability of radiologic reports by human radiologists, to gauge its potential for autonomous reporting. Results: The model demonstrated impressive performance in test sets, achieving an average F1 score of 0.81 for six major pathological findings in the MIMIC internal test set and 0.62 for seven major pathological findings in the external test set. The model's F1 scores surpassed those of GPT-4-vision and Gemini-Pro-Vision in both test sets. In human radiologist evaluations of the external test set, the model achieved a 72.7% success rate in autonomous reporting, slightly below the 84.0% rate of ground truth reports. Conclusion: This study highlights the significant potential of multimodal LLMs for CXR interpretation, while also acknowledging the performance limitations. Despite these challenges, we believe that making our model open-source will catalyze further research, expanding its effectiveness and applicability in various clinical contexts. CXR-LLAVA is available at https://github.com/ECOFRI/CXR_LLAVA.
△ Less
Submitted 14 January, 2024; v1 submitted 22 October, 2023;
originally announced October 2023.
-
FedTherapist: Mental Health Monitoring with User-Generated Linguistic Expressions on Smartphones via Federated Learning
Authors:
Jaemin Shin,
Hyungjun Yoon,
Seungjoo Lee,
Sungjoon Park,
Yunxin Liu,
Jinho D. Choi,
Sung-Ju Lee
Abstract:
Psychiatrists diagnose mental disorders via the linguistic use of patients. Still, due to data privacy, existing passive mental health monitoring systems use alternative features such as activity, app usage, and location via mobile devices. We propose FedTherapist, a mobile mental health monitoring system that utilizes continuous speech and keyboard input in a privacy-preserving way via federated…
▽ More
Psychiatrists diagnose mental disorders via the linguistic use of patients. Still, due to data privacy, existing passive mental health monitoring systems use alternative features such as activity, app usage, and location via mobile devices. We propose FedTherapist, a mobile mental health monitoring system that utilizes continuous speech and keyboard input in a privacy-preserving way via federated learning. We explore multiple model designs by comparing their performance and overhead for FedTherapist to overcome the complex nature of on-device language model training on smartphones. We further propose a Context-Aware Language Learning (CALL) methodology to effectively utilize smartphones' large and noisy text for mental health signal sensing. Our IRB-approved evaluation of the prediction of self-reported depression, stress, anxiety, and mood from 46 participants shows higher accuracy of FedTherapist compared with the performance with non-language features, achieving 0.15 AUROC improvement and 8.21% MAE reduction.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Diversity Enhanced Narrative Question Generation for Storybooks
Authors:
Hokeun Yoon,
JinYeong Bak
Abstract:
Question generation (QG) from a given context can enhance comprehension, engagement, assessment, and overall efficacy in learning or conversational environments. Despite recent advancements in QG, the challenge of enhancing or measuring the diversity of generated questions often remains unaddressed. In this paper, we introduce a multi-question generation model (mQG), which is capable of generating…
▽ More
Question generation (QG) from a given context can enhance comprehension, engagement, assessment, and overall efficacy in learning or conversational environments. Despite recent advancements in QG, the challenge of enhancing or measuring the diversity of generated questions often remains unaddressed. In this paper, we introduce a multi-question generation model (mQG), which is capable of generating multiple, diverse, and answerable questions by focusing on context and questions. To validate the answerability of the generated questions, we employ a SQuAD2.0 fine-tuned question answering model, classifying the questions as answerable or not. We train and evaluate mQG on the FairytaleQA dataset, a well-structured QA dataset based on storybooks, with narrative questions. We further apply a zero-shot adaptation on the TellMeWhy and SQuAD1.1 datasets. mQG shows promising results across various evaluation metrics, among strong baselines.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
The FLASH pilot survey: an HI absorption search against MRC 1-Jy radio sources
Authors:
J. N. H. S. Aditya,
Hyein Yoon,
James R. Allison,
Tao An,
Rajan Chhetri,
Stephen J. Curran,
Jeremy Darling,
Kimberly L. Emig,
Marcin Glowacki,
Emily Kerrison,
Bärbel S. Koribalski,
Elizabeth K. Mahony,
Vanessa A. Moss,
John Morgan,
Elaine M. Sadler,
Roberto Soria,
Renzhi Su,
Simon Weng,
Matthew Whiting
Abstract:
We report an ASKAP search for associated HI 21-cm absorption against bright radio sources from the Molonglo Reference Catalogue (MRC) 1-Jy sample. The search uses pilot survey data from the ASKAP First Large Absorption Survey in \hi (FLASH) covering the redshift range $0.42 < z < 1.00$. From a sample of 62 MRC 1-Jy radio galaxies and quasars in this redshift range we report three new detections of…
▽ More
We report an ASKAP search for associated HI 21-cm absorption against bright radio sources from the Molonglo Reference Catalogue (MRC) 1-Jy sample. The search uses pilot survey data from the ASKAP First Large Absorption Survey in \hi (FLASH) covering the redshift range $0.42 < z < 1.00$. From a sample of 62 MRC 1-Jy radio galaxies and quasars in this redshift range we report three new detections of associated HI 21-cm absorption, yielding an overall detection fraction of $1.8\%^{+4.0\%}_{-1.5\%}$. The detected systems comprise two radio galaxies (MRC 2216$-$281 at $z=0.657$ and MRC 0531$-$237 at $z=0.851$) and one quasar (MRC 2156$-$245 at $z=0.862$). The MRC 0531$-$237 absorption system is the strongest found to date, with a velocity integrated optical depth of $\rm 143.8 \pm 0.4 \ km \ s^{-1}$. All three objects with detected HI 21-cm absorption are peaked-spectrum or compact steep-spectrum (CSS) radio sources, classified based on our SED fits to the spectra. Two of them show strong interplanetary scintillation at 162 MHz, implying that the radio continuum source is smaller than 1 arcsec in size even at low frequencies. Among the class of peaked-spectrum and compact steep-spectrum radio sources, the HI detection fraction is $23\%^{+22\%}_{-13\%}$. This is consistent within $1σ$ with a detection fraction of $\approx 42\%^{+21\%}_{-15\%}$ in earlier reported GPS and CSS samples at intermediate redshifts ($0.4 < z < 1.0$). All three detections have a high 1.4 GHz radio luminosity, with MRC 0531$-$237 and MRC 2216$-$281 having the highest values in the sample, $\rm > 27.5 \ W \ Hz^{-1}$. The preponderance of extended radio sources in our sample could partially explain the overall low detection fraction, while the effects of a redshift evolution in gas properties and AGN UV luminosity on the neutral gas absorption still need to be investigated.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Efficient machine-learning surrogates for large-scale geological carbon and energy storage
Authors:
Teeratorn Kadeethum,
Stephen J. Verzi,
Hongkyu Yoon
Abstract:
Geological carbon and energy storage are pivotal for achieving net-zero carbon emissions and addressing climate change. However, they face uncertainties due to geological factors and operational limitations, resulting in possibilities of induced seismic events or groundwater contamination. To overcome these challenges, we propose a specialized machine-learning (ML) model to manage extensive reserv…
▽ More
Geological carbon and energy storage are pivotal for achieving net-zero carbon emissions and addressing climate change. However, they face uncertainties due to geological factors and operational limitations, resulting in possibilities of induced seismic events or groundwater contamination. To overcome these challenges, we propose a specialized machine-learning (ML) model to manage extensive reservoir models efficiently.
While ML approaches hold promise for geological carbon storage, the substantial computational resources required for large-scale analysis are the obstacle. We've developed a method to reduce the training cost for deep neural operator models, using domain decomposition and a topology embedder to link spatio-temporal points. This approach allows accurate predictions within the model's domain, even for untrained data, enhancing ML efficiency for large-scale geological storage applications.
△ Less
Submitted 11 October, 2023;
originally announced October 2023.