Search | arXiv e-print repository

arXiv:2409.11924 [pdf, other]

Optical intensity-gradient torque due to chiral multipole interplay

Authors: Jiquan Wen, Huajin Chen, Hongxia Zheng, Xiaohao Xu, Shaohui Yan, Baoli Yao, Zhifang Lin

Abstract: Owing to the ubiquity and easy-to-shape property of optical intensity, the intensity gradient force of light has been most spectacularly exploited in optical manipulation of small particles. Manifesting the intensity gradient as an optical torque to spin particles is of great fascination on both fundamental and practical sides but remains elusive. Here, we uncover the existence of the optical inte… ▽ More Owing to the ubiquity and easy-to-shape property of optical intensity, the intensity gradient force of light has been most spectacularly exploited in optical manipulation of small particles. Manifesting the intensity gradient as an optical torque to spin particles is of great fascination on both fundamental and practical sides but remains elusive. Here, we uncover the existence of the optical intensity-gradient torque in the interaction of light with chiral particles. Such a new type of torque derives from the interplay between chirality induced multipoles, which switches its direction for particles with opposite chirality. We show that this torque can be directly detected by a simple standing wave field, created with the interference of two counterpropagating plane-like waves. Our work offers a unique route to achieve rotational control of matter by tailoring the field intensity of Maxwell waves. It also establishes a framework that maps a remarkable connection among the optical forces and torques, across chiral to nonchiral. △ Less

Submitted 18 September, 2024; originally announced September 2024.

arXiv:2409.05118 [pdf, other]

Physics-augmented Deep Learning with Adversarial Domain Adaptation: Applications to STM Image Denoising

Authors: Jianxin Xie, Wonhee Ko, Rui-Xing Zhang, Bing Yao

Abstract: Image denoising is a critical task in various scientific fields such as medical imaging and material characterization, where the accurate recovery of underlying structures from noisy data is essential. Although supervised denoising techniques have achieved significant advancements, they typically require large datasets of paired clean-noisy images for training. Unsupervised methods, while not reli… ▽ More Image denoising is a critical task in various scientific fields such as medical imaging and material characterization, where the accurate recovery of underlying structures from noisy data is essential. Although supervised denoising techniques have achieved significant advancements, they typically require large datasets of paired clean-noisy images for training. Unsupervised methods, while not reliant on paired data, typically necessitate a set of unpaired clean images for training, which are not always accessible. In this paper, we propose a physics-augmented deep learning with adversarial domain adaption (PDA-Net) framework for unsupervised image denoising, with applications to denoise real-world scanning tunneling microscopy (STM) images. Our PDA-Net leverages the underlying physics to simulate and envision the ground truth for denoised STM images. Additionally, built upon Generative Adversarial Networks (GANs), we incorporate a cycle-consistency module and a domain adversarial module into our PDA-Net to address the challenge of lacking paired training data and achieve information transfer between the simulated and real experimental domains. Finally, we propose to implement feature alignment and weight-sharing techniques to fully exploit the similarity between simulated and real experimental images, thereby enhancing the denoising performance in both the simulation and experimental domains. Experimental results demonstrate that the proposed PDA-Net successfully enhances the quality of STM images, offering promising applications to enhance scientific discovery and accelerate experimental quantum material research. △ Less

Submitted 8 September, 2024; originally announced September 2024.

arXiv:2408.16465 [pdf, other]

Human and LLM-Based Voice Assistant Interaction: An Analytical Framework for User Verbal and Nonverbal Behaviors

Authors: Szeyi Chan, Shihan Fu, Jiachen Li, Bingsheng Yao, Smit Desai, Mirjana Prpa, Dakuo Wang

Abstract: Recent progress in large language model (LLM) technology has significantly enhanced the interaction experience between humans and voice assistants (VAs). This project aims to explore a user's continuous interaction with LLM-based VA (LLM-VA) during a complex task. We recruited 12 participants to interact with an LLM-VA during a cooking task, selected for its complexity and the requirement for cont… ▽ More Recent progress in large language model (LLM) technology has significantly enhanced the interaction experience between humans and voice assistants (VAs). This project aims to explore a user's continuous interaction with LLM-based VA (LLM-VA) during a complex task. We recruited 12 participants to interact with an LLM-VA during a cooking task, selected for its complexity and the requirement for continuous interaction. We observed that users show both verbal and nonverbal behaviors, though they know that the LLM-VA can not capture those nonverbal signals. Despite the prevalence of nonverbal behavior in human-human communication, there is no established analytical methodology or framework for exploring it in human-VA interactions. After analyzing 3 hours and 39 minutes of video recordings, we developed an analytical framework with three dimensions: 1) behavior characteristics, including both verbal and nonverbal behaviors, 2) interaction stages--exploration, conflict, and integration--that illustrate the progression of user interactions, and 3) stage transition throughout the task. This analytical framework identifies key verbal and nonverbal behaviors that provide a foundation for future research and practical applications in optimizing human and LLM-VA interactions. △ Less

Submitted 3 September, 2024; v1 submitted 29 August, 2024; originally announced August 2024.

arXiv:2408.03586 [pdf, other]

Clinical Challenges and AI Opportunities in Decision-Making for Cancer Treatment-Induced Cardiotoxicity

Authors: Siyi Wu, Weidan Cao, Shihan Fu, Bingsheng Yao, Ziqi Yang, Changchang Yin, Varun Mishra, Daniel Addison, Ping Zhang, Dakuo Wang

Abstract: Cardiotoxicity induced by cancer treatment has become a major clinical concern, affecting the long-term survival and quality of life of cancer patients. Effective clinical decision-making, including the detection of cancer treatment-induced cardiotoxicity and the monitoring of associated symptoms, remains a challenging task for clinicians. This study investigates the current practices and needs of… ▽ More Cardiotoxicity induced by cancer treatment has become a major clinical concern, affecting the long-term survival and quality of life of cancer patients. Effective clinical decision-making, including the detection of cancer treatment-induced cardiotoxicity and the monitoring of associated symptoms, remains a challenging task for clinicians. This study investigates the current practices and needs of clinicians in the clinical decision making of cancer treatment-induced cardiotoxicity and explores the potential of digital health technologies to support this process. Through semi-structured interviews with seven clinical experts, we identify a three-step decision-making paradigm: 1) symptom identification, 2) diagnostic testing and specialist collaboration, and 3) clinical decision-making and intervention. Our findings highlight the difficulties of diagnosing cardiotoxicity (absence of unified protocols and high variability in symptoms) and monitoring patient symptoms (lacking accurate and timely patient self-reported symptoms). The clinicians also expressed their need for effective early detection tools that can integrate remote patient monitoring capabilities. Based on these insights, we discuss the importance of understanding the dynamic nature of clinical workflows, and the design considerations for future digital tools to support cancer-treatment-induced cardiotoxicity decision-making. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: In Submission

arXiv:2408.02456 [pdf, other]

doi 10.1145/3639472

Enhancing Heterogeneous Knowledge Graph Completion with a Novel GAT-based Approach

Authors: Wanxu Wei, Yitong Song, Bin Yao

Abstract: Knowledge graphs (KGs) play a vital role in enhancing search results and recommendation systems. With the rapid increase in the size of the KGs, they are becoming inaccuracy and incomplete. This problem can be solved by the knowledge graph completion methods, of which graph attention network (GAT)-based methods stand out since their superior performance. However, existing GAT-based knowledge graph… ▽ More Knowledge graphs (KGs) play a vital role in enhancing search results and recommendation systems. With the rapid increase in the size of the KGs, they are becoming inaccuracy and incomplete. This problem can be solved by the knowledge graph completion methods, of which graph attention network (GAT)-based methods stand out since their superior performance. However, existing GAT-based knowledge graph completion methods often suffer from overfitting issues when dealing with heterogeneous knowledge graphs, primarily due to the unbalanced number of samples. Additionally, these methods demonstrate poor performance in predicting the tail (head) entity that shares the same relation and head (tail) entity with others. To solve these problems, we propose GATH, a novel GAT-based method designed for Heterogeneous KGs. GATH incorporates two separate attention network modules that work synergistically to predict the missing entities. We also introduce novel encoding and feature transformation approaches, enabling the robust performance of GATH in scenarios with imbalanced samples. Comprehensive experiments are conducted to evaluate the GATH's performance. Compared with the existing SOTA GAT-based model on Hits@10 and MRR metrics, our model improves performance by 5.2% and 5.2% on the FB15K-237 dataset, and by 4.5% and 14.6% on the WN18RR dataset, respectively. △ Less

Submitted 5 August, 2024; originally announced August 2024.

Journal ref: ACM Transactions on Knowledge Discovery from Data, Volume 18, Issue 4, 2024

arXiv:2407.19544 [pdf]

Deep Generative Models-Assisted Automated Labeling for Electron Microscopy Images Segmentation

Authors: Wenhao Yuan, Bingqing Yao, Shengdong Tan, Fengqi You, Qian He

Abstract: The rapid advancement of deep learning has facilitated the automated processing of electron microscopy (EM) big data stacks. However, designing a framework that eliminates manual labeling and adapts to domain gaps remains challenging. Current research remains entangled in the dilemma of pursuing complete automation while still requiring simulations or slight manual annotations. Here we demonstrate… ▽ More The rapid advancement of deep learning has facilitated the automated processing of electron microscopy (EM) big data stacks. However, designing a framework that eliminates manual labeling and adapts to domain gaps remains challenging. Current research remains entangled in the dilemma of pursuing complete automation while still requiring simulations or slight manual annotations. Here we demonstrate tandem generative adversarial network (tGAN), a fully label-free and simulation-free pipeline capable of generating EM images for computer vision training. The tGAN can assimilate key features from new data stacks, thus producing a tailored virtual dataset for the training of automated EM analysis tools. Using segmenting nanoparticles for analyzing size distribution of supported catalysts as the demonstration, our findings showcased that the recognition accuracy of tGAN even exceeds the manually-labeling method by 5%. It can also be adaptively deployed to various data domains without further manual manipulation, which is verified by transfer learning from HAADF-STEM to BF-TEM. This generalizability may enable it to extend its application to a broader range of imaging characterizations, liberating microscopists and materials scientists from tedious dataset annotations. △ Less

Submitted 28 July, 2024; originally announced July 2024.

arXiv:2407.18271 [pdf, other]

Large Language Model for Verilog Generation with Golden Code Feedback

Authors: Ning Wang, Bingkun Yao, Jie Zhou, Xi Wang, Zhe Jiang, Nan Guan

Abstract: Recent advancements in large language models (LLMs) have catalyzed significant interest in the automatic generation of Register-Transfer Level (RTL) code, particularly Verilog, from natural language instructions. While commercial LLMs like ChatGPT have dominated this domain, open-source alternatives have lagged considerably in performance, limiting the flexibility and data privacy of this emerging… ▽ More Recent advancements in large language models (LLMs) have catalyzed significant interest in the automatic generation of Register-Transfer Level (RTL) code, particularly Verilog, from natural language instructions. While commercial LLMs like ChatGPT have dominated this domain, open-source alternatives have lagged considerably in performance, limiting the flexibility and data privacy of this emerging technology. This study introduces a novel approach utilizing reinforcement learning with golden code feedback to enhance the performance of pre-trained models. Leveraging open-source data and base models, we have achieved state-of-the-art (SOTA) results with a substantial margin. Notably, our 6.7B parameter model \ours{} demonstrates superior performance compared to current best-in-class 13B and 16B models. Furthermore, through a comprehensive analysis of the limitations in direct fine-tuning and the training dynamics of reinforcement learning, we posit that the development of comprehensive supervisory signals, which are align with the inherent parallel semantics of Verilog code, is critical to effective generation. The code and data associated with this research are publicly available at \url{https://github.com/CatIIIIIIII/veriseek}. The model weights can be accessed at \url{https://huggingface.co/WANGNingroci/VeriSeek}. △ Less

Submitted 5 August, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

arXiv:2407.17571 [pdf, other]

Diffusion Models for Multi-Task Generative Modeling

Authors: Changyou Chen, Han Ding, Bunyamin Sisman, Yi Xu, Ouye Xie, Benjamin Z. Yao, Son Dinh Tran, Belinda Zeng

Abstract: Diffusion-based generative modeling has been achieving state-of-the-art results on various generation tasks. Most diffusion models, however, are limited to a single-generation modeling. Can we generalize diffusion models with the ability of multi-modal generative training for more generalizable modeling? In this paper, we propose a principled way to define a diffusion model by constructing a unifi… ▽ More Diffusion-based generative modeling has been achieving state-of-the-art results on various generation tasks. Most diffusion models, however, are limited to a single-generation modeling. Can we generalize diffusion models with the ability of multi-modal generative training for more generalizable modeling? In this paper, we propose a principled way to define a diffusion model by constructing a unified multi-modal diffusion model in a common diffusion space. We define the forward diffusion process to be driven by an information aggregation from multiple types of task-data, e.g., images for a generation task and labels for a classification task. In the reverse process, we enforce information sharing by parameterizing a shared backbone denoising network with additional modality-specific decoder heads. Such a structure can simultaneously learn to generate different types of multi-modal data with a multi-task loss, which is derived from a new multi-modal variational lower bound that generalizes the standard diffusion model. We propose several multimodal generation settings to verify our framework, including image transition, masked-image training, joint image-label and joint image-representation generative modeling. Extensive experimental results on ImageNet indicate the effectiveness of our framework for various multi-modal generative modeling, which we believe is an important research direction worthy of more future explorations. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: Published as a conference paper at ICLR 2024

arXiv:2407.16999 [pdf, other]

doi 10.1145/3394486.3403129

SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing

Authors: Changchang Yin, Pin-Yu Chen, Bingsheng Yao, Dakuo Wang, Jeffrey Caterino, Ping Zhang

Abstract: Sepsis is the leading cause of in-hospital mortality in the USA. Early sepsis onset prediction and diagnosis could significantly improve the survival of sepsis patients. Existing predictive models are usually trained on high-quality data with few missing information, while missing values widely exist in real-world clinical scenarios (especially in the first hours of admissions to the hospital), wh… ▽ More Sepsis is the leading cause of in-hospital mortality in the USA. Early sepsis onset prediction and diagnosis could significantly improve the survival of sepsis patients. Existing predictive models are usually trained on high-quality data with few missing information, while missing values widely exist in real-world clinical scenarios (especially in the first hours of admissions to the hospital), which causes a significant decrease in accuracy and an increase in uncertainty for the predictive models. The common method to handle missing values is imputation, which replaces the unavailable variables with estimates from the observed data. The uncertainty of imputation results can be propagated to the sepsis prediction outputs, which have not been studied in existing works on either sepsis prediction or uncertainty quantification. In this study, we first define such propagated uncertainty as the variance of prediction output and then introduce uncertainty propagation methods to quantify the propagated uncertainty. Moreover, for the potential high-risk patients with low confidence due to limited observations, we propose a robust active sensing algorithm to increase confidence by actively recommending clinicians to observe the most informative variables. We validate the proposed models in both publicly available data (i.e., MIMIC-III and AmsterdamUMCdb) and proprietary data in The Ohio State University Wexner Medical Center (OSUWMC). The experimental results show that the propagated uncertainty is dominant at the beginning of admissions to hospitals and the proposed algorithm outperforms state-of-the-art active sensing methods. Finally, we implement a SepsisLab system for early sepsis prediction and active sensing based on our pre-trained models. Clinicians and potential sepsis patients can benefit from the system in early prediction and diagnosis of sepsis. △ Less

Submitted 24 July, 2024; originally announced July 2024.

Comments: To be published in KDD 2024

MSC Class: 68T07 (primary) 92C50 (secondary) ACM Class: H.2.8; I.2.1; J.3

arXiv:2407.14181 [pdf]

Harnessing Zn-Volatility for Compositional Tuning in PtZn Nanoalloy Catalysts

Authors: Bingqing Yao, Chaokai Xu, Yaxin Tang, Yankun Du, Shengdong Tan, Sheng Dai, Guangfu Luo, Qian He

Abstract: Bimetallic nanoalloys have gained extensive attention due to their tunable properties and wide range of catalytic applications. However, achieving good compositional control in nanoalloy catalysts remains a formidable challenge. In this work, we demonstrate that heat treatment can be used to tune the composition of Pt-Zn nanoalloy catalysts, leveraging the volatile nature of zinc to enhance their… ▽ More Bimetallic nanoalloys have gained extensive attention due to their tunable properties and wide range of catalytic applications. However, achieving good compositional control in nanoalloy catalysts remains a formidable challenge. In this work, we demonstrate that heat treatment can be used to tune the composition of Pt-Zn nanoalloy catalysts, leveraging the volatile nature of zinc to enhance their performance in propane dehydrogenation. Through identical location (scanning) transmission electron microscopy (IL-(S)TEM) using an in-situ EM gas cell, as well as other complementary techniques, we observed that the zinc content of the Pt-Zn nanoalloy particles decreased over time of the heat treatment under hydrogen. The rate of change depends on the original composition of the particles, as well as the heat treatment conditions such as temperature and flow rate. Our experimental results and theoretical calculations suggest that Zn in the intermetallic phase might be more stable, providing an opportunity for precise tuning the nanoparticle compositions. This approach presents a viable strategy for developing better Pt-Zn catalysts for propane dehydrogenation. △ Less

Submitted 19 July, 2024; originally announced July 2024.

arXiv:2407.13851 [pdf, other]

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

Authors: Sirnam Swetha, Jinyu Yang, Tal Neiman, Mamshad Nayeem Rizve, Son Tran, Benjamin Yao, Trishul Chilimbi, Mubarak Shah

Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have revolutionized the field of vision-language understanding by integrating visual perception capabilities into Large Language Models (LLMs). The prevailing trend in this field involves the utilization of a vision encoder derived from vision-language contrastive learning (CL), showing expertise in capturing overall representations w… ▽ More Recent advancements in Multimodal Large Language Models (MLLMs) have revolutionized the field of vision-language understanding by integrating visual perception capabilities into Large Language Models (LLMs). The prevailing trend in this field involves the utilization of a vision encoder derived from vision-language contrastive learning (CL), showing expertise in capturing overall representations while facing difficulties in capturing detailed local patterns. In this work, we focus on enhancing the visual representations for MLLMs by combining high-frequency and detailed visual representations, obtained through masked image modeling (MIM), with semantically-enriched low-frequency representations captured by CL. To achieve this goal, we introduce X-Former which is a lightweight transformer module designed to exploit the complementary strengths of CL and MIM through an innovative interaction mechanism. Specifically, X-Former first bootstraps vision-language representation learning and multimodal-to-multimodal generative learning from two frozen vision encoders, i.e., CLIP-ViT (CL-based) and MAE-ViT (MIM-based). It further bootstraps vision-to-language generative learning from a frozen LLM to ensure visual features from X-Former can be interpreted by the LLM. To demonstrate the effectiveness of our approach, we assess its performance on tasks demanding detailed visual understanding. Extensive evaluations indicate that X-Former excels in visual reasoning tasks involving both structural and semantic categories in the GQA dataset. Assessment on fine-grained visual perception benchmark further confirms its superior capabilities in visual understanding. △ Less

Submitted 18 July, 2024; originally announced July 2024.

Comments: Accepted at ECCV2024

arXiv:2407.12725 [pdf, other]

Is Sarcasm Detection A Step-by-Step Reasoning Process in Large Language Models?

Authors: Ben Yao, Yazhou Zhang, Qiuchi Li, Jing Qin

Abstract: Elaborating a series of intermediate reasoning steps significantly improves the ability of large language models (LLMs) to solve complex problems, as such steps would evoke LLMs to think sequentially. However, human sarcasm understanding is often considered an intuitive and holistic cognitive process, in which various linguistic, contextual, and emotional cues are integrated to form a comprehensiv… ▽ More Elaborating a series of intermediate reasoning steps significantly improves the ability of large language models (LLMs) to solve complex problems, as such steps would evoke LLMs to think sequentially. However, human sarcasm understanding is often considered an intuitive and holistic cognitive process, in which various linguistic, contextual, and emotional cues are integrated to form a comprehensive understanding, in a way that does not necessarily follow a step-by-step fashion. To verify the validity of this argument, we introduce a new prompting framework (called SarcasmCue) containing four sub-methods, viz. chain of contradiction (CoC), graph of cues (GoC), bagging of cues (BoC) and tensor of cues (ToC), which elicits LLMs to detect human sarcasm by considering sequential and non-sequential prompting methods. Through a comprehensive empirical comparison on four benchmarks, we highlight three key findings: (1) CoC and GoC show superior performance with more advanced models like GPT-4 and Claude 3.5, with an improvement of 3.5%. (2) ToC significantly outperforms other methods when smaller LLMs are evaluated, boosting the F1 score by 29.7% over the best baseline. (3) Our proposed framework consistently pushes the state-of-the-art (i.e., ToT) by 4.2%, 2.0%, 29.7%, and 58.2% in F1 scores across four datasets. This demonstrates the effectiveness and stability of the proposed framework. △ Less

Submitted 24 August, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

Comments: 9 pages, 5 figures

arXiv:2407.09073 [pdf, other]

Open Vocabulary Multi-Label Video Classification

Authors: Rohit Gupta, Mamshad Nayeem Rizve, Jayakrishnan Unnikrishnan, Ashish Tawari, Son Tran, Mubarak Shah, Benjamin Yao, Trishul Chilimbi

Abstract: Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation. Some recent works have focused on extending VLMs to open vocabulary single label action classification in videos. However, previous methods fall short in holistic video understanding which requires the ability to… ▽ More Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation. Some recent works have focused on extending VLMs to open vocabulary single label action classification in videos. However, previous methods fall short in holistic video understanding which requires the ability to simultaneously recognize multiple actions and entities e.g., objects in the video in an open vocabulary setting. We formulate this problem as open vocabulary multilabel video classification and propose a method to adapt a pre-trained VLM such as CLIP to solve this task. We leverage large language models (LLMs) to provide semantic guidance to the VLM about class labels to improve its open vocabulary performance with two key contributions. First, we propose an end-to-end trainable architecture that learns to prompt an LLM to generate soft attributes for the CLIP text-encoder to enable it to recognize novel classes. Second, we integrate a temporal modeling module into CLIP's vision encoder to effectively model the spatio-temporal dynamics of video concepts as well as propose a novel regularized finetuning technique to ensure strong open vocabulary classification performance in the video domain. Our extensive experimentation showcases the efficacy of our approach on multiple benchmark datasets. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Accepted at ECCV 2024

arXiv:2407.03772 [pdf, other]

CS3: Cascade SAM for Sperm Segmentation

Authors: Yi Shi, Xu-Peng Tian, Yun-Kai Wang, Tie-Yi Zhang, Bin Yao, Hui Wang, Yong Shao, Cen-Cen Wang, Rong Zeng, De-Chuan Zhan

Abstract: Automated sperm morphology analysis plays a crucial role in the assessment of male fertility, yet its efficacy is often compromised by the challenges in accurately segmenting sperm images. Existing segmentation techniques, including the Segment Anything Model(SAM), are notably inadequate in addressing the complex issue of sperm overlap-a frequent occurrence in clinical samples. Our exploratory stu… ▽ More Automated sperm morphology analysis plays a crucial role in the assessment of male fertility, yet its efficacy is often compromised by the challenges in accurately segmenting sperm images. Existing segmentation techniques, including the Segment Anything Model(SAM), are notably inadequate in addressing the complex issue of sperm overlap-a frequent occurrence in clinical samples. Our exploratory studies reveal that modifying image characteristics by removing sperm heads and easily segmentable areas, alongside enhancing the visibility of overlapping regions, markedly enhances SAM's efficiency in segmenting intricate sperm structures. Motivated by these findings, we present the Cascade SAM for Sperm Segmentation (CS3), an unsupervised approach specifically designed to tackle the issue of sperm overlap. This method employs a cascade application of SAM to segment sperm heads, simple tails, and complex tails in stages. Subsequently, these segmented masks are meticulously matched and joined to construct complete sperm masks. In collaboration with leading medical institutions, we have compiled a dataset comprising approximately 2,000 unlabeled sperm images to fine-tune our method, and secured expert annotations for an additional 240 images to facilitate comprehensive model assessment. Experimental results demonstrate superior performance of CS3 compared to existing methods. △ Less

Submitted 9 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

Comments: Early accepted by MICCAI2024

arXiv:2407.03663 [pdf, other]

Limited-View Photoacoustic Imaging Reconstruction Via High-quality Self-supervised Neural Representation

Authors: Youshen xiao, Yuting Shen, Bowei Yao, Xiran Cai, Yuyao Zhang, Fei Gao

Abstract: In practical applications within the human body, it is often challenging to fully encompass the target tissue or organ, necessitating the use of limited-view arrays, which can lead to the loss of crucial information. Addressing the reconstruction of photoacoustic sensor signals in limited-view detection spaces has become a focal point of current research. In this study, we introduce a self-supervi… ▽ More In practical applications within the human body, it is often challenging to fully encompass the target tissue or organ, necessitating the use of limited-view arrays, which can lead to the loss of crucial information. Addressing the reconstruction of photoacoustic sensor signals in limited-view detection spaces has become a focal point of current research. In this study, we introduce a self-supervised network termed HIgh-quality Self-supervised neural representation (HIS), which tackles the inverse problem of photoacoustic imaging to reconstruct high-quality photoacoustic images from sensor data acquired under limited viewpoints. We regard the desired reconstructed photoacoustic image as an implicit continuous function in 2D image space, viewing the pixels of the image as sparse discrete samples. The HIS's objective is to learn the continuous function from limited observations by utilizing a fully connected neural network combined with Fourier feature position encoding. By simply minimizing the error between the network's predicted sensor data and the actual sensor data, HIS is trained to represent the observed continuous model. The results indicate that the proposed HIS model offers superior image reconstruction quality compared to three commonly used methods for photoacoustic image reconstruction. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.00840 [pdf, other]

MUSE-Net: Missingness-aware mUlti-branching Self-attention Encoder for Irregular Longitudinal Electronic Health Records

Authors: Zekai Wang, Tieming Liu, Bing Yao

Abstract: The era of big data has made vast amounts of clinical data readily available, particularly in the form of electronic health records (EHRs), which provides unprecedented opportunities for developing data-driven diagnostic tools to enhance clinical decision making. However, the application of EHRs in data-driven modeling faces challenges such as irregularly spaced multi-variate time series, issues o… ▽ More The era of big data has made vast amounts of clinical data readily available, particularly in the form of electronic health records (EHRs), which provides unprecedented opportunities for developing data-driven diagnostic tools to enhance clinical decision making. However, the application of EHRs in data-driven modeling faces challenges such as irregularly spaced multi-variate time series, issues of incompleteness, and data imbalance. Realizing the full data potential of EHRs hinges on the development of advanced analytical models. In this paper, we propose a novel Missingness-aware mUlti-branching Self-attention Encoder (MUSE-Net) to cope with the challenges in modeling longitudinal EHRs for data-driven disease prediction. The MUSE-Net leverages a multi-task Gaussian process (MGP) with missing value masks for data imputation, a multi-branching architecture to address the data imbalance problem, and a time-aware self-attention encoder to account for the irregularly spaced time interval in longitudinal EHRs. We evaluate the proposed MUSE-Net using both synthetic and real-world datasets. Experimental results show that our MUSE-Net outperforms existing methods that are widely used to investigate longitudinal signals. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.19749 [pdf, other]

SPIRONet: Spatial-Frequency Learning and Topological Channel Interaction Network for Vessel Segmentation

Authors: De-Xing Huang, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Zhen-Qiu Feng, Mei-Jiang Gui, Hao Li, Tian-Yu Xiang, Bo-Xian Yao, Zeng-Guang Hou

Abstract: Automatic vessel segmentation is paramount for developing next-generation interventional navigation systems. However, current approaches suffer from suboptimal segmentation performances due to significant challenges in intraoperative images (i.e., low signal-to-noise ratio, small or slender vessels, and strong interference). In this paper, a novel spatial-frequency learning and topological channel… ▽ More Automatic vessel segmentation is paramount for developing next-generation interventional navigation systems. However, current approaches suffer from suboptimal segmentation performances due to significant challenges in intraoperative images (i.e., low signal-to-noise ratio, small or slender vessels, and strong interference). In this paper, a novel spatial-frequency learning and topological channel interaction network (SPIRONet) is proposed to address the above issues. Specifically, dual encoders are utilized to comprehensively capture local spatial and global frequency vessel features. Then, a cross-attention fusion module is introduced to effectively fuse spatial and frequency features, thereby enhancing feature discriminability. Furthermore, a topological channel interaction module is designed to filter out task-irrelevant responses based on graph neural networks. Extensive experimental results on several challenging datasets (CADSA, CAXF, DCA1, and XCAD) demonstrate state-of-the-art performances of our method. Moreover, the inference speed of SPIRONet is 21 FPS with a 512x512 input size, surpassing clinical real-time requirements (6~12FPS). These promising outcomes indicate SPIRONet's potential for integration into vascular interventional navigation systems. Code is available at https://github.com/Dxhuang-CASIA/SPIRONet. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19205 [pdf, other]

Coordinated RSMA for Integrated Sensing and Communication in Emergency UAV Systems

Authors: Binghan Yao, Ruoguang Li, Yingyang Chen, Li Wang

Abstract: Recently, unmanned aerial vehicle (UAV)-enabled integrated sensing and communication (ISAC) is emerging as a promising technique for achieving robust and rapid emergency response capabilities. Such a novel framework offers high-quality and cost-efficient C\&S services due to the intrinsic flexibility and mobility of UAVs. In parallel, rate-splitting multiple access (RSMA) is able to achieve a tail… ▽ More Recently, unmanned aerial vehicle (UAV)-enabled integrated sensing and communication (ISAC) is emerging as a promising technique for achieving robust and rapid emergency response capabilities. Such a novel framework offers high-quality and cost-efficient C\&S services due to the intrinsic flexibility and mobility of UAVs. In parallel, rate-splitting multiple access (RSMA) is able to achieve a tailor-made communication by splitting the messages into private and common parts with adjustable rates, making it suitable for on-demand data transmission in disaster scenarios. In this paper, we propose a coordinated RSMA for integrated sensing and communication (CoRSMA-ISAC) scheme in emergency UAV system to facilitate search and rescue operations, where a number of ISAC UAVs simultaneously communicate with multiple communication survivors (CSs) and detect a potentially trapped survivor (TS) in a coordinated manner. Towards this end, an optimization problem is formulated to maximize the weighted sum rate (WSR) of the system, subject to the sensing signal-to-noise ratio (SNR) requirement. In order to solve the formulated non-convex problem, we first decompose it into three subproblems, i.e., UAV-CS association, UAV deployment, as well as beamforming optimization and rate allocation. Subsequently, we introduce an iterative optimization approach leveraging K-Means, successive convex approximation (SCA), and semi-definite relaxation (SDR) algorithms to reframe the subproblems into a more tractable form and efficiently solve them. Simulation results demonstrate that the proposed CoRSMA-ISAC scheme is superior to conventional space division multiple access (SDMA), non-orthogonal multiple access (NOMA), and orthogonal multiple access (OMA) in terms of both communication and sensing performance. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.17578 [pdf, other]

Sparse-view Signal-domain Photoacoustic Tomography Reconstruction Method Based on Neural Representation

Authors: Bowei Yao, Yi Zeng, Haizhao Dai, Qing Wu, Youshen Xiao, Fei Gao, Yuyao Zhang, Jingyi Yu, Xiran Cai

Abstract: Photoacoustic tomography is a hybrid biomedical technology, which combines the advantages of acoustic and optical imaging. However, for the conventional image reconstruction method, the image quality is affected obviously by artifacts under the condition of sparse sampling. in this paper, a novel model-based sparse reconstruction method via implicit neural representation was proposed for improving… ▽ More Photoacoustic tomography is a hybrid biomedical technology, which combines the advantages of acoustic and optical imaging. However, for the conventional image reconstruction method, the image quality is affected obviously by artifacts under the condition of sparse sampling. in this paper, a novel model-based sparse reconstruction method via implicit neural representation was proposed for improving the image quality reconstructed from sparse data. Specially, the initial acoustic pressure distribution was modeled as a continuous function of spatial coordinates, and parameterized by a multi-layer perceptron. The weights of multi-layer perceptron were determined by training the network in self-supervised manner. And the total variation regularization term was used to offer the prior knowledge. We compared our result with some ablation studies, and the results show that out method outperforms existing methods on simulation and experimental data. Under the sparse sampling condition, our method can suppress the artifacts and avoid the ill-posed problem effectively, which reconstruct images with higher signal-to-noise ratio and contrast-to-noise ratio than traditional methods. The high-quality results for sparse data make the proposed method hold the potential for further decreasing the hardware cost of photoacoustic tomography system. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2405.13803 [pdf, other]

Sunnie: An Anthropomorphic LLM-Based Conversational Agent for Mental Well-Being Activity Recommendation

Authors: Siyi Wu, Feixue Han, Bingsheng Yao, Tianyi Xie, Xuan Zhao, Dakuo Wang

Abstract: A longstanding challenge in mental well-being support is the reluctance of people to adopt psychologically beneficial activities, often due to lack of motivation, low perceived trustworthiness, and limited personalization of recommendations. Chatbots have shown promise in promoting positive mental health practices, yet their rigid interaction flows and less human-like conversational experiences pr… ▽ More A longstanding challenge in mental well-being support is the reluctance of people to adopt psychologically beneficial activities, often due to lack of motivation, low perceived trustworthiness, and limited personalization of recommendations. Chatbots have shown promise in promoting positive mental health practices, yet their rigid interaction flows and less human-like conversational experiences present significant limitations. In this work, we explore whether the anthropomorphic design (both LLM's persona design and conversational experience design) can enhance users' perception of the system and their willingness to adopt mental well-being activity recommendations. To this end, we introduce Sunnie, an anthropomorphic LLM-based conversational agent designed to offer personalized well-being support through multi-turn conversation and recommend practical actions grounded in positive psychology and social psychology. An empirical user study comparing the user experience with Sunnie and with a traditional survey-based activity recommendation system suggests that the anthropomorphic characteristics of Sunnie significantly enhance users' perception of the system and the overall usability; nevertheless, users' willingness to adopt activity recommendations did not change significantly. △ Less

Submitted 13 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

Comments: In Submission

arXiv:2404.13409 [pdf, other]

"I Wish There Were an AI": Challenges and AI Potential in Cancer Patient-Provider Communication

Authors: Ziqi Yang, Xuhai Xu, Bingsheng Yao, Jiachen Li, Jennifer Bagdasarian, Guodong Gao, Dakuo Wang

Abstract: Patient-provider communication has been crucial to cancer patients' survival after their cancer treatments. However, the research community and patients themselves often overlook the communication challenges after cancer treatments as they are overshadowed by the severity of the patient's illness and the variety and rarity of the cancer disease itself. Meanwhile, the recent technical advances in A… ▽ More Patient-provider communication has been crucial to cancer patients' survival after their cancer treatments. However, the research community and patients themselves often overlook the communication challenges after cancer treatments as they are overshadowed by the severity of the patient's illness and the variety and rarity of the cancer disease itself. Meanwhile, the recent technical advances in AI, especially in Large Language Models (LLMs) with versatile natural language interpretation and generation ability, demonstrate great potential to support communication in complex real-world medical situations. By interviewing six healthcare providers and eight cancer patients, our goal is to explore the providers' and patients' communication barriers in the post-cancer treatment recovery period, their expectations for future communication technologies, and the potential of AI technologies in this context. Our findings reveal several challenges in current patient-provider communication, including the knowledge and timing gaps between cancer patients and providers, their collaboration obstacles, and resource limitations. Moreover, based on providers' and patients' needs and expectations, we summarize a set of design implications for intelligent communication systems, especially with the power of LLMs. Our work sheds light on the design of future AI-powered systems for patient-provider communication under high-stake and high-uncertainty situations. △ Less

Submitted 20 April, 2024; originally announced April 2024.

Comments: 18 pages, 2 figures, submission to CSCW'24

arXiv:2404.05012 [pdf, other]

Towards Reliable and Empathetic Depression-Diagnosis-Oriented Chats

Authors: Kunyao Lan, Cong Ming, Binwei Yao, Lu Chen, Mengyue Wu

Abstract: Chatbots can serve as a viable tool for preliminary depression diagnosis via interactive conversations with potential patients. Nevertheless, the blend of task-oriented and chit-chat in diagnosis-related dialogues necessitates professional expertise and empathy. Such unique requirements challenge traditional dialogue frameworks geared towards single optimization goals. To address this, we propose… ▽ More Chatbots can serve as a viable tool for preliminary depression diagnosis via interactive conversations with potential patients. Nevertheless, the blend of task-oriented and chit-chat in diagnosis-related dialogues necessitates professional expertise and empathy. Such unique requirements challenge traditional dialogue frameworks geared towards single optimization goals. To address this, we propose an innovative ontology definition and generation framework tailored explicitly for depression diagnosis dialogues, combining the reliability of task-oriented conversations with the appeal of empathy-related chit-chat. We further apply the framework to D$^4$, the only existing public dialogue dataset on depression diagnosis-oriented chats. Exhaustive experimental results indicate significant improvements in task completion and emotional support generation in depression diagnosis, fostering a more comprehensive approach to task-oriented chat dialogue system development and its applications in digital mental health. △ Less

Submitted 7 April, 2024; originally announced April 2024.

arXiv:2403.16398 [pdf, other]

Rethinking the Representation in Federated Unsupervised Learning with Non-IID Data

Authors: Xinting Liao, Weiming Liu, Chaochao Chen, Pengyang Zhou, Fengyuan Yu, Huabin Zhu, Binhui Yao, Tao Wang, Xiaolin Zheng, Yanchao Tan

Abstract: Federated learning achieves effective performance in modeling decentralized data. In practice, client data are not well-labeled, which makes it potential for federated unsupervised learning (FUSL) with non-IID data. However, the performance of existing FUSL methods suffers from insufficient representations, i.e., (1) representation collapse entanglement among local and global models, and (2) incon… ▽ More Federated learning achieves effective performance in modeling decentralized data. In practice, client data are not well-labeled, which makes it potential for federated unsupervised learning (FUSL) with non-IID data. However, the performance of existing FUSL methods suffers from insufficient representations, i.e., (1) representation collapse entanglement among local and global models, and (2) inconsistent representation spaces among local models. The former indicates that representation collapse in local model will subsequently impact the global model and other local models. The latter means that clients model data representation with inconsistent parameters due to the deficiency of supervision signals. In this work, we propose FedU2 which enhances generating uniform and unified representation in FUSL with non-IID data. Specifically, FedU2 consists of flexible uniform regularizer (FUR) and efficient unified aggregator (EUA). FUR in each client avoids representation collapse via dispersing samples uniformly, and EUA in server promotes unified representation by constraining consistent client model updating. To extensively validate the performance of FedU2, we conduct both cross-device and cross-silo evaluation experiments on two benchmark datasets, i.e., CIFAR10 and CIFAR100. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: CVPR 2024

arXiv:2403.14870 [pdf, other]

VidLA: Video-Language Alignment at Scale

Authors: Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan, Son Tran, Benjamin Z. Yao, Belinda Zeng, Mubarak Shah, Trishul Chilimbi

Abstract: In this paper, we propose VidLA, an approach for video-language alignment at scale. There are two major limitations of previous video-language alignment approaches. First, they do not capture both short-range and long-range temporal dependencies and typically employ complex hierarchical deep network architectures that are hard to integrate with existing pretrained image-text foundation models. To… ▽ More In this paper, we propose VidLA, an approach for video-language alignment at scale. There are two major limitations of previous video-language alignment approaches. First, they do not capture both short-range and long-range temporal dependencies and typically employ complex hierarchical deep network architectures that are hard to integrate with existing pretrained image-text foundation models. To effectively address this limitation, we instead keep the network architecture simple and use a set of data tokens that operate at different temporal resolutions in a hierarchical manner, accounting for the temporally hierarchical nature of videos. By employing a simple two-tower architecture, we are able to initialize our video-language model with pretrained image-text foundation models, thereby boosting the final performance. Second, existing video-language alignment works struggle due to the lack of semantically aligned large-scale training data. To overcome it, we leverage recent LLMs to curate the largest video-language dataset to date with better visual grounding. Furthermore, unlike existing video-text datasets which only contain short clips, our dataset is enriched with video clips of varying durations to aid our temporally hierarchical data tokens in extracting better representations at varying temporal scales. Overall, empirical results show that our proposed approach surpasses state-of-the-art methods on multiple retrieval benchmarks, especially on longer videos, and performs competitively on classification benchmarks. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024

arXiv:2403.08154 [pdf, other]

The Effect of Different Optimization Strategies to Physics-Constrained Deep Learning for Soil Moisture Estimation

Authors: Jianxin Xie, Bing Yao, Zheyu Jiang

Abstract: Soil moisture is a key hydrological parameter that has significant importance to human society and the environment. Accurate modeling and monitoring of soil moisture in crop fields, especially in the root zone (top 100 cm of soil), is essential for improving agricultural production and crop yield with the help of precision irrigation and farming tools. Realizing the full sensor data potential depe… ▽ More Soil moisture is a key hydrological parameter that has significant importance to human society and the environment. Accurate modeling and monitoring of soil moisture in crop fields, especially in the root zone (top 100 cm of soil), is essential for improving agricultural production and crop yield with the help of precision irrigation and farming tools. Realizing the full sensor data potential depends greatly on advanced analytical and predictive domain-aware models. In this work, we propose a physics-constrained deep learning (P-DL) framework to integrate physics-based principles on water transport and water sensing signals for effective reconstruction of the soil moisture dynamics. We adopt three different optimizers, namely Adam, RMSprop, and GD, to minimize the loss function of P-DL during the training process. In the illustrative case study, we demonstrate the empirical convergence of Adam optimizers outperforms the other optimization methods in both mini-batch and full-batch training. △ Less

Submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.07228 [pdf, other]

Physics-constrained Active Learning for Soil Moisture Estimation and Optimal Sensor Placement

Authors: Jianxin Xie, Bing Yao, Zheyu Jiang

Abstract: Soil moisture is a crucial hydrological state variable that has significant importance to the global environment and agriculture. Precise monitoring of soil moisture in crop fields is critical to reducing agricultural drought and improving crop yield. In-situ soil moisture sensors, which are buried at pre-determined depths and distributed across the field, are promising solutions for monitoring so… ▽ More Soil moisture is a crucial hydrological state variable that has significant importance to the global environment and agriculture. Precise monitoring of soil moisture in crop fields is critical to reducing agricultural drought and improving crop yield. In-situ soil moisture sensors, which are buried at pre-determined depths and distributed across the field, are promising solutions for monitoring soil moisture. However, high-density sensor deployment is neither economically feasible nor practical. Thus, to achieve a higher spatial resolution of soil moisture dynamics using a limited number of sensors, we integrate a physics-based agro-hydrological model based on Richards' equation in a physics-constrained deep learning framework to accurately predict soil moisture dynamics in the soil's root zone. This approach ensures that soil moisture estimates align well with sensor observations while obeying physical laws at the same time. Furthermore, to strategically identify the locations for sensor placement, we introduce a novel active learning framework that combines space-filling design and physics residual-based sampling to maximize data acquisition potential with limited sensors. Our numerical results demonstrate that integrating Physics-constrained Deep Learning (P-DL) with an active learning strategy within a unified framework--named the Physics-constrained Active Learning (P-DAL) framework--significantly improves the predictive accuracy and effectiveness of field-scale soil moisture monitoring using in-situ sensors. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.01273 [pdf, other]

NoMAD-Attention: Efficient LLM Inference on CPUs Through Multiply-add-free Attention

Authors: Tianyi Zhang, Jonah Wonkyu Yi, Bowen Yao, Zhaozhuo Xu, Anshumali Shrivastava

Abstract: Large language model inference on Central Processing Units (CPU) is challenging due to the vast quantities of expensive Multiply-Add (MAD) matrix operations in the attention computations. In this paper, we argue that there is a rare gem in modern CPUs, Single-Instruction-Multiple-Data (SIMD) registers, which allow for ultra-low-latency lookups in batch. We leverage this unique capability of CPUs t… ▽ More Large language model inference on Central Processing Units (CPU) is challenging due to the vast quantities of expensive Multiply-Add (MAD) matrix operations in the attention computations. In this paper, we argue that there is a rare gem in modern CPUs, Single-Instruction-Multiple-Data (SIMD) registers, which allow for ultra-low-latency lookups in batch. We leverage this unique capability of CPUs to propose NoMAD-Attention, an efficient attention algorithm that replaces MAD operations with in-register lookups. Through hardware-aware algorithmic designs, NoMAD-Attention achieves the computation of attention scores using repeated fast accesses to SIMD registers despite their highly limited sizes. Moreover, NoMAD-Attention works with pre-trained attention-based LLMs without model finetuning. Empirical evaluations demonstrate that NoMAD-Attention maintains the quality of the original LLMs well, and speeds up the 4-bit quantized LLaMA-7B-based model by up to 2$\times$ at 16k context length. Our results are reproducible at https://github.com/tonyzhang617/nomad-dist. △ Less

Submitted 2 March, 2024; originally announced March 2024.

arXiv:2402.01994 [pdf, ps, other]

Human-Centered Privacy Research in the Age of Large Language Models

Authors: Tianshi Li, Sauvik Das, Hao-Ping Lee, Dakuo Wang, Bingsheng Yao, Zhiping Zhang

Abstract: The emergence of large language models (LLMs), and their increased use in user-facing systems, has led to substantial privacy concerns. To date, research on these privacy concerns has been model-centered: exploring how LLMs lead to privacy risks like memorization, or can be used to infer personal characteristics about people from their content. We argue that there is a need for more research focus… ▽ More The emergence of large language models (LLMs), and their increased use in user-facing systems, has led to substantial privacy concerns. To date, research on these privacy concerns has been model-centered: exploring how LLMs lead to privacy risks like memorization, or can be used to infer personal characteristics about people from their content. We argue that there is a need for more research focusing on the human aspect of these privacy issues: e.g., research on how design paradigms for LLMs affect users' disclosure behaviors, users' mental models and preferences for privacy controls, and the design of tools, systems, and artifacts that empower end-users to reclaim ownership over their personal data. To build usable, efficient, and privacy-friendly systems powered by these models with imperfect privacy properties, our goal is to initiate discussions to outline an agenda for conducting human-centered research on privacy issues in LLM-powered systems. This Special Interest Group (SIG) aims to bring together researchers with backgrounds in usable security and privacy, human-AI collaboration, NLP, or any other related domains to share their perspectives and experiences on this problem, to help our community establish a collective understanding of the challenges, research opportunities, research methods, and strategies to collaborate with researchers outside of HCI. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: 4 pages, CHI EA'24

arXiv:2401.13804 [pdf, other]

Exploring Parent's Needs for Children-Centered AI to Support Preschoolers' Interactive Storytelling and Reading Activities

Authors: Yuling Sun, Jiaju Chen, Bingsheng Yao, Jiali Liu, Dakuo Wang, Xiaojuan Ma, Yuxuan Lu, Ying Xu, Liang He

Abstract: Interactive storytelling is vital for preschooler development. While children's interactive partners have traditionally been their parents and teachers, recent advances in artificial intelligence (AI) have sparked a surge of AI-based storytelling and reading technologies. As these technologies become increasingly ubiquitous in preschoolers' lives, questions arise regarding how they function in pra… ▽ More Interactive storytelling is vital for preschooler development. While children's interactive partners have traditionally been their parents and teachers, recent advances in artificial intelligence (AI) have sparked a surge of AI-based storytelling and reading technologies. As these technologies become increasingly ubiquitous in preschoolers' lives, questions arise regarding how they function in practical storytelling and reading scenarios and, how parents, the most critical stakeholders, experience and perceive these technologies. This paper investigates these questions through a qualitative study with 17 parents of children aged 3-6. Our findings suggest that even though AI-based storytelling and reading technologies provide more immersive and engaging interaction, they still cannot meet parents' expectations due to a series of interactive and algorithmic challenges. We elaborate on these challenges and discuss the possible implications of future AI-based interactive storytelling technologies for preschoolers. △ Less

Submitted 31 August, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: Accepted at ACM CSCW 2024

arXiv:2401.13799 [pdf, other]

Who Changed the Destiny of Rural Students, and How?: Unpacking ICT-Mediated Remote Education in Rural China

Authors: Yuling Sun, Xiuqi Zhu, Xiaomu Zhou, Bingsheng Yao, Kai Zhang, Dakuo Wang, Jiaju Chen, Liang He

Abstract: The proliferation of Information and Communication Technologies (ICTs) has shown great promise in addressing educational challenges facing rural areas. However, the complex rural context poses significant challenges to the effective utilization of these technologies. This paper examines the empirical integration of live-streaming-based remote classrooms (LSRC) through a qualitative study in rural… ▽ More The proliferation of Information and Communication Technologies (ICTs) has shown great promise in addressing educational challenges facing rural areas. However, the complex rural context poses significant challenges to the effective utilization of these technologies. This paper examines the empirical integration of live-streaming-based remote classrooms (LSRC) through a qualitative study in rural China. Our findings suggest that while LSRC enables rural students equal access to high-quality educational resources, its practical integration faces numerous challenges. In particular, we emphasize the crucial role of local teachers in addressing these challenges, ultimately achieving the desired improvement of students' learning outcomes. We also examine the impact of LSRC on the original rural education ecosystem. Building upon our findings, we call for a reconsideration of interaction paradigms and evaluation systems of ICT-mediated rural education, emphasizing the significance of rural teachers. We conclude by discussing the implications for future ICT-mediated technology interventions in rural settings. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: In submission

arXiv:2401.11876 [pdf, other]

First-principles Based 3D Virtual Simulation Testing for Discovering SOTIF Corner Cases of Autonomous Driving

Authors: Lehang Li, Haokuan Wu, Botao Yao, Tianyu He, Shuohan Huang, Chuanyi Liu

Abstract: 3D virtual simulation, which generates diversified test scenarios and tests full-stack of Autonomous Driving Systems (ADSes) modules dynamically as a whole, is a promising approach for Safety of The Intended Functionality (SOTIF) ADS testing. However, as different configurations of a test scenario will affect the sensor perceptions and environment interaction, e.g. light pulses emitted by the LiDA… ▽ More 3D virtual simulation, which generates diversified test scenarios and tests full-stack of Autonomous Driving Systems (ADSes) modules dynamically as a whole, is a promising approach for Safety of The Intended Functionality (SOTIF) ADS testing. However, as different configurations of a test scenario will affect the sensor perceptions and environment interaction, e.g. light pulses emitted by the LiDAR sensor will undergo backscattering and attenuation, which is usually overlooked by existing works, leading to false positives or wrong results. Moreover, the input space of an ADS is extremely large, with infinite number of possible initial scenarios and mutations, along both temporal and spatial domains. This paper proposes a first-principles based sensor modeling and environment interaction scheme, and integrates it into CARLA simulator. With this scheme, a long-overlooked category of adverse weather related corner cases are discovered, along with their root causes. Moreover, a meta-heuristic algorithm is designed based on several empirical insights, which guide both seed scenarios and mutations, significantly reducing the search dimensions of scenarios and enhancing the efficiency of corner case identification. Experimental results show that under identical simulation setups, our algorithm discovers about four times as many corner cases as compared to state-of-the-art work. △ Less

Submitted 22 January, 2024; originally announced January 2024.

Comments: 11 pages, 10 figures

arXiv:2401.11480 [pdf]

Organ-Level Radiation Doses from CT Scans for 10,000 Chinese Subjects Undergoing Physical Examinations: The Feasibility of AI-Based Multi-organ CT Image Segmentation and Near Real-time Monte Carlo Dose Computing

Authors: Zirui Ye, Bei Yao, Haoran Zheng, Li Tao, Ripeng Wang, Yang Lu, Yankui Chang, Xi Pei, Zhi Chen, Xie George Xu

Abstract: Considering the increasing trend of physical examinations in China, the escalating frequency of Computed Tomography (CT) scans has amplified concerns regarding population radiation exposure and its consequent risks. The challenges mainly manifest in two aspects: one is the rapid construction of patient-specific human phantoms, and the other is the fast Monte Carlo (MC) simulation of radiation dose… ▽ More Considering the increasing trend of physical examinations in China, the escalating frequency of Computed Tomography (CT) scans has amplified concerns regarding population radiation exposure and its consequent risks. The challenges mainly manifest in two aspects: one is the rapid construction of patient-specific human phantoms, and the other is the fast Monte Carlo (MC) simulation of radiation dose. Hence, this study aims to demonstrate a near real-time MC patient-specific organ dose computation method, for the first time, involving automatic segmentation across a large dataset of 11,482 subjects undergoing chest CT scans. We developed a preliminary software platform, integrating the automatic segmentation software DeepViewer and the GPU-accelerated MC engine ARCHER-CT. Comparisons with traditional dosimetry methods revealed up to 100% discrepancies for a few subjects in organ-level dose, underscoring the patient-specific method's superior accuracy. This study paves the way for more accurate radiation risk assessment, crucial in the era of patient-specific radiation dosimetry. △ Less

Submitted 21 January, 2024; originally announced January 2024.

Comments: 19 pages, 10 figures

arXiv:2312.14478 [pdf, other]

Federated Learning via Input-Output Collaborative Distillation

Authors: Xuan Gong, Shanglin Li, Yuxiang Bao, Barry Yao, Yawen Huang, Ziyan Wu, Baochang Zhang, Yefeng Zheng, David Doermann

Abstract: Federated learning (FL) is a machine learning paradigm in which distributed local nodes collaboratively train a central model without sharing individually held private data. Existing FL methods either iteratively share local model parameters or deploy co-distillation. However, the former is highly susceptible to private data leakage, and the latter design relies on the prerequisites of task-releva… ▽ More Federated learning (FL) is a machine learning paradigm in which distributed local nodes collaboratively train a central model without sharing individually held private data. Existing FL methods either iteratively share local model parameters or deploy co-distillation. However, the former is highly susceptible to private data leakage, and the latter design relies on the prerequisites of task-relevant real data. Instead, we propose a data-free FL framework based on local-to-central collaborative distillation with direct input and output space exploitation. Our design eliminates any requirement of recursive local parameter exchange or auxiliary task-relevant data to transfer knowledge, thereby giving direct privacy control to local users. In particular, to cope with the inherent data heterogeneity across locals, our technique learns to distill input on which each local model produces consensual yet unique results to represent each expertise. Our proposed FL framework achieves notable privacy-utility trade-offs with extensive experiments on image classification and segmentation tasks under various real-world heterogeneous federated learning settings on both natural and medical images. △ Less

Submitted 22 December, 2023; originally announced December 2023.

Comments: Accepted at AAAI 2024

arXiv:2312.00029 [pdf, other]

Bergeron: Combating Adversarial Attacks through a Conscience-Based Alignment Framework

Authors: Matthew Pisano, Peter Ly, Abraham Sanders, Bingsheng Yao, Dakuo Wang, Tomek Strzalkowski, Mei Si

Abstract: Research into AI alignment has grown considerably since the recent introduction of increasingly capable Large Language Models (LLMs). Unfortunately, modern methods of alignment still fail to fully prevent harmful responses when models are deliberately attacked. Such vulnerabilities can lead to LLMs being manipulated into generating hazardous content: from instructions for creating dangerous materi… ▽ More Research into AI alignment has grown considerably since the recent introduction of increasingly capable Large Language Models (LLMs). Unfortunately, modern methods of alignment still fail to fully prevent harmful responses when models are deliberately attacked. Such vulnerabilities can lead to LLMs being manipulated into generating hazardous content: from instructions for creating dangerous materials to inciting violence or endorsing unethical behaviors. To help mitigate this issue, we introduce Bergeron: a framework designed to improve the robustness of LLMs against attacks without any additional parameter fine-tuning. Bergeron is organized into two tiers; with a secondary LLM acting as a guardian to the primary LLM. This framework better safeguards the primary model against incoming attacks while monitoring its output for any harmful content. Empirical analysis reviews that by using Bergeron to complement models with existing alignment training, we can significantly improve the robustness and safety of multiple, commonly used commercial and open-source LLMs. Specifically, we found that models integrated with Bergeron are, on average, nearly seven times more resistant to attacks compared to models without such support. △ Less

Submitted 18 August, 2024; v1 submitted 16 November, 2023; originally announced December 2023.

arXiv:2311.09825 [pdf, other]

Human Still Wins over LLM: An Empirical Study of Active Learning on Domain-Specific Annotation Tasks

Authors: Yuxuan Lu, Bingsheng Yao, Shao Zhang, Yun Wang, Peng Zhang, Tun Lu, Toby Jia-Jun Li, Dakuo Wang

Abstract: Large Language Models (LLMs) have demonstrated considerable advances, and several claims have been made about their exceeding human performance. However, in real-world tasks, domain knowledge is often required. Low-resource learning methods like Active Learning (AL) have been proposed to tackle the cost of domain expert annotation, raising this question: Can LLMs surpass compact models trained wit… ▽ More Large Language Models (LLMs) have demonstrated considerable advances, and several claims have been made about their exceeding human performance. However, in real-world tasks, domain knowledge is often required. Low-resource learning methods like Active Learning (AL) have been proposed to tackle the cost of domain expert annotation, raising this question: Can LLMs surpass compact models trained with expert annotations in domain-specific tasks? In this work, we conduct an empirical experiment on four datasets from three different domains comparing SOTA LLMs with small models trained on expert annotations with AL. We found that small models can outperform GPT-3.5 with a few hundreds of labeled data, and they achieve higher or similar performance with GPT-4 despite that they are hundreds time smaller. Based on these findings, we posit that LLM predictions can be used as a warmup method in real-world applications and human experts remain indispensable in tasks involving data annotation driven by domain-specific knowledge. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.09782 [pdf, other]

More Samples or More Prompts? Exploring Effective In-Context Sampling for LLM Few-Shot Prompt Engineering

Authors: Bingsheng Yao, Guiming Chen, Ruishi Zou, Yuxuan Lu, Jiachen Li, Shao Zhang, Yisi Sang, Sijia Liu, James Hendler, Dakuo Wang

Abstract: While most existing works on LLM prompting techniques focus only on how to select a better set of data samples inside one single prompt input (In-Context Learning or ICL), why can not we design and leverage multiple prompts together to further improve the LLM's performance? In this work, we propose In-Context Sampling (ICS), a low-resource LLM prompting technique to produce confident predictions b… ▽ More While most existing works on LLM prompting techniques focus only on how to select a better set of data samples inside one single prompt input (In-Context Learning or ICL), why can not we design and leverage multiple prompts together to further improve the LLM's performance? In this work, we propose In-Context Sampling (ICS), a low-resource LLM prompting technique to produce confident predictions by optimizing the construction of multiple ICL prompt inputs. Extensive experiments with three open-source LLMs (FlanT5-XL, Mistral-7B, and Mixtral-8x7B) on four NLI datasets (e-SNLI, Multi-NLI, ANLI, and Contract-NLI) and one QA dataset (CommonsenseQA) illustrate that ICS can consistently enhance LLMs' performance. An in-depth evaluation with three data similarity-based ICS strategies suggests that these strategies can further elevate LLM's performance, which sheds light on a new yet promising future research direction. △ Less

Submitted 2 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

Comments: Accepted at NAACL 2024 Findings

arXiv:2311.09756 [pdf, other]

FairytaleCQA: Integrating a Commonsense Knowledge Graph into Children's Storybook Narratives

Authors: Jiaju Chen, Yuxuan Lu, Shao Zhang, Bingsheng Yao, Yuanzhe Dong, Ying Xu, Yunyao Li, Qianwen Wang, Dakuo Wang, Yuling Sun

Abstract: AI models (including LLM) often rely on narrative question-answering (QA) datasets to provide customized QA functionalities to support downstream children education applications; however, existing datasets only include QA pairs that are grounded within the given storybook content, but children can learn more when teachers refer the storybook content to real-world knowledge (e.g., commonsense knowl… ▽ More AI models (including LLM) often rely on narrative question-answering (QA) datasets to provide customized QA functionalities to support downstream children education applications; however, existing datasets only include QA pairs that are grounded within the given storybook content, but children can learn more when teachers refer the storybook content to real-world knowledge (e.g., commonsense knowledge). We introduce the FairytaleCQA dataset, which is annotated by children education experts, to supplement 278 storybook narratives with educationally appropriate commonsense knowledge. The dataset has 5,868 QA pairs that not only originate from the storybook narrative but also contain the commonsense knowledge grounded by an external knowledge graph (i.e., ConceptNet). A follow-up experiment shows that a smaller model (T5-large) fine-tuned with FairytaleCQA reliably outperforms much larger prompt-engineered LLM (e.g., GPT-4) in this new QA-pair generation task (QAG). This result suggests that: 1) our dataset brings novel challenges to existing LLMs, and 2) human experts' data annotation are still critical as they have much nuanced knowledge that LLMs do not know in the children educational domain. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2310.17245 [pdf, other]

CROP: Conservative Reward for Model-based Offline Policy Optimization

Authors: Hao Li, Xiao-Hu Zhou, Xiao-Liang Xie, Shi-Qi Liu, Zhen-Qiu Feng, Xiao-Yin Liu, Mei-Jiang Gui, Tian-Yu Xiang, De-Xing Huang, Bo-Xian Yao, Zeng-Guang Hou

Abstract: Offline reinforcement learning (RL) aims to optimize policy using collected data without online interactions. Model-based approaches are particularly appealing for addressing offline RL challenges due to their capability to mitigate the limitations of offline data through data generation using models. Prior research has demonstrated that introducing conservatism into the model or Q-function during… ▽ More Offline reinforcement learning (RL) aims to optimize policy using collected data without online interactions. Model-based approaches are particularly appealing for addressing offline RL challenges due to their capability to mitigate the limitations of offline data through data generation using models. Prior research has demonstrated that introducing conservatism into the model or Q-function during policy optimization can effectively alleviate the prevalent distribution drift problem in offline RL. However, the investigation into the impacts of conservatism in reward estimation is still lacking. This paper proposes a novel model-based offline RL algorithm, Conservative Reward for model-based Offline Policy optimization (CROP), which conservatively estimates the reward in model training. To achieve a conservative reward estimation, CROP simultaneously minimizes the estimation error and the reward of random actions. Theoretical analysis shows that this conservative reward mechanism leads to a conservative policy evaluation and helps mitigate distribution drift. Experiments on D4RL benchmarks showcase that the performance of CROP is comparable to the state-of-the-art baselines. Notably, CROP establishes an innovative connection between offline and online RL, highlighting that offline RL problems can be tackled by adopting online RL techniques to the empirical Markov decision process trained with a conservative reward. The source code is available with https://github.com/G0K0URURI/CROP.git. △ Less

Submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.15077 [pdf, other]

'Don't Get Too Technical with Me': A Discourse Structure-Based Framework for Science Journalism

Authors: Ronald Cardenas, Bingsheng Yao, Dakuo Wang, Yufang Hou

Abstract: Science journalism refers to the task of reporting technical findings of a scientific paper as a less technical news article to the general public audience. We aim to design an automated system to support this real-world task (i.e., automatic science journalism) by 1) introducing a newly-constructed and real-world dataset (SciTechNews), with tuples of a publicly-available scientific paper, its cor… ▽ More Science journalism refers to the task of reporting technical findings of a scientific paper as a less technical news article to the general public audience. We aim to design an automated system to support this real-world task (i.e., automatic science journalism) by 1) introducing a newly-constructed and real-world dataset (SciTechNews), with tuples of a publicly-available scientific paper, its corresponding news article, and an expert-written short summary snippet; 2) proposing a novel technical framework that integrates a paper's discourse structure with its metadata to guide generation; and, 3) demonstrating with extensive automatic and human experiments that our framework outperforms other baseline methods (e.g. Alpaca and ChatGPT) in elaborating a content plan meaningful for the target audience, simplifying the information selected, and producing a coherent final report in a layman's style. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023

arXiv:2310.05853 [pdf, other]

"Mango Mango, How to Let The Lettuce Dry Without A Spinner?'': Exploring User Perceptions of Using An LLM-Based Conversational Assistant Toward Cooking Partner

Authors: Szeyi Chan, Jiachen Li, Bingsheng Yao, Amama Mahmood, Chien-Ming Huang, Holly Jimison, Elizabeth D Mynatt, Dakuo Wang

Abstract: The rapid advancement of the Large Language Model (LLM) has created numerous potentials for integration with conversational assistants (CAs) assisting people in their daily tasks, particularly due to their extensive flexibility. However, users' real-world experiences interacting with these assistants remain unexplored. In this research, we chose cooking, a complex daily task, as a scenario to inve… ▽ More The rapid advancement of the Large Language Model (LLM) has created numerous potentials for integration with conversational assistants (CAs) assisting people in their daily tasks, particularly due to their extensive flexibility. However, users' real-world experiences interacting with these assistants remain unexplored. In this research, we chose cooking, a complex daily task, as a scenario to investigate people's successful and unsatisfactory experiences while receiving assistance from an LLM-based CA, Mango Mango. We discovered that participants value the system's ability to provide extensive information beyond the recipe, offer customized instructions based on context, and assist them in dynamically planning the task. However, they expect the system to be more adaptive to oral conversation and provide more suggestive responses to keep users actively involved. Recognizing that users began treating our LLM-CA as a personal assistant or even a partner rather than just a recipe-reading tool, we propose several design considerations for future development. △ Less

Submitted 9 October, 2023; originally announced October 2023.

Comments: Under submission to CHI2024

arXiv:2309.13879 [pdf, other]

LLM-Powered Conversational Voice Assistants: Interaction Patterns, Opportunities, Challenges, and Design Guidelines

Authors: Amama Mahmood, Junxiang Wang, Bingsheng Yao, Dakuo Wang, Chien-Ming Huang

Abstract: Conventional Voice Assistants (VAs) rely on traditional language models to discern user intent and respond to their queries, leading to interactions that often lack a broader contextual understanding, an area in which Large Language Models (LLMs) excel. However, current LLMs are largely designed for text-based interactions, thus making it unclear how user interactions will evolve if their modality… ▽ More Conventional Voice Assistants (VAs) rely on traditional language models to discern user intent and respond to their queries, leading to interactions that often lack a broader contextual understanding, an area in which Large Language Models (LLMs) excel. However, current LLMs are largely designed for text-based interactions, thus making it unclear how user interactions will evolve if their modality is changed to voice. In this work, we investigate whether LLMs can enrich VA interactions via an exploratory study with participants (N=20) using a ChatGPT-powered VA for three scenarios (medical self-diagnosis, creative planning, and debate) with varied constraints, stakes, and objectivity. We observe that LLM-powered VA elicits richer interaction patterns that vary across tasks, showing its versatility. Notably, LLMs absorb the majority of VA intent recognition failures. We additionally discuss the potential of harnessing LLMs for more resilient and fluid user-VA interactions and provide design guidelines for tailoring LLMs for voice assistance. △ Less

Submitted 25 September, 2023; originally announced September 2023.

arXiv:2309.12368 [pdf, other]

doi 10.1145/3613904.3642343

Rethinking Human-AI Collaboration in Complex Medical Decision Making: A Case Study in Sepsis Diagnosis

Authors: Shao Zhang, Jianing Yu, Xuhai Xu, Changchang Yin, Yuxuan Lu, Bingsheng Yao, Melanie Tory, Lace M. Padilla, Jeffrey Caterino, Ping Zhang, Dakuo Wang

Abstract: Today's AI systems for medical decision support often succeed on benchmark datasets in research papers but fail in real-world deployment. This work focuses on the decision making of sepsis, an acute life-threatening systematic infection that requires an early diagnosis with high uncertainty from the clinician. Our aim is to explore the design requirements for AI systems that can support clinical e… ▽ More Today's AI systems for medical decision support often succeed on benchmark datasets in research papers but fail in real-world deployment. This work focuses on the decision making of sepsis, an acute life-threatening systematic infection that requires an early diagnosis with high uncertainty from the clinician. Our aim is to explore the design requirements for AI systems that can support clinical experts in making better decisions for the early diagnosis of sepsis. The study begins with a formative study investigating why clinical experts abandon an existing AI-powered Sepsis predictive module in their electrical health record (EHR) system. We argue that a human-centered AI system needs to support human experts in the intermediate stages of a medical decision-making process (e.g., generating hypotheses or gathering data), instead of focusing only on the final decision. Therefore, we build SepsisLab based on a state-of-the-art AI algorithm and extend it to predict the future projection of sepsis development, visualize the prediction uncertainty, and propose actionable suggestions (i.e., which additional laboratory tests can be collected) to reduce such uncertainty. Through heuristic evaluation with six clinicians using our prototype system, we demonstrate that SepsisLab enables a promising human-AI collaboration paradigm for the future of AI-assisted sepsis diagnosis and other high-stakes medical decision making. △ Less

Submitted 26 February, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

Comments: Accepted by CHI'24

MSC Class: 68U35 ACM Class: H.5.2; I.2.1

arXiv:2309.11653 [pdf, other]

doi 10.1145/3613904.3642385

"It's a Fair Game", or Is It? Examining How Users Navigate Disclosure Risks and Benefits When Using LLM-Based Conversational Agents

Authors: Zhiping Zhang, Michelle Jia, Hao-Ping Lee, Bingsheng Yao, Sauvik Das, Ada Lerner, Dakuo Wang, Tianshi Li

Abstract: The widespread use of Large Language Model (LLM)-based conversational agents (CAs), especially in high-stakes domains, raises many privacy concerns. Building ethical LLM-based CAs that respect user privacy requires an in-depth understanding of the privacy risks that concern users the most. However, existing research, primarily model-centered, does not provide insight into users' perspectives. To b… ▽ More The widespread use of Large Language Model (LLM)-based conversational agents (CAs), especially in high-stakes domains, raises many privacy concerns. Building ethical LLM-based CAs that respect user privacy requires an in-depth understanding of the privacy risks that concern users the most. However, existing research, primarily model-centered, does not provide insight into users' perspectives. To bridge this gap, we analyzed sensitive disclosures in real-world ChatGPT conversations and conducted semi-structured interviews with 19 LLM-based CA users. We found that users are constantly faced with trade-offs between privacy, utility, and convenience when using LLM-based CAs. However, users' erroneous mental models and the dark patterns in system design limited their awareness and comprehension of the privacy risks. Additionally, the human-like interactions encouraged more sensitive disclosures, which complicated users' ability to navigate the trade-offs. We discuss practical design guidelines and the needs for paradigm shifts to protect the privacy of LLM-based CA users. △ Less

Submitted 1 April, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: 26 pages, 5 figures

arXiv:2309.09357 [pdf, other]

Talk2Care: Facilitating Asynchronous Patient-Provider Communication with Large-Language-Model

Authors: Ziqi Yang, Xuhai Xu, Bingsheng Yao, Shao Zhang, Ethan Rogers, Stephen Intille, Nawar Shara, Guodong Gordon Gao, Dakuo Wang

Abstract: Despite the plethora of telehealth applications to assist home-based older adults and healthcare providers, basic messaging and phone calls are still the most common communication methods, which suffer from limited availability, information loss, and process inefficiencies. One promising solution to facilitate patient-provider communication is to leverage large language models (LLMs) with their po… ▽ More Despite the plethora of telehealth applications to assist home-based older adults and healthcare providers, basic messaging and phone calls are still the most common communication methods, which suffer from limited availability, information loss, and process inefficiencies. One promising solution to facilitate patient-provider communication is to leverage large language models (LLMs) with their powerful natural conversation and summarization capability. However, there is a limited understanding of LLMs' role during the communication. We first conducted two interview studies with both older adults (N=10) and healthcare providers (N=9) to understand their needs and opportunities for LLMs in patient-provider asynchronous communication. Based on the insights, we built an LLM-powered communication system, Talk2Care, and designed interactive components for both groups: (1) For older adults, we leveraged the convenience and accessibility of voice assistants (VAs) and built an LLM-powered VA interface for effective information collection. (2) For health providers, we built an LLM-based dashboard to summarize and present important health information based on older adults' conversations with the VA. We further conducted two user studies with older adults and providers to evaluate the usability of the system. The results showed that Talk2Care could facilitate the communication process, enrich the health information collected from older adults, and considerably save providers' efforts and time. We envision our work as an initial exploration of LLMs' capability in the intersection of healthcare and interpersonal communication. △ Less

Submitted 3 February, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

Comments: Under submission to IMWUT'23, 26 pages

MSC Class: 68U35 ACM Class: H.5.2; I.2.7

arXiv:2309.00221 [pdf, other]

A multinode quantum network over a metropolitan area

Authors: Jian-Long Liu, Xi-Yu Luo, Yong Yu, Chao-Yang Wang, Bin Wang, Yi Hu, Jun Li, Ming-Yang Zheng, Bo Yao, Zi Yan, Da Teng, Jin-Wei Jiang, Xiao-Bing Liu, Xiu-Ping Xie, Jun Zhang, Qing-He Mao, Xiao Jiang, Qiang Zhang, Xiao-Hui Bao, Jian-Wei Pan

Abstract: Towards realizing the future quantum internet, a pivotal milestone entails the transition from two-node proof-of-principle experiments conducted in laboratories to comprehensive, multi-node setups on large scales. Here, we report on the debut implementation of a multi-node entanglement-based quantum network over a metropolitan area. We equipped three quantum nodes with atomic quantum memories and… ▽ More Towards realizing the future quantum internet, a pivotal milestone entails the transition from two-node proof-of-principle experiments conducted in laboratories to comprehensive, multi-node setups on large scales. Here, we report on the debut implementation of a multi-node entanglement-based quantum network over a metropolitan area. We equipped three quantum nodes with atomic quantum memories and their telecom interfaces, and combined them into a scalable phase-stabilized architecture through a server node. We demonstrated heralded entanglement generation between two quantum nodes situated 12.5 km apart, and the storage of entanglement exceeding the round-trip communication time. We also showed the concurrent entanglement generation on three links. Our work provides a metropolitan-scale testbed for the evaluation and exploration of multi-node quantum network protocols and starts a new stage of quantum internet research. △ Less

Submitted 31 August, 2023; originally announced September 2023.

Comments: 21 pages in total, 4 figures and 1 table in the main text, 5 figures and 8 tables in the supplementary material

arXiv:2307.15868 [pdf, ps, other]

Faster Stochastic Algorithms for Minimax Optimization under Polyak--Łojasiewicz Conditions

Authors: Lesi Chen, Boyuan Yao, Luo Luo

Abstract: This paper considers stochastic first-order algorithms for minimax optimization under Polyak--Łojasiewicz (PL) conditions. We propose SPIDER-GDA for solving the finite-sum problem of the form $\min_x \max_y f(x,y)\triangleq \frac{1}{n} \sum_{i=1}^n f_i(x,y)$, where the objective function $f(x,y)$ is $μ_x$-PL in $x$ and $μ_y$-PL in $y$; and each $f_i(x,y)$ is $L$-smooth. We prove SPIDER-GDA could f… ▽ More This paper considers stochastic first-order algorithms for minimax optimization under Polyak--Łojasiewicz (PL) conditions. We propose SPIDER-GDA for solving the finite-sum problem of the form $\min_x \max_y f(x,y)\triangleq \frac{1}{n} \sum_{i=1}^n f_i(x,y)$, where the objective function $f(x,y)$ is $μ_x$-PL in $x$ and $μ_y$-PL in $y$; and each $f_i(x,y)$ is $L$-smooth. We prove SPIDER-GDA could find an $ε$-optimal solution within ${\mathcal O}\left((n + \sqrt{n}\,κ_xκ_y^2)\log (1/ε)\right)$ stochastic first-order oracle (SFO) complexity, which is better than the state-of-the-art method whose SFO upper bound is ${\mathcal O}\big((n + n^{2/3}κ_xκ_y^2)\log (1/ε)\big)$, where $κ_x\triangleq L/μ_x$ and $κ_y\triangleq L/μ_y$. For the ill-conditioned case, we provide an accelerated algorithm to reduce the computational cost further. It achieves $\tilde{\mathcal O}\big((n+\sqrt{n}\,κ_xκ_y)\log^2 (1/ε)\big)$ SFO upper bound when $κ_y \gtrsim \sqrt{n}$. Our ideas also can be applied to the more general setting that the objective function only satisfies PL condition for one variable. Numerical experiments validate the superiority of proposed methods. △ Less

Submitted 28 July, 2023; originally announced July 2023.

Comments: published in NeurIPS 2022; fix a mistake in the proof of Thm. 4.1 and polish the writing

arXiv:2307.14385 [pdf, other]

doi 10.1145/3643540

Mental-LLM: Leveraging Large Language Models for Mental Health Prediction via Online Text Data

Authors: Xuhai Xu, Bingsheng Yao, Yuanzhe Dong, Saadia Gabriel, Hong Yu, James Hendler, Marzyeh Ghassemi, Anind K. Dey, Dakuo Wang

Abstract: Advances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health. In this work, we present a comprehensive evaluation of multiple LLMs on various mental health prediction tasks via online text data, including Alpaca, Alpaca-LoRA… ▽ More Advances in large language models (LLMs) have empowered a variety of applications. However, there is still a significant gap in research when it comes to understanding and enhancing the capabilities of LLMs in the field of mental health. In this work, we present a comprehensive evaluation of multiple LLMs on various mental health prediction tasks via online text data, including Alpaca, Alpaca-LoRA, FLAN-T5, GPT-3.5, and GPT-4. We conduct a broad range of experiments, covering zero-shot prompting, few-shot prompting, and instruction fine-tuning. The results indicate a promising yet limited performance of LLMs with zero-shot and few-shot prompt designs for mental health tasks. More importantly, our experiments show that instruction finetuning can significantly boost the performance of LLMs for all tasks simultaneously. Our best-finetuned models, Mental-Alpaca and Mental-FLAN-T5, outperform the best prompt design of GPT-3.5 (25 and 15 times bigger) by 10.9% on balanced accuracy and the best of GPT-4 (250 and 150 times bigger) by 4.8%. They further perform on par with the state-of-the-art task-specific language model. We also conduct an exploratory case study on LLMs' capability on mental health reasoning tasks, illustrating the promising capability of certain models such as GPT-4. We summarize our findings into a set of action guidelines for potential methods to enhance LLMs' capability for mental health tasks. Meanwhile, we also emphasize the important limitations before achieving deployability in real-world mental health settings, such as known racial and gender bias. We highlight the important ethical risks accompanying this line of research. △ Less

Submitted 28 January, 2024; v1 submitted 26 July, 2023; originally announced July 2023.

Comments: Published at Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT) 2024

MSC Class: 68U35 ACM Class: H.5.2; I.2.m

arXiv:2306.15096 [pdf, other]

Automated Identication of Atrial Fibrillation from Single-lead ECGs Using Multi-branching ResNet

Authors: Jianxin Xie, Stavros Stavrakis, Bing Yao

Abstract: Atrial fibrillation (AF) is the most common cardiac arrhythmia, which is clinically identified with irregular and rapid heartbeat rhythm. AF puts a patient at risk of forming blood clots, which can eventually lead to heart failure, stroke, or even sudden death. It is of critical importance to develop an advanced analytical model that can effectively interpret the electrocardiography (ECG) signals… ▽ More Atrial fibrillation (AF) is the most common cardiac arrhythmia, which is clinically identified with irregular and rapid heartbeat rhythm. AF puts a patient at risk of forming blood clots, which can eventually lead to heart failure, stroke, or even sudden death. It is of critical importance to develop an advanced analytical model that can effectively interpret the electrocardiography (ECG) signals and provide decision support for accurate AF diagnostics. In this paper, we propose an innovative deep-learning method for automated AF identification from single-lead ECGs. We first engage the continuous wavelet transform (CWT) to extract time-frequency features from ECG signals. Then, we develop a convolutional neural network (CNN) structure that incorporates ResNet for effective network training and multi-branching architectures for addressing the imbalanced data issue to process the 2D time-frequency features for AF classification. We evaluate the proposed methodology using two real-world ECG databases. The experimental results show a superior performance of our method compared with traditional deep learning models. △ Less

Submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.08126 [pdf, other]

PersonaPKT: Building Personalized Dialogue Agents via Parameter-efficient Knowledge Transfer

Authors: Xu Han, Bin Guo, Yoon Jung, Benjamin Yao, Yu Zhang, Xiaohu Liu, Chenlei Guo

Abstract: Personalized dialogue agents (DAs) powered by large pre-trained language models (PLMs) often rely on explicit persona descriptions to maintain personality consistency. However, such descriptions may not always be available or may pose privacy concerns. To tackle this bottleneck, we introduce PersonaPKT, a lightweight transfer learning approach that can build persona-consistent dialogue models with… ▽ More Personalized dialogue agents (DAs) powered by large pre-trained language models (PLMs) often rely on explicit persona descriptions to maintain personality consistency. However, such descriptions may not always be available or may pose privacy concerns. To tackle this bottleneck, we introduce PersonaPKT, a lightweight transfer learning approach that can build persona-consistent dialogue models without explicit persona descriptions. By representing each persona as a continuous vector, PersonaPKT learns implicit persona-specific features directly from a small number of dialogue samples produced by the same persona, adding less than 0.1% trainable parameters for each persona on top of the PLM backbone. Empirical results demonstrate that PersonaPKT effectively builds personalized DAs with high storage efficiency, outperforming various baselines in terms of persona consistency while maintaining good response generation quality. In addition, it enhances privacy protection by avoiding explicit persona descriptions. Overall, PersonaPKT is an effective solution for creating personalized DAs that respect user privacy. △ Less

Submitted 13 June, 2023; originally announced June 2023.

Comments: 10 pages, 3 figures, accepted to SustaiNLP 2023

arXiv:2306.02120 [pdf, other]

Giant Enhancement of Magnonic Frequency Combs by Exceptional Points

Authors: Congyi Wang, Jinwei Rao, Zhijian Chen, Kaixin Zhao, Liaoxin Sun, Bimu Yao, Tao Yu, Yi-Pu Wang, Wei Lu

Abstract: With their incomparable time-frequency accuracy, frequency combs have significantly advanced precision spectroscopy, ultra-sensitive detection, and atomic clocks. Traditional methods to create photonic, phononic, and magnonic frequency combs hinge on material nonlinearities which are often weak, necessitating high power densities to surpass their initiation thresholds, which subsequently limits th… ▽ More With their incomparable time-frequency accuracy, frequency combs have significantly advanced precision spectroscopy, ultra-sensitive detection, and atomic clocks. Traditional methods to create photonic, phononic, and magnonic frequency combs hinge on material nonlinearities which are often weak, necessitating high power densities to surpass their initiation thresholds, which subsequently limits their applications. Here, we introduce a novel nonlinear process to efficiently generate magnonic frequency combs (MFCs) by exploiting exceptional points (EPs) in a coupled system comprising a pump-induced magnon mode and a Kittel mode. Even without any cavity, our method greatly improves the efficiency of nonlinear frequency conversion and achieves optimal MFCs at low pump power. Additionally, our novel nonlinear process enables excellent tunability of EPs using the polarization and power of the pump, simplifying MFC generation and manipulation. Our work establishes a synergistic relationship between non-Hermitian physics and MFCs, which is advantages for coherent/quantum information processing and ultra-sensitive detection. △ Less

Submitted 3 June, 2023; originally announced June 2023.

Comments: 7 pages, 4 figures

Showing 1–50 of 171 results for author: Yao, B