Search | arXiv e-print repository

Sunflowers in set systems with small VC-dimension

Authors: József Balogh, Anton Bernshteyn, Michelle Delcourt, Asaf Ferber, Huy Tuan Pham

Abstract: A family of $r$ distinct sets $\{A_1,\ldots, A_r\}$ is an $r$-sunflower if for all $1 \leqslant i < j \leqslant r$ and $1 \leqslant i' < j' \leqslant r$, we have $A_i \cap A_j = A_{i'} \cap A_{j'}$. Erdős and Rado conjectured in 1960 that every family $\mathcal{H}$ of $\ell$-element sets of size at least $K(r)^\ell$ contains an $r$-sunflower, where $K(r)$ is some function that depends only on $r$.… ▽ More A family of $r$ distinct sets $\{A_1,\ldots, A_r\}$ is an $r$-sunflower if for all $1 \leqslant i < j \leqslant r$ and $1 \leqslant i' < j' \leqslant r$, we have $A_i \cap A_j = A_{i'} \cap A_{j'}$. Erdős and Rado conjectured in 1960 that every family $\mathcal{H}$ of $\ell$-element sets of size at least $K(r)^\ell$ contains an $r$-sunflower, where $K(r)$ is some function that depends only on $r$. We prove that if $\mathcal{H}$ is a family of $\ell$-element sets of VC-dimension at most $d$ and $|\mathcal{H}| > (C r (\log d+\log^\ast \ell))^\ell$ for some absolute constant $C > 0$, then $\mathcal{H}$ contains an $r$-sunflower. This improves a recent result of Fox, Pach, and Suk. When $d=1$, we obtain a sharp bound, namely that $|\mathcal{H}| > (r-1)^\ell$ is sufficient. Along the way, we establish a strengthening of the Kahn-Kalai conjecture for set families of bounded VC-dimension, which is of independent interest. △ Less

Submitted 7 August, 2024; originally announced August 2024.

Comments: 14 pages

arXiv:2408.03035 [pdf, other]

Training-Free Condition Video Diffusion Models for single frame Spatial-Semantic Echocardiogram Synthesis

Authors: Van Phi Nguyen, Tri Nhan Luong Ha, Huy Hieu Pham, Quoc Long Tran

Abstract: Conditional video diffusion models (CDM) have shown promising results for video synthesis, potentially enabling the generation of realistic echocardiograms to address the problem of data scarcity. However, current CDMs require a paired segmentation map and echocardiogram dataset. We present a new method called Free-Echo for generating realistic echocardiograms from a single end-diastolic segmentat… ▽ More Conditional video diffusion models (CDM) have shown promising results for video synthesis, potentially enabling the generation of realistic echocardiograms to address the problem of data scarcity. However, current CDMs require a paired segmentation map and echocardiogram dataset. We present a new method called Free-Echo for generating realistic echocardiograms from a single end-diastolic segmentation map without additional training data. Our method is based on the 3D-Unet with Temporal Attention Layers model and is conditioned on the segmentation map using a training-free conditioning method based on SDEdit. We evaluate our model on two public echocardiogram datasets, CAMUS and EchoNet-Dynamic. We show that our model can generate plausible echocardiograms that are spatially aligned with the input segmentation map, achieving performance comparable to training-based CDMs. Our work opens up new possibilities for generating echocardiograms from a single segmentation map, which can be used for data augmentation, domain adaptation, and other applications in medical imaging. Our code is available at \url{https://github.com/gungui98/echo-free} △ Less

Submitted 6 August, 2024; originally announced August 2024.

Comments: Accepted to MICCAI 2024

arXiv:2408.00892 [pdf, other]

Peptide Sequencing Via Protein Language Models

Authors: Thuong Le Hoai Pham, Jillur Rahman Saurav, Aisosa A. Omere, Calvin J. Heyl, Mohammad Sadegh Nasr, Cody Tyler Reynolds, Jai Prakash Yadav Veerla, Helen H Shang, Justyn Jaworski, Alison Ravenscraft, Joseph Anthony Buonomo, Jacob M. Luber

Abstract: We introduce a protein language model for determining the complete sequence of a peptide based on measurement of a limited set of amino acids. To date, protein sequencing relies on mass spectrometry, with some novel edman degregation based platforms able to sequence non-native peptides. Current protein sequencing techniques face limitations in accurately identifying all amino acids, hindering comp… ▽ More We introduce a protein language model for determining the complete sequence of a peptide based on measurement of a limited set of amino acids. To date, protein sequencing relies on mass spectrometry, with some novel edman degregation based platforms able to sequence non-native peptides. Current protein sequencing techniques face limitations in accurately identifying all amino acids, hindering comprehensive proteome analysis. Our method simulates partial sequencing data by selectively masking amino acids that are experimentally difficult to identify in protein sequences from the UniRef database. This targeted masking mimics real-world sequencing limitations. We then modify and finetune a ProtBert derived transformer-based model, for a new downstream task predicting these masked residues, providing an approximation of the complete sequence. Evaluating on three bacterial Escherichia species, we achieve per-amino-acid accuracy up to 90.5% when only four amino acids ([KCYM]) are known. Structural assessment using AlphaFold and TM-score validates the biological relevance of our predictions. The model also demonstrates potential for evolutionary analysis through cross-species performance. This integration of simulated experimental constraints with computational predictions offers a promising avenue for enhancing protein sequence analysis, potentially accelerating advancements in proteomics and structural biology by providing a probabilistic reconstruction of the complete protein sequence from limited experimental data. △ Less

Submitted 1 August, 2024; originally announced August 2024.

arXiv:2407.18532 [pdf, other]

Outer Approximation and Super-modular Cuts for Constrained Assortment Optimization under Mixed-Logit Model

Authors: Hoang Giang Pham, Tien Mai

Abstract: In this paper, we study the assortment optimization problem under the mixed-logit customer choice model. While assortment optimization has been a major topic in revenue management for decades, the mixed-logit model is considered one of the most general and flexible approaches for modeling and predicting customer purchasing behavior. Existing exact methods have primarily relied on mixed-integer lin… ▽ More In this paper, we study the assortment optimization problem under the mixed-logit customer choice model. While assortment optimization has been a major topic in revenue management for decades, the mixed-logit model is considered one of the most general and flexible approaches for modeling and predicting customer purchasing behavior. Existing exact methods have primarily relied on mixed-integer linear programming (MILP) or second-order cone (CONIC) reformulations, which allow for exact problem solving using off-the-shelf solvers. However, these approaches often suffer from weak continuous relaxations and are slow when solving large instances. Our work addresses the problem by focusing on components of the objective function that can be proven to be monotonically super-modular and convex. This allows us to derive valid cuts to outer-approximate the nonlinear objective functions. We then demonstrate that these valid cuts can be incorporated into Cutting Plane or Branch-and-Cut methods to solve the problem exactly. Extensive experiments show that our approaches consistently outperform previous methods in terms of both solution quality and computation time. △ Less

Submitted 26 July, 2024; originally announced July 2024.

arXiv:2407.17790 [pdf, other]

Exploring the Limitations of Kolmogorov-Arnold Networks in Classification: Insights to Software Training and Hardware Implementation

Authors: Van Duy Tran, Tran Xuan Hieu Le, Thi Diem Tran, Hoai Luan Pham, Vu Trung Duong Le, Tuan Hai Vu, Van Tinh Nguyen, Yasuhiko Nakashima

Abstract: Kolmogorov-Arnold Networks (KANs), a novel type of neural network, have recently gained popularity and attention due to the ability to substitute multi-layer perceptions (MLPs) in artificial intelligence (AI) with higher accuracy and interoperability. However, KAN assessment is still limited and cannot provide an in-depth analysis of a specific domain. Furthermore, no study has been conducted on t… ▽ More Kolmogorov-Arnold Networks (KANs), a novel type of neural network, have recently gained popularity and attention due to the ability to substitute multi-layer perceptions (MLPs) in artificial intelligence (AI) with higher accuracy and interoperability. However, KAN assessment is still limited and cannot provide an in-depth analysis of a specific domain. Furthermore, no study has been conducted on the implementation of KANs in hardware design, which would directly demonstrate whether KANs are truly superior to MLPs in practical applications. As a result, in this paper, we focus on verifying KANs for classification issues, which are a common but significant topic in AI using four different types of datasets. Furthermore, the corresponding hardware implementation is considered using the Vitis high-level synthesis (HLS) tool. To the best of our knowledge, this is the first article to implement hardware for KAN. The results indicate that KANs cannot achieve more accuracy than MLPs in high complex datasets while utilizing substantially higher hardware resources. Therefore, MLP remains an effective approach for achieving accuracy and efficiency in software and hardware implementation. △ Less

Submitted 25 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

Comments: 6 pages, 3 figures, 2 tables

arXiv:2407.16497 [pdf, other]

Dynamic Retraining-Updating Mean Teacher for Source-Free Object Detection

Authors: Trinh Le Ba Khanh, Huy-Hung Nguyen, Long Hoang Pham, Duong Nguyen-Ngoc Tran, Jae Wook Jeon

Abstract: In object detection, unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. However, UDA's reliance on labeled source data restricts its adaptability in privacy-related scenarios. This study focuses on source-free object detection (SFOD), which adapts a source-trained detector to an unlabeled target domain without using labeled s… ▽ More In object detection, unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. However, UDA's reliance on labeled source data restricts its adaptability in privacy-related scenarios. This study focuses on source-free object detection (SFOD), which adapts a source-trained detector to an unlabeled target domain without using labeled source data. Recent advancements in self-training, particularly with the Mean Teacher (MT) framework, show promise for SFOD deployment. However, the absence of source supervision significantly compromises the stability of these approaches. We identify two primary issues, (1) uncontrollable degradation of the teacher model due to inopportune updates from the student model, and (2) the student model's tendency to replicate errors from incorrect pseudo labels, leading to it being trapped in a local optimum. Both factors contribute to a detrimental circular dependency, resulting in rapid performance degradation in recent self-training frameworks. To tackle these challenges, we propose the Dynamic Retraining-Updating (DRU) mechanism, which actively manages the student training and teacher updating processes to achieve co-evolutionary training. Additionally, we introduce Historical Student Loss to mitigate the influence of incorrect pseudo labels. Our method achieves state-of-the-art performance in the SFOD setting on multiple domain adaptation benchmarks, comparable to or even surpassing advanced UDA methods. The code will be released at https://github.com/lbktrinh/DRU △ Less

Submitted 23 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.14738 [pdf, other]

Flatness-aware Sequential Learning Generates Resilient Backdoors

Authors: Hoang Pham, The-Anh Ta, Anh Tran, Khoa D. Doan

Abstract: Recently, backdoor attacks have become an emerging threat to the security of machine learning models. From the adversary's perspective, the implanted backdoors should be resistant to defensive algorithms, but some recently proposed fine-tuning defenses can remove these backdoors with notable efficacy. This is mainly due to the catastrophic forgetting (CF) property of deep neural networks. This pap… ▽ More Recently, backdoor attacks have become an emerging threat to the security of machine learning models. From the adversary's perspective, the implanted backdoors should be resistant to defensive algorithms, but some recently proposed fine-tuning defenses can remove these backdoors with notable efficacy. This is mainly due to the catastrophic forgetting (CF) property of deep neural networks. This paper counters CF of backdoors by leveraging continual learning (CL) techniques. We begin by investigating the connectivity between a backdoored and fine-tuned model in the loss landscape. Our analysis confirms that fine-tuning defenses, especially the more advanced ones, can easily push a poisoned model out of the backdoor regions, making it forget all about the backdoors. Based on this finding, we re-formulate backdoor training through the lens of CL and propose a novel framework, named Sequential Backdoor Learning (SBL), that can generate resilient backdoors. This framework separates the backdoor poisoning process into two tasks: the first task learns a backdoored model, while the second task, based on the CL principles, moves it to a backdoored region resistant to fine-tuning. We additionally propose to seek flatter backdoor regions via a sharpness-aware minimizer in the framework, further strengthening the durability of the implanted backdoor. Finally, we demonstrate the effectiveness of our method through extensive empirical experiments on several benchmark datasets in the backdoor domain. The source code is available at https://github.com/mail-research/SBL-resilient-backdoors △ Less

Submitted 19 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.06697 [pdf, other]

Certified Continual Learning for Neural Network Regression

Authors: Long H. Pham, Jun Sun

Abstract: On the one hand, there has been considerable progress on neural network verification in recent years, which makes certifying neural networks a possibility. On the other hand, neural networks in practice are often re-trained over time to cope with new data distribution or for solving different tasks (a.k.a. continual learning). Once re-trained, the verified correctness of the neural network is like… ▽ More On the one hand, there has been considerable progress on neural network verification in recent years, which makes certifying neural networks a possibility. On the other hand, neural networks in practice are often re-trained over time to cope with new data distribution or for solving different tasks (a.k.a. continual learning). Once re-trained, the verified correctness of the neural network is likely broken, particularly in the presence of the phenomenon known as catastrophic forgetting. In this work, we propose an approach called certified continual learning which improves existing continual learning methods by preserving, as long as possible, the established correctness properties of a verified network. Our approach is evaluated with multiple neural networks and on two different continual learning methods. The results show that our approach is efficient and the trained models preserve their certified correctness and often maintain high utility. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.02873 [pdf, other]

Robot Shape and Location Retention in Video Generation Using Diffusion Models

Authors: Peng Wang, Zhihao Guo, Abdul Latheef Sait, Minh Huy Pham

Abstract: Diffusion models have marked a significant milestone in the enhancement of image and video generation technologies. However, generating videos that precisely retain the shape and location of moving objects such as robots remains a challenge. This paper presents diffusion models specifically tailored to generate videos that accurately maintain the shape and location of mobile robots. This developme… ▽ More Diffusion models have marked a significant milestone in the enhancement of image and video generation technologies. However, generating videos that precisely retain the shape and location of moving objects such as robots remains a challenge. This paper presents diffusion models specifically tailored to generate videos that accurately maintain the shape and location of mobile robots. This development offers substantial benefits to those working on detecting dangerous interactions between humans and robots by facilitating the creation of training data for collision detection models, circumventing the need for collecting data from the real world, which often involves legal and ethical issues. Our models incorporate techniques such as embedding accessible robot pose information and applying semantic mask regulation within the ConvNext backbone network. These techniques are designed to refine intermediate outputs, therefore improving the retention performance of shape and location. Through extensive experimentation, our models have demonstrated notable improvements in maintaining the shape and location of different robots, as well as enhancing overall video generation quality, compared to the benchmark diffusion model. Codes will be opensourced at \href{https://github.com/PengPaulWang/diffusion-robots}{Github}. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 8 pages, 10 figures

arXiv:2406.17381 [pdf, other]

Forget but Recall: Incremental Latent Rectification in Continual Learning

Authors: Nghia D. Nguyen, Hieu Trung Nguyen, Ang Li, Hoang Pham, Viet Anh Nguyen, Khoa D. Doan

Abstract: Intrinsic capability to continuously learn a changing data stream is a desideratum of deep neural networks (DNNs). However, current DNNs suffer from catastrophic forgetting, which hinders remembering past knowledge. To mitigate this issue, existing Continual Learning (CL) approaches either retain exemplars for replay, regularize learning, or allocate dedicated capacity for new tasks. This paper in… ▽ More Intrinsic capability to continuously learn a changing data stream is a desideratum of deep neural networks (DNNs). However, current DNNs suffer from catastrophic forgetting, which hinders remembering past knowledge. To mitigate this issue, existing Continual Learning (CL) approaches either retain exemplars for replay, regularize learning, or allocate dedicated capacity for new tasks. This paper investigates an unexplored CL direction for incremental learning called Incremental Latent Rectification or ILR. In a nutshell, ILR learns to propagate with correction (or rectify) the representation from the current trained DNN backward to the representation space of the old task, where performing predictive decisions is easier. This rectification process only employs a chain of small representation mapping networks, called rectifier units. Empirical experiments on several continual learning benchmarks, including CIFAR10, CIFAR100, and Tiny ImageNet, demonstrate the effectiveness and potential of this novel CL direction compared to existing representative CL methods. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2405.17926 [pdf, other]

SarcNet: A Novel AI-based Framework to Automatically Analyze and Score Sarcomere Organizations in Fluorescently Tagged hiPSC-CMs

Authors: Huyen Le, Khiet Dang, Tien Lai, Nhung Nguyen, Mai Tran, Hieu Pham

Abstract: Quantifying sarcomere structure organization in human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) is crucial for understanding cardiac disease pathology, improving drug screening, and advancing regenerative medicine. Traditional methods, such as manual annotation and Fourier transform analysis, are labor-intensive, error-prone, and lack high-throughput capabilities. In this st… ▽ More Quantifying sarcomere structure organization in human-induced pluripotent stem cell-derived cardiomyocytes (hiPSC-CMs) is crucial for understanding cardiac disease pathology, improving drug screening, and advancing regenerative medicine. Traditional methods, such as manual annotation and Fourier transform analysis, are labor-intensive, error-prone, and lack high-throughput capabilities. In this study, we present a novel deep learning-based framework that leverages cell images and integrates cell features to automatically evaluate the sarcomere structure of hiPSC-CMs from the onset of differentiation. This framework overcomes the limitations of traditional methods through automated, high-throughput analysis, providing consistent, reliable results while accurately detecting complex sarcomere patterns across diverse samples. The proposed framework contains the SarcNet, a linear layers-added ResNet-18 module, to output a continuous score ranging from one to five that captures the level of sarcomere structure organization. It is trained and validated on an open-source dataset of hiPSC-CMs images with the endogenously GFP-tagged alpha-actinin-2 structure developed by the Allen Institute for Cell Science (AICS). SarcNet achieves a Spearman correlation of 0.831 with expert evaluations, demonstrating superior performance and an improvement of 0.075 over the current state-of-the-art approach, which uses Linear Regression. Our results also show a consistent pattern of increasing organization from day 18 to day 32 of differentiation, aligning with expert evaluations. By integrating the quantitative features calculated directly from the images with the visual features learned during the deep learning model, our framework offers a more comprehensive and accurate assessment, thereby enhancing the further utility of hiPSC-CMs in medical research and therapy development. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17002 [pdf, other]

UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models

Authors: Quan Van Nguyen, Huy Quang Pham, Dan Quang Tran, Thang Kien-Bao Nguyen, Nhat-Hao Nguyen-Dang, Bao-Thien Nguyen-Tat

Abstract: Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Metho… ▽ More Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Methods: In our participation in the ImageCLEFmedical2024 Caption evaluation campaign, we explored caption prediction tasks using advanced Transformer-based models. We developed methods incorporating Transformer encoder-decoder and Query Transformer architectures. These models were trained and evaluated to generate diagnostic captions from radiology images. Results: Experimental evaluations demonstrated the effectiveness of our models, with the VisionDiagnostor-BioBART model achieving the highest BERTScore of 0.6267. This performance contributed to our team, DarkCow, achieving third place on the leaderboard. Conclusion: Our diagnostic captioning models show great promise in aiding medical professionals by generating high-quality reports efficiently. This approach can facilitate better data processing and performance optimization in medical imaging departments, ultimately benefiting healthcare delivery. △ Less

Submitted 27 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.14781 [pdf, other]

Unified Neural Backdoor Removal with Only Few Clean Samples through Unlearning and Relearning

Authors: Nay Myat Min, Long H. Pham, Jun Sun

Abstract: The application of deep neural network models in various security-critical applications has raised significant security concerns, particularly the risk of backdoor attacks. Neural backdoors pose a serious security threat as they allow attackers to maliciously alter model behavior. While many defenses have been explored, existing approaches are often bounded by model-specific constraints, or necess… ▽ More The application of deep neural network models in various security-critical applications has raised significant security concerns, particularly the risk of backdoor attacks. Neural backdoors pose a serious security threat as they allow attackers to maliciously alter model behavior. While many defenses have been explored, existing approaches are often bounded by model-specific constraints, or necessitate complex alterations to the training process, or fall short against diverse backdoor attacks. In this work, we introduce a novel method for comprehensive and effective elimination of backdoors, called ULRL (short for UnLearn and ReLearn for backdoor removal). ULRL requires only a small set of clean samples and works effectively against all kinds of backdoors. It first applies unlearning for identifying suspicious neurons and then targeted neural weight tuning for backdoor mitigation (i.e., by promoting significant weight deviation on the suspicious neurons). Evaluated against 12 different types of backdoors, ULRL is shown to significantly outperform state-of-the-art methods in eliminating backdoors whilst preserving the model utility. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.02287 [pdf, other]

Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models

Authors: Piotr Padlewski, Max Bain, Matthew Henderson, Zhongkai Zhu, Nishant Relan, Hai Pham, Donovan Ong, Kaloyan Aleksiev, Aitor Ormazabal, Samuel Phua, Ethan Yeo, Eugenie Lamprecht, Qi Liu, Yuqi Wang, Eric Chen, Deyu Fu, Lei Li, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Mikel Artetxe, Yi Tay

Abstract: We introduce Vibe-Eval: a new open benchmark and framework for evaluating multimodal chat models. Vibe-Eval consists of 269 visual understanding prompts, including 100 of hard difficulty, complete with gold-standard responses authored by experts. Vibe-Eval is open-ended and challenging with dual objectives: (i) vibe checking multimodal chat models for day-to-day tasks and (ii) rigorously testing a… ▽ More We introduce Vibe-Eval: a new open benchmark and framework for evaluating multimodal chat models. Vibe-Eval consists of 269 visual understanding prompts, including 100 of hard difficulty, complete with gold-standard responses authored by experts. Vibe-Eval is open-ended and challenging with dual objectives: (i) vibe checking multimodal chat models for day-to-day tasks and (ii) rigorously testing and probing the capabilities of present frontier models. Notably, our hard set contains >50% questions that all frontier models answer incorrectly. We explore the nuances of designing, evaluating, and ranking models on ultra challenging prompts. We also discuss trade-offs between human and automatic evaluation, and show that automatic model evaluation using Reka Core roughly correlates to human judgment. We offer free API access for the purpose of lightweight evaluation and plan to conduct formal human evaluations for public models that perform well on the Vibe-Eval's automatic scores. We release the evaluation code and data, see https://github.com/reka-ai/reka-vibe-eval △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2405.01488 [pdf, other]

Digital Twin Generators for Disease Modeling

Authors: Nameyeh Alam, Jake Basilico, Daniele Bertolini, Satish Casie Chetty, Heather D'Angelo, Ryan Douglas, Charles K. Fisher, Franklin Fuller, Melissa Gomes, Rishabh Gupta, Alex Lang, Anton Loukianov, Rachel Mak-McCully, Cary Murray, Hanalei Pham, Susanna Qiao, Elena Ryapolova-Webb, Aaron Smith, Dimitri Theoharatos, Anil Tolwani, Eric W. Tramel, Anna Vidovszky, Judy Viduya, Jonathan R. Walsh

Abstract: A patient's digital twin is a computational model that describes the evolution of their health over time. Digital twins have the potential to revolutionize medicine by enabling individual-level computer simulations of human health, which can be used to conduct more efficient clinical trials or to recommend personalized treatment options. Due to the overwhelming complexity of human biology, machine… ▽ More A patient's digital twin is a computational model that describes the evolution of their health over time. Digital twins have the potential to revolutionize medicine by enabling individual-level computer simulations of human health, which can be used to conduct more efficient clinical trials or to recommend personalized treatment options. Due to the overwhelming complexity of human biology, machine learning approaches that leverage large datasets of historical patients' longitudinal health records to generate patients' digital twins are more tractable than potential mechanistic models. In this manuscript, we describe a neural network architecture that can learn conditional generative models of clinical trajectories, which we call Digital Twin Generators (DTGs), that can create digital twins of individual patients. We show that the same neural network architecture can be trained to generate accurate digital twins for patients across 13 different indications simply by changing the training set and tuning hyperparameters. By introducing a general purpose architecture, we aim to unlock the ability to scale machine learning approaches to larger datasets and across more indications so that a digital twin could be created for any patient in the world. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.01054 [pdf, other]

Continual Learning for Robust Gate Detection under Dynamic Lighting in Autonomous Drone Racing

Authors: Zhongzheng Qiao, Xuan Huy Pham, Savitha Ramasamy, Xudong Jiang, Erdal Kayacan, Andriy Sarabakha

Abstract: In autonomous and mobile robotics, a principal challenge is resilient real-time environmental perception, particularly in situations characterized by unknown and dynamic elements, as exemplified in the context of autonomous drone racing. This study introduces a perception technique for detecting drone racing gates under illumination variations, which is common during high-speed drone flights. The… ▽ More In autonomous and mobile robotics, a principal challenge is resilient real-time environmental perception, particularly in situations characterized by unknown and dynamic elements, as exemplified in the context of autonomous drone racing. This study introduces a perception technique for detecting drone racing gates under illumination variations, which is common during high-speed drone flights. The proposed technique relies upon a lightweight neural network backbone augmented with capabilities for continual learning. The envisaged approach amalgamates predictions of the gates' positional coordinates, distance, and orientation, encapsulating them into a cohesive pose tuple. A comprehensive number of tests serve to underscore the efficacy of this approach in confronting diverse and challenging scenarios, specifically those involving variable lighting conditions. The proposed methodology exhibits notable robustness in the face of illumination variations, thereby substantiating its effectiveness. △ Less

Submitted 2 May, 2024; originally announced May 2024.

Comments: 8 pages, 6 figures, in 2024 International Joint Conference on Neural Networks (IJCNN)

arXiv:2404.18831 [pdf, other]

ConPro: Learning Severity Representation for Medical Images using Contrastive Learning and Preference Optimization

Authors: Hong Nguyen, Hoang Nguyen, Melinda Chang, Hieu Pham, Shrikanth Narayanan, Michael Pazzani

Abstract: Understanding the severity of conditions shown in images in medical diagnosis is crucial, serving as a key guide for clinical assessment, treatment, as well as evaluating longitudinal progression. This paper proposes Con- PrO: a novel representation learning method for severity assessment in medical images using Contrastive learningintegrated Preference Optimization. Different from conventional co… ▽ More Understanding the severity of conditions shown in images in medical diagnosis is crucial, serving as a key guide for clinical assessment, treatment, as well as evaluating longitudinal progression. This paper proposes Con- PrO: a novel representation learning method for severity assessment in medical images using Contrastive learningintegrated Preference Optimization. Different from conventional contrastive learning methods that maximize the distance between classes, ConPrO injects into the latent vector the distance preference knowledge between various severity classes and the normal class. We systematically examine the key components of our framework to illuminate how contrastive prediction tasks acquire valuable representations. We show that our representation learning framework offers valuable severity ordering in the feature space while outperforming previous state-of-the-art methods on classification tasks. We achieve a 6% and 20% relative improvement compared to a supervised and a self-supervised baseline, respectively. In addition, we derived discussions on severity indicators and related applications of preference comparison in the medical domain. △ Less

Submitted 29 April, 2024; originally announced April 2024.

Comments: 8 pages

arXiv:2404.18397 [pdf, other]

ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images

Authors: Huy Quang Pham, Thang Kien-Bao Nguyen, Quan Van Nguyen, Dan Quang Tran, Nghia Hieu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Abstract: Optical Character Recognition - Visual Question Answering (OCR-VQA) is the task of answering text information contained in images that have just been significantly developed in the English language in recent years. However, there are limited studies of this task in low-resource languages such as Vietnamese. To this end, we introduce a novel dataset, ViOCRVQA (Vietnamese Optical Character Recogniti… ▽ More Optical Character Recognition - Visual Question Answering (OCR-VQA) is the task of answering text information contained in images that have just been significantly developed in the English language in recent years. However, there are limited studies of this task in low-resource languages such as Vietnamese. To this end, we introduce a novel dataset, ViOCRVQA (Vietnamese Optical Character Recognition - Visual Question Answering dataset), consisting of 28,000+ images and 120,000+ question-answer pairs. In this dataset, all the images contain text and questions about the information relevant to the text in the images. We deploy ideas from state-of-the-art methods proposed for English to conduct experiments on our dataset, revealing the challenges and difficulties inherent in a Vietnamese dataset. Furthermore, we introduce a novel approach, called VisionReader, which achieved 0.4116 in EM and 0.6990 in the F1-score on the test set. Through the results, we found that the OCR system plays a very important role in VQA models on the ViOCRVQA dataset. In addition, the objects in the image also play a role in improving model performance. We open access to our dataset at link (https://github.com/qhnhynmm/ViOCRVQA.git) for further research in OCR-VQA task in Vietnamese. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.12387 [pdf, other]

Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Authors: Reka Team, Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham, Isaac Ong, Kaloyan Aleksiev, Lei Li, Matthew Henderson, Max Bain, Mikel Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen, Samuel Phua, Yazheng Yang, Yi Tay, Yuqi Wang, Zhongkai Zhu , et al. (1 additional authors not shown)

Abstract: We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but al… ▽ More We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but also outperform many much larger models, delivering outsized values for their respective compute class. Meanwhile, our most capable and largest model, Reka Core, approaches the best frontier models on both automatic evaluations and blind human evaluations. On image question answering benchmarks (e.g. MMMU, VQAv2), Core performs competitively to GPT4-V. Meanwhile, on multimodal chat, Core ranks as the second most preferred model under a blind third-party human evaluation setup, outperforming other models such as Claude 3 Opus. On text benchmarks, Core not only performs competitively to other frontier models on a set of well-established benchmarks (e.g. MMLU, GSM8K) but also outperforms GPT4-0613 on human evaluation. On video question answering (Perception-Test), Core outperforms Gemini Ultra. Models are shipped in production at http://chat.reka.ai . A showcase of non cherry picked qualitative examples can also be found at http://showcase.reka.ai . △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.10652 [pdf, other]

ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images

Authors: Quan Van Nguyen, Dan Quang Tran, Huy Quang Pham, Thang Kien-Bao Nguyen, Nghia Hieu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Abstract: Visual Question Answering (VQA) is a complicated task that requires the capability of simultaneously processing natural language and images. Initially, this task was researched, focusing on methods to help machines understand objects and scene contexts in images. However, some text appearing in the image that carries explicit information about the full content of the image is not mentioned. Along… ▽ More Visual Question Answering (VQA) is a complicated task that requires the capability of simultaneously processing natural language and images. Initially, this task was researched, focusing on methods to help machines understand objects and scene contexts in images. However, some text appearing in the image that carries explicit information about the full content of the image is not mentioned. Along with the continuous development of the AI era, there have been many studies on the reading comprehension ability of VQA models in the world. As a developing country, conditions are still limited, and this task is still open in Vietnam. Therefore, we introduce the first large-scale dataset in Vietnamese specializing in the ability to understand text appearing in images, we call it ViTextVQA (\textbf{Vi}etnamese \textbf{Text}-based \textbf{V}isual \textbf{Q}uestion \textbf{A}nswering dataset) which contains \textbf{over 16,000} images and \textbf{over 50,000} questions with answers. Through meticulous experiments with various state-of-the-art models, we uncover the significance of the order in which tokens in OCR text are processed and selected to formulate answers. This finding helped us significantly improve the performance of the baseline models on the ViTextVQA dataset. Our dataset is available at this \href{https://github.com/minhquan6203/ViTextVQA-Dataset}{link} for research purposes. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Preprint submitted to IJCV

arXiv:2403.19867 [pdf, ps, other]

Finding Decision Tree Splits in Streaming and Massively Parallel Models

Authors: Huy Pham, Hoang Ta, Hoa T. Vu

Abstract: In this work, we provide data stream algorithms that compute optimal splits in decision tree learning. In particular, given a data stream of observations $x_i$ and their labels $y_i$, the goal is to find the optimal split point $j$ that divides the data into two sets such that the mean squared error (for regression) or misclassification rate (for classification) is minimized. We provide various fa… ▽ More In this work, we provide data stream algorithms that compute optimal splits in decision tree learning. In particular, given a data stream of observations $x_i$ and their labels $y_i$, the goal is to find the optimal split point $j$ that divides the data into two sets such that the mean squared error (for regression) or misclassification rate (for classification) is minimized. We provide various fast streaming algorithms that use sublinear space and a small number of passes for these problems. These algorithms can also be extended to the massively parallel computation model. Our work, while not directly comparable, complements the seminal work of Domingos and Hulten (KDD 2000). △ Less

Submitted 17 April, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.15959 [pdf, other]

Risk-Calibrated Human-Robot Interaction via Set-Valued Intent Prediction

Authors: Justin Lidard, Hang Pham, Ariel Bachman, Bryan Boateng, Anirudha Majumdar

Abstract: Tasks where robots must anticipate human intent, such as navigating around a cluttered home or sorting everyday items, are challenging because they exhibit a wide range of valid actions that lead to similar outcomes. Moreover, zero-shot cooperation between human-robot partners is an especially challenging problem because it requires the robot to infer and adapt on the fly to a latent human intent,… ▽ More Tasks where robots must anticipate human intent, such as navigating around a cluttered home or sorting everyday items, are challenging because they exhibit a wide range of valid actions that lead to similar outcomes. Moreover, zero-shot cooperation between human-robot partners is an especially challenging problem because it requires the robot to infer and adapt on the fly to a latent human intent, which could vary significantly from human to human. Recently, deep learned motion prediction models have shown promising results in predicting human intent but are prone to being confidently incorrect. In this work, we present Risk-Calibrated Interactive Planning (RCIP), which is a framework for measuring and calibrating risk associated with uncertain action selection in human-robot cooperation, with the fundamental idea that the robot should ask for human clarification when the risk associated with the uncertainty in the human's intent cannot be controlled. RCIP builds on the theory of set-valued risk calibration to provide a finite-sample statistical guarantee on the cumulative loss incurred by the robot while minimizing the cost of human clarification in complex multi-step settings. Our main insight is to frame the risk control problem as a sequence-level multi-hypothesis testing problem, allowing efficient calibration using a low-dimensional parameter that controls a pre-trained risk-aware policy. Experiments across a variety of simulated and real-world environments demonstrate RCIP's ability to predict and adapt to a diverse set of dynamic human intents. △ Less

Submitted 23 April, 2024; v1 submitted 23 March, 2024; originally announced March 2024.

Comments: Website with additional information, videos, and code: https://risk-calibrated-planning.github.io/

arXiv:2403.04264 [pdf, other]

Competitive Facility Location under Random Utilities and Routing Constraints

Authors: Hoang Giang Pham, Tien Thanh Dam, Ngan Ha Duong, Tien Mai, Minh Hoang Ha

Abstract: In this paper, we study a facility location problem within a competitive market context, where customer demand is predicted by a random utility choice model. Unlike prior research, which primarily focuses on simple constraints such as a cardinality constraint on the number of selected locations, we introduce routing constraints that necessitate the selection of locations in a manner that guarantee… ▽ More In this paper, we study a facility location problem within a competitive market context, where customer demand is predicted by a random utility choice model. Unlike prior research, which primarily focuses on simple constraints such as a cardinality constraint on the number of selected locations, we introduce routing constraints that necessitate the selection of locations in a manner that guarantees the existence of a tour visiting all chosen locations while adhering to a specified tour length upper bound. Such routing constraints find crucial applications in various real-world scenarios. The problem at hand features a non-linear objective function, resulting from the utilization of random utilities, together with complex routing constraints, making it computationally challenging. To tackle this problem, we explore three types of valid cuts, namely, outer-approximation and submodular cuts to handle the nonlinear objective function, as well as sub-tour elimination cuts to address the complex routing constraints. These lead to the development of two exact solution methods: a nested cutting plane and nested branch-and-cut algorithms, where these valid cuts are iteratively added to a master problem through two nested loops. We also prove that our nested cutting plane method always converges to optimality after a finite number of iterations. Furthermore, we develop a local search-based metaheuristic tailored for solving large-scale instances and show its pros and cons compared to exact methods. Extensive experiments are conducted on problem instances of varying sizes, demonstrating that our approach excels in terms of solution quality and computation time when compared to other baseline approaches. △ Less

Submitted 9 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2402.13822 [pdf, other]

MSTAR: Multi-Scale Backbone Architecture Search for Timeseries Classification

Authors: Tue M. Cao, Nhat H. Tran, Hieu H. Pham, Hung T. Nguyen, Le P. Nguyen

Abstract: Most of the previous approaches to Time Series Classification (TSC) highlight the significance of receptive fields and frequencies while overlooking the time resolution. Hence, unavoidably suffered from scalability issues as they integrated an extensive range of receptive fields into classification models. Other methods, while having a better adaptation for large datasets, require manual design an… ▽ More Most of the previous approaches to Time Series Classification (TSC) highlight the significance of receptive fields and frequencies while overlooking the time resolution. Hence, unavoidably suffered from scalability issues as they integrated an extensive range of receptive fields into classification models. Other methods, while having a better adaptation for large datasets, require manual design and yet not being able to reach the optimal architecture due to the uniqueness of each dataset. We overcome these challenges by proposing a novel multi-scale search space and a framework for Neural architecture search (NAS), which addresses both the problem of frequency and time resolution, discovering the suitable scale for a specific dataset. We further show that our model can serve as a backbone to employ a powerful Transformer module with both untrained and pre-trained weights. Our search space reaches the state-of-the-art performance on four datasets on four different domains while introducing more than ten highly fine-tuned models for each data. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.11702 [pdf, other]

doi 10.1145/3643991.3645074

Can ChatGPT Support Developers? An Empirical Evaluation of Large Language Models for Code Generation

Authors: Kailun Jin, Chung-Yu Wang, Hung Viet Pham, Hadi Hemmati

Abstract: Large language models (LLMs) have demonstrated notable proficiency in code generation, with numerous prior studies showing their promising capabilities in various development scenarios. However, these studies mainly provide evaluations in research settings, which leaves a significant gap in understanding how effectively LLMs can support developers in real-world. To address this, we conducted an em… ▽ More Large language models (LLMs) have demonstrated notable proficiency in code generation, with numerous prior studies showing their promising capabilities in various development scenarios. However, these studies mainly provide evaluations in research settings, which leaves a significant gap in understanding how effectively LLMs can support developers in real-world. To address this, we conducted an empirical analysis of conversations in DevGPT, a dataset collected from developers' conversations with ChatGPT (captured with the Share Link feature on platforms such as GitHub). Our empirical findings indicate that the current practice of using LLM-generated code is typically limited to either demonstrating high-level concepts or providing examples in documentation, rather than to be used as production-ready code. These findings indicate that there is much future work needed to improve LLMs in code generation before they can be integral parts of modern software development. △ Less

Submitted 16 March, 2024; v1 submitted 18 February, 2024; originally announced February 2024.

Comments: 4 pages, 3 figures, 21st International Conference on Mining Software Repositories (MSR '24), April 15-16, 2024, Lisbon, Portugal

ACM Class: I.2.2

arXiv:2401.13594 [pdf, other]

Graph Guided Question Answer Generation for Procedural Question-Answering

Authors: Hai X. Pham, Isma Hadji, Xinnuo Xu, Ziedune Degutyte, Jay Rainey, Evangelos Kazakos, Afsaneh Fazly, Georgios Tzimiropoulos, Brais Martinez

Abstract: In this paper, we focus on task-specific question answering (QA). To this end, we introduce a method for generating exhaustive and high-quality training data, which allows us to train compact (e.g., run on a mobile device), task-specific QA models that are competitive against GPT variants. The key technological enabler is a novel mechanism for automatic question-answer generation from procedural t… ▽ More In this paper, we focus on task-specific question answering (QA). To this end, we introduce a method for generating exhaustive and high-quality training data, which allows us to train compact (e.g., run on a mobile device), task-specific QA models that are competitive against GPT variants. The key technological enabler is a novel mechanism for automatic question-answer generation from procedural text which can ingest large amounts of textual instructions and produce exhaustive in-domain QA training data. While current QA data generation methods can produce well-formed and varied data, their non-exhaustive nature is sub-optimal for training a QA model. In contrast, we leverage the highly structured aspect of procedural text and represent each step and the overall flow of the procedure as graphs. We then condition on graph nodes to automatically generate QA pairs in an exhaustive and controllable manner. Comprehensive evaluations of our method show that: 1) small models trained with our data achieve excellent performance on the target QA task, even exceeding that of GPT3 and ChatGPT despite being several orders of magnitude smaller. 2) semantic coverage is the key indicator for downstream QA performance. Crucially, while large language models excel at syntactic diversity, this does not necessarily result in improvements on the end QA model. In contrast, the higher semantic coverage provided by our method is critical for QA performance. △ Less

Submitted 24 January, 2024; originally announced January 2024.

Comments: Accepted to EACL 2024 as long paper. 25 pages including appendix

MSC Class: I.2.7

arXiv:2401.00827 [pdf, ps, other]

A multipartite analogue of Dilworth's Theorem

Authors: Jacob Fox, Huy Tuan Pham

Abstract: We prove that every partially ordered set on $n$ elements contains $k$ subsets $A_{1},A_{2},\dots,A_{k}$ such that either each of these subsets has size $Ω(n/k^{5})$ and, for every $i<j$, every element in $A_{i}$ is less than or equal to every element in $A_{j}$, or each of these subsets has size $Ω(n/(k^{2}\log n))$ and, for every $i \not = j$, every element in $A_{i}$ is incomparable with every… ▽ More We prove that every partially ordered set on $n$ elements contains $k$ subsets $A_{1},A_{2},\dots,A_{k}$ such that either each of these subsets has size $Ω(n/k^{5})$ and, for every $i<j$, every element in $A_{i}$ is less than or equal to every element in $A_{j}$, or each of these subsets has size $Ω(n/(k^{2}\log n))$ and, for every $i \not = j$, every element in $A_{i}$ is incomparable with every element in $A_{j}$ for $i\ne j$. This answers a question of the first author from 2006. As a corollary, we prove for each positive integer $h$ there is $C_h$ such that for any $h$ partial orders $<_{1},<_{2},\dots,<_{h}$ on a set of $n$ elements, there exists $k$ subsets $A_{1},A_{2},\dots,A_{k}$ each of size at least $n/(k\log n)^{C_{h}}$ such that for each partial order $<_{\ell}$, either $a_{1}<_{\ell}a_{2}<_{\ell}\dots<_{\ell}a_{k}$ for any tuple of elements $(a_1,a_2,\dots,a_k) \in A_1\times A_2\times \dots \times A_k$, or $a_{1}>_{\ell}a_{2}>_{\ell}\dots>_{\ell}a_{k}$ for any $(a_1,a_2,\dots,a_k) \in A_1\times A_2\times \dots \times A_k$, or $a_i$ is incomparable with $a_j$ for any $i\ne j$, $a_i\in A_i$ and $a_j\in A_j$. This improves on a 2009 result of Pach and the first author motivated by problems in discrete geometry. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2312.11011 [pdf, other]

VinaLLaMA: LLaMA-based Vietnamese Foundation Model

Authors: Quan Nguyen, Huy Pham, Dung Dao

Abstract: In this technical report, we present VinaLLaMA, an open-weight, state-of-the-art (SOTA) Large Language Model for the Vietnamese language, built upon LLaMA-2 with an additional 800 billion trained tokens. VinaLLaMA not only demonstrates fluency in Vietnamese but also exhibits a profound understanding of Vietnamese culture, making it a truly indigenous model. VinaLLaMA-7B-chat, trained on 1 million… ▽ More In this technical report, we present VinaLLaMA, an open-weight, state-of-the-art (SOTA) Large Language Model for the Vietnamese language, built upon LLaMA-2 with an additional 800 billion trained tokens. VinaLLaMA not only demonstrates fluency in Vietnamese but also exhibits a profound understanding of Vietnamese culture, making it a truly indigenous model. VinaLLaMA-7B-chat, trained on 1 million high-quality synthetic samples, achieves SOTA results on key benchmarks, including VLSP, VMLU, and Vicuna Benchmark Vietnamese, marking a significant advancement in the Vietnamese AI landscape and offering a versatile resource for various applications. △ Less

Submitted 18 December, 2023; originally announced December 2023.

Comments: VinaLLaMA Technical Report - 13 pages

arXiv:2312.09445 [pdf, other]

IncepSE: Leveraging InceptionTime's performance with Squeeze and Excitation mechanism in ECG analysis

Authors: Tue Minh Cao, Nhat Hong Tran, Le Phi Nguyen, Hieu Huy Pham, Hung Thanh Nguyen

Abstract: Our study focuses on the potential for modifications of Inception-like architecture within the electrocardiogram (ECG) domain. To this end, we introduce IncepSE, a novel network characterized by strategic architectural incorporation that leverages the strengths of both InceptionTime and channel attention mechanisms. Furthermore, we propose a training setup that employs stabilization techniques tha… ▽ More Our study focuses on the potential for modifications of Inception-like architecture within the electrocardiogram (ECG) domain. To this end, we introduce IncepSE, a novel network characterized by strategic architectural incorporation that leverages the strengths of both InceptionTime and channel attention mechanisms. Furthermore, we propose a training setup that employs stabilization techniques that are aimed at tackling the formidable challenges of severe imbalance dataset PTB-XL and gradient corruption. By this means, we manage to set a new height for deep learning model in a supervised learning manner across the majority of tasks. Our model consistently surpasses InceptionTime by substantial margins compared to other state-of-the-arts in this domain, noticeably 0.013 AUROC score improvement in the "all" task, while also mitigating the inherent dataset fluctuations during training. △ Less

Submitted 16 November, 2023; originally announced December 2023.

arXiv:2312.00398 [pdf, other]

Learning to Estimate Critical Gait Parameters from Single-View RGB Videos with Transformer-Based Attention Network

Authors: Quoc Hung T. Le, Hieu H. Pham

Abstract: Musculoskeletal diseases and cognitive impairments in patients lead to difficulties in movement as well as negative effects on their psychological health. Clinical gait analysis, a vital tool for early diagnosis and treatment, traditionally relies on expensive optical motion capture systems. Recent advances in computer vision and deep learning have opened the door to more accessible and cost-effec… ▽ More Musculoskeletal diseases and cognitive impairments in patients lead to difficulties in movement as well as negative effects on their psychological health. Clinical gait analysis, a vital tool for early diagnosis and treatment, traditionally relies on expensive optical motion capture systems. Recent advances in computer vision and deep learning have opened the door to more accessible and cost-effective alternatives. This paper introduces a novel spatio-temporal Transformer network to estimate critical gait parameters from RGB videos captured by a single-view camera. Empirical evaluations on a public dataset of cerebral palsy patients indicate that the proposed framework surpasses current state-of-the-art approaches and show significant improvements in predicting general gait parameters (including Walking Speed, Gait Deviation Index - GDI, and Knee Flexion Angle at Maximum Extension), while utilizing fewer parameters and alleviating the need for manual feature extraction. △ Less

Submitted 1 March, 2024; v1 submitted 1 December, 2023; originally announced December 2023.

Comments: Accepted at ISBI 2024 (21st IEEE International Symposium on Biomedical Imaging)

arXiv:2311.15041 [pdf, other]

MPCNN: A Novel Matrix Profile Approach for CNN-based Sleep Apnea Classification

Authors: Hieu X. Nguyen, Duong V. Nguyen, Hieu H. Pham, Cuong D. Do

Abstract: Sleep apnea (SA) is a significant respiratory condition that poses a major global health challenge. Previous studies have investigated several machine and deep learning models for electrocardiogram (ECG)-based SA diagnoses. Despite these advancements, conventional feature extractions derived from ECG signals, such as R-peaks and RR intervals, may fail to capture crucial information encompassed wit… ▽ More Sleep apnea (SA) is a significant respiratory condition that poses a major global health challenge. Previous studies have investigated several machine and deep learning models for electrocardiogram (ECG)-based SA diagnoses. Despite these advancements, conventional feature extractions derived from ECG signals, such as R-peaks and RR intervals, may fail to capture crucial information encompassed within the complete PQRST segments. In this study, we propose an innovative approach to address this diagnostic gap by delving deeper into the comprehensive segments of the ECG signal. The proposed methodology draws inspiration from Matrix Profile algorithms, which generate an Euclidean distance profile from fixed-length signal subsequences. From this, we derived the Min Distance Profile (MinDP), Max Distance Profile (MaxDP), and Mean Distance Profile (MeanDP) based on the minimum, maximum, and mean of the profile distances, respectively. To validate the effectiveness of our approach, we use the modified LeNet-5 architecture as the primary CNN model, along with two existing lightweight models, BAFNet and SE-MSCNN, for ECG classification tasks. Our extensive experimental results on the PhysioNet Apnea-ECG dataset revealed that with the new feature extraction method, we achieved a per-segment accuracy up to 92.11 \% and a per-recording accuracy of 100\%. Moreover, it yielded the highest correlation compared to state-of-the-art methods, with a correlation coefficient of 0.989. By introducing a new feature extraction method based on distance relationships, we enhanced the performance of certain lightweight models, showing potential for home sleep apnea test (HSAT) and SA detection in IoT devices. The source code for this work is made publicly available in GitHub: https://github.com/vinuni-vishc/MPCNN-Sleep-Apnea. △ Less

Submitted 25 November, 2023; originally announced November 2023.

arXiv:2311.05192 [pdf, other]

TransReg: Cross-transformer as auto-registration module for multi-view mammogram mass detection

Authors: Hoang C. Nguyen, Chi Phan, Hieu H. Pham

Abstract: Screening mammography is the most widely used method for early breast cancer detection, significantly reducing mortality rates. The integration of information from multi-view mammograms enhances radiologists' confidence and diminishes false-positive rates since they can examine on dual-view of the same breast to cross-reference the existence and location of the lesion. Inspired by this, we present… ▽ More Screening mammography is the most widely used method for early breast cancer detection, significantly reducing mortality rates. The integration of information from multi-view mammograms enhances radiologists' confidence and diminishes false-positive rates since they can examine on dual-view of the same breast to cross-reference the existence and location of the lesion. Inspired by this, we present TransReg, a Computer-Aided Detection (CAD) system designed to exploit the relationship between craniocaudal (CC), and mediolateral oblique (MLO) views. The system includes cross-transformer to model the relationship between the region of interest (RoIs) extracted by siamese Faster RCNN network for mass detection problems. Our work is the first time cross-transformer has been integrated into an object detection framework to model the relation between ipsilateral views. Our experimental evaluation on DDSM and VinDr-Mammo datasets shows that our TransReg, equipped with SwinT as a feature extractor achieves state-of-the-art performance. Specifically, at the false positive rate per image at 0.5, TransReg using SwinT gets a recall at 83.3% for DDSM dataset and 79.7% for VinDr-Mammo dataset. Furthermore, we conduct a comprehensive analysis to demonstrate that cross-transformer can function as an auto-registration module, aligning the masses in dual-view and utilizing this information to inform final predictions. It is a replication diagnostic workflow of expert radiologists △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.00638 [pdf, other]

FAIRLABEL: Correcting Bias in Labels

Authors: Srinivasan H Sengamedu, Hien Pham

Abstract: There are several algorithms for measuring fairness of ML models. A fundamental assumption in these approaches is that the ground truth is fair or unbiased. In real-world datasets, however, the ground truth often contains data that is a result of historical and societal biases and discrimination. Models trained on these datasets will inherit and propagate the biases to the model outputs. We propos… ▽ More There are several algorithms for measuring fairness of ML models. A fundamental assumption in these approaches is that the ground truth is fair or unbiased. In real-world datasets, however, the ground truth often contains data that is a result of historical and societal biases and discrimination. Models trained on these datasets will inherit and propagate the biases to the model outputs. We propose FAIRLABEL, an algorithm which detects and corrects biases in labels. The goal of FAIRLABELis to reduce the Disparate Impact (DI) across groups while maintaining high accuracy in predictions. We propose metrics to measure the quality of bias correction and validate FAIRLABEL on synthetic datasets and show that the label correction is correct 86.7% of the time vs. 71.9% for a baseline model. We also apply FAIRLABEL on benchmark datasets such as UCI Adult, German Credit Risk, and Compas datasets and show that the Disparate Impact Ratio increases by as much as 54.2%. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: ICDM LegalAI Workshop 2023

MSC Class: 68T07 ACM Class: I.2.6

Journal ref: ICDM 2023 Workshop

arXiv:2310.18532 [pdf, other]

SkipAnalyzer: A Tool for Static Code Analysis with Large Language Models

Authors: Mohammad Mahdi Mohajer, Reem Aleithan, Nima Shiri Harzevili, Moshi Wei, Alvine Boaye Belle, Hung Viet Pham, Song Wang

Abstract: We introduce SkipAnalyzer, a large language model (LLM)-powered tool for static code analysis. SkipAnalyzer has three components: 1) an LLM-based static bug detector that scans source code and reports specific types of bugs, 2) an LLM-based false-positive filter that can identify false-positive bugs in the results of static bug detectors (e.g., the result of step 1) to improve detection accuracy,… ▽ More We introduce SkipAnalyzer, a large language model (LLM)-powered tool for static code analysis. SkipAnalyzer has three components: 1) an LLM-based static bug detector that scans source code and reports specific types of bugs, 2) an LLM-based false-positive filter that can identify false-positive bugs in the results of static bug detectors (e.g., the result of step 1) to improve detection accuracy, and 3) an LLM-based patch generator that can generate patches for the detected bugs above. As a proof-of-concept, SkipAnalyzer is built on ChatGPT, which has exhibited outstanding performance in various software engineering tasks. To evaluate SkipAnalyzer, we focus on two types of typical and critical bugs that are targeted by static bug detection, i.e., Null Dereference and Resource Leak as subjects. We employ Infer to aid the gathering of these two bug types from 10 open-source projects. Consequently, our experiment dataset contains 222 instances of Null Dereference bugs and 46 instances of Resource Leak bugs. Our study demonstrates that SkipAnalyzer achieves remarkable performance in the mentioned static analysis tasks, including bug detection, false-positive warning removal, and bug repair. In static bug detection, SkipAnalyzer achieves accuracy values of up to 68.37% for detecting Null Dereference bugs and 76.95% for detecting Resource Leak bugs, improving the precision of the current leading bug detector, Infer, by 12.86% and 43.13%, respectively. For removing false-positive warnings, SkipAnalyzer can reach a precision of up to 93.88% for Null Dereference bugs and 63.33% for Resource Leak bugs. Additionally, SkipAnalyzer surpasses state-of-the-art false-positive warning removal tools. Furthermore, in bug repair, SkipAnalyzer can generate syntactically correct patches to fix its detected bugs with a success rate of up to 97.30%. △ Less

Submitted 17 December, 2023; v1 submitted 27 October, 2023; originally announced October 2023.

arXiv:2310.14030 [pdf, other]

Visual Tracking Nonlinear Model Predictive Control Method for Autonomous Wind Turbine Inspection

Authors: Abdelhakim Amer, Mohit Mehndiratta, Jonas le Fevre Sejersen, Huy Xuan Pham, Erdal Kayacan

Abstract: Automated visual inspection of on-and offshore wind turbines using aerial robots provides several benefits, namely, a safe working environment by circumventing the need for workers to be suspended high above the ground, reduced inspection time, preventive maintenance, and access to hard-to-reach areas. A novel nonlinear model predictive control (NMPC) framework alongside a global wind turbine path… ▽ More Automated visual inspection of on-and offshore wind turbines using aerial robots provides several benefits, namely, a safe working environment by circumventing the need for workers to be suspended high above the ground, reduced inspection time, preventive maintenance, and access to hard-to-reach areas. A novel nonlinear model predictive control (NMPC) framework alongside a global wind turbine path planner is proposed to achieve distance-optimal coverage for wind turbine inspection. Unlike traditional MPC formulations, visual tracking NMPC (VT-NMPC) is designed to track an inspection surface, instead of a position and heading trajectory, thereby circumventing the need to provide an accurate predefined trajectory for the drone. An additional capability of the proposed VT-NMPC method is that by incorporating inspection requirements as visual tracking costs to minimize, it naturally achieves the inspection task successfully while respecting the physical constraints of the drone. Multiple simulation runs and real-world tests demonstrate the efficiency and efficacy of the proposed automated inspection framework, which outperforms the traditional MPC designs, by providing full coverage of the target wind turbine blades as well as its robustness to changing wind conditions. The implementation codes are open-sourced. △ Less

Submitted 21 October, 2023; originally announced October 2023.

Comments: 8 pages, accepted for publication at ICAR conference

arXiv:2310.06912 [pdf, other]

Benchmarking Deep Learning Fuzzers

Authors: Nima Shiri Harzevili, Hung Viet Pham, Song Wang

Abstract: In this work, we set out to conduct the first ground-truth empirical evaluation of state-of-the-art DL fuzzers. Specifically, we first manually created an extensive DL bug benchmark dataset, which includes 627 real-world DL bugs from TensorFlow and PyTorch libraries reported by users between 2020 and 2022. Then we run three state-of-the-art DL fuzzers, i.e., FreeFuzz, DeepRel, and DocTer, on the b… ▽ More In this work, we set out to conduct the first ground-truth empirical evaluation of state-of-the-art DL fuzzers. Specifically, we first manually created an extensive DL bug benchmark dataset, which includes 627 real-world DL bugs from TensorFlow and PyTorch libraries reported by users between 2020 and 2022. Then we run three state-of-the-art DL fuzzers, i.e., FreeFuzz, DeepRel, and DocTer, on the benchmark by following their instructions. We find that these fuzzers are unable to detect many real bugs collected in our benchmark dataset. Specifically, most (235) of the 257 applicable bugs cannot be detected by any fuzzer. Our systematic analysis further identifies four major, broad, and common factors that affect these fuzzers' ability to detect real bugs. These findings present opportunities to improve the performance of the fuzzers in future work. As a proof of concept, we propose a lightweight corner case generator as an extension to the three DL fuzzers, which simply covers several boundary values as well as DL-specific data types. It helps FreeFuzz, DeepRel, and DocTer detect 12, 12, and 14 more bugs, respectively, that were overlooked by the original fuzzers. Overall, this work complements prior studies on DL fuzzers with an extensive performance evaluation and provides a benchmark for future DL library fuzzing studies. Also, our proposed corner case generator proves that the fuzzers can be extended to detect more bugs by extending their internal fuzzing logic based on the insights provided in root cause analysis. △ Less

Submitted 10 October, 2023; originally announced October 2023.

arXiv:2309.04317 [pdf, other]

Actor critic learning algorithms for mean-field control with moment neural networks

Authors: Huyên Pham, Xavier Warin

Abstract: We develop a new policy gradient and actor-critic algorithm for solving mean-field control problems within a continuous time reinforcement learning setting. Our approach leverages a gradient-based representation of the value function, employing parametrized randomized policies. The learning for both the actor (policy) and critic (value function) is facilitated by a class of moment neural network f… ▽ More We develop a new policy gradient and actor-critic algorithm for solving mean-field control problems within a continuous time reinforcement learning setting. Our approach leverages a gradient-based representation of the value function, employing parametrized randomized policies. The learning for both the actor (policy) and critic (value function) is facilitated by a class of moment neural network functions on the Wasserstein space of probability measures, and the key feature is to sample directly trajectories of distributions. A central challenge addressed in this study pertains to the computational treatment of an operator specific to the mean-field framework. To illustrate the effectiveness of our methods, we provide a comprehensive set of numerical results. These encompass diverse examples, including multi-dimensional settings and nonlinear quadratic mean-field control problems with controlled volatility. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 16 pages, 11 figures

MSC Class: 68T07

arXiv:2308.15772 [pdf, other]

Task-Based MoE for Multitask Multilingual Machine Translation

Authors: Hai Pham, Young Jin Kim, Subhabrata Mukherjee, David P. Woodruff, Barnabas Poczos, Hany Hassan Awadalla

Abstract: Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications. However, current MoE implementations are task agnostic, treating all tokens from different tasks in the same manner. In this work, we instead design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic… ▽ More Mixture-of-experts (MoE) architecture has been proven a powerful method for diverse tasks in training deep models in many applications. However, current MoE implementations are task agnostic, treating all tokens from different tasks in the same manner. In this work, we instead design a novel method that incorporates task information into MoE models at different granular levels with shared dynamic task-based adapters. Our experiments and analysis show the advantages of our approaches over the dense and canonical MoE models on multi-task multilingual machine translations. With task-specific adapters, our models can additionally generalize to new tasks efficiently. △ Less

Submitted 24 October, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

arXiv:2308.07832 [pdf, ps, other]

REFORMS: Reporting Standards for Machine Learning Based Science

Authors: Sayash Kapoor, Emily Cantrell, Kenny Peng, Thanh Hien Pham, Christopher A. Bail, Odd Erik Gundersen, Jake M. Hofman, Jessica Hullman, Michael A. Lones, Momin M. Malik, Priyanka Nanayakkara, Russell A. Poldrack, Inioluwa Deborah Raji, Michael Roberts, Matthew J. Salganik, Marta Serra-Garcia, Brandon M. Stewart, Gilles Vandewiele, Arvind Narayanan

Abstract: Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways acros… ▽ More Machine learning (ML) methods are proliferating in scientific research. However, the adoption of these methods has been accompanied by failures of validity, reproducibility, and generalizability. These failures can hinder scientific progress, lead to false consensus around invalid claims, and undermine the credibility of ML-based science. ML methods are often applied and fail in similar ways across disciplines. Motivated by this observation, our goal is to provide clear reporting standards for ML-based science. Drawing from an extensive review of past literature, we present the REFORMS checklist ($\textbf{Re}$porting Standards $\textbf{For}$ $\textbf{M}$achine Learning Based $\textbf{S}$cience). It consists of 32 questions and a paired set of guidelines. REFORMS was developed based on a consensus of 19 researchers across computer science, data science, mathematics, social sciences, and biomedical sciences. REFORMS can serve as a resource for researchers when designing and implementing a study, for referees when reviewing papers, and for journals when enforcing standards for transparency and reproducibility. △ Less

Submitted 19 September, 2023; v1 submitted 15 August, 2023; originally announced August 2023.

arXiv:2307.16834 [pdf]

doi 10.1007/978-3-031-53963-3_25

Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection System

Authors: Hoang Viet Pham, Thinh Gia Tran, Chuong Dinh Le, An Dinh Le, Hien Bich Vo

Abstract: Innovative enhancement in embedded system platforms, specifically hardware accelerations, significantly influence the application of deep learning in real-world scenarios. These innovations translate human labor efforts into automated intelligent systems employed in various areas such as autonomous driving, robotics, Internet-of-Things (IoT), and numerous other impactful applications. NVIDIA's Jet… ▽ More Innovative enhancement in embedded system platforms, specifically hardware accelerations, significantly influence the application of deep learning in real-world scenarios. These innovations translate human labor efforts into automated intelligent systems employed in various areas such as autonomous driving, robotics, Internet-of-Things (IoT), and numerous other impactful applications. NVIDIA's Jetson platform is one of the pioneers in offering optimal performance regarding energy efficiency and throughput in the execution of deep learning algorithms. Previously, most benchmarking analysis was based on 2D images with a single deep learning model for each comparison result. In this paper, we implement an end-to-end video-based crime-scene anomaly detection system inputting from surveillance videos and the system is deployed and completely operates on multiple Jetson edge devices (Nano, AGX Xavier, Orin Nano). The comparison analysis includes the integration of Torch-TensorRT as a software developer kit from NVIDIA for the model performance optimisation. The system is built based on the PySlowfast open-source project from Facebook as the coding template. The end-to-end system process comprises the videos from camera, data preprocessing pipeline, feature extractor and the anomaly detection. We provide the experience of an AI-based system deployment on various Jetson Edge devices with Docker technology. Regarding anomaly detectors, a weakly supervised video-based deep learning model called Robust Temporal Feature Magnitude Learning (RTFM) is applied in the system. The approach system reaches 47.56 frames per second (FPS) inference speed on a Jetson edge device with only 3.11 GB RAM usage total. We also discover the promising Jetson device that the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power. △ Less

Submitted 12 September, 2023; v1 submitted 28 July, 2023; originally announced July 2023.

Comments: Accepted in Future of Information and Communication Conference (FICC) 2024

arXiv:2307.10466 [pdf, ps, other]

Universality of Spectral Independence with Applications to Fast Mixing in Spin Glasses

Authors: Nima Anari, Vishesh Jain, Frederic Koehler, Huy Tuan Pham, Thuy-Duong Vuong

Abstract: We study Glauber dynamics for sampling from discrete distributions $μ$ on the hypercube $\{\pm 1\}^n$. Recently, techniques based on spectral independence have successfully yielded optimal $O(n)$ relaxation times for a host of different distributions $μ$. We show that spectral independence is universal: a relaxation time of $O(n)$ implies spectral independence. We then study a notion of tractabi… ▽ More We study Glauber dynamics for sampling from discrete distributions $μ$ on the hypercube $\{\pm 1\}^n$. Recently, techniques based on spectral independence have successfully yielded optimal $O(n)$ relaxation times for a host of different distributions $μ$. We show that spectral independence is universal: a relaxation time of $O(n)$ implies spectral independence. We then study a notion of tractability for $μ$, defined in terms of smoothness of the multilinear extension of its Hamiltonian -- $\log μ$ -- over $[-1,+1]^n$. We show that Glauber dynamics has relaxation time $O(n)$ for such $μ$, and using the universality of spectral independence, we conclude that these distributions are also fractionally log-concave and consequently satisfy modified log-Sobolev inequalities. We sharpen our estimates and obtain approximate tensorization of entropy and the optimal $\widetilde{O}(n)$ mixing time for random Hamiltonians, i.e. the classically studied mixed $p$-spin model at sufficiently high temperature. These results have significant downstream consequences for concentration of measure, statistical testing, and learning. △ Less

Submitted 19 July, 2023; originally announced July 2023.

arXiv:2307.02289 [pdf, other]

Fuzzing with Quantitative and Adaptive Hot-Bytes Identification

Authors: Tai D. Nguyen, Long H. Pham, Jun Sun

Abstract: Fuzzing has emerged as a powerful technique for finding security bugs in complicated real-world applications. American fuzzy lop (AFL), a leading fuzzing tool, has demonstrated its powerful bug finding ability through a vast number of reported CVEs. However, its random mutation strategy is unable to generate test inputs that satisfy complicated branching conditions (e.g., magic-byte comparisons, c… ▽ More Fuzzing has emerged as a powerful technique for finding security bugs in complicated real-world applications. American fuzzy lop (AFL), a leading fuzzing tool, has demonstrated its powerful bug finding ability through a vast number of reported CVEs. However, its random mutation strategy is unable to generate test inputs that satisfy complicated branching conditions (e.g., magic-byte comparisons, checksum tests, and nested if-statements), which are commonly used in image decoders/encoders, XML parsers, and checksum tools. Existing approaches (such as Steelix and Neuzz) on addressing this problem assume unrealistic assumptions such as we can satisfy the branch condition byte-to-byte or we can identify and focus on the important bytes in the input (called hot-bytes) once and for all. In this work, we propose an approach called \tool~which is designed based on the following principles. First, there is a complicated relation between inputs and branching conditions and thus we need not only an expressive model to capture such relationship but also an informative measure so that we can learn such relationship effectively. Second, different branching conditions demand different hot-bytes and we must adjust our fuzzing strategy adaptively depending on which branches are the current bottleneck. We implement our approach as an open source project and compare its efficiency with other state-of-the-art fuzzers. Our evaluation results on 10 real-world programs and LAVA-M dataset show that \tool~achieves sustained increases in branch coverage and discovers more bugs than other fuzzers. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2306.06579 [pdf, other]

Learning Robust and Consistent Time Series Representations: A Dilated Inception-Based Approach

Authors: Anh Duy Nguyen, Trang H. Tran, Hieu H. Pham, Phi Le Nguyen, Lam M. Nguyen

Abstract: Representation learning for time series has been an important research area for decades. Since the emergence of the foundation models, this topic has attracted a lot of attention in contrastive self-supervised learning, to solve a wide range of downstream tasks. However, there have been several challenges for contrastive time series processing. First, there is no work considering noise, which is o… ▽ More Representation learning for time series has been an important research area for decades. Since the emergence of the foundation models, this topic has attracted a lot of attention in contrastive self-supervised learning, to solve a wide range of downstream tasks. However, there have been several challenges for contrastive time series processing. First, there is no work considering noise, which is one of the critical factors affecting the efficacy of time series tasks. Second, there is a lack of efficient yet lightweight encoder architectures that can learn informative representations robust to various downstream tasks. To fill in these gaps, we initiate a novel sampling strategy that promotes consistent representation learning with the presence of noise in natural time series. In addition, we propose an encoder architecture that utilizes dilated convolution within the Inception block to create a scalable and robust network architecture with a wide receptive field. Experiments demonstrate that our method consistently outperforms state-of-the-art methods in forecasting, classification, and abnormality detection tasks, e.g. ranks first over two-thirds of the classification UCR datasets, with only $40\%$ of the parameters compared to the second-best approach. Our source code for CoInception framework is accessible at https://github.com/anhduy0911/CoInception. △ Less

Submitted 11 June, 2023; originally announced June 2023.

arXiv:2306.03269 [pdf, other]

Security Knowledge-Guided Fuzzing of Deep Learning Libraries

Authors: Nima Shiri Harzevili, Mohammad Mahdi Mohajer, Moshi Wei, Hung Viet Pham, Song Wang

Abstract: Recently, many Deep Learning fuzzers have been proposed for testing of DL libraries. However, they either perform unguided input generation (e.g., not considering the relationship between API arguments when generating inputs) or only support a limited set of corner case test inputs. Furthermore, a substantial number of developer APIs crucial for library development remain untested, as they are typ… ▽ More Recently, many Deep Learning fuzzers have been proposed for testing of DL libraries. However, they either perform unguided input generation (e.g., not considering the relationship between API arguments when generating inputs) or only support a limited set of corner case test inputs. Furthermore, a substantial number of developer APIs crucial for library development remain untested, as they are typically not well-documented and lack clear usage guidelines. To fill this gap, we propose a novel fuzzer named Orion, which combines guided test input generation and corner case test input generation based on a set of fuzzing rules constructed from historical data that is known to trigger vulnerabilities in the implementation of DL APIs. To extract the fuzzing rules, we first conduct an empirical study regarding the root cause analysis of 376 vulnerabilities in two of the most popular DL libraries, i.e., PyTorch and TensorFlow. We then construct the rules based on the root causes of the historical vulnerabilities. Our evaluation shows that Orion reports 135 vulnerabilities on the latest releases of TensorFlow and PyTorch, 76 of which were confirmed by the library developers. Among the 76 confirmed vulnerabilities, 69 are previously unknown, and 7 have already been fixed. The rest are awaiting further confirmation. Regarding end-user APIs, Orion was able to detect 31.8% and 90% more vulnerabilities on TensorFlow and PyTorch, respectively, compared to the state-of-the-art conventional fuzzer, i.e., DeepRel. When compared to the state-of-the-art LLM-based DL fuzzer, AtlasFuzz, Orion detected 13.63% more vulnerabilities on TensorFlow and 18.42% more vulnerabilities on PyTorch. Regarding developer APIs, Orion stands out by detecting 117% more vulnerabilities on TensorFlow and 100% more vulnerabilities on PyTorch compared to the most relevant fuzzer designed for developer APIs, such as FreeFuzz. △ Less

Submitted 24 December, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

arXiv:2305.18607 [pdf, other]

doi 10.1145/3597926.3598135

How Effective Are Neural Networks for Fixing Security Vulnerabilities

Authors: Yi Wu, Nan Jiang, Hung Viet Pham, Thibaud Lutellier, Jordan Davis, Lin Tan, Petr Babkin, Sameena Shah

Abstract: Security vulnerability repair is a difficult task that is in dire need of automation. Two groups of techniques have shown promise: (1) large code language models (LLMs) that have been pre-trained on source code for tasks such as code completion, and (2) automated program repair (APR) techniques that use deep learning (DL) models to automatically fix software bugs. This paper is the first to stud… ▽ More Security vulnerability repair is a difficult task that is in dire need of automation. Two groups of techniques have shown promise: (1) large code language models (LLMs) that have been pre-trained on source code for tasks such as code completion, and (2) automated program repair (APR) techniques that use deep learning (DL) models to automatically fix software bugs. This paper is the first to study and compare Java vulnerability repair capabilities of LLMs and DL-based APR models. The contributions include that we (1) apply and evaluate five LLMs (Codex, CodeGen, CodeT5, PLBART and InCoder), four fine-tuned LLMs, and four DL-based APR techniques on two real-world Java vulnerability benchmarks (Vul4J and VJBench), (2) design code transformations to address the training and test data overlapping threat to Codex, (3) create a new Java vulnerability repair benchmark VJBench, and its transformed version VJBench-trans and (4) evaluate LLMs and APR techniques on the transformed vulnerabilities in VJBench-trans. Our findings include that (1) existing LLMs and APR models fix very few Java vulnerabilities. Codex fixes 10.2 (20.4%), the most number of vulnerabilities. (2) Fine-tuning with general APR data improves LLMs' vulnerability-fixing capabilities. (3) Our new VJBench reveals that LLMs and APR models fail to fix many Common Weakness Enumeration (CWE) types, such as CWE-325 Missing cryptographic step and CWE-444 HTTP request smuggling. (4) Codex still fixes 8.3 transformed vulnerabilities, outperforming all the other LLMs and APR models on transformed vulnerabilities. The results call for innovations to enhance automated Java vulnerability repair such as creating larger vulnerability repair training data, tuning LLMs with such data, and applying code simplification transformation to facilitate vulnerability repair. △ Less

Submitted 1 April, 2024; v1 submitted 29 May, 2023; originally announced May 2023.

Comments: This paper was accepted in the proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2023), and was presented at the conference, that was held in Seattle, USA, 17-21 July 2023

arXiv:2305.14707 [pdf, other]

SciFix: Outperforming GPT3 on Scientific Factual Error Correction

Authors: Dhananjay Ashok, Atharva Kulkarni, Hai Pham, Barnabás Póczos

Abstract: Due to the prohibitively high cost of creating error correction datasets, most Factual Claim Correction methods rely on a powerful verification model to guide the correction process. This leads to a significant drop in performance in domains like scientific claims, where good verification models do not always exist. In this work, we introduce SciFix, a scientific claim correction system that does… ▽ More Due to the prohibitively high cost of creating error correction datasets, most Factual Claim Correction methods rely on a powerful verification model to guide the correction process. This leads to a significant drop in performance in domains like scientific claims, where good verification models do not always exist. In this work, we introduce SciFix, a scientific claim correction system that does not require a verifier but can outperform existing methods by a considerable margin -- achieving correction accuracy of 84% on the SciFact dataset, 77% on SciFact-Open and 72% on the CovidFact dataset, compared to next best accuracies of 7%, 5%, and 15% on the same datasets respectively. Our method leverages the power of prompting with LLMs during training to create a richly annotated dataset that can be used for fully supervised training and regularization. We additionally use a claim-aware decoding procedure to improve the quality of corrected claims. Our method outperforms the very LLM that was used to generate the annotated dataset -- with Few-Shot Prompting on GPT3.5 achieving 58%, 61%, and 64% on the respective datasets, a consistently lower correction accuracy, despite using nearly 800 times as many parameters as our model. △ Less

Submitted 12 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

Comments: To appear in proceedings of EMNLP2023 (findings)

arXiv:2305.14132 [pdf, ps, other]

Set-coloring Ramsey numbers and error-correcting codes near the zero-rate threshold

Authors: David Conlon, Jacob Fox, Huy Tuan Pham, Yufei Zhao

Abstract: For positive integers $n,r,s$ with $r > s$, the set-coloring Ramsey number $R(n;r,s)$ is the minimum $N$ such that if every edge of the complete graph $K_N$ receives a set of $s$ colors from a palette of $r$ colors, then there is a subset of $n$ vertices where all of the edges between them receive a common color. If $n$ is fixed and $\frac{s}{r}$ is less than and bounded away from… ▽ More For positive integers $n,r,s$ with $r > s$, the set-coloring Ramsey number $R(n;r,s)$ is the minimum $N$ such that if every edge of the complete graph $K_N$ receives a set of $s$ colors from a palette of $r$ colors, then there is a subset of $n$ vertices where all of the edges between them receive a common color. If $n$ is fixed and $\frac{s}{r}$ is less than and bounded away from $1-\frac{1}{n-1}$, then $R(n;r,s)$ is known to grow exponentially in $r$, while if $\frac{s}{r}$ is greater than and bounded away from $1-\frac{1}{n-1}$, then $R(n;r,s)$ is bounded. Here we prove bounds for $R(n;r,s)$ in the intermediate range where $\frac{s}{r}$ is close to $1 - \frac{1}{n-1}$ by establishing a connection to the maximum size of error-correcting codes near the zero-rate threshold. △ Less

Submitted 14 August, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

arXiv:2305.10429 [pdf, other]

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

Authors: Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu

Abstract: The mixture proportions of pretraining data domains (e.g., Wikipedia, books, web text) greatly affect language model (LM) performance. In this paper, we propose Domain Reweighting with Minimax Optimization (DoReMi), which first trains a small proxy model using group distributionally robust optimization (Group DRO) over domains to produce domain weights (mixture proportions) without knowledge of do… ▽ More The mixture proportions of pretraining data domains (e.g., Wikipedia, books, web text) greatly affect language model (LM) performance. In this paper, we propose Domain Reweighting with Minimax Optimization (DoReMi), which first trains a small proxy model using group distributionally robust optimization (Group DRO) over domains to produce domain weights (mixture proportions) without knowledge of downstream tasks. We then resample a dataset with these domain weights and train a larger, full-sized model. In our experiments, we use DoReMi on a 280M-parameter proxy model to set the domain weights for training an 8B-parameter model (30x larger) more efficiently. On The Pile, DoReMi improves perplexity across all domains, even when it downweights a domain. DoReMi improves average few-shot downstream accuracy by 6.5% points over a baseline model trained using The Pile's default domain weights and reaches the baseline accuracy with 2.6x fewer training steps. On the GLaM dataset, DoReMi, which has no knowledge of downstream tasks, even matches the performance of using domain weights tuned on downstream tasks. △ Less

Submitted 20 November, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: NeurIPS 2023

arXiv:2305.06198 [pdf, ps, other]

Optimal mixing of the down-up walk on independent sets of a given size

Authors: Vishesh Jain, Marcus Michelen, Huy Tuan Pham, Thuy-Duong Vuong

Abstract: Let $G$ be a graph on $n$ vertices of maximum degree $Δ$. We show that, for any $δ> 0$, the down-up walk on independent sets of size $k \leq (1-δ)α_c(Δ)n$ mixes in time $O_{Δ,δ}(k\log{n})$, thereby resolving a conjecture of Davies and Perkins in an optimal form. Here, $α_{c}(Δ)n$ is the NP-hardness threshold for the problem of counting independent sets of a given size in a graph on $n$ vertices of… ▽ More Let $G$ be a graph on $n$ vertices of maximum degree $Δ$. We show that, for any $δ> 0$, the down-up walk on independent sets of size $k \leq (1-δ)α_c(Δ)n$ mixes in time $O_{Δ,δ}(k\log{n})$, thereby resolving a conjecture of Davies and Perkins in an optimal form. Here, $α_{c}(Δ)n$ is the NP-hardness threshold for the problem of counting independent sets of a given size in a graph on $n$ vertices of maximum degree $Δ$. Our mixing time has optimal dependence on $k,n$ for the entire range of $k$; previously, even polynomial mixing was not known. In fact, for $k = Ω_Δ(n)$ in this range, we establish a log-Sobolev inequality with optimal constant $Ω_{Δ,δ}(1/n)$. At the heart of our proof are three new ingredients, which may be of independent interest. The first is a method for lifting $\ell_\infty$-independence from a suitable distribution on the discrete cube -- in this case, the hard-core model -- to the slice by proving stability of an Edgeworth expansion using a multivariate zero-free region for the base distribution. The second is a generalization of the Lee-Yau induction to prove log-Sobolev inequalities for distributions on the slice with considerably less symmetry than the uniform distribution. The third is a sharp decomposition-type result which provides a lossless comparison between the Dirichlet form of the original Markov chain and that of the so-called projected chain in the presence of a contractive coupling. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: 25 pages; comments welcome!

arXiv:2305.00627 [pdf]

CNN-based fully automatic mitral valve extraction using CT images and existence probability maps

Authors: Yukiteru Masuda, Ryo Ishikawa, Toru Tanaka, Gakuto Aoyama, Keitaro Kawashima, James V. Chapman, Masahiko Asami, Michael Huy Cuong Pham, Klaus Fuglsang Kofoed, Takuya Sakaguchi, Kiyohide Satoh

Abstract: Accurate extraction of mitral valve shape from clinical tomographic images acquired in patients has proven useful for planning surgical and interventional mitral valve treatments. However, manual extraction of the mitral valve shape is laborious, and the existing automatic extraction methods have not been sufficiently accurate. In this paper, we propose a fully automated method of extracting mitra… ▽ More Accurate extraction of mitral valve shape from clinical tomographic images acquired in patients has proven useful for planning surgical and interventional mitral valve treatments. However, manual extraction of the mitral valve shape is laborious, and the existing automatic extraction methods have not been sufficiently accurate. In this paper, we propose a fully automated method of extracting mitral valve shape from computed tomography (CT) images for the all phases of the cardiac cycle. This method extracts the mitral valve shape based on DenseNet using both the original CT image and the existence probability maps of the mitral valve area inferred by U-Net as input. A total of 1585 CT images from 204 patients with various cardiac diseases including mitral regurgitation (MR) were collected and manually annotated for mitral valve region. The proposed method was trained and evaluated by 10-fold cross validation using the collected data and was compared with the method without the existence probability maps. The mean error of shape extraction error in the proposed method is 0.88 mm, which is an improvement of 0.32 mm compared with the method without the existence probability maps. △ Less

Submitted 18 May, 2023; v1 submitted 30 April, 2023; originally announced May 2023.

Comments: 15 pages, 6 figure, 3 table. changed title, modified taipo

Showing 1–50 of 246 results for author: Pham, H