Skip to main content

Showing 1–50 of 249 results for author: Tran, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10974  [pdf, other

    cs.NI

    Deep Reinforcement Learning for Network Energy Saving in 6G and Beyond Networks

    Authors: Dinh-Hieu Tran, Nguyen Van Huynh, Soumeya Kaada, Van Nhan Vo, Eva Lagunas, Symeon Chatzinotas

    Abstract: Network energy saving has received great attention from operators and vendors to reduce energy consumption and CO2 emissions to the environment as well as significantly reduce costs for mobile network operators. However, the design of energy-saving networks also needs to ensure the mobile users' (MUs) QoS requirements such as throughput requirements (TR). This work considers a mobile cellular netw… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 7 pages, 4 figures

  2. arXiv:2407.21050  [pdf

    cs.CL

    Artificial Intelligence in Extracting Diagnostic Data from Dental Records

    Authors: Yao-Shun Chuang, Chun-Teh Lee, Oluwabunmi Tokede, Guo-Hao Lin, Ryan Brandon, Trung Duong Tran, Xiaoqian Jiang, Muhammad F. Walji

    Abstract: This research addresses the issue of missing structured data in dental records by extracting diagnostic information from unstructured text. The updated periodontology classification system's complexity has increased incomplete or missing structured diagnoses. To tackle this, we use advanced AI and NLP methods, leveraging GPT-4 to generate synthetic notes for fine-tuning a RoBERTa model. This signi… ▽ More

    Submitted 12 August, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: 11 pages, 2 tables, 3 figures, under review

  3. arXiv:2407.18839  [pdf, other

    cs.CV

    Scalable Group Choreography via Variational Phase Manifold Learning

    Authors: Nhat Le, Khoa Do, Xuan Bui, Tuong Do, Erman Tjiputra, Quang D. Tran, Anh Nguyen

    Abstract: Generating group dance motion from the music is a challenging task with several industrial applications. Although several methods have been proposed to tackle this problem, most of them prioritize optimizing the fidelity in dancing movement, constrained by predetermined dancer counts in datasets. This limitation impedes adaptability to real-world applications. Our study addresses the scalability p… ▽ More

    Submitted 31 July, 2024; v1 submitted 26 July, 2024; originally announced July 2024.

    Comments: Accepted at ECCV 2024

  4. arXiv:2407.18066  [pdf, other

    cs.LG cs.NI

    Multi-Agent Deep Reinforcement Learning for Resilience Optimization in 5G RAN

    Authors: Soumeya Kaada, Dinh-Hieu Tran, Nguyen Van Huynh, Marie-Line Alberi Morel, Sofiene Jelassi, Gerardo Rubino

    Abstract: Resilience is defined as the ability of a network to resist, adapt, and quickly recover from disruptions, and to continue to maintain an acceptable level of services from users' perspective. With the advent of future radio networks, including advanced 5G and upcoming 6G, critical services become integral to future networks, requiring uninterrupted service delivery for end users. Unfortunately, wit… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  5. arXiv:2407.17790  [pdf, other

    cs.LG cs.AR

    Exploring the Limitations of Kolmogorov-Arnold Networks in Classification: Insights to Software Training and Hardware Implementation

    Authors: Van Duy Tran, Tran Xuan Hieu Le, Thi Diem Tran, Hoai Luan Pham, Vu Trung Duong Le, Tuan Hai Vu, Van Tinh Nguyen, Yasuhiko Nakashima

    Abstract: Kolmogorov-Arnold Networks (KANs), a novel type of neural network, have recently gained popularity and attention due to the ability to substitute multi-layer perceptions (MLPs) in artificial intelligence (AI) with higher accuracy and interoperability. However, KAN assessment is still limited and cannot provide an in-depth analysis of a specific domain. Furthermore, no study has been conducted on t… ▽ More

    Submitted 25 July, 2024; v1 submitted 25 July, 2024; originally announced July 2024.

    Comments: 6 pages, 3 figures, 2 tables

  6. arXiv:2407.17571  [pdf, other

    cs.CV

    Diffusion Models for Multi-Task Generative Modeling

    Authors: Changyou Chen, Han Ding, Bunyamin Sisman, Yi Xu, Ouye Xie, Benjamin Z. Yao, Son Dinh Tran, Belinda Zeng

    Abstract: Diffusion-based generative modeling has been achieving state-of-the-art results on various generation tasks. Most diffusion models, however, are limited to a single-generation modeling. Can we generalize diffusion models with the ability of multi-modal generative training for more generalizable modeling? In this paper, we propose a principled way to define a diffusion model by constructing a unifi… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Published as a conference paper at ICLR 2024

  7. arXiv:2407.17181  [pdf, other

    eess.IV cs.CV

    Trans2Unet: Neural fusion for Nuclei Semantic Segmentation

    Authors: Dinh-Phu Tran, Quoc-Anh Nguyen, Van-Truong Pham, Thi-Thao Tran

    Abstract: Nuclei segmentation, despite its fundamental role in histopathological image analysis, is still a challenge work. The main challenge of this task is the existence of overlapping areas, which makes separating independent nuclei more complicated. In this paper, we propose a new two-branch architecture by combining the Unet and TransUnet networks for nuclei segmentation task. In the proposed architec… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: ICCAIS 2022

  8. arXiv:2407.16497  [pdf, other

    cs.CV

    Dynamic Retraining-Updating Mean Teacher for Source-Free Object Detection

    Authors: Trinh Le Ba Khanh, Huy-Hung Nguyen, Long Hoang Pham, Duong Nguyen-Ngoc Tran, Jae Wook Jeon

    Abstract: In object detection, unsupervised domain adaptation (UDA) aims to transfer knowledge from a labeled source domain to an unlabeled target domain. However, UDA's reliance on labeled source data restricts its adaptability in privacy-related scenarios. This study focuses on source-free object detection (SFOD), which adapts a source-trained detector to an unlabeled target domain without using labeled s… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  9. arXiv:2407.16235  [pdf, other

    cs.SE cs.AI

    Comparison of Static Application Security Testing Tools and Large Language Models for Repo-level Vulnerability Detection

    Authors: Xin Zhou, Duc-Manh Tran, Thanh Le-Cong, Ting Zhang, Ivana Clairine Irsan, Joshua Sumarlin, Bach Le, David Lo

    Abstract: Software vulnerabilities pose significant security challenges and potential risks to society, necessitating extensive efforts in automated vulnerability detection. There are two popular lines of work to address automated vulnerability detection. On one hand, Static Application Security Testing (SAST) is usually utilized to scan source code for security vulnerabilities, especially in industries. On… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  10. arXiv:2407.16232  [pdf, other

    cs.CV

    Channel-Partitioned Windowed Attention And Frequency Learning for Single Image Super-Resolution

    Authors: Dinh Phu Tran, Dao Duy Hung, Daeyoung Kim

    Abstract: Recently, window-based attention methods have shown great potential for computer vision tasks, particularly in Single Image Super-Resolution (SISR). However, it may fall short in capturing long-range dependencies and relationships between distant tokens. Additionally, we find that learning on spatial domain does not convey the frequency content of the image, which is a crucial aspect in SISR. To t… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Version 1, BMVC 2024

  11. arXiv:2407.06434  [pdf, other

    cs.DC

    Efficient Batched CPU/GPU Implementation of Orthogonal Matching Pursuit for Python

    Authors: Ariel Lubonja, Sebastian Kazmarek Præsius, Trac Duy Tran

    Abstract: Finding the most sparse solution to the underdetermined system $\mathbf{y}=\mathbf{Ax}$, given a tolerance, is known to be NP-hard. A popular way to approximate a sparse solution is by using Greedy Pursuit algorithms, and Orthogonal Matching Pursuit (OMP) is one of the most widely used such solutions. For this paper, we implemented an efficient implementation of OMP that leverages Cholesky inverse… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  12. arXiv:2406.02897  [pdf, other

    cs.SD eess.AS

    LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

    Authors: Trung Dang, David Aponte, Dung Tran, Kazuhito Koishida

    Abstract: Prior works have demonstrated zero-shot text-to-speech by using a generative language model on audio tokens obtained via a neural audio codec. It is still challenging, however, to adapt them to low-latency scenarios. In this paper, we present LiveSpeech - a fully autoregressive language model-based approach for zero-shot text-to-speech, enabling low-latency streaming of the output audio. To allow… ▽ More

    Submitted 10 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

  13. Building a temperature forecasting model for the city with the regression neural network (RNN)

    Authors: Nguyen Phuc Tran, Duy Thanh Tran, Thi Thuy Nga Duong

    Abstract: In recent years, a study by environmental organizations in the world and Vietnam shows that weather change is quite complex. global warming has become a serious problem in the modern world, which is a concern for scientists. last century, it was difficult to forecast the weather due to missing weather monitoring stations and technological limitations. this made it hard to collect data for building… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 6 pages

    Journal ref: The 6th International Conference for Small & Medium Business in 2020 (ICSMB 2020)

  14. arXiv:2405.17002  [pdf, other

    cs.CV

    UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models

    Authors: Quan Van Nguyen, Huy Quang Pham, Dan Quang Tran, Thang Kien-Bao Nguyen, Nhat-Hao Nguyen-Dang, Bao-Thien Nguyen-Tat

    Abstract: Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Metho… ▽ More

    Submitted 27 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  15. arXiv:2405.15824  [pdf, other

    cs.LG cs.AI

    Efficient Mitigation of Bus Bunching through Setter-Based Curriculum Learning

    Authors: Avidan Shah, Danny Tran, Yuhan Tang

    Abstract: Curriculum learning has been growing in the domain of reinforcement learning as a method of improving training efficiency for various tasks. It involves modifying the difficulty (lessons) of the environment as the agent learns, in order to encourage more optimal agent behavior and higher reward states. However, most curriculum learning methods currently involve discrete transitions of the curricul… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: 9 pages, preprint

  16. arXiv:2404.18397  [pdf, other

    cs.CV

    ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images

    Authors: Huy Quang Pham, Thang Kien-Bao Nguyen, Quan Van Nguyen, Dan Quang Tran, Nghia Hieu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

    Abstract: Optical Character Recognition - Visual Question Answering (OCR-VQA) is the task of answering text information contained in images that have just been significantly developed in the English language in recent years. However, there are limited studies of this task in low-resource languages such as Vietnamese. To this end, we introduce a novel dataset, ViOCRVQA (Vietnamese Optical Character Recogniti… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  17. arXiv:2404.11152  [pdf, other

    eess.IV cs.CV

    Multi-target and multi-stage liver lesion segmentation and detection in multi-phase computed tomography scans

    Authors: Abdullah F. Al-Battal, Soan T. M. Duong, Van Ha Tang, Quang Duc Tran, Steven Q. H. Truong, Chien Phan, Truong Q. Nguyen, Cheolhong An

    Abstract: Multi-phase computed tomography (CT) scans use contrast agents to highlight different anatomical structures within the body to improve the probability of identifying and detecting anatomical structures of interest and abnormalities such as liver lesions. Yet, detecting these lesions remains a challenging task as these lesions vary significantly in their size, shape, texture, and contrast with resp… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  18. arXiv:2404.10652  [pdf, other

    cs.CL

    ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images

    Authors: Quan Van Nguyen, Dan Quang Tran, Huy Quang Pham, Thang Kien-Bao Nguyen, Nghia Hieu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

    Abstract: Visual Question Answering (VQA) is a complicated task that requires the capability of simultaneously processing natural language and images. Initially, this task was researched, focusing on methods to help machines understand objects and scene contexts in images. However, some text appearing in the image that carries explicit information about the full content of the image is not mentioned. Along… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: Preprint submitted to IJCV

  19. arXiv:2404.10078  [pdf, other

    cs.CV

    Low-Light Image Enhancement Framework for Improved Object Detection in Fisheye Lens Datasets

    Authors: Dai Quoc Tran, Armstrong Aboah, Yuntae Jeon, Maged Shoman, Minsoo Park, Seunghee Park

    Abstract: This study addresses the evolving challenges in urban traffic monitoring detection systems based on fisheye lens cameras by proposing a framework that improves the efficacy and accuracy of these systems. In the context of urban infrastructure and transportation management, advanced traffic monitoring systems have become critical for managing the complexities of urbanization and increasing vehicle… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

  20. arXiv:2404.09991  [pdf, other

    cs.RO cs.CV

    EgoPet: Egomotion and Interaction Data from an Animal's Perspective

    Authors: Amir Bar, Arya Bakhtiar, Danny Tran, Antonio Loquercio, Jathushan Rajasegaran, Yann LeCun, Amir Globerson, Trevor Darrell

    Abstract: Animals perceive the world to plan their actions and interact with other agents to accomplish complex tasks, demonstrating capabilities that are still unmatched by AI systems. To advance our understanding and reduce the gap between the capabilities of animals and AI systems, we introduce a dataset of pet egomotion imagery with diverse examples of simultaneous egomotion and multi-agent interaction.… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: https://www.amirbar.net/egopet

  21. arXiv:2404.01041  [pdf, other

    cs.LG cs.AI cs.CR cs.MA

    Can LLMs get help from other LLMs without revealing private information?

    Authors: Florian Hartmann, Duc-Hieu Tran, Peter Kairouz, Victor Cărbune, Blaise Aguera y Arcas

    Abstract: Cascades are a common type of machine learning systems in which a large, remote model can be queried if a local model is not able to accurately label a user's data by itself. Serving stacks for large language models (LLMs) increasingly use cascades due to their ability to preserve task performance while dramatically reducing inference costs. However, applying cascade systems in situations where th… ▽ More

    Submitted 2 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  22. arXiv:2403.18802  [pdf, other

    cs.CL cs.AI cs.LG

    Long-form factuality in large language models

    Authors: Jerry Wei, Chengrun Yang, Xinying Song, Yifeng Lu, Nathan Hu, Jie Huang, Dustin Tran, Daiyi Peng, Ruibo Liu, Da Huang, Cosmo Du, Quoc V. Le

    Abstract: Large language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model's long-form factuality in open domains, we first use GPT-4 to generate LongFact, a prompt set comprising thousands of questions spanning 38 topics. We then propose that LLM agents can be used as automated evaluators for long-form factua… ▽ More

    Submitted 3 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.

  23. arXiv:2403.10297  [pdf, other

    cs.CV

    Leveraging Neural Radiance Field in Descriptor Synthesis for Keypoints Scene Coordinate Regression

    Authors: Huy-Hoang Bui, Bach-Thuan Bui, Dinh-Tuan Tran, Joo-Ho Lee

    Abstract: Classical structural-based visual localization methods offer high accuracy but face trade-offs in terms of storage, speed, and privacy. A recent innovation, keypoint scene coordinate regression (KSCR) named D2S addresses these issues by leveraging graph attention networks to enhance keypoint relationships and predict their 3D coordinates using a simple multilayer perceptron (MLP). Camera pose is t… ▽ More

    Submitted 19 March, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

  24. arXiv:2403.09579  [pdf, other

    cs.SD cs.LG eess.AS

    uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures

    Authors: Afrina Tabassum, Dung Tran, Trung Dang, Ismini Lourentzou, Kazuhito Koishida

    Abstract: Masked Autoencoders (MAEs) learn rich low-level representations from unlabeled data but require substantial labeled data to effectively adapt to downstream tasks. Conversely, Instance Discrimination (ID) emphasizes high-level semantics, offering a potential solution to alleviate annotation requirements in MAEs. Although combining these two approaches can address downstream tasks with limited label… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: 5 pages, 6 figures, 4 tables. To appear in ICASSP'2024

  25. arXiv:2403.07763  [pdf, other

    cs.NI cs.ET

    Emerging Technologies for 6G Non-Terrestrial-Networks: From Academia to Industrial Applications

    Authors: Cong T. Nguyen, Yuris Mulya Saputra, Nguyen Van Huynh, Tan N. Nguyen, Dinh Thai Hoang, Diep N Nguyen, Van-Quan Pham, Miroslav Voznak, Symeon Chatzinotas, Dinh-Hieu Tran

    Abstract: Terrestrial networks form the fundamental infrastructure of modern communication systems, serving more than 4 billion users globally. However, terrestrial networks are facing a wide range of challenges, from coverage and reliability to interference and congestion. As the demands of the 6G era are expected to be much higher, it is crucial to address these challenges to ensure a robust and efficient… ▽ More

    Submitted 3 July, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: 35 pages

  26. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  27. arXiv:2403.01454  [pdf, ps, other

    cs.IT

    Maximum Length RLL Sequences in de Bruijn Graph

    Authors: Yeow Meng Chee, Tuvi Etzion, Tien Long Nguyen, Duy Hoang Ta, Vinh Duc Tran, Van Khu Vu

    Abstract: A timing and synchronization system based on a de Bruijn sequence has been proposed and studied recently for a channel associated with quantum communication that requires reliable synchronization. To avoid a long period of no-pulse in such a system on-off pulses are used to simulate a zero and on-on pulses are used to simulate a one. However, these sequences have high redundancy. To reduce the red… ▽ More

    Submitted 3 March, 2024; originally announced March 2024.

  28. arXiv:2402.18011  [pdf, other

    cs.CV

    Representing 3D sparse map points and lines for camera relocalization

    Authors: Bach-Thuan Bui, Huy-Hoang Bui, Dinh-Tuan Tran, Joo-Ho Lee

    Abstract: Recent advancements in visual localization and mapping have demonstrated considerable success in integrating point and line features. However, expanding the localization framework to include additional mapping components frequently results in increased demand for memory and computational resources dedicated to matching tasks. In this study, we show how a lightweight neural network can learn to rep… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  29. arXiv:2402.08643  [pdf, other

    cs.CV cs.LG

    Learned Image Compression with Text Quality Enhancement

    Authors: Chih-Yu Lai, Dung Tran, Kazuhito Koishida

    Abstract: Learned image compression has gained widespread popularity for their efficiency in achieving ultra-low bit-rates. Yet, images containing substantial textual content, particularly screen-content images (SCI), often suffers from text distortion at such compressed levels. To address this, we propose to minimize a novel text logit loss designed to quantify the disparity in text between the original an… ▽ More

    Submitted 13 February, 2024; originally announced February 2024.

    Comments: Submitted to ICIP 2024

  30. arXiv:2402.01198  [pdf, other

    cs.IT eess.SP

    Physical Layer Location Privacy in SIMO Communication Using Fake Paths Injection

    Authors: Trong Duy Tran, Maxime Ferreira Da Costa, Linh Trung Nguyen

    Abstract: Fake path injection is an emerging paradigm for inducing privacy over wireless networks. In this paper, fake paths are injected by the transmitter into a SIMO multipath communication channel to preserve her physical location from an eavesdropper. A novel statistical privacy metric is defined as the ratio between the largest (resp. smallest) eigenvalues of Bob's (resp. Eve's) Cramér-Rao lower bound… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  31. arXiv:2312.17255  [pdf, other

    eess.AS cs.LG cs.SD

    Single-channel speech enhancement using learnable loss mixup

    Authors: Oscar Chang, Dung N. Tran, Kazuhito Koishida

    Abstract: Generalization remains a major problem in supervised learning of single-channel speech enhancement. In this work, we propose learnable loss mixup (LLM), a simple and effortless training diagram, to improve the generalization of deep learning-based speech enhancement models. Loss mixup, of which learnable loss mixup is a special variant, optimizes a mixture of the loss functions of random sample pa… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

  32. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  33. arXiv:2312.02192  [pdf, other

    cs.CV

    DiverseDream: Diverse Text-to-3D Synthesis with Augmented Text Embedding

    Authors: Uy Dieu Tran, Minh Luu, Phong Ha Nguyen, Khoi Nguyen, Binh-Son Hua

    Abstract: Text-to-3D synthesis has recently emerged as a new approach to sampling 3D models by adopting pretrained text-to-image models as guiding visual priors. An intriguing but underexplored problem with existing text-to-3D methods is that 3D models obtained from the sampling-by-optimization procedure tend to have mode collapses, and hence poor diversity in their results. In this paper, we provide an ana… ▽ More

    Submitted 17 July, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

    Comments: Accepted to ECCV 2024. Project page: https://diversedream.github.io

  34. arXiv:2311.10810  [pdf

    cs.CL cs.AI

    Use GPT-J Prompt Generation with RoBERTa for NER Models on Diagnosis Extraction of Periodontal Diagnosis from Electronic Dental Records

    Authors: Yao-Shun Chuang, Xiaoqian Jiang, Chun-Teh Lee, Ryan Brandon, Duong Tran, Oluwabunmi Tokede, Muhammad F. Walji

    Abstract: This study explored the usability of prompt generation on named entity recognition (NER) tasks and the performance in different settings of the prompt. The prompt generation by GPT-J models was utilized to directly test the gold standard as well as to generate the seed and further fed to the RoBERTa model with the spaCy package. In the direct test, a lower ratio of negative examples with higher nu… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: 2023 AMIA Annual Symposium, see https://amia.org/education-events/amia-2023-annual-symposium

  35. arXiv:2311.10809  [pdf

    cs.AI

    Extracting periodontitis diagnosis in clinical notes with RoBERTa and regular expression

    Authors: Yao-Shun Chuang, Chun-Teh Lee, Ryan Brandon, Trung Duong Tran, Oluwabunmi Tokede, Muhammad F. Walji, Xiaoqian Jiang

    Abstract: This study aimed to utilize text processing and natural language processing (NLP) models to mine clinical notes for the diagnosis of periodontitis and to evaluate the performance of a named entity recognition (NER) model on different regular expression (RE) methods. Two complexity levels of RE methods were used to extract and generate the training data. The SpaCy package and RoBERTa transformer mo… ▽ More

    Submitted 17 November, 2023; originally announced November 2023.

    Comments: IEEE ICHI 2023, see https://ieeeichi.github.io/ICHI2023/program.html

  36. arXiv:2310.18986  [pdf, other

    cs.CV

    Controllable Group Choreography using Contrastive Diffusion

    Authors: Nhat Le, Tuong Do, Khoa Do, Hien Nguyen, Erman Tjiputra, Quang D. Tran, Anh Nguyen

    Abstract: Music-driven group choreography poses a considerable challenge but holds significant potential for a wide range of industrial applications. The ability to generate synchronized and visually appealing group dance motions that are aligned with music opens up opportunities in many fields such as entertainment, advertising, and virtual performances. However, most of the recent works are not able to ge… ▽ More

    Submitted 3 November, 2023; v1 submitted 29 October, 2023; originally announced October 2023.

  37. STEER: Semantic Turn Extension-Expansion Recognition for Voice Assistants

    Authors: Leon Liyang Zhang, Jiarui Lu, Joel Ruben Antony Moniz, Aditya Kulkarni, Dhivya Piraviperumal, Tien Dung Tran, Nicholas Tzou, Hong Yu

    Abstract: In the context of a voice assistant system, steering refers to the phenomenon in which a user issues a follow-up command attempting to direct or clarify a previous turn. We propose STEER, a steering detection model that predicts whether a follow-up turn is a user's attempt to steer the previous command. Constructing a training dataset for steering use cases poses challenges due to the cold-start p… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Industry Track

  38. arXiv:2310.15543  [pdf, other

    cs.LG

    Symmetry-preserving graph attention network to solve routing problems at multiple resolutions

    Authors: Cong Dao Tran, Thong Bach, Truong Son Hy

    Abstract: Travelling Salesperson Problems (TSPs) and Vehicle Routing Problems (VRPs) have achieved reasonable improvement in accuracy and computation time with the adaptation of Machine Learning (ML) methods. However, none of the previous works completely respects the symmetries arising from TSPs and VRPs including rotation, translation, permutation, and scaling. In this work, we introduce the first-ever co… ▽ More

    Submitted 19 November, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

  39. arXiv:2310.15516  [pdf, other

    cs.LG

    Graph Attention-based Deep Reinforcement Learning for solving the Chinese Postman Problem with Load-dependent costs

    Authors: Truong Son Hy, Cong Dao Tran

    Abstract: Recently, Deep reinforcement learning (DRL) models have shown promising results in solving routing problems. However, most DRL solvers are commonly proposed to solve node routing problems, such as the Traveling Salesman Problem (TSP). Meanwhile, there has been limited research on applying neural methods to arc routing problems, such as the Chinese Postman Problem (CPP), since they often feature ir… ▽ More

    Submitted 2 March, 2024; v1 submitted 24 October, 2023; originally announced October 2023.

  40. arXiv:2310.01148  [pdf, other

    cs.LG

    Cryptocurrency Portfolio Optimization by Neural Networks

    Authors: Quoc Minh Nguyen, Dat Thanh Tran, Juho Kanniainen, Alexandros Iosifidis, Moncef Gabbouj

    Abstract: Many cryptocurrency brokers nowadays offer a variety of derivative assets that allow traders to perform hedging or speculation. This paper proposes an effective algorithm based on neural networks to take advantage of these investment products. The proposed algorithm constructs a portfolio that contains a pair of negatively correlated assets. A deep neural network, which outputs the allocation weig… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 8 pages, 4 figures, accepted at SSCI 2023

  41. arXiv:2309.16699  [pdf

    cs.RO eess.SY

    Circular-Line Trajectory Tracking Controller for Mobile Robot using Multi-Pixy2 Sensors

    Authors: Xuan Quang Ngo, Tri Duc Tran, Huy Hung Nguyen, Van Dong Nguyen, Van Tu Duong, Tan Tien Nguyen

    Abstract: This study suggests a novel tracking method that employs three Pixy2 sensors to identify the desired line trajectories instead of traditional perceiving means. Firstly, the kinematic model of the mobile robot is derived from the information gathered by three Pixy2 sensors. Secondly, the sliding mode controller is implemented to regulate the tracking error. Finally, simulation results are analyzed… ▽ More

    Submitted 12 August, 2023; originally announced September 2023.

    Comments: 6 pages, 12 figures, the 2023 International Symposium on Electrical and Electronics Engineering, Ho Chi Minh, Viet Nam, 2023

  42. arXiv:2309.13881  [pdf, other

    cs.CV

    Skip-Connected Neural Networks with Layout Graphs for Floor Plan Auto-Generation

    Authors: Yuntae Jeon, Dai Quoc Tran, Seunghee Park

    Abstract: With the advent of AI and computer vision techniques, the quest for automated and efficient floor plan designs has gained momentum. This paper presents a novel approach using skip-connected neural networks integrated with layout graphs. The skip-connected layers capture multi-scale floor plan information, and the encoder-decoder networks with GNN facilitate pixel-level probability-based generation… ▽ More

    Submitted 25 September, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

  43. arXiv:2309.12025  [pdf, other

    cs.DS cs.CC cs.LG math.CO

    Robust Approximation Algorithms for Non-monotone $k$-Submodular Maximization under a Knapsack Constraint

    Authors: Dung T. K. Ha, Canh V. Pham, Tan D. Tran, Huan X. Hoang

    Abstract: The problem of non-monotone $k$-submodular maximization under a knapsack constraint ($\kSMK$) over the ground set size $n$ has been raised in many applications in machine learning, such as data summarization, information propagation, etc. However, existing algorithms for the problem are facing questioning of how to overcome the non-monotone case and how to fast return a good solution in case of th… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: 12 pages

    Report number: KSE-ID38

  44. arXiv:2309.10740  [pdf, other

    cs.SD cs.LG cs.MM eess.AS

    ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation

    Authors: Yatong Bai, Trung Dang, Dung Tran, Kazuhito Koishida, Somayeh Sojoudi

    Abstract: Diffusion models are instrumental in text-to-audio (TTA) generation. Unfortunately, they suffer from slow inference due to an excessive number of queries to the underlying denoising network per generation. To address this bottleneck, we introduce ConsistencyTTA, a framework requiring only a single non-autoregressive network query, thereby accelerating TTA by hundreds of times. We achieve so by pro… ▽ More

    Submitted 24 June, 2024; v1 submitted 19 September, 2023; originally announced September 2023.

  45. arXiv:2307.15250  [pdf, other

    cs.CV cs.RO

    D2S: Representing sparse descriptors and 3D coordinates for camera relocalization

    Authors: Bach-Thuan Bui, Huy-Hoang Bui, Dinh-Tuan Tran, Joo-Ho Lee

    Abstract: State-of-the-art visual localization methods mostly rely on complex procedures to match local descriptors and 3D point clouds. However, these procedures can incur significant costs in terms of inference, storage, and updates over time. In this study, we propose a direct learning-based approach that utilizes a simple network named D2S to represent complex local descriptors and their scene coordinat… ▽ More

    Submitted 12 July, 2024; v1 submitted 27 July, 2023; originally announced July 2023.

  46. arXiv:2306.10208  [pdf, other

    cs.CV

    Learning Space-Time Semantic Correspondences

    Authors: Du Tran, Jitendra Malik

    Abstract: We propose a new task of space-time semantic correspondence prediction in videos. Given a source video, a target video, and a set of space-time key-points in the source video, the task requires predicting a set of keypoints in the target video that are the semantic correspondences of the provided source keypoints. We believe that this task is important for fine-grain video understanding, potential… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

  47. arXiv:2306.06893  [pdf, other

    cs.CV cs.AI

    In-context Cross-Density Adaptation on Noisy Mammogram Abnormalities Detection

    Authors: Huy T. Nguyen, Thinh B. Lam, Quan D. D. Tran, Minh T. Nguyen, Dat T. Chung, Vinh Q. Dinh

    Abstract: This paper investigates the impact of breast density distribution on the generalization performance of deep-learning models on mammography images using the VinDr-Mammo dataset. We explore the use of domain adaptation techniques, specifically Domain Adaptive Object Detection (DAOD) with the Noise Latent Transferability Exploration (NLTE) framework, to improve model performance across breast densiti… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  48. arXiv:2305.10292  [pdf, other

    cs.DS cs.AI

    Linear Query Approximation Algorithms for Non-monotone Submodular Maximization under Knapsack Constraint

    Authors: Canh V. Pham, Tan D. Tran, Dung T. K. Ha, My T. Thai

    Abstract: This work, for the first time, introduces two constant factor approximation algorithms with linear query complexity for non-monotone submodular maximization over a ground set of size $n$ subject to a knapsack constraint, $\mathsf{DLA}$ and $\mathsf{RLA}$. $\mathsf{DLA}$ is a deterministic algorithm that provides an approximation factor of $6+ε$ while $\mathsf{RLA}$ is a randomized algorithm with a… ▽ More

    Submitted 10 July, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

  49. arXiv:2305.04594  [pdf, other

    cs.RO

    A sensor fusion approach for improving implementation speed and accuracy of RTAB-Map algorithm based indoor 3D mapping

    Authors: Hoang-Anh Phan, Phuc Vinh Nguyen, Thu Hang Thi Khuat, Hieu Dang Van, Dong Huu Quoc Tran, Bao Lam Dang, Tung Thanh Bui, Van Nguyen Thi Thanh, Trinh Chu Duc

    Abstract: In recent years, 3D mapping for indoor environments has undergone considerable research and improvement because of its effective applications in various fields, including robotics, autonomous navigation, and virtual reality. Building an accurate 3D map for indoor environment is challenging due to the complex nature of the indoor space, the problem of real-time embedding and positioning errors of t… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted to 20th International Joint Conference on Computer Science and Software Engineering (JCSSE 2023). 5 pages

  50. arXiv:2305.04576  [pdf, other

    cs.RO

    An Enhanced Sampling-Based Method With Modified Next-Best View Strategy For 2D Autonomous Robot Exploration

    Authors: Dong Huu Quoc Tran, Hoang-Anh Phan, Hieu Dang Van, Tan Van Duong, Tung Thanh Bui, Van Nguyen Thi Thanh

    Abstract: Autonomous exploration is a new technology in the field of robotics that has found widespread application due to its objective to help robots independently localize, scan maps, and navigate any terrain without human control. Up to present, the sampling-based exploration strategies have been the most effective for aerial and ground vehicles equipped with depth sensors producing three-dimensional po… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

    Comments: Accepted to 20th International Joint Conference on Computer Science and Software Engineering (JCSSE 2023). 6 pages