Skip to main content

Showing 1–50 of 1,887 results for author: Huang, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10575  [pdf, other

    cs.CV

    MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval

    Authors: Haoran Tang, Meng Cao, Jinfa Huang, Ruyang Liu, Peng Jin, Ge Li, Xiaodan Liang

    Abstract: Text-Video Retrieval (TVR) aims to align and associate relevant video content with corresponding natural language queries. Most existing TVR methods are based on large-scale pre-trained vision-language models (e.g., CLIP). However, due to the inherent plain structure of CLIP, few TVR methods explore the multi-scale representations which offer richer contextual information for a more thorough under… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 8 pages

  2. arXiv:2408.10357  [pdf, other

    cs.CL cs.IR

    Beyond Relevant Documents: A Knowledge-Intensive Approach for Query-Focused Summarization using Large Language Models

    Authors: Weijia Zhang, Jia-Hong Huang, Svitlana Vakulenko, Yumo Xu, Thilina Rajapakse, Evangelos Kanoulas

    Abstract: Query-focused summarization (QFS) is a fundamental task in natural language processing with broad applications, including search engines and report generation. However, traditional approaches assume the availability of relevant documents, which may not always hold in practical scenarios, especially in highly specialized topics. To address this limitation, we propose a novel knowledge-intensive app… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted by the 27th International Conference on Pattern Recognition (ICPR 2024)

  3. arXiv:2408.08470  [pdf, other

    cs.LG cs.AI

    Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language Models

    Authors: Jerry Huang, Prasanna Parthasarathi, Mehdi Rezagholizadeh, Sarath Chandar

    Abstract: Despite their widespread adoption, large language models (LLMs) remain prohibitive to use under resource constraints, with their ever growing sizes only increasing the barrier for use. One noted issue is the high latency associated with auto-regressive generation, rendering large LLMs use dependent on advanced computing infrastructure. Assisted decoding, where a smaller draft model guides a larger… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 14 pages (9 pages main content + references + appendix)

  4. arXiv:2408.07733  [pdf, other

    cs.LG cs.CR

    Enhancing Adversarial Attacks via Parameter Adaptive Adversarial Attack

    Authors: Zhibo Jin, Jiayu Zhang, Zhiyu Zhu, Chenyu Zhang, Jiahao Huang, Jianlong Zhou, Fang Chen

    Abstract: In recent times, the swift evolution of adversarial attacks has captured widespread attention, particularly concerning their transferability and other performance attributes. These techniques are primarily executed at the sample level, frequently overlooking the intrinsic parameters of models. Such neglect suggests that the perturbations introduced in adversarial samples might have the potential f… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  5. arXiv:2408.07367  [pdf, other

    cs.RO

    Risk Occupancy: A New and Efficient Paradigm through Vehicle-Road-Cloud Collaboration

    Authors: Jiaxing Chen, Wei Zhong, Bolin Gao, Yifei Liu, Hengduo Zou, Jiaxi Liu, Yanbo Lu, Jin Huang, Zhihua Zhong

    Abstract: This study introduces the 4D Risk Occupancy within a vehicle-road-cloud architecture, integrating the road surface spatial, risk, and temporal dimensions, and endowing the algorithm with beyond-line-of-sight, all-angles, and efficient abilities. The algorithm simplifies risk modeling by focusing on directly observable information and key factors, drawing on the concept of Occupancy Grid Maps (OGM)… ▽ More

    Submitted 17 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: 13 pages,9 figures

  6. arXiv:2408.07349  [pdf, other

    eess.IV cs.CV cs.MM

    Automated Retinal Image Analysis and Medical Report Generation through Deep Learning

    Authors: Jia-Hong Huang

    Abstract: The increasing prevalence of retinal diseases poses a significant challenge to the healthcare system, as the demand for ophthalmologists surpasses the available workforce. This imbalance creates a bottleneck in diagnosis and treatment, potentially delaying critical care. Traditional methods of generating medical reports from retinal images rely on manual interpretation, which is time-consuming and… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Ph.D. thesis, 124 pages

  7. arXiv:2408.06356  [pdf, other

    cs.CV

    Enhancing Ecological Monitoring with Multi-Objective Optimization: A Novel Dataset and Methodology for Segmentation Algorithms

    Authors: Sophia J. Abraham, Jin Huang, Brandon RichardWebster, Michael Milford, Jonathan D. Hauenstein, Walter Scheirer

    Abstract: We introduce a unique semantic segmentation dataset of 6,096 high-resolution aerial images capturing indigenous and invasive grass species in Bega Valley, New South Wales, Australia, designed to address the underrepresented domain of ecological data in the computer vision community. This dataset presents a challenging task due to the overlap and distribution of grass species, which is critical for… ▽ More

    Submitted 25 July, 2024; originally announced August 2024.

  8. arXiv:2408.05694  [pdf, other

    cs.CR

    ICSFuzz: Collision Detector Bug Discovery in Autonomous Driving Simulators

    Authors: Weiwei Fu, Heqing Huang, Yifan Zhang, Ke Zhang, Jin Huang, Wei-Bin Lee, Jianping Wang

    Abstract: With the increasing adoption of autonomous vehicles, ensuring the reliability of autonomous driving systems (ADSs) deployed on autonomous vehicles has become a significant concern. Driving simulators have emerged as crucial platforms for testing autonomous driving systems, offering realistic, dynamic, and configurable environments. However, existing simulation-based ADS testers have largely overlo… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  9. arXiv:2408.05517  [pdf, other

    cs.CL

    SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning

    Authors: Yuze Zhao, Jintao Huang, Jinghan Hu, Xingjun Wang, Yunlin Mao, Daoze Zhang, Zeyinzi Jiang, Zhikai Wu, Baole Ai, Ang Wang, Wenmeng Zhou, Yingda Chen

    Abstract: Recent development in Large Language Models (LLMs) and Multi-modal Large Language Models (MLLMs) have leverage Attention-based Transformer architectures and achieved superior performance and generalization capabilities. They have since covered extensive areas of traditional learning tasks. For instance, text-based tasks such as text-classification and sequence-labeling, as well as multi-modal task… ▽ More

    Submitted 18 August, 2024; v1 submitted 10 August, 2024; originally announced August 2024.

  10. arXiv:2408.04808  [pdf, other

    cs.DC cs.LG

    Scaling Deep Learning Computation over the Inter-Core Connected Intelligence Processor

    Authors: Yiqi Liu, Yuqi Xue, Yu Cheng, Lingxiao Ma, Ziming Miao, Jilong Xue, Jian Huang

    Abstract: As AI chips incorporate numerous parallelized cores to scale deep learning (DL) computing, inter-core communication is enabled recently by employing high-bandwidth and low-latency interconnect links on the chip (e.g., Graphcore IPU). It allows each core to directly access the fast scratchpad memory in other cores, which enables new parallel computing paradigms. However, without proper support for… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: This paper is accepted at The 30th ACM Symposium on Operating Systems Principles (SOSP'24)

  11. arXiv:2408.04804  [pdf, other

    cs.CV

    Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation

    Authors: Yifan Feng, Jiangang Huang, Shaoyi Du, Shihui Ying, Jun-Hai Yong, Yipeng Li, Guiguang Ding, Rongrong Ji, Yue Gao

    Abstract: We introduce Hyper-YOLO, a new object detection method that integrates hypergraph computations to capture the complex high-order correlations among visual features. Traditional YOLO models, while powerful, have limitations in their neck designs that restrict the integration of cross-level features and the exploitation of high-order feature interrelationships. To address these challenges, we propos… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  12. arXiv:2408.04708  [pdf, other

    cs.SD cs.AI eess.AS

    MulliVC: Multi-lingual Voice Conversion With Cycle Consistency

    Authors: Jiawei Huang, Chen Zhang, Yi Ren, Ziyue Jiang, Zhenhui Ye, Jinglin Liu, Jinzheng He, Xiang Yin, Zhou Zhao

    Abstract: Voice conversion aims to modify the source speaker's voice to resemble the target speaker while preserving the original speech content. Despite notable advancements in voice conversion these days, multi-lingual voice conversion (including both monolingual and cross-lingual scenarios) has yet to be extensively studied. It faces two main challenges: 1) the considerable variability in prosody and art… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  13. arXiv:2408.04307  [pdf, other

    cs.DC cs.LG

    Partial Experts Checkpoint: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training

    Authors: Weilin Cai, Le Qin, Jiayi Huang

    Abstract: As large language models continue to scale up, the imperative for fault tolerance in distributed deep learning systems intensifies, becoming a focal area of AI infrastructure research. Checkpoint has emerged as the predominant fault tolerance strategy, with extensive studies dedicated to optimizing its efficiency. However, the advent of the sparse Mixture-of-Experts (MoE) model presents new challe… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  14. arXiv:2408.04216  [pdf

    cs.CL cs.AI

    Attention Mechanism and Context Modeling System for Text Mining Machine Translation

    Authors: Shi Bo, Yuwei Zhang, Junming Huang, Sitong Liu, Zexi Chen, Zizheng Li

    Abstract: This paper advances a novel architectural schema anchored upon the Transformer paradigm and innovatively amalgamates the K-means categorization algorithm to augment the contextual apprehension capabilities of the schema. The transformer model performs well in machine translation tasks due to its parallel computing power and multi-head attention mechanism. However, it may encounter contextual ambig… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  15. arXiv:2408.04171  [pdf, other

    cs.CV

    Rotation center identification based on geometric relationships for rotary motion deblurring

    Authors: Jinhui Qin, Yong Ma, Jun Huang, Fan Fan, You Du

    Abstract: Non-blind rotary motion deblurring (RMD) aims to recover the latent clear image from a rotary motion blurred (RMB) image. The rotation center is a crucial input parameter in non-blind RMD methods. Existing methods directly estimate the rotation center from the RMB image. However they always suffer significant errors, and the performance of RMD is limited. For the assembled imaging systems, the pos… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  16. arXiv:2408.04170  [pdf

    cs.CV

    M2EF-NNs: Multimodal Multi-instance Evidence Fusion Neural Networks for Cancer Survival Prediction

    Authors: Hui Luo, Jiashuang Huang, Hengrong Ju, Tianyi Zhou, Weiping Ding

    Abstract: Accurate cancer survival prediction is crucial for assisting clinical doctors in formulating treatment plans. Multimodal data, including histopathological images and genomic data, offer complementary and comprehensive information that can greatly enhance the accuracy of this task. However, the current methods, despite yielding promising results, suffer from two notable limitations: they do not eff… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  17. arXiv:2408.04104  [pdf, other

    cs.AR cs.AI cs.LG cs.OS

    Hardware-Assisted Virtualization of Neural Processing Units for Cloud Platforms

    Authors: Yuqi Xue, Yiqi Liu, Lifeng Nai, Jian Huang

    Abstract: Cloud platforms today have been deploying hardware accelerators like neural processing units (NPUs) for powering machine learning (ML) inference services. To maximize the resource utilization while ensuring reasonable quality of service, a natural approach is to virtualize NPUs for efficient resource sharing for multi-tenant ML services. However, virtualizing NPUs for modern cloud platforms is not… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: Accepted to MICRO'24

  18. arXiv:2408.03771  [pdf

    cs.CV

    Methodological Explainability Evaluation of an Interpretable Deep Learning Model for Post-Hepatectomy Liver Failure Prediction Incorporating Counterfactual Explanations and Layerwise Relevance Propagation: A Prospective In Silico Trial

    Authors: Xian Zhong, Zohaib Salahuddin, Yi Chen, Henry C Woodruff, Haiyi Long, Jianyun Peng, Nuwan Udawatte, Roberto Casale, Ayoub Mokhtari, Xiaoer Zhang, Jiayao Huang, Qingyu Wu, Li Tan, Lili Chen, Dongming Li, Xiaoyan Xie, Manxia Lin, Philippe Lambin

    Abstract: Artificial intelligence (AI)-based decision support systems have demonstrated value in predicting post-hepatectomy liver failure (PHLF) in hepatocellular carcinoma (HCC). However, they often lack transparency, and the impact of model explanations on clinicians' decisions has not been thoroughly evaluated. Building on prior research, we developed a variational autoencoder-multilayer perceptron (VAE… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  19. arXiv:2408.03567  [pdf, other

    cs.CV cs.CL

    Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning

    Authors: Zi-Yi Dou, Xitong Yang, Tushar Nagarajan, Huiyu Wang, Jing Huang, Nanyun Peng, Kris Kitani, Fu-Jen Chu

    Abstract: We present EMBED (Egocentric Models Built with Exocentric Data), a method designed to transform exocentric video-language data for egocentric video representation learning. Large-scale exocentric data covers diverse activities with significant potential for egocentric learning, but inherent disparities between egocentric and exocentric data pose challenges in utilizing one view for the other seaml… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  20. arXiv:2408.03499  [pdf, other

    cs.CV

    FacialPulse: An Efficient RNN-based Depression Detection via Temporal Facial Landmarks

    Authors: Ruiqi Wang, Jinyang Huang, Jie Zhang, Xin Liu, Xiang Zhang, Zhi Liu, Peng Zhao, Sigui Chen, Xiao Sun

    Abstract: Depression is a prevalent mental health disorder that significantly impacts individuals' lives and well-being. Early detection and intervention are crucial for effective treatment and management of depression. Recently, there are many end-to-end deep learning methods leveraging the facial expression features for automatic depression detection. However, most current methods overlook the temporal dy… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  21. arXiv:2408.02927  [pdf, other

    cs.LG cs.AI cs.CL cs.CR

    HARMONIC: Harnessing LLMs for Tabular Data Synthesis and Privacy Protection

    Authors: Yuxin Wang, Duanyu Feng, Yongfu Dai, Zhengyu Chen, Jimin Huang, Sophia Ananiadou, Qianqian Xie, Hao Wang

    Abstract: Data serves as the fundamental foundation for advancing deep learning, particularly tabular data presented in a structured format, which is highly conducive to modeling. However, even in the era of LLM, obtaining tabular data from sensitive domains remains a challenge due to privacy or copyright concerns. Hence, exploring how to effectively use models like LLMs to generate realistic and privacy-pr… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  22. FDiff-Fusion:Denoising diffusion fusion network based on fuzzy learning for 3D medical image segmentation

    Authors: Weiping Ding, Sheng Geng, Haipeng Wang, Jiashuang Huang, Tianyi Zhou

    Abstract: In recent years, the denoising diffusion model has achieved remarkable success in image segmentation modeling. With its powerful nonlinear modeling capabilities and superior generalization performance, denoising diffusion models have gradually been applied to medical image segmentation tasks, bringing new perspectives and methods to this field. However, existing methods overlook the uncertainty of… ▽ More

    Submitted 21 July, 2024; originally announced August 2024.

    Comments: This paper has been accepted by Information Fusion. Permission from Elsevier must be obtained for all other uses, in any current or future media. The final version is available at [doi:10.1016/J.INFFUS.2024.102540]

    Journal ref: Information Fusion, 2024: 102540

  23. arXiv:2408.01723  [pdf, other

    cs.CV cs.IR

    A Novel Evaluation Framework for Image2Text Generation

    Authors: Jia-Hong Huang, Hongyi Zhu, Yixian Shen, Stevan Rudinac, Alessio M. Pacces, Evangelos Kanoulas

    Abstract: Evaluating the quality of automatically generated image descriptions is challenging, requiring metrics that capture various aspects such as grammaticality, coverage, correctness, and truthfulness. While human evaluation offers valuable insights, its cost and time-consuming nature pose limitations. Existing automated metrics like BLEU, ROUGE, METEOR, and CIDEr aim to bridge this gap but often show… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: The paper has been accepted for presentation at the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, specifically in the Large Language Model for Evaluation in IR (LLM4Eval) Workshop in 2024

  24. Degrade to Function: Towards Eco-friendly Morphing Devices that Function Through Programmed Sequential Degradation

    Authors: Qiuyu Lu, Semina Yi, Mentian Gan, Jihong Huang, Xiao Zhang, Yue Yang, Chenyi Shen, Lining Yao

    Abstract: While it seems counterintuitive to think of degradation within an operating device as beneficial, one may argue that when rationally designed, the controlled breakdown of materials can be harnessed for specific functions. To apply this principle to the design of morphing devices, we introduce the concept of Degrade to Function (DtF). This concept aims to create eco-friendly and self-contained morp… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: 24 pages, 24 figures, The 37th Annual ACM Symposium on User Interface Software and Technology (UIST 24)

  25. arXiv:2408.01453  [pdf, other

    cs.CY cs.AI cs.CL

    Reporting and Analysing the Environmental Impact of Language Models on the Example of Commonsense Question Answering with External Knowledge

    Authors: Aida Usmanova, Junbo Huang, Debayan Banerjee, Ricardo Usbeck

    Abstract: Human-produced emissions are growing at an alarming rate, causing already observable changes in the climate and environment in general. Each year global carbon dioxide emissions hit a new record, and it is reported that 0.5% of total US greenhouse gas emissions are attributed to data centres as of 2021. The release of ChatGPT in late 2022 sparked social interest in Large Language Models (LLMs), th… ▽ More

    Submitted 24 July, 2024; originally announced August 2024.

    Comments: Presented at Bonn Sustainable AI 2023 conference

  26. arXiv:2408.01391  [pdf, other

    cs.DC cs.LG

    FT K-means: A High-Performance K-means on GPU with Fault Tolerance

    Authors: Shixun Wu, Yitong Ding, Yujia Zhai, Jinyang Liu, Jiajun Huang, Zizhe Jian, Huangliang Dai, Sheng Di, Bryan M. Wong, Zizhong Chen, Franck Cappello

    Abstract: K-means is a widely used algorithm in clustering, however, its efficiency is primarily constrained by the computational cost of distance computing. Existing implementations suffer from suboptimal utilization of computational units and lack resilience against soft errors. To address these challenges, we introduce FT K-means, a high-performance GPU-accelerated implementation of K-means with online f… ▽ More

    Submitted 7 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

  27. arXiv:2408.00989  [pdf, other

    cs.AI

    On the Resilience of Multi-Agent Systems with Malicious Agents

    Authors: Jen-tse Huang, Jiaxu Zhou, Tailin Jin, Xuhui Zhou, Zixi Chen, Wenxuan Wang, Youliang Yuan, Maarten Sap, Michael R. Lyu

    Abstract: Multi-agent systems, powered by large language models, have shown great abilities across various tasks due to the collaboration of expert agents, each focusing on a specific domain. However, when agents are deployed separately, there is a risk that malicious users may introduce malicious agents who generate incorrect or irrelevant results that are too stealthy to be identified by other non-special… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 10 pages

  28. arXiv:2408.00793  [pdf

    physics.chem-ph cs.LG

    From 2015 to 2023: How Machine Learning Aids Natural Product Analysis

    Authors: Suwen Shi, Ziwei Huang, Xingxin Gu, Xu Lin, Chaoying Zhong, Junjie Hang, Jianli Lin, Claire Chenwen Zhong, Lin Zhang, Yu Li, Junjie Huang

    Abstract: In recent years, conventional chemistry techniques have faced significant challenges due to their inherent limitations, struggling to cope with the increasing complexity and volume of data generated in contemporary research endeavors. Computational methodologies represent robust tools in the field of chemistry, offering the capacity to harness potent machine-learning models to yield insightful ana… ▽ More

    Submitted 17 July, 2024; originally announced August 2024.

    Comments: 19 pages, 4 figures

  29. arXiv:2408.00744  [pdf, other

    cs.CV

    Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation

    Authors: Siyu Jiao, Hongguang Zhu, Jiannan Huang, Yao Zhao, Yunchao Wei, Humphrey Shi

    Abstract: Pre-trained vision-language models, e.g. CLIP, have been increasingly used to address the challenging Open-Vocabulary Segmentation (OVS) task, benefiting from their well-aligned vision-text embedding space. Typical solutions involve either freezing CLIP during training to unilaterally maintain its zero-shot capability, or fine-tuning CLIP vision encoder to achieve perceptual sensitivity to local r… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: ECCV 2024

  30. arXiv:2408.00247  [pdf, other

    cs.IR

    Simple but Efficient: A Multi-Scenario Nearline Retrieval Framework for Recommendation on Taobao

    Authors: Yingcai Ma, Ziyang Wang, Yuliang Yan, Jian Wu, Yuning Jiang, Longbin Li, Wen Chen, Jianhang Huang

    Abstract: In recommendation systems, the matching stage is becoming increasingly critical, serving as the upper limit for the entire recommendation process. Recently, some studies have started to explore the use of multi-scenario information for recommendations, such as model-based and data-based approaches. However, the matching stage faces significant challenges due to the need for ultra-large-scale retri… ▽ More

    Submitted 5 August, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

  31. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  32. arXiv:2407.21631  [pdf, other

    cs.CV

    RoadFormer+: Delivering RGB-X Scene Parsing through Scale-Aware Information Decoupling and Advanced Heterogeneous Feature Fusion

    Authors: Jianxin Huang, Jiahang Li, Ning Jia, Yuxiang Sun, Chengju Liu, Qijun Chen, Rui Fan

    Abstract: Task-specific data-fusion networks have marked considerable achievements in urban scene parsing. Among these networks, our recently proposed RoadFormer successfully extracts heterogeneous features from RGB images and surface normal maps and fuses these features through attention mechanisms, demonstrating compelling efficacy in RGB-Normal road scene parsing. However, its performance significantly d… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: 11 pages, 5 figures

  33. arXiv:2407.21328  [pdf, other

    eess.IV cs.CV

    Knowledge-Guided Prompt Learning for Lifespan Brain MR Image Segmentation

    Authors: Lin Teng, Zihao Zhao, Jiawei Huang, Zehong Cao, Runqi Meng, Feng Shi, Dinggang Shen

    Abstract: Automatic and accurate segmentation of brain MR images throughout the human lifespan into tissue and structure is crucial for understanding brain development and diagnosing diseases. However, challenges arise from the intricate variations in brain appearance due to rapid early brain development, aging, and disorders, compounded by the limited availability of manually-labeled datasets. In response,… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  34. arXiv:2407.21022  [pdf, other

    cs.IR

    A Comprehensive Survey on Retrieval Methods in Recommender Systems

    Authors: Junjie Huang, Jizheng Chen, Jianghao Lin, Jiarui Qin, Ziming Feng, Weinan Zhang, Yong Yu

    Abstract: In an era dominated by information overload, effective recommender systems are essential for managing the deluge of data across digital platforms. Multi-stage cascade ranking systems are widely used in the industry, with retrieval and ranking being two typical stages. Retrieval methods sift through vast candidates to filter out irrelevant items, while ranking methods prioritize these candidates to… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 38 pages

  35. arXiv:2407.21004  [pdf, other

    cs.CL cs.CV

    Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection

    Authors: Jinfa Huang, Jinsheng Pan, Zhongwei Wan, Hanjia Lyu, Jiebo Luo

    Abstract: Recent advances show that two-stream approaches have achieved outstanding performance in hateful meme detection. However, hateful memes constantly evolve as new memes emerge by fusing progressive cultural ideas, making existing methods obsolete or ineffective. In this work, we explore the potential of Large Multimodal Models (LMMs) for hateful meme detection. To this end, we propose Evolver, which… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  36. arXiv:2407.20955  [pdf, other

    cs.SD cs.AI eess.AS

    Emotion-driven Piano Music Generation via Two-stage Disentanglement and Functional Representation

    Authors: Jingyue Huang, Ke Chen, Yi-Hsuan Yang

    Abstract: Managing the emotional aspect remains a challenge in automatic music generation. Prior works aim to learn various emotions at once, leading to inadequate modeling. This paper explores the disentanglement of emotions in piano performance generation through a two-stage framework. The first stage focuses on valence modeling of lead sheet, and the second stage addresses arousal modeling by introducing… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: Proceedings of the 25th International Society for Music Information Retrieval Conference, ISMIR 2024

  37. arXiv:2407.20176  [pdf, other

    cs.SD cs.AI eess.AS

    Emotion-Driven Melody Harmonization via Melodic Variation and Functional Representation

    Authors: Jingyue Huang, Yi-Hsuan Yang

    Abstract: Emotion-driven melody harmonization aims to generate diverse harmonies for a single melody to convey desired emotions. Previous research found it hard to alter the perceived emotional valence of lead sheets only by harmonizing the same melody with different chords, which may be attributed to the constraints imposed by the melody itself and the limitation of existing music representation. In this p… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: This work is the initial version of the ISMIR 2024 paper EMO-Disentanger

  38. arXiv:2407.19041  [pdf, other

    cs.AI cs.CL

    Optimizing Numerical Estimation and Operational Efficiency in the Legal Domain through Large Language Models

    Authors: Jia-Hong Huang, Chao-Chun Yang, Yixian Shen, Alessio M. Pacces, Evangelos Kanoulas

    Abstract: The legal landscape encompasses a wide array of lawsuit types, presenting lawyers with challenges in delivering timely and accurate information to clients, particularly concerning critical aspects like potential imprisonment duration or financial repercussions. Compounded by the scarcity of legal experts, there's an urgent need to enhance the efficiency of traditional legal workflows. Recent advan… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: The paper has been accepted by the 33rd ACM International Conference on Information and Knowledge Management (CIKM) in 2024

  39. arXiv:2407.17933  [pdf, other

    cs.CV

    Segmentation by registration-enabled SAM prompt engineering using five reference images

    Authors: Yaxi Chen, Aleksandra Ivanova, Shaheer U. Saeed, Rikin Hargunani, Jie Huang, Chaozong Liu, Yipeng Hu

    Abstract: The recently proposed Segment Anything Model (SAM) is a general tool for image segmentation, but it requires additional adaptation and careful fine-tuning for medical image segmentation, especially for small, irregularly-shaped, and boundary-ambiguous anatomical structures such as the knee cartilage that is of interest in this work. Repaired cartilage, after certain surgical procedures, exhibits i… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: Accepted to the 11th International Workshop on Biomedical Image Registration (WBIR 2024)

  40. arXiv:2407.17817  [pdf, other

    cs.CL cs.LG

    Demystifying Verbatim Memorization in Large Language Models

    Authors: Jing Huang, Diyi Yang, Christopher Potts

    Abstract: Large Language Models (LLMs) frequently memorize long sequences verbatim, often with serious legal and privacy implications. Much prior work has studied such verbatim memorization using observational data. To complement such work, we develop a framework to study verbatim memorization in a controlled setting by continuing pre-training from Pythia checkpoints with injected sequences. We find that (1… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  41. arXiv:2407.17788  [pdf, other

    cs.CR

    PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation

    Authors: Junjie Huang, Quanyan Zhu

    Abstract: Recent advances in Large Language Models (LLMs) have shown significant potential in enhancing cybersecurity defenses against sophisticated threats. LLM-based penetration testing is an essential step in automating system security evaluations by identifying vulnerabilities. Remediation, the subsequent crucial step, addresses these discovered vulnerabilities. Since details about vulnerabilities, expl… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  42. arXiv:2407.17535  [pdf, other

    cs.AI cs.LG cs.SE

    LAMBDA: A Large Model Based Data Agent

    Authors: Maojun Sun, Ruijian Han, Binyan Jiang, Houduo Qi, Defeng Sun, Yancheng Yuan, Jian Huang

    Abstract: We introduce ``LAMBDA," a novel open-source, code-free multi-agent data analysis system that that harnesses the power of large models. LAMBDA is designed to address data analysis challenges in complex data-driven applications through the use of innovatively designed data agents that operate iteratively and generatively using natural language. At the core of LAMBDA are two key agent roles: the prog… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 30 pages, 21 figures and 5 tables

    MSC Class: 62-04; 62-08; 68T01; 68T09

  43. arXiv:2407.17164  [pdf, other

    cs.LG cs.AI

    Robust Deep Hawkes Process under Label Noise of Both Event and Occurrence

    Authors: Xiaoyu Tan, Bin Li, Xihe Qiu, Jingjing Huang, Yinghui Xu, Wei Chu

    Abstract: Integrating deep neural networks with the Hawkes process has significantly improved predictive capabilities in finance, health informatics, and information technology. Nevertheless, these models often face challenges in real-world settings, particularly due to substantial label noise. This issue is of significant concern in the medical field, where label noise can arise from delayed updates in ele… ▽ More

    Submitted 29 July, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

    Comments: ECAI2024

  44. arXiv:2407.17150  [pdf, other

    cs.CL cs.SE

    SimCT: A Simple Consistency Test Protocol in LLMs Development Lifecycle

    Authors: Fufangchen Zhao, Guoqiang Jin, Rui Zhao, Jiangheng Huang, Fei Tan

    Abstract: In this work, we report our efforts to advance the standard operation procedure of developing Large Language Models (LLMs) or LLMs-based systems or services in industry. We introduce the concept of Large Language Model Development Lifecycle (LDLC) and then highlight the importance of consistency test in ensuring the delivery quality. The principled solution of consistency test, however, is usually… ▽ More

    Submitted 8 August, 2024; v1 submitted 24 July, 2024; originally announced July 2024.

  45. Cascaded two-stage feature clustering and selection via separability and consistency in fuzzy decision systems

    Authors: Yuepeng Chen, Weiping Ding, Hengrong Ju, Jiashuang Huang, Tao Yin

    Abstract: Feature selection is a vital technique in machine learning, as it can reduce computational complexity, improve model performance, and mitigate the risk of overfitting. However, the increasing complexity and dimensionality of datasets pose significant challenges in the selection of features. Focusing on these challenges, this paper proposes a cascaded two-stage feature clustering and selection algo… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by IEEE Transactions on Fuzzy Systems for publication. Permission from IEEE must be obtained for all other uses, in any current or future media. The final version is available at [10.1109/TFUZZ.2024.3420963]

    Journal ref: IEEE Transactions on Fuzzy Systems 2024

  46. arXiv:2407.15693  [pdf, ps, other

    math.AP cs.LG math.FA math.ST

    Fisher-Rao Gradient Flow: Geodesic Convexity and Functional Inequalities

    Authors: José A. Carrillo, Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Dongyi Wei

    Abstract: The dynamics of probability density functions has been extensively studied in science and engineering to understand physical phenomena and facilitate algorithmic design. Of particular interest are dynamics that can be formulated as gradient flows of energy functionals under the Wasserstein metric. The development of functional inequalities, such as the log-Sobolev inequality, plays a pivotal role… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 38 pages

  47. arXiv:2407.15420  [pdf, other

    cs.CV

    Local All-Pair Correspondence for Point Tracking

    Authors: Seokju Cho, Jiahui Huang, Jisu Nam, Honggyu An, Seungryong Kim, Joon-Young Lee

    Abstract: We introduce LocoTrack, a highly accurate and efficient model designed for the task of tracking any point (TAP) across video sequences. Previous approaches in this task often rely on local 2D correlation maps to establish correspondences from a point in the query image to a local region in the target image, which often struggle with homogeneous regions or repetitive features, leading to matching a… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. Project page: https://ku-cvlab.github.io/locotrack Code: https://github.com/KU-CVLAB/locotrack

  48. FMDNN: A Fuzzy-guided Multi-granular Deep Neural Network for Histopathological Image Classification

    Authors: Weiping Ding, Tianyi Zhou, Jiashuang Huang, Shu Jiang, Tao Hou, Chin-Teng Lin

    Abstract: Histopathological image classification constitutes a pivotal task in computer-aided diagnostics. The precise identification and categorization of histopathological images are of paramount significance for early disease detection and treatment. In the diagnostic process of pathologists, a multi-tiered approach is typically employed to assess abnormalities in cell regions at different magnifications… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by IEEE Transactions on Fuzzy Systems for publication. Permission from IEEE must be obtained for all other uses, in any current or future media. The final version is available at [doi: 10.1109/TFUZZ.2024.3410929]

    Journal ref: IEEE Transactions on Fuzzy Systems ( Early Access ) 2024

  49. arXiv:2407.14804  [pdf, other

    cs.CR

    WiFaKey: Generating Cryptographic Keys from Face in the Wild

    Authors: Xingbo Dong, Hui Zhang, Yen Lung Lai, Zhe Jin, Junduan Huang, Wenxiong Kang, Andrew Beng Jin Teoh

    Abstract: Deriving a unique cryptographic key from biometric measurements is a challenging task due to the existing noise gap between the biometric measurements and error correction coding. Additionally, privacy and security concerns arise as biometric measurements are inherently linked to the user. Biocryptosystems represent a key branch of solutions aimed at addressing these issues. However, many existing… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  50. arXiv:2407.14754  [pdf, other

    eess.IV cs.CV

    Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures

    Authors: Jiaxing Huang, Yanfeng Zhou, Yaoru Luo, Guole Liu, Heng Guo, Ge Yang

    Abstract: Accurate segmentation of long and thin tubular structures is required in a wide variety of areas such as biology, medicine, and remote sensing. The complex topology and geometry of such structures often pose significant technical challenges. A fundamental property of such structures is their topological self-similarity, which can be quantified by fractal features such as fractal dimension (FD). In… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.