Skip to main content

Showing 1–50 of 1,746 results for author: Wang, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10681  [pdf, other

    cs.CL cs.LG

    HMoE: Heterogeneous Mixture of Experts for Language Modeling

    Authors: An Wang, Xingwu Sun, Ruobing Xie, Shuaipeng Li, Jiaqi Zhu, Zhen Yang, Pinxue Zhao, J. N. Han, Zhanhui Kang, Di Wang, Naoaki Okazaki, Cheng-zhong Xu

    Abstract: Mixture of Experts (MoE) offers remarkable performance and computational efficiency by selectively activating subsets of model parameters. Traditionally, MoE models use homogeneous experts, each with identical capacity. However, varying complexity in input data necessitates experts with diverse capabilities, while homogeneous MoE hinders effective expert specialization and efficient parameter util… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2408.10588  [pdf, other

    cs.CV cs.GR

    DEGAS: Detailed Expressions on Full-Body Gaussian Avatars

    Authors: Zhijing Shao, Duotun Wang, Qing-Yao Tian, Yao-Dong Yang, Hengyu Meng, Zeyu Cai, Bo Dong, Yu Zhang, Kang Zhang, Zeyu Wang

    Abstract: Although neural rendering has made significant advancements in creating lifelike, animatable full-body and head avatars, incorporating detailed expressions into full-body avatars remains largely unexplored. We present DEGAS, the first 3D Gaussian Splatting (3DGS)-based modeling method for full-body avatars with rich facial expressions. Trained on multiview videos of a given subject, our method lea… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  3. arXiv:2408.09916  [pdf, other

    cs.CV cs.CL

    Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit

    Authors: Qizhou Chen, Taolin Zhang, Chengyu Wang, Xiaofeng He, Dakan Wang, Tingting Liu

    Abstract: Model editing aims to correct outdated or erroneous knowledge in large models without costly retraining. Recent research discovered that the mid-layer representation of the subject's final token in a prompt has a strong influence on factual predictions, and developed Large Language Model (LLM) editing techniques based on this observation. However, for Vision-LLMs (VLLMs), how visual representation… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  4. arXiv:2408.09839  [pdf, other

    cs.CV cs.AI

    Segment-Anything Models Achieve Zero-shot Robustness in Autonomous Driving

    Authors: Jun Yan, Pengyu Wang, Danni Wang, Weiquan Huang, Daniel Watzenig, Huilin Yin

    Abstract: Semantic segmentation is a significant perception task in autonomous driving. It suffers from the risks of adversarial examples. In the past few years, deep learning has gradually transitioned from convolutional neural network (CNN) models with a relatively small number of parameters to foundation models with a huge number of parameters. The segment-anything model (SAM) is a generalized image segm… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: Accepted to IAVVC 2024

  5. arXiv:2408.09326  [pdf, other

    cs.CL cs.AI cs.SE

    Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks

    Authors: Kexin Chen, Yi Liu, Dongxia Wang, Jiaying Chen, Wenhai Wang

    Abstract: Large Language Models (LLMs) have increasingly become pivotal in content generation with notable societal impact. These models hold the potential to generate content that could be deemed harmful.Efforts to mitigate this risk include implementing safeguards to ensure LLMs adhere to social ethics.However, despite such measures, the phenomenon of "jailbreaking" -- where carefully crafted prompts elic… ▽ More

    Submitted 17 August, 2024; originally announced August 2024.

  6. arXiv:2408.08841  [pdf, other

    cs.CL

    FLEXTAF: Enhancing Table Reasoning with Flexible Tabular Formats

    Authors: Xuanliang Zhang, Dingzirui Wang, Longxu Dou, Baoxin Wang, Dayong Wu, Qingfu Zhu, Wanxiang Che

    Abstract: The table reasoning task aims to answer the question according to the given table. Currently, using Large Language Models (LLMs) is the predominant method for table reasoning. Most existing methods employ a fixed tabular format to represent the table, which could limit the performance. Given that each instance requires different capabilities and models possess varying abilities, we assert that dif… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  7. arXiv:2408.08779  [pdf, other

    cs.CL

    DAC: Decomposed Automation Correction for Text-to-SQL

    Authors: Dingzirui Wang, Longxu Dou, Xuanliang Zhang, Qingfu Zhu, Wanxiang Che

    Abstract: Text-to-SQL is an important task that helps people obtain information from databases by automatically generating SQL queries. Considering the brilliant performance, approaches based on Large Language Models (LLMs) become the mainstream for text-to-SQL. Among these approaches, automated correction is an effective approach that further enhances performance by correcting the mistakes in the generated… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  8. arXiv:2408.08724  [pdf, other

    cs.CL

    ChatZero:Zero-shot Cross-Lingual Dialogue Generation via Pseudo-Target Language

    Authors: Yongkang Liu, Feng Shi, Daling Wang, Yifei Zhang, Hinrich Schütze

    Abstract: Although large language models(LLMs) show amazing capabilities, among various exciting applications discovered for LLMs fall short in other low-resource languages. Besides, most existing methods depend on large-scale dialogue corpora and thus building systems for dialogue generation in a zero-shot scenario remains a considerable challenge. To address this challenge, we propose a novel end-to-end z… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: ECAI2024

    Journal ref: ECAI2024

  9. arXiv:2408.08703  [pdf, other

    cs.CV

    TsCA: On the Semantic Consistency Alignment via Conditional Transport for Compositional Zero-Shot Learning

    Authors: Miaoge Li, Jingcai Guo, Richard Yi Da Xu, Dongsheng Wang, Xiaofeng Cao, Song Guo

    Abstract: Compositional Zero-Shot Learning (CZSL) aims to recognize novel \textit{state-object} compositions by leveraging the shared knowledge of their primitive components. Despite considerable progress, effectively calibrating the bias between semantically similar multimodal representations, as well as generalizing pre-trained knowledge to novel compositional contexts, remains an enduring challenge. In t… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  10. arXiv:2408.08515  [pdf, other

    cs.SE

    Selecting Initial Seeds for Better JVM Fuzzing

    Authors: Tianchang Gao, Junjie Chen, Dong Wang, Yile Guo, Yingquan Zhao, Zan Wang

    Abstract: Literature in traditional program fuzzing has confirmed that effectiveness is largely impacted by redundancy among initial seeds, thereby proposing a series of seed selection methods. JVM fuzzing, compared to traditional ones, presents unique characteristics, including large-scale and intricate code, and programs with both syntactic and semantic features. However, it remains unclear whether the ex… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  11. arXiv:2408.07889  [pdf, other

    cs.CV

    MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking

    Authors: Simiao Lai, Chang Liu, Jiawen Zhu, Ben Kang, Yang Liu, Dong Wang, Huchuan Lu

    Abstract: Existing RGB-T tracking algorithms have made remarkable progress by leveraging the global interaction capability and extensive pre-trained models of the Transformer architecture. Nonetheless, these methods mainly adopt imagepair appearance matching and face challenges of the intrinsic high quadratic complexity of the attention mechanism, resulting in constrained exploitation of temporal informatio… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  12. arXiv:2408.07875  [pdf, other

    cs.LG stat.ML

    Incremental Structure Discovery of Classification via Sequential Monte Carlo

    Authors: Changze Huang, Di Wang

    Abstract: Gaussian Processes (GPs) provide a powerful framework for making predictions and understanding uncertainty for classification with kernels and Bayesian non-parametric learning. Building such models typically requires strong prior knowledge to define preselect kernels, which could be ineffective for online applications of classification that sequentially process data because features of data may sh… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  13. arXiv:2408.07685  [pdf, ps, other

    cs.GT

    Auto-bidding and Auctions in Online Advertising: A Survey

    Authors: Gagan Aggarwal, Ashwinkumar Badanidiyuru, Santiago R. Balseiro, Kshipra Bhawalkar, Yuan Deng, Zhe Feng, Gagan Goel, Christopher Liaw, Haihao Lu, Mohammad Mahdian, Jieming Mao, Aranyak Mehta, Vahab Mirrokni, Renato Paes Leme, Andres Perlroth, Georgios Piliouras, Jon Schneider, Ariel Schvartzman, Balasubramanian Sivan, Kelly Spendlove, Yifeng Teng, Di Wang, Hanrui Zhang, Mingfei Zhao, Wennan Zhu , et al. (1 additional authors not shown)

    Abstract: In this survey, we summarize recent developments in research fueled by the growing adoption of automated bidding strategies in online advertising. We explore the challenges and opportunities that have arisen as markets embrace this autobidding and cover a range of topics in this area, including bidding algorithms, equilibrium analysis and efficiency of common auction formats, and optimal auction d… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  14. arXiv:2408.07402  [pdf, other

    cs.CL cs.AI cs.LO quant-ph

    A Quantum-Inspired Analysis of Human Disambiguation Processes

    Authors: Daphne Wang

    Abstract: Formal languages are essential for computer programming and are constructed to be easily processed by computers. In contrast, natural languages are much more challenging and instigated the field of Natural Language Processing (NLP). One major obstacle is the ubiquity of ambiguities. Recent advances in NLP have led to the development of large language models, which can resolve ambiguities with high… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: PhD thesis

  15. arXiv:2408.07302  [pdf

    cs.CY cs.CL cs.HC

    Effects of a Prompt Engineering Intervention on Undergraduate Students' AI Self-Efficacy, AI Knowledge and Prompt Engineering Ability: A Mixed Methods Study

    Authors: David James Woo, Deliang Wang, Tim Yung, Kai Guo

    Abstract: Prompt engineering is critical for effective interaction with large language models (LLMs) such as ChatGPT. However, efforts to teach this skill to students have been limited. This study designed and implemented a prompt engineering intervention, examining its influence on undergraduate students' AI self-efficacy, AI knowledge, and proficiency in creating effective prompts. The intervention involv… ▽ More

    Submitted 30 July, 2024; originally announced August 2024.

    Comments: 34 pages, 6 figures

  16. arXiv:2408.06665  [pdf, ps, other

    cs.LG cs.AI

    RW-NSGCN: A Robust Approach to Structural Attacks via Negative Sampling

    Authors: Shuqi He, Jun Zhuang, Ding Wang, Jun Song

    Abstract: Node classification using Graph Neural Networks (GNNs) has been widely applied in various practical scenarios, such as predicting user interests and detecting communities in social networks. However, recent studies have shown that graph-structured networks often contain potential noise and attacks, in the form of topological perturbations and weight disturbances, which can lead to decreased classi… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  17. arXiv:2408.06481  [pdf, other

    cs.RO

    UniT: Unified Tactile Representation for Robot Learning

    Authors: Zhengtong Xu, Raghava Uppuluri, Xinwei Zhang, Cael Fitch, Philip Glen Crandall, Wan Shou, Dongyi Wang, Yu She

    Abstract: UniT is a novel approach to tactile representation learning, using VQVAE to learn a compact latent space and serve as the tactile representation. It uses tactile images obtained from a single simple object to train the representation with transferability and generalizability. This tactile representation can be zero-shot transferred to various downstream tasks, including perception tasks and manipu… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  18. arXiv:2408.05107  [pdf, other

    cs.RO

    Depth Helps: Improving Pre-trained RGB-based Policy with Depth Information Injection

    Authors: Xincheng Pang, Wenke Xia, Zhigang Wang, Bin Zhao, Di Hu, Dong Wang, Xuelong Li

    Abstract: 3D perception ability is crucial for generalizable robotic manipulation. While recent foundation models have made significant strides in perception and decision-making with RGB-based input, their lack of 3D perception limits their effectiveness in fine-grained robotic manipulation tasks. To address these limitations, we propose a Depth Information Injection ($\bold{DI}^{\bold{2}}$) framework that… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: accepted by IROS 2024

  19. arXiv:2408.05019  [pdf, other

    cs.CV

    Instruction Tuning-free Visual Token Complement for Multimodal LLMs

    Authors: Dongsheng Wang, Jiequan Cui, Miaoge Li, Wang Lin, Bo Chen, Hanwang Zhang

    Abstract: As the open community of large language models (LLMs) matures, multimodal LLMs (MLLMs) have promised an elegant bridge between vision and language. However, current research is inherently constrained by challenges such as the need for high-quality instruction pairs and the loss of visual information in image-to-text training objectives. To this end, we propose a Visual Token Complement framework (… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: Accepted by ECCV2024 (20pages)

  20. arXiv:2408.04901  [pdf, other

    cs.RO

    CTE-MLO: Continuous-time and Efficient Multi-LiDAR Odometry with Localizability-aware Point Cloud Sampling

    Authors: Hongming Shen, Zhenyu Wu, Wei Wang, Qiyang Lyu, Huiqin Zhou, Tianchen Deng, Yeqing Zhu, Danwei Wang

    Abstract: In recent years, LiDAR-based localization and mapping methods have achieved significant progress thanks to their reliable and real-time localization capability. Considering single LiDAR odometry often faces hardware failures and degradation in practical scenarios, Multi-LiDAR Odometry (MLO), as an emerging technology, is studied to enhance the performance of LiDAR-based localization and mapping sy… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  21. arXiv:2408.04638  [pdf, other

    cs.CL cs.CY

    Affective Computing in the Era of Large Language Models: A Survey from the NLP Perspective

    Authors: Yiqun Zhang, Xiaocui Yang, Xingle Xu, Zeran Gao, Yijie Huang, Shiyi Mu, Shi Feng, Daling Wang, Yifei Zhang, Kaisong Song, Ge Yu

    Abstract: Affective Computing (AC), integrating computer science, psychology, and cognitive science knowledge, aims to enable machines to recognize, interpret, and simulate human emotions.To create more value, AC can be applied to diverse scenarios, including social media, finance, healthcare, education, etc. Affective Computing (AC) includes two mainstream tasks, i.e., Affective Understanding (AU) and Affe… ▽ More

    Submitted 30 July, 2024; originally announced August 2024.

  22. arXiv:2408.04472  [pdf, other

    cs.CL

    Can LLMs Beat Humans in Debating? A Dynamic Multi-agent Framework for Competitive Debate

    Authors: Yiqun Zhang, Xiaocui Yang, Shi Feng, Daling Wang, Yifei Zhang, Kaisong Song

    Abstract: Competitive debate is a complex task of computational argumentation. Large Language Models (LLMs) suffer from hallucinations and lack competitiveness in this field. To address these challenges, we introduce Agent for Debate (Agent4Debate), a dynamic multi-agent framework based on LLMs designed to enhance their capabilities in competitive debate. Drawing inspiration from human behavior in debate pr… ▽ More

    Submitted 20 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

    Comments: 12 pages (including appendix), 7 figures

  23. arXiv:2408.03586  [pdf, other

    cs.HC

    Clinical Challenges and AI Opportunities in Decision-Making for Cancer Treatment-Induced Cardiotoxicity

    Authors: Siyi Wu, Weidan Cao, Shihan Fu, Bingsheng Yao, Ziqi Yang, Changchang Yin, Varun Mishra, Daniel Addison, Ping Zhang, Dakuo Wang

    Abstract: Cardiotoxicity induced by cancer treatment has become a major clinical concern, affecting the long-term survival and quality of life of cancer patients. Effective clinical decision-making, including the detection of cancer treatment-induced cardiotoxicity and the monitoring of associated symptoms, remains a challenging task for clinicians. This study investigates the current practices and needs of… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: In Submission

  24. arXiv:2408.03482  [pdf, other

    cs.CR

    Beyond App Markets: Demystifying Underground Mobile App Distribution Via Telegram

    Authors: Yanhui Guo, Dong Wang, Liu Wang, Yongsheng Fang, Chao Wang, Minghui Yang, Tianming Liu, Haoyu Wang

    Abstract: The thriving mobile app ecosystem encompasses a wide range of functionalities. However, within this ecosystem, a subset of apps provides illicit services such as gambling and pornography to pursue economic gains, collectively referred to as "underground economy apps". While previous studies have examined these apps' characteristics and identification methods, investigations into their distribution… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  25. arXiv:2408.02912  [pdf, other

    cs.RO cs.AI

    KOI: Accelerating Online Imitation Learning via Hybrid Key-state Guidance

    Authors: Jingxian Lu, Wenke Xia, Dong Wang, Zhigang Wang, Bin Zhao, Di Hu, Xuelong Li

    Abstract: Online Imitation Learning methods struggle with the gap between extensive online exploration space and limited expert trajectories, which hinder efficient exploration due to inaccurate task-aware reward estimation. Inspired by the findings from cognitive neuroscience that task decomposition could facilitate cognitive processing for efficient learning, we hypothesize that an agent could estimate pr… ▽ More

    Submitted 8 August, 2024; v1 submitted 5 August, 2024; originally announced August 2024.

  26. arXiv:2408.02865  [pdf, other

    eess.IV cs.AI cs.CL cs.CV

    VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge

    Authors: Zihan Li, Diping Song, Zefeng Yang, Deming Wang, Fei Li, Xiulan Zhang, Paul E. Kinahan, Yu Qiao

    Abstract: The need for improved diagnostic methods in ophthalmology is acute, especially in the less developed regions with limited access to specialists and advanced equipment. Therefore, we introduce VisionUnite, a novel vision-language foundation model for ophthalmology enhanced with clinical knowledge. VisionUnite has been pretrained on an extensive dataset comprising 1.24 million image-text pairs, and… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  27. arXiv:2408.02814  [pdf, other

    cs.LG cs.CR

    Pre-trained Encoder Inference: Revealing Upstream Encoders In Downstream Machine Learning Services

    Authors: Shaopeng Fu, Xuexue Sun, Ke Qing, Tianhang Zheng, Di Wang

    Abstract: Though pre-trained encoders can be easily accessed online to build downstream machine learning (ML) services quickly, various attacks have been designed to compromise the security and privacy of these encoders. While most attacks target encoders on the upstream side, it remains unknown how an encoder could be threatened when deployed in a downstream ML service. This paper unveils a new vulnerabili… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  28. arXiv:2408.02262  [pdf, other

    cs.SE

    Towards Identifying Code Proficiency through the Analysis of Python Textbooks

    Authors: Ruksit Rojpaisarnkit, Gregorio Robles, Raula Gaikovina Kula, Dong Wang, Chaiyong Ragkhitwetsagul, Jesus M. Gonzalez-Barahona, Kenichi Matsumoto

    Abstract: Python, one of the most prevalent programming languages today, is widely utilized in various domains, including web development, data science, machine learning, and DevOps. Recent scholarly efforts have proposed a methodology to assess Python competence levels, similar to how proficiency in natural languages is evaluated. This method involves assigning levels of competence to Python constructs, fo… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: 12 pages, 7 figures, 6 tables, ICSME2024

    ACM Class: D.2.0; D.2.7

  29. arXiv:2408.01760  [pdf, other

    cs.SE

    Large Language Models for Equivalent Mutant Detection: How Far Are We?

    Authors: Zhao Tian, Honglin Shu, Dong Wang, Xuejie Cao, Yasutaka Kamei, Junjie Chen

    Abstract: Mutation testing is vital for ensuring software quality. However, the presence of equivalent mutants is known to introduce redundant cost and bias issues, hindering the effectiveness of mutation testing in practical use. Although numerous equivalent mutant detection (EMD) techniques have been proposed, they exhibit limitations due to the scarcity of training data and challenges in generalizing to… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by ISSTA'2024

  30. arXiv:2408.01431  [pdf

    cs.CY cs.AI

    Building an Ethical and Trustworthy Biomedical AI Ecosystem for the Translational and Clinical Integration of Foundational Models

    Authors: Simha Sankar Baradwaj, Destiny Gilliland, Jack Rincon, Henning Hermjakob, Yu Yan, Irsyad Adam, Gwyneth Lemaster, Dean Wang, Karol Watson, Alex Bui, Wei Wang, Peipei Ping

    Abstract: Foundational Models (FMs) are gaining increasing attention in the biomedical AI ecosystem due to their ability to represent and contextualize multimodal biomedical data. These capabilities make FMs a valuable tool for a variety of tasks, including biomedical reasoning, hypothesis generation, and interpreting complex imaging data. In this review paper, we address the unique challenges associated wi… ▽ More

    Submitted 13 August, 2024; v1 submitted 18 July, 2024; originally announced August 2024.

    Comments: 3 figures, 3 tables

  31. arXiv:2408.01091  [pdf, other

    cs.AI

    Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions

    Authors: Jin Gao, Lei Gan, Yuankai Li, Yixin Ye, Dequan Wang

    Abstract: Large multimodal models (LMMs) excel in adhering to human instructions. However, self-contradictory instructions may arise due to the increasing trend of multimodal interaction and context length, which is challenging for language beginners and vulnerable populations. We introduce the Self-Contradictory Instructions benchmark to evaluate the capability of LMMs in recognizing conflicting commands.… ▽ More

    Submitted 5 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: Accepted by the 18th European Conference on Computer Vision ECCV 2024

  32. arXiv:2408.00230  [pdf, other

    cs.AI cs.CL

    Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models

    Authors: Juntu Zhao, Junyu Deng, Yixin Ye, Chongxuan Li, Zhijie Deng, Dequan Wang

    Abstract: Advancements in text-to-image diffusion models have broadened extensive downstream practical applications, but such models often encounter misalignment issues between text and image. Taking the generation of a combination of two disentangled concepts as an example, say given the prompt "a tea cup of iced coke", existing models usually generate a glass cup of iced coke because the iced coke usually… ▽ More

    Submitted 5 August, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

    Comments: Accepted by the 18th European Conference on Computer Vision ECCV 2024

  33. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  34. arXiv:2407.20499  [pdf, other

    cs.LG

    Optimizing Long-tailed Link Prediction in Graph Neural Networks through Structure Representation Enhancement

    Authors: Yakun Wang, Daixin Wang, Hongrui Liu, Binbin Hu, Yingcui Yan, Qiyang Zhang, Zhiqiang Zhang

    Abstract: Link prediction, as a fundamental task for graph neural networks (GNNs), has boasted significant progress in varied domains. Its success is typically influenced by the expressive power of node representation, but recent developments reveal the inferior performance of low-degree nodes owing to their sparse neighbor connections, known as the degree-based long-tailed problem. Will the degree-based lo… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  35. arXiv:2407.19053  [pdf, other

    cs.SE

    A Study of Using Multimodal LLMs for Non-Crash Functional Bug Detection in Android Apps

    Authors: Bangyan Ju, Jin Yang, Tingting Yu, Tamerlan Abdullayev, Yuanyuan Wu, Dingbang Wang, Yu Zhao

    Abstract: Numerous approaches employing various strategies have been developed to test the graphical user interfaces (GUIs) of mobile apps. However, traditional GUI testing techniques, such as random and model-based testing, primarily focus on generating test sequences that excel in achieving high code coverage but often fail to act as effective test oracles for non-crash functional (NCF) bug detection. To… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  36. arXiv:2407.18526  [pdf, other

    cs.LG

    Constructing Enhanced Mutual Information for Online Class-Incremental Learning

    Authors: Huan Zhang, Fan Lyu, Shenghua Fan, Yujin Zheng, Dingwen Wang

    Abstract: Online Class-Incremental continual Learning (OCIL) addresses the challenge of continuously learning from a single-channel data stream, adapting to new tasks while mitigating catastrophic forgetting. Recently, Mutual Information (MI)-based methods have shown promising performance in OCIL. However, existing MI-based methods treat various knowledge components in isolation, ignoring the knowledge conf… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  37. arXiv:2407.18324  [pdf, other

    cs.LG cs.CL eess.AS q-fin.CP q-fin.ST

    AMA-LSTM: Pioneering Robust and Fair Financial Audio Analysis for Stock Volatility Prediction

    Authors: Shengkun Wang, Taoran Ji, Jianfeng He, Mariam Almutairi, Dan Wang, Linhan Wang, Min Zhang, Chang-Tien Lu

    Abstract: Stock volatility prediction is an important task in the financial industry. Recent advancements in multimodal methodologies, which integrate both textual and auditory data, have demonstrated significant improvements in this domain, such as earnings calls (Earnings calls are public available and often involve the management team of a public company and interested parties to discuss the company's ea… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  38. arXiv:2407.16999  [pdf, other

    cs.LG cs.AI cs.HC

    SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing

    Authors: Changchang Yin, Pin-Yu Chen, Bingsheng Yao, Dakuo Wang, Jeffrey Caterino, Ping Zhang

    Abstract: Sepsis is the leading cause of in-hospital mortality in the USA. Early sepsis onset prediction and diagnosis could significantly improve the survival of sepsis patients. Existing predictive models are usually trained on high-quality data with few missing information, while missing values widely exist in real-world clinical scenarios (especially in the first hours of admissions to the hospital), wh… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: To be published in KDD 2024

    MSC Class: 68T07 (primary) 92C50 (secondary) ACM Class: H.2.8; I.2.1; J.3

  39. arXiv:2407.16670  [pdf, other

    cs.CV cs.CY cs.MM

    FakingRecipe: Detecting Fake News on Short Video Platforms from the Perspective of Creative Process

    Authors: Yuyan Bu, Qiang Sheng, Juan Cao, Peng Qi, Danding Wang, Jintao Li

    Abstract: As short-form video-sharing platforms become a significant channel for news consumption, fake news in short videos has emerged as a serious threat in the online information ecosystem, making developing detection methods for this new scenario an urgent need. Compared with that in text and image formats, fake news on short video platforms contains rich but heterogeneous information in various modali… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Will appear at ACM Multimedia 2024 (MM 2024), 13 pages, 15 figures

  40. arXiv:2407.16634  [pdf, other

    eess.IV cs.AI cs.CV cs.HC

    Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

    Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, Liwei Wang

    Abstract: Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifical… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  41. arXiv:2407.14020  [pdf, other

    q-bio.NC cs.LG

    NeuroBind: Towards Unified Multimodal Representations for Neural Signals

    Authors: Fengyu Yang, Chao Feng, Daniel Wang, Tianye Wang, Ziyao Zeng, Zhiyang Xu, Hyoungseob Park, Pengliang Ji, Hanbin Zhao, Yuanning Li, Alex Wong

    Abstract: Understanding neural activity and information representation is crucial for advancing knowledge of brain function and cognition. Neural activity, measured through techniques like electrophysiology and neuroimaging, reflects various aspects of information processing. Recent advances in deep neural networks offer new approaches to analyzing these signals using pre-trained models. However, challenges… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  42. arXiv:2407.12888  [pdf

    cs.CL cs.AI

    Explainable Biomedical Hypothesis Generation via Retrieval Augmented Generation enabled Large Language Models

    Authors: Alexander R. Pelletier, Joseph Ramirez, Irsyad Adam, Simha Sankar, Yu Yan, Ding Wang, Dylan Steinecke, Wei Wang, Peipei Ping

    Abstract: The vast amount of biomedical information available today presents a significant challenge for investigators seeking to digest, process, and understand these findings effectively. Large Language Models (LLMs) have emerged as powerful tools to navigate this complex and challenging data landscape. However, LLMs may lead to hallucinatory responses, making Retrieval Augmented Generation (RAG) crucial… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  43. arXiv:2407.12593  [pdf, other

    cs.CV

    EvSign: Sign Language Recognition and Translation with Streaming Events

    Authors: Pengyu Zhang, Hao Yin, Zeren Wang, Wenyue Chen, Shengming Li, Dong Wang, Huchuan Lu, Xu Jia

    Abstract: Sign language is one of the most effective communication tools for people with hearing difficulties. Most existing works focus on improving the performance of sign language tasks on RGB videos, which may suffer from degraded recording conditions, such as fast movement of hands with motion blur and textured signer's appearance. The bio-inspired event camera, which asynchronously captures brightness… ▽ More

    Submitted 21 July, 2024; v1 submitted 17 July, 2024; originally announced July 2024.

    Comments: To appear on ECCV 2024

  44. arXiv:2407.12580  [pdf, other

    cs.CL cs.CV cs.IR

    E5-V: Universal Embeddings with Multimodal Large Language Models

    Authors: Ting Jiang, Minghui Song, Zihan Zhang, Haizhen Huang, Weiwei Deng, Feng Sun, Qi Zhang, Deqing Wang, Fuzhen Zhuang

    Abstract: Multimodal large language models (MLLMs) have shown promising advancements in general visual and language understanding. However, the representation of multimodal information using MLLMs remains largely unexplored. In this work, we introduce a new framework, E5-V, designed to adapt MLLMs for achieving universal multimodal embeddings. Our findings highlight the significant potential of MLLMs in rep… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Code and models are available at https://github.com/kongds/E5-V

  45. arXiv:2407.12338  [pdf, other

    cs.IR cs.AI

    GUME: Graphs and User Modalities Enhancement for Long-Tail Multimodal Recommendation

    Authors: Guojiao Lin, Zhen Meng, Dongjie Wang, Qingqing Long, Yuanchun Zhou, Meng Xiao

    Abstract: Multimodal recommendation systems (MMRS) have received considerable attention from the research community due to their ability to jointly utilize information from user behavior and product images and text. Previous research has two main issues. First, many long-tail items in recommendation systems have limited interaction data, making it difficult to learn comprehensive and informative representat… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: 11 pages, accepted by CIKM 2024

  46. The Great AI Witch Hunt: Reviewers Perception and (Mis)Conception of Generative AI in Research Writing

    Authors: Hilda Hadan, Derrick Wang, Reza Hadi Mogavi, Joseph Tu, Leah Zhang-Kennedy, Lennart E. Nacke

    Abstract: Generative AI (GenAI) use in research writing is growing fast. However, it is unclear how peer reviewers recognize or misjudge AI-augmented manuscripts. To investigate the impact of AI-augmented writing on peer reviews, we conducted a snippet-based online survey with 17 peer reviewers from top-tier HCI conferences. Our findings indicate that while AI-augmented writing improves readability, languag… ▽ More

    Submitted 26 June, 2024; originally announced July 2024.

    Journal ref: ResearchGate2024

  47. arXiv:2407.11853  [pdf, other

    cs.ET

    A Case for Application-Aware Space Radiation Tolerance in Orbital Computing

    Authors: Meiqi Wang, Han Qiu, Longnv Xu, Di Wang, Yuanjie Li, Tianwei Zhang, Jun Liu, Hewu Li

    Abstract: We are witnessing a surge in the use of commercial off-the-shelf (COTS) hardware for cost-effective in-orbit computing, such as deep neural network (DNN) based on-satellite sensor data processing, Earth object detection, and task decision.However, once exposed to harsh space environments, COTS hardware is vulnerable to cosmic radiation and suffers from exhaustive single-event upsets (SEUs) and mul… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  48. arXiv:2407.11477  [pdf, other

    cs.LG cs.AI

    XTraffic: A Dataset Where Traffic Meets Incidents with Explainability and More

    Authors: Xiaochuan Gou, Ziyue Li, Tian Lan, Junpeng Lin, Zhishuai Li, Bingyu Zhao, Chen Zhang, Di Wang, Xiangliang Zhang

    Abstract: Long-separated research has been conducted on two highly correlated tracks: traffic and incidents. Traffic track witnesses complicating deep learning models, e.g., to push the prediction a few percent more accurate, and the incident track only studies the incidents alone, e.g., to infer the incident risk. We, for the first time, spatiotemporally aligned the two tracks in a large-scale region (16,9… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  49. arXiv:2407.11389  [pdf, ps, other

    cs.NI eess.SP

    Spatial-spectral Cell-free Networks: A Large-scale Case Study

    Authors: Zesheng Zhu, Lifeng Wang, Xin Wang, Dongming Wang, Kai-Kit Wong

    Abstract: This paper studies the large-scale cell-free networks where dense distributed access points (APs) serve many users. As a promising next-generation network architecture, cell-free networks enable ultra-reliable connections and minimal fading/blockage, which are much favorable to the millimeter wave and Terahertz transmissions. However, conventional beam management with large phased arrays in a cell… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  50. arXiv:2407.11038  [pdf, other

    cs.LG cs.AI cs.NE

    Fuzzy Recurrent Stochastic Configuration Networks for Industrial Data Analytics

    Authors: Dianhui Wang, Gang Dang

    Abstract: This paper presents a novel neuro-fuzzy model, termed fuzzy recurrent stochastic configuration networks (F-RSCNs), for industrial data analytics. Unlike the original recurrent stochastic configuration network (RSCN), the proposed F-RSCN is constructed by multiple sub-reservoirs, and each sub-reservoir is associated with a Takagi-Sugeno-Kang (TSK) fuzzy rule. Through this hybrid framework, first, t… ▽ More

    Submitted 12 August, 2024; v1 submitted 5 July, 2024; originally announced July 2024.