Skip to main content

Showing 1–50 of 1,721 results for author: Wu, Z

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.09878  [pdf, other

    cs.CR

    Transferring Backdoors between Large Language Models by Knowledge Distillation

    Authors: Pengzhou Cheng, Zongru Wu, Tianjie Ju, Wei Du, Zhuosheng Zhang Gongshen Liu

    Abstract: Backdoor Attacks have been a serious vulnerability against Large Language Models (LLMs). However, previous methods only reveal such risk in specific models, or present tasks transferability after attacking the pre-trained phase. So, how risky is the model transferability of a backdoor attack? In this paper, we focus on whether existing mini-LLMs may be unconsciously instructed in backdoor knowledg… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: 13 pages, 16 figures, 5 tables

  2. arXiv:2408.09172  [pdf, other

    cs.AI cs.CL

    Unc-TTP: A Method for Classifying LLM Uncertainty to Improve In-Context Example Selection

    Authors: Hsiu-Yuan Huang, Zichen Wu, Yutong Yang, Junzhao Zhang, Yunfang Wu

    Abstract: Nowadays, Large Language Models (LLMs) have demonstrated exceptional performance across various downstream tasks. However, it is challenging for users to discern whether the responses are generated with certainty or are fabricated to meet user expectations. Estimating the uncertainty of LLMs is particularly challenging due to their vast scale and the lack of white-box access. In this work, we prop… ▽ More

    Submitted 20 August, 2024; v1 submitted 17 August, 2024; originally announced August 2024.

    Comments: 9 pages, long paper

  3. arXiv:2408.09070  [pdf, other

    cs.CL cs.IR

    CodeTaxo: Enhancing Taxonomy Expansion with Limited Examples via Code Language Prompts

    Authors: Qingkai Zeng, Yuyang Bai, Zhaoxuan Tan, Zhenyu Wu, Shangbin Feng, Meng Jiang

    Abstract: Taxonomies play a crucial role in various applications by providing a structural representation of knowledge. The task of taxonomy expansion involves integrating emerging concepts into existing taxonomies by identifying appropriate parent concepts for these new query concepts. Previous approaches typically relied on self-supervised methods that generate annotation data from existing taxonomies. Ho… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  4. arXiv:2408.08870  [pdf, other

    cs.CV

    SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation

    Authors: Xinyu Xiong, Zihuang Wu, Shuangyi Tan, Wenxue Li, Feilong Tang, Ying Chen, Siying Li, Jie Ma, Guanbin Li

    Abstract: Image segmentation plays an important role in vision understanding. Recently, the emerging vision foundation models continuously achieved superior performance on various tasks. Following such success, in this paper, we prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models. We propose a simple but effective framework, termed SAM2-UNet, for versatile… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Technical Report

  5. arXiv:2408.08852  [pdf, other

    cs.AI cs.LG

    GeoTransformer: Enhancing Urban Forecasting with Geospatial Attention Mechanisms

    Authors: Yuhao Jia, Zile Wu, Shengao Yi, Yifei Sun

    Abstract: Recent advancements have focused on encoding urban spatial information into high-dimensional spaces, with notable efforts dedicated to integrating sociodemographic data and satellite imagery. These efforts have established foundational models in this field. However, the effective utilization of these spatial representations for urban forecasting applications remains under-explored. To address this… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  6. arXiv:2408.08332  [pdf, other

    cs.CV cs.LG

    TurboEdit: Instant text-based image editing

    Authors: Zongze Wu, Nicholas Kolkin, Jonathan Brandt, Richard Zhang, Eli Shechtman

    Abstract: We address the challenges of precise image inversion and disentangled image editing in the context of few-step diffusion models. We introduce an encoder based iterative inversion technique. The inversion network is conditioned on the input image and the reconstructed image from the previous step, allowing for correction of the next reconstruction towards the input image. We demonstrate that disent… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted to European Conference on Computer Vision (ECCV), 2024. Project page: https://betterze.github.io/TurboEdit/

  7. arXiv:2408.08152  [pdf, other

    cs.CL cs.AI cs.LG cs.LO

    DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

    Authors: Huajian Xin, Z. Z. Ren, Junxiao Song, Zhihong Shao, Wanjia Zhao, Haocheng Wang, Bo Liu, Liyue Zhang, Xuan Lu, Qiushi Du, Wenjun Gao, Qihao Zhu, Dejian Yang, Zhibin Gou, Z. F. Wu, Fuli Luo, Chong Ruan

    Abstract: We introduce DeepSeek-Prover-V1.5, an open-source language model designed for theorem proving in Lean 4, which enhances DeepSeek-Prover-V1 by optimizing both training and inference processes. Pre-trained on DeepSeekMath-Base with specialization in formal mathematical languages, the model undergoes supervised fine-tuning using an enhanced formal theorem proving dataset derived from DeepSeek-Prover-… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  8. arXiv:2408.08088  [pdf, other

    cs.CR cs.IR

    KGV: Integrating Large Language Models with Knowledge Graphs for Cyber Threat Intelligence Credibility Assessment

    Authors: Zongzong Wu, Fengxiao Tang, Ming Zhao, Yufeng Li

    Abstract: Cyber threat intelligence is a critical tool that many organizations and individuals use to protect themselves from sophisticated, organized, persistent, and weaponized cyber attacks. However, few studies have focused on the quality assessment of threat intelligence provided by intelligence platforms, and this work still requires manual analysis by cybersecurity experts. In this paper, we propose… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  9. arXiv:2408.07543  [pdf, other

    cs.CV cs.CL

    MathScape: Evaluating MLLMs in multimodal Math Scenarios through a Hierarchical Benchmark

    Authors: Minxuan Zhou, Hao Liang, Tianpeng Li, Zhiyu Wu, Mingan Lin, Linzhuang Sun, Yaqi Zhou, Yan Zhang, Xiaoqin Huang, Yicong Chen, Yujing Qiao, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

    Abstract: With the development of Multimodal Large Language Models (MLLMs), the evaluation of multimodal models in the context of mathematical problems has become a valuable research field. Multimodal visual-textual mathematical reasoning serves as a critical indicator for evaluating the comprehension and complex multi-step quantitative reasoning abilities of MLLMs. However, previous multimodal math benchma… ▽ More

    Submitted 15 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

  10. arXiv:2408.07084  [pdf

    cs.LG cs.AI

    Dynamic Hypergraph-Enhanced Prediction of Sequential Medical Visits

    Authors: Wangying Yang, Zitao Zheng, Shi Bo, Zhizhong Wu, Bo Zhang, Yuanfang Yang

    Abstract: This study introduces a pioneering Dynamic Hypergraph Networks (DHCE) model designed to predict future medical diagnoses from electronic health records with enhanced accuracy. The DHCE model innovates by identifying and differentiating acute and chronic diseases within a patient's visit history, constructing dynamic hypergraphs that capture the complex, high-order interactions between diseases. It… ▽ More

    Submitted 19 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

  11. arXiv:2408.05894  [pdf, other

    cs.CV cs.CL

    GlyphPattern: An Abstract Pattern Recognition for Vision-Language Models

    Authors: Zixuan Wu, Yoolim Kim, Carolyn Jane Anderson

    Abstract: Vision-Language Models (VLMs) building upon the foundation of powerful large language models have made rapid progress in reasoning across visual and textual data. While VLMs perform well on vision tasks that they are trained on, our results highlight key challenges in abstract pattern recognition. We present GlyphPattern, a 954 item dataset that pairs 318 human-written descriptions of visual patte… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  12. arXiv:2408.05517  [pdf, other

    cs.CL

    SWIFT:A Scalable lightWeight Infrastructure for Fine-Tuning

    Authors: Yuze Zhao, Jintao Huang, Jinghan Hu, Xingjun Wang, Yunlin Mao, Daoze Zhang, Zeyinzi Jiang, Zhikai Wu, Baole Ai, Ang Wang, Wenmeng Zhou, Yingda Chen

    Abstract: Recent development in Large Language Models (LLMs) and Multi-modal Large Language Models (MLLMs) have leverage Attention-based Transformer architectures and achieved superior performance and generalization capabilities. They have since covered extensive areas of traditional learning tasks. For instance, text-based tasks such as text-classification and sequence-labeling, as well as multi-modal task… ▽ More

    Submitted 18 August, 2024; v1 submitted 10 August, 2024; originally announced August 2024.

  13. arXiv:2408.04901  [pdf, other

    cs.RO

    CTE-MLO: Continuous-time and Efficient Multi-LiDAR Odometry with Localizability-aware Point Cloud Sampling

    Authors: Hongming Shen, Zhenyu Wu, Wei Wang, Qiyang Lyu, Huiqin Zhou, Tianchen Deng, Yeqing Zhu, Danwei Wang

    Abstract: In recent years, LiDAR-based localization and mapping methods have achieved significant progress thanks to their reliable and real-time localization capability. Considering single LiDAR odometry often faces hardware failures and degradation in practical scenarios, Multi-LiDAR Odometry (MLO), as an emerging technology, is studied to enhance the performance of LiDAR-based localization and mapping sy… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  14. arXiv:2408.04325  [pdf, other

    eess.AS cs.CL

    HydraFormer: One Encoder For All Subsampling Rates

    Authors: Yaoxun Xu, Xingchen Song, Zhiyong Wu, Di Wu, Zhendong Peng, Binbin Zhang

    Abstract: In automatic speech recognition, subsampling is essential for tackling diverse scenarios. However, the inadequacy of a single subsampling rate to address various real-world situations often necessitates training and deploying multiple models, consequently increasing associated costs. To address this issue, we propose HydraFormer, comprising HydraSub, a Conformer-based encoder, and a BiTransformer-… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: accepted by ICME 2024

  15. Tackling Noisy Clients in Federated Learning with End-to-end Label Correction

    Authors: Xuefeng Jiang, Sheng Sun, Jia Li, Jingjing Xue, Runhan Li, Zhiyuan Wu, Gang Xu, Yuwei Wang, Min Liu

    Abstract: Recently, federated learning (FL) has achieved wide successes for diverse privacy-sensitive applications without sacrificing the sensitive private information of clients. However, the data quality of client datasets can not be guaranteed since corresponding annotations of different clients often contain complex label noise of varying degrees, which inevitably causes the performance degradation. In… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

    Comments: To appear in ACM CIKM'24 full research paper track

  16. arXiv:2408.04299  [pdf, other

    cs.CV

    Respiratory Subtraction for Pulmonary Microwave Ablation Evaluation

    Authors: Wan Li, Xinyun Zhong, Wei Li, Song Zhang, Moheng Rong, Yan Xi, Peng Yuan, Zechen Wang, Xiaolei Jiang, Rongxi Yi, Hui Tang, Yang Chen, Chaohui Tong, Zhan Wu, Feng Wang

    Abstract: Currently, lung cancer is a leading cause of global cancer mortality, often necessitating minimally invasive interventions. Microwave ablation (MWA) is extensively utilized for both primary and secondary lung tumors. Although numerous clinical guidelines and standards for MWA have been established, the clinical evaluation of ablation surgery remains challenging and requires long-term patient follo… ▽ More

    Submitted 8 August, 2024; originally announced August 2024.

  17. arXiv:2408.03097  [pdf, other

    cs.CV

    Prototype Learning for Micro-gesture Classification

    Authors: Guoliang Chen, Fei Wang, Kun Li, Zhiliang Wu, Hehe Fan, Yi Yang, Meng Wang, Dan Guo

    Abstract: In this paper, we briefly introduce the solution developed by our team, HFUT-VUT, for the track of Micro-gesture Classification in the MiGA challenge at IJCAI 2024. The task of micro-gesture classification task involves recognizing the category of a given video clip, which focuses on more fine-grained and subtle body movements compared to typical action recognition tasks. Given the inherent comple… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: 1st Place in Micro-gesture Classification in MiGA at IJCAI-2024

  18. arXiv:2408.03017  [pdf, other

    cs.RO

    Closed-Loop Magnetic Control of Medical Soft Continuum Robots for Deflection

    Authors: Zhiwei Wu, Siyi Wei, Zhanxin Geng, Jinhui Zhang, Duanduan Chen

    Abstract: Magnetic soft continuum robots (MSCRs) have emerged as powerful devices in endovascular interventions owing to their hyperelastic fibre matrix and enhanced magnetic manipulability. Effective closed-loop control of tethered magnetic devices contributes to the achievement of autonomous vascular robotic surgery. In this article, we employ a magnetic actuation system equipped with a single rotatable p… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  19. arXiv:2408.01946  [pdf, other

    cs.CV

    Masked Angle-Aware Autoencoder for Remote Sensing Images

    Authors: Zhihao Li, Biao Hou, Siteng Ma, Zitong Wu, Xianpeng Guo, Bo Ren, Licheng Jiao

    Abstract: To overcome the inherent domain gap between remote sensing (RS) images and natural images, some self-supervised representation learning methods have made promising progress. However, they have overlooked the diverse angles present in RS objects. This paper proposes the Masked Angle-Aware Autoencoder (MA3E) to perceive and learn angles during pre-training. We design a \textit{scaling center crop} o… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

    Comments: This paper has been accepted by ECCV 2024

  20. arXiv:2408.01705  [pdf, other

    cs.CV cs.AI

    Downstream Transfer Attack: Adversarial Attacks on Downstream Models with Pre-trained Vision Transformers

    Authors: Weijie Zheng, Xingjun Ma, Hanxun Huang, Zuxuan Wu, Yu-Gang Jiang

    Abstract: With the advancement of vision transformers (ViTs) and self-supervised learning (SSL) techniques, pre-trained large ViTs have become the new foundation models for computer vision applications. However, studies have shown that, like convolutional neural networks (CNNs), ViTs are also susceptible to adversarial attacks, where subtle perturbations in the input can fool the model into making false pre… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  21. arXiv:2408.01319  [pdf, other

    cs.AI

    A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks

    Authors: Jiaqi Wang, Hanqi Jiang, Yiheng Liu, Chong Ma, Xu Zhang, Yi Pan, Mengyuan Liu, Peiran Gu, Sichen Xia, Wenjun Li, Yutong Zhang, Zihao Wu, Zhengliang Liu, Tianyang Zhong, Bao Ge, Tuo Zhang, Ning Qiang, Xintao Hu, Xi Jiang, Xin Zhang, Wei Zhang, Dinggang Shen, Tianming Liu, Shu Zhang

    Abstract: In an era defined by the explosive growth of data and rapid technological advancements, Multimodal Large Language Models (MLLMs) stand at the forefront of artificial intelligence (AI) systems. Designed to seamlessly integrate diverse data types-including text, images, videos, audio, and physiological sequences-MLLMs address the complexities of real-world applications far beyond the capabilities of… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

  22. arXiv:2408.00241  [pdf, other

    cs.AI

    Multiple Greedy Quasi-Newton Methods for Saddle Point Problems

    Authors: Minheng Xiao, Shi Bo, Zhizhong Wu

    Abstract: This paper introduces the Multiple Greedy Quasi-Newton (MGSR1-SP) method, a novel approach to solving strongly-convex-strongly-concave (SCSC) saddle point problems. Our method enhances the approximation of the squared indefinite Hessian matrix inherent in these problems, significantly improving both stability and efficiency through iterative greedy updates. We provide a thorough theoretical analys… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

    Comments: Submitted to DOCS 2024

  23. arXiv:2407.21531  [pdf, other

    cs.SD cs.CL cs.MM eess.AS

    Can LLMs "Reason" in Music? An Evaluation of LLMs' Capability of Music Understanding and Generation

    Authors: Ziya Zhou, Yuhang Wu, Zhiyue Wu, Xinyue Zhang, Ruibin Yuan, Yinghao Ma, Lu Wang, Emmanouil Benetos, Wei Xue, Yike Guo

    Abstract: Symbolic Music, akin to language, can be encoded in discrete symbols. Recent research has extended the application of large language models (LLMs) such as GPT-4 and Llama2 to the symbolic music domain including understanding and generation. Yet scant research explores the details of how these LLMs perform on advanced music understanding and conditioned generation, especially from the multi-step re… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: Accepted by ISMIR2024

  24. arXiv:2407.21417  [pdf, other

    cs.CL

    Dancing in Chains: Reconciling Instruction Following and Faithfulness in Language Models

    Authors: Zhengxuan Wu, Yuhao Zhang, Peng Qi, Yumo Xu, Rujun Han, Yian Zhang, Jifan Chen, Bonan Min, Zhiheng Huang

    Abstract: Modern language models (LMs) need to follow human instructions while being faithful; yet, they often fail to achieve both. Here, we provide concrete evidence of a trade-off between instruction following (i.e., follow open-ended instructions) and faithfulness (i.e., ground responses in given context) when training LMs with these objectives. For instance, fine-tuning LLaMA-7B on instruction followin… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: preprint

  25. arXiv:2407.21315  [pdf, other

    cs.CL cs.AI

    Beyond Silent Letters: Amplifying LLMs in Emotion Recognition with Vocal Nuances

    Authors: Zehui Wu, Ziwei Gong, Lin Ai, Pengyuan Shi, Kaan Donbekci, Julia Hirschberg

    Abstract: This paper introduces a novel approach to emotion detection in speech using Large Language Models (LLMs). We address the limitation of LLMs in processing audio inputs by translating speech characteristics into natural language descriptions. Our method integrates these descriptions into text prompts, enabling LLMs to perform multimodal emotion analysis without architectural modifications. We evalua… ▽ More

    Submitted 31 July, 2024; v1 submitted 30 July, 2024; originally announced July 2024.

  26. arXiv:2407.21057  [pdf, other

    cs.CL cs.AI cs.LG

    Multi-group Uncertainty Quantification for Long-form Text Generation

    Authors: Terrance Liu, Zhiwei Steven Wu

    Abstract: While large language models are rapidly moving towards consumer-facing applications, they are often still prone to factual errors and hallucinations. In order to reduce the potential harms that may come from these errors, it is important for users to know to what extent they can trust an LLM when it makes a factual claim. To this end, we study the problem of uncertainty quantification of factual c… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  27. arXiv:2407.21045  [pdf

    cs.CL cs.AI

    Unlocking the Potential: Benchmarking Large Language Models in Water Engineering and Research

    Authors: Boyan Xu, Liang Wen, Zihao Li, Yuxing Yang, Guanlan Wu, Xiongpeng Tang, Yu Li, Zihao Wu, Qingxian Su, Xueqing Shi, Yue Yang, Rui Tong, How Yong Ng

    Abstract: Recent advancements in Large Language Models (LLMs) have sparked interest in their potential applications across various fields. This paper embarked on a pivotal inquiry: Can existing LLMs effectively serve as "water expert models" for water engineering and research tasks? This study was the first to evaluate LLMs' contributions across various water engineering and research tasks by establishing a… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

  28. An Efficient Convex-Hull Relaxation Based Algorithm for Multi-User Discrete Passive Beamforming

    Authors: Wenhai Lai, Zheyu Wu, Yi Feng, Kaiming Shen, Ya-Feng Liu

    Abstract: Intelligent reflecting surface (IRS) is an emerging technology to enhance spatial multiplexing in wireless networks. This letter considers the discrete passive beamforming design for IRS in order to maximize the minimum signal-to-interference-plus-noise ratio (SINR) among multiple users in an IRS-assisted downlink network. The main design difficulty lies in the discrete phase-shift constraint. Dif… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

    Comments: 5 pages

    Journal ref: IEEE Signal Processing Letters 2024

  29. arXiv:2407.19352  [pdf

    cs.LG q-fin.RM

    Design and Optimization of Big Data and Machine Learning-Based Risk Monitoring System in Financial Markets

    Authors: Liyang Wang, Yu Cheng, Xingxin Gu, Zhizhong Wu

    Abstract: With the increasing complexity of financial markets and rapid growth in data volume, traditional risk monitoring methods no longer suffice for modern financial institutions. This paper designs and optimizes a risk monitoring system based on big data and machine learning. By constructing a four-layer architecture, it effectively integrates large-scale financial data and advanced machine learning al… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

  30. arXiv:2407.19035  [pdf, other

    cs.CV

    ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting

    Authors: Shen Chen, Jiale Zhou, Zhongyu Jiang, Tianfang Zhang, Zongkai Wu, Jenq-Neng Hwang, Lei Li

    Abstract: The creation of high-quality 3D assets is paramount for applications in digital heritage preservation, entertainment, and robotics. Traditionally, this process necessitates skilled professionals and specialized software for the modeling, texturing, and rendering of 3D objects. However, the rising demand for 3D assets in gaming and virtual reality (VR) has led to the creation of accessible image-to… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 14 pages

  31. arXiv:2407.18039  [pdf, other

    cs.LG cs.AI

    Peak-Controlled Logits Poisoning Attack in Federated Distillation

    Authors: Yuhan Tang, Aoxu Zhang, Zhiyuan Wu, Bo Gao, Tian Wen, Yuwei Wang, Sheng Sun

    Abstract: Federated Distillation (FD) offers an innovative approach to distributed machine learning, leveraging knowledge distillation for efficient and flexible cross-device knowledge transfer without necessitating the upload of extensive model parameters to a central server. While FD has gained popularity, its vulnerability to poisoning attacks remains underexplored. To address this gap, we previously int… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2401.03685

  32. arXiv:2407.18038  [pdf, other

    cs.CV cs.RO

    TiCoSS: Tightening the Coupling between Semantic Segmentation and Stereo Matching within A Joint Learning Framework

    Authors: Guanfeng Tang, Zhiyuan Wu, Rui Fan

    Abstract: Semantic segmentation and stereo matching, respectively analogous to the ventral and dorsal streams in our human brain, are two key components of autonomous driving perception systems. Addressing these two tasks with separate networks is no longer the mainstream direction in developing computer vision algorithms, particularly with the recent advances in large vision models and embodied artificial… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  33. arXiv:2407.17915  [pdf, other

    cs.CR cs.AI

    The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models

    Authors: Zihui Wu, Haichang Gao, Jianping He, Ping Wang

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities, but their power comes with significant security considerations. While extensive research has been conducted on the safety of LLMs in chat mode, the security implications of their function calling feature have been largely overlooked. This paper uncovers a critical vulnerability in the function calling process of LLMs, introduc… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  34. arXiv:2407.17227  [pdf, other

    cs.AI cs.CL

    LEAN-GitHub: Compiling GitHub LEAN repositories for a versatile LEAN prover

    Authors: Zijian Wu, Jiayu Wang, Dahua Lin, Kai Chen

    Abstract: Recently, large language models have presented promising results in aiding formal mathematical reasoning. However, their performance is restricted due to the scarcity of formal theorem-proving data, which requires additional effort to be extracted from raw formal language corpora. Meanwhile, a significant amount of human-written formal language corpora remains underutilized. To address this issue,… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  35. arXiv:2407.16508  [pdf, other

    cs.CV

    ToDER: Towards Colonoscopy Depth Estimation and Reconstruction with Geometry Constraint Adaptation

    Authors: Zhenhua Wu, Yanlin Jin, Liangdong Qiu, Xiaoguang Han, Xiang Wan, Guanbin Li

    Abstract: Visualizing colonoscopy is crucial for medical auxiliary diagnosis to prevent undetected polyps in areas that are not fully observed. Traditional feature-based and depth-based reconstruction approaches usually end up with undesirable results due to incorrect point matching or imprecise depth estimation in realistic colonoscopy videos. Modern deep-based methods often require a sufficient number of… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  36. SOAP: Enhancing Spatio-Temporal Relation and Motion Information Capturing for Few-Shot Action Recognition

    Authors: Wenbo Huang, Jinghui Zhang, Xuwei Qian, Zhen Wu, Meng Wang, Lei Zhang

    Abstract: High frame-rate (HFR) videos of action recognition improve fine-grained expression while reducing the spatio-temporal relation and motion information density. Thus, large amounts of video samples are continuously required for traditional data-driven training. However, samples are not always sufficient in real-world scenarios, promoting few-shot action recognition (FSAR) research. We observe that m… ▽ More

    Submitted 24 July, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

    Comments: Accepted by ACM MM 2024

  37. arXiv:2407.16165  [pdf, other

    eess.IV cs.CV cs.LG

    Advanced AI Framework for Enhanced Detection and Assessment of Abdominal Trauma: Integrating 3D Segmentation with 2D CNN and RNN Models

    Authors: Liheng Jiang, Xuechun yang, Chang Yu, Zhizhong Wu, Yuting Wang

    Abstract: Trauma is a significant cause of mortality and disability, particularly among individuals under forty. Traditional diagnostic methods for traumatic injuries, such as X-rays, CT scans, and MRI, are often time-consuming and dependent on medical expertise, which can delay critical interventions. This study explores the application of artificial intelligence (AI) and machine learning (ML) to improve t… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 6 Pages

  38. Blueprinting the Cloud: Unifying and Automatically Optimizing Cloud Data Infrastructures with BRAD -- Extended Version

    Authors: Geoffrey X. Yu, Ziniu Wu, Ferdi Kossmann, Tianyu Li, Markos Markakis, Amadou Ngom, Samuel Madden, Tim Kraska

    Abstract: Modern organizations manage their data with a wide variety of specialized cloud database engines (e.g., Aurora, BigQuery, etc.). However, designing and managing such infrastructures is hard. Developers must consider many possible designs with non-obvious performance consequences; moreover, current software abstractions tightly couple applications to specific systems (e.g., with engine-specific cli… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 17 pages, 15 figures

  39. arXiv:2407.14903  [pdf, other

    cs.CV

    Automated Patient Positioning with Learned 3D Hand Gestures

    Authors: Zhongpai Gao, Abhishek Sharma, Meng Zheng, Benjamin Planche, Terrence Chen, Ziyan Wu

    Abstract: Positioning patients for scanning and interventional procedures is a critical task that requires high precision and accuracy. The conventional workflow involves manually adjusting the patient support to align the center of the target body part with the laser projector or other guiding devices. This process is not only time-consuming but also prone to inaccuracies. In this work, we propose an autom… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  40. arXiv:2407.13509  [pdf, other

    cs.SD cs.CL cs.LG eess.AS

    Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models

    Authors: Weiqin Li, Peiji Yang, Yicheng Zhong, Yixuan Zhou, Zhisheng Wang, Zhiyong Wu, Xixin Wu, Helen Meng

    Abstract: Spontaneous style speech synthesis, which aims to generate human-like speech, often encounters challenges due to the scarcity of high-quality data and limitations in model capabilities. Recent language model-based TTS systems can be trained on large, diverse, and low-quality speech datasets, resulting in highly natural synthesized speech. However, they are limited by the difficulty of simulating v… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Accepted by INTERSPEECH 2024

  41. arXiv:2407.13372  [pdf, other

    cs.CV

    Any Image Restoration with Efficient Automatic Degradation Adaptation

    Authors: Bin Ren, Eduard Zamfir, Yawei Li, Zongwei Wu, Danda Pani Paudel, Radu Timofte, Nicu Sebe, Luc Van Gool

    Abstract: With the emergence of mobile devices, there is a growing demand for an efficient model to restore any degraded image for better perceptual quality. However, existing models often require specific learning modules tailored for each degradation, resulting in complex architectures and high computation costs. Different from previous work, in this paper, we propose a unified manner to achieve joint emb… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: Efficient Any Image Restoration

  42. arXiv:2407.13211  [pdf

    cs.CV eess.IV

    Research on Image Super-Resolution Reconstruction Mechanism based on Convolutional Neural Network

    Authors: Hao Yan, Zixiang Wang, Zhengjia Xu, Zhuoyue Wang, Zhizhong Wu, Ranran Lyu

    Abstract: Super-resolution reconstruction techniques entail the utilization of software algorithms to transform one or more sets of low-resolution images captured from the same scene into high-resolution images. In recent years, considerable advancement has been observed in the domain of single-image super-resolution algorithms, particularly those based on deep learning techniques. Nevertheless, the extract… ▽ More

    Submitted 31 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

  43. arXiv:2407.13147  [pdf, other

    cs.CV

    DFMSD: Dual Feature Masking Stage-wise Knowledge Distillation for Object Detection

    Authors: Zhourui Zhang, Jun Li, Zhijian Wu, Jifeng Shen, Jianhua Xu

    Abstract: In recent years, current mainstream feature masking distillation methods mainly function by reconstructing selectively masked regions of a student network from the feature maps of a teacher network. In these methods, attention mechanisms can help to identify spatially important regions and crucial object-aware channel clues, such that the reconstructed features are encoded with sufficient discrimi… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  44. arXiv:2407.12951  [pdf, other

    cs.CV

    AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer

    Authors: Zhuguanyu Wu, Jiaxin Chen, Hanwen Zhong, Di Huang, Yunhong Wang

    Abstract: Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community. Despite the high accuracy, deploying it in real applications raises critical challenges including the high computational cost and inference latency. Recently, the post-training quantization (PTQ) technique has emerged as a promising way to enhance ViT's efficiency. Neverth… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  45. arXiv:2407.12857  [pdf, other

    cs.CL cs.DL cs.IR

    Automated Peer Reviewing in Paper SEA: Standardization, Evaluation, and Analysis

    Authors: Jianxiang Yu, Zichen Ding, Jiaqi Tan, Kangyang Luo, Zhenmin Weng, Chenghua Gong, Long Zeng, Renjing Cui, Chengcheng Han, Qiushi Sun, Zhiyong Wu, Yunshi Lan, Xiang Li

    Abstract: In recent years, the rapid increase in scientific papers has overwhelmed traditional review mechanisms, resulting in varying quality of publications. Although existing methods have explored the capabilities of Large Language Models (LLMs) for automated scientific reviewing, their generated contents are often generic or partial. To address the issues above, we introduce an automated paper reviewing… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  46. arXiv:2407.12249  [pdf, other

    cs.IT

    Beamforming Design for Secure MC-NOMA Empowered ISAC Systems with an Active Eve

    Authors: Zhongqing Wu, Xuehua Li, Yuanxin Cai, Weijie Yuan

    Abstract: As the integrated sensing and communication(ISAC) technology emerges as a promising component of sixth generation (6G), the study of its physical layer security has become a key concern for researchers. Specifically, in this work, we focus on the security issues over a multi-carrier (MC)-non-orthogonal multiple access (NOMA) assisted ISAC system, considering imperfect channel state information (CS… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 6 pages, 5 figures, conference, This paper has been accepted by ICCC Workshops 2024

  47. arXiv:2407.11280  [pdf, other

    cs.AI cs.CE cs.DB cs.LG

    Intelligent Cross-Organizational Process Mining: A Survey and New Perspectives

    Authors: Yiyuan Yang, Zheshun Wu, Yong Chu, Zhenghua Chen, Zenglin Xu, Qingsong Wen

    Abstract: Process mining, as a high-level field in data mining, plays a crucial role in enhancing operational efficiency and decision-making across organizations. In this survey paper, we delve into the growing significance and ongoing trends in the field of process mining, advocating a specific viewpoint on its contents, application, and development in modern businesses and process management, particularly… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Under review; 13 pages, 7 figures, 2 tables

  48. arXiv:2407.09694  [pdf, other

    cs.CV

    Divide and Fuse: Body Part Mesh Recovery from Partially Visible Human Images

    Authors: Tianyu Luan, Zhongpai Gao, Luyuan Xie, Abhishek Sharma, Hao Ding, Benjamin Planche, Meng Zheng, Ange Lou, Terrence Chen, Junsong Yuan, Ziyan Wu

    Abstract: We introduce a novel bottom-up approach for human body mesh reconstruction, specifically designed to address the challenges posed by partial visibility and occlusion in input images. Traditional top-down methods, relying on whole-body parametric models like SMPL, falter when only a small part of the human is visible, as they require visibility of most of the human body for accurate mesh reconstruc… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  49. arXiv:2407.09509  [pdf, other

    q-bio.NC cs.HC

    Brain Dialogue Interface (BDI): A User-Friendly fMRI Model for Interactive Brain Decoding

    Authors: Heng Huang, Lin Zhao, Zihao Wu, Xiaowei Yu, Jing Zhang, Xintao Hu, Dajiang Zhu, Tianming Liu

    Abstract: Brain decoding techniques are essential for understanding the neurocognitive system. Although numerous methods have been introduced in this field, accurately aligning complex external stimuli with brain activities remains a formidable challenge. To alleviate alignment difficulties, many studies have simplified their models by employing single-task paradigms and establishing direct links between br… ▽ More

    Submitted 17 June, 2024; originally announced July 2024.

  50. arXiv:2407.09417  [pdf, other

    cs.CL cs.IR

    Mitigating Entity-Level Hallucination in Large Language Models

    Authors: Weihang Su, Yichen Tang, Qingyao Ai, Changyue Wang, Zhijing Wu, Yiqun Liu

    Abstract: The emergence of Large Language Models (LLMs) has revolutionized how users access information, shifting from traditional search engines to direct question-and-answer interactions with LLMs. However, the widespread adoption of LLMs has revealed a significant challenge known as hallucination, wherein LLMs generate coherent yet factually inaccurate responses. This hallucination phenomenon has led to… ▽ More

    Submitted 22 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.