Skip to main content

Showing 1–50 of 1,349 results for author: Yu, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10631  [pdf, other

    cs.LG cs.AI cs.CL

    LLM-Barber: Block-Aware Rebuilder for Sparsity Mask in One-Shot for Large Language Models

    Authors: Yupeng Su, Ziyi Guan, Xiaoqun Liu, Tianlai Jin, Dongkuan Wu, Graziano Chesi, Ngai Wong, Hao Yu

    Abstract: Large language models (LLMs) have grown significantly in scale, leading to a critical need for efficient model pruning techniques. Existing post-training pruning techniques primarily focus on measuring weight importance on converged dense models to determine salient weights to retain. However, they often overlook the changes in weight importance during the pruning process, which can lead to perfor… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2408.10599  [pdf, other

    hep-ex cs.CV

    Vision Calorimeter for Anti-neutron Reconstruction: A Baseline

    Authors: Hongtian Yu, Yangu Li, Mingrui Wu, Letian Shen, Yue Liu, Yunxuan Song, Qixiang Ye, Xiaorui Lyu, Yajun Mao, Yangheng Zheng, Yunfan Liu

    Abstract: In high-energy physics, anti-neutrons ($\bar{n}$) are fundamental particles that frequently appear as final-state particles, and the reconstruction of their kinematic properties provides an important probe for understanding the governing principles. However, this confronts significant challenges instrumentally with the electromagnetic calorimeter (EMC), a typical experimental sensor but recovering… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  3. arXiv:2408.10531  [pdf, other

    cs.RO

    Leveraging Temporal Contexts to Enhance Vehicle-Infrastructure Cooperative Perception

    Authors: Jiaru Zhong, Haibao Yu, Tianyi Zhu, Jiahui Xu, Wenxian Yang, Zaiqing Nie, Chao Sun

    Abstract: Infrastructure sensors installed at elevated positions offer a broader perception range and encounter fewer occlusions. Integrating both infrastructure and ego-vehicle data through V2X communication, known as vehicle-infrastructure cooperation, has shown considerable advantages in enhancing perception capabilities and addressing corner cases encountered in single-vehicle autonomous driving. Howeve… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: Accepted by IEEE ITSC 2024

  4. arXiv:2408.09688  [pdf, other

    cs.CL

    Recording for Eyes, Not Echoing to Ears: Contextualized Spoken-to-Written Conversion of ASR Transcripts

    Authors: Jiaqing Liu, Chong Deng, Qinglin Zhang, Qian Chen, Hai Yu, Wen Wang

    Abstract: Automatic Speech Recognition (ASR) transcripts exhibit recognition errors and various spoken language phenomena such as disfluencies, ungrammatical sentences, and incomplete sentences, hence suffering from poor readability. To improve readability, we propose a Contextualized Spoken-to-Written conversion (CoS2W) task to address ASR and grammar errors and also transfer the informal text into the for… ▽ More

    Submitted 18 August, 2024; originally announced August 2024.

    Comments: 7 pages, 3 figures

  5. arXiv:2408.07576  [pdf, other

    cs.CV cs.AI

    MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient Semantic Segmentation

    Authors: Beoungwoo Kang, Seunghun Moon, Yubin Cho, Hyunwoo Yu, Suk-Ju Kang

    Abstract: Beyond the Transformer, it is important to explore how to exploit the capacity of the MetaFormer, an architecture that is fundamental to the performance improvements of the Transformer. Previous studies have exploited it only for the backbone network. Unlike previous studies, we explore the capacity of the Metaformer architecture more extensively in the semantic segmentation task. We propose a pow… ▽ More

    Submitted 14 August, 2024; v1 submitted 14 August, 2024; originally announced August 2024.

    Comments: Accepted by WACV 2024

  6. Cross-aware Early Fusion with Stage-divided Vision and Language Transformer Encoders for Referring Image Segmentation

    Authors: Yubin Cho, Hyunwoo Yu, Suk-ju Kang

    Abstract: Referring segmentation aims to segment a target object related to a natural language expression. Key challenges of this task are understanding the meaning of complex and ambiguous language expressions and determining the relevant regions in the image with multiple objects by referring to the expression. Recent models have focused on the early fusion with the language features at the intermediate s… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Published in IEEE Transactions on Multimedia (TMM)

  7. arXiv:2408.06327  [pdf, other

    cs.AI cs.CL cs.CV

    VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

    Authors: Xiao Liu, Tianjie Zhang, Yu Gu, Iat Long Iong, Yifan Xu, Xixuan Song, Shudan Zhang, Hanyu Lai, Xinyi Liu, Hanlin Zhao, Jiadai Sun, Xinyue Yang, Yu Yang, Zehan Qi, Shuntian Yao, Xueqiao Sun, Siyi Cheng, Qinkai Zheng, Hao Yu, Hanchen Zhang, Wenyi Hong, Ming Ding, Lihang Pan, Xiaotao Gu, Aohan Zeng , et al. (5 additional authors not shown)

    Abstract: Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable Visual Foundation Agents. These agents are postulated to excel across a myriad of tasks, potentially approaching general artificial intelligence. However, existing benchmarks fail to sufficiently challenge or showcase the full potential of LMM… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  8. arXiv:2408.06288  [pdf, ps, other

    cs.IT eess.SP

    RIS-Aided Free-Space Optics Communications in A2G Networks over Inverted Gamma-Gamma Turbulent Channels

    Authors: Md. Abdur Rakib, Md. Ibrahim, A. S. M. Badrudduza, Imran Shafique Ansari, Md. Shahid Uz Zaman, Heejung Yu

    Abstract: With the advent of sixth-generation networks, reconfigurable intelligent surfaces (RISs) have revolutionized wireless communications through dynamic electromagnetic wave manipulation, thereby facilitating the adaptability and unparalleled control of real-time performance evaluations. This study proposed a framework to analyze the performance of RIS-assisted free-space optics (FSO) communication ov… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  9. arXiv:2408.05776  [pdf

    cs.NI eess.SP

    Convergence of Symbiotic Communications and Blockchain for Sustainable and Trustworthy 6G Wireless Networks

    Authors: Haoxiang Luo, Gang Sun, Cheng Chi, Hongfang Yu, Mohsen Guizani

    Abstract: Symbiotic communication (SC) is known as a new wireless communication paradigm, similar to the natural ecosystem population, and can enable multiple communication systems to cooperate and mutualize through service exchange and resource sharing. As a result, SC is seen as an important potential technology for future sixth-generation (6G) communications, solving the problem of lack of spectrum resou… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

  10. arXiv:2408.05555  [pdf, other

    cs.CL

    Large Language Model-based Role-Playing for Personalized Medical Jargon Extraction

    Authors: Jung Hoon Lim, Sunjae Kwon, Zonghai Yao, John P. Lalor, Hong Yu

    Abstract: Previous studies reveal that Electronic Health Records (EHR), which have been widely adopted in the U.S. to allow patients to access their personal medical information, do not have high readability to patients due to the prevalence of medical jargon. Tailoring medical notes to individual comprehension by identifying jargon that is difficult for each person will enhance the utility of generative mo… ▽ More

    Submitted 10 August, 2024; originally announced August 2024.

    Comments: 17 pages, 3 figures, 3 tables

  11. arXiv:2408.05326  [pdf, other

    cs.CL

    A Psychology-based Unified Dynamic Framework for Curriculum Learning

    Authors: Guangyu Meng, Qingkai Zeng, John P. Lalor, Hong Yu

    Abstract: Directly learning from examples of random difficulty levels is often challenging for both humans and machine learning models. A more effective strategy involves exposing learners to examples in a progressive order, from easy to difficult. Curriculum Learning (CL) has been proposed to implement this strategy in machine learning model training. However, two key challenges persist in CL framework des… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

  12. arXiv:2408.04138  [pdf, other

    cs.CL cs.AI

    Enhancing Healthcare through Large Language Models: A Study on Medical Question Answering

    Authors: Haoran Yu, Chang Yu, Zihan Wang, Dongxian Zou, Hao Qin

    Abstract: In recent years, the application of Large Language Models (LLMs) in healthcare has shown significant promise in improving the accessibility and dissemination of medical knowledge. This paper presents a detailed study of various LLMs trained on the MedQuAD medical question-answering dataset, with a focus on identifying the most effective model for providing accurate medical information. Among the m… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

    Comments: received by IEEE ICPICS

  13. arXiv:2408.03092  [pdf, other

    cs.CL

    Extend Model Merging from Fine-Tuned to Pre-Trained Large Language Models via Weight Disentanglement

    Authors: Le Yu, Bowen Yu, Haiyang Yu, Fei Huang, Yongbin Li

    Abstract: Merging Large Language Models (LLMs) aims to amalgamate multiple homologous LLMs into one with all the capabilities. Ideally, any LLMs sharing the same backbone should be mergeable, irrespective of whether they are Fine-Tuned (FT) with minor parameter changes or Pre-Trained (PT) with substantial parameter shifts. However, existing methods often manually assign the model importance, rendering them… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

    Comments: 17 pages

  14. arXiv:2408.01861  [pdf, other

    cs.LG stat.ML

    Batch Active Learning in Gaussian Process Regression using Derivatives

    Authors: Hon Sum Alec Yu, Christoph Zimmer, Duy Nguyen-Tuong

    Abstract: We investigate the use of derivative information for Batch Active Learning in Gaussian Process regression models. The proposed approach employs the predictive covariance matrix for selection of data batches to exploit full correlation of samples. We theoretically analyse our proposed algorithm taking different optimality criteria into consideration and provide empirical comparisons highlighting th… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: 29 pages, 10 figures

  15. arXiv:2408.00619  [pdf, other

    cs.CV

    Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection

    Authors: Ruiyang Zhang, Hu Zhang, Hang Yu, Zhedong Zheng

    Abstract: Unsupervised 3D object detection aims to identify objects of interest from unlabeled raw data, such as LiDAR points. Recent approaches usually adopt pseudo 3D bounding boxes (3D bboxes) from clustering algorithm to initialize the model training, and then iteratively updating both pseudo labels and the trained model. However, pseudo bboxes inevitably contain noises, and such inaccurate annotation a… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: Preprint, 14 pages, 4 figures, 4 tables

  16. arXiv:2408.00365  [pdf, other

    cs.AI cs.CV eess.IV

    Multimodal Fusion and Coherence Modeling for Video Topic Segmentation

    Authors: Hai Yu, Chong Deng, Qinglin Zhang, Jiaqing Liu, Qian Chen, Wen Wang

    Abstract: The video topic segmentation (VTS) task segments videos into intelligible, non-overlapping topics, facilitating efficient comprehension of video content and quick access to specific content. VTS is also critical to various downstream video understanding tasks. Traditional VTS methods using shallow features or unsupervised approaches struggle to accurately discern the nuances of topical transitions… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  17. arXiv:2407.21325  [pdf

    cs.AR

    EdgeLLM: A Highly Efficient CPU-FPGA Heterogeneous Edge Accelerator for Large Language Models

    Authors: Mingqiang Huang, Ao Shen, Kai Li, Haoxiang Peng, Boyu Li, Hao Yu

    Abstract: The rapid advancements in artificial intelligence (AI), particularly the Large Language Models (LLMs), have profoundly affected our daily work and communication forms. However, the colossal scale of LLM presents significant operational challenges, particularly when attempting to deploy them on resource-constrained edge devices such as smartphones, robots, and embedded systems. In this work, we pro… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  18. arXiv:2407.20427  [pdf, other

    cs.CV eess.IV

    Mean Opinion Score as a New Metric for User-Evaluation of XAI Methods

    Authors: Hyeon Yu, Jenny Benois-Pineau, Romain Bourqui, Romain Giot, Alexey Zhukov

    Abstract: This paper investigates the use of Mean Opinion Score (MOS), a common image quality metric, as a user-centric evaluation metric for XAI post-hoc explainers. To measure the MOS, a user experiment is proposed, which has been conducted with explanation maps of intentionally distorted images. Three methods from the family of feature attribution methods - Gradient-weighted Class Activation Mapping (Gra… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

    Comments: Supported by organization Laboratoire Bordelais de Recherche en Informatique, 15 pages, 4 figures, 3 tables

    ACM Class: I.4.7

  19. arXiv:2407.19074  [pdf

    cs.CE cs.LG

    Parsimonious Universal Function Approximator for Elastic and Elasto-Plastic Cavity Expansion Problems

    Authors: Xiao-Xuan Chen, Pin Zhang, Hai-Sui Yu, Zhen-Yu Yin, Brian Sheil

    Abstract: Cavity expansion is a canonical problem in geotechnics, which can be described by partial differential equations (PDEs) and ordinary differential equations (ODEs). This study explores the potential of using a new solver, a physics-informed neural network (PINN), to calculate the stress field in an expanded cavity in the elastic and elasto-plastic regimes. Whilst PINNs have emerged as an effective… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  20. arXiv:2407.18766  [pdf, other

    cs.IT eess.SP

    Secrecy Performance Analysis of Integrated RF-UWOC IoT Networks Enabled by UAV and Underwater-RIS

    Authors: Abrar Bin Sarawar, A. S. M. Badrudduza, Md. Ibrahim, Imran Shafique Ansari, Heejung Yu

    Abstract: In the sixth-generation (6G) Internet of Things (IoT) networks, the use of UAV-mounted base stations and reconfigurable intelligent surfaces (RIS) has been considered to enhance coverage, flexibility, and security in non-terrestrial networks (NTNs). In addition to aerial networks enabled by NTN technologies, the integration of underwater networks with 6G IoT can be considered one of the most innov… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  21. arXiv:2407.18323  [pdf, other

    cs.NI eess.SP

    Active Reconfigurable Intelligent Surface-Aided Terahertz Wireless Communications

    Authors: Waqas Khalid, Heejung Yu, Yazdan Ahmad Qadri

    Abstract: Terahertz (THz) communication is expected to be a key technology for future sixth-generation (6G) wireless networks. Furthermore, reconfigurable intelligent surfaces (RIS) have been proposed to modify the wireless propagation environment and enhance system performance. Given the sensitivity to blockages and limited coverage range, RIS is particularly promising for THz communications. Active RIS ca… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Submitted in KICS Summer Conference 2024, (19 June 2024 - 22 June 2024), Jeju, Korea

  22. arXiv:2407.17261  [pdf, other

    cs.CV

    Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation

    Authors: Hyunwoo Yu, Yubin Cho, Beoungwoo Kang, Seunghun Moon, Kyeongbo Kong, Suk-Ju Kang

    Abstract: We present an Encoder-Decoder Attention Transformer, EDAFormer, which consists of the Embedding-Free Transformer (EFT) encoder and the all-attention decoder leveraging our Embedding-Free Attention (EFA) structure. The proposed EFA is a novel global context modeling mechanism that focuses on functioning the global non-linearity, not the specific roles of the query, key and value. For the decoder, w… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  23. arXiv:2407.17023  [pdf, other

    cs.CL cs.AI

    From Internal Conflict to Contextual Adaptation of Language Models

    Authors: Sara Vera Marjanović, Haeun Yu, Pepa Atanasova, Maria Maistro, Christina Lioma, Isabelle Augenstein

    Abstract: Knowledge-intensive language understanding tasks require Language Models (LMs) to integrate relevant context, mitigating their inherent weaknesses, such as incomplete or outdated knowledge. Nevertheless, studies indicate that LMs often ignore the provided context as it can conflict with the pre-existing LM's memory learned during pre-training. Moreover, conflicting knowledge can already be present… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: 22 pages, 15 figures

    MSC Class: 68T50 ACM Class: I.2.7

  24. arXiv:2407.17020  [pdf, other

    cs.CV

    EAFormer: Scene Text Segmentation with Edge-Aware Transformers

    Authors: Haiyang Yu, Teng Fu, Bin Li, Xiangyang Xue

    Abstract: Scene text segmentation aims at cropping texts from scene images, which is usually used to help generative models edit or remove texts. The existing text segmentation methods tend to involve various text-related supervisions for better performance. However, most of them ignore the importance of text edges, which are significant for downstream applications. In this paper, we propose Edge-Aware Tran… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  25. arXiv:2407.16634  [pdf, other

    eess.IV cs.AI cs.CV cs.HC

    Knowledge-driven AI-generated data for accurate and interpretable breast ultrasound diagnoses

    Authors: Haojun Yu, Youcheng Li, Nan Zhang, Zihan Niu, Xuantong Gong, Yanwen Luo, Quanlin Wu, Wangyan Qin, Mengyuan Zhou, Jie Han, Jia Tao, Ziwei Zhao, Di Dai, Di He, Dong Wang, Binghui Tang, Ling Huo, Qingli Zhu, Yong Wang, Liwei Wang

    Abstract: Data-driven deep learning models have shown great capabilities to assist radiologists in breast ultrasound (US) diagnoses. However, their effectiveness is limited by the long-tail distribution of training data, which leads to inaccuracies in rare cases. In this study, we address a long-standing challenge of improving the diagnostic model performance on rare cases using long-tailed data. Specifical… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

  26. arXiv:2407.15869  [pdf, other

    cs.LG cs.AI

    Long Input Sequence Network for Long Time Series Forecasting

    Authors: Chao Ma, Yikai Hou, Xiang Li, Yinggang Sun, Haining Yu

    Abstract: Short fixed-length inputs are the main bottleneck of deep learning methods in long time-series forecasting tasks. Prolonging input length causes overfitting, rapidly deteriorating accuracy. Our research indicates that the overfitting is a combination reaction of the multi-scale pattern coupling in time series and the fixed focusing scale of current models. First, we find that the patterns exhibite… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

    Comments: 9 pages

  27. arXiv:2407.15762  [pdf, other

    cs.LG cs.AI cs.CL

    Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning

    Authors: Kaiwen Wang, Rahul Kidambi, Ryan Sullivan, Alekh Agarwal, Christoph Dann, Andrea Michi, Marco Gelmi, Yunxuan Li, Raghav Gupta, Avinava Dubey, Alexandre Ramé, Johan Ferret, Geoffrey Cideron, Le Hou, Hongkun Yu, Amr Ahmed, Aranyak Mehta, Léonard Hussenot, Olivier Bachem, Edouard Leurent

    Abstract: Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge here is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditioned Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Bui… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 40 pages

  28. arXiv:2407.14785  [pdf, ps, other

    cs.DS

    Stochastic Online Metric Matching: Adversarial is no Harder than Stochastic

    Authors: Amin Saberi, Mingwei Yang, Sophie H. Yu

    Abstract: We study the stochastic online metric matching problem. In this problem, $m$ servers and $n$ requests are located in a metric space, where all servers are available upfront and requests arrive one at a time. In particular, servers are adversarially chosen, and requests are independently drawn from a known distribution. Upon the arrival of a new request, it needs to be immediately and irrevocably m… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.

  29. arXiv:2407.14568  [pdf, other

    cs.CL cs.AI cs.DB

    SQLfuse: Enhancing Text-to-SQL Performance through Comprehensive LLM Synergy

    Authors: Tingkai Zhang, Chaoyu Chen, Cong Liao, Jun Wang, Xudong Zhao, Hang Yu, Jianchao Wang, Jianguo Li, Wenhui Shi

    Abstract: Text-to-SQL conversion is a critical innovation, simplifying the transition from complex SQL to intuitive natural language queries, especially significant given SQL's prevalence in the job market across various roles. The rise of Large Language Models (LLMs) like GPT-3.5 and GPT-4 has greatly advanced this field, offering improved natural language understanding and the ability to generate nuanced… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  30. arXiv:2407.14239  [pdf, other

    cs.AI

    KoMA: Knowledge-driven Multi-agent Framework for Autonomous Driving with Large Language Models

    Authors: Kemou Jiang, Xuan Cai, Zhiyong Cui, Aoyong Li, Yilong Ren, Haiyang Yu, Hao Yang, Daocheng Fu, Licheng Wen, Pinlong Cai

    Abstract: Large language models (LLMs) as autonomous agents offer a novel avenue for tackling real-world challenges through a knowledge-driven manner. These LLM-enhanced methodologies excel in generalization and interpretability. However, the complexity of driving tasks often necessitates the collaboration of multiple, heterogeneous agents, underscoring the need for such LLM-driven agents to engage in coope… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 13 pages, 18 figures

  31. arXiv:2407.14084  [pdf, other

    math.CO cs.IT

    A Purely Entropic Approach to the Rainbow Triangle Problem

    Authors: Ting-Wei Chao, Hung-Hsun Hans Yu

    Abstract: In this short note, we present a purely entropic proof that in a $3$-edge-colored simple graph with $R$ red edges, $G$ green edges, and $B$ blue edges, the number of rainbow triangles is at most $\sqrt{2RGB}$.

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 5 pages, 5 figures

    MSC Class: 05D05; 05D40; 94A17

  32. arXiv:2407.13996  [pdf, other

    cs.DC cs.AR cs.PF

    Missile: Fine-Grained, Hardware-Level GPU Resource Isolation for Multi-Tenant DNN Inference

    Authors: Yongkang Zhang, Haoxuan Yu, Chenxia Han, Cheng Wang, Baotong Lu, Yang Li, Xiaowen Chu, Huaicheng Li

    Abstract: Colocating high-priority, latency-sensitive (LS) and low-priority, best-effort (BE) DNN inference services reduces the total cost of ownership (TCO) of GPU clusters. Limited by bottlenecks such as VRAM channel conflicts and PCIe bus contentions, existing GPU sharing solutions are unable to avoid resource conflicts among concurrently executing tasks, failing to achieve both low latency for LS tasks… ▽ More

    Submitted 27 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: 18 pages, 18 figures

    ACM Class: D.4.9; I.2.5

  33. arXiv:2407.13863  [pdf, other

    cs.CV

    A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks

    Authors: Yixiang Qiu, Hao Fang, Hongyao Yu, Bin Chen, MeiKang Qiu, Shu-Tao Xia

    Abstract: Model Inversion (MI) attacks aim to reconstruct privacy-sensitive training data from released models by utilizing output information, raising extensive concerns about the security of Deep Neural Networks (DNNs). Recent advances in generative adversarial networks (GANs) have contributed significantly to the improved performance of MI attacks due to their powerful ability to generate realistic image… ▽ More

    Submitted 27 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  34. arXiv:2407.11730  [pdf, other

    cs.CV

    Monocular Occupancy Prediction for Scalable Indoor Scenes

    Authors: Hongxiao Yu, Yuqi Wang, Yuntao Chen, Zhaoxiang Zhang

    Abstract: Camera-based 3D occupancy prediction has recently garnered increasing attention in outdoor driving scenes. However, research in indoor scenes remains relatively unexplored. The core differences in indoor scenes lie in the complexity of scene scale and the variance in object size. In this paper, we propose a novel method, named ISO, for predicting indoor scene occupancy using monocular images. ISO… ▽ More

    Submitted 16 July, 2024; v1 submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

  35. arXiv:2407.11548  [pdf, other

    cs.IR

    A PLMs based protein retrieval framework

    Authors: Yuxuan Wu, Xiao Yi, Yang Tan, Huiqun Yu, Guisheng Fan

    Abstract: Protein retrieval, which targets the deconstruction of the relationship between sequences, structures and functions, empowers the advancing of biology. Basic Local Alignment Search Tool (BLAST), a sequence-similarity-based algorithm, has proved the efficiency of this field. Despite the existing tools for protein retrieval, they prioritize sequence similarity and probably overlook proteins that are… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: 16 pages, 12 figures

    ACM Class: H.3.3

  36. arXiv:2407.11537  [pdf, other

    cs.CV cs.AI

    AEMIM: Adversarial Examples Meet Masked Image Modeling

    Authors: Wenzhao Xiang, Chang Liu, Hang Su, Hongyang Yu

    Abstract: Masked image modeling (MIM) has gained significant traction for its remarkable prowess in representation learning. As an alternative to the traditional approach, the reconstruction from corrupted images has recently emerged as a promising pretext task. However, the regular corrupted images are generated using generic generators, often lacking relevance to the specific reconstruction task involved… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Under review of International Journal of Computer Vision (IJCV)

  37. Exploring Knowledge Transfer in Evolutionary Many-task Optimization: A Complex Network Perspective

    Authors: Yudong Yang, Kai Wu, Xiangyi Teng, Handing Wang, He Yu, Jing Liu

    Abstract: The field of evolutionary many-task optimization (EMaTO) is increasingly recognized for its ability to streamline the resolution of optimization challenges with repetitive characteristics, thereby conserving computational resources. This paper tackles the challenge of crafting efficient knowledge transfer mechanisms within EMaTO, a task complicated by the computational demands of individual task e… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 9 pages, accepted by GECCO 2024 poster

  38. arXiv:2407.08569  [pdf, other

    cs.CV

    Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene

    Authors: Ruiyang Zhang, Hu Zhang, Hang Yu, Zhedong Zheng

    Abstract: The unsupervised 3D object detection is to accurately detect objects in unstructured environments with no explicit supervisory signals. This task, given sparse LiDAR point clouds, often results in compromised performance for detecting distant or small objects due to the inherent sparsity and limited spatial resolution. In this paper, we are among the early attempts to integrate LiDAR data with 2D… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV'24, 18 pages, 5 figures, 6 tables

  39. arXiv:2407.08481  [pdf, other

    eess.IV cs.CV

    SliceMamba with Neural Architecture Search for Medical Image Segmentation

    Authors: Chao Fan, Hongyuan Yu, Yan Huang, Liang Wang, Zhenghan Yang, Xibin Jia

    Abstract: Despite the progress made in Mamba-based medical image segmentation models, existing methods utilizing unidirectional or multi-directional feature scanning mechanisms struggle to effectively capture dependencies between neighboring positions, limiting the discriminant representation learning of local features. These local features are crucial for medical image segmentation as they provide critical… ▽ More

    Submitted 19 August, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  40. arXiv:2407.08337  [pdf, other

    cs.LG cs.DC stat.ML

    FedLog: Personalized Federated Classification with Less Communication and More Flexibility

    Authors: Haolin Yu, Guojun Zhang, Pascal Poupart

    Abstract: In federated learning (FL), the common paradigm that FedAvg proposes and most algorithms follow is that clients train local models with their private data, and the model parameters are shared for central aggregation, mostly averaging. In this paradigm, the communication cost is often a challenge, as modern massive neural networks can contain millions to billions parameters. We suggest that clients… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  41. arXiv:2407.08106  [pdf, other

    cs.RO

    SGLC: Semantic Graph-Guided Coarse-Fine-Refine Full Loop Closing for LiDAR SLAM

    Authors: Neng Wang, Xieyuanli Chen, Chenghao Shi, Zhiqiang Zheng, Hongshan Yu, Huimin Lu

    Abstract: Loop closing is a crucial component in SLAM that helps eliminate accumulated errors through two main steps: loop detection and loop pose correction. The first step determines whether loop closing should be performed, while the second estimates the 6-DoF pose to correct odometry drift. Current methods mostly focus on developing robust descriptors for loop closure detection, often neglecting loop po… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 8 pages, 4 figures

  42. arXiv:2407.07347  [pdf, other

    cs.CV eess.IV

    MNeRV: A Multilayer Neural Representation for Videos

    Authors: Qingling Chang, Haohui Yu, Shuxuan Fu, Zhiqiang Zeng, Chuangquan Chen

    Abstract: As a novel video representation method, Neural Representations for Videos (NeRV) has shown great potential in the fields of video compression, video restoration, and video interpolation. In the process of representing videos using NeRV, each frame corresponds to an embedding, which is then reconstructed into a video frame sequence after passing through a small number of decoding layers (E-NeRV, HN… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 14 pages, 12 figures, 8 table

  43. arXiv:2407.06772  [pdf, other

    cs.IT eess.SP

    Revealing the evanescent components in Kronecker-product based codebooks: insights and implications

    Authors: Jun Yang, Yijian Chen, Yunqi Sun, Yuan Si, Hongkang Yu, Shujuan Zhang, Zhaohua Lu

    Abstract: The orthogonal bases of discrete Fourier transform (DFT) has been recognized as the standard spatial-domain bases for Type I, Type II and enhanced Type II codewords by the 3rd Generation Partnership Project (3GPP). For uniform planar arrays, these spatial-domain bases are derived as the Kronecker product of one-dimensional DFT bases. Theoretically, each spatial basis corresponds to a beam directed… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 11 pages, 9 figures

  44. arXiv:2407.06459  [pdf, other

    cs.RO cs.AI cs.LG

    How Much Progress Did I Make? An Unexplored Human Feedback Signal for Teaching Robots

    Authors: Hang Yu, Qidi Fang, Shijie Fang, Reuben M. Aronson, Elaine Schaertl Short

    Abstract: Enhancing the expressiveness of human teaching is vital for both improving robots' learning from humans and the human-teaching-robot experience. In this work, we characterize and test a little-used teaching signal: \textit{progress}, designed to represent the completion percentage of a task. We conducted two online studies with 76 crowd-sourced participants and one public space study with 40 non-e… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 8 pages. RO-MAN 2024

  45. arXiv:2407.05643  [pdf, other

    cs.IT eess.SP

    Spatial Non-Stationary Dual-Wideband Channel Estimation for XL-MIMO Systems

    Authors: Anzheng Tang, Jun-Bo Wang, Yijin Pan, Tuo Wu, Chuanwen Chang, Yijian Chen, Hongkang Yu, Maged Elkashlan

    Abstract: In this paper, we investigate the channel estimation problem for extremely large-scale multi-input and multi-output (XL-MIMO) systems, considering the spherical wavefront effect, spatially non-stationary (SnS) property, and dual-wideband effects. To accurately characterize the XL-MIMO channel, we first derive a novel spatial-and-frequency-domain channel model for XL-MIMO systems and carefully exam… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: This paper has been submitted to IEEE journal for possible publication

  46. arXiv:2407.05571  [pdf, other

    cs.NI eess.SP

    Cost-Efficient Computation Offloading in SAGIN: A Deep Reinforcement Learning and Perception-Aided Approach

    Authors: Yulan Gao, Ziqiang Ye, Han Yu

    Abstract: The Space-Air-Ground Integrated Network (SAGIN), crucial to the advancement of sixth-generation (6G) technology, plays a key role in ensuring universal connectivity, particularly by addressing the communication needs of remote areas lacking cellular network infrastructure. This paper delves into the role of unmanned aerial vehicles (UAVs) within SAGIN, where they act as a control layer owing to th… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  47. arXiv:2407.03672  [pdf, other

    cs.LG cs.AI

    A Survey of Data Synthesis Approaches

    Authors: Hsin-Yu Chang, Pei-Yu Chen, Tun-Hsiang Chou, Chang-Sheng Kao, Hsuan-Yun Yu, Yen-Ting Lin, Yun-Nung Chen

    Abstract: This paper provides a detailed survey of synthetic data techniques. We first discuss the expected goals of using synthetic data in data augmentation, which can be divided into four parts: 1) Improving Diversity, 2) Data Balancing, 3) Addressing Domain Shift, and 4) Resolving Edge Cases. Synthesizing data are closely related to the prevailing machine learning techniques at the time, therefore, we s… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  48. arXiv:2407.03418  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    HEMM: Holistic Evaluation of Multimodal Foundation Models

    Authors: Paul Pu Liang, Akshay Goindani, Talha Chafekar, Leena Mathur, Haofei Yu, Ruslan Salakhutdinov, Louis-Philippe Morency

    Abstract: Multimodal foundation models that can holistically process text alongside images, video, audio, and other sensory modalities are increasingly used in a variety of real-world applications. However, it is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains. In this paper, we introduce Holistic Evaluation o… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Code available at https://github.com/pliang279/HEMM

  49. arXiv:2407.03205  [pdf, other

    cs.CV

    Category-Aware Dynamic Label Assignment with High-Quality Oriented Proposal

    Authors: Mingkui Feng, Hancheng Yu, Xiaoyu Dang, Ming Zhou

    Abstract: Objects in aerial images are typically embedded in complex backgrounds and exhibit arbitrary orientations. When employing oriented bounding boxes (OBB) to represent arbitrary oriented objects, the periodicity of angles could lead to discontinuities in label regression values at the boundaries, inducing abrupt fluctuations in the loss function. To address this problem, an OBB representation based o… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  50. arXiv:2407.02052  [pdf, other

    eess.AS cs.SD

    The USTC-NERCSLIP Systems for The ICMC-ASR Challenge

    Authors: Minghui Wu, Luzhen Xu, Jie Zhang, Haitao Tang, Yanyan Yue, Ruizhi Liao, Jintao Zhao, Zhengzhe Zhang, Yichi Wang, Haoyin Yan, Hongliang Yu, Tongle Ma, Jiachen Liu, Chongliang Wu, Yongchao Li, Yanyong Zhang, Xin Fang, Yue Zhang

    Abstract: This report describes the submitted system to the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) challenge, which considers the ASR task with multi-speaker overlapping and Mandarin accent dynamics in the ICMC case. We implement the front-end speaker diarization using the self-supervised learning representation based multi-speaker embedding and beamforming using the speaker position,… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted at ICASSP 2024