Skip to main content

Showing 1–50 of 211 results for author: Peng, L

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.11810  [pdf, other

    cs.CV

    Pixel Is Not A Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models

    Authors: Chun-Yen Shih, Li-Xuan Peng, Jia-Wei Liao, Ernie Chu, Cheng-Fu Chou, Jun-Cheng Chen

    Abstract: Diffusion Models have emerged as powerful generative models for high-quality image synthesis, with many subsequent image editing techniques based on them. However, the ease of text-based image editing introduces significant risks, such as malicious editing for scams or intellectual property infringement. Previous works have attempted to safeguard images from diffusion-based editing by adding imper… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  2. arXiv:2408.11518  [pdf, other

    cs.CV

    EmoFace: Emotion-Content Disentangled Speech-Driven 3D Talking Face with Mesh Attention

    Authors: Yihong Lin, Liang Peng, Jianqiao Hu, Xiandong Li, Wenxiong Kang, Songju Lei, Xianjia Wu, Huang Xu

    Abstract: The creation of increasingly vivid 3D virtual digital humans has become a hot topic in recent years. Currently, most speech-driven work focuses on training models to learn the relationship between phonemes and visemes to achieve more realistic lips. However, they fail to capture the correlations between emotions and facial expressions effectively. To solve this problem, we propose a new model, ter… ▽ More

    Submitted 21 August, 2024; originally announced August 2024.

  3. arXiv:2408.10145  [pdf, other

    cs.CV

    Multi-Scale Representation Learning for Image Restoration with State-Space Model

    Authors: Yuhong He, Long Peng, Qiaosi Yi, Chen Wu, Lu Wang

    Abstract: Image restoration endeavors to reconstruct a high-quality, detail-rich image from a degraded counterpart, which is a pivotal process in photography and various computer vision systems. In real-world scenarios, different types of degradation can cause the loss of image details at various scales and degrade image contrast. Existing methods predominantly rely on CNN and Transformer to capture multi-s… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  4. arXiv:2408.08665  [pdf, other

    cs.CV

    QMambaBSR: Burst Image Super-Resolution with Query State Space Model

    Authors: Xin Di, Long Peng, Peizhe Xia, Wenbo Li, Renjing Pei, Yang Cao, Yang Wang, Zheng-Jun Zha

    Abstract: Burst super-resolution aims to reconstruct high-resolution images with higher quality and richer details by fusing the sub-pixel information from multiple burst low-resolution frames. In BusrtSR, the key challenge lies in extracting the base frame's content complementary sub-pixel details while simultaneously suppressing high-frequency noise disturbance. Existing methods attempt to extract sub-pix… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

  5. arXiv:2408.06742  [pdf, other

    cs.CV

    Long-Tailed Out-of-Distribution Detection: Prioritizing Attention to Tail

    Authors: Yina He, Lei Peng, Yongcun Zhang, Juanjuan Weng, Zhiming Luo, Shaozi Li

    Abstract: Current out-of-distribution (OOD) detection methods typically assume balanced in-distribution (ID) data, while most real-world data follow a long-tailed distribution. Previous approaches to long-tailed OOD detection often involve balancing the ID data by reducing the semantics of head classes. However, this reduction can severely affect the classification accuracy of ID data. The main challenge of… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  6. arXiv:2408.01826  [pdf, other

    cs.CV

    GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer

    Authors: Yihong Lin, Zhaoxin Fan, Lingyu Xiong, Liang Peng, Xiandong Li, Wenxiong Kang, Xianjia Wu, Songju Lei, Huang Xu

    Abstract: Speech-driven talking head generation is an important but challenging task for many downstream applications such as augmented reality. Existing methods have achieved remarkable performance by utilizing autoregressive models or diffusion models. However, most still suffer from modality inconsistencies, specifically the misalignment between audio and mesh modalities, which causes inconsistencies in… ▽ More

    Submitted 16 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

    Comments: 9 pages, 5 figures

  7. arXiv:2408.01246  [pdf, other

    cs.CR

    MapComp: A Secure View-based Collaborative Analytics Framework for Join-Group-Aggregation

    Authors: Xinyu Peng, Feng Han, Li Peng, Weiran Liu, Zheng Yan, Kai Kang, Xinyuan Zhang, Guoxing Wei, Jianling Sun, Jinfei Liu

    Abstract: This paper introduces MapComp, a novel view-based framework to facilitate join-group-aggregation (JGA) queries for collaborative analytics. Through specially crafted materialized view for join and novel design of group-aggregation (GA) protocols, MapComp removes duplicated join workload and expedites subsequent GA, improving the efficiency of JGA query execution. To support continuous data updates… ▽ More

    Submitted 15 August, 2024; v1 submitted 2 August, 2024; originally announced August 2024.

    Comments: 12 pages

  8. arXiv:2407.20871  [pdf, other

    cs.LG

    Co-Neighbor Encoding Schema: A Light-cost Structure Encoding Method for Dynamic Link Prediction

    Authors: Ke Cheng, Linzhi Peng, Junchen Ye, Leilei Sun, Bowen Du

    Abstract: Structure encoding has proven to be the key feature to distinguishing links in a graph. However, Structure encoding in the temporal graph keeps changing as the graph evolves, repeatedly computing such features can be time-consuming due to the high-order subgraph construction. We develop the Co-Neighbor Encoding Schema (CNES) to address this issue. Instead of recomputing the feature by the link, CN… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  9. arXiv:2407.20824  [pdf, other

    cs.LG

    DyGKT: Dynamic Graph Learning for Knowledge Tracing

    Authors: Ke Cheng, Linzhi Peng, Pengyang Wang, Junchen Ye, Leilei Sun, Bowen Du

    Abstract: Knowledge Tracing aims to assess student learning states by predicting their performance in answering questions. Different from the existing research which utilizes fixed-length learning sequence to obtain the student states and regards KT as a static problem, this work is motivated by three dynamical characteristics: 1) The scales of students answering records are constantly growing; 2) The seman… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  10. arXiv:2407.19284  [pdf, other

    eess.IV cs.CV

    Optimizing Synthetic Data for Enhanced Pancreatic Tumor Segmentation

    Authors: Linkai Peng, Zheyuan Zhang, Gorkem Durak, Frank H. Miller, Alpay Medetalibeyoglu, Michael B. Wallace, Ulas Bagci

    Abstract: Pancreatic cancer remains one of the leading causes of cancer-related mortality worldwide. Precise segmentation of pancreatic tumors from medical images is a bottleneck for effective clinical decision-making. However, achieving a high accuracy is often limited by the small size and availability of real patient data for training deep learning models. Recent approaches have employed synthetic data g… ▽ More

    Submitted 27 July, 2024; originally announced July 2024.

    Comments: MICCAI Workshop AIPAD 2024

  11. arXiv:2407.16961  [pdf, other

    cs.CV cs.RO eess.IV

    Pose Estimation from Camera Images for Underwater Inspection

    Authors: Luyuan Peng, Hari Vishnu, Mandar Chitre, Yuen Min Too, Bharath Kalyan, Rajat Mishra, Soo Pieng Tan

    Abstract: High-precision localization is pivotal in underwater reinspection missions. Traditional localization methods like inertial navigation systems, Doppler velocity loggers, and acoustic positioning face significant challenges and are not cost-effective for some applications. Visual localization is a cost-effective alternative in such cases, leveraging the cameras already equipped on inspection vehicle… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: Submitted to IEEE Journal of Oceanic Engineering

  12. arXiv:2407.14491  [pdf, other

    cs.CV

    PD-TPE: Parallel Decoder with Text-guided Position Encoding for 3D Visual Grounding

    Authors: Chenshu Hou, Liang Peng, Xiaopei Wu, Wenxiao Wang, Xiaofei He

    Abstract: 3D visual grounding aims to locate the target object mentioned by free-formed natural language descriptions in 3D point cloud scenes. Most previous work requires the encoder-decoder to simultaneously align the attribute information of the target object and its relational information with the surrounding environment across modalities. This causes the queries' attention to be dispersed, potentially… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

  13. arXiv:2407.09787  [pdf, other

    cs.CV

    Semi-supervised 3D Object Detection with PatchTeacher and PillarMix

    Authors: Xiaopei Wu, Liang Peng, Liang Xie, Yuenan Hou, Binbin Lin, Xiaoshui Huang, Haifeng Liu, Deng Cai, Wanli Ouyang

    Abstract: Semi-supervised learning aims to leverage numerous unlabeled data to improve the model performance. Current semi-supervised 3D object detection methods typically use a teacher to generate pseudo labels for a student, and the quality of the pseudo labels is essential for the final performance. In this paper, we propose PatchTeacher, which focuses on partial scene 3D object detection to provide high… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: Accepted by AAAI 2024

  14. arXiv:2407.09209  [pdf, other

    cs.CL eess.AS

    Pronunciation Assessment with Multi-modal Large Language Models

    Authors: Kaiqi Fu, Linkai Peng, Nan Yang, Shuran Zhou

    Abstract: Large language models (LLMs), renowned for their powerful conversational abilities, are widely recognized as exceptional tools in the field of education, particularly in the context of automated intelligent instruction systems for language learning. In this paper, we propose a scoring system based on LLMs, motivated by their positive impact on text-related scoring tasks. Specifically, the speech e… ▽ More

    Submitted 18 July, 2024; v1 submitted 12 July, 2024; originally announced July 2024.

  15. arXiv:2407.02158  [pdf, other

    cs.CV

    UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks

    Authors: Jingjing Ren, Wenbo Li, Haoyu Chen, Renjing Pei, Bin Shao, Yong Guo, Long Peng, Fenglong Song, Lei Zhu

    Abstract: Ultra-high-resolution image generation poses great challenges, such as increased semantic planning complexity and detail synthesis difficulties, alongside substantial training resource demands. We present UltraPixel, a novel architecture utilizing cascade diffusion models to generate high-quality images at multiple resolutions (\textit{e.g.}, 1K to 6K) within a single model, while maintaining comp… ▽ More

    Submitted 4 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: Project page https://jingjingrenabc.github.io/ultrapixel

  16. arXiv:2406.16477  [pdf, other

    cs.CV cs.CL

    DaLPSR: Leverage Degradation-Aligned Language Prompt for Real-World Image Super-Resolution

    Authors: Aiwen Jiang, Zhi Wei, Long Peng, Feiqiang Liu, Wenbo Li, Mingwen Wang

    Abstract: Image super-resolution pursuits reconstructing high-fidelity high-resolution counterpart for low-resolution image. In recent years, diffusion-based models have garnered significant attention due to their capabilities with rich prior knowledge. The success of diffusion models based on general text prompts has validated the effectiveness of textual control in the field of text2image. However, given… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  17. arXiv:2406.15132  [pdf, other

    cs.LG cs.AI

    Younger: The First Dataset for Artificial Intelligence-Generated Neural Network Architecture

    Authors: Zhengxin Yang, Wanling Gao, Luzhou Peng, Yunyou Huang, Fei Tang, Jianfeng Zhan

    Abstract: Designing and optimizing neural network architectures typically requires extensive expertise, starting with handcrafted designs and then manual or automated refinement. This dependency presents a significant barrier to rapid innovation. Recognizing the complexity of automatically generating neural network architecture from scratch, we introduce Younger, a pioneering dataset to advance this ambitio… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 31 pages, 29 figures, 11 tables

  18. arXiv:2406.14912  [pdf, other

    cs.CV

    FC3DNet: A Fully Connected Encoder-Decoder for Efficient Demoir'eing

    Authors: Zhibo Du, Long Peng, Yang Wang, Yang Cao, Zheng-Jun Zha

    Abstract: Moiré patterns are commonly seen when taking photos of screens. Camera devices usually have limited hardware performance but take high-resolution photos. However, users are sensitive to the photo processing time, which presents a hardly considered challenge of efficiency for demoiréing methods. To balance the network speed and quality of results, we propose a \textbf{F}ully \textbf{C}onnected en\t… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Accepted by ICIP2024

  19. arXiv:2406.11115  [pdf, other

    cs.CL

    Text Grafting: Near-Distribution Weak Supervision for Minority Classes in Text Classification

    Authors: Letian Peng, Yi Gu, Chengyu Dong, Zihan Wang, Jingbo Shang

    Abstract: For extremely weak-supervised text classification, pioneer research generates pseudo labels by mining texts similar to the class names from the raw corpus, which may end up with very limited or even no samples for the minority classes. Recent works have started to generate the relevant texts by prompting LLMs using the class names or definitions; however, there is a high risk that LLMs cannot gene… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  20. arXiv:2406.07255  [pdf, other

    cs.CV eess.IV

    Towards Realistic Data Generation for Real-World Super-Resolution

    Authors: Long Peng, Wenbo Li, Renjing Pei, Jingjing Ren, Xueyang Fu, Yang Wang, Yang Cao, Zheng-Jun Zha

    Abstract: Existing image super-resolution (SR) techniques often fail to generalize effectively in complex real-world settings due to the significant divergence between training data and practical scenarios. To address this challenge, previous efforts have either manually simulated intricate physical-based degradations or utilized learning-based techniques, yet these approaches remain inadequate for producin… ▽ More

    Submitted 11 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  21. arXiv:2406.03215  [pdf, other

    cs.CV

    Searching Priors Makes Text-to-Video Synthesis Better

    Authors: Haoran Cheng, Liang Peng, Linxuan Xia, Yuepeng Hu, Hengjia Li, Qinglin Lu, Xiaofei He, Boxi Wu

    Abstract: Significant advancements in video diffusion models have brought substantial progress to the field of text-to-video (T2V) synthesis. However, existing T2V synthesis model struggle to accurately generate complex motion dynamics, leading to a reduction in video realism. One possible solution is to collect massive data and train the model on it, but this would be extremely expensive. To alleviate this… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  22. arXiv:2406.03097  [pdf, other

    cs.LG cs.AI

    Enhancing the Resilience of Graph Neural Networks to Topological Perturbations in Sparse Graphs

    Authors: Shuqi He, Jun Zhuang, Ding Wang, Luyao Peng, Jun Song

    Abstract: Graph neural networks (GNNs) have been extensively employed in node classification. Nevertheless, recent studies indicate that GNNs are vulnerable to topological perturbations, such as adversarial attacks and edge disruptions. Considerable efforts have been devoted to mitigating these challenges. For example, pioneering Bayesian methodologies, including GraphSS and LlnDT, incorporate Bayesian labe… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  23. arXiv:2405.16848  [pdf, other

    cs.CV

    A re-calibration method for object detection with multi-modal alignment bias in autonomous driving

    Authors: Zhihang Song, Lihui Peng, Jianming Hu, Danya Yao, Yi Zhang

    Abstract: Multi-modal object detection in autonomous driving has achieved great breakthroughs due to the usage of fusing complementary information from different sensors. The calibration in fusion between sensors such as LiDAR and camera is always supposed to be precise in previous work. However, in reality, calibration matrices are fixed when the vehicles leave the factory, but vibration, bumps, and data l… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures

  24. arXiv:2405.12367  [pdf, other

    eess.IV cs.CV

    Large-Scale Multi-Center CT and MRI Segmentation of Pancreas with Deep Learning

    Authors: Zheyuan Zhang, Elif Keles, Gorkem Durak, Yavuz Taktak, Onkar Susladkar, Vandan Gorade, Debesh Jha, Asli C. Ormeci, Alpay Medetalibeyoglu, Lanhong Yao, Bin Wang, Ilkin Sevgi Isler, Linkai Peng, Hongyi Pan, Camila Lopes Vendrami, Amir Bourhani, Yury Velichko, Boqing Gong, Concetto Spampinato, Ayis Pyrros, Pallavi Tiwari, Derk C. F. Klatte, Megan Engels, Sanne Hoogenboom, Candice W. Bolan , et al. (13 additional authors not shown)

    Abstract: Automated volumetric segmentation of the pancreas on cross-sectional imaging is needed for diagnosis and follow-up of pancreatic diseases. While CT-based pancreatic segmentation is more established, MRI-based segmentation methods are understudied, largely due to a lack of publicly available datasets, benchmarking research efforts, and domain-specific deep learning methods. In this retrospective st… ▽ More

    Submitted 25 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: under review version

  25. arXiv:2405.11856  [pdf, other

    cs.RO eess.SY

    Modeling and simulation of a mechanism for suppressing the flipping problem of a jumping robot

    Authors: Qi Li, Liang Peng, Zhiyuan Wu, Pengda Ye, Weitao Zhang, Yi Xu, Qing Shi

    Abstract: In order to solve the problem of stable jumping of micro robot, we design a special mechanism: elastic passive joint (EPJ). EPJ can assist in achieving smooth jumping through the opening-closing process when the robot jumps. First, we introduce the composition and operation principle of EPJ, and perform a dynamic modeling of the robot's jumping process. Then, in order to verify the effectiveness o… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  26. arXiv:2405.07726  [pdf, other

    cs.CL

    Quantifying and Optimizing Global Faithfulness in Persona-driven Role-playing

    Authors: Letian Peng, Jingbo Shang

    Abstract: Persona-driven role-playing (PRP) aims to build AI characters that can respond to user queries by faithfully sticking with all persona statements. Unfortunately, existing faithfulness criteria for PRP are limited to coarse-grained LLM-based scoring without a clear definition or formulation. This paper presents a pioneering exploration to quantify PRP faithfulness as a fine-grained and explainable… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  27. arXiv:2405.07023  [pdf, other

    eess.IV cs.CV

    Efficient Real-world Image Super-Resolution Via Adaptive Directional Gradient Convolution

    Authors: Long Peng, Yang Cao, Renjing Pei, Wenbo Li, Jiaming Guo, Xueyang Fu, Yang Wang, Zheng-Jun Zha

    Abstract: Real-SR endeavors to produce high-resolution images with rich details while mitigating the impact of multiple degradation factors. Although existing methods have achieved impressive achievements in detail recovery, they still fall short when addressing regions with complex gradient arrangements due to the intensity-based linear weighting feature extraction manner. Moreover, the stochastic artifact… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

  28. arXiv:2405.06784  [pdf, other

    cs.LG

    Open Challenges and Opportunities in Federated Foundation Models Towards Biomedical Healthcare

    Authors: Xingyu Li, Lu Peng, Yuping Wang, Weihua Zhang

    Abstract: This survey explores the transformative impact of foundation models (FMs) in artificial intelligence, focusing on their integration with federated learning (FL) for advancing biomedical research. Foundation models such as ChatGPT, LLaMa, and CLIP, which are trained on vast datasets through methods including unsupervised pretraining, self-supervised learning, instructed fine-tuning, and reinforceme… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: 42 pages

  29. arXiv:2405.05160  [pdf, other

    cs.LG cs.AI cs.CV

    Selective Classification Under Distribution Shifts

    Authors: Hengyue Liang, Le Peng, Ju Sun

    Abstract: In selective classification (SC), a classifier abstains from making predictions that are likely to be wrong to avoid excessive errors. To deploy imperfect classifiers -- imperfect either due to intrinsic statistical noise of data or for robustness issue of the classifier or beyond -- in high-stakes scenarios, SC appears to be an attractive and necessary path to follow. Despite decades of research… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: Total 25 pages (14 pages for main body); preprint for journal submission

  30. arXiv:2404.19534  [pdf, other

    cs.CV

    MIPI 2024 Challenge on Nighttime Flare Removal: Methods and Results

    Authors: Yuekun Dai, Dafeng Zhang, Xiaoming Li, Zongsheng Yue, Chongyi Li, Shangchen Zhou, Ruicheng Feng, Peiqing Yang, Zhezhu Jin, Guanqun Liu, Chen Change Loy, Lize Zhang, Shuai Liu, Chaoyu Feng, Luyang Wang, Shuan Chen, Guangqi Shao, Xiaotao Wang, Lei Lei, Qirui Yang, Qihua Cheng, Zhiqiang Xu, Yihao Liu, Huanjing Yue, Jingyu Yang , et al. (38 additional authors not shown)

    Abstract: The increasing demand for computational photography and imaging on mobile platforms has led to the widespread development and integration of advanced image sensors with novel algorithms in camera systems. However, the scarcity of high-quality data for research and the rare opportunity for in-depth exchange of views from industry and academia constrain the development of mobile intelligent photogra… ▽ More

    Submitted 27 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

    Comments: CVPR 2024 Mobile Intelligent Photography and Imaging (MIPI) Workshop--Nighttime Flare Removal Challenge Report. Website: https://mipi-challenge.org/MIPI2024/

  31. arXiv:2404.19384  [pdf, other

    cs.CV cs.AI

    Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection

    Authors: Zhanwei Zhang, Minghao Chen, Shuai Xiao, Liang Peng, Hengjia Li, Binbin Lin, Ping Li, Wenxiao Wang, Boxi Wu, Deng Cai

    Abstract: Recent self-training techniques have shown notable improvements in unsupervised domain adaptation for 3D object detection (3D UDA). These techniques typically select pseudo labels, i.e., 3D boxes, to supervise models for the target domain. However, this selection process inevitably introduces unreliable 3D boxes, in which 3D points cannot be definitively assigned as foreground or background. Previ… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR2024

  32. arXiv:2404.16484  [pdf, other

    cs.CV eess.IV

    Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey

    Authors: Marcos V. Conde, Zhijun Lei, Wen Li, Cosmin Stejerean, Ioannis Katsavounidis, Radu Timofte, Kihwan Yoon, Ganzorig Gankhuyag, Jiangtao Lv, Long Sun, Jinshan Pan, Jiangxin Dong, Jinhui Tang, Zhiyuan Li, Hao Wei, Chenyang Ge, Dongyang Zhang, Tianle Liu, Huaian Chen, Yi Jin, Menghan Zhou, Yiqiang Yan, Si Gao, Biao Wu, Shaoli Liu , et al. (50 additional authors not shown)

    Abstract: This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: CVPR 2024, AI for Streaming (AIS) Workshop

  33. arXiv:2404.10877  [pdf, other

    cs.CL

    Incubating Text Classifiers Following User Instruction with Nothing but LLM

    Authors: Letian Peng, Jingbo Shang

    Abstract: In this paper, we aim to generate text classification data given arbitrary class definitions (i.e., user instruction), so one can train a small text classifier without any human annotation or raw corpus. Compared with pioneer attempts, our proposed Incubator is the first framework that can handle complicated and even mutually dependent classes (e.g., "TED Talk given by Educator" and "Other"). Spec… ▽ More

    Submitted 20 May, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

  34. arXiv:2404.10343  [pdf, other

    cs.CV eess.IV

    The Ninth NTIRE 2024 Efficient Super-Resolution Challenge Report

    Authors: Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, Yuqing Liu, Jizhe Li, Keji He, Chao Fan, Heng Zhang, Xiaolin Zhang, Xuanwu Yin, Kunlong Zuo, Bohao Liao, Peizhe Xia, Long Peng, Zhibo Du, Xin Di, Wangkai Li, Yang Wang , et al. (109 additional authors not shown)

    Abstract: This paper provides a comprehensive review of the NTIRE 2024 challenge, focusing on efficient single-image super-resolution (ESR) solutions and their outcomes. The task of this challenge is to super-resolve an input image with a magnification factor of x4 based on pairs of low and corresponding high-resolution images. The primary objective is to develop networks that optimize various aspects such… ▽ More

    Submitted 25 June, 2024; v1 submitted 16 April, 2024; originally announced April 2024.

    Comments: The report paper of NTIRE2024 Efficient Super-resolution, accepted by CVPRW2024

  35. arXiv:2404.07382  [pdf, other

    cs.AI cs.LO

    Learn from Failure: Fine-Tuning LLMs with Trial-and-Error Data for Intuitionistic Propositional Logic Proving

    Authors: Chenyang An, Zhibo Chen, Qihao Ye, Emily First, Letian Peng, Jiayun Zhang, Zihan Wang, Sorin Lerner, Jingbo Shang

    Abstract: Recent advances in Automated Theorem Proving have shown the effectiveness of leveraging a (large) language model that generates tactics (i.e. proof steps) to search through proof states. The current model, while trained solely on successful proof paths, faces a discrepancy at the inference stage, as it must sample and try various tactics at each proof state until finding success, unlike its traini… ▽ More

    Submitted 29 July, 2024; v1 submitted 10 April, 2024; originally announced April 2024.

    Comments: Accepted as a main conference paper at ACL 2024

  36. arXiv:2404.06155  [pdf, ps, other

    cs.CV cs.RO

    Efficient and Robust Point Cloud Registration via Heuristics-guided Parameter Search

    Authors: Tianyu Huang, Haoang Li, Liangzu Peng, Yinlong Liu, Yun-Hui Liu

    Abstract: Estimating the rigid transformation with 6 degrees of freedom based on a putative 3D correspondence set is a crucial procedure in point cloud registration. Existing correspondence identification methods usually lead to large outlier ratios ($>$ 95 $\%$ is common), underscoring the significance of robust registration methods. Many researchers turn to parameter search-based strategies (e.g., Branch-… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: 21 pages, 16 figures. Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024

  37. arXiv:2404.00915  [pdf, ps, other

    cs.CV cs.RO

    Scalable 3D Registration via Truncated Entry-wise Absolute Residuals

    Authors: Tianyu Huang, Liangzu Peng, René Vidal, Yun-Hui Liu

    Abstract: Given an input set of $3$D point pairs, the goal of outlier-robust $3$D registration is to compute some rotation and translation that align as many point pairs as possible. This is an important problem in computer vision, for which many highly accurate approaches have been recently proposed. Despite their impressive performance, these approaches lack scalability, often overflowing the $16$GB of me… ▽ More

    Submitted 9 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: 24 pages, 12 figures. Accepted to CVPR 2024

  38. arXiv:2404.00457  [pdf, other

    cs.CL

    MetaIE: Distilling a Meta Model from LLM for All Kinds of Information Extraction Tasks

    Authors: Letian Peng, Zilong Wang, Feng Yao, Zihan Wang, Jingbo Shang

    Abstract: Information extraction (IE) is a fundamental area in natural language processing where prompting large language models (LLMs), even with in-context examples, cannot defeat small LMs tuned on very small IE datasets. We observe that IE tasks, such as named entity recognition and relation extraction, all focus on extracting important information, which can be formalized as a label-to-span matching. I… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  39. arXiv:2403.15878  [pdf, other

    cs.CV

    Diffusion-based Aesthetic QR Code Generation via Scanning-Robust Perceptual Guidance

    Authors: Jia-Wei Liao, Winston Wang, Tzu-Sian Wang, Li-Xuan Peng, Cheng-Fu Chou, Jun-Cheng Chen

    Abstract: QR codes, prevalent in daily applications, lack visual appeal due to their conventional black-and-white design. Integrating aesthetics while maintaining scannability poses a challenge. In this paper, we introduce a novel diffusion-model-based aesthetic QR code generation pipeline, utilizing pre-trained ControlNet and guided iterative refinement via a novel classifier guidance (SRG) based on the pr… ▽ More

    Submitted 23 March, 2024; originally announced March 2024.

  40. arXiv:2403.11627  [pdf, other

    cs.CV

    LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models

    Authors: Yang Yang, Wen Wang, Liang Peng, Chaotian Song, Yao Chen, Hengjia Li, Xiaolong Yang, Qinglin Lu, Deng Cai, Boxi Wu, Wei Liu

    Abstract: Customization generation techniques have significantly advanced the synthesis of specific concepts across varied contexts. Multi-concept customization emerges as the challenging task within this domain. Existing approaches often rely on training a fusion matrix of multiple Low-Rank Adaptations (LoRAs) to merge various concepts into a single image. However, we identify this straightforward method f… ▽ More

    Submitted 10 July, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: project page: https://github.com/Young98CN/LoRA_Composer

  41. arXiv:2403.08360  [pdf, other

    cs.CV cs.RO

    Improved Image-based Pose Regressor Models for Underwater Environments

    Authors: Luyuan Peng, Hari Vishnu, Mandar Chitre, Yuen Min Too, Bharath Kalyan, Rajat Mishra

    Abstract: We investigate the performance of image-based pose regressor models in underwater environments for relocalization. Leveraging PoseNet and PoseLSTM, we regress a 6-degree-of-freedom pose from single RGB images with high accuracy. Additionally, we explore data augmentation with stereo camera images to improve model accuracy. Experimental results demonstrate that the models achieve high accuracy in b… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Presented at AUV Symposium 2022

  42. arXiv:2403.03493  [pdf, other

    cs.CV

    VastTrack: Vast Category Visual Object Tracking

    Authors: Liang Peng, Junyuan Gao, Xinran Liu, Weihong Li, Shaohua Dong, Zhipeng Zhang, Heng Fan, Libo Zhang

    Abstract: In this paper, we introduce a novel benchmark, dubbed VastTrack, towards facilitating the development of more general visual tracking via encompassing abundant classes and videos. VastTrack possesses several attractive properties: (1) Vast Object Category. In particular, it covers target objects from 2,115 classes, largely surpassing object categories of existing popular benchmarks (e.g., GOT-10k… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

    Comments: Tech. report

  43. arXiv:2402.09642  [pdf, other

    cs.CL

    Answer is All You Need: Instruction-following Text Embedding via Answering the Question

    Authors: Letian Peng, Yuwei Zhang, Zilong Wang, Jayanth Srinivasa, Gaowen Liu, Zihan Wang, Jingbo Shang

    Abstract: This work aims to build a text embedder that can capture characteristics of texts specified by user instructions. Despite its tremendous potential to deploy user-oriented embeddings, none of previous approaches provides a concrete solution for it. This paper offers a new viewpoint, which treats the instruction as a question about the input text and encodes the expected answers to obtain the repres… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  44. arXiv:2401.14718  [pdf, other

    cs.CV

    A Survey on Video Prediction: From Deterministic to Generative Approaches

    Authors: Ruibo Ming, Zhewei Huang, Zhuoxuan Ju, Jianming Hu, Lihui Peng, Shuchang Zhou

    Abstract: Video prediction, a fundamental task in computer vision, aims to enable models to generate sequences of future frames based on existing video content. This task has garnered widespread application across various domains. In this paper, we comprehensively survey both historical and contemporary works in this field, encompassing the most widely used datasets and algorithms. Our survey scrutinizes th… ▽ More

    Submitted 22 July, 2024; v1 submitted 26 January, 2024; originally announced January 2024.

    Comments: under review

  45. arXiv:2401.04908  [pdf, other

    cs.IT cs.NI

    On Achieving High-Fidelity Grant-free Non-Orthogonal Multiple Access

    Authors: Haoran Mei, Limei Peng, Pin-Han Ho

    Abstract: Grant-free access (GFA) has been envisioned to play an active role in massive Machine Type Communication (mMTC) under 5G and Beyond mobile systems, which targets at achieving significant reduction of signaling overhead and access latency in the presence of sporadic traffic and small-size data. The paper focuses on a novel K-repetition GFA (K-GFA) scheme by incorporating Reed-Solomon (RS) code with… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 9 pages, 5 figures

  46. arXiv:2401.04539  [pdf, other

    cs.IT cs.NI

    A Novel Framework of K-repetition Grant-free Access via Diversity Slotted Aloha (DSA)

    Authors: Haoran Mei, Limei Peng, Pin-Han Ho

    Abstract: This article introduces a novel framework of multi-user detection (MUD) for K-repetition grant-free non-orthogonal multiple access (K-GF-NOMA), called $α$ iterative interference cancellation diversity slotted aloha ($α$-IIC-DSA). The proposed framework targets at a simple yet effective decoding process where the AP can intelligently exploit the correlation among signals received at different resou… ▽ More

    Submitted 9 January, 2024; originally announced January 2024.

    Comments: 7 pages, 5 figures

  47. arXiv:2401.01724  [pdf, other

    cs.CV

    Lightweight Adaptive Feature De-drifting for Compressed Image Classification

    Authors: Long Peng, Yang Cao, Yuejin Sun, Yang Wang

    Abstract: JPEG is a widely used compression scheme to efficiently reduce the volume of transmitted images. The artifacts appear among blocks due to the information loss, which not only affects the quality of images but also harms the subsequent high-level tasks in terms of feature drifting. High-level vision models trained on high-quality images will suffer performance degradation when dealing with compress… ▽ More

    Submitted 3 January, 2024; originally announced January 2024.

    Comments: Accepted by IEEE Transactions on Multimedia 2024

  48. arXiv:2312.14574  [pdf, other

    cs.CV cs.LG

    MMGPL: Multimodal Medical Data Analysis with Graph Prompt Learning

    Authors: Liang Peng, Songyue Cai, Zongqian Wu, Huifang Shang, Xiaofeng Zhu, Xiaoxiao Li

    Abstract: Prompt learning has demonstrated impressive efficacy in the fine-tuning of multimodal large models to a wide range of downstream tasks. Nonetheless, applying existing prompt learning methods for the diagnosis of neurological disorder still suffers from two issues: (i) existing methods typically treat all patches equally, despite the fact that only a small number of patches in neuroimaging are rele… ▽ More

    Submitted 27 June, 2024; v1 submitted 22 December, 2023; originally announced December 2023.

  49. arXiv:2312.11837  [pdf, other

    cs.CV

    Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving

    Authors: Junkai Xu, Liang Peng, Haoran Cheng, Linxuan Xia, Qi Zhou, Dan Deng, Wei Qian, Wenxiao Wang, Deng Cai

    Abstract: Multi-camera perception tasks have gained significant attention in the field of autonomous driving. However, existing frameworks based on Lift-Splat-Shoot (LSS) in the multi-camera setting cannot produce suitable dense 3D features due to the projection nature and uncontrollable densification process. To resolve this problem, we propose to regulate intermediate dense 3D features with the help of vo… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  50. arXiv:2312.08768  [pdf, other

    cs.CV

    Local Conditional Controlling for Text-to-Image Diffusion Models

    Authors: Yibo Zhao, Liang Peng, Yang Yang, Zekai Luo, Hengjia Li, Yao Chen, Wei Zhao, qinglin lu, Boxi Wu, Wei Liu

    Abstract: Diffusion models have exhibited impressive prowess in the text-to-image task. Recent methods add image-level controls, e.g., edge and depth maps, to manipulate the generation process together with text prompts to obtain desired images. This controlling process is globally operated on the entire image, which limits the flexibility of control regions. In this paper, we introduce a new simple yet pra… ▽ More

    Submitted 6 February, 2024; v1 submitted 14 December, 2023; originally announced December 2023.