Skip to main content

Showing 1–50 of 167 results for author: Hou, X

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.07600  [pdf, other

    cs.CV

    Disentangle and denoise: Tackling context misalignment for video moment retrieval

    Authors: Kaijing Ma, Han Fang, Xianghao Zang, Chao Ban, Lanxiang Zhou, Zhongjiang He, Yongxiang Li, Hao Sun, Zerun Feng, Xingsong Hou

    Abstract: Video Moment Retrieval, which aims to locate in-context video moments according to a natural language query, is an essential task for cross-modal grounding. Existing methods focus on enhancing the cross-modal interactions between all moments and the textual description for video understanding. However, constantly interacting with all locations is unreasonable because of uneven semantic distributio… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  2. arXiv:2408.06608  [pdf, other

    cs.AR cs.GR

    Potamoi: Accelerating Neural Rendering via a Unified Streaming Architecture

    Authors: Yu Feng, Weikai Lin, Zihan Liu, Jingwen Leng, Minyi Guo, Han Zhao, Xiaofeng Hou, Jieru Zhao, Yuhao Zhu

    Abstract: Neural Radiance Field (NeRF) has emerged as a promising alternative for photorealistic rendering. Despite recent algorithmic advancements, achieving real-time performance on today's resource-constrained devices remains challenging. In this paper, we identify the primary bottlenecks in current NeRF algorithms and introduce a unified algorithm-architecture co-design, Potamoi, designed to accommodate… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2404.11852

  3. arXiv:2408.01687  [pdf, other

    cs.SE

    Voices from the Frontier: A Comprehensive Analysis of the OpenAI Developer Forum

    Authors: Xinyi Hou, Yanjie Zhao, Haoyu Wang

    Abstract: OpenAI's advanced large language models (LLMs) have revolutionized natural language processing and enabled developers to create innovative applications. As adoption grows, understanding the experiences and challenges of developers working with these technologies is crucial. This paper presents a comprehensive analysis of the OpenAI Developer Forum, focusing on (1) popularity trends and user engage… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  4. arXiv:2407.21220  [pdf, other

    cs.LG cs.CR cs.CV

    DeepBaR: Fault Backdoor Attack on Deep Neural Network Layers

    Authors: C. A. Martínez-Mejía, J. Solano, J. Breier, D. Bucko, X. Hou

    Abstract: Machine Learning using neural networks has received prominent attention recently because of its success in solving a wide variety of computational tasks, in particular in the field of computer vision. However, several works have drawn attention to potential security risks involved with the training and implementation of such networks. In this work, we introduce DeepBaR, a novel approach that impla… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  5. arXiv:2407.18333  [pdf, other

    cs.AR cs.AI

    AutoVCoder: A Systematic Framework for Automated Verilog Code Generation using LLMs

    Authors: Mingzhe Gao, Jieru Zhao, Zhe Lin, Wenchao Ding, Xiaofeng Hou, Yu Feng, Chao Li, Minyi Guo

    Abstract: Recently, the use of large language models (LLMs) for software code generation, e.g., C/C++ and Python, has proven a great success. However, LLMs still suffer from low syntactic and functional correctness when it comes to the generation of register-transfer level (RTL) code, such as Verilog. To address this issue, in this paper, we develop AutoVCoder, a systematic open-source framework that signif… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

  6. arXiv:2407.16467  [pdf, other

    cs.CR cs.AI

    Side-Channel Analysis of OpenVINO-based Neural Network Models

    Authors: Dirmanto Jap, Jakub Breier, Zdenko Lehocký, Shivam Bhasin, Xiaolu Hou

    Abstract: Embedded devices with neural network accelerators offer great versatility for their users, reducing the need to use cloud-based services. At the same time, they introduce new security challenges in the area of hardware attacks, the most prominent being side-channel analysis (SCA). It was shown that SCA can recover model parameters with a high accuracy, posing a threat to entities that wish to keep… ▽ More

    Submitted 20 August, 2024; v1 submitted 23 July, 2024; originally announced July 2024.

  7. arXiv:2407.14660  [pdf, ps, other

    math.NT cs.IT

    More on the sum-freedom of the multiplicative inverse function

    Authors: Claude Carlet, Xiang-dong Hou

    Abstract: In two papers entitled ``Two generalizations of almost perfect nonlinearity" and ``On the vector subspaces of $\mathbb F_{2^n}$ over which the multiplicative inverse function sums to zero", the first author has introduced and studied the notion of sum-freedom of vectorial functions, which expresses that a function sums to nonzero values over all affine subspaces of $\Bbb F_{2^n}$ of a given dimens… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: 17 pages, 2 tables

    MSC Class: 11G25; 11T06; 11T71; 94D10

  8. arXiv:2407.12295  [pdf, ps, other

    cs.CV eess.IV

    Exploiting Inter-Image Similarity Prior for Low-Bitrate Remote Sensing Image Compression

    Authors: Junhui Li, Xingsong Hou

    Abstract: Deep learning-based methods have garnered significant attention in remote sensing (RS) image compression due to their superior performance. Most of these methods focus on enhancing the coding capability of the compression network and improving entropy model prediction accuracy. However, they typically compress and decompress each image independently, ignoring the significant inter-image similarity… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  9. arXiv:2407.11699  [pdf, other

    cs.CV

    Relation DETR: Exploring Explicit Position Relation Prior for Object Detection

    Authors: Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen, Xuguang Lan

    Abstract: This paper presents a general scheme for enhancing the convergence and performance of DETR (DEtection TRansformer). We investigate the slow convergence problem in transformers from a new perspective, suggesting that it arises from the self-attention that introduces no structural bias over inputs. To address this issue, we explore incorporating position relation prior as attention bias to augment o… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  10. arXiv:2407.08422  [pdf, other

    cs.CR cs.AI

    On the (In)Security of LLM App Stores

    Authors: Xinyi Hou, Yanjie Zhao, Haoyu Wang

    Abstract: LLM app stores have seen rapid growth, leading to the proliferation of numerous custom LLM apps. However, this expansion raises security concerns. In this study, we propose a three-layer concern framework to identify the potential security risks of LLM apps, i.e., LLM apps with abusive potential, LLM apps with malicious intent, and LLM apps with exploitable vulnerabilities. Over five months, we co… ▽ More

    Submitted 29 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  11. arXiv:2407.03040  [pdf, other

    cs.CL cs.AI

    Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model

    Authors: Xia Hou, Qifeng Li, Jian Yang, Tongliang Li, Linzheng Chai, Xianjie Wu, Hangyuan Ji, Zhoujun Li, Jixuan Nie, Jingbo Dun, Wenfeng Song

    Abstract: Instruction tuning as an effective technique aligns the outputs of large language models (LLMs) with human preference. But how to generate the seasonal multi-turn dialogues from raw documents for instruction tuning still requires further exploration. In this paper, we present a novel framework named R2S that leverages the CoD-Chain of Dialogue logic to guide large language models (LLMs) in generat… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 11 pages, 3 figures

    MSC Class: 68T50 ACM Class: I.2.7

  12. arXiv:2406.13381  [pdf, other

    cs.CL

    CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration

    Authors: Xinming Hou, Mingming Yang, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Wayne Xin Zhao

    Abstract: Existing LLMs exhibit remarkable performance on various NLP tasks, but still struggle with complex real-world tasks, even equipped with advanced strategies like CoT and ReAct. In this work, we propose the CoAct framework, which transfers the hierarchical planning and collaboration patterns in human society to LLM systems. Specifically, our CoAct framework involves two agents: (1) A global planning… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 9 pages, 4 figures

  13. arXiv:2406.12805  [pdf, other

    cs.CV

    AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation

    Authors: Xinyu Hou, Xiaoming Li, Chen Change Loy

    Abstract: Despite the high-quality results of text-to-image generation, stereotypical biases have been spotted in their generated contents, compromising the fairness of generative models. In this work, we propose to learn adaptive inclusive tokens to shift the attribute distribution of the final generative outputs. Unlike existing de-biasing approaches, our method requires neither explicit attribute specifi… ▽ More

    Submitted 18 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

  14. arXiv:2406.10185  [pdf, other

    cs.CV

    Detecting and Evaluating Medical Hallucinations in Large Vision Language Models

    Authors: Jiawei Chen, Dingkang Yang, Tong Wu, Yue Jiang, Xiaolu Hou, Mingcheng Li, Shunli Wang, Dongling Xiao, Ke Li, Lihua Zhang

    Abstract: Large Vision Language Models (LVLMs) are increasingly integral to healthcare applications, including medical visual question answering and imaging report generation. While these models inherit the robust capabilities of foundational Large Language Models (LLMs), they also inherit susceptibility to hallucinations-a significant concern in high-stakes medical contexts where the margin for error is mi… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  15. arXiv:2406.06451  [pdf, other

    cs.HC cs.AI cs.CY

    Insights from Social Shaping Theory: The Appropriation of Large Language Models in an Undergraduate Programming Course

    Authors: Aadarsh Padiyath, Xinying Hou, Amy Pang, Diego Viramontes Vargas, Xingjian Gu, Tamara Nelson-Fromm, Zihan Wu, Mark Guzdial, Barbara Ericson

    Abstract: The capability of large language models (LLMs) to generate, debug, and explain code has sparked the interest of researchers and educators in undergraduate programming, with many anticipating their transformative potential in programming education. However, decisions about why and how to use LLMs in programming education may involve more than just the assessment of an LLM's technical capabilities.… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Accepted to the ACM Conference on International Computing Education Research V.1 (ICER '24 Vol. 1)

  16. arXiv:2406.04170  [pdf

    cs.LG cs.AI cs.NE

    Element-wise Multiplication Based Physics-informed Neural Networks

    Authors: Feilong Jiang, Xiaonan Hou, Min Xia

    Abstract: As a promising framework for resolving partial differential equations (PDEs), physics-informed neural networks (PINNs) have received widespread attention from industrial and scientific fields. However, lack of expressive ability and initialization pathology issues are found to prevent the application of PINNs in complex PDEs. In this work, we propose Element-wise Multiplication Based Physics-infor… ▽ More

    Submitted 16 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

  17. arXiv:2406.03961  [pdf, ps, other

    eess.IV cs.CV

    LDM-RSIC: Exploring Distortion Prior with Latent Diffusion Models for Remote Sensing Image Compression

    Authors: Junhui Li, Jutao Li, Xingsong Hou, Huake Wang, Yutao Zhang, Yujie Dun, Wenke Sun

    Abstract: Deep learning-based image compression algorithms typically focus on designing encoding and decoding networks and improving the accuracy of entropy model estimation to enhance the rate-distortion (RD) performance. However, few algorithms leverage the compression distortion prior from existing compression algorithms to improve RD performance. In this paper, we propose a latent diffusion model-based… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  18. arXiv:2405.15630  [pdf, other

    cs.SE

    GPTZoo: A Large-scale Dataset of GPTs for the Research Community

    Authors: Xinyi Hou, Yanjie Zhao, Shenao Wang, Haoyu Wang

    Abstract: The rapid advancements in Large Language Models (LLMs) have revolutionized natural language processing, with GPTs, customized versions of ChatGPT available on the GPT Store, emerging as a prominent technology for specific domains and tasks. To support academic research on GPTs, we introduce GPTZoo, a large-scale dataset comprising 730,420 GPT instances. Each instance includes rich metadata with 21… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  19. arXiv:2405.13891  [pdf, other

    cs.CR cs.AI

    DeepNcode: Encoding-Based Protection against Bit-Flip Attacks on Neural Networks

    Authors: Patrik Velčický, Jakub Breier, Mladen Kovačević, Xiaolu Hou

    Abstract: Fault injection attacks are a potent threat against embedded implementations of neural network models. Several attack vectors have been proposed, such as misclassification, model extraction, and trojan/backdoor planting. Most of these attacks work by flipping bits in the memory where quantized model parameters are stored. In this paper, we introduce an encoding-based protection method against bi… ▽ More

    Submitted 2 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

  20. arXiv:2405.12377  [pdf

    eess.SY cs.LG

    Spatio-temporal Attention-based Hidden Physics-informed Neural Network for Remaining Useful Life Prediction

    Authors: Feilong Jiang, Xiaonan Hou, Min Xia

    Abstract: Predicting the Remaining Useful Life (RUL) is essential in Prognostic Health Management (PHM) for industrial systems. Although deep learning approaches have achieved considerable success in predicting RUL, challenges such as low prediction accuracy and interpretability pose significant challenges, hindering their practical implementation. In this work, we introduce a Spatio-temporal Attention-base… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  21. arXiv:2405.10626  [pdf, other

    cs.CL

    Dynamic data sampler for cross-language transfer learning in large language models

    Authors: Yudong Li, Yuhao Feng, Wen Zhou, Zhe Zhao, Linlin Shen, Cheng Hou, Xianxu Hou

    Abstract: Large Language Models (LLMs) have gained significant attention in the field of natural language processing (NLP) due to their wide range of applications. However, training LLMs for languages other than English poses significant challenges, due to the difficulty in acquiring large-scale corpus and the requisite computing resources. In this paper, we propose ChatFlow, a cross-language transfer-based… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Accepted by ICASSP 2024

  22. arXiv:2405.10518  [pdf, ps, other

    cs.CV eess.IV

    Enhancing Perception Quality in Remote Sensing Image Compression via Invertible Neural Network

    Authors: Junhui Li, Xingsong Hou

    Abstract: Decoding remote sensing images to achieve high perceptual quality, particularly at low bitrates, remains a significant challenge. To address this problem, we propose the invertible neural network-based remote sensing image compression (INN-RSIC) method. Specifically, we capture compression distortion from an existing image compression algorithm and encode it as a set of Gaussian-distributed latent… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  23. arXiv:2405.10210  [pdf, other

    cs.LG cs.SE

    GPT Store Mining and Analysis

    Authors: Dongxun Su, Yanjie Zhao, Xinyi Hou, Shenao Wang, Haoyu Wang

    Abstract: As a pivotal extension of the renowned ChatGPT, the GPT Store serves as a dynamic marketplace for various Generative Pre-trained Transformer (GPT) models, shaping the frontier of conversational AI. This paper presents an in-depth measurement study of the GPT Store, with a focus on the categorization of GPTs by topic, factors influencing GPT popularity, and the potential security risks. Our investi… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  24. arXiv:2405.04645  [pdf, other

    cs.HC cs.CY

    Enhancing LLM-Based Feedback: Insights from Intelligent Tutoring Systems and the Learning Sciences

    Authors: John Stamper, Ruiwei Xiao, Xinying Hou

    Abstract: The field of Artificial Intelligence in Education (AIED) focuses on the intersection of technology, education, and psychology, placing a strong emphasis on supporting learners' needs with compassion and understanding. The growing prominence of Large Language Models (LLMs) has led to the development of scalable solutions within educational settings, including generating different types of feedback… ▽ More

    Submitted 11 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: Accepted to 25th International Conference on Artificial Intelligence in Education (AIED 2024) BlueSky special track

  25. arXiv:2405.03181  [pdf, other

    cs.DC

    Collaborative Satellite Computing through Adaptive DNN Task Splitting and Offloading

    Authors: Shifeng Peng, Xuefeng Hou, Zhishu Shen, Qiushi Zheng, Jiong Jin, Atsushi Tagami, Jingling Yuan

    Abstract: Satellite computing has emerged as a promising technology for next-generation wireless networks. This innovative technology provides data processing capabilities, which facilitates the widespread implementation of artificial intelligence (AI)-based applications, especially for image processing tasks involving deep neural network (DNN). With the limited computing resources of an individual satellit… ▽ More

    Submitted 20 May, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted by 29th IEEE Symposium on Computers and Communications (ISCC)

  26. arXiv:2404.16385  [pdf, other

    cs.CV

    Efficiency in Focus: LayerNorm as a Catalyst for Fine-tuning Medical Visual Language Pre-trained Models

    Authors: Jiawei Chen, Dingkang Yang, Yue Jiang, Mingcheng Li, Jinjie Wei, Xiaolu Hou, Lihua Zhang

    Abstract: In the realm of Medical Visual Language Models (Med-VLMs), the quest for universal efficient fine-tuning mechanisms remains paramount, especially given researchers in interdisciplinary fields are often extremely short of training resources, yet largely unexplored. Given the unique challenges in the medical domain, such as limited data scope and significant domain-specific requirements, evaluating… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  27. arXiv:2404.13677  [pdf, other

    cs.CV eess.IV

    A Dataset and Model for Realistic License Plate Deblurring

    Authors: Haoyan Gong, Yuzheng Feng, Zhenrong Zhang, Xianxu Hou, Jingxin Liu, Siqi Huang, Hongbin Liu

    Abstract: Vehicle license plate recognition is a crucial task in intelligent traffic management systems. However, the challenge of achieving accurate recognition persists due to motion blur from fast-moving vehicles. Despite the widespread use of image synthesis approaches in existing deblurring and recognition algorithms, their effectiveness in real-world scenarios remains unproven. To address this, we int… ▽ More

    Submitted 22 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  28. arXiv:2404.12737  [pdf, other

    cs.SE

    LLM App Store Analysis: A Vision and Roadmap

    Authors: Yanjie Zhao, Xinyi Hou, Shenao Wang, Haoyu Wang

    Abstract: The rapid growth and popularity of large language model (LLM) app stores have created new opportunities and challenges for researchers, developers, users, and app store managers. As the LLM app ecosystem continues to evolve, it is crucial to understand the current landscape and identify potential areas for future research and development. This paper presents a forward-looking analysis of LLM app s… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  29. arXiv:2404.12736  [pdf, other

    cs.SE

    Large Language Model Supply Chain: A Research Agenda

    Authors: Shenao Wang, Yanjie Zhao, Xinyi Hou, Haoyu Wang

    Abstract: The rapid advancements in pre-trained Large Language Models (LLMs) and Large Multimodal Models (LMMs) have ushered in a new era of intelligent applications, transforming fields ranging from natural language processing to content generation. The LLM supply chain represents a crucial aspect of the contemporary artificial intelligence landscape. It encompasses the entire lifecycle of pre-trained mode… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  30. arXiv:2404.09425  [pdf, other

    eess.IV cs.CV

    Super-resolution of biomedical volumes with 2D supervision

    Authors: Cheng Jiang, Alexander Gedeon, Yiwei Lyu, Eric Landgraf, Yufeng Zhang, Xinhai Hou, Akhil Kondepudi, Asadur Chowdury, Honglak Lee, Todd Hollon

    Abstract: Volumetric biomedical microscopy has the potential to increase the diagnostic information extracted from clinical tissue specimens and improve the diagnostic accuracy of both human pathologists and computational pathology models. Unfortunately, barriers to integrating 3-dimensional (3D) volumetric microscopy into clinical medicine include long imaging times, poor depth / z-axis resolution, and an… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: CVPR Workshop on Computer Vision for Microscopy Image Analysis 2024

  31. arXiv:2404.08364  [pdf, other

    cs.DC

    FlowWalker: A Memory-efficient and High-performance GPU-based Dynamic Graph Random Walk Framework

    Authors: Junyi Mei, Shixuan Sun, Chao Li, Cheng Xu, Cheng Chen, Yibo Liu, Jing Wang, Cheng Zhao, Xiaofeng Hou, Minyi Guo, Bingsheng He, Xiaoliang Cong

    Abstract: Dynamic graph random walk (DGRW) emerges as a practical tool for capturing structural relations within a graph. Effectively executing DGRW on GPU presents certain challenges. First, existing sampling methods demand a pre-processing buffer, causing substantial space complexity. Moreover, the power-law distribution of graph vertex degrees introduces workload imbalance issues, rendering DGRW embarras… ▽ More

    Submitted 26 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

  32. arXiv:2404.05253  [pdf, other

    cs.CV

    CodeEnhance: A Codebook-Driven Approach for Low-Light Image Enhancement

    Authors: Xu Wu, XianXu Hou, Zhihui Lai, Jie Zhou, Ya-nan Zhang, Witold Pedrycz, Linlin Shen

    Abstract: Low-light image enhancement (LLIE) aims to improve low-illumination images. However, existing methods face two challenges: (1) uncertainty in restoration from diverse brightness degradations; (2) loss of texture and color information caused by noise suppression and light enhancement. In this paper, we propose a novel enhancement approach, CodeEnhance, by leveraging quantized priors and image refin… ▽ More

    Submitted 30 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: 10 pages, 13 figures

  33. arXiv:2404.02213  [pdf, other

    cs.HC cs.AI cs.CY

    Exploring How Multiple Levels of GPT-Generated Programming Hints Support or Disappoint Novices

    Authors: Ruiwei Xiao, Xinying Hou, John Stamper

    Abstract: Recent studies have integrated large language models (LLMs) into diverse educational contexts, including providing adaptive programming hints, a type of feedback focuses on helping students move forward during problem-solving. However, most existing LLM-based hint systems are limited to one single hint type. To investigate whether and how different levels of hints can support students' problem-sol… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted CHI 2024 LBW - 10 pages

  34. arXiv:2403.16131  [pdf, other

    cs.CV

    Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement

    Authors: Xiuquan Hou, Meiqin Liu, Senlin Zhang, Ping Wei, Badong Chen

    Abstract: DETR-like methods have significantly increased detection performance in an end-to-end manner. The mainstream two-stage frameworks of them perform dense self-attention and select a fraction of queries for sparse cross-attention, which is proven effective for improving performance but also introduces a heavy computational burden and high dependence on stable query selection. This paper demonstrates… ▽ More

    Submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted to CVPR 2024

  35. arXiv:2403.16002  [pdf, other

    cs.CV

    SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking

    Authors: Xiaojun Hou, Jiazheng Xing, Yijie Qian, Yaowei Guo, Shuo Xin, Junhao Chen, Kai Tang, Mengmeng Wang, Zhengkai Jiang, Liang Liu, Yong Liu

    Abstract: Multimodal Visual Object Tracking (VOT) has recently gained significant attention due to its robustness. Early research focused on fully fine-tuning RGB-based trackers, which was inefficient and lacked generalized representation due to the scarcity of multimodal data. Therefore, recent studies have utilized prompt tuning to transfer pre-trained RGB-based trackers to multimodal data. However, the m… ▽ More

    Submitted 27 March, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  36. arXiv:2403.13680  [pdf, other

    eess.IV cs.CV

    Step-Calibrated Diffusion for Biomedical Optical Image Restoration

    Authors: Yiwei Lyu, Sung Jik Cha, Cheng Jiang, Asadur Chowdury, Xinhai Hou, Edward Harake, Akhil Kondepudi, Christian Freudiger, Honglak Lee, Todd C. Hollon

    Abstract: High-quality, high-resolution medical imaging is essential for clinical care. Raman-based biomedical optical imaging uses non-ionizing infrared radiation to evaluate human tissues in real time and is used for early cancer detection, brain tumor diagnosis, and intraoperative tissue analysis. Unfortunately, optical imaging is vulnerable to image degradation due to laser scattering and absorption, wh… ▽ More

    Submitted 16 May, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  37. arXiv:2403.11614  [pdf, other

    cs.CV

    CRS-Diff: Controllable Generative Remote Sensing Foundation Model

    Authors: Datao Tang, Xiangyong Cao, Xingsong Hou, Zhongyuan Jiang, Deyu Meng

    Abstract: The emergence of generative models has revolutionized the field of remote sensing (RS) image generation. Despite generating high-quality images, existing methods are limited in relying mainly on text control conditions and thus don't always generate images accurately and stablely. In this paper, we propose CRS-Diff, a new RS generative foundation framework specifically tailored for RS image genera… ▽ More

    Submitted 11 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  38. arXiv:2402.13430  [pdf, other

    cs.LG cs.AI cs.SI

    LinkSAGE: Optimizing Job Matching Using Graph Neural Networks

    Authors: Ping Liu, Haichao Wei, Xiaochen Hou, Jianqiang Shen, Shihai He, Kay Qianqi Shen, Zhujun Chen, Fedor Borisyuk, Daniel Hewlett, Liang Wu, Srikant Veeraraghavan, Alex Tsun, Chengming Jiang, Wenjing Zhang

    Abstract: We present LinkSAGE, an innovative framework that integrates Graph Neural Networks (GNNs) into large-scale personalized job matching systems, designed to address the complex dynamics of LinkedIns extensive professional network. Our approach capitalizes on a novel job marketplace graph, the largest and most intricate of its kind in industry, with billions of nodes and edges. This graph is not merel… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  39. arXiv:2402.11139  [pdf, other

    cs.LG cs.AI

    LiGNN: Graph Neural Networks at LinkedIn

    Authors: Fedor Borisyuk, Shihai He, Yunbo Ouyang, Morteza Ramezani, Peng Du, Xiaochen Hou, Chengming Jiang, Nitin Pasumarthy, Priya Bannur, Birjodh Tiwana, Ping Liu, Siddharth Dangi, Daqi Sun, Zhoutao Pei, Xiao Shi, Sirou Zhu, Qianqi Shen, Kuang-Hsuan Lee, David Stein, Baolei Li, Haichao Wei, Amol Ghoting, Souvik Ghosh

    Abstract: In this paper, we present LiGNN, a deployed large-scale Graph Neural Networks (GNNs) Framework. We share our insight on developing and deployment of GNNs at large scale at LinkedIn. We present a set of algorithmic improvements to the quality of GNN representation learning including temporal graph architectures with long term losses, effective cold start solutions via graph densification, ID embedd… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  40. arXiv:2402.06859  [pdf, other

    cs.LG cs.AI cs.IR

    LiRank: Industrial Large Scale Ranking Models at LinkedIn

    Authors: Fedor Borisyuk, Mingzhou Zhou, Qingquan Song, Siyu Zhu, Birjodh Tiwana, Ganesh Parameswaran, Siddharth Dangi, Lars Hertel, Qiang Xiao, Xiaochen Hou, Yunbo Ouyang, Aman Gupta, Sheallika Singh, Dan Liu, Hailing Cheng, Lei Le, Jonathan Hung, Sathiya Keerthi, Ruoyan Wang, Fengyu Zhang, Mohit Kothari, Chen Zhu, Daqi Sun, Yun Dai, Xun Luan , et al. (9 additional authors not shown)

    Abstract: We present LiRank, a large-scale ranking framework at LinkedIn that brings to production state-of-the-art modeling architectures and optimization methods. We unveil several modeling improvements, including Residual DCN, which adds attention and residual connections to the famous DCNv2 architecture. We share insights into combining and tuning SOTA architectures to create a unified model, including… ▽ More

    Submitted 7 August, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    ACM Class: H.3.3

  41. arXiv:2402.06188  [pdf, other

    cs.CV cs.AI cs.LG

    A self-supervised framework for learning whole slide representations

    Authors: Xinhai Hou, Cheng Jiang, Akhil Kondepudi, Yiwei Lyu, Asadur Chowdury, Honglak Lee, Todd C. Hollon

    Abstract: Whole slide imaging is fundamental to biomedical microscopy and computational pathology. Previously, learning representations for gigapixel-sized whole slide images (WSIs) has relied on multiple instance learning with weak labels, which do not annotate the diverse morphologic features and spatial heterogeneity of WSIs. A high-quality self-supervised learning method for WSIs would provide transfera… ▽ More

    Submitted 23 May, 2024; v1 submitted 9 February, 2024; originally announced February 2024.

    Comments: 26 pages, 11 figures

  42. arXiv:2402.04390  [pdf

    cs.LG cs.AI

    Densely Multiplied Physics Informed Neural Networks

    Authors: Feilong Jiang, Xiaonan Hou, Min Xia

    Abstract: Although physics-informed neural networks (PINNs) have shown great potential in dealing with nonlinear partial differential equations (PDEs), it is common that PINNs will suffer from the problem of insufficient precision or obtaining incorrect outcomes. Unlike most of the existing solutions trying to enhance the ability of PINN by optimizing the training process, this paper improved the neural net… ▽ More

    Submitted 12 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: 15 pages, 9 figures

  43. CodeTailor: LLM-Powered Personalized Parsons Puzzles for Engaging Support While Learning Programming

    Authors: Xinying Hou, Zihan Wu, Xu Wang, Barbara J. Ericson

    Abstract: Learning to program can be challenging, and providing high-quality and timely support at scale is hard. Generative AI and its products, like ChatGPT, can create a solution for most intro-level programming problems. However, students might use these tools to just generate code for them, resulting in reduced engagement and limited learning. In this paper, we present CodeTailor, a system that leverag… ▽ More

    Submitted 30 May, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

    Comments: Accepted to the Eleventh ACM Conference on Learning @ Scale (L@S 24) as a full paper

  44. Integrating Personalized Parsons Problems with Multi-Level Textual Explanations to Scaffold Code Writing

    Authors: Xinying Hou, Barbara J. Ericson, Xu Wang

    Abstract: Novice programmers need to write basic code as part of the learning process, but they often face difficulties. To assist struggling students, we recently implemented personalized Parsons problems, which are code puzzles where students arrange blocks of code to solve them, as pop-up scaffolding. Students found them to be more engaging and preferred them for learning, instead of simply receiving the… ▽ More

    Submitted 11 January, 2024; v1 submitted 6 January, 2024; originally announced January 2024.

    Comments: Peer-Reviewed, Accepted for publication in Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 2 (SIGCSE 2024)

  45. arXiv:2401.00701  [pdf, other

    cs.CV

    Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning

    Authors: Kaibin Tian, Yanhua Cheng, Yi Liu, Xinglin Hou, Quan Chen, Han Li

    Abstract: In recent years, text-to-video retrieval methods based on CLIP have experienced rapid development. The primary direction of evolution is to exploit the much wider gamut of visual and textual cues to achieve alignment. Concretely, those methods with impressive performance often design a heavy fusion block for sentence (words)-video (frames) interaction, regardless of the prohibitive computation com… ▽ More

    Submitted 1 January, 2024; originally announced January 2024.

  46. arXiv:2312.06683  [pdf, other

    cs.IR

    AT4CTR: Auxiliary Match Tasks for Enhancing Click-Through Rate Prediction

    Authors: Qi Liu, Xuyang Hou, Defu Lian, Zhe Wang, Haoran Jin, Jia Cheng, Jun Lei

    Abstract: Click-through rate (CTR) prediction is a vital task in industrial recommendation systems. Most existing methods focus on the network architecture design of the CTR model for better accuracy and suffer from the data sparsity problem. Especially in industrial recommendation systems, the widely applied negative sample down-sampling technique due to resource limitation worsens the problem, resulting i… ▽ More

    Submitted 18 December, 2023; v1 submitted 9 December, 2023; originally announced December 2023.

  47. Understanding the Effects of Using Parsons Problems to Scaffold Code Writing for Students with Varying CS Self-Efficacy Levels

    Authors: Xinying Hou, Barbara J. Ericson, Xu Wang

    Abstract: Introductory programming courses aim to teach students to write code independently. However, transitioning from studying worked examples to generating their own code is often difficult and frustrating for students, especially those with lower CS self-efficacy in general. Therefore, we investigated the impact of using Parsons problems as a code-writing scaffold for students with varying levels of C… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

    Comments: Peer-Reviewed, Accepted for publication in the proceedings of the 2023 ACM Koli Calling International Conference on Computing Education Research

  48. arXiv:2311.17461  [pdf, other

    cs.CV

    When StyleGAN Meets Stable Diffusion: a $\mathscr{W}_+$ Adapter for Personalized Image Generation

    Authors: Xiaoming Li, Xinyu Hou, Chen Change Loy

    Abstract: Text-to-image diffusion models have remarkably excelled in producing diverse, high-quality, and photo-realistic images. This advancement has spurred a growing interest in incorporating specific identities into generated content. Most current methods employ an inversion approach to embed a target visual concept into the text embedding space using a single reference image. However, the newly synthes… ▽ More

    Submitted 29 November, 2023; originally announced November 2023.

  49. arXiv:2311.12850  [pdf, other

    cs.CV cs.CR cs.LG

    PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining

    Authors: Kecen Li, Chen Gong, Zhixiang Li, Yuzhong Zhao, Xinwen Hou, Tianhao Wang

    Abstract: Differential Privacy (DP) image data synthesis, which leverages the DP technique to generate synthetic data to replace the sensitive data, allowing organizations to share and utilize synthetic images without privacy concerns. Previous methods incorporate the advanced techniques of generative models and pre-training on a public dataset to produce exceptional DP image data, but suffer from problems… ▽ More

    Submitted 12 April, 2024; v1 submitted 19 October, 2023; originally announced November 2023.

    Comments: Accepted at USENIX Security 2024. The first two authors contributed equally

  50. arXiv:2311.10764  [pdf, other

    cs.IR cs.AI

    Deep Group Interest Modeling of Full Lifelong User Behaviors for CTR Prediction

    Authors: Qi Liu, Xuyang Hou, Haoran Jin, jin Chen, Zhe Wang, Defu Lian, Tan Qu, Jia Cheng, Jun Lei

    Abstract: Extracting users' interests from their lifelong behavior sequence is crucial for predicting Click-Through Rate (CTR). Most current methods employ a two-stage process for efficiency: they first select historical behaviors related to the candidate item and then deduce the user's interest from this narrowed-down behavior sub-sequence. This two-stage paradigm, though effective, leads to information lo… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.