Skip to main content

Showing 1–50 of 266 results for author: Fang, H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10129  [pdf, other

    cs.CV

    UNINEXT-Cutie: The 1st Solution for LSVOS Challenge RVOS Track

    Authors: Hao Fang, Feiyu Pan, Xiankai Lu, Wei Zhang, Runmin Cong

    Abstract: Referring video object segmentation (RVOS) relies on natural language expressions to segment target objects in video. In this year, LSVOS Challenge RVOS Track replaced the origin YouTube-RVOS benchmark with MeViS. MeViS focuses on referring the target object in a video through its motion descriptions instead of static attributes, posing a greater challenge to RVOS task. In this work, we integrate… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

  2. arXiv:2408.10125  [pdf, other

    cs.CV

    Video Object Segmentation via SAM 2: The 4th Solution for LSVOS Challenge VOS Track

    Authors: Feiyu Pan, Hao Fang, Runmin Cong, Wei Zhang, Xiankai Lu

    Abstract: Video Object Segmentation (VOS) task aims to segmenting a particular object instance throughout the entire video sequence given only the object mask of the first frame. Recently, Segment Anything Model 2 (SAM 2) is proposed, which is a foundation model towards solving promptable visual segmentation in images and videos. SAM 2 builds a data engine, which improves model and data via user interaction… ▽ More

    Submitted 19 August, 2024; originally announced August 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2408.00714

  3. arXiv:2408.07630  [pdf, other

    cs.IR cs.LG

    Towards Fair and Rigorous Evaluations: Hyperparameter Optimization for Top-N Recommendation Task with Implicit Feedback

    Authors: Hui Fang, Xu Feng, Lu Qin, Zhu Sun

    Abstract: The widespread use of the internet has led to an overwhelming amount of data, which has resulted in the problem of information overload. Recommender systems have emerged as a solution to this problem by providing personalized recommendations to users based on their preferences and historical data. However, as recommendation models become increasingly complex, finding the best hyperparameter combin… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  4. arXiv:2408.07600  [pdf, other

    cs.CV

    Disentangle and denoise: Tackling context misalignment for video moment retrieval

    Authors: Kaijing Ma, Han Fang, Xianghao Zang, Chao Ban, Lanxiang Zhou, Zhongjiang He, Yongxiang Li, Hao Sun, Zerun Feng, Xingsong Hou

    Abstract: Video Moment Retrieval, which aims to locate in-context video moments according to a natural language query, is an essential task for cross-modal grounding. Existing methods focus on enhancing the cross-modal interactions between all moments and the textual description for video understanding. However, constantly interacting with all locations is unreasonable because of uneven semantic distributio… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  5. arXiv:2408.06265  [pdf, other

    cs.RO

    EyeSight Hand: Design of a Fully-Actuated Dexterous Robot Hand with Integrated Vision-Based Tactile Sensors and Compliant Actuation

    Authors: Branden Romero, Hao-Shu Fang, Pulkit Agrawal, Edward Adelson

    Abstract: In this work, we introduce the EyeSight Hand, a novel 7 degrees of freedom (DoF) humanoid hand featuring integrated vision-based tactile sensors tailored for enhanced whole-hand manipulation. Additionally, we introduce an actuation scheme centered around quasi-direct drive actuation to achieve human-like strength and speed while ensuring robustness for large-scale data collection. We evaluate the… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  6. arXiv:2408.05966  [pdf, other

    cs.CV cs.AI cs.GR cs.MM

    Freehand Sketch Generation from Mechanical Components

    Authors: Zhichao Liao, Di Huang, Heming Fang, Yue Ma, Fengyuan Piao, Xinghui Li, Long Zeng, Pingfa Feng

    Abstract: Drawing freehand sketches of mechanical components on multimedia devices for AI-based engineering modeling has become a new trend. However, its development is being impeded because existing works cannot produce suitable sketches for data-driven research. These works either generate sketches lacking a freehand style or utilize generative models not originally designed for this task resulting in poo… ▽ More

    Submitted 21 August, 2024; v1 submitted 12 August, 2024; originally announced August 2024.

    Comments: Published at ACM Multimedia (ACM MM) 2024

  7. arXiv:2408.02455  [pdf, other

    cs.RO

    A Surprisingly Efficient Representation for Multi-Finger Grasping

    Authors: Hengxu Yan, Hao-Shu Fang, Cewu Lu

    Abstract: The problem of grasping objects using a multi-finger hand has received significant attention in recent years. However, it remains challenging to handle a large number of unfamiliar objects in real and cluttered environments. In this work, we propose a representation that can be effectively mapped to the multi-finger grasp space. Based on this representation, we develop a simple decision model that… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Published at International Conference on Robotics and Automation (ICRA) 2024

  8. arXiv:2408.02053  [pdf

    cs.CV

    PanicleNeRF: low-cost, high-precision in-field phenotypingof rice panicles with smartphone

    Authors: Xin Yang, Xuqi Lu, Pengyao Xie, Ziyue Guo, Hui Fang, Haowei Fu, Xiaochun Hu, Zhenbiao Sun, Haiyan Cen

    Abstract: The rice panicle traits significantly influence grain yield, making them a primary target for rice phenotyping studies. However, most existing techniques are limited to controlled indoor environments and difficult to capture the rice panicle traits under natural growth conditions. Here, we developed PanicleNeRF, a novel method that enables high-precision and low-cost reconstruction of rice panicle… ▽ More

    Submitted 4 August, 2024; originally announced August 2024.

  9. arXiv:2408.01342  [pdf, other

    cs.IR cs.AI

    Leveraging Knowledge Graph Embedding for Effective Conversational Recommendation

    Authors: Yunwen Xia, Hui Fang, Jie Zhang, Chong Long

    Abstract: Conversational recommender system (CRS), which combines the techniques of dialogue system and recommender system, has obtained increasing interest recently. In contrast to traditional recommender system, it learns the user preference better through interactions (i.e. conversations), and then further boosts the recommendation performance. However, existing studies on CRS ignore to address the relat… ▽ More

    Submitted 2 August, 2024; originally announced August 2024.

    Comments: 26pages, 15figures

  10. arXiv:2407.18849  [pdf

    cs.SI cs.CY

    MNTD: An Efficient Dynamic Community Detector Based on Nonnegative Tensor Decomposition

    Authors: Hao Fang, Qu Wang, Qicong Hu, Hao Wu

    Abstract: Dynamic community detection is crucial for elucidating the temporal evolution of social structures, information dissemination, and interactive behaviors within complex networks. Nonnegative matrix factorization provides an efficient framework for identifying communities in static networks but fall short in depicting temporal variations in community affiliations. To solve this problem, this paper p… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 10 pages, 5 figures,This paper will be published on 2024 IEEE International Conference on Systems, Man, and Cybernetics(SMC)

  11. arXiv:2407.17689  [pdf, other

    cs.CV

    SAM-MIL: A Spatial Contextual Aware Multiple Instance Learning Approach for Whole Slide Image Classification

    Authors: Heng Fang, Sheng Huang, Wenhao Tang, Luwen Huangfu, Bo Liu

    Abstract: Multiple Instance Learning (MIL) represents the predominant framework in Whole Slide Image (WSI) classification, covering aspects such as sub-typing, diagnosis, and beyond. Current MIL models predominantly rely on instance-level features derived from pretrained models such as ResNet. These models segment each WSI into independent patches and extract features from these local patches, leading to a… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: accepted by ACM Multimedia 2024

  12. arXiv:2407.14230  [pdf, other

    cs.CV cs.LG

    ETSCL: An Evidence Theory-Based Supervised Contrastive Learning Framework for Multi-modal Glaucoma Grading

    Authors: Zhiyuan Yang, Bo Zhang, Yufei Shi, Ningze Zhong, Johnathan Loh, Huihui Fang, Yanwu Xu, Si Yong Yeo

    Abstract: Glaucoma is one of the leading causes of vision impairment. Digital imaging techniques, such as color fundus photography (CFP) and optical coherence tomography (OCT), provide quantitative and noninvasive methods for glaucoma diagnosis. Recently, in the field of computer-aided glaucoma diagnosis, multi-modality methods that integrate the CFP and OCT modalities have achieved greater diagnostic accur… ▽ More

    Submitted 19 July, 2024; originally announced July 2024.

    Comments: Accepted by Ophthalmic Medical Image Analysis Workshop at MICCAI'24

  13. arXiv:2407.13863  [pdf, other

    cs.CV

    A Closer Look at GAN Priors: Exploiting Intermediate Features for Enhanced Model Inversion Attacks

    Authors: Yixiang Qiu, Hao Fang, Hongyao Yu, Bin Chen, MeiKang Qiu, Shu-Tao Xia

    Abstract: Model Inversion (MI) attacks aim to reconstruct privacy-sensitive training data from released models by utilizing output information, raising extensive concerns about the security of Deep Neural Networks (DNNs). Recent advances in generative adversarial networks (GANs) have contributed significantly to the improved performance of MI attacks due to their powerful ability to generate realistic image… ▽ More

    Submitted 27 July, 2024; v1 submitted 18 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  14. arXiv:2407.11843  [pdf, other

    cs.CL cs.AI

    InferAct: Inferring Safe Actions for LLM-Based Agents Through Preemptive Evaluation and Human Feedback

    Authors: Haishuo Fang, Xiaodan Zhu, Iryna Gurevych

    Abstract: A crucial requirement for deploying LLM-based agents in real-life applications is robustness against risky or irreversible mistakes. However, existing research lacks a focus on the preemptive evaluation of reasoning trajectories performed by LLM agents, leading to a gap in ensuring safe and reliable operations. To explore better solutions, this paper introduces InferAct, a novel approach that leve… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

  15. arXiv:2407.10179  [pdf, other

    cs.CV

    CLIP-Guided Networks for Transferable Targeted Attacks

    Authors: Hao Fang, Jiawei Kong, Bin Chen, Tao Dai, Hao Wu, Shu-Tao Xia

    Abstract: Transferable targeted adversarial attacks aim to mislead models into outputting adversary-specified predictions in black-box scenarios. Recent studies have introduced \textit{single-target} generative attacks that train a generator for each target class to generate highly transferable perturbations, resulting in substantial computational overhead when handling multiple classes. \textit{Multi-targe… ▽ More

    Submitted 22 July, 2024; v1 submitted 14 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  16. arXiv:2407.08926  [pdf, other

    cs.IR

    Toward Automatic Group Membership Annotation for Group Fairness Evaluation

    Authors: Fumian Chen, Dayu Yang, Hui Fang

    Abstract: With the increasing research attention on fairness in information retrieval systems, more and more fairness-aware algorithms have been proposed to ensure fairness for a sustainable and healthy retrieval ecosystem. However, as the most adopted measurement of fairness-aware algorithms, group fairness evaluation metrics, require group membership information that needs massive human annotations and is… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Journal ref: NLDB2024

  17. arXiv:2407.08348  [pdf, other

    cs.AI cs.CL cs.LG

    Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On

    Authors: Liang Zeng, Liangjun Zhong, Liang Zhao, Tianwen Wei, Liu Yang, Jujie He, Cheng Cheng, Rui Hu, Yang Liu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: In this paper, we investigate the underlying factors that potentially enhance the mathematical reasoning capabilities of large language models (LLMs). We argue that the data scaling law for math reasoning capabilities in modern LLMs is far from being saturated, highlighting how the model's quality improves with increases in data quantity. To support this claim, we introduce the Skywork-Math model… ▽ More

    Submitted 17 July, 2024; v1 submitted 11 July, 2024; originally announced July 2024.

  18. arXiv:2407.07427  [pdf, other

    cs.CV

    Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation

    Authors: Hao Fang, Peng Wu, Yawei Li, Xinxin Zhang, Xiankai Lu

    Abstract: Open-Vocabulary Video Instance Segmentation (VIS) is attracting increasing attention due to its ability to segment and track arbitrary objects. However, the recent Open-Vocabulary VIS attempts obtained unsatisfactory results, especially in terms of generalization ability of novel categories. We discover that the domain gap between the VLM features (e.g., CLIP) and the instance queries and the unde… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  19. arXiv:2406.17458  [pdf, other

    cs.CV

    Continuous Urban Change Detection from Satellite Image Time Series with Temporal Feature Refinement and Multi-Task Integration

    Authors: Sebastian Hafner, Heng Fang, Hossein Azizpour, Yifang Ban

    Abstract: Urbanization advances at unprecedented rates, resulting in negative effects on the environment and human well-being. Remote sensing has the potential to mitigate these effects by supporting sustainable development strategies with accurate information on urban growth. Deep learning-based methods have achieved promising urban change detection results from optical satellite image pairs using convolut… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE Transactions on Geoscience and Remote Sensing, Code will be available at https://github.com/SebastianHafner/ContUrbanCD.git

  20. arXiv:2406.17005  [pdf, other

    cs.CV

    PVUW 2024 Challenge on Complex Video Understanding: Methods and Results

    Authors: Henghui Ding, Chang Liu, Yunchao Wei, Nikhila Ravi, Shuting He, Song Bai, Philip Torr, Deshui Miao, Xin Li, Zhenyu He, Yaowei Wang, Ming-Hsuan Yang, Zhensong Xu, Jiangtao Yao, Chengjing Wu, Ting Liu, Luoqi Liu, Xinyu Liu, Jing Zhang, Kexin Zhang, Yuting Yang, Licheng Jiao, Shuyuan Yang, Mingqi Gao, Jingnan Luo , et al. (12 additional authors not shown)

    Abstract: Pixel-level Video Understanding in the Wild Challenge (PVUW) focus on complex video understanding. In this CVPR 2024 workshop, we add two new tracks, Complex Video Object Segmentation Track based on MOSE dataset and Motion Expression guided Video Segmentation track based on MeViS dataset. In the two new tracks, we provide additional videos and annotations that feature challenging elements, such as… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: MOSE Challenge: https://henghuiding.github.io/MOSE/ChallengeCVPR2024, MeViS Challenge: https://henghuiding.github.io/MeViS/ChallengeCVPR2024

  21. arXiv:2406.11142  [pdf, other

    cs.RO cs.CV

    Graspness Discovery in Clutters for Fast and Accurate Grasp Detection

    Authors: Chenxi Wang, Hao-Shu Fang, Minghao Gou, Hongjie Fang, Jin Gao, Cewu Lu

    Abstract: Efficient and robust grasp pose detection is vital for robotic manipulation. For general 6 DoF grasping, conventional methods treat all points in a scene equally and usually adopt uniform sampling to select grasp candidates. However, we discover that ignoring where to grasp greatly harms the speed and accuracy of current grasp pose detection methods. In this paper, we propose "graspness", a qualit… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: ICCV 2021

  22. arXiv:2406.07080  [pdf, other

    cs.CL

    DARA: Decomposition-Alignment-Reasoning Autonomous Language Agent for Question Answering over Knowledge Graphs

    Authors: Haishuo Fang, Xiaodan Zhu, Iryna Gurevych

    Abstract: Answering Questions over Knowledge Graphs (KGQA) is key to well-functioning autonomous language agents in various real-life applications. To improve the neural-symbolic reasoning capabilities of language agents powered by Large Language Models (LLMs) in KGQA, we propose the DecompositionAlignment-Reasoning Agent (DARA) framework. DARA effectively parses questions into formal queries through a dual… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

    Comments: Accepted by ACL2024 findings

  23. arXiv:2406.06563  [pdf, other

    cs.CL cs.AI

    Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models

    Authors: Tianwen Wei, Bo Zhu, Liang Zhao, Cheng Cheng, Biye Li, Weiwei Lü, Peng Cheng, Jianhao Zhang, Xiaoyu Zhang, Liang Zeng, Xiaokun Wang, Yutuan Ma, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: In this technical report, we introduce the training methodologies implemented in the development of Skywork-MoE, a high-performance mixture-of-experts (MoE) large language model (LLM) with 146 billion parameters and 16 experts. It is initialized from the pre-existing dense checkpoints of our Skywork-13B model. We explore the comparative effectiveness of upcycling versus training from scratch initi… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  24. arXiv:2406.05704  [pdf, other

    cs.CV

    Hierarchical Features Matter: A Deep Exploration of GAN Priors for Improved Dataset Distillation

    Authors: Xinhao Zhong, Hao Fang, Bin Chen, Xulin Gu, Tao Dai, Meikang Qiu, Shu-Tao Xia

    Abstract: Dataset distillation is an emerging dataset reduction method, which condenses large-scale datasets while maintaining task accuracy. Current methods have integrated parameterization techniques to boost synthetic dataset performance by shifting the optimization space from pixel to another informative feature domain. However, they limit themselves to a fixed optimization space for distillation, negle… ▽ More

    Submitted 12 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  25. arXiv:2406.05491  [pdf, other

    cs.CV cs.CR

    One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models

    Authors: Hao Fang, Jiawei Kong, Wenbo Yu, Bin Chen, Jiawei Li, Shutao Xia, Ke Xu

    Abstract: Vision-Language Pre-training (VLP) models trained on large-scale image-text pairs have demonstrated unprecedented capability in many practical applications. However, previous studies have revealed that VLP models are vulnerable to adversarial samples crafted by a malicious adversary. While existing attacks have achieved great success in improving attack effect and transferability, they all focus o… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  26. arXiv:2406.04842  [pdf, other

    cs.CV

    3rd Place Solution for MeViS Track in CVPR 2024 PVUW workshop: Motion Expression guided Video Segmentation

    Authors: Feiyu Pan, Hao Fang, Xiankai Lu

    Abstract: Referring video object segmentation (RVOS) relies on natural language expressions to segment target objects in video, emphasizing modeling dense text-video relations. The current RVOS methods typically use independently pre-trained vision and language models as backbones, resulting in a significant domain gap between video and text. In cross-modal feature interaction, text features are only used a… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  27. arXiv:2406.04214  [pdf, other

    cs.CL

    ValueBench: Towards Comprehensively Evaluating Value Orientations and Understanding of Large Language Models

    Authors: Yuanyi Ren, Haoran Ye, Hanjun Fang, Xin Zhang, Guojie Song

    Abstract: Large Language Models (LLMs) are transforming diverse fields and gaining increasing influence as human proxies. This development underscores the urgent need for evaluating value orientations and understanding of LLMs to ensure their responsible integration into public-facing applications. This work introduces ValueBench, the first comprehensive psychometric benchmark for evaluating value orientati… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted at ACL 2024

  28. arXiv:2406.01436  [pdf, other

    cs.CL

    Editing the Mind of Giants: An In-Depth Exploration of Pitfalls of Knowledge Editing in Large Language Models

    Authors: Cheng-Hsun Hsueh, Paul Kuo-Ming Huang, Tzu-Han Lin, Che-Wei Liao, Hung-Chieh Fang, Chao-Wei Huang, Yun-Nung Chen

    Abstract: Knowledge editing is a rising technique for efficiently updating factual knowledge in Large Language Models (LLMs) with minimal alteration of parameters. However, recent studies have identified concerning side effects, such as knowledge distortion and the deterioration of general abilities, that have emerged after editing. This survey presents a comprehensive study of these side effects, providing… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  29. arXiv:2406.00605  [pdf, other

    cs.CL cs.AI

    LongSkywork: A Training Recipe for Efficiently Extending Context Length in Large Language Models

    Authors: Liang Zhao, Tianwen Wei, Liang Zeng, Cheng Cheng, Liu Yang, Peng Cheng, Lijie Wang, Chenxia Li, Xuejie Wu, Bo Zhu, Yimeng Gan, Rui Hu, Shuicheng Yan, Han Fang, Yahui Zhou

    Abstract: We introduce LongSkywork, a long-context Large Language Model (LLM) capable of processing up to 200,000 tokens. We provide a training recipe for efficiently extending context length of LLMs. We identify that the critical element in enhancing long-context processing capability is to incorporate a long-context SFT stage following the standard SFT stage. A mere 200 iterations can convert the standard… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  30. arXiv:2405.20725  [pdf, other

    cs.AI cs.CV

    GI-NAS: Boosting Gradient Inversion Attacks through Adaptive Neural Architecture Search

    Authors: Wenbo Yu, Hao Fang, Bin Chen, Xiaohang Sui, Chuan Chen, Hao Wu, Shu-Tao Xia, Ke Xu

    Abstract: Gradient Inversion Attacks invert the transmitted gradients in Federated Learning (FL) systems to reconstruct the sensitive data of local clients and have raised considerable privacy concerns. A majority of gradient inversion methods rely heavily on explicit prior knowledge (e.g., a well pre-trained generative model), which is often unavailable in realistic scenarios. To alleviate this issue, rese… ▽ More

    Submitted 31 May, 2024; originally announced May 2024.

  31. Learn to be Fair without Labels: a Distribution-based Learning Framework for Fair Ranking

    Authors: Fumian Chen, Hui Fang

    Abstract: Ranking algorithms as an essential component of retrieval systems have been constantly improved in previous studies, especially regarding relevance-based utilities. In recent years, more and more research attempts have been proposed regarding fairness in rankings due to increasing concerns about potential discrimination and the issue of echo chamber. These attempts include traditional score-based… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: ICTIR'23

  32. LEO Satellite Network Access in the Wild: Potentials, Experiences, and Challenges

    Authors: Sami Ma, Yi Ching Chou, Miao Zhang, Hao Fang, Haoyuan Zhao, Jiangchuan Liu, William I. Atlas

    Abstract: In the past three years, working with the Pacific Salmon Foundation and various First Nations groups, we have established Starlink-empowered wild salmon monitoring sites in remote Northern British Columbia, Canada. We report our experiences with the network services in these challenging environments, including deep woods and deep valleys, that lack infrastructural support with some close to Starli… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

    Comments: 8 pages, 6 figures

    ACM Class: C.2.1

  33. arXiv:2405.03458  [pdf, other

    cs.CV

    SSyncOA: Self-synchronizing Object-aligned Watermarking to Resist Cropping-paste Attacks

    Authors: Chengxin Zhao, Hefei Ling, Sijing Xie, Han Fang, Yaokun Fang, Nan Sun

    Abstract: Modern image processing tools have made it easy for attackers to crop the region or object of interest in images and paste it into other images. The challenge this cropping-paste attack poses to the watermarking technology is that it breaks the synchronization of the image watermark, introducing multiple superimposed desynchronization distortions, such as rotation, scaling, and translation. Howeve… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: 7 pages, 5 figures (Have been accepted by ICME 2024)

  34. arXiv:2404.16233  [pdf, other

    cs.LG cs.AI

    AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models

    Authors: Zhiqiang Tang, Haoyang Fang, Su Zhou, Taojiannan Yang, Zihan Zhong, Tony Hu, Katrin Kirchhoff, George Karypis

    Abstract: AutoGluon-Multimodal (AutoMM) is introduced as an open-source AutoML library designed specifically for multimodal learning. Distinguished by its exceptional ease of use, AutoMM enables fine-tuning of foundation models with just three lines of code. Supporting various modalities including image, text, and tabular data, both independently and in combination, the library offers a comprehensive suite… ▽ More

    Submitted 30 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted at AutoML 2024 Conference

  35. arXiv:2404.13263  [pdf, other

    cs.CV

    FilterPrompt: Guiding Image Transfer in Diffusion Models

    Authors: Xi Wang, Yichen Peng, Heng Fang, Haoran Xie, Xi Yang, Chuntao Li

    Abstract: In controllable generation tasks, flexibly manipulating the generated images to attain a desired appearance or structure based on a single input image cue remains a critical and longstanding challenge. Achieving this requires the effective decoupling of key attributes within the input image data, aiming to get representations accurately. Previous research has predominantly concentrated on disentan… ▽ More

    Submitted 12 May, 2024; v1 submitted 20 April, 2024; originally announced April 2024.

    Comments: Project Page: https://meaoxixi.github.io/FilterPrompt/

  36. arXiv:2404.12281  [pdf, other

    cs.RO

    RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective

    Authors: Chenxi Wang, Hongjie Fang, Hao-Shu Fang, Cewu Lu

    Abstract: Precise robot manipulations require rich spatial information in imitation learning. Image-based policies model object positions from fixed cameras, which are sensitive to camera view changes. Policies utilizing 3D point clouds usually predict keyframes rather than continuous actions, posing difficulty in dynamic and contact-rich scenarios. To utilize 3D perception efficiently, we present RISE, an… ▽ More

    Submitted 21 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  37. arXiv:2404.12216  [pdf, other

    cs.CV

    ProTA: Probabilistic Token Aggregation for Text-Video Retrieval

    Authors: Han Fang, Xianghao Zang, Chao Ban, Zerun Feng, Lanxiang Zhou, Zhongjiang He, Yongxiang Li, Hao Sun

    Abstract: Text-video retrieval aims to find the most relevant cross-modal samples for a given query. Recent methods focus on modeling the whole spatial-temporal relations. However, since video clips contain more diverse content than captions, the model aligning these asymmetric video-text pairs has a high risk of retrieving many false positive results. In this paper, we propose Probabilistic Token Aggregati… ▽ More

    Submitted 20 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  38. Behavior Alignment: A New Perspective of Evaluating LLM-based Conversational Recommendation Systems

    Authors: Dayu Yang, Fumian Chen, Hui Fang

    Abstract: Large Language Models (LLMs) have demonstrated great potential in Conversational Recommender Systems (CRS). However, the application of LLMs to CRS has exposed a notable discrepancy in behavior between LLM-based CRS and human recommenders: LLMs often appear inflexible and passive, frequently rushing to complete the recommendation task without sufficient inquiry.This behavior discrepancy can lead t… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: Accepted by the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024)

  39. arXiv:2404.04956  [pdf, other

    cs.CV cs.CR

    Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models

    Authors: Zijin Yang, Kai Zeng, Kejiang Chen, Han Fang, Weiming Zhang, Nenghai Yu

    Abstract: Ethical concerns surrounding copyright protection and inappropriate content generation pose challenges for the practical implementation of diffusion models. One effective solution involves watermarking the generated images. However, existing methods often compromise the model performance or require additional training, which is undesirable for operators and users. To address this issue, we propose… ▽ More

    Submitted 6 May, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: 17 pages, 11 figures, accepted by CVPR 2024

  40. arXiv:2404.02284  [pdf, other

    cs.RO

    APEX: Ambidextrous Dual-Arm Robotic Manipulation Using Collision-Free Generative Diffusion Models

    Authors: Apan Dastider, Hao Fang, Mingjie Lin

    Abstract: Dexterous manipulation, particularly adept coordinating and grasping, constitutes a fundamental and indispensable capability for robots, facilitating the emulation of human-like behaviors. Integrating this capability into robots empowers them to supplement and even supplant humans in undertaking increasingly intricate tasks in both daily life and industrial settings. Unfortunately, contemporary me… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Under Review in IEEE IROS 2024

  41. arXiv:2404.00261  [pdf, other

    cs.IR cs.AI

    A Simple Yet Effective Approach for Diversified Session-Based Recommendation

    Authors: Qing Yin, Hui Fang, Zhu Sun, Yew-Soon Ong

    Abstract: Session-based recommender systems (SBRSs) have become extremely popular in view of the core capability of capturing short-term and dynamic user preferences. However, most SBRSs primarily maximize recommendation accuracy but ignore user minor preferences, thus leading to filter bubbles in the long run. Only a handful of works, being devoted to improving diversity, depend on unique model designs and… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  42. arXiv:2403.19622  [pdf, other

    cs.RO cs.CV

    RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

    Authors: Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng

    Abstract: The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments. Recent progress in utilizing language models as high-level planners has demonstrated that the complexity of tasks can be reduced through decomposing them into primitive-level plans, mak… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 24 pages, 12 figures, 6 tables

  43. arXiv:2403.18462  [pdf, other

    cs.IR

    Decoy Effect In Search Interaction: Understanding User Behavior and Measuring System Vulnerability

    Authors: Nuo Chen, Jiqun Liu, Hanpei Fang, Yuankai Luo, Tetsuya Sakai, Xiao-Ming Wu

    Abstract: This study examines the decoy effect's underexplored influence on user search interactions and methods for measuring information retrieval (IR) systems' vulnerability to this effect. It explores how decoy results alter users' interactions on search engine result pages, focusing on metrics like click-through likelihood, browsing time, and perceived document usefulness. By analyzing user interaction… ▽ More

    Submitted 27 March, 2024; originally announced March 2024.

  44. Data-driven Energy Consumption Modelling for Electric Micromobility using an Open Dataset

    Authors: Yue Ding, Sen Yan, Maqsood Hussain Shah, Hongyuan Fang, Ji Li, Mingming Liu

    Abstract: The escalating challenges of traffic congestion and environmental degradation underscore the critical importance of embracing E-Mobility solutions in urban spaces. In particular, micro E-Mobility tools such as E-scooters and E-bikes, play a pivotal role in this transition, offering sustainable alternatives for urban commuters. However, the energy consumption patterns for these tools are a critical… ▽ More

    Submitted 19 August, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: 7 pages, 5 figures, 4 tables. This manuscript has been accepted by the IEEE ITEC 2024

  45. arXiv:2403.16788  [pdf, other

    cs.CV

    HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation

    Authors: Linglin Jing, Yiming Ding, Yunpeng Gao, Zhigang Wang, Xu Yan, Dong Wang, Gerald Schaefer, Hui Fang, Bin Zhao, Xuelong Li

    Abstract: Event-based semantic segmentation has gained popularity due to its capability to deal with scenarios under high-speed motion and extreme lighting conditions, which cannot be addressed by conventional RGB cameras. Since it is hard to annotate event data, previous approaches rely on event-to-image reconstruction to obtain pseudo labels for training. However, this will inevitably introduce noise, and… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  46. arXiv:2403.08948  [pdf, ps, other

    eess.SY cs.GT

    Model-free Resilient Controller Design based on Incentive Feedback Stackelberg Game and Q-learning

    Authors: Jiajun Shen, Fengjun Li, Morteza Hashemi, Huazhen Fang

    Abstract: In the swift evolution of Cyber-Physical Systems (CPSs) within intelligent environments, especially in the industrial domain shaped by Industry 4.0, the surge in development brings forth unprecedented security challenges. This paper explores the intricate security issues of Industrial CPSs (ICPSs), with a specific focus on the unique threats presented by intelligent attackers capable of directly c… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 8 pages

  47. arXiv:2403.04746  [pdf, other

    cs.CL cs.AI cs.LG

    LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

    Authors: Boshi Wang, Hao Fang, Jason Eisner, Benjamin Van Durme, Yu Su

    Abstract: Tools are essential for large language models (LLMs) to acquire up-to-date information and take consequential actions in external environments. Existing work on tool-augmented LLMs primarily focuses on the broad coverage of tools and the flexibility of adding new tools. However, a critical aspect that has surprisingly been understudied is simply how accurately an LLM uses tools for which it has be… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: Code and data available at https://github.com/microsoft/simulated-trial-and-error

  48. arXiv:2402.14872  [pdf, other

    cs.CL cs.AI cs.NE

    Semantic Mirror Jailbreak: Genetic Algorithm Based Jailbreak Prompts Against Open-source LLMs

    Authors: Xiaoxia Li, Siyuan Liang, Jiyi Zhang, Han Fang, Aishan Liu, Ee-Chien Chang

    Abstract: Large Language Models (LLMs), used in creative writing, code generation, and translation, generate text based on input sequences but are vulnerable to jailbreak attacks, where crafted prompts induce harmful outputs. Most jailbreak prompt methods use a combination of jailbreak templates followed by questions to ask to create jailbreak prompts. However, existing jailbreak prompt designs generally su… ▽ More

    Submitted 27 February, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  49. arXiv:2402.09270  [pdf, other

    cs.CV

    Fast Window-Based Event Denoising with Spatiotemporal Correlation Enhancement

    Authors: Huachen Fang, Jinjian Wu, Qibin Hou, Weisheng Dong, Guangming Shi

    Abstract: Previous deep learning-based event denoising methods mostly suffer from poor interpretability and difficulty in real-time processing due to their complex architecture designs. In this paper, we propose window-based event denoising, which simultaneously deals with a stack of events while existing element-based denoising focuses on one event each time. Besides, we give the theoretical analysis based… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  50. arXiv:2402.05819  [pdf, other

    eess.AS cs.CL cs.LG

    Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model

    Authors: Hung-Chieh Fang, Nai-Xuan Ye, Yi-Jen Shih, Puyuan Peng, Hsuan-Fu Wang, Layne Berry, Hung-yi Lee, David Harwath

    Abstract: Recent advances in self-supervised speech models have shown significant improvement in many downstream tasks. However, these models predominantly centered on frame-level training objectives, which can fall short in spoken language understanding tasks that require semantic comprehension. Existing works often rely on additional speech-text data as intermediate targets, which is costly in the real-wo… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted to ICASSP 2024 workshop on Self-supervision in Audio, Speech, and Beyond (SASB)