Skip to main content

Showing 1–50 of 1,970 results for author: Zhou, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.10908  [pdf, other

    cs.RO cs.HC

    Enhancing End-to-End Autonomous Driving Systems Through Synchronized Human Behavior Data

    Authors: Yiqun Duan, Zhuoli Zhuang, Jinzhao Zhou, Yu-Cheng Chang, Yu-Kai Wang, Chin-Teng Lin

    Abstract: This paper presents a pioneering exploration into the integration of fine-grained human supervision within the autonomous driving domain to enhance system performance. The current advances in End-to-End autonomous driving normally are data-driven and rely on given expert trials. However, this reliance limits the systems' generalizability and their ability to earn human trust. Addressing this gap,… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  2. arXiv:2408.10822  [pdf, other

    cs.LG

    Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting

    Authors: Jianxiang Zhou, Erdong Liu, Wei Chen, Siru Zhong, Yuxuan Liang

    Abstract: Traffic forecasting has emerged as a crucial research area in the development of smart cities. Although various neural networks with intricate architectures have been developed to address this problem, they still face two key challenges: i) Recent advancements in network designs for modeling spatio-temporal correlations are starting to see diminishing returns in performance enhancements. ii) Addit… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  3. arXiv:2408.10764  [pdf, other

    cs.CL

    Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model

    Authors: Chenhan Yuan, Fei Huang, Ru Peng, Keming Lu, Bowen Yu, Chang Zhou, Jingren Zhou

    Abstract: Transformer-based large language models (LLMs) exhibit limitations such as generating unsafe responses, unreliable reasoning, etc. Existing inference intervention approaches attempt to mitigate these issues by finetuning additional models to produce calibration signals (such as rewards) that guide the LLM's decoding process. However, this solution introduces substantial time and space overhead due… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: 16 pages

  4. arXiv:2408.10679  [pdf, other

    cs.CV

    DemMamba: Alignment-free Raw Video Demoireing with Frequency-assisted Spatio-Temporal Mamba

    Authors: Shuning Xu, Xina Liu, Binbin Song, Xiangyu Chen, Qiubo Chen, Jiantao Zhou

    Abstract: Moire patterns arise when two similar repetitive patterns interfere, a phenomenon frequently observed during the capture of images or videos on screens. The color, shape, and location of moire patterns may differ across video frames, posing a challenge in learning information from adjacent frames and preserving temporal consistency. Previous video demoireing methods heavily rely on well-designed a… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

  5. arXiv:2408.10567  [pdf, other

    q-bio.NC cs.AI cs.CV cs.LG

    Prompt Your Brain: Scaffold Prompt Tuning for Efficient Adaptation of fMRI Pre-trained Model

    Authors: Zijian Dong, Yilei Wu, Zijiao Chen, Yichi Zhang, Yueming Jin, Juan Helen Zhou

    Abstract: We introduce Scaffold Prompt Tuning (ScaPT), a novel prompt-based framework for adapting large-scale functional magnetic resonance imaging (fMRI) pre-trained models to downstream tasks, with high parameter efficiency and improved performance compared to fine-tuning and baselines for prompt tuning. The full fine-tuning updates all pre-trained parameters, which may distort the learned feature space… ▽ More

    Submitted 20 August, 2024; originally announced August 2024.

    Comments: MICCAI 2024

  6. arXiv:2408.09058  [pdf, other

    cs.RO

    Vision-assisted Avocado Harvesting with Aerial Bimanual Manipulation

    Authors: Zhichao Liu, Jingzong Zhou, Caio Mucchiani, Konstantinos Karydis

    Abstract: Robotic fruit harvesting holds potential in precision agriculture to improve harvesting efficiency. While ground mobile robots are mostly employed in fruit harvesting, certain crops, like avocado trees, cannot be harvested efficiently from the ground alone. This is because of unstructured ground and planting arrangement and high-to-reach fruits. In such cases, aerial robots integrated with manipul… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: First Two Authors Share Equal Contribution. 13 Pages, 15 Figures

  7. arXiv:2408.08601  [pdf, other

    cs.CV

    Learning A Low-Level Vision Generalist via Visual Task Prompt

    Authors: Xiangyu Chen, Yihao Liu, Yuandong Pu, Wenlong Zhang, Jiantao Zhou, Yu Qiao, Chao Dong

    Abstract: Building a unified model for general low-level vision tasks holds significant research and practical value. Current methods encounter several critical issues. Multi-task restoration approaches can address multiple degradation-to-clean restoration tasks, while their applicability to tasks with different target domains (e.g., image stylization) is limited. Methods like PromptGIP can handle multiple… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: Accepted to ACMMM24

  8. arXiv:2408.08570  [pdf, other

    cs.CV

    EraW-Net: Enhance-Refine-Align W-Net for Scene-Associated Driver Attention Estimation

    Authors: Jun Zhou, Chunsheng Liu, Faliang Chang, Wenqian Wang, Penghui Hao, Yiming Huang, Zhiqiang Yang

    Abstract: Associating driver attention with driving scene across two fields of views (FOVs) is a hard cross-domain perception problem, which requires comprehensive consideration of cross-view mapping, dynamic driving scene analysis, and driver status tracking. Previous methods typically focus on a single view or map attention to the scene via estimated gaze, failing to exploit the implicit connection betwee… ▽ More

    Submitted 16 August, 2024; originally announced August 2024.

    Comments: 13pages, 9 figures,

  9. arXiv:2408.08018  [pdf, other

    cs.HC

    Investigating Size Congruency Between the Visual Perception of a VR Object and the Haptic Perception of Its Physical World Agent

    Authors: Wenqi Zheng, Dawei Xiong, Cekai Weng, Jiajun Jiang, Junwei Li, Jinni Zhou, Mingming Fan

    Abstract: The perception of physical objects and miniatures enhances the realism and immersion in VR. This work explores the relationship between haptic feedback from real objects and their visual representations in VR. The study examines how users confirm and adjust the sizes of different virtual objects. The results show that as the size of the virtual cubes increases, users are less likely to perceive th… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 8 pages, 6 figures, VINCI 2024

  10. arXiv:2408.08003  [pdf, other

    cs.CL

    Leveraging Web-Crawled Data for High-Quality Fine-Tuning

    Authors: Jing Zhou, Chenglin Jiang, Wei Shen, Xiao Zhou, Xiaonan He

    Abstract: Most large language models are fine-tuned using either expensive human-annotated data or GPT-4 generated data which cannot guarantee performance in certain domains. We argue that although the web-crawled data often has formatting errors causing semantic inaccuracies, it can still serve as a valuable source for high-quality supervised fine-tuning in specific domains without relying on advanced mode… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

  11. arXiv:2408.07733  [pdf, other

    cs.LG cs.CR

    Enhancing Adversarial Attacks via Parameter Adaptive Adversarial Attack

    Authors: Zhibo Jin, Jiayu Zhang, Zhiyu Zhu, Chenyu Zhang, Jiahao Huang, Jianlong Zhou, Fang Chen

    Abstract: In recent times, the swift evolution of adversarial attacks has captured widespread attention, particularly concerning their transferability and other performance attributes. These techniques are primarily executed at the sample level, frequently overlooking the intrinsic parameters of models. Such neglect suggests that the perturbations introduced in adversarial samples might have the potential f… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  12. arXiv:2408.07556  [pdf, other

    cs.LG

    PolyCL: Contrastive Learning for Polymer Representation Learning via Explicit and Implicit Augmentations

    Authors: Jiajun Zhou, Yijie Yang, Austin M. Mroz, Kim E. Jelfs

    Abstract: Polymers play a crucial role in a wide array of applications due to their diverse and tunable properties. Establishing the relationship between polymer representations and their properties is crucial to the computational design and screening of potential polymers via machine learning. The quality of the representation significantly influences the effectiveness of these computational methods. Here,… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  13. arXiv:2408.07444  [pdf, other

    eess.IV cs.CV

    Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark

    Authors: Senmao Wang, Haifan Gong, Runmeng Cui, Boyao Wan, Yicheng Liu, Zhonglin Hu, Haiqing Yang, Jingyang Zhou, Bo Pan, Lin Lin, Haiyue Jiang

    Abstract: Costal cartilage segmentation is crucial to various medical applications, necessitating precise and reliable techniques due to its complex anatomy and the importance of accurate diagnosis and surgical planning. We propose a novel deep learning-based approach called topology-guided deformable Mamba (TGDM) for costal cartilage segmentation. The TGDM is tailored to capture the intricate long-range co… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  14. arXiv:2408.07278  [pdf, other

    cs.IR cs.AI cs.CV

    Scene-wise Adaptive Network for Dynamic Cold-start Scenes Optimization in CTR Prediction

    Authors: Wenhao Li, Jie Zhou, Chuan Luo, Chao Tang, Kun Zhang, Shixiong Zhao

    Abstract: In the realm of modern mobile E-commerce, providing users with nearby commercial service recommendations through location-based online services has become increasingly vital. While machine learning approaches have shown promise in multi-scene recommendation, existing methodologies often struggle to address cold-start problems in unprecedented scenes: the increasing diversity of commercial choices,… ▽ More

    Submitted 18 August, 2024; v1 submitted 3 August, 2024; originally announced August 2024.

    Comments: 10 pages, 6 figures, accepted by Recsys 2024

    MSC Class: 68T09 ACM Class: I.2.0

  15. arXiv:2408.07083  [pdf, other

    cs.LG cs.AI

    Masked EEG Modeling for Driving Intention Prediction

    Authors: Jinzhao Zhou, Justin Sia, Yiqun Duan, Yu-Cheng Chang, Yu-Kai Wang, Chin-Teng Lin

    Abstract: Driving under drowsy conditions significantly escalates the risk of vehicular accidents. Although recent efforts have focused on using electroencephalography to detect drowsiness, helping prevent accidents caused by driving in such states, seamless human-machine interaction in driving scenarios requires a more versatile EEG-based system. This system should be capable of understanding a driver's in… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  16. arXiv:2408.06927  [pdf, other

    cs.CV cs.LG

    Breaking Class Barriers: Efficient Dataset Distillation via Inter-Class Feature Compensator

    Authors: Xin Zhang, Jiawei Du, Ping Liu, Joey Tianyi Zhou

    Abstract: Dataset distillation has emerged as a technique aiming to condense informative features from large, natural datasets into a compact and synthetic form. While recent advancements have refined this technique, its performance is bottlenecked by the prevailing class-specific synthesis paradigm. Under this paradigm, synthetic data is optimized exclusively for a pre-assigned one-hot label, creating an i… ▽ More

    Submitted 13 August, 2024; originally announced August 2024.

  17. arXiv:2408.05740  [pdf, other

    cs.LG cs.AI stat.ML

    MTSCI: A Conditional Diffusion Model for Multivariate Time Series Consistent Imputation

    Authors: Jianping Zhou, Junhao Li, Guanjie Zheng, Xinbing Wang, Chenghu Zhou

    Abstract: Missing values are prevalent in multivariate time series, compromising the integrity of analyses and degrading the performance of downstream tasks. Consequently, research has focused on multivariate time series imputation, aiming to accurately impute the missing values based on available observations. A key research question is how to ensure imputation consistency, i.e., intra-consistency between… ▽ More

    Submitted 11 August, 2024; originally announced August 2024.

    Comments: 10 pages, 5 figures, accepted by CIKM2024

  18. arXiv:2408.04967  [pdf, other

    eess.AS cs.SD

    ADD 2023: Towards Audio Deepfake Detection and Analysis in the Wild

    Authors: Jiangyan Yi, Chu Yuan Zhang, Jianhua Tao, Chenglong Wang, Xinrui Yan, Yong Ren, Hao Gu, Junzuo Zhou

    Abstract: The growing prominence of the field of audio deepfake detection is driven by its wide range of applications, notably in protecting the public from potential fraud and other malicious activities, prompting the need for greater attention and research in this area. The ADD 2023 challenge goes beyond binary real/fake classification by emulating real-world scenarios, such as the identification of manip… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  19. arXiv:2408.04879  [pdf, other

    cs.CV

    On the Element-Wise Representation and Reasoning in Zero-Shot Image Recognition: A Systematic Survey

    Authors: Jingcai Guo, Zhijie Rao, Zhi Chen, Song Guo, Jingren Zhou, Dacheng Tao

    Abstract: Zero-shot image recognition (ZSIR) aims at empowering models to recognize and reason in unseen domains via learning generalized knowledge from limited data in the seen domain. The gist for ZSIR is to execute element-wise representation and reasoning from the input visual space to the target semantic space, which is a bottom-up modeling paradigm inspired by the process by which humans observe the w… ▽ More

    Submitted 9 August, 2024; originally announced August 2024.

    Comments: 24 pages, 7 figures

  20. arXiv:2408.04840  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models

    Authors: Jiabo Ye, Haiyang Xu, Haowei Liu, Anwen Hu, Ming Yan, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou

    Abstract: Multi-modal Large Language Models (MLLMs) have demonstrated remarkable capabilities in executing instructions for a variety of single-image tasks. Despite this progress, significant challenges remain in modeling long image sequences. In this work, we introduce the versatile multi-modal large language model, mPLUG-Owl3, which enhances the capability for long image-sequence understanding in scenario… ▽ More

    Submitted 13 August, 2024; v1 submitted 8 August, 2024; originally announced August 2024.

  21. arXiv:2408.04679  [pdf, other

    cs.CL cs.AI cs.LG

    Towards Linguistic Neural Representation Learning and Sentence Retrieval from Electroencephalogram Recordings

    Authors: Jinzhao Zhou, Yiqun Duan, Ziyi Zhao, Yu-Cheng Chang, Yu-Kai Wang, Thomas Do, Chin-Teng Lin

    Abstract: Decoding linguistic information from non-invasive brain signals using EEG has gained increasing research attention due to its vast applicational potential. Recently, a number of works have adopted a generative-based framework to decode electroencephalogram (EEG) signals into sentences by utilizing the power generative capacity of pretrained large language models (LLMs). However, this approach has… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  22. arXiv:2408.03631  [pdf, ps, other

    cs.AI cs.CL

    Large Language Models for Base Station Siting: Intelligent Deployment based on Prompt or Agent

    Authors: Yanhu Wang, Muhammad Muzammil Afzal, Zhengyang Li, Jie Zhou, Chenyuan Feng, Shuaishuai Guo, Tony Q. S. Quek

    Abstract: Traditional base station siting (BSS) methods rely heavily on drive testing and user feedback, which are laborious and require extensive expertise in communication, networking, and optimization. As large language models (LLMs) and their associated technologies advance, particularly in the realms of prompt engineering and agent engineering, network optimization will witness a revolutionary approach… ▽ More

    Submitted 7 August, 2024; originally announced August 2024.

  23. arXiv:2408.03429  [pdf, other

    quant-ph cs.ET

    MarQSim: Reconciling Determinism and Randomness in Compiler Optimization for Quantum Simulation

    Authors: Xiuqi Cao, Junyu Zhou, Yuhao Liu, Yunong Shi, Gushu Li

    Abstract: Quantum simulation, fundamental in quantum algorithm design, extends far beyond its foundational roots, powering diverse quantum computing applications. However, optimizing the compilation of quantum Hamiltonian simulation poses significant challenges. Existing approaches fall short in reconciling deterministic and randomized compilation, lack appropriate intermediate representations, and struggle… ▽ More

    Submitted 6 August, 2024; originally announced August 2024.

  24. arXiv:2408.03095  [pdf, other

    cs.SE

    TestART: Improving LLM-based Unit Test via Co-evolution of Automated Generation and Repair Iteration

    Authors: Siqi Gu, Chunrong Fang, Quanjun Zhang, Fangyuan Tian, Jianyi Zhou, Zhenyu Chen

    Abstract: Unit test is crucial for detecting bugs in individual program units but consumes time and effort. The existing automated unit test generation methods are mainly based on search-based software testing (SBST) and language models to liberate developers. Recently, large language models (LLMs) have demonstrated remarkable reasoning and generation capabilities. However, several problems limit their abil… ▽ More

    Submitted 12 August, 2024; v1 submitted 6 August, 2024; originally announced August 2024.

  25. arXiv:2408.02920  [pdf, other

    cs.SE cs.AI

    A Taxonomy of Architecture Options for Foundation Model-based Agents: Analysis and Decision Model

    Authors: Jingwen Zhou, Qinghua Lu, Jieshan Chen, Liming Zhu, Xiwei Xu, Zhenchang Xing, Stefan Harrer

    Abstract: The rapid advancement of AI technology has led to widespread applications of agent systems across various domains. However, the need for detailed architecture design poses significant challenges in designing and operating these systems. This paper introduces a taxonomy focused on the architectures of foundation-model-based agents, addressing critical aspects such as functional capabilities and non… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

    Comments: Under review

  26. arXiv:2408.02751  [pdf, other

    cs.LG

    A Novel Hybrid Approach for Tornado Prediction in the United States: Kalman-Convolutional BiLSTM with Multi-Head Attention

    Authors: Jiawei Zhou

    Abstract: Tornadoes are among the most intense atmospheric vortex phenomena and pose significant challenges for detection and forecasting. Conventional methods, which heavily depend on ground-based observations and radar data, are limited by issues such as decreased accuracy over greater distances and a high rate of false positives. To address these challenges, this study utilizes the Seamless Hybrid Scan R… ▽ More

    Submitted 5 August, 2024; originally announced August 2024.

  27. arXiv:2408.01800  [pdf, other

    cs.CV

    MiniCPM-V: A GPT-4V Level MLLM on Your Phone

    Authors: Yuan Yao, Tianyu Yu, Ao Zhang, Chongyi Wang, Junbo Cui, Hongji Zhu, Tianchi Cai, Haoyu Li, Weilin Zhao, Zhihui He, Qianyu Chen, Huarong Zhou, Zhensheng Zou, Haoye Zhang, Shengding Hu, Zhi Zheng, Jie Zhou, Jie Cai, Xu Han, Guoyang Zeng, Dahai Li, Zhiyuan Liu, Maosong Sun

    Abstract: The recent surge of Multimodal Large Language Models (MLLMs) has fundamentally reshaped the landscape of AI research and industry, shedding light on a promising path toward the next AI milestone. However, significant challenges remain preventing MLLMs from being practical in real-world applications. The most notable challenge comes from the huge cost of running an MLLM with a massive number of par… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

    Comments: preprint

  28. arXiv:2408.00989  [pdf, other

    cs.AI

    On the Resilience of Multi-Agent Systems with Malicious Agents

    Authors: Jen-tse Huang, Jiaxu Zhou, Tailin Jin, Xuhui Zhou, Zixi Chen, Wenxuan Wang, Youliang Yuan, Maarten Sap, Michael R. Lyu

    Abstract: Multi-agent systems, powered by large language models, have shown great abilities across various tasks due to the collaboration of expert agents, each focusing on a specific domain. However, when agents are deployed separately, there is a risk that malicious users may introduce malicious agents who generate incorrect or irrelevant results that are too stealthy to be identified by other non-special… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

    Comments: 10 pages

  29. arXiv:2408.00641  [pdf, other

    cs.LG

    Enhancing Ethereum Fraud Detection via Generative and Contrastive Self-supervision

    Authors: Chenxiang Jin, Jiajun Zhou, Chenxuan Xie, Shanqing Yu, Qi Xuan, Xiaoniu Yang

    Abstract: The rampant fraudulent activities on Ethereum hinder the healthy development of the blockchain ecosystem, necessitating the reinforcement of regulations. However, multiple imbalances involving account interaction frequencies and interaction types in the Ethereum transaction environment pose significant challenges to data mining-based fraud detection research. To address this, we first propose the… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  30. arXiv:2408.00325  [pdf, other

    cs.SD eess.AS

    Iterative Prototype Refinement for Ambiguous Speech Emotion Recognition

    Authors: Haoqin Sun, Shiwan Zhao, Xiangyu Kong, Xuechen Wang, Hui Wang, Jiaming Zhou, Yong Qin

    Abstract: Recognizing emotions from speech is a daunting task due to the subtlety and ambiguity of expressions. Traditional speech emotion recognition (SER) systems, which typically rely on a singular, precise emotion label, struggle with this complexity. Therefore, modeling the inherent ambiguity of emotions is an urgent problem. In this paper, we propose an iterative prototype refinement framework (IPR) f… ▽ More

    Submitted 1 August, 2024; originally announced August 2024.

  31. arXiv:2408.00144  [pdf, other

    cs.CL cs.AI

    Distributed In-Context Learning under Non-IID Among Clients

    Authors: Siqi Liang, Sumyeong Ahn, Jiayu Zhou

    Abstract: Advancements in large language models (LLMs) have shown their effectiveness in multiple complicated natural language reasoning tasks. A key challenge remains in adapting these models efficiently to new or unfamiliar tasks. In-context learning (ICL) provides a promising solution for few-shot adaptation by retrieving a set of data points relevant to a query, called in-context examples (ICE), from a… ▽ More

    Submitted 31 July, 2024; originally announced August 2024.

    Comments: 12 pages

    ACM Class: I.2.7

  32. arXiv:2408.00118  [pdf, other

    cs.CL cs.AI

    Gemma 2: Improving Open Language Models at a Practical Size

    Authors: Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, Léonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ramé, Johan Ferret, Peter Liu, Pouya Tafti, Abe Friesen, Michelle Casbon, Sabela Ramos, Ravin Kumar, Charline Le Lan, Sammy Jerome, Anton Tsitsulin, Nino Vieillard, Piotr Stanczyk, Sertan Girgin, Nikola Momchev, Matt Hoffman , et al. (172 additional authors not shown)

    Abstract: In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al., 2020a) and group-query attention (Ainslie et al., 2023). We al… ▽ More

    Submitted 2 August, 2024; v1 submitted 31 July, 2024; originally announced August 2024.

  33. RainMamba: Enhanced Locality Learning with State Space Models for Video Deraining

    Authors: Hongtao Wu, Yijun Yang, Huihui Xu, Weiming Wang, Jinni Zhou, Lei Zhu

    Abstract: The outdoor vision systems are frequently contaminated by rain streaks and raindrops, which significantly degenerate the performance of visual tasks and multimedia applications. The nature of videos exhibits redundant temporal cues for rain removal with higher stability. Traditional video deraining methods heavily rely on optical flow estimation and kernel-based manners, which have a limited recep… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

    Comments: ACM Multimedia 2024

  34. arXiv:2407.21560  [pdf, ps, other

    cs.CL cs.AI

    Generative Sentiment Analysis via Latent Category Distribution and Constrained Decoding

    Authors: Jun Zhou, Dongyang Yu, Kamran Aziz, Fangfang Su, Qing Zhang, Fei Li, Donghong Ji

    Abstract: Fine-grained sentiment analysis involves extracting and organizing sentiment elements from textual data. However, existing approaches often overlook issues of category semantic inclusion and overlap, as well as inherent structural patterns within the target sequence. This study introduces a generative sentiment analysis model. To address the challenges related to category semantic inclusion and ov… ▽ More

    Submitted 31 July, 2024; originally announced July 2024.

  35. arXiv:2407.21075  [pdf, other

    cs.AI cs.CL cs.LG

    Apple Intelligence Foundation Language Models

    Authors: Tom Gunter, Zirui Wang, Chong Wang, Ruoming Pang, Andy Narayanan, Aonan Zhang, Bowen Zhang, Chen Chen, Chung-Cheng Chiu, David Qiu, Deepak Gopinath, Dian Ang Yap, Dong Yin, Feng Nan, Floris Weers, Guoli Yin, Haoshuo Huang, Jianyu Wang, Jiarui Lu, John Peebles, Ke Ye, Mark Lee, Nan Du, Qibin Chen, Quentin Keunebroek , et al. (130 additional authors not shown)

    Abstract: We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  36. arXiv:2407.20878  [pdf

    eess.IV cs.CV

    S3PET: Semi-supervised Standard-dose PET Image Reconstruction via Dose-aware Token Swap

    Authors: Jiaqi Cui, Pinxian Zeng, Yuanyuan Xu, Xi Wu, Jiliu Zhou, Yan Wang

    Abstract: To acquire high-quality positron emission tomography (PET) images while reducing the radiation tracer dose, numerous efforts have been devoted to reconstructing standard-dose PET (SPET) images from low-dose PET (LPET). However, the success of current fully-supervised approaches relies on abundant paired LPET and SPET images, which are often unavailable in clinic. Moreover, these methods often mix… ▽ More

    Submitted 30 July, 2024; originally announced July 2024.

  37. arXiv:2407.20459  [pdf, other

    cs.CR

    Excavating Vulnerabilities Lurking in Multi-Factor Authentication Protocols: A Systematic Security Analysis

    Authors: Ang Kok Wee, Eyasu Getahun Chekole, Jianying Zhou

    Abstract: Nowadays, cyberattacks are growing exponentially, causing havoc to Internet users. In particular, authentication attacks constitute the major attack vector where intruders impersonate legitimate users to maliciously access systems or resources. Traditional single-factor authentication (SFA) protocols are often bypassed by side-channel and other attack techniques, hence they are no longer sufficien… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  38. arXiv:2407.20080  [pdf, other

    cs.CV cs.LG

    UniTTA: Unified Benchmark and Versatile Framework Towards Realistic Test-Time Adaptation

    Authors: Chaoqun Du, Yulin Wang, Jiayi Guo, Yizeng Han, Jie Zhou, Gao Huang

    Abstract: Test-Time Adaptation (TTA) aims to adapt pre-trained models to the target domain during testing. In reality, this adaptability can be influenced by multiple factors. Researchers have identified various challenging scenarios and developed diverse methods to address these challenges, such as dealing with continual domain shifts, mixed domains, and temporally correlated or imbalanced class distributi… ▽ More

    Submitted 29 July, 2024; originally announced July 2024.

  39. arXiv:2407.19813  [pdf, other

    cs.CL cs.AI

    Improving Retrieval Augmented Language Model with Self-Reasoning

    Authors: Yuan Xia, Jingbo Zhou, Zhenhui Shi, Jun Chen, Haifeng Huang

    Abstract: The Retrieval-Augmented Language Model (RALM) has shown remarkable performance on knowledge-intensive tasks by incorporating external knowledge during inference, which mitigates the factual hallucinations inherited in large language models (LLMs). Despite these advancements, challenges persist in the implementation of RALMs, particularly concerning their reliability and traceability. To be specifi… ▽ More

    Submitted 2 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

  40. Multiscale Representation Enhanced Temporal Flow Fusion Model for Long-Term Workload Forecasting

    Authors: Shiyu Wang, Zhixuan Chu, Yinbo Sun, Yu Liu, Yuliang Guo, Yang Chen, Huiyang Jian, Lintao Ma, Xingyu Lu, Jun Zhou

    Abstract: Accurate workload forecasting is critical for efficient resource management in cloud computing systems, enabling effective scheduling and autoscaling. Despite recent advances with transformer-based forecasting models, challenges remain due to the non-stationary, nonlinear characteristics of workload time series and the long-term dependencies. In particular, inconsistent performance between long-te… ▽ More

    Submitted 18 August, 2024; v1 submitted 29 July, 2024; originally announced July 2024.

    Comments: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM '24), October 21--25, 2024, Boise, ID, USA

  41. arXiv:2407.19035  [pdf, other

    cs.CV

    ScalingGaussian: Enhancing 3D Content Creation with Generative Gaussian Splatting

    Authors: Shen Chen, Jiale Zhou, Zhongyu Jiang, Tianfang Zhang, Zongkai Wu, Jenq-Neng Hwang, Lei Li

    Abstract: The creation of high-quality 3D assets is paramount for applications in digital heritage preservation, entertainment, and robotics. Traditionally, this process necessitates skilled professionals and specialized software for the modeling, texturing, and rendering of 3D objects. However, the rising demand for 3D assets in gaming and virtual reality (VR) has led to the creation of accessible image-to… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

    Comments: 14 pages

  42. arXiv:2407.18461  [pdf, other

    cs.SD cs.CL eess.AS

    Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation

    Authors: Shiyao Wang, Shiwan Zhao, Jiaming Zhou, Aobo Kong, Yong Qin

    Abstract: Dysarthric speech recognition (DSR) presents a formidable challenge due to inherent inter-speaker variability, leading to severe performance degradation when applying DSR models to new dysarthric speakers. Traditional speaker adaptation methodologies typically involve fine-tuning models for each speaker, but this strategy is cost-prohibitive and inconvenient for disabled users, requiring substanti… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: accepted by Interspeech 2024

    Journal ref: INTERSPEECH 2024

  43. arXiv:2407.18271  [pdf, other

    cs.AR cs.AI

    Large Language Model for Verilog Generation with Golden Code Feedback

    Authors: Ning Wang, Bingkun Yao, Jie Zhou, Xi Wang, Zhe Jiang, Nan Guan

    Abstract: Recent advancements in large language models (LLMs) have catalyzed significant interest in the automatic generation of Register-Transfer Level (RTL) code, particularly Verilog, from natural language instructions. While commercial LLMs like ChatGPT have dominated this domain, open-source alternatives have lagged considerably in performance, limiting the flexibility and data privacy of this emerging… ▽ More

    Submitted 5 August, 2024; v1 submitted 21 July, 2024; originally announced July 2024.

  44. arXiv:2407.17789  [pdf, other

    cs.MA cs.AI

    Very Large-Scale Multi-Agent Simulation in AgentScope

    Authors: Xuchen Pan, Dawei Gao, Yuexiang Xie, Zhewei Wei, Yaliang Li, Bolin Ding, Ji-Rong Wen, Jingren Zhou

    Abstract: Recent advances in large language models (LLMs) have opened new avenues for applying multi-agent systems in very large-scale simulations. However, there remain several challenges when conducting multi-agent simulations with existing platforms, such as limited scalability and low efficiency, unsatisfied agent diversity, and effort-intensive management processes. To address these challenges, we deve… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

    Comments: We have released code on https://github.com/modelscope/agentscope

  45. arXiv:2407.17349  [pdf, other

    cs.CL

    Boosting Large Language Models with Socratic Method for Conversational Mathematics Teaching

    Authors: Yuyang Ding, Hanglei Hu, Jie Zhou, Qin Chen, Bo Jiang, Liang He

    Abstract: With the introduction of large language models (LLMs), automatic math reasoning has seen tremendous success. However, current methods primarily focus on providing solutions or using techniques like Chain-of-Thought to enhance problem-solving accuracy. In this paper, we focus on improving the capability of mathematics teaching via a Socratic teaching-based LLM (\texttt{SocraticLLM}), which guides l… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

    Comments: Accepted By CIKM 2024

  46. arXiv:2407.16931  [pdf, other

    cs.CL

    ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering

    Authors: Xiuying Chen, Tairan Wang, Taicheng Guo, Kehan Guo, Juexiao Zhou, Haoyang Li, Mingchen Zhuge, Jürgen Schmidhuber, Xin Gao, Xiangliang Zhang

    Abstract: Question Answering (QA) effectively evaluates language models' reasoning and knowledge depth. While QA datasets are plentiful in areas like general domain and biomedicine, academic chemistry is less explored. Chemical QA plays a crucial role in both education and research by effectively translating complex chemical information into readily understandable format. Addressing this gap, we introduce S… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: 14 pages

  47. arXiv:2407.16266  [pdf, other

    cs.CL

    Beyond Binary Gender: Evaluating Gender-Inclusive Machine Translation with Ambiguous Attitude Words

    Authors: Yijie Chen, Yijin Liu, Fandong Meng, Jinan Xu, Yufeng Chen, Jie Zhou

    Abstract: Gender bias has been a focal point in the study of bias in machine translation and language models. Existing machine translation gender bias evaluations are primarily focused on male and female genders, limiting the scope of the evaluation. To assess gender bias accurately, these studies often rely on calculating the accuracy of gender pronouns or the masculine and feminine attributes of grammatic… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: The code is publicly available at \url{https://github.com/pppa2019/ambGIMT}

  48. arXiv:2407.16260  [pdf, other

    cs.CV

    DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors

    Authors: Zizheng Yan, Jiapeng Zhou, Fanpeng Meng, Yushuang Wu, Lingteng Qiu, Zisheng Ye, Shuguang Cui, Guanying Chen, Xiaoguang Han

    Abstract: Text-to-3D generation has recently seen significant progress. To enhance its practicality in real-world applications, it is crucial to generate multiple independent objects with interactions, similar to layer-compositing in 2D image editing. However, existing text-to-3D methods struggle with this task, as they are designed to generate either non-independent objects or independent objects lacking s… ▽ More

    Submitted 23 July, 2024; originally announced July 2024.

    Comments: ECCV 2024. Project page: https://chester256.github.io/dreamdissector

  49. arXiv:2407.15227  [pdf, other

    cs.CL cs.SI

    A Community-Centric Perspective for Characterizing and Detecting Anti-Asian Violence-Provoking Speech

    Authors: Gaurav Verma, Rynaa Grover, Jiawei Zhou, Binny Mathew, Jordan Kraemer, Munmun De Choudhury, Srijan Kumar

    Abstract: Violence-provoking speech -- speech that implicitly or explicitly promotes violence against the members of the targeted community, contributed to a massive surge in anti-Asian crimes during the pandemic. While previous works have characterized and built tools for detecting other forms of harmful speech, like fear speech and hate speech, our work takes a community-centric approach to studying anti-… ▽ More

    Submitted 21 July, 2024; originally announced July 2024.

    Comments: Accepted to ACL 2024 Main

  50. arXiv:2407.14788  [pdf, other

    cs.LG cs.AI cs.CL

    On the Design and Analysis of LLM-Based Algorithms

    Authors: Yanxi Chen, Yaliang Li, Bolin Ding, Jingren Zhou

    Abstract: We initiate a formal investigation into the design and analysis of LLM-based algorithms, i.e. algorithms that contain one or multiple calls of large language models (LLMs) as sub-routines and critically rely on the capabilities of LLMs. While LLM-based algorithms, ranging from basic LLM calls with prompt engineering to complicated LLM-powered agent systems and compound AI systems, have achieved re… ▽ More

    Submitted 20 July, 2024; originally announced July 2024.