Skip to main content

Showing 1–50 of 111 results for author: Ma, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2409.04005  [pdf, other

    cs.CV

    Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task

    Authors: Jing Wang, Ao Ma, Jiasong Feng, Dawei Leng, Yuhui Yin, Xiaodan Liang

    Abstract: The global self-attention mechanism in diffusion transformers involves redundant computation due to the sparse and redundant nature of visual information, and the attention map of tokens within a spatial window shows significant similarity. To address this redundancy, we propose the Proxy Token Diffusion Transformer (PT-DiT), which employs sparse representative token attention (where the number of… ▽ More

    Submitted 5 September, 2024; originally announced September 2024.

  2. arXiv:2408.08189  [pdf, other

    cs.CV

    FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

    Authors: Jiasong Feng, Ao Ma, Jing Wang, Bo Cheng, Xiaodan Liang, Dawei Leng, Yuhui Yin

    Abstract: Synthesizing motion-rich and temporally consistent videos remains a challenge in artificial intelligence, especially when dealing with extended durations. Existing text-to-video (T2V) models commonly employ spatial cross-attention for text control, equivalently guiding different frame generations without frame-specific textual guidance. Thus, the model's capacity to comprehend the temporal logic c… ▽ More

    Submitted 16 August, 2024; v1 submitted 15 August, 2024; originally announced August 2024.

  3. arXiv:2408.08105  [pdf, other

    cs.CV cs.AI

    Multimodal Causal Reasoning Benchmark: Challenging Vision Large Language Models to Infer Causal Links Between Siamese Images

    Authors: Zhiyuan Li, Heng Wang, Dongnan Liu, Chaoyi Zhang, Ao Ma, Jieting Long, Weidong Cai

    Abstract: Large Language Models (LLMs) have showcased exceptional ability in causal reasoning from textual information. However, will these causalities remain straightforward for Vision Large Language Models (VLLMs) when only visual hints are provided? Motivated by this, we propose a novel Multimodal Causal Reasoning benchmark, namely MuCR, to challenge VLLMs to infer semantic cause-and-effect relationship… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 20 pages

  4. arXiv:2407.18496  [pdf, other

    cs.CL cs.LG

    Towards More Accurate Prediction of Human Empathy and Emotion in Text and Multi-turn Conversations by Combining Advanced NLP, Transformers-based Networks, and Linguistic Methodologies

    Authors: Manisha Singh, Divy Sharma, Alonso Ma, Nora Goldfine

    Abstract: Based on the WASSA 2022 Shared Task on Empathy Detection and Emotion Classification, we predict the level of empathic concern and personal distress displayed in essays. For the first stage of this project we implemented a Feed-Forward Neural Network using sentence-level embeddings as features. We experimented with four different embedding models for generating the inputs to the neural network. The… ▽ More

    Submitted 26 July, 2024; originally announced July 2024.

  5. arXiv:2407.18471  [pdf, other

    cs.CL cs.IR cs.LG

    Constructing the CORD-19 Vaccine Dataset

    Authors: Manisha Singh, Divy Sharma, Alonso Ma, Bridget Tyree, Margaret Mitchell

    Abstract: We introduce new dataset 'CORD-19-Vaccination' to cater to scientists specifically looking into COVID-19 vaccine-related research. This dataset is extracted from CORD-19 dataset [Wang et al., 2020] and augmented with new columns for language detail, author demography, keywords, and topic per paper. Facebook's fastText model is used to identify languages [Joulin et al., 2016]. To establish author d… ▽ More

    Submitted 25 July, 2024; originally announced July 2024.

  6. arXiv:2407.15645  [pdf, other

    cs.CL cs.AI

    Psychometric Alignment: Capturing Human Knowledge Distributions via Language Models

    Authors: Joy He-Yueya, Wanjing Anya Ma, Kanishk Gandhi, Benjamin W. Domingue, Emma Brunskill, Noah D. Goodman

    Abstract: Language models (LMs) are increasingly used to simulate human-like responses in scenarios where accurately mimicking a population's behavior can guide decision-making, such as in developing educational materials and designing public policies. The objective of these simulations is for LMs to capture the variations in human responses, rather than merely providing the expected correct answers. Prior… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: Code and data: https://github.com/joyheyueya/psychometric-alignment

  7. arXiv:2407.13746  [pdf, ps, other

    cs.LG stat.ML

    Multi-Label Learning with Stronger Consistency Guarantees

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: We present a detailed study of surrogate losses and algorithms for multi-label learning, supported by $H$-consistency bounds. We first show that, for the simplest form of multi-label loss (the popular Hamming loss), the well-known consistent binary relevance surrogate suffers from a sub-optimal dependency on the number of labels in terms of $H$-consistency bounds, when using smooth losses such as… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  8. arXiv:2407.13732  [pdf, other

    cs.LG stat.ML

    Realizable $H$-Consistent and Bayes-Consistent Loss Functions for Learning to Defer

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: We present a comprehensive study of surrogate loss functions for learning to defer. We introduce a broad family of surrogate losses, parameterized by a non-increasing function $Ψ$, and establish their realizable $H$-consistency under mild conditions. For cost functions based on classification error, we further show that these losses admit $H$-consistency bounds when the hypothesis set is symmetric… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  9. arXiv:2407.13722  [pdf, ps, other

    cs.LG stat.ML

    Enhanced $H$-Consistency Bounds

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: Recent research has introduced a key notion of $H$-consistency bounds for surrogate losses. These bounds offer finite-sample guarantees, quantifying the relationship between the zero-one estimation error (or other target loss) and the surrogate loss estimation error for a specific hypothesis set. However, previous bounds were derived under the condition that a lower bound of the surrogate loss con… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  10. arXiv:2407.12421  [pdf, other

    cs.LG cs.AI

    SafePowerGraph: Safety-aware Evaluation of Graph Neural Networks for Transmission Power Grids

    Authors: Salah Ghamizi, Aleksandar Bojchevski, Aoxiang Ma, Jun Cao

    Abstract: Power grids are critical infrastructures of paramount importance to modern society and their rapid evolution and interconnections has heightened the complexity of power systems (PS) operations. Traditional methods for grid analysis struggle with the computational demands of large-scale RES and ES integration, prompting the adoption of machine learning (ML) techniques, particularly Graph Neural Net… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  11. arXiv:2407.07140  [pdf, other

    cs.LG stat.ML

    Cardinality-Aware Set Prediction and Top-$k$ Classification

    Authors: Corinna Cortes, Anqi Mao, Christopher Mohri, Mehryar Mohri, Yutao Zhong

    Abstract: We present a detailed study of cardinality-aware top-$k$ classification, a novel approach that aims to learn an accurate top-$k$ set predictor while maintaining a low cardinality. We introduce a new target loss function tailored to this setting that accounts for both the classification error and the cardinality of the set predicted. To optimize this loss function, we propose two families of surrog… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2403.19625

  12. arXiv:2407.03600  [pdf, other

    cs.CL

    Chain-of-Thought Augmentation with Logit Contrast for Enhanced Reasoning in Language Models

    Authors: Jay Shim, Grant Kruttschnitt, Alyssa Ma, Daniel Kim, Benjamin Chek, Athul Anand, Kevin Zhu, Sean O'Brien

    Abstract: Rapidly increasing model scales coupled with steering methods such as chain-of-thought prompting have led to drastic improvements in language model reasoning. At the same time, models struggle with compositional generalization and are far from human performance on many reasoning-based benchmarks. Leveraging the success of chain-of-thought prompting, and also taking inspiration from context-aware d… ▽ More

    Submitted 27 August, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

  13. arXiv:2406.17319  [pdf, other

    cs.CV

    DMF-Net: Image-Guided Point Cloud Completion with Dual-Channel Modality Fusion and Shape-Aware Upsampling Transformer

    Authors: Aihua Mao, Yuxuan Tang, Jiangtao Huang, Ying He

    Abstract: In this paper we study the task of a single-view image-guided point cloud completion. Existing methods have got promising results by fusing the information of image into point cloud explicitly or implicitly. However, given that the image has global shape information and the partial point cloud has rich local details, We believe that both modalities need to be given equal attention when performing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  14. Single-Temporal Supervised Learning for Universal Remote Sensing Change Detection

    Authors: Zhuo Zheng, Yanfei Zhong, Ailong Ma, Liangpei Zhang

    Abstract: Bitemporal supervised learning paradigm always dominates remote sensing change detection using numerous labeled bitemporal image pairs, especially for high spatial resolution (HSR) remote sensing imagery. However, it is very expensive and labor-intensive to label change regions in large-scale bitemporal HSR remote sensing image pairs. In this paper, we propose single-temporal supervised learning (… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: IJCV 2024. arXiv admin note: text overlap with arXiv:2108.07002

  15. arXiv:2406.10215  [pdf, other

    cs.CL cs.LG

    DevBench: A multimodal developmental benchmark for language learning

    Authors: Alvin Wei Ming Tan, Sunny Yu, Bria Long, Wanjing Anya Ma, Tonya Murray, Rebecca D. Silverman, Jason D. Yeatman, Michael C. Frank

    Abstract: How (dis)similar are the learning trajectories of vision-language models and children? Recent modeling work has attempted to understand the gap between models' and humans' data efficiency by constructing models trained on less data, especially multimodal naturalistic data. However, such models are often evaluated on adult-level benchmarks, with limited breadth in language abilities tested, and wit… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  16. arXiv:2405.07905  [pdf, other

    eess.IV cs.CV

    PLUTO: Pathology-Universal Transformer

    Authors: Dinkar Juyal, Harshith Padigela, Chintan Shah, Daniel Shenker, Natalia Harguindeguy, Yi Liu, Blake Martin, Yibo Zhang, Michael Nercessian, Miles Markey, Isaac Finberg, Kelsey Luu, Daniel Borders, Syed Ashar Javed, Emma Krause, Raymond Biju, Aashish Sood, Allen Ma, Jackson Nyman, John Shamshoian, Guillaume Chhor, Darpan Sanghavi, Marc Thibault, Limin Yu, Fedaa Najdawi , et al. (8 additional authors not shown)

    Abstract: Pathology is the study of microscopic inspection of tissue, and a pathology diagnosis is often the medical gold standard to diagnose disease. Pathology images provide a unique challenge for computer-vision-based analysis: a single pathology Whole Slide Image (WSI) is gigapixel-sized and often contains hundreds of thousands to millions of objects of interest across multiple resolutions. In this wor… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  17. arXiv:2405.05968  [pdf, other

    cs.LG stat.ML

    A Universal Growth Rate for Learning with Smooth Surrogate Losses

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: This paper presents a comprehensive analysis of the growth rate of $H$-consistency bounds (and excess error bounds) for various surrogate losses used in classification. We prove a square-root growth rate near zero for smooth margin-based surrogate losses in binary classification, providing both upper and lower bounds under mild assumptions. This result also translates to excess error bounds. Our l… ▽ More

    Submitted 8 July, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

  18. arXiv:2403.19625  [pdf, other

    cs.LG stat.ML

    Top-$k$ Classification and Cardinality-Aware Prediction

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: We present a detailed study of top-$k$ classification, the task of predicting the $k$ most probable classes for an input, extending beyond single-class prediction. We demonstrate that several prevalent surrogate loss functions in multi-class classification, such as comp-sum and constrained losses, are supported by $H$-consistency bounds with respect to the top-$k$ loss. These bounds guarantee cons… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  19. arXiv:2403.19494  [pdf, ps, other

    cs.LG stat.ML

    Regression with Multi-Expert Deferral

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: Learning to defer with multiple experts is a framework where the learner can choose to defer the prediction to several experts. While this problem has received significant attention in classification contexts, it presents unique challenges in regression due to the infinite and continuous nature of the label space. In this work, we introduce a novel framework of regression with deferral, which invo… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  20. arXiv:2403.19480  [pdf, ps, other

    cs.LG stat.ML

    $H$-Consistency Guarantees for Regression

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: We present a detailed study of $H$-consistency bounds for regression. We first present new theorems that generalize the tools previously given to establish $H$-consistency bounds. This generalization proves essential for analyzing $H$-consistency bounds specific to regression. Next, we prove a series of novel $H$-consistency bounds for surrogate loss functions of the squared loss, under the assump… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  21. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1110 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 8 August, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  22. arXiv:2403.00892  [pdf, other

    eess.SY cs.LG

    PowerFlowMultiNet: Multigraph Neural Networks for Unbalanced Three-Phase Distribution Systems

    Authors: Salah Ghamizi, Jun Cao, Aoxiang Ma, Pedro Rodriguez

    Abstract: Efficiently solving unbalanced three-phase power flow in distribution grids is pivotal for grid analysis and simulation. There is a pressing need for scalable algorithms capable of handling large-scale unbalanced power grids that can provide accurate and fast solutions. To address this, deep learning techniques, especially Graph Neural Networks (GNNs), have emerged. However, existing literature pr… ▽ More

    Submitted 6 September, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

  23. arXiv:2402.18078  [pdf, other

    cs.CV

    Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis

    Authors: Yanzuo Lu, Manlin Zhang, Andy J Ma, Xiaohua Xie, Jian-Huang Lai

    Abstract: Diffusion model is a promising approach to image generation and has been employed for Pose-Guided Person Image Synthesis (PGPIS) with competitive performance. While existing methods simply align the person appearance to the target pose, they are prone to overfitting due to the lack of a high-level semantic understanding on the source person image. In this paper, we propose a novel Coarse-to-Fine L… ▽ More

    Submitted 9 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: Accepted by CVPR 2024 (Highlight)

  24. arXiv:2402.10434  [pdf, other

    cs.LG

    Parametric Augmentation for Time Series Contrastive Learning

    Authors: Xu Zheng, Tianchun Wang, Wei Cheng, Aitian Ma, Haifeng Chen, Mo Sha, Dongsheng Luo

    Abstract: Modern techniques like contrastive learning have been effectively used in many areas, including computer vision, natural language processing, and graph-structured data. Creating positive examples that assist the model in learning robust and discriminative representations is a crucial stage in contrastive learning approaches. Usually, preset human intuition directs the selection of relevant data au… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: Accepted by International Conference on Learning Representations (ICLR 2024)

  25. arXiv:2401.16450  [pdf, other

    cs.HC cs.AI cs.SE

    ACCESS: Prompt Engineering for Automated Web Accessibility Violation Corrections

    Authors: Calista Huang, Alyssa Ma, Suchir Vyasamudri, Eugenie Puype, Sayem Kamal, Juan Belza Garcia, Salar Cheema, Michael Lutz

    Abstract: With the increasing need for inclusive and user-friendly technology, web accessibility is crucial to ensuring equal access to online content for individuals with disabilities, including visual, auditory, cognitive, or motor impairments. Despite the existence of accessibility guidelines and standards such as Web Content Accessibility Guidelines (WCAG) and the Web Accessibility Initiative (W3C), ove… ▽ More

    Submitted 10 February, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

    Comments: 11 pages, 6 figures

  26. arXiv:2401.16348  [pdf, other

    cs.CL cs.CY cs.HC

    Improving the TENOR of Labeling: Re-evaluating Topic Models for Content Analysis

    Authors: Zongxia Li, Andrew Mao, Daniel Stephens, Pranav Goel, Emily Walpole, Alden Dima, Juan Fung, Jordan Boyd-Graber

    Abstract: Topic models are a popular tool for understanding text collections, but their evaluation has been a point of contention. Automated evaluation metrics such as coherence are often used, however, their validity has been questioned for neural topic models (NTMs) and can overlook a models benefits in real world applications. To this end, we conduct the first evaluation of neural, supervised and classic… ▽ More

    Submitted 19 February, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

    Comments: 19 pages, 5 tables, 6 figures, Accepted to EACL Main Conference 2024

  27. arXiv:2312.12246  [pdf, other

    cs.CV cs.LG

    MDD-UNet: Domain Adaptation for Medical Image Segmentation with Theoretical Guarantees, a Proof of Concept

    Authors: Asbjørn Munk, Ao Ma, Mads Nielsen

    Abstract: The current state-of-the art techniques for image segmentation are often based on U-Net architectures, a U-shaped encoder-decoder networks with skip connections. Despite the powerful performance, the architecture often does not perform well when used on data which has different characteristics than the data it was trained on. Many techniques for improving performance in the presence of domain shif… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Published at NLDL 2024

  28. arXiv:2312.12222  [pdf, other

    cs.CV

    EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering

    Authors: Junjue Wang, Zhuo Zheng, Zihang Chen, Ailong Ma, Yanfei Zhong

    Abstract: Earth vision research typically focuses on extracting geospatial object locations and categories but neglects the exploration of relations between objects and comprehensive reasoning. Based on city planning needs, we develop a multi-modal multi-task VQA dataset (EarthVQA) to advance relational reasoning-based judging, counting, and comprehensive analysis. The EarthVQA dataset contains 6000 images,… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted By AAAI 2024

  29. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  30. arXiv:2312.11468  [pdf, other

    physics.med-ph cs.CV

    Bias-Reduced Neural Networks for Parameter Estimation in Quantitative MRI

    Authors: Andrew Mao, Sebastian Flassbeck, Jakob Assländer

    Abstract: Purpose: To develop neural network (NN)-based quantitative MRI parameter estimators with minimal bias and a variance close to the Cramér-Rao bound. Theory and Methods: We generalize the mean squared error loss to control the bias and variance of the NN's estimates, which involves averaging over multiple noise realizations of the same measurements during training. Bias and variance properties of… ▽ More

    Submitted 10 April, 2024; v1 submitted 13 November, 2023; originally announced December 2023.

  31. arXiv:2312.07871  [pdf, other

    cs.CV

    MLNet: Mutual Learning Network with Neighborhood Invariance for Universal Domain Adaptation

    Authors: Yanzuo Lu, Meng Shen, Andy J Ma, Xiaohua Xie, Jian-Huang Lai

    Abstract: Universal domain adaptation (UniDA) is a practical but challenging problem, in which information about the relation between the source and the target domains is not given for knowledge transfer. Existing UniDA methods may suffer from the problems of overlooking intra-domain variations in the target domain and difficulty in separating between the similar known and unknown class. To address these is… ▽ More

    Submitted 27 February, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024 (Poster)

  32. arXiv:2312.00111  [pdf, other

    cs.LG cond-mat.mtrl-sci

    Multimodal Learning for Materials

    Authors: Viggo Moro, Charlotte Loh, Rumen Dangovski, Ali Ghorashi, Andrew Ma, Zhuo Chen, Samuel Kim, Peter Y. Lu, Thomas Christensen, Marin Soljačić

    Abstract: Artificial intelligence is transforming computational materials science, improving the prediction of material properties, and accelerating the discovery of novel materials. Recently, publicly available material data repositories have grown rapidly. This growth encompasses not only more materials, but also a greater variety and quantity of their associated properties. Existing machine learning effo… ▽ More

    Submitted 12 April, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

    Comments: 11 pages, 4 figures

  33. arXiv:2311.18495  [pdf, other

    cs.LG cs.CV

    Improving Adversarial Transferability via Model Alignment

    Authors: Avery Ma, Amir-massoud Farahmand, Yangchen Pan, Philip Torr, Jindong Gu

    Abstract: Neural networks are susceptible to adversarial perturbations that are transferable across different models. In this paper, we introduce a novel model alignment technique aimed at improving a given source model's ability in generating transferable adversarial perturbations. During the alignment process, the parameters of the source model are fine-tuned to minimize an alignment loss. This loss measu… ▽ More

    Submitted 17 July, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: Accepted at the European Conference on Computer Vision (ECCV) 2024. Code: https://github.com/averyma/model-alignment

  34. arXiv:2311.10266  [pdf, other

    cs.CL

    Diagnosing and Debiasing Corpus-Based Political Bias and Insults in GPT2

    Authors: Ambri Ma, Arnav Kumar, Brett Zeligson

    Abstract: The training of large language models (LLMs) on extensive, unfiltered corpora sourced from the internet is a common and advantageous practice. Consequently, LLMs have learned and inadvertently reproduced various types of biases, including violent, offensive, and toxic language. However, recent research shows that generative pretrained transformer (GPT) language models can recognize their own biase… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 9 pages

  35. arXiv:2311.02762   

    cs.CV cs.LG

    Fast Sparse 3D Convolution Network with VDB

    Authors: Fangjun Zhou, Anyong Mao, Eftychios Sifakis

    Abstract: We proposed a new Convolution Neural Network implementation optimized for sparse 3D data inference. This implementation uses NanoVDB as the data structure to store the sparse tensor. It leaves a relatively small memory footprint while maintaining high performance. We demonstrate that this architecture is around 20 times faster than the state-of-the-art dense CNN model on a high-resolution 3D objec… ▽ More

    Submitted 14 November, 2023; v1 submitted 5 November, 2023; originally announced November 2023.

    Comments: Unauthorized publication

  36. arXiv:2310.19859  [pdf, other

    cs.CV cs.AI

    Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone

    Authors: Zeyinzi Jiang, Chaojie Mao, Ziyuan Huang, Ao Ma, Yiliang Lv, Yujun Shen, Deli Zhao, Jingren Zhou

    Abstract: Parameter-efficient tuning has become a trend in transferring large-scale foundation models to downstream applications. Existing methods typically embed some light-weight tuners into the backbone, where both the design and the learning of the tuners are highly dependent on the base model. This work offers a new tuning paradigm, dubbed Res-Tuning, which intentionally unbinds tuners from the backbon… ▽ More

    Submitted 30 October, 2023; originally announced October 2023.

    Comments: Accepted to NeurIPS 2023

  37. arXiv:2310.17626  [pdf, ps, other

    cs.CV

    A Survey on Transferability of Adversarial Examples across Deep Neural Networks

    Authors: Jindong Gu, Xiaojun Jia, Pau de Jorge, Wenqain Yu, Xinwei Liu, Avery Ma, Yuan Xun, Anjun Hu, Ashkan Khakzar, Zhijiang Li, Xiaochun Cao, Philip Torr

    Abstract: The emergence of Deep Neural Networks (DNNs) has revolutionized various domains by enabling the resolution of complex tasks spanning image recognition, natural language processing, and scientific problem-solving. However, this progress has also brought to light a concerning vulnerability: adversarial examples. These crafted inputs, imperceptible to humans, can manipulate machine learning models in… ▽ More

    Submitted 1 May, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: Accepted to Transactions on Machine Learning Research (TMLR)

  38. arXiv:2310.14774  [pdf, ps, other

    cs.LG stat.ML

    Principled Approaches for Learning to Defer with Multiple Experts

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: We present a study of surrogate losses and algorithms for the general problem of learning to defer with multiple experts. We first introduce a new family of surrogate losses specifically tailored for the multiple-expert setting, where the prediction and deferral functions are learned simultaneously. We then prove that these surrogate losses benefit from strong $H$-consistency bounds. We illustrate… ▽ More

    Submitted 31 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: ISAIM 2024

  39. arXiv:2310.14772  [pdf, other

    cs.LG stat.ML

    Predictor-Rejector Multi-Class Abstention: Theoretical Analysis and Algorithms

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: We study the key framework of learning with abstention in the multi-class classification setting. In this setting, the learner can choose to abstain from making a prediction with some pre-defined cost. We present a series of new theoretical and algorithmic results for this learning problem in the predictor-rejector framework. We introduce several new families of surrogate losses for which we prove… ▽ More

    Submitted 31 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: ALT 2024

  40. arXiv:2310.14770  [pdf, ps, other

    cs.LG stat.ML

    Theoretically Grounded Loss Functions and Algorithms for Score-Based Multi-Class Abstention

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: Learning with abstention is a key scenario where the learner can abstain from making a prediction at some cost. In this paper, we analyze the score-based formulation of learning with abstention in the multi-class classification setting. We introduce new families of surrogate losses for the abstention loss function, which include the state-of-the-art surrogate losses in the single-stage setting and… ▽ More

    Submitted 31 March, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: AISTATS 2024

  41. arXiv:2310.06837  [pdf, other

    cs.CL cs.LG

    Generating and Evaluating Tests for K-12 Students with Language Model Simulations: A Case Study on Sentence Reading Efficiency

    Authors: Eric Zelikman, Wanjing Anya Ma, Jasmine E. Tran, Diyi Yang, Jason D. Yeatman, Nick Haber

    Abstract: Developing an educational test can be expensive and time-consuming, as each item must be written by experts and then evaluated by collecting hundreds of student responses. Moreover, many tests require multiple distinct sets of questions administered throughout the school year to closely monitor students' progress, known as parallel tests. In this study, we focus on tests of silent sentence reading… ▽ More

    Submitted 10 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 (Main)

  42. arXiv:2309.17031  [pdf, other

    cs.CV cs.AI

    Scalable Multi-Temporal Remote Sensing Change Data Generation via Simulating Stochastic Change Process

    Authors: Zhuo Zheng, Shiqi Tian, Ailong Ma, Liangpei Zhang, Yanfei Zhong

    Abstract: Understanding the temporal dynamics of Earth's surface is a mission of multi-temporal remote sensing image analysis, significantly promoted by deep vision models with its fuel -- labeled multi-temporal images. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present a sca… ▽ More

    Submitted 29 September, 2023; originally announced September 2023.

    Comments: ICCV 2023

  43. arXiv:2309.03893  [pdf, other

    cs.CV cs.AI cs.LG

    DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection

    Authors: Manlin Zhang, Jie Wu, Yuxi Ren, Ming Li, Jie Qin, Xuefeng Xiao, Wei Liu, Rui Wang, Min Zheng, Andy J. Ma

    Abstract: Data is the cornerstone of deep learning. This paper reveals that the recently developed Diffusion Model is a scalable data engine for object detection. Existing methods for scaling up detection-oriented data often require manual collection or generative models to obtain target images, followed by data augmentation and labeling to produce training pairs, which are costly, complex, or lacking diver… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

    Comments: Code and Models are publicly available. Project Page: https://mettyz.github.io/DiffusionEngine

  44. arXiv:2308.16904  [pdf, other

    math.NA cs.LG math.OC

    A Note on Randomized Kaczmarz Algorithm for Solving Doubly-Noisy Linear Systems

    Authors: El Houcine Bergou, Soumia Boucherouite, Aritra Dutta, Xin Li, Anna Ma

    Abstract: Large-scale linear systems, $Ax=b$, frequently arise in practice and demand effective iterative solvers. Often, these systems are noisy due to operational errors or faulty data-collection processes. In the past decade, the randomized Kaczmarz (RK) algorithm has been studied extensively as an efficient iterative solver for such systems. However, the convergence study of RK in the noisy regime is li… ▽ More

    Submitted 23 August, 2024; v1 submitted 31 August, 2023; originally announced August 2023.

    MSC Class: 15A06; 15A09; 15A10; 15A18; 65F10; 65Y20; 68Q25; 68W20; 68W40

  45. arXiv:2308.06703  [pdf, other

    cs.LG

    Understanding the robustness difference between stochastic gradient descent and adaptive gradient methods

    Authors: Avery Ma, Yangchen Pan, Amir-massoud Farahmand

    Abstract: Stochastic gradient descent (SGD) and adaptive gradient methods, such as Adam and RMSProp, have been widely used in training deep neural networks. We empirically show that while the difference between the standard generalization performance of models trained using these methods is small, those trained using SGD exhibit far greater robustness under input perturbations. Notably, our investigation de… ▽ More

    Submitted 28 November, 2023; v1 submitted 13 August, 2023; originally announced August 2023.

    Comments: Accepted at TMLR (Featured Certification). Code: see https://github.com/averyma/opt-robust

  46. arXiv:2307.02035  [pdf, ps, other

    cs.LG stat.ML

    Ranking with Abstention

    Authors: Anqi Mao, Mehryar Mohri, Yutao Zhong

    Abstract: We introduce a novel framework of ranking with abstention, where the learner can abstain from making prediction at some limited cost $c$. We present a extensive theoretical analysis of this framework including a series of $H$-consistency bounds for both the family of linear functions and that of neural networks with one hidden-layer. These theoretical guarantees are the state-of-the-art consistenc… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

  47. arXiv:2306.08838  [pdf, other

    cs.LG cs.CR stat.ML

    Differentially Private Domain Adaptation with Theoretical Guarantees

    Authors: Raef Bassily, Corinna Cortes, Anqi Mao, Mehryar Mohri

    Abstract: In many applications, the labeled data at the learner's disposal is subject to privacy constraints and is relatively limited. To derive a more accurate predictor for the target domain, it is often beneficial to leverage publicly available labeled data from an alternative domain, somewhat close to the target domain. This is the modern problem of supervised domain adaptation from a public source to… ▽ More

    Submitted 4 February, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

  48. arXiv:2306.04730  [pdf, other

    eess.SP cs.LG math.NA math.OC stat.ML

    Stochastic Natural Thresholding Algorithms

    Authors: Rachel Grotheer, Shuang Li, Anna Ma, Deanna Needell, Jing Qin

    Abstract: Sparse signal recovery is one of the most fundamental problems in various applications, including medical imaging and remote sensing. Many greedy algorithms based on the family of hard thresholding operators have been developed to solve the sparse signal recovery problem. More recently, Natural Thresholding (NT) has been proposed with improved computational efficiency. This paper proposes and disc… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  49. arXiv:2306.00357  [pdf, other

    stat.ML cs.HC cs.LG math.PR math.ST

    Efficient and Robust Bayesian Selection of Hyperparameters in Dimension Reduction for Visualization

    Authors: Yin-Ting Liao, Hengrui Luo, Anna Ma

    Abstract: We introduce an efficient and robust auto-tuning framework for hyperparameter selection in dimension reduction (DR) algorithms, focusing on large-scale datasets and arbitrary performance metrics. By leveraging Bayesian optimization (BO) with a surrogate model, our approach enables efficient hyperparameter selection with multi-objective trade-offs and allows us to perform data-driven sensitivity an… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: 20 pages, 16 figures

    MSC Class: 62F15; 68T09; 94A16

  50. arXiv:2305.17358  [pdf, other

    cs.CL

    CTC-based Non-autoregressive Speech Translation

    Authors: Chen Xu, Xiaoqian Liu, Xiaowen Liu, Qingxuan Sun, Yuhao Zhang, Murun Yang, Qianqian Dong, Tom Ko, Mingxuan Wang, Tong Xiao, Anxiang Ma, Jingbo Zhu

    Abstract: Combining end-to-end speech translation (ST) and non-autoregressive (NAR) generation is promising in language and speech processing for their advantages of less error propagation and low latency. In this paper, we investigate the potential of connectionist temporal classification (CTC) for non-autoregressive speech translation (NAST). In particular, we develop a model consisting of two encoders th… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: ACL 2023 Main Conference