Skip to main content

Showing 1–50 of 78 results for author: Pfister, T

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.12733  [pdf, other

    cs.AI cs.CL cs.DB cs.LG

    SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging

    Authors: Mohammadreza Pourreza, Ruoxi Sun, Hailong Li, Lesly Miculicich, Tomas Pfister, Sercan O. Arik

    Abstract: Text-to-SQL systems, which convert natural language queries into SQL commands, have seen significant progress primarily for the SQLite dialect. However, adapting these systems to other SQL dialects like BigQuery and PostgreSQL remains a challenge due to the diversity in SQL syntax and functions. We introduce SQL-GEN, a framework for generating high-quality dialect-specific synthetic data guided by… ▽ More

    Submitted 22 August, 2024; originally announced August 2024.

  2. arXiv:2408.06610  [pdf, other

    cs.CV cs.CL cs.LG

    CROME: Cross-Modal Adapters for Efficient Multimodal LLM

    Authors: Sayna Ebrahimi, Sercan O. Arik, Tejas Nama, Tomas Pfister

    Abstract: Multimodal Large Language Models (MLLMs) demonstrate remarkable image-language capabilities, but their widespread use faces challenges in cost-effective training and adaptation. Existing approaches often necessitate expensive language model retraining and limited adaptability. Additionally, the current focus on zero-shot performance improvements offers insufficient guidance for task-specific tunin… ▽ More

    Submitted 12 August, 2024; originally announced August 2024.

  3. arXiv:2408.01875  [pdf, other

    cs.CL

    Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval

    Authors: Yanfei Chen, Jinsung Yoon, Devendra Singh Sachan, Qingze Wang, Vincent Cohen-Addad, Mohammadhossein Bateni, Chen-Yu Lee, Tomas Pfister

    Abstract: Recent advances in large language models (LLMs) have enabled autonomous agents with complex reasoning and task-fulfillment capabilities using a wide range of tools. However, effectively identifying the most relevant tools for a given task becomes a key bottleneck as the toolset size grows, hindering reliable tool utilization. To address this, we introduce Re-Invoke, an unsupervised tool retrieval… ▽ More

    Submitted 3 August, 2024; originally announced August 2024.

  4. arXiv:2407.20243  [pdf, other

    cs.CL cs.LG

    Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller Embedding Dimensions

    Authors: Jinsung Yoon, Raj Sinha, Sercan O Arik, Tomas Pfister

    Abstract: Embeddings from Large Language Models (LLMs) have emerged as critical components in various applications, particularly for information retrieval. While high-dimensional embeddings generally demonstrate superior performance as they contain more salient information, their practical application is frequently hindered by elevated computational latency and the associated higher cost. To address these c… ▽ More

    Submitted 17 July, 2024; originally announced July 2024.

  5. arXiv:2407.08223  [pdf, other

    cs.CL cs.AI

    Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

    Authors: Zilong Wang, Zifeng Wang, Long Le, Huaixiu Steven Zheng, Swaroop Mishra, Vincent Perot, Yuwei Zhang, Anush Mattapalli, Ankur Taly, Jingbo Shang, Chen-Yu Lee, Tomas Pfister

    Abstract: Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval outcomes through iterative LLM refinement or self-critique capabilities acquired through additional instruction tuning of LLMs. In this work, we introduce Specul… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: Preprint

  6. arXiv:2406.16008  [pdf, other

    cs.CL cs.AI cs.LG

    Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

    Authors: Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister

    Abstract: Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between… ▽ More

    Submitted 3 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

    Comments: ACL Findings 2024

  7. arXiv:2406.05365  [pdf, other

    cs.CL cs.AI cs.LG

    CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation

    Authors: I-Hung Hsu, Zifeng Wang, Long T. Le, Lesly Miculicich, Nanyun Peng, Chen-Yu Lee, Tomas Pfister

    Abstract: Grounded generation aims to equip language models (LMs) with the ability to produce more credible and accountable responses by accurately citing verifiable sources. However, existing methods, by either feeding LMs with raw or preprocessed materials, remain prone to errors. To address this, we introduce CaLM, a novel verification framework. CaLM leverages the insight that a robust grounded response… ▽ More

    Submitted 24 June, 2024; v1 submitted 8 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Camera Ready Version

  8. arXiv:2406.04153  [pdf, other

    cs.LG

    Learned Feature Importance Scores for Automated Feature Engineering

    Authors: Yihe Dong, Sercan Arik, Nathanael Yoder, Tomas Pfister

    Abstract: Feature engineering has demonstrated substantial utility for many machine learning workflows, such as in the small data regime or when distribution shifts are severe. Thus automating this capability can relieve much manual effort and improve model performance. Towards this, we propose AutoMAN, or Automated Mask-based Feature Engineering, an automated feature engineering framework that achieves hig… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  9. arXiv:2406.02818  [pdf, other

    cs.CL

    Chain of Agents: Large Language Models Collaborating on Long-Context Tasks

    Authors: Yusen Zhang, Ruoxi Sun, Yanfei Chen, Tomas Pfister, Rui Zhang, Sercan Ö. Arik

    Abstract: Addressing the challenge of effectively processing long contexts has become a critical issue for Large Language Models (LLMs). Two common strategies have emerged: 1) reducing the input length, such as retrieving relevant chunks by Retrieval-Augmented Generation (RAG), and 2) expanding the context window limit of LLMs. However, both strategies have drawbacks: input reduction has no guarantee of cov… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 19 pages, 6 figures

  10. arXiv:2406.00222  [pdf, other

    cs.CL cs.AI cs.LG

    Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training

    Authors: Maximillian Chen, Ruoxi Sun, Sercan Ö. Arık, Tomas Pfister

    Abstract: Large language models (LLMs) aligned through reinforcement learning from human feedback (RLHF) have quickly become one of the dominant paradigms for building intelligent conversational assistant agents. However, despite their strong performance across many benchmarks, LLM-based agents still lack conversational skills such as disambiguation: when generalized assistants are faced with ambiguity, the… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  11. arXiv:2405.18654  [pdf, other

    cs.CV

    Mitigating Object Hallucination via Data Augmented Contrastive Tuning

    Authors: Pritam Sarkar, Sayna Ebrahimi, Ali Etemad, Ahmad Beirami, Sercan Ö. Arık, Tomas Pfister

    Abstract: Despite their remarkable progress, Multimodal Large Language Models (MLLMs) tend to hallucinate factually inaccurate information. In this work, we address object hallucinations in MLLMs, where information is offered about an object that is not present in the model input. We introduce a contrastive tuning method that can be applied to a pretrained off-the-shelf MLLM for mitigating hallucinations wh… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  12. arXiv:2404.09491  [pdf, other

    cs.LG

    Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning

    Authors: Sungwon Han, Jinsung Yoon, Sercan O Arik, Tomas Pfister

    Abstract: Large Language Models (LLMs), with their remarkable ability to tackle challenging and unseen reasoning problems, hold immense potential for tabular learning, that is vital for many real-world applications. In this paper, we propose a novel in-context learning framework, FeatLLM, which employs LLMs as feature engineers to produce an input data set that is optimally suited for tabular predictions. T… ▽ More

    Submitted 6 May, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted to ICML, 2024

  13. arXiv:2404.05875  [pdf, other

    cs.CL cs.AI cs.LG

    CodecLM: Aligning Language Models with Tailored Synthetic Data

    Authors: Zifeng Wang, Chun-Liang Li, Vincent Perot, Long T. Le, Jin Miao, Zizhao Zhang, Chen-Yu Lee, Tomas Pfister

    Abstract: Instruction tuning has emerged as the key in aligning large language models (LLMs) with specific task instructions, thereby mitigating the discrepancy between the next-token prediction objective and users' actual goals. To reduce the labor and time cost to collect or annotate data by humans, researchers start to explore the use of LLMs to generate instruction-aligned synthetic data. Recent works f… ▽ More

    Submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted to Findings of NAACL 2024

  14. arXiv:2401.04398  [pdf, other

    cs.CL

    Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

    Authors: Zilong Wang, Hao Zhang, Chun-Liang Li, Julian Martin Eisenschlos, Vincent Perot, Zifeng Wang, Lesly Miculicich, Yasuhisa Fujii, Jingbo Shang, Chen-Yu Lee, Tomas Pfister

    Abstract: Table-based reasoning with large language models (LLMs) is a promising direction to tackle many table understanding tasks, such as table-based question answering and fact verification. Compared with generic reasoning, table-based reasoning requires the extraction of underlying semantics from both free-form questions and semi-structured tabular data. Chain-of-Thought and its similar approaches inco… ▽ More

    Submitted 18 January, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

    Comments: Accepted to ICLR 2024

  15. arXiv:2312.01279  [pdf, other

    cs.CL cs.AI cs.LG

    TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents

    Authors: James Enouen, Hootan Nakhost, Sayna Ebrahimi, Sercan O Arik, Yan Liu, Tomas Pfister

    Abstract: Large language models (LLMs) have attracted huge interest in practical applications given their increasingly accurate responses and coherent reasoning abilities. Given their nature as black-boxes using complex reasoning processes on their inputs, it is inevitable that the demand for scalable and faithful explanations for LLMs' generated content will continue to grow. There have been major developm… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  16. arXiv:2311.09533  [pdf, other

    cs.CL

    Effective Large Language Model Adaptation for Improved Grounding and Citation Generation

    Authors: Xi Ye, Ruoxi Sun, Sercan Ö. Arik, Tomas Pfister

    Abstract: Large language models (LLMs) have achieved remarkable advancements in natural language understanding and generation. However, one major issue towards their widespread deployment in the real world is that they can generate "hallucinated" answers that are not factual. Towards this end, this paper focuses on improving LLMs by grounding their responses in retrieved passages and by providing citations.… ▽ More

    Submitted 2 April, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: NAACL 2024

  17. arXiv:2311.02883  [pdf, other

    cs.CL

    SQLPrompt: In-Context Text-to-SQL with Minimal Labeled Data

    Authors: Ruoxi Sun, Sercan Ö. Arik, Rajarishi Sinha, Hootan Nakhost, Hanjun Dai, Pengcheng Yin, Tomas Pfister

    Abstract: Text-to-SQL aims to automate the process of generating SQL queries on a database from natural language text. In this work, we propose "SQLPrompt", tailored to improve the few-shot prompting capabilities of Text-to-SQL for Large Language Models (LLMs). Our methods include innovative prompt design, execution-based consistency decoding strategy which selects the SQL with the most consistent execution… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  18. arXiv:2311.00886  [pdf, other

    cs.LG

    COSTAR: Improved Temporal Counterfactual Estimation with Self-Supervised Learning

    Authors: Chuizheng Meng, Yihe Dong, Sercan Ö. Arık, Yan Liu, Tomas Pfister

    Abstract: Estimation of temporal counterfactual outcomes from observed history is crucial for decision-making in many domains such as healthcare and e-commerce, particularly when randomized controlled trials (RCTs) suffer from high cost or impracticality. For real-world datasets, modeling time-dependent confounders is challenging due to complex dynamics, long-range dependencies and both past treatments and… ▽ More

    Submitted 12 February, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

  19. arXiv:2310.11689  [pdf, other

    cs.CL cs.LG

    Adaptation with Self-Evaluation to Improve Selective Prediction in LLMs

    Authors: Jiefeng Chen, Jinsung Yoon, Sayna Ebrahimi, Sercan O Arik, Tomas Pfister, Somesh Jha

    Abstract: Large language models (LLMs) have recently shown great advances in a variety of tasks, including natural language understanding and generation. However, their use in high-stakes decision-making scenarios is still limited due to the potential for errors. Selective prediction is a technique that can be used to improve the reliability of the LLMs by allowing them to abstain from making predictions wh… ▽ More

    Submitted 11 November, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: Paper published at Findings of the Association for Computational Linguistics: EMNLP, 2023

  20. arXiv:2310.08750  [pdf, other

    cs.LG

    Search-Adaptor: Embedding Customization for Information Retrieval

    Authors: Jinsung Yoon, Sercan O Arik, Yanfei Chen, Tomas Pfister

    Abstract: Embeddings extracted by pre-trained Large Language Models (LLMs) have significant potential to improve information retrieval and search. Beyond the zero-shot setup in which they are being conventionally used, being able to take advantage of the information from the relevant query-corpus paired data can further boost the LLM capabilities. In this paper, we propose a novel method, Search-Adaptor, fo… ▽ More

    Submitted 23 August, 2024; v1 submitted 12 October, 2023; originally announced October 2023.

    Comments: Published in 2024 ACL Main Conference

  21. arXiv:2310.04948  [pdf, other

    cs.LG cs.CL

    TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting

    Authors: Defu Cao, Furong Jia, Sercan O Arik, Tomas Pfister, Yixiang Zheng, Wen Ye, Yan Liu

    Abstract: The past decade has witnessed significant advances in time series modeling with deep learning. While achieving state-of-the-art results, the best-performing architectures vary highly across applications and domains. Meanwhile, for natural language processing, the Generative Pre-trained Transformer (GPT) has demonstrated impressive performance via training one general-purpose model across various t… ▽ More

    Submitted 2 April, 2024; v1 submitted 7 October, 2023; originally announced October 2023.

    Comments: Accepted by ICLR 2024. Camera Ready Version

  22. arXiv:2308.13703  [pdf, other

    cs.LG

    PAITS: Pretraining and Augmentation for Irregularly-Sampled Time Series

    Authors: Nicasia Beebe-Wang, Sayna Ebrahimi, Jinsung Yoon, Sercan O. Arik, Tomas Pfister

    Abstract: Real-world time series data that commonly reflect sequential human behavior are often uniquely irregularly sampled and sparse, with highly nonuniform sampling over time and entities. Yet, commonly-used pretraining and augmentation methods for time series are not specifically designed for such scenarios. In this paper, we present PAITS (Pretraining and Augmentation for Irregularly-sampled Time Seri… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

    Comments: Code: \url{https://github.com/google-research/google-research/tree/master/irregular_timeseries_pretraining}

  23. arXiv:2308.00675  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

    Authors: Cheng-Yu Hsieh, Si-An Chen, Chun-Liang Li, Yasuhisa Fujii, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister

    Abstract: Today, large language models (LLMs) are taught to use new tools by providing a few demonstrations of the tool's usage. Unfortunately, demonstrations are hard to acquire, and can result in undesirable biased usage if the wrong demonstration is chosen. Even in the rare scenario that demonstrations are readily available, there is no principled selection protocol to determine how many and which ones t… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

  24. arXiv:2306.00739  [pdf, other

    cs.CL cs.AI cs.DB

    SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended)

    Authors: Ruoxi Sun, Sercan Ö. Arik, Alex Muzio, Lesly Miculicich, Satya Gundabathula, Pengcheng Yin, Hanjun Dai, Hootan Nakhost, Rajarishi Sinha, Zifeng Wang, Tomas Pfister

    Abstract: Text-to-SQL, the process of translating natural language into Structured Query Language (SQL), represents a transformative application of large language models (LLMs), potentially revolutionizing how humans interact with data. This paper introduces the SQL-PaLM framework, a comprehensive solution for understanding and enhancing Text-to-SQL using LLMs, using in the learning regimes of few-shot prom… ▽ More

    Submitted 30 March, 2024; v1 submitted 26 May, 2023; originally announced June 2023.

  25. arXiv:2305.16556  [pdf, other

    cs.LG cs.AI

    LANISTR: Multimodal Learning from Structured and Unstructured Data

    Authors: Sayna Ebrahimi, Sercan O. Arik, Yihe Dong, Tomas Pfister

    Abstract: Multimodal large-scale pretraining has shown impressive performance for unstructured data such as language and image. However, a prevalent real-world scenario involves structured data types, tabular and time-series, along with unstructured data. Such scenarios have been understudied. To bridge this gap, we propose LANISTR, an attention-based framework to learn from LANguage, Image, and STRuctured… ▽ More

    Submitted 24 April, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

  26. arXiv:2305.14926  [pdf, other

    cs.CL cs.AI cs.LG

    Universal Self-Adaptive Prompting

    Authors: Xingchen Wan, Ruoxi Sun, Hootan Nakhost, Hanjun Dai, Julian Martin Eisenschlos, Sercan O. Arik, Tomas Pfister

    Abstract: A hallmark of modern large language models (LLMs) is their impressive general zero-shot and few-shot abilities, often elicited through in-context learning (ICL) via prompting. However, while highly coveted and being the most general, zero-shot performances in LLMs are still typically weaker due to the lack of guidance and the difficulty of applying existing automatic prompt design methods in gener… ▽ More

    Submitted 20 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: EMNLP 2023 (Main). 10 pages, 5 figures, 4 tables (26 pages, 9 figures and 13 tables including references and appendices)

  27. arXiv:2305.14106  [pdf, other

    cs.CL cs.AI cs.LG

    Better Zero-Shot Reasoning with Self-Adaptive Prompting

    Authors: Xingchen Wan, Ruoxi Sun, Hanjun Dai, Sercan O. Arik, Tomas Pfister

    Abstract: Modern large language models (LLMs) have demonstrated impressive capabilities at sophisticated tasks, often through step-by-step reasoning similar to humans. This is made possible by their strong few and zero-shot abilities -- they can effectively learn from a handful of handcrafted, completed responses ("in-context examples"), or are prompted to reason spontaneously through specially designed tri… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: Findings of the Association for Computational Linguistics: ACL 2023. 10 pages, 2 tables, 4 figures (20 pages, 8 tables, 7 figures including references and appendices)

  28. arXiv:2305.02549  [pdf, other

    cs.CL cs.CV cs.LG

    FormNetV2: Multimodal Graph Contrastive Learning for Form Document Information Extraction

    Authors: Chen-Yu Lee, Chun-Liang Li, Hao Zhang, Timothy Dozat, Vincent Perot, Guolong Su, Xiang Zhang, Kihyuk Sohn, Nikolai Glushnev, Renshen Wang, Joshua Ainslie, Shangbang Long, Siyang Qin, Yasuhisa Fujii, Nan Hua, Tomas Pfister

    Abstract: The recent advent of self-supervised pre-training techniques has led to a surge in the use of multimodal learning in form document understanding. However, existing approaches that extend the mask language modeling to other modalities require careful multi-task tuning, complex reconstruction target designs, or additional pre-training data. In FormNetV2, we introduce a centralized multimodal graph c… ▽ More

    Submitted 13 June, 2023; v1 submitted 4 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023

  29. arXiv:2305.02301  [pdf, other

    cs.CL cs.AI cs.LG

    Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

    Authors: Cheng-Yu Hsieh, Chun-Liang Li, Chih-Kuan Yeh, Hootan Nakhost, Yasuhisa Fujii, Alexander Ratner, Ranjay Krishna, Chen-Yu Lee, Tomas Pfister

    Abstract: Deploying large language models (LLMs) is challenging because they are memory inefficient and compute-intensive for practical applications. In reaction, researchers train smaller task-specific models by either finetuning with human labels or distilling using LLM-generated labels. However, finetuning and distillation require large amounts of training data to achieve comparable performance to LLMs.… ▽ More

    Submitted 5 July, 2023; v1 submitted 3 May, 2023; originally announced May 2023.

    Comments: Accepted to Findings of ACL 2023

  30. arXiv:2304.03870  [pdf, other

    cs.LG

    ASPEST: Bridging the Gap Between Active Learning and Selective Prediction

    Authors: Jiefeng Chen, Jinsung Yoon, Sayna Ebrahimi, Sercan Arik, Somesh Jha, Tomas Pfister

    Abstract: Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain. These predictions can then be deferred to humans for further evaluation. As an everlasting challenge for machine learning, in many real-world scenarios, the distribution of test data is different from the training data. This results in more inaccurate predictions, and often increased dependenc… ▽ More

    Submitted 29 February, 2024; v1 submitted 7 April, 2023; originally announced April 2023.

  31. arXiv:2303.06053  [pdf, other

    cs.LG cs.AI

    TSMixer: An All-MLP Architecture for Time Series Forecasting

    Authors: Si-An Chen, Chun-Liang Li, Nate Yoder, Sercan O. Arik, Tomas Pfister

    Abstract: Real-world time-series datasets are often multivariate with complex dynamics. To capture this complexity, high capacity architectures like recurrent- or attention-based sequential deep learning models have become popular. However, recent work demonstrates that simple univariate linear models can outperform such deep learning models on several commonly used academic benchmarks. Extending them, in t… ▽ More

    Submitted 11 September, 2023; v1 submitted 10 March, 2023; originally announced March 2023.

    Journal ref: Transactions on Machine Learning Research (TMLR), 09/2023

  32. arXiv:2302.03084  [pdf, other

    cs.CV

    Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval

    Authors: Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister

    Abstract: In Composed Image Retrieval (CIR), a user combines a query image with text to describe their intended target. Existing methods rely on supervised learning of CIR models using labeled triplets consisting of the query image, text specification, and the target image. Labeling such triplets is expensive and hinders broad applicability of CIR. In this work, we propose to study an important task, Zero-S… ▽ More

    Submitted 15 May, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

    Comments: CVPR2023

  33. arXiv:2301.04857  [pdf, other

    cs.AI stat.ME

    Neural Spline Search for Quantile Probabilistic Modeling

    Authors: Ruoxi Sun, Chun-Liang Li, Sercan O. Arik, Michael W. Dusenberry, Chen-Yu Lee, Tomas Pfister

    Abstract: Accurate estimation of output quantiles is crucial in many use cases, where it is desired to model the range of possibility. Modeling target distribution at arbitrary quantile levels and at arbitrary input attribute levels are important to offer a comprehensive picture of the data, and requires the quantile function to be expressive enough. The quantile function describing the target distribution… ▽ More

    Submitted 12 January, 2023; originally announced January 2023.

  34. arXiv:2212.00173  [pdf, other

    cs.LG

    SPADE: Semi-supervised Anomaly Detection under Distribution Mismatch

    Authors: Jinsung Yoon, Kihyuk Sohn, Chun-Liang Li, Sercan O. Arik, Tomas Pfister

    Abstract: Semi-supervised anomaly detection is a common problem, as often the datasets containing anomalies are partially labeled. We propose a canonical framework: Semi-supervised Pseudo-labeler Anomaly Detection with Ensembling (SPADE) that isn't limited by the assumption that labeled and unlabeled data come from the same distribution. Indeed, the assumption is often violated in many applications - for ex… ▽ More

    Submitted 30 November, 2022; originally announced December 2022.

  35. arXiv:2211.07730  [pdf, other

    cs.LG cs.AI cs.CL

    QueryForm: A Simple Zero-shot Form Entity Query Framework

    Authors: Zifeng Wang, Zizhao Zhang, Jacob Devlin, Chen-Yu Lee, Guolong Su, Hao Zhang, Jennifer Dy, Vincent Perot, Tomas Pfister

    Abstract: Zero-shot transfer learning for document understanding is a crucial yet under-investigated scenario to help reduce the high cost involved in annotating document entities. We present a novel query-based framework, QueryForm, that extracts entity values from form-like documents in a zero-shot fashion. QueryForm contains a dual prompting mechanism that composes both the document schema and a specific… ▽ More

    Submitted 27 June, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: Accepted to Findings of ACL 2023

  36. arXiv:2206.07240  [pdf, other

    cs.CV cs.AI cs.LG

    Test-Time Adaptation for Visual Document Understanding

    Authors: Sayna Ebrahimi, Sercan O. Arik, Tomas Pfister

    Abstract: For visual document understanding (VDU), self-supervised pretraining has been shown to successfully generate transferable representations, yet, effective adaptation of such representations to distribution shifts at test-time remains to be an unexplored area. We propose DocTTA, a novel test-time adaptation method for documents, that does source-free domain adaptation using unlabeled target document… ▽ More

    Submitted 23 August, 2023; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: Accepted at TMLR 2023

  37. arXiv:2206.06469  [pdf

    cs.LG stat.ML

    Invariant Structure Learning for Better Generalization and Causal Explainability

    Authors: Yunhao Ge, Sercan Ö. Arik, Jinsung Yoon, Ao Xu, Laurent Itti, Tomas Pfister

    Abstract: Learning the causal structure behind data is invaluable for improving generalization and obtaining high-quality explanations. We propose a novel framework, Invariant Structure Learning (ISL), that is designed to improve causal structure discovery by utilizing generalization as an indication. ISL splits the data into different environments, and learns a structure that is invariant to the target acr… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: 16 pages (including Appendix), 4 figures

  38. arXiv:2206.02107  [pdf, other

    cs.LG

    Interpretable Mixture of Experts

    Authors: Aya Abdelsalam Ismail, Sercan Ö. Arik, Jinsung Yoon, Ankur Taly, Soheil Feizi, Tomas Pfister

    Abstract: The need for reliable model explanations is prominent for many machine learning applications, particularly for tabular and time-series data as their use cases often involve high-stakes decision making. Towards this goal, we introduce a novel interpretable modeling framework, Interpretable Mixture of Experts (IME), that yields high accuracy, comparable to `black-box' Deep Neural Networks (DNNs) in… ▽ More

    Submitted 25 May, 2023; v1 submitted 5 June, 2022; originally announced June 2022.

  39. arXiv:2206.01125  [pdf, other

    cs.CV

    Prefix Conditioning Unifies Language and Label Supervision

    Authors: Kuniaki Saito, Kihyuk Sohn, Xiang Zhang, Chun-Liang Li, Chen-Yu Lee, Kate Saenko, Tomas Pfister

    Abstract: Image-classification datasets have been used to pretrain image recognition models. Recently, web-scale image-caption datasets have emerged as a source of powerful pretraining alternative. Image-caption datasets are more ``open-domain'', containing a wider variety of scene types and vocabulary words than traditional classification datasets, and models trained on these datasets have demonstrated str… ▽ More

    Submitted 15 May, 2023; v1 submitted 2 June, 2022; originally announced June 2022.

    Comments: CVPR2023

  40. arXiv:2204.04799  [pdf, other

    cs.LG cs.CV

    DualPrompt: Complementary Prompting for Rehearsal-free Continual Learning

    Authors: Zifeng Wang, Zizhao Zhang, Sayna Ebrahimi, Ruoxi Sun, Han Zhang, Chen-Yu Lee, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister

    Abstract: Continual learning aims to enable a single model to learn a sequence of tasks without catastrophic forgetting. Top-performing methods usually require a rehearsal buffer to store past pristine examples for experience replay, which, however, limits their practical value due to privacy and memory constraints. In this work, we present a simple yet effective framework, DualPrompt, which learns a tiny s… ▽ More

    Submitted 5 August, 2022; v1 submitted 10 April, 2022; originally announced April 2022.

    Comments: Published at ECCV 2022 as a conference paper

  41. arXiv:2203.16530  [pdf, other

    cs.CV

    Learning Instance-Specific Adaptation for Cross-Domain Segmentation

    Authors: Yuliang Zou, Zizhao Zhang, Chun-Liang Li, Han Zhang, Tomas Pfister, Jia-Bin Huang

    Abstract: We propose a test-time adaptation method for cross-domain image segmentation. Our method is simple: Given a new unseen instance at test time, we adapt a pre-trained model by conducting instance-specific BatchNorm (statistics) calibration. Our approach has two core components. First, we replace the manually designed BatchNorm calibration rule with a learnable module. Second, we leverage strong data… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: Project page: https://yuliang.vision/InstCal/

  42. arXiv:2203.08411  [pdf, other

    cs.CL cs.CV cs.LG

    FormNet: Structural Encoding beyond Sequential Modeling in Form Document Information Extraction

    Authors: Chen-Yu Lee, Chun-Liang Li, Timothy Dozat, Vincent Perot, Guolong Su, Nan Hua, Joshua Ainslie, Renshen Wang, Yasuhisa Fujii, Tomas Pfister

    Abstract: Sequence modeling has demonstrated state-of-the-art performance on natural language and document understanding tasks. However, it is challenging to correctly serialize tokens in form-like documents in practice due to their variety of layout patterns. We propose FormNet, a structure-aware sequence model to mitigate the suboptimal serialization of forms. First, we design Rich Attention that leverage… ▽ More

    Submitted 23 March, 2022; v1 submitted 16 March, 2022; originally announced March 2022.

    Comments: Accepted to ACL 2022

  43. arXiv:2203.02034  [pdf, other

    cs.LG

    Data-Efficient and Interpretable Tabular Anomaly Detection

    Authors: Chun-Hao Chang, Jinsung Yoon, Sercan Arik, Madeleine Udell, Tomas Pfister

    Abstract: Anomaly detection (AD) plays an important role in numerous applications. We focus on two understudied aspects of AD that are critical for integration into real-world applications. First, most AD methods cannot incorporate labeled data that are often available in practice in small quantities and can be crucial to achieve high AD accuracy. Second, most AD methods are not interpretable, a bottleneck… ▽ More

    Submitted 4 June, 2023; v1 submitted 3 March, 2022; originally announced March 2022.

    Comments: Accepted in 2023 KDD

  44. arXiv:2202.02403  [pdf, other

    cs.LG cs.AI

    Self-Adaptive Forecasting for Improved Deep Learning on Non-Stationary Time-Series

    Authors: Sercan O. Arik, Nathanael C. Yoder, Tomas Pfister

    Abstract: Real-world time-series datasets often violate the assumptions of standard supervised learning for forecasting -- their distributions evolve over time, rendering the conventional training and model selection procedures suboptimal. In this paper, we propose a novel method, Self-Adaptive Forecasting (SAF), to modify the training of time-series forecasting models to improve their performance on foreca… ▽ More

    Submitted 26 September, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

  45. arXiv:2202.02262  [pdf, other

    cs.LG

    Decoupling Local and Global Representations of Time Series

    Authors: Sana Tonekaboni, Chun-Liang Li, Sercan Arik, Anna Goldenberg, Tomas Pfister

    Abstract: Real-world time series data are often generated from several sources of variation. Learning representations that capture the factors contributing to this variability enables a better understanding of the data via its underlying generative process and improves performance on downstream machine learning tasks. This paper proposes a novel generative approach for learning representations for the globa… ▽ More

    Submitted 11 February, 2022; v1 submitted 4 February, 2022; originally announced February 2022.

  46. arXiv:2201.03668  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Towards Group Robustness in the presence of Partial Group Labels

    Authors: Vishnu Suresh Lokhande, Kihyuk Sohn, Jinsung Yoon, Madeleine Udell, Chen-Yu Lee, Tomas Pfister

    Abstract: Learning invariant representations is an important requirement when training machine learning models that are driven by spurious correlations in the datasets. These spurious correlations, between input samples and the target labels, wrongly direct the neural network predictions resulting in poor performance on certain groups, especially the minority groups. Robust training against these spurious c… ▽ More

    Submitted 10 January, 2022; originally announced January 2022.

  47. arXiv:2112.11573  [pdf, other

    cs.CV

    Anomaly Clustering: Grouping Images into Coherent Clusters of Anomaly Types

    Authors: Kihyuk Sohn, Jinsung Yoon, Chun-Liang Li, Chen-Yu Lee, Tomas Pfister

    Abstract: We study anomaly clustering, grouping data into coherent clusters of anomaly types. This is different from anomaly detection that aims to divide anomalies from normal data. Unlike object-centered image clustering, anomaly clustering is particularly challenging as anomalous patterns are subtle and local. We present a simple yet effective clustering framework using a patch-based pretrained deep embe… ▽ More

    Submitted 14 October, 2022; v1 submitted 21 December, 2021; originally announced December 2021.

    Comments: WACV2023

  48. arXiv:2112.08654  [pdf, other

    cs.LG cs.CV

    Learning to Prompt for Continual Learning

    Authors: Zifeng Wang, Zizhao Zhang, Chen-Yu Lee, Han Zhang, Ruoxi Sun, Xiaoqi Ren, Guolong Su, Vincent Perot, Jennifer Dy, Tomas Pfister

    Abstract: The mainstream paradigm behind continual learning has been to adapt the model parameters to non-stationary data distributions, where catastrophic forgetting is the central challenge. Typical methods rely on a rehearsal buffer or known task identity at test time to retrieve learned knowledge and address forgetting, while this work presents a new paradigm for continual learning that aims to train a… ▽ More

    Submitted 21 March, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

    Comments: Published at CVPR 2022 as a conference paper

  49. arXiv:2109.03216  [pdf, other

    cs.LG cs.CV

    Learning Fast Sample Re-weighting Without Reward Data

    Authors: Zizhao Zhang, Tomas Pfister

    Abstract: Training sample re-weighting is an effective approach for tackling data biases such as imbalanced and corrupted labels. Recent methods develop learning-based algorithms to learn sample re-weighting strategies jointly with model training based on the frameworks of reinforcement learning and meta learning. However, depending on additional unbiased reward data is limiting their general applicability.… ▽ More

    Submitted 7 September, 2021; originally announced September 2021.

    Comments: ICCV2021

  50. arXiv:2106.10786  [pdf, other

    cs.CL cs.LG

    ROPE: Reading Order Equivariant Positional Encoding for Graph-based Document Information Extraction

    Authors: Chen-Yu Lee, Chun-Liang Li, Chu Wang, Renshen Wang, Yasuhisa Fujii, Siyang Qin, Ashok Popat, Tomas Pfister

    Abstract: Natural reading orders of words are crucial for information extraction from form-like documents. Despite recent advances in Graph Convolutional Networks (GCNs) on modeling spatial layout patterns of documents, they have limited ability to capture reading orders of given word-level node representations in a graph. We propose Reading Order Equivariant Positional Encoding (ROPE), a new positional enc… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Comments: Accepted to ACL-IJCNLP 2021 (Oral)