Skip to main content

Showing 1–30 of 30 results for author: Liu, P J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.07852  [pdf, other

    cs.CL cs.AI cs.LG

    Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability

    Authors: Jiri Hron, Laura Culp, Gamaleldin Elsayed, Rosanne Liu, Ben Adlam, Maxwell Bileschi, Bernd Bohnet, JD Co-Reyes, Noah Fiedel, C. Daniel Freeman, Izzeddin Gur, Kathleen Kenealy, Jaehoon Lee, Peter J. Liu, Gaurav Mishra, Igor Mordatch, Azade Nova, Roman Novak, Aaron Parisi, Jeffrey Pennington, Alex Rizkowsky, Isabelle Simpson, Hanie Sedghi, Jascha Sohl-dickstein, Kevin Swersky , et al. (6 additional authors not shown)

    Abstract: While many capabilities of language models (LMs) improve with increased training budget, the influence of scale on hallucinations is not yet fully understood. Hallucinations come in many forms, and there is no universally accepted definition. We thus focus on studying only those hallucinations where a correct answer appears verbatim in the training set. To fully control the training data content,… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

    Comments: Published at COLM 2024. 16 pages, 11 figures

  2. arXiv:2407.05872  [pdf, other

    cs.LG

    Scaling Exponents Across Parameterizations and Optimizers

    Authors: Katie Everett, Lechao Xiao, Mitchell Wortsman, Alexander A. Alemi, Roman Novak, Peter J. Liu, Izzeddin Gur, Jascha Sohl-Dickstein, Leslie Pack Kaelbling, Jaehoon Lee, Jeffrey Pennington

    Abstract: Robust and effective scaling of models from small to large width typically requires the precise adjustment of many algorithmic and architectural details, such as parameterization and optimizer choices. In this work, we propose a new perspective on parameterization by investigating a key assumption in prior work about the alignment between parameters and data and derive new theoretical results unde… ▽ More

    Submitted 16 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

    Comments: 63 pages, International Conference on Machine Learning 2024

  3. arXiv:2402.01878  [pdf, other

    cs.CL cs.LG

    LiPO: Listwise Preference Optimization through Learning-to-Rank

    Authors: Tianqi Liu, Zhen Qin, Junru Wu, Jiaming Shen, Misha Khalman, Rishabh Joshi, Yao Zhao, Mohammad Saleh, Simon Baumgartner, Jialu Liu, Peter J. Liu, Xuanhui Wang

    Abstract: Aligning language models (LMs) with curated human feedback is critical to control their behaviors in real-world applications. Several recent policy optimization methods, such as DPO and SLiC, serve as promising alternatives to the traditional Reinforcement Learning from Human Feedback (RLHF) approach. In practice, human feedback often comes in a format of a ranked list over multiple responses to a… ▽ More

    Submitted 22 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  4. arXiv:2312.09300  [pdf, other

    cs.CL cs.AI cs.LG

    Self-Evaluation Improves Selective Generation in Large Language Models

    Authors: Jie Ren, Yao Zhao, Tu Vu, Peter J. Liu, Balaji Lakshminarayanan

    Abstract: Safe deployment of large language models (LLMs) may benefit from a reliable method for assessing their generated content to determine when to abstain or to selectively generate. While likelihood-based metrics such as perplexity are widely employed, recent research has demonstrated the limitations of using sequence-level probability estimates given by LLMs as reliable indicators of generation quali… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  5. arXiv:2312.06585  [pdf, other

    cs.LG

    Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

    Authors: Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron , et al. (16 additional authors not shown)

    Abstract: Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investig… ▽ More

    Submitted 17 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted to TMLR. Camera-ready version. First three authors contributed equally

  6. arXiv:2311.07587  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?

    Authors: C. Daniel Freeman, Laura Culp, Aaron Parisi, Maxwell L Bileschi, Gamaleldin F Elsayed, Alex Rizkowsky, Isabelle Simpson, Alex Alemi, Azade Nova, Ben Adlam, Bernd Bohnet, Gaurav Mishra, Hanie Sedghi, Igor Mordatch, Izzeddin Gur, Jaehoon Lee, JD Co-Reyes, Jeffrey Pennington, Kelvin Xu, Kevin Swersky, Kshiteej Mahajan, Lechao Xiao, Rosanne Liu, Simon Kornblith, Noah Constant , et al. (5 additional authors not shown)

    Abstract: We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment. This problem is comprised of arithmetic questions posed in natural language, with an arbitrary adversarial string inserted before the question is complete. Even in the simple setting of 1-digit addition problems, it is easy to find adversarial prompts that mak… ▽ More

    Submitted 15 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

  7. arXiv:2310.10047  [pdf, other

    cs.CL

    Improving Large Language Model Fine-tuning for Solving Math Problems

    Authors: Yixin Liu, Avi Singh, C. Daniel Freeman, John D. Co-Reyes, Peter J. Liu

    Abstract: Despite their success in many natural language tasks, solving math problems remains a significant challenge for large language models (LLMs). A large gap exists between LLMs' pass-at-one and pass-at-N performance in solving math problems, suggesting LLMs might be close to finding correct solutions, motivating our exploration of fine-tuning methods to unlock LLMs' performance. Using the challenging… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  8. arXiv:2309.14322  [pdf, other

    cs.LG

    Small-scale proxies for large-scale Transformer training instabilities

    Authors: Mitchell Wortsman, Peter J. Liu, Lechao Xiao, Katie Everett, Alex Alemi, Ben Adlam, John D. Co-Reyes, Izzeddin Gur, Abhishek Kumar, Roman Novak, Jeffrey Pennington, Jascha Sohl-dickstein, Kelvin Xu, Jaehoon Lee, Justin Gilmer, Simon Kornblith

    Abstract: Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific interest, the amount of resources required to reproduce them has made investigation difficult. In this work, we seek ways to reproduce and study train… ▽ More

    Submitted 16 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

  9. arXiv:2309.06657  [pdf, other

    cs.CL

    Statistical Rejection Sampling Improves Preference Optimization

    Authors: Tianqi Liu, Yao Zhao, Rishabh Joshi, Misha Khalman, Mohammad Saleh, Peter J. Liu, Jialu Liu

    Abstract: Improving the alignment of language models with human preferences remains an active research challenge. Previous approaches have primarily utilized Reinforcement Learning from Human Feedback (RLHF) via online RL methods such as Proximal Policy Optimization (PPO). Recently, offline methods such as Sequence Likelihood Calibration (SLiC) and Direct Preference Optimization (DPO) have emerged as attrac… ▽ More

    Submitted 23 January, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

    Comments: Accepted in ICLR 2024

  10. arXiv:2305.10425  [pdf, other

    cs.CL cs.AI

    SLiC-HF: Sequence Likelihood Calibration with Human Feedback

    Authors: Yao Zhao, Rishabh Joshi, Tianqi Liu, Misha Khalman, Mohammad Saleh, Peter J. Liu

    Abstract: Learning from human feedback has been shown to be effective at aligning language models with human preferences. Past work has often relied on Reinforcement Learning from Human Feedback (RLHF), which optimizes the language model using reward scores assigned from a reward model trained on human preference data. In this work we show how the recently introduced Sequence Likelihood Calibration (SLiC),… ▽ More

    Submitted 17 May, 2023; originally announced May 2023.

  11. arXiv:2212.09928  [pdf, other

    cs.CL cs.LG

    Improving the Robustness of Summarization Models by Detecting and Removing Input Noise

    Authors: Kundan Krishna, Yao Zhao, Jie Ren, Balaji Lakshminarayanan, Jiaming Luo, Mohammad Saleh, Peter J. Liu

    Abstract: The evaluation of abstractive summarization models typically uses test data that is identically distributed as training data. In real-world practice, documents to be summarized may contain input noise caused by text extraction artifacts or data pipeline bugs. The robustness of model performance under distribution shift caused by such noise is relatively under-studied. We present a large empirical… ▽ More

    Submitted 4 December, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: EMNLP Findings 2023 Camera Ready

  12. arXiv:2210.00045  [pdf, other

    cs.CL

    Calibrating Sequence likelihood Improves Conditional Language Generation

    Authors: Yao Zhao, Misha Khalman, Rishabh Joshi, Shashi Narayan, Mohammad Saleh, Peter J. Liu

    Abstract: Conditional language models are predominantly trained with maximum likelihood estimation (MLE), giving probability mass to sparsely observed target sequences. While MLE trained models assign high probability to plausible sequences given the context, the model probabilities often do not accurately rank-order generated sequences by quality. This has been empirically observed in beam search decoding… ▽ More

    Submitted 30 September, 2022; originally announced October 2022.

  13. arXiv:2209.15558  [pdf, other

    cs.CL

    Out-of-Distribution Detection and Selective Generation for Conditional Language Models

    Authors: Jie Ren, Jiaming Luo, Yao Zhao, Kundan Krishna, Mohammad Saleh, Balaji Lakshminarayanan, Peter J. Liu

    Abstract: Machine learning algorithms typically assume independent and identically distributed samples in training and at test time. Much work has shown that high-performing ML classifiers can degrade significantly and provide overly-confident, wrong classification predictions, particularly for out-of-distribution (OOD) inputs. Conditional language models (CLMs) are predominantly trained to classify the nex… ▽ More

    Submitted 7 March, 2023; v1 submitted 30 September, 2022; originally announced September 2022.

    Comments: Published in ICLR 2023

  14. arXiv:2208.04347  [pdf, other

    cs.CL

    Investigating Efficiently Extending Transformers for Long Input Summarization

    Authors: Jason Phang, Yao Zhao, Peter J. Liu

    Abstract: While large pretrained Transformer models have proven highly capable at tackling natural language tasks, handling long sequence inputs continues to be a significant challenge. One such task is long input summarization, where inputs are longer than the maximum input context of most pretrained models. Through an extensive set of experiments, we investigate what model architectural changes and pretra… ▽ More

    Submitted 8 August, 2022; originally announced August 2022.

  15. arXiv:2208.01030  [pdf, other

    cs.CL

    SMART: Sentences as Basic Units for Text Evaluation

    Authors: Reinald Kim Amplayo, Peter J. Liu, Yao Zhao, Shashi Narayan

    Abstract: Widely used evaluation metrics for text generation either do not work well with longer texts or fail to evaluate all aspects of text quality. In this paper, we introduce a new metric called SMART to mitigate such limitations. Specifically, We treat sentences as basic units of matching instead of tokens, and use a sentence matching function to soft-match candidate and reference sentences. Candidate… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

    Comments: code coming soon

  16. arXiv:2006.10213  [pdf, other

    cs.CL cs.IR cs.LG

    SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization

    Authors: Yao Zhao, Mohammad Saleh, Peter J. Liu

    Abstract: Most prior work in the sequence-to-sequence paradigm focused on datasets with input sequence lengths in the hundreds of tokens due to the computational constraints of common RNN and Transformer architectures. In this paper, we study long-form abstractive text summarization, a sequence-to-sequence setting with input sequence lengths up to 100,000 tokens and output sequence lengths up to 768 tokens.… ▽ More

    Submitted 17 June, 2020; originally announced June 2020.

  17. arXiv:1912.08777  [pdf, other

    cs.CL

    PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization

    Authors: Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu

    Abstract: Recent work pre-training Transformers with self-supervised objectives on large text corpora has shown great success when fine-tuned on downstream NLP tasks including text summarization. However, pre-training objectives tailored for abstractive text summarization have not been explored. Furthermore there is a lack of systematic evaluation across diverse domains. In this work, we propose pre-trainin… ▽ More

    Submitted 10 July, 2020; v1 submitted 18 December, 2019; originally announced December 2019.

    Comments: Added results from mixed+stochastic model, test-set overlapping analysis; Code link added; Accepted for ICML 2020. arXiv admin note: text overlap with arXiv:1605.06560, arXiv:1205.2395, arXiv:0902.4351, arXiv:1610.09932, arXiv:nucl-ex/0512029 by other authors

  18. arXiv:1910.10683  [pdf, other

    cs.LG cs.CL stat.ML

    Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

    Authors: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu

    Abstract: Transfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a diversity of approaches, methodology, and practice. In this paper, we explore the landscape of transfer learning techniques for NLP by introducing… ▽ More

    Submitted 19 September, 2023; v1 submitted 23 October, 2019; originally announced October 2019.

  19. arXiv:1910.00998  [pdf, other

    cs.CL cs.LG cs.NE

    SummAE: Zero-Shot Abstractive Text Summarization using Length-Agnostic Auto-Encoders

    Authors: Peter J. Liu, Yu-An Chung, Jie Ren

    Abstract: We propose an end-to-end neural model for zero-shot abstractive text summarization of paragraphs, and introduce a benchmark task, ROCSumm, based on ROCStories, a subset for which we collected human summaries. In this task, five-sentence stories (paragraphs) are summarized with one sentence, using human summaries only for evaluation. We show results for extractive and human baselines to demonstrate… ▽ More

    Submitted 2 October, 2019; originally announced October 2019.

    Comments: first two authors contributed equally

  20. arXiv:1906.02845  [pdf, other

    stat.ML cs.LG

    Likelihood Ratios for Out-of-Distribution Detection

    Authors: Jie Ren, Peter J. Liu, Emily Fertig, Jasper Snoek, Ryan Poplin, Mark A. DePristo, Joshua V. Dillon, Balaji Lakshminarayanan

    Abstract: Discriminative neural networks offer little or no performance guarantees when deployed on data not generated by the same process as the training distribution. On such out-of-distribution (OOD) inputs, the prediction may not only be erroneous, but confidently so, limiting the safe deployment of classifiers in real-world applications. One such challenging application is bacteria identification based… ▽ More

    Submitted 5 December, 2019; v1 submitted 6 June, 2019; originally announced June 2019.

    Comments: Accepted to NeurIPS 2019

  21. Assessing The Factual Accuracy of Generated Text

    Authors: Ben Goodrich, Vinay Rao, Mohammad Saleh, Peter J Liu

    Abstract: We propose a model-based metric to estimate the factual accuracy of generated text that is complementary to typical scoring schemes like ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (Bilingual Evaluation Understudy). We introduce and release a new large-scale dataset based on Wikipedia and Wikidata to train relation classifiers and end-to-end fact extraction models. The end-t… ▽ More

    Submitted 25 May, 2021; v1 submitted 30 May, 2019; originally announced May 2019.

    Journal ref: The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '19), August 4--8, 2019, Anchorage, AK, USA

  22. arXiv:1905.12126  [pdf, other

    cs.LG cs.AI stat.ML

    Using Ontologies To Improve Performance In Massively Multi-label Prediction Models

    Authors: Ethan Steinberg, Peter J. Liu

    Abstract: Massively multi-label prediction/classification problems arise in environments like health-care or biology where very precise predictions are useful. One challenge with massively multi-label problems is that there is often a long-tailed frequency distribution for the labels, which results in few positive examples for the rare labels. We propose a solution to this problem by modifying the output la… ▽ More

    Submitted 28 May, 2019; originally announced May 2019.

  23. arXiv:1810.05739  [pdf, other

    cs.CL

    MeanSum: A Neural Model for Unsupervised Multi-document Abstractive Summarization

    Authors: Eric Chu, Peter J. Liu

    Abstract: Abstractive summarization has been studied using neural sequence transduction methods with datasets of large, paired document-summary examples. However, such datasets are rare and the models trained from them do not generalize to other domains. Recently, some progress has been made in learning sequence-to-sequence mappings with only unpaired examples. In our work, we consider the setting where the… ▽ More

    Submitted 22 May, 2019; v1 submitted 12 October, 2018; originally announced October 2018.

    Comments: Accepted to ICML 2019

  24. arXiv:1808.02622  [pdf, other

    cs.CL

    Learning to Write Notes in Electronic Health Records

    Authors: Peter J. Liu

    Abstract: Clinicians spend a significant amount of time inputting free-form textual notes into Electronic Health Records (EHR) systems. Much of this documentation work is seen as a burden, reducing time spent with patients and contributing to clinician burnout. With the aspiration of AI-assisted note-writing, we propose a new language modeling task predicting the content of notes conditioned on past data fr… ▽ More

    Submitted 8 August, 2018; originally announced August 2018.

    Comments: preprint

  25. arXiv:1801.10198  [pdf, other

    cs.CL

    Generating Wikipedia by Summarizing Long Sequences

    Authors: Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, Noam Shazeer

    Abstract: We show that generating English Wikipedia articles can be approached as a multi- document summarization of source documents. We use extractive summarization to coarsely identify salient information and a neural abstractive model to generate the article. For the abstractive model, we introduce a decoder-only architecture that can scalably attend to very long sequences, much longer than typical enco… ▽ More

    Submitted 30 January, 2018; originally announced January 2018.

    Comments: Published as a conference paper at ICLR 2018

  26. Scalable and accurate deep learning for electronic health records

    Authors: Alvin Rajkomar, Eyal Oren, Kai Chen, Andrew M. Dai, Nissan Hajaj, Peter J. Liu, Xiaobing Liu, Mimi Sun, Patrik Sundberg, Hector Yee, Kun Zhang, Gavin E. Duggan, Gerardo Flores, Michaela Hardt, Jamie Irvine, Quoc Le, Kurt Litsch, Jake Marcus, Alexander Mossin, Justin Tansuwan, De Wang, James Wexler, Jimbo Wilson, Dana Ludwig, Samuel L. Volchenboum , et al. (9 additional authors not shown)

    Abstract: Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of p… ▽ More

    Submitted 11 May, 2018; v1 submitted 24 January, 2018; originally announced January 2018.

    Comments: Published version from https://www.nature.com/articles/s41746-018-0029-1

    Journal ref: npj Digital Medicine 1:18 (2018)

  27. arXiv:1801.05453  [pdf, other

    cs.CL cs.LG stat.ML

    Beyond Word Importance: Contextual Decomposition to Extract Interactions from LSTMs

    Authors: W. James Murdoch, Peter J. Liu, Bin Yu

    Abstract: The driving force behind the recent success of LSTMs has been their ability to learn complex and non-linear relationships. Consequently, our inability to describe these relationships has led to LSTMs being characterized as black boxes. To this end, we introduce contextual decomposition (CD), an interpretation algorithm for analysing individual predictions made by standard LSTMs, without any change… ▽ More

    Submitted 27 April, 2018; v1 submitted 16 January, 2018; originally announced January 2018.

    Comments: Oral presentation at ICLR 2018

  28. arXiv:1704.04368  [pdf, other

    cs.CL

    Get To The Point: Summarization with Pointer-Generator Networks

    Authors: Abigail See, Peter J. Liu, Christopher D. Manning

    Abstract: Neural sequence-to-sequence models have provided a viable new approach for abstractive text summarization (meaning they are not restricted to simply selecting and rearranging passages from the original text). However, these models have two shortcomings: they are liable to reproduce factual details inaccurately, and they tend to repeat themselves. In this work we propose a novel architecture that a… ▽ More

    Submitted 25 April, 2017; v1 submitted 14 April, 2017; originally announced April 2017.

    Comments: Add METEOR evaluation results, add some citations, fix some equations (what are now equations 1, 8 and 11 were missing a bias term), fix url to pyrouge package, add acknowledgments

  29. arXiv:1704.00784  [pdf, other

    cs.LG cs.CL

    Online and Linear-Time Attention by Enforcing Monotonic Alignments

    Authors: Colin Raffel, Minh-Thang Luong, Peter J. Liu, Ron J. Weiss, Douglas Eck

    Abstract: Recurrent neural network models with an attention mechanism have proven to be extremely effective on a wide variety of sequence-to-sequence problems. However, the fact that soft attention mechanisms perform a pass over the entire input sequence when producing each element in the output sequence precludes their use in online settings and results in a quadratic time complexity. Based on the insight… ▽ More

    Submitted 29 June, 2017; v1 submitted 3 April, 2017; originally announced April 2017.

    Comments: ICML camera-ready version; 10 pages + 9 page appendix

  30. arXiv:1611.02683  [pdf, other

    cs.CL cs.LG cs.NE

    Unsupervised Pretraining for Sequence to Sequence Learning

    Authors: Prajit Ramachandran, Peter J. Liu, Quoc V. Le

    Abstract: This work presents a general unsupervised learning method to improve the accuracy of sequence to sequence (seq2seq) models. In our method, the weights of the encoder and decoder of a seq2seq model are initialized with the pretrained weights of two language models and then fine-tuned with labeled data. We apply this method to challenging benchmarks in machine translation and abstractive summarizati… ▽ More

    Submitted 21 February, 2018; v1 submitted 8 November, 2016; originally announced November 2016.

    Comments: Updated to accepted EMNLP 2017 version