Skip to main content

Showing 1–42 of 42 results for author: Yogatama, D

Searching in archive cs. Search in all archives.
.
  1. arXiv:2405.02287  [pdf, other

    cs.CL cs.AI cs.CV

    Vibe-Eval: A hard evaluation suite for measuring progress of multimodal language models

    Authors: Piotr Padlewski, Max Bain, Matthew Henderson, Zhongkai Zhu, Nishant Relan, Hai Pham, Donovan Ong, Kaloyan Aleksiev, Aitor Ormazabal, Samuel Phua, Ethan Yeo, Eugenie Lamprecht, Qi Liu, Yuqi Wang, Eric Chen, Deyu Fu, Lei Li, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Mikel Artetxe, Yi Tay

    Abstract: We introduce Vibe-Eval: a new open benchmark and framework for evaluating multimodal chat models. Vibe-Eval consists of 269 visual understanding prompts, including 100 of hard difficulty, complete with gold-standard responses authored by experts. Vibe-Eval is open-ended and challenging with dual objectives: (i) vibe checking multimodal chat models for day-to-day tasks and (ii) rigorously testing a… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

  2. arXiv:2404.12387  [pdf, other

    cs.CL cs.CV

    Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

    Authors: Reka Team, Aitor Ormazabal, Che Zheng, Cyprien de Masson d'Autume, Dani Yogatama, Deyu Fu, Donovan Ong, Eric Chen, Eugenie Lamprecht, Hai Pham, Isaac Ong, Kaloyan Aleksiev, Lei Li, Matthew Henderson, Max Bain, Mikel Artetxe, Nishant Relan, Piotr Padlewski, Qi Liu, Ren Chen, Samuel Phua, Yazheng Yang, Yi Tay, Yuqi Wang, Zhongkai Zhu , et al. (1 additional authors not shown)

    Abstract: We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio inputs. This technical report discusses details of training some of these models and provides comprehensive evaluation results. We show that Reka Edge and Reka Flash are not only state-of-the-art but al… ▽ More

    Submitted 18 April, 2024; originally announced April 2024.

  3. arXiv:2404.01266  [pdf, other

    cs.AI cs.CL

    IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations

    Authors: Deqing Fu, Ghazal Khalighinejad, Ollie Liu, Bhuwan Dhingra, Dani Yogatama, Robin Jia, Willie Neiswanger

    Abstract: Current foundation models exhibit impressive capabilities when prompted either with text only or with both image and text inputs. But do their capabilities change depending on the input modality? In this work, we propose $\textbf{IsoBench}$, a benchmark dataset containing problems from four major areas: math, science, algorithms, and games. Each example is presented with multiple… ▽ More

    Submitted 2 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  4. arXiv:2402.10424  [pdf, other

    cs.CL cs.AI

    Understanding In-Context Learning with a Pelican Soup Framework

    Authors: Ting-Rui Chiang, Dani Yogatama

    Abstract: Many existing theoretical analyses of in-context learning for natural language processing are based on latent variable models that leaves gaps between theory and practice. We aim to close these gaps by proposing a theoretical framework, the Pelican Soup Framework. In this framework, we introduce (1) the notion of a common sense knowledge base, (2) a general formalism for natural language classific… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  5. arXiv:2402.02392  [pdf, other

    cs.AI cs.CL cs.LG

    DeLLMa: A Framework for Decision Making Under Uncertainty with Large Language Models

    Authors: Ollie Liu, Deqing Fu, Dani Yogatama, Willie Neiswanger

    Abstract: The potential of large language models (LLMs) as decision support tools is increasingly being explored in fields such as business, engineering, and medicine, which often face challenging tasks of decision-making under uncertainty. In this paper, we show that directly prompting LLMs on these types of decision-making problems can yield poor results, especially as the problem complexity increases. To… ▽ More

    Submitted 9 June, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

    Comments: 35 pages, 25 figures

  6. arXiv:2311.09615  [pdf, other

    cs.CL

    On Retrieval Augmentation and the Limitations of Language Model Training

    Authors: Ting-Rui Chiang, Xinyan Velocity Yu, Joshua Robinson, Ollie Liu, Isabelle Lee, Dani Yogatama

    Abstract: Augmenting a language model (LM) with $k$-nearest neighbors ($k$NN) retrieval on its training data alone can decrease its perplexity, though the underlying reasons for this remain elusive. In this work, we rule out one previously posited possibility -- the "softmax bottleneck." We then create a new dataset to evaluate LM generalization ability in the setting where training data contains additional… ▽ More

    Submitted 2 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL 2024

  7. arXiv:2310.16261  [pdf, other

    cs.CL

    The Distributional Hypothesis Does Not Fully Explain the Benefits of Masked Language Model Pretraining

    Authors: Ting-Rui Chiang, Dani Yogatama

    Abstract: We analyze the masked language modeling pretraining objective function from the perspective of the distributional hypothesis. We investigate whether better sample efficiency and the better generalization capability of models pretrained with masked language modeling can be attributed to the semantic similarity encoded in the pretraining data's distributional property. Via a synthetic dataset, our a… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023

  8. arXiv:2310.07972  [pdf, other

    cs.LG cs.AI cs.IT

    Interpretable Diffusion via Information Decomposition

    Authors: Xianghao Kong, Ollie Liu, Han Li, Dani Yogatama, Greg Ver Steeg

    Abstract: Denoising diffusion models enable conditional generation and density modeling of complex relationships like images and text. However, the nature of the learned relationships is opaque making it difficult to understand precisely what relationships between words and parts of an image are captured, or to predict the effect of an intervention. We illuminate the fine-grained relationships learned by di… ▽ More

    Submitted 18 May, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: 32 pages, 18 figures

  9. arXiv:2207.10551  [pdf, other

    cs.LG cs.CL

    Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?

    Authors: Yi Tay, Mostafa Dehghani, Samira Abnar, Hyung Won Chung, William Fedus, Jinfeng Rao, Sharan Narang, Vinh Q. Tran, Dani Yogatama, Donald Metzler

    Abstract: There have been a lot of interest in the scaling properties of Transformer models. However, not much has been done on the front of investigating the effect of scaling properties of different inductive biases and model architectures. Do model architectures scale differently? If so, how does inductive bias affect scaling behaviour? How does this influence upstream (pretraining) and downstream (trans… ▽ More

    Submitted 21 July, 2022; originally announced July 2022.

  10. arXiv:2206.10658  [pdf, other

    cs.CL cs.IR

    Questions Are All You Need to Train a Dense Passage Retriever

    Authors: Devendra Singh Sachan, Mike Lewis, Dani Yogatama, Luke Zettlemoyer, Joelle Pineau, Manzil Zaheer

    Abstract: We introduce ART, a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data. Dense retrieval is a central challenge for open-domain tasks, such as Open QA, where state-of-the-art methods typically require large supervised datasets with custom hard-negative mining and denoising of positive examples. ART, in contrast, only requires a… ▽ More

    Submitted 2 April, 2023; v1 submitted 21 June, 2022; originally announced June 2022.

    Comments: Accepted to TACL, pre MIT Press publication version

  11. arXiv:2206.07682  [pdf, other

    cs.CL

    Emergent Abilities of Large Language Models

    Authors: Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, William Fedus

    Abstract: Scaling up language models has been shown to predictably improve performance and sample efficiency on a wide range of downstream tasks. This paper instead discusses an unpredictable phenomenon that we refer to as emergent abilities of large language models. We consider an ability to be emergent if it is not present in smaller models but is present in larger models. Thus, emergent abilities cannot… ▽ More

    Submitted 26 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: Transactions on Machine Learning Research (TMLR), 2022

  12. arXiv:2205.02655  [pdf, other

    cs.CV cs.CL

    Language Models Can See: Plugging Visual Controls in Text Generation

    Authors: Yixuan Su, Tian Lan, Yahui Liu, Fangyu Liu, Dani Yogatama, Yan Wang, Lingpeng Kong, Nigel Collier

    Abstract: Generative language models (LMs) such as GPT-2/3 can be prompted to generate text with remarkable quality. While they are designed for text-prompted generation, it remains an open question how the generation process could be guided by modalities beyond text such as images. In this work, we propose a training-free framework, called MAGIC (iMAge-Guided text generatIon with CLIP), for plugging in vis… ▽ More

    Submitted 30 May, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

    Comments: 21 pages, 5 figures, 5 tables; (v2 adds some experimental details)

  13. arXiv:2203.01311  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.MM

    High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-Modality Representation Learning

    Authors: Paul Pu Liang, Yiwei Lyu, Xiang Fan, Jeffrey Tsaw, Yudong Liu, Shentong Mo, Dani Yogatama, Louis-Philippe Morency, Ruslan Salakhutdinov

    Abstract: Many real-world problems are inherently multimodal, from spoken language, gestures, and paralinguistics humans use to communicate, to force, proprioception, and visual sensors on robots. While there has been an explosion of interest in multimodal learning, these methods are focused on a small set of modalities primarily in language, vision, and audio. In order to accelerate generalization towards… ▽ More

    Submitted 28 June, 2023; v1 submitted 2 March, 2022; originally announced March 2022.

    Comments: TMLR 2023, Code available at https://github.com/pliang279/HighMMT

  14. arXiv:2202.06417  [pdf, other

    cs.CL

    A Contrastive Framework for Neural Text Generation

    Authors: Yixuan Su, Tian Lan, Yan Wang, Dani Yogatama, Lingpeng Kong, Nigel Collier

    Abstract: Text generation is of great importance to many natural language processing applications. However, maximization-based decoding methods (e.g. beam search) of neural language models often lead to degenerate solutions -- the generated text is unnatural and contains undesirable repetitions. Existing approaches introduce stochasticity via sampling or modify training objectives to decrease probabilities… ▽ More

    Submitted 26 September, 2022; v1 submitted 13 February, 2022; originally announced February 2022.

    Comments: NeurIPS 2022

  15. arXiv:2201.09680  [pdf, other

    cs.CL cs.AI

    Relational Memory Augmented Language Models

    Authors: Qi Liu, Dani Yogatama, Phil Blunsom

    Abstract: We present a memory-augmented approach to condition an autoregressive language model on a knowledge graph. We represent the graph as a collection of relation triples and retrieve relevant relations for a given context to improve text generation. Experiments on WikiText-103, WMT19, and enwik8 English datasets demonstrate that our approach produces a better language model in terms of perplexity and… ▽ More

    Submitted 24 January, 2022; originally announced January 2022.

    Comments: Accepted to TACL, pre MIT Press publication version

  16. arXiv:2110.05838  [pdf, other

    cs.LG cs.AI cs.CL

    Balancing Average and Worst-case Accuracy in Multitask Learning

    Authors: Paul Michel, Sebastian Ruder, Dani Yogatama

    Abstract: When training and evaluating machine learning models on a large number of tasks, it is important to not only look at average task accuracy -- which may be biased by easy or redundant tasks -- but also worst-case accuracy (i.e. the performance on the task with the lowest accuracy). In this work, we show how to use techniques from the distributionally robust optimization (DRO) literature to improve… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: Under review

  17. arXiv:2110.02488  [pdf, other

    cs.CL

    ABC: Attention with Bounded-memory Control

    Authors: Hao Peng, Jungo Kasai, Nikolaos Pappas, Dani Yogatama, Zhaofeng Wu, Lingpeng Kong, Roy Schwartz, Noah A. Smith

    Abstract: Transformer architectures have achieved state-of-the-art results on a variety of sequence modeling tasks. However, their attention mechanism comes with a quadratic complexity in sequence lengths, making the computational overhead prohibitive, especially for long sequences. Attention context can be seen as a random-access memory with each token taking a slot. Under this perspective, the memory size… ▽ More

    Submitted 1 June, 2022; v1 submitted 5 October, 2021; originally announced October 2021.

  18. arXiv:2109.10686  [pdf, other

    cs.CL cs.AI cs.CV cs.LG

    Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers

    Authors: Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler

    Abstract: There remain many open questions pertaining to the scaling behaviour of Transformer architectures. These scaling decisions and findings can be critical, as training runs often come with an associated computational cost which have both financial and/or environmental impact. The goal of this paper is to present scaling insights from pretraining and finetuning Transformers. While Kaplan et al. presen… ▽ More

    Submitted 30 January, 2022; v1 submitted 22 September, 2021; originally announced September 2021.

    Comments: ICLR 2022 + Updated Checkpoint Release

  19. arXiv:2106.05346  [pdf, other

    cs.CL cs.AI cs.IR

    End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

    Authors: Devendra Singh Sachan, Siva Reddy, William Hamilton, Chris Dyer, Dani Yogatama

    Abstract: We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers. We model retrieval decisions as latent variables over sets of relevant documents. Since marginalizing over sets of retrieved documents is computationally hard, we approximate this using an expectat… ▽ More

    Submitted 4 December, 2021; v1 submitted 9 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021 camera-ready version

  20. arXiv:2103.13076  [pdf, other

    cs.CL

    Finetuning Pretrained Transformers into RNNs

    Authors: Jungo Kasai, Hao Peng, Yizhe Zhang, Dani Yogatama, Gabriel Ilharco, Nikolaos Pappas, Yi Mao, Weizhu Chen, Noah A. Smith

    Abstract: Transformers have outperformed recurrent neural networks (RNNs) in natural language generation. But this comes with a significant computational cost, as the attention mechanism's complexity scales quadratically with sequence length. Efficient transformer variants have received increasing interest in recent works. Among them, a linear-complexity recurrent variant has proven well suited for autoregr… ▽ More

    Submitted 20 September, 2021; v1 submitted 24 March, 2021; originally announced March 2021.

    Comments: EMNLP 2021

  21. arXiv:2103.02143  [pdf, other

    cs.CL

    Random Feature Attention

    Authors: Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A. Smith, Lingpeng Kong

    Abstract: Transformers are state-of-the-art models for a variety of sequence modeling tasks. At their core is an attention function which models pairwise interactions between the inputs at every timestep. While attention is powerful, it does not scale efficiently to long sequences due to its quadratic time and space complexity in the sequence length. We propose RFA, a linear time and space attention that us… ▽ More

    Submitted 19 March, 2021; v1 submitted 2 March, 2021; originally announced March 2021.

    Comments: ICLR 2021

  22. arXiv:2102.02557  [pdf, other

    cs.CL

    Adaptive Semiparametric Language Models

    Authors: Dani Yogatama, Cyprien de Masson d'Autume, Lingpeng Kong

    Abstract: We present a language model that combines a large parametric neural network (i.e., a transformer) with a non-parametric episodic memory component in an integrated architecture. Our model uses extended short-term context by caching local hidden states -- similar to transformer-XL -- and global long-term memory by retrieving a set of nearest neighbor tokens at each timestep. We design a gating funct… ▽ More

    Submitted 4 February, 2021; originally announced February 2021.

    Comments: Accepted to TACL, pre MIT Press publication version

  23. arXiv:2102.01951  [pdf, other

    cs.CL cs.AI

    Mind the Gap: Assessing Temporal Generalization in Neural Language Models

    Authors: Angeliki Lazaridou, Adhiguna Kuncoro, Elena Gribovskaya, Devang Agrawal, Adam Liska, Tayfun Terzi, Mai Gimenez, Cyprien de Masson d'Autume, Tomas Kocisky, Sebastian Ruder, Dani Yogatama, Kris Cao, Susannah Young, Phil Blunsom

    Abstract: Our world is open-ended, non-stationary, and constantly evolving; thus what we talk about and how we talk about it change over time. This inherent dynamic nature of language contrasts with the current static language modelling paradigm, which trains and evaluates models on utterances from overlapping time periods. Despite impressive recent progress, we demonstrate that Transformer-XL language mode… ▽ More

    Submitted 26 October, 2021; v1 submitted 3 February, 2021; originally announced February 2021.

    Comments: To appear as a Spotlight at NeurIPS 2021

  24. arXiv:2005.13482  [pdf, other

    cs.CL

    Syntactic Structure Distillation Pretraining For Bidirectional Encoders

    Authors: Adhiguna Kuncoro, Lingpeng Kong, Daniel Fried, Dani Yogatama, Laura Rimell, Chris Dyer, Phil Blunsom

    Abstract: Textual representation learners trained on large amounts of data have achieved notable success on downstream tasks; intriguingly, they have also performed well on challenging tests of syntactic competence. Given this success, it remains an open question whether scalable learners like BERT can become fully proficient in the syntax of natural language by virtue of data scale alone, or whether they s… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: 17 pages, 6 tables, 2 figures. AK and LK contributed equally

  25. A Call for More Rigor in Unsupervised Cross-lingual Learning

    Authors: Mikel Artetxe, Sebastian Ruder, Dani Yogatama, Gorka Labaka, Eneko Agirre

    Abstract: We review motivations, definition, approaches, and methodology for unsupervised cross-lingual learning and call for a more rigorous position in each of them. An existing rationale for such research is based on the lack of parallel data for many of the world's languages. However, we argue that a scenario without any parallel data and abundant monolingual data is unrealistic in practice. We also dis… ▽ More

    Submitted 30 April, 2020; originally announced April 2020.

    Comments: ACL 2020

  26. arXiv:2002.09543  [pdf, other

    cs.CL

    Modelling Latent Skills for Multitask Language Generation

    Authors: Kris Cao, Dani Yogatama

    Abstract: We present a generative model for multitask conditional language generation. Our guiding hypothesis is that a shared set of latent skills underlies many disparate language generation tasks, and that explicitly modelling these skills in a task embedding space can help with both positive transfer across tasks and with efficient adaptation to new tasks. We instantiate this task embedding space as a l… ▽ More

    Submitted 21 February, 2020; originally announced February 2020.

  27. arXiv:1911.03064  [pdf, other

    cs.CL cs.CY cs.LG

    Reducing Sentiment Bias in Language Models via Counterfactual Evaluation

    Authors: Po-Sen Huang, Huan Zhang, Ray Jiang, Robert Stanforth, Johannes Welbl, Jack Rae, Vishal Maini, Dani Yogatama, Pushmeet Kohli

    Abstract: Advances in language modeling architectures and the availability of large text corpora have driven progress in automatic text generation. While this results in models capable of generating coherent texts, it also prompts models to internalize social biases present in the training corpus. This paper aims to quantify and reduce a particular type of bias exhibited by language models: bias in the sent… ▽ More

    Submitted 8 October, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

    Comments: Accepted in the Findings of EMNLP, 2020

  28. On the Cross-lingual Transferability of Monolingual Representations

    Authors: Mikel Artetxe, Sebastian Ruder, Dani Yogatama

    Abstract: State-of-the-art unsupervised multilingual models (e.g., multilingual BERT) have been shown to generalize in a zero-shot cross-lingual setting. This generalization ability has been attributed to the use of a shared subword vocabulary and joint training across multiple languages giving rise to deep multilingual abstractions. We evaluate this hypothesis by designing an alternative approach that tran… ▽ More

    Submitted 26 May, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

    Comments: ACL 2020

  29. arXiv:1910.08350  [pdf, other

    cs.CL cs.LG

    A Mutual Information Maximization Perspective of Language Representation Learning

    Authors: Lingpeng Kong, Cyprien de Masson d'Autume, Wang Ling, Lei Yu, Zihang Dai, Dani Yogatama

    Abstract: We show state-of-the-art word representation learning methods maximize an objective function that is a lower bound on the mutual information between different parts of a word sequence (i.e., a sentence). Our formulation provides an alternative perspective that unifies classical word embedding models (e.g., Skip-gram) and modern contextual embeddings (e.g., BERT, XLNet). In addition to enhancing ou… ▽ More

    Submitted 26 November, 2019; v1 submitted 18 October, 2019; originally announced October 2019.

    Comments: 12 pages, 3 figures

  30. arXiv:1909.01492  [pdf, other

    cs.CL cs.CR cs.LG stat.ML

    Achieving Verified Robustness to Symbol Substitutions via Interval Bound Propagation

    Authors: Po-Sen Huang, Robert Stanforth, Johannes Welbl, Chris Dyer, Dani Yogatama, Sven Gowal, Krishnamurthy Dvijotham, Pushmeet Kohli

    Abstract: Neural networks are part of many contemporary NLP systems, yet their empirical successes come at the price of vulnerability to adversarial attacks. Previous work has used adversarial training and data augmentation to partially mitigate such brittleness, but these are unlikely to find worst-case adversaries due to the complexity of the search space arising from discrete text perturbations. In this… ▽ More

    Submitted 20 December, 2019; v1 submitted 3 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019

  31. arXiv:1906.01076  [pdf, other

    cs.LG cs.CL stat.ML

    Episodic Memory in Lifelong Language Learning

    Authors: Cyprien de Masson d'Autume, Sebastian Ruder, Lingpeng Kong, Dani Yogatama

    Abstract: We introduce a lifelong language learning setup where a model needs to learn from a stream of text examples without any dataset identifier. We propose an episodic memory model that performs sparse experience replay and local adaptation to mitigate catastrophic forgetting in this setup. Experiments on text classification and question answering demonstrate the complementary benefits of sparse experi… ▽ More

    Submitted 25 November, 2019; v1 submitted 3 June, 2019; originally announced June 2019.

    Comments: Proceedings of NeurIPS 2019

  32. arXiv:1901.11373  [pdf, other

    cs.LG cs.CL stat.ML

    Learning and Evaluating General Linguistic Intelligence

    Authors: Dani Yogatama, Cyprien de Masson d'Autume, Jerome Connor, Tomas Kocisky, Mike Chrzanowski, Lingpeng Kong, Angeliki Lazaridou, Wang Ling, Lei Yu, Chris Dyer, Phil Blunsom

    Abstract: We define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language's lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly. Using this definition, we analyze state-of-the-art natural language understanding models and conduct an extensive empirical investigation to evaluate them against these criteria through a series of ex… ▽ More

    Submitted 31 January, 2019; originally announced January 2019.

  33. arXiv:1901.09296  [pdf, other

    cs.CL

    Variational Smoothing in Recurrent Neural Network Language Models

    Authors: Lingpeng Kong, Gabor Melis, Wang Ling, Lei Yu, Dani Yogatama

    Abstract: We present a new theoretical perspective of data noising in recurrent neural network language models (Xie et al., 2017). We show that each variant of data noising is an instance of Bayesian recurrent neural networks with a particular variational distribution (i.e., a mixture of Gaussians whose weights depend on statistics derived from the corpus such as the unigram distribution). We use this insig… ▽ More

    Submitted 26 January, 2019; originally announced January 2019.

    Comments: Accepted as a conference paper at ICLR 2019

  34. Jointly Learning Sentence Embeddings and Syntax with Unsupervised Tree-LSTMs

    Authors: Jean Maillard, Stephen Clark, Dani Yogatama

    Abstract: We introduce a neural network that represents sentences by composing their words according to induced binary parse trees. We use Tree-LSTM as our composition function, applied along a tree structure found by a fully differentiable natural language chart parser. Our model simultaneously optimises both the composition function and the parser, thus eliminating the need for externally-provided parse t… ▽ More

    Submitted 25 May, 2017; originally announced May 2017.

    Journal ref: Natural Language Engineering 25, no. 4 (2019): 433-49

  35. arXiv:1705.04146  [pdf, other

    cs.AI cs.CL cs.LG

    Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems

    Authors: Wang Ling, Dani Yogatama, Chris Dyer, Phil Blunsom

    Abstract: Solving algebraic word problems requires executing a series of arithmetic operations---a program---to obtain a final answer. However, since programs can be arbitrarily complicated, inducing them directly from question-answer pairs is a formidable challenge. To make this task more feasible, we solve these problems by generating answer rationales, sequences of natural language and human-readable mat… ▽ More

    Submitted 23 October, 2017; v1 submitted 11 May, 2017; originally announced May 2017.

  36. arXiv:1703.01898  [pdf, other

    stat.ML cs.CL cs.LG

    Generative and Discriminative Text Classification with Recurrent Neural Networks

    Authors: Dani Yogatama, Chris Dyer, Wang Ling, Phil Blunsom

    Abstract: We empirically characterize the performance of discriminative and generative LSTM models for text classification. We find that although RNN-based generative models are more powerful than their bag-of-words ancestors (e.g., they account for conditional dependencies across words in a document), they have higher asymptotic error rates than discriminatively trained RNN models. However we also find tha… ▽ More

    Submitted 25 May, 2017; v1 submitted 6 March, 2017; originally announced March 2017.

  37. arXiv:1611.09100  [pdf, other

    cs.CL

    Learning to Compose Words into Sentences with Reinforcement Learning

    Authors: Dani Yogatama, Phil Blunsom, Chris Dyer, Edward Grefenstette, Wang Ling

    Abstract: We use reinforcement learning to learn tree-structured neural networks for computing representations of natural language sentences. In contrast with prior work on tree-structured models in which the trees are either provided as input or predicted using supervision from explicit treebank annotations, the tree structures in this work are optimized to improve performance on a downstream task. Experim… ▽ More

    Submitted 28 November, 2016; originally announced November 2016.

  38. arXiv:1512.02595  [pdf, other

    cs.CL

    Deep Speech 2: End-to-End Speech Recognition in English and Mandarin

    Authors: Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh , et al. (9 additional authors not shown)

    Abstract: We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our app… ▽ More

    Submitted 8 December, 2015; originally announced December 2015.

  39. arXiv:1506.02004  [pdf, other

    cs.CL

    Sparse Overcomplete Word Vector Representations

    Authors: Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama, Chris Dyer, Noah Smith

    Abstract: Current distributed representations of words show little resemblance to theories of lexical semantics. The former are dense and uninterpretable, the latter largely based on familiar, discrete classes (e.g., supersenses) and relations (e.g., synonymy and hypernymy). We propose methods that transform word vectors into sparse (and optionally binary) vectors. The resulting representations are more sim… ▽ More

    Submitted 5 June, 2015; originally announced June 2015.

    Comments: Proceedings of ACL 2015

  40. arXiv:1503.00693  [pdf, other

    cs.CL cs.LG stat.ML

    Bayesian Optimization of Text Representations

    Authors: Dani Yogatama, Noah A. Smith

    Abstract: When applying machine learning to problems in NLP, there are many choices to make about how to represent input texts. These choices can have a big effect on performance, but they are often uninteresting to researchers or practitioners who simply need a module that performs well. We propose an approach to optimizing over this space of choices, formulating the problem as global optimization. We appl… ▽ More

    Submitted 2 March, 2015; originally announced March 2015.

  41. arXiv:1406.2035  [pdf, other

    cs.CL cs.LG stat.ML

    Learning Word Representations with Hierarchical Sparse Coding

    Authors: Dani Yogatama, Manaal Faruqui, Chris Dyer, Noah A. Smith

    Abstract: We propose a new method for learning word representations using hierarchical regularization in sparse coding inspired by the linguistic study of word meanings. We show an efficient learning algorithm based on stochastic proximal methods that is significantly faster than previous approaches, making it possible to perform hierarchical sparse coding on a corpus of billions of word tokens. Experiments… ▽ More

    Submitted 6 November, 2014; v1 submitted 8 June, 2014; originally announced June 2014.

  42. arXiv:1310.2627  [pdf, ps, other

    stat.ML cs.AI cs.LG

    A Sparse and Adaptive Prior for Time-Dependent Model Parameters

    Authors: Dani Yogatama, Bryan R. Routledge, Noah A. Smith

    Abstract: We consider the scenario where the parameters of a probabilistic model are expected to vary over time. We construct a novel prior distribution that promotes sparsity and adapts the strength of correlation between parameters at successive timesteps, based on the data. We derive approximate variational inference procedures for learning and prediction with this prior. We test the approach on two task… ▽ More

    Submitted 7 November, 2015; v1 submitted 9 October, 2013; originally announced October 2013.