Skip to main content

Showing 1–24 of 24 results for author: Heafield, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.21783  [pdf, other

    cs.AI cs.CL cs.CV

    The Llama 3 Herd of Models

    Authors: Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang , et al. (510 additional authors not shown)

    Abstract: Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical… ▽ More

    Submitted 15 August, 2024; v1 submitted 31 July, 2024; originally announced July 2024.

  2. arXiv:2402.01505  [pdf, other

    cs.CL

    Code-Switched Language Identification is Harder Than You Think

    Authors: Laurie Burchell, Alexandra Birch, Robert P. Thompson, Kenneth Heafield

    Abstract: Code switching (CS) is a very common phenomenon in written and spoken communication but one that is handled poorly by many natural language processing applications. Looking to the application of building CS corpora, we explore CS language identification (LID) for corpus building. We make the task more realistic by scaling it to more languages and considering models with simpler architectures for f… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: EACL 2024

  3. arXiv:2309.08958  [pdf, other

    cs.CL cs.AI

    Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca

    Authors: Pinzhen Chen, Shaoxiong Ji, Nikolay Bogoychev, Andrey Kutuzov, Barry Haddow, Kenneth Heafield

    Abstract: Foundational large language models (LLMs) can be instruction-tuned to perform open-domain question answering, facilitating applications like chat assistants. While such efforts are often carried out in a single language, we empirically analyze cost-efficient strategies for multilingual scenarios. Our study employs the Alpaca dataset and machine translations of it to form multilingual data, which i… ▽ More

    Submitted 30 January, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

    Comments: Accepted to Findings of ACL: EACL 2024. Added human evaluation and shortened writing

  4. arXiv:2306.03856  [pdf, other

    cs.CL cs.AI

    Iterative Translation Refinement with Large Language Models

    Authors: Pinzhen Chen, Zhicheng Guo, Barry Haddow, Kenneth Heafield

    Abstract: We propose iteratively prompting a large language model to self-correct a translation, with inspiration from their strong language understanding and translation capability as well as a human-like translation approach. Interestingly, multi-turn querying reduces the output's string-based metric scores, but neural metrics suggest comparable or improved quality. Human evaluations indicate better fluen… ▽ More

    Submitted 1 May, 2024; v1 submitted 6 June, 2023; originally announced June 2023.

  5. An Open Dataset and Model for Language Identification

    Authors: Laurie Burchell, Alexandra Birch, Nikolay Bogoychev, Kenneth Heafield

    Abstract: Language identification (LID) is a fundamental step in many natural language processing pipelines. However, current LID systems are far from perfect, particularly on lower-resource languages. We present a LID model which achieves a macro-average F1 score of 0.93 and a false positive rate of 0.033 across 201 languages, outperforming previous work. We achieve this by training on a curated dataset of… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

    Comments: To be published in ACL 2023

  6. arXiv:2209.00099  [pdf, other

    cs.CL

    Efficient Methods for Natural Language Processing: A Survey

    Authors: Marcos Treviso, Ji-Ung Lee, Tianchu Ji, Betty van Aken, Qingqing Cao, Manuel R. Ciosici, Michael Hassid, Kenneth Heafield, Sara Hooker, Colin Raffel, Pedro H. Martins, André F. T. Martins, Jessica Zosa Forde, Peter Milder, Edwin Simpson, Noam Slonim, Jesse Dodge, Emma Strubell, Niranjan Balasubramanian, Leon Derczynski, Iryna Gurevych, Roy Schwartz

    Abstract: Recent work in natural language processing (NLP) has yielded appealing results from scaling model parameters and training data; however, using only scale to improve performance means that resource consumption also grows. Such resources include data, time, storage, or energy, all of which are naturally limited and unevenly distributed. This motivates research into efficient methods that require few… ▽ More

    Submitted 24 March, 2023; v1 submitted 31 August, 2022; originally announced September 2022.

    Comments: Accepted at TACL, pre publication version

  7. arXiv:2207.04672  [pdf

    cs.CL cs.AI

    No Language Left Behind: Scaling Human-Centered Machine Translation

    Authors: NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran , et al. (14 additional authors not shown)

    Abstract: Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today. However, such efforts have coalesced around a small subset of languages, leaving behind the vast majority of mostly low-resource languages. What does it take to break the 200 language barrier while ensuring safe, high quality res… ▽ More

    Submitted 25 August, 2022; v1 submitted 11 July, 2022; originally announced July 2022.

    Comments: 190 pages

    MSC Class: 68T50 ACM Class: I.2.7

  8. Exploring Diversity in Back Translation for Low-Resource Machine Translation

    Authors: Laurie Burchell, Alexandra Birch, Kenneth Heafield

    Abstract: Back translation is one of the most widely used methods for improving the performance of neural machine translation systems. Recent research has sought to enhance the effectiveness of this method by increasing the 'diversity' of the generated translations. We argue that the definitions and metrics used to quantify 'diversity' in previous work have been insufficient. This work puts forward a more n… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

  9. arXiv:2109.10194  [pdf, other

    cs.CL

    TranslateLocally: Blazing-fast translation running on the local CPU

    Authors: Nikolay Bogoychev, Jelmer Van der Linde, Kenneth Heafield

    Abstract: Every day, millions of people sacrifice their privacy and browsing habits in exchange for online machine translation. Companies and governments with confidentiality requirements often ban online translation or pay a premium to disable logging. To bring control back to the end user and demonstrate speed, we developed translateLocally. Running locally on a desktop or laptop CPU, translateLocally del… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

    Comments: Accepted at EMNLP 2021 demo track; https://translatelocally.com

  10. arXiv:2106.00169  [pdf, other

    cs.CL

    Gender Bias Amplification During Speed-Quality Optimization in Neural Machine Translation

    Authors: Adithya Renduchintala, Denise Diaz, Kenneth Heafield, Xian Li, Mona Diab

    Abstract: Is bias amplified when neural machine translation (NMT) models are optimized for speed and evaluated on generic test sets using BLEU? We investigate architectures and techniques commonly used to speed up decoding in Transformer-based models, such as greedy search, quantization, average attention networks (AANs) and shallow decoder models and show their effect on gendered noun translation. We const… ▽ More

    Submitted 31 May, 2021; originally announced June 2021.

    Comments: Accepted at ACL 2021

  11. arXiv:2012.15455  [pdf, other

    cs.CL

    Fully Synthetic Data Improves Neural Machine Translation with Knowledge Distillation

    Authors: Alham Fikri Aji, Kenneth Heafield

    Abstract: This paper explores augmenting monolingual data for knowledge distillation in neural machine translation. Source language monolingual text can be incorporated as a forward translation. Interestingly, we find the best way to incorporate target language monolingual text is to translate it to the source language and round-trip translate it back to the target language, resulting in a fully synthetic c… ▽ More

    Submitted 15 September, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

  12. arXiv:2008.05348  [pdf, other

    cs.CL

    Approaching Neural Chinese Word Segmentation as a Low-Resource Machine Translation Task

    Authors: Pinzhen Chen, Kenneth Heafield

    Abstract: Chinese word segmentation has entered the deep learning era which greatly reduces the hassle of feature engineering. Recently, some researchers attempted to treat it as character-level translation, which further simplified model designing, but there is a performance gap between the translation-based approach and other methods. This motivates our work, in which we apply the best practices from low-… ▽ More

    Submitted 11 October, 2022; v1 submitted 12 August, 2020; originally announced August 2020.

    Comments: PACLIC 2022

  13. arXiv:2008.04885  [pdf, ps, other

    cs.CL

    The Sockeye 2 Neural Machine Translation Toolkit at AMTA 2020

    Authors: Tobias Domhan, Michael Denkowski, David Vilar, Xing Niu, Felix Hieber, Kenneth Heafield

    Abstract: We present Sockeye 2, a modernized and streamlined version of the Sockeye neural machine translation (NMT) toolkit. New features include a simplified code base through the use of MXNet's Gluon API, a focus on state of the art model architectures, distributed mixed precision training, and efficient CPU decoding with 8-bit quantization. These improvements result in faster training and inference, hig… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

  14. arXiv:1909.06091  [pdf, ps, other

    cs.CL cs.LG

    Neural Machine Translation with 4-Bit Precision and Beyond

    Authors: Alham Fikri Aji, Kenneth Heafield

    Abstract: Neural Machine Translation (NMT) is resource intensive. We design a quantization procedure to compress NMT models better for devices with limited hardware capability. Because most neural network parameters are near zero, we employ logarithmic quantization in lieu of fixed-point quantization. However, we find bias terms are less amenable to log quantization but note they comprise a tiny fraction of… ▽ More

    Submitted 20 September, 2019; v1 submitted 13 September, 2019; originally announced September 2019.

  15. Making Asynchronous Stochastic Gradient Descent Work for Transformers

    Authors: Alham Fikri Aji, Kenneth Heafield

    Abstract: Asynchronous stochastic gradient descent (SGD) is attractive from a speed perspective because workers do not wait for synchronization. However, the Transformer model converges poorly with asynchronous SGD, resulting in substantially lower quality compared to synchronous SGD. To investigate why this is the case, we isolate differences between asynchronous and synchronous methods to investigate batc… ▽ More

    Submitted 8 June, 2019; originally announced June 2019.

    Journal ref: WNGT 2019,80-89

  16. arXiv:1808.10267  [pdf, ps, other

    cs.CL

    Multi-Source Syntactic Neural Machine Translation

    Authors: Anna Currey, Kenneth Heafield

    Abstract: We introduce a novel multi-source technique for incorporating source syntax into neural machine translation using linearized parses. This is achieved by employing separate encoders for the sequential and parsed versions of the same source sentence; the resulting representations are then combined using a hierarchical attention mechanism. The proposed model improves over both seq2seq and parsed base… ▽ More

    Submitted 30 August, 2018; originally announced August 2018.

    Comments: EMNLP 2018

  17. arXiv:1808.08859  [pdf, ps, other

    cs.CL

    Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation

    Authors: Nikolay Bogoychev, Marcin Junczys-Dowmunt, Kenneth Heafield, Alham Fikri Aji

    Abstract: In order to extract the best possible performance from asynchronous stochastic gradient descent one must increase the mini-batch size and scale the learning rate accordingly. In order to achieve further speedup we introduce a technique that delays gradient updates effectively increasing the mini-batch size. Unfortunately with the increase of mini-batch size we worsen the stale gradient problem in… ▽ More

    Submitted 14 September, 2018; v1 submitted 27 August, 2018; originally announced August 2018.

    Comments: To appear in EMNLP 2018 as a short paper

  18. arXiv:1805.12096  [pdf, other

    cs.CL

    Marian: Cost-effective High-Quality Neural Machine Translation in C++

    Authors: Marcin Junczys-Dowmunt, Kenneth Heafield, Hieu Hoang, Roman Grundkiewicz, Anthony Aue

    Abstract: This paper describes the submissions of the "Marian" team to the WNMT 2018 shared task. We investigate combinations of teacher-student training, low-precision matrix products, auto-tuning and other methods to optimize the Transformer model on GPU and CPU. By further integrating these methods with the new averaging attention networks, a recently introduced faster Transformer variant, we create a nu… ▽ More

    Submitted 30 May, 2018; originally announced May 2018.

    Comments: System submission to the Workshop for Neural Machine Translation 2018, efficiency task

  19. arXiv:1805.09863  [pdf, other

    cs.CL

    Fast Neural Machine Translation Implementation

    Authors: Hieu Hoang, Tomasz Dwojak, Rihards Krislauks, Daniel Torregrosa, Kenneth Heafield

    Abstract: This paper describes the submissions to the efficiency track for GPUs at the Workshop for Neural Machine Translation and Generation by members of the University of Edinburgh, Adam Mickiewicz University, Tilde and University of Alicante. We focus on efficient implementation of the recurrent deep-learning model as implemented in Amun, the fast inference engine for neural machine translation. We impr… ▽ More

    Submitted 7 June, 2018; v1 submitted 24 May, 2018; originally announced May 2018.

  20. arXiv:1805.02094  [pdf, other

    cs.CL

    Exploring Hyper-Parameter Optimization for Neural Machine Translation on GPU Architectures

    Authors: Robert Lim, Kenneth Heafield, Hieu Hoang, Mark Briers, Allen Malony

    Abstract: Neural machine translation (NMT) has been accelerated by deep learning neural networks over statistical-based approaches, due to the plethora and programmability of commodity heterogeneous computing architectures such as FPGAs and GPUs and the massive amount of training corpuses generated from news outlets, government agencies and social media. Training a learning classifier for neural networks en… ▽ More

    Submitted 13 September, 2021; v1 submitted 5 May, 2018; originally announced May 2018.

    Comments: 2018 2nd Naval Applications for Machine Learning

  21. arXiv:1804.05940  [pdf, other

    cs.CL

    Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task

    Authors: Marcin Junczys-Dowmunt, Roman Grundkiewicz, Shubha Guha, Kenneth Heafield

    Abstract: Previously, neural methods in grammatical error correction (GEC) did not reach state-of-the-art results compared to phrase-based statistical machine translation (SMT) baselines. We demonstrate parallels between neural GEC and low-resource neural MT and successfully adapt several methods from low-resource MT to neural GEC. We further establish guidelines for trustable results in neural GEC and prop… ▽ More

    Submitted 16 April, 2018; originally announced April 2018.

    Comments: Accepted for oral presentation in long paper research track at NAACL 2018

  22. arXiv:1804.00344  [pdf, other

    cs.CL

    Marian: Fast Neural Machine Translation in C++

    Authors: Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, André F. T. Martins, Alexandra Birch

    Abstract: We present Marian, an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs. Marian is written entirely in C++. We describe the design of the encoder-decoder framework and demonstrate that a research-friendly toolkit can achieve high training and translation speed.

    Submitted 4 April, 2018; v1 submitted 1 April, 2018; originally announced April 2018.

    Comments: Demonstration paper

  23. arXiv:1708.00726  [pdf, other

    cs.CL

    The University of Edinburgh's Neural MT Systems for WMT17

    Authors: Rico Sennrich, Alexandra Birch, Anna Currey, Ulrich Germann, Barry Haddow, Kenneth Heafield, Antonio Valerio Miceli Barone, Philip Williams

    Abstract: This paper describes the University of Edinburgh's submissions to the WMT17 shared news translation and biomedical translation tasks. We participated in 12 translation directions for news, translating between English and Czech, German, Latvian, Russian, Turkish and Chinese. For the biomedical task we submitted systems for English to Czech, German, Polish and Romanian. Our systems are neural machin… ▽ More

    Submitted 2 August, 2017; originally announced August 2017.

    Comments: WMT 2017 shared task track; for Bibtex, see http://homepages.inf.ed.ac.uk/rsennric/bib.html#uedin-nmt:2017

  24. arXiv:1704.05021  [pdf, other

    cs.CL cs.DC cs.LG

    Sparse Communication for Distributed Gradient Descent

    Authors: Alham Fikri Aji, Kenneth Heafield

    Abstract: We make distributed stochastic gradient descent faster by exchanging sparse updates instead of dense updates. Gradient updates are positively skewed as most updates are near zero, so we map the 99% smallest updates (by absolute value) to zero then exchange sparse matrices. This method can be combined with quantization to further improve the compression. We explore different configurations and appl… ▽ More

    Submitted 24 July, 2017; v1 submitted 17 April, 2017; originally announced April 2017.

    Comments: EMNLP 2017

    Journal ref: EMNLP 2017, 440-445