Skip to main content

Showing 1–9 of 9 results for author: Fernando, A

Searching in archive cs. Search in all archives.
.
  1. arXiv:2406.06021  [pdf, other

    cs.CL

    Shoulders of Giants: A Look at the Degree and Utility of Openness in NLP Research

    Authors: Surangika Ranathunga, Nisansa de Silva, Dilith Jayakody, Aloka Fernando

    Abstract: We analysed a sample of NLP research papers archived in ACL Anthology as an attempt to quantify the degree of openness and the benefit of such an open culture in the NLP community. We observe that papers published in different NLP venues show different patterns related to artefact reuse. We also note that more than 30% of the papers we analysed do not release their artefacts publicly, despite prom… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Will appear in ACL 2024

  2. arXiv:2404.07839  [pdf, other

    cs.LG cs.AI cs.CL

    RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

    Authors: Aleksandar Botev, Soham De, Samuel L Smith, Anushan Fernando, George-Cristian Muraru, Ruba Haroun, Leonard Berrada, Razvan Pascanu, Pier Giuseppe Sessa, Robert Dadashi, Léonard Hussenot, Johan Ferret, Sertan Girgin, Olivier Bachem, Alek Andreev, Kathleen Kenealy, Thomas Mesnard, Cassidy Hardin, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, Pouya Tafti , et al. (37 additional authors not shown)

    Abstract: We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent performance on language. It has a fixed-sized state, which reduces memory use and enables efficient inference on long sequences. We provide a pre-trained model with 2B non-embedding parameters, and an instruction tuned var… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  3. arXiv:2402.19427  [pdf, other

    cs.LG cs.CL

    Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

    Authors: Soham De, Samuel L. Smith, Anushan Fernando, Aleksandar Botev, George Cristian-Muraru, Albert Gu, Ruba Haroun, Leonard Berrada, Yutian Chen, Srivatsan Srinivasan, Guillaume Desjardins, Arnaud Doucet, David Budden, Yee Whye Teh, Razvan Pascanu, Nando De Freitas, Caglar Gulcehre

    Abstract: Recurrent neural networks (RNNs) have fast inference and scale efficiently on long sequences, but they are difficult to train and hard to scale. We propose Hawk, an RNN with gated linear recurrences, and Griffin, a hybrid model that mixes gated linear recurrences with local attention. Hawk exceeds the reported performance of Mamba on downstream tasks, while Griffin matches the performance of Llama… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

    Comments: 25 pages, 11 figures

  4. arXiv:2402.07446  [pdf, other

    cs.CL

    Quality Does Matter: A Detailed Look at the Quality and Utility of Web-Mined Parallel Corpora

    Authors: Surangika Ranathunga, Nisansa de Silva, Menan Velayuthan, Aloka Fernando, Charitha Rathnayake

    Abstract: We conducted a detailed analysis on the quality of web-mined corpora for two low-resource languages (making three language pairs, English-Sinhala, English-Tamil and Sinhala-Tamil). We ranked each corpus according to a similarity measure and carried out an intrinsic and extrinsic evaluation on different portions of this ranked corpus. We show that there are significant quality differences between d… ▽ More

    Submitted 14 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

  5. arXiv:2303.06349  [pdf, other

    cs.LG

    Resurrecting Recurrent Neural Networks for Long Sequences

    Authors: Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, Soham De

    Abstract: Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train. Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks, and have the added benefits of fast parallelizable training and RNN-like fast inference. However, while SSMs are superficially similar to RNNs, there are important diff… ▽ More

    Submitted 11 March, 2023; originally announced March 2023.

    Comments: 30 pages, 9 figures

  6. arXiv:2302.13721  [pdf, other

    cs.CV

    Wireless End-to-End Image Transmission System using Semantic Communications

    Authors: Maheshi Lokumarambage, Vishnu Gowrisetty, Hossein Rezaei, Thushan Sivalingam, Nandana Rajatheva, Anil Fernando

    Abstract: Semantic communication is considered the future of mobile communication, which aims to transmit data beyond Shannon's theorem of communications by transmitting the semantic meaning of the data rather than the bit-by-bit reconstruction of the data at the receiver's end. The semantic communication paradigm aims to bridge the gap of limited bandwidth problems in modern high-volume multimedia applicat… ▽ More

    Submitted 10 April, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

    Comments: Accepted for IEEE Access

  7. arXiv:2205.08722  [pdf

    cs.CL

    Data Augmentation to Address Out-of-Vocabulary Problem in Low-Resource Sinhala-English Neural Machine Translation

    Authors: Aloka Fernando, Surangika Ranathunga

    Abstract: Out-of-Vocabulary (OOV) is a problem for Neural Machine Translation (NMT). OOV refers to words with a low occurrence in the training data, or to those that are absent from the training data. To alleviate this, word or phrase-based Data Augmentation (DA) techniques have been used. However, existing DA techniques have addressed only one of these OOV types and limit to considering either syntactic co… ▽ More

    Submitted 18 May, 2022; originally announced May 2022.

    Journal ref: Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation (2021) 61-70

  8. arXiv:2011.02821  [pdf

    cs.CL

    Data Augmentation and Terminology Integration for Domain-Specific Sinhala-English-Tamil Statistical Machine Translation

    Authors: Aloka Fernando, Surangika Ranathunga, Gihan Dias

    Abstract: Out of vocabulary (OOV) is a problem in the context of Machine Translation (MT) in low-resourced languages. When source and/or target languages are morphologically rich, it becomes even worse. Bilingual list integration is an approach to address the OOV problem. This allows more words to be translated than are in the training data. However, since bilingual lists contain words in the base form, it… ▽ More

    Submitted 3 February, 2021; v1 submitted 5 November, 2020; originally announced November 2020.

  9. arXiv:1910.07395  [pdf, other

    cs.CV

    Offline handwritten mathematical symbol recognition utilising deep learning

    Authors: Azadeh Nazemi, Niloofar Tavakolian, Donal Fitzpatrick, Chandrik a Fernando, Ching Y. Suen

    Abstract: This paper describes an approach for offline recognition of handwritten mathematical symbols. The process of symbol recognition in this paper includes symbol segmentation and accurate classification for over 300 classes. Many multidimensional mathematical symbols need both horizontal and vertical projection to be segmented. However, some symbols do not permit to be projected and stop segmentation,… ▽ More

    Submitted 16 October, 2019; originally announced October 2019.

    ACM Class: I.4.6; I.5.4