Skip to main content

Showing 1–1 of 1 results for author: Steinegger, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2007.06225  [pdf

    cs.LG cs.CL cs.DC stat.ML

    ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing

    Authors: Ahmed Elnaggar, Michael Heinzinger, Christian Dallago, Ghalia Rihawi, Yu Wang, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Martin Steinegger, Debsindhu Bhowmik, Burkhard Rost

    Abstract: Computational biology and bioinformatics provide vast data gold-mines from protein sequences, ideal for Language Models taken from NLP. These LMs reach for new prediction frontiers at low inference costs. Here, we trained two auto-regressive models (Transformer-XL, XLNet) and four auto-encoder models (BERT, Albert, Electra, T5) on data from UniRef and BFD containing up to 393 billion amino acids.… ▽ More

    Submitted 4 May, 2021; v1 submitted 13 July, 2020; originally announced July 2020.

    Comments: 17 pages, 9 figures, 4 tables