Genes From Scratch The Evolutionary Fate of de Novo Genes
Genes From Scratch The Evolutionary Fate of de Novo Genes
of Pages 5
Review
Although considered an extremely unlikely event, many sequence similarity of orphans with other genes is permit-
genes emerge from previously noncoding genomic ted, processes like exaptation of transposable elements,
regions. This review covers the entire life cycle of such gene duplication, and horizontal gene transfer emerge as
de novo genes. Two competing hypotheses about the potential forces underlying the generation of orphan genes
process of de novo gene birth are discussed as well as [6]. Genes originating from such processes with detectable
the high death rate of de novo genes. Despite the high sequence similarities are better characterized as young
death rate, some de novo genes are retained and remain genes and should be clearly distinguished from orphan
functional, even in distantly related species, through genes sensu stricto. Mechanisms resulting in true orphans
their integration into gene networks. Further studies can be placed into four categories, which I outline here. (i)
combining gene expression with ribosome profiling in Origin of new genes from previously noncoding DNA –
multiple populations across different species will be these genes have also been called de novo genes indicating
instrumental for an improved understanding of the evo- that the ancestral sequence was not functional. (ii) Gene
lutionary processes operating on de novo genes. duplication and rapid divergence: either gene duplications
or insertions of reverse transcribed mRNA sequences into
Are orphan genes a dated concept? the genome result in duplications of already existing genes.
For many years, it had been considered extremely unlikely, It has been proposed that duplicated copies may undergo
if not impossible, that genes with no detectable homology phases of rapid evolution in a combination of neutral and
could emerge (e.g., [1]). With the availability of the full adaptive changes [4]. This rapid evolution erases the se-
genomic sequence of yeast, however, this picture changed. quence similarity with the other copies, generating an
About one third of the entire set of genes in baker’s yeast orphan gene. Despite being conceptually appealing, this
has no sequence similarity to genes from other organisms class of orphan genes is difficult to distinguish from de novo
[2]. Because nothing was known about their ancestors, genes because it is very challenging to identify historically
these new genes were termed orphans (or ORFans in rapidly evolving sequences. Hence, I treat this class jointly
the microbial world [3]). with de novo genes. (iii) Horizontal gene transfer: integra-
It has become common practice to identify orphan genes tion of foreign DNA from bacteria or viruses into the host
based on sequence similarity searches (e.g., BLAST) using genome may result in the acquisition of hitherto absent
a very relaxed significance cutoff: those genes with no hit in genes. Given the vast number of viral sequences, it is very
other species are classified as orphans [4]. The term orphan likely that the source of the acquired gene has not yet been
was not only appealing but also precise as long as only a sequenced. Although this mechanism is prevalent in pro-
few sequenced genomes were available. With an increasing karyotes, based on the current surveys of orphan genes in
number of sequenced genomes, the taxonomic sampling eukaryotes, very little support for horizontal gene transfer
became denser and the definition of orphans lost its preci- has been found [6]. (iv) Frameshift mutations (overprint-
sion: orphans could now be detected in related species, ing): N-terminal frameshifts could generate an entirely
leading to a violation of the definition. To account for this, it different protein with almost no change in the protein
has been proposed that orphans be renamed as taxonomi- coding DNA sequence (CDS) [7]. In viruses, de novo genes
cally restricted genes [5], but this concept requires an often are frequently generated without frameshifts in the ances-
arbitrary definition of the taxonomic depth to distinguish tral gene [8]. Although up to 7% of the orphan genes may
the relevant units. originate by this process [9], I suggest their evolutionary
dynamics be treated separately because their emergence is
Mechanisms giving rise to orphans frequently coupled with the loss of the progenitor gene.
Given this imprecision, it may be more informative to focus
on the biological processes generating orphan genes. When Shifting the focus from orphans to de novo genes
the definition of orphan genes is relaxed such that some Given the diversity of processes underlying orphan births
and the uncertainty surrounding orphan definition, I
Corresponding author: Schlötterer, C. ([email protected]).
propose that future studies describing the patterns of
Keywords: orphans; de novo genes; transcription; population genetics. molecular evolution focus solely on de novo genes. The
0168-9525/
unambiguous definition of de novo genes will be of key
ß 2015 Published by Elsevier Ltd. http://dx.doi.org/10.1016/j.tig.2015.02.007 importance for informative meta analyses providing a
general picture of the evolutionary dynamics of these
Trends in Genetics xx (2015) 1–5 1
TIGS-1186; No. of Pages 5
genes. The importance of separating novel genes according studies use mRNA expression as an indicator for functional
to the underlying molecular mechanism is emphasized de novo genes. Given that a large fraction of the genome is
by their previously documented different evolutionary transcribed [19], several researchers additionally validat-
dynamics [10]. ed the translation of these mRNAs into proteins by the
presence of the corresponding peptides in databases. Such
Are de novo genes real? databases are biased towards larger proteins, however,
De novo genes arise from previously noncoding DNA, are and de novo genes are short. This bias has motivated the
short, and are expressed at low levels [10–12]. These fea- use of other methods, such as ribosome profiling to study the
tures frequently raise doubts about the biological signifi- translation of putative de novo genes [14]. Overall, function-
cance of de novo genes. In light of these concerns, several al importance of de novo genes is well-supported by the
approaches have been used to distinguish true de novo combined evidence from mRNA and protein expression
genes from random noise. [14,15,20,21].
(A) (B)
TF
TF
(C)
TF
TRENDS in Genetics
Figure 1. Two competing models of de novo gene birth. Open reading frames (ORFs) are shown as colored blocks. Active transcription is symbolized by an arrow and
the presence of translation by a peptide. Non-neutral phases are indicated by a broken box. (A) and (B) illustrate two versions of the expression first model. (A) The
protogene model assumes that several short peptides are expressed and during the course of evolution they are combined into a larger de novo gene. (B) the ORF contains
premature stop codons (yellow circles), which prevent the translation of the expressed mRNA; only after new mutations generate a full-length ORF is the functional de novo
gene obtained. (C) The ORF first model states that a fully functional ORF is present but not expressed because the necessary regulatory signals are missing. Once new
mutations generate functional transcription factor (TF) binding sites, the de novo gene is expressed and translated.
of new mutations, proto genes can grow and result in lines of evidence support this interpretation. (i) More
functional de novo genes [14] (Figure 1A). Alternatively, strains expressed the de novo genes than expected under
it is also possible that the full-length transcript is initially neutrality. (ii) Consistent with selectively favored spread
interrupted by stop codons, but new mutations generate of the expressed de novo genes, the amount of polymor-
the full-length ORF of the de novo gene (Figure 1B). The phism around them was lower in individuals carrying the
appeal of this model is that it builds on the ubiquity of expressed variant than in those with the non-expressed
expressed genomic regions and also circumvents the im- copy. Importantly, because only the expression of a func-
plausibility problem of de novo genes, noted by [1]. This tional gene could confer a fitness advantage, this pattern
model is strongly supported by a range of studies that suggests that a new mutation resulting in the expression of
found transcription preceded the emergence of an ORF and a pre-existing ORF leads to these de novo genes becoming
translation [14,22,32]. functional.
4
TIGS-1186; No. of Pages 5