Vol 440|9 March 2006|doi:10.


Expression profiling in primates reveals a rapid
evolution of human transcription factors
Yoav Gilad1†, Alicia Oshlack2, Gordon K. Smyth2, Terence P. Speed2,3 & Kevin P. White1

Although it has been hypothesized for thirty years that many gene expression levels within and between species, we extracted RNA
human adaptations are likely to be due to changes in gene from liver samples of five adult males from each of the four species. A
regulation1, almost nothing is known about the modes of natural common reference design was used, with a sixth human liver sample
selection acting on regulation in primates. Here we identify a set serving as the reference. We performed four technical replicates of
of genes for which expression is evolving under natural selection. each comparison, for a total of 80 hybridizations. Results from all
We use a new multi-species complementary DNA array to compare species were obtained for 907 genes, used in subsequent analyses
steady-state messenger RNA levels in liver tissues within and (Supplementary Table S1).
between humans, chimpanzees, orangutans and rhesus macaques. After image analysis, background correction and normalization,
Using estimates from a linear mixed model, we identify a set of the log expression values were analysed using a linear mixed model
genes for which expression levels have remained constant across with fixed effects for species and sequence mismatches, and a random
the entire phylogeny (,70 million years), and are therefore likely effect for individuals within species (see Methods). For each gene, we
to be under stabilizing selection. Among the top candidates are used residual maximum likelihood15 to estimate the fixed effects and
five genes with expression levels that have previously been shown variances. Hypothesis testing was performed using likelihood ratio
to be altered in liver carcinoma. We also find a number of genes tests (see Methods).
with similar expression levels among non-human primates but As a first step, we identified genes that are differentially expressed
significantly elevated or reduced expression in the human lineage, between species (Table 1). A phylogenetic tree based on the number
features that point to the action of directional selection. Among of differentially expressed genes between species16 recapitulates their
the gene set with a human-specific increase in expression, there is known phylogeny (Supplementary Fig. S1). However, the number of
an excess of transcription factors; the same is not true for genes significantly differentially expressed genes does not always increase
with increased expression in chimpanzee. with evolutionary time.
A number of recent studies have used DNA microarrays to Focusing on human and chimpanzee, we found 110 genes (12%)
compare patterns of gene expression between closely related to be differentially expressed at a false discovery rate (FDR)17 of 0.01,
species2–9. Within primates, the focus has been primarily on with a mean absolute log ratio of 1.56-fold difference (Supplemen-
human–chimpanzee comparisons, estimating gene expression pro- tary Table S2). Our observation is in general agreement with a
files for a number of tissues, including liver, brain and heart2,6,7,10. The statistical meta-analysis11 of the data from ref. 2. In contrast to this
aim has been to characterize general trends in the evolution of gene meta-analysis, however, we find that equal numbers of genes have
expression rather than to identify specific genes of interest. To date, elevated (55) or reduced (55) expression levels in humans compared
conclusions about the selection pressures acting on gene expression to chimpanzees.
have been conflicting2,3,6,11–13. To estimate lineage-specific changes in expression levels, we used
These studies have all relied on data collected from arrays using the expression profiles from orangutan and rhesus macaques as
gene probes that were designed on the basis of human sequences only. outgroups for 84 of the genes that show significantly different
However, sequence mismatches affect hybridization intensity and expression between human and chimpanzee (Fig. 1a; see Methods).
can therefore bias estimates of gene expression differences between Using this approach, we found similar numbers of genes for which
species14. This limitation of single-species arrays is especially expression has been altered in either the human or the chimpanzee
problematic when the goal is to study how expression changes over lineage. Moreover, in both species, the numbers of genes that show
evolutionary time. To make comparisons between more distantly increased or decreased expression levels relative to the estimated
related primate species, we generated a multi-species cDNA array ancestral expression level is similar (45 and 43 of the genes are
that allows comparison of gene expression between species without upregulated in humans and chimpanzees, respectively). In addi-
the confounding effects of sequence divergence14. This cDNA array tion, the average or median fold change in gene expression level is
contains probes for 1,056 orthologous genes from four species (see similar regardless of the lineage or the trend (that is, up or down)
Supplementary Methods)14.
We used this array to compare gene expression profiles in the
livers of humans, chimpanzees (Pan troglodytes), orangutans (Pongo Table 1 | Inter-species differentially expressed genes
pygmaeus) and rhesus macaques (Macaca mulatta), the phylogeny of
Chimpanzee Orangutan Rhesus macaque
which represents approximately 70 million years (Myr) of evolution.
By assigning expression changes in the liver to particular lineages, we Human 110 128 176
were able to identify the first set of genes for which regulation seems Chimpanzee – 150 141
Orangutan – – 129
to be under lineage-specific selection pressures. In order to measure
Department of Genetics and Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut 06510, USA. 2Walter and Eliza Hall Institute of Medical
Research, Parkville, Victoria 3052, Australia. 3Department of Statistics and Program in Biostatistics, University of California, Berkeley, California 94720, USA. †Present address:
Department of Human Genetics, University of Chicago, Chicago, Illinois 60605, USA.

NATURE|Vol 440|9 March 2006 LETTERS

Figure 1 | Expression changes in specific lineages. a, For 84 genes that are lineage. b, For 446 genes that are not differentially expressed between human
differentially expressed between human and chimpanzee, the log2-fold and chimpanzee, and for which an ancestral state could be estimated (see
change relative to the common ancestor is given for the human (blue) and Methods), the log2-fold change relative to the common ancestor is shown for
chimpanzee (orange) lineages. Genes are ordered by the ratio of their the human lineage.
expression changes in the human lineage compared to the chimpanzee

(Supplementary Fig. S2). The pattern also holds for expression significantly differentially expressed between species, and also have
changes in the human lineage in genes that are not differentially low within-species variance (Fig. 2). The expression levels of these
expressed between human and chimpanzee (Fig. 1b; 52% of the genes genes seem to have remained constant for ,70 Myr19, suggesting that
were upregulated). These observations do not agree with previous their regulation is under evolutionary constraint. Among the first
studies2,10. Possible explanations for the discrepancy are the use of 100 genes on our list (Supplementary Table S3), the most significant
human microarrays for inter-primate comparisons2, or the assign- enrichment (P , 1028; uncorrected for multiple tests) is for genes
ment of expression changes to lineages in the absence of outgroup from the category ‘regulation of cellular physiological process’ (Gene
data10. Ontology ID 0051244; As we expect
Our approach also allows the identification of genes for which transcription of such genes to be similar across individuals and
regulation is likely to have evolved under stabilizing selection. species, this finding serves as a validation of the approach.
Previous studies have done this by testing for deviations from A number of recent papers have argued that the majority of
neutrality or stabilizing selection13,16,18. Such an approach requires expression differences observed between primates are neutral,
a model for the evolution of expression, and thus relies on a number based primarily on the observation that the mean square fold change
of parameter estimates about which there is considerable uncertainty in expression levels in liver and brain increases linearly with species
in primates (for example, the neutral expression change per genera- divergence time6,12. Having found no clear increase in the number of
tion, and the environmental and mutational variance for each gene). significantly differentially expressed genes with time (Table 1), we re-
Instead of specifying an explicit model, we used statistical analyses to examined the mean square fold change for our data. This revealed no
rank genes according to their pattern of evolutionary change among linear increase over time (Supplementary Fig. S3). Moreover, our
the four species, and focused on those at the top of the list as the most observation that many genes show stable expression levels over
promising candidates. 70 Myr suggests that, rather than evolving mostly neutrally,
First, we identified genes that best fitted a model of constant expression levels are often under stabilizing selection, consistent
expression level throughout the phylogeny, reasoning that these with findings in Drosophila16,18 and in C. elegans20.
represent promising candidates for stabilizing selection. A majority This finding has implications for studies of human disease. Indeed,
of the genes on the array (60%) do not show significant inter-species our observations suggest that many changes in gene regulation may
expression differences. However, failure to reject the null hypothesis be deleterious and hence influence disease susceptibility. Consistent
of no expression difference between species can result from constant with this, among the top 100 genes for which regulation is probably
expression level in all individuals in all species (Fig. 2a) or large evolving under stabilizing selection, genes associated with human
within-species variance (Fig. 2b)—especially as primate tissues cancer are slightly enriched (9% compared to 5% in the total gene
cannot be staged10. As our aim is to identify genes under stabilizing sample; P ¼ 0.10, one-tailed Fisher’s exact test). Moreover, the
selection, we are only interested in the former scenario. We therefore expression levels of five genes (MBD4, WWOX, ING1, ATP7B and
ranked genes by their expression variation among individuals across IGFBP2; ranked 5th, 12th, 28th, 58th and 66th, respectively) have
all species (see Methods). Genes at the top of our list are not been shown to be altered specifically in liver carcinoma21–24. These
LETTERS NATURE|Vol 440|9 March 2006

Figure 2 | Genes that are not differentially expressed across species. In Figure 3 | Genes with distinct expression pattern in humans. Different
each plot, different genes (x-axis) are represented by different colours. For genes (x-axis) are represented by distinct colours. For each gene, the log2
each gene, the estimated expression level (^s.e.m.) is shown for humans, expression levels for humans are set to zero. Estimated gene expression level
chimpanzees, orangutans and rhesus macaque (left to right). a, The five relative to human (^s.e.m.) is shown for humans, chimpanzees, orangutans
highest-ranked genes (see Methods). These genes have constant expression and rhesus macaque (left to right). Shown are examples of five genes that are
levels in all species, suggesting that their expression levels are under not differentially expressed in the non-human primates but are upregulated
stabilizing selection. b, Examples of genes that are not differentially (a) or downregulated (b) in humans. The expression levels of these genes
expressed across species, probably due to high within-species variance (gene seem to have been under stabilizing selection in the non-human primates
rankings 489–493). and under directional selection in the human lineage.

findings suggest that focusing on genes with conserved expression findings raise the possibility that the function and regulation of
levels among primates may be helpful in identifying promising transcription factors have been substantially modified in the human
candidates for disease-association studies, much like phylogenetic lineage, potentially affecting many downstream targets over a short
shadowing of DNA sequences25 can aid in the identification of non- evolutionary time frame. Notably, the opposite finding emerged
coding elements of functional importance. from studies of closely related Drosophila species, in which the
Using this general approach, we also identified genes for which expression levels of transcription factors were shown to evolve slower
expression levels are not significantly different among non-human than genes encoding other types of proteins16,18. Given the large
primates but are significantly elevated or reduced in humans relative number of phenotypic changes in the human lineage1, it is tempting
to each of the three other species (see Methods and Supplementary to speculate that relative rates of transcription factor evolution may
Table S4). In other words, the expression level of the gene has serve as an indicator of rates of phenotypic evolution at the
remained similar over ,65 Myr of evolution and then changed organismal level.
over the ,5 Myr of the human lineage, indicative of directional Finally, to examine the extent to which evolution of protein-
selection in humans. Our analysis revealed 14 genes with significantly coding regions mirrors gene expression level changes in the liver, we
higher expression levels in humans and five with lower expression considered three sets of genes: those for which expression levels seem
(Fig. 3). We note that we are likely to be missing a number of targets to be under directional selection in humans (set A), the top 100
of positive selection: gene expression varies across tissues and candidates for stabilizing selection (set B) and the remaining genes
developmental stages26, and as a result, the absence of support for (set C). To assess the evidence for natural selection acting on coding
selection in primate expression data is weak evidence against it. regions, we used estimates of the posterior probability that a gene is
Notably, among the genes with higher expression in humans, we subject to positive or negative selection based on synonymous and
find a significant excess of transcription factors (5/12, 42% compared non-synonymous nucleotide polymorphism and divergence levels at
with 10% representation on the array; P ¼ 0.003 by Fisher’s exact genes on our array28. Using this approach (with a posterior prob-
test, including all genes for which GO annotation was available), ability of 0.05), only 6% of genes in set C and 4% in set B are inferred
whereas no transcription factors were found among genes with to evolve under positive selection. In contrast, among set A, signifi-
unusually low expression in humans. We repeated this analysis cantly more genes (25%) are inferred to evolve under positive
using a less stringent criterion to identify genes for which the mean selection (P ¼ 0.03, one-tailed Fisher’s exact test). These obser-
expression level in humans differed significantly from that of non- vations suggest that genes with expression levels under directional
human primates (see Methods). Again, transcription factors were selection in humans are somewhat more likely to show accelerated
overrepresented among the 30 genes with elevated expression in amino acid evolution.
humans (30%; P ¼ 0.001, Fisher’s exact test), and no transcription In summary, the use of a new multi-species cDNA array has
factors were found among 19 genes with reduced expression. In allowed us to identify a set of genes with regulation under natural
contrast, when these analyses were applied to chimpanzee (Sup- selection in humans. In particular, the over-representation of tran-
plementary Table S5), the number of transcription factors was scription factors among the genes with modified expression levels in
equivalent among genes with elevated (9%) or reduced (9%) the human lineage is consistent with the suggestion that most
expression levels (for the less stringent cutoff), and neither pro- differences between human and chimpanzee are due to changes in
portion was significantly different from the overall representation on gene regulation1, and might provide insight into their genetic
the array (that is, 10%). It is unlikely that these observations can be architecture.
explained by differential degradation of transcripts encoding specific
classes of proteins27, as no difference in RNA quality was observed
between human and non-human primate samples during sample Study design and analysis. The 80 arrays were scanned using a GenePix Axon
preparation (on the basis of electrophoretic analyses). scanner and data were extracted using GenePix 6 (Molecular Devices) to give
In addition to the rapid evolution of expression levels, genes Cy5 and Cy3 foreground and background fluorescence intensities. Analysis was
encoding transcription factors have also been shown to evolve rapidly done in the R computing environment ( Background-
in the human lineage at the coding sequence level28. Together, these corrected Cy5 and Cy3 intensities were produced using the ‘normexp’ method
NATURE|Vol 440|9 March 2006 LETTERS

