Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

Vol 440|9 March 2006|doi:10.

1038/nature04559

LETTERS
Expression profiling in primates reveals a rapid
evolution of human transcription factors
Yoav Gilad1†, Alicia Oshlack2, Gordon K. Smyth2, Terence P. Speed2,3 & Kevin P. White1

Although it has been hypothesized for thirty years that many gene expression levels within and between species, we extracted RNA
human adaptations are likely to be due to changes in gene from liver samples of five adult males from each of the four species. A
regulation1, almost nothing is known about the modes of natural common reference design was used, with a sixth human liver sample
selection acting on regulation in primates. Here we identify a set serving as the reference. We performed four technical replicates of
of genes for which expression is evolving under natural selection. each comparison, for a total of 80 hybridizations. Results from all
We use a new multi-species complementary DNA array to compare species were obtained for 907 genes, used in subsequent analyses
steady-state messenger RNA levels in liver tissues within and (Supplementary Table S1).
between humans, chimpanzees, orangutans and rhesus macaques. After image analysis, background correction and normalization,
Using estimates from a linear mixed model, we identify a set of the log expression values were analysed using a linear mixed model
genes for which expression levels have remained constant across with fixed effects for species and sequence mismatches, and a random
the entire phylogeny (,70 million years), and are therefore likely effect for individuals within species (see Methods). For each gene, we
to be under stabilizing selection. Among the top candidates are used residual maximum likelihood15 to estimate the fixed effects and
five genes with expression levels that have previously been shown variances. Hypothesis testing was performed using likelihood ratio
to be altered in liver carcinoma. We also find a number of genes tests (see Methods).
with similar expression levels among non-human primates but As a first step, we identified genes that are differentially expressed
significantly elevated or reduced expression in the human lineage, between species (Table 1). A phylogenetic tree based on the number
features that point to the action of directional selection. Among of differentially expressed genes between species16 recapitulates their
the gene set with a human-specific increase in expression, there is known phylogeny (Supplementary Fig. S1). However, the number of
an excess of transcription factors; the same is not true for genes significantly differentially expressed genes does not always increase
with increased expression in chimpanzee. with evolutionary time.
A number of recent studies have used DNA microarrays to Focusing on human and chimpanzee, we found 110 genes (12%)
compare patterns of gene expression between closely related to be differentially expressed at a false discovery rate (FDR)17 of 0.01,
species2–9. Within primates, the focus has been primarily on with a mean absolute log ratio of 1.56-fold difference (Supplemen-
human–chimpanzee comparisons, estimating gene expression pro- tary Table S2). Our observation is in general agreement with a
files for a number of tissues, including liver, brain and heart2,6,7,10. The statistical meta-analysis11 of the data from ref. 2. In contrast to this
aim has been to characterize general trends in the evolution of gene meta-analysis, however, we find that equal numbers of genes have
expression rather than to identify specific genes of interest. To date, elevated (55) or reduced (55) expression levels in humans compared
conclusions about the selection pressures acting on gene expression to chimpanzees.
have been conflicting2,3,6,11–13. To estimate lineage-specific changes in expression levels, we used
These studies have all relied on data collected from arrays using the expression profiles from orangutan and rhesus macaques as
gene probes that were designed on the basis of human sequences only. outgroups for 84 of the genes that show significantly different
However, sequence mismatches affect hybridization intensity and expression between human and chimpanzee (Fig. 1a; see Methods).
can therefore bias estimates of gene expression differences between Using this approach, we found similar numbers of genes for which
species14. This limitation of single-species arrays is especially expression has been altered in either the human or the chimpanzee
problematic when the goal is to study how expression changes over lineage. Moreover, in both species, the numbers of genes that show
evolutionary time. To make comparisons between more distantly increased or decreased expression levels relative to the estimated
related primate species, we generated a multi-species cDNA array ancestral expression level is similar (45 and 43 of the genes are
that allows comparison of gene expression between species without upregulated in humans and chimpanzees, respectively). In addi-
the confounding effects of sequence divergence14. This cDNA array tion, the average or median fold change in gene expression level is
contains probes for 1,056 orthologous genes from four species (see similar regardless of the lineage or the trend (that is, up or down)
Supplementary Methods)14.
We used this array to compare gene expression profiles in the
livers of humans, chimpanzees (Pan troglodytes), orangutans (Pongo Table 1 | Inter-species differentially expressed genes
pygmaeus) and rhesus macaques (Macaca mulatta), the phylogeny of
Chimpanzee Orangutan Rhesus macaque
which represents approximately 70 million years (Myr) of evolution.
By assigning expression changes in the liver to particular lineages, we Human 110 128 176
were able to identify the first set of genes for which regulation seems Chimpanzee – 150 141
Orangutan – – 129
to be under lineage-specific selection pressures. In order to measure
1
Department of Genetics and Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut 06510, USA. 2Walter and Eliza Hall Institute of Medical
Research, Parkville, Victoria 3052, Australia. 3Department of Statistics and Program in Biostatistics, University of California, Berkeley, California 94720, USA. †Present address:
Department of Human Genetics, University of Chicago, Chicago, Illinois 60605, USA.

242
© 2006 Nature Publishing Group
NATURE|Vol 440|9 March 2006 LETTERS

Figure 1 | Expression changes in specific lineages. a, For 84 genes that are lineage. b, For 446 genes that are not differentially expressed between human
differentially expressed between human and chimpanzee, the log2-fold and chimpanzee, and for which an ancestral state could be estimated (see
change relative to the common ancestor is given for the human (blue) and Methods), the log2-fold change relative to the common ancestor is shown for
chimpanzee (orange) lineages. Genes are ordered by the ratio of their the human lineage.
expression changes in the human lineage compared to the chimpanzee

(Supplementary Fig. S2). The pattern also holds for expression significantly differentially expressed between species, and also have
changes in the human lineage in genes that are not differentially low within-species variance (Fig. 2). The expression levels of these
expressed between human and chimpanzee (Fig. 1b; 52% of the genes genes seem to have remained constant for ,70 Myr19, suggesting that
were upregulated). These observations do not agree with previous their regulation is under evolutionary constraint. Among the first
studies2,10. Possible explanations for the discrepancy are the use of 100 genes on our list (Supplementary Table S3), the most significant
human microarrays for inter-primate comparisons2, or the assign- enrichment (P , 1028; uncorrected for multiple tests) is for genes
ment of expression changes to lineages in the absence of outgroup from the category ‘regulation of cellular physiological process’ (Gene
data10. Ontology ID 0051244; http://www.geneontology.org). As we expect
Our approach also allows the identification of genes for which transcription of such genes to be similar across individuals and
regulation is likely to have evolved under stabilizing selection. species, this finding serves as a validation of the approach.
Previous studies have done this by testing for deviations from A number of recent papers have argued that the majority of
neutrality or stabilizing selection13,16,18. Such an approach requires expression differences observed between primates are neutral,
a model for the evolution of expression, and thus relies on a number based primarily on the observation that the mean square fold change
of parameter estimates about which there is considerable uncertainty in expression levels in liver and brain increases linearly with species
in primates (for example, the neutral expression change per genera- divergence time6,12. Having found no clear increase in the number of
tion, and the environmental and mutational variance for each gene). significantly differentially expressed genes with time (Table 1), we re-
Instead of specifying an explicit model, we used statistical analyses to examined the mean square fold change for our data. This revealed no
rank genes according to their pattern of evolutionary change among linear increase over time (Supplementary Fig. S3). Moreover, our
the four species, and focused on those at the top of the list as the most observation that many genes show stable expression levels over
promising candidates. 70 Myr suggests that, rather than evolving mostly neutrally,
First, we identified genes that best fitted a model of constant expression levels are often under stabilizing selection, consistent
expression level throughout the phylogeny, reasoning that these with findings in Drosophila16,18 and in C. elegans20.
represent promising candidates for stabilizing selection. A majority This finding has implications for studies of human disease. Indeed,
of the genes on the array (60%) do not show significant inter-species our observations suggest that many changes in gene regulation may
expression differences. However, failure to reject the null hypothesis be deleterious and hence influence disease susceptibility. Consistent
of no expression difference between species can result from constant with this, among the top 100 genes for which regulation is probably
expression level in all individuals in all species (Fig. 2a) or large evolving under stabilizing selection, genes associated with human
within-species variance (Fig. 2b)—especially as primate tissues cancer are slightly enriched (9% compared to 5% in the total gene
cannot be staged10. As our aim is to identify genes under stabilizing sample; P ¼ 0.10, one-tailed Fisher’s exact test). Moreover, the
selection, we are only interested in the former scenario. We therefore expression levels of five genes (MBD4, WWOX, ING1, ATP7B and
ranked genes by their expression variation among individuals across IGFBP2; ranked 5th, 12th, 28th, 58th and 66th, respectively) have
all species (see Methods). Genes at the top of our list are not been shown to be altered specifically in liver carcinoma21–24. These
243
© 2006 Nature Publishing Group
LETTERS NATURE|Vol 440|9 March 2006

Figure 2 | Genes that are not differentially expressed across species. In Figure 3 | Genes with distinct expression pattern in humans. Different
each plot, different genes (x-axis) are represented by different colours. For genes (x-axis) are represented by distinct colours. For each gene, the log2
each gene, the estimated expression level (^s.e.m.) is shown for humans, expression levels for humans are set to zero. Estimated gene expression level
chimpanzees, orangutans and rhesus macaque (left to right). a, The five relative to human (^s.e.m.) is shown for humans, chimpanzees, orangutans
highest-ranked genes (see Methods). These genes have constant expression and rhesus macaque (left to right). Shown are examples of five genes that are
levels in all species, suggesting that their expression levels are under not differentially expressed in the non-human primates but are upregulated
stabilizing selection. b, Examples of genes that are not differentially (a) or downregulated (b) in humans. The expression levels of these genes
expressed across species, probably due to high within-species variance (gene seem to have been under stabilizing selection in the non-human primates
rankings 489–493). and under directional selection in the human lineage.

findings suggest that focusing on genes with conserved expression findings raise the possibility that the function and regulation of
levels among primates may be helpful in identifying promising transcription factors have been substantially modified in the human
candidates for disease-association studies, much like phylogenetic lineage, potentially affecting many downstream targets over a short
shadowing of DNA sequences25 can aid in the identification of non- evolutionary time frame. Notably, the opposite finding emerged
coding elements of functional importance. from studies of closely related Drosophila species, in which the
Using this general approach, we also identified genes for which expression levels of transcription factors were shown to evolve slower
expression levels are not significantly different among non-human than genes encoding other types of proteins16,18. Given the large
primates but are significantly elevated or reduced in humans relative number of phenotypic changes in the human lineage1, it is tempting
to each of the three other species (see Methods and Supplementary to speculate that relative rates of transcription factor evolution may
Table S4). In other words, the expression level of the gene has serve as an indicator of rates of phenotypic evolution at the
remained similar over ,65 Myr of evolution and then changed organismal level.
over the ,5 Myr of the human lineage, indicative of directional Finally, to examine the extent to which evolution of protein-
selection in humans. Our analysis revealed 14 genes with significantly coding regions mirrors gene expression level changes in the liver, we
higher expression levels in humans and five with lower expression considered three sets of genes: those for which expression levels seem
(Fig. 3). We note that we are likely to be missing a number of targets to be under directional selection in humans (set A), the top 100
of positive selection: gene expression varies across tissues and candidates for stabilizing selection (set B) and the remaining genes
developmental stages26, and as a result, the absence of support for (set C). To assess the evidence for natural selection acting on coding
selection in primate expression data is weak evidence against it. regions, we used estimates of the posterior probability that a gene is
Notably, among the genes with higher expression in humans, we subject to positive or negative selection based on synonymous and
find a significant excess of transcription factors (5/12, 42% compared non-synonymous nucleotide polymorphism and divergence levels at
with 10% representation on the array; P ¼ 0.003 by Fisher’s exact genes on our array28. Using this approach (with a posterior prob-
test, including all genes for which GO annotation was available), ability of 0.05), only 6% of genes in set C and 4% in set B are inferred
whereas no transcription factors were found among genes with to evolve under positive selection. In contrast, among set A, signifi-
unusually low expression in humans. We repeated this analysis cantly more genes (25%) are inferred to evolve under positive
using a less stringent criterion to identify genes for which the mean selection (P ¼ 0.03, one-tailed Fisher’s exact test). These obser-
expression level in humans differed significantly from that of non- vations suggest that genes with expression levels under directional
human primates (see Methods). Again, transcription factors were selection in humans are somewhat more likely to show accelerated
overrepresented among the 30 genes with elevated expression in amino acid evolution.
humans (30%; P ¼ 0.001, Fisher’s exact test), and no transcription In summary, the use of a new multi-species cDNA array has
factors were found among 19 genes with reduced expression. In allowed us to identify a set of genes with regulation under natural
contrast, when these analyses were applied to chimpanzee (Sup- selection in humans. In particular, the over-representation of tran-
plementary Table S5), the number of transcription factors was scription factors among the genes with modified expression levels in
equivalent among genes with elevated (9%) or reduced (9%) the human lineage is consistent with the suggestion that most
expression levels (for the less stringent cutoff), and neither pro- differences between human and chimpanzee are due to changes in
portion was significantly different from the overall representation on gene regulation1, and might provide insight into their genetic
the array (that is, 10%). It is unlikely that these observations can be architecture.
explained by differential degradation of transcripts encoding specific
classes of proteins27, as no difference in RNA quality was observed
METHODS
between human and non-human primate samples during sample Study design and analysis. The 80 arrays were scanned using a GenePix Axon
preparation (on the basis of electrophoretic analyses). scanner and data were extracted using GenePix 6 (Molecular Devices) to give
In addition to the rapid evolution of expression levels, genes Cy5 and Cy3 foreground and background fluorescence intensities. Analysis was
encoding transcription factors have also been shown to evolve rapidly done in the R computing environment (http://www.r-project.org). Background-
in the human lineage at the coding sequence level28. Together, these corrected Cy5 and Cy3 intensities were produced using the ‘normexp’ method
244
© 2006 Nature Publishing Group
NATURE|Vol 440|9 March 2006 LETTERS

with an offset of 50, implemented in the limma software package29. Lowess human and African great ape cultured fibroblasts. Genome Res. 13, 1619–-1630
curves for intensity-dependent normalization were generated in a way similar to (2003).
ref. 14, where probes from the two species involved in the hybridization were 6. Khaitovich, P. et al. A neutral model of transcriptome evolution. PLoS Biol. 2,
E132 (2004).
used to fit the curves. All probes on the array were adjusted by the fitted lowess
7. Khaitovich, P. et al. Regional patterns of gene expression in human and
curve (see Supplementary Methods). We concentrated on the 907 genes on the chimpanzee brains. Genome Res. 14, 1462–-1473 (2004).
array for which successful polymerase chain reaction (PCR) products were 8. Rise, M. L. et al. Development and application of a salmonid EST database and
obtained from all species14. The expression log ratios for each gene were analysed cDNA microarray: data mining and interspecific hybridization characteristics.
using the linear mixed model: Genome Res. 14, 478–-490 (2004).
9. Nuzhdin, S. V., Wayne, M. L., Harmon, K. L. & McIntyre, L. M. Common pattern
ytijp ¼ mt þ ktp 2 khp þ ati þ 1tijp of evolution of gene expression level and protein sequence in Drosophila.
Mol. Biol. Evol. 21, 1308–-1317 (2004).
in which we have suppressed the gene labels. Here, y tijp is the normalized log2
10. Khaitovich, P. et al. Parallel patterns of evolution in the genomes and
ratio measured for target species t for replicate j of individual i on species probe transcriptomes of humans and chimpanzees. Science 309, 1850–-1854 (2005).
p. The term m t is the expected log ratio of the expression level of the gene in target 11. Hsieh, W. P., Chu, T. M., Wolfinger, R. D. & Gibson, G. Mixed-model reanalysis
species t relative to the human reference, and k tp and k hp are parameters of primate data suggests tissue and species biases in oligonucleotide-based
corresponding to the reduction in the log expression levels caused by reduced gene expression profiles. Genetics 165, 747–-757 (2003).
affinity owing to target and probe sequence mismatches. As each hybridization 12. Khaitovich, P., Paabo, S. & Weiss, G. Toward a neutral evolutionary model of
has target species t on the red channel and the human reference on the green gene expression. Genetics 170, 929–-939 (2005).
channel, there are two k terms for each measurement. We assume that k tt is equal 13. Lemos, B., Meiklejohn, C. D., Caceres, M. & Hartl, D. L. Rates of divergence in
to 0, and that the affinity adjustments are symmetrical in target and probe (that gene expression profiles of primates, mice, and flies: stablizing selection and
variability among functional categories. Evolution Int. J. Org. Evolution 59,
is, k tp ¼ k pt). The term a ti is the random effect for individual i of species t,
126–-137 (2005).
assumed to be uncorrelated with mean zero and variance j2a . Finally, 1 tijp is the 14. Gilad, Y., Rifkin, S. A., Bertone, P., Gerstein, M. & White, K. P. Multi-species
residual error term, and these are assumed to be uncorrelated with mean zero microarrays reveal the effect of sequence divergence on gene expression
and variance j21 . We also considered models that included random effects for profiles. Genome Res. 15, 674–-680 (2005).
probes within arrays and a crossed term for an array £ probe interaction, but 15. Patterson, H. D. & Thompson, R. Recovery of inter-block information when
found that the contributions from these terms were substantially smaller than block sizes are unequal. Biometrika 58, 545–-554 (1971).
the error term and therefore did not warrant inclusion in the model. (See 16. Rifkin, S. A., Kim, J. & White, K. P. Evolution of gene expression in the
Supplementary Information for further details on the parameters and model.) Drosophila melanogaster subgroup. Nature Genet. 33, 138–-144 (2003).
17. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical
For each gene, the model was fitted by residual maximum likelihood using
and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–-300
statmod and lme software packages30. (1995).
Hypothesis testing. Likelihood ratio tests were used for hypothesis testing. 18. Rifkin, S. A., Houle, D., Kim, J. & White, K. P. A mutation accumulation assay
Under the full model, for each gene, 12 parameters (4 m t parameters, 6 k tp reveals a broad capacity for rapid evolution of gene expression. Nature 438,
parameters, j2a and j21 ) were estimated by maximum likelihood. Genes deemed 220–-223 (2005).
to be under stabilizing selection were those for which the fit of a reduced model 19. Glazko, G. V. & Nei, M. Estimation of divergence times for major lineages of
with m ¼ m h ¼ m c ¼ m o ¼ m r was adequate (h, human; c, chimpanzee; o, primate species. Mol. Biol. Evol. 20, 424–-434 (2003).
orangutan; r, rhesus macaque). Such genes were selected on the basis of the 20. Denver, D. R. et al. The transcriptional consequences of mutation and natural
likelihood ratio test statistic comparing the fit under this sub-model to that selection in Caenorhabditis elegans. Nature Genet. 37, 544–-548 (2005).
21. Park, S. W. et al. Frequent downregulation and loss of WWOX gene expression
under the full model. Under the null hypothesis, 22(log-likelihood ratio) has an
in human hepatocellular carcinoma. Br. J. Cancer 91, 753–-759 (2004).
approximate x 2 distribution on 3 degrees of freedom, and genes for which this 22. Sugeno, H. et al. Expression of copper-transporting P-type adenosine
statistic was less than 12.4 (P ¼ 6.1 £ 1023) were chosen. We then ranked these triphosphatase (ATP7B) in human hepatocellular carcinoma. Anticancer Res.
genes according to the magnitude
 of the between-to-within individual ratio 24, 1045–-1048 (2004).
mean squares 16j^2a þ j^21 =j^21 starting with genes for which this was small. We 23. Zhu, Z. et al. Inhibitory effect of tumour suppressor p33ING1b and its synergy
note that the latter process alone would not suffice to identify genes that are not with p53 gene in hepatocellular carcinoma. World J. Gastroenterol. 11,
differentially expressed between species (Supplementary Fig. S4). 1903–-1909 (2005).
To select genes that were different in human compared to the other three 24. Chiba, T. et al. Identification and investigation of methylated genes in
hepatoma. Eur. J. Cancer 41, 1185–-1194 (2005).
species, we combined three criteria. First, we used a likelihood ratio statistic to 25. Boffelli, D. et al. Phylogenetic shadowing of primate sequences to find
exclude genes that were differentially expressed in the non-human primates. We functional regions of the human genome. Science 299, 1391–-1394 (2003).
maximized the likelihood under the constraints m ¼ m c ¼ m o ¼ m r, constructed 26. Ludwig, M. Z. et al. Functional evolution of a cis-regulatory module. PLoS Biol. 3,
the ratio of this likelihood compared to the full model, and removed genes where e93 (2005).
we estimated significant differences. Second, we used a likelihood ratio statistic 27. Yang, E. et al. Decay rates of human mRNAs: correlation with functional
to rank genes on the basis of differences between human and the other species characteristics and sequence attributes. Genome Res. 13, 1863–-1872 (2003).
(that is, m – m h). We chose a cutoff statistic of 16 (P ¼ 6.3 £ 1025) to select 28. Bustamante, C. D. et al. Natural selection on protein-coding genes in the
genes, but also investigated genes selected under a more relaxed cutoff of 12, human genome. Nature 437, 1153–-1157 (2005).
29. Smyth, G. K. in Bioinformatics and Computational Biology Solutions using R and
which corresponds to ,1% FDR17. Third, we restricted the list to genes with
Bioconductor (eds Gentleman, R., Carey, V., Dudoit, S., Irizarry, R. & Huber, W.)
small between relative to within individual variance. Pairwise differences 397–-420 (Springer, New York, 2005).
between species were also constructed using a likelihood ratio statistic with a 30. Pinheiro, J. C. & Bates, D. M. Mixed-Effects Models in S and S-PLUS (Springer-
cutoff chosen to give 1% FDR, assuming a x21 distribution (numbers are given in Verlag, New York, 2000).
Table 1). We found by simulations that the null likelihood ratio test statistic was
well approximated by a x 2 distribution, implying that our assumptions are Supplementary Information is linked to the online version of the paper at
accurate (data not shown). www.nature.com/nature.
We note that correlation between species due to shared phylogeny is not Acknowledgements We thank K. E. Holt for pointing out a possible technical
expected to influence our results, as no structure is imposed on the parameters for explanation for the excess of upregulated transcription factors in humans, the
the means of the different species and no model is fitted to them across species. Yale hospitals, S. Paabo and the Yerkes Primate Center for providing samples
used in the study, and A. Clark and M. Przeworski for comments on the
Received 5 November; accepted 29 December 2005. manuscript. This research was supported by grants to K.P.W. from the
W. M. Keck Foundation, the Arnold and Mabel Beckman Foundation and the
1. King, M. C. & Wilson, A. C. Evolution at two levels in humans and National Human Genome Research Institute of the National Institutes of Health,
chimpanzees. Science 188, 107–-116 (1975). and by NHMRC grants to G.K.S., T.P.S. and A.O. Y.G. was supported by an
2. Enard, W. et al. Intra- and interspecific variation in primate gene expression EMBO fellowship.
patterns. Science 296, 340–-343 (2002).
3. Caceres, M. et al. Elevated gene expression levels distinguish human from non- Author Information Expression data from this study have been deposited in the
human primate brains. Proc. Natl Acad. Sci. USA 100, 13030–-13035 (2003). GEO database under the series accession number GSE2569. Reprints and
4. Ranz, J. M., Castillo-Davis, C. I., Meiklejohn, C. D. & Hartl, D. L. Sex-dependent permissions information is available at npg.nature.com/reprintsandpermissions.
gene expression and evolution of the Drosophila transcriptome. Science 300, The authors declare no competing financial interests. Correspondence and
1742–-1745 (2003). requests for materials should be addressed to Y.G. ([email protected]) or
5. Karaman, M. W. et al. Comparative analysis of gene-expression patterns in K.P.W. ([email protected]).

245
© 2006 Nature Publishing Group

You might also like