Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

Articles

Genomic analyses provide insights into the history of


tomato breeding
Tao Lin1,2,16, Guangtao Zhu1,16, Junhong Zhang3,16, Xiangyang Xu4,16, Qinghui Yu5,16, Zheng Zheng1,16,
Zhonghua Zhang1, Yaoyao Lun1, Shuai Li1, Xiaoxuan Wang1, Zejun Huang1, Junming Li1, Chunzhi Zhang1,
Taotao Wang3, Yuyang Zhang3, Aoxue Wang4, Yancong Zhang6, Kui Lin6, Chuanyou Li7, Guosheng Xiong2,7,
Yongbiao Xue8,9, Andrea Mazzucato10, Mathilde Causse11, Zhangjun Fei12, James J Giovannoni12,
Roger T Chetelat13, Dani Zamir14, Thomas Städler15, Jingfu Li4, Zhibiao Ye3, Yongchen Du1 & Sanwen Huang1,2
© 2014 Nature America, Inc. All rights reserved.

The histories of crop domestication and breeding are recorded in genomes. Although tomato is a model species for plant biology
and breeding, the nature of human selection that altered its genome remains largely unknown. Here we report a comprehensive
analysis of tomato evolution based on the genome sequences of 360 accessions. We provide evidence that domestication and
improvement focused on two independent sets of quantitative trait loci (QTLs), resulting in modern tomato fruit ~100 times larger
than its ancestor. Furthermore, we discovered a major genomic signature for modern processing tomatoes, identified the causative
variants that confer pink fruit color and precisely visualized the linkage drag associated with wild introgressions. This study outlines
the accomplishments as well as the costs of historical selection and provides molecular insights toward further improvement.

Global food production relies heavily on the capacity and effective- Tomato (Solanum lycopersicum) has a worldwide distribution and is
ness of plant breeding1. More than 10,000 years ago, our Neolithic considered the leading vegetable crop, with a global yield of 162 mil-
ancestors successfully domesticated hundreds of wild plant species lion tons in 2012 (United Nations Food and Agriculture Organization
into cultivated crops that remain principal human food sources2,3. (FAO) statistics; see URLs) and a net value of over $55 billion9. Tomato
Domestication can be regarded as the first stage of plant breeding and is also an important model system for plants and especially for fleshy
is often followed by species distribution along corridors of human fruit biology10. It represents the cornerstone for biological research on
migration. Migration and subsequent differential selection by local and genetic improvement of all solanaceous crops, including potato,
farmers likely contributed to geographical differences in preferences pepper and eggplant. Tomato and its wild relatives originated from
for cultivated species and traits4. Until recently, crop breeding has the Andean region of South America. Cherry tomato (S. lycopersicum
relied heavily on accumulated experience and careful observation. The var. cerasiforme) is considered the probable ancestor of the big-fruited
current genomic era, enabled by next-generation DNA sequencing tomato and was likely domesticated from the red-fruited wild species
technologies, offers new and powerful tools for targeted and precise Solanum pimpinellifolium11. Tomatoes were brought to Europe by the
selection. Scientists can now ‘read’ entire genomes during selection conquistadors in the sixteenth century12, and subsequent migration
and track the history of plant breeding via population genomics, as and continued selection reduced the genetic diversity of this crop.
recently demonstrated in rice5, maize6, soybean7 and cucumber8. To further boost the performance of modern tomato cultivars, wild
These studies illustrate how human-involved evolutionary processes tomato genomes were deliberately introgressed into elite cultivars13.
have shaped modern crop genomes and provide insights for further However, how human selection has changed the tomato genome
crop improvement. remains largely unknown.

1Key Laboratory of Biology and Genetic Improvement of Horticultural Crops of the Ministry of Agriculture, Sino-Dutch Joint Laboratory of Horticultural Genomics,
Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing, China. 2Agricultural Genome Institute at Shenzhen, Chinese Academy of
Agricultural Sciences, Shenzhen, China. 3Key Laboratory of Horticultural Plant Biology, Huazhong Agricultural University, Wuhan, China. 4College of Horticulture,
Northeast Agricultural University, Harbin, China. 5Institute of Horticulture, Xinjiang Academy of Agricultural Sciences, Urumqi, China. 6College of Life Sciences,
Beijing Normal University, Beijing, China. 7State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Chinese Academy of Sciences
and National Plant Gene Research Centre, Beijing, China. 8State Key Laboratory of Molecular Developmental Biology, Institute of Genetics and Developmental Biology,
Chinese Academy of Sciences and National Plant Gene Research Centre, Beijing, China. 9Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China.
10Department of Agriculture, Forests, Nature and Energy (DAFNE), University of Tuscia, Viterbo, Italy. 11Institut National de la Recherche Agronomique (INRA), Unité

de Génétique et Amélioration des Fruits et Légumes, Domaine Saint-Maurice, Montfavet, France. 12Boyce Thompson Institute for Plant Research, US Department
of Agriculture (USDA) Robert W. Holley Center for Agriculture and Health, Cornell University, Ithaca, New York, USA. 13C.M. Rick Tomato Genetics Resource
Center, Department of Plant Sciences, University of California, Davis, Davis, California, USA. 14Robert H. Smith Institute of Plant Sciences and Genetics, Faculty of
Agriculture, Hebrew University of Jerusalem, Rehovot, Israel. 15Plant Ecological Genetics, Institute of Integrative Biology, Eidgenössische Technische Hochschule
(ETH) Zurich, Zurich, Switzerland. 16These authors contributed equally to this work. Correspondence should be addressed to S.H. ([email protected]),
Y.D. ([email protected]), Z.Y. ([email protected]) or Jingfu Li ([email protected]).

Received 20 July; accepted 22 September; published online 12 October 2014; doi:10.1038/ng.3117

Nature Genetics ADVANCE ONLINE PUBLICATION 


Articles

RESULTS var. cerasiforme accessions (fruit weight = 13.29 ± 9.54 g) and BIG
A map of tomato genome variation including 166 big-fruited S. lycopersicum accessions (fruit weight =
We constructed a genomic history of tomato breeding by analyz- 111.33 ± 68.19 g) (Supplementary Fig. 1). The neighbor-joining tree
ing the genomes of 360 diverse accessions collected from around largely supported this division, although some discrepancies between
the world. These included 333 accessions from the red-fruited clade phenotypic characterization and phylogenetic clustering can be
(S. pimpinellifolium, S. lycopersicum var. cerasiforme and S. lycoper­ anticipated due to shared ancestral variation and historical gene flow
sicum) representing various geographical origins, consumption among these very closely related groups (Fig. 1a). For instance, some
types and improvement statuses, 10 accessions of wild tomato cherry tomatoes residing in the BIG group could be the feral descend-
species including some known donors of disease resistance genes ants of cultivated tomatoes or the products of introgression between
(R genes) and 17 modern commercial hybrids (F1) (Supplementary the groups, as previously hypothesized17.
Table 1). Resequencing of the 360 accessions generated a total of Model-based clustering analyses (Fig. 1b and Supplementary
2.6 trillion base pairs of sequence, with a median depth of 5.7× and Figs. 2 and 3) enabled the division of the CER group into two main
coverage of 93.1% of the assembled genome14 (release SL2.40) clusters. One cluster showed obvious admixture in genetic compo-
(Supplementary Table 1). After aligning the reads against the tomato sition and consisted of 49 accessions mainly distributed in South
reference genome, we generated a final set of 11,620,517 SNPs and America, where CER accessions might experience occasional gene
1,303,213 small indels (shorter than 5 bp) (Supplementary Tables 2 flows with the local wild relative S. pimpinellifolium. Another cluster
and 3). The accuracy of the identified SNPs was estimated to be was homogeneous with the BIG group in genetic composition and
98.4% and 97.6% using Sanger sequencing (349 SNPs in 3 acces- contained 38 accessions mainly of non–South American origin (from
sions) and existing SNP array data15 (48 accessions with an average Mesoamerica, Europe, North America and Asia). Within the BIG
of 5,800 SNPs), respectively (Supplementary Tables 4 and 5, and group, we identified a cluster of processing tomatoes, highlighting
© 2014 Nature America, Inc. All rights reserved.

Supplementary Note). We identified 207,306 nonsynonymous SNPs their genetic similarity and distinction from other accessions (Fig. 1b).
in 30,945 genes, including 12,035 nonsense SNPs in 7,678 genes that Nucleotide diversity measured by the π value18 for the PIM group
caused start codon changes, the introduction of premature stop (3.23 × 10−3) was substantially higher than that for the CER
codons or the production of elongated transcripts. This SNP data set (1.74 × 10−3) and BIG (0.73 × 10−3) groups. In addition, the PIM
represents a new resource for tomato biology and breeding. group had more private SNPs (582,954) than the CER (207,892) and
We explored the phylogenetic relationships among the accessions BIG (194,919) groups. We aligned the Solanum pennellii genome19
using 20,111 SNPs (minor allele frequency (MAF) > 0.05) at fourfold- to the reference Heinz 1706 genome. Of the ~11.6 million SNP sites,
degenerate sites that represent neutral or near-neutral variants. The ~3.5 million could be reliably recovered in the S. pennellii genome.
resulting neighbor-joining tree (Fig. 1a) supports the clustering of Among the ~3.5 million SNPs, on average, 30.4% of the sites in PIM
the red-fruited clade. Interestingly, three Solanum cheesmaniae acces- accessions, 6.6% of the sites in CER accessions and 2.8% of the sites
sions and one Solanum galapagense accession were situated within this in BIG accessions were identical to the corresponding S. pennellii sites
clade, consistent with previous studies16. These two species bear small that are presumably ancestral alleles. The decay of linkage disequilib-
mature fruits (2–3 g) of yellow or red color. Endemic to the Galapagos rium (LD) with physical distance between SNPs occurred at 8.8 kb
Islands, they can be experimentally crossed to S. lycopersicum without in PIM (r2 = 0.2), 256.8 kb in CER (r2 = 0.35) and 865.7 kb in BIG
difficulty. On the basis of passport informa-
tion, fruit weight and other morphological
traits, we assigned the 331 red-fruited acces- a TS-407 S. habrochaites PI 247087
TS-408 S. chilense LA1969
TS-403 S. peruvianum PI 128650
sions to 3 groups (2 accessions could not be TS-402 S. peruvianum PI 126935
assigned as fruit weight was highly segregated TS-404 S. peruvianum PI 128657
S. neorickii S. habrochaites S. peruvianum S. galapagense
TS-146 S. neorickii LA2133
within each accession): PIM including 53
S. pimpinellifolium accessions (fruit weight = TS-217 S. cheesmaniae LA0429
2.04 ± 0.85 g), CER including 112 S. lycopersicum TS-208 S. galapagense LA0528
TS-207 S. cheesmaniae LA1037
TS-199 S. cheesmaniae LA0746
S. pimpinellifolium S. lycopersicum S. lycopersicum S. cheesmaniae
var. cerasiforme
Figure 1 Genome-wide relationship and fruit
morphology in cultivated tomato and its wild
relatives. (a) The neighbor-joining tree of the
population (331 accessions from the red-fruited
clade and 10 wild accessions) was generated b PIM CER BIG

using 20,111 SNPs at fourfold-degenerate


sites. The bars indicate the PIM (green), CER K=2
(orange) and BIG (blue) lines. The two branches
containing wild accessions are enlarged for
visualization. Typical fruits of the species
studied are shown. (b) Model-based clustering
K=3
analysis with different numbers of clusters
(K = 2, 3 and 4). The y axis quantifies cluster
membership, and the x axis lists the different
accessions. The orders and positions of these
accessions on the x axis are consistent with
K=4
those for the neighbor-joining tree. South
American CER, non–South American CER
and processing tomato clusters are separated South American Non–South Processing
by dashed red lines. American

 aDVANCE ONLINE PUBLICATION Nature Genetics


Articles

Figure 2 Evolution of fruit mass during a 8


Domestication
fw12.1
domestication and improvement. (a,b) A total fw1.1 fw7.2
fw5.2 lcn12.1
6
of 186 (top 5%; πPIM/πCER ≥ 3.0) and 133 (top

�PIM/�CER
5%; πCER/πBIG ≥ 6.9) regions are considered 4
to be candidate domestication sweeps (orange 2
bars above the dashed horizontal threshold 0
line) (a) and improvement sweeps (green bars
above the dashed horizontal threshold line) (b),
b 30
Improvement fw2.1
fw3.2
fw9.3
fw2.2 fw9.1
respectively. Pink arrows indicate the 5 and 24 fw2.3 lcn3.1 lcn10.1

�CER/�BIG
lcn2.1 fw11.1
13 QTLs related to fruit mass located within 18 lcn2.2 fw11.2
fw11.3
the domestication and improvement sweeps, 12
respectively. (c,d) Distribution of nucleotide 6
diversity (π) of the PIM (green), CER (orange) 0
1 2 3 4 5 6 7 8 9 10 11 12
and BIG (blue) lines within the domestication Chromosome
sweep harboring fw12.1 (with TG180 as
the signature marker) (c) and within the c 8 TG180
PIM
CER
d 8 fw2.1
fw12.1 lcn2.1 fw2.2
improvement sweep harboring five fruit mass 6 BIG 6

� (10 )
� (10–3)
lcn2.2 fw2.3

–3
QTLs on chromosome 2 (d). (e–g) Verification of 4 4
the improvement sweeps related to fruit mass. 2 2
(e) Fruit phenotype and mass of the parental
0 0
lines, the F1 and the two bulk populations with 0 1 2 3 4 5 6 7 8 9 10 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
extreme fruit size from the F2 population each Chr. 12 (Mb) Chr. 2 (Mb)
containing 50 individuals. (f) The SNP indices
e
© 2014 Nature America, Inc. All rights reserved.

P1 P2 F1
(ratio of the SNPs that are identical to those 260.15 g 13.61 g 37.45 g
in the big-fruited parent) in the big-fruited Small-fruited
bulk population
and small-fruited bulk populations are shown 18.66 ± 2.71 g
using blue and orange lines, respectively. Big-fruited
(g) The ∆SNP index (subtracting the SNP index bulk population
of the small-fruited bulk population from that 88.04 ± 18.62 g
of the big-fruited bulk population) and its 95%
confidence interval are shown using red and f Big fruit
1.0 Small fruit
black lines, respectively. Regions with a ∆SNP
SNP index

0.8
index above the confidence line are highlighted 0.6
0.4
with pink bars. (h) Schematic of the two-step 0.2
evolution of tomato fruit size. QTLs that were 0
putatively selected during domestication and g
0.8
improvement are listed, and those in pink 0.6
∆SNP index

0.4
were verified in this study. Note that the size 0.2
of the photos is different for e and h. 0
–0.2
–0.4
1 2 3 4 5 6 7 8 9 10 11 12
(r2 = 0.35) accessions (Supplementary Figs. 4 Chromosome
and 5, and Supplementary Note). Overall,
these morphological and genomic data sup- h fw1.1
fw5.2
fw9.3
fw2.1 lcn2.2 lcn10.1
fw7.2 fw2.2 fw3.2 fw11.1
port the PIM group as the ancestor of the fw12.1 fw2.3 lcn3.1 fw11.2
lcn12.1 lcn2.1 fw9.1 fw11.3
red-fruited clade and the CER group as the
Domestication Improvement
evolutionary intermediate between PIM and
BIG, consistent with a previous report20. We
PIM CER BIG
therefore define the evolutionary processes 2.04 ± 0.85 g 13.29 ± 9.54 g 111.33 ± 68.19 g
yielding the CER group from PIM as ‘domes-
tication’ and yielding the BIG group from CER as ‘improvement’. genomic regions with a drastic reduction in nucleotide diversity in
Demographic modeling using δaδi21 suggested an effective popula- the comparison of PIM and CER lines ( πPIM/πCER; domestication
tion size of ~300 for tomato at domestication, an estimate similar to sweeps) as well as in the comparison of CER and BIG lines (πCER/πBIG;
that for cucumber8 (Supplementary Table 6). improvement sweeps). In total, we identified 186 domestication
sweeps and 133 improvement sweeps covering 8.3% (64.6 Mb) and
Two-step evolution of fruit mass 7.0% (54.5 Mb) of the assembled genome, harboring 5,605 and 4,807
It is well known that the indigenous people of the Andes domesticated genes, respectively (Fig. 2a,b and Supplementary Tables 7–10). We
quinoa, lima bean, peanut, potato, sweet potato and squash2. They note that 21% of the domestication sweeps overlapped with improve-
likely also kept and propagated seeds from wild tomato plants with ment sweeps (8.0 Mb; 1.0% of the genome), indicating that some
bigger and tastier fruits. Fruit mass is the key trait of human selec- of the domestication loci might have undergone a second round of
tion in tomato, as it affects both yield and quality. Typical fruits from selection for further increase in fruit size and improvement of other
PIM lines are tiny and have a thick skin, thin pericarp and high seed agronomic traits. Jointly, the domestication and improvement sweeps
content (Fig. 2). Previous reports on segregating populations from occupied 111.0 Mb (14.2% of the assembled genome).
crosses between PIM and BIG accessions identified multiple QTLs for Five QTLs (Supplementary Table 11) related to fruit mass (fw1.1,
fruit mass, including several genes that were cloned 22–25. However, fw5.2, fw7.2, fw12.1 and lcn12.1) are located within the domestication
whether these QTLs and genes were selected during domestication or sweeps and likely contributed to the enlargement of tomato fruits
improvement remains elusive. To address this question, we scanned during the evolutionary transition from PIM to CER lines (Fig. 2a).

Nature Genetics ADVANCE ONLINE PUBLICATION 


Articles

Figure 3 A major genomic signature of modern processing tomatoes a 1.0


and three causative variants for pink fruit. (a) FST values for all SNP
0.8
sites between tomatoes for fresh consumption and modern processing
0.6
tomatoes. Blue dots above the horizontal dashed line indicate highly

FST
divergent SNPs (top 1%; FST = 0.4464). Three SSC QTLs and a firmness 0.4
QTL on chromosome 5 are indicated. (b) Manhattan plot of the GWAS f 0.2
or fruit color using the compressed mixed linear model (MLM).
0
A significantly associated SNPy is identified upstream of the SlMYB12 (y) b
gene. (c) The structure of SlMYB12 and the positions of SNPy, the 35
SNPy ssc5.1 ssc5.2 fir5.1 ssc5.3 Red Pink
603-bp deletion and the two other deleterious mutations. Exons are 30
25
depicted as black blocks. The numbers of tomato accessions with the

−log10 P
20
six genotypes (I–VI) are given. Note that genotypes II and IV are
15 Y y
recombinants between SNPy and the 603-bp deletion.
10
5
0
For instance, the fw12.1 QTL resides in a domestication sweep span- 1 2 3 4 5 6 7 8 9 10 11 12

ning the telomeric region of the short arm of chromosome 12 (Fig. 2c). c Chromosome
Chr. 1 (Mb)
Its signature marker, TG180, is physically close to the Solyc12g005310 71.246 71.247 71.250 71.255 71.256 71.257 71.258 71.259
gene that encodes a putative auxin-responsive GH3-like protein with SNPy
SlMYB12 (Y)
predominant expression in flower buds, making it a logical candidate C TG CAA
603-bp deletion
for the gene corresponding to fw12.1 (ref. 26). We detected 13 QTLs A TAG TAA
I 204 C Absent TG CAA
(Supplementary Table 11) located within improvement sweeps that
© 2014 Nature America, Inc. All rights reserved.

Red II 1 A Absent TG CAA


might have contributed to the second round of fruit enlargement III 117 A Present TG CAA
IV 1 C Present TG CAA
during the CER-to-BIG transition (Fig. 2b). Remarkably, a major Pink
V 1 C Absent TG TAA
improvement sweep spanned a 10.3-Mb region at the distal end of VI 3 A Absent TAG CAA

the long arm of chromosome 2, where two fruit mass QTLs (fw2.2 and
lcn2.1) were cloned and three others (fw2.1, fw2.3 and lcn2.2) were which jointly gave rise to modern tomatoes about 100 times larger
mapped (Fig. 2d). Selection for these QTLs in the BIG group might be in fruit size than their wild ancestor.
causative of the low genetic diversity in this region (π = 0.22 × 10−3 in
the 10-Mb region versus 0.73 × 10–3 in the whole genome). Divergence in big-fruited tomatoes
The gene fw2.2 controls carpel cell number and contributes sub- After domestication in South America, tomatoes were dispersed to
stantially to the evolution of tomato fruit mass and size24. However, its other parts of the world and selected by local farmers and breeders.
causative variation remains undetermined. We exploited the SNP data In general, big-fruited tomatoes were bred for fresh consumption
set to perform a local association study around fw2.2 for allelic varia- or for processing into tomato paste. Modern processing tomatoes
tion conferring the phenotypic change. A SNP in the promoter region have several characteristic traits, including determinate growth for
(−912 bp relative to the start codon) showed a signal (P < 1 × 10−3) homogeneous fruit setting and harvest27, jointless pedicel28, increased
that was almost fixed in BIG accessions (97.3%) but not in CER acces- firmness for mechanical harvest, and higher soluble solid content (SSC)
sions (66.7%). In PIM accessions, it was a minor allele (2.6%). Taking and lycopene content for processing quality. However, the genome-
this finding together with the fact that fw2.2 is not located within a wide genetic basis underlying the divergence between tomatoes for
domestication sweep, we infer that fw2.2 is more likely an improve- fresh consumption and processing tomatoes was not previously studied.
ment rather than a domestication gene. The same held true for two To search for SNPs underlying this divergence, we computed the
other cloned QTLs: lcn2.1, which contributes to increased locule population differentiation statistic (FST) of each SNP site for 22 mod-
number25, and fw3.2, which corresponds to a cytochrome P450 gene ern processing accessions and the remaining 144 BIG accessions. We
controlling fruit cell number22. The SNP for fw2.2 could be developed observed a non-random distribution of highly divergent sites (the top
as a marker for selecting recombinants to break apart the improve- 1% had FST ≥ 0.4464; the genome average was 0.07). Intriguingly, 90.53%
ment sweep and to introduce new variations into this region for (63,009 of 69,603) of these sites resided on chromosome 5 (Fig. 3a),
modern tomato breeding. spanning the majority of the chromosome (from 3.5 to 62.8 Mb).
To further verify improvement sweeps related to fruit mass, we We note that a previous study identified three SSC QTLs (ssc5.1, ssc5.2
sequenced 2 bulk populations with extreme fruit size, each con- and ssc5.3) located on the short arm, in the centromeric region and on
sisting of 50 progenies from an F2 population of 500 individuals the long arm of chromosome 5, respectively29. A major fruit firmness
from a cross between the CER and BIG lines, to a depth of 50× QTL, fir5.1, also resides in the centromeric region of chromosome 5
(Fig. 2e, Supplementary Fig. 6 and Supplementary Note). We called (ref. 30). In addition, the chromosome has a large centromere with
SNPs between two parental genomes, and we computed the SNP indices a length of ~50 Mb, extending from 10 to 60 Mb on the assembled
for the big-fruited and small-fruited bulk populations as well as their chromosome14. Therefore, selection of the QTLs for higher SSC and
differences (∆SNP index) using a 1,000-kb sliding window with a better fruit firmness likely resulted in the hitchhiking of almost the
step size of 10 kb. This analysis led to the identification of four entire chromosome 5, representing a genomic signature of modern
genomic regions contributing to fruit mass, all of which overlapped processing tomatoes.
with previously identified improvement sweeps, i.e., the chromosome Red-fruited tomatoes are widely consumed, but pink-fruited toma-
2 distal end carrying fw2.1, fw2.2, fw2.3, lcn2.1 and lcn2.2, both distal toes are especially popular in China and Japan31. Two independent
ends of chromosome 9 carrying fw9.1 and fw9.3, and the distal end studies31,32 demonstrated that the pink gene y on chromosome 1
of the long arm of chromosome 11 carrying fw11.1, fw11.2 and corresponds to SlMYB12, which controls the accumulation of
fw11.3 (Fig. 2f,g). To summarize, we propose a two-step evolu- yellow-colored flavonoid (naringenin chalcone) in the tomato fruit
tion of fruit mass that involved two different sets of loci (Fig. 2h), epidermis (peels from pink fruit are colorless owing to the absence

 aDVANCE ONLINE PUBLICATION Nature Genetics


Articles

Figure 4 Introgressions and sweeps. The TS-11 AgpL1


TS-12
introgression fragments from different wild TS-40 S. habrochaites
TS-52
relatives of the 27 accessions (10 inbreeding TS-151
I
lines (I), 12 fresh market hybrids (F) and TS-175 Mi-1
TS-210
5 processing hybrids (P)) are displayed with TS-211 S. peruvianum
TS-272 PI 128657
colored bars. Light-blue bars depict the TS-400
chromosomes. The nucleotide diversities of TS-305
TS-306
Ty-1
the PIM, CER and BIG groups are depicted TS-307 S. chilense
TS-308 LA1969
as green, orange and blue lines, respectively. TS-309
The orange and green bars above the F
TS-310
TS-311 Tm-2 a
chromosomes denote the identified TS-312 S. peruvianum
TS-313 PI 128650
domestication and improvement sweeps, TS-314
TS-315
respectively. The locations of fruit mass QTLs TS-316 Sw-5
are marked by red lines. AgpL1, Mi-1, Ty-1, TS-317 S. peruvianum
TS-318 PI 126935
Tm-2a and Sw-5 represent genes encoding the P TS-319
TS-320
ADP-Glc pyrophosphorylase large subunit, the TS-321
root knot nematode resistance gene, the tomato 10 Mb
yellow leaf curl virus resistance gene, the tomato
mosaic virus resistance gene and the tomato
spotted wilt virus resistance gene, respectively. fw1.1
Chr. 1
fw1.2

of flavonoid); however, the causative vari-


© 2014 Nature America, Inc. All rights reserved.

ant remains unknown. To identify the allelic I


variation underlying this phenotype, we
performed a genome-wide association study
(GWAS) using 231 tomato accessions with
known phenotypes. The strongest association
signal (SNPy, P < 1 × 10−32) resided 8,616 bp
F
upstream of the start codon of SlMYB12
(Fig. 3b). We further analyzed the upstream
and downstream sequences of the target
gene for structural variants and discovered
a 603-bp deletion in the upstream region P
of SlMYB12 (−4,865 bp relative to the start
codon) that was present in most pink-fruited
accessions (Fig. 3c). Among 205 red-fruited
accessions and 122 pink-fruited ones (96 fw6.2 fw9.1 fw9.3
Chr. 6 Chr. 9
additional pink-fruited accessions were
added for this analysis), the 603-bp deletion
coincided with the phenotype in all but 4 accessions. We sequenced in wild accessions (Fig. 4). We detected a large exotic fragment on
the genic regions of SlMYB12 in these four accessions and identified chromosome 9 (51.7–54.7 Mb in length) carrying the tomato mosaic
two nonsense mutations (a nucleotide substitution (C>T) and a 1-bp virus resistance gene Tm-2a derived from Solanum peruvianum34
insertion (TG>TAG)), both resulting in the introduction of premature (PI 128650). In addition, there were two major introgressions on
stop codons. It is noteworthy that the 603-bp deletion was more diag- chromosome 6: one (26.6–27.7 Mb in length) carrying the root knot
nostic than SNPy, as two recombinants were found (genotype II and nematode resistance gene Mi-1 from S. peruvianum35 (PI 128657) and
IV in Fig. 3c). We hypothesize that the deletion might impair the tran- the other (30.9–32.5 Mb in lengths) carrying the tomato yellow leaf
scription of SlMYB12, whose expression is silenced in pink fruits32. curl virus resistance gene Ty-1 from Solanum chilense36 (LA1969).
The silencing of SlMYB12 likely relaxed purifying selection in the Both introgressions occupied nearly the same genomic region,
coding sequence, which could accumulate more deleterious mutations making it difficult to recombine both genes into a single cultivar.
as observed. The three recessive alleles of the y gene represent useful Even after multiple generations of backcrossing, these introgressed
markers for pink tomato breeding. fragments remain intact, possibly owing to chromosomal inversions
or a centromeric location that would inhibit recombination, as shown
Wild introgressions in the case of Ty-1 and Mi-1 (refs. 36,37). An introgression from
Domestication and improvement have increased tomato produc- Solanum habrochaites carrying the AgpL1 gene on chromosome 1,
tivity as well as narrowed its genetic basis. In recent decades, wild which enhances SSC in mature fruits, was observed in four modern
germplasm has increasingly been used as a source of new alleles for processing hybrids (Fig. 4). Understanding the precise position and
tomato breeding. These efforts rely largely on the pioneering research size of these large wild introgressions will enable the deployment of
and comprehensive germplasm collections of Charles Rick at the molecular markers to minimize the limitation from linkage drag and
University of California, Davis33. As an example, resistance genes maximize the potential of wild germplasm.
(R genes) introgressed from wild species are necessary for the success Intriguingly, introgressions carrying resistance genes showed
of modern commercial cultivars. To determine how introgression relatively low overlap with the genomic locations of domestication
changed the tomato genome, we scanned all accessions in the CER and and improvement sweeps (Fig. 4). Only one 4.1-Mb region out of
BIG lines and the 17 F1 hybrids for genome regions similar to those the 92.2 Mb of introgressions (Mi-1, Ty-1, Tm-2a, Sw-5 and AgpL1)

Nature Genetics ADVANCE ONLINE PUBLICATION 


Articles

overlapped with the domestication and improvement sweeps (4.5% AUTHOR CONTRIBUTIONS
in comparison to the genome level of 14.2%; P = 0.01), indicating S.H., Y.D., Z.Y. and Jingfu Li conceived and designed the research. T.L., G.Z.,
J.Z., X.X., Q.Y., Z. Zheng, Y.L., S.L., T.W. and Yuyang Zhang performed DNA
that introgressions are less likely to have occurred within swept regions.
sequencing and biological experiments. T.L., G.Z., Z. Zhang, K.L., Yancong Zhang,
For the three cloned fruit mass genes (fw2.2, fw3.2 and lcn2.1), gene C.L., Y.X., X.W., Z.H., D.Z., Junming Li, G.X., C.Z., A.M., M.C., Z.F., J.J.G., R.T.C.,
action was either recessive or additive22,24,25, requiring both ‘big’ A.W. and T.S. performed the data analysis. S.H., G.Z., T.L., J.Z., X.X., Q.Y. and
alleles for full phenotypic penetration. This could also be true for Z. Zhang wrote the manuscript. Y.D., Z.Y., Jingfu Li, Z. Zhang, C.L., Y.X., A.M.,
other domestication and improvement genes38. Therefore, it may M.C., Z.F., J.J.G., R.T.C., D.Z. and T.S. revised the manuscript.
prove difficult to introgress new alleles into swept regions, implying COMPETING FINANCIAL INTERESTS
that domestication and improvement bear a cost in terms of potential The authors declare no competing financial interests.
future improvement.
Reprints and permissions information is available online at http://www.nature.com/
reprints/index.html.
DISCUSSION
The genomic foundation for modern tomato breeding was shaped
1. Borlaug, N.E. Contributions of conventional plant breeding to food production.
by human-involved selection, as illustrated in this study. Despite Science 219, 689–693 (1983).
their historical contribution to desirable phenotypic traits, these 2. Diamond, J.M. Guns, Germs, and Steel (W.W. Norton & Company, New York,
human-induced processes also resulted in the near fixation of a 1997).
3. Doebley, J.F., Gaut, B.S. & Smith, B.D. The molecular genetics of crop domestication.
large proportion of the tomato genome. As shown, the domestica- Cell 127, 1309–1321 (2006).
tion and improvement sweeps and linkage drags associated with 4. Gross, B.L. & Olsen, K.M. Genetic perspectives on crop domestication. Trends Plant
introgression jointly occupy nearly 200 Mb (25.6% of the assembled Sci. 15, 529–537 (2010).
5. Huang, X. et al. A map of rice genome variation reveals the origin of cultivated
genome), limiting further improvement via conventional breed-
© 2014 Nature America, Inc. All rights reserved.

rice. Nature 490, 497–501 (2012).


ing. The genome sequence14 and the variation map generated here 6. Hufford, M.B. et al. Comparative population genomics of maize domestication and
improvement. Nat. Genet. 44, 808–811 (2012).
will facilitate the separation of genes for favorable traits from their 7. Lam, H.M. et al. Resequencing of 31 wild and cultivated soybean genomes
embedded sweeps and linkage drags by variome-guided selection identifies patterns of genetic diversity and selection. Nat. Genet. 42, 1053–1059
for rare recombination or possibly by genome editing. These efforts (2010).
8. Qi, J. et al. A genomic variation map provides insights into the genetic basis of
should enable a redesign of the genomic foundation for future cucumber domestication and diversity. Nat. Genet. 45, 1510–1515 (2013).
tomato breeding. 9. Vincent, H. et al. A prioritized crop wild relative inventory to help underpin global
food security. Biol. Conserv. 167, 265–275 (2013).
10. Meissner, R. et al. A new model system for tomato genetics. Plant J. 12, 1465–1472
URLs. The SNPs from this study can be viewed in a genome browser (1997).
at http://solgenomics.net/jbrowse/JBrowse-1.11.4/?data=data/json/ 11. Ranc, N., Munos, S., Santoni, S. & Causse, M. A clarified position for Solanum
lycopersicum var. cerasiforme in the evolutionary history of tomatoes (solanaceae).
tomato_variants. Food and Agricultural Organization of the United BMC Plant Biol. 8, 130 (2008).
Nations (FAO) statistics, http://faostat.fao.org/; Tomato Heinz 1706 12. Jenkins, J. The origin of the cultivated tomato. Econ. Bot. 2, 379–392 (1948).
genome, ftp://ftp.sgn.cornell.edu/genomes/Solanum_lycopersicum/; 13. Rick, C.M. Hybridization between Lycopersicon esculentum and Solanum pennellii:
phylogenetic and cytogenetic significance. Proc. Natl. Acad. Sci. USA 46, 78–82
SOAP software, http://soap.genomics.org.cn; PHYLIP software, (1960).
http://evolution.genetics.washington.edu/phylip.html; STRUCTURE 14. Tomatod Genome Consortium. The tomato genome sequence provides insights into
software, http://pritchardlab.stanford.edu/structure.html. fleshy fruit evolution. Nature 485, 635–641 (2012).
15. Sim, S.C. et al. High-density SNP genotyping of tomato (Solanum lycopersicum L.)
reveals patterns of genetic variation due to breeding. PLoS ONE 7, e45520
Methods (2012).
16. Spooner, D.M., Peralta, I.E. & Knapp, S. Comparison of AFLPs with other markers
Methods and any associated references are available in the online for phylogenetic inference in wild tomatoes (Solanum L. section Lycopersicon (Mill.)
version of the paper. Wettst.). Taxon 54, 43–61 (2005).
17. Rick, C. & Holle, M. Andean Lycopersicon esculentum var. cerasiforme: genetic
variation and its evolutionary significance. Econ. Bot. 44, 69–78 (1990).
Accessions codes. The sequence data have been deposited in 18. Tajima, F. Evolutionary relationship of DNA sequences in finite populations. Genetics
the NCBI Sequence Read Archive (SRA) under accession 105, 437–460 (1983).
SRP045767. 19. Bolger, A. et al. The genome of the stress-tolerant wild tomato species Solanum
pennellii. Nat. Genet. 46, 1034–1038 (2014).
20. Blanca, J. et al. Variation revealed by SNP genotyping and morphology provides
Note: Any Supplementary Information and Source Data files are available in the insight into the origin of the tomato. PLoS ONE 7, e48198 (2012).
online version of the paper. 21. Gutenkunst, R.N., Hernandez, R.D., Williamson, S.H. & Bustamante, C.D. Inferring
the joint demographic history of multiple populations from multidimensional SNP
Acknowledgments frequency data. PLoS Genet. 5, e1000695 (2009).
We thank J. Maloof (University of California, Davis) for providing tomato 22. Chakrabarti, M. et al. A cytochrome P450 regulates a domestication trait in
RNA sequencing data and L.A. Mueller and N. Menda (Cornell University) cultivated tomato. Proc. Natl. Acad. Sci. USA 110, 17125–17130 (2013).
for setting up a genome browser of SNPs. This work was supported by funding 23. Grandillo, S., Ku, H. & Tanksley, S. Identifying the loci responsible for natural variation
from the National Program on Key Basic Research Projects in China (973 in fruit size and shape in tomato. Theor. Appl. Genet. 99, 978–987 (1999).
24. Frary, A. et al. fw2.2: a quantitative trait locus key to the evolution of tomato fruit
program; 2012CB113900 and 2011CB100600), the National Science Fund
size. Science 289, 85–88 (2000).
for Distinguished Young Scholars (31225025 to S.H.), the National HighTech 25. Muños, S. et al. Increase in tomato locule number is controlled by two single-
Research Development Program in China (863 Program; 2012AA100101 nucleotide polymorphisms located near WUSCHEL. Plant Physiol. 156, 2244–2254
and 2012AA100105), the National Natural Science Foundation of China (2011).
(31272160, 31230064, 31272171 and 31171962), the Chinese Ministry of 26. Liu, K. et al. A GH3-like gene, CcGH3, isolated from Capsicum chinense L. fruit
Finance (1251610601001), CAAS (an Agricultural Science and Technology is regulated by auxin and ethylene. Plant Mol. Biol. 58, 447–464 (2005).
Innovation Program grant to S.H.), the China Agriculture Research System 27. Pnueli, L. et al. The SELF-PRUNING gene of tomato regulates vegetative to
(CARS-25-A-09 and CARS-25-A-15), the Special Fund for Agro-Scientific reproductive switching of sympodial meristems and is the ortholog of CEN and
TFL1. Development 125, 1979–1989 (1998).
Research in the Public Interest of China (201303115), the Major Special Science
28. Mao, L. et al. JOINTLESS is a MADS-box gene controlling tomato flower abscission
and Technology Project during the Twelfth Five-Year Plan Period of Xinjiang zone development. Nature 406, 910–913 (2000).
(201230116-3) and the US National Science Foundation Plant Genome Program 29. Tanksley, S.D. et al. Advanced backcross QTL analysis in a cross between an elite
(IOS-0923312). This work was also supported by the Shenzhen municipal and processing line of tomato and its wild relative L. pimpinellifolium. Theor. Appl.
Dapeng district governments. Genet. 92, 213–224 (1996).

 aDVANCE ONLINE PUBLICATION Nature Genetics


Articles

30. Xu, J. et al. Phenotypic diversity and association mapping for fruit quality traits in 35. Kaloshian, I. et al. Genetic and physical localization of the root-knot nematode
cultivated tomato and related species. Theor. Appl. Genet. 126, 567–581 (2013). resistance locus Mi in tomato. Mol. Gen. Genet. 257, 376–385 (1998).
31. Ballester, A.R. et al. Biochemical and molecular analysis of pink tomatoes: 36. Verlaan, M.G. et al. The Tomato Yellow Leaf Curl Virus resistance genes Ty-1
deregulated expression of the gene encoding transcription factor SlMYB12 leads to and Ty-3 are allelic and code for DFDGD-class RNA-dependent RNA polymerases.
pink tomato fruit color. Plant Physiol. 152, 71–84 (2010). PLoS Genet. 9, e1003399 (2013).
32. Adato, A. et al. Fruit-surface flavonoid accumulation in tomato is controlled by a 37. Seah, S., Yaghoobi, J., Rossi, M., Gleason, C.A. & Williamson, V.M. The nematode-
SlMYB12-regulated transcriptional network. PLoS Genet. 5, e1000777 (2009). resistance gene, Mi-1, is associated with an inverted chromosomal segment in
33. Rick, C.M. The tomato. Sci. Am. 239, 76–87 (1978). susceptible compared to resistant tomato. Theor. Appl. Genet. 108, 1635–1642
34. Tanksley, S.D. et al. Yield and quality evaluations on a pair of processing tomato (2004).
lines nearly isogenic for the Tm2a gene for resistance to the tobacco mosaic virus. 38. Schauer, N. et al. Mode of inheritance of primary metabolic traits in tomato.
Euphytica 99, 77–83 (1998). Plant Cell 20, 509–523 (2008).
© 2014 Nature America, Inc. All rights reserved.

Nature Genetics ADVANCE ONLINE PUBLICATION 


ONLINE METHODS from 3 tomato accessions (PIM, TS-244; CER, TS-252; BIG, TS-137) for valida-
Plant materials and sequencing. A total of 360 tomato accessions were collected tion by PCR and Sanger sequencing. Second, we compared 285,508 SNPs for 48
from TGRC (Tomato Genetics Resource Center), USDA (US Department of tomato accessions identified in this study to previously published tomato SNP
Agriculture), EU-SOL (European Union Solanaceae project), INRA (National array data15 (Supplementary Tables 4 and 5, and Supplementary Note).
Institute for Agricultural Research) and IVF-CAAS (Institute of Vegetables
and Flowers, Chinese Academy of Agricultural Science). These accessions Phylogenetic analysis and population structure of tomato. To build a
included 10 wild tomato accessions (1 S. habrochaites, 3 S. cheesmaniae, neighbor-joining tree, we screened a subset of 20,111 SNPs at fourfold-
1 S. galapagense, 3 S. peruvianum, 1 Solanum neorickii and 1 S. chilense), degenerate sites (MAF > 5% and missing data < 10%) in the 341 tomato
53 PIM accessions, 112 CER accessions, 166 BIG accessions (2 accessions accessions (excluding the F1 individuals) from the entire SNP data set. These
were excluded for extreme phenotype segregation) and 17 modern com- SNPs should be under less selective pressure, thus more reliably reflecting
mercial hybrids (F1) (Supplementary Table 1). Tomato plants were grown population structure and demography. We constructed a phylogenetic tree
in the greenhouses of IVF-CAAS, Beijing, and the Huazhong Agricultural using PHYLIP42 (version 3.695) with 100 bootstrap replicates. Using the same
University, Wuhan, China. Genomic DNA was extracted from young leaves data set, we also investigated the population structure using STRUCTURE 43
using the cetyltriethylammnonium bromide (CTAB) method39. At least 5 µg of (version 2.3.1) on the basis of allele frequencies. To determine the most likely
genomic DNA was used for each accession to construct paired-end sequencing group number, STRUCTURE was run 20 times on 1,000 randomly selected
libraries with insert sizes of approximately 500 bp according to the manufacturer’s SNPs at fourfold-degenerate sites for each K value from 2 to 19 (Supplementary
instructions (Illumina). We generated more than 5 Gb of sequence data Fig. 2). After determining ∆K, we used 20,111 SNPs at fourfold-degenerate sites
for each accession with 100-bp paired-end reads using the Illumina HiSeq to determine the group membership of each accession by 10,000 iterations with
2000 platform. K values from 2 to 4. In addition, we performed principal-component analysis
(PCA)44 using 2,340,973 SNPs across the genome (MAF > 10%, missing < 5%).
Mapping and variation calling. To call SNPs, we used SOAP2 (ref. 40) to map Two-dimensional coordinates were plotted for the 331 tomato accessions
all the sequence reads from each accession to the tomato reference genome14 (excluding 10 wild accessions) (Supplementary Fig. 3).
© 2014 Nature America, Inc. All rights reserved.

(release SL2.40) with the following parameters: -m 100, -x 888, -s 35, -l 32, -v 3.
Mapped reads were filtered to remove PCR duplicates, assigned to the chro- Demographic analysis of tomato evolution history. The best parameters for
mosomes and sorted according to mapping coordinates. Both paired-end and fitting were estimated using δaδi21. We fitted the three-population model with
single-end mapped reads were then used for SNP detection throughout the PIM and BIG mixed together (Supplementary Table 6) for all three groups.
entire collection of tomato accessions via the following procedures. Scripts The simulation was carried out 20 times, and each time we randomly selected
and analysis pipelines for the SNP data set are available in Supplementary 500,000 SNPs and estimated 95% confidence intervals on the basis of the best
Data Set. fitting parameters. The parameters inferred by δaδi were scaled by 2Ne, with
To identify SNPs for each genotype, we identified possible SNPs for each acces- Ne being the ancestral population size. We estimated the ancestral population
sion relative to the reference using SOAPsnp41 with the following parameters: - size using the formula 4Ne × µ × L = θ, where µ is the mutation rate, L is the
L 100 -u -F 1. The likelihood of each individual’s genotype in glf format was then generation time and θ is the genetic diversity. We used θPIM to estimate θ
generated for each chromosome with SNP quality ≥ 40 and base quality ≥40. (set to ~3.23 × 10−3). Here the neutral mutation rate of 1 × 10−8 (ref. 45) was
To integrate SNPs across the entire collection, we called each SNP using used for µ (the mutation rate per generation). Thus, 2Ne was estimated to
GLFmulti on the basis of the maximum-likelihood estimation of site frequency. be 1.615 × 105. All the parameters were then scaled by 2Ne to estimate time
The core set of SNPs was then obtained by filtering according to site frequency in years and the population size in number of individuals.
and the quality score given by GLFmulti. These SNPs were further filtered
using the following criteria: (i) one position with more than two alleles was Detection of domestication and improvement sweeps. Nucleotide diversity
considered to be a polymorphic site in the population; (ii) the total sequencing (π) is often applied to measure the degree of variability in a group18. To identify
depth had to be >150× and <3,500× and the SNP quality value had to be greater genomic regions affected by domestication and improvement, two key stages
than 40; (iii) positions with an average mapping rate of reads of less than 1.5 in tomato evolution, we first measured the level of genetic diversity (π) using
were retained to rule out the effect of duplications; and (iv) the nearest SNPs a 100-kb window with a step size of 10 kb in PIM, CER and BIG. The regions
had to be more than 1 bp away. affected by domestication should have substantially lower diversity in CER than
To obtain the final set of SNPs, we performed filtering on the basis of seg- in PIM. Improvement sweeps should show a much stronger reduction of diver-
regation tests and the proportion of homozygosity. Segregation tests can dis- sity in BIG in comparison to CER. If πPIM for a window was lower than 0.002
tinguish any segregation pattern from random sequencing errors on the basis in domestication analysis and πCER was lower than 0.001 in improvement analy-
of the sequencing depth of the two putative alleles in different individuals. sis, then the window was excluded. By scanning the ratios of genetic diversity
We thus performed segregation tests on the contingency table of read depth between PIM and CER (πPIM/πCER) as well as between CER and BIG (πCER/
for SNP alleles from the 360 tomato accessions. Permutations were used to πBIG), we selected windows with the top 5% of ratios (3.0 and 6.9 for domestica-
determine the significance of allele depth in the population, and only sites tion and improvement, respectively) as candidate regions for further analysis.
with P < 0.01 were retained. In addition, we filtered out sites at which fewer Finally, windows that were ≤100 kb apart were merged into a single selected
than 85% of the lines appeared to be homozygous and sites with a proportion region (Supplementary Tables 7–10). To verify the empirical thresholds with
of heterozygous genotypes greater than three times that of the homozygous low false discovery rate, we performed whole-genome permutation tests to
genotypes with the minor allele. ascertain the thresholds for identifying domestication and improvement sweeps
Finally, to detect small indels (≤5 bp in length), we mapped all the sequence (Supplementary Note). The regions shared by domestication and improvement
reads from each accession with a gap of less than 5 bp allowed (parameter sweeps were defined as overlapping regions, and these regions should have under-
-g 5) using SOAP2 (ref. 40). Indels (1–5 bp) were called by the SOAPindel gone further selection during improvement. In our study, we analyzed almost all
pipeline (see URLs). the fruit mass QTLs and genes that segregated between PIM parents and culti-
vated parents. If closely linked markers or the mapped intervals were located
Annotation of SNPs. The identified SNPs were further categorized as varia- in domestication (improvement) sweeps, we considered them to be candidate
tions in intergenic regions, UTRs, coding sequences and introns according to the domestication (improvement) QTLs or genes (Supplementary Table 11).
tomato genome annotation (release ITAG2.3). SNPs in coding sequences were
further grouped into synonymous SNPs (not causing amino acid changes) and Linkage disequilibrium analysis. LD values for PIM, CER and BIG were
nonsynonymous SNPs (causing amino acid changes) (Supplementary Table 2). calculated on the basis of SNPs (MAF > 0.05) using Haploview software46. The
parameters were as follows: -n -pedfile -info -log -minMAF 0.05 -hwcutoff 0
Evaluation of SNPs. We used Sanger sequencing and previously released SNP -dprime -memory 2096. LD decay was calculated on the basis of the r2 value
array data to evaluate the accuracy of SNPs. First, we randomly selected 349 SNPs and corresponding distance between two SNPs (Supplementary Fig. 5).

Nature Genetics doi:10.1038/ng.3117


Bulked segregant analysis of the F2 population by whole-genome Detection of introgression status from wild germplasm in cultivars. Some
resequencing. We planted an F2 population of 500 individuals derived from wild accessions harbor important loci, including disease resistance genes
the cross between TS-400 (a big-fruit accession) and TS-my (a small-fruit (R genes), and have been widely applied in modern tomato breeding programs.
accession) in the fall of 2013 in IVF-CAAS, China. For each individual, the To detect introgressions in modern cultivars, we analyzed the introgression
average weight of approximately ten representative fruits was recorded (Fig. 2e, pattern by calculating the ratio of identical SNPs between cultivated accessions
Supplementary Fig. 6 and Supplementary Note) and genomic DNA was iso- (including 112 CER, 166 BIG and 17 F1 accessions) and wild donor accessions
lated from fresh leaves using the CTAB method. For bulked segregant analysis, (Fig. 4). The ratio from each chromosome was plotted in a 100-kb sliding
bulk DNA samples for big- and small-fruit accessions were constructed by window with a step size of 10 kb.
mixing equal amounts of DNA from 50 individuals showing extremely big
and small fruits, respectively. Roughly 20× genome sequences for each parent
(TS-400 and TS-my) and 50× data for each bulk sample (big fruit and small fruit)
were generated. Short reads were aligned against the reference genome (release
SL2.40) using the Burrows-Wheeler Aligner (BWA)47, and SNPs were identi-
fied using SAMtools48. SNPs between two parental genomes were identified
for further analysis when the base quality value was ≥20 and the SNP quality 39. Gawel, N. & Jarret, R. A modified CTAB DNA extraction procedure for Musa and
Ipomoea. Plant Mol. Biol. Rep. 9, 262–266 (1991).
value was ≥20. On the basis of these criteria and the number of SNPs with read 40. Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment.
depth from 4 to 200, a SNP index was calculated for both bulk samples express- Bioinformatics 25, 1966–1967 (2009).
ing the proportion of reads harboring SNPs that were identical to those in the 41. Li, R. et al. SNP detection for massively parallel whole-genome resequencing.
big-fruit parent (TS-400). A ∆SNP index was obtained by subtracting the SNP Genome Res. 19, 1124–1132 (2009).
42. Felsenstein, J. PHYLIP—phylogeny inference package (version 3.2). Cladistics 5,
index for the small-fruit bulk sample from that for the big-fruit bulk sample. 164–166 (1989).
An average SNP index for the big-fruit and small-fruit bulk samples was cal- 43. Falush, D., Stephens, M. & Pritchard, J.K. Inference of population structure using
culated using a 1,000-kb sliding window with a step size of 10 kb (Fig. 2f,g). multilocus genotype data: linked loci and correlated allele frequencies. Genetics
© 2014 Nature America, Inc. All rights reserved.

We also calculated the statistical confidence intervals of the ∆SNP index under 164, 1567–1587 (2003).
44. Patterson, N., Price, A.L. & Reich, D. Population structure and eigenanalysis.
the null hypothesis of no QTLs. For each position, the 95% confidence inter- PLoS Genet. 2, e190 (2006).
vals of the ∆SNP index were obtained following the method described in 45. Gaut, B.S. Molecular clocks and nucleotide substitution rates in higher plants.
Takagi et al.49. Evol. Biol. 30, 93–120 (1998).
46. Barrett, J.C., Fry, B., Maller, J. & Daly, M.J. Haploview: analysis and visualization
of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).
Genome-wide association studies for fruit color. We used 10,990,318 high- 47. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler
quality SNPs (MAF > 0.05) to perform GWAS for fruit color in 231 accessions transform. Bioinformatics 25, 1754–1760 (2009).
(205 red- and 26 pink-fruit accessions). The association analyses were per- 48. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics
formed using the compressed MLM50,51 (Fig. 3b) with TASSEL 4.0 (ref. 52). 25, 2078–2079 (2009).
49. Takagi, H. et al. QTL-seq: rapid mapping of quantitative trait loci in rice by whole
To further detect the causative variant in the significantly associated region genome resequencing of DNA from two bulked populations. Plant J. 74, 174–183
(chromosome 1: 71,229,871–71,258,882), we analyzed the discordant paired- (2013).
end reads between pink and red tomatoes by aligning the resequenced reads 50. Yu, J. et al. A unified mixed-model method for association mapping that accounts
of 20 pink- and 20 red-fruit accessions against the reference genome (release for multiple levels of relatedness. Nat. Genet. 38, 203–208 (2006).
51. Zhang, Z. et al. Mixed linear model approach adapted for genome-wide association
SL2.40) using BWA and SAMtools. We further checked the variants in 205 studies. Nat. Genet. 42, 355–360 (2010).
red- and 122 pink-fruit accessions from our tomato germplasm and IVF-CAAS 52. Bradbury, P.J. et al. TASSEL: software for association mapping of complex traits in
accessions by direct PCR and Sanger sequencing. diverse samples. Bioinformatics 23, 2633–2635 (2007).

doi:10.1038/ng.3117 Nature Genetics

You might also like