Simple sequence repeat (SSR) markers are widely used for analyses of genetic diversity and relationships and molecular breeding due to their abundance in the genome, polymorphism, codominant inheritance and ease of detection by PCR (Kuleung et al. 2004, Li et al. 2018).
RNA sequencing is an effective way of obtaining a large amount of sequence data for SSR mining. The development of SSR markers for many plants based on transcriptome data resources has rapidly progressed (Hodel et al. 2016). In recent years, many SSR markers have been developed for plants of economic importance and for endangered species, such as elephant grass (López et al. 2018), Torreya grandis Fort. (Zeng et al. 2018), peanut (Bosamia et al. 2015), tobacco (Bindler et al. 2011), rubber tree (Li et al. 2012), and Myracrodruon urundeuva (Allemão) Engl. (Souza et al. 2018). Single-molecule long reads that capture the entire RNA molecule can provide insight into the transcriptome; thus, many studies aimed at SSR marker development and other objectives based on full-length mRNA sequencing have been carried out (Chen et al. 2018, Yi et al. 2018).
C. alismatifolia originated in Thailand and was introduced into China in recent years. During its introduction into China, it was cultivated in southern provinces such as Hainan, Guangdong and Guangxi, which have warmer winters than more northern provinces. It has been planted in the Yangtze River basin, where its bulbs are harvested and stored over the winter (Liu et al. 2017, Liu et al. 2013). This summer flowering plant has become an important urban garden flower in summer in Zhejiang, Jiangsu, Anhui and adjacent areas. Although it has become a popular ornamental plant, research on this plant at the molecular level remains lacking.
The genetic relationships among the ten most common C. alismatifolia varieties remain unclear and require elucidation to improve breeding programs. Previous studies on the genetic diversity of C. alismatifolia have been based mainly on dominant and universal markers, such as RAPD, ISSR and AFLP (Syamkumar & Sasikumar 2007, Das et al. 2011). However, SSRs, as codominant markers that are more stable than other marker types, are needed for research on molecular breeding and the genetic relationships of germplasm resources.
In this work, a large number of SSR primers were designed based on single-cell sequencing data. Seventy primer pairs were selected at random for testing, and 35 (50 %) displayed polymorphism within the 10 selected varieties. The genetic diversity and relationships of the 10 varieties were assessed based on the newly developed SSR markers. The present study provides a public resource and information that can aid future genetic studies and breeding programs in Curcuma alismatifolia.
Materials and methods
Plant materials. A mature plant of C. alismatifolia ‘Chiang Mai pink’ cultivated in the Flower Research and Development Center of Zhejiang Academy of Agricultural Sciences was selected for PacBio single-molecule long-read sequencing. Ten varieties (Figure 1), with one individual per variety, were selected to test the validity and polymorphism of 70 SSR primers. High quality DNA was extracted from leaves of each variety according to previous methods (Doyle & Doyle 1987) and stored in refrigerator at -20 °C for later use.
Full-length mRNA sequencing of C. alismatifolia. High-quality RNA of leaf, scape, fertile bract, sterile bract (ornamental bract) and flower was extracted and mixed in a proportion of 1:1:1:2:2 for cDNA library construction using the Clontech SMARTer cDNA synthesis kit (Takara, Japan). We performed size selection using the BluePippin Size Selection System protocol and produced three libraries corresponding to fragments of 1-2, 2-3 and 3-6 kb in length. The three libraries were sequenced on three cells with the PacBio Sequel system (PacBio, CA, USA). Long reads produced by the PacBio sequencer were processed with the PacBio IsoSeq pipeline (github.com/PacificBiosciences/IsoSeq_SA3nUP) to generate full-length refined consensus transcripts. The reads were filtered using standard protocols with the SMRT Analysis software suite (www.pacb.com/support/software-downloads/) (Yi et al. 2018).
SSR mining and character analysis. The software MISA (a microsatellite identification tool) was used to search the SSRs in all of the unigenes. For mononucleotide repeats, nucleotide sequences with fewer than ten repeats were excluded. For di-, tri-, tetra-, penta- and hexanucleotides, a minimum of six repeats was adopted as a filtering criterion. For compound microsatellites, a cutoff value of 100 bp was chosen as the maximum length of bases interrupting two SSRs (Bosamia et al. 2015).
SSR primer development and SSR-PCR amplification. Unigenes containing SSRs were used to design primers from sequences flanking SSR loci with Primer 3.0. All SSR loci except mononucleotide repeats were used for primer design. The criteria used for primer design were as follows: primer length, 20-25 bp; temperature, 50-60 °C; GC content, 40-60 % and product size range, 95-295 bp.
Seventy randomly selected SSR primer pairs were used for validation testing in the 10 C. alismatifolia varieties (Supplemental material S1). Each 20 µl SSR-PCR reaction mixture consisted of 1 µl of Taq DNA polymerase with 1× reaction buffer, 0.4 µl dNTPs, 0.3 µmol/L each of forward and reverse primer, and 50 ng DNA template. PCR amplification was performed with an initial denaturation at 94 °C for 5 min; followed by 10 cycles of 94 °C for 30 s, 60 to 50 °C for 30 s and 72 °C for 40 s, with a 1 °C decrement in annealing temperature per cycle; 25 cycles of 94 °C for 30 s, 50 °C for 30 s and 72 °C for 40 s; and a final extension at 72 °C for 3 min. Electrophoresis was performed at 2,000 V for 1.5 h with a vertical polyacrylamide gel and stained with GoldenView (Gajjar et al. 2014). The gels were imaged with an automated gel documentation system (GelDoc XR+ Imager, USA) and scored for marker amplification. The alleles of the 10 varieties amplified by each primer pairs were named A, B, C, according to length from short to long.
Data analysis. The polymorphic primers were selected for further analysis. POPGENE software version 1.31 (Yeh et al. 1999) was used to calculate the number of alleles (Na), the number of effective alleles (Ne), the observed heterozygosity (Ho), and Shannon’s information index (I).
The polymorphic information content (PIC) of the alleles was calculated by the formula PIC = 1-Σ(Pi)², where Pi is the frequency of the ith SSR allele. The genetic distances across the varieties were calculated using POPGENE software version 1.31 (Yeh et al. 1999). A cluster analysis of the 10 varieties based on Nei’s unbiased measure of genetic distance was carried out using the unweighted pair-group method with arithmetic average (UPGMA), and a dendrogram was constructed by NTSYS-pc version 2.11V (Rohlf 2004).
Results
SSR mining and feature analysis. Out of the 64,471 unigenes subjected for SSR screening, 15,891 were found to contain SSRs. A total of 19,902 SSRs were identified among these 15,891 unigenes, with an average of one SSR per 2.06 kb; 3,155 unigenes contained more than one SSR (Table 1). The CT motif was the most common SSR motif in C. alismatifolia. The frequencyes of di- to hexanucleotide SSRs were calculated, and the top 20 most frequent motifs, including 6 di- and 14 trinucleotides, are shown in Figure 2
Design of novel primer sets and validation. A total of 3,637 primer pairs were designed (Supplemental material S2), of which trinucleotides (54.88 %) showed the highest frequency, followed by di- (28.84 %), compound- (10.53 %), tetra- (3.38 %), hexa- (1.48 %) and pentanucleotides (0.88 %). The remaining SSRs contained sequences that failed to generate primer pairs, due either to the unavailability of flanking sites for primer design or due to nonconformance with the primer design parameters. Most Curcuma plants have medicinal and ornamental value; however, few SSR markers for this genus have been developed. As EST-SSR markers are usually transferable among distantly related species, these newly developed markers could be used with other Curcuma species for which little SSR and EST information is available.
Feature | Values |
---|---|
Total number of sequences examined | 64,471 |
Total size of examined sequences (Mb) | 132.8 |
Total number of identified SSRs | 19,902 |
Number of SSR containing sequences | 15,891 |
Number of sequences containing more than one SSR | 3,155 |
Number of SSRs present in compound formation | 1,132 |
Seventy primer pairs were selected at random to test their validity, of which 41 (59 %) were successfully used to amplify PCR products and 35 (50 %) displayed polymorphism within the 10 selected varieties.
Genetic diversity and relationships of C. alismatifolia varieties. The thirty-five SSR markers verified as polymorphic were then used to assess the genetic diversity and genetic relationships of the 10 core C. alismatifolia varieties used as garden flowers in China. A total of 139 alleles were detected in the 10 varieties, of which 49 were determined to be variety specific. The Na, Ne, Ho, PIC and I values for each SSR marker are listed in Table 2.
Primer Name | Sequence Name | Motif | Ho | No. of alleles | Length of product | Anneling temperature (℃) | Pic value |
---|---|---|---|---|---|---|---|
P2 | c13778 | (AGA)6 | 0.4 | 4 | 131 | 54 | 0.345 |
P5 | c18662 | (AGA)7 | 0.4 | 3 | 170 | 56 | 0.545 |
P11 | c28839 | (CTCTC)5 | 0.4 | 4 | 266 | 54 | 0.3475 |
P12 | c9314 | (GAGAT)5 | 0.9 | 5 | 245 | 52 | 0.635 |
P13 | c3424 | (TGAGC)5 | 0.5 | 3 | 295 | 56 | 0.485 |
P14 | c27068 | (CT)13 | 0.4 | 4 | 210 | 54 | 0.595 |
P16 | c3374 | (CAG)9 | 0.6 | 5 | 166 | 54 | 0.485 |
P17 | c9346 | (CTC)9 | 0.11 | 3 | 157 | 54 | 0.290 |
P18 | c36716 | (GA)14 | 0.2 | 5 | 158 | 54 | 0.64 |
P19 | c19380 | (GAAG)7 | 0.6 | 4 | 183 | 56 | 0.475 |
P21 | c18940 | (AGC)10 | 0.8 | 4 | 184 | 56 | 0.64 |
P22 | c30792 | (CAA)5(CTA)5 | 0.3 | 2 | 283 | 55 | 0.375 |
P23 | c30453 | (CGATGG)5 | 0.6 | 3 | 189 | 57 | 0.58 |
P25 | c32105 | (CGATGG)5 | 0.6 | 3 | 188 | 57 | 0.58 |
P26 | c33727 | (TAAA)5(AT)6 | 0.33 | 4 | 270 | 55 | 0.725 |
P27 | c30007 | (TG)7(T)17 | 0.3 | 4 | 259 | 54 | 0.615 |
P30 | c5429 | (GAA)11 | 0.8 | 8 | 183 | 54 | 0.665 |
P31 | c1628 | (TTC)11 | 0.2 | 4 | 187 | 53 | 0.475 |
P39 | c33612 | (GAA)11 | 0.8 | 6 | 164 | 54 | 0.655 |
P40 | c74290 | (GAG)6 | 0.4 | 4 | 195 | 57 | 0.345 |
P47 | c35063 | (AG)14 | 0.44 | 6 | 156 | 54 | 0.772 |
P48 | c21222 | (GAAG)7 | 0.6 | 4 | 184 | 54 | 0.685 |
P52 | c17820 | (CAG)10 | 0.5 | 4 | 278 | 54 | 0.625 |
P54 | c27844 | (GAAAGG)5 | 0.6 | 4 | 193 | 53 | 0.65 |
P55 | c2465 | (GAGAA)6 | 0.9 | 4 | 292 | 53 | 0.565 |
P58 | c6985 | (TC)17 | 0.1 | 4 | 126 | 52 | 0.345 |
P60 | c28299 | (TATC)9 | 0.89 | 5 | 157 | 54 | 0.728 |
P61 | c19402 | (CTGCTC)9 | 0.1 | 2 | 156 | 56 | 0.095 |
P62 | c13037 | (TCC)6 | 0.5 | 2 | 237 | 53 | 0.455 |
P63 | c2023 | (CCA)6 | 0.7 | 4 | 144 | 57 | 0.705 |
P64 | c43017 | (TCT)9 | 0.2 | 3 | 141 | 52 | 0.445 |
P66 | c2969 | (GCA)5(ACA)9 | 0.4 | 6 | 281 | 52 | 0.795 |
P67 | c32672 | (TC)25 | 0.1 | 4 | 141 | 52 | 0.475 |
P69 | c5196 | (ACG)5(ACC)5act(ACC)6 | 0.1 | 2 | 192 | 57 | 0.095 |
P70 | c84245 | (TCC)8tt(CTC)9 | 0.5 | 3 | 286 | 53 | 0.405 |
The genetic distance among the ten varieties ranged from 0.30 to 0.96 (Table S2). The largest genetic distance was observed between ‘Scarlet’ and ‘Emerald ChocoZebra’, and the smallest was observed between ‘Splash’ and ‘Swift’. The white, green, light purple or light pink, and pink or rose-red flower varieties can be distinguished by P23 and P25, indicating that these two markers might be related to flower color. The UPGMA clustered the 10 varieties into three groups; ‘Emerald ChocoZebra’ formed one group, ‘Scarlet’ and ‘Chiang Mai pink’ formed another group, and the remaining 7 varieties formed the third group (Figure 3).
Discussion
The number of alleles at each SSR locus of C. alismatifolia was found to range the number of alleles at each SSR locus of C. alismatifolia was found to range from 1 to 12, with an average of 3.77. Research on peanut (Bosamia et al. 2015) yielded similar results regarding allele number as our study on C. alismatifolia. The CT motif was the most common SSR motif in C. alismatifolia, as also observed in Hevea brasiliensis (Li et al. 2012) and Corchorus capsularis (Saha et al. 2017). In contrast, the AT motif was found to be most common for Cryptomeria japonica (Ueno et al. 2012), and AAG was most common for Arachis hypogaea (Bosamia et al. 2015).
The high PIC values (0.095 to 0.795) indicated that most loci were highly polymorphic and informative. Flower color is one of the most important characters for ornamental plant breeding. The colors of the 10 varieties in this study can be classified into four categories: pink to rose red (1-6), green (7), pure white (8), and light pink or light purple (9, 10). The two molecular markers P23 and P25 can distinguish the 10 varieties by color and thus will be beneficial for molecular marker-assisted breeding related to flower color. Molecular markers based molecular linkage maps have been constructed in many ornamental plants, such as Anthurium andraeanum (Venkat et al. 2014), rose (Spiller et al. 2011) and Dendranthema morifolium (Zhang et al. 2011). The large number of primers we designed could be used to construct a molecular linkage map.
The identification of polymorphic EST-SSR markers can not only enhance our understanding of SSRs in the Curcuma alismatifolia transcriptome but also provide resources for genetic and genomic studies aimed at improving this ornamental flower.
There are more than fifty species in the Curcuma genus worldwide. Many members of the genus, such as Curcuma longa, C. viridiflora and C. zedoaria, have medicinal value due to their high contents of curcumin, which has blood-lipid lowering, antitumor, anti-inflammatory and antioxidant effects (Yang et al. 2020). The many SSR molecular markers detected in this work, included 35 identified as polymorphic, are potentially transferable to other Curcuma species.