SciELO - Scientific Electronic Library Online

 
vol.97 issue4Why continue with floristic checklists in Mexico? The case of the Tacaná-Boquerón Priority Terrestrial Region, in the Mexican State of ChiapasFloristic inventory of the Biosphere Reserve Sierra del Abra Tanchipa, San Luis Potosí, Mexico author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Botanical Sciences

On-line version ISSN 2007-4476Print version ISSN 2007-4298

Abstract

RUIZ-SANCHEZ, Eduardo et al. Datataxa: a new script to extract metadata sequence information from GenBank, the Flora of Bajío as a case study. Bot. sci [online]. 2019, vol.97, n.4, pp.754-760.  Epub Feb 04, 2020. ISSN 2007-4476.  https://doi.org/10.17129/botsci.2226.

Background:

GenBank is a public repository that houses millions of nucleotide sequences. Several software have been developed to extract information stored in GenBank. However, none of them are useful to extract and organize GenBank accession based on metadata. We developed a new script called Datataxa, which works to mine GenBank information. The checklist of the Flora del Bajío y de Regiones Adyacentes (FBRA) was used as a case study to apply our script.

Questions:

How many species occurring in the FBRA have records in GenBank? What percentage of those records have been used for phylogenetic, phylogeographic, phylogenomic, barcoding, genetic diversity, and biogeographic studies?

Methods:

Datataxa was written in AutoIt Scripting Language in order to facilitate the extraction of information from GenBank. This information was classified in six study categories. A checklist of species published fascicles of FBRA was used as study case to apply our new script, and the previous categories were applied to the FBRA species list.

Results:

The script allowed us to search for meta information, like publication titles, for 2,558 species that were included in the FBRA. Of these, 1,575 had a least one record in GenBank. A total of 1,322 species were used in phylogenetic studies, followed by barcoding studies (326) and biogeographic studies (298). Phylogenomic (41), phylogeographic (34), and diversity studies (34) were the least represented.

Conclusions:

Datataxa was useful for mining metadata sequence information from GenBank and can be used with any list of species to get the GenBank accessions’ metadata.

Keywords : API; checklist; entrez; floristic treatment; GenBank; vascular plants.

        · abstract in Spanish     · text in English     · English ( pdf )