SciELO - Scientific Electronic Library Online

 
vol.97 número4¿Por qué continuar realizando listados florísticos en México? El caso de la Región Terrestre Prioritaria Tacaná-Boquerón, ChiapasInventario florístico de la Reserva de la Biosfera Sierra del Abra Tanchipa, San Luis Potosí, México índice de autoresíndice de materiabúsqueda de artículos
Home Pagelista alfabética de revistas  

Servicios Personalizados

Revista

Articulo

Indicadores

Links relacionados

  • No hay artículos similaresSimilares en SciELO

Compartir


Botanical Sciences

versión On-line ISSN 2007-4476versión impresa ISSN 2007-4298

Resumen

RUIZ-SANCHEZ, Eduardo et al. Datataxa: a new script to extract metadata sequence information from GenBank, the Flora of Bajío as a case study. Bot. sci [online]. 2019, vol.97, n.4, pp.754-760.  Epub 04-Feb-2020. ISSN 2007-4476.  https://doi.org/10.17129/botsci.2226.

Background:

GenBank is a public repository that houses millions of nucleotide sequences. Several software have been developed to extract information stored in GenBank. However, none of them are useful to extract and organize GenBank accession based on metadata. We developed a new script called Datataxa, which works to mine GenBank information. The checklist of the Flora del Bajío y de Regiones Adyacentes (FBRA) was used as a case study to apply our script.

Questions:

How many species occurring in the FBRA have records in GenBank? What percentage of those records have been used for phylogenetic, phylogeographic, phylogenomic, barcoding, genetic diversity, and biogeographic studies?

Methods:

Datataxa was written in AutoIt Scripting Language in order to facilitate the extraction of information from GenBank. This information was classified in six study categories. A checklist of species published fascicles of FBRA was used as study case to apply our new script, and the previous categories were applied to the FBRA species list.

Results:

The script allowed us to search for meta information, like publication titles, for 2,558 species that were included in the FBRA. Of these, 1,575 had a least one record in GenBank. A total of 1,322 species were used in phylogenetic studies, followed by barcoding studies (326) and biogeographic studies (298). Phylogenomic (41), phylogeographic (34), and diversity studies (34) were the least represented.

Conclusions:

Datataxa was useful for mining metadata sequence information from GenBank and can be used with any list of species to get the GenBank accessions’ metadata.

Palabras llave : API; checklist; entrez; floristic treatment; GenBank; vascular plants.

        · resumen en Español     · texto en Inglés