SciELO - Scientific Electronic Library Online

 
vol.97 número4¿Por qué continuar realizando listados florísticos en México? El caso de la Región Terrestre Prioritaria Tacaná-Boquerón, ChiapasInventario florístico de la Reserva de la Biosfera Sierra del Abra Tanchipa, San Luis Potosí, México índice de autoresíndice de assuntospesquisa de artigos
Home Pagelista alfabética de periódicos  

Serviços Personalizados

Journal

Artigo

Indicadores

Links relacionados

  • Não possue artigos similaresSimilares em SciELO

Compartilhar


Botanical Sciences

versão On-line ISSN 2007-4476versão impressa ISSN 2007-4298

Resumo

RUIZ-SANCHEZ, Eduardo et al. Datataxa: a new script to extract metadata sequence information from GenBank, the Flora of Bajío as a case study. Bot. sci [online]. 2019, vol.97, n.4, pp.754-760.  Epub 04-Fev-2020. ISSN 2007-4476.  https://doi.org/10.17129/botsci.2226.

Background:

GenBank is a public repository that houses millions of nucleotide sequences. Several software have been developed to extract information stored in GenBank. However, none of them are useful to extract and organize GenBank accession based on metadata. We developed a new script called Datataxa, which works to mine GenBank information. The checklist of the Flora del Bajío y de Regiones Adyacentes (FBRA) was used as a case study to apply our script.

Questions:

How many species occurring in the FBRA have records in GenBank? What percentage of those records have been used for phylogenetic, phylogeographic, phylogenomic, barcoding, genetic diversity, and biogeographic studies?

Methods:

Datataxa was written in AutoIt Scripting Language in order to facilitate the extraction of information from GenBank. This information was classified in six study categories. A checklist of species published fascicles of FBRA was used as study case to apply our new script, and the previous categories were applied to the FBRA species list.

Results:

The script allowed us to search for meta information, like publication titles, for 2,558 species that were included in the FBRA. Of these, 1,575 had a least one record in GenBank. A total of 1,322 species were used in phylogenetic studies, followed by barcoding studies (326) and biogeographic studies (298). Phylogenomic (41), phylogeographic (34), and diversity studies (34) were the least represented.

Conclusions:

Datataxa was useful for mining metadata sequence information from GenBank and can be used with any list of species to get the GenBank accessions’ metadata.

Palavras-chave : API; checklist; entrez; floristic treatment; GenBank; vascular plants.

        · resumo em Espanhol     · texto em Inglês