The R Journal: article published in 2017, volume 9:2
David J. Winter , The R Journal (2017) 9:2, pages 520-526.
Abstract The USA National Center for Biotechnology Information (NCBI) is one of the world’s most important sources of biological information. NCBI databases like PubMed and GenBank contain mil lions of records describing bibliographic, genetic, genomic, and medical data. Here I present rentrez, a package which provides an R interface to 50 NCBI databases. The package is well-documented, contains an extensive suite of unit tests and has an active user base. The programmatic interface to the NCBI provided by rentrez allows researchers to query databases and download or import particular records into R sessions for subsequent analysis. The complete nature of the package, its extensive test-suite and the fact the package implements the NCBI’s usage policies all make rentrez a powerful aid to developers of new packages that perform more specific tasks.
Received: 2017-09-01; online 2017-11-16, supplementary material, (11.2 Kb)CRAN packages: ape, RISmed, pubmed.mineR, rentrez, reutils, rotl, fulltext, treemapCRAN Task Views implied by cited CRAN packages: Phylogenetics, Environmetrics, Genetics, Graphics, OfficialStatistics, WebTechnologiesBioconductor packages: genomes, RMassBank, MeSHSim, genbankr
fulltextpackage makes it easy to do text-mining by supporting the following steps:
- Search for articles
- Fetch articles
- Get links for full text articles (xml, pdf)
- Extract text from articles / convert formats
- Collect bits of articles that you actually need
Download supplementary materials from papers
A set of tools to extract bibliographic content from the National Center for Biotechnology Information (NCBI) databases, including PubMed. The name RISmed is a portmanteau of RIS (for Research Information Systems, a common tag format for bibliographic data) and PubMed.
Text mining of PubMed Abstracts (text and XML)
Provides an R interface to the NCBI's 'EUtils' API, allowing users to search databases like 'GenBank' <https://www.ncbi.nlm.nih.gov/genbank/> and 'PubMed' <https://www.ncbi.nlm.nih.gov/pubmed/>, process the results of those searches and pull data into their R sessions.
An interface to NCBI databases such as PubMed, GenBank, or GEO powered by the Entrez Programming Utilities (EUtils). The nine EUtils provide programmatic access to the NCBI Entrez query and database system for searching and retrieving biological data.