## First check for the required packages, install if needed, and load the libraries.
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
::install("sangerseqR")
BiocManager::install_github("ropensci/bold")
remotes
if (!require("pacman")) install.packages("pacman")
::p_load(dplyr, curl, zip, readr, rgbif, usethis, stringr) pacman
Fetching plant occurrence records from GBIF
This notebook contains code used to pull plant species occurrence records from the GBIF API.
We use pacman
to mange the R packages and load libraries.
Read in BOLD species list and obtain GBIF keys
This block uses a site-specific list of species from the Yellowstone BOLD project to pull any taxon keys for those species hosted on GBIF, matched by exact scientific names.
The list of species we used can be accessed from Anderson and Hoff (2024), and filtered for trnL.
Similarly, and data from a BOLD project could be downloaded and used in this analysis to generate a similar map of global coverage from a localized sampling effort.
The data being read in for this block is data set S4 in the Supplement provided with the publication.
<- readr::read_csv("../data/Kartzinel_et_al_Dataset_S4_20241030.csv") %>%
species_list pull("Species")
Rows: 570 Columns: 62
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (36): Project Code, Process ID, Sample ID, Field ID, rbcL Seq. Length, r...
dbl (7): rbcL Trace Count, matK Trace Count, trnL-F Trace Count, trnH-psbA ...
lgl (19): BIN, Catalog Num, Image Count, Contamination, Stop Codon, Flagged ...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Get all backbone results (without filtering)
<- name_backbone_checklist(species_list, kingdom = "plants")
all_matches
<- all_matches %>%
exact_key_matches filter(matchType == "EXACT") %>%
select(usageKey) %>%
as.list()
# Find taxa that didn't match at species level.
<- all_matches %>%
not_exact_matches filter(matchType != "EXACT")
Investigate taxon keys for fuzzy matches and higher rank matches
Some keys may indicate that local species are sharing a taxon key, or the keys returned were backed off to higher taxonomic levels. Beware that these can result in many more occurrence records matched at higher taxonomic levels. This can happen for hyper-local species with no occurrence records in GBIF. For our purposes, we kept only exact species matches.
# View results
not_exact_matches
# A tibble: 53 × 26
usageKey scientificName canonicalName rank status confidence matchType
<int> <chr> <chr> <chr> <chr> <int> <chr>
1 3033620 Pulsatilla Mill. Pulsatilla GENUS ACCEP… 99 HIGHERRA…
2 2704858 Calamagrostis Adans. Calamagrostis GENUS ACCEP… 96 HIGHERRA…
3 3064 Amaranthaceae Amaranthaceae FAMI… ACCEP… 99 HIGHERRA…
4 3171742 Erythranthe Spach Erythranthe GENUS ACCEP… 99 HIGHERRA…
5 8148051 Bistorta (L.) Scop. Bistorta GENUS ACCEP… 99 HIGHERRA…
6 8148051 Bistorta (L.) Scop. Bistorta GENUS ACCEP… 99 HIGHERRA…
7 8148051 Bistorta (L.) Scop. Bistorta GENUS ACCEP… 99 HIGHERRA…
8 NA <NA> <NA> <NA> <NA> 100 NONE
9 NA <NA> <NA> <NA> <NA> 100 NONE
10 NA <NA> <NA> <NA> <NA> 100 NONE
# ℹ 43 more rows
# ℹ 19 more variables: kingdom <chr>, phylum <chr>, order <chr>, family <chr>,
# genus <chr>, species <chr>, kingdomKey <int>, phylumKey <int>,
# classKey <int>, orderKey <int>, familyKey <int>, genusKey <int>,
# speciesKey <int>, synonym <lgl>, class <chr>, acceptedUsageKey <int>,
# verbatim_name <chr>, verbatim_index <dbl>, verbatim_kingdom <chr>
We had 6 species that did not match a species key, resulting in 98% of the species having data we can use from GBIF to explore global geographic coverage of these species.
Set GBIF credentials
The following block will open your .Renviron file. Register an account with GBIF on their website and then add these environment variables to the .Renviron and save: GBIF_USER=“user” GBIF_PWD=“password” GBIF_EMAIL=“email”.
After requesting the data based on our list of taxon keys, we will get millions of occurrence records that we can download; the data will be help in your GBIF portal.
::edit_r_environ() usethis
☐ Edit '/Users/tdivoll/.Renviron'.
☐ Restart R for changes to take effect.
Request the occurrence data
We’ll further restrict the data returned to records that have reliable coordinate data, and use a simple CSV format to reduce the size of the data. The Darwin Core Archive format will include much more metadata, but we’re only interested in the locations for this analysis.
<- occ_download(
gbif_data_BOLDlist pred_in("taxonKey", exact_key_matches$usageKey),
pred("hasCoordinate", TRUE),
pred("hasGeospatialIssue", FALSE),
format = "SIMPLE_CSV"
)
Get metadata and wait
Get the metadata about the request.
# this will print some info, including the download ID we need to check on the job gbif_data_BOLDlist
<<gbif download>>
Your download is being processed by GBIF:
https://www.gbif.org/occurrence/download/0007289-250127130748423
Most downloads finish within 15 min.
Check status with
occ_download_wait('0007289-250127130748423')
After it finishes, use
d <- occ_download_get('0007289-250127130748423') %>%
occ_download_import()
to retrieve your download.
Download Info:
Username: tdivoll
E-mail: timothy_divoll@brown.edu
Format: SIMPLE_CSV
Download key: 0007289-250127130748423
Created: 2025-02-01T12:16:59.420+00:00
Citation Info:
Please always cite the download DOI when using this data.
https://www.gbif.org/citation-guidelines
DOI: 10.15468/dl.7bdmx3
Citation:
GBIF Occurrence Download https://doi.org/10.15468/dl.7bdmx3 Accessed from R via rgbif (https://github.com/ropensci/rgbif) on 2025-02-01
Check the status of the download.
occ_download_wait('0066939-241126133413365')
status: succeeded
download is done, status: succeeded
<<gbif download metadata>>
Status: SUCCEEDED
DOI: 10.15468/dl.48qedg
Format: SIMPLE_CSV
Download key: 0066939-241126133413365
Created: 2025-01-13T17:50:02.365+00:00
Modified: 2025-01-13T18:02:18.702+00:00
Download link: https://api.gbif.org/v1/occurrence/download/request/0066939-241126133413365.zip
Total records: 20510078