How to disambiguate taxonomy

Overview

Teaching: 20 min
Exercises: 15 min
Questions
  • How do I deal with homonyms in the ALA?

  • How do I look up more detailed taxonomic information?

  • How do I use this information to look up counts and occurrences?

Objectives
  • Disambiguate a species using the scientific_name argument

  • Identify filters to use for your data, their possible values, and add it to your galah query

  • Begin to learn when you use scientific_name vs. filter=.

Note: based on https://galah.ala.org.au/Python/galah_user_guide/Taxonomic_Filtering.html#taxonomic-filtering

Taxonomic complexity can confound the process of searching, filtering, and downloading records using galah, but there are a few ways to ensure records are not missed by using [functions] in galah. Let’s start by configuring galah to the ALA.

import galah
galah.galah_config(atlas="Australia",email="your-email-here")

Learning about search_taxa()

search_taxa() enables users to look up taxonomic names before downloading data, which allows for disambiguating homonyms and checking that the search term matches the taxon name in the ALA . search_taxa() returns the scientific name, authorship, rank, and full classification for the taxon matched to the provided search term.

galah.search_taxa(taxa="Petroica boodang")
                scientificName scientificNameAuthorship  ...           species vernacularName   issues
0  Petroica (Petroica) boodang           (Lesson, 1838)  ...  Petroica boodang  Scarlet Robin  noIssue

It can also return taxonomic information for multiple species, including synonyms and Indigneous names.

# Muscicapa chrysoptera is a synonym for the Flame Robin, Petroica phoenicea
# Guniibuu is the Yuwaalaraay Indigenous name for the Red-Capped Robin, Petroica goodenovii
galah.search_taxa(taxa = ["Muscicapa chrysoptera", "Guniibuu"])
                   scientificName    scientificNameAuthorship  ...    vernacularName   issues
0   Petroica (Littlera) phoenicea                 Gould, 1837  ...       Flame Robin  noIssue
1  Petroica (Petroica) goodenovii  (Vigors & Horsfield, 1827)  ...  Red-capped Robin  noIssue

Where homonyms exist, search_taxa() will prompt users to clarify the search term by providing one or more taxonomic ranks using the search_taxa() argument scientific_name. This example differentiates among the genus Morganella in three kingdoms:

galah.search_taxa(taxa = ["Morganella"])
Warning: Search returned multiple taxa due to a homonym issue.
Please use the `scientific_name` argument to clarify taxa.
  search_term   issues
0  Morganella  homonym
galah.search_taxa(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
  scientificName scientificNameAuthorship   rank  ...       order       family       genus   issues
0     Morganella                   Zeller  genus  ...  Agaricales  Agaricaceae  Morganella  noIssue

This disambiguation of the Morganella taxa can then be used by atlas_counts(), atlas_occurrences(), atlas_species() or atlas_media() by providing the keyword scientific_name to any of these functions.

atlas_counts()

galah.atlas_counts(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
   totalRecords
0           149

atlas_occurrences()

galah.atlas_occurrences(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
     decimalLatitude  decimalLongitude  ...                                   dataResourceName occurrenceStatus
0         -47.000000        168.200000  ...    New Zealand Fungal and Plant Disease Collection          PRESENT
1         -46.879900        168.136500  ...    New Zealand Fungal and Plant Disease Collection          PRESENT
2         -46.874875        168.124660  ...    New Zealand Fungal and Plant Disease Collection          PRESENT
3         -46.862757        168.116777  ...    New Zealand Fungal and Plant Disease Collection          PRESENT
4         -46.554617        169.479051  ...    New Zealand Fungal and Plant Disease Collection          PRESENT
..               ...               ...  ...                                                ...              ...
144              NaN               NaN  ...   Royal Botanic Gardens, Kew - Fungarium Specimens          PRESENT
145              NaN               NaN  ...   Royal Botanic Gardens, Kew - Fungarium Specimens          PRESENT
146       -22.500000        145.000000  ...     USDA United States National Fungus Collections          PRESENT
147              NaN               NaN  ...     USDA United States National Fungus Collections          PRESENT
148        -8.916667        148.150000  ...  Centre for Australian National Biodiversity Re...          PRESENT

[149 rows x 8 columns]

OPTIONAL: Using filters= to search for exact matches

filters= subsets records by searching for exact matches to an expression, and may also be used for taxonomic filtering. for example, if we want to search for multiple species of robins in Australia, we can do this for single or multiple species. We can also group the multiple species by their species names so we can compare the number of records for each robin.

galah.atlas_counts(taxa="Petroica boodang")
   totalRecords
0        132331
aus_petroica = ["Petroica boodang", "Petroica goodenovii",
                "Petroica phoenicea", "Petroica rosea",
                "Petroica rodinogaster", "Petroica multicolor"]
galah.atlas_counts(taxa=aus_petroica,group_by=["species","vernacularName"])
                  species               vernacularName   count
0        Petroica boodang        Eastern Scarlet Robin    3766
1        Petroica boodang                Scarlet Robin  128261
2        Petroica boodang  South-western Scarlet Robin     211
3        Petroica boodang      Tasmanian Scarlet Robin      93
4     Petroica goodenovii             Red-capped Robin  120947
5     Petroica multicolor                Pacific Robin    6856
6      Petroica phoenicea                  Flame Robin   82751
7   Petroica rodinogaster          Mainland Pink Robin      69
8   Petroica rodinogaster                   Pink Robin   15608
9   Petroica rodinogaster         Tasmanian Pink Robin      47
10         Petroica rosea                   Rose Robin   60552

This can be useful in searching for paraphyletic or polyphyletic groups. For example, to get counts of non-chordates:

non_chordates = galah.atlas_counts(
    filters=["kingdom=Animalia","phylum!=Chordata"],
    group_by=["phylum"],
    expand=False
)
non_chordates.head()
           phylum     count
0  Acanthocephala       482
1        Annelida    332234
2      Arthropoda  10135041
3     Brachiopoda     11634
4         Bryozoa     32937

OPTIONAL: Deciding between filters=, search_taxa(), and taxonomic ranks

Deciding between using filters= and search_taxa() in a query comes down to how a record has been classified, and whether or not you have the correct unique name and classification of the taxa of interest.

The ALA has fields for the primary taxonomic ranks (kingdom, phylum, class, order, family, genus, species) and some secondary ranks (e.g. subfamily, subgenus), all of which may be used with filters= and search_taxa(). Additionally, there is a field named scientificName, which refers to the lowest taxonomic rank to which a record has been identified e.g.

import numpy as np
pitta_ranks = galah.atlas_counts(
    taxa="Pitta",
    group_by=["scientificName","taxonRank"]
)
pitta_ranks = pitta_ranks.loc[pitta_ranks["scientificName"].notnull()]
pitta_ranks
                                 scientificName   taxonRank  count
0                                         Pitta       genus     70
1                          Pitta (Erythropitta)    subgenus    882
2            Pitta (Erythropitta) erythrogaster     species    190
3   Pitta (Erythropitta) erythrogaster digglesi  subspecies      6
4                            Pitta (Pitta) iris     species   6600
5                       Pitta (Pitta) iris iris  subspecies     91
6              Pitta (Pitta) iris johnstoneiana  subspecies     27
7                      Pitta (Pitta) versicolor     species  30295
8           Pitta (Pitta) versicolor intermedia  subspecies     64
9            Pitta (Pitta) versicolor simillima  subspecies     53
10          Pitta (Pitta) versicolor versicolor  subspecies    424

If, for instance, you have the correct species or subspecies name, then searching for matches against the species and subspecies fields, respectively, will provide more precise results. This is because the field scientificName may include subgenera. If you’ve used search_taxa() to get the ALA-matched name of a taxon and only want records identified to a particular level of classification, searching for matches against scientificName is recommended.

Paraphyletic or polyphyletic groups may contain taxa identified to different taxonomic levels. In this case, it is simpler to use search_taxa(). In the example below, search_taxa() matches terms to one genus, three species, and two subspecies. This can then be used in atlas_counts() to get counts for each scientific name.

tas_endemic = ["Sarcophilus", # Tasmanian Devil
               "Bettongia gaimardi", # Tasmanian Bettong
               "Melanodryas vittata", # Dusky Robin
               "Platycercus caledonicus", # Green Rosella
               "Aquila audax fleayi", # Tasmanian Wedge-Tailed Eagle
               "Tyto novaehollandiae castanops" # Tasmanian Masked Owl
              ]
galah.search_taxa(taxa=tas_endemic)
                          scientificName scientificNameAuthorship  ...                  species                vernacularName
0                            Sarcophilus             Cuvier, 1837  ...                      NaN                           NaN
1                     Bettongia gaimardi        (Desmarest, 1822)  ...       Bettongia gaimardi             Tasmanian Bettong
2      Melanodryas (Amaurodryas) vittata   (Quoy & Gaimard, 1830)  ...      Melanodryas vittata                   Dusky Robin
3  Platycercus (Platycercus) caledonicus           (Gmelin, 1788)  ...  Platycercus caledonicus                 Green Rosella
4         Aquila (Uroaetus) audax fleayi    Condon & Amadon, 1954  ...             Aquila audax  Tasmanian Wedge-tailed Eagle
5         Tyto novaehollandiae castanops            (Gould, 1837)  ...     Tyto novaehollandiae          Tasmanian Masked Owl
galah.atlas_counts(
    taxa=tas_endemic,
    group_by=["scientificName"],
    expand=False
)
                                       scientificName  count
0                      Aquila (Uroaetus) audax fleayi   5090
1                                  Bettongia gaimardi   2284
2                        Bettongia gaimardi cuniculus     54
3                         Bettongia gaimardi gaimardi      9
4                   Melanodryas (Amaurodryas) vittata  15807
5             Melanodryas (Amaurodryas) vittata kingi     16
6           Melanodryas (Amaurodryas) vittata vittata     62
7               Platycercus (Platycercus) caledonicus  51508
8       Platycercus (Platycercus) caledonicus brownii     24
9   Platycercus (Platycercus) caledonicus caledonicus     50
10                                        Sarcophilus    131
11                               Sarcophilus harrisii  36607
12                     Tyto novaehollandiae castanops     85

Key Points

  • When looking up taxa, getting the right scientific name may not be straightforward.

  • Filtering your data on taxonomic names can help with disambiguation.

  • Providing extra arguments can help get you the data you want.