Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Species confidence level #7

Open
JackRoache opened this issue Apr 12, 2018 · 1 comment
Open

Species confidence level #7

JackRoache opened this issue Apr 12, 2018 · 1 comment

Comments

@JackRoache
Copy link

We'd like a way to store the confidence level of an automated or manual species ID into Guano. The use case for this would be when running automated species identification tools so the confidence can be written to the metadata. I also believe many government agencies now require the confidence levels to be put into any report tendered.

I think the simplest way to incorporate this into the existing guano spec would be to add an optional field to Species Auto ID and Species Manual ID. This would be a percentage and stored as a float.

The problems I think need to be solved are

  • Use of exisitng Guano fields
  • Confidence seperator

I think using the existing Guano fields would be acceptable, as older reading implementations would still output the species label in a user readable way, although it may make it clumsy to use.

The separator is a bit harder, but I think it could get away with using a space and having the last item in the species label being the confidence level. Other options could be to use braces "{ }" or "[ ]" or chevrons "< >" as I'm sure these would be uncommon usages in existing species labels

I would propose something similar to:
optional, list of strings with optional float confidence level. . The optional float confidence level is represented as a percentage and should be the last field.

Species Auto ID: Bat 70.3, A very long bat name 50.8

@riggsd
Copy link
Owner

riggsd commented Apr 12, 2018

The "confidence level" for a given automated species classification is exclusively relevant to the specific algorithm used; to compare it between algorithms is akin to comparing "apples to oranges".

For example, Sonobat 3's expert system autoclassifier ranked classifications with a discriminant probability in the range 0.0 - 1.0. Sonobat 4's neural network provides a probability in the range 0.0 - 1.0 (but which is also - confusingly - sometimes greater than 1.0), though this value is not the same as the former discriminant probability value, and thus was explicitly renamed in its output report. Kaleidoscope Pro provides a "margin" value which they explicitly say cannot even be compared between two different species classifications, simply that higher values are more confident than lower values for a given species.

For these reasons, the confidence level of an automated classification is entirely vendor-dependent.

SonoBat places its vendor-specific metadata fields under the SB namespace. SonoBatLIVE writes an SB|Accepted XofY field, but - as of version 4.2, at least - does not write the probability to GUANO metadata. Please encourage them to do so!

Kaleidoscope Pro writes all of its confidence information to the WA|Kaleidoscope|Classifier|Statistics metadata field, though it is wrapped in JSON markup. Please commend them, and encourage them to break each field out into a separate GUANO field so they can be used independently.

As for manual species identification, if your organization has agreed upon a specific way to rank confidence, you may use your own field under the User namespace, for example User|Manual ID Confidence: high.

Do not stuff additional values into the Species Manual ID field; it is defined as a list of species labels, and placing anything other than species labels into it is illegal.

With regard to reporting confidence levels to government agencies, I believe you are referring to Maximum Liklihood Estimation values, which are a probabilistic confidence level at the site/night level, NOT the file level. This is outside the scope of GUANO metadata. See the USFWS Indiana Bat Summer Survey Guidelines for more information on the calculation and interpretation of MLE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants