Skip to content

Commit

Permalink
Merge branch 'master' into gh-3243/add_pickle_support
Browse files Browse the repository at this point in the history
  • Loading branch information
helpmefindaname authored Mar 29, 2024
2 parents 7021c2e + 72dc4ad commit a4fd5ed
Show file tree
Hide file tree
Showing 63 changed files with 4,814 additions and 1,016 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ jobs:
test:
runs-on: ubuntu-latest
env:
TRANSFORMERS_CACHE: ./cache/transformers
HF_HOME: ./cache/transformers
FLAIR_CACHE_ROOT: ./cache/flair
steps:
- uses: actions/checkout@v3
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/issues.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ on: issue_comment
jobs:
issue_commented:
name: Issue comment
if: ${{ github.event.issue.pull_request && github.event.issue.author == github.even.issue_comment.author }}
if: ${{ github.event.issue.author == github.even.issue_comment.author }}
runs-on: ubuntu-latest
steps:
- uses: actions-ecosystem/action-remove-labels@v1
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,4 @@ venv.bak/

resources/taggers/
regression_train/
/doc_build/
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ document embeddings, including our proposed [Flair embeddings](https://www.aclwe
* **A PyTorch NLP framework.** Our framework builds directly on [PyTorch](https://pytorch.org/), making it easy to
train your own models and experiment with new approaches using Flair embeddings and classes.

Now at [version 0.12.2](https://github.com/flairNLP/flair/releases)!
Now at [version 0.13.1](https://github.com/flairNLP/flair/releases)!


## State-of-the-Art Models
Expand Down Expand Up @@ -191,7 +191,7 @@ If you use our new "FLERT" models or approach, please cite [this paper](https://
}
```

If you use our TARS approach for few-shot and zero-shot learning, please cite [this paper](https://kishaloyhalder.github.io/pdfs/tars_coling2020.pdf/):
If you use our TARS approach for few-shot and zero-shot learning, please cite [this paper](https://aclanthology.org/2020.coling-main.285/):

```
@inproceedings{halder2020coling,
Expand Down
6 changes: 3 additions & 3 deletions assets/redirect.html
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
<!DOCTYPE html>
<html>
<head>
<title>Redirecting to https://flairnlp.github.io/</title>
<title>Redirecting to https://flairnlp.github.io/flair/v[VERSION]/</title>
<meta charset="utf-8">
<meta http-equiv="refresh" content="0; URL=https://flairnlp.github.io/">
<link rel="canonical" href="https://flairnlp.github.io/">
<meta http-equiv="refresh" content="0; URL=https://flairnlp.github.io/flair/v[VERSION]/">
<link rel="canonical" href="https://flairnlp.github.io/flair/v[VERSION]/">
</head>
</html>
8 changes: 4 additions & 4 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
# -- Project information -----------------------------------------------------
from sphinx_github_style import get_linkcode_resolve

version = "0.12.2"
release = "0.12.2"
version = "0.13.1"
release = "0.13.1"
project = "flair"
author = importlib_metadata.metadata(project)["Author"]
copyright = f"2023 {author}"
Expand Down Expand Up @@ -113,7 +113,7 @@ def linkcode_resolve(*args):
smv_latest_version = importlib_metadata.version(project)

# Whitelist pattern for tags (set to None to ignore all tags)
smv_tag_whitelist = r"^\d+\.\d+\.\d+$"
smv_tag_whitelist = r"^v\d+\.\d+\.\d+$"

# Whitelist pattern for branches (set to None to ignore all branches)
smv_branch_whitelist = r"^master$"
Expand All @@ -122,7 +122,7 @@ def linkcode_resolve(*args):
smv_remote_whitelist = r"^origin$"

# Pattern for released versions
smv_released_pattern = r"^refs/tags/\d+\.\d+\.\d+$"
smv_released_pattern = r"^refs/tags/v\d+\.\d+\.\d+$"

# Format for versioned output directories inside the build directory
smv_outputdir_format = "{ref.name}"
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial/intro.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ In your favorite virtual environment, simply do:
pip install flair
```

Flair requires Python 3.7+.
Flair requires Python 3.8+.

## Example 1: Tag Entities in Text

Expand Down
127 changes: 127 additions & 0 deletions docs/tutorial/tutorial-basics/entity-mention-linking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Using and creating entity mention linker

As of Flair 0.14 we ship the [entity mention linker](#flair.models.EntityMentionLinker) - the core framework behind the [Hunflair BioNEN aproach](https://huggingface.co/hunflair)].

## Example 1: Printing Entity linking outputs to console

To illustrate, let's use the example the hunflair models on a biomedical sentence:

```python
from flair.models import EntityMentionLinker
from flair.nn import Classifier
from flair.tokenization import SciSpacyTokenizer
from flair.data import Sentence

sentence = Sentence(
"The mutation in the ABCD1 gene causes X-linked adrenoleukodystrophy, "
"a neurodegenerative disease, which is exacerbated by exposure to high "
"levels of mercury in dolphin populations.",
use_tokenizer=SciSpacyTokenizer()
)

ner_tagger = Classifier.load("hunflair")
ner_tagger.predict(sentence)

nen_tagger = EntityMentionLinker.load("disease-linker-no-ab3p")
nen_tagger.predict(sentence)

for tag in sentence.get_labels():
print(tag)
```

```{note}
Here we use the `disease-linker-no-ab3p` model, as it is the simplest model to run. You might get better results by using `disease-linker` instead,
but under the hood ab3p uses an executeable that is only compiled for linux and therefore won't run on every system.
Analogously to `disease` there are also linker for `chemical`, `species` and `gene`
all work with the `{entity_type}-linker` or `{entity_type}-linker-no-ab3p` naming-schema
```


This should print:
```console
Span[4:5]: "ABCD1" → Gene (0.9509)
Span[7:11]: "X-linked adrenoleukodystrophy" → Disease (0.9872)
Span[7:11]: "X-linked adrenoleukodystrophy" → MESH:D000326/name=Adrenoleukodystrophy (195.30780029296875)
Span[13:15]: "neurodegenerative disease" → Disease (0.8988)
Span[13:15]: "neurodegenerative disease" → MESH:D019636/name=Neurodegenerative Diseases (201.1804962158203)
Span[29:30]: "mercury" → Chemical (0.9484)
Span[31:32]: "dolphin" → Species (0.8071)
```

As we can see, the huflair-ner model resolved entities of several types, however for the disease linker, only those of type disease were relevant:
- "X-linked adrenoleukodystrophy" refers to the entity "[Adrenoleukodystrophy](https://id.nlm.nih.gov/mesh/D000326.html)"
- "neurodegenerative disease" refers to the "[Neurodegenerative Diseases](https://id.nlm.nih.gov/mesh/D019636.html)"


## Example 2: Structured handling of predictions

After the predictions, the flair sentence has multiple labels added to the sentence object.
* Each NER prediction adds a span referenced by the `label_type` from the span tagger.
* Each NEL prediction adds one or more labels (up to `k`) to the respective span. Those have the `label_type` from the entity mention linker.
* The NEL labels are ordered by their score. Depending on the exact implementation, it is possible that the order is ascending or descending, however the first one is always the best.

Therefore, an example to extract the information to a dictionary that could be used for further processing is the following:

```python
from flair.models import EntityMentionLinker
from flair.nn import Classifier
from flair.tokenization import SciSpacyTokenizer
from flair.data import Sentence

sentence = Sentence(
"The mutation in the ABCD1 gene causes X-linked adrenoleukodystrophy, "
"a neurodegenerative disease, which is exacerbated by exposure to high "
"levels of mercury in dolphin populations.",
use_tokenizer=SciSpacyTokenizer()
)

ner_tagger = Classifier.load("hunflair")
ner_tagger.predict(sentence)

nen_tagger = EntityMentionLinker.load("disease-linker-no-ab3p")

# top_k = 5 so that a span can have up to 5 labels assigned.
nen_tagger.predict(sentence, top_k=5)

result_mentions = []

for span in sentence.get_spans(ner_tagger.label_type):

# basic information about the span that is tagged.
span_data = {
"start": span.start_position + sentence.start_position,
"end": span.end_position + sentence.start_position,
"text": span.text,
}

# add the ner label. We always have only one, so we can use `span.get_label(...)`
span_data["ner_label"] = span.get_label(ner_tagger.label_type).value

mentions_found = []

# since `top_k` is larger than 1, we need to handle multiple nen labels. Therefore we use `span.get_labels(...)`
for label in span.get_labels(nen_tagger.label_type):
mentions_found.append({
"id": label.value,
"score": label.score,
})

# extract the most probable prediction if any prediction is found.
if mentions_found:
span_data["nen_id"] = mentions_found[0]["id"]
else:
span_data["nen_id"] = None

# add all found candidates with rating if you want to explore more than just the most probable prediction.
span_data["mention_candidates"] = mentions_found

result_mentions.append(span_data)

print(result_mentions)
```

```{note}
If you need more than the extracted ids, you can use `nen_tagger.dictionary[span_data["nen_id"]]`
to look up the [`flair.data.EntityCandidate`](#flair.data.EntityCandidate) which contains further information.
```
1 change: 1 addition & 0 deletions docs/tutorial/tutorial-basics/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ and showcases various models we ship with Flair.
tagging-entities
tagging-sentiment
entity-linking
entity-mention-linking
part-of-speech-tagging
other-models
how-to-tag-corpus
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This tutorial section show you how to train state-of-the-art NER models and othe

## Training a named entity recognition (NER) model with transformers

For a state-of-the-art NER sytem you should fine-tune transformer embeddings, and use full document context
For a state-of-the-art NER system you should fine-tune transformer embeddings, and use full document context
(see our [FLERT](https://arxiv.org/abs/2011.06993) paper for details).

Use the following script:
Expand Down
Loading

0 comments on commit a4fd5ed

Please sign in to comment.