Skip to content

Commit

Permalink
Add metrics/wer (#63)
Browse files Browse the repository at this point in the history
* Init commit

* Init commit

* Fix relative imports

* Format with black

* Remove redundant parameters

* Refactor differ class

* Reformat with black

* Refactor class and format with black

* Refactor class and format with black

* Add more-itertools

* refactor whisper normalizers for linting

* Move metrics dir

* Update README

* Add example transcripts for WER

* add diarization metrics

* update README

* update README

* update README

* Fix issue with printing the errors

* Add installation / usage guidance

* Fix issue with the README

* update utils to parse jsons in the same format the transcriber returns

* update version in setup.py

* Add metrics dir to linting, package, requirements

* Init Commit

* Ignore virtual environment

* Formatting

* Move wer scripts into dedicated dir

* Add examples

* Update diarization README

* Update metrics entrypoint

* Fix ctm format, fix normalizers

* Add top level README for metrics

* Update metrics READMEs

* Update READMEs + changelog

* Skip empty files, improve printing

* Improve handling of disfluencies

* Allow using SM JSON for metrics

* Modify disfluencies

* Bump version

---------

Co-authored-by: Dan Cochrane <[email protected]>
Co-authored-by: Ellena Reid <[email protected]>
  • Loading branch information
3 people authored Dec 7, 2023
1 parent 5cf9f07 commit 848f249
Show file tree
Hide file tree
Showing 54 changed files with 12,689 additions and 5 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ target/

# pyenv
.python-version
venv

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
Expand Down
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,11 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]
## [1.13.0] - 2023-12-07

### Added

- Add metrics toolkit for transcription and diarization

## [1.12.0] - 2023-11-03

Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
SOURCES := speechmatics/ tests/ examples/ setup.py
SOURCES := speechmatics/ tests/ examples/ metrics/ setup.py
VERSION ?= $(shell cat VERSION)

.PHONY: all
Expand Down
6 changes: 6 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,12 @@ A complete list of commands and flags can be found in the SDK docs at https://sp
and [RT API documentation](https://docs.speechmatics.com/rt-api-ref#transcription-config).
## SM Metrics
This package includes tooling for benchmarking transcription and diarization accuracy.
For more information, see the `metrics/README.md`
## Testing
To install development dependencies and run tests
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.12.0
1.13.0
47 changes: 47 additions & 0 deletions metrics/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# SM Metrics

We provide some additional tooling to help benchmark transcription and diarization performance.

## Getting Started

### CLI

The `sm-metrics` binary is built after installing with PyPI or running `python3 setup.py` from the source code. To see the options from the command-line, use the following:
``` bash
sm-metrics -h
```

### Source Code

When executing directly from the source code:
```bash
python3 -m metrics.cli -h
```

## What's Included?

### Transcription Metrics

This includes tools to:
- Normalise transcripts
- Calculate Word Error Rate and Character Error Rate
- Calculate the number of substitutions, deletions and insertions for a given ASR transcript
- Visualise the alignment and differences between a reference and ASR transcript

### Diarization Metrics

This includes tools to calculate a number of metrics used in benchmarking diarization, including:

- Diarization Error Rate
- Segmentation precision, recall and F1-Scores
- Word Diarization Error Rate

## Documentation

More extensive information on the metrics themselves, as well as how to run them can be found on the READMEs.

For diarization, we provide an additional PDF.

## Support

If you have any issues with this library or encounter any bugs then please get in touch with us at [email protected] or raise an issue for this repo.
Empty file added metrics/__init__.py
Empty file.
36 changes: 36 additions & 0 deletions metrics/cli.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
"""Entrypoint for SM metrics"""
import argparse

import metrics.diarization.sm_diarization_metrics.cookbook as diarization_metrics
import metrics.wer.__main__ as wer_metrics


def main():
parser = argparse.ArgumentParser(description="Your CLI description")

# Create subparsers
subparsers = parser.add_subparsers(
dest="mode", help="Metrics mode. Choose from 'wer' or 'diarization"
)
subparsers.required = True # Make sure a subparser id always provided

wer_parser = subparsers.add_parser("wer", help="Entrypoint for WER metrics")
wer_metrics.get_wer_args(wer_parser)

diarization_parser = subparsers.add_parser(
"diarization", help="Entrypoint for diarization metrics"
)
diarization_metrics.get_diarization_args(diarization_parser)

args = parser.parse_args()

if args.mode == "wer":
wer_metrics.main(args)
elif args.mode == "diarization":
diarization_metrics.main(args)
else:
print("Unsupported mode. Please use 'wer' or 'diarization'")


if __name__ == "__main__":
main()
13 changes: 13 additions & 0 deletions metrics/diarization/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
clean_files := .deps .deps-dev build dist

.PHONY: clean
clean:
$(RM) -rf $(clean_files)

.PHONY: wheel
wheel:
(pip install wheel && python3 setup.py bdist_wheel)

.PHONY: install
install:
pip install ./dist/*
88 changes: 88 additions & 0 deletions metrics/diarization/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# SM Diarization Metrics

This package includes tooling for a number of different metrics for benchmarking speaker diarization, including:

- Diarization Error Rate
- Word Diarization Error Rate
- Jaccard Error Rate
- Segmentation precision, recall and F1-Scores

## Getting Started

This project is Speechmatics' fork of https://github.com/pyannote/pyannote-metrics used to calculate various speaker diarization metrics from reference/hypothesis transcript pairs.

### Run from PyPI

```
pip install speechmatics-python
```

This package has a CLI supporting ctm, lab, or V2 JSON format transcripts and can be run using:

```bash
sm-metrics diarization <reference file> <hypothesis file>
```

For further guidance run:

```
sm-metrics diarization -h
```

### Run from source code

If you would prefer to clone the repo and run the source code, that can be done as follows.

Clone the repository and install package:
```bash
git clone https://github.com/speechmatics/speechmatics-python.git && cd speechmatics-python && python setup.py install
```

And run directly:
```
python3 -m metrics.cli <reference file> <transcript_file>
```



## Permitted Formats

### CTM

Plain text file with the '.ctm' extension. Each line is of the form:
```
<file id> <speaker> <start time> <end time> <word> <confidence>
```

### LAB

Plain text file with the '.lab' extension. Each line is of the form:
```
<start time> <end time> <speaker>
```

### JSON (Diarisation Reference format)

JSON file of the form:

```json
[
{
"speaker_name": "Speaker 1",
"word": "Seems",
"start": 0.75,
"duration": 0.29
},
]

```

### JSON (Speechmatics ASR Output)

V2 JSON output of Speechmatics ASR can be directly used as a hypothesis for diarization metrics

## Docs

Further description of how to use the tool and the metrics available are in sm_diarization_metrics.pdf

When using the PDF, be aware that it assumes you are running the source code directly from `./metrics/diarization`
4 changes: 4 additions & 0 deletions metrics/diarization/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
pyannote.core
pyannote.database
docopt
tabulate
34 changes: 34 additions & 0 deletions metrics/diarization/setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# -*- coding: utf-8 -*-
"""Package module."""

import os

from pip._internal.req import parse_requirements
from setuptools import find_packages, setup

requirements = parse_requirements("./requirements.txt", session=False)

git_tag = os.environ.get("CI_COMMIT_TAG")
if git_tag:
assert git_tag.startswith("diarization-metrics")
version = git_tag.lstrip("diarization-metrics/") if git_tag else "0.0.3"


def read(fname):
return open(os.path.join(os.path.dirname(__file__), fname)).read()


setup(
author="Speechmatics",
author_email="[email protected]",
description="Python module for evaluating speaker diarization.",
install_requires=[str(r.requirement) for r in requirements],
name="speechmatics_diarization_metrics",
license="Speechmatics Proprietary License",
packages=find_packages(exclude=("tests",)),
platforms=["linux"],
python_requires=">=3.5",
version=version,
long_description=read("README.md"),
long_description_content_type="text/markdown",
)
Binary file added metrics/diarization/sm_diarisation_metrics.pdf
Binary file not shown.
Empty file.
Loading

0 comments on commit 848f249

Please sign in to comment.