Add metrics/wer (#63)

* Init commit * Init commit * Fix relative imports * Format with black * Remove redundant parameters * Refactor differ class * Reformat with black * Refactor class and format with black * Refactor class and format with black * Add more-itertools * refactor whisper normalizers for linting * Move metrics dir * Update README * Add example transcripts for WER * add diarization metrics * update README * update README * update README * Fix issue with printing the errors * Add installation / usage guidance * Fix issue with the README * update utils to parse jsons in the same format the transcriber returns * update version in setup.py * Add metrics dir to linting, package, requirements * Init Commit * Ignore virtual environment * Formatting * Move wer scripts into dedicated dir * Add examples * Update diarization README * Update metrics entrypoint * Fix ctm format, fix normalizers * Add top level README for metrics * Update metrics READMEs * Update READMEs + changelog * Skip empty files, improve printing * Improve handling of disfluencies * Allow using SM JSON for metrics * Modify disfluencies * Bump version --------- Co-authored-by: Dan Cochrane <[email protected]> Co-authored-by: Ellena Reid <[email protected]>
speechmatics · Dec 7, 2023 · 848f249 · 848f249
1 parent 5cf9f07
commit 848f249
Show file tree

Hide file tree

Showing 54 changed files with 12,689 additions and 5 deletions.
diff --git a/.gitignore b/.gitignore
@@ -87,6 +87,7 @@ target/
 
 # pyenv
 .python-version
+venv
 
 # pipenv
 #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,7 +5,11 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## [Unreleased]
+## [1.13.0] - 2023-12-07
+
+### Added
+
+- Add metrics toolkit for transcription and diarization
 
 ## [1.12.0] - 2023-11-03
 

diff --git a/Makefile b/Makefile
@@ -1,4 +1,4 @@
-SOURCES := speechmatics/ tests/ examples/ setup.py
+SOURCES := speechmatics/ tests/ examples/ metrics/ setup.py
 VERSION ?= $(shell cat VERSION)
 
 .PHONY: all

diff --git a/README.md b/README.md
@@ -315,6 +315,12 @@ A complete list of commands and flags can be found in the SDK docs at https://sp
   and [RT API documentation](https://docs.speechmatics.com/rt-api-ref#transcription-config).
 
 
+## SM Metrics
+
+This package includes tooling for benchmarking transcription and diarization accuracy.
+
+For more information, see the `metrics/README.md`
+
 ## Testing
 
 To install development dependencies and run tests

diff --git a/VERSION b/VERSION
@@ -1 +1 @@
-1.12.0
+1.13.0
diff --git a/metrics/README.md b/metrics/README.md
@@ -0,0 +1,47 @@
+# SM Metrics
+
+We provide some additional tooling to help benchmark transcription and diarization performance.
+
+## Getting Started
+
+### CLI
+
+The `sm-metrics` binary is built after installing with PyPI or running `python3 setup.py` from the source code. To see the options from the command-line, use the following:
+``` bash
+sm-metrics -h
+```
+
+### Source Code
+
+When executing directly from the source code:
+```bash
+python3 -m metrics.cli -h
+```
+
+## What's Included?
+
+### Transcription Metrics
+
+This includes tools to:
+- Normalise transcripts
+- Calculate Word Error Rate and Character Error Rate
+- Calculate the number of substitutions, deletions and insertions for a given ASR transcript
+- Visualise the alignment and differences between a reference and ASR transcript
+
+### Diarization Metrics
+
+This includes tools to calculate a number of metrics used in benchmarking diarization, including:
+
+- Diarization Error Rate
+- Segmentation precision, recall and F1-Scores
+- Word Diarization Error Rate
+
+## Documentation
+
+More extensive information on the metrics themselves, as well as how to run them can be found on the READMEs.
+
+For diarization, we provide an additional PDF.
+
+## Support
+
+If you have any issues with this library or encounter any bugs then please get in touch with us at [email protected] or raise an issue for this repo.
diff --git a/metrics/__init__.py b/metrics/__init__.py
diff --git a/metrics/cli.py b/metrics/cli.py
@@ -0,0 +1,36 @@
+"""Entrypoint for SM metrics"""
+import argparse
+
+import metrics.diarization.sm_diarization_metrics.cookbook as diarization_metrics
+import metrics.wer.__main__ as wer_metrics
+
+
+def main():
+    parser = argparse.ArgumentParser(description="Your CLI description")
+
+    # Create subparsers
+    subparsers = parser.add_subparsers(
+        dest="mode", help="Metrics mode. Choose from 'wer' or 'diarization"
+    )
+    subparsers.required = True  # Make sure a subparser id always provided
+
+    wer_parser = subparsers.add_parser("wer", help="Entrypoint for WER metrics")
+    wer_metrics.get_wer_args(wer_parser)
+
+    diarization_parser = subparsers.add_parser(
+        "diarization", help="Entrypoint for diarization metrics"
+    )
+    diarization_metrics.get_diarization_args(diarization_parser)
+
+    args = parser.parse_args()
+
+    if args.mode == "wer":
+        wer_metrics.main(args)
+    elif args.mode == "diarization":
+        diarization_metrics.main(args)
+    else:
+        print("Unsupported mode. Please use 'wer' or 'diarization'")
+
+
+if __name__ == "__main__":
+    main()
diff --git a/metrics/diarization/Makefile b/metrics/diarization/Makefile
@@ -0,0 +1,13 @@
+clean_files := .deps .deps-dev build dist
+
+.PHONY: clean
+clean:
+	$(RM) -rf $(clean_files)
+
+.PHONY: wheel
+wheel:
+	(pip install wheel && python3 setup.py bdist_wheel)
+
+.PHONY: install
+install:
+	pip install ./dist/*
diff --git a/metrics/diarization/README.md b/metrics/diarization/README.md
@@ -0,0 +1,88 @@
+# SM Diarization Metrics
+
+This package includes tooling for a number of different metrics for benchmarking speaker diarization, including:
+
+- Diarization Error Rate
+- Word Diarization Error Rate
+- Jaccard Error Rate
+- Segmentation precision, recall and F1-Scores
+
+## Getting Started
+
+This project is Speechmatics' fork of https://github.com/pyannote/pyannote-metrics used to calculate various speaker diarization metrics from reference/hypothesis transcript pairs.
+
+### Run from PyPI
+
+```
+pip install speechmatics-python
+```
+
+This package has a CLI supporting ctm, lab, or V2 JSON format transcripts and can be run using:
+
+```bash
+sm-metrics diarization <reference file> <hypothesis file>
+```
+
+For further guidance run:
+
+```
+sm-metrics diarization -h
+```
+
+### Run from source code
+
+If you would prefer to clone the repo and run the source code, that can be done as follows.
+
+Clone the repository and install package:
+```bash
+git clone https://github.com/speechmatics/speechmatics-python.git && cd speechmatics-python && python setup.py install
+```
+
+And run directly:
+```
+python3 -m metrics.cli <reference file> <transcript_file>
+```
+
+
+
+## Permitted Formats
+
+### CTM
+
+Plain text file with the '.ctm' extension. Each line is of the form:
+```
+<file id> <speaker> <start time> <end time> <word> <confidence>
+```
+
+### LAB
+
+Plain text file with the '.lab' extension. Each line is of the form:
+```
+<start time> <end time> <speaker>
+```
+
+### JSON (Diarisation Reference format)
+
+JSON file of the form:
+
+```json
+[
+    {
+        "speaker_name": "Speaker 1",
+        "word": "Seems",
+        "start": 0.75,
+        "duration": 0.29
+    },
+]
+
+```
+
+### JSON (Speechmatics ASR Output)
+
+V2 JSON output of Speechmatics ASR can be directly used as a hypothesis for diarization metrics
+
+## Docs
+
+Further description of how to use the tool and the metrics available are in sm_diarization_metrics.pdf
+
+When using the PDF, be aware that it assumes you are running the source code directly from `./metrics/diarization`
diff --git a/metrics/diarization/requirements.txt b/metrics/diarization/requirements.txt
@@ -0,0 +1,4 @@
+pyannote.core
+pyannote.database
+docopt
+tabulate
diff --git a/metrics/diarization/setup.py b/metrics/diarization/setup.py
@@ -0,0 +1,34 @@
+# -*- coding: utf-8 -*-
+"""Package module."""
+
+import os
+
+from pip._internal.req import parse_requirements
+from setuptools import find_packages, setup
+
+requirements = parse_requirements("./requirements.txt", session=False)
+
+git_tag = os.environ.get("CI_COMMIT_TAG")
+if git_tag:
+    assert git_tag.startswith("diarization-metrics")
+version = git_tag.lstrip("diarization-metrics/") if git_tag else "0.0.3"
+
+
+def read(fname):
+    return open(os.path.join(os.path.dirname(__file__), fname)).read()
+
+
+setup(
+    author="Speechmatics",
+    author_email="[email protected]",
+    description="Python module for evaluating speaker diarization.",
+    install_requires=[str(r.requirement) for r in requirements],
+    name="speechmatics_diarization_metrics",
+    license="Speechmatics Proprietary License",
+    packages=find_packages(exclude=("tests",)),
+    platforms=["linux"],
+    python_requires=">=3.5",
+    version=version,
+    long_description=read("README.md"),
+    long_description_content_type="text/markdown",
+)
diff --git a/metrics/diarization/sm_diarisation_metrics.pdf b/metrics/diarization/sm_diarisation_metrics.pdf
diff --git a/metrics/diarization/sm_diarization_metrics/__init__.py b/metrics/diarization/sm_diarization_metrics/__init__.py
-Original file line number
+Diff line change
@@ Expand Up / @@ -87,6 +87,7 @@ target/ @@
     # pyenv
     .python-version
+    venv
     # pipenv
     #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
@@ Expand Down @@