From 486f4e4a66d096264a6ca641f1d7a59dcb8863b3 Mon Sep 17 00:00:00 2001 From: rhiaro Date: Fri, 24 May 2024 16:05:14 +0100 Subject: [PATCH] chore(docs): Add docs about scoring --- docs/howto.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/docs/howto.md b/docs/howto.md index 5468e88..252b0e3 100644 --- a/docs/howto.md +++ b/docs/howto.md @@ -127,6 +127,19 @@ You can adjust some thresholds based on the accuracy and completeness of the dat ### Scoring +The tool first compares all pairs nodes in the two networks which are within the [node match radius](#settings) of each other. It then updates the span data to use the consolidated nodes, and then compares all pairs of spans which have the same start and end nodes. + +The **overall confidence score** of the similarity between two features is generated by comparing the values of each field of each feature. Confidence scores for each pair of fields are generated, which are then combined to generate the overall score. + +The scoring is based on heuristics, which are derived from: + +* the purpose of the field, according to the Open Fibre Data Standard, and +* the type of data the field holds + +We use a combination of exact matching, string similarity metrics, list overlaps, and geographical distance to calculate the scores. + +When doing a manual comparison, the overall confidence score, and the breakdown of the fields this was derived from, are shown in the interface, alongside the maps displaying the features being compared. You can use this information to make the final decision about whether the two features are the same (and should be consolidated into one) or not (and should both be kept). + ### Output The final output of the tool are geoJSON files saved to your computer locally.