Skip to content

Commit

Permalink
chore(docs): Updated UI
Browse files Browse the repository at this point in the history
  • Loading branch information
rhiaro committed May 24, 2024
1 parent 19f90c6 commit 9e5a0e3
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 103 deletions.
110 changes: 8 additions & 102 deletions docs/howto.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,33 +111,27 @@ Tip: To view a map underneath the nodes and spans, go to the Browser window > XY
4. Click "Consolidate OFDS" in the toolbar.
5. Select the layers for the spans and nodes of each network using the dropdown menus in the Select Inputs tab.
6. Change any settings you need (see [settings](#settings)).
7. The tool presents data on nodes and spans which are geographically close to each other, pair by pair, along with a confidence score for how likely they are to be duplicates. Click "Same" to confirm the pair presented are duplicates and should be merged. Click "Not Same" to confirm the pair are _not_ duplicates, and should not be merged. If you're not sure, click "Next". You can use the "Next" and "Previous" buttons to cycle through the comparisons until you have marked them all as either "Same" or "Not Same".
7. The tool presents data on nodes and spans which are geographically close to each other, pair by pair, along with a confidence score for how likely they are to be duplicates. Click "Consolidate" to confirm the pair presented are duplicates and should be merged. Click "Keep Both" to confirm the pair are _not_ duplicates, and should not be merged. If you're not sure, click "Next". You can use the "Next" and "Previous" buttons to cycle through the comparisons until you have marked them all as either "Consolidate" or "Keep Both".
8. When you've reviewed all of the comparisons, click "Finish".
9. TODO: Choose the level of detail you would like in the additional provenance metadata generated with your output(s):
* None: the output is OFDS conformant geoJSON nodes and spans networks, with no additional provenance metadata.
* Basic: the output includes a reference to the source nodes/spans for any that were generated from duplicates, the overall confidence score for the match, and whether they were merged by hand or automatically.
* Detailed: the output includes all the data for each pair of nodes/spans compared, similarity scores for each field, the overall confidence score, and whether duplicate nodes/spans were merged by hand or automatically.
* Detailed (include non-matches): the same as "Detailed", plus data for nodes/spans which were compared but not merged.
10. Choose where you would like to save the consolidated network JSON files.
9. Choose where you would like to save the consolidated node and span GeoJSON files.

### Settings

When two features are consolidated, for some fields the data cannot be merged or combined, and only the data from one network will be kept. The network you select for the "Primary Network" will be the one for which data is kept in this case.

You can adjust some thresholds based on the accuracy and completeness of the data you are comparing, and how much oversight you want over the consolidation.

* Node match radius: compare nodes within this distance of each other. On data with high precision and accuracy for geographic elements, you may wish to set this number low; for less precise or inaccurate data, a higher number means more comparisons will be made.
* TODO: Primary network: in the event of a match of two similar but not identical nodes/spans, keep the data from this network.
* Ask above (%): the confidence score above which the tool should prompt you to mark a node/span pair as "Same" or "Not Same". Below this score, pairs are assumed to not be matches, and both are kept in the final output.
* Auto consolidate above (%): the confidence score above which the tool should automatically consolidate nodes/spans without prompting.
* **Node match radius:** compare nodes within this distance of each other. On data with high precision and accuracy for geographic elements, you may wish to set this number low; for less precise or inaccurate data, a higher number means more comparisons will be made.
* **Ask above (%):** the confidence score above which the tool should prompt you to consolidate. Below this score, pairs are assumed to not be matches, and both are kept in the final output.
* **Auto consolidate above (%):** the confidence score above which the tool should automatically consolidate nodes/spans without prompting.

### Scoring

### Output

The final output of the tool are geoJSON files saved to your computer locally.

If you choose "None" for your provenance data, the structure of the output will be the same as that of the input.

If you choose "Basic", each feature will have an additional `provenance` field:
Each feature has an additional `provenance` object, containing the following:

* `wasDerivedFrom`: array of the ids of the two features that were consolidated.
* `generatedAtTime`: date or datetime the network was generated.
Expand Down Expand Up @@ -179,91 +173,3 @@ If you choose "Basic", each feature will have an additional `provenance` field:
]
}
```

If you choose "Detailed", each feature will have the additional provenance field as above, as well as the confidence scores for each field compared for each feature:

* `similarFieldScores`: an object where the properties are the field names used for comparision, and the values are the similarity scores between 0 and 1 for each. Only includes fields which scored highly.

_And_ the whole network will contain a copy of the source of both original networks for all features which were consolidated:

* `allFieldScores`: an object where the properties are the field names used for comparision, and the values are the similarity scores between 0 and 1 for each.
* `merged`: bool; false means the features were not consolidated together; true means they were.
* `manual`: the same as this property on individual features in a consolidated network, but note that if `manual` is `false` and `merged` is `false` it means the tool automatically did not consolidate two features without prompting the user to confirm.

If "include non-matches" is set, the source and scores of both original networks for all features which were _compared_ are included, even if they were not consolidated:

```
{
"type": "FeatureCollection",
"features":
[
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
40.116275928237194,
-3.2187281346824963
]
},
"properties": {
"id": "f787b3ce-dc40-4d09-ac8a-78ded5811578",
"name": "Name of a Node",
"status": "operational",
...
},
"provenance": {
"wasDerivedFrom": ["f787b3ce-dc40-4d09-ac8a-78ded5811578", "debba101-49e9-4454-a613-f474dcc73f1c"],
"generatedAtTime": "2024-04-04",
"confidence": 0.89,
"similarFieldScores": {"name": 1, "coordinates": 0.8, "status": 1},
"manual": "true"
}
},
{
...
}
],
"comparisons": [
{
"features": [
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
40.116275928237194,
-3.2187281346824963
]
},
"properties": {
"id": "f787b3ce-dc40-4d09-ac8a-78ded5811578",
"name": "Name of a Node",
"status": "operational",
...
}
},
{
"type": "Feature",
"geometry": {
"type": "Point",
"coordinates": [
37.07134114614533,
-1.0362943463841985
]
},
"properties": {
"id": "a35a6d78-e7b6-4125-abb3-ae62489ab370",
"name": "A completely different one",
"status": "operational",
...
}
}
],
"confidence": 0.29,
"allFieldScores": {"name": 0.1, "coordinates": 0.4, "status": 1, "type": 0, "power": 0},
"merged": "false",
"manual": "false"
]
}
```
2 changes: 1 addition & 1 deletion docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ This is a [QGIS](https://qgis.org/) plugin to consolidate (deduplicate, combine)
For more detail on any of these steps, see the [how to guide](howto).

1. Add your nodes and spans geoJSON files as Vector Layers in QGIS (optionally add map tiles; make sure the map is at the bottom of the layers).
2. Start and configure the consolidation tool. Click through each presented pair of nodes and spans, and mark them as "Same" or "Not Same". Click "Finish" when you've compared them all.
2. Start and configure the consolidation tool. Click through each presented pair of nodes and spans, and "Consolidate" or "Keep Both". Click "Finish" when you've compared them all.
3. Configure how much provenance metadata you want to keep with the output, then save the output.

Both the input and the output of the tool should be OFDS conformant geoJSON.
Expand Down

0 comments on commit 9e5a0e3

Please sign in to comment.