-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand relative mask_path
and make paths valid for used OS (#3257)
#3704
base: develop
Are you sure you want to change the base?
Conversation
Expands relative mask_path values when importing a dataset. These keys are searched recursively to ensure compatibility with nested structures. Previously only filepath keys got expanded.
When importing a dataset all path separators of keys `filepath` and `mask_path` get replaced with the ones used by the OS. Resolves: voxel51#3257
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @denkrau thanks for the PR!
Can you explain a bit more about your use case? The existing implementation of the FiftyOneDataset
format was designed to handle relative -> absolute path conversions for you, but it sounds like it's not working in your case?
The example below demonstrates this working as expected:
import fiftyone as fo
import fiftyone.zoo as foz
import fiftyone.utils.labels as foul
dataset = foz.load_zoo_dataset(
"coco-2017",
split="validation",
label_types="segmentations",
max_samples=10,
)
# Create a `Segmentation` field with masks stored on disk
foul.objects_to_segmentations(
dataset,
"ground_truth",
"segmentations",
output_dir="/tmp/segmentations",
overwrite=True,
)
# Export in FiftyOneDataset format
dataset.export(
export_dir="/tmp/fod",
dataset_type=fo.types.FiftyOneDataset,
)
The directory structure of /tmp/fod
is like this:
/tmp/fod/
data/
000000001000.jpg
...
fields/
segmentations/
000000001000.png
...
metadata.json
samples.json
And here's how the samples.json
stores the mask paths (as relative to this JSON file):
$ python -m json.tool /tmp/fod/samples.json
...
"segmentations": {
"_id": {
"$oid": "6532854e5f4d8589d27625e8"
},
"_cls": "Segmentation",
"tags": [],
"mask_path": "fields/segmentations/000000001000.png"
}
}
]
}
When I then import this dataset:
dataset2 = fo.Dataset.from_dir(
dataset_dir="/tmp/fod",
dataset_type=fo.types.FiftyOneDataset,
)
print(dataset2.first())
the absolute mask paths are correctly constructed:
...
'segmentations': <Segmentation: {
'id': '6532854e5f4d8589d27625df',
'tags': [],
'mask': None,
'mask_path': '/tmp/fod/fields/segmentations/000000000139.png',
}>,
}>
@@ -1848,6 +1848,7 @@ def _import_samples(self, dataset, dataset_dict, tags=None): | |||
def _parse_sample(sd): | |||
if not os.path.isabs(sd["filepath"]): | |||
sd["filepath"] = os.path.join(rel_dir, sd["filepath"]) | |||
sd["filepath"] = fou.make_path_os_safe(sd["filepath"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually have a fiftyone.core.storage.normpath()
utility that does this. Can you use that instead?
@@ -1856,6 +1857,10 @@ def _parse_sample(sd): | |||
_parse_media_fields(sd, media_fields, rel_dir) | |||
|
|||
sd["_dataset_id"] = dataset_id | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _parse_media_fields()
method call on line 1857 is intended to handle this for you. I'd like to understand a bit more about your use case to see why it's not working for you 🤔
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #3704 +/- ##
========================================
Coverage 16.24% 16.24%
========================================
Files 640 640
Lines 74031 74031
Branches 988 988
========================================
Hits 12029 12029
Misses 62002 62002
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Hey @brimoor, thanks for the quick reply. Indeed your sample is working, so this got me wondering. tl;dr: It's about In my case I parse samples from (a network) drive and create a dataset. Data copying must be avoided, this means But you can test it with your code example as well. Just set
|
@denkrau your example with The Generally you wouldn't combine |
Thanks for the clarification. So there is no way to avoid copying/restructuring and to have all relative paths expanded? Is such a feature even desired from the project's point of view? If not, could it be possible to use file links with I'm just trying to figure out how to proceed with this or another PR. |
Can you clarify what you're trying to achieve that's not possible? I'm not sure 😅 |
Sure, thanks for your help, here we go. There are several raw datasets available on a shared network drive. Multiple users access them. The mount point for each user might vary. Now for some raw datasets FiftyOne dataset versions shall be created (i.e. exported) additionally, so users can import them on demand. The raw datasets cannot be moved/restructured, because some users work with this structure. Also I would like to avoid copying raw datasets for disk space and maintaining reasons. In my understanding of FiftyOne so far with the insights you gave me, such a scenario is not possible to achieve. Because it would require to expand all relative paths from |
What changes are proposed in this pull request?
When importing a dataset relative
mask_path
keys get expanded as well.This even works with nested structures.
Additionally,
filepath
andmask_path
path separators get converted based on the OS.How is this patch tested? If it is not, please explain why.
Importing a dataset in Ubuntu (which was exported in Windows) with relative
mask_path
and check if path is absolute and valid now.Release Notes
Is this a user-facing change that should be mentioned in the release notes?
notes for FiftyOne users.
(Details in 1-2 sentences. You can just refer to another PR with a description
if this PR is part of a larger change.)
What areas of FiftyOne does this PR affect?
fiftyone
Python library changes