-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request: fail if any unknown files exist #182
Comments
Can you elaborate on this use case a bit? You have a workflow that generates files with competely random filenames (i.e. timestamped) when an error occurs? How do you propose this would be implemented in an elegant fashion? Wouldn't it be easier to simply use the custom test interface for this problem? |
I am working with a very large and complex workflow. When someone adds a new process, sometimes they don't intend for the outputs of that process to get published to the results directory because it's an intermediate process, but if they misconfigure things, then those files can get published. It would be nice if those mistakes could be caught in an automated way using this package. It may be easier to do this with the custom test interface. I'll look in to that. I'm not familiar with the code itself, so i'm not sure on a good implementation, but the interface could look something like this
|
I do think that is one of the best possible interfaces, but I don't like how now suddenly the should_exist meaning depends on context. This will make the YAML harder to read. What about a custom test? import pathlib
from typing import Optional, Set
import pytest
def check_allowed_paths(
directory: pathlib.Path,
allowed_paths: Set[str],
root: Optional[pathlib.Path] = None
):
if root is None:
root = directory
for path in directory.iterdir():
if path.is_dir():
check_allowed_paths(path, allowed_paths, root)
elif path.is_file() or path.is_symlink():
relpath = path.relative_to(root)
if str(relpath) not in allowed_paths:
raise FileExistsError(f"{relpath} not allowed!")
ALL_WORKFLOW_FILES = [
"bam/test1.bam",
"vcf/test1.vcf",
]
MULTISAMPLE_FILES = [
"bam/test2.bam",
"vcf/test2.vcf",
]
ANNOTATED_FILES = [
"vep/test1.vep.vcf",
]
@pytest.mark.workflow("annotated")
def test_annotated_workflow(workflow_dir):
check_allowed_paths(
workflow_dir, set(ALL_WORKFLOW_FILES + ANNOTATED_FILES))
@pytest.mark.workflow("multisample")
def test_annotated_workflow(workflow_dir):
check_allowed_paths(
workflow_dir, set(ALL_WORKFLOW_FILES + MULTISAMPLE_FILES)) EDIT: Even better: you could use the |
Thanks! I started looking through the code to see if there was a way to automatically append this custom test to every workflow, without having to copy/paste the custom test for every new workflow. In the process i put together a example (rough) implementation of this in the code: #183 The interface is a bit different than what i mentioned above. In order to check for unknown files you'd add the
|
Sorry, I think this feature is too specific to your needs to warrant a change to the test YAML. Why not write a custom test as mentioned above? |
makes sense. the only reason i thought a custom test wasn't quite as good is because you have to duplicate the test collection. Do you know if there's a way to hook into pytest's test collection to make sure that the custom test applies to all the relevant yaml files? If not, another idea would be to extend the |
You can parse the YAML yourself. That should give you all the workflow names and all the relevant files. In that case there will be no duplication. |
There is duplication in finding the yaml files. pytest's test collection handles that an any difference between the custom code and pytest's test collection will result in missed tests |
Sorry for my late reply. I have been very busy lately and haven't kept up.
Yes, there will be duplication. Sometimes a little duplication is better than adding extra complexity. It is possible to take DRY too far. I would take simplicy and a little duplication over complexity every single time. But you might have noticed that from the design of pytest-workflow already ;-). |
As far as i'm aware, currently it is only possible to check that specified output files match certain criteria. I'm also aware that you can specify that a certain file must not exist. It would also be useful for me to specify that NO other files besides the specified ones exist in a given directory
The text was updated successfully, but these errors were encountered: