Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Continue improving tests #79

Open
JoeZiminski opened this issue Mar 2, 2023 · 2 comments · May be fixed by #100
Open

Continue improving tests #79

JoeZiminski opened this issue Mar 2, 2023 · 2 comments · May be fixed by #100

Comments

@JoeZiminski
Copy link
Member

I'll put some notes here on the existing test SSH stucture, and what it will be nice to test next.

Existing:

There are two sets of file transfer tests. test_transfer_file.py is the original set of tests that mostly make directories of different data_types and check they are uploaded / downloaded properly. These do not test the newer keys for non-data_type files e.g. "all_non_sub".

test_ssh_file_transfer.py has extended tests thats test files and folders, with lots of permutations of different sub/ses/datatype (including all the new keys (e.g. all_non_sub). This has options for testing over the local filesystem which is always on. It also has option for testing over SSH (and there are also tests in test_ssh_setup.py that test SSH setup). The data transfer tests do not require a password, but the SSH setup tests do. Currently all SSH test settings are stored in a conftest.py, including a path to a txt file containg the password. However, this is bad idea and is probably safest to remove, and just forgot the couple of tests that check SSH key setup.

The idea is that SSH tests are always off on CI, and we can run them just on the dedicated PC we will setup.

As for tests on real data, I had a look into the memory blocking thing for linux but I don't think there is a simple way to make it cross-platform. As such started to put together a fake experimental project folder with real data at: (ceph) X:\neuroinformatics\scratch\datashuttle_tests\real_data/). We can extend this, and then setup real-life test around it that can check long-latency transfers.

Things to test next:

  • need to test what happens when connection drops during transfer. Rclone has protocols for this
    but we need to test our end (notes: https://stackoverflow.com/questions/18601828/python-block-network-connections-for-testing-purposes, but this might be just for dropping access to connection for python not entire machine)
  • Currently the default setting is that we will never overwrite data (there is a config for this can be changed). One problem
    is that if somehow a transfer fails halfway, a corrupt version of the file will exist on remote, but will never be updated
    again, because it will not be overwritten. This shoudl still warn the user, but need to explictly test what happens in
    this case, that the warning is clear and see if there are any nice solutions.
  • currently all test are from one project (local) to another (remote). It will be nice to simulate 2-3 local projects
    and push them all to the remote at the same time. Not 100% sure what rlcone will do in this situation.
  • currently rawdata only tested. WHen extend to derivatives will need to extend the tests
  • switching between local and SSH transfer methods in a single repo (currently these are isolateed tests)
  • SSH tests are very slow on windows due to the way they are checkde through mounted filesystem (see test_ssh_file_transfer.py pathtable_and_project()
@adamltyson
Copy link
Member

The idea is that SSH tests are always off on CI, and we can run them just on the dedicated PC we will setup.

If we can, it would be great to get these set up as local GH actions runners and integrate with CI.

Currently the default setting is that we will never overwrite data (there is a config for this can be changed). One problem
is that if somehow a transfer fails halfway, a corrupt version of the file will exist on remote, but will never be updated
again, because it will not be overwritten. This shoudl still warn the user, but need to explictly test what happens in
this case, that the warning is clear and see if there are any nice solutions.

One way address this is to mark datasets as either OK (leave e.g. a .transfer.complete empty file after transfer) or not OK (write a .transfer.inprogress file at the start, and clean up at the end). datashuttle can read these and ask the user what they'd like to do. Although, does rclone leave a corrupt version if transfer is interrupted? I think e.g. rsync writes to a temp file until transfer is complete.

currently rawdata only tested. WHen extend to derivatives will need to extend the tests

Might be worth thinking about this sooner rather than later in the context of SWC-BIDS and datashuttle.\

@JoeZiminski
Copy link
Member Author

I've not tested it yet, but it seems rclone will store incomplete transfer to meta-data and re-upload the next time, according to here and here. However, this doesn't seem very well documented. I've opened a new forum question here asking for a bit more detail and checking if I've missed any documentation anywhere.

That would be cool to integrate the SSH tests with local CI runners. Yes I think adding the derivatives option is high priority next step, it actually won't take much to implement so could be worth doing before beta release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants