Continue improving tests #79

JoeZiminski · 2023-03-02T19:56:47Z

I'll put some notes here on the existing test SSH stucture, and what it will be nice to test next.

Existing:

There are two sets of file transfer tests. test_transfer_file.py is the original set of tests that mostly make directories of different data_types and check they are uploaded / downloaded properly. These do not test the newer keys for non-data_type files e.g. "all_non_sub".

test_ssh_file_transfer.py has extended tests thats test files and folders, with lots of permutations of different sub/ses/datatype (including all the new keys (e.g. all_non_sub). This has options for testing over the local filesystem which is always on. It also has option for testing over SSH (and there are also tests in test_ssh_setup.py that test SSH setup). The data transfer tests do not require a password, but the SSH setup tests do. Currently all SSH test settings are stored in a conftest.py, including a path to a txt file containg the password. However, this is bad idea and is probably safest to remove, and just forgot the couple of tests that check SSH key setup.

The idea is that SSH tests are always off on CI, and we can run them just on the dedicated PC we will setup.

As for tests on real data, I had a look into the memory blocking thing for linux but I don't think there is a simple way to make it cross-platform. As such started to put together a fake experimental project folder with real data at: (ceph) X:\neuroinformatics\scratch\datashuttle_tests\real_data/). We can extend this, and then setup real-life test around it that can check long-latency transfers.

Things to test next:

need to test what happens when connection drops during transfer. Rclone has protocols for this
but we need to test our end (notes: https://stackoverflow.com/questions/18601828/python-block-network-connections-for-testing-purposes, but this might be just for dropping access to connection for python not entire machine)
Currently the default setting is that we will never overwrite data (there is a config for this can be changed). One problem
is that if somehow a transfer fails halfway, a corrupt version of the file will exist on remote, but will never be updated
again, because it will not be overwritten. This shoudl still warn the user, but need to explictly test what happens in
this case, that the warning is clear and see if there are any nice solutions.
currently all test are from one project (local) to another (remote). It will be nice to simulate 2-3 local projects
and push them all to the remote at the same time. Not 100% sure what rlcone will do in this situation.
currently rawdata only tested. WHen extend to derivatives will need to extend the tests
switching between local and SSH transfer methods in a single repo (currently these are isolateed tests)
SSH tests are very slow on windows due to the way they are checkde through mounted filesystem (see test_ssh_file_transfer.py pathtable_and_project()

adamltyson · 2023-03-06T11:17:25Z

The idea is that SSH tests are always off on CI, and we can run them just on the dedicated PC we will setup.

If we can, it would be great to get these set up as local GH actions runners and integrate with CI.

Currently the default setting is that we will never overwrite data (there is a config for this can be changed). One problem
is that if somehow a transfer fails halfway, a corrupt version of the file will exist on remote, but will never be updated
again, because it will not be overwritten. This shoudl still warn the user, but need to explictly test what happens in
this case, that the warning is clear and see if there are any nice solutions.

One way address this is to mark datasets as either OK (leave e.g. a .transfer.complete empty file after transfer) or not OK (write a .transfer.inprogress file at the start, and clean up at the end). datashuttle can read these and ask the user what they'd like to do. Although, does rclone leave a corrupt version if transfer is interrupted? I think e.g. rsync writes to a temp file until transfer is complete.

currently rawdata only tested. WHen extend to derivatives will need to extend the tests

Might be worth thinking about this sooner rather than later in the context of SWC-BIDS and datashuttle.\

JoeZiminski · 2023-03-07T10:34:28Z

I've not tested it yet, but it seems rclone will store incomplete transfer to meta-data and re-upload the next time, according to here and here. However, this doesn't seem very well documented. I've opened a new forum question here asking for a bit more detail and checking if I've missed any documentation anywhere.

That would be cool to integrate the SSH tests with local CI runners. Yes I think adding the derivatives option is high priority next step, it actually won't take much to implement so could be worth doing before beta release.

JoeZiminski linked a pull request May 31, 2023 that will close this issue

[Testing] Setup SSH tests to cluster (internal testing only) #100

Draft

JoeZiminski added this to the Second release v0.0.2 milestone Oct 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Continue improving tests #79

Continue improving tests #79

JoeZiminski commented Mar 2, 2023

adamltyson commented Mar 6, 2023

JoeZiminski commented Mar 7, 2023

Continue improving tests #79

Continue improving tests #79

Comments

JoeZiminski commented Mar 2, 2023

adamltyson commented Mar 6, 2023

JoeZiminski commented Mar 7, 2023