-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] unintuitive behavior with split files #1064
Comments
Yes please share a sample dataset we can use to test and better understand the issue |
Here you go. Most of it has the purpose to generate the file structure necessary to reproduce it. I stumbled over this problem, because I use datalad which uses git-annex to store data. Therefore the original files have some hash-based file path, with a human readable symlink in the appropriate bids directory that points to the originals. I think the randomness of the hash path causes the problem, as finding the next split is not trivial. So if you have datalad already installed, you can skip most of the code by simply creating a dataset from mne-sample-bids dataset and then to read the data in the different scenarios. Let me know if anything is unclear import mne
import mne_bids as mb
import os
import os.path as op
from mne import datasets
# first we need to create all sets of files
# one pair of split files that are real files
# and another pair of split files that are symlinks
# load the sample data
out_path = op.join('temp', 'sub-01', 'ses-01', 'meg')
os.makedirs(out_path, exist_ok=True)
# we need to more directories where we will hide the original symlinks in later
os.makedirs('temp/foo1', exist_ok=True)
os.makedirs('temp/foo2', exist_ok=True)
raw_dir = op.join(datasets.sample.data_path(), 'MEG', 'sample')
raw = mne.io.read_raw(op.join(raw_dir, 'sample_audvis_raw.fif'))
# define the paths for the files to be written
out_file = op.join(out_path, 'sub-01_ses-01_task-real_run-01_meg.fif')
out_symlink = op.abspath(op.join(out_path,
'sub-01_ses-01_task-sym_run-01_meg.fif'))
# crop the raw and save them
raw.crop(0, 10)
raw.save(out_file, split_size="10MB", split_naming='bids')
raw.save(out_symlink, split_size="10MB", split_naming='bids')
# adjust paths to match the splits
split1_link_src = out_symlink.replace('_meg', '_split-01_meg')
split2_link_src = out_symlink.replace('_meg', '_split-02_meg')
split1_link_tgt = split1_link_src.replace('sym', 'link')
split2_link_tgt = split2_link_src.replace('sym', 'link')
# move the symlink files somewhere non-bids conforming
# I think it is important that the splits are in different directories
split1_link_src_new = split1_link_src.replace('sub-01/ses-01/meg', 'foo1')
split2_link_src_new = split2_link_src.replace('sub-01/ses-01/meg', 'foo2')
split1_link_tgt = split1_link_tgt.replace('link', 'sym')
split2_link_tgt = split2_link_tgt.replace('link', 'sym')
# mv the original files out of the meg directory
os.system(f'mv {split1_link_src} {split1_link_src_new}')
os.system(f'mv {split2_link_src} {split2_link_src_new}')
# generate symlinks
os.symlink(split1_link_src_new, split1_link_tgt)
os.symlink(split2_link_src_new, split2_link_tgt)
# the actual exercise: trying to read data
sub = '01'
ses = '01'
datatype ='meg'
run = '01'
root = 'temp'
split='01'
# path to symlink
task = 'sym'
# split not specified
path = mb.BIDSPath(root=root, subject=sub, session=ses, task=task,
datatype=datatype, run=run)
raw = mb.read_raw_bids(path) # fails
# split specified
path2 = mb.BIDSPath(root=root, subject=sub, session=ses, task=task,
datatype=datatype, run=run, split=split)
raw2 = mb.read_raw_bids(path2) # works
# path to real files
task= 'real'
# split not specified
path3 = mb.BIDSPath(root=root, subject=sub, session=ses, task=task,
datatype=datatype, run=run)
raw3 = mb.read_raw_bids(path3) # fails
# split specified
path4 = mb.BIDSPath(root=root, subject=sub, session=ses, task=task,
datatype=datatype, run=run, split=split)
raw4 = mb.read_raw_bids(path4) # fails
os.system("rm -r temp") |
ok now I think I understand the problem. We use resolve() to
resolve symlinks on read. Now
this is useful if the name of the second split of the file is still the old
one in the original file.
But in your case the split-01 file points to the correct split-02 file as
see by:
❯ mne show_fiff
temp/sub-01/ses-01/meg/sub-01_ses-01_task-real_run-01_split-01_meg.fif -t
118
118 = FIFFB_REF
115 = FIFF_REF_ROLE (4b >i4) = [2]
118 = FIFF_REF_FILE_NAME (47b str) =
sub-01_ses-01_task-real_run-01_split-02_meg.fif ... str len=47
116 = FIFF_REF_FILE_ID (20b ids) = {'version': 65537, 'machid': array([
3173893, -1922170880], dtype=i ... dict len=4
117 = FIFF_REF_FILE_NUM (4b >i4) = [1]
(note I had to extend the default string length in show_fiff function to
see the full file)
so here we should not resolve. Now I assume this was added cause users
wanted to avoid writing new files
hence keep the original pointers to the old naming in the storage server.
what do you think we should do?
… Message ID: ***@***.***>
|
In this scenario, what would happen if the split field is specified in the path for reading? It seems that specifying it solves the issues with symlinks that I have encountered. If they also work for the situation you describe, one solution could be to require the split field when reading split files. |
specifying the split entity is not that easy as if you have fif files > 2GB
you get split files automatically...
as the 2 uses cases (resolve or not symlinks) makes sense I am wondering if
we could not have a resolver_symlink param in read_raw_bids
anyone sees an alternative to this?
… Message ID: ***@***.***>
|
similar issues have come up before, see mne-tools/mne-python#9221 and the fix (though apparently not enough of a fix?) in mne-tools/mne-python#9227. cc @adswa in case she has some ideas here. |
Description of the problem
Hi,
I think there are some problems with the way mne-bids handles split files, potentially in combination with symlinks. This report is based on this discussion on the forum.
Suppose you have split recordings and try to read them. There are following scenarios:
As you see there are two things going on here:
The reason for the failure in 2. is that mne-bids resolves the symlink already, such that when mne-python later checks the
next_fname
field, it gets confused because the specified next_fname is base on the symlink and not that underlying file.Apart from that, I have doubts whether it is desirable to silently read a file that wasn't requested. Perhaps that should be prevented in first place?
In any case, I think the current situation is a bit confusing. It shouldn't matter on whether a filename points to a file or a symlink whether the split field is required or not. I see two options:
As mentioned on the forum, it seems removing the
.resolve()
is sufficient for (1)mne-bids/mne_bids/read.py
Line 690 in 64f95f8
Steps to reproduce
Expected results
unclear
Actual results
Actual files are read regardless of whether split is specified. Symlinks cannot be read if split is not specified.
Additional information
Running mne-bids 0.11.dev0 and mne-python 1.2.dev0
The text was updated successfully, but these errors were encountered: