Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in _06_make_cov: missing in_files task-noise_proc-clean_raw.fif #1025

Open
skjerns opened this issue Nov 18, 2024 · 13 comments
Open

Error in _06_make_cov: missing in_files task-noise_proc-clean_raw.fif #1025

skjerns opened this issue Nov 18, 2024 · 13 comments

Comments

@skjerns
Copy link

skjerns commented Nov 18, 2024

I'm trying to run a simple preprocessing pipeline for 3 different runs, two resting state and one main task. However, the pipeline fails at a step where the covariance is calculated, which I set to "empty-room" (which are successfully assigned in the first step of the pipeline)

It seems like the function wants to access a file that has not (yet?) been written/computed.

┌────────┬ sensor/_06_make_cov ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│16:48:32│ ❌ sub-05 A critical error occurred. The error message was: missing in_files["raw"] = /data/fastreplay/Fast-Replay-MEG-bids/derivatives/sub-05/meg/sub-05_task-noise_proc-clean_raw.fif

Aborting pipeline run. The traceback is:

  File "/zi/home/simon.kern/anaconda3/lib/python3.11/site-packages/mne_bids_pipeline/_run.py", line 55, in __mne_bids_pipeline_failsafe_wrapper__
    out = memory.cache(func)(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/zi/home/simon.kern/anaconda3/lib/python3.11/site-packages/mne_bids_pipeline/_run.py", line 179, in wrapper
    hashes.append(hash_(k, v))
                  ^^^^^^^^^^^
  File "/zi/home/simon.kern/anaconda3/lib/python3.11/site-packages/mne_bids_pipeline/_run.py", line 456, in _path_to_str_hash
    assert v.exists(), f'missing {kind}_files["{k}"] = {v}'


Sorry for spamming so many issues, I'm really trying to get the pipeline working and mostly I can move forward by trial-and-error, but for this error I'm lost where to even begin searching.

So far the pipeline has successfully created the files for the main task ('sub-02_task-main_proc-clean_raw'), but the rest sessions are missing yet and not computed. Not sure if the pipeline will do that after the main is completely calculated?

pipeline_config.py
###############################################################################
### Preprocessing pipeline settings for mne-bids-pipeline |
###############################################################################
from collections.abc import Callable, Sequence
from typing import Annotated, Any, Literal

from annotated_types import Ge, Interval, Len, MinLen
from mne import Covariance
from mne_bids import BIDSPath

from mne_bids_pipeline.typing import (
  # ArbitraryContrast,
  # DigMontageType,
  # FloatArrayLike,
  PathLike,
)

# %%05
# # General settings
# Configuration file for mne-bids-pipeline

from typing import Sequence, Literal, Optional
from mne import Covariance
from mne_bids import BIDSPath

# %%
# General Settings

bids_root: PathLike | None = '/data/fastreplay/Fast-Replay-MEG-bids/'
deriv_root: PathLike = f"{bids_root}/derivatives/"  # Save all processed data under /derivatives/
subjects_dir: Optional[PathLike] = f"{deriv_root}/freesurfer/subjects/"  # Path to FreeSurfer subject reconstructions
interactive: bool = False  # Disable interactive elements
# sessions: Literal["all"] = "all"  # Process all sessions
# task: str = ""  # Process all tasks by leaving empty
task_is_rest: bool = True  # Treat data as resting-state, disable epoching
# runs: Literal["all"] = "all"  # Process all runs
exclude_runs: Optional[dict[str, list[str]]] = None  # No excluded runs
subjects: Sequence[str] | Literal["all"] = "all"  # Analyze all subjects
exclude_subjects: Sequence[str] = ['23']  # No excluded subjects
process_empty_room: bool = True  # Preprocess empty-room data
process_rest: bool = True  # Preprocess resting-state data
ch_types: Sequence[Literal["meg"]] = ["meg"]  # Include MEG and EEG channels
data_type: Literal["meg", "eeg"] = "meg"  # Data type is MEG
eog_channels: Sequence[str] = ["BIO002", "BIO003"]  # Specify EOG channels
ecg_channel: str = "BIO001"  # Specify ECG channel
spatial_filter: Literal["ica"] = "ica"  # Use ICA for artifact removal
ica_n_components: int = 50  # Number of ICA components
ica_algorithm: str = 'picard'
rest_epochs_duration = 2
rest_epochs_overlap = 0
epochs_tmin = 0
# on_error  = 'continue'
exclude_subjects: Sequence[str] = ['01', '23']
baseline = None
# %%
# Preprocessing

raw_resample_sfreq: float = 100.0  # Resample data to 100 Hz
l_freq: float = 0.1  # Apply high-pass filter at 0.1 Hz
h_freq: Optional[float] = None  # Disable low-pass filter
notch_freq: Sequence[float] = [50.0]  # Apply notch filter at 50 Hz
notch_trans_bandwidth: float = 1.0  # Set notch filter transition bandwidth to 1 Hz

# %%
# Artifact Removal via ICA

# The pipeline will automatically identify and remove ICA components related to EOG and ECG.

# %%
# Source-level Analysis

run_source_estimation: bool = True  # Enable source-level analysis
inverse_method: Literal["dSPM"] = "dSPM"  # Use dSPM as the inverse solution method
loose: float = 0.2  # Weigh parallel dipole components by 0.2
depth: float = 0.8  # Set depth weighting exponent to 0.8
noise_cov = "emptyroom"  # Use resting-state recording for noise covariance

# # %%
# # FreeSurfer recon-all Settings

recon_all: bool = True  # Enable FreeSurfer's recon-all
freesurfer_verbose: bool = True  # print the complete recon-all pipeline

# %%
# Parallelization

n_jobs: int = 4  # Use all available CPU cores
parallel_backend: Literal["loky"] = "loky"  # Use 'loky' backend for parallel processing

# %%
# Logging

log_level: Literal["info"] = "info"  # Set pipeline logging verbosity to 'info'
mne_log_level: Literal["error"] = "error"  # Set MNE-Python logging verbosity to 'error'

# %%
# Error Handling

on_error: Literal["abort"] = "abort"  # Abort processing on errors
config_validation: Literal["raise"] = "raise"  # Raise exceptions on config validation issues
@larsoner
Copy link
Member

So far the pipeline has successfully created the files for the main task ('sub-02_task-main_proc-clean_raw'), but the rest sessions are missing yet and not computed. Not sure if the pipeline will do that after the main is completely calculated?

No this is weird... the rest and empty-room data should both be processed during the ICA application steps because you have

process_rest: bool = True  # Preprocess resting-state data
process_empty_room: bool = True  # Preprocess empty-room data

It sounds like you have neither of these processed fully? That's a bug and a bit confusing one at that. If you can confirm this to be the case I can see if we have a testing dataset with task, rest, and empty room (not sure if we do?) and see if I can replicate the issue.

Incidentally, if you have two resting state recordings, I'm not sure if our pipeline is smart enough to process both of them yet. But it shouldn't be too difficult in principle I think to make it process more than one. A hard part might actually be getting a test dataset that is configured this way.

One option would be for you to upload one subject's anonymized data to OSF.io. Then we could probably fairly easily add your dataset as a testing dataset :) Or even if you don't want to make the data public, sharing one subject's files privately somewhere would make it easy for one of us to test and fix locally.

@skjerns
Copy link
Author

skjerns commented Nov 20, 2024

Yes, the empty room and rest are not processed, unless I call them explicitly by --task rest1,rest2

Can I help with debugging somehow? I failed finding where config.task=='' is translated into finding the individual runs.

The data is anyway scheduled for uploading to a repository (possibly not OSF though), so I can think about doing this step ahead if that helps.

@hoechenberger
Copy link
Member

The data is anyway scheduled for uploading to a repository (possibly not OSF though), so I can think about doing this step ahead if that helps.

https://openneuro.org might be a good choice 🙂

@SophieHerbst
Copy link
Collaborator

We recently had a seemingly similar issue where the noise covariance matrix was computed on the task data, and thus missing for resting state in the source step leading to an error. We had to rerun the noise covariance computation.
Setting source_info_path_update to rest somehow helped.

In general, it is a bit confusing that the noise covariance matrix is computed in the sensor step (even though it makes from the data point of view), where the user might not have thought about what they want for the inverse.
Also the documentation of the noise cov is a bit in both places (sensor and source step). Maybe this could be simplified?

@hoechenberger
Copy link
Member

In general, it is a bit confusing that the noise covariance matrix is computed in the sensor step

Good point. It was a conscious decision back then to put it into the "sensor" set of steps, but I don't think we're really using it outside of anything inverse-modeling-related, right?

@larsoner
Copy link
Member

I don't think we're really using it outside of anything inverse-modeling-related, right?

Looking at whitened evoked data and related SNR can be useful outside of source imaging contexts as well

@hoechenberger
Copy link
Member

I don't think we're really using it outside of anything inverse-modeling-related, right?

Looking at whitened evoked data and related SNR can be useful outside of source imaging contexts as well

Good point.

What do you make of the behavior described above? Bug? Or conceptual issue in the Pipeline? I have to admit I don't fully grasp it yet!

@larsoner
Copy link
Member

It sounds like there are potentially multiple bugs in terms of which files we preprocess and/or which files we use to create noise covariances 😬

@berkgercek
Copy link

Also having the same issue, the pipeline runs fine (_02_find_empty_rooom runs as expected and succeeds) but when I get to the source processing steps the clean empty-room covariance files are not found. When I attempt to run the pipeline directly on --task noise the pipeline fails attempting to located the noise file with run-None included as the run identity.

I'm running pipeline version 1.9.0 for reference, with MNE 1.7.1 (1.8 and above cause issues with anonymized Helium info)

@berkgercek
Copy link

Upon looking at the _emptyroommatch.json file the filename is null, so it seems like succeeding in that step doesn't mean the actual empty room file is associated. I'll poke around the code to see if I can find the issue.

@hoechenberger
Copy link
Member

Thanks @berkgercek, I appreciate the effort!

@berkgercek
Copy link

Ok, so after looking at _02_find_empty_room it seems to look exclusively at the candidates offered by MNE-BIDS' path._find_empty_room_candidates. This function only will look for files that use the emptyroom subject ID.

The BIDS specification suggests that two possible routes to tagging empty-room recordings are possible:

  1. The subject is set to emptyroom, as implemented in MNE-BIDS, and the empty-room recordings are date-tagged using the session identifier.
  2. The task is set to noise but otherwise the subject and session identities are preserved. This is not searched for in file paths by MNE-BIDS.

Unless mne-bids-pipeline implements additional search logic (I didn't see anything like this in _02_find_empty_room) then case 2 will not be recognized if the AssociatedEmptyRoom field in the JSON sidecar is empty.

This, at least, is why I've had issues in my pipeline. For now I've solved the issue by switching to method 1, which accommodates my laziness in not maintaining the sidecar JSON.

I'll create an issue in MNE-BIDS to suggest looking for same-subject/session task-noise files as part of the BIDSPath.find_empty_room logic and post it in this thread when I do.

@berkgercek
Copy link

Issue is open at mne-bids

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants