Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Add mne_ch_name_mapping #29

Closed
wants to merge 1 commit into from
Closed

Conversation

larsoner
Copy link
Member

@jnenonen this PR adds a new tag to the MNE namespace to allow channel names longer than 15 characters by giving a JSON dictionary mapping between 15-char and arbitrary-length names. But before doing that, I thought I'd check to see if MEGIN folks would be interested in using this sort of functionality, in which case it could live in the main namespace. Quoting the MNE issue:

The reason there is a limitation at all IIUC is beacuse the channel info struct is a fixed-size C struct. We could add a FIFFV_MNE_CHANNEL_NAMES_MAPPING tag as a string JSON dump of the dict mapping short-to-original names that has the following behavior:

  1. When writing, (temporarily) replace info['chs'] and info['bads'] names with the shortened ones; write the FIFFV_MNE_CHANNEL_NAMES_MAPPING JSON
  2. When reading, check for the presence of the FIFFV_MNE_CHANNEL_NAMES_MAPPING JSON and use it to make the appropriate substitutions in info['chs'] and info['bads'].

In other words, we add a tag that says "do rename_channels(...) during writing and reading".

In theory this could be implemented in MEGIN C programs, too, but I assume it would be more of a pain because of structs/types/assumptions about these channel names.

Any interest in having this in the main DictionaryTags (e.g., if you think you would use it someday), or should we just keep it here in DictionaryTags_MNE.txt ?

@agramfort
Copy link
Member

@jnenonen we need you to approve or question this.

@jnenonen
Copy link
Collaborator

jnenonen commented Nov 19, 2020 via email

@larsoner
Copy link
Member Author

@mkajola sounds like it's up to you!

@mkajola
Copy link
Collaborator

mkajola commented Nov 24, 2020 via email

@larsoner
Copy link
Member Author

larsoner commented Nov 24, 2020

Embedded json is probably practical for python, but not necessarily for other platforms... it kind of breaks the basic idea of the fiff file... One also needs multiple parsers to read the files.

For what it's worth, we chose JSON because of its adoption across multiple languages -- a quick search turned up several possible C parsers (e.g., this one that is MIT/commercial-compatible single source and single header file) and the same goes for other languages. But it is indeed more overhead than using FIF tags directly, as you say, and is really just a quick workaround/band-aid solution to use a single string tag to encode what typically would be encoded using multiple tags (via a new block type, etc.).

What is the actual need here? Is it just a longer name for the channel. Then no new logical info would be in the file, just encoded differently. In some contexts, there seem to be also need to have e.g. user defined labels etc. which are necessarily not the same thing as the ‘name’ of the channel... on this century Unicode is the way to go... The idea that has been in the air for a very long time, has been to change the rigid channel info records to normal fiff-blocks.

Channel-name-length is the most pressing need, but it would be great to encode other information in a more flexible manner, support unicode, etc. as these are also limitations of the current format. A new "channel block" scheme that takes precedence over the old "channel struct" entries is definitely a better solution!

To me the question is, what do you see as a potential timeline for finalization and adoption of this sort of scheme? If we knew that this proposal could be discussed, finalized, and implemented within 6 months, there wouldn't be any point in us implementing the band-aid JSON approach. On the other hand, if we knew it take 6 years, then the JSON band-aid would have value as a short/medium-term workaround because it solves a real problem for our users in the meantime.

Proposal

To try to keep the proposed "channel block" solution toward the <= 6-month end of the spectrum, how about we start small with only the components we need within a framework we can expand later as we see a need for new tags? Concretely, we need to keep the channel info struct for backward compatibility, so there is no reason to replicate the entries already in there. So let's create a new block and for now only add the types that would take care of two concrete problems -- channel name length, and (optionally) unicode encoding:

  1. Create a new FIFFB_CH_BLOCK: the number of these blocks should match FIFF_NCHAN, and it should live within FIFFB_MEAS_INFO
  2. Add a tag FIFF_CH_INFO_NAME that links the given channel block to its channel info struct
  3. Add a tag FIFF_CH_NAME that is a channel name string of arbitrary length
  4. (Optional) Add a new FIFF.FIFFT_UNICODE that indicates that the tag is a unicode string

Step (4) seems like the most annoying to implement in C, so it could be omitted and implemented later -- LATIN-1 strings could be used for now. To represent this visually with how it would show up in mne show_fiff, it would be something like:

999 = FIFFB_ROOT
    ...
    100 = FIFFB_MEAS
        103 = FIFF_BLOCK_ID (20b ids) = {'version': 65537, 'machid': a ... dict len=4
        101 = FIFFB_MEAS_INFO
            ...
            200 = FIFF_NCHAN (4b >i4) = [376]
            x376: 203 = FIFF_CH_INFO (96b cis) = {'scanno': 376, 'logno': 61, ' ... dict len=11
            000 = FIFFB_CH_BLOCK
                000 = FIFF_CH_INFO_NAME (8b str) = 'MEG 0111'
                000 = FIFF_CH_NAME (19b unicode) = 'MEG 0111 ⏯️⏯️⏯️⏯️⏯️⏯️⏯️⏯️⏯️⏯️'
            000 = FIFFB_CH_BLOCK
                000 = FIFF_CH_STRUCT_NAME (8b) str = 'MEG 0112'
                ...
            ...

Does this sort of thing seem like a reasonable starting point? I think it checks all the boxes you mention above @mkajola and it should be easy enough for us to implement at the MATLAB/Python end, and definitely seems better from a long-term perspective.

@larsoner
Copy link
Member Author

(FYI I edited my GitHub comment to fix name consistency in the proposal, so if you're looking at my replies over email it's worth going on GitHub to see the up-to-date version of the proposal. Also the tag/block names and 000's are just placeholders -- if we agree this is a good way to move forward, I can open a PR with some proposed names and we can iterate there!)

@agramfort
Copy link
Member

I can only voice my support to a solution over the next few months. We have more and more users hitting the limitations of fif when working with different input formats that do not have the same restrictions. It's becoming problematic for a wider adoption of mne. my 2c

@mkajola
Copy link
Collaborator

mkajola commented Nov 25, 2020

I think that this would be useful thing, so lets see if we could realize this. That would be then in version 1.5. (Do we have the Manual somewhere in github, or only the tag directory text file). Unicode should then be 2.x later on, since touching the structures/primary types will not be backwards compatible.
The tags for the fields exists already - tags 250 - 258.
These seems to be no block for this (I suspect that the hole at 113 might have been it), so that needs to be specified.
The most interesting part here is to check how many readers we break with this. But to know that, I suppose we first need to create some files and test that, I think we can do that here for all (well at least for the most important) Megin software.

What would be the correct talking/specification environment for this. I suppose here we are approving/disapproving the json workaround. For me that's fine if it is in the MNE reserved area.
This question applies also the other restrictions in the format. It would be good to collect list of those so that it would be possible to evaluate if there is a good way to change FIFF of should there be some entirely different way to go.
In the same arena we probably should discuss also how to handle the different profiles of fiff usage. Currently specifying that one supports a particular version is bit fuzzy. Whether it means the Megin spec, or the extended one. Starts to resemble DICOM.

@larsoner
Copy link
Member Author

What would be the correct talking/specification environment for this.

I opened a PR that is an alternative to / supersedes this one: #33. Let's discuss there as people on GitHub can leave comments directly on the code in line, comment on the diff, etc. If we merge that PR we'll close this JSON band-aid one as it's redundant and a less good solution.

Do we have the Manual somewhere in github, or only the tag directory text file

Just text -- let's discuss further in #31, it would be great to have the build automated.

This question applies also the other restrictions in the format. It would be good to collect list of those so that it would be possible to evaluate if there is a good way to change FIFF of should there be some entirely different way to go.

Opened an issue to discuss as people have ideas: #30

@larsoner
Copy link
Member Author

Closing this as we hopefully won't need it!

@larsoner larsoner closed this Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants