Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request to provide a sample script to de-identify edf #64

Closed
catvasily opened this issue Dec 13, 2019 · 25 comments
Closed

request to provide a sample script to de-identify edf #64

catvasily opened this issue Dec 13, 2019 · 25 comments

Comments

@catvasily
Copy link

Thank you for sharing the package.
May I kindly ask you to provide a sample script on how to remove patient's name from a given EDF+, or to replace it with a fake name.
Thank you.

@catvasily catvasily changed the title sample scipt to de-identify edf sample script to de-identify edf Dec 13, 2019
@catvasily catvasily changed the title sample script to de-identify edf request to provide a sample script to de-identify edf Dec 13, 2019
@skjerns
Copy link
Collaborator

skjerns commented Jan 7, 2020

I have something like that written. I'll upload it for you, but it requires a few more scripts and functions as well. I'll see if I can make a PR with some more high-level functions to this package.

@skjerns
Copy link
Collaborator

skjerns commented Jan 7, 2020

@catvasily

I've added the functions and created a PR #65 .

Until then you can find the functions here: https://github.com/skjerns/pyedflib

more specifically: https://github.com/skjerns/pyedflib/blob/master/pyedflib/highlevel.py

@catvasily
Copy link
Author

Thank you @skjerns ! It was helpful!
My original EDF files contain annotations, and after anonymization, they are gone.
Is there a way to keep annotations?

@skjerns
Copy link
Collaborator

skjerns commented Feb 11, 2020

Ahh, right, I totally forgot that those exist.

I'll see if I'll find time today to adapt the script.

@skjerns
Copy link
Collaborator

skjerns commented Feb 12, 2020

@catvasily try again, should work now :)

@catvasily
Copy link
Author

Thank you @skjerns for refining your toolbox!
The annotations are there! The only thing is that their exact timing in the anonymized edf is rounded to seconds, so in general, the annotations are shifted, up to 1 second.

@skjerns
Copy link
Collaborator

skjerns commented Feb 13, 2020

Found the culprit.

Can you check again @catvasily?

@skjerns
Copy link
Collaborator

skjerns commented Feb 21, 2020

@catvasily ? does it work for you?

edit: marking as closed, as no response

@skjerns skjerns closed this as completed Feb 29, 2020
@catvasily
Copy link
Author

Thank you @skjerns for improving your toolbox.

The annotations are still rounded up to 1 second.

By the way, the last line in the sample script demonstrating "Highlevel Interface" on https://pypi.org/project/pyEDFlib/, should be highlevel.anonymize_edf('edf_file.edf'), right?

@skjerns
Copy link
Collaborator

skjerns commented May 12, 2020

did you install the develop version?

pip install git+https://github.com/holgern/pyedflib.git --upgrade

If not try that and check again. It should be working in the current dev version. There's no new release yet.

@catvasily
Copy link
Author

catvasily commented May 12, 2020

Failed to build pyedflib
I use Anaconda environment 3.7, and am getting an error:
ERROR: Could not build wheels for PeEDFlib which use PEP 517
I tried to use

pip install --upgrade pip setuptools wheel

but at no avail.
Any suggestions?
Thank you.

@skjerns
Copy link
Collaborator

skjerns commented May 12, 2020

You need to install an VS C++ compiler, e.g. Visual Studio 2015 Community or maybe even this one might work: Microsoft Visual C++ Compiler for Python 2.7 and make sure the compiler exe (cl.exe) is on your path. Somehow the newer version (Studio 2019) do not work well.

@catvasily
Copy link
Author

Thank you @skjerns !
I was able to successfully build pyedflib (by installing the python compiler and fixing some issues with Studio 2019).
highlevel.anonymize_edf() introduces one small change though:
it rounds the start time up to 1 second.
For example, for the original edf: start time = 5 sep 2017 10:33:33.4628906
but for the new (de-identified) edf: 5 sep 2017 10:33:33
so it shifts the timing of all the annotations accordingly.
I am wondering if there a way to fix it? Thanks.

@skjerns
Copy link
Collaborator

skjerns commented May 12, 2020

Unfortunately, EDF only allows the starttime to be set to seconds, not milliseconds, see EDF specifications or BDF specifications.

8 ascii : starttime of recording (hh.mm.ss), e.g. 12:23:45

can you provide me with a sample file? alternatively open the file in a text editor and send me the first 500 characters. Where did you obtain a file with sub-second resolution?

@catvasily
Copy link
Author

Now I am surprised myself.
It is a clinical EEG converted from Natus's proprietary format.
Would be happy if you can take a look.
Can we switch to email correspondence please.

@catvasily
Copy link
Author

0 X F 24-SEP-1996 Lastname,Firstname Startdate 05-SEP-2017 X X X 05.09.1710.33.339216 EDF+C 1347 1 35 Trigger Event Patient Event C3 C4 CZ F3 F4 F7 F8 FZ FP1 FP2 FPZ O1 O2 P3 P4 PZ T3 T4 T5 T6 AUX1 ECG1 ECG2 AUX4 AUX5 AUX6 AUX7 AUX8 PG1 PG2 A1 A2 EDF Annotations uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV uV 0 0 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 8711 -1 1 1 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 -8711 1 0 0 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 -32768 1 1 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 32767 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 512 184

@skjerns
Copy link
Collaborator

skjerns commented May 12, 2020

leave me a message at [removed]

@skjerns
Copy link
Collaborator

skjerns commented May 12, 2020

05.09.1710.33.339216
dd.mm.yyhh.mm.ss + 8 bytes header lengths (9216).

Your recording also starts at a second mark, according to the header.

@skjerns skjerns reopened this May 12, 2020
@skjerns
Copy link
Collaborator

skjerns commented May 13, 2020

It seems that the information is not annotated in the header itself, but rather inside the signal blocks, must be something by EDF+C/D I'm not aware of. However, I cannot figure out yet how to access this information, as it's not part of the header itself.

pyEDFlib is a wrapper for a library written in C called EDFlib. EDFBrowser is also based on EDFlib, and apparently displays the subsecond starttime. So theoretically EDFlib should be able to read this offset information, probably named there starttime_subsecond. However, I can't seem to be able to read this information with our wrapper, it always returns 0. I'll ask the creator of EDFlib for assistance.

However, it might be non-trivial to implement this change, as most of the library is set up to second-accuracy starttimes and not subsecond. But I'll see.

@Teuniz
Copy link

Teuniz commented May 13, 2020

There were two problems with EDFlib:

  1. When opening a file for reading, a bug caused the subsecond starttime to be always zero.

  2. There was no function to write the subsecond starttime when writing a new file.

Both issues have been addressed in EDFlib.

Kind regards,

Teunis

@skjerns
Copy link
Collaborator

skjerns commented May 13, 2020

@Teuniz Thank's for the quick response! That's great news. I'll move this to #79 and implement the changes from there, and report back here once I'm done.

@skjerns
Copy link
Collaborator

skjerns commented Jul 11, 2020

@catvasily I've implemented the changes for subsecond accuracy. Can you check if it works?

Test it with the dev version: pip install git+https://github.com/holgern/pyedflib.git

@catvasily
Copy link
Author

Hi @skjerns, it is working!!
The timing is preserved now, and
max(original_signal_amplitudes - anonimyzed_signal_amplitudes) = 0, as expected.
I have noticed a very small difference in the time stamps for annotations - it is of the order of 10^(-4) second.
For example,
for one annotation in the original edf: 0:09:25:3905,
but for the corresponding annotation in the anonymized edf: 0:09:25:3906,
which is not critical.
The rest seems to be the same.
Thank you for developing this toolbox.  It is really helpful!

@skjerns
Copy link
Collaborator

skjerns commented Jul 13, 2020

@catvasily that's great to hear.

I'll look into the issue with the shift by 0.1ms. It really isn't much, however it would still be better to have it consistent.

@skjerns skjerns closed this as completed Jul 13, 2020
@skjerns skjerns reopened this Jul 15, 2020
@skjerns
Copy link
Collaborator

skjerns commented Jul 17, 2020

this seems to be a problem with writing the annotations in general, not with the starttime itself. The starttime is set accurately, but the annotations timing not, it will only do so up to 0.1ms accuracy. Should be possible to keep them the same. I'll continue here #86

@skjerns skjerns closed this as completed Jul 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants