Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sample_rate vs sample_frequency: how to continue? #247

Open
skjerns opened this issue Nov 9, 2023 · 10 comments
Open

sample_rate vs sample_frequency: how to continue? #247

skjerns opened this issue Nov 9, 2023 · 10 comments

Comments

@skjerns
Copy link
Collaborator

skjerns commented Nov 9, 2023

There is confusions about the two terms and how they are implemented in pyedflib and also edflib. I'd like you'r opinion on how to continue. @DimitriPapadopoulos @cbrnr @holgern @raphaelvallat @LucaCerina @cfranklin11

EDF specs

In the specs the following is defined:
* 8 ascii : duration of a data record, in seconds -> we call this internally record_duration
*ns * 8 ascii : ns * nr of samples in each data record (per channel) -> we call this internally smp_per_record

There is no sample_frequency or sample_rate within the EDF specs. However, these can be derived by dividing smp_per_record by record_duration.

edflib

edflib, introduces samplefrequency (equal to smp_per_record/record_duration, not always Hz) and samplerate (Hz), which is not actually implemented to set, just mentioned in the docstrings. There's an inconsistency as Hz is defined as cycle per second, but this isn't always reflected in edflib:

  • For example, if you want to use a samplerate of 0.5 Hz, set the samplefrequency to 5 Hz and the datarecord duration to 10 seconds, or set the samplefrequency to 1 Hz and the datarecord duration to 2 seconds."

pyedflib

In pyedflib, sample_rate was initially like edflib's samplefrequency. Later, sample_frequency was introduced to represent Hz, simplifying the user experience. The definition of sample_frequency in Hz is also how it is done in `python-edf.

Solutions

In principle I think nobody cares about the record_duration or how things are implemented internally. People want to set and retrieve a sampling rate in Hz and be done. You should not have to think about the record_duration unless you have good reason to. Imo the decision to be made is which of the two terms to keep.

Options:

A) Keep both terms, eventually removing sample_rate. This is user-friendly but inconsistent with edflib.

B) Revert to sample_rate only, aligning with edflib but disrupting recent changes. Treat sample_rate as Hz.

C) Align with edflib: use sample_frequency for smp_per_record and sample_rate for Hz, which could confuse users.

Personally I think A&B seem to be the best options, with A the least disrupting. If the unfortunate introduction of sample_frequency would not have been done, I would favour B, but that's too late now. What do you think?

@cbrnr
Copy link
Contributor

cbrnr commented Nov 9, 2023

I agree with you, and since sample_rate and sample_frequency are synonymous (at least in general, i.e. not talking about some specific implementation details), I would vote for B. I don't think it's too late for that, you can just deprecate sample_frequency and go back to sample_rate.

@raphaelvallat
Copy link
Contributor

I prefer sample_frequency over sample_rate, which is more consistent with scipy.signal and MNE. My vote is for A :)

@skjerns
Copy link
Collaborator Author

skjerns commented Dec 16, 2023

Any more opinions? @DimitriPapadopoulos @holgern @LucaCerina @cfranklin11

@cbrnr
Copy link
Contributor

cbrnr commented Dec 16, 2023

BTW I'm also fine with A as long as there's only one term (eventually after a deprecation cycle).

@LucaCerina
Copy link
Contributor

I would go with A.
Looking at EDFlib their definition of samplefrequency forcing an integer value is a bit weird and I think it's ok to not align with that

@TrianecT-Wouter
Copy link

A. Does not make sense to talk about samples / record for end users

@cfranklin11
Copy link
Contributor

I've changed jobs since submitting my PR, so don't work with EDFs anymore. I'll defer to those who still work with them on a day-to-day basis.

Just to offer some context, the changes introduced in PR #121, were due to the fact that where I worked we had a lot of EDFs whose sample frequency wasn't per second, which led to incorrect calculations when working with pyedflib. I don't have enough experience with EDFs to know if this is an edge case that's not worth the added complexity, but it is a situation that exists for some users.

@cbrnr
Copy link
Contributor

cbrnr commented Dec 17, 2024

Out of curiosity, what unit for the sampling frequency did you use?

@cfranklin11
Copy link
Contributor

Out of curiosity, what unit for the sampling frequency did you use?

It's been a few years, so my memory is a little fuzzy on the details, but it did vary from study to study. Most of the studies had the usual 1 sample per second, but a sizeable minority didn't. The non-standard frequencies weren't always whole numbers, but I think they were generally between 1 and 2 per second (inclusive).

@cbrnr
Copy link
Contributor

cbrnr commented Dec 18, 2024

But in this case they were still all per second (i.e., Hz), which is the standard unit of frequency. It is perfectly fine (and very common) that the sampling frequency is not an integer. What I don't understand is if there are cases where the sampling frequency is not in units of Hz (1 / s), as @skjerns writes in the first comment. Even smp_per_record/record_duration is a sampling frequency in units of 1 / s, but the problem with the internal EDF implementation seems to be that to realize a non-integer sampling frequency, it has to be constructed with an integer record duration (consisting of an integer number of samples of course).

In summary, I can only echo what has already been stated by several people, end users do not care about how things are implemented internally, they want to be able to specify a sampling frequency (which is also referred to as sampling rate). In my opinion, consistency with edflib is not important. FWIW, edfio also uses sampling_frequency, but its meaning is just "sampling frequency" and not some internal EDF implementation detail. This would also be a nice way to use a name that differs from edflib, because they use sample_frequency, but pyedflib could use sampling_frequency (consistent with edfio).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants