-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OndaEDF should use EDF-representable encoding parameters (physical min & max) #90
Comments
EDF (I presume) was designed as a format for data generated by A/D converters, which (excluding μ-law or other non-linear types) has a natural resolution (how much is 1 lsb worth?) and a span or range of valid values. On the other hand, if EDF is used as an output format from some complex process, the contextual clues from the data's origin may be unavailable (or never existed) and the problem changes to one of finding the best mapping given the limits of the format (e.g. 8 char ASCII representation of mapping parameters and Int16 samples). Those two constraints can play against each other. The EDF error reported above disappears if one chooses The error does not disappear if one naïvely chooses Is there an algorithmic way to find the optimal mapping into EDF? Would optimum be defined as minimum maximum error over a whole signal? Minimum RMS error? Should there be a function to determine the error in a given case? |
I'm not sure if I'm am following 100% but I think "two constraints can play against each other" refers to the two independent sources of error:
I don't think these two are really fundamentally in tension however! Why?
We can use function _nearest_representable_edf_value(x, rm::RoundingMode=RoundNearest)
return round(x, rm; digits=(8 - (ndigits(floor(Int, x)) + signbit(x) + !isinteger(x))))
end to choose one, deciding on whether or not we want to look up or down to get it. (See beacon-biosignals/EDF.jl#76 for the bugfix I've made to this function compared to it's version in EDF.jl). So I think if we just choose BTW, I will include some code below I have used while investigating, in case it is useful (and to put it somewhere). Codeusing Printf
function edf_roundtrip(x::Real)
result = missing
if isinteger(x)
str = string(trunc(Int, x))
if length(str) <= 8
result = str
end
else
fpart, ipart = modf(x)
ipart_str = string('-'^signbit(x), Int(abs(ipart))) # handles `-0.0` case
fpart_str = @sprintf "%.7f" abs(fpart)
fpart_str = fpart_str[3:end] # remove leading `0.`
if length(ipart_str) < 7
result = ipart_str * '.' * fpart_str[1:(7-length(ipart_str))]
elseif length(ipart_str) <= 8
result = ipart_str
end
end
if !ismissing(result)
return parse(Float64, result)
end
error("failed to fit header field into EDF's 8 ASCII character limit. Got: $x")
return nothing
end
edf_roundtrip_error(x) = abs(x - edf_roundtrip(x))
function _nearest_representable_edf_time_value(x, rm::RoundingMode=RoundNearest)
return round(x, rm; digits=(8 - (ndigits(floor(Int, x)) + signbit(x) + !isinteger(x))))
end
decode(sample, dmin, dmax, pmin, pmax) = ((sample - dmin) / (dmax - dmin)) * (pmax - pmin) + pmin
encode(sample, resolution_in_unit, offset_in_unit) = round(Int16, clamp((sample - offset_in_unit) / resolution_in_unit, typemin(Int16), typemax(Int16)))
function quantization_error(sample, dmin, dmax, pmin, pmax)
sample_resolution_in_unit = (pmax - pmin) / (dmax - dmin)
sample_offset_in_unit = pmin - (sample_resolution_in_unit * dmin)
encoded_sample = encode(sample, sample_resolution_in_unit, sample_offset_in_unit)
decoded_sample = decode(encoded_sample, dmin, dmax, pmin, pmax)
return abs(sample - decoded_sample)
end
function quantization_plus_thresholding_error(sample, dmin, dmax, pmin, pmax)
sample_resolution_in_unit = (pmax - pmin) / (dmax - dmin)
sample_offset_in_unit = pmin - (sample_resolution_in_unit * dmin)
encoded_sample = encode(sample, sample_resolution_in_unit, sample_offset_in_unit)
decoded_sample = decode(encoded_sample, edf_roundtrip(dmin), edf_roundtrip(dmax), edf_roundtrip(pmin), edf_roundtrip(pmax))
return abs(sample - decoded_sample)
end
additional_error_due_to_truncation = (args...) -> (quantization_plus_thresholding_error(args...) - quantization_error(args...))
additional_quantization_error_due_to_rounding = (sample, dmin, dmax, pmin, pmax) -> (quantization_error(sample, _nearest_representable_edf_time_value(dmin, RoundDown), _nearest_representable_edf_time_value(dmax, RoundUp), _nearest_representable_edf_time_value(pmin, RoundDown), _nearest_representable_edf_time_value(pmax, RoundUp)) - quantization_error(sample, dmin, dmax, pmin, pmax))
function error_curve(f, pmin, pmax)
dmin, dmax = Float64(typemin(Int16)), Float64(typemax(Int16))
xs = range(pmin, pmax; length=100)
ys = [f(sample, dmin, dmax, pmin, pmax) for sample in xs]
return (xs, ys)
end
using CairoMakie
pmin, pmax = -3200.0, 3200.0 # or pmin, pmax = 1e6 + 0.999999, 1e6 + 1.999999
# periodic, probably an analytic formula
lines(error_curve(quantization_error, pmin, pmax)...)
lines(error_curve(quantization_plus_thresholding_error, pmin, pmax)...)
lines(error_curve(additional_error_due_to_truncation, pmin, pmax)...)
lines(error_curve(additional_quantization_error_due_to_rounding, pmin, pmax)...) BTW: this is why I've been saying that I don't think stuff like beacon-biosignals/EDF.jl#70 is really getting at the root problem here. If we represent more numbers accurately, that's helpful, but really we should be choosing exactly-representable encoding parameters from the start, since we both get to choose them here and know the target is EDF.jl, and can thus eliminate truncation error completely, just by calling |
@ericphanson Did you mean for the funcitons |
@ericphanson I don't have time today to keep up with all the great detail you've put in here, but let me know if I've got the TL;DR summary correct: The root of the problem in your original example was choosing offset and resolution at the Onda stage without considering that the chosen encoding parameters are not exactly representable in EDF. |
No, sorry, they are the same. The
Yes! That was actually what I was trying to say with this whole issue. |
physical_minimum
andphysical_maximum
) and then tries to store those in the file as 8 ASCII charactersOne approach here is to make EDF.jl better at representing encoding parameters, e.g. using scientific notation. I think improving EDF.jl's representation is a good thing to do. However, since OndaEDF gets to choose it's encoding, and a wide variety of encoding parameters can perform reasonably well for a given recording, I claim it should choose easy-to-represent ones.
That is, we could round
physical_minimum
down to the nearest EDF representable value, andphysical_maximum
up to the nearest EDF representable value before encoding. Thus, there should be no truncation/rounding when EDF encodes that parameter, and thus reduces unnecessary quantization noise.As a very silly example, the current approach can result in 5 orders of magnitude more encoding noise, simply due to the truncation of encoding parameters:
yields
Note in both cases we are using
OndaEDF.reencode_samples
to re-encode them. The only difference is passing through EDF.jl writing, which rounds/truncates these parameters.The text was updated successfully, but these errors were encountered: