You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I generate statistics from a .tfrecord file with generate_statistics_from_tfrecord, its histograms contain weird float values as the sample_counts of the buckets.
For example, in one bucket which is supposed to contain 10 samples, sample_count: 9.94000000834465 is used instead. How can I set the exact integer sample_count for each bucket?
TFDV currently uses an approximate method to determine the bucket boundaries in a single pass. The float values are due to this. One option would be to do some post-processing to round the values.
When I generate statistics from a
.tfrecord
file withgenerate_statistics_from_tfrecord
, its histograms contain weird float values as thesample_count
s of the buckets.For example, in one bucket which is supposed to contain 10 samples,
sample_count: 9.94000000834465
is used instead. How can I set the exact integersample_count
for each bucket?Here's a Colab to reproduce.
The text was updated successfully, but these errors were encountered: