TensorFlow Lite 与定点量化 | SF-Zhou's Blog #183
SF-Zhou
announced in
Announcements
Replies: 1 comment
-
对于 float32,按照 IEEE 754 规范,有效表示位数包括低位的 23bit 加上一个隐藏的起始 1bit,故笔者叙述为 24bit。实际上运行如下的 C++ 代码也可以简单地验证这一点: #include <iostream>
using namespace std;
int main() {
float a = 0.f;
while (a + 1.f > a) {
a += 1.f;
}
printf("%.2f\n", a);
} |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
https://sf-zhou.github.io/ml/tensorflow_lite_and_uint8_quantization.html
Beta Was this translation helpful? Give feedback.
All reactions