Train imatrix with model weights in 64Bit Precision #11072
Closed
joseph777111
started this conversation in
Ideas
Replies: 1 comment 5 replies
-
Unless the model was trained and saved at 64 bits it provides no value, even going bf16 -> f32 has no value (other than allowing GPU offloading during calculation, until we get CUDA bf16 support) |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
This thought has been bugging me in the back of my mind: can we train imatrices with the model weights in 64bit precision? I know it's overkill, but is it possible? Training the imatrix with an F32 of the model yields superior results. In fact, on my M1 Mac, I can train the F32 imatrix with the --process-output flag set (for llama-matrix), and the model actually benefits from it. So, extrapolating from that, I imagine that training imatrices in 64bit would yield even better results, considering the fact that I run my GGUF IQuants in OF32.EF32.IQ8_0 (Output Tensor.Embedddings.QuantSize). So, I'm curious what an Imatrix and model computed and quantized respectively from 64Bit model weights would yield? Is this possible, or am I ranting like a mad man? 🤔
PS... Llama.cpp + Apple Metal + FA (Flash Attention) is AWESOME! ❤️
@ggerganov @bartowski1182 @ikawrakow
Beta Was this translation helpful? Give feedback.
All reactions