Inquiry about Embedding Extraction #1663
Unanswered
PhilipAmadasun
asked this question in
Q&A
Replies: 1 comment 2 replies
-
First: Therefore the similarity-Identification might be done somehow like that/that. I tried out the following:
But for me the question is, how big this "sim" must be to be identical. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have two questions about extracting embeddings
First:
from scipy.spatial.distance import cdist distance = cdist(embedding1, embedding2, metric="cosine")[0,0]
Gives an error, I was following the tutorial here
I have to do this instead:
distance = cdist(np.expand_dims(embedding1,axis=0), np.expand_dims(embedding2, axis=0), metric="cosine")[0,0]
To get results. And now I'm confused if I'm even supposed to do this.
Second:
Do you get the most accurate "Essence" of someone's voice via embedding by:
- Extracting embedding from longer audio clips of the persons talking?
- Gathering (let's say) 30 millisecond chunks of audio, extracting the embeddings from each chunk, then getting an average embedding from them?
Which one of these is the way to go?
Beta Was this translation helpful? Give feedback.
All reactions