-
I wish to automate the process of generating cluster Ids for my dataset without having to provide labeled pairs or any input other than the data and schema. Is this possible with this library? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
@fgregg @NickCrews Please help! |
Beta Was this translation helpful? Give feedback.
-
Dedupe does not support unsupervised. If you need unsupervised training then you will want to look at a different software package. This list is a bit outdated, but it's a reasonable starting point. https://github.com/J535D165/data-matching-software |
Beta Was this translation helpful? Give feedback.
-
@NickCrews is right. i think there's some very interesting possibilities, but nothing in dedupe yet. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the quick response and the list @NickCrews @fgregg ! Dedupe is wonderfully written, I learnt a lot. |
Beta Was this translation helpful? Give feedback.
Dedupe does not support unsupervised. If you need unsupervised training then you will want to look at a different software package. This list is a bit outdated, but it's a reasonable starting point. https://github.com/J535D165/data-matching-software
I personally recommend splink.