Python implementation of binary similarity (see [1]) and distance measures (see [2]). The bitsets
(immutable ordered set data type) and numpy.ndarray
are suported as feature vectors.
Example based on bitsets
:
from bitsets import bitset
from binsdpy.similarity import jaccard
from binsdpy.distance import euclid
Colors = bitset("Colors", ("red", "blue", "green", "yellow"))
a = Colors.frommembers(["red", "blue"])
b = Colors.frommembers(["red", "yellow"])
jaccard(a, b)
# > 0.3333333333333333
euclid(a, b)
# > 1.4142135623730951
Example based on np.ndarray
:
import numpy as np
from binsdpy.similarity import jaccard
from binsdpy.distance import euclid
a = np.array([1, 1, 0, 0], dtype=bool)
b = np.array([1, 0, 0, 1], dtype=bool)
jaccard(a, b)
# > 0.3333333333333333
euclid(a, b)
# > 1.4142135623730951
Package is avaliable in alpha version via pip
.
$ pip install binsdpy
binsdpy requires:
- Python (>= 3.6)
- bitset
- numpy
[1] Brusco, M., Cradit, J. D., & Steinley, D. (2021). A comparison of 71 binary similarity coefficients: The effect of base rates. Plos one, 16(4), e0247751. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0247751
[2] Choi, S. S., Cha, S. H., & Tappert, C. C. (2010). A survey of binary similarity and distance measures. Journal of systemics, cybernetics and informatics, 8(1), 43-48. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.352.6123&rep=rep1&type=pdf