A python implementation of the imbalance-degree measure for multi-class imbalanced datasets characterization.
This measure is proposed in [1] as an alternative for the well known imbalance-ratio used with binary class imbalanced datasets.
This implementation was developed and maintained by Mario Juez-Gil from ADMIRABLE research group of the University of Burgos, with the help and useful advice from Álvar Arnaiz-González, Juan J. Rodriguez, and his PhD thesis supervisors: César García-Osorio, and Carlos López-Nozal.
This module exposes a function called imbalance_degree which takes two arguments:
classes
: A list of classes (targets) of each instance of the dataset.distance
: distance or similarity function identifier. It can take the following values (EU
by default):EU
: Euclidean distance.CH
: Chebyshev distance.KL
: Kullback Leibler divergence.HE
: Hellinger distance.TV
: Total variation distance.CS
: Chi-square divergence.
An usage example could be:
example.py
from imbalance_degree import imbalance_degree
import numpy as np
classes = np.array([0,0,0,1,1,2])
print(imbalance_degree(classes, "EU"))
output:
0.49999999999999994
[1] J. Ortigosa-Hernández, I. Inza, and J. A. Lozano, “Measuring the class-imbalance extent of multi-class problems,” Pattern Recognit. Lett., 2017. DOI: 10.1016/j.patrec.2017.08.002
Licensed under the GNU GPLv3, please see the LICENSE file for more details.
This work was partially supported by the Consejería de Educación of the Junta de Castilla y León and by the European Social Fund with the EDU/1100/2017 pre-doctoral grants; by the project TIN2015-67534-P (MINECO/FEDER, UE) of the Ministerio de Economía Competitividad of the Spanish Government and the project BU085P17 (JCyL/FEDER, UE) of the Junta de Castilla y León both cofinanced from European Union FEDER funds.