This work proposes a criterion for the identification of anomalies derived from potential biases in the labelling of data sets for supervised classification algorithms. Experimentation is carried out in emotion recognition, which is particularly prone to structural and induced biases. Through a preliminary analysis of the results, in this case from the automated classification of images using neural networks, a metric is proposed to identify anomalies and ambiguities in the results, comparing them to a knowledge base.Among the achievements of this work are developing an ambiguity graph; clarifying the mutual relationships among the classes by the correlation of anomalies, ambiguities and biases; setting up two experimental methods to test class robustness by downsampling and class ambiguities by contamination with samples from a neutral class. The same approach can be generalized to any classification application context, with simple adaptations of the target classes.The open problem of identifying bias in a data set is of major relevance and criticality for an ethical approach to automatic classification, especially in all those domains where the results of classification are going to affect the quality of human life.
Defining Classification Ambiguity to Discover a Potential Bias Applied to Emotion Recognition Data Sets
Biondi, GSoftware
;Franzoni, V
Supervision
;Milani, AFormal Analysis
2022
Abstract
This work proposes a criterion for the identification of anomalies derived from potential biases in the labelling of data sets for supervised classification algorithms. Experimentation is carried out in emotion recognition, which is particularly prone to structural and induced biases. Through a preliminary analysis of the results, in this case from the automated classification of images using neural networks, a metric is proposed to identify anomalies and ambiguities in the results, comparing them to a knowledge base.Among the achievements of this work are developing an ambiguity graph; clarifying the mutual relationships among the classes by the correlation of anomalies, ambiguities and biases; setting up two experimental methods to test class robustness by downsampling and class ambiguities by contamination with samples from a neutral class. The same approach can be generalized to any classification application context, with simple adaptations of the target classes.The open problem of identifying bias in a data set is of major relevance and criticality for an ethical approach to automatic classification, especially in all those domains where the results of classification are going to affect the quality of human life.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.