The Bray-Curtis dissimilarity is widely used to calculate beta diversity on abundance data. However, the effect of undersampling on this index has received limited attention and only few studies addressed this topic. The paper aimed to investigate the error introduced by undersampling and its correlation to the similarity of the complete datasets, proposing a possible countermeasure, which is based on the addition of dummy species. To evaluate the performance of this proposed approach, we applied a meta-analytic technique based on repeated and random subsamples of 16 datasets on published biological assemblage data. We estimated the effect of undersampling on the resulting similarities and we compared the results with the adjusted version of the index resulted from the addition of extra species, also called dummy species, to the original abundance dataset. Undersampling generally resulted in poor accuracy and led to underestimates of assemblage similarities. The addition of dummy species resulted in a decrease in the severity of underestimations. To reach an accuracy >80% in similarity, more than 300 individuals needed to be randomly sampled and the under - and over-estimation rates decreased consistently by the addition of dummy species. Additionally, we found that the more similar two assemblages were, the more likely similarities were underestimated and this tendency was more severe at low sample sizes. Our simulation indicated that datasets which contain more than 300 individuals provided reliable estimates of similarities and that the addition of one to three dummy species to the abundance matrices was a good choice to reduce underestimates and increase accuracy.
Never underestimate biodiversity: how undersampling affects Bray–Curtis similarity estimates and a possible countermeasure
La Porta, G.
2023
Abstract
The Bray-Curtis dissimilarity is widely used to calculate beta diversity on abundance data. However, the effect of undersampling on this index has received limited attention and only few studies addressed this topic. The paper aimed to investigate the error introduced by undersampling and its correlation to the similarity of the complete datasets, proposing a possible countermeasure, which is based on the addition of dummy species. To evaluate the performance of this proposed approach, we applied a meta-analytic technique based on repeated and random subsamples of 16 datasets on published biological assemblage data. We estimated the effect of undersampling on the resulting similarities and we compared the results with the adjusted version of the index resulted from the addition of extra species, also called dummy species, to the original abundance dataset. Undersampling generally resulted in poor accuracy and led to underestimates of assemblage similarities. The addition of dummy species resulted in a decrease in the severity of underestimations. To reach an accuracy >80% in similarity, more than 300 individuals needed to be randomly sampled and the under - and over-estimation rates decreased consistently by the addition of dummy species. Additionally, we found that the more similar two assemblages were, the more likely similarities were underestimated and this tendency was more severe at low sample sizes. Our simulation indicated that datasets which contain more than 300 individuals provided reliable estimates of similarities and that the addition of one to three dummy species to the abundance matrices was a good choice to reduce underestimates and increase accuracy.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.