Machine Learning Classifier Evaluation for Different Input Combinations: A Case Study with Landsat 9 and Sentinel-2 Data

Palanisamy, P. A.; Jain, K.; Bonafoni, S.

doi:10.3390/rs15133241

High-resolution multispectral remote sensing images offer valuable information about various land features, providing essential details and spatially accurate representations. In the complex urban environment, classification accuracy is not often adequate using the complete original multispectral bands for practical applications. To improve the classification accuracy of multispectral images, band reduction techniques are used, which can be categorized into feature extraction and feature selection techniques. The present study examined the use of multispectral satellite bands, spectral indices (including Normalized Difference Built-up Index, Normalized Difference Vegetation Index, and Normalized Difference Water Index) for feature extraction, and the principal component analysis technique for feature selection. These methods were analyzed both independently and in combination for the classification of multiple land use and land cover features. The classification was performed for Landsat 9 and Sentinel-2 satellite images in Delhi, India, using six machine learning techniques: Classification and Regression Tree, Minimum Distance, Naive Bayes, Random Forest, Gradient Tree Boosting, and Support Vector Machine on Google Earth Engine platform. The performance of the classifiers was evaluated quantitatively and qualitatively to analyze the classification results with whole image (comprehensive feature) and small subset (targeted feature). The RF and GTB classifiers were found to outperform all others in the quantitative analysis of all input combinations for both Landsat 9 and Sentinel-2 datasets. RF achieved a classification total accuracy of 96.19% for Landsat and 96.95% for Sentinel-2, whereas GTB achieved 91.62% for Landsat and 92.89% for Sentinel-2 in all band combinations. Furthermore, the RF classifier achieved the highest F1 score of 0.97 in both the Landsat and Sentinel datasets. The qualitative analysis revealed that the PCA bands were particularly useful to classifiers in distinguishing even the slightest differences among the feature class. The findings contribute to the understanding of feature extraction and selection techniques for land use and land cover classification, offering insights into their effectiveness in different scenarios.

IRIS - Res&Arch Institutional Research Information System - Research & Archive