MinION Sequencing of Yeast Mock Communities To Assess the Effect of Databases and ITS-LSU Markers on the Reliability of Metabarcoding Analysis

Conti, Angela; Casagrande Pierantoni, Debora; Robert, Vincent; Corte, Laura; Cardinali, Gianluigi

doi:10.1128/spectrum.01052-22

Microbial communities play key roles both for humans and the environment. They are involved in ecosystem functions, maintaining their stability, and provide important services, such as carbon cycle and nitrogen cycle. Acting both as symbionts and as pathogens, description of the structure and composition of these communities is important. Metabarcoding uses ribosomal DNA (rDNA) (eukaryotic) or rRNA gene (prokaryotic) sequences for identification of species present in a site and measuring their abundance. This procedure requires several technical steps that could be source of bias producing a distorted view of the real community composition. In this work, we took advantage of an innovative "long-read" next-generation sequencing (NGS) technology (MinION) amplifying the DNA spanning from the internal transcribed spacer (ITS) to large subunit (LSU) that can be read simultaneously in this platform, providing more information than "short-read" systems. The experimental system consisted of six fungal mock communities composed of species present at various relative amounts to mimic natural situations characterized by predominant and low-frequency species. The influence of the sequencing platform (MinION and Illumina MiSeq) and the effect of different reference databases and marker sequences on metagenomic identification of species were evaluated. The results showed that the ITS-based database provided more accurate species identification than LSU. Furthermore, a procedure based on a preliminary identification with standard reference databases followed by the production of custom databases, including only the best outputs of the first step, is proposed. This additional step improved the estimate of species proportion of the mock communities and reduced the number of ghost species not really present in the simulated communities.IMPORTANCE Metagenomic analyses are fundamental in many research areas; therefore, improvement of methods and protocols for the description of microbial communities becomes more and more necessary. Long-read sequencing could be used for reducing biases due to the multicopy nature of rDNA sequences and short-read limitations. However, these novel technologies need to be assessed and standardized with controlled experiments, such as mock communities. The interest behind this work was to evaluate how long reads performed identification and quantification of species mixed in precise proportions and how the choice of database affects such analyses. Development of a pipeline that mitigates the effect of the barcoding sequences and the impact of the reference database on metagenomic analyses can help microbiome studies go one step further. Metagenomic analyses are fundamental in many research areas; therefore, improvement of methods and protocols for the description of microbial communities becomes more and more necessary. Long-read sequencing could be used for reducing biases due to the multicopy nature of rDNA sequences and short-read limitations.

IRIS - Res&Arch Institutional Research Information System - Research & Archive