Emu: Species-Level Microbial Community Profiling for Full-Length Nanopore 16S Reads

Kristen D. Curry; Qi Wang; Michael G. Nute; Alona Tyshaieva; Elizabeth Reeves; Sirena Soriano; Enid Graeber; Patrick Finzer; Werner Mendling; Qinglong Wu; Tor Savidge; Sonia Villapol; Alexander Dilthey; Todd J. Treangen

doi:10.1101/2021.05.02.442339

Emu: Species-Level Microbial Community Profiling for Full-Length Nanopore 16S Reads

Kristen D. Curry, Qi Wang, Michael G. Nute, Alona Tyshaieva, Elizabeth Reeves, Sirena Soriano, Enid Graeber, Patrick Finzer, Werner Mendling, Qinglong Wu, Tor Savidge, Sonia Villapol, Alexander Dilthey, Todd J. Treangen

Research output: Contribution to journal › Article

Abstract

16S rRNA based analysis is the established standard for elucidating microbial community composition. While short read 16S analyses are largely confined to genus-level resolution at best since only a portion of the gene is sequenced, full-length 16S sequences have the potential to provide species-level accuracy. However, existing taxonomic identification algorithms are not optimized for the increased read length and error rate of long-read data. Here we present Emu, a novel approach that employs an expectation-maximization (EM) algorithm to generate taxonomic abundance profiles from full-length 16S rRNA reads. Results produced from one simulated data set and two mock communities prove Emu capable of accurate microbial community profiling while obtaining fewer false positives and false negatives than alternative methods. Additionally, we illustrate a real-world application of our new software by comparing clinical sample composition estimates generated by an established whole-genome shotgun sequencing workflow to those returned by full-length 16S sequences processed with Emu.Competing Interest StatementThe authors have declared no competing interest.

Original language	Undefined/Unknown
Journal	bioRxiv
DOIs	https://doi.org/10.1101/2021.05.02.442339
State	Unpublished - 2021

Access to Document

10.1101/2021.05.02.442339

https://www.biorxiv.org/content/early/2021/05/03/2021.05.02.442339

Cite this

@article{96498d1852834c0881cfa733bf2ff949,

title = "Emu: Species-Level Microbial Community Profiling for Full-Length Nanopore 16S Reads",

abstract = "16S rRNA based analysis is the established standard for elucidating microbial community composition. While short read 16S analyses are largely confined to genus-level resolution at best since only a portion of the gene is sequenced, full-length 16S sequences have the potential to provide species-level accuracy. However, existing taxonomic identification algorithms are not optimized for the increased read length and error rate of long-read data. Here we present Emu, a novel approach that employs an expectation-maximization (EM) algorithm to generate taxonomic abundance profiles from full-length 16S rRNA reads. Results produced from one simulated data set and two mock communities prove Emu capable of accurate microbial community profiling while obtaining fewer false positives and false negatives than alternative methods. Additionally, we illustrate a real-world application of our new software by comparing clinical sample composition estimates generated by an established whole-genome shotgun sequencing workflow to those returned by full-length 16S sequences processed with Emu.Competing Interest StatementThe authors have declared no competing interest.",

author = "Curry, {Kristen D.} and Qi Wang and Nute, {Michael G.} and Alona Tyshaieva and Elizabeth Reeves and Sirena Soriano and Enid Graeber and Patrick Finzer and Werner Mendling and Qinglong Wu and Tor Savidge and Sonia Villapol and Alexander Dilthey and Treangen, {Todd J.}",

year = "2021",

doi = "10.1101/2021.05.02.442339",

language = "Undefined/Unknown",

journal = "bioRxiv",

publisher = "Cold Spring Harbor Laboratory Press",

}

TY - JOUR

T1 - Emu: Species-Level Microbial Community Profiling for Full-Length Nanopore 16S Reads

AU - Curry, Kristen D.

AU - Wang, Qi

AU - Nute, Michael G.

AU - Tyshaieva, Alona

AU - Reeves, Elizabeth

AU - Soriano, Sirena

AU - Graeber, Enid

AU - Finzer, Patrick

AU - Mendling, Werner

AU - Wu, Qinglong

AU - Savidge, Tor

AU - Villapol, Sonia

AU - Dilthey, Alexander

AU - Treangen, Todd J.

PY - 2021

Y1 - 2021

N2 - 16S rRNA based analysis is the established standard for elucidating microbial community composition. While short read 16S analyses are largely confined to genus-level resolution at best since only a portion of the gene is sequenced, full-length 16S sequences have the potential to provide species-level accuracy. However, existing taxonomic identification algorithms are not optimized for the increased read length and error rate of long-read data. Here we present Emu, a novel approach that employs an expectation-maximization (EM) algorithm to generate taxonomic abundance profiles from full-length 16S rRNA reads. Results produced from one simulated data set and two mock communities prove Emu capable of accurate microbial community profiling while obtaining fewer false positives and false negatives than alternative methods. Additionally, we illustrate a real-world application of our new software by comparing clinical sample composition estimates generated by an established whole-genome shotgun sequencing workflow to those returned by full-length 16S sequences processed with Emu.Competing Interest StatementThe authors have declared no competing interest.

AB - 16S rRNA based analysis is the established standard for elucidating microbial community composition. While short read 16S analyses are largely confined to genus-level resolution at best since only a portion of the gene is sequenced, full-length 16S sequences have the potential to provide species-level accuracy. However, existing taxonomic identification algorithms are not optimized for the increased read length and error rate of long-read data. Here we present Emu, a novel approach that employs an expectation-maximization (EM) algorithm to generate taxonomic abundance profiles from full-length 16S rRNA reads. Results produced from one simulated data set and two mock communities prove Emu capable of accurate microbial community profiling while obtaining fewer false positives and false negatives than alternative methods. Additionally, we illustrate a real-world application of our new software by comparing clinical sample composition estimates generated by an established whole-genome shotgun sequencing workflow to those returned by full-length 16S sequences processed with Emu.Competing Interest StatementThe authors have declared no competing interest.

U2 - 10.1101/2021.05.02.442339

DO - 10.1101/2021.05.02.442339

M3 - Article

JO - bioRxiv

JF - bioRxiv

ER -