Tracking Pathogens via Next Generation Sequencing (NGS)
From the Spring 2021 issue of "Microcosm."
Since the discovery and isolation of DNA, investigative processes aiming to decipher life's complex and variable genetic codes have become indispensable to public health infrastructure. Advances in nucleic acid sequencing have revolutionized the way we identify, characterize and track causative agents of disease. Where symptomology was once considered the gold standard of diagnostics, next generation — or high-throughput — sequencing has now become the enabling instrument of "precision public health," with applications in emerging infectious diseases, foodborne illness, antimicrobial resistance, biosurveillance, bioforensics and epidemiology, allowing for earlier detection and management of outbreaks and disease.
Here, we discuss some of the broad applications of pathogen genomics, as well as past, present and ongoing developments in this powerful area of research.
Next Generation Sequencing in the Microbiology Lab
Next generation sequencing (NGS) encompasses any high-throughput sequencing method that can process millions of individual DNA fragments (or cDNA fragments from RNA) at one time, and is predated by first-generation sequencing methods such as Sanger sequencing (also known as dideoxynucleotide chain termination sequencing) that process only one nucleotide per reaction. There is considerable debate about what defines the generations of DNA sequencing technology. While some purport that NGS began when massively parallel pyrosequencing was commercially released in 2005, others link the beginning of the NGS era to the evolution of single molecule sequencing (SMS) technology.
Either way, genomic sequencing facilitates genome assembly and metagenomic analysis. And according to Dr. Trish Simner, director of the Medical Bacteriology and Infectious Disease Sequencing Laboratories at the Johns Hopkins Hospital and Early Career At-Large representative for the American Society for Microbiology (ASM) Council on Microbial Science, among the many sequencing platforms in use today, "NGS has three main applications in the clinical microbiology lab."
- Whole genome sequencing (WGS) allows for comprehensive analysis of entire microbial genomes. This approach is often applied to pure, isolated colonies of organisms in the lab and provides large amounts of data in a short amount of time.
- Targeted next generation sequencing allows for sequence analysis of specific areas of the genome. The approach increases sensitivity directly from the specimen. However, selecting for specific targets in the sequence requires the use of an amplification approach by PCR or a hybridization approach after the library is prepared.
- Metagenomic sequencing (mNGS) allows for agnostic analysis of all nucleic acid in a given sample. Nothing specific is targeted, resulting in both host and microbial nucleic acid being sequenced. mNGS data can then be further mined to detect microbial nucleic acid and determine whether a pathogen of interest is present in the sample.
A battery of tests are typically ordered in an attempt to establish a diagnosis that is guided by physician ordering practices, but mNGS provides a hypothesis-free approach to detecting all microbial groups in a given sample. Detecting all nucleic acid in a sample gives researchers the ability to look at any portion of the genome sequenced, uncover coinfections, and identify new or unexpected organisms.
RNA-based mNGS of a respiratory sample from a patient in Wuhan is what allowed researchers to identify the cause of an outbreak of pneumonia spreading through China in late 2019. The causative agent turned out to be the novel coronavirus, later named SARS-CoV-2, which is responsible for the ongoing COVID-19 pandemic.
"It shows you the power of the method," said Dr. Simner. "Because it's so broad, it allows you to capture the rare, atypical or unknown pathogens."
How do researchers know when they have a novel pathogen on their hands? Sequencing is still considered a complex, expensive and relatively new diagnostic technique in the clinical lab. But when a cluster of patients presents with similar symptoms (like in Wuhan), for which standard diagnostic methods fail to identify a likely cause, it's a good sign that something unique is taking place, and a deeper analysis is warranted. At that point, sequencing approaches such as metagenomics can be especially helpful.
"You can sequence any nucleic acid in your sample, and the fun thing with metagenomics is that you can extract just the reads assigned to a single pathogen. And if you got sufficient coverage, you can actually assemble the genome with those reads and taxonomically classify the organism to its closest relative," Dr. Simner explained. In the case of SARS-CoV-2, that was closest to a bat coronavirus species called RaTG13 virus.
Surveillance and Tracking of Variants
After SARS-CoV-2 was identified, preexisting libraries of microbial knowledge, medical tools and scientific practices were rapidly deployed or repurposed to fight the evolving pandemic. Among these, genomic sequencing has remained (rather quietly) indispensable, not only for the early detection and investigation of SARS-CoV-2 outbreaks but also for the tracking of new variants.
In fact, by the time reports of a "new" SARS-CoV-2 variant, B.1.1.7, reached the general public, many research labs and medical and academic institutions around the world had already been tracking the evolution of the novel coronavirus for quite some time.
Dr. Heba Mostafa, assistant professor of pathology at the Johns Hopkins University School of Medicine, recalls the experience vividly. "When the pandemic started and we began diagnosing [SARS-CoV-2] in the lab in March 2020, it was very logical to me that we needed to start sequencing right away to understand what kind of virus diversity we have and if there are any correlations between changes in the viral genome and the severity of the disease," she shared, adding that they were already detecting diversity in the SARS-CoV-2 genome at the time.
Dr. Jacques Ravel, associate director of the Institute for Genome Sciences at the University of Maryland School of Medicine and an American Academy of Microbiology Fellow, described a similar experience. "We established genome sequencing of variants a long time ago, because that's what we do. We are a genomic center. We sequence things, and we are really good at that," he explained. "But you know, way back in the summer (2020), nobody cared about it." Genomic surveillance is expensive, with WGS typically costing approximately $150-$200 per sample, and despite having the required expertise and equipment, in the absence of regular funding many labs were unable to support the necessary ongoing sequencing efforts.
Meanwhile, the U.K. invested in epidemiological surveillance early in the pandemic. The COVID-19 Genomics Consortium UK (COG-UK) was created in late March 2020 with £20 million in funding from UK Research and Innovation, and was granted an additional £12.2 million from the Department of Health and Social Care's Testing Innovation Fund in November 2020 to build a real-time surveillance system of emerging outbreaks. Early action and adequate funding enabled the U.K. to discover the B.1.1.7 variant and alert much of the world that SARS-CoV-2 was evolving.
As news of circulating SARS-CoV-2 variants, including B.1.351 and P.1, increased, so did local, national and international efforts to increase sequencing capacity. Both the University of Maryland and Johns Hopkins University are now working closely with the Maryland Department of Health to ramp up sequencing, with a goal of processing 10% of the total positive cases in Maryland, a number that, according to Dr. Ravel, epidemiologists say is a good number.
First you need to catch where the variants are located, and then you can do your epidemiological study to capture contacts and so on," Dr. Ravel said. "Together, this gives a good picture of the penetration in a given area." Many other academic institutions and research and medical centers across the country are collaborating in a similar manner.
In an NPR interview, Dr. Vaughn Cooper, professor of Microbiology and Molecular Genetics and Computational and Systems Biology at the University of Pittsburgh School of Medicine, and an ASM COMS-elected Board director and co-founder of the Microbial Genome Sequencing Center (MiGS), which has also sequenced 1000s of isolates of SARS-CoV-2 for many customers and collaborators, stated, "There are a few big companies being contracted to do sequencing by the CDC, but they've also contracted with a relatively small number of academic medical centers. And I understand that number of contractees is growing. We really need to engage researchers at academic medical centers who have this ability to join the effort."
According to Dr. Cooper, a major unmet need in legislation is training people in county and state public health labs and academic partners to turn WGS data into knowledge in a timely fashion. "Sequencing is cheap; analysis is expensive," he said. "Our training and funding should reflect these facts."
The Centers for Disease Control and Prevention's (CDC) Advanced Molecular Detection program (AMD) has been working to increase the availability of next generation sequencing in state and local public health systems since its inception in 2014, with the goals of faster disease and outbreak detection and protection from emerging and evolving disease threats. ASM is a leading advocate for greater investment in pathogen genomics and strongly supports the AMD goals.
The work of CDC's AMD program is fundamental to U.S. leadership in sequencing SARS-CoV-2 samples, and strengthening this effort will allow us to get ahead and stay ahead of the COVID-19 variants of concern as they emerge and circulate," said ASM CEO Dr. Stefano Bertuzzi.
On March 11, 2021, $1.75 billion was allocated to the CDC's AMD program as part of the Energy and Commerce Committee's COVID-19 relief budget reconciliation package. Thanks to the dedicated efforts of many microbiologists who have tirelessly built a foundation upon which SARS-CoV-2 surveillance can now expand, the increased funding will help efficiently ramp up sequencing capacity to track SARS-CoV-2 evolution; elucidate the source, timing, transmission and spread of circulating variants; inform public health practices; and guide vaccine rollouts.
Screening for Foodborne Pathogens
Before, during and after the COVID-19 pandemic, a number of other vital applications for NGS have also expanded. For example, NGS is revolutionizing food microbiology. Where pulsed field gel electrophoresis (PFGE) was once the gold standard for characterizing outbreaks, WGS is now being used to screen for pathogens that are contaminating the food chain. One advantage of WGS is that it provides data about evolutionary relationships between bacterial isolates, and comparison of lineages can help provide links between cases and transmission. Routine surveillance may therefore prevent future outbreaks of foodborne illness.
In 2012, the U.S. Food and Drug Administration (FDA) created the GenomeTrackr Network as a means to aggregate and share WGS data of foodborne pathogens collected by public health and university laboratories across the country. The data is housed and analyzed by the National Center for Biotechnology Information (NCBI) but can also be publicly accessed for real-time comparison and analysis.
Thanks to information collected by this network, 709 public health actions have been taken to prevent foodborne illness since 2013, including, but not limited to, investigation of E. coli infections linked to flour (2017), Salmonella linked to dried coconut (2018) and numerous pet food recalls. Furthermore, the FDA has partnered with the CDC to sequence all Listeria monocytogenes isolates in the U.S., with state labs in Washington, Minnesota and New York to conduct real-time samplings of food. Environmental and clinical samples of Salmonella, E. coli, Campylobacter, Vibrio, Cronobacter, parasites and viruses are also beginning to be sequenced by labs in the GenomeTrackr network, and it's exciting to consider where analysis of that data might lead.
Assessment of Antimicrobial Resistance and Predicting Antimicrobial Susceptibilities
Antimicrobial resistance remains one of the greatest public health threats of our time and is responsible for more than 700,000 lives lost annually, according to World Health Organization (WHO) estimates. On a global scale, most of these deaths are caused by resistance in malaria, tuberculosis and HIV. In highly developed countries, hospital-acquired infections, such as methicillin-resistant Staphylococcus aureus (MRSA), enterobacterales with extended-spectrum beta-lactamase (ESBL) resistance, and other emerging pan-resistant Gram-negative bacteria are often the culprits.
NGS offers the technology to identify genetic determinants of antimicrobial resistance and the power to monitor events related to the emergence and spread of AMR. Sequence-based approaches are being employed to analyze the microbial resistome (collection of antibiotic resistance genes) in a variety of ways. For example, WGS has been used to predict the species and drug susceptibility profile of mycobacteria with 93% accuracy, analyze the origins of MRSA to reveal that resistant strains of the bacteria emerged long before methicillin was introduced into clinical practice, and provide useful information about detecting known and new mechanisms of drug resistance in Leishmania, to name a few.
Today, broad-based panels that use targeted NGS to identify pathogens or profile bacteria based off 16S rRNA gene sequencing are beginning to include wide varieties of AMR genes on their panels.
"This antimicrobial resistance detection by next gen sequencing is one of my interests," explained Dr. Simner. Carbapenem-resistant organisms are one of her primary research foci. "If we think about direct-from-specimen detection of antimicrobial resistance, the targeted approach is going to be the best approach because oftentimes for antimicrobial resistance, we're only looking for a specific gene or a single nucleotide variant, as opposed to trying to sequence any part of the genome, which is used by a metagenomic approach." She expects that we will one day see broad-based targeted NGS platforms expanding past pathogen identification and AMR detection to include other characteristics like virulence factors. And as NGS becomes more affordable and bioinformatic analysis of NGS data improves, analysis of the entire genome may allow us to see things we might not have otherwise predicted, including suppressor mutations.
Precision Medicine: The Future of Pathogen Genomics
Genomic sequencing has certainly come a long way since the first bacterial genome (Haemophilus influenza) was sequenced in 1995 at the Institute for Genomic Research (TIGR) in Rockville, Md., where Dr. Ravel was an assistant investigator before accepting his current position with the University of Maryland School of Medicine. In the near future, high-throughput sequencing of metagenomic samples could revolutionize the speed and accuracy with which we diagnose pathogens and treat infection. Metagenomic analysis of patient samples will revolutionize the way we conduct medicine on the whole. Metagenomic analysis picks up not only microbial DNA/RNA, but also the host transcriptome (RNA-sequencing approach), something that researchers currently ignore during analysis, but Dr. Simner hopes it will one day be used to our advantage.
There's so much host information there," Dr. Simner pointed out. "Novel diagnostic tools that rely on biomarkers are being developed, but we already sequence the host biomarkers in the metagenomic approach.
Next steps include looking at the combination of biomarkers to try to identify whether there is a host response to the pathogen, and searching for shifts in the microbiome that might make the host more prone to infection, rather than simply acquiring or being colonized by a particular pathogen. Dr. Simner adds, "It truly is a precision medicine test!"