Golden Helix · Clinical Genomics Guide
Next Generation Sequencing
A Complete Guide for Clinical Labs
From raw sequencing reads to a defensible clinical report. Everything labs, bioinformaticians, and genetic counselors need to know about NGS.
Introduction
The technology is solved.
The challenge is what comes next.
Next generation sequencing has reshaped what clinical laboratories can offer their patients. What once required years of work and billions of dollars, sequencing a single human genome, can now be accomplished in days, at a cost that makes routine diagnostic testing feasible at scale.
The technology itself is only part of the story. For clinical labs the real challenge is not running the sequencer. It is what happens to the data afterward: the computational pipelines, the variant classification frameworks, the regulatory requirements, and the interpretive expertise needed to translate billions of base pairs into a meaningful clinical report.
Definition
What Is Next Generation Sequencing?
Next generation sequencing (NGS) is a family of high-throughput DNA and RNA sequencing technologies that read millions of nucleotide fragments simultaneously. The same capability appears under several names: massively parallel sequencing, high-throughput sequencing, deep sequencing. All describe the same core idea of reading enormous amounts of genetic material in a single run.
The "next generation" label distinguishes these technologies from first-generation Sanger sequencing, which reads one DNA fragment at a time. NGS achieves its throughput by breaking genomic DNA into millions of small fragments, sequencing those fragments in parallel, then using computational software to reassemble them against a reference genome.
In the clinical context, NGS is the foundational technology behind genetic panels, whole exome sequencing (WES), and whole genome sequencing (WGS). It powers the diagnosis of rare inherited disease, cancer tumor profiling, newborn screening programs, and pharmacogenomic testing.
Context
A Brief History: From Sanger to NGS
1977 · First Generation
Sanger Sequencing
Chain-termination chemistry reads one DNA fragment at a time. Sanger remains the gold standard for confirming specific variants, but it is too slow and expensive for genome-scale work.
2000s onward · Second Generation
Next Generation Sequencing (NGS)
Massively parallel sequencing-by-synthesis. Sketched in 2003 at the Panton Arms pub in Cambridge by biochemist Shankar Balasubramanian and laser physicist David Klenerman, who founded Solexa. Illumina acquired Solexa in 2007, and the reversible-terminator chemistry has dominated clinical genomics ever since. Sequencing cost dropped from billions to hundreds of dollars.
Now entering clinical use · Third Generation
Long-Read Sequencing (PacBio, Oxford Nanopore)
Reads longer DNA fragments without an amplification step, resolving complex structural variants and repetitive regions. Has not yet displaced short-read NGS for most routine diagnostic applications.
Laboratory Process
How NGS Works: The Lab Workflow
Before data ever reaches a bioinformatician, the sample passes through a series of wet-lab preparation steps. The quality of the sequencing data is largely determined by what happens before the sequencer is ever loaded.
- 01
Sample Collection & DNA Extraction
Sequencing begins with biological material: blood, saliva, tissue, or a tumor biopsy. DNA is extracted and quantified. Sample quality at this stage directly drives downstream data quality. Degraded DNA produces shorter fragments, lower coverage, and a higher rate of artifacts.
- 02
Library Preparation
DNA is fragmented into ~150–500 bp pieces. Synthetic adapters are ligated to both ends, letting fragments attach to the flow cell and enabling multiplexing of multiple samples. Failures here, including poor ligation efficiency, PCR bias, or inadequate GC-rich coverage, propagate through the entire downstream analysis.
- 03
Sequencing (Primary Analysis)
The library is loaded onto a sequencer. Fragments attach to a flow cell, are amplified into clusters, and fluorescently labeled nucleotides are incorporated one at a time. The output is a FASTQ file containing millions of reads, each base paired with a Phred quality score. A WES run produces 10–20 GB of raw data; WGS produces 100–200+ GB.
- 04
Secondary Analysis: Alignment & Variant Calling
Reads map to the human reference genome (GRCh38) using BWA-class aligners, producing a BAM file. Clinical NGS always aligns to a reference; de novo assembly is a research-only path for organisms without one. Duplicates are removed. Variant callers then identify positions that differ from the reference. SNV and small-indel calling reaches very high accuracy in well-covered regions, particularly with modern deep-learning-assisted callers (DeepVariant, Sentieon DNAscope, Illumina DRAGEN). CNV and structural variant calling remain active design choices, with meaningful sensitivity differences between strategies. Output: a VCF file.
- 05
Tertiary Analysis: Annotation, Filtering & Interpretation
A human genome contains ~3–4 million variants vs. the reference. Tertiary analysis narrows this to the handful that may explain the patient's phenotype. It involves annotation with ClinVar, gnomAD, and OMIM; filtering for quality and frequency; ACMG/AMP classification; and final report generation. This stage requires the most expertise, the most time, and the best software.
What NGS Detects
Variant Types Detected by NGS
One of NGS's core advantages is its ability to detect multiple classes of genetic variation in a single assay. Clinical labs should understand each type, its detection reliability, and the analytical approaches required.
Most common
Single Nucleotide Variants (SNVs)
A change at a single base position. The best-characterized class in clinical databases. Standard NGS detects germline SNVs with high sensitivity and specificity at ≥30x coverage. Examples: BRAF V600E, BRCA1/2 pathogenic variants.
Small scale
Insertions & Deletions (Indels)
Insertions or deletions of 1–50 bases. Frameshift indels are a common loss-of-function mechanism. Detection accuracy falls for longer indels and homopolymer runs.
Large scale
Copy Number Variants (CNVs)
Gains or losses of large genomic segments (kb to Mb). Examples: ERBB2 amplification in breast cancer, 22q11.2 deletion syndrome. Requires dedicated CNV callers (VS-CNV, CNVkit, others); general-purpose SNV callers cannot detect these.
Rearrangements
Structural Variants (SVs)
Large-scale rearrangements: inversions, translocations, large deletions and duplications. Includes the gene fusions critical in cancer (BCR-ABL1, EML4-ALK). Short-read NGS has limited SV sensitivity; long-read improves this substantially.
Drug response
Pharmacogenomic (PGx) Star Alleles
Haplotype combinations across genes like CYP2D6, CYP2C19, and DPYD that predict drug metabolism. Requires specialized databases and haplotype-calling algorithms beyond standard variant calling pipelines.
Assay Selection
NGS Test Types: Panels, Exomes & Genomes
Clinical NGS is not a single test. It is a family of assays with different scopes, costs, and diagnostic yields. Choosing the right test for the clinical question is as important as running it correctly.
Targeted Gene Panels
Sequences a predefined set of genes for a specific indication. Achieves very high depth (500x–1000x+), improving sensitivity for low-frequency somatic and mosaic variants. Less expensive. Misses any causal variant outside covered genes.
High depth · Lower costWhole Exome Sequencing (WES)
Captures protein-coding regions of all ~20,000 genes. About 1–2% of the genome but ~85% of known disease-causing variants. The workhorse of rare disease diagnosis, carrier screening, and research-grade clinical testing.
Broad · Diagnostic workhorseWhole Genome Sequencing (WGS)
Covers all ~3.2 billion base pairs, including non-coding regulatory variants. Detects all variant types at single-base resolution. Increasingly used first-line for critically ill neonates and rare disease with no prior diagnosis.
Complete · Growing as first-lineLiquid Biopsy / cfDNA
Analyzes tumor-shed DNA circulating in the bloodstream. Detects somatic variants at very low allele frequencies (<1%), enabling non-invasive tumor monitoring and minimal residual disease detection. Requires ultra-sensitive variant callers.
Non-invasive · Ultra-sensitiveEnd-to-End View
From Sequencing to Clinical Report
The NGS pipeline is a process of progressive data reduction and progressive clinical enrichment. At each step, data volume shrinks by orders of magnitude. Clinical value increases at every stage.
| Stage | Output Format | Data Volume | Clinical Value |
|---|---|---|---|
| Sequencer | FASTQ | ~100–200 GB | Raw, no clinical value yet |
| Alignment & Variant Calling | BAM / VCF | ~1–10 GB | Candidate variants identified |
| Annotation & Filtering | Annotated VCF | ~50 MB | Candidates narrowed to review set |
| Interpretation & Classification | Variant list | ~10 KB | Clinically classified findings |
| Clinical Report | PDF / EHR | <1 MB | Actionable diagnostic conclusions |
A failure anywhere in this chain matters. A poorly prepared library, a misaligned read, an outdated annotation database, or an incorrectly applied ACMG criterion can each produce a false-negative or false-positive clinical conclusion.
Before You Deploy
Key Considerations for Clinical Laboratories
- 01
Coverage and Uniformity
Minimum coverage: ≥20x for WGS, ≥30x for WES. Coverage at individual loci matters as much as mean coverage. A single exon with inadequate depth is a diagnostic gap. Clinical validation must characterize coverage uniformity across all target regions.
- 02
Reference Genome Version
GRCh38 (hg38) is the current standard and recommended for new pipelines. Labs still on GRCh37 (hg19) should plan migrations carefully. Variant coordinates differ between assemblies, and a mismatch can produce incorrect clinical conclusions.
- 03
Analytical Validation
Every NGS assay must be analytically validated before clinical deployment, confirming sensitivity, specificity, precision, and reportable range for each variant type. CAP and ACMG have published guidelines (Gargis et al., 2012; Roy et al., 2018) defining minimum thresholds for CAP/CLIA accreditation.
- 04
Pipeline Reproducibility
For clinical use the analytical pipeline must be deterministic. The same input files must produce the same output every time. This rules out algorithms with random sampling or non-deterministic steps. Regulatory bodies require that labs can reproduce any result on demand.
- 05
Data Storage and Security
Genomic data is uniquely sensitive. Unlike a compromised password, a patient's genome cannot be changed once exposed. Labs must comply with HIPAA, GDPR (for EU data), and institutional governance policies. On-premises vs. cloud, encryption, access controls, and retention decisions belong in the design phase, not the launch phase.
- 06
Regulatory Framework
NGS-based tests are regulated as laboratory-developed tests (LDTs) under CLIA in the US, with CAP inspector oversight of bioinformatics pipelines, validation studies, and quality management. Work with software vendors operating under ISO 13485-certified quality management systems for audit-ready documentation.
Pricing Reference
How Much Does NGS Cost?
NGS pricing varies by test type, clinical vs. research context, and insurance coverage. The figures below reflect typical U.S. clinical laboratory pricing as of 2024 to 2025.
| Test Type | Typical Clinical Cost | Common Use Cases |
|---|---|---|
| Targeted gene panel | $300 – $1,500 | Hereditary cancer, cardiology, pharmacogenomics |
| Whole exome sequencing (WES) | $1,000 – $3,000 | Rare disease diagnosis, pediatric genetics |
| Whole genome sequencing (WGS) | $2,000 – $5,000 | Critically ill neonates, undiagnosed disease |
| Liquid biopsy / ctDNA panel | $1,500 – $5,000+ | Tumor monitoring, therapy selection |
The sequencing cost is only part of the total. Clinical interpretation, the labor required to review, classify, and report findings, adds significantly, especially for complex WGS rare disease cases. Labs that invest in automated interpretation platforms reduce per-case interpretation cost substantially over time. For research programs, WGS can be obtained from commercial providers for $200 to $400 per sample at scale, but this excludes library preparation, bioinformatics, and interpretation.
Common Questions
Frequently Asked Questions
What are the 4 steps of NGS?
What is the difference between NGS and Sanger sequencing?
Is NGS the same as Illumina?
What is the difference between NGS and whole genome sequencing?
How long does NGS analysis take?
What coverage depth is required for clinical NGS?
Is NGS data considered protected health information (PHI)?
What is the difference between germline and somatic NGS testing?
Keep Reading
Related Resources
Dive deeper into specific stages of the NGS pipeline, software selection, and clinical lab infrastructure.
See These Concepts in Practice
VarSeq is Golden Helix's platform for tertiary analysis, ACMG/AMP-guided variant classification, and clinical reporting. Purpose-built for laboratories that need a deterministic, validated, and scalable interpretation environment.