Golden Helix · Clinical Genomics Guide

Next Generation Sequencing
A Complete Guide for Clinical Labs

From raw sequencing reads to a defensible clinical report. Everything labs, bioinformaticians, and genetic counselors need to know about NGS.

NGS AnalysisClinical PipelineVariant ClassificationLab ConsiderationsCosts & Coverage

Introduction

The technology is solved.
The challenge is what comes next.

Next generation sequencing has reshaped what clinical laboratories can offer their patients. What once required years of work and billions of dollars, sequencing a single human genome, can now be accomplished in days, at a cost that makes routine diagnostic testing feasible at scale.

The technology itself is only part of the story. For clinical labs the real challenge is not running the sequencer. It is what happens to the data afterward: the computational pipelines, the variant classification frameworks, the regulatory requirements, and the interpretive expertise needed to translate billions of base pairs into a meaningful clinical report.

~$3B
Cost to sequence one genome in 2000
~$100K
Cost of one genome in 2006, Illumina 1G Genetic Analyzer
~$400
Cost per genome today at research scale
10+ yrs
Time the Human Genome Project took
1–2 days
Typical clinical sequencing run time
~3.2B
Base pairs sequenced in a human whole genome

Definition

What Is Next Generation Sequencing?

Next generation sequencing (NGS) is a family of high-throughput DNA and RNA sequencing technologies that read millions of nucleotide fragments simultaneously. The same capability appears under several names: massively parallel sequencing, high-throughput sequencing, deep sequencing. All describe the same core idea of reading enormous amounts of genetic material in a single run.

The "next generation" label distinguishes these technologies from first-generation Sanger sequencing, which reads one DNA fragment at a time. NGS achieves its throughput by breaking genomic DNA into millions of small fragments, sequencing those fragments in parallel, then using computational software to reassemble them against a reference genome.

In the clinical context, NGS is the foundational technology behind genetic panels, whole exome sequencing (WES), and whole genome sequencing (WGS). It powers the diagnosis of rare inherited disease, cancer tumor profiling, newborn screening programs, and pharmacogenomic testing.

Context

A Brief History: From Sanger to NGS

  1. 1977 · First Generation

    Sanger Sequencing

    Chain-termination chemistry reads one DNA fragment at a time. Sanger remains the gold standard for confirming specific variants, but it is too slow and expensive for genome-scale work.

  2. 2000s onward · Second Generation

    Next Generation Sequencing (NGS)

    Massively parallel sequencing-by-synthesis. Sketched in 2003 at the Panton Arms pub in Cambridge by biochemist Shankar Balasubramanian and laser physicist David Klenerman, who founded Solexa. Illumina acquired Solexa in 2007, and the reversible-terminator chemistry has dominated clinical genomics ever since. Sequencing cost dropped from billions to hundreds of dollars.

  3. Now entering clinical use · Third Generation

    Long-Read Sequencing (PacBio, Oxford Nanopore)

    Reads longer DNA fragments without an amplification step, resolving complex structural variants and repetitive regions. Has not yet displaced short-read NGS for most routine diagnostic applications.

Laboratory Process

How NGS Works: The Lab Workflow

Before data ever reaches a bioinformatician, the sample passes through a series of wet-lab preparation steps. The quality of the sequencing data is largely determined by what happens before the sequencer is ever loaded.

  1. 01

    Sample Collection & DNA Extraction

    Sequencing begins with biological material: blood, saliva, tissue, or a tumor biopsy. DNA is extracted and quantified. Sample quality at this stage directly drives downstream data quality. Degraded DNA produces shorter fragments, lower coverage, and a higher rate of artifacts.

  2. 02

    Library Preparation

    DNA is fragmented into ~150–500 bp pieces. Synthetic adapters are ligated to both ends, letting fragments attach to the flow cell and enabling multiplexing of multiple samples. Failures here, including poor ligation efficiency, PCR bias, or inadequate GC-rich coverage, propagate through the entire downstream analysis.

  3. 03

    Sequencing (Primary Analysis)

    The library is loaded onto a sequencer. Fragments attach to a flow cell, are amplified into clusters, and fluorescently labeled nucleotides are incorporated one at a time. The output is a FASTQ file containing millions of reads, each base paired with a Phred quality score. A WES run produces 10–20 GB of raw data; WGS produces 100–200+ GB.

  4. 04

    Secondary Analysis: Alignment & Variant Calling

    Reads map to the human reference genome (GRCh38) using BWA-class aligners, producing a BAM file. Clinical NGS always aligns to a reference; de novo assembly is a research-only path for organisms without one. Duplicates are removed. Variant callers then identify positions that differ from the reference. SNV and small-indel calling reaches very high accuracy in well-covered regions, particularly with modern deep-learning-assisted callers (DeepVariant, Sentieon DNAscope, Illumina DRAGEN). CNV and structural variant calling remain active design choices, with meaningful sensitivity differences between strategies. Output: a VCF file.

  5. 05

    Tertiary Analysis: Annotation, Filtering & Interpretation

    A human genome contains ~3–4 million variants vs. the reference. Tertiary analysis narrows this to the handful that may explain the patient's phenotype. It involves annotation with ClinVar, gnomAD, and OMIM; filtering for quality and frequency; ACMG/AMP classification; and final report generation. This stage requires the most expertise, the most time, and the best software.

What NGS Detects

Variant Types Detected by NGS

One of NGS's core advantages is its ability to detect multiple classes of genetic variation in a single assay. Clinical labs should understand each type, its detection reliability, and the analytical approaches required.

Most common

Single Nucleotide Variants (SNVs)

A change at a single base position. The best-characterized class in clinical databases. Standard NGS detects germline SNVs with high sensitivity and specificity at ≥30x coverage. Examples: BRAF V600E, BRCA1/2 pathogenic variants.

Small scale

Insertions & Deletions (Indels)

Insertions or deletions of 1–50 bases. Frameshift indels are a common loss-of-function mechanism. Detection accuracy falls for longer indels and homopolymer runs.

Large scale

Copy Number Variants (CNVs)

Gains or losses of large genomic segments (kb to Mb). Examples: ERBB2 amplification in breast cancer, 22q11.2 deletion syndrome. Requires dedicated CNV callers (VS-CNV, CNVkit, others); general-purpose SNV callers cannot detect these.

Rearrangements

Structural Variants (SVs)

Large-scale rearrangements: inversions, translocations, large deletions and duplications. Includes the gene fusions critical in cancer (BCR-ABL1, EML4-ALK). Short-read NGS has limited SV sensitivity; long-read improves this substantially.

Drug response

Pharmacogenomic (PGx) Star Alleles

Haplotype combinations across genes like CYP2D6, CYP2C19, and DPYD that predict drug metabolism. Requires specialized databases and haplotype-calling algorithms beyond standard variant calling pipelines.

Assay Selection

NGS Test Types: Panels, Exomes & Genomes

Clinical NGS is not a single test. It is a family of assays with different scopes, costs, and diagnostic yields. Choosing the right test for the clinical question is as important as running it correctly.

Targeted Gene Panels

Sequences a predefined set of genes for a specific indication. Achieves very high depth (500x–1000x+), improving sensitivity for low-frequency somatic and mosaic variants. Less expensive. Misses any causal variant outside covered genes.

High depth · Lower cost

Whole Exome Sequencing (WES)

Captures protein-coding regions of all ~20,000 genes. About 1–2% of the genome but ~85% of known disease-causing variants. The workhorse of rare disease diagnosis, carrier screening, and research-grade clinical testing.

Broad · Diagnostic workhorse

Whole Genome Sequencing (WGS)

Covers all ~3.2 billion base pairs, including non-coding regulatory variants. Detects all variant types at single-base resolution. Increasingly used first-line for critically ill neonates and rare disease with no prior diagnosis.

Complete · Growing as first-line

Liquid Biopsy / cfDNA

Analyzes tumor-shed DNA circulating in the bloodstream. Detects somatic variants at very low allele frequencies (<1%), enabling non-invasive tumor monitoring and minimal residual disease detection. Requires ultra-sensitive variant callers.

Non-invasive · Ultra-sensitive

End-to-End View

From Sequencing to Clinical Report

The NGS pipeline is a process of progressive data reduction and progressive clinical enrichment. At each step, data volume shrinks by orders of magnitude. Clinical value increases at every stage.

StageOutput FormatData VolumeClinical Value
SequencerFASTQ~100–200 GBRaw, no clinical value yet
Alignment & Variant CallingBAM / VCF~1–10 GBCandidate variants identified
Annotation & FilteringAnnotated VCF~50 MBCandidates narrowed to review set
Interpretation & ClassificationVariant list~10 KBClinically classified findings
Clinical ReportPDF / EHR<1 MBActionable diagnostic conclusions

A failure anywhere in this chain matters. A poorly prepared library, a misaligned read, an outdated annotation database, or an incorrectly applied ACMG criterion can each produce a false-negative or false-positive clinical conclusion.

Before You Deploy

Key Considerations for Clinical Laboratories

  • 01

    Coverage and Uniformity

    Minimum coverage: ≥20x for WGS, ≥30x for WES. Coverage at individual loci matters as much as mean coverage. A single exon with inadequate depth is a diagnostic gap. Clinical validation must characterize coverage uniformity across all target regions.

  • 02

    Reference Genome Version

    GRCh38 (hg38) is the current standard and recommended for new pipelines. Labs still on GRCh37 (hg19) should plan migrations carefully. Variant coordinates differ between assemblies, and a mismatch can produce incorrect clinical conclusions.

  • 03

    Analytical Validation

    Every NGS assay must be analytically validated before clinical deployment, confirming sensitivity, specificity, precision, and reportable range for each variant type. CAP and ACMG have published guidelines (Gargis et al., 2012; Roy et al., 2018) defining minimum thresholds for CAP/CLIA accreditation.

  • 04

    Pipeline Reproducibility

    For clinical use the analytical pipeline must be deterministic. The same input files must produce the same output every time. This rules out algorithms with random sampling or non-deterministic steps. Regulatory bodies require that labs can reproduce any result on demand.

  • 05

    Data Storage and Security

    Genomic data is uniquely sensitive. Unlike a compromised password, a patient's genome cannot be changed once exposed. Labs must comply with HIPAA, GDPR (for EU data), and institutional governance policies. On-premises vs. cloud, encryption, access controls, and retention decisions belong in the design phase, not the launch phase.

  • 06

    Regulatory Framework

    NGS-based tests are regulated as laboratory-developed tests (LDTs) under CLIA in the US, with CAP inspector oversight of bioinformatics pipelines, validation studies, and quality management. Work with software vendors operating under ISO 13485-certified quality management systems for audit-ready documentation.

Pricing Reference

How Much Does NGS Cost?

NGS pricing varies by test type, clinical vs. research context, and insurance coverage. The figures below reflect typical U.S. clinical laboratory pricing as of 2024 to 2025.

Test TypeTypical Clinical CostCommon Use Cases
Targeted gene panel$300 – $1,500Hereditary cancer, cardiology, pharmacogenomics
Whole exome sequencing (WES)$1,000 – $3,000Rare disease diagnosis, pediatric genetics
Whole genome sequencing (WGS)$2,000 – $5,000Critically ill neonates, undiagnosed disease
Liquid biopsy / ctDNA panel$1,500 – $5,000+Tumor monitoring, therapy selection

The sequencing cost is only part of the total. Clinical interpretation, the labor required to review, classify, and report findings, adds significantly, especially for complex WGS rare disease cases. Labs that invest in automated interpretation platforms reduce per-case interpretation cost substantially over time. For research programs, WGS can be obtained from commercial providers for $200 to $400 per sample at scale, but this excludes library preparation, bioinformatics, and interpretation.

Common Questions

Frequently Asked Questions

What are the 4 steps of NGS?
The NGS workflow is typically described in four stages: (1) library preparation, where DNA is fragmented and adapters are added; (2) sequencing, where the prepared library is loaded onto a sequencer and billions of bases are read in parallel, producing FASTQ files; (3) secondary analysis, where reads are aligned to the reference genome and variants are called, producing BAM and VCF files; and (4) tertiary analysis, where variants are annotated, filtered, classified, and reported. In practice, library preparation is preceded by sample collection and DNA extraction, making it a five-step process end to end.
What is the difference between NGS and Sanger sequencing?
Sanger sequencing reads one DNA fragment at a time using chain-termination chemistry. It is highly accurate for short, targeted regions and remains the standard for confirming specific variants. NGS sequences millions of fragments simultaneously, making it orders of magnitude faster and cheaper for genome-scale analysis. A Sanger run reads a few hundred bases; an NGS run reads hundreds of billions. The trade-off: NGS produces shorter individual reads (150–300 bp for short-read platforms), creating challenges in repetitive regions. In clinical practice the two are complementary: NGS for discovery, Sanger for confirmation.
Is NGS the same as Illumina?
No. NGS is the category of technology; Illumina is the most widely used manufacturer of NGS instruments. Illumina's sequencing-by-synthesis platform dominates clinical genomics, which is why the terms are sometimes used interchangeably. Other NGS platforms include Thermo Fisher (Ion Torrent), Pacific Biosciences (long-read), and Oxford Nanopore (long-read). Downstream analytical software like VarSeq is platform-agnostic and works with data from any short-read or long-read sequencer.
What is the difference between NGS and whole genome sequencing?
NGS is the umbrella technology. Whole genome sequencing (WGS) is one application of NGS that sequences the complete genome. Other NGS applications include targeted gene panels and whole exome sequencing. All use massively parallel sequencing technology; they differ in which regions of the genome are captured and sequenced.
How long does NGS analysis take?
Sequencing run times vary by platform and assay: a typical Illumina WES run takes 24–48 hours. Secondary analysis (alignment and variant calling) takes an additional 4–12 hours per sample with optimized pipelines. Tertiary analysis, interpretation and reporting, is the rate-limiting step and can take hours to days per case in manual workflows. Automated platforms like VarSeq reduce this to minutes for routine cases.
What coverage depth is required for clinical NGS?
Minimum depth requirements depend on the assay type and variant class. Germline WES typically requires ≥30x mean coverage with ≥95% of targets covered at ≥20x. Somatic panels may require 500x–1000x to detect low-frequency mutations. Specific coverage requirements should be established during analytical validation.
Is NGS data considered protected health information (PHI)?
Yes. Under HIPAA, genomic sequence data from an identified or identifiable individual is considered PHI and is subject to the full protections of the HIPAA Privacy and Security Rules. Laboratories must implement appropriate administrative, technical, and physical safeguards for all NGS data.
What is the difference between germline and somatic NGS testing?
Germline testing sequences DNA from normal tissue (typically blood or saliva) to identify inherited variants. Somatic testing sequences DNA from tumor tissue to identify mutations acquired during cancer development. Each requires a different analytical pipeline, different variant callers, and different interpretation guidelines (ACMG for germline; AMP/ASCO/CAP for somatic).

See These Concepts in Practice

VarSeq is Golden Helix's platform for tertiary analysis, ACMG/AMP-guided variant classification, and clinical reporting. Purpose-built for laboratories that need a deterministic, validated, and scalable interpretation environment.