Human Genome Project — goals, methodology, and key findings

medium CBSE NEET NCERT Class 12 3 min read

Question

What were the goals of the Human Genome Project (HGP)? Describe the methodology used. List the salient findings of the project.

(NCERT Class 12 — directly asked in CBSE boards and NEET)


Solution — Step by Step

The Human Genome Project (1990-2003, coordinated by US DOE and NIH) aimed to:

  1. Identify all genes in human DNA (~20,000-25,000 genes)
  2. Determine the sequence of all 3.1 billion base pairs
  3. Store the information in databases (GenBank)
  4. Develop tools for data analysis (bioinformatics)
  5. Address ethical, legal, and social issues (ELSI programme)
  6. Transfer technology to the private sector

Approach 1: Hierarchical shotgun (HGP’s method)

  1. Break chromosomes into large fragments using restriction enzymes
  2. Clone these fragments into BAC (Bacterial Artificial Chromosomes) or YAC (Yeast Artificial Chromosomes)
  3. Map fragments to specific chromosomal locations
  4. Break each clone into smaller pieces, sequence them, and assemble

Approach 2: Whole genome shotgun (Craig Venter/Celera)

  1. Shatter the entire genome into random small fragments
  2. Sequence all fragments simultaneously using automated sequencers
  3. Use powerful computers to align overlapping sequences and assemble the whole genome

Both approaches used the Sanger sequencing method (dideoxy chain termination) as the core sequencing technique.

  • Total base pairs: ~3.1 billion (3.1 × 10⁹)
  • Total genes: ~20,000-25,000 (far fewer than the expected 100,000)
  • Less than 2% of the genome codes for proteins — the rest is non-coding (once called “junk DNA”)
  • Average gene size: ~3,000 bases; largest known gene: dystrophin (2.4 million bases)
  • 99.9% of DNA sequence is identical between any two humans — only 0.1% varies
  • Repetitive sequences make up a large portion of the genome and have roles in chromosome structure
  • Chromosome 1 has the most genes (~2968); Y chromosome has the fewest (~231)
  • The non-coding DNA includes regulatory sequences, introns, and transposable elements

Why This Works

The HGP was transformative because it gave us the complete “parts list” of human biology. Knowing the sequence allows us to identify genes responsible for diseases, develop targeted drugs, and understand human evolution.

The surprise that we have only ~20,000 genes (fewer than a grape vine) showed that complexity comes not from gene number but from alternative splicing, gene regulation, and protein-protein interactions. One gene can produce multiple proteins through different splicing patterns.


Alternative Method — Key Numbers for NEET

Memorise these numbers — NEET loves factual recall from HGP:

  • 3.1 billion base pairs total
  • ~20,000-25,000 genes
  • Less than 2% codes for proteins
  • 99.9% similarity between individuals
  • Completed in 2003 (April 14, 2003 — final announcement)
  • Cost: approximately $2.7 billion over 13 years

For the methodology, remember: BAC/YAC cloning → physical mapping → sequencing → assembly.


Common Mistake

Students often say the HGP “sequenced the entire genome including every individual variation.” The HGP sequenced a reference genome — a composite from a few anonymous donors. It did not capture all population-level variation. Individual genome projects (like the 1000 Genomes Project) came later. Also, don’t confuse the number of genes (~25,000) with the number of proteins (over 100,000) — alternative splicing makes many more proteins than there are genes.

Want to master this topic?

Read the complete guide with more examples and exam tips.

Go to full topic guide →

Try These Next