We have been discussing the human genome, the summation of all
of the DNA in all of our chromosomes. With the official completion
of the human genome project, for the first time ever we know the sequence
of the DNA in us. How has it been possible to determine the sequence of about
3,000,000,000 base pairs???? As recently as the beginning of the 1990's, it was not clear that
it was a possible undertaking.
To put this in perspective, let's glance at a bit of half-century genetics history:
In the 1950's; basic DNA structure ('53) and replication ('58) were figured
out.
In the 1960's; genetic code for mRNA to protein figured out ('64).
In the 1970's; technique for sequencing ~200 base pairs of DNA was developed.
In the 1980's; many individual genes able to be located, sequenced and analyzed.
In the 1990's; automated processes developed, making "genome sequencing"
possible.
2001 - 2004 ; human and some other genome sequences completed.
Let's start in the 1970's and ask how is it possible to determine the base pair sequence of a very short (200 base pairs or so) DNA molecule? Described
most simply, the technique involves the following steps: (i) prepare
appropriate template DNA, (ii) do enzymatic in vitro DNA synthesis
to make DNA single strands of all possible lengths from the template strand, (iii) separate these "all possible lengths" single-stranded DNA molecules
by electrophoresis, and then (iv) determine what the "end" nucleotide is for each of the molecules.
1. What is a "dideoxynucleotide", and what happens if
one of these is put into a growing DNA strand during in vitro DNA synthesis?
Figure 6.20 shows the addition of a normal deoxynucleotide onto the 3' OH of
the growing strand during DNA synthesis.
If the added nucleotide is a "dideoxynucleotide" (Figure 6.29), the 3' end of
the growing strand will now have an H, rather than an OH. This will prevent
any additional nucleotides from being added on; i.e., in vitro DNA synthesis
will stop at this point.
2. How does the "dideoxy sequencing method" work?
Figure 6.30 shows the technique, which involves four separate in vitro
DNA strand synthesis reactions in the presence of all four deoxynucleotides
plus a small amount of one dideoxynucleotide. Incorporation of the dideoxynucleotide
into the growing DNA strand stops further synthesis, so in each of the four
reaction tubes we end up with a mixture of newly made DNA strands of various
lengths.
In "the old days" (1970's and 80's), the products of all
four reactions were electrophoresed in adjacent lanes in a gel. Each possible
length of DNA showed up somewhere in the gel, and from the locations of the
bands in the four lanes one could "read" the sequence of the overall DNA strand.
By the mid 1990's, the process became much faster and less labor-intensive by using
fluorescent-tagged dideoxynucleotides. All four reaction samples could be
electrophoresed together in one lane so that each band could be automatically
read by a spectrophotometer as it came off the bottom of the gel. By 2000,
automated DNA sequencers were using capillary tubes rather than gel lanes.
The "putting together in the right order" of many "overlapping
random location sequences" requires specialized computer applications.
In the 2001 Nature paper that reported the first draft of the human genome,
a simple overview of the technique was presented in Figure 2.
The final draft of the human genome is being published chromosome by chromosome, working in general from the smallest ones to the largest ones. The most recent article, reporting an accuracy of "greater than 99.99%", is just this month, in Nature, Schmutz et al.(over 80 people), "The DNA sequence and comparative analysis of human chromosome 5".
3. What are some of the main genome sequences that have
been determined and are available for analysis?
The Ensembl
Genome Browser provides an excellent entry point into investigating some of the genomes
that have been sequenced.