Open Questions: Molecular Biology and Genetics

[Home] [Up] [Glossary] [Topic Index] [Site Map]

See also: Gene expression  and regulation -- RNA biology -- Genetics and genomics -- Molecular evolution


Reviewing the basics

Filling in some details

Regulation of genes

Exons and introns

Translation of RNA into proteins



'Junk' DNA

Recommended references: Web sites

Recommended references: Magazine/journal articles

Recommended references: Books


The word root Gen- is one of the most pregnant in the whole Indo-European family of languages. It is associated with the concept of giving birth and being born. It's clearly the root of English words such as gene, genesis, genetic, genital, gender, genealogy, generate, generation, congenital, indigenous, and progeny. A little more distant are words like genus, general, generic. We've inherited all these from Latin. Gens is a mostly obsolete English word for clan. Gente is Spanish for people. Generous and gentle, as in gentleman and genteel, also come from the clan idea in gens. In Germanic langauges (of which English is one), the root became kin (like clan), king, and kind (like both genus and gentle).

Even further afield, historical linguists think that the root is related to other words suggesting birth, such as native (=indigenous), nativity, natal, nation, and nature. Not to mention, pregnant.

The point here is that the word "genetics" has many rich associations with the means by which life has come to be. And that is precisely what the subject is about. We have the additional scientific terms ontogeny (how individual creatures develop) and phylogeny (how the tree of life itself developed). The word could not be more apt, because molecular genetics is absolutely fundamental to understanding both these processes.

The original scietific meaning of the word gene was some (at the time unknown) unit representing genetic information. It was the something in the hereditary material of peas which (Mendel presumed) made them wrinkled or smooth. It is the something in our makeup (we presume) that gives us blue eyes or brown.

But that meaning, we now know, is a vast oversimplification. The full truth is much more complicated, but also much more amazing. Unfortunately, we too often encounter a misuse of the outdated concept. There are certainly genetic factors in personality and intelligence, for example. But it isn't just one gene. Traits of this sort are undoubtedly influenced by dozens or even hundreds of genes, to say nothing of environmental influences besides.

The term "molecular genetics" refers to our modern, rigorous science of genes, "the genome", DNA, etc. The subject has received a lot of attention since it was announced in 2000 that the "sequencing" of the human genome was (almost) complete. Does that mean there are no more important open questions left in this area?


More than a year after the announcement, there were still disagreements over some pretty basic details. Are there only 30,000 human genes? Or more like twice that? Or somewhere in between? How is it that we can know (99%) the precise order of the 3 billion base pairs in the human genome, and yet not know what all the genes are?

The answer is simple: knowing the order of all the base pairs is a good start, but it's only a start. It simply brings us to the point where we can reasonably ask (about any given genome, not necessarily that of humans) three fundamental questions:

  1. Except for the simplest organisms (bacteria, basically), most of the DNA in the chromosomes is not part of genes, and it's not even clear why it's there. So, exactly what identifies the 3 to 5% of base pairs which make up the genes, and where is each gene located?
  2. Every cell in the body contains the whole genome, yet at any given time, most genes are switched off rather than on. I. e., they are not "expressed". What is the mechanism which determines whether any particular gene is or isn't expressed at a particular time?
  3. Most genes (outside of a handful that have specialized purposes) contain the code for constructing one or more proteins, which are produced when the gene is expressed. What is the biological function of each of those proteins? (Since there may be a million or so of them produced in the human body, answering this question is likely going to take quite awhile.)

Reviewing the basics

You can skim this part if it's already familiar. But to start at the beginning, let's review the biochemical details. DNA stands for deoxyribonucleic acid. It's very similar to ribonucleic acid (RNA). Each of these is a polymer consisting of an arbitrary number of units called "nucleotides". Each nucleotide, in turn, is composed of three sorts of more elementary constituents. The first constituent is a sugar: ribose (in the case of RNA) or deoxyribose (for DNA). The only difference between the two is that deoxyribose has one fewer oxygen atom than ribose (2 instead of 3). They are called sugars because they are chemically similar to more familar sugars, like dextrose. The second constituent is a phosphate group (PO4). In a strand of DNA or RNA there is one phosphate between each sugar, linking them together like beads on a chain. These chemical bonds are very strong.

The third constituent is one of a set of 5 "bases", each of which is a relatively simple molecule consisting of carbon, nitrogen, oxygen, and hydrogen. The 5 bases are adenine, guanine, cytosine, thymine, and uracil, usually abbreviated A, G, C, T, U respectively. Thymine occurs only in DNA, while uracil occurs only in RNA, but chemically they are very similar. In DNA and RNA one base is attached to each sugar molecule, like charms on a bracelet.

Each base has a tendency to pair off with a specific one of the others, by means of a "hydrogen bond". G always pairs with C. A pairs with either T or U. There is one more important difference between RNA and DNA. RNA is always a single strand. In DNA, however, two strands usually occur together with corresponding bases pairing together in the manner described. This usually results in the "double helix" structure, with the two strands twining around each other, their bases in the center and the sugar-phosphate backbone on the outside.

Given all this, the process of making a protein from a gene is easy enough to describe at a high level. In the first step, an enzyme (i. e. a special type of protein) called RNA polymerase (because it builds RNA polymers) does the work. The RNA polymerase attaches to a strand of DNA at a particular marker sequence which indicates the start of a gene. It them moves down the strand building an RNA molecule as it goes along. For each nucleotide on the DNA strand, the RNA polymerase adds a new unit on the growing chain using a nucleotide which contains the complementary base, selecting nucleotides from the surrounding acqueous medium. This "transcription" process stops when the polymerase encounters an appropriate base-sequence marker at the end of the gene.

The resulting RNA is called "messenger RNA" (mRNA). Once completed, it drifts away from the DNA to another part of the cell (outside the cell nucleus if there is one). Eventually the mRNA encounters an important cellular device called a ribosome. The ribosome is a complex of several proteins and another type of RNA molecule (ribosomal RNA or rRNA). It is the machine that "translates" mRNA into proteins. Basically what the ribosome does is to read the mRNA sequence just as the RNA polymerase read the original DNA and to build up proteins from the given sequence information.

It is at this point that one important fact comes in. Proteins are polymers of another sort of small molecule called an amino acid. A large number of amino acids are known, but only 20 actually make up proteins. Still, there are only 4 different bases in RNA, so there can't be a simple 1:1 correspondence. But if you take any string of 3 bases together, that yields 64 possible sequences, which is more than enough to specify any needed amino acid. That is exactly what happens, and the resulting mapping from base triplets to amino acids is called the "genetic code". It's a four-letter alphabet that forms 64 possible 3-letter words. Each word specifies a particular amino acid (and in most cases a particular amino acid can be specified by more than one word). A few words are reserved to specify the end of a coding sequence, like the period at the end of a sentence. It is a remarkable fact that this code is universal to all forms of life (with only a few almost trivial exceptions). This is very strong evidence for a single common ancestor of all presently existing organisms.

The ribosome, then, simply employs this genetic code to construct proteins one amino acid a time, given the sequence information in the mRNA. The code itself is actually not built into the ribosome, but rather resides in yet another type of RNA called "transfer RNA" (tRNA). Each tRNA is a relatively short molecule which has an appropriate 3-base sequence at one end and a docking location at the other which attaches only one specific amino acid. The tRNA molecules with attached amino acid are found floating around in the ambient medium and fundamentally all that the ribosome does is find the right one to use at any particular point in its reading of the mRNA.

There are a number of important details omitted from this description, but at a high level, there isn't anything more to the process of building proteins from a DNA template than the mechanical process of matching one sort of thing with the appropriate other sort of thing.

Note also how three different types of RNA have played crucial roles: mRNA acts as a copy of the original (DNA) template. rRNA makes up a key functional part of the ribosome. And tRNA implements the genetic code. The several active roles that RNA plays in this process have led to speculation that RNA was one of the original molecules of life, and that it was somehow, at an early stage, the key chemical player in a priomordial "RNA world". This is one principle possible scenario for the origns of life. It seems easier to imagine that different types of RNA working together eventually managed to "invent" proteins -- which today actually make up the material of living things as we know them -- than to suppose that proteins somehow came first and "invented" RNA.

But we're certainly not sure what happened. Proteins definitely play crucial roles at every stage of copying RNA and DNA. It is possible that proteins somehow came about first and eventually managed to encode their own blueprints in a crude primal form of RNA. In other words, at some point in time, there may have been relatively simple proteins which were able to "read" the sequence of other proteins and record it as RNA. This is doubtful mainly because we don't know of any such process going on today. Because of its linear nature, RNA (or DNA) is much easier to "read" than proteins are, making it much more convenient for information storage and retrieval.

Filling in some details

So much for the basics. The truth of the matter is that, as usual, the devil is in the details. We've glossed over a lot of important points which are crucial to answering two key questions: What distinguishes genes from the rest of the DNA in the genome? And what causes a particular gene to be expressed at any particular time? In fact, although we know many of the details here, large parts of the full answers to these questions are still unknown.

The subject we're looking at here is sometimes called "gene regulation": How does the cell chemistry distinguish genes from all the rest of the DNA in the first place? And how does it decide whether to actually transcribe the gene into mRNA, i. e., whether to "express" the gene? One key fact we must allow for in a more detailed examination of gene regulation is that not all cells do it in the same way. In other words, unlike the genetic code itself, gene regulation is not the same in all life forms. Hence it has gradually developed during the evolutionary process. (Yet most of the details do seem to have appeared early enough to occur in most life forms other than bacteria.)

Very broadly speaking, there are two types of cells. The first, called prokaryotic cells or prokaryotes are very small, simple, and presumably primative cells -- with bacteria being the prime example. The second type, called eukaryotic cells or eukaryotes, are larger, more complex, and presumably a later evolutionary development. The cells in almost all types of life other than bacteria -- i. e. protists, fungi, plants, and animals -- are eukaryotic. Because it's more complex and interesting, we'll focus on gene regulation in eukaryotes, but point out how it differs in prokaryotes. (One other type of cell, found in organisms known as archaea, is somewhat intermediate in this regard, but we'll leave it out, for simplicity.)

There's also one key fact we need to note about nucleic acids, either RNA or DNA. That is, they are not just symmetrical strings of nucleotides in which there is no sense of "forward" or "backward". On the contrary, there is a definite directionality. It arises because the orientation of the sugar units in each nucleotide of the chain is important. Each sugar is conventionally said to have a 3' end and a 5' end (the numbers referring to numbering of the 5 carbon atoms in the sugar molecule). The 5' position is where the phosphate group is attached in a single nucleotide. (The base of a nucleotide is attached at the 1' position.) In the process of polymerization, one nucleotide is attached to another by establishing a bond between the phosphate on one nucleotide and the 3' position of the sugar of another nucleotide. As a result, a string of nucleotides itself has a 3' end an a 5' end, and each additional nucleotide can be added only at the 3' end.

What is the practical significance of this? It is simply that polymerase enzymes which costruct DNA or RNA from an existing string of nucleotides must build the new string in the 5' to 3' direction, since the 3' end is the only one that can "grow". Given this, it is a further fact that RNA polymerase "reads" a nucleotide string only in the 3' to 5' direction. That is, RNA polymerase looks for the "next" nucleotide to be transcribed at the 5' end of the "current" nucleotide. (DNA polymerase, which replicates an existing strand of DNA, also reads in the 3' to 5' direction.)

Somewhat confusingly, by convention, nucleotide sequences in single strands of DNA and RNA are specified in the 5' to 3' direction, which is the order in which they are assembled, but the opposite of the way they are read by polymerase. One more fact is that in doubly stranded DNA, the two strands run in opposite directions, so in this form of DNA, choice of direction is again arbitrary -- how would one choose which strand defines it? One final bit of terminology: the 5' to 3' direction of a single strand is said to be "downstream", while the opposite 3' to 5' direction is said to be "upstream".

We now have enough terminology to talk about how gene transcription (the copying of nucleotide sequences from DNA to mRNA) and regulation occurs. Most importantly, there are specific short sequences which are markers -- usually called "promoters" -- of the approximate start of a gene. Such markers always include the sequence TATA. Consequently, the marker is sometimes called a TATA box. The RNA polymerase enzyme attaches to the promoter sequence in a specific direction -- "downstream", so that it is oriented, on the strand containing the promoter, in the 5' to 3' direction relative to the promoter. (Hence the promoter occurs upstream of the gene itself.

In eukaryotes, however, there is a special protein, one of a class called "transcription factors", which first attaches to the marker region. Additional transcription factors then attach to the first, and only when enough of the "right" transcription factors are present can the RNA polymerase attach and begin working. The transcription factors are one of the main ways in which gene expression is regulated. Since transcription factors are proteins, which are manufactured under the direction of other genes, this is how one gene can affect the expression of another. Indeed, there can be a whole sequence of such relationships between genes. In this way, a set of genes may become turned on in sequence, altering the characteristics of the cell at each stage. This seems to be the basic "secret" of how specialized cells arise from more generalized types of cells in the embryonic development process of multicellular organisms (which are almost always composed of eukaryotic cells).

If you've followed this discussion closely, one detail may be bothering you. We said that RNA polymerase reads in the 3' to 5' direction. Yet it is oriented in the 5' to 3' direction relative to the marker where it attaches. To resolve this apparent contradiction, it is crucial that DNA contains two strands oriented in opposite directions. The marker region may occur on either strand (when read in the appropriate sense). But what then happens is that the polymerase actually reads from the opposite strand from the one containing the marker. Because the opposite strand is oriented in the "correct" (3' to 5') direction to be read by the polymerase, everything works out just right. The net result is that either strand may be the one which is actually transcribed, but there is no ambiguity, because everything is arranged properly by the location and orientation of the marker.

Is this perhaps a reason why DNA is double-stranded? Not necessarily. There are other clear advantages of the double-strandedness. Redundancy may be the main one: Any damage that affects one strand may be detected and even corrected by various mechanisms which use the information from the other (hopefully undamaged) strand. (This is in addition to the redundancy which occurs because genes are duplicated when they are found on paired chromosomes.) The double-stranded architecture of DNA has a variety of ramifications, and it's impressive how it all fits together.

Regulation of genes

We have only begun to examine the mechanisms by which the expression of genes may be regulated, in considering how the action of transcription factors enables the beginning of transcription by RNA polymerase. There are a number of other mechanisms, which may operate either earlier or later in the process.

We must note at this point that prokaryotic cells do not have transcription factors, so things are a little simpler. In this case, the RNA polymerase is ready to begin working as soon as it attaches to the promoter. (Even so, a small molecule called a "sigma peptide" assists in this process.) There are, however, proteins -- called "repressors" -- which can bind to the promoter and prevent polymerase from attaching. This is called "negative regulation". But there is yet another type of protein -- called an "inducer" -- that can bind to a repressor and make it let go of the promoter region, thus enabling transcription. This is an example of "positive regulation". The genes that specify such repressors and inducers are thus able to affect the expression of other genes. This mechanism seems best suited to allow prokaryotic cells (which are usually single-cell organisms like bacteria) to change behavior depending on their environment.

Something similar seems to happen in eykaryotes. But in this case there may be multiple regions of the DNA, not necessarily either upstream or close to the affected gene, to which proteins may attach and either facilitate or inhibit transcription of the gene. Such regions of DNA are not (as far as is known) parts of any gene. They are simply noncoding regions called enhancers or silencers, depending on their effect. The proteins which bind to such regions to make them effective are called activators and repressors, respectively. Here again, the genes which code for the activators and repressors are able to affect the expression of other genes, in a potentially multiple-step chain.

There is an totally different mechanism of gene regulation which works in eukaryotes but is entirely absent from prokaryotes. To discuss it, we have to know a few more facts about eukaryotic DNA. Unlike prokaryotic DNA, it does not just occur "loose" within the cell. Instead, the main cellular DNA (as opposed to that found in mitochondria) is contained in the chromosomes.

Since eukaryotic cells are complex (and usually part of even more complex organisms), their DNA needs to contain a lot of nucleotides (although, a lot of this seems to be "junk" of no apparent function). The 3 billion nucleotides in one strand of human DNA would stretch out about 2 meters if laid out straight. Yet this needs to be packed within a cell nucleus that's less than a millionth of a meter in diameter. And all without getting impossibly tangled up!

There is a systematic way this is accomplished. The DNA is wound around a set of proteins called histones. DNA winds twice around each histone, forming what is called a nucleosome. It then winds around another histone, and so on. The histones neutralize negative charges on the DNA sugar-phosphate backbone and allow it to be stored quite compactly. The entire string of nucleosomes is called chromatin, which is the "stuff" of the chromosomes.

As you might suspect, it is not as easy for proteins to get access to any segment of DNA when it's bound up this way as it would be if the DNA were simply floating around naked. Although we are just beginning to understand what's happening here, it appears that there are chemical markers attached to the histones which affect how proteins such as DNA polymerase, transcription factors, activators, and repressors interact with DNA. From our description of the transcription process, it's easy to see that anything which interferes with the interaction between DNA and these proteins is going to affect gene expression.

For example, the attachment of acetyl groups to histone proteins (a process called acetylation) may play a role in turning genes on. Likewise, the attachment of methyl groups to DNA itself ("methylation") seems to have the effect of turning genes off. (Or perhaps, more exactly, it prevents genes that are turned off from being turned back on.) In the process known as "imprinting", this seems (in a few cases) to allow for disabling genes which come specifically from either the mother or father.

Researchers are now speculating that there may be such a thing as a "histone code", analogous to the genetic code of DNA, which dictates gene regulation at a high level. In other words, the occurrence of a specific histone at a specific place may be meaningful in gene regulation.

Exons and introns

And what about mechanisms which affect gene regulation that occur after DNA has been transcribed into mRNA?

Translation of RNA into proteins


So what, in a nutshell, are the most interesting questions in molecular genetics today? We've had to cover a lot of definitions and basic biology in order even to state them with any precision, but here's our refined list:
  1. Out of all the hundreds of thousands of possible genes in the genome of any organism, i. e. sequences following known promoters, is there any reasonably good set of rules that identify the actual functioning genes of the organism?
  2. What are the full set of molecular mechanisms which affect exactly when a given gene will be transcribed into mRNA? I. e., exactly when may genes be expressed and when not?
  3. Is there a code we don't fully understand yet, embedded in histone proteins, which affects gene regulation in eukaryotic cells?
  4. What exactly is the reason for the presence of non-coding DNA (introns) mixed in with the coding regions (exons) of genes in eukaryotes? How and when do different subsets of the mRNA from a single gene become spliced into RNA which can be translated into different proteins?
  5. What, in fact, is the origin or "purpose" of all the "junk" DNA that occurs in eukaryotic genomes?

Expression of genes

Biologists have used the term "gene" since long before it was learned what a gene was at the molecular level. Although we now know far more about the molecular nature of genes, and even have precise sequences for thousands of them, a number of questions still remain.
What identifies the start of a gene?
How are exons/introns within a gene identified?
How does one gene encode multiple proteins?
What determines whether a given gene is expressed or not?
Which of the two strands of DNA within a gene is used for decoding, and how is it selected?
What is the origin and use (if any) of "junk" DNA?

Recommended references: Web sites

Site indexes

Open Directory Project: Molecular Biology
Categorized and annotated links. A version of this list is at Google, with entries sorted in "page rank" order. May also be found at Netscape.
Virtual Library on Genetics
Categorized and annotated list of links.
Genomic and Genetic Resources on the World Wide Web
Directory of Web sites, maintained by NHGRI.
The Virtual Library of Biochemistry and Cell Biology: Genes and Gene Expression
Extensive categorized and annotated list of links.
Links (
Moderate number of selected, well-annotated and categorized links.
Genetics Resources (The DNA Files)
Substantial list of resources, both Web links and other references, organized into many categories and subcategories, with extensive annotations.
Gene Structure and Function Links
Directory of Web sites, from the Usenet bionet.genome.gene-structure news group. Some of the more interesting categories are for educational material and bioinformatics.
Collection of Links Useful for Exploration
Covers sites related to various areas of genetic research. Maintained by the Human Genome Center at the University of Tokyo.
Links for Biomolecular Studies
Maintained by Daisuke Kihara, mainly for researchers.
Galaxy: Genetics
Categorized site directory. Entries usually include descriptive annotations. More here.
Galaxy: Molecular Biology
Categorized site directory. Entries usually include descriptive annotations. Has a subcategory for DNA structure.

Sites with general resources

GenStructure FAQ
Frequently asked questions and general information from the Usenet bionet.genome.gene-structure news group. Contents include recommended reading, and gene structure and function links. (The newsgroup archives are here.)
Double Helix: 50 Years of DNA
A retrospective collection of articles from Nature "celebrating the historical, scientific and cultural impacts of the discovery of the double helix" 50 years after the event.
DNA Anniversary
Very good site from Nature with feature articles, news articles, biographies, and external links related to DNA.
Genetic Science Learning Center
Diverse collection of educational and tutorial material related to genetics. Includes suggested activities, research articles, news, and listing of events.
Biology Project: Molecular Biology
Part of the University of Arizona Biology Project. There are external links and tutorials on topics like molecular genetics of prokaryotes, nucleic acids, and eukaryotic gene expression.
Good general information on genetics and molecular biology, with separate beginner, intermediate, and advanced sections. Also has news and discussion forums, and good external links. Produced by the Wellcome Trust Sanger Institute.
The DNA Files
Home page for a series of 14 public radio documentary programs, produced by (U. S.) National Public Radio. Tapes and transcripts may be ordered. The list of resources for further study is especially good.
Dolan DNA Learning Center
The Learning Center, supported by the Cold Spring Harbor Laboratory, has produced various educational sites, such as DNA Interactive, Your Genes, Your Health, DNA from the Beginning, Genetic Origins and Bioservers.
In Depth: Human Genome
News stories and articles on genetics and the human genome, provided by the BBC.
The RNA World Website
An extensive site supported by the Institute of Molecular Biology in Jena (Germany). Provides many external links and other information on RNA related topics, mostly oriented towards professionals, but with some tutorial and educational resources. Contains some useful references on RNA interference (RNAi).
DNA - Celebrating 50 Years of the Double Helix
A special collection of articles and news stories related to DNA, from New Scientist.
A Revolution at 50
February 25, 2003 articles from the New York Times commemorating the 50th anniversary of the discovery of the structure of DNA. See DNA, the Keeper of Life's Secrets, Starts to Talk for the overview.
A portal to relevant Nature Publishing Group resouces in the field of genetics.
Nature Web Focus: The Y Chromosome
Collection of articles, papers, and other materials from Nature related to the Y chromosome.

Surveys, overviews, tutorials

Category: Genetics
Topic category from Wikipedia.
Category: Molecular biology
Topic category from Wikipedia.
Molecular biology
Article from Wikipedia. See also DNA, RNA, Chromosome, Protein,
Molecular Genetics
Extensive hyperlinked textbook by Ulrich Melcher. The text is intended for first year graduate students but useful to a general audience.
Primer on Molecular Genetics
Covers general background information on DNA, genes, and chromosomes, as well as more detailed information on genome mapping and sequencing, and bioinformatics. (The document was originally developed in 1992.)
DNA From the Beginning
An animated primer on the basics of DNA, genes, and heredity. Main sections are classical genetics, molecules of genetics, and genetic organization and control. The site was produced by the Dolan DNA Learning Center.
DNA Interactive
Online interactive presentations of the history of DNA and research into molecular biology and genomes. The site was produced by the Dolan DNA Learning Center.
Damage Control
March 2009 article in The Scientist. "Researchers unlock a treasure trove of information about how cells sense and respond to DNA damage."
What is junk DNA, and what is it worth?
February 2007 Scientific American Ask the Experts article. Good explanation of "junk" DNA.
RNA to the Rescue
May 2005 Scientfic American In Focus article, subtitled "Novel inheritance patterns violate Mendel's laws."
Explore DNA and Genetics
Collection of news stories, broadcast transcripts, and articles about DNA and genetics at ABC News (Australia).
Introduction to DNA Structure
Tutorial on the components and structure of DNA, with illustrative graphics.
What is known about the function of introns?
Scientific American page with answers and external links from several experts.
Genius of Junk (DNA)
July 2003 feature story and interview with Malcolm Simons, a pioneer in research into "junk" DNA.


Human Molecular Genetics
Complete online textbook, by Tom Strachan and Andrew P. Read. Index. Part of the NCBI Bookshelf.
An Introduction to Genetic Analysis
Complete online textbook, by Anthony J. F. Griffiths, Jeffrey H. Miller, David T. Suzuki, Richard Lewontin, and William M. Gelbart. Index. Part of the NCBI Bookshelf.
Modern Genetic Analysis
Complete online textbook, by Anthony J. F. Griffiths, Jeffrey H. Miller, Richard Lewontin, and William M. Gelbart. The material is organized to emphasize molecular genetics, compared to the more chronological organization of An Introduction to Genetic Analysis. Index. Part of the NCBI Bookshelf.
Molecular Biology of the Cell
Complete online textbook, by Bruce Alberts, Dennis Bray, Julian Lewis, Martin Raff, Keith Roberts, and James D. Watson. Probably the leading textbook in the field. (But this is not the latest edition.) Index. Part of the NCBI Bookshelf.
Molecular Cell Biology
Complete online textbook, by Harvey Lodish, Arnold Berk, S. Lawrence Zipursky, Paul Matsudaira, David Baltimore, and James E. Darnell. Index. Part of the NCBI Bookshelf.

Recommended references: Magazine/journal articles

Alphabet of Life
Marissa Cevallos
Science News, February 12, 2011
Searching for clues to the genetic code's origin.
Unnatural selection
Laura Bell
Science News, October 9, 2010
Chemists build proteins with parts not in the typical toolkit.
Triple Helix: Designing a New Molecule of Life
Peter E. Nielsen
Scientific American, December 2008
"Peptide nucleic acid, a synthetic hybrid of protein and DNA, could form the basis of a new class of drugs - and of artificial life unlike anything found in nature."
The Human Genome: RNA Machine
John S. Mattick
The Scientist, October 2007
Contrary to current dogma, most of the genome may be functional.
Looking at Variation in Numbers
Josh P. Roberts
The Scientist, May 2005
Nipping at SNP's heels, copy-number polymorphisms gain ground.
The Hidden Genetic Program of Complex Organisms
John S. Mattick
Scientific American, October 2004
The Unseen Genome: Gems among the Junk
W. Wayt Gibbs
Scientific American, November 2003
Biological Dark Matter
John Travis
Science News, January 12, 2002, pp. 24-25
Some of the genes in DNA code directly for sequences of RNA instead of proteins.
Is Life That Simple?
Eli Kintisch
Discover, April 2001, pp. 66-71
The bacterium Mycoplasma genitalium has the smallest known genome -- 470 genes. Research is being performed to determine whether even fewer genes are enough for life.
Code Breakers
Tina Hesman Saey
Science News, June 3, 2000
Scientists are altering bacteria in a most fundamental way.
Ayala Ochert
Discover, December 1999, pp. 59-66
About 95% of the human genome is DNA which is not part of actual genes. Transposons make up nearly half of this "junk DNA". New findings indicate that transposons do actually play a genetic role.
A Gene for Nothing
Robert Sapolsky
Discover, October 1997, pp. 40-46
A very simple overview article that explains in a general way how genes work.
Joan Argetsinger Steitz
Scientific American, June 1988, pp, 56-63
A "snurp" is a small nuclear ribonucleoprotein, one of a class of molecular complexes consisting of RNA and proteins. As a component of a larger structure called a spliceosome, snurps help with the essential task of removing introns from messenger RNA.
The Processing of RNA
James E. Darnell, Jr.
Scientific American, October 1983, pp. 90-100
The process by which genetic information from DNA is transcribed into proteins by means of three different types of RNA is intricate. Of particular interest is the fact that the DNA information that encodes a protein is not continuous and must be carefully spliced together. There may be clues in this mechanism to the mystery of how the RNA/DNA coding system evolved.

Recommended references: Books

Matt Ridley -- Genome: The Autobiography of a Species in 23 Chapters
HarperCollins Publishers, 1999
Ridley writes good books. Genome is an impressive exposition of the ways that our human genetic material (our "genotype") affects our physical form, both in health and in illness (our "phenotype"). The relationship is sometimes direct, but more usually it isn't, in which case it is all the more important to know the mechanism through which it works. The books is organized into 23 chapters, one for each chromosome. Each chapter treats a single topic -- such as intelligence, conflict, disease, memory, sex, and death -- according to the presence on the corresponding chromosome of a gene thought to be relevant to the topic.
Maxim D. Frank-Kamenetskii -- Unraveling DNA: The Most Important Molecule of Life
Perseus Books, 1997
This is an efficient but inclusive presentation of many aspects of the structure and biological function of DNA. Topics include the way in which information is encoded in and transcribed from DNA, forms that DNA may take in addition to the usual double helix (circles, knots), and genetic engineering.
Karl Drlica -- Understanding DNA and Gene Cloning: A Guide for the Curious
John Wiley & Sons, 1997
Drlica provides a solid, technical -- but not overly detailed -- introduction to the subject of molecular biology and genetic technology. It's refreshingly free of the fluff present in many such books that are accessible to a general audience. Topics include the structure and replication of DNA, gene expression, manipulation of DNA in the laboratory, gene cloning, and applications of genetic technology.
Robert Pollack -- Signs of Life: The Language and Meanings of DNA
Houghton Mifflin Company, 1994
The author presents a fairly elementary account of the nature and function of the DNA molecule. A great deal of the book deals with philosophical and even political speculation about the uses to which knowledge of DNA might be put.
Christopher Wills -- Exons, Introns, and Talking Genes: The Science Behind the Human Genome Project
Basic Books, 1991
Wills' book appeared near the beginning of the Human Genome Project. The Project, as such, is now history, and it has accumulated information which will take decades to digest. Nevertheless, its outcome has not yet changed in a revolutionary way the fundamentals which were understood at the outset, which are ably presented in this volume. This remains worthwhile reading to convey that understanding of the underlying science, as well as the basic questions -- many of which are still unanswered.


Copyright © 2002-04 by Charles Daney, All Rights Reserved