Open Questions: Protein Chemistry and Biology

[Home] [Up] [Glossary] [Topic Index] [Site Map]

Prerequisites: Molecular biology and genetics

See also: Proteomics

Introduction

Protein chemistry

Protein folding

Structural biology

Synthesizing proteins

Glycosylation

Enzymes


Recommended references: Web sites

Recommended references: Magazine/journal articles

Recommended references: Books

Introduction

Proteins are the "stuff" that living things are made of. No deep understanding of biology can occur without an understanding of proteins.

They are, if anything, more important even than DNA and RNA -- which are the blueprints for making proteins. However, life as we know it is not even conceivable without both proteins and nucleic acids. The simplest forms of life -- viruses (treating them for the sake of discussion as "living") -- consist of nothing but protein and nucleic acid.

Schoolchildren learn that "proteins" are an essential component of food, among others such as carbohydrates, fats, etc. Interestingly enough, however, proteins consumed as food are usually broken down by the process of digestion into their constituent amino acids. They are seldom used by the body directly. Instead, the amino acids are the genuinely important nutrients. All the proteins which a living body actually needs -- which may be as many as a million in humans -- are manufactured within cells under direction from the genes.

Just to show the diversity of functions that proteins have in living cells, here is a partial list of the principal ones:


Protein chemistry

What exactly is a protein, chemically? It is a polymer consisting of multiple amino acids linked together by "peptide bonds". An amino acid, in turn, is a relatively small molecule consisting of a central carbon atom, to which is attached one hydrogen atom, a carboxyl group (COO-), an amino group (NH3+), and one other variable group called a "radical" or "side chain". (One biological amino acid, proline, is a slight exception.)

The peptide bond forms between the carboxyl group of one amino acid and the amino group of another (expelling one water molecule in the process). Proteins may contain upwards of 1500 amino acids linked in this way. A generic name for such a chain is a "polypeptide". Shorter chains, less than about 40 amino acids, are conventionally referred to as "peptides" rather than proteins. Peptides often occur as hormones and neurotransmitters. In addition, they can usually be synthesized chemically in the laboratory, whereas biologically active proteins usually can't.

The side chain of an amino acid can be as simple as a single hydrogen atom, yielding "glycine". In glycine, the hydrogen atoms are on opposite sides of the central carbon. Except for glycine, all amino acids can have their four parts arranged around the carbon in one of two different ways -- clockwise or counter-clockwise. This leads to the circumstance that each amino acid can exist in two forms, known as "stereo-isomers" or "enantiomers", which are chemically but not physically identical. This handedness property is known as "chirality". In solution enantiomers polarize light passing through in different directions.

But more importantly, all amino acids that occur naturally in proteins are of one type rather than the other. These types are designated as L- or D- enantiomers (from Latin levulo and dextro). The L-form is the one that occurs naturally. The D-forms are biologically inert -- they cannot be manufactured into proteins by biological processes. When D-amino acids or polypeptides based on them are synthesized chemically, they are not used by living cells. It is a rather mysterious open question as to how this asymmetry happened to develop as it did in all forms of life.

Another unanswered question is why only 20 different amino acids -- out of a large number that are chemically possible -- actually occur in proteins. Undoubtedly these choices reflect chemical conditions which existed when protein-based life first emerged. We just don't happen to know what those conditions were.


Protein folding

The work that any given protein performs in a cell is determined mostly by its physical shape rather than its chemical composition. But the chemical composition does matter, indirectly, since it controls the shape. The shape is all-important, because it is by other molecules attaching to proteins, or vice versa, that chemical reactions and other important processes actually happen. And whether or not a protein and another molecule can attach to each other depends on whether they have the right shape to fit together. If you know a little about "nanotechnology", it is clear that proteins are the original nano-machines.

Since any given protein is a linear sequence of specific amino acids, the protein is chemically defined by the sequence. This sequence is known as the "primary structure" of the protein. Most proteins, however, do not actually have a linear shape. They tend to fold up in very complex ways into compact forms, which is the secret of their biological activity. (A few structural proteins that are fibrous, such as the keratin in hair, are mostly linear.)

It turns out that most proteins fold up into their effective 3-dimensional shapes in just a single way. That is, the amino acid sequence usually determines the shape, and hence the function, of the protein. (And the sequence was itself determined by the sequence of nucleotides in the gene which ultimately specified the protein.)

The few cases where more than one shape may result from a given amino acid sequence can cause all sorts of havoc. This circumstance is exactly what gives rise to so-called "prion diseases", such as the famous mad cow disease. What happens here is some (a very few) proteins which can assume an alternate form are also capable of converting all other like proteins to the alternate form, destroying whatever functionality they may have had.

Anyhow, there are three more levels of structure beyond the primary structure. The second level -- the "secondary structure" -- consists of the "local" shapes assumed by portions of the protein. Certain shapes occur commonly enough that they have names such as the "alpha helix" and the "beta-pleated sheet".

The way that the various components of the secondary structure fold together into some definite configuration relative to each other creates the "tertiary structure". The various bumps and pockets thus formed in the protein molecule become the main way that other molecules attach to the protein, or vice versa.

There is, lastly, a "quaternary structure". This comes about in many proteins which actually consist of two or more polypeptide chains (subunits). Each of these assumes some tertiary structure, and then chemical bonds form in certain locations to hold the subunits together.

As it happens, most complete polypeptide chains will spontaneously assume just one tertiary structure determined by the amino acid sequence (the primary structure). This 3-dimensional shape generally represents the lowest energy configuration for the sequence. At least, this is what has long been thought. It now appears that many, perhaps most, proteins may have alternative tertiary structures. This seems to be somewhat of a murky issue now.

In any case, whether there is just one tertiary structure or several that are possible, the shape or shapes which can result should in principle be computable knowing just the primary structure. But in our present state of computational technology, actually performing this calculation from first principles for largish proteins is enormously difficult -- mostly beyond the capabilities of our most powerful existing supercomputers, even in the "terascale" class. Finding some more effective techniques or algorithms for doing this computation is known as the protein folding problem.

There's one more wrinkle to this folding issue which might be noted. All the observations on the general uniqueness of the 3-dimensional structure for a particular protein assume that the polypeptide already exists as a complete entity. But of course, during the synthesis of proteins in the ribosomes of cells, each polypeptide chain is only partially complete until the very end. What prevents these chains from folding in an inappropriate way when it is, say, only half complete? The answer to this is partially known. It turns out that there are specialized proteins called "chaperones", appropriately enough, which assist in the process to ensure that the partially completed polypeptides don't fold up the "wrong" way.


Synthesizing proteins

Chemical synthesis of small peptides in the laboratory is no big deal. The longer the chain the more work is involved, however. A technique called "solid phase chemistry" allows for automated production of proteins over 100 amino acids in length. Unfortunately, many proteins, especially the most interesting ones, can be 10 times that size, or larger.

But there is another problem, even for relatively small proteins. Some simple proteins, such as blood serum albumin, really are composed of nothing but amino acids. Unfortunately, most proteins actually consist of more than just amino acids. These are called "conjugated" proteins, and they are decorated with a variety of other molecules -- like sugars, fats, or nucleic acids. Somehow, cells know how to do this in just the right way. Protein chemists don't know how to do this, yet.

Genetic engineering of bacteria theoretically makes possible the production of at least smaller proteins, by inserting the approriate DNA into the bacterial genome. (There's a limit on how much DNA can be inserted, limiting the size of proteins that can be produced.) Unfortunately, there are glitches even here. It appears that mammalian cells (including human ones) attach different sugars to their proteins than bacteria do. (The process is called "glycosylation.) They may fold the proteins differently. And they can make larger proteins than bacteria.

Net result, even with genetic engineering of other organisms, it isn't at all easy to manufacture proteins. The best we've done so far is we've learned how to use other mammalian cells (such as those of Chinese hamster ovaries) to make proteins acceptable for humans.


Glycosylation


Enzymes



Recommended references: Web sites

Site indexes

The Virtual Library of Biochemistry and Cell Biology: Proteins: Biogenesis to Degradation
Extensive categorized and annotated list of links.

Sites with general resources

Blue Gene: A vision for protein science using a petaflop supercomputer
2001 article from the IBM Systems Journal about the Blue Gene computer project, with a great deal of information about computational approaches to protein folding. More recent information can be found at the Blue Gene page.
The Baker Laboratory Home Page
Research group of David Baker, the developer of the Rosetta software. Contains information on the group's research, publications, and members. The laboratory operates a distributed computation project called Rosetta@home.
Folding@home
Cooperative distributed computing project to study protein folding.
The Human Proteome Folding Project
A distributed computing project of Grid.org whose objective is to compute the structure of all human proteins in about a year.

Surveys, overviews, tutorials

Protein
Article from Wikipedia. See also Amino acid, Peptide bond, Protein folding.
Protein Folding Background Reading
Four articles providing background information for a symposium on protein folding. See also the list of key questions.
Structural biology: Breaking the protein rules
March 2011 Nature News Feature, about intrinsically disordered proteins. "If dogma dictates that proteins need a structure to function, then why do so many of them live in a state of disorder?"
The Protein Tango
August 2009 article in The Scientist "Researchers unravel the complexities of coupled protein binding and folding and lead others towards new drug targets."
Lively proteins move and shake
May 2001 news article from Physics World, about the movement of protein side chains.
Molecular Modeling: A Methof for Unraveling Protein Structure and Function
General information about the science of protein structure, from the NCBI Science Primer
Unraveling the Mystery of Protein Folding
Good article (PDF format) for a general audience by W. A. (Bill) Thomasson. Explains details such as fundamental patterns of protein structure and the role of molecular chaperones in folding.
Physicists Take on Challenge Of Showing How Proteins Fold
October 1998 article from The Scientist, by Steve Bunk.
The Bridge from Genes to Proteins
July 1998 article by Charles L. Brooks III with good overview information on the folding problem.
Rosetta Tackles the Extreme Origami of Protein Folding
Article from the July 2001 issue of the Howard Hughes Medical Institute Bulletin. "Rosetta" is a computational technique for computing protein folding developed by HHMI investigator David Baker.
Gene Machine
Wired magazine article about IBM's "Blue Gene" supercomputer project.
Protein Structure Basics
Good single-page overview.


Recommended references: Magazine/journal articles

Shock and Age
Richard Morimoto
The Scientist, June 2010
The accumulation of misfolded protein marks the accrual of years as the body ages. Could heat shock proteins be used to reduce the effects of aging and diminish the risk of disease by untangling improperly folded proteins?
Proteins by Design
David Baker
The Scientist, July 2006
New functional proteins are being built on advances in modeling and structure prediction.
Gene Machine
Oliver Morton
Wired, July 2001, pp. 148-159
Computing the structure of proteins is a daunting challenge. IBM's "Blue Gene" project is developing a petaflop (1000 teraflop) computer to attack the problem by 2004.


Recommended references: Books


Home

Copyright © 2002 by Charles Daney, All Rights Reserved