Most genes contain the information needed to make functional molecules called proteins. (A few genes produce other molecules that help the cell assemble proteins.) The journey from gene to protein is complex and tightly controlled within each cell.
It consists of two major steps: transcription and translation. Together, transcription and translation are known as gene expression.
During the process of transcription, the information stored in a gene’s DNA is transferred to a similar molecule called RNA (ribonucleic acid) in the cell nucleus. Both RNA and DNA are made up of a chain of nucleotide bases, but they have slightly different chemical properties.
The type of RNA that contains the information for making a protein is called messenger RNA (mRNA) because it carries the information, or message, from the DNA out of the nucleus into the cytoplasm.
Translation, the second step in getting from a gene to a protein, takes place in the cytoplasm. The mRNA interacts with a specialized complex called a ribosome, which “reads” the sequence of mRNA bases. Each sequence of three bases, called a codon, usually codes for one particular amino acid. (Amino acids are the building blocks of proteins.)
A type of RNA called transfer RNA (tRNA) assembles the protein, one amino acid at a time. Protein assembly continues until the ribosome encounters a “stop” codon (a sequence of three bases that does not code for an amino acid).
The flow of information from DNA to RNA to proteins is one of the fundamental principles of molecular biology. It is so important that it is sometimes called the “central dogma.”
Several steps in the gene expression process may be modulated, including the transcription, RNA splicing, translation, and post-translational modification of a protein.
Gene regulation gives the cell control over structure and function, and is the basis for cellular differentiation, morphogenesis and the versatility and adaptability of any organism. Gene regulation may also serve as a substrate for evolutionary change, since control of the timing, location, and amount of gene expression can have a profound effect on the functions (actions) of the gene in a cell or in a multicellular organism.
Transcription
A gene is a stretch of DNA that encodes information. Genomic DNA consists of two antiparallel and reverse complementary strands, each having 5’ and 3’ ends. With respect to a gene, the two strands may be labeled the “template strand,” which serves as a blueprint for the production of an RNA transcript, and the “coding strand,” which includes the DNA version of the transcript sequence.
The production of RNA copies of the DNA is called transcription, and is performed in the nucleus by RNA polymerase, which adds one RNA nucleotide at a time to a growing RNA strand. This RNA is complementary to the template 3’ → 5’ DNA strand, which is itself complementary to the coding 5’ → 3’ DNA strand.
Therefore, the resulting 5’ → 3’ RNA strand is identical to the coding DNA strand with the exception that thymines (T) are replaced with uracils (U) in the RNA. A coding DNA strand reading “ATG” is indirectly transcribed through the non-coding strand as “AUG” in RNA.
Transcription in prokaryotes is carried out by a single type of RNA polymerase, which needs a DNA sequence called a Pribnow box as well as a sigma factor (σ factor) to start transcription. In eukaryotes, transcription is performed by three types of RNA polymerases, each of which needs a special DNA sequence called the promoter and a set of DNA-binding proteins—transcription factors—to initiate the process.
RNA polymerase I is responsible for transcription of ribosomal RNA (rRNA) genes. RNA polymerase II (Pol II) transcribes all protein-coding genes but also some non-coding RNAs (e.g., snRNAs, snoRNAs or long non-coding RNAs). Pol II includes a C-terminal domain (CTD) that is rich in serine residues.
When these residues are phosphorylated, the CTD binds to various protein factors that promote transcript maturation and modification. RNA polymerase III transcribes 5S rRNA, transfer RNA (tRNA) genes, and some small non-coding RNAs (e.g., 7SK). Transcription ends when the polymerase encounters a sequence called the terminator.
Translation
For some RNA (non-coding RNA) the mature RNA is the final gene product. In the case of messenger RNA (mRNA) the RNA is an information carrier coding for the synthesis of one or more proteins. mRNA carrying a single protein sequence (common in eukaryotes) is monocistronic whilst mRNA carrying multiple protein sequences (common in prokaryotes) is known as polycistronic.
Every mRNA consists of three parts: a 5’ untranslated region (5’UTR), a protein-coding region or open reading frame (ORF), and a 3’ untranslated region (3’UTR). The coding region carries information for protein synthesis encoded by the genetic code to form triplets.
Each triplet of nucleotides of the coding region is called a codon and corresponds to a binding site complementary to an anticodon triplet in transfer RNA. Transfer RNAs with the same anticodon sequence always carry an identical type of amino acid.
Amino acids are then chained together by the ribosome according to the order of triplets in the coding region. The ribosome helps transfer RNA to bind to messenger RNA and takes the amino acid from each transfer RNA and makes a structure-less protein out of it.
Each mRNA molecule is translated into many protein molecules, on average ~2800 in mammals.
In prokaryotes translation generally occurs at the point of transcription (co-transcriptionally), often using a messenger RNA that is still in the process of being created. In eukaryotes translation can occur in a variety of regions of the cell depending on where the protein being written is supposed to be.
Major locations are the cytoplasm for soluble cytoplasmic proteins and the membrane of the endoplasmic reticulum for proteins that are for export from the cell or insertion into a cell membrane.
Proteins that are supposed to be expressed at the endoplasmic reticulum are recognised part-way through the translation process. This is governed by the signal recognition particle — a protein that binds to the ribosome and directs it to the endoplasmic reticulum when it finds a signal peptide on the growing (nascent) amino acid chain.
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text
MicroRNA
Recently, molecules called microRNAs have been found in organisms as diverse as plants, worms and people. The molecules are truly “micro,” consisting of only a few dozen nucleotides, compared to typical human mRNAs that are a few thousand nucleotides long.
What’s particularly interesting about microRNAs is that many of them arise from DNA that used to be considered merely filler material.
How do these small but important RNA molecules do their work?
They start out much bigger but get trimmed by cellular enzymes, including one aptly named Dicer. Like tiny pieces of Velcro®, microRNAs stick to certain mRNA molecules and stop them from passing on their protein-making instructions.
First discovered in a roundworm model system, some microRNAs help determine the organisms body plan. In their absence, very bad things can happen.
For example, worms engineered to lack a microRNA called let-7 develop so abnormally that they often rupture and practically break in half as the worm grows.
Perhaps it is not surprising that since microRNAs help specify the timing of an organism’s developmental plan, the appearance of the microRNAs themselves is carefully timed inside a developing organism. Biologists, including Amy Pasquinelli of the University of California, San Diego, are currently figuring out how microRNAs are made and cut to size, as well as how they are produced at the proper time during development.
MicroRNA molecules also have been linked to cancer. For example, Gregory Hannon of the Cold Spring Harbor Laboratory on Long Island, New York, found that certain microRNAs are associated with the severity of the blood cancer B-cell lymphoma in mice.
Since the discovery of microRNAs in the first years of the 21st century, scientists have identified hundreds of them that likely exist as part of a large family with similar nucleotide sequences. New roles for these molecules are still being found.