dna Logo The Molecular Biology Notebook Online
A Beginners' Guide to Molecular Biology



You are here: Gene to Function > Central Dogma > Genetic Code

The genetic code


Once established that DNA carries the genetic information, we need to understand how the information is coded.

It is immediately evident that a combination of three nucleotides would code for an amino acid. Indeed:

  • If singletsask Dr Chromo! coded for amino acids, where one nucleotide corresponds to one amino acid, then only four amino acids could be coded by the four bases of RNA.

  • If doubletsask Dr Chromo! (or two nucleotides) coded for amino acids, because there are 16 possible combinations of the four nucleotides, then this type of code will only translate into 16 amino acids which is still too few.

  • If triplets coded for amino acids, then because there are 64 possible combinations, it is more than enough to code for the twenty amino acids.

    The table below shows the possible number of codons depending on how many nucleotides are used each time.

    Singlet Code Doublet Code Triplet Code
    A
    U
    C
    G

    Four possible
    combinations

    AA AG AC AU
    UA UG UC UU
    CA CG CC CU
    GA GG GC GU

    16 possible
    combinations

    AAA
    AAU
    AAC
    AAG
    AUA
    AUU
    AUC
    AUG
    ACA
    ACU
    ACC
    ACG
    AGA
    AGU
    AGC
    AGG
    UAA
    UAU
    UAC
    UAG
    UUA
    UUU
    UUC
    UUG
    UCA
    UCU
    UCC
    UCG
    UGA
    UGU
    UGC
    UGG
    CAA
    CAU
    CAC
    CAG
    CUA
    CUU
    CUC
    CUG
    CCA
    CCU
    CCC
    CCG
    CGA
    CGU
    CGC
    CGG
    GAA
    GAU
    GAC
    GAG
    GUA
    GUU
    GUC
    GUG
    GCA
    GCU
    GCC
    GCG
    GGA
    GGU
    GGC
    GGG

    64 possible
    combinations


    The hypothesis of a three letter code must be tested.

    Francis CrickBiography, in 1961, provided experimental evidence for this hypothesis. Working with the bacterium Escherishia coli, he made additions and deletions of one or more nucleotides to specific genes. These changes caused the gene to be misread and resulted in abnormal phenotypes. But when three bases where added or substracted, the resulting phenotype was normal or almost normal.


    The code must be deciphered.

    The code was eventually broken by the preparation of a length of nucleic acid, messenger RNA, in which one triplet code was repeated many times. Initially, this synthetic RNA was made of single nucleotides, producing structures like AAA-AAA-AAA-AAA-AAA which codes for a polypeptide made only of Lysine (Lys-Lys-Lys....), or CCC-CCC-CCCC (Proline: Pro-Pro-Pro-....), or UUU-UUU-UUU-UU (Phenylalanine; Phe-Phe-Phe...). Subsequently, synthetic RNAs with various known sequences of nucleotides were produced, and added to a cell-free enzymatic protein synthesis system, and the products analysed.



    The genetic code

    Second base
    First base

    U

    C

    A

    G

    Third
    base

    U

    UUU: Phe
    UUC: Phe
    UUA: Leu
    UUG: Leu
    TCT: Ser
    UCC: Ser
    UCA: Ser
    UCG: Ser
    UAU: Tyr
    UAC: Tyr
    UAA: Stop
    UAG: Stop
    UGU: Cys
    UGC: Cys
    UGA: Stop
    UGG: Trp
    U
    C
    A
    G

    C

    CUU: Leu
    CUC: Leu
    CUA: Leu
    CUG: Leu
    CCU: Pro
    CCC: Pro
    CCA: Pro
    CCG: Pro
    CAU: His
    CAC: His
    CAA: Gln
    CAG: Gln
    CGU: Arg
    CGC: Arg
    CGA: Arg
    CGG: Arg
    U
    C
    A
    G

    A

    AUU: Ile
    AUC: Ile
    AUA: Ile
    AUG: Met
    ACU: Thr
    ACC: Thr
    ACA: Thr
    ACG: Thr
    AAU: Asn
    AAC: Asn
    AAA: Lys
    AAG: Lys
    AGU: Ser
    AGC: Ser
    AGA: Arg
    AGG: Arg
    U
    C
    A
    G

    G

    GUU: Val
    GUC: Val
    GUA: Val
    GUG: Val
    GCU: Ala
    GCC: Ala
    GCA: Ala
    GCG: Ala
    GAU: Asp
    GAC: Asp
    GAA: Glu
    GAG: Glu
    GGU: Gly
    GGC: Gly
    GGA: Gly
    GGG: Gly
    U
    C
    A
    G
    Ala: Alanine
    Arg: Arginine
    Asn:Asparagine
    Asp: Aspartic Acid
    Cys: Cysteine
    Gln:Glutamine
    Glu:Glutamic acid
    Gly: Glycine
    His:Histidine
    Ile: Isoleucine
        
    Leu:Leucine
    Lys: Lysine
    Met: Methionine
    Phe: Phenylalanine
    Pro: Proline
    Ser: Serine
    Thr: Threonine
    Trp: Tryptophan
    Tyr: Tyrosine
    Val: Valine



    The code is degenerateask Dr Chromo!: more than one codonask Dr Chromo! may code for the same amino acid.

    The code is (almost) universal: any particular codon represents the same amino acid in bacteria, plants, fungi, or animals (small differences among bateria)

    Only two amino acids are coded by a single codon: Methionine - ATG and Tryptophan - TGG.

    There are three 'Stop' codons: they do not code for any amino acid, but when they are present, signal the end of a protein (TAA, TAG, TGA).


  • Copyright Rothamsted Research 1999- 2010
    Web technical : Nathalie Castells-Brooke