Molecular Evolution Lecture Section I Part v) Proteomics and Protein Control

The Proteome, as a term, refers to the complement to the genome by the physical expression of the genes. Since the genes are the encoders of the information necessary to make proteins, the proteome may be regarded as the direct physical expression of the information of the genome. Since this representation is linear, obviously the genome and proteome evolve in lockstep, since one represents the other. Upon examining proteins and their homology by means of their structure and function and cross-referencing this with the genetic sequence which represents the protein in question, we find ourselves presented with immediate evidence from such a distinct homology the divergence of such as indicated by noise mutation and signature conserved sequences, that even down to the supramolecular level, we can establish common descent in biology. So that this does not sound like blinding jargon, we must first ask ourselves, what is a protein, and how is this relevant?
I suppose that before beginning to explain why, an explanation of fundamental concepts is in order. To understand evolutionary mechanisms, we need a very deep understanding of genes and proteins. Since an advanced explanation of evolutionary genetics is covered in an essay I am writing called Hox Flow Mechanisms and Their Effect on Evolutionary Phenotypes and Structures the bulk of this essay will be about proteins. Why genes and proteins? The relationship between genes and proteins, and their interactions with RNA is deep and ancient. Many times have I said that only someone who truly understands the interlocking three pillars of molecular biology with the depth that comes with years of study is the only person who has any right to comment on the validity of the theory that life evolves. Essentially, a protein is a string of amino acids, usually 500-2000 amino acids long. The whole of life depends on proteins. Everything else, save the genes, is a mere passive bystanders in a biological dance of life. When we observe the cell, we are in essence observing proteins. Proteins control movement (motor proteins), the control structure (structural proteins), they control concentration (transmembrane proteins), they control ion gradients (pump proteins), and most importantly, they control every single chemical reaction in the body (enzymes).
Proteins don't just control the body, they are the body. All proteins fold up tightly into one highly preferred conformation. There is no limit to the number of tasks they do in the cell. Proteins can be subdivided into two large classes, the globular proteins fold up into irregular ball-like shapes and fibrous proteins. Nearly all globular proteins are allosteric, which means they can adopt two slightly different conformations, this means they have two binding sites, one of which is for a regulatory molecule, and the other is for the substrate. Allosteric control is very complex. Suffice it to say for now that it works on either negative or positive feedback (ie the regulatory molecule increases the protein's affinity for the substrate, and the other way around, or the opposite, the regulatory molecule decreases protein affinity for the substrate, which of course, would be reciprocal. In this way, regulatory molecules can turn the protein on or off, and in negative control, there is a tug of war between the regulatory ligand and substrate which are reciprocally affected by each others concentration in the cell. A short summary of biological proteins would look like this: A protein is a specific type of biological polymer made up a specific family of chemical subunits called amino acids. There are 20 biological amino acids, and they are distinguished by the fact that they all have a central alpha carbon, which is attached to an amine group (-NH2), a Carboxyl group (-COOH), a hydrogen, and a side chain. It is the side chain that gives each amino acid its properties, and each of the 20 has a different side chain. Proteins can be anything in length. Usually it is 50-2000 amino acids long, and the longest ones can 7000 amino acids long. The interaction between the side chains (which is determined by charge, since three are basic, four are acidic, nine are nonpolar and five are polar but uncharged) determines the shape of the protein.
For instance, the nonpolar side chains are all hydrophobic (water hating) which means the protein will fold up in a manner where the nonpolar side chains are facing inwards and not exposed to water (this is the most energetically favorable conformation). This is just one of many different subtle interplays between amino acids that determine a proteins shape. However, nearly all proteins fold spontaneously in a solution, indicating that all the information necessary to fold it is stored in the amino acids. Proteins are: Structural: All large structures in the body are almost certainly composed of structural proteins. Adding repeated protein subunits allows for geometric assembly of thousands of structures.
For example tubulin can, by readdition of the tubulin subunit, assemble the microtubules of the cell. Actin is a fibrous, ropelike protein that can assemble into fibrils, like most fibrous proteins, which is a long sheet of fibers arrayed together. Actin is the fiber responsible for muscle contraction, another example is elastin, which is made of a loosely bound collection of elastin polypeptide chains, which, when bonded to each other, make a rubber like sheet that gives skin its property of stretching without tearing. Many structural proteins can self-assemble just by the repeated addition of a single protein. For example, the capsid, which is the coat of a virus, is a spherical structure which is made by no more than 60 identical proteins added together to make a perfect sphere.
Enzymatic: Globular proteins function as enzymes, which speed up all the body's chemical reactions. Enzymes are better catalysts than anything man has, and can speed up a reaction by a factor of 100,000,000,000,000 (100 trillion). They control the rate of the thousands of reactions in the cell, and by regulation and coordination and feedback loops, create massive, intricate metabolic pathways. All enzymes have an active site, which the molecule to be catalyzed (any molecule that binds to a protein is called a ligand) attaches to for catalyzation of whatever reaction is needed. Usually enzymes operate in steps, so the product of one enzyme becomes the target for the next, in this case, the molecule is called a substrate.
Transmembrane: Proteins can be arrayed across the membrane of the cell and control the concentration of various chemicals inside, allowing certain chemicals in and out. They are usually powered by ATP hydrolysis and usually control the flow of small ions like calcium and potassium. Trasmembrane proteins are technically a class of motor proteins, which are detailed below. Transmembrane proteins are important in cell regulation and enzyme kinetics. In muscles they are particularly important as it is the flow of Calcium ions out that powers the muscle contraction. They are very important in neurons and synaptic vescicles as the flow of ions (calcium, potassium and chloride) is what creates the energy gradient which holds the information the neuron is carrying.
Motor proteins: All proteins have precisely engineered moving parts, but motor proteins especially so, since a tiny movement has to induce a major conformational change. For instance, the protein myosin has to control muscle contraction, which, as you can imagine, is a tremendous organizational problem. Many motor proteins have the very impressive ability to “walk” across structures like microtubules and DNA polymers. This is an autocatalytic inbuilt function of the protein. It results from the protein having three distinct conformations, and the protein switches between them via ATP hydrolysis. Since ATP hydrolysis is extremely energetically favorable, the protein is forced to move in one direction, since ADP condensation is almost certainly not going to occur. The protein is forced forward by a catalyst called Adenine nucleotide exchange factor, which releases the ADP after hydrolysis causing an ATP to bind to the regulatory site almost immediately. In this way the protein is forced from conformation 1 to conformation 2 to conformation 3 and then back to conformation 1 and so on.
Now, what we must understand regarding proteins is that all the necessary information that they may fold into their conformation (for globular proteins) is with the exception of a few viral capsids, completely present in the amino acid sequence of the protein in question, since proteins fold up spontaneously. Protein fold conformations depend on the thousands of non covalent interactions between the side chains, carboxyl and amine groups of the amino acids in proteins, where the variation comes from the side chains, which differentiate the amino acids. Hence, proteins are autonomous: The information necessary to create the function which depends on the structure is present in the sequence and does not need any exterior help to assemble. So, we must now consider the interrelated roles of genes and proteins in a cell.
Consider of a cell. Just one cell. It is tiny, barely 50 microns across. It is a enclosed by a tough cell wall made of glycolipids. This wall gives the cell defined boundaries and separates it from the outside world, defining it against the background. Inside the cell is a watery gel which fills up the whole inside of the cell. This watery gel is called the cytosol, and it is the main stage on which cellular events take place. This cell’s wall will be studded with transmembrane proteins which control things coming in and out of the cell (organic molecules in, waste gas out). Meanwhile, inside the cell, enzymes will be running the day-to-day operations of the cell. Structures inside the cell (usually made of proteins) needed to maintain it will be being broken down, assembled, and repaired in a series of complex pathways all controlled by enzymes. Meanwhile, the cell needs energy and raw materials, so it imports organic molecules (aka “food) and breaks it down into simple subunits (this process is controlled by enzymes) which are then used for energy (a process which is also controlled by enzymes) or used to construct large cellular structures (this is also controlled by enzymes).
For all this to happen requires a lot of chemical messages to fly between lots of different parts of the cell so that the cooperative process keeps going, and all different cellular projects are in communication and taking cues from the environment for what to do (these processes are controlled by signal integrating proteins, signal amplifying proteins and signal transducing proteins). Controlling all this is the genetic code. The genetic code holds the “master key” to all the proteins. The rate at which proteins are assembled from genes is controlled by other genes, which in turn usually end up being controlled by other genes. Since proteins work in teams, the concentration of each different protein, as controlled by the genetic code, affects the cell as a whole. Most of the time, the demand for various products operates on a feedback loop. If a product is needed, it triggers a stimulus which sends a message to the genetic template. This can result in a particular gene being switched on or off or increasing rate of production or decreasing or a host of other things. In other words, the genetic code of a cell functions like a microprocessor. It takes input from the environment, processes it, and delivers an output. In this way, the whole balance of the cell can be controlled by the genes. However, this analogy is not entirely accurate since the relation between proteins and genes are reciprocal ie proteins can control genes (these are called DNA binding proteins). So let us now examine the basic principles of organization of proteins and genetic information:
Duplicative mutation: A genetic mutation where a gene string is accidentally duplicated during mitosis failure. This provides the mutation carrier with superfluous genetic baggage, basically an extra copy of a gene. This copy is free to mutate based solely on random frequency probability. Homology: A family relationship between two or more genes (or sets of genes) as the result of a duplicative mutation. For instance, the human genome contains seven haemoglobin proteins, all of which are in a gene family called the haemoglobin family. This is part of a larger family called the globin family, under which all oxygen binding proteins are classed like myoglobins. Paralogy: A relationship between two closely related genes in a single genome as the result of a mutation. These two genes (or sets) are said to be paralogous of each other in the same carrier species. For instance, the seven human haemoglobins are said to be paralogous of each other. Orthology: A relationship between genes or sets of genes in different species. When two species diverge, the new genetic arm of the phylogenic tree retains much of the genetic code of it's predecessor. Any related batches of genes in two species are said to be orthologous of each other. The seven human haemoglobins are orthologous to the seven chimp ones. Recombinative mutation: Chunks of genetic information are shuffled around. Bear this in mind for what we talk about below.
This is a very short primer on how genetic information is shifted around and created. Polypeptide chain: A tertiary protein is one consisting of only one polypeptide chain (one very long molecule made of many amino acid subunits all covalently bonded, usually 50-2000 amino acids long) Tertiary domain: A recognizable subunit of a polypeptide chain. Domains are usually consistent of many alpha helices and beta sheets twisted around each other. A tertiary domain is essentially a clearly distinct building block of a polypeptide chain. When I say distinct, I mean obvious. For example, the protein nucleosaminade is constructed of four distinct regions symmetrically aligned in a tetramer (four identical domains). Perhaps telling you how obvious the domain distinct is. Perhaps a picture would help: neuraminidase tetramer http://www.bio2001.csiro.au/images/THblue-NA.jpg Domains can be aligned in stack-up formation (N-terminus to C and so on) to create long strandlike proteins as seen in fibronectin shown here http://www.steve.gb.com/images/molecules/proteins/fibronectin_type_III_repeats.jpg Hopefully these pictures will make you understand what I mean when I say that a domain is a “distinct region”. The technical definition of a domain is “a region of protein which folds up independent of the rest of the chain”. Thus we can see that a tertiary domain is a clear, distinct region of protein which makes up the large building subunits of polypeptide chains, a fact which will be important later. Modular domain: A domain which pops up all over the proteome of many different proteins and many different domains in many different species, essentially a building block ubiquitous throughout evolution.
*critical note about tertiary and modular domains. When we say the domain is in many places, keep in mind that domains are not identical protein strings to the letter. Two protein strings can be 50% different in amino acid string and still be the same domain (as a domain is defined by its function). This is why the human haemoglobin alpha is a different protein string to say, the chimp haemoglobin alpha, but they are still the same domain. This is crucial for a test discussed soon called the molecular clock. Quaternary domain: Many proteins are consistent of many polypeptide chains (tertiary proteins) noncovalently bonded together. If this is the case then it is called a quaternary protein complex, and the subunits of this large complex (called quaternary domains) are the tertiary proteins (the subunits of which of course, are tertiary domains) Primary structure: Unfolded protein. A straight string of amino acids. Secondary structure: The protein transforms into it’s secondary structure by folding at the kinks between the tertiary subunits. The secondary structure of a protein includes fundamental strutres that are totally ubiquitous to life. For instance: An alpha helice is the most common structure in the whole of biology. A hydrogen bond between every four steps on the string produces a helical structure.
The Oxygen on the Carboxyl group of every amino acid binds to the hydrogen on the amide group four amino acids away, producing a full twist every 3.6 amino acids. Most times these helices will coil around each other, because the outside typically has a hydrophobic "stripe". The resulting structure is a coiled-coil, which is very important in structural proteins like collagen and keratin (hair and skin). The Beta sheet is equally important, and it occurs when two strings of amino acids face each other either parallel or antiparallel. Oppositely charged amino acid side chains attract each other, and carboxyl groups attract the nitrogen groups. These are very important, because the parallel sheets forms tight kinks in between them which usually end up being the binding site for a ligand. Supersecondary structure: This includes ubiquitous short patterns of different secondary structures that are not large enough to be domains, and crop up all over the place. These include Greek Keys, Coiled-coils, Collagen triple-helixes, Beta motifs, Zinc fingers and Delta modules. Tertiary structure: The folding becomes further intricate during progression to tertiary structure when the folds between individual atoms and peptide bonds take shape
. Quaternary structure: Only applicable in a quaternary protein. Same as tertiary structure, except of course, with a protein complex holding multiple noncovalent bonded chains. From all this we can conclude that the protein is built by higher and higher levels of organization. As we shall see, it is shuffling of the higher levels of organization that give us the novel protein combinations that give rise to molecular evolution, and avoid the deletrious nature of changes in the fundamental units of protein organization. Signature sequence: These are crucial for identifying two protein strings which are the same domain but which have very different amino acid sequences (note, the key factor in determining whether two strings are indeed the same domain is 3D geometry, aka the fold or conformation, but this can often take years to construct). For instance, two protein strings can still be considered the same domain even if, say, the match between them is 30% amino acids identical. This is because many of the mutations do not alter the geometry of the string (these neutral mutations are also called “background noise&rdquo
. So to help us tell where two protein strings are indeed the same domain, we need to find the signature sequences, conserved amino acid strings critical for the functioning of the protein (often these strings are so sensitive that they do not change whatsoever, even over two billion years).
The direct relationship between structure, sequence, function, and time of divergence whereby the originality of the proteome comes from duplication and divergence from a common base proteome which branched out from there is hence indicated by means of the direct relationships found in structure, function, sequence and divergence by time which indicates a change in continuum. The existence of paralogies of genetics across the families in the proteome, even diverged as far as separations between the three domains themselves, and the fact that amino acid tracking reveals this to narrow as the organisms in question become more closely related (a fact which is reinforced by advanced radiometry) can only be possible via repeated duplication and divergence of genes, thusly bearing gene families which in turn branched out depending on the survival requisites of the organism and location, the lack of originality in the proteome, especially the vertebrae proteome, which can be explained entirely in terms of domain shuffling and protein string recombination can only be explained by origin from a common descent, a primordial genome who bore only the survival requisites for the simplest of life. What this genome may have looked like is mysterious, but insight into a small bacteria called Mycoplasm genitalium can give us the answer, when computationally recombined with cross-references of genes exclusive to archae, eubacteria and eukaryotes (Excluding ESP proteins of course) we arrive at an answer of roughly 200 genes dedicated to basic metabolic and structural proteins, rRNAs and mitosis control gates. Ad it is from this humble beginning that life evolved. A fact which is corroborated by genomic analysis which indicates the homology of genetic sets, and hence by extension that of the proteome, where the homology is not only present in sequence, but by a direct relationship, to structure hence function as well, where the divergence has a direct relation with time. Such a relationship is only possible in light of common descent. Quite simply, molecular genetics tracking, ERVs and mtDNA, in addition to computational searches for paralogies across the spectrum, leads us inevitably to the conclusion that the whole swath of life arose from a single, simple, primordial cell.
I will now turn to a complex explanation of homologous amino acid tracking, and how we may distinguish signature sequences from nosie mutations. Homologous tracking can be distinguished from noise mutations from distinct signature sequences on the protein. For instance, a protein which is common to the whole of life Called Elongation factor-Tu or EF-Tu is a control mechanism for tRNA at the ribosomal junction with the mRNA string for protein synthesis. A critical control mechanism for m/tRNA match, the EF-Tu is a GTPase protein with multiple domains. The EF-Tu holds the tRNA as masked in position for translation via bonding to the mRNA. The hydrolysis of the GTP on the EF-Tu induces a major conformational change, since it has a crucial alpha-helix called the switch helix which can, due to the crucial positions of polar amino acid, switch between two crucial domains. The position of the switch helix depends on the hydrolysis of GTP. When the GTP is bound to EF-Tu, the switch helix is locked in place, which means that the tRNA is still masked. When the GTP is hydrolyzed in GDP, the switch helix opens the domain latch, releasing the tRNA. In this way we can see how a small conformational change can amplify the signal and induce a major movement change in the conformation of the protein, such is how precise the protein's moving parts are.
They are the most well-engineered devices in nature (and much more complex then any of man's devices) So when we examine the vast swath of EF-Tu domains across the spectrum of life, the switch/latch sequences are almost unchanged, such is the precision of the protein (this effect is also present in catalysts, since the replacing of a single aspartate with a glycine in the binding site of aspartate transcarbamylase is enough to shift the position of the transition inducing carboxylate by the radius of a hydrogen atom, which is enough to decrease the enzymatic activity by a thousand fold). Therefore, then, even with all the noise mutations, the sig sequences do not change in billions of years. But it is the noise we are concerned with, since the indication from the divergence of the same domains, while retaining the sig domains, clearly indicate that they arose from a single common ancestor.
http://bass.bio.uci.edu/~hudel/bs99a/lecture25/tugdp_for_bs99a.gif
The switch helix on EF-Tu is distinct, it is orange in this picture. Notice how it can swing between the two domains, the one on the bottom right, and on the top right. The helix, as well as the receptor domains and the GTP-binding site, the Allosteric ligand control system and tRNA site will change very little throughout the spectrum of life. Everything else freely mutates on the basis of random frequency, and in a precise rate and order that is easily predicted and shown by molecular biologists. This noise mutation is excellent evidence for evolution, because it clearly indicates that all EF-Tu holding species have a common ancestor (EF-Tu is a prokaryotic control protein, not present in Eukaryotes). This is checked against the most highly conserved proteins, because these indicate with distinction evolutionary relationships. Some proteins and gene sequences are so common, so vital, that they literally do not change at all. For example, a protein critical to all Eukaryotic life, the histone octamer around which DNA is wound to package chromatin for sequestering inside the nucleolus, is so highly conserved that the difference between the histone octamer for the human being and the potted pea plant(which are separated by over 1.5 billion years) differs in less than 1% of amino acids. And to generate new creative domains for the creation of new phenotypes for organisms, we turn to the following. It is about duplicative mutations, followed by recombinative mutations, or shuffling mutations. For instance, A protein is not subdivided merely by it’s amino acid. It is grouped into large subunits called polypeptides, regional stretches of protein subunit roughly 100 amino acids long. In this way we can see that massive proteins (>1000 amino acids) are not only defined by their individual, but ultimately, the order of different units created by smaller strings of amino acids within the complex. The protein transforms into it’s secondary structure by folding at the kinks between the subunits. The shape, therefore, of a protein is directly determined by it’s chemical sequence. The folding becomes further intricate during progression to tertiary structure when the folds between individual units take shape.
Finally, the protein reaches it’s quaternary structure or it’s native state, with the intricate system of folds.
Point mutations are damaging, as they create stop codons, or nonsense mutations, but as we shall soon see, bearing in mind the organization of the proteome, point mutations have little to do with evolutionary mechanisms there is almost nothing original in the vertebrae genome. It is the result of multiple whole-global duplications throughout evolution. Even in humans, the proteome contains only 7% vertebrae-specific proteins. The only place we really seem to have any originality is in domain shuffling (the human tyrpsin can bind to at least 18 domains, while in drosophilia it's only 5). As I said about protein structure, much of the innovation merely comes from rearrangement of subunits, which is beneficial in terms of the shuffling mutation quite often. An excellent example of how evolutionary mechanisms can create novel protein combinations which can give survival benefits to the carrier organisms is found in a pair of the most critical classes of proteins in the whole of life: Kinases and phosphotases.
Kinases are protein phosphorylating class of signal transductors which control a large amount of proteins and amplify many, many signals, also acting as signal-integrating proteins by anchoring to the extracellular matrix junction and relaying signals from the membrane to the Endoplasmic reticulum and the nucleus. Instead of a ligand reciprocal/cooperative Allosteric binding site to control the action of the protein in question, a certain side chain (always a threonine, serine or tyrosine) is phosphorylated, which activates or deactivates the protein. The cyclic nature of kinase loop functions is very similar to that of GTPases. The largest superfamily of kinase is a simple monodomainal kinase called the Ras protein. As evolutionary mechanisms took course and organisms became more complex, a wider range of transductors became required, which evolve in lockstep with other evolving functions, a process called coevolution. This has been indicated by the fact that the Ras like domain has since become integrated into totally different proteins, and created entire classes of kinases simply by joining the Ras to many other domains throughout the course of evolution to create novel protein combinations. The branching of various kinase families that results from this is fully consistent with molecular clock tracking of the divergence rate of the amino acids (recall noise mutations). Which means whole families of kinases have been generated at different times in the evolutionary process by duplication and divergence. We now have many, many families of kinases including Cdc7, PDGF receptors, TGF-Beta receptors, Ca2+ dependent kinase, CdK integrators (which include a large range of Cdk including Cdk2 and Cdk3), Src kinases, KSS1, the list goes on and on. This is just a small example of how evolutionary mechanisms can generate huge numbers of novel proteins simply by recombination and duplication. The lack of originality or "design" in the kinase family, as well as the prescence of Ras in every kinase and the underlying signature sequence is clear evidence for a primordial kinase upon which the whole family was built, simply by the course of time and natural selection. This is the tree for the relationship between some kinase families:
http://bmc.ub.uni-potsdam.de/1471-213X-5-8/1471-213X-5-8-2.jpg
As we can see, this short stretch has signature sequences, exactly like the test for haemoglobin These are divergences found in an identical protein domain confirmed exactly by molecular clock tracking against the known divergence rate of the domain and the orthologous seperation of these two species. If (as the creationist claim) these species were created within days of each other, or had no common ancestor, this divergence would not exist. This is the same domain for each animal, which I took the liberty of sequencing myself.
Orthologous Divergence of the haemoglobin chain of various vertebrae correlated by molecular tracking, as shown by a commonly repeated molecular clock test, that of haemoglobin. As an oxygen-binding protein, haemoglobins are present in all multicellular Eukaryotes. It is, as well as an excellent example of homology due to the fact that it is arrayed in batches and is part of a superfamily of oxygen-binding proteins, very easy to test with the molecular clock, to see the rate of divergence on noise mutations. The very fact that we get positives on the molecular clock is enough to disprove the notion of design, seeing as the time/divergence relationship is only possible via evolution of the protein batch via duplication and homology and divergence. Percentage divergence in amino acids between conserved domain of haemoglobin
Human/Lamprey (divergence: 550 million years ago) 35%
Human /Shark (Divergence: 520 million years) 51% Human/tuna fish (450 million years) 55% Human/frog (350 million years) 56%
Human/chicken (320 million years) 70%
Human/lizard (270 million years) 77%
Bird/Crocodile (220 million years) 76%
Human/Kangaroo (170 million years) 81% Human/Sloth/Mouse/Elephant/Rabbit/Pig/Sheep/Whale/Cat/Dog/rat All between 150 and 50 million years, all 80-85% related in this domain
Human/orangutang (10 million years) 98%
human/chimp (7 million years) 100%
And now to recall the discussion about protein domains is key, namely the fact that there are no original protein domains in the whole proteomic spectrum makes it very obvious that they evolved by duplication and blind guidance. The entire vertebrae genome was created by shuffling mutations which rearranged domains into novel combinations. The human genome contains only 7% vertebrae specific protein, and differs in terms of size from a fruit fly by only a factor of 1.2, yet the fact that much more novel and complex arrangements of the same protein domains means the construction of a much more complex organism.
There is nothing original about the vast diversity of life. It all came from very simple, repeating, diverging, primordial, protein domains. So let us sum up. Proteins are the physical product of the genome, and are, analogous to the genes, represented in a linear fashion by chemical encoding. Whilst homology in genes is determined by sequence, homology in proteins is determined by structure and function. It is established that we can find a gene's function by means of working in reverse, which is to discover what the organism in question lacks when this gene is deleted artificially. We can also conclude that homologous genes retain the same function, and there is directly proportional relationship between divergence of function and divergence of base pairs.
The gene families are all homologous, which is to say that they are all built from duplication, which increases the raw material of genetics, and divergence of the different new orthologous and paralogous arms of a set which were created by said mutations. The proteome is the physical expression of such and as such it has precisely the same homologous properties. In evolutionary terms, we examine proteins by their higher subdivisions. We can divide a protein into quaternery domains. We can divide quaternery domains into polypeptide chains, or tertiary domains, which we can divide into batches of supersecondary structures, which we can divide into patterns of secondary structures, which can be explained in terms of primary structure (sequence). Once we get to a high enough level, often the section of the protein in question is simply self-assembling and is not affected by other domains on the same protein. In other words, the building blocks of proteins as we measure them at higher levels are the functional modules by which evolutionary mechanisms occur, not the deleterious lower level changes. Since at higher levels, the functional units of proteins fold up utterly independantly, the result is simply that we have modular functional domains that are interchangeable throughout a wide range of proteins and retain their preexisting function.
In this way, we can see that the bulk of evolutionary innovation comes from the reshuffling of such domains at higher levels. Since there is a directly inverse relationship between diversity of organisms and complexity (by which I mean that prokaryota are by far the most simple, and hence the most diverse, while protozoa are much more constrained, and muilticellular Eukaryota are the most constrained by far, due to limits on physiology and anatomy). Hence, originality decreases as we travel up the taxonomy. By the time we reach the least diverse groups, such as mammals (by far the least diverse Class in the whole animal Kingdom), we find that the vast bulk of changes are little more than quantitative tweaks in Hox genes, as opposed to actual innovation.
At any rate, the result of such a relationship is that the vertebrae in particular are noted for having no proteomic originality, which is to say that the proteins of the vertebrae proteome are always found everywhere else, hence any innovation in the vertebrae proteome occurs by means of simple reshuffling of the modular domains. Since the relationship between time divergence and amino acid divergence is directly proportional, we find that noise mutations which alter the sequence but not the structure or function of proteins are an indicator of time/divergence relationships in a proteomic homology, since the rate at which the sequences diverge is calculable, and is different for each protein depending on the conservation of the protein in question. In every protein, we invariably find sequences of amino acids 10-30 long which are so highly conserved due to the fact that they are absolutely necessary for the function of the protein that they do not change over evolutionary history, and hence, by means of comparing such sequences to the noise mutation which operates on the probability of just random frequency and is neutral, we find the rate of divergence, which tells us the time separation associated with the arising of two species. But this would only work if the species arose in a continuum from a common ancestor, since the homology indicates that all proteins are the result of recombination following duplication and the reassembly of modular domains to produce novel structures (the originality of which lessens as the organisms in question become more complex), and that the lack of originality in such a domain indicates that the entire proteome originates from a common ancestral proteome, whose appearance we can shed light on by means of cross-referencing universal domains with the most simple and, in terms of genome hence proteome size, the smallest organism in existence, the Mycoplasma genitalium bacteria.
The noise mutations found in the homologous sets of proteins which do not affect structure and function serve as the markers by which we may catalogue evolutionary relationships. Indeed, without the process of evolution, the existence of such markers would be absurd since they indicate a directly proportional relationship between sequence and time of divergence, which must indicate that such mutations occurred in a continuum nature reaching from a common ancestor all the way across the lineage to the organisms in question.
Were the organisms designed, or created spontaneously or without a line of descent from such a common ancestor, such a distinct molecular trail would not exist, since there would be no familial relationship by lineage from the organism to the marker organism the noise mutations on which are the ones being compared. So, we have a homologous protein set the result of common descent, that we can determine was clearly the result of duplication and divergence which most certainly indicates common descent, and then we may track the progress of such divergence by means of the direct relationship between amino acid noise mutation, or non-deleterious neutral mutations which do not alter either structure or function, which, again, would be impossible without common descent. The innovation of creating new proteins is only a small portion of the genetic mechanism by which evolution may occur, however. We need to consider systems of genetic control, the workings of regulatory DNA, the Hox genes and how changes in utero by means of alteration of the Hox genes and the associated node pathway may produce novel structures in multicellular Eukaryota, and at a more simple level, how changes in regulatory DNA as well as protein innovation may produce novelties in protozoa and prokaryota, since they lack Hox genes being that they are not multicellular and do not need such control systems.
Thus we encounter books that use quantum mechanics as a justification for an array of metaphysical and spiritual beliefs written by people who would be unable to interpret a Feynman diagram or recognize, much less solve, a simple work function problem, articles smugly asserting that certain structures and organisms could not possibly have evolved, whose authors would be unable to draw a Punnett Square, brazen proclamations that evolution violates the laws of thermodynamics from people who would be unable to calculate enthalpy changes, use the combined gas law or solve a simple problem of dynamic equilibrium
-Me
































