Up until this point we have given very little thought to the central process of the replication of biological systems as a whole, and we haven’t discussed biological evolution in detail. Nonetheless, this, as has been stressed, is the central process of biology and is one of the defining characteristics, namely the action, direct or indirect, by which molecules capable of holding the template for their own synthesis make more of themselves. In modern cells this is an indirect process in which the chemical system serving to maintain the central catalysis and metabolic processes allowing these replication processes to be thermodynamically possible is itself encoded in the molecules which guide their own synthesis. The process of duplication of DNA is of very high fidelity, as it must be in order to maintain the genomic integrity of the offspring organisms. The process of duplicating polymers hundreds of millions or sometimes billions of base pairs long (in eukaryote) requires that the chemical system which redirects resources toward this process and catalyzes its occurrence must also provide sets of extensive error correction and checking mechanisms because DNA polymers are sensitive and are constantly subject to chemical and UV attack despite being extensively protected from the outside world by the cell membrane and in the case of eukaryote, the nucleus and because the DNA polymerase responsible for catalyzing the formation of the phosphodiester bond is not a perfect copying device.
Nonetheless, the fidelity of the process is not perfect and this gives rise to the notion of descent genes, that errors in the replication process will produce offspring copies which are not exactly faithful replications of their parent copies. We haven’t actually discussed the nature of genome sequences in terms of what comprises them and how they are organized. One of the crucial points to stress is that there is a distinct difference between errors in the replication process that result from chemical and UV attacks, and alterations of sequence that result from errors in the copying process. The latter, not the former, are crucial to what we are trying to understand here. It is also important to understand that the central processes of evolution from a genetic standpoint are gene duplication and homologous recombination. Gene duplication is not the same as the replication of the genome. Gene duplication is an error which occurs when a gene is duplicated by accident therefore causing a sequence to gain an extra copy of a gene. This plays a major role in evolutionary processes and is central to the creation of novel genetic innovation.
Here it is time for another crucial distinction here, there is a distinction between recombination and point mutation errors. The former are the crucial processes which are central to the formation of descent genes. They are processes in which sequences of genetic material are accidentaly duplicated and inserted back into the genome, the order of particular sequences is altered and sequences recombine to form new sequences. These are the central molecular processes underlying the formation of descent genes. Point mutations on the other hand, are those mutations which result from the accidental insertion of an incorrect base in the place of the correct base on the template. The term point mutation is also sometimes used to describe frameshift mutations in which the insertion or deletion of a single base pair results in a frameshift. These are the errors which the error correction mechanisms must destroy in order to maintain genomic fidelity. Point mutations must be carefully distinguished from recombination and duplication when discussing molecular evolution.
We will not go into much detail about genetic recombination in this essay because it is about chemical evolution. For now, we need only understand the above as a general principle.
However we should presumably define evolution in some detail. What we discussed above are processes by which there can be alterations in the gene sequence of organisms from one generation to the next, giving rise to descent genes. What holds all these notions together is the powerful and central concept of evolution. Broadly speaking, in biological terms, evolution is a change in inherited characteristics over generations. Because of the processes of imperfect fidelity discussed above, lineages of biological organisms will alter over time. This is called descent with modification. This process is crucial to the existence of biological life.
All biological life in existence today, and in fact, as has been in existence for over 3 billion years, is governed by the fundamental processes that were discussed above. The reason for this is that all biological life descends from a common ancestor. The lineages propagated by the common ancestor have, over the course of 3 billion years, diverged. The ingenuity of this is that the process of descent with modification allows for distinct lineages with distinct phenotypes to diverge, over time result from a common ancestor. In other words, descent with modification can produce many distinct lineages from the offspring of a common ancestor. Over many generations, by repeated application of the process of descent with modification, this becomes more pronounced.
This by itself does not sound very mind bending and that’s because, by itself, it is not. But the utter ingenuity of descent with modification is that all biological life can be explained in terms of it. Many modern biological organisms are large complex multicellular organisms that arose late in the history of life on Earth. What makes descent with modification so powerful is that it can explain how, by the course of the alteration of lineages over successive generations, all modern organisms are related to very primitive, singled celled forms of life that existed over 3 billion years ago. Thus we come to the absolutely crucial part of descent with modification allowing it to explain the alterations of lineages over generations: Descent with modification is not a random process. What exactly does this mean? Let us return to fundamental principles to interpret it. Biological systems must survive by acquiring resources from their environs in order to run the metabolic processes which sustain them (refer to the diagram). Previously in our discussion we paid almost no attention to the concept of the “environment” in which a biological system survived except to state that it takes in raw materials from the environment and converts them into chemical energy to drive the central replication processes. But now the relationship between the organism and its environment becomes crucial, because organisms must (a) compete with each other in order to gain resources to survive and (b) must be able to adapt in order to survive in the environment. “Adapt” does not mean the individual organisms themselves adapt but rather their lineages adapt by virtue of the already discussed principle of descent with modification. This is crucial. These principles are effectively summarized as the struggle for resources. The principle of descent with modification is such that over generations, the descendants of organisms bear phenotypic differences from their ancestry. The principle of struggle for resources is that organisms which are more suitable to obtaining resources in their environment and therefore surviving and propagating (in other words, are more adapted) are those organisms which have a higher probability of passing their genetic material onto progeny. Put the two principles together and we have the principle of natural selection:
Over generations, the struggle for resources will ensure that there is a higher probability of organisms more adapted to the environment will survive and propagate their genetic material. Because of the principle of descent with modification, a descendant group of organisms exhibits variation among the population. The result of this variation is that some organisms will be more adapted to their environment than others by virtue of their genetic material. Thus, genetic material which confers upon its organism a higher probability of reproductive success will have in turn a higher probability of propagation throughout the population. As a consequence, the frequency of genes which confer such reproductive success upon their organisms will increase over time.
This is all stated compactly below:
Evolution: Over time, the characteristics of a lineage change.
Common Descent: All organisms have diverged from a common ancestor
Gradualism: Every organism, however different and distant from each other, is related, some distantly. Radical changes in phenotype and genotype have occurred by incremental processes by which lineages diverge from a common ancestor
Gene Frequency: The method by which evolution (the change in lineages) occurs is by changes in gene frequencies of populations. It is the change in proportion of individuals which have certain characteristics that determines the characteristic divergence of a lineage.
Natural Selection: The process by which gene frequencies are altered is characterized by the variations of organisms in a population, and how those variations determine the ability of the organism to survive and reproduce. The selection of alleles over others in a population will accordingly alter the frequency of genetic particles and hence the phenotype of a lineage.
What has been stated above is the central theory of modern biology. Having a firm understanding of evolution is critical for what follows.
We have seen that modern biological systems depend on crucial mutual relationships between distinct types of polymers. We must therefore understand a fundamental principle about such mutually co-dependant systems before we discuss chemical evolution. In other words we must understand the principle of coevolution.
All life on this planet has a common ancestor. We speak, therefore, of the similarity of organisms in terms of shared characteristics. An ancestral characteristic, usually called a0, is one that is held by a common ancestor of a group of taxa under discussion, which will then change or remain the same depending on the course of natural selection for the different lineages from that ancestor. This is especially complicated because under some circumstances, an ancestral characteristic can become more conserved than before, such as when one of its homologs becomes a pseudogene, or it can become less conserved than before, such as when it undergoes a homologous duplication. So, we can think of an organism as a set of characteristics, and we can group organisms based on their characteristic sets. When two distinct organisms have an identical or very similar characteristic, there are two possibilities. The first possibility is that this similar characteristic is because the two organisms have a common ancestor. This is then called a homologous character between the two organisms. Or, the organism could belong to a different lineage, but have evolved the characteristic independently. This is called a homoplasious characteristic. It is also called convergent evolution.
The fact that pseudogenes (a pseudogene is typically defined as a gene which is inactive and does not produce a functional protein, which could be for a number of reasons such as frameshift or a nonsense mutation) produced by homologs can induce alterations in the conservation of related paralogous sequences is crucial, for it is a primary mechanism for what we are talking about. Understanding this principle puts us in a position to understand coevolution. Coevolutionary processes occur when alterations in one thing produce alterations in another. This is very vague and we need to elucidate this more precisely. For our purposes, the most important principle will be that the alteration of conservation of sequences by the introduction or alteration of other sequences can result in a mutual dependency between the sequences. It is important to understand that there is no magic behind the mechanism by which the introduction or alteration of a sequence alters the conservation of another sequence, it is merely that it alters the tendency of mutations of that sequence to eliminate the organism carrying it due to natural selection. A paralogous duplication of a conserved gene will relax the conservation on both copies, for example, since the existence of another copy decreases the probability that a mutation on one copy will be deleterious to the organism. However, since both sequences are subject to random mutation more or less independently, a common result of this set up is that the paralogous sequences are both conserved by mutations. This is quite simple to understand. If the alteration or introduction of sequences can induce relaxation of conservation, it can do the opposite. In this case, it can conserve both sequences. This is a crucial component of coevolution. A nucleotide sequence being conserved does not mean it was always conserved. We will return to this later.
This is an example of a fundamental principle we must understand and must bear in mind as we discuss what follows.
During the course of this essay, the complex and mutual relationships between DNA and protein have been stressed repeatedly in terms of understanding what life is in chemical terms. The extent to which these mutual relations were shown and the degree to which the two required each other may have confused some readers into thinking that their evolution as self replicating systems would not be possible.
I suppose this is the point in the essay at which this confusion is exploded.
We haven’t discussed in depth how modern evolutionary theory incorporates understanding of life at the molecular level and like many other things not in this essay, it is too large and complex to be incorporated here. We can however give a short discussion of chemical evolution. Previously, our discussion was virtually entirely referring to the relationship between DNA and protein. Now, however, RNA takes center stage. It makes perfect sense to discuss chemical evolution here because we are supposed to be discussing life in fundamental chemical terms and it doesn’t really get much more fundamental than this.
Previously, our discussion focused on modern biological life, in other words, the systems that exist today. Of course, the processes we discuss are so fundamental to these systems that that have been in place for a very long time. The processes of transcription and translation in modern cells have been in place for 3 billion years, (these are the fundamental processes of transcription and translation. There are differences between prokaryota and eukaryota that came later). Previously, our discussion focused virtually entirely on DNA and protein, and emphasized the fundamental basis of biological life as the interaction between polymers which can guide their own synthesis and those whose sequence gives rise to physical properties allowing them to maintain the chemical system which can replicate under the control of those polymers which guide their own synthesis. Nonetheless, these complex interlocking systems did not exist in the most primitive forms of life which gave rise to the cellular processes upon which all modern life is based. In other words, we have to discuss the process of chemical evolution in order to make sense of the emergence of the existence of modern biological systems which involve mutual dependency between fundamentally different polymers in the system.
In order to understand how the modern cell (defined henceforth as a cell which performs the central processes of protein biosynthesis and DNA replication as given by the central dogma of molecular biology) arose, we must turn our attention to RNA. Previously, RNA molecules were relegated to the role of intermediary molecules. Now they take center stage. RNA is crucial to understanding the rise of modern cells performing protein biosynthesis.
RNA molecules have been studied extensively in vitro, and as a consequence, a possible scenario for the early history of life on Earth, which fits best with the in vitro evidence gathered over the last 25 years, and certain features in modern cellular structures, has emerged as a leading candidate to explain the evolution of the most primitive systems that could be called “biological”. This scenario, called the RNA world hypothesis, is sketched out in some detail below. We will not be going further backward than the RNA world, since we know little about the hypothetical RNA world as it is, let alone what may have come before.
We have seen that RNA molecules can serve functions in cellular processes by virtue of templated polymerization and the ability of short RNA sequences to fold up into distinct conformations by virtue of hydrogen bonding. This ability should have hinted at something crucial. Primitive forms of life before the rise of the central process of protein biosynthesis maintained a chemical system and encoded this system in the same molecules. In modern cells, these processes are closely linked but separate. DNA nucleic acid molecules serve their function solely by virtue of their sequence. They retain a linear structure. Polypeptides, on the other hand, bristle with reactive chemical groups and serve their function on virtue of their 3D structure but they cannot encode their own replication. In other words, protein biosynthesis was not present in the most primitive original forms of life before the rise of modern cells, because proteins cannot encode their own replication. The crucial point here is that it is thought that before the rise of protein biosynthesis, RNA molecules simultaneously served as the central catalytic molecules allowing the catalysis of the templated polymerization of RNA molecules. When an RNA molecule serves as a catalyst it is a ribozyme. Ribozymes remain in modern cells despite having been almost entirely superseded by polypeptides, and are thought to be a very ancient feature of modern cells which remain present in fundamental processes like protein biosynthesis and splicing, despite being almost wholly outphased by polypeptides. Like polypeptides, RNA molecules can fold into preferred conformations on the basis of their sequence. Also like polypeptides, this can result in the formation of active sites which allow the RNA molecules to act as catalysts, and can alter conformations on the basis of the binding of particular ligands, small molecules or other RNA molecules. Unlike polypeptides, RNA molecules can template their own polymerization. Putting these four properties together, it is clear from our previous discussion about how these properties underlie everything in modern biology, that an RNA world before the rise of protein biosynthesis would have reached very high levels of biological sophistication.
Let us try to expound on this in more detail. It is clear that particular catalytic processes which would have been crucial in the early RNA world would have been the catalysis of the formation of the phosphodiester bond linking RNA monomers together with other RNA polymers serving as a template for their polymerization. The original process would have been significantly more primitive than the one in place today, which responds to very complex intracellular feedback loops and has extensive mechanisms to ensure the fidelity of the genome is maintained. Conversely, the lack of sophistication of the process of templated polymerization in the RNA world would pose less of a problem for the ribozymes since they were significantly smaller. Today, the maintenance of genomes hundreds of millions of base pairs long is only possible because of the error correction mechanisms. We shall see how this process evolved by applying the principle of coevolution to the early history of DNA shortly.
The existence of ribozymes which catalyzed the formation of the phosphodiester linkage on RNA templates presents opportunities for the process of natural selection to act on RNA based systems. In pools of RNA molecules, the existence of ribozymes whose catalytic properties allow them to replicate RNA templates allows for the development of cooperative systems of RNA molecules. This process would have allowed for the propagation of RNA templates which, while not able to catalyze the phosphodiester linkage leading directly to their own replication, can catalyze other reactions which aid the system in maintaining and propagating itself such as the sequestering of particular metabolites. If a particular RNA can serve to catalyze the formation of the phosphodiester bond, it can serve as a catalyst not just for the replication of other RNA, but also itself in a positive feedback loop. Such an RNA is known as an autocatalyst. Autocatalysis was crucial in the early history of life on Earth, because the catalysis of the formation of the phosphodiester bond linking RNA molecules together is central to the process of replication and evolution of RNA based systems.
Primitive processes that we would recognize as precursors of the modern system of DNA-protein involve mutually beneficial sets of RNA where ribozymes capable of propagating RNA templates could catalyze the replication of RNA molecules in the immediate vicinity. Such advantageous ribozymes could then diffuse throughout a population of other ribozymes. Because the process by which ribozymes are duplicated is subject to the same descent with modification as modern copying processes, we can apply the principle of natural selection to these processes. Those ribozymes which confer upon systems to which they are confined particular advantages will survive and propagate by directing resources toward the continuation of those systems in favor of other systems lacking those ribozymes, hence during the process of ribozymal replication, those advantageous ribozymes will increase in frequency in a set pool of ribozymes.
The formation of competing systems of ribozymes therefore would have reached a crucial step with the development of distinct compartments to separate competing systems so as to facilitate the action of natural selection on physically distinct systems by ensuring that particular advantageous ribozymes are confined to particular systems.
The process of forming distinct vesicular compartments in which biological molecules are compartmentalized from their surroundings long predates the rise of sophisticated modern cells. The process occurs because the molecules that constitute the membranous film of cells have particular properties that allow for their spontaneous formation into a spherical membrane in an aqueous environment. This is one of many reasons why water has left a permanent stamp on our biology. All the processes under discussion in modern cells occur in aqueous environments. So too, all the process we are discussing now, where mutually cooperative systems of ribozymes compete with other such systems, also occur in an aqueous environment. The reason that a spherical membrane can form spontaneously in such an environment is because the molecules that constitute them are phospholipids, as shown below:
Phospholipids are able to perform this function because the long alkyl chains which are ester linked to the glycerol are hydrophobic, but the phosphate group is hydrophilic. As a consequence, the molecules of phospholipid will spontaneously assemble in the most thermodynamically favorable conformation, in which the hydrophobic alkyl chains aggregate to exclude water and the phosphate groups contact the surrounding aqueous environment. As a consequence, a spherical bilayer forms:
Thus far we have discussed the process by which systems solely relying on RNA based replication and metabolism. In order to discuss the process by which this was substituted with a DNA protein system in which RNA was relegated to the role of intermediary, we must apply the principles of coevolution discussed above.
Ribozymes have been studied extensively since they were discovered in 1982. It is known that they have the ability to bind amino acids to them. It is still not known the process by which RNA molecules initially served to template amino acid sequences by virtue of codons as a precursor to modern encoding by DNA. In vitro analysis suggests there is a correlation between preponderance of codons corresponding to particular amino aicds in ribozymes preferentially binding those amino acids, but the imperfection of this correlation means its interpretation as the reason for the current system of codons is still open to debate, especially because the modern process does not involve direct matching between a codon and its amino acid anyway, but rather particular tRNA holding the amino acids which have the anticodons to particular triplets. It is, however, known that amino acid specific ribozymal catalysts of the aminoacylaction reaction, ribozymes catalyzing the peptidyl transferase reaction, and ribozymes capable of binding particular amino acids can all be formed in vitro from randomly generated sequences of DNA, and therefore that the process of using particular RNA as templates, other RNA as holders of the amino acids which could then template on a particular sequence and hold the amino acids in place, and finally a third RNA as a catalyst, could certainly have evolved in these primitive cells.
It is certainly worthy of note that even in modern cells, the central peptidyl transferase reaction of modern ribosomes is still catalyzed by rRNA. It is though this feature is reflective of the descent of modern cells from RNA based systems in which the binding of amino acids to RNA templates would have conferred upon those RNA particular advantages. In vitro, from randomly generated sequences, it is possible to synthesize RNA which can catalyze the formation of the phosphodiester linkage, and the sequences of ribozymes which can catalyze the formation of the amide linkages. It is thought, therefore, that the DNA-protein based system is evolved from a primitive precursor in which ribozymes could hold the sequence to short polypeptides whose sequences conferred catalytic abilities. This would have presented advantages to any RNA based system which could propagate the continued synthesis of such polypeptides because polypeptides make better catalysts than ribozymes do, largely because their 20 distinct side chain groupings means that the possible 3D structures and non covalent interactions that can form are much more varied. It would therefore also be naturally selected for the propagation of those ribozymes which could catalyze the formation of the amide linkage on this template. Only a very small number of possible polypeptide sequences could actually code for the 3D structure to confer catalytic benefits on ribozymal systems, but because those sequences would have been naturally selected for (in other words, the ribozymes which corresponded to those sequences would have been selected for) out of the enormous pool of ribozymes which existed in the RNA world, those sequences would have propagated with greater frequency. Once again, we apply the principal of natural selection to the formation of short polypeptides from ribozymes. The duplication of those RNA based systems in the RNA world would have led to copying errors producing descent variations in the sequences which would lead to variation of possible polypeptide sequences which in turn are selected for if they confer benefits.
Now, if you can remember that far back, apply the principle of coevolution to this process. The formation of mutually cooperative RNA based systems from individual ribosomes which can confer benefits in terms of ability to propogate of ribozymes in the immediate vicinity would have led to their conservation. In other words, they would tend to become more resistant to random errors accumulating in their sequences since those sequences were necessary and such errors would have been deleterious and removed that system from the population by natural selection. “Resistant to random errors” simply means that it is less probabilistically favorable for those errors to propagate. It is helpful to think of conservation of sequences like an evolutionary ratchet mechanism. A sequence, which was not conserved before, can become conserved because of alterations in other sequences.
We can apply the principle of coevolution to the replacement of ribozymes by polypeptides. At first glance this would appear intractable because our modern overwhelmingly polypeptide based systems required catalysis by polypeptides in order to propagate. But we should know better than that now. If a particular ribozyme templated a polypeptide which could perform a function of catalysis more efficiently than another ribozyme within the system, then by applying the principle of coevolution, such an event would relax the conservation on the sequence of that ribozyme. It is most likely that such processes are what led to the outphasing of ribozymes in modern cells. The nice thing about this explanation is that it is successful in explaining why the process of catalyzing the formation of the peptide linkage of polypeptides is still catalyzed by RNA. Obviously, that particular ribozyme’s function cannot be outphased by a polypeptide which could perform the function better.
What therefore differs between the modern central processes and the central processes described in these primitive cells is the use of DNA sequences instead of RNA sequences to encode the sequence of polypeptides and ribozymes maintaining the chemical system. The formation of DNA based systems would have presented an advantage once sufficient biological complexity had been reached to require very long sequences of nucleotides to store all the hereditary information of the system. Because of its deoxyribose group, DNA polymers are much more stable and resistant to breakage than RNA polymers.
In the course of this discussion of the hypothetical RNA world as the most likely scenario for early life on Earth, the application of the principles of coevolution and natural selection has been crucial. By applying the principle of coevolution to RNA based systems, we have no conceptual difficulty with the fact that in the modern cellular system, the processes which create and maintain the polypeptide based system is itself dependant on complex systems of polypeptides. This should be instructive to all those wishing to understand fundamental evolutionary processes.
"Physical reality” isn’t some arbitrary demarcation. It is defined in terms of what we can systematically investigate, directly or not, by means of our senses. It is preposterous to assert that the process of systematic scientific reasoning arbitrarily excludes “non-physical explanations” because the very notion of “non-physical explanation” is contradictory.