Chapter 2 Evidence for Evolution

Darwin described evolution as descent with modification. It turns out that he was not the only one to think about the ever-changing world in this way. Another prominent naturalist of the time, Alfred Russel Wallace, independently conceived the theory of evolution through natural selection. Like Darwin, Wallace conducted extensive fieldwork in the tropics and was a meticulous observer of the natural world. In 1858, Wallace wrote a letter to Darwin—who was by then an eminent scholar but had not published his views on evolution yet—detailing his own ideas about natural selection. This led to the joint publication of short abstracts detailing Darwin’s and Wallace’s views of evolution, and more importantly, it motivated Darwin to finish and publish his famous work, On the Origin of Species, in 1859. So, why does most of the credit for formalizing evolutionary theory go to Darwin rather than Wallace? Well, Darwin was undoubtedly first, ruminating on his ideas about evolution for decades before deciding to publish. As a consequence, he was able to introduce his views in much richer detail and provided many lines of evidence in support of his theory.

Explore More

To learn more about Alfred Russel Wallace, listen to “He Helped Discover Evolution, And Then Became Extinct”, an NPR story published on the 100th anniversary of his death.

So, what evidence do we have that evolution is actually happening? What is the evidence for the occurrence of change in inherited traits across successive generations, the transformation of species through time, and the emergence of new species?

As mentioned in Chapter 1, we can (must!) treat Darwin’s idea of descent with modification as every other scientific hypothesis and develop testable predictions that are falsifiable with data. The idea of descent with modification makes five predictions that we can address with data:

  1. Species change through time (microevolution).
  2. Lineages split to form new species (speciation).
  3. Novel forms derived from earlier forms (macroevolution).
  4. Species are not independent but connected by descent from a common ancestor (common ancestry and homology).
  5. Earth and life on Earth are old (deep time).

This chapter takes a closer look at the different lines of evidence we have in support of evolution.

2.1 Microevolution

Microevolution is the change in inherited traits of a population from one generation to the next, ultimately leading to the accumulation of changes and the transformation of species through extended time periods. Heritable trait in this context can refer to any phenotypic trait (for example the average beak size in a population of a bird) or a molecular trait (for example the frequency of alternative alleles at a particular locus). While changes in most traits from one generation to the next are subtle at best, strong natural selection can lead to significant and detectable evolutionary changes in very short periods of time. For example, check out the brief video produced by the Kishony Lab at Harvard Medical School below. They have designed a simple way to observe how bacteria evolve as they encounter increasingly higher doses of an antibiotic and adapt to survive—and thrive—despite of it.

You might say that bacteria are different. After all, assuming a generation time of 30 minutes, the two-week experiment described in the video represents over 670 generations of bacterial evolution. Translated to humans, that would represent about 17,000 years. Looking back that far in history, that was a time when humans exclusively lived as hunter-gatherers and just about started to migrate into North America over the Bering Land Bridge.

One of the most persistent misconceptions about evolution is that it takes millions of years to occur. However, the reality is that microevolution—in principle—can happen in as little as one generation. Those short-term changes can be very hard to detect, because our measurement error of a trait of interest if often larger than the actual per-generation evolutionary change. Nonetheless, over the course of just a handful of generations, natural populations may exhibit significant evolutionary change that we can detect with high confidence using both genetic markers (i.e., measuring changes in allele frequencies) or phenotypic measurements.

The convergence of ecological and evolutionary timescales is a relatively recent insight. Darwin did not think that we would be able to directly observe evolutionary change over short periods of time:

“We see nothing of these slow changes in progress, until the hand of time has marked the lapse of ages.”

— Darwin, 1859

However, with technological breakthroughs that improved the precision of measurements we take in natural populations and with scientists’ ability to track populations continuously through time, we have accumulated data across dozens of study systems—from microbes to vertebrates—documenting microevolutionary change within a few to a few dozen generations (see Hairston et al. 2005; Carroll et al. 2007). Here, I will briefly introduce you to evidence for rapid evolution gathered in one such study system (the threespine stickleback). In this chapter’s case study, you will explore another example based on a time series of beak size variation of a species of Darwin’s finch.

2.1.1 The Case of Threespine Stickleback

Threespine stickleback (Gasterosteus aculeatus; Figure 2.1) are a widely used system to study evolution and have been shown to rapidly adapt to novel environmental conditions. Stickleback are primarily marine and inhabit coastal waters throughout much of the Northern hemisphere. They are small fish (usually less than 8 cm in length) that exhibit exquisite adaptations to avoid predation in their environment: the sides of their body are covered in bony plates, and they have spines associated with their dorsal and pelvic fins that, when spread out, can dissuade a predator from capturing or consuming them.

Threespine stickleback (*Gasterosteus aculeatus*). Photo by Gilles San Martin, [CC BY-SA 2.0](, via Wikimedia Commons

Figure 2.1: Threespine stickleback (Gasterosteus aculeatus). Photo by Gilles San Martin, CC BY-SA 2.0, via Wikimedia Commons

Since the last ice age, as glaciers retreated and left behind a plethora of new streams and lakes, stickleback have also colonized freshwater habitats, which differ in many ways from the original marine habitats. Freshwaters not only exhibit a different water chemistry, but they also tend to harbor fewer predators and different food resources. So, over the past 10,000-20,000 years, stickleback in freshwater environments have evolved a number of phenotypic differences compared to their marine ancestors, including a drastic reduction of the armor plates along the body and—in some instances—a loss of the pelvic spines (Jones et al. 2012). Moreover, stickleback have also adapted to different niches within freshwaters, and there are distinct morphs in streams (vs. lakes), and in benthic and pelagic habitats within lakes (Hendry et al. 2013). Different freshwater ecotypes exhibit distinct body shapes and colorations and are adapted to consuming different types of food items.

So, how long might it take for the evolution of the traits that vary so drastically across different stickleback forms? Sure, 20,000 years is a blink of an eye in the history of life on the planet, but it is still an eternity for any researcher that might want to observe evolution in action!

One hint at how fast stickleback might evolve comes from a fascinating natural experiment. In 1964, the Great Alaska Earthquake brought widespread destruction to the region and literally reshaped the regional topology. For example, multiple islands in the Prince William Sound and the Gulf of Alaska were lifted up further out of the ocean, creating new freshwater ponds where previously were none. In the time since the earthquake, stickleback have colonized these new freshwater ponds, and within just 50 years, they have evolved similar phenotypic traits that we know from stickleback in continental freshwaters (Lescak et al. 2015). Hence, adaptation to freshwaters upon colonization from the ocean may occur in a matter of a few decades rather than gradually over thousands of years of evolution.

To get a better understanding of just how fast evolution may proceed, researchers from the University of Basel in Switzerland decided to conduct a manipulative field experiment using lake and stream stickleback (Laurentino et al. 2020). The researchers first sequenced the genomes of lake and stream stickleback to detect the genomic regions that are differentiated between ecotypes and likely contain the genes involved in shaping the phenotypic differences between them. After that, they generated F2 crosses between the ecotypes, which is like shuffling a deck of cards from a genomic perspective: individual F2 offspring essentially exhibit a random mixture of genomic segments from their lake and stream ancestors. If these F2 hybrids were introduced into a stream environment, it is predicted that individuals that exhibit stream alleles in regions of the genome important for the expression of stream-specific phenotypic traits perform better than individuals with lake alleles. At a population level, this should lead to an increase in the frequency of alleles characteristic for stream stickleback. And… this is exactly what happened when the researchers actually conducted the experiment! More importantly, the predicted genetic changes were actually detectable within just one generation of F2 individuals being released into a stream habitat. So, when selection is strong and researchers have the capability to measure changes in traits with adequate precision, we can actually detect the small, generation-to-generation changes that ultimately accumulate to give rise to more conspicuous evolutionary changes that are also easier to detect.

2.2 Speciation

Speciation is the process by which new species arise. Before we dive into how speciation actually works, we should agree on what species actually are:

Definition: Species3

A biological species is a group of organisms that can reproduce with one another in nature and produce fertile offspring. Species are characterized by the fact that they are reproductively isolated from other groups, which means that the organisms in one species are incapable of reproducing with organisms in another species.

In phylogenetic trees, speciation is depicted as a singular point that represents the moment one lineage splits into two (red circle in Figure 2.2). Although speciation can occur instantaneously, for example when polyploidization is involved (see Chapter 11), new species typically evolve gradually, from a single variable population, to populations within a species that are differentiated but still connected through gene flow, to distinct species that are completely isolated from each other (Figure 2.2). Movement along this “speciation continuum” is driven by the accumulation of reproductive barriers that prevent individuals from mating or successfully producing offspring with each other. Importantly, movement along the speciation continuum can be bidirectional, and reproductive barriers can disappear such that two species merge back together into one. If speciation is a gradual process, we should be able to observe all stages along the speciation continuum in nature, not just the endpoints of the speciation process with reproductively isolated species as insinuated by phylogenetic trees.

Speciation is not typically an instantaneous process. Rather species evolve gradually along a speciation continuum.

Figure 2.2: Speciation is not typically an instantaneous process. Rather species evolve gradually along a speciation continuum.

2.2.1 Ring Species

One phenomenon that perfectly illustrates the speciation continuum and provides evidence for ongoing speciation are so-called ring species, in which two reproductively isolated populations living sympatrically (purple and yellow in Figure 2.3) are connected by a geographic ring of populations that can interbreed. Such ring species arise when an original population disperses around a geographic barrier, and populations diverge gradually, for example as a consequence of adaptation to local environmental conditions. Once populations come into sympatry again behind a geographic barrier, sufficient differences have accumulated such that populations cannot interbreed with each other anymore.

A schematic representing a ring species. Individuals are able to successfully reproduce with members of adjacent populations, as indicated by the black arrows. However, as populations disperse around a geographic barrier and diverge gradually, they are unable to reproduce when they come into contact again. This process represents a form of speciation occurring with gene flow.

Figure 2.3: A schematic representing a ring species. Individuals are able to successfully reproduce with members of adjacent populations, as indicated by the black arrows. However, as populations disperse around a geographic barrier and diverge gradually, they are unable to reproduce when they come into contact again. This process represents a form of speciation occurring with gene flow.

Several well-studied examples of ring species exist. For example, plethodontid salamanders of the Ensatina eschscholtzii complex have colonized different parts of California from the north and expanded southward around the Central Valley, which represents unsuitable habitat for salamanders. At the southern tip of the Central valley, salamander populations from the eastern and western mountain ridges that surround the valley came into secondary contact and are unable to interbreed due to the genetic changes that have accumulated during evolution in isolation (Pereira et al. 2011). Other examples of ring species include the herring and lesser black-backed gulls (genus Larus) that have a circumpolar distribution and cannot interbreed in northern Europe. In addition, greenish warblers (Phylloscopus trochiloides) form a ring species around the Himalayas (Irwin et al. 2005).

2.2.2 Catching Speciation in Action

Evidence for ongoing speciation comes from a wide variety of study systems that do not necessarily occur in a ring. Especially when populations are subject to strong natural selection (for example if they inhabit different habitat types), we do not only observe adaptive differentiation across populations but also the inadvertent emergence of reproductive isolation. So, as populations acquire traits that make them better suited to the environments they inhabit, individuals also tend stop interbreeding with individuals from different populations that have different traits. This is called ecological speciation, and we explore this concept in more detail in Chapter 11. As a consequence, we can often find populations along various stages of the speciation continuum.

Timema stick insects are a great example. These insects are distributed in the western United States, and different species of Timema have adapted to live and feed on different host plants. While most species are uniformly green, gray, or brown and live on broad-leaved host plants, several species have independently evolved a dorsal stripe that provides camouflage on needle-like leaves, providing protection from predators (Figure 2.4). In at least one species, T. cristinae, both uniformly colored and striped populations exist depending on whether they live on host plants with broad or needle-like leaves. While these phenotypic differences with T. cristinae parallel differences between other Timema species, reproductive isolation between different T. cristinae populations utilizing alternative host plants is still incomplete (Nosil 2007). Hence, the different ecotypes represent intermediate stages of speciation. Similar variation along the speciation continuum has been uncovered in a wide variety of other natural study systems, and we will get to know some of them in more detail later in the book.

*Timema* stick insects have adapted to different host plants. Left: T*. cristinae* on one of its hosts, Greenbark (*Ceanothus spinosus*). Photo by [Aaron C](, [public domain]( Right: *T. poppensis* on its host, Redwood (*Sequoia sempervirens*). Photo by [Moritz Muschick](, [CC BY 2.0](

Figure 2.4: Timema stick insects have adapted to different host plants. Left: T. cristinae on one of its hosts, Greenbark (Ceanothus spinosus). Photo by Aaron C, public domain. Right: T. poppensis on its host, Redwood (Sequoia sempervirens). Photo by Moritz Muschick, CC BY 2.0.

2.3 Macroevolution

When discussing evidence for microevolution and speciation, we primarily turned to evidence from observations of living forms and from experiments. Here, we will consider a different type of evidence, fossils, which are simply the remains of prehistoric organisms. Fossils are usually petrified or preserved as a mold in rock, although the rapid thawing of permafrost accelerated by climate change has also revealed frozen fossils with amazing preservation of soft tissues. The mere fact that fossils exist and often represent forms that are clearly distinct from any organisms alive today (Figure 2.5) is testament to the ever-changing faunas and floras that have inhabited Earth through time. The fossil record also teaches us that extinction is an important aspect of evolutionary change, just as the generation of novel forms.

Explore More

If you are interested in learning more about what fossils teach us about evolution, I highly recommend Donald R. Prothero’s book “Evolution: What the Fossils Say and Why It Matters”.

A woolly mammoth (left) and an American mastodon (right) facing each other, showing the physical differences between the two extinct animals. Illustration by [Dantheman9758](, [CC BY-SA 3.0](

Figure 2.5: A woolly mammoth (left) and an American mastodon (right) facing each other, showing the physical differences between the two extinct animals. Illustration by Dantheman9758, CC BY-SA 3.0.

2.3.1 Geographic and Temporal Patterns of Succession

Perhaps the strongest evidence fossils provide for Darwin’s notion of evolution is that they are not randomly distributed, neither in terms of geography nor time. The constant cycle of descent with modification and extinction has created predictable patterns in fossil deposits across the planet, which is evident as a succession of novel forms that are derived from earlier ones.

Patterns of succession are evident geographically, because there tends to be a regional correspondence between fossils and living forms. That is, we tend to find fossil relatives of extant species in the same areas where extant species live today. For example, marsupial fossils are particularly common in Australia just like living marsupials today. This is also reflected in the biogeographic distribution of extant forms, which frequently tracks the changing landmasses driven by continental drift. So marsupials are not only found in Australia but also South America, which were once connected through Antarctica in a large landmass called Gondwana until about 140 million years ago. The non-random distribution of fossils means that we can develop hypotheses and test predictions about the fossil record. For example, if the biogeographic distribution of marsupials was the consequence of a once contiguous distribution on Gondwana, we would predict the presence of marsupial fossils on Antarctica, which have indeed been found (Woodburne & Zinsmeister 1982).

Patterns of succession are also evident temporally, with fossils exhibiting more ancestral traits being found in older layers of rock compared to derived forms. Subsequently deposited rock layers can therefore shed light into the temporal dynamics of trait evolution and reveal entire time series that connect forms with disparate phenotypes. For example, we know that horses with their single hoof descended from multi-toed ancestors because we have a time series of horse fossils that illustrates the toe reductions through time (Figure 2.6).

Equine evolution, composed from skeletons of the State Museum for Natural History Karlsruhe, Germany. From left to right: Size development, biometrical changes in the cranium, reduction of toes on the left forefoot. Image by H. Zell, [CC BY-SA 3.0](, via Wikimedia Commons.

Figure 2.6: Equine evolution, composed from skeletons of the State Museum for Natural History Karlsruhe, Germany. From left to right: Size development, biometrical changes in the cranium, reduction of toes on the left forefoot. Image by H. Zell, CC BY-SA 3.0, via Wikimedia Commons.

2.3.2 Transitional Fossils

If novel forms are indeed descendants from earlier forms, the fossil record should capture evidence of what Darwin called the “transmutation of species”. Hence, we should be able to find transitional fossils that exhibit traits common to both an ancestral group and its derived descendants. The first, and perhaps still one of the most spectacular, transitional fossils ever found was Archaeopterix (Figure 2.7), which shared traits with Mesozoic dinosaurs and modern birds. Its jaws contained sharp teeth, and it has wings with fingers and claws, a long bony tail, hyperextensible second toes (like Velociraptor’s), and feathers. Hence, Archaeopterix represents a transitional fossils between modern birds and non-avian dinosaurs (Ostrom 1976). Similar transitional fossils have been discovered to link the transition from aquatic to terrestrial vertebrates (Tiktaalik), from marine mammals to their terrestrial ancestors (Pakicetus, Ambulocetus, and Remingtonocetus), and from quadrupedal to bipedal hominids (Australopithecus afarensis), among many others.

*Archaeopteryx lithographica*, specimen displayed at the Museum of Natural History, Berlin, Germany. Photo by H. Raab, [CC BY-SA 3.0](, via Wikimedia Commons.

Figure 2.7: Archaeopteryx lithographica, specimen displayed at the Museum of Natural History, Berlin, Germany. Photo by H. Raab, CC BY-SA 3.0, via Wikimedia Commons.

2.4 Common Ancestry and Homology

A key prediction of Darwin’s notion of evolution is that species are not independent but connected by descent from a common ancestor. Phylogenetic trees (Figure 2.8) are representations of that connectedness, and if we were to travel back in time toward the root of the phylogenetic tree, we would expect lineages to merge into the origin of life, the original being from which all living forms descended. In the absence of time travel, the critical evidence for common ancestry of all life is homology.

Definition: Homology

Homology is the similarity of the structure, physiology, or development of different species based upon their descent from a common evolutionary ancestor.

The original phylogenetic trees Darwin used to illustrate common ancestry. Left: The Tree of Life image that appeared in Darwin's *On the Origin of Species by Natural Selection* (1859). It was the book's only illustration. Right: Charles Darwin's original 1837 sketch, his first diagram of an evolutionary tree from his *First Notebook on Transmutation of Species* (1837). Illustrations by Charles Darwin, [Public Domain](

Figure 2.8: The original phylogenetic trees Darwin used to illustrate common ancestry. Left: The Tree of Life image that appeared in Darwin’s On the Origin of Species by Natural Selection (1859). It was the book’s only illustration. Right: Charles Darwin’s original 1837 sketch, his first diagram of an evolutionary tree from his First Notebook on Transmutation of Species (1837). Illustrations by Charles Darwin, Public Domain.

Homology explains why all forms of life share certain characteristics. All forms of life, from microbes to plants and animals, share the same molecular building blocks: lipids that form the boundaries of cells and organelles, nucleic acids that encode information, proteins that play both structural and catalytic roles, and glycans that serve structure, energy storage, and regulatory purposes (Marth 2008). Different life forms, even those separated by billions of years of evolution, share these building blocks because they inherited them from a common ancestor.

Homologies also occur at a more narrow scope. For example, Darwin was puzzled by the structural similarity in the forelimbs of terrestrial vertebrates even though they serve entirely different functions (Figure 2.9):

What could be more curious than that the hand of a man, formed for grasping, that of a mole for digging, the leg of a horse, the paddle of the porpoise, and the wing of a bat, should all be constructed on the same patterns, and should include the same bones, in the same relative position?

— Darwin, 1859

How do these similarities in structure arise? Again, it is because all terrestrial vertebrates inherited this shared limb structure from their common ancestor. Rather than “inventing” different types of forelimbs for different purposes (grabbing, running, flying, swimming), evolution has gradually modified and repurposed existing forelimb structures for new functions.

Limbs of terrerstrial vertebrates exhibit the same structure, with homologous bones  (color-coded) that are arranges in the same order irrespective of function. Illustration by Волков Владислав Петрович, [CC BY-SA 4.0](, via Wikimedia Commons.

Figure 2.9: Limbs of terrerstrial vertebrates exhibit the same structure, with homologous bones (color-coded) that are arranges in the same order irrespective of function. Illustration by Волков Владислав Петрович, CC BY-SA 4.0, via Wikimedia Commons.

Homologies occur in nested sets. Closely related species exhibit a higher number of homologies, because they inherited those traits from a shared ancestor. More distantly related taxa exhibit differences in their traits, because they have been on independent evolutionary trajectories for prolonged periods of time. Accordingly, analyses of homologous structures are used to infer phylogenetic relationships among taxa. We group species that share a lot of homologous traits closely together on a phylogenetic tree, while those that share few are further apart (see Chapter 7 for more information).

Important Note

Not all trait similarities are the consequence of homology! Analogy in biology describes similarity of function and superficial resemblance of structures that have different origins. For example, the wings of a fly, a moth, and a bird are analogous, because they evolved independently as adaptations to a common function (flying). Analogies are a consequence of convergent evolution, where unrelated lineages evolve similar traits, typically as adaptations to similar lifestyles or environmental conditions.

2.4.1 Shared Flaws

Perhaps the most compelling evidence for common ancestry comes from homologous structures that serve no purpose at all. The origin of these structures cannot be explained by adaptive evolution, and the only reason that some organisms exhibit functionless structures is because they have inherited them from an ancestor. I want to briefly introduce two such cases, vestigial organs and pseudogenes.

Vestigial Organs

Vestigial organs are rudimentary traits that lost some or all of the ancestral functions of the structure. Classic examples of vestigial organs in humans are the appendix (vestigial caecum), the coccyx (vestigial tail), and some muscles connected to the ear, which allowed for ear mobility in other primates. Structural vestigial organs in other animals include remnants of limbs that are still expressed in some whales, snakes, and flightless birds (Figure 2.10).

Vestigial limbs are common in tetrapods. A. Skeleton of a baleen whale showing the vestigial hindlegs (structure c). Illustration from Meyers Konversionlexikon, [Public Domain]( B. Vestigial hindlegs (spurs) in a *Boa constrictor*. Photo by Stefan3345, [CC BY-SA 4.0]( C. Little spotted kiwi (*Apteryx owenii*) have vestigial wings that are completely invisible below the plumage. Photo by Judi Lapsley Miller, [CC BY 4.0](

Figure 2.10: Vestigial limbs are common in tetrapods. A. Skeleton of a baleen whale showing the vestigial hindlegs (structure c). Illustration from Meyers Konversionlexikon, Public Domain. B. Vestigial hindlegs (spurs) in a Boa constrictor. Photo by Stefan3345, CC BY-SA 4.0. C. Little spotted kiwi (Apteryx owenii) have vestigial wings that are completely invisible below the plumage. Photo by Judi Lapsley Miller, CC BY 4.0.

One of the most fascinating examples of vestigiality comes from Mexican cavefish (Astyanax mexicanus) that are eyeless and completely blind as adults (Figure 2.11). Cavefish actually start growing eyes during embryonic development, and the lack of eyes in adult fish is a consequence of the abortion of eye development within the first few days of the growing embryo. The eye in this case is a developmental (rather than a structural) vestigial organ that makes a transient appearance during certain embryonic stages. More importantly, cavefish actually possess all the genes required for the normal development of an eye. It turns out, eye abortion is initiated by signaling factors associated with the developing cavefish lens. Transplantation of a surface fish lens into a developing cavefish actually leads to the normal formation of an eye, just like transplantation of a cave fish lens into a surface fish embryo leads to eye abortion (Krishnan and Rohner 2017). If cavefish had originated independently, there would be no need for evolution to “create” the developmental machinery for eye development. Cavefish simply have that machinery because they lost eyes secondarily and inherited all the information for making an eye from their eyed surface ancestors.

Different forms of *Astyanax mexicanus*. Left: Cave form, which is completely blind and lacks body pigmentation as an adult. Right: An individual from a surface stream for comparison. Photos by [Daniel Castranova, NICHD/NIH](, [Public Domain](

Figure 2.11: Different forms of Astyanax mexicanus. Left: Cave form, which is completely blind and lacks body pigmentation as an adult. Right: An individual from a surface stream for comparison. Photos by Daniel Castranova, NICHD/NIH, Public Domain.


Pseudogenes are inactive copies of functional genes in the genome and represent another kind of evolutionary “flaw”. Pseudogenes arise when processed messenger RNA (mRNA) is reverse-transcribed and inserted back into the genome. Reverse transcription is typically associated with retrotransposons or the activity of retroviruses in cells. Because processed mRNA lacks introns and other genetic elements important for transcription and translation, the complimentary DNA (cDNA) that is built back into the genome ends up being functionless. Hence, pseudogenes essentially represent junk DNA invisible not only to the cellular machinery responsible for protein synthesis, but for natural selection as well.

The reason that pseudogenes are invisible for natural selection is that they make no contribution to the phenotype of an organism, neither good or bad. While copy mistakes (i.e., mutations) that impair the function of normal genes are usually eliminated by selection, similar mutations in pseudogenes have no effect and just linger around. As generations pass, pseudogenes consequently tend to accumulate more and more mutations compared to the original functional genes they originated from. Since mutations in the genome occur at predictable rates, we can compare pseudogenes and to their functional equivalents to estimate when pseudogenes first arose. Conducting such analyses for pseudogenes in the human genome revealed that some of them are really old, much older in fact than the human species! This suggests that those pseudogenes must have arisen in an ancient ancestor, which we share in common with other closely related species. If this is the case, we should be able to find the same pseudogenes in other primates, but only those that have diverged from a common ancestor after the origin of the pseudogene.

Friedberg and Rhoads (2000) put this hypothesis to the test (Table 2.1). The oldest pseudogene they investigated (CALM II 𝛙3), which has an estimated age of about 36 million years (Myr) based on the mutational difference from the functional equivalent, is found in all five species of primates that they investigated (including divergence times between 8 and 36 Myr). In contrast, the youngest pseudogene (𝛂-Enolase 𝛙1; 11 Myr old) is only found in chimpanzees and gorillas, the only primates that have diverged from the human lineage less than 11 Myr ago. Overall, the pattern of the presence and absence of pseudogenes is consistent with common ancestry. There is no reason for species to evolve the same pseudogenes independently in a predictable pattern. Rather, species share these “flaws” simply because they were passed down from one generation to the next, even as lineages split and formed new species.

Table 2.1: Pseudogenes that Friedberg and Rhoads (2000) detected in different primates (with hamsters as an outgroup). Individual pseudogenes identified in the human genome, along with their age estimates, are listed in rows. Different species with their estimated divergence times from humans are in columns. Symbols indicat the presence (+) or absence (-) of specific pseudogenes in a particular species.
Chimpanzee Gorilla Orangutan Rhesus Capuchin Hamster
8 Myr 9 Myr 16 Myr 25 Myr 36 Myr >85 Myr

𝛂-Enolase 𝛙1

11 Myr

+ + - - - -

AS 𝛙7

16 Myr

+ - + - - -


19 Myr

+ + + - - -

AS 𝛙1

25 Myr

+ + + + - -

AS 𝛙3

25 Myr

+ + + + - -


36 Myr

+ + + + + -

2.4.2 Why Homologies Matter

The fact that evolutionary novelties occur in nested sets as predicted by descent with modification provides strong evidence for common ancestry. The finding also has far reaching implications, because it underlies all biomedical research and applications. The reason we can study DNA repair mechanisms in bacteria to learn about their role in cancer development is because DNA repair mechanisms in bacteria and humans are homologous. The reason we can study cell cycle regulation in yeast is because both yeast and humans inherited the same regulatory machinery from a common ancestor. The reason we can study drug responses in rodents is because the physiological processing of many substances is mediated by homologous pathways in rodents and humans. And we can gain insights about neurophysiology and psychiatry from other primates, again because we all inherited our neurosystem from a common ancestor. The reason evolution is the unifying theory of biology is because it provides the critical framework for comparative studies among species, helping us to make sure that we are actually comparing apples to apples (i.e., homologous structures). If not, inferences from comparative analyses can be deeply flawed.

2.5 Deep Time

The last prediction of Darwin’s idea of descent with modification is that Earth and life on it are old. The study of the age of Earth and the universe is not really a subject of biology (hence, I will only touch on this briefly). The age of Earth is chiefly studied by geologists who combine isotopic analyses with an understanding of radio active decay (radiometric dating), and they have established that Earth is about 4.54 billion years old (see Paul Braterman’s article in Scientific American if you want more information). Similarly, astronomers have estimated the age of the universe at 13.8 billion years by measuring the rate of expansion of the universe and extrapolating back to the Big Bang (see Ethan Siegel’s article in Forbes). Evidence for the age of life on Earth comes unsurprisingly from the fossil record. The oldest known fossils are cyanobacteria found in Australian rock formations. Radiometric dating has revealed that they are 3.5 billion years old. Hence, all evidence indicates that life on Earth has had incredibly long periods of time to evolve and create the diversity of organisms observable today.

2.6 Correspondence of Different Lines of Evidence

Inference in science is strongest when there is a clear correspondence between different lines of evidence that all support a central hypothesis. In fact, we can often use existing information to formulate testable hypotheses that then can be addressed with alternative approaches. You have have already learned about one such example in the context of pseudogenes. Estimating the age of pseudogenes by tallying the number of mutations between a pseudogene and its functional equivalent led to clear predictions about the phylogenetic distribution of pseudogenes to test for common ancestry.

The discovery of Tiktaalik, a transitional fossil between aquatic and terrestrial vertebrates discovered by a research team around Neil Shubin (Figure 2.12), is another example for the role of interdisciplinary research in making discoveries that transform our understanding. Tiktaalik was not discovered haphazardly by a bunch of rock-loving paleontologists that were just looking for fossils. Its discovery was deliberate and testimony to the power of the scientific method. Wanting to find a transitional fossil that exhibited characteristics of both fish and early tetrapods, Shubin and his team first turned to molecular phylogenetic analyses of vertebrates. Essentially, they used DNA sequences to not only infer the evolutionary relationships between different vertebrate groups but also to date when different lineages split from each other (we will learn exactly how this works in Chapter 7). These analyses revealed that terrestrial vertebrates (Tetrapoda) are sister to the lungfishes (Dipnoi), a linage from which they split between 350-425 million years ago (Figure 2.13). Any transitional fossils that exhibit traits intermediate between the two groups should consequently be found in rock layers of about that age. So Shubin and colleagues took out a geological map of Earth in search of exposed rock formations of the correct age range, and they found some on Ellesmere Island in the Nunavut Territory of Canada. After a few disappointing field seasons, Shubin and his team indeed found a fossil with the desired traits in 2004. Tiktaalik roseae, as they named the newly discovered species, exhibited gills and scales like fish but also limb bones characteristic of today’s land animals (Daeschler et al. 2006). It’s the combination of multiple approaches rooted in molecular biology, evolutionary analyses, and paleontology that ultimately led to the discovery of this missing link. Looking for corresponding evidence from different research approaches leads to the most robust inference in science, an approach frequently used in evolutionary biology.

Explore More

If you want to learn more about the fascinating discovery of Tiktaalik and its implications for evolution and our own origins, I recommend you either read Neil Shubin’s book “Your Inner Fish” or watch the PBS series based on the book.

*Tiktaalik roseae*, artist reconstruction and cast of the fossil as displayed at The Harvard Museum of Natural History. Photo by [Maggie](, [CC BY-NC-ND 2.0](

Figure 2.12: Tiktaalik roseae, artist reconstruction and cast of the fossil as displayed at The Harvard Museum of Natural History. Photo by Maggie, CC BY-NC-ND 2.0.

A simplified phylogenetic tree of vertebrates. Terrestrial vertebrates (Tetrapoda) are part of the lobed-finned fishes (Sacropterygii) and split from their sister group (the lungfishes, Dipnoi) between 350 and 425 million years ago. The estimated range for potential transitional fossils is highlighted in gray, the age of the *Tiktaalik* fossil in red.

Figure 2.13: A simplified phylogenetic tree of vertebrates. Terrestrial vertebrates (Tetrapoda) are part of the lobed-finned fishes (Sacropterygii) and split from their sister group (the lungfishes, Dipnoi) between 350 and 425 million years ago. The estimated range for potential transitional fossils is highlighted in gray, the age of the Tiktaalik fossil in red.

2.7 Absence of Evidence…

A frequent argument of critics of evolutionary theory (and science in general) is that we cannot explain everything, and indeed there are some major open questions that remain largely unaddressed: How did life on Earth originate? What were the characteristics of the last universal common ancestor of Archaea, Bacteria, and Eukaryotes? How and why did the eukaryotic cell arise? How did the transitional forms between some major taxonomic groups look like? Why is it that so many species reproduce by having sex?

The fact that we do not know the answers to these questions and many others like them does not undermine what we do know about evolution and science. It is precisely why evolutionary biology is an exciting field of research! More importantly, absence of evidence is not evidence for absence. For example, gaps in the fossil record and a lack of transitional forms between some taxonomic groups does not negate what we have learned about evolutionary patterns and processes. Perhaps some of these key fossils have just not been found yet, or they may be completely lost to time (because fossilization is actually a rare process). Ultimately, the probability that evolution is true based on other evidence is high enough that a lack of a specific fossil cannot call it into question. Only novel evidence—for example, new fossils that directly contradict our current understanding—has the potential to reshape evolutionary biology. In other words, the burden of proof about any inaccuracies in our current understanding of evolution lies with the critic and not the untouched gaps in our current understanding. And as scientists, it is our day-to-day business to detect and correct those inaccuracies, rather than defending the status quo blindly.

2.8 Case Study: Darwin’s Finches

For the first case study, we will take a closer look at some evidence for microevolution, using one of the most iconic study systems in evolutionary biology, the Darwin’s finches on the Galapagos Islands. These are the same finches that have helped to inspire Darwin, but it turns out that much of what we know about these finches actually comes from two biologists, Rosemary and Peter Grant, who have studied these birds in their natural habitats for many decades.

The Grants’ primary study site is Daphne Major. With a size of less than half a square kilometer, it is one of the smallest islands in the Galapagos Archipelago (Figure 2.14). Daphne Major harbors a significant population of the Medium Ground Finch (Geospiza fortis, Figure 2.15), which was the focus of much of the Grants’ research. Over decades, they followed this finch population, not only keeping track of individual birds and their offspring, but meticulously measuring the population’s phenotypic traits generation after generation. This resulted in a massive, long-term data set that allows to query key questions about microevolutionary change. For this exercise, we will take a look at the beak size data the Grants collected from 1972-1994.

Note that this weeks case study also provides a general introduction to RStudio and RNotebooks. The practical skills required to complete the exercise are also explained in the section below.

Explore More

To learn more about Rosemary and Peter Grant, check out the portrait that Emily Singer wrote for Wired. If you are interested in their work on finches, I can recommend the popular science book “The Beak of the Finch: A Story of Evolution in Our Time” by Jonathan Weiner and “How and Why Species Multiply: The Radiation of Darwin’s Finches” written by the Grants themselves.

Daphne Major, a small rugged island in the Galapagos. Photo by [Sam LaRussa](, [CC BY 2.0](

Figure 2.14: Daphne Major, a small rugged island in the Galapagos. Photo by Sam LaRussa, CC BY 2.0.

Medium Ground Finch (*Geospiza fortis*), Santa Cruz, Galapagos. Photo by [Putney Mark](, [CC BY-SA 2.0](

Figure 2.15: Medium Ground Finch (Geospiza fortis), Santa Cruz, Galapagos. Photo by Putney Mark, CC BY-SA 2.0.

2.9 Practical Skills: R Notebooks and Plotting with ggplot

2.9.1 R Notebooks

In the last chapter, you learned how to enter commands in the RStudio console to receive an output from R. This showed you the general principle of how you can prompt R to execute a function you want. In reality, you will rarely work in the console, at least for this class. This is because RStudio provides the ability to create R Notebooks (*.Rmd files) that allow you to combine text elements (formatted using the Markdown text formatting system) with chunks of code and the code output. Essentially, your R Notebook will contain multiple mini-consoles (the code chunks) with code that you can execute, and the output will be displayed immediately below. The big advantage is that you can create documents that contain computer codes, their outputs (like graphs), and explanatory text (e.g., instructions provided to you or interpretations of the results provided by you).

Each chapter comes with a downloadable *.zip file that contains a folder with the materials for the accompanying exercises (also see Appendix B). Once unzipped, the folder contains a pre-formatted *.Rmd file as well as additional files, like data sets and images. To avoid issues with the import of data and the display of images, it is important to you keep all files together in the same folder as you received them. If you want to move the files (for example from your Downloads folder to your Class folder), I recommend that you move the entire folder containing the exercise files (rather than the individual files).

Once you downloaded the files associated with the first exercise, you can open the *.Rmd file by double-clicking. It will automatically open in RStudio. As you can see, there are three main parts to an R Notebook file.

The Header

The header, which you can see at the beginning of the document, is delineated with three dashes (---) at the beginning and the end. It includes some code that is important for the formatting of output files. This section of the document is pre-formatted, and I would recommend not altering it; there is no reason for you to change the header for any exercises in this course. However, if you would like to learn more about the different header options for your use of R Notebooks in the future, you can find a good tutorial here.

Code Chunks

Code chunks are delineated with three ticks (''') at the beginning and the end, and the {r} after the first set of ticks lets your computer know that you will be using the R programming language. You can always add a code chunk by clicking “Insert > Code Chunk > R” above, although we usually already created all the chunks that you will need. Any text within a chunk, if written correctly, represents executable code, which the computer can interpret as a command to execute certain tasks. You can make your computer execute the code in a chunk by pressing the small, green play arrow on the top right corner of each chunk, or you can just highlight the code and press command+enter (control+enter on PC). When you execute the code, the output will automatically appear below the code chunk.

Sometimes you will find us using hash tags (#) within code chunks. Hash tags “silence” the code that follows on the same line, such that the computer jumps over that section when executing the code. That is useful for code annotation, and you will frequently see us using the hash tags to add instructions or explanations.


The text in between code chunks is just that: text. We will use these sections to provide you with background information and discussion prompts, and you will use these sections to respond to questions and offer your interpretations of data. Sections where you need to write something are always highlighted in italics (designated with asterisks). You can use a variety of Markdown prompts to format your text (see here for a cheat sheet), although the current version of RStudio allows you to change formatting with a click of a button, just like other word processing software.

HTML Preview and Output

A key strength of the R Notebook system is that you can output your notebook in a wide variety of file formats that automatically integrate text, code, and code output. In fact, this learning resource has entirely been written in RStudio!

In order to generate an output, your R Notebook (including text, code chunks, and the outputs from your code) can be automatically “knitted” into an HTML file. You can click “Preview > Preview Notebook” (or “Knit > Knit to HTML”) to see the live HTML file as you are working on your R Notebook (just make sure to save to update), and you can find the shareable *.html file in the same folder as you *.Rmd-file. The *.html file will have the same file name as your *.Rmd file with “.nb” added to it.

2.9.2 Using Libraries

When you install R, your computer can understand and execute a number of commands. This is what is known as “Base R”. The power of R, however, is that you can expand the number of commands your computer can understand by installing and loading additional R packages (also called libraries). There are R packages specialized for pretty much any area of biology, providing a capability to analyze data from the level of genes and genomes to ecosystem level processes. We will frequently use a package called ggplot2, which allows for plotting data.

Installing Libraries

To successfully complete some of the R exercises, you will need to install additional libraries. To download and install new R packages, go to “Tools > Install Packages…” and type in the name of the package you want to install (e.g., “ggplot2”). Alternatively, you can use the install.packages() command as in the following code chunk:

#To install ggplot2, execute the following code:

Important Note

You only need to install packages once unless you re-install R. I recommend deleting code chunks with install.packages() prompts after you run them successfully, or you can silence them by adding a hash tag in front of the particular line of code. Failure to deactivate package installation code can lead to errors during the creation of HTML outputs.

Loading Libraries

To make use of installed libraries, you also need to load the libraries every time you use R (i.e., every time you restart the program). You can do this with the library() command, and you will find a code chunk prompting you to load all required libraries at the beginning of each R Notebook. For example, the following code chunk loads the ggplot2 library:

#Note that loading a library does not lead to an output

Important Note

You have to re-load your libraries every time you restart RStudio! The most common error students in this class encounter is that a particular function cannot be found:

 Error in function.x(): could not find function "function.x"

This means that the function name is either misspelled, or the library containing a particular function has not been loaded (so R does not actually understand the command you are entering).

2.9.3 Importing Data

For most R exercises, you will work with real data sets that illuminate evolutionary concepts. Data sets will typically be provided as *.csv files (which stands for comma-separated values). *.csv files are essentially text files containing data tables, and you can also open these up in a text editor or Excel. If you do so, you will see a data structure familiar from regular spreadsheets: different variables are organized in columns, and observations are organized in rows.

Setting Your Working Directory

Having a well-organized file structure is critical to avoid issues with coding, because you will frequently read in data files, and you need to make sure that R knows where to look for those files. Unless otherwise specified, R will only look for files you may want to import in a particular folder called the working directory. If you are not sure what your current working directory is, you can simply execute the command getwd() in the console, and R will tell your the current working directory.

If you move the exercise files around (or if you are working on your own projects), you need to make sure that R is looking for the files in the right folder. To do so, you need to set the working directory with the setwd() command using the path to your specific folder.


Note that the path on a Mac usually looks something like this: /Users/michitobler/Documents

On a Windows PC, it looks something like this: C:\Users\michitobler\Documents

Important Note

If you don’t want to deal with having to set your working directory, simply follow the advice from above: Retain your *.Rmd file and all the additional files together in the same folder. If you open the *.Rmd file by double-clicking, the working directory should be set automatically, and R will look in the right spot for files you may want to import.

Reading a *.csv File

In order to import data in the form of *.csv files, you can use R’s read.csv() function. In the code chunk below, you can import a simple test data set (“test_data.csv”) provided with this chapter that includes three variables: sex, length, and mass of individuals in a population.

#The line of code simply prompts the computer to read the "test_data.csv" file and generate a data.frame called
#Note that the file encoding flag simply indicates that the file was generated on a Mac (the operating system I use). It helps to prevent issues for Windows users. <- read.csv("data/test_data.csv", fileEncoding = 'UTF-8-BOM')

If this worked correctly, you should now see a new data frame called “” in your workspace (Global Environment; top right panel). You can double click it to view it or use View( in the console as described on Chapter 1. There should be three columns: sex, length, and mass.

2.9.4 Graphing Data

A key learning objective of this course is that you learn to visualize and interpret data to address different evolutionary hypotheses. In the following sections, I will explain step by step (that is code line by code line) how to make a simple graph with our sample data. Let’s aim to make a scatter plot showing the relationship between length and mass of individuals in the population. The process is not much different than sketching a graph by hand and layering different parts of the graph on top of each other, just that you use words (code) to make the computer draw. To graph data, we will primarily use the ggplot command that comes with the ggplot2 library.

Defining the Axes and Coordinate System

The first step of making any graph is to define the axes and establish the coordinate grid that allows for the plotting of the data. In order to do this, R first needs to know what data frame the data is stored in (in this case, the data frame is called The axes are then defined by specifying the aesthetics aes() within the ggplot function, as shown below.

#This line of code calls for the ggplot function (a plotting function) and makes a grid based on the data frame, using length as the x axis and mass as the y axis
ggplot(, aes(x=length, y=mass))

The output is a simple coordinate system based on the data we provided, with length as the x-axis and mass as the y-axis.

Adding a Layer with Data Points

The second step is to draw the actual data into the established coordinate system. To do so, you just need to tell the program what kind of graph you want to draw. Different graph types in ggplot are referred to as geoms (geometries), and a scatter plot is designated as geom_point. You can literally add that to your existing code with a plus sign.

ggplot(, aes(x=length, y=mass)) +

For an overview of some of the graph types (geoms) ggplot offers, check here. In the coming chapters, I will introduce you to a variety of geoms that you can use to visualize different types of data.

Adding a Trendline

Whenever we look at the relationship between two variables, we may want to add a trendline. You can add a trendline by adding the geom_smooth command to your existing code. method="lm" within the geom_smooth command indicates that we want to draw a straight line (linear model, lm). se=FALSE indicates that we do not want to draw a confidence interval around the estimated best-fit line. Change it to se=TRUE and see what happens!

#The code within the brackets of the geom_smooth command specifies some additional options, namely that we want to draw a straight line (method="lm") and that we do not want to show the confidence interval (se=FALSE).
ggplot(, aes(x=length, y=mass)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE)

Changing the Axes Labels

The variable names in the data frame do not always provide the clearest description of what a variable means. We can modify the x and y axis labels using the xlab() and ylab() commands, respectively. Note that labels need to be written in quotation marks.:

#Simply add the new label text in quotation marks
ggplot(, aes(x=length, y=mass)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE) +
  xlab("Body length in cm") +
  ylab("Body mass in kg")

Adding Additional Complexity

If you look at the data frame, you will see that we do not only have information about the length and mass of individuals in the population, but also their sex. So, we may want to account for potential sex differences in the relationship between length and mass. To do so, we can color-code individual points based on the sex of the individual by adding another term of the aesthetics of the ggplot function (color=sex):

ggplot(, aes(x=length, y=mass, color=sex)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE) +
  xlab("Body length in cm") +
  ylab("Body mass in kg")

As you can see, this not only changes the color of individual points, but it also draws a separate regression line for males and females.

Changing the Theme

I honestly just hate the default theme of ggplot with its gray background. But you can quickly alter the look of a graph by switching to a number of other possible themes. I personally like the theme_classic(), but you can customize the look of your graph with any theme you may like (see here).

ggplot(, aes(x=length, y=mass, color=sex)) +
  geom_point() +
  geom_smooth(method="lm", se=FALSE) +
  xlab("Body length in cm") +
  ylab("Body mass in kg") +

Generating and Visualizing Aggregate Data

In the exercise associated with this chapter, you will not be plotting data from individuals but rather aggregate data that is compiled from many individuals and provides a mean and a measurement of variation around a mean for different sampling groups. To show you how we can visualize such data as mean (∓ variation), I am first calculating the mean and standard deviation (sd) of length separate for each sex based on the data above using the ddply function from the plyr package.

#Load the plyr package that includes the ddply function

#Use the ddply function to calculate mean and standard deviation of length for each sex
means <- ddply(,~sex,summarise,mean=mean(length),sd=sd(length))
##      sex     mean       sd
## 1 female 101.2414 12.34520
## 2   male 118.6452 15.05224

To visualize means and standard variations, we can again use the ggplot function with sex on the x axis and the mean value for each sex on the y axis (note that we are now referring to the ‘means’ data frame that we just created in the last code chunk). As before, we are using geom_point to draw our data as points. In addition, we are using geom_errorbar to draw the standard deviations around the mean in both directions. You already know all the other code elements from above.

ggplot(means, aes(x=sex, y=mean)) +
  geom_point() +
  geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd), width=0.3)  +  #widfth designates the width of the horizontal bars
  xlab("Sex") +
  ylab("Mean length [mm]") +

2.10 Reflection Questions

  1. There is phenomenal variation in human height. The graph below shows variation among over 80 human populations (countries; in different colors) and across time (from 1897 to 1996). As you can see, there is a spread of about 20 cm in mean height among populations that has persisted through time. In addition, mead height has increased by an average of ~8 cm over 100 years (black line). Do you think the variation in height among the populations and the change through time are the product of evolution? Why? If you want to explore these data further, you can download it here. Data was originally retrieved from Our World in Data (CC BY 4.0).
Mean height of male humans in different countries (by color) and across years.

Figure 2.16: Mean height of male humans in different countries (by color) and across years.

  1. Transitional fossils are a hallmark of evolution. However, we lack transitional fossils between many groups of organisms, especially between phyla that arose during the Cambrian explosion. Why do you think this is? How does this undermine evolutionary theory?

2.11 References

  1. Transitions are interchanges of two-ring purines (A and G) or of one-ring pyrimidines (C and T). Transversions are interchanges of purine for pyrimidine bases, which therefore involve exchange of one-ring and two-ring structures.↩︎