Friday, December 16, 2011

Transposable Elements and Common Descent of Humans and other Primates

I am writing this because it seems to me that the type of evidence that I am going to describe here for evolution is both decisive and readily comprehensible by laymen. I am going to concentrate on humans, chimpanzees and other primates beause this seems to be what concerns people the most, and because several primate genomes have been fully sequenced (including the human genome, which has of course been studied more intensely than any other.) This means that there is a staggering amount of evidence. I will just note that the same kind of analysis could be done for any other group of closely related organisms where substantial amounts of genome sequence are available, because transposable elements are present in nearly all animals and plants and in a large portion of microbes.

The argument, simply put, is this. Primate genomes contain large numbers (millions) of sequences (ranging from 50 or so bp to 6000 bp) which got where they are by being copied from another location in the genome and inserted where they are now. (There are longer sequences that have been duplicated to new locations in the human genome, but they are not my focus here.) The processes by which these sequences get inserted are not target specific. When a sequence segment "jumps" (actually in most cases the sequence is copied and inserted) its final location in the genome is largely random. And here is the essential point. When you compare the genomes of different species (say human and chimp) huge numbers of these transposed sequences are found in exactly the corresponding position in the 2 genomes. Depending on when the transposon at a particular site was inserted, it may be there in multiple species. Very old transposons can be present at a particular site in all mammals. Generally the more closely related two species are, the more transposon insertion sites they will share.

Some transposable elements are still "jumping" in the human and other genomes. About 80 instances of genetic diseases have been found where a transposable element inserted into a gene in an individual and was not present in either parent, and over 7000 locations have been identified in the human genomes sequenced so far where a transposable element has inserted in some chromosomes but is absent in other copies of the chromosome and in all chimp chromosomes. These latter new insertions are interesting but not relevant to my argument, except that they may serve to convince people that transposons really do get where are by being copied and inserted. They aren't just repetitive sequences. One creationist web site did a whole series on Alu transposons and never mentioned that they get where they are by insertion.

Now logically, the presence of a transposable sequence at a particular location in different species could be the result of parallel transpositions of the same element in the two separate species. However, none of the transposon types that occur in primates are targeted to specific locations. There are transposons in bacteria that target specific sites that can be unique in a genome, but none of the transposons in primates work this way. When a transposon in a mammal jumps, it has about 3 billion base pairs to "choose from" when it lands. To get target specificity would require a recognition sequence of at least 16 bp if the enzymes involved had absolute target specificity for one particular sequence. (There are about 2^32 possible sequences 16 bp long = over 4 billion sequences. The transposons in mammals do have a statistical preference for much shorter (AT rich) sequences (about 5-6 base long), but there are millions of short sequences in a mammalian genome that fit these preferences, and the preference isn't absolute. A transposon will sometimes land in a suboptimal sequence. Any break in the DNA caused by various kinds of damage can be a target for insertion of a transposon.

The upshot of this is that even a single case of transposons inserting independently at the exactly corresponding site in two different species is a rare event. This has been reported, but it was possible to distinguish the 2 events even in this case because the two insertions were by different classes of transposon (the sequence that inserted at the site was completely different in the two species. In the human genome over 900 classes and subclasses of transposable element have been distinguished.) It is also possible to distinguish insertions at the same site of the same element, because the inserted elements are often truncated from their full length, and the different lengths distinguish separate events.

The result of this is that the hundreds of thousands of cases of the same transposon being inserted at exactly the corresponding position in the genomes of different species can only be be accounted for by the different species having common ancestors in which the transposon insertions took place.

An additional level of specificity is added by the fact that transposons often insert within previously inserted transposons. This is not surprising when you realize that the human genome (and other primate genomes) are at least 50% composed of transposable element sequences. When transposons have inserted into previous inserted transposons it is possible to analyze the sequences and determine the order in which the different transposons were inserted. There are over 600,000 clusters of multiple transposons like this in the human genome. When you compare the human and chimp genomes you find that the transposons were not only inserted at the same sites in the two genomes, they were inserted in the same order.

When you put all this together it is apparent that the odds against millions of transposon insertions (the human genome contains about 3 million total) occurring in parallel at the corresponding sites in different species and in the same order are astronomical. (Actually trans-astronomical. The calculation would produce a number larger than any number that is useful in astronomy.)

The result of this is that there are only two possibilities to account for the transposons in animal genomes, common ancestors or miracles, millions and millions of miracles. But the trouble with miracles is that when you have you invoked them, you have quit doing science, because miracles can account for anything. You can postulate that the whole world, including all our memories and all the physical evidence, was created 5 minutes ago. No one can prove that it didn't happen. It just isn't very interesting. Once you have started doing that, evidence is irrelevant. You might as well just go to the beach or watch TV or whatever you prefer. There's no reason to do all the work and spend all the money that it takes to do science. So, if you want to stick to science, the only way to account for all those transposons inserted at the same position in different species is that those species had common ancestors. At least that's the only scientific possibility I can think of.

That is the argument in brief. There is a huge mass of details that one can get into about the different kinds of transposons in mammalian genomes, their mechanisms of transposition, the different ways of estimating the age of individual insertion sites, the occurrence of sequences in the transposable elements that have some function for the cell, the many ways that transposons alter chromosome sequences by carrying neighboring sequences with them when they transpose, the mechanisms that cells employ to suppress the transcription of TEs most of the time, the way that new TE insertions get into the germ line so that they are inherited in the next generation, the occurrence of transposition in somatic cells and the effect that that sometimes has on induced cancer, etc. But what I have presented here is the basic argument that TEs provide for the fact that current species groups have common ancestors, i.e., that speciation and evolution has occurred.

To give some idea of what interspecies comparison of TE insertion sites looks like, I am including a figure that aligns a 50,000 bp region of the human and chimp genomes, with the TEs marked.

Figure 1. The upper panel is repetitive sequences determined by Repeat Masker software in a segment of human chromosome 3. The bottom panel is the corresponding segment in chimp. Generally the elements present in human are present in chimp, although they may not line up perfectly due to small insertions or deletions in the intervening sequences. Darker shades of grey mean that the element is very similar to the standard sequence that the software uses for that type of element, and thus that the element is younger and has had less time for its sequence to diverge. In a least one case of an old highly diverged L2 element, the software detected it in human but missed it in the chimp sequence. SINEs are short interspersed elements, the most common of which in humans is Alu elements. LINEs are long interspersed elements, the most common of which in humans are LINE-1s. LTR elements are endogenous retroviruses and related elements which lack the envelope gene and thus can't form virus particles. LTR stands for long terminal repeat, the diagnostic characteristic of these kind of elements. DNA transposons are old elements that moved by a cut-and-paste mechanism, but none of them have been active in the line that led to humans for a very long time. The bottom 2 or 3 tracks of each part of the figure show TEs that have been interrupted by the insertion of another TE.


2 comments:

  1. How do you know that the final location of the transposons in the genome is random?

    ReplyDelete
  2. Sorry for the delay in replying. I've been a bit under the weather for the last couple of days.

    It isn't strictly random, in the sense of every possible site being equally likely, but the only thing that is necessary for the argument is that transposition doesn't have anywhere near the site-specificity that would be needed to account for what is observed in related species by parallel events.

    The most straightforward way to show this is by activating a marked element in cultured cells or in an experimental animal. You have to mark the element somehow so you can identify the new insertions easily, since there are a very large number of old elements already present in the genome. You activate it simply by putting promoter elements next to it that will cause an RNA to be transcribed from the element. When you do this you find that copies of the element get inserted all over the genome, on all the chromosomes.

    I didn't go into a whole lot of detail about specific types of elements in the post. If you notice in the figure, there are tracks for SINEs (short interspersed elements), LINEs (long interspersed elements) and other kinds of elements. I made this figure from the UC Santa Cruz genome browser. When you are live in the browser, you can mouse over each element in a track and it shows you the specific type and subtype of element. Most of the SINEs are a type called Alus and the LINEs are mostly LINE-1, with a few LINE-2 or LINE-3s. Each of those has subtypes based on specific sets of mutations that distinguish them. If you compare each element in human and chimp, you find that not only are the types the same, so are the subtypes. (The different subtypes of each element were active at different times in evolution - recently inserted elements are restricted to a few currently active subtypes.) There is also specificity about LINE elements in the length of the insertion - in the figure you can see that they are very similar for each specific insertion in chimp and human. This is because LINE elements tend to be truncated on the 5' end during insertion. Only about 25% of LINE insertions are the full length (~6000 bp). The rest are shorter (and hence defective for further transposition.) Transposition is generally a pretty sloppy process. It is common for the element to carry some extra sequence from one end or the other of the source site, to cause a deletion at the target site, or to have a rearrangement in the sequence of new element. These peculiarities at each site are present in the different species that contain the insertion.

    Hope this helps with the question. If want to know more, feel free to ask.

    ReplyDelete