Image of a collection of ancient skulls.
Enlarge / OK, which one of you is the father?

Shortly before the publication of the first Neanderthal genome, a number of researchers had seen hints that there might be something strange lurking in the statistics of the human genome. The publication of the genome erased any doubts about these hints and provided a clear identity for the strangeness: a few percent of the bases in European and Asian populations came from our now-extinct relatives.

But what if we didn’t have the certainty provided by the Neanderthal genome? That’s the situation we find ourselves in now, as several studies have recently identified “ghost lineages“—hints of branches in the human family tree for which we have no DNA sequence but find their imprint on the genomes of populations alive today. The existence of these ghost lineages is based on statistical arguments, so it’s very dependent upon statistical methods and underlying assumptions, which are prone to being the subject of disagreement within the community that studies human evolution.

Now, researchers at the University of Utah are arguing that they have evidence of a very old ghost lineage contributing to Neanderthals and Denisovans (and so, indirectly, possibly to us). This is a claim that others in the field will undoubtedly contest, in part because the evidence comes from an analysis that would also revise the dates of many key events in human evolution. But it’s interesting to look at in light of how scientists deal with a question that may never be answered by definitive data.

Looking for ghosts

Ghost lineages have made their presence known in two ways. In the first, sequences of DNA from different populations can reveal shared ancestry groups. Native Americans, for example, have sequences that descended from an ancestral population that contributed DNA to modern East Asians, as well as another population that contributed to modern Siberians. In West Africans, we’ve found a significant contribution from a population that doesn’t seem to have contributed to any other existing population (along with contributions from groups that do have current descendants).

While that population’s contribution is well within the range of normal human variation, we still don’t know anything about who they were or where they interacted with the ancestors of West Africans. They’re a historical ghost at the moment, though further studies could always provide more details.

But there are hints of additional ghost lineages in our past. In these cases, the contribution comes from something outside the normal range of human variability. Take the Neanderthal DNA, for example. European and Asian populations all share common ancestors that seem to have left Africa about 50,000 years ago and thus have a relatively small range of variations in their DNA. Neanderthals, in contrast, split off from the lineage that produced modern humans hundreds of thousands of years ago and have been largely separated since. They had plenty
of time to build up their own variations that are distinct to their lineage and not found in modern human populations.

Neanderthals contributed DNA that had developed its own distinct variations after hundreds of thousands of years of reproductive isolation.
Enlarge / Neanderthals contributed DNA that had developed its own distinct variations after hundreds of thousands of years of reproductive isolation.

John Timmer

Thus, the DNA Neanderthals contributed to Eurasian populations included variants that fall well outside the range of the variation we see in other parts of the genome. And while we know about Neanderthals, it’s possible you can get a similar contribution from a group we don’t know about.

The problem is that this sort of branching is impossible to identify at the single-base level. There’s no way to distinguish a variant that has arisen recently due to mutation from one that was brought in from a more distantly related lineage. In the diagram below, we take some known branches of the recent human family tree and add a potential ghost lineage. We can imagine an example where, at a specific location in the genome, modern humans and Neanderthals have an A, while Denisovans have a G.

If one human lineage has a distinctive variation, we can't tell whether it arose in that lineage or was contributed by interbreeding with a separate branch unless we look at lots of additional variants.
Enlarge / If one human lineage has a distinctive variation, we can’t tell whether it arose in that lineage or was contributed by interbreeding with a separate branch unless we look at lots of additional variants.

John Timmer

One explanation for this is that modern humans got their A from Neanderthals, which we know interbred with us. But that interbreeding has mostly contributed to non-African populations, so this is unlikely. Another option is that a mutation occurred on the Denisovan lineage. But a third option is that the G came into the Denisovan population thanks to a completely separate human lineage that interbred with them. At the individual base level, these two options are impossible to tell apart.

Testing all the things

To discriminate among all the possible models of our evolutionary past, we have to consider both the information we know—that Neanderthal DNA is rare in African populations, for example—as well as statistical arguments. DNA variants tend to be inherited together, so if there is a contribution from a ghost lineage, it would likely involve some unusual variants clustering near each other in the genome. With enough solid knowledge and a careful statistical analysis of enough genomes, it should be possible to figure out which models are more likely and which can be ruled out.

That’s more or less what this new research did. It starts with two Neanderthal genomes, one Denisovan genome, and one genome each from modern English, French, and Yoruban populations. It then builds different models of potential evolutionary histories—a branch here, a bit of interbreeding there—and determines how well each model is supported by the statistics. Given enough models to test, there should be a pattern where a collection of similar trees is favored. And that model better be consistent with the things we already know.

The rough outline of the tree that comes out of this analysis does a reasonably good job of matching up with things that have been seen in other analyses. The relatively recent gene flow from Neanderthals into modern humans is there, as is an earlier one from the ancestors of modern humans into early Neanderthals. There’s also an indication of gene flow from a ghost population into the Denisovan lineage, which has been seen in other studies. This ghost lineage would have had to occupy some part of Eurasia as a contemporary of the Neanderthals and Denisovans, something that’s certainly possible, given that the two groups we know about managed to get there.

Trees upon trees

Things start to get a bit strange, however, in the earlier parts of the favored tree. The same ghost lineage would have also contributed DNA to the common ancestor of Neanderthals and Denisovans, suggesting it was a distinct lineage already by the time of their split from the part of the tree that includes modern humans. There’s no indication, however, of it contributing to the modern human lineage (except perhaps indirectly via its contribution to Neanderthals). That would suggest that the ghost lineage was outside of Africa by the time the modern human lineage started and only encountered the Neanderthal/Denisovan ancestors once they migrated into Eurasia.

That’s possible, but the only lineages that we know were present outside of Africa at the time were variants of Homo erectus, a much earlier linage.

What are we missing?

Which brings us to the dates of the different splits. The authors use what they acknowledge is a low estimate of the mutation rate/generation to figure out when the lineage splits occur. The estimate produces early splits for all the lineages compared to estimates from other sources. But even accounting for that, the lineage splits are older than most other estimates in the literature.

And that has a rather dramatic impact on the origin of the ghost lineage. Even using a mutation rate that produces a relatively recent split, the ghost lineage would have been a distinct branch of the human family tree roughly two million years ago. That’s right about the same time as Homo erectus shows up in the fossil record. So this tree would have an extremely early branch of H. erectus moving out into Asia and being isolated from the rest of the human lineage until the ancestors of Neanderthals showed up roughly a million years later.

There’s no shortage of reasons to be skeptical about that theory, including the rapid isolation of the lineage from the lineages that remained in Africa and the fact that fertility was still possible after such a long time spent in reproductive isolation. That, plus the fact that the dates disagree with so much else in the literature pretty much guarantees that the paper will be controversial.

But the paper was never going to be the final word, since the analysis it describes doesn’t even try to include a number of additional events in human evolution that we know to be significant. We know that Denisovans contributed DNA to a number of Asian and Pacific lineages, but there are no sequences included from modern humans in those lineages. We also know another ghost lineage from around the time of the branch leading to modern humans contributed DNA—including an entire Y chromosome—to a small group of West African populations. Those aren’t represented here, either.

It’s not hard to understand why. More sequences and more branches would mean increased computation time for each tree evaluated, and adding additional potential branches means that far more trees have to be evaluated in total. But including these sorts of well-defined cases of interbreeding have the potential to provide a strong validation of any results produced by this analysis.

Fortunately, the data is all out there, and someone will undoubtedly find the computer time to make sure it gets done eventually. But this is a case where it’s unlikely to be the sort of certainty provided by obtaining a genome from the ghost lineage, given the age of these events. And it may be that the remaining signals in populations we can get genomes from aren’t strong enough to eliminate ambiguity.

It will be interesting to watch how researchers in the field deal with all these remaining uncertainties.

Science Advances, 2019. DOI: 10.1126/sciadv.aay5483  (About DOIs).