Image of a man gesturing behind a lectern.
Enlarge / Centers for Disease Control and Prevention Director Robert Redfield speaks during a press conference about the 2019-nCoV outbreak.

As the recently discovered coronavirus has rapidly spread beyond its origins in China, health authorities around the world have needed to quickly develop testing capabilities. In the United States, that task has been performed by the Centers for Disease Control (CDC), which has published its methodology and is currently in the process of applying for an emergency waiver to allow medical-testing facilities to perform these tests.

But if you’re not familiar with the tools of molecular biology, the CDC’s testing procedure might as well be written in another language. What follows is a description of how to go from an unknown virus to a diagnostic test in less than a month.

Starting from nothing

When Chinese health authorities were first confronted with the outbreak, it had a disturbing familiarity. They had already dealt with a similar set of symptoms during the SARS outbreak in the early 2000s and had seen the spread of MERS a decade later. Thanks to these and related viruses, we already had a detailed description of the structure of the typical coronavirus genome as early as 2005. That knowledge would undoubtedly prove essential for the first step in developing a rapid diagnostic test: characterization of the genome of the new virus, 2019-nCoV.

Because we know what the average coronavirus looks like, we have been able to identify areas that don’t change much over the evolution of new members of this family of viruses. And that allows us to obtain sequences of its genome without first isolating the virus.

The first challenge of sequencing a coronavirus genome is that it’s made of RNA rather than DNA. Most of our tools for working with nucleic acids are specific to DNA. Fortunately, we’ve discovered an enzyme called “reverse transcriptase” that takes RNA and makes a DNA copy of it—transcription is the copying of DNA into RNA; this enzyme does the opposite, hence the name. (Reverse transcriptase was first identified in other RNA viruses that need to be copied into DNA as part of infection.) Using reverse transcriptase, researchers were able to make DNA copies of parts of 2019-nCoV as a first step to studying its genome.

But reverse transcription of samples from infected individuals would simply create a mess of DNA fragments from everything present: the patient’s own cells, harmless bacteria, and so on. Fortunately, DNA sequencing and analysis techniques have become so advanced that it’s now possible to just sequence the whole mess, irrelevant stuff and all, and let computers sort out what’s present. Software is able to take what we know about the average coronavirus genome and identify all of the fragments of sequence that look like they came form a coronavirus. Other software can determine how all these fragments overlap and then stitch them together, producing a near-complete coronavirus genome.

At this point, Chinese health authorities recognized that the virus involved in these infections was new, and they rapidly published the virus’s genome sequence so that other health organizations could be prepared.

From genome to sampling

To make a diagnostic test specific to 2019-nCoV, researchers had to look for areas of its genome that don’t change rapidly over coronavirus evolution but have changed enough in this branch of the family that they can be viewed as its distinctive signature. Those sequences can be used to design a means of amplifying a piece of the 2019-nCoV genome using a technique called the polymerase chain reaction, or PCR.

We won’t go into all of the technical details of how PCR works, in part because we’ve already done so. For the purposes of understanding the diagnostic test, all you have to know is that you need to design two small pieces of DNA that match (meaning they can base pair with) two sections of the genome a few hundred base pairs apart. These small pieces of DNA are called “primers.” PCR will amplify the section of DNA between the two primers.

It does this by putting the DNA through heating and cooling cycles in the presence of enzymes that copy DNA. Each time through the cycle, the enzymes can make two new copies of the section between the primers. Using this process, it’s possible to take a stretch of DNA that’s extremely rare and produce billions of copies of it.

But PCR works with DNA, and the coronavirus is made of RNA. So we need to use reverse transcriptase first before trying to perform PCR. Fortunately, companies have developed solutions that have all the enzymes and raw materials that both reactions need, allowing for coupled reverse transcriptase-PCR reaction mixes. The combination of reactions has been termed RT-PCR. With the right primers, RT-PCR can allow us to start with a chaotic mix or RNA and leave us with a lot of copies of a specific piece of the 2019-nCoV, provided any was present in the original sample.

The problem is that PCR is so sensitive that it can also amplify small errors—primers sticking to a distantly related sequence, a distantly related coronavirus in the sample, or even contamination from the previous sample. Even though these errors are rare, the exponential amplification provided by PCR can eventually allow one to dominate the sample. Fortunately, people have devised a way of taking advantage of the rarity of these errors.

Get real

If the right sequence for the primers is present—meaning 2019-nCoV is present in the sample—amplification will typically start with the very first cycle and grow rapidly. Errors, in contrast, may take a few cycles to occur and amplification therefore lags for a bit. To figure out when 2019-nCoV is really present, we have to identify when the amplification happens quickly and when it lags. We have to observe the progress of the PCR cycles in real time.

To do so, scientists developed a dye that only fluoresces if double-stranded DNA is present. As the reaction starts, there’s very little of that around, so fluorescence is low. But as more amplifications occur, the glow rapidly rises until there’s so much DNA that sensing the difference between cycles becomes impossible. If the amplification starts early, this rise and saturation occurs early; if it depends on an error, then it takes longer to see them.

Thus, real-time RT-PCR (RRT-PCR, for those excited about jargon) gives us a way to determine whether a PCR amplification occurs because our sequence of interest is present. (It can also be used to get an estimate of the relative amount of that sequence is present, but that’s not needed for this test.)

Because this is such an important technique, companies have developed products based around it. You can buy the fluorescent dye, enzymes, etc., as well as a machine that integrates the thermal hardware to cycle the reaction and has a light sensor to monitor the fluorescence. If you wanted to do this yourself, appropriate hardware seems to be available on eBay for somewhere in the neighborhood of $2,000.

Kits aren’t all you need

If you look at the CDC’s instructions, however, you’ll see little discussion of the hardware or enzymes. Instead, you’ll find discussion of ways to avoid contamination. If a facility is doing lots of sample testing, there’s going to be no shortage of 2019-nCoV DNA around, both from the samples and from the previous PCR reactions. Given the ease with which PCR can amplify rare sequences, this can create the risk of hordes of false positives. So the CDC details reams of best practices, like preparing RT-PCR reaction mixes with a separate set of hardware than that used to handle samples.

Another big chunk of instructions involves the details of appropriate controls. Some of those leave out key reaction components like enzymes or sample RNA, in order to make sure that contamination is not producing spurious results. This will tell you whether you should trust positive results. There’s also a positive control, to make sure that there isn’t something wrong with the reaction mix, thus telling you whether you can trust negative results.

That said, the tests aren’t going to be definitive. We don’t know enough of the virus’ lifecycle to know the dynamics of infection yet: how long after infection does the virus become detectable, and when does that compare with the onset of symptoms. It’s quite possible that asymptomatic infected people won’t have enough virus for this test to pick up the virus consistently. So the CDC is still advising caution with people considered to be at risk of infection.

Still, as cases of person-to-person transmission outside China appear to be ramping up, testing without the need to ship samples to CDC headquarters in Atlanta could help significantly with our ability to respond to a rapidly changing outbreak.