Bench philosophy: More than the Sum of its Parts

Single cell genomics
by Steven Buckingham, Labtimes 02/2014

Photo: National Science Foundation

A diploid cell contains about seven picogram of genomic DNA. Amplifying this tiny amount without messing things up and creating erroneous nucleotide sequences is only one of the challenges in single cell genomics.

How times change. Not so long ago, biological variation was the researcher’s mortal enemy. We would strangle it with statistics and bury it out of sight in means and medians. Showing error bars was a form of public confession. Now, the wind seems to have shifted in the opposite direction. Whereas up to now we have focussed on what cells have in common, now researchers are waking up to the interesting things that make them unique. Welcome to the world of single cell analysis (SCA).

Single cell genomics is the latest omics to jump on the SCA train, following close on the heels of single cell transcriptomics (such as RNAseq), single cell protoeomics and single cell metabolomics. It is getting people excited because a) we can do it and b) it turns out that – despite what our teachers told us – different cells in the same organism do not all have the same genome. And what is more, these genomic differences are important both for biology and for medicine.

It’s all about sequencing

So, why has single cell genomics suddenly become possible? It is largely the result of vast improvements in nucleotide analysis, although accompanying advances in bioinformatics and new smart ways of getting hold of single cells to do the SCA with in the first place also play a part.

Progress in nucleotide analysis over the past few years has continued to astound us. Just as we were getting used to the idea of “New Generation Sequencing” (NGS), people are already beginning to talk of 3rd generation sequencing. NGS embraces technologies, invariably based on optical methods, that parallelise the sequencing process, bringing sequencing into the high throughput bracket. There is no single distinctive feature that defines what is meant by third generation sequencing – the term is used loosely to describe several sequencing technologies that are fast and cheap enough to open up feasible personal sequencing (such as the $1,000 genome) and allow ever smaller starting samples.

Not exactly new

But there is nothing fundamentally new or particularly surprising about the overall strategy behind single cell genomics. First, get hold of your cell (or perhaps a small group of cells) and pull out the DNA. Amplify up said DNA – a process known as Whole Genome Amplification (WGA) – and sequence it. Simple!

Only it’s not. Anyone who knows anything about DNA amplification will wince at the thought of all the potential traps and pitfalls. For instance, amplifying with standard polymerases is notoriously error-prone. So, if you see something interesting in your sequence, it may not be a genuine cell-specific SNP or point mutation. It may just be an amplification artefact. Then there is the problem of amplification bias. Some bits of sequence for some reason just do better at competing with others for getting amplified. Others end up under-represented or even completely absent (“Allelic Dropout” – wouldn’t that make a great name for a band?), wreaking havoc when it comes to counting Copy Number Variations (CNVs) and leading to a host of erroneous conclusions. It’s what happens when you start out with such meagre amounts of material.

Amplification bias

Sure, there are computational methods of compensating for things like amplification bias. But at present there is no sure-fire way of confidently separating biological signal from PCR “noise”. So, how do we get around this? One way would be to use analytical tools that don’t require NGS. Researchers have successfully applied DNA microarrays and SNP arrays to single cell genomic analysis. But these methods miss out on the fine-grained detail that full sequencing offers. Admittedly, they will give good indications of things like CNVs but they are designed to spot a specific, limited set of variations. After all, if you want the full picture you need full sequencing.

Not surprisingly, then, success in single cell genomics hinges on minimising the impact of the artefacts introduced at the WGA stage. The standard approach to WGA that has dominated the scene for a few years is called “Multiple Displacement Amplification” (MDA). You take the double-stranded DNA to be sequenced and denature it to separate the two strands. Add some random primers and let them anneal to the DNA. Then amplify at a constant 30°C temperature using a DNA-polymerase with strand-displacing properties, such as phi29. As the primers elongate, they get stripped off the template like a banana being peeled, allowing new primers to access the template. At the same time, the stripped-off portions themselves are randomly primed, resulting in a hyper-branched PCR product. The peeling means the genome gets good coverage, so the technique is known for high fidelity, though it does not fix the problems of allelic drop-out or preferential amplification.

Souped-up MDA

If MDA isn’t up to scratch, then you can always try MALBAC – “Multiple Annealing and Looping-Based Amplification Cycles”. This is a sort of souped-up MDA, in which the random primers share common 27-nucleotide sequences along with their own unique 8-nucleotide sequence. You anneal the random primers to the template and, as for MDA, they peel off as they elongate. But with MALBAC you then raise the temperature again to pull them off the template, then re-anneal so that they attach to different positions on the template. This gives better genome coverage and reduces amplification bias.

You then take these partial amplicons through more priming and extension to yield, eventually, full-length amplicons. Importantly, these have the MALBAC primer sequence at one end and its complement at the other. These complementary ends allow the amplicon to form loops, which make the full-length amplicons unavailable for further amplification. Meanwhile, you take the temperature back up to recover the template and generate a load of new, and yet more diverse, amplicons. The initial priming is spread more evenly over several cycles, so the process has been described as “quasi-linear”, in contrast with the heavily exponential nature of traditional amplification. The inventors (Science, 2012, 338:1622-6) claim it is far better than MDA at picking out both alleles of known SNPs (70% compared to 10%).

Another extension of MDA is MIDAS – Microwell Displacement Amplification System. MIDAS aims to tackle the problem of amplification bias by reducing the amount of amplification to just enough for sequencing, as well as reducing the reaction volume to just 12 nl (the effectively higher template concentration is thought to produce more favourable primer-annealing kinetics). The cells are dispensed individually into microwells where lysis buffers and amplification reagents are added. The rest is just pure MDA.

But wouldn’t it be great if we could bypass the amplification step altogether? That is effectively what Single Molecule Real Time (SMRT) sequencing does. SMRT, the product of Stephen Quake’s research in Stanford, is based on a repeating cycle of adding fluorescent-labelled nucleotides one at a time, reading off the fluorescence signal, then cutting off the fluorescent tag. The platform offered by Pacific Biosciences does it a slightly different way. You attach a polymerase molecule and a bit of template to the end of a “Zero Mode Waveguide” – a fancy bit of kit that focusses a beam of light narrow enough; so that only one nucleotide base gets lit up. You add in fluorescent-tagged bases and when one of them joins on to the end of the template; you read off its fluorescent tag – that is your base call. Cut off the fluorescent tag, which then floats off out of view of the detector and wait for the next nucleotide to come along.

A bumpy road

Sounds perfect but SMRT has had a rough ride. Quake took his neat trick and formed Helicos Biosciences. They promptly went bust in 2012. And SMRT earned a reputation, not entirely unjustified, for being error-prone. But the technology has improved since then and Pacific Biosciences continue to stand by their offering.

Some of the latest sequencing platforms take a fresh approach altogether. Take, for example, the Ion Torrent Personal Genome Machine, originally developed by Ion Torrent but now owned by Life Technologies. The template DNA strand to be sequenced is placed in a microwell and one nucleotide is added at a time. If the nucleotide is capable of being added to the template; the reaction results in a single proton being released. The proton is detected by a very sensitive probe – probably the fanciest pH meter on the market. But why bother with single cell genomics? First of all, it turns out that the copy of the genome in each of our cells is not, despite what they told us at school, all the same. Remember, we are not talking about differences at the transcription level here – everyone knows about that. I am talking about differences at the level of the genes themselves, variations in the DNA sequence in the nucleus. As cells divide, for example, mutations creep in.

So what? Well, it is now becoming clear that these differences might actually be biologically important. For one thing, it looks like the amount of cell-to-cell variation is greater than we thought. Ira Hall and Fred Gage reported in Science last year (Science, 2013, 342: 632-637) that between 13 and 41% of human frontal cortex neurons had at least one megabase-scale CNV, and that some neurons had “highly aberrant genomes marked by multiple alterations”.

What makes the difference?

Single cell genomics is hot in the cancer field, too. For instance, several labs have noted that only a tiny fraction of the cells in a tumour spread to other parts of the body to give rise to metastases, and have wondered what makes them do so while others are content to stay at home. And what about ageing? What mosaic genomic changes, if any, play a part here? Single cell analyses are beginning to answer these questions.

Of course, there are some biological questions that have to be answered using single cell analyses. For instance, microbes that can’t be cultured. McClean et al. (Genome Research 23: 867-87) showed how the approach can be used to analyse pathogenic bacteria.

Single cell genomics is not easy. Overcoming the hazards of hugely amplifying a minuscule signal can, so far, only be done by using a combination of molecular biological expertise and some pretty expensive equipment. It is part of the SCA trend that acknowledges the importance of cellular uniqueness and, as such, it raises new statistical challenges. But for those determined to meet those challenges, it offers a whole new dimension to our understanding of cells.

Last Changed: 20.03.2014