Bench philosophy: Nanopore sequencing

Making the MinION Better
by Steven Buckingham, Labtimes 01/2017

A new generation of DNA sequencing is here, spearheaded by nanopore sequencing, such as the MinION sequencer from Oxford Nanopore Technologies (ONT). Is MinION the cure for all sequencing woes?

Nanopore sequencing is so much of a step forward that it has been called the “third generation” of sequencing (TGS). It is seen to be as much of an advance over second-generation sequencing (SGS), as SGS was over the Sanger method. That is partly an acknowledgement that the way it works is so completely different to any method of sequencing that went before it. Whereas those “old” methods all relied on chemically manipulating the source DNA, nanopore sequencing almost literally reads the sequence of bases along a single DNA molecule, one-by-one.

It does this by passing the DNA strands through nanometre-scaled pores and reading the resistance of the pore as the DNA passes through. What you need to note here is that the reading takes place at the molecular scale – directly reading off individual molecules, rather than masses of molecules, as is done in chemically-based sequencing. Further, it relies on physics, not chemistry. That means the method offers itself particularly well to miniaturisation and automation.

ONTs MinION device represents the culmination of those two drives. It is incredibly small, just 10 x 3 x 2 cm, about the size of a glasses case. It weighs some 90 grams, so you would hardly notice it in your pocket. It connects to your computer through a USB3 connection and comes with driving software that works on Windows or Mac.

The MinION is simply plugged into a laptop for base calling via the Metrichor programme. Photo: Exceter Sequencing Service

The working heart of the MinION is a membrane containing 2,048 pores. These pores are divided into 512 groups of four. The reason for doing it that way is because you can't predict exactly how well an individual pore is going to perform. The driving software looks at each group of four pores and chooses the best one. After a while, the performance of that best pore eventually declines and the software switches to another pore, to keep the throughput up. The pore-bearing membrane sits on top of a specialised chip, which controls the pores and reads off the data.

The DNA in the sample has to be modified for the process to work. All the ingredients for the chemical preparation come in a kit. Two adaptors are added to the DNA. The first is a Y-shaped adapter, which is ligated onto one end of the double-stranded DNA (dsDNA). This adaptor guides the DNA towards a molecular motor located at the pore. The second adaptor is a hairpin that caps the other end of the DNA. The result is a very long hairpin with the Y-adaptor at the end.

Once the Y-adaptor has attached the DNA to the motor, the DNA is unwound as it is threaded through the pore. Eventually, the hairpin adaptor goes through the pore, followed by the complementary strand. This means that each dsDNA molecule is read twice, raising the accuracy of the read. The user thus has the option of a quicker but less accurate, one-directional (1D) read, or a slower but more trustworthy 2D read.

As the residues pass through the pore, they change the electrical resistance of the pore. This can be observed by applying a constant voltage and measuring the fluctuations in the current, something familiar to anyone who has experience with patch clamping of single ion channels. You can look at this raw output (squiggle plot), if you really want to, but thankfully the software takes care of translating this Morse code for you.

Not that a mere human has any chance of being able to make sense of the raw output in the first place. It is not, unfortunately, a matter of mapping each squiggle to a residue. That is because, in reality, bases are not read one at a time. There is no known way of doing that with this kind of nanopore technology.

Instead, MinION actually reads five at a time. The relationship between the shape of the squiggle and the 5mer that caused it, is, therefore, really complicated. So the MinION device uses an inferencing approach, modelling that relationship as a hidden Markov process and solving it using a standard inferencing algorithm.

Base calling over the internet

We have been talking about MinION doing this base calling but that isn't actually quite what happens. In fact, none of this “base calling” is done by that tiny box attached to your computer. Instead, base calling is done over the internet, using ONT's cloud-based Metrichor service.

That may present a problem for some researchers. First, Metrichor is proprietary software. Until recently, that meant that you don't really know how it does its work, which is a barrier to fully evaluating published papers. However, ONT have now made their Metrichor source code available to registered users. All the same, base calling over the web does mean relying on an internet connection. Thankfully, a number of fully open-source alternatives have become available.

But MinION is not perfect. For one thing, getting long reads is great but it can be a struggle handling the DNA, even before it hits the pores.

David Buck is head of High Throughput Genomics at the Oxford Genomics Centre. He has worked with Genomics PLC who have gathered together some of the world leaders in deep genomic sequencing. “We were interested in MinION because we wanted to be able to piece together haplotypes to find a single mutation, or even two mutations, in the same gene,” says Buck. “We also needed to get full length transcriptome sequences, so we could look for splice variants. MinION's long reads was key to that: being able to amplify full length of DNA means it is much easier to find splice variants.”

But there was a problem with preparing the DNA. “The protocols are actually pretty straightforward but the hardest thing is keeping the molecules large. It's quite ironic really. In theory, MinION should have no limit as to the length of the reads. But no matter how hard we tried, we would at best get 100-200 kb reads.”

The problem is that the DNA is easily sheared, making long reads impossible. “There are all sorts of tricks to get around it,” says Buck. “For instance, there are kits that repair nicks, or you can do size selection. But they don't often work. People wanting to sequence clinical samples, for example – they may want to sequence long reads from frozen samples that have been hanging around for ages. They are unlikely to get large enough molecules. So MinION must be able to offer the compensatory advantages of being cheap and quick.”

Johanna Rhodes, an Early Career Research Fellow at Imperial College London, came across the same problem. Rhodes is using DNA sequencing to identify resistance mutations in Aspergillus fumigatus. Aspergillus infections are a major cause of death in lung transplants. But her work got sidelined by an outbreak in another pathogen, Candida auris. Rhodes wanted to trace the spread of infection with a genetic epidemiology approach.

Rhodes was attracted by MinION's promise of long reads. “We were looking for SNPs and insertions of 34 or 48bp – in Aspergillus these are fairly easy to spot but in the case of Candida we didn't know what we were looking for. The pathogen had only been discovered in 2009. The mutations of interest could be anywhere, so we had no real choice but to sequence the whole genome.”

But when Rhodes started feeding Candida DNA into MinION, things went wrong. “MinION turned out not to be the best choice, after all. DNA extraction methods require bead beating, which sheared the DNA. That's a real problem for Aspergillus – it is a tough-skinned fungus so need to do some pretty vigorous bead beating, which broke up the DNA. But Candida is a yeast, so we could treat it gently enough to get the DNA. Then MinION really came into its own.”

Buck's lab was involved in the early roll-out of the MinION. “I have always been a tech guru,” Buck confesses. “I took the opportunity to sign up with the early access program. We were in there right from the early days of Oxford Nanopore when they were first developing this stuff. My colleagues helped develop the algorithms and I even shared an office one day a week with a member of the Nanopore team. We have grown with MinION, since we got one in 2014.”

Yes, definitely an early adopter.

But Buck does not look back on MinION's then performance with rose-tinted glasses. “In the early days, there was a slew of Principal Investigators with interesting questions that relied on being able to get long read lengths. But we found MinION did not produce very much data and the reads were of disappointingly poor quality. They weren't very long, either. So, one by one, those eager PIs gradually disappeared.”

Improved accuracy

Indeed, MinION's Achilles heel has always been its read accuracy. Buck was getting some 95% accuracy, compared to the 99.8% you expect from Illumina. But he persevered. “I am stubborn,” says Buck. “I got involved in the MinION Analysis and Reference Consortium (MARC) programme, set about sequencing bacterial genomes, characterising the E. coli genome, all the while working with ONT to make MinION better.”

ONT's open involvement with the research community has paid off. MinION's performance has soared. “The early days were awful,” recalls Buck. But the quality of the data has improved amazingly. We started getting an accuracy of 95% – still not as good as Illumina. The main problem seemed to be that the original device used a base calling algorithm that applies a training set that did not translate well to humans. So, we had to retrain the system and it was only a matter of time before we helped ONT fix the problem.”

MinION today is not the MinION of even a year ago. There have been major changes in the chemistry in the DNA preparation kits and in the pores themselves, coupled with equally large adjustments to the bioinformatics. The SGS assembly algorithms, which were originally designed with short read limitations in mind, have been replaced. With the current system, Buck is getting up to 6-7 GB of human genomic data in 24 hours.

Sequencing data overkill

He is also impressed with the pace of MinION's improvements. “I was using their ‘old’ chemistry and getting a decent yield and long reads. But using a new pathogen meant I needed to create a reference genome. With that old chemistry I had no choice but to sequence with Illumina to get a scaffold. Then ONT sent me their latest chemistry and now I can do away with using Illumina altogether. I can identify mutations within 24h. MinION streams data in real time, so I ended up with a lot of sequence in a very short time.”

Ironically, MinION's rate of delivery raises its own problems. And it will get worse – the next step after MinION is PromethION, which is basically the same as MinION except that instead of MinION's 512 channels, it comes with 144,000 channels. Buck says his lab estimates that if they go with ­PromethION they will have to spend an estimated £0.75 million on storage alone.

The PromethION Early Access Programme (PEAP) is already under way. And Nanopore has yet more up their sleeve. At the December's Nanopore Community Meeting, ONT referred to a forthcoming flow-cell dongle – or “Flongle” – that allegedly takes care of the whole process, including the DNA preparation. Just drop your material in the top, snap the lid on and a genome comes out the other side.

Sounds too much like science fiction. But with companies like ONT around, an awful lot of science fiction is beginning to look like science fact.

Last Changed: 11.02.2017