What is it about?
DNA is a long, stringy, sticky molecule that encodes the recipe book of the living world. Specific sequences of instructions encoded in DNA dictate the behaviour of living things, and can be used to distinguish populations, individuals, and sometimes even cells. The distinguishing features of these DNA sequences are so small that they cannot be observed directly, so we manipulate and observe DNA through chemical, optical, and electrical means to create a representation of DNA sequences as files on a computer, known as DNA sequencing. Occasionally, these representations can get jumbled up, such that the data stored on the computer is no longer a true representation of the thing that it came from. These "chimeric" sequences can be formed either as a result of chemical manipulations during the preparation of DNA for sequencing, or as a result of computer code joining different sequences together after on the computer. We have used a DNA sequencing device, the Oxford Nanopore MinION, to characterise chimeric sequences. The sequencing is carried out by hundreds of electrical eyes observing the changing shape of DNA as it passes through tiny holes in the device. The electrical signals produced by this device are descriptive enough that chimeric reads can be detected and filtered out, resulting in a truer representation of the thing that has been sequenced.
Featured Image
Why is it important?
Our findings demonstrate one particular advantage for long-read sequencing in that these chimeric reads can be characterised and filtered out before they become a problem for downstream analysis. Chimeric sequences have been observed in reads produced by other sequencing devices. This paper demonstrates that these chimeric reads also occur in nanopore sequencing, and the methods we have used for this investigation indicate that the majority of these chimeric reads are being formed at the sample preparation stage, rather than in-silico. One area in which chimeric reads may be a problem is in sample barcoding, where different samples have a characteristic tag attached to their DNA prior to sequencing. If a chimeric read is formed from two different tags, it is possible that a DNA sequencer could mislabel the read as originating from only one of the two samples (or as another sample entirely), which can be a problem when small counts of particular sequences are important (e.g. single-cell sequencing, metagenomic assembly). This paper demonstrates the benefit of long-read sequencing, allowing researchers to drill down a bit more into the data to find answers to why a particular sequence-associated event might be happening. This paper is part of the beginning of discoveries that are not possible using other existing sequencing technologies. We have also explored the raw electrical signal in a few cases, and have demonstrated that it can provide more information than can be obtained from a called sequence alone.
Perspectives
Read the Original
This page is a summary of: Investigation of chimeric reads using the MinION, F1000Research, May 2017, Faculty of 1000, Ltd.,
DOI: 10.12688/f1000research.11547.1.
You can read the full text:
Contributors
The following have contributed to this page