What is it about?

DNA is a long, stringy, sticky molecule that encodes the recipe book of the living world. Specific sequences of instructions encoded in DNA dictate the behaviour of living things, and can be used to distinguish populations, individuals, and sometimes even cells. The distinguishing features of these DNA sequences are so small that they cannot be observed directly, so we manipulate and observe DNA through chemical, optical, and electrical means to create a representation of DNA sequences as files on a computer, known as DNA sequencing. Occasionally, these representations can get jumbled up, such that the data stored on the computer is no longer a true representation of the thing that it came from. These "chimeric" sequences can be formed either as a result of chemical manipulations during the preparation of DNA for sequencing, or as a result of computer code joining different sequences together after on the computer. We have used a DNA sequencing device, the Oxford Nanopore MinION, to characterise chimeric sequences. The sequencing is carried out by hundreds of electrical eyes observing the changing shape of DNA as it passes through tiny holes in the device. The electrical signals produced by this device are descriptive enough that chimeric reads can be detected and filtered out, resulting in a truer representation of the thing that has been sequenced.

Featured Image

Why is it important?

Our findings demonstrate one particular advantage for long-read sequencing in that these chimeric reads can be characterised and filtered out before they become a problem for downstream analysis. Chimeric sequences have been observed in reads produced by other sequencing devices. This paper demonstrates that these chimeric reads also occur in nanopore sequencing, and the methods we have used for this investigation indicate that the majority of these chimeric reads are being formed at the sample preparation stage, rather than in-silico. One area in which chimeric reads may be a problem is in sample barcoding, where different samples have a characteristic tag attached to their DNA prior to sequencing. If a chimeric read is formed from two different tags, it is possible that a DNA sequencer could mislabel the read as originating from only one of the two samples (or as another sample entirely), which can be a problem when small counts of particular sequences are important (e.g. single-cell sequencing, metagenomic assembly). This paper demonstrates the benefit of long-read sequencing, allowing researchers to drill down a bit more into the data to find answers to why a particular sequence-associated event might be happening. This paper is part of the beginning of discoveries that are not possible using other existing sequencing technologies. We have also explored the raw electrical signal in a few cases, and have demonstrated that it can provide more information than can be obtained from a called sequence alone.

Perspectives

This investigation arose out of a serendipitous discovery of sequenced reads that mapped to multiple regions of the mouse genome. This was completely unexpected given that our DNA samples had been created from three separate PCR reactions, which were only combined when sequencing adapters were joined onto the DNA. The discovery only happened because I released the entire set of base-called reads to Olivier (a co-investigator), who carried out a simple BLAST search on one of the longest reads. Although this discovery seemed important at the time, a lot more work was needed to answer obvious questions about the nature of the chimeric reads: * How common was it? * Was it a side-effect of the sample preparation method? * Was it something that was real, or was it done entirely in the computer? * Was it possible to fully-characterise the chimeric nature of *every* read? Answering these questions took a long time, and in the interest of getting this discovery out as early as possible, I have not fully explored the phenomena. As always, more questions come up, and there's never enough time to find an answer to everything.

Dr David A Eccles
Malaghan Institute of Medical Research

Read the Original

This page is a summary of: Investigation of chimeric reads using the MinION, F1000Research, May 2017, Faculty of 1000, Ltd.,
DOI: 10.12688/f1000research.11547.1.
You can read the full text:

Read
Open access logo

Contributors

The following have contributed to this page