• Question: When carrying out DNA sequencing, are there ever errors in the DNA that has been sequenced? I don't mean mutations in the DNA, I mean that there has been a malfunction in the sequencing method or technology used :) During what steps in DNA sequencing are errors most likely to arise?

    Asked by anon-256449 on 8 Jul 2020.
    • Photo: Ailith Ewing

      Ailith Ewing answered on 8 Jul 2020:


      Hi Ella, yes this really common, DNA sequencing isn’t perfect and errors can be caused by both problems with sample preparation and the sequencing technology itself. Technologies vary quite a lot in terms of their error rates. As well as optimising sample preparation and improving the sequencing techology itself, we can minimise errors by sequencing the same section of DNA many many times. Then the chances of getting the same error every time we sequence it is a lot lower and we can work out what the right sequence is from the rest of the times we sequence it. When we sequence the DNA from tumours which has many more mutations we sequence each bit of DNA sequence on average 70-100 times to get the best results! We also are able to apply filters to the sequencing data in order to minimise the error rates after it’s been produced.

    • Photo: Alfonsina Arriaga Jimenez

      Alfonsina Arriaga Jimenez answered on 8 Jul 2020:


      Hello, yes there can be a lot of errors. As with every methodology, we can make mistakes in different ways. Mistakes can go from do not having any result at all, through sequencing something different, due to contamination. For example once I made extractions from a beetle and thought everything was perfect until the end when I use the DNA coding and put it online looking for a close group, and there I found that was a mite and not a beetle. Somehow while doing the extraction I take a leg of a mite and I did everything with it. In the end we always compare our results to be sure what we have is indeed DNA of the species we want.

    • Photo: Kim Liu

      Kim Liu answered on 9 Jul 2020: last edited 9 Jul 2020 8:58 am


      Hey Ella – the majority of DNA sequencing derives from the base-pairing interaction, which is known to be incredibly accurate. Nonetheless, all biological interactions rely on randomly moving molecules landing in the right places, and protein parts moving at the right times. Chemical interactions are usually strong enough to hold down the randomness, but if a molecule is moving especially fast, it may not be enough – this is the fundamental cause of basepairing errors, which lead to mistakes in both sequencing and in DNA replication in our cells. Sometimes the sequence itself is fundamentally difficult to sequence; the more mixed up the DNA sequence is, the easier it is to sequence accurately (e.g. ATGCTTGACGGACTG is much easier than CCCCCCCCCCC or AAAAAAAAAAA). Also, if the DNA forms a funny shape (e.g. GGGGCCGGGGCCGGGG forms a box-like shape), the polymerase sometimes struggles to read through it. I haven’t discussed Nanopore sequencing, which uses a more biophysical mechanism and currently isn’t as accurate as the biochemistry based methods.

      As Ailith says – in sequencing, we like to repeatedly sequence the same thing multiple times to ensure we know what we’re looking at 🙂 I suspect it can get very tricky distinguishing very rare mutations from sequencing errors or degrading DNA, and it doesn’t help that there’s always the possibility I messed up the sample somehow!

    • Photo: Anabel Martinez Lyons

      Anabel Martinez Lyons answered on 9 Jul 2020:


      Hi Ella, As the others have said very well, DNA sequencing is definitely prone to some degree of error, and certain techniques are known to have lower error rates (or you could say higher accuracy) than others. One thing that is always worth doing with all lab experiments, and this definitely goes for DNA sequencing, is repetition. As Ailith mentioned, it is common practice to sequence the same bits of DNA many, many times (each one of these is referred to as a ‘read’). Especially for whole exome sequencing or whole genome sequencing technologies (where you want to sequence millions or billions of base pairs), producing many reads is important not only for accuracy but to be able to ‘stitch’ the exome/genome together in the correct order since it isn’t possible to sequence such large proportions of the DNA in one go. Instead we have to fragment the DNA into lots of smaller, easier-to-handle pieces and sequence these individually. It then takes a great deal of computational processing to align all the reads into the correct order. The more reads you have, the easier it is to tell what ones have a technological/computational error and which ones represent the ‘real’ sequence. Hope that helps answer your question :-).

Comments