[HN Gopher] Sequencing your DNA with a USB dongle and open sourc...
       ___________________________________________________________________
        
       Sequencing your DNA with a USB dongle and open source code
        
       Author : TangerineDream
       Score  : 194 points
       Date   : 2021-02-03 15:25 UTC (7 hours ago)
        
 (HTM) web link (stackoverflow.blog)
 (TXT) w3m dump (stackoverflow.blog)
        
       | yters wrote:
       | How do we get a dongle?
        
         | jmiskovic wrote:
         | The source README mentiones supporting MinION ($1000)
         | https://nanoporetech.com/products/minion
         | 
         | Nice video here https://www.youtube.com/watch?v=1_mER5qmaVk
         | 
         | Found some previous discusion of HW:
         | 
         | https://news.ycombinator.com/item?id=16262719
         | 
         | https://news.ycombinator.com/item?id=7893158
        
           | yters wrote:
           | Incredible, thanks! Cheaper than a shotgun sequencer :)
        
             | danpalmer wrote:
             | Bear in mind that it'll only work once. Additional
             | consumable flow cells are a similar price.
        
         | snypher wrote:
         | I think it's misleading for the article to call this a dongle.
         | It's a $1k USB device with expensive consumables, more like a
         | printer.
        
         | [deleted]
        
       | kneel wrote:
       | Nanopores have unacceptably high error rates. Around 10%
        
         | sannee wrote:
         | Is this an accuracy or precision issue? I am imagining that if
         | you actually have access to the device, you could do as many
         | runs as you want, getting to arbitrarily low error rates.
        
           | brofallon wrote:
           | This is a common misconception - "averaging out" errors only
           | works if the errors are pretty rare at any given site. This
           | is true for some types of errors & sequencing technologies,
           | but not universally true. Some types of DNA sequences (most
           | notably homopolymers and other simple repeats) are very
           | difficult to sequence correctly, and X% of the reads there
           | will be incorrect. If X>20% of so, then it may look like real
           | germline variation no matter how many reads are sequenced
        
           | koeng wrote:
           | The errors are non-random. That's why they use machine
           | learning to figure out those errors. You could, of course,
           | also just do traditional statistics on sequences that you
           | want to sequence all the time. I've done that with plasmids
           | before, and it works pretty good. I think there are a few
           | papers on it too.
        
             | searune wrote:
             | > The errors are non-random.
             | 
             | Could you elaborate / give an example? Are the errors
             | deterministic? Is it like ISI (Inter-Symbol
             | Interference[1]) in signal processing, where some symbols
             | interfere with the reception of the next symbol(s)? Are
             | there short range errors (one letter) or long continuous
             | errors?
             | 
             | [1] https://en.wikipedia.org/wiki/Intersymbol_interference
        
               | marsdentech wrote:
               | It's a complicated issue; I tend to think of the error
               | component of any one MinION observation as being a
               | function of the k-mer in the pore at the time (i.e. the
               | subject of the observation) and, with some decaying
               | dependence, the sequences (i.e. in both directions) that
               | extend out from either side of the target k-mer. You
               | might say that MinION error is a function of the target
               | k-mer and its immediate environment. It gets even messier
               | when you try to imagine the form of that function; for
               | one, it's not _completely_ good enough to remain in
               | sequence space alone: among other things, the "shape"
               | (i.e. the conformation) of that (DNA or RNA) molecule
               | around the target k-mer will influence how the shape of
               | the pore will change in response to the target k-mer,
               | which, in turn, will influence the observed current
               | signal (i.e. manifest as a deviation from the "expected"
               | or "ideal" current signal for that k-mer!). As I
               | understand it, Nanopore don't spend too much time
               | actually modelling k-mer-in-pore dwell-mechanics; instead
               | their best base callers use machine learning to
               | generalise across the swathes of available sequencing
               | data for known targets (and give really quite impressive
               | results, all things considered).
        
               | koeng wrote:
               | https://gist.github.com/Koeng101/abc674e1acd575646748afcb
               | cc7...
               | 
               | There is a real example I ran a few months ago. How to
               | read it is here
               | https://en.m.wikipedia.org/wiki/Pileup_format
               | 
               | Positions like 172 have errors more often than not
               | because the basecaller is wrong sometimes (note: this is
               | from a sequence verified sample).
               | 
               | The errors come up more often in some sequences than they
               | do in others. I'm not really sure about symbol
               | processing, but if you have any beginner resources for
               | that I'd appreciate them!
        
           | dnautics wrote:
           | don't know why this was downvoted. If I'm not mistaken, there
           | is generally a high error rate per pore fundamentally because
           | it's a single molecule experiment. These get averaged out,
           | but may be difficult to align as it might not necessarily be
           | a straightforward averaging. There are also segments that are
           | fundamentally generally difficult to sequence correctly
           | (single nucleide runs, not even a super high n) that will
           | probably never get satisfyingly resolved no matter how many
           | times you sequence.
        
         | searine wrote:
         | It should be noted that the "errors" in this case are gaps in
         | sequence. Sometimes the DNA strand slips through the pore and
         | some bases aren't called.
         | 
         | The actual base calling is on par with Hi-seq in my experience.
         | In software terms, you are missing chunks of code, but aren't
         | flipping bits.
         | 
         | This is important because in certain experiments, you care less
         | about those gaps (scaffolding for example). So you can get a
         | lot of cheap utility out of nanopore sequencing.
        
         | chrisamiller wrote:
         | That all depends what you want to do with the data. For
         | assembling new genomes, they produce very long reads that are
         | essential for "scaffolding". They're also great for structural
         | variant detection (large rearrangements of DNA). DNA sequencing
         | is not a monolith and there's room for lots of different
         | complimentary technologies.
        
         | koeng wrote:
         | Are you sure about that? My last consensus run worked with
         | complete coverage of ~410 bp region. Here is a gist of the raw
         | pileup without consensus -
         | https://gist.github.com/Koeng101/abc674e1acd575646748afcbcc7...
         | 
         | Visually, I think, you can see that it isn't THAT bad (low
         | coverage at the ends is because of how I barcoded the
         | sequences).
         | 
         | I hate to be that guy, but have you actually used the
         | technology? And if so, approximately what year? Unacceptable
         | for what procedure? Do you have any raw reads that have been
         | troubling you?
        
           | searine wrote:
           | They mean at genome-wide scales. If you are just doing a
           | 410bp the sequence is short enough that the signal of is
           | going crush and noise you get from strands slipping in the
           | pores.
           | 
           | The errors nanopores get are gaps, not base pair
           | substitutions. So with things like viral or bacterial
           | sequencing you don't really have huge issues.
           | 
           | When you are doing large eukaryotic sequences with lower
           | coverage on average, you start picking up a lot of deletion
           | artifacts. Which isn't a huge deal if you have a very well
           | annotated genome like human, but if you are doing pioneer
           | genomics it can create some difficulties. Often if the genome
           | isn't well annotated, its best to pair nanopore with short
           | reads.
        
             | koeng wrote:
             | The gaps are usually homopolymers and such, which should
             | get helped by R10 pores. But true, at low coverage, things
             | can get tougher!
        
         | marsdentech wrote:
         | This is a common, and often justified, though not always fair,
         | criticism. MinIONs have an error rate of around 10% for _any
         | given base_. Moreover, these errors aren't entirely independent
         | of one another, so if you struggle to sequence a given base the
         | first time, you're likely also to struggle if you try again.
         | That said, if your experiment is such that you're only
         | sequencing a guaranteed single target (e.g. one, isolated
         | coronavirus genome), in that one sequencing run (on that one
         | flow cell), you'll "re-sequence" the same any given region many
         | times and, unless you're looking at "problematic" (i.e. low-
         | complexity) regions, you _will_ be able to "average out" the
         | errors to reveal the true target sequence. On the other hand,
         | if you're trying to co-sequence a mixture of closely-related
         | targets, that's when the headache starts...
        
       | samchorlton wrote:
       | So happy to see this here. While sequencing is quite old, mass
       | adoption still has not come. The benefits are clear - faster
       | infectious disease diagnosis, personalized treatment, tracking
       | the spread of infection, identifying food contamination - the
       | use-cases are endless. However before nanopore sequencing came,
       | it was always out of reach of the masses.
       | 
       | We've actually started BugSeq[0] to help labs get into nanopore
       | sequencing - improving these open source tools and also writing
       | our own. Orgs like FDA, USDA, big food co's, CDC, etc are now all
       | adopting nanopore sequencing. Happy to see the industry taking
       | off, this will be a step function improvement for public health
       | in general.
       | 
       | (disclaimer: founder of BugSeq) 0: https://bugseq.com
        
         | dekhn wrote:
         | personalized treatment is still best handled by gene panels.
         | nobody has made a compelling argument for WGS for personalized
         | med. Right now it's a huge waste of investment until we
         | understand the multigenicity of diseases better (which is a
         | research problem best solved by sequencing millions of
         | individuals and using high quality WGS sequencers).
        
           | nextos wrote:
           | I think typing your HLA class I and II genes is the single
           | most valuable thing you can get now from your genome. It's
           | also pretty likely to remain extraordinarily valuable even if
           | whole-genome sequencing prices drop to nearly zero.
           | 
           | HLA associations with autoimmune disorders are
           | extraordinarily strong. Same applies to infectious diseases,
           | vaccine efficiency and checkpoint inhibitor efficiency.
           | 
           | While you can type HLA with classical techniques, the only
           | really reliable way is really to use long reads.
           | 
           | Same applies to CYP enzyme superfamily, where variation is
           | linked to some rare drug toxicity events for example.
           | 
           | We should all know our HLA and our CYP genotypes. Why 23andme
           | does not even attempt to impute HLA is beyond my
           | understanding.
        
           | teekert wrote:
           | One example: Homologous Recombination Deficiency, the
           | signature it leaves genome-wide and the associated
           | sensitivity to PARP inhibitors.
           | 
           | But agreed, it is about time we start to understand
           | regulatory regions better. But that will require gathering
           | more WGS data, and indeed most data is Whole Exome or Panel.
        
             | dekhn wrote:
             | Research project, not actionable human health. I fully
             | support large-scale WGS projects and hope that some day one
             | of them will have a recognizable impact.
        
               | samchorlton wrote:
               | I don't know about this specific example, but DNA
               | sequencing is already routinely used for personalized
               | oncology therapeutics outside of clinical trials, so not
               | really research project.
               | 
               | Source: Am MD and practice laboratory medicine.
        
               | dekhn wrote:
               | Sure. Doctors love to try new technologies. most of the
               | reports of success are happy narratives, not evidence
               | based medicine.
        
           | samchorlton wrote:
           | We work within the infectious disease space, so I'll give an
           | example from our work that is still personalized medicine:
           | Faster detection of antimicrobial resistance. Every infection
           | will be resistant to different
           | antibacterials/antivirals/antifungals/antiparasitics. What if
           | we could get the patient on the right antimicrobial for their
           | specific infection faster? There's strong evidence that
           | timely administration of correct antimicrobials in septic
           | shock results in improved mortality.
           | 
           | Nanopore sequencing very much has the potential to deliver
           | this personalized treatment, without looking at any human
           | genes or panels. If we could rapidly sequence bacteria in the
           | bloodstream and predict their antimicrobial susceptibilities,
           | we can make a difference.
        
             | dekhn wrote:
             | What you're describing is a very reasonable research topic
             | with some supporting evidence.
             | 
             | What I'm saying is that nobody has delivered on any of the
             | huge claims about the genome which genomicists made for the
             | last 20 years, specifically in terms of actionable human
             | health.
             | 
             | it's time to start calling the bluff.
        
               | samchorlton wrote:
               | I'm not exactly sure how you can say that.
               | 
               | The following have been revolutionized by the human
               | genome project and subsequent technological innovation in
               | sequencing:
               | 
               | -Non-invasive prenatal diagnostics
               | 
               | -Screening for cancer with cell-free DNA
               | 
               | -Rapid and accurate diagnostics for children with
               | suspected genetic disorders
               | 
               | -Targeted cancer therapeutics
               | 
               | Many of these are already in routine clinical use in high
               | income countries and result in significant improvement in
               | human health.
        
               | [deleted]
        
               | dekhn wrote:
               | The impact is minor and most of the progress did NOT come
               | from HGP data.
               | 
               | I worked in genomics for 20 years. I have deep knowledge
               | of biology and medicine. And the reality is, for the
               | amount of money invested, the actionable medical returns
               | have been relatively tiny and industry continues to not
               | invest in sequencers for a good reason.
        
               | searine wrote:
               | >What I'm saying is that nobody has delivered on any of
               | the huge claims about the genome which genomicists made
               | for the last 20 years, specifically in terms of
               | actionable human health.
               | 
               | I mean. Sure, sequencing the human genome didn't solve
               | our problem overnight, and you can't sequence a genome at
               | a vending machine for a nickel to tell your future, but I
               | think there has been an avalanche of medical data derived
               | from the genome and that is only continue to get bigger.
               | 
               | Now that we are really starting to figure out the
               | polygenic risks and the single deleterious variants and
               | their links with phenotype, people will have a much
               | better picture of what their future might hold (and how
               | to prevent it).
               | 
               | I don't think it was ever a bluff. The problem just
               | turned out harder than we thought it was going to be.
        
               | dekhn wrote:
               | it didn't turn out to be harder than _I_ thought it was
               | going to be. I came into this in the 90s fully prepared
               | for the idea of polygenic risk. In my opinion, most
               | people who did molecular biology first think that way,
               | while most people who learned mendelian genetics don 't.
               | 
               | I had my genome sequenced a few years ago by Illumina.
               | They had a big slick presentation, blah blah blah, ApoE1,
               | etc. When the genetic counsellors came to my genome they
               | said "huh. you don't have any risk factors". I checked
               | and each of their risks was from an existing gene panel,
               | so the WGS wasn't valuable (it's on PGP, if you want to
               | work with it https://my.pgp-hms.org/profile/hu80855C).
               | 
               | I talked in more detail with the counsellors. Turns out,
               | whenever they saw a novel variant that wasn't covered by
               | a gene panel they were googling the variant and skimming
               | the abstracts of papers.
               | 
               | It was at that point I realized the difference between
               | research, PR, and actionable medical data.
        
         | ngcc_hk wrote:
         | All great until this was used for people control. Collecting
         | dna which you cannot control and even can trace your race or
         | relatives.
         | 
         | We have internet. Great. But look at the dark side. DNA is
         | great like target medicine but you have totalitarian regime
         | which might use it.
         | 
         | Need some sort of awareness. How to deal with the two sides,
         | let us discuss once you know there is a very dark side to it.
        
           | samchorlton wrote:
           | Thanks for your concern. All technologies come with benefits
           | and risks. Of course, DNA sequencing can be used for harmful
           | purposes, eg. tracking individuals. We should be very
           | cautious of these risks as the technology develops, and take
           | well thought out steps to mitigate them. A similar analogy
           | can be made to the internet and tracking people. Overall,
           | however, the benefits of DNA sequencing to society already
           | far outweigh these risks.
        
       | ordu wrote:
       | _> If you try to commercialize it, that takes a while to start a
       | company, and it can take so long that by the time you go to the
       | mechanics of that, the next thing has already emerged._
       | 
       | Technological singularity is here! :)
       | 
       | [1] https://en.wikipedia.org/wiki/Technological_singularity
        
       | Ovah wrote:
       | Anyone with hands on experience using NanoPore? I've been
       | thinking about buying one of these to play around with. But
       | anecdotally I've heard that they lack utility or are my concerns
       | just myths? a) they are designed to handle many batched samples
       | at once rather than many runs of few samples over time. So in
       | practice they don't really last for many individual samples. b)
       | the computational requirements are high. So while a NanoPore can
       | be plugged into a laptop in the field it would take forever to
       | run the data processing on said computer.
        
         | bioinformatics wrote:
         | Computational requirements are quite high, but OK if you have
         | good GPUs on hand. A coronavirus sequenced sample on the fast
         | mode without GPUs would take 3-4 hours to complete, while on
         | the high accuracy mode days. GPU access would speed up
         | performance considerably.
         | 
         | Error rate for MinIONs is still quite high (10-15%), so a human
         | genome sequencing would be quite inaccurate in some regions.
         | 
         | Sequencer is quite cheap, reagents and flow cells are a little
         | bit more expensive.
        
           | gnramires wrote:
           | Is the error rate per base pair?
        
           | Ovah wrote:
           | Thank you. The upfront cost of the sequencer sure makes it
           | tempting at first sight.
           | 
           | My desired hobbyist use case is to key out plants, lichens
           | and mushrooms that I find in the field. I have the
           | bioinformatics knowhow just need the hardware. 3-4h seems
           | lika a long time for a genome that is <30k nucleotides long.
           | Mushrooms on average seem to have almost as many genes as
           | coronaviruses has nucleotides. I guess partial sequences (and
           | thus reduced comp time?) might do the trick but it's probably
           | hard to target those partial reference sequences with a long-
           | read method like NanoPore.
        
           | alwaysdoit wrote:
           | If you repeat the process many times will it reduce that
           | error rate, or are the errors non-independent?
        
             | staplung wrote:
             | Unfortunately, with nanopore the errors are biased so you
             | tend to get errors in the same places. All sequencing
             | techniques also have error rates but some are unbiased so
             | running a single sample through (which will usually have
             | many, many copies of any sequence) will average out to a
             | good read of the sequence.
             | 
             | Some good info on next-gen sequencing techniques:
             | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3841808/
        
               | tdido wrote:
               | Still, some of the errors can be compensated for with
               | more coverage. So if you can manage 20-30X you're left
               | with the homopolymer problem (nanopores can't tell how
               | long a stretch of the same repeated nucleotide is,
               | because you can't control how long the sensed kmer stays
               | in the pore), but lots of other types can be improved
               | quite a lot.
        
             | alextheparrot wrote:
             | Last time I looked into Nanopore the cost wasn't that much
             | better where you'd even consider this experiment.
             | 
             | On the other hand, when doing a genome assembly, the
             | Nanopore reads are good for a draft sequence and then the
             | Illumina reads can be used to polish the sequence.
        
         | z991 wrote:
         | Here's my write-up of buying one for fun:
         | https://abarry.org/dna-sequencing-in-our-extra-bedroom/
        
           | carlsborg wrote:
           | From your post the thing uploads the sequenced data and their
           | service generates the report. Is the raw data available?
           | 
           | Also: truly remarkable phd thesis!
        
           | ipsum2 wrote:
           | Nice write up! How much did it cost to get a Oxford Nanopore?
        
             | carlsborg wrote:
             | $1k prices are on the website.
        
         | danpalmer wrote:
         | A close friend of mine has worked there for many years. We've
         | spoken a lot about the tech.
         | 
         | I don't know the answers to all your questions, but I do know
         | that the emphasis is on research, not consumer (or hobbyist)
         | use. I believe the devices are ~free, but each run requires
         | using a consumable part that has to be either disposed of or
         | returned for refurbishment, and I believe these are hundreds of
         | dollars each.
         | 
         | The big advancement is the size and cost of the devices, the
         | fact that a lab can have one on every desk rather than a
         | communal machine that you have to queue your samples up for, or
         | a device you can transport in field kit.
         | 
         | They do have cloud services that do much of the processing for
         | you, but I suspect you'd want to be able to manipulate the data
         | so you'd need your own data processing tools locally. It's not
         | going to give you a 23andMe style report, it's more likely to
         | say "yep, that's a human" vs "you're ecoli". I believe they do
         | have training for how to do this data analysis, but I suspect
         | this is targeted at customers on large contracts.
        
           | Ovah wrote:
           | Thank you for the practical insight. I suspected that
           | NanoPores are not just yet geared towards hobbyists. I happen
           | to have some bioinformatics knowhow so it's mainly a matter
           | of hardware for me. As both you, u/bioinformatics and
           | u/searine mention it is the overhead cost of flow cells etc
           | that worries me from a hobbyist point of view.
        
         | marsdentech wrote:
         | I used to run a department at a biotech where ~50% of our data
         | came from MinIONs (although, that said, I'm a bioinformatician,
         | rather than a molecular biologist), so I can answer your
         | questions. For (a.), you can for sure "batch" samples. The term
         | of art you're looking for is "multiplexing". Nanopore provide
         | prep kits that allow you to "barcode" different samples (i.e.
         | tag all the molecules in a given sample with a unique,
         | synthetic sequence, which allows them to be distinguished by
         | software downstream), but note that (as with all DNA prep kits,
         | but some more than others) you'll need access to a fair whack
         | of lab equipment and consumables to use it (these kits aren't
         | "all-in"). For (b.), for one anecdata point, I used to process
         | a whole flow cell's data on an M4800 with a 4th Gen i7 and 32
         | GB of RAM in a few hours. Most of the "high" computational
         | requirements you hear about relate to either assembly or
         | variant calling (both of which are downstream of just
         | retrieving "usable" sequencing data); and even both of those
         | I've managed on that same laptop overnight. Actually acquiring
         | the data (you can delay base calling if you like, although you
         | probably wouldn't need to) is real-time and only needs very
         | modest hardware (IMHO the Nanopore "system requirements" are
         | very much on the "safe-side".) "In the field", your challenge
         | would be physically preparing the samples!
        
         | searine wrote:
         | They are a fun tool, great for doing molecular work in the
         | field. The error rate is still very high compared to short
         | reads, but if you know this and plan for it going in you should
         | be fine.
         | 
         | Flowcells last for one sample. The machine should last
         | indefinitely. You can sometimes add more of the same DNA to a
         | flowcell after one use to get a bit more out of it, but the
         | quality degrades quickly. 500-1000 dollars each for flowcells,
         | depending on how much you order.
         | 
         | My experience in field use, I was using Oxford Nanopores
         | software which does processing remotely and was able to run the
         | the platform on just a regular 2015-era laptop.
        
           | twobitshifter wrote:
           | What is a flow cell made out of and why is the cost so high?
        
             | searine wrote:
             | It's made of plastic, glass, and the special protein pores
             | which split the strand and read the DNA. Reagents and
             | sample are applied to it to make the reaction happen.
             | 
             | The flowcell gets contaminated with your sample after one
             | run so they are 'one time use'. The nanonpore protein
             | eventually stops working also.
             | 
             | They are expensive because doing molecular biology is
             | expensive. It requires expensive machines and expensive
             | reagents at atomic scales to create. Thus money is
             | required.
        
               | tdido wrote:
               | Actually, one of the main features of this tech apart
               | from the obvious size-factor is that it's a streaming
               | process. You can analyse data on the fly and decide when
               | to stop the run. Wash the flowcell, and use it for
               | another sample. Eventually the pores die, yes, how fast
               | depends on the sample type. I think they guarantee 48
               | hours or something of the sort.
               | 
               | The expensive part is not the chemistry. Each flowcell
               | has a very expensive piece of metal that senses the very
               | small current variations that each kmer causes when going
               | through each pore. They've actually come up with a device
               | (horribly named "flongle") that has the same shape of a
               | flowcell but no pores, and the mini flowcell it uses is
               | ~90USD (against ~900USD for a full flowcell). Of course,
               | yield is much lower.
        
           | nojokes wrote:
           | Is the price a question of scale? If this technology would
           | become commonplace, would the price go down? Are there
           | patents that would prevent cheaper chemical production?
        
             | hobofan wrote:
             | I assume scale and more R&D on how to produce nanopores
             | more cheaply would be the main ways to drive price down. As
             | for patents, Oxford Nanopore has a pretty big portfolio for
             | all things nanopore, so a direct competitor based on
             | nanopores that would drive the price down seems unlikely
             | (though they obviously have to compete on price with other
             | sequencing methods to some degree).
        
       | phkahler wrote:
       | How does it handle repeats? I can understand reading AACCCT...
       | since they say the signal depends on several letters. But what
       | about 12 Gs? Or longer runs of the same letter. Is the some way
       | to clock one nucleotide at a time?
        
         | tdido wrote:
         | Nope. You're working with kmers. I think it's 6mers in the
         | current models. It's good because you get redundancy as you
         | move, but coupled with the fact that you can't control dwelling
         | time it makes repetition hard to handle.
        
         | marsdentech wrote:
         | As others have said, you're reading a sliding window of k-mers
         | over the target sequence; I think for the MinION k is presently
         | 5. To answer your question directly, it struggles with
         | homopolymer runs, not inherently because they're low
         | complexity, but actually because it's tricky to "clock" how
         | many like, contiguous k-mers have passed through the pore after
         | a given period of time. That is to say, for example, if your
         | target sequence is "GGGGGGG" (i.e. a homopolymer run of 7 Gs),
         | you'd expect to observe three like, contiguous signals (i.e. in
         | current space) for the all-G 5-mer, one signal each per "clock
         | cycle" (which corresponds to the dwell time of the k-mer in the
         | pore). If these "clock cycles" were always constant, it's
         | merely a case of dividing the "time spent on the observed all-G
         | 5-mer" signal by the the "time spent on one clock cycle".
         | Sadly, for our purposes, there's enough wobble in any one such
         | "clock cycle" that that calculation won't always yield a
         | reliable result. The upshot: your "GGGGGGG" (7 Gs) target
         | sequence may be registered as "GGGGGG" (6 Gs) or "GGGGGGGG" (8
         | Gs), or even something else. Now, for distinguishing two
         | alleles where the difference between them is, say, a doubling
         | in length of an already-very-long homopolymer run, even with
         | the aforementioned "clock wobble", you'd likely be able to see
         | that in MinION data quite clearly. As with all thing DNA
         | sequencing (for the time being, at least!), your precise
         | biological question will determine which (one or more)
         | sequencing techniques are best for the job!
        
           | phkahler wrote:
           | Just a thought. If the DNA were run through 2 such holes, you
           | could use a nearby non-uniform sequence to clock the reading
           | of the other one. Not a magic bullet, but maybe an
           | improvement. Assumes the readers can be close enough to bound
           | the amount of slack between them, and that they dont
           | interfere with each other.
        
       | RocketSyntax wrote:
       | Maybe if you ran the test 100 times and did some pileups by
       | position it would be usable in comparison to WGS
        
         | koeng wrote:
         | If you want to see what a real run looks like, here is a little
         | gist of my last Nanopore run, raw basecall -> alignment (no
         | consensus)
         | 
         | https://gist.github.com/Koeng101/abc674e1acd575646748afcbcc7...
        
       | RocketSyntax wrote:
       | Also, this is not new. It's been around for yrs
        
         | JabavuAdams wrote:
         | Maybe, but it was a good summary for me and I've been in
         | biophysics for 3 years or so. Also, lots of good keywords and
         | discussion generated here to follow up on. Overall, very useful
         | article and discussion.
        
         | dekhn wrote:
         | schatz periodically dumps PR for attention
        
       | u678u wrote:
       | Wow I never thought of this. I understand all the controversy
       | over 23&me and DNA secrecy, but it seems pretty soon it'll be
       | trivial to run DNA anywhere anytime.
        
         | garettmd wrote:
         | I'm wondering about the impacts of cheap/accessible DNA
         | sequencing in the future. Not just impacts to existing
         | businesses, but what does it mean from a privacy perspective?
         | If someone could take a strand of your hair and then get your
         | genome sequence from it - what would be the implications?
        
           | pishpash wrote:
           | In the long future: total loss of privacy and identity as
           | meaningful concepts.
        
       | devops000 wrote:
       | Could DNA sequence be used as a private key / seed for a Bitcoin
       | wallet? It does make sense?
        
         | koeng wrote:
         | At the 2014 DEFCON Biohacking village I did exactly that. I
         | gave out like 50 tubes of plasmid, all you had to do is go
         | sequence em to extract the private key, and boom, you get like
         | $200 (or like 15K today...)
         | 
         | Literally nobody did it for a couple years, so I ended up
         | taking out the bitcoin to pay for more DNA synthesis a few
         | years ago. I actually did delete the bitcoin private key
         | though, so I had to pay for sequencing it back out...
        
           | a-dub wrote:
           | what was your encoding scheme? hash of some character
           | representation was the key?
        
             | koeng wrote:
             | 2 base pairs per byte mapping. Super simple.
        
         | blamestross wrote:
         | Same problem as all biometrics. Data about you makes for a bad
         | password. It can make an ok username tho.
        
         | WanderPanda wrote:
         | Memorising the seedwords of one key + a backup key in a 1 of 2
         | multisig setup seem to be a good alternative.
        
         | CapitalistCartr wrote:
         | Any password-like object has to be changable. And easily.
        
       | Abishek_Muthian wrote:
       | The advancement in DNA sequencing tech for humans, have been a
       | boon for fighting extinction of other animals too. Sequencing
       | bird DNA from feathers to determine their migration and check
       | population was envisioned decades ago and has only been made
       | possible recently to the advancement of the tech.
       | 
       | The Bird Genoscape Project[1] was also showcased in this
       | excellent Nat Geo video[2].
       | 
       | [1]https://www.birdgenoscape.org/
       | 
       | [2]https://www.youtube.com/watch?v=_p43ksRgIlk
        
       | tingletech wrote:
       | seems pretty impressive. Here is the code linked in the article
       | that does the signal processing to decode the sensor data into
       | DNA sequences. https://github.com/skovaka/UNCALLED
        
       | lifeisstillgood wrote:
       | My first reaction after reaching the halfway point in the article
       | was to check it was not April 1st already.
       | 
       | But even on a site like Stackoverflow (hey I can trust Joel
       | right?), and even after coming here and reading "hey yes we build
       | / use those too" I am struggling to believe this.
       | 
       | What else don't I know about in biotech? How far ahead is the
       | industry compared to where the average man on the clapham omnibus
       | thinks it is.
       | 
       | Please stop the world I want to get off.
        
       | koeng wrote:
       | Until you also realize you need a Qubit and the library preps and
       | oh now you need NEB next gen enzymes and wow turns out pipette
       | technique really matters.
       | 
       | That said, I love Nanopores, I use them in my business, and those
       | error rates you can hack around if you know what's going on under
       | the hood.
        
         | tdido wrote:
         | I don't think you need the Qubit with the rapid prep.
        
           | koeng wrote:
           | it works but your efficiency drops by quite a bit
        
         | Florin_Andrei wrote:
         | > _those error rates_
         | 
         | Do a thousand readings, fix the parts that don't match across
         | the board?
        
         | dekhn wrote:
         | "wow turns out pipette technique really matters" <- one of the
         | most underrated comments of all time.
        
           | jacquesm wrote:
           | Boris Johnson gives a nice demonstration here:
           | 
           | https://twitter.com/neilhall_uk/status/1355088791220985857
        
             | dekhn wrote:
             | The worst for me was coming in early and setting up gels.
             | I'd drink a bunch of coffee, have shaky hands, and then
             | break the gel with the pipette tip repeatedly while trying
             | to jam the dna into the well.
             | 
             | there's a reason I went into automated biological robots.
        
           | andi999 wrote:
           | Pipette skills improve rapidly if you practice with a
           | microscale.
        
             | dekhn wrote:
             | that's how we calibrated ours. turns out: most pipettes in
             | the lab were miscalibrated, with 50+% error. Then it turned
             | out our scale wasnt properly calibrated, so we had to
             | replace that too.
        
         | samchorlton wrote:
         | Exactly. Better analytics can enable this technology to produce
         | better results than competing technologies in less time. Once
         | automated/easy/rapid sample prep comes, there will be mass
         | adoption in the space.
         | 
         | Disclaimer: Co-Founder of BugSeq[0] 0: https://bugseq.com
        
           | matthew_stone wrote:
           | > Once automated/easy/rapid sample prep comes, there will be
           | mass adoption in the space.
           | 
           | Sounds like Elon calling biology a "software problem".
           | 
           | Not saying that you're wrong, just saying that the
           | computational folk tend to discount the challenges and skills
           | required in the wet lab.
        
             | samchorlton wrote:
             | Agreed - Definitely a different class of problem than
             | "software". There are large barriers, eg. lab
             | contamination, biocontainment, low input protocols, etc;
             | however, technological innovation _will_ help with these.
             | 
             | That being said, we see a future where someone without
             | advanced molecular training can put a sample (whether
             | that's a nasal swab, concerning white powder received in
             | the mail or lab-grown meat) in a black box and get out a
             | meaningful report.
        
             | phkahler wrote:
             | >> Not saying that you're wrong, just saying that the
             | computational folk tend to discount the challenges and
             | skills required in the wet lab.
             | 
             | It's time to bring in the industrial automation folks. They
             | probably won't invent a fancy new algorithm to reduce the
             | time to splice the pieces together, but they'll fine tune
             | and automate your reader to the 9's.
        
           | koeng wrote:
           | Yea automated sample preps are key for me. The main thing
           | that is overlooked in synthetic biology about nanopore is it
           | has the capability to dramatically lower cost of indexing,
           | which turns out to be one of the main prohibiting costs for
           | dropping the cost of plasmid production.
        
       ___________________________________________________________________
       (page generated 2021-02-03 23:01 UTC)