Thoughts on the $1,000 genome
The idea of sequencing a human genome for $1,000 was what drew me in to research as a second-year undergraduate at UCLA in 2006. I had signed up to do research in the lab of Professor Jacob Schmidt, a biophysicist masquerading as a biological engineer, and of the several projects to choose from I was most excited by the hare-brained idea of threading strands of DNA through a one-nanometer hole in a protein and reading the bases as they slid by.
The U.S. Human Genome Project started in 1990 and published the first human genome sequence in 2003, at a total cost of $3 billion. A parallel project at Celera Genomics, founded by Craig Venter, started in 1998 and published a genome sequence at the same time for $300 million (but used a significant amount of data released by the public program).
Also in 2006, a $10 million Archon X-prize was offered to the first organization that could sequence 100 human genomes in ten days for less than $1,000 each, setting up the symbolic target for science and for the public. The actual competition was to start in September 2013 but was cancelled in June for reasons that appear to vary based on how politically correct the source desires to be: some say that the prize was "outpaced by innovation" and that the industry no longer needed that incentive to reach the target; others, that potential competitors could not agree on a maximum error rate (the sponsors wanted one in one million, but industry did not think it needed to be that low).
None the less, on January 14, 2014, Illumina launched the HiSeq X and claimed victory. This was achieved through what can be understated as "economies of scale": the machines are designed to be used and must be bought in sets of ten (which they are awkwardly calling "HiSeq X 10"), and the calculations assume that 72,000 genomes will be sequenced over four years. That's 50 genomes per day for a total cost (machines, reagents, labor) of $82 million. Dividing by 72,000 we see that this should really be called the $1,140 genome; presumably the $1,000 figure comes from subtracting the $1 million-per-machine initial purchase cost. They had six customers at press time.
The HiSeq X uses "next-generation" sequencing technology. Marketing phraseology can get complex, but next-generation sequencing is what Illumina and it's competitors do now. The so-called "current-generation" technology would be Sanger sequencing and its derivatives. The new technologies that are currently in the lab research phase — we fell into this group — are called third- or fourth-generation technologies.
I'm a bit disappointed that what brought us to the $1,000 genome in the end was not one of these elegant new technologies but what amounts to brute force: Illumina massively multiplexed its existing next-generation technology, improved the chemistry and optics (the machine works by imaging fluorescent nucleotides as they are incorporated into short strands of DNA), and bolted on better sample-preparation and data analysis systems. Our project at UCLA didn't work out; the one company, Oxford Nanopore Technologies, that tried to commercialize the technology announced a commercial product in dramatic fashion in 2012, but has not released any actual data (they are going to let a small number of customers, whom they will select themselves, use their product later this year). So it goes.
The value of a genome sequence to individual patients is not $1,000, yet. This is because of two things: we don't know enough about the genetic connections to diseases, and what little we do know does not translate into useful actions the patient can take (beyond "eat well" and "exercise as much as you can", which I could have told you without needing your genome sequence, or really any information at all). Still, Aetna for example lists 100 diseases for which it is willing to reimburse genetic testing, today. And at the research level, there are hundreds more for which genetic associations have been found. As soon as there are more than a handful of important conditions that are tied to particular genes, it will make more sense to just get a whole genome sequence rather than sequencing those specific genes, and more so the cheaper a genome sequence becomes. (To compare, the most sophisticated BRCA gene test for breast cancer costs about $4,000.)
So far as we know, the best way to increase our knowledge of the connection between genome and disease — and therefore the value of a whole genome sequence to patients — is to amass more genome sequences. A whole genome sequence is not just more data than a collection of individual genes, it is higher quality data: it means we don't have to bias a study by sequencing a particular set of genes chosen by hypothesis. We can instead get it all, and then quietly search through it in hindsight for what was most likely to have been significant. Systems like Illumina's will speed that along.
February 15, 2014
Update, March 16: Oxford Nanopore released preliminary data from one of its machines in February. They achieved ~5.5 kb read lengths, limited by the size of the DNA fragments that came out of the library prep and not necessarily by the pore, for an E. coli genome sequence. (For comparison, standard next-gen sequencers give reads of ~300 bases.) But, a full assembly of the E. coli genome required help from an Illumina machine, and there are still questions as to the error rate and coverage that is possible.
Human Longevity, inc., a new company founded by Craig Venter, has bought two HiSeq X 10 systems to sequence 40,000 human genomes (including the genomes of the bacteria in their guts) per year. They've decided the best way to get at the fundamental nature of aging is to amass the largest population genome database ever created.