Ambitious Plans to Store Genome Data Online Closer to Reality

Photo of author

( — November 9, 2014)  — Moore’s Law say that which is impossible today will likely be outdated tomorrow. Human Genome project took 15 years and $3 billion just to do the first genome sequence. Today, it can be done for a $1000 within 24 hours.

Acting accordingly to Moore’s Law, the researchers from Google are trying to store all humans genome online; however, Google Drive currently cannot cope with entire copies of the genome.

Google estimates that around 100 gigabits are needed to store all six billion nucleotide letters, which isn’t much. The problem, however, starts when we multiply this capacity for every person on the planet. For example, you’ll need more than 1.2 million terabit just to build a DNA database of everyone living in Moscow.

The problem is not about storage capacity; it is about searching and indexing, experts say. Google search index currently has limit at 100,000 terabit. The catch is that people’s DNA is 99.1 percent identical, so actually less than one gigabit is needed to store our essence on the cloud.  

Nevertheless, why would we want save our genome sequence online?

According scientist at the Institute for Systems Biology in Seattle, their vision is to be able to compare data from the “cancer genome clouds” and quickly run virtual experiments as easily as a web search. Others believe they could fight terminal diseases with greater success.

“if I were to get lung cancer in the future, doctors are going to sequence my genome and my tumor’s genome, and then query them against a database of 50 million other genomes,” said Deniz Kural, CEO of Seven Bridges, which stores genome data on behalf of 1,600 researchers in Amazon’s cloud. “The result will be ‘Hey, here’s the drug that will work best for you,’” she said.

The genome database is currently under construction as some 3,500 genomes from public projects are already stored on Google’s server farm. IBM, Microsoft and Amazon are joining the race and who ever resolve the data indexing, will profit. As the competition rises, however, the prices are dropping. Currently, a single human genome with Google is going to cost you $25, the same range in Amazon.