Got alarmed by the NSA snooping around? It’s only metadata, we were assured by everybody from the heads of the Intelligence services to the President.-nobody listens to your phone conversations. Which is true, as far as it goes. And let me hasten to say that personally I am not unduly alarmed as long as we don’t become a China or a Russia, or even an Ecuador-like country; we have to strike a balance between absolute liberty and security.
But consider what can be accomplished with metadata. An article in Science (18 January, p. 262, 2013) highlights an article in the same issue by Yaniv Erlich and his graduate student Melissa Gymrek, both of MIT;they wanted to find out if using anonymized data bases, like the 1000 Genomes Project, could be used to identify individuals who had donated their DNA without needing a sample of their DNA as a reference. As Science describes it
“Erlich’s team exploited two tricks. The first is that metadata about anonymous DNA donors, such as age at the time of donation and state of residence, is often included with their sequences. Erlich started with the genomes of 32 men of northern and western European ancestry collected in a public database as part of the International HapMap Project (Science, 26 May 2006, p. 1131). Based on the metadata, he knew the men’s ages and that each resided in Utah when they donated their DNA. But that only narrowed the search down to approximately 10,000 men.
For the next step in Erlich’s hack, he turned to a few dozen SNPs on the Y chromosome called Y-STR markers. These are almost certain to remain unchanged between father and son. Taken together, Y-STR markers are like a family crest that distinguishes one patrilineal pedigree from another. That’s a powerful tool if you want to know whether a man is a member of a particular family.
The second trick
That is where the second trick comes in. Cheap DNA-sequencing has made it possible for people to share their genetic markers in databases on recreational genealogy Web sites. To ferret out the donors’ identities, Erlich used the two most popular, which provide free access to databases containing nearly 40,000 records matching Y-STR to surnames.
When he plugged the 10 genomes with the most recoverable Y-STR markers into those genealogy databases, eight strongly matched to surnames of Mormon families in Utah. Ultimately, he was confident of his guesses for the surnames of five of the genome donors.
Erlich then gathered more information on each one using online resources such as public record search engines and obituaries. He hit the jackpot with metadata in records from Coriell Cell Repositories, a facility in New Jersey that provides cells from the 1000 Genomes Project donors to researchers. With that, he identified family members who had donated their own genomes to the same project, including women.”
The rapid proliferation and expansion of DNA databases will allow the correlation of genetic signatures with specific diseases, which is a boon to medical science and society. Federal law prohibits health insurance companies from using a person’s genetic data. But what inhibits an insurance company from subcontracting the snooping work and use the contractor’s “recommendations” in the form of a number on a risk scale, without formally having access to the DNA data? Alternatively, insurance companies could create finely tuned policies, based on DNA data, so as to target small groups of individuals sharing certain genetic traits. The possibilities of using these public DNA databases without ostensibly breaking the law are almost endless, the only limiting factor being human ingenuity.
The common refrain we hear from critics of the NSA program is that unlike private enterprise collecting personal data, the government has prosecutorial powers. I don’t know…To me, the likelihood of insurance companies surreptitiously laying their hands on my genetic makeup is far more alarming than the government having a record of my telephone calls.