Genetic Privacy in Practice

A year ago, NIH was touting its plan to open up its Genome Wide Association Study data to all researchers. After all, research subjects should have nothing to worry about if their identities are not attached to their genome and if the data on all the subjects’ genomes is aggregated. But at the beginning of the month, NIH pulled pooled GWAS data from its website and began encouraging other institutions to follow suit, because a team of scientists have figured out just how to identify a single person’s DNA from a sample of hundreds.

DNA under magnifying glass

iStockphoto/SP

The paper describing the identification technique was published in the August 29 issue of PLoS Genetics by a team led by David W. Craig at the Translational Genomics Research Institute, also known as TGen, in Phoenix, AZ. In it, Craig and his team detail a new statistical technique that allows researchers to search through genomic mixtures that contain the DNA of more than 200 individuals and identify the presence of a single person’s DNA—even if that person’s DNA only makes up 0.1 percent of the entire mixture. They were even able to show how, theoretically, they could find an individual’s DNA in a mixture containing over 1,000 people.

This technique would be a helpful to forensics experts who usually find DNA samples at crime scenes that contain trace amounts of many individual’s DNA. Specifically, the technique utilized Single Nucleotide Polymorphism chips, or SNP chips, to detect the presence of tens of thousands of SNPs in a genomic mixture. SNP detection is usually employed to study the prevalence of certain genes and their correlation with certain diseases. Academic researchers have been using SNP chips to compile databases of human genomic variation like the one at the NIH, whereas clinicians and commercial ventures such as 23andMe and deCODEme have been using SNP chips to determine if a particular patient or consumer possesses SNPs that are correlated with certain traits or conditions. In fact, the TGen study utilized SNP chips from the companies Affymetrix and Illumina, the company partnered with 23andMe.

If this method is made more cost effective for crime labs, “it would be an amazing asset,” said Commander Brent Vermeer, director of the Phoenix Police Department crime lab in the TGen press release. For some time, one of the assumptions usually made about forensic DNA tests is that it is impossible to identify individuals from pooled data. Investigators currently utilize techniques that detect about 20 SNPs and cost about $50. The chips used in the TGen study detect tens of thousands of SNPs and cost several hundred dollars.

The TGen press release also notes a bill that was passed in June in the Arizona Senate which “requires police agencies to keep DNA evidence in cases of homicide or felony sexual assault for as long as convicts are in prison or on supervised release, or at least 55 years in unsolved cases. Some like Phoenix keep it indefinitely.”

Vermeer added in the press release, “As technology advances, we need to be prepared to keep evidence that, down the road, could prove again to be useful.”

In an email to GenomeWeb News, GPPC Director Kathy Hudson explained the legal implications: “So, the unlikely but concerning scenario is that law enforcement has a DNA sample from a crime scene, searches an NIH database, finds a match and gets a subpoena to identify what researcher provided the cohort data.”

“While a fairly remote concern, and there are some protections even against subpoena, NIH did the right thing in acting to protect research participants,” she wrote.

The larger privacy concern that led to the NIH’s new database restrictions is that this technique allows anyone with the technology to go into an aggregate genomic database and search for an individual’s particular genetic signature—if, of course you already know what that person’s genetic signature is. There have not been any breaches yet, but the NIH decided to abide by the precautionary principle and make the data available only to researchers who apply for access for a certain period of time. The NIH also confirmed that other groups, including the Wellcome Trust Case Control Consortium, and the Broad Institute of MIT and Harvard, also have removed their aggregate data from public availability.

To allay any other concerns, the NIH told GenomeWeb, “even if an individual’s SNP profile was found within a pooled dataset, all that would be learned is that this profile was contained in the dataset and, thus, it could then be associated with the characteristics of that dataset (e.g., disease or control population).”

Tags: ,

Comments on this article

Leave a Comment

Please remember that the Science Progress Terms of Use do not allow promoting or endorsing any particular political party or candidate for office. Posts or comments that do this will be deleted. By clicking "Submit Comment" below, you acknowledge that you have read our Terms of Use agreement and agree to its terms.

Close
E-mail It