We reported on a dispute over the methods used by New York City’s crime lab to analyze complex DNA samples. Now similar concerns are prompting a national study. In this Q&A, a leading expert explains why labs may be making mistakes — and what can be done about it.
By Lauren Kirchner,
The National Institute of Standards and Technology announced last week that it is launching a new study of certain types of DNA analysis used in criminal prosecutions. These methods, “if misapplied, could lead to innocent people being wrongly convicted,” according to the institute’s statement. NIST will invite all public and private forensics labs in the U.S. to participate by testing the same set of complex DNA samples, and will compare the results, which will be published online next summer. Its goal is to develop a new set of national standards for DNA analysis.
This study comes at a time when labs are seeking to identify suspects based on especially small samples (such as “touch” DNA, which consists of just a few skin cells), and using software to help analyze mixtures of more than one person’s genetic material. ProPublica recently methods by New York City’s crime lab: high-sensitivity testing of trace amounts of DNA, and the Forensic Statistic Tool, known technically as “probabilistic genotyping software.”
John Butler, a DNA expert and the author of several textbooks on forensic DNA testing, will be leading a team of scientists for the NIST study. He spoke to ProPublica Tuesday from the institute’s offices in Gaithersburg, Maryland.
Why this study, and why now?
Just in the past two years, there has been a huge rush to go into the probabilistic genotyping field, and people are jumping into this without really thinking about a lot of these issues: how sensitivity impacts what they’re doing, how “transfer” and “persistence” of DNA can impact their results, and what they’re doing in terms of the way that they set up their propositions that go into the likelihood ratios of their probabilistic genotyping programs.
The goal of this study is not to do a Consumer Reports on software, that’s not the purpose of this. I know that perhaps some commercial manufacturers may feel like we’re going to unjustly review their software — that’s not the plan. It’s to see, if presented with mixtures — and people are free to use manual methods or different software systems — what the different responses are. Nobody’s ever really looked at the results from the same samples, across different platforms, to see what happens.
There was a criminal case in upstate New York last year where two different commercial programs, TrueAllele and STRmix, came up with two different results for the same DNA evidence, or at least characterized the results in very different ways.
Yes, there are several things going on there. One is, how are they modeling the data collected from the evidence? So you may have the exact same DNA profile, but the modeling of what the profile means, and how the data is evaluated, can be different. But then the other aspect is, what propositions are put into it? So are you assuming that everyone [in a mixed DNA sample] is unrelated? All those things factor into what the final result is, and so that’s one of the reasons you see a difference.
I think to most readers, the fact that two programs could come up with two different results is really alarming.
Well I think we have to do a better job of trying to explain why those differences exist, and then to really tease them apart. Is it reasonable to get a vastly different result? And what does that mean, to a jury or a judge, or even to the police or prosecutor who are getting the results? Do they really appreciate what those results mean, or the range of possibilities that are there [in those results]? Just because you have a big number doesn’t mean that you got the right person. That kind of thing.
Will that be something that this study will look at — the way DNA results are explained, as well as how they are obtained?
They go together. If you can’t communicate the results, then you’re not really effective. Why generate them in the first place? That’s been my attitude: There’s just as much effort that needs to go into making sure you convey the right information, and not misinformation, with the DNA test results.
What will this study look like, and what do you hope to find?
We’ll start with a historical perspective on the literature, going back into the last 20 or so years. But it’s really been in the last 10 years that things have changed dramatically, because of the change in [testing] sensitivity. You have people looking at more and more mixtures, which they didn’t have when the sensitivity wasn’t as high, and they weren’t looking down into the weeds for their results. So I’ve been looking through all the proficiency tests that exist out there now that we can get our hands on, to understand what people have actually been tested for, in mixtures, and then how they’ve all performed, to get our current data points.
Then the other part we’re looking at is, DNA can get transferred between people, like when they shake hands. There have been lots of studies that have been done on this, but a lot of people don’t know about it. So this is to inform the forensic scientists as well. Understanding the implications of, if you have a really high-sensitivity technique and you’re putting it into a computer program to do testing on whether someone’s DNA is in there or not — just because someone’s DNA is there, what’s the meaning of that?
I know you’ve been looking into FST. One of the challenges is that you have proprietary software systems, and you’ll never be able to get to the bottom of some of those things. Getting access to the code for TrueAllele or for STRmix may never easily happen, because of the commercial environment that they’re in. And all the interlab studies we’ve ever done before have never really had these systems involved. We have now had a massive sea change in terms of labs moving towards probabilistic genotyping, and not really knowing what they’re doing, in terms of what will be their impact.
ProPublica has actually intervened in a federal court case to try gain access to the source code for FST.
My personal opinion is that companies have a right to proprietary information (which would include the source code so that a competitor does not have the ability to steal their hard work). But, with situations like DNA testing, it is important for users to understand what models are used, how data are processed, and the impact of assumptions being made. In other words, be transparent — which is a bedrock scientific principle.
I saw that NIST might also do a similar study on bite mark evidence in the future. So many types of forensic science, from firearms analysis to hair analysis to arson science, have been recently called into question. Some scientists even consider fingerprinting to be controversial now. But is it still fair to characterize DNA as the “gold standard” of forensic science?
In my talk that I gave back in 2015, I made an analogy to math. You have basic math, like two plus two equals four — basic arithmetic — that’s the equivalent of single-source DNA profiling. That works, and that’s your gold standard. When you get to sexual assault evidence — when you have a perpetrator’s and a victim’s DNA mixed together — that’s algebra. Usually there is a high level of DNA there, and it’s not an issue. But when you get to “touch” evidence, which is what we’re increasingly seeing in the forensic field, that’s calculus. So when we talk about DNA, it’s not that calculus, it’s not the touch evidence that’s the gold standard. The gold standard is the use of DNA with databases in either single-source samples or simple two-person mixtures.
And here’s the challenge: Labs are not prepared to do the complex mixtures. The reality is, all the labs’ proficiency tests, as I’m looking at them, are like basic math or algebra. So you’re going into a final exam on calculus, but you’ve only done homework on algebra and basic arithmetic. Are you going to pass that exam? That’s the reality of what we’re facing.