EE user Bob raised the 64K question of the DNA era: How do we evaluate DNA using those two basic research tools—the Genealogical Proof Standard and the Evidence Analysis Map?
In our past two postings, we discussed
- The difference between the GPS and the EAM.
- How to evaluate DNA test results as a source: Is it an original record, a derivative of some sort, or a narrative report?
Today, let’s consider the information we get from DNA test results.
The reason we separate information from sources is that they are different entities. Think of it this way. When we buy a product at the store, we evaluate at least two different things. We evaluate the packaging and we evaluate the contents.
In the research world, the packaging is the source. The content is the information. The packaging may have problems—say, we buy a dozen pens and the package is torn. The pens may still be in perfect shape. Some pens may be damaged. Or all of them.
The information we get from a source is like those pens in that package. Each is an individual item. One may be reliable while another has problems. Following this concept, the EAM teaches us two critical points:
A. Information comes in three broad types:
- Primary (firsthand information);
- Secondary (secondhand information);
- Undetermined (for all those cases in which we don’t know who the informant is).
B. Each piece of information has to be separately evaluated. So, too, with DNA test results.
Different companies report their results in different formats—that format is what we evaluate as a source. Within that format, different companies provide different types of information. Some of it is primary, aka firsthand. Some of it is secondary, aka secondhand. As examples:
The report we get from a testing company is typically firsthand.
Most testing companies report our test results by setting up an individual database for us. That database generally provides information based on firsthand knowledge. The company is the informant. It was the principal participant in the process. It created the data we are getting. That database may present the data in different ways. For example:
- It may offer raw data.
- It may create a database of other testers who match us.
- It may build that information into a chromosome browser.
- It may create a certificate we can print out, showing specific markers.
Within all these formats, the company is providing primary information about our DNA. It has firsthand knowledge of the details of the testing process and firsthand knowledge of how it is interpreting the results.
When it tells us that we and another tested person share x-number of segments on Chromosome Whatever, with a length of x.x cM, it has firsthand knowledge of the data, the algorithms, and the methodology used.
Many testing companies have a module in which we and other users can upload our genealogical trees. This module offers secondhand information. The testing company does not create those trees. It does not have firsthand knowledge of all the information in those trees.
Some testing companies incorporate these trees into their reports. They tell us that we and Joe Blow seem to be fourth cousins, once removed. They give us a personalized tree showing how we and Joe Blow both descend from William and Mary Blow. That information is secondhand. The testing company’s algorithm did create the tree, but it does not have firsthand knowledge of each genealogical “fact” on Joe's tree or our tree through which we climb back to William and Mary.
With many sources, there are situations in which we do not know the identity of the informant. In that case, we consider the nature of our information to be “undetermined.” With DNA test results, this is not a situation we would expect. Even if we do not know the informant for each piece of information on a genealogical tree, if the testing company uses that tree to predict a common ancestor, then that portion of the company’s report is secondhand.
Of course, labeling a piece of information as primary or secondary is not the end-game, when it comes to evaluating the reliability of what we have. Labels can be misleading and labels don’t tell the full story.
It’s true: firsthand is usually better than secondhand. That said, primary information is not infallible. A piece of information received directly from a testing company can err. Algorithms are not perfect; and the humans who conduct the tests and key the results into databases can make mistakes. Conversely, the secondhand information provided for matching trees may be correct.
With every piece of information, we also have to evaluate whether that information provides EVIDENCE for the research question we are trying to answer. If so, we have to understand the kind of evidence it provides and how we can use it to support our conclusion. Tomorrow, we’ll tackle that.
HOW TO CITE: Elizabeth Shown Mills, "Applying the EAM to DNA: Part 2, Information," blog post, QuickTips: The Blog @ Evidence Explained (https://www.evidenceexplained.org/quicktips/applying-EAM-to-DNA-part-2-information : posted 11 August 2018)