DNA 101 - A (hopefully GENTLE) INTRODUCTION

1.  DNA is the part of every cell in our bodies that tells the cells how to make us - it is the "master building plan" for our bodies.

2.  The "points", or “marker” genes, for the basic human being (all 100,000) have been “mapped” and assigned identifying numbers.  These are the numbers you see at the top of the chart:  the “Marker Numbers.”  They are called the “DYS” Numbers (DNA Y-Chromosome Segment).

 

3.  But if there are 100,000 genes in our personal DNA, why are there only a few numbers on the chart?  This is because only a FEW of the marker genes on only the Y (male) chromosome are checked in the genealogy DNA test, and different companies may test different markers.  These markers were chosen because they have proved to be the most “stable” predictors of “clan” relationships.  In other words, they usually remain the same for long periods of time with only slight modifications – good for comparing family lines back many generations.

4.  A person submitting their DNA to the company we use (Family Tree DNA - http://www.ftdna.com) can have a "12-marker test", a "25-marker test", a "37-marker test", or a "67-marker test".
            a.  The first set of 12 markers are the most "stable" of the group and USUALLY (“usually” means there CAN be variations even in closely related lines!) don't change as rapidly as the other markers.
            b.  The set of markers from 13-25 are SOMEWHAT less "stable" than the first group (can change a little more rapidly); and the same for 26-37, and then 38-67.  All of this is BROADLY interpreted when we talk about matches:  you should not rule out a match just because you do not match on one or two markers in the first 12 - the match may be there, it just may not be as close genealogically as if they all matched.

 

5.  So what are we matching? 

a.  Each gene has a specific code that tells it how many times to repeat itself in order to do its job in building our body. 

b.  That number of repeats is the number in the box below the marker number (the Allele number in genetic lingo). 

c.  Each person’s number of repeats is specific for that individual throughout the whole set of DNA in the cell (all 100,000 genes) which is why DNA can be used to determine paternity or identify people in criminal cases.

d.  BUT  . . . remember genealogy DNA only uses a FEW of the genes and ONLY from ONE of the 43 chromosomes in the cell:  the Y (male) chromosome.  These are genes that remain the same for many generations, the ones that are more STABLE – they are not useful in the kind of identification done for legal or criminal identification simply because they are NOT very different.

            e.  As you compare numbers, you are looking for someone who matches as many of your numbers as possible.

6.  Looking at the chart, some DYS numbers at the top are in red:  those numbers indicate markers which can change (mutate) rapidly - sometimes from one generation to the next. 

a.  Matches on these numbers are more conclusive than matches on the other markers because they have NOT changed between the two samples.

b.  And, similarly, a non-match on these numbers means less than a non-match on the numbers in black simply because they could have changed in a single generation – perhaps even from the DNA donor to his father.


7.  There is also the consideration of how GREAT the difference is when you do have a difference:  a difference from 11 in one sample to 10 or12 in the other is not as great as a difference from 11 in one to 8 or 14 in the other.  This is called “Genetic Distance” and is part of the report you get from FTDNA when your results come back.

 

8.  Here is the FTDNA site explanation of how they get the “genetic distance report” based on the number of steps of difference:

    “When comparing people’s samples in our system we show individuals who are closely matched, but not identically matched, as being different by what the Anthropologists call genetic distance.

    ”If two people were identical in all markers except they are off in one marker by 1 point, the genetic distance would be 1. If they were off at 2 different markers by 1 point in each marker, then the genetic distance of those two samples would be 2. If they were off by 2 points at one marker and 1 point in a second marker, then the genetic distance would be 3. This is called the Stepwise Model of calculating genetic distance for shallow time depths. (i.e. Genealogy not Anthropology).”

 

    “Some markers have shown themselves to be more volatile than others and the population geneticists have created a second model to account for these ‘aberrations’. That model is called the Infinite allele model. For markers that fall into this category, despite the fact that two people could be separated by 2 (or 3) mutations, the scientific assumption is that the change took place in a single generation (between a father and a son) and therefore it is treated as a single step, despite the fact that more than one ‘point’ separates two samples.

    ”Currently the Scientists have asked us to classify DYS 464 and YCAII a and b as following the Infinite Allele Model.”

 

9.  There is another column on the chart that has not been addressed:  the “Haplogroup” column.  Haplogroup is a geneticist term for the broad, overall “clan” group from which our genetic pattern originated and is based on general marker groupings of the first 12 markers.  When you think about haplogroups, you are thinking in terms of10,000 or even 10’s of thousands of years back.  There are currently 18 Haplogroups lettered A-R with subgroups alternately numbers and letters (for example the main haplogroup for our Claxton/Clarkson/Clarkston project is R1b1 which is the most common grouping.  Here is FTDNA’s explanation of the R1b1 group:

Haplogroup R1b1 is the most common haplogroup in European populations. It is believed to have expanded throughout Europe as humans re-colonized after the last glacial maximum 10-12 thousand years ago. This lineage is also the haplogroup containing the Atlantic modal haplotype.”

You’ll also notice that some of the haplogroup designations are green and the rest are red:  the ones that are green have had an additional test done (called a “deep clade test”) to determine their haplogroup for sure; the ones in red are PREDICTED based on FTDNA’s database.

To see what the overall pattern of named haplogroups looks like, you can go to FTDNA’s web site and look at their “Human Philogenetic Tree.”

 

10.  Now that you have the “technical” part down, let’s look at the chart itself.

            a.  On the left is the kit number.

            b.  Next to the kit number is the name of the progenitor for that group – the most distant known ancestor – and a little of what is known about him.

            c.  Next is the haplogroup.  Notice that we do have two samples which came back as I’sthe I haplogroup is the Nordic line with roots in Scandinavia.

            d.  The rest of the columns have the results for each kit for each marker (colors are explained in #11 below).

 

11.  Colors on the chart.

            a.  We now have 8 lines that match exactly on the first 12 markers (and an additional one with only one marker different), so we are taking that group as our primary group:  numbers which match those lines have been colored Light Yellow.  We have arranged those lines in order by the date of birth of the progenitor (with the exception of Robert C. Clarkson who differs from the main group by two markers in the 37 marker section and so is listed immediately after the lines that match on all 37 markers).

            b.  Absence of color (white) is used to show where there is a single differing number (in other words, that number does not match the primary group, and no one in the other groups has that number in that column either).

            c.  The other colors (Light Green, Lavender, Pale Blue, Light Turquoise, Rose, Gray, and Tan) are used to GROUP numbers which are different from the primary group, but have others which match.  I have tried to group these to show the majority of their matches together.

            d.  In addition, we have marked several “pairs” of progenitor boxes with color.  These colors indicate that the progenitors with the same color are assumed to be the same person even though the DNA does not match exactly between the two lines.  There are a variety of reasons why the DNA would not match, one of which is undocumented adoptions where family members raised children as their own when the parents of the children both died, and then the children carried the adoptive parents’ surname instead of their biological father’s name.

 

For further information (clarification?) on this explanation or for additions or corrections to the chart or the explanation, please e-mail me.