comparing DNA match data - redux #dna


Lee Jaffe
 

I decided to try to reframe my earlier post, with the initial responses and new data I've gathered, to place the question in context. 

My original post featured two DNA match reports, one from MyHeritage and another from Ancestry, for my mother Nancy and an unknown person Mary.  MyHeritage reported total shared DNA of 123.8 cM (1.8%), in 10 segments, the largest being 25.2cM (and 3 other segments larger than 10cM).  In turn, Ancestry reported 21cM (66cM unweighted) across 7 segments with the longest 15cM.  I asked the list for help understanding such a wide discrepancy in these reports.  

I receive several responses, here and privately, most sharing a consistent view that Ancestry's algorithms are more conservative while MyHeritage often inflates its reports.  I don't want to go into a lot of detail here (you can see my original post and the full exchange at  https://groups.jewishgen.org/g/main/message/673482) but here is one representative quote: 

MH has some imputation problems where they "connect" some small segments that may be close.  They also include segments under 7cMs.  So they tend to inflate their numbers.   Ancestry excludes the segments under 7 which would bring down the match size and then TIMBR takes out pile up areas. 

Most responses recommended comparing tests at GEDMatch for a more realistic reading.  I should also mention that there were at least a couple of responses warning not to dismiss MyHeritage results, though still acknowledging that their numbers were not as rigorous.

These responses were alarming.  Anyone who has spent any time trying to use DNA to further their research – especially in Ashkenazi inheritance where endogamy muddies the waters – will recognize that a DNA match of 123.8 with a 25.2 segment is considered significant while a match of 21cM (or even 66cM) and a 15cM longest segment is to be given lower priority.  Processing what to do with this information, I wondered whether other matches I'd been exploring also exhibited such variation and were correspondingly questionable.  

I reviewed a table of my mother's DNA matches collected from Ancestry, MyHeritage, GEDMatch and FTDNA (see below).  For the 22 instances where I found matches from two or more testing sites, the average (mean) difference between the highest and lowest reported total was 18cM (when the 102.8 discrepancy reported for Mary was removed from the calculation).  Seven of the 21 (again, excluding Mary) posted above average variations: the largest was 71.6cM and the smallest 1.2cM.  In all but a few cases the variation was not significant and would not affect a determination about which matches to pursue.   

At the same time, a review of the data strongly suggests that assertions that MyHeritage inflates its reports and Ancestry is more conservative don't hold up.  In fact, Ancestry reported a higher total cM than MyHeritage did in 7 of the 9 instances where a head-to-head comparison was available.  And Ancestry posted the highest total cM, compared to any other test site, in half (11 of 22) cases.  

For my part, this leaves me still looking for an answer to my original question:  What can I make of two tests between my mother and Mary reporting such extremely different results?  If I can't explain it away with the algorithms used by the two testing services, what is the explanation?  And, on the practical side, which test do I trust?  I understand that a 3rd party comparison is most likely to answer that question, if Mary is willing to copy her results to GEDMatch.  I've asked her but she hasn't responded so far.  Even if this individual case is resolved eventually what do we make of such discrepancies?  This reminds me of something I found in a fortune cookie: "A person with one watch is certain of the time.  A person with two watches isn't sure."  Testing with more than one service is supposed to help with our family history research, not create more uncertainty.   

Lee David Jaffe
===============
Surnames / Towns:  Jaffe / Suchowola and Bialystok, Poland ; Stein (Sztejnsapir) / Bialystok and Rajgrod ; Roterozen / Rajgrod ; Joroff (Jaroff, Zarov) / Chernigov, Ukraine ; Schwartz (Schwarzstein) / Ternivka, Ukraine ;  Weinblatt / Brooklyn, Perth Amboy, NJ ; Koshkin / Snovsk, Ukraine ; Rappoport / ? ; Braun / Wizajny, Suwalki,  Ludwinowski / Wizajny, Suwalki

A.S. FTDNA 582
A.S. MyHeritage 590.4
A.S. Ancestry 598
A.S. GEDmatch 603.3
B-D GEDMatch 99.4
B-D FTDNA 105
B.F. FTDNA 62
B.F. MyHeritage 63.2
B.G. GEDMatch 118.8
B.G. FTDNA 130
B.G. MyHeritage 131.5
B.H. GEDmatch 57.3
B.H. MyHeritage 128.9
B.W. GEDMatch 5 60.1
B.W. MyHeritage 69.8
D.H. GEDMatch 175.7
D.H. Ancestry 205
G.L. MyHeritage 183.3
G.L. Ancestry 197
H.R. GEDMatch 192.9
H.R. Ancestry 195
J.F. MyHeritage 85
J.F. Ancestry 106
J.P. MyHeritage 138.6
J.P. Ancestry 148
J.Q. FTDNA 85
J.Q. GEDMatch 92.2
J.Q. MyHeritage 96
M.B FTDNA 287
M.B GEDmatch 287
M.B MyHeritage 300.2
M.L. GEDMatch 139.2
M.L. Ancestry 143
M.R. FTDNA 220.9
M.R. GEDMatch 221.8
M.R. MyHeritage 223.5
M.R. Ancestry 234
M.R. FTDNA 239
Mary Ancestry 21
Mary MyHeritage 123.8
N.T. FTDNA 125
N.T. GEDMatch 130.2
N.T. Ancestry 150
N.T. MyHeritage 153.5
P.G. GEDMatch 168.8
P.G. GEDMatch 172.1
P.G. GEDMatch 172.2
P.G. MyHeritage 197.9
P.G. Ancestry 207
R.A. GEDMatch 221.4
R.A. Ancestry 258
R.F. GEDmatch 315.4
R.F. Ancestry 324
S.H. GEDMatch 105.3
S.H. MyHeritage 115
S.H. Ancestry 122
S.R. GEDMatch 104.3
S.R. Ancestry 118
 


stoub@...
 

The dramatic differences between My Heritage and Ancestry are most noticeable when Ancestry thinks the total is <90cM. Above that threshold, TIMBER is not in effect. For Nancy's match to Mary, Ancestry calculates this as 67cM so TIMBER is in effect and they downweight this to 21cM. [Most of the matches you list above are well above 90cM.] Ancestry is basically saying that 2/3 of the DNA shared is too common to be genealogically meaningful. Some folks don't like TIMBER but my experience is that it is very useful, and Ancestry numbers are far more accurate than My Heritage. I doubt this is a traceable match.

The main question is why Ancestry says the total is 67cM and My Heritage says 124cM. If Mary doesn't transfer to GEDmatch, we're left guessing, but given the problems My Heritage has with imputing data, I wouldn't be surprised if that accounts for the bulk of the difference. One researcher who tested both her parents found that 32% of all matches had no match with either of her parents. https://cruwys.blogspot.com/2018/01/a-chromosome-browser-and-new-matching.html That's a lot of bad data!
--
Steve Toub


Jill Whitehead
 

Following the reference to my friend Debbie Kennet's Cruwys blogspot from 2018, I have contacted her for an update on the relative merits of Ancestry and My Heritage for making DNA comparisons. Here is her reply:

"MyHeritage have updated their matching algorithms since I wrote that post. However, there is still a very high false positive rate because they accept uploads in different file formats and there is very little overlap in the SNPs used for matching on the different chips. See:

 

https://isogg.org/wiki/Autosomal_SNP_comparison_chart

 

I don’t trust matches sharing below about 40 cM at MyHeritage and I’ve also found some problematic matches sharing between 40 and 50 cM. The matches at AncestryDNA are much more reliable.

 

I have two kits at MyHeritage. I did a new test on the Global Screening Array and with that kit 47% of my matches don’t match either of my parents. My other kit is a transfer from Ancestry and with that kit 38% of my matches don’t match either of my parents. I also get over a thousand more matches with the transfer kit.

 

With Jewish matches at MyHeritage I would suggest you ignore all the low and medium confidence matches and only work with matches where the largest segment is 30 cM or more and where are there are preferably at least two reasonable-sized segments. If you look in the chromosome browser you’ll probably find that a lot of the matches where you appear to share high cM totals are actually made up of lots of really tiny segments that are more likely to be false. You may want to recalibrate the total cM shared and remove all the segments under 10 cM to get a more realistic estimate of the total cM shared."

I am aware that a lot of people mistakenly put faith in smaller sized segments when many of these relate to false positives and should be ignored - some of these are the ones joined together to make larger segments by MH. Debbie recommends only looking at those matches where there is a higher 30cm segment plus some larger other segments, and ignore  the smaller ones under 10cm. That is what I normally do anyway, though I have personally looked for one segment of 25cm plus, rather than 30cm, as these seem to be more prevalent.  Though again, I am also aware that you are more likely to get a larger chromosome segment on some chromosomes of around 25cm, which has a skewing effect, so 30cm would avoid that.

This strategy would avoid those matches where clearly no recent family tree relationship can be found, and help in resolving the problem of endogamy. 

Jill Whitehead, Surrey, UK


Lee Jaffe
 

As a follow-up to my post and the suggestion of several posts, I was able to compare the results of match data for my mother and another person (Mary) at MyHeritage, FTDNA and GEDMatch.  

MyHeritage
Chromosome Start Location End Location Start RSID End RSID Centimorgans SNPs
1 109759998 151341570 rs2924 rs41284998 25.2 7552
2 79935344 89125131 rs76042919 rs2012201 8.3 3712
4 37349537 55823638 rs13120150 rs7684211 14.1 7168
5 149420698 167290121 rs2304070 rs116611957 20.2 9600
10 72680375 79844239 rs7079039 rs7100515 6.9 3968
10 108448200 115522272 rs1252035 rs2419878 7.6 3968
11 36423369 63143137 rs12290256 rs201479912 15.2 10368
12 5906873 12094208 rs6489659 rs78137435 11.7 2688
18 11225618 19261413 rs12457940 rs11662721 6.3 2048
20 53357605 56460394 rs74422090 rs6099816 8.3 2304

FamilyTree DNA
Chromosome Start Location End Location Centimorgans Matching SNPs
1 109727832 113810931 6.38 1123
2 79803287 88999248 8.68 1436
4 37250289 55984959 15.47 2816
5 149294495 159073110 9.55 2197
10 108316278 115620108 8.01 1649
11 36376603 44312566 6.71 1419
12 5775875 12208613 11.94 1718
18 11229655 19485946 6.91 847
20 54409617 56478627 6.17 831

GEDMatch
Chr B37 Start Pos'n B37 End Pos'n Centimorgans (cM) SNPs Segment threshold Bunch limit SNP Density Ratio
4 37,250,289 49,061,848 12.4 1,721 202 121 0.32
5 149,301,335 158,958,254 10.2 1,630 195 117 0.32
10 108,859,104 115,614,740 8.1 1,164 198 118 0.33
11 36,412,655 44,311,968 7 1,095 196 117 0.3
12 6,009,595 12,208,578 12.1 1,226 184 110 0.37
20 53,438,513 56,478,015 7.8 884 195 117 0.4

Some idea about where the discrepancy occurs came from a note in the FTDNA report:  the section on Chr 1 where MyHeritage reported a 25.2cM segment match was marked by FTDNA as a "SNP poor region: not tested for Family Finder."  

The bottom line for my purposes is that the MH report in this case was misleading and a more realistic reading  (3 out of 4 test sites) suggests this match falls below the threshold where I could reasonably expect to find a family tree connection.  I do want to point out that in the other matches I reviewed, MyHeritage was often within range of the other services' results and my experience in this case does not mean that MH testing should be deprecated generally.  I also want to note that I was fortunate to have the cooperation of the other test taker, who engaged in an exchange to explore a possible connection, shared pointers to other sites where she was tested and provided access to her family tree.  Thanks also to those who responded to my post.

Lee David Jaffe
===============
Surnames / Towns:  Jaffe / Suchowola and Bialystok, Poland ; Stein (Sztejnsapir) / Bialystok and Rajgrod ; Roterozen / Rajgrod ; Joroff (Jaroff, Zarov) / Chernigov, Ukraine ; Schwartz (Schwarzstein) / Ternivka, Ukraine ;  Weinblatt / Brooklyn, Perth Amboy, NJ ; Koshkin / Snovsk, Ukraine ; Rappoport / ? ; Braun / Wizajny, Suwalki,  Ludwinowski / Wizajny, Suwalki

 


Carolyn Lea
 

 Re: comparing DNA match data - redux #dna
Check out Kitty Cooper's blog on Ashkenazi DNA matches. Due to endogamy within the Ashkenazi population we need to look for a much larger threshold. 
"Finding relatives is difficult because all of us Ashkenazim are related multiple times, both way back when and more recently. Most AJs look like 4th or 5th cousins to each other even when that is not the case.  Cousin marriages, uncle-niece marriages, and other close family marriages abound in our trees. In my own family, on my Jewish line, my great grandmother fixed up her sister with her husband’s brother to get that dowry for the family business so I have double third cousins. Click here for my article from back in 2014 that suggested that we are all descended from 350 people in the 1300s." https://blog.kittycooper.com/2021/03/more-on-ashkenazi-dna/ 
She also has an older one you can check out at her blog site. 
Carolyn Lea (nee Schwarzbaum) 
OKC, OK
Schwarzbaum(Posen/NY/Georgia, US)
Lewisohn/Levison
Rothschild
Wittkowski
Basch









Steven Usdansky
 

My matches with HM on MyHeritage, FTDNA, and GEDmatch. All I know for sure is that 60+ cM segment didn't come from my mother, who doesn't match HM at all on chromosome 7 (with a 3 cM minimum)
Yellow - Largest estimate for a given segment
Blue - Segment only reported by one site

 

MyHeritage              
Chromosome Start Location End Location Start RSID End RSID Centimorgans SNPs    
6 1.056E+08 112518090 rs1417736 rs7738951 8.7 3584    
6 1.516E+08 1.603E+08 rs4489165 rs220729 13.5 5632    
7 4776780 46390788 rs75662232 rs898930 62.2 26368    
7 67594586 82373405 rs7777871 rs35847872 17.3 6656    
16 88165 2908703 rs2541696 rs8046218 6.1 1920    
22 36604605 41736267 rs132744 rs5751080 7.8 2816    
                 
FTDNA                
Chromosome Start Location End Location Centimorgans Matching SNPs      
6 1.063E+08 112621250 7.94 2370        
6 1.51E+08 1.601E+08 14.86 3984        
7 4940038 47102645 64.32 18302        
7 68622989 83429355 16.03 4575        
13 81485364 91104009 6.02 2546        
14 35853388 44331104 6.50 2412        
22 36639742 44744200 11.96 3146        
                 
GEDmatch                
Chr B37 Start Pos'n B37 End Pos'n Centimorgans (cM) SNPs Segment threshold Bunch limit SNP Density Ratio
2 2.274E+08 2.319E+08 7.2 722 201 120 0.28  
6 1.056E+08 112345811 8.3 1177 222 133 0.36  
6 1.515E+08 1.599E+08 12.7 1704 190 114 0.33  
7 4646190 46618644 61.2 8710 181 108 0.34  
7 75059630 83311410 10.5 1470 191 114 0.35  
14 30714764 33709003 8.1 418 199 119 0.25  
22 36570688 45118533 14.7 1174 192 115 0.25  


--
Steven Usdansky
usdanskys@...
USDANSKY (Узданский): Turec, Kapyl, Klyetsk, Nyasvizh, Slutsk, Grosovo
SINIENSKI: Karelichy, Lyubcha, Navahrudak
NAMENWIRTH: Bobowa, Rzepiennik
SIGLER: "Minsk"


Dorann Cafaro
 

Endogamy is a challenge but for those of us who have only one Ashkenazi relationship, in my case my father’s mother, I find my small Ashkenazi DNA (27%) is more accurate and much easier to connect. I am connected to almost everyone from Raudnitz an der Elbe, N Bohemia. So do not overlook these small matches for those who are not 100% Jewish DNA.

Dorann Jacobs Cafaro
Charleston, SC
732 687-5318

Researching Grauer, Dray, Marcus, May, Weisbach, Hecht, Cohen, Kahn
(Dray being a brick wall)


Bob Smiley
 

I did my own test on the validity of large segments of DNA matches on MyHeritage. I had a personal pile-up on chromosome 16. There were 30 matches which were in the range of 25 to 30 cM in length which overlapped each other. I tested each set to determine which ones were from my Paternal chromosome, and which were from my Maternal chromosome.

I had two known paternal cousins which had matches in that region, and non of the other 28 triangulated with both of them. So I only had two valid Paternal matches.
There were five who matched each other (triangulated) and they were presumed Maternal matches. None of them were known relatives and the total match was too small to expect to find the relationship through genealogical trees.

The other 23 large matches did not triangulate with anyone. These were all false matches, or what I prefer calling, pseudomatches.

The final determination is that in MyHeritage, over 76% of the large (bigger than 25 cM) matches are false matches. They are not real. This is undeniable data.

--
Bob Smiley
Kirkland, Washington USA