144 random matches in 65,000

144 random matches in 65,000


I am aware that full loci ,false matches, ( the first few dozen or so ) only occur with those people with all high frequency of occurence of alleles at all loci. So , to me, it is no surprise that this effect comes to the fore rather than related matches, the acceptable "explanation" for the Arizona partial match data. Whatever the contribution is of related partial matches, that proportion will stay the same with the addition of other states databases - it is naturally limited. But the number of unrelated partial matches will increase quadratically as no limitation.
Trying to simulate
http://www.stanford.edu/~kdevlin/Cold_Hit_Probabilities.pdf
Scientific Heat about Cold Hit
"... It should be noted that a recent analysis of the Arizona 
convicted offender data base (a database that uses the 13 CODIS loci) 
revealed that among the approximately 65,000 entries listed there were 
144 individuals whose DNA profiles match at 9 loci (including one 
match between individuals of different races, one Caucasion, the other 
African American), another few who match at 10 loci, one pair that 
match at 11, and one pair that match at 12. The 11 and 12 loci matches 
were siblings, hence not random. But matches on 9 or 10 loci among a 
database as small as 65,000 entries cast considerable doubt on figures 
such as "one in ten trillion" for a match that extends to just 3 or 4 a
dditional loci. ..."

http://www.latimes.com/news/local/la-me-dna20-2008jul20,0,1506170,full.story
How reliable is DNA in identifying suspects? July 19, 2008 ... "The FBI laboratory, which administers the national DNA database system, tried to stop distribution of Troyer's results and began an aggressive behind-the-scenes campaign to block similar searches elsewhere, even those ordered by courts, a Times investigation found. At stake is the credibility of the compelling odds often cited in DNA cases, which can suggest an all but certain link between a suspect and a crime scene. .... As a result, Thomas Callaghan, head of the FBI's CODIS unit, has dismissed Troyer's findings as "misleading" and "meaningless." He urged authorities in several states to object to Arizona-style searches, advising them to tell courts that the probes could violate the privacy of convicted offenders, tie up crucial databases and even lead the FBI to expel offending states from CODIS -- a penalty that could cripple states' ability to solve crimes. ... After the judge, Steven Platt, rejected her arguments, Groves returned to court, saying the search was too risky. FBI officials had now warned her that it could corrupt the entire state database, something they would not help fix, she told the court." With further info from http://www.maa.org/devlin/devlin_10_06.html Devlin's Angle October 2006 ... As far as I am aware, to date there has been only one attempt to do this, and the results obtained were both startling and worrying. A study of the Arizona CODIS database carried out in 2005 showed that approximately 1 in every 228 profiles in the database matched another profile in the database at nine or more loci, that approximately 1 in every 1,489 profiles matched at 10 loci, 1 in 16,374 profiles matched at 11 loci, and 1 in 32,747 matched at 12 loci. How big a population does it take to produce so many matches that appear to contradict so dramatically the astronomical, theoretical figures given by the naive application of the product rule? The Arizona database contained at the time a mere 65,493 entries. Scary isn't it? It is not much of a leap to estimate that the FBI's national CODIS database of 3,000,000 entries will contain not just one but several pairs that match on all 13 loci, contrary (and how!) to the prediction made by the currently much touted RMP that you can expect a single match only when you have on the order of 15 quadrillion profiles. ... Which translates to 144 pairs at 9 loci 22 pairs at 10 loci 2 pairs at 11 loci 1 pair at 12 loci The Arizona Data http://www.nlada.org/Defender/forensics/for_lib/Documents/1148592247.61/Myers%20CAC%20Presentation.pdf is 144, 9 loci matches; 22 , 10 loci matches; 2 related 11 loci matches and 1 related 12 loci match in 65,493 , 13 loci DNA profiles. That is 144 inclusive of 120 9 loci, 22 10 loci , and 2 10/11 loci. "Avoid Saying that 13-Locus Profiles are de facto Unique" Quote from the Myers presentation - should be writ large in labs and courtrooms Source info for the Arizona database disclosures via a Kathryn Troyer, now on http://www.nlada.org/Defender/forensics/for_lib/Documents/1148592247.61/Myers%20\ CAC%20Presentation.pdf or http://tinyurl.com/ytp6ho and original http://www.promega.com/geneticidproc/ussymp12proc/abstracts/troyer.pdf FYI another related file on that site has a mistranscribed transcript , which all the way through, has a mysterious reference to "low side" - a homonym for loci. One further clarification is the 144 off pairs of 9 loci matches consists of 120 exclusive 9 loci + 22 off 10 loci + 1 off 11 loci + 1 off 12 loci, 22 off 10 loci = 20 + 1 +1 2 off 11 loci = 1 +1 1 off 12 loci as is. 3 of the 9-loci partial matches are shown on that presentation as pairs A/B,C/D and E/F I've added AFs as % and minimum AFs (allele frequencies) included and excluded in the matches afterwards A/B AF-cauc AF-hisp D3 16,16 / 16,16 25,25 / 25,25 29,29 / 29,29 vWA 17,18 / 17,18 28,20 / 28,20 22,17 / 22,17 FGA 22.2,24/19,19 <.2,13.6/5.3,5.3 <.4,15/6.4,6.4 Amel X,Y / X,Y - - - - D8 12,13 / 8,15 15,30 / 1.2,11 14,27 /0.7,13 D21 30,31 / 29,30 28,8.3/19.28 26,8.2/ 20,26 D18 14,16 / 14.16 14,14 / 14,14 14,14 / 14,14 D5 11,12 / 11,12 36,38 / 36,38 35,35 / 35,35 D13 12,13 / 12,13 25,12 / 25,12 22,12 / 22,12 D7 11,11 / 11,11 21,21 / 21,21 26,26 / 26,26 D16 11,12 / 11,12 32,33 / 32,33 26,25 / 26,25 THO1 7,8 / 6,8 19,8.4/23,8.4 28,9.6/21,9.6 TPOX 8,11 / 8,11 53,24 / 53,24 47,27 / 47,27 CSF 12,13 / 12,13 36,9.6/36,9.6 36,6.1/36,6.1 min inc = 9.6 min inc=6.1 min exc=<.2 min exc=<.4 C/D D3 16,17 / 14,17 25,21 / 10,21 29,20 /7.9,20 vWA 16,17 / 16,17 20,28 / 20,28 26,22 / 26,22 FGA 22,23 / 21,21 22,13 / 18,18 15,14 / 17,17 Amel X,Y / X,Y - - - - D8 12,14 / 12,16 18,17 / 18,3 14,25 /14,2.5 D21 28,30 / 28,30 16,28 / 16,28 9.6,26/9.6,26 D18 16,16 / 16,18 14,14 /14,7.6 14,14 /14,6.8 D5 12,12 / 12,12 38,38 / 38,38 35,35 / 35,35 D13 11,12 / 11,12 34,25 / 34,25 24,12 / 24,12 D7 9,10 / 9,10 18,24 / 18,24 11,29 / 11,29 D16 11,13 / 11,13 32,15 / 32,15 26,19 / 26,19 THO1 8,9 / 8,9 8.4,12/8.4,12 9.6,15/9.6,15 TPOX 8,9 / 8,9 53,12 / 53,12 47,10 / 47,10 CSF 10,11 / 10,11 21,30 / 21,30 23,29 / 23,29 min inc = 8.4 min inc=9.6 min exc=3.1 min exc=6.8 E/F D3 15,15 / 15,15 26,26 / 26,26 29,29 / 29,29 vWA 17,17 / 17,17 28,28 / 28,28 22,22 / 22,22 FGA 18,24 / 20,25 2.6,3.6/13,7.1 1.8,15/8.9/12 Amel X,Y / X,Y - - - - D8 13,14 / 13,14 30,17 / 30,17 27,25 / 27,25 D21 28,31.2/28,31.2 16,10 / 16,10 9.6,11/9.6,11 D18 13,16 / 14,19 13,14 /14,3.8 11,14 /14,3.9 D5 11,12 / 11,12 36,38 / 36,39 35,35 / 35,35 D13 11,11 / 11,11 34,34 / 34,34 24,24 / 24,24 D7 11,12 / 10,11 21,17 / 24,21 26,16 / 29,26 D16 12,12 / 12,12 33,33 / 33,33 25,25 / 25,25 THO1 6,8 / 6,8 23,8.4/23.8.4 21,9.6/21,9.6 TPOX 8,8 / 8,8 53,53 / 53,53 47,47 / 47,47 CSF 10,12 / 10,10 21,36 / 21,21 23,26 / 23,23 min inc =8.4 min inc=9.6 min exc=2.6 min exc=1.8 It is instructive to convert to allele frequencies as in simulations unrelated false matches in "population" sizes of order 1 million to 10 million consist only of a subset of the population. My own profile falls in the all >8% subset to put me in the firing line for a false match at some indeterrminate future date via the UK system. For UK , 10 loci, then minimum AF of 6.6% in all matches when all published alleles are used totally randomly in simulations. For CODIS simulated profiles with all published alleles then the minimum AF was 7.8% for all the matching pairs. Some subset proportions (caucasian) Percentage of UK population with all >8% is 6.6% (10loci) Percentage of UK population with all > 6.9% is 15.8% (10 loci) Percentage of USA population with all >8% is 5.6% (13 loci) Percentage of Oz population with all >3% is 52% (9 loci) So in the above A,B,D,E,F,G , whether caucasian or hispanic i would say they were all likely unrelated except maybe A-B if they were hispanic. There is a 9 in 13 chance that any rareish allele (< 6.6 to 7.8 AF) would be included in the matching alleles if the pair were related which would not be the case for unrelated matches. I will attempt to show how rare a possibility this is. There is about a 7.2 percent chance of an allele having 5.6 percent allele frequency (C. Brenner) Chance of a <5.6 percent allele 0.072 chance > 5.6 percent then 1-0.072 = 0.928 Number of times, drawing one low one in 54 draws is 0.072 * 54 = 3.89 Number of times all 54 being > 5.6, is 0.928^54 = 0.0177 so 3.89/0.0177 = 220 times more unlikely than the situation if related. Repeating the AF analysis for African American shows 2 pairs matching if related, ie included AFs have one less than 5.6 percent A/B AF- African American D3 16,16 / 16,16 29,29 / 29,29 vWA 17,18 / 17,18 18,14 / 18,14 FGA 22.2,24/19,19 <.3,18 / 7,7 Amel X,Y / X,Y - D8 12,13 / 8,15 8,15 / .6,23 D21 30,31 / 29,30 28,8.3 / 19.28 D18 14,16 / 14,16 18,9 / 18,9 D5 11,12 / 11,12 25,37 / 25,37 D13 12,13 / 12,13 40,16 / 40,16 D7 11,11 / 11,11 21,21 / 21,21 D16 11,12 / 11,12 35,17 / 35,17 THO1 7,8 / 6,8 38,22 / 16,22 TPOX 8,11 / 8,11 32,25 / 32,25 CSF 12,13 / 12,13 27,5.1 / 36,5.1 min inc = 5.1 min exc = <.3 C/D D3 16,17 / 14,17 29,27 / 9,27 vWA 16,17 / 16,17 25,18 / 25,18 FGA 22,23 / 21,21 19,13 / 18,18 Amel X,Y / X,Y - D8 12,14 / 12,16 8,36 / 8,6.4 D21 28,30 / 28,30 26,18 / 26,18 D18 16,16 / 16,18 18,18 / 18,10 D5 12,12 / 12,12 37,37 / 37,37 D13 11,12 / 11,12 28,40 / 28,40 D7 9,10 / 9,10 13,30 / 13,30 D16 11,13 / 11,13 35,12 / 35,12 THO1 8,9 / 8,9 22,12 / 22,12 TPOX 8,9 / 8,9 32,21 / 32,21 CSF 10,11 / 10,11 25,27 / 25,27 min inc = 12 min exc = 9 E/F D3 15,15 / 15,15 24,24 / 24,24 vWA 17,17 / 17,17 18,18 / 18,18 FGA 18,24 / 20,25 0.9,18 / 3.5,10 Amel X,Y / X,Y - D8 13,14 / 13,14 15,36 / 15,36 D21 28,31.2/28,31.2 26,4.8 / 26,4.8 D18 13,16 / 14,19 5.4,19 / 8,9.2 D5 11,12 / 11,12 25,37 / 25,37 D13 11,11 / 11,11 28,28 / 28,28 D7 11,12 / 10,11 21,12 / 30,21 D16 12,12 / 12,12 17,17 / 17,17 THO1 6,8 / 6,8 16,22 / 16,22 TPOX 8,8 / 8,8 32,32 / 32,32 CSF 10,12 / 10,10 25,27 / 25,25 min inc = 4.8 min exc = 0.9 And repeating for Native American , all related to my criteria AB min incl CSF 4.7 percent CD min incl THO1 5.3 and TPOX 4.2 percent EF min incl THO1 5.2 percent Also taking the principal 3 subcomponents of Arizona prisons of Hispanic 46.1 percent/Cauc 34.7 percent / Af-Am 8.6 percent /native American 4.4 percent from http://acjc.state.az.us/pubs/home/Crime_Trends_2005.pdf page 46 ignoring oriental, native-Am etc and rescaling to 49.1/37.0/ 9.2 / 4.7 percent The ratio of likelihood of unrelated to related matches in just the Hispanic and Caucasian communities for matching pairs ABCDE,F in the Myers presentation is 220 and 2/3 probability of pairs AB and EF are related matches ,if from the smaller Af-Am population, and all 3 pairs related if from Nat-Am then I make the revised ratio that those A to F profiles and the stated 2 related pairs represent about 87.1 percent unrelated to 12.9 percent related in the 9 loci matches for a representaive Az offender DNA database. ie NOT mostly or overwhelmingly related matches. I've not yet seen anyone else's evaluation from the (so far ?) available evidence. Admittedly, further account has to be taken of the probability of related matches having no low AF alleles. From The Annals of Applied Statistics 2007,Vol.1,No.2,358–370 THE RARITY OF DNA PROFILES BY BRUCE S. WEIR Table 2, for a simulation of 65,493 profiles and theta = 0.03 he would expect a 9/10 ratio of 10.5 to 1. So a 2 pairs in 22 10 loci known to be related , he would expect only 21 related pairs in 122 of 9 loci pairs. The Visual Basic code is between the Horizontal Rules as dividers. To generate use the 13 loci CODIS generator on dnas11.htm I initially chose to generate 32,500, half the quoted number. The following VB routines are for locus 1 divider/sorter then match checker for 2 to 13 locus 2 sorter then checker Locus 3 sorter then checker Locus 4 sorter then checker Converting 'profile' strings back to standard Codis form. CODIS generator for only >8 percent AFs Results Use the Codis generator on dnas11 changing the - to space in the file names for consistency. Place the VB code between Sub and End Sub, change the file names to whatever you prefer, I prefer names based on date. It would be possible to join together these routines but I prefer to keep a bit hands on outputting to separate files
' dividing into 10 by first digit ' Dividing 10 files into 10 by second digit Dim ps As String Dim ph(26) xx = 0 yyyy = xx temp = "jul12-d.txt" temp0 = "jul28a 0" temp1 = "jul28a 1" temp2 = "jul28a 2" temp3 = "jul28a 3" temp4 = "jul28a 4" temp5 = "jul28a 5" temp6 = "jul28a 6" temp7 = "jul28a 7" temp8 = "jul28a 8" temp9 = "jul28a 9" tempc = "jul28a c" Open temp For Input As #1 Open temp0 For Output As #10 Open temp1 For Output As #11 Open temp2 For Output As #12 Open temp3 For Output As #13 Open temp4 For Output As #14 Open temp5 For Output As #15 Open temp6 For Output As #16 Open temp7 For Output As #17 Open temp8 For Output As #18 Open temp9 For Output As #19 count0 = 0 count1 = 0 count2 = 0 count3 = 0 count4 = 0 count5 = 0 count6 = 0 count7 = 0 count8 = 0 count9 = 0 Do Until (EOF(1) = True) Input #1, ps a2$ = Mid(ps, 1, 1) ph(1) = Val(a2$) If ph(1) = 0 Then Write #10, ps count0 = count0 + 1 End If If ph(1) = 1 Then Write #11, ps count1 = count1 + 1 End If If ph(1) = 2 Then Write #12, ps count2 = count2 + 1 End If If ph(1) = 3 Then Write #13, ps count3 = count3 + 1 End If If ph(1) = 4 Then Write #14, ps count4 = count4 + 1 End If If ph(1) = 5 Then Write #15, ps count5 = count5 + 1 End If If ph(1) = 6 Then Write #16, ps count6 = count6 + 1 End If If ph(1) = 7 Then Write #17, ps count7 = count7 + 1 End If If ph(1) = 8 Then Write #18, ps count8 = count8 + 1 End If If ph(1) = 9 Then Write #19, ps count9 = count9 + 1 End If x = x + 1 Loop Close (1) Close #1 Close #10 Close #11 Close #12 Close #13 Close #14 Close #15 Close #16 Close #17 Close #18 Close #19 ' output counts countt = count0 + count1 + count2 + count3 + count4 + count5 + count6 + count7 + count8 + count9 Open tempc For Output As #20 Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9, total, countt Close #20 For xx = 0 To 9 yyyy = xx ' beep count on every tenth division ' as a progress indicator temp = "jul28a" & Str(yyyy) temp0 = "jul28a" & Str(yyyy) & " 0" temp1 = "jul28a" & Str(yyyy) & " 1" temp2 = "jul28a" & Str(yyyy) & " 2" temp3 = "jul28a" & Str(yyyy) & " 3" temp4 = "jul28a" & Str(yyyy) & " 4" temp5 = "jul28a" & Str(yyyy) & " 5" temp6 = "jul28a" & Str(yyyy) & " 6" temp7 = "jul28a" & Str(yyyy) & " 7" temp8 = "jul28a" & Str(yyyy) & " 8" temp9 = "jul28a" & Str(yyyy) & " 9" tempc = "jul28a" & Str(yyyy) & " c" Open temp For Input As #1 Open temp0 For Output As #10 Open temp1 For Output As #11 Open temp2 For Output As #12 Open temp3 For Output As #13 Open temp4 For Output As #14 Open temp5 For Output As #15 Open temp6 For Output As #16 Open temp7 For Output As #17 Open temp8 For Output As #18 Open temp9 For Output As #19 count0 = 0 count1 = 0 count2 = 0 count3 = 0 count4 = 0 count5 = 0 count6 = 0 count7 = 0 count8 = 0 count9 = 0 Do Until (EOF(1) = True) Input #1, ps a2$ = Mid(ps, 2, 1) ph(1) = Val(a2$) If ph(1) = 0 Then Write #10, ps count0 = count0 + 1 End If If ph(1) = 1 Then Write #11, ps count1 = count1 + 1 End If If ph(1) = 2 Then Write #12, ps count2 = count2 + 1 End If If ph(1) = 3 Then Write #13, ps count3 = count3 + 1 End If If ph(1) = 4 Then Write #14, ps count4 = count4 + 1 End If If ph(1) = 5 Then Write #15, ps count5 = count5 + 1 End If If ph(1) = 6 Then Write #16, ps count6 = count6 + 1 End If If ph(1) = 7 Then Write #17, ps count7 = count7 + 1 End If If ph(1) = 8 Then Write #18, ps count8 = count8 + 1 End If If ph(1) = 9 Then Write #19, ps count9 = count9 + 1 End If x = x + 1 Loop Close (1) Close #1 Close #10 Close #11 Close #12 Close #13 Close #14 Close #15 Close #16 Close #17 Close #18 Close #19 ' output counts Open tempc For Output As #20 Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9 Close #20 Next xx 'End Beep Beep
' progressively check for locus 1 matches and any in ' 2 to 13 then the next until no more to check as only maximum of ' matches on 9 loci required ' match checker for locus 1 Dim ps As String Dim pt As String Dim ph(13) Dim pk(13) Dim Array1(40000, 1) temp0 = "jul29b-r.txt" Open temp0 For Output As #10 For bb = 0 To 9 For aa = 0 To 9 temp = "jul28a" & Str(bb) & Str(aa) temp2 = "jul28a" & Str(bb) & Str(aa) Open temp2 For Input As #2 nn = 0 Do While (EOF(2) = False) Input #2, pt c1$ = Mid(pt, 1, 26) Array1(nn, 1) = c1$ nn = nn + 1 Loop Close (2) Close #2 startlim = 0 ' endlim= nn-1 endlim = nn - 1 j = startlim k = 0 For n = 0 To endlim ' pt is the profile to be checked against for each ' other than itself ' comparing all columns 2 to 13 pt = Array1(n, 1) b2$ = Mid(pt, 3, 2) pk(2) = b2$ b3$ = Mid(pt, 5, 2) pk(3) = b3$ b4$ = Mid(pt, 7, 2) pk(4) = b4$ b5$ = Mid(pt, 9, 2) pk(5) = b5$ b6$ = Mid(pt, 11, 2) pk(6) = b6$ b7$ = Mid(pt, 13, 2) pk(7) = b7$ b8$ = Mid(pt, 15, 2) pk(8) = b8$ b9$ = Mid(pt, 17, 2) pk(9) = b9$ b10$ = Mid(pt, 19, 2) pk(10) = b10$ b11$ = Mid(pt, 21, 2) pk(11) = b11$ b12$ = Mid(pt, 23, 2) pk(12) = b12$ b13$ = Mid(pt, 25, 2) pk(13) = b13$ k = 0 count0 = 0 count1 = 0 ' qq quasi-loop because the Exit For in the ' loop forces closure of the loop rather than Next For q = n + 1 To nn For qq = 0 To 0 ps = Array1(q, 1) ' pt is the profile to be checked zz = 0 xx = 0 a2$ = Mid(ps, 3, 2) ph(2) = a2$ If ph(2) <> pk(2) Then zz = zz + 1 xx = xx + zz zz = 0 a3$ = Mid(ps, 5, 2) ph(3) = a3$ If ph(3) <> pk(3) Then zz = zz + 1 xx = xx + zz zz = 0 a4$ = Mid(ps, 7, 2) ph(4) = a4$ If ph(4) <> pk(4) Then zz = zz + 1 xx = xx + zz zz = 0 a5$ = Mid(ps, 9, 2) ph(5) = a5$ If ph(5) <> pk(5) Then zz = zz + 1 xx = xx + zz zz = 0 a6$ = Mid(ps, 11, 2) ph(6) = a6$ If ph(6) <> pk(6) Then zz = zz + 1 xx = xx + zz If xx > 4 Then Exit For End If zz = 0 a7$ = Mid(ps, 13, 2) ph(7) = a7$ If ph(7) <> pk(7) Then zz = zz + 1 xx = xx + zz If xx > 4 Then Exit For End If zz = 0 a8$ = Mid(ps, 15, 2) ph(8) = a8$ If ph(8) <> pk(8) Then zz = zz + 1 xx = xx + zz If xx > 4 Then Exit For End If zz = 0 a9$ = Mid(ps, 17, 2) ph(9) = a9$ If ph(9) <> pk(9) Then zz = zz + 1 xx = xx + zz If xx > 4 Then Exit For End If zz = 0 a10$ = Mid(ps, 19, 2) ph(10) = a10$ If ph(10) <> pk(10) Then zz = zz + 1 xx = xx + zz If xx > 4 Then Exit For End If zz = 0 a11$ = Mid(ps, 21, 2) ph(11) = a11$ If ph(11) <> pk(11) Then zz = zz + 1 xx = xx + zz If xx > 4 Then Exit For End If zz = 0 a12$ = Mid(ps, 23, 2) ph(12) = a12$ If ph(12) <> pk(12) Then zz = zz + 1 xx = xx + zz If xx > 4 Then Exit For End If zz = 0 a13$ = Mid(ps, 25, 2) ph(13) = a13$ If ph(13) <> pk(13) Then zz = zz + 1 xx = xx + zz If xx > 4 Then Exit For End If Write #10, ps, pt, 13 - xx Next qq ' xx is the count of non-matching alleles in ps profile to the pt profile count1 = count1 + 1 k = k + 1 Next q j = j + 1 ' beeps after 1000s If j / 1000 = Int(j / 1000) Then For beepc = 1 To (j / 1000) For beept = 1 To 200000 beepu = 1 / beept Next beept Beep Next beepc End If Next n Close #1 Next aa ' end beep For beepc = 1 To bb For beept = 1 To 200000 beepu = 1 / beept Next beept Beep Next beepc Next bb Close #10 ' end beep For beepc = 1 To 10 For beept = 1 To 200000 beepu = 1 / beept Next beept Beep Next beepc
' dividing into 10 by third digit ' Dividing 10 files into 10 by fourth digit Dim ps As String Dim ph(26) xx = 0 yyyy = xx temp = "jul12-d.txt" temp0 = "jul28b 0" temp1 = "jul28b 1" temp2 = "jul28b 2" temp3 = "jul28b 3" temp4 = "jul28b 4" temp5 = "jul28b 5" temp6 = "jul28b 6" temp7 = "jul28b 7" temp8 = "jul28b 8" temp9 = "jul28b 9" tempc = "jul28b c" Open temp For Input As #1 Open temp0 For Output As #10 Open temp1 For Output As #11 Open temp2 For Output As #12 Open temp3 For Output As #13 Open temp4 For Output As #14 Open temp5 For Output As #15 Open temp6 For Output As #16 Open temp7 For Output As #17 Open temp8 For Output As #18 Open temp9 For Output As #19 count0 = 0 count1 = 0 count2 = 0 count3 = 0 count4 = 0 count5 = 0 count6 = 0 count7 = 0 count8 = 0 count9 = 0 Do Until (EOF(1) = True) Input #1, ps a2$ = Mid(ps, 3, 1) ph(1) = Val(a2$) If ph(1) = 0 Then Write #10, ps count0 = count0 + 1 End If If ph(1) = 1 Then Write #11, ps count1 = count1 + 1 End If If ph(1) = 2 Then Write #12, ps count2 = count2 + 1 End If If ph(1) = 3 Then Write #13, ps count3 = count3 + 1 End If If ph(1) = 4 Then Write #14, ps count4 = count4 + 1 End If If ph(1) = 5 Then Write #15, ps count5 = count5 + 1 End If If ph(1) = 6 Then Write #16, ps count6 = count6 + 1 End If If ph(1) = 7 Then Write #17, ps count7 = count7 + 1 End If If ph(1) = 8 Then Write #18, ps count8 = count8 + 1 End If If ph(1) = 9 Then Write #19, ps count9 = count9 + 1 End If x = x + 1 Loop Close (1) Close #1 Close #10 Close #11 Close #12 Close #13 Close #14 Close #15 Close #16 Close #17 Close #18 Close #19 ' output counts countt = count0 + count1 + count2 + count3 + count4 + count5 + count6 + count7 + count8 + count9 Open tempc For Output As #20 Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9, total, countt Close #20 For xx = 0 To 9 yyyy = xx temp = "jul28b" & Str(yyyy) temp0 = "jul28b" & Str(yyyy) & " 0" temp1 = "jul28b" & Str(yyyy) & " 1" temp2 = "jul28b" & Str(yyyy) & " 2" temp3 = "jul28b" & Str(yyyy) & " 3" temp4 = "jul28b" & Str(yyyy) & " 4" temp5 = "jul28b" & Str(yyyy) & " 5" temp6 = "jul28b" & Str(yyyy) & " 6" temp7 = "jul28b" & Str(yyyy) & " 7" temp8 = "jul28b" & Str(yyyy) & " 8" temp9 = "jul28b" & Str(yyyy) & " 9" tempc = "jul28b" & Str(yyyy) & " c" Open temp For Input As #1 Open temp0 For Output As #10 Open temp1 For Output As #11 Open temp2 For Output As #12 Open temp3 For Output As #13 Open temp4 For Output As #14 Open temp5 For Output As #15 Open temp6 For Output As #16 Open temp7 For Output As #17 Open temp8 For Output As #18 Open temp9 For Output As #19 count0 = 0 count1 = 0 count2 = 0 count3 = 0 count4 = 0 count5 = 0 count6 = 0 count7 = 0 count8 = 0 count9 = 0 Do Until (EOF(1) = True) Input #1, ps a2$ = Mid(ps, 4, 1) ph(1) = Val(a2$) If ph(1) = 0 Then Write #10, ps count0 = count0 + 1 End If If ph(1) = 1 Then Write #11, ps count1 = count1 + 1 End If If ph(1) = 2 Then Write #12, ps count2 = count2 + 1 End If If ph(1) = 3 Then Write #13, ps count3 = count3 + 1 End If If ph(1) = 4 Then Write #14, ps count4 = count4 + 1 End If If ph(1) = 5 Then Write #15, ps count5 = count5 + 1 End If If ph(1) = 6 Then Write #16, ps count6 = count6 + 1 End If If ph(1) = 7 Then Write #17, ps count7 = count7 + 1 End If If ph(1) = 8 Then Write #18, ps count8 = count8 + 1 End If If ph(1) = 9 Then Write #19, ps count9 = count9 + 1 End If x = x + 1 Loop Close (1) Close #1 Close #10 Close #11 Close #12 Close #13 Close #14 Close #15 Close #16 Close #17 Close #18 Close #19 ' output counts Open tempc For Output As #20 Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9 Close #20 Next xx 'End Beep Beep
Dim ps As String Dim pt As String Dim ph(13) Dim pk(13) Dim Array1(40000, 1) ' after dividing into 10 x 10 files on locus 2, L2 ' compares L3 columns each with the other temp0 = "jul29a-r.txt" Open temp0 For Output As #10 For bb = 0 To 9 For aa = 0 To 9 temp = "jul28b" & Str(bb) & Str(aa) temp2 = "jul28b" & Str(bb) & Str(aa) Open temp2 For Input As #2 nn = 0 Do While (EOF(2) = False) Input #2, pt c1$ = Mid(pt, 1, 26) Array1(nn, 1) = c1$ nn = nn + 1 Loop Close (2) Close #2 startlim = 0 ' endlim= nn-1 endlim = nn - 1 j = startlim k = 0 For n = 0 To endlim ' pt is the profile to be checked against for each ' other than itself ' comparing all columns 3 to 13 pt = Array1(n, 1) b3$ = Mid(pt, 5, 2) pk(3) = b3$ b4$ = Mid(pt, 7, 2) pk(4) = b4$ b5$ = Mid(pt, 9, 2) pk(5) = b5$ b6$ = Mid(pt, 11, 2) pk(6) = b6$ b7$ = Mid(pt, 13, 2) pk(7) = b7$ b8$ = Mid(pt, 15, 2) pk(8) = b8$ b9$ = Mid(pt, 17, 2) pk(9) = b9$ b10$ = Mid(pt, 19, 2) pk(10) = b10$ b11$ = Mid(pt, 21, 2) pk(11) = b11$ b12$ = Mid(pt, 23, 2) pk(12) = b12$ b13$ = Mid(pt, 25, 2) pk(13) = b13$ k = 0 count0 = 0 count1 = 0 ' qq quasi-loop because the Exit For in the ' loop forces closure of the loop rather than Next For q = n + 1 To nn For qq = 0 To 0 ps = Array1(q, 1) ' pt is the profile to be checked zz = 0 xx = 0 a3$ = Mid(ps, 5, 2) ph(3) = a3$ If ph(3) <> pk(3) Then zz = zz + 1 xx = xx + zz zz = 0 a4$ = Mid(ps, 7, 2) ph(4) = a4$ If ph(4) <> pk(4) Then zz = zz + 1 xx = xx + zz zz = 0 a5$ = Mid(ps, 9, 2) ph(5) = a5$ If ph(5) <> pk(5) Then zz = zz + 1 xx = xx + zz zz = 0 a6$ = Mid(ps, 11, 2) ph(6) = a6$ If ph(6) <> pk(6) Then zz = zz + 1 xx = xx + zz If xx > 3 Then Exit For End If zz = 0 a7$ = Mid(ps, 13, 2) ph(7) = a7$ If ph(7) <> pk(7) Then zz = zz + 1 xx = xx + zz If xx > 3 Then Exit For End If zz = 0 a8$ = Mid(ps, 15, 2) ph(8) = a8$ If ph(8) <> pk(8) Then zz = zz + 1 xx = xx + zz If xx > 3 Then Exit For End If zz = 0 a9$ = Mid(ps, 17, 2) ph(9) = a9$ If ph(9) <> pk(9) Then zz = zz + 1 xx = xx + zz If xx > 3 Then Exit For End If zz = 0 a10$ = Mid(ps, 19, 2) ph(10) = a10$ If ph(10) <> pk(10) Then zz = zz + 1 xx = xx + zz If xx > 3 Then Exit For End If zz = 0 a11$ = Mid(ps, 21, 2) ph(11) = a11$ If ph(11) <> pk(11) Then zz = zz + 1 xx = xx + zz If xx > 3 Then Exit For End If zz = 0 a12$ = Mid(ps, 23, 2) ph(12) = a12$ If ph(12) <> pk(12) Then zz = zz + 1 xx = xx + zz If xx > 3 Then Exit For End If zz = 0 a13$ = Mid(ps, 25, 2) ph(13) = a13$ If ph(13) <> pk(13) Then zz = zz + 1 xx = xx + zz If xx > 3 Then Exit For End If Write #10, ps, pt, 12 - xx Next qq ' xx is the count of non-matching alleles in ps profile to the pt profile count1 = count1 + 1 k = k + 1 Next q j = j + 1 ' beeps after 1000s If j / 1000 = Int(j / 1000) Then For beepc = 1 To (j / 1000) For beept = 1 To 200000 beepu = 1 / beept Next beept Beep Next beepc End If Next n Close #1 Next aa Next bb Close #10 ' end beep For beepc = 1 To 10 For beept = 1 To 200000 beepu = 1 / beept Next beept Beep Next beepc
' dividing into 10 by fifth digit ' Dividing 10 files into 10 by sixth digit Dim ps As String Dim ph(26) xx = 0 yyyy = xx temp = "jul12-d.txt" temp0 = "jul28c 0" temp1 = "jul28c 1" temp2 = "jul28c 2" temp3 = "jul28c 3" temp4 = "jul28c 4" temp5 = "jul28c 5" temp6 = "jul28c 6" temp7 = "jul28c 7" temp8 = "jul28c 8" temp9 = "jul28c 9" tempc = "jul28c c" Open temp For Input As #1 Open temp0 For Output As #10 Open temp1 For Output As #11 Open temp2 For Output As #12 Open temp3 For Output As #13 Open temp4 For Output As #14 Open temp5 For Output As #15 Open temp6 For Output As #16 Open temp7 For Output As #17 Open temp8 For Output As #18 Open temp9 For Output As #19 count0 = 0 count1 = 0 count2 = 0 count3 = 0 count4 = 0 count5 = 0 count6 = 0 count7 = 0 count8 = 0 count9 = 0 Do Until (EOF(1) = True) Input #1, ps a2$ = Mid(ps, 5, 1) ph(1) = Val(a2$) If ph(1) = 0 Then Write #10, ps count0 = count0 + 1 End If If ph(1) = 1 Then Write #11, ps count1 = count1 + 1 End If If ph(1) = 2 Then Write #12, ps count2 = count2 + 1 End If If ph(1) = 3 Then Write #13, ps count3 = count3 + 1 End If If ph(1) = 4 Then Write #14, ps count4 = count4 + 1 End If If ph(1) = 5 Then Write #15, ps count5 = count5 + 1 End If If ph(1) = 6 Then Write #16, ps count6 = count6 + 1 End If If ph(1) = 7 Then Write #17, ps count7 = count7 + 1 End If If ph(1) = 8 Then Write #18, ps count8 = count8 + 1 End If If ph(1) = 9 Then Write #19, ps count9 = count9 + 1 End If x = x + 1 Loop Close (1) Close #1 Close #10 Close #11 Close #12 Close #13 Close #14 Close #15 Close #16 Close #17 Close #18 Close #19 ' output counts countt = count0 + count1 + count2 + count3 + count4 + count5 + count6 + count7 + count8 + count9 Open tempc For Output As #20 Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9, total, countt Close #20 For xx = 0 To 9 yyyy = xx temp = "jul28c" & Str(yyyy) temp0 = "jul28c" & Str(yyyy) & " 0" temp1 = "jul28c" & Str(yyyy) & " 1" temp2 = "jul28c" & Str(yyyy) & " 2" temp3 = "jul28c" & Str(yyyy) & " 3" temp4 = "jul28c" & Str(yyyy) & " 4" temp5 = "jul28c" & Str(yyyy) & " 5" temp6 = "jul28c" & Str(yyyy) & " 6" temp7 = "jul28c" & Str(yyyy) & " 7" temp8 = "jul28c" & Str(yyyy) & " 8" temp9 = "jul28c" & Str(yyyy) & " 9" tempc = "jul28c" & Str(yyyy) & " c" Open temp For Input As #1 Open temp0 For Output As #10 Open temp1 For Output As #11 Open temp2 For Output As #12 Open temp3 For Output As #13 Open temp4 For Output As #14 Open temp5 For Output As #15 Open temp6 For Output As #16 Open temp7 For Output As #17 Open temp8 For Output As #18 Open temp9 For Output As #19 count0 = 0 count1 = 0 count2 = 0 count3 = 0 count4 = 0 count5 = 0 count6 = 0 count7 = 0 count8 = 0 count9 = 0 Do Until (EOF(1) = True) Input #1, ps a2$ = Mid(ps, 6, 1) ph(1) = Val(a2$) If ph(1) = 0 Then Write #10, ps count0 = count0 + 1 End If If ph(1) = 1 Then Write #11, ps count1 = count1 + 1 End If If ph(1) = 2 Then Write #12, ps count2 = count2 + 1 End If If ph(1) = 3 Then Write #13, ps count3 = count3 + 1 End If If ph(1) = 4 Then Write #14, ps count4 = count4 + 1 End If If ph(1) = 5 Then Write #15, ps count5 = count5 + 1 End If If ph(1) = 6 Then Write #16, ps count6 = count6 + 1 End If If ph(1) = 7 Then Write #17, ps count7 = count7 + 1 End If If ph(1) = 8 Then Write #18, ps count8 = count8 + 1 End If If ph(1) = 9 Then Write #19, ps count9 = count9 + 1 End If x = x + 1 Loop Close (1) Close #1 Close #10 Close #11 Close #12 Close #13 Close #14 Close #15 Close #16 Close #17 Close #18 Close #19 ' output counts Open tempc For Output As #20 Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9 Close #20 Next xx 'End Beep Beep
Dim ps As String Dim pt As String Dim ph(13) Dim pk(13) Dim Array1(40000, 1) ' after dividing into 10 x 10 files on locus 3 ' compares L4 columns each with the other temp0 = "jul30-r.txt" Open temp0 For Output As #10 For bb = 0 To 9 For aa = 0 To 9 temp = "jul28c" & Str(bb) & Str(aa) temp2 = "jul28c" & Str(bb) & Str(aa) Open temp2 For Input As #2 nn = 0 Do While (EOF(2) = False) Input #2, pt c1$ = Mid(pt, 1, 26) Array1(nn, 1) = c1$ nn = nn + 1 Loop Close (2) Close #2 startlim = 0 ' endlim= nn-1 endlim = nn - 1 j = startlim k = 0 For n = 0 To endlim ' pt is the profile to be checked against for each ' other than itself ' comparing all columns 4 to 13 pt = Array1(n, 1) b4$ = Mid(pt, 7, 2) pk(4) = b4$ b5$ = Mid(pt, 9, 2) pk(5) = b5$ b6$ = Mid(pt, 11, 2) pk(6) = b6$ b7$ = Mid(pt, 13, 2) pk(7) = b7$ b8$ = Mid(pt, 15, 2) pk(8) = b8$ b9$ = Mid(pt, 17, 2) pk(9) = b9$ b10$ = Mid(pt, 19, 2) pk(10) = b10$ b11$ = Mid(pt, 21, 2) pk(11) = b11$ b12$ = Mid(pt, 23, 2) pk(12) = b12$ b13$ = Mid(pt, 25, 2) pk(13) = b13$ k = 0 count0 = 0 count1 = 0 ' qq quasi-loop because the Exit For in the ' loop forces closure of the loop rather than Next For q = n + 1 To nn For qq = 0 To 0 ps = Array1(q, 1) ' pt is the profile to be checked zz = 0 xx = 0 a4$ = Mid(ps, 7, 2) ph(4) = a4$ If ph(4) <> pk(4) Then zz = zz + 1 xx = xx + zz zz = 0 a5$ = Mid(ps, 9, 2) ph(5) = a5$ If ph(5) <> pk(5) Then zz = zz + 1 xx = xx + zz zz = 0 a6$ = Mid(ps, 11, 2) ph(6) = a6$ If ph(6) <> pk(6) Then zz = zz + 1 xx = xx + zz If xx > 2 Then Exit For End If zz = 0 a7$ = Mid(ps, 13, 2) ph(7) = a7$ If ph(7) <> pk(7) Then zz = zz + 1 xx = xx + zz If xx > 2 Then Exit For End If zz = 0 a8$ = Mid(ps, 15, 2) ph(8) = a8$ If ph(8) <> pk(8) Then zz = zz + 1 xx = xx + zz If xx > 2 Then Exit For End If zz = 0 a9$ = Mid(ps, 17, 2) ph(9) = a9$ If ph(9) <> pk(9) Then zz = zz + 1 xx = xx + zz If xx > 2 Then Exit For End If zz = 0 a10$ = Mid(ps, 19, 2) ph(10) = a10$ If ph(10) <> pk(10) Then zz = zz + 1 xx = xx + zz If xx > 2 Then Exit For End If zz = 0 a11$ = Mid(ps, 21, 2) ph(11) = a11$ If ph(11) <> pk(11) Then zz = zz + 1 xx = xx + zz If xx > 2 Then Exit For End If zz = 0 a12$ = Mid(ps, 23, 2) ph(12) = a12$ If ph(12) <> pk(12) Then zz = zz + 1 xx = xx + zz If xx > 2 Then Exit For End If zz = 0 a13$ = Mid(ps, 25, 2) ph(13) = a13$ If ph(13) <> pk(13) Then zz = zz + 1 xx = xx + zz If xx > 2 Then Exit For End If Write #10, ps, pt, 11 - xx Next qq ' xx is the count of non-matching alleles in ps profile to the pt profile count1 = count1 + 1 k = k + 1 Next q j = j + 1 ' beeps after 1000s If j / 1000 = Int(j / 1000) Then For beepc = 1 To (j / 1000) For beept = 1 To 200000 beepu = 1 / beept Next beept Beep Next beepc End If Next n Close #1 Next aa Next bb Close #10 ' end beep For beepc = 1 To 10 For beept = 1 To 200000 beepu = 1 / beept Next beept Beep Next beepc
' dividing into 10 by seventth digit ' Dividing 10 files into 10 by eighth digit Dim ps As String Dim ph(26) xx = 0 yyyy = xx temp = "jul12-d.txt" temp0 = "jul28d 0" temp1 = "jul28d 1" temp2 = "jul28d 2" temp3 = "jul28d 3" temp4 = "jul28d 4" temp5 = "jul28d 5" temp6 = "jul28d 6" temp7 = "jul28d 7" temp8 = "jul28d 8" temp9 = "jul28d 9" tempc = "jul28d c" Open temp For Input As #1 Open temp0 For Output As #10 Open temp1 For Output As #11 Open temp2 For Output As #12 Open temp3 For Output As #13 Open temp4 For Output As #14 Open temp5 For Output As #15 Open temp6 For Output As #16 Open temp7 For Output As #17 Open temp8 For Output As #18 Open temp9 For Output As #19 count0 = 0 count1 = 0 count2 = 0 count3 = 0 count4 = 0 count5 = 0 count6 = 0 count7 = 0 count8 = 0 count9 = 0 Do Until (EOF(1) = True) Input #1, ps a2$ = Mid(ps, 7, 1) ph(1) = Val(a2$) If ph(1) = 0 Then Write #10, ps count0 = count0 + 1 End If If ph(1) = 1 Then Write #11, ps count1 = count1 + 1 End If If ph(1) = 2 Then Write #12, ps count2 = count2 + 1 End If If ph(1) = 3 Then Write #13, ps count3 = count3 + 1 End If If ph(1) = 4 Then Write #14, ps count4 = count4 + 1 End If If ph(1) = 5 Then Write #15, ps count5 = count5 + 1 End If If ph(1) = 6 Then Write #16, ps count6 = count6 + 1 End If If ph(1) = 7 Then Write #17, ps count7 = count7 + 1 End If If ph(1) = 8 Then Write #18, ps count8 = count8 + 1 End If If ph(1) = 9 Then Write #19, ps count9 = count9 + 1 End If x = x + 1 Loop Close (1) Close #1 Close #10 Close #11 Close #12 Close #13 Close #14 Close #15 Close #16 Close #17 Close #18 Close #19 ' output counts countt = count0 + count1 + count2 + count3 + count4 + count5 + count6 + count7 + count8 + count9 Open tempc For Output As #20 Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9, total, countt Close #20 For xx = 0 To 9 yyyy = xx temp = "jul28d" & Str(yyyy) temp0 = "jul28d" & Str(yyyy) & " 0" temp1 = "jul28d" & Str(yyyy) & " 1" temp2 = "jul28d" & Str(yyyy) & " 2" temp3 = "jul28d" & Str(yyyy) & " 3" temp4 = "jul28d" & Str(yyyy) & " 4" temp5 = "jul28d" & Str(yyyy) & " 5" temp6 = "jul28d" & Str(yyyy) & " 6" temp7 = "jul28d" & Str(yyyy) & " 7" temp8 = "jul28d" & Str(yyyy) & " 8" temp9 = "jul28d" & Str(yyyy) & " 9" tempc = "jul28d" & Str(yyyy) & " c" Open temp For Input As #1 Open temp0 For Output As #10 Open temp1 For Output As #11 Open temp2 For Output As #12 Open temp3 For Output As #13 Open temp4 For Output As #14 Open temp5 For Output As #15 Open temp6 For Output As #16 Open temp7 For Output As #17 Open temp8 For Output As #18 Open temp9 For Output As #19 count0 = 0 count1 = 0 count2 = 0 count3 = 0 count4 = 0 count5 = 0 count6 = 0 count7 = 0 count8 = 0 count9 = 0 Do Until (EOF(1) = True) Input #1, ps a2$ = Mid(ps, 8, 1) ph(1) = Val(a2$) If ph(1) = 0 Then Write #10, ps count0 = count0 + 1 End If If ph(1) = 1 Then Write #11, ps count1 = count1 + 1 End If If ph(1) = 2 Then Write #12, ps count2 = count2 + 1 End If If ph(1) = 3 Then Write #13, ps count3 = count3 + 1 End If If ph(1) = 4 Then Write #14, ps count4 = count4 + 1 End If If ph(1) = 5 Then Write #15, ps count5 = count5 + 1 End If If ph(1) = 6 Then Write #16, ps count6 = count6 + 1 End If If ph(1) = 7 Then Write #17, ps count7 = count7 + 1 End If If ph(1) = 8 Then Write #18, ps count8 = count8 + 1 End If If ph(1) = 9 Then Write #19, ps count9 = count9 + 1 End If x = x + 1 Loop Close (1) Close #1 Close #10 Close #11 Close #12 Close #13 Close #14 Close #15 Close #16 Close #17 Close #18 Close #19 ' output counts Open tempc For Output As #20 Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9 Close #20 Next xx 'End Beep Beep
Dim ps As String Dim pt As String Dim ph(13) Dim pk(13) Dim Array1(40000, 1) ' after dividing into 10 x 10 files on locus 4 ' compares L5 columns each with the other temp0 = "jul30a-r.txt" Open temp0 For Output As #10 For bb = 0 To 9 For aa = 0 To 9 temp = "jul28d" & Str(bb) & Str(aa) temp2 = "jul28d" & Str(bb) & Str(aa) Open temp2 For Input As #2 nn = 0 Do While (EOF(2) = False) Input #2, pt c1$ = Mid(pt, 1, 26) Array1(nn, 1) = c1$ nn = nn + 1 Loop Close (2) Close #2 startlim = 0 ' endlim= nn-1 endlim = nn - 1 j = startlim k = 0 For n = 0 To endlim ' pt is the profile to be checked against for each ' other than itself ' comparing all columns 4 to 13 pt = Array1(n, 1) b5$ = Mid(pt, 9, 2) pk(5) = b5$ b6$ = Mid(pt, 11, 2) pk(6) = b6$ b7$ = Mid(pt, 13, 2) pk(7) = b7$ b8$ = Mid(pt, 15, 2) pk(8) = b8$ b9$ = Mid(pt, 17, 2) pk(9) = b9$ b10$ = Mid(pt, 19, 2) pk(10) = b10$ b11$ = Mid(pt, 21, 2) pk(11) = b11$ b12$ = Mid(pt, 23, 2) pk(12) = b12$ b13$ = Mid(pt, 25, 2) pk(13) = b13$ k = 0 count0 = 0 count1 = 0 ' qq quasi-loop because the Exit For in the ' loop forces closure of the loop rather than Next For q = n + 1 To nn For qq = 0 To 0 ps = Array1(q, 1) ' pt is the profile to be checked zz = 0 xx = 0 a5$ = Mid(ps, 9, 2) ph(5) = a5$ If ph(5) <> pk(5) Then zz = zz + 1 xx = xx + zz zz = 0 a6$ = Mid(ps, 11, 2) ph(6) = a6$ If ph(6) <> pk(6) Then zz = zz + 1 xx = xx + zz If xx > 1 Then Exit For End If zz = 0 a7$ = Mid(ps, 13, 2) ph(7) = a7$ If ph(7) <> pk(7) Then zz = zz + 1 xx = xx + zz If xx > 1 Then Exit For End If zz = 0 a8$ = Mid(ps, 15, 2) ph(8) = a8$ If ph(8) <> pk(8) Then zz = zz + 1 xx = xx + zz If xx > 1 Then Exit For End If zz = 0 a9$ = Mid(ps, 17, 2) ph(9) = a9$ If ph(9) <> pk(9) Then zz = zz + 1 xx = xx + zz If xx > 1 Then Exit For End If zz = 0 a10$ = Mid(ps, 19, 2) ph(10) = a10$ If ph(10) <> pk(10) Then zz = zz + 1 xx = xx + zz If xx > 1 Then Exit For End If zz = 0 a11$ = Mid(ps, 21, 2) ph(11) = a11$ If ph(11) <> pk(11) Then zz = zz + 1 xx = xx + zz If xx > 1 Then Exit For End If zz = 0 a12$ = Mid(ps, 23, 2) ph(12) = a12$ If ph(12) <> pk(12) Then zz = zz + 1 xx = xx + zz If xx > 1 Then Exit For End If zz = 0 a13$ = Mid(ps, 25, 2) ph(13) = a13$ If ph(13) <> pk(13) Then zz = zz + 1 xx = xx + zz If xx > 1 Then Exit For End If Write #10, ps, pt, 10 - xx Next qq ' xx is the count of non-matching alleles in ps profile to the pt profile count1 = count1 + 1 k = k + 1 Next q j = j + 1 ' beeps after 1000s If j / 1000 = Int(j / 1000) Then For beepc = 1 To (j / 1000) For beept = 1 To 200000 beepu = 1 / beept Next beept Beep Next beepc End If Next n Close #1 Next aa Next bb Close #10 ' end beep For beepc = 1 To 10 For beept = 1 To 200000 beepu = 1 / beept Next beept Beep Next beepc
' converting integre values back to DNA loci,alleles, 13 loci Codis ' afterwards tidy up using Word search/replace ' to get rid of string quotes "" etc ' xxxx is number of profiles to be converted Dim ph(26) Dim pj(26) Dim ps As String Open "result.txt" For Input As #1 Open "result_conv.txt" For Output As #2 For x = 1 To xxxx Input #1, ps a1$ = Mid(ps, 1, 1) a2$ = Mid(ps, 2, 1) a3$ = Mid(ps, 3, 1) a4$ = Mid(ps, 4, 1) a5$ = Mid(ps, 5, 1) a6$ = Mid(ps, 6, 1) a7$ = Mid(ps, 7, 1) a8$ = Mid(ps, 8, 1) a9$ = Mid(ps, 9, 1) a10$ = Mid(ps, 10, 1) a11$ = Mid(ps, 11, 1) a12$ = Mid(ps, 12, 1) a13$ = Mid(ps, 13, 1) a14$ = Mid(ps, 14, 1) a15$ = Mid(ps, 15, 1) a16$ = Mid(ps, 16, 1) a17$ = Mid(ps, 17, 1) a18$ = Mid(ps, 18, 1) a19$ = Mid(ps, 19, 1) a20$ = Mid(ps, 20, 1) a21$ = Mid(ps, 21, 1) a22$ = Mid(ps, 22, 1) a23$ = Mid(ps, 23, 1) a24$ = Mid(ps, 24, 1) a25$ = Mid(ps, 25, 1) a26$ = Mid(ps, 26, 1) ph(0) = a1$ ph(1) = a2$ ph(2) = a3$ ph(3) = a4$ ph(4) = a5$ ph(5) = a6$ ph(6) = a7$ ph(7) = a8$ ph(8) = a9$ ph(9) = a10$ ph(10) = a11$ ph(11) = a12$ ph(12) = a13$ ph(13) = a14$ ph(14) = a15$ ph(15) = a16$ ph(16) = a17$ ph(17) = a18$ ph(18) = a19$ ph(19) = a20$ ph(20) = a21$ ph(21) = a22$ ph(22) = a23$ ph(23) = a24$ ph(24) = a25$ ph(25) = a26$ For j = 0 To 1 ' D3 If ph(j) = "0" Then pj(j) = 12 If ph(j) = "1" Then pj(j) = 13 If ph(j) = "2" Then pj(j) = 14 If ph(j) = "3" Then pj(j) = 15 If ph(j) = "4" Then pj(j) = 16 If ph(j) = "5" Then pj(j) = 17 If ph(j) = "6" Then pj(j) = 18 If ph(j) = "7" Then pj(j) = 19 If ph(j) = "8" Then pj(j) = 20 Next j For j = 2 To 3 ' vWA ' 11, "8" ' 12, "9" If ph(j) = "0" Then pj(j) = 13 If ph(j) = "1" Then pj(j) = 14 If ph(j) = "2" Then pj(j) = 15 If ph(j) = "3" Then pj(j) = 16 If ph(j) = "4" Then pj(j) = 17 If ph(j) = "5" Then pj(j) = 18 If ph(j) = "6" Then pj(j) = 19 If ph(j) = "7" Then pj(j) = 20 ' 21, "A" Next j For j = 4 To 5 ' D8 If ph(j) = "0" Then pj(j) = 8 If ph(j) = "1" Then pj(j) = 9 If ph(j) = "2" Then pj(j) = 10 If ph(j) = "3" Then pj(j) = 11 If ph(j) = "4" Then pj(j) = 12 If ph(j) = "5" Then pj(j) = 13 If ph(j) = "6" Then pj(j) = 14 If ph(j) = "7" Then pj(j) = 15 If ph(j) = "8" Then pj(j) = 16 '17 '18 Next j For j = 6 To 7 ' D5 If ph(j) = "0" Then pj(j) = 7 If ph(j) = "1" Then pj(j) = 8 If ph(j) = "2" Then pj(j) = 9 If ph(j) = "3" Then pj(j) = 10 If ph(j) = "4" Then pj(j) = 11 If ph(j) = "5" Then pj(j) = 12 If ph(j) = "6" Then pj(j) = 13 If ph(j) = "7" Then pj(j) = 14 ' 8 Next j For j = 8 To 9 ' D13 If ph(j) = "0" Then pj(j) = 8 If ph(j) = "1" Then pj(j) = 9 If ph(j) = "2" Then pj(j) = 10 If ph(j) = "3" Then pj(j) = 11 If ph(j) = "4" Then pj(j) = 12 If ph(j) = "5" Then pj(j) = 13 If ph(j) = "6" Then pj(j) = 14 ' 15 Next j For j = 10 To 11 ' D7 If ph(j) = "0" Then pj(j) = 7 If ph(j) = "1" Then pj(j) = 8 If ph(j) = "2" Then pj(j) = 9 If ph(j) = "3" Then pj(j) = 10 If ph(j) = "4" Then pj(j) = 11 If ph(j) = "5" Then pj(j) = 12 If ph(j) = "6" Then pj(j) = 13 If ph(j) = "7" Then pj(j) = 14 Next j For j = 12 To 13 ' D16 If ph(j) = "0" Then pj(j) = 8 If ph(j) = "1" Then pj(j) = 9 If ph(j) = "2" Then pj(j) = 10 If ph(j) = "3" Then pj(j) = 11 If ph(j) = "4" Then pj(j) = 12 If ph(j) = "5" Then pj(j) = 13 If ph(j) = "6" Then pj(j) = 14 Next j For j = 14 To 15 ' FGA If ph(j) = "0" Then pj(j) = 16 If ph(j) = "1" Then pj(j) = 18 If ph(j) = "2" Then pj(j) = 19 If ph(j) = "3" Then pj(j) = 20 If ph(j) = "4" Then pj(j) = 21 If ph(j) = "5" Then pj(j) = 21.2 If ph(j) = "6" Then pj(j) = 22 If ph(j) = "7" Then pj(j) = 22.2 If ph(j) = "8" Then pj(j) = 23 If ph(j) = "9" Then pj(j) = 24 If ph(j) = "A" Then pj(j) = 25 If ph(j) = "B" Then pj(j) = 26 If ph(j) = "C" Then pj(j) = 27 Next j For j = 16 To 17 ' D21 If ph(j) = "0" Then pj(j) = 26 If ph(j) = "1" Then pj(j) = 27 If ph(j) = "2" Then pj(j) = 28 If ph(j) = "3" Then pj(j) = 29 If ph(j) = "4" Then pj(j) = 29.2 If ph(j) = "5" Then pj(j) = 30 If ph(j) = "6" Then pj(j) = 30.2 If ph(j) = "7" Then pj(j) = 31 If ph(j) = "8" Then pj(j) = 31.2 If ph(j) = "9" Then pj(j) = 32 If ph(j) = "A" Then pj(j) = 32.2 If ph(j) = "B" Then pj(j) = 33.2 If ph(j) = "C" Then pj(j) = 34 If ph(j) = "D" Then pj(j) = 34.2 Next j For j = 18 To 19 ' D18 If ph(j) = "0" Then pj(j) = 10 If ph(j) = "1" Then pj(j) = 11 If ph(j) = "2" Then pj(j) = 12 If ph(j) = "3" Then pj(j) = 13 If ph(j) = "4" Then pj(j) = 14 If ph(j) = "5" Then pj(j) = 15 If ph(j) = "6" Then pj(j) = 16 If ph(j) = "7" Then pj(j) = 17 If ph(j) = "8" Then pj(j) = 18 If ph(j) = "9" Then pj(j) = 19 If ph(j) = "A" Then pj(j) = 20 If ph(j) = "B" Then pj(j) = 21 If ph(j) = "C" Then pj(j) = 22 If ph(j) = "D" Then pj(j) = 23 If ph(j) = "E" Then pj(j) = 24 Next j For j = 20 To 21 ' THO1 If ph(j) = "0" Then pj(j) = 6 If ph(j) = "1" Then pj(j) = 7 If ph(j) = "2" Then pj(j) = 8 If ph(j) = "3" Then pj(j) = 9 If ph(j) = "4" Then pj(j) = 9.3 If ph(j) = "5" Then pj(j) = 10 Next j For j = 22 To 23 ' TPOX If ph(j) = "0" Then pj(j) = 8 If ph(j) = "1" Then pj(j) = 9 If ph(j) = "2" Then pj(j) = 10 If ph(j) = "3" Then pj(j) = 11 If ph(j) = "4" Then pj(j) = 12 Next j For j = 24 To 25 ' CSF1PO If ph(j) = "0" Then pj(j) = 8 If ph(j) = "1" Then pj(j) = 9 If ph(j) = "2" Then pj(j) = 10 If ph(j) = "3" Then pj(j) = 11 If ph(j) = "4" Then pj(j) = 12 If ph(j) = "5" Then pj(j) = 13 If ph(j) = "6" Then pj(j) = 14 If ph(j) = "7" Then pj(j) = 15 Next j Write #2, ""; pj(0), pj(1); ""; pj(2), pj(3); ""; pj(4), pj(5); ""; pj(6), pj(7); ""; pj(8), pj(9); ""; pj(10), pj(11); ""; pj(12), pj(13); ""; pj(14), pj(15); ""; pj(16), pj(17); ""; pj(18), pj(19); ""; pj(20), pj(21); ""; pj(22), pj(23); ""; pj(24), pj(25); "" Next x Close #1 Close #2
' Generating 13 loci x2 profiles CODIS with >8 percent AFs ' directing pairs and first divider ' with changed order ' L11,L12,L13 then L1 to L10 to deliberately cherry-pick for greatest number ' of matches Dim ph(26) Dim pb(26) ' initialising Random Number Generator - RNG count9 = 0 count8 = 0 Randomize a = 214013 c = 2531011 x0 = Timer z = 2 ^ 24 ' 1 file 'aug02a-g' for original, un-directed pairs, source data. ' This file is necessary to check on the performance of the RNG ' when a matched pair is found then it is highly unlikely that ' both sequences as generated, before pair directing, would ' be the same - more likely a manifest of repeat within the RNG ' (reason for adopting the 214013 / 2531011 RNG ) ' Use 'Word' find function on part of the sequences, including pair reversals, ' with luck would include a 'homozygotic' pair eg (3,3) say ,so no reversal ' on that pair Open "aug02a-g" For Output As #1 ' outputs directed and divided by first digit Open "aug02a-0" For Output As #10 Open "aug02a-1" For Output As #11 Open "aug02a-2" For Output As #12 Open "aug02a-3" For Output As #13 Open "aug02a-4" For Output As #14 Open "aug02a-5" For Output As #15 Open "aug02a-6" For Output As #16 Open "aug02a-7" For Output As #17 Open "aug02a-8" For Output As #18 Open "aug02a-9" For Output As #19 ' change xxxx for different total size ' for xxxx = 10000000 my computer took 5 hours to generate over-night xxxx = 100000 xxxx = xxxx - 1 For x = 0 To xxxx flag = 0 For j = 0 To 1 ' D3 , locus 1 ' RNG random number generator temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj If ph(j) < 0.005 Then ph(j) = 11 If ph(j) < 0.01 Then ph(j) = 1 If ph(j) < 0.124 Then ph(j) = 2 If ph(j) < 0.382 Then ph(j) = 3 If ph(j) < 0.623 Then ph(j) = 4 If ph(j) < 0.84 Then ph(j) = 5 If ph(j) < 0.988 Then ph(j) = 6 If ph(j) < 0.998 Then ph(j) = 7 If ph(j) < 1 Then ph(j) = 8 If ph(j) > 10 Then ph(j) = 0 If ph(j) = "7" Then flag = 1 If ph(j) = "8" Then flag = 1 Next j For j = 2 To 3 ' vWA locus 2 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj If ph(j) < 0.002 Then ph(j) = 11 If ph(j) < 0.084 Then ph(j) = 1 If ph(j) < 0.193 Then ph(j) = 2 If ph(j) < 0.42 Then ph(j) = 3 If ph(j) < 0.691 Then ph(j) = 4 If ph(j) < 0.903 Then ph(j) = 5 If ph(j) < 0.987 Then ph(j) = 6 If ph(j) < 1 Then ph(j) = 7 If ph(j) > 10 Then ph(j) = 0 If ph(j) = "0" Then flag = 1 If ph(j) = "1" Then flag = 1 If ph(j) = "6" Then flag = 1 If ph(j) = "7" Then flag = 1 Next j For j = 4 To 5 ' D8 , locus 3 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj If ph(j) < 0.012 Then ph(j) = 11 If ph(j) < 0.022 Then ph(j) = 1 If ph(j) < 0.107 Then ph(j) = 2 If ph(j) < 0.187 Then ph(j) = 3 If ph(j) < 0.324 Then ph(j) = 4 If ph(j) < 0.64 Then ph(j) = 5 If ph(j) < 0.865 Then ph(j) = 6 If ph(j) < 0.976 Then ph(j) = 7 If ph(j) < 1 Then ph(j) = 8 If ph(j) > 10 Then ph(j) = 0 If ph(j) = "0" Then flag = 1 If ph(j) = "1" Then flag = 1 If ph(j) = "8" Then flag = 1 Next j For j = 6 To 7 ' D5 , locus 4 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj If ph(j) < 0.002 Then ph(j) = 11 If ph(j) < 0.004 Then ph(j) = 1 If ph(j) < 0.031 Then ph(j) = 2 If ph(j) < 0.099 Then ph(j) = 3 If ph(j) < 0.435 Then ph(j) = 4 If ph(j) < 0.824 Then ph(j) = 5 If ph(j) < 0.985 Then ph(j) = 6 If ph(j) < 1 Then ph(j) = 7 If ph(j) > 10 Then ph(j) = 0 If ph(j) = "0" Then flag = 1 If ph(j) = "1" Then flag = 1 If ph(j) = "2" Then flag = 1 If ph(j) = "3" Then flag = 1 If ph(j) = "7" Then flag = 1 Next j For j = 8 To 9 ' D13 , locus 5 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj If ph(j) < 0.111 Then ph(j) = 11 If ph(j) < 0.198 Then ph(j) = 1 If ph(j) < 0.268 Then ph(j) = 2 If ph(j) < 0.543 Then ph(j) = 3 If ph(j) < 0.862 Then ph(j) = 4 If ph(j) < 0.944 Then ph(j) = 5 If ph(j) < 1 Then ph(j) = 6 If ph(j) > 10 Then ph(j) = 0 If ph(j) = "1" Then flag = 1 If ph(j) = "5" Then flag = 1 If ph(j) = "6" Then flag = 1 Next j For j = 10 To 11 ' D7, locus 6 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj If ph(j) < 0.02 Then ph(j) = 11 If ph(j) < 0.162 Then ph(j) = 1 If ph(j) < 0.302 Then ph(j) = 2 If ph(j) < 0.589 Then ph(j) = 3 If ph(j) < 0.816 Then ph(j) = 4 If ph(j) < 0.955 Then ph(j) = 5 If ph(j) < 0.993 Then ph(j) = 6 If ph(j) < 1 Then ph(j) = 7 If ph(j) > 10 Then ph(j) = 0 If ph(j) = "0" Then flag = 1 If ph(j) = "6" Then flag = 1 If ph(j) = "7" Then flag = 1 Next j For j = 12 To 13 ' D16, locus 7 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj If ph(j) < 0.017 Then ph(j) = 11 If ph(j) < 0.133 Then ph(j) = 1 If ph(j) < 0.182 Then ph(j) = 2 If ph(j) < 0.472 Then ph(j) = 3 If ph(j) < 0.824 Then ph(j) = 4 If ph(j) < 0.972 Then ph(j) = 5 If ph(j) < 1 Then ph(j) = 6 If ph(j) > 10 Then ph(j) = 0 If ph(j) = "0" Then flag = 1 If ph(j) = "2" Then flag = 1 If ph(j) = "6" Then flag = 1 Next j For j = 14 To 15 ' FGA, locus 8 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj pb(j) = "Z" If ph(j) < 0.002 Then ph(j) = 11 If ph(j) < 0.011 Then ph(j) = 1 If ph(j) < 0.091 Then ph(j) = 2 If ph(j) < 0.245 Then ph(j) = 3 If ph(j) < 0.429 Then ph(j) = 4 If ph(j) < 0.431 Then ph(j) = 5 If ph(j) < 0.605 Then ph(j) = 6 If ph(j) < 0.612 Then ph(j) = 7 If ph(j) < 0.755 Then ph(j) = 8 If ph(j) < 0.897 Then ph(j) = 9 If ph(j) < 0.963 And ph(j) >= 0.897 Then pb(j) = "A" If ph(j) < 0.993 And ph(j) >= 0.963 Then pb(j) = "B" If ph(j) < 1 And ph(j) >= 0.993 Then pb(j) = "C" If ph(j) > 10 Then ph(j) = 0 If pb(j) <> "Z" Then ph(j) = pb(j) If ph(j) = "0" Then flag = 1 If ph(j) = "1" Then flag = 1 If ph(j) = "2" Then flag = 1 If ph(j) = "5" Then flag = 1 If ph(j) = "7" Then flag = 1 If pb(j) = "A" Then flag = 1 If pb(j) = "B" Then flag = 1 If pb(j) = "C" Then flag = 1 Next j For j = 16 To 17 ' D21, locus 9 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj pb(j) = "Z" If ph(j) < 0.005 Then ph(j) = 11 If ph(j) < 0.041 Then ph(j) = 1 If ph(j) < 0.201 Then ph(j) = 2 If ph(j) < 0.452 Then ph(j) = 3 If ph(j) < 0.454 Then ph(j) = 4 If ph(j) < 0.683 Then ph(j) = 5 If ph(j) < 0.7 Then ph(j) = 6 If ph(j) < 0.785 Then ph(j) = 7 If ph(j) < 0.874 Then ph(j) = 8 If ph(j) < 0.886 Then ph(j) = 9 If ph(j) < 0.964 And ph(j) >= 0.886 Then pb(j) = "A" If ph(j) < 0.996 And ph(j) >= 0.964 Then pb(j) = "B" If ph(j) < 0.998 And ph(j) >= 0.996 Then pb(j) = "C" If ph(j) < 1 And ph(j) >= 0.998 Then pb(j) = "D" If ph(j) > 10 Then ph(j) = 0 If pb(j) <> "Z" Then ph(j) = pb(j) If ph(j) = "0" Then flag = 1 If ph(j) = "1" Then flag = 1 If ph(j) = "4" Then flag = 1 If ph(j) = "6" Then flag = 1 If ph(j) = "7" Then flag = 1 If ph(j) = "9" Then flag = 1 If pb(j) = "A" Then flag = 1 If pb(j) = "B" Then flag = 1 If pb(j) = "C" Then flag = 1 If pb(j) = "D" Then flag = 1 Next j For j = 18 To 19 ' D18, locus 10 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj pb(j) = "Z" If ph(j) < 0.005 Then ph(j) = 11 If ph(j) < 0.017 Then ph(j) = 1 If ph(j) < 0.137 Then ph(j) = 2 If ph(j) < 0.256 Then ph(j) = 3 If ph(j) < 0.435 Then ph(j) = 4 If ph(j) < 0.602 Then ph(j) = 5 If ph(j) < 0.742 Then ph(j) = 6 If ph(j) < 0.863 Then ph(j) = 7 If ph(j) < 0.924 Then ph(j) = 8 If ph(j) < 0.955 Then ph(j) = 9 If ph(j) < 0.974 And ph(j) >= 0.955 Then pb(j) = "A" If ph(j) < 0.986 And ph(j) >= 0.974 Then pb(j) = "B" If ph(j) < 0.991 And ph(j) >= 0.986 Then pb(j) = "C" If ph(j) < 0.998 And ph(j) >= 0.991 Then pb(j) = "D" If ph(j) < 1 And ph(j) >= 0.998 Then pb(j) = "E" If ph(j) > 10 Then ph(j) = 0 If pb(j) <> "Z" Then ph(j) = pb(j) If ph(j) = "0" Then flag = 1 If ph(j) = "1" Then flag = 1 If ph(j) = "8" Then flag = 1 If ph(j) = "9" Then flag = 1 If pb(j) = "A" Then flag = 1 If pb(j) = "B" Then flag = 1 If pb(j) = "C" Then flag = 1 If pb(j) = "D" Then flag = 1 If pb(j) = "E" Then flag = 1 If pb(j) = "F" Then flag = 1 Next j For j = 20 To 21 ' THO1 , locus 11 ' RNG random number generator temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj If ph(j) < 0.239 Then ph(j) = 11 If ph(j) < 0.403 Then ph(j) = 1 If ph(j) < 0.512 Then ph(j) = 2 If ph(j) < 0.679 Then ph(j) = 3 If ph(j) < 0.988 Then ph(j) = 4 If ph(j) < 1 Then ph(j) = 5 If ph(j) > 10 Then ph(j) = 0 If ph(j) = "5" Then flag = 1 Next j For j = 22 To 23 ' TPOX, locus 12 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj If ph(j) < 0.543 Then ph(j) = 11 If ph(j) < 0.632 Then ph(j) = 1 If ph(j) < 0.7 Then ph(j) = 2 If ph(j) < 0.968 Then ph(j) = 3 If ph(j) < 1 Then ph(j) = 4 If ph(j) > 10 Then ph(j) = 0 If ph(j) = "2" Then flag = 1 If ph(j) = "4" Then flag = 1 Next j For j = 24 To 25 ' CSF1PO , locus 13 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj If ph(j) < 0.002 Then ph(j) = 11 If ph(j) < 0.031 Then ph(j) = 1 If ph(j) < 0.301 Then ph(j) = 2 If ph(j) < 0.62 Then ph(j) = 3 If ph(j) < 0.928 Then ph(j) = 4 If ph(j) < 0.991 Then ph(j) = 5 If ph(j) < 0.998 Then ph(j) = 6 If ph(j) < 1 Then ph(j) = 7 If ph(j) > 10 Then ph(j) = 0 If ph(j) = "0" Then flag = 1 If ph(j) = "1" Then flag = 1 If ph(j) = "5" Then flag = 1 If ph(j) = "6" Then flag = 1 If ph(j) = "7" Then flag = 1 Next j If flag = 1 Then countf = countf + 1 If flag = 0 Then ' output the original generated file ' output the original generated file Write #1, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) & ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) ' Because in real DNA profiles without further info ,no one ' knows which allele in each pair came from the mother or father ' by convention they are written smaller ,larger (or equal). ' The following directs each pair For j = 0 To 24 Step 2 If ph(j + 1) < ph(j) Then jjj = ph(j) ph(j) = ph(j + 1) ph(j + 1) = jjj End If Next j ' put extra conditional statements here to reduce ' the number of files or just delete some of the following ' ' dividing on first col If ph(20) = 0 Then Write #10, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count0 = count0 + 1 End If If ph(20) = 1 Then Write #11, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count1 = count1 + 1 End If If ph(20) = 2 Then Write #12, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count2 = count2 + 1 End If If ph(20) = 3 Then Write #13, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count3 = count3 + 1 End If If ph(20) = 4 Then Write #14, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count4 = count4 + 1 End If If ph(20) = 5 Then Write #15, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count5 = count5 + 1 End If If ph(20) = 6 Then Write #16, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count6 = count6 + 1 End If If ph(20) = 7 Then Write #17, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count7 = count7 + 1 End If If ph(20) = 8 Then Write #18, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count8 = count8 + 1 End If If ph(20) = 9 Then Write #19, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count9 = count9 + 1 End If end if Next x Close #10 Close #11 Close #12 Close #13 Close #14 Close #15 Close #16 Close #17 Close #18 Close #19 Close #1 ' count file for data to fix for - next loops in sucessive dividings Open "aug02a-c" For Output As #20 Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9 Close #20
Results With a quicker algorithm it now takes me a couple of hours on a ten year old pc to check for all partial matches. This presumably translates to about 10 minutes on a modern pc. I'll stick with that, without burrowing into b-trees, bb-trees, hash tables etc. Basically it goes more in the direction of discarding non-matching alleles as >3 non-matches occur quicker than >8 matches . Anyway results for half of 65,000 using RCMP Cauacasian allele frequency data. 13 loci CODIS simulation using RCMP Toronto Caucasian data http://www.csfs.ca/databases/cfs_CC_ProfilerPlus_freq.htm http://www.csfs.ca/databases/cfs_CC_Cofiler_freq.htm So anticipating about 1/4 of 72 = 18 , 9 loci matches in 32,500 Result 34 9 loci matches One 10 loci match , that one for D3,vWA,D8,D5,D13,D7,D16,FGA,D21,D18,THO1,TPOX,CSF1PO (16,20)(14,17)(12,14)(11,12)(11,14)(12,12)(12,13)(20,24)(29,30)(14,15)(6,9.3)(8,8)(10,12) and (15,15)(18,18)(12,14)(11,12)(9,11)(12,12)(12,13)(20,24)(29,30)(14,15)(6,9.3)(8,8)(10,12) Does anyone have any idea of the Arizona allele frequencies ? I was expecting to decrease the randomness, having to build in co-ancestry to match real world results. I cannot of course increase the randomness, it is a very well behaved RNG. Including a proportion of afro-caribbean 'profiles' I would not expect that much difference as for these sorts of matches it is much more heavily biased towards the largest allele frequencies for which A-C values, some increase and some decrease cf caucasian. Analysing the allele frequencies occuring in these 35 matches then as usual the highest standard AFs are increased , middle ranking stay much the same and rare ones drop out in effect. For just the most common alleles per locus D3 15, goes from normal AF of 26 per cent to matches AF of 30 pc vWA 17, goes from 27 to 31 per cent D8 14, 32 to 48 pc D5 12, 38 to 46 D13 12, 31 to 46 D7 10, 29 to 31 D16 12, 35 to 45 FGA 21, 18 to 21 D21 29, 25 to 35 D18 14, 18 to 23 THO1 9.3, 31 to 37 TPOX 8, 54 to 76 CSF1 11, 31 to 40 I've still not seen confirmation that the Arizona results refer to permutated partial matches rather than first 9 of 13 / 10 of 13 loci matches which if leaving out the TPOX one would bring the 13 down to more like 12 in effect, lowering the results The results not converted Locus 1 "244555664414349C5A77040034","24235566442434465A45040034",9 "25355545451334393548340324","25355545444434463544340324",9 "3324455533143536335A240345","33244555331135443525240345",9 "33343544343534495545230023","33342344343534193545040023",9 "33345655334534495558030034","3334565533453426AA46440034",9 "34334555141334487A57340023","34334555141334343525340323",9 "34345755133544465A24440024","34345755023544693524440034",9 "343555364534342A5744010345","34352536233434135744010334",9 "3435574604354646337B440034","34355745043546383336000034",9 "344546443424554A3A35130024","344546440424556A5A35120024",9 "34455655443314382535340423","34455645463314382534340323",9 "35265544443433462356440233","35245544443433892378440235",9 "35344745044434462544040334","35342745044434465A9A040034",9 "44364446443434685722030024","44344456440234685722110024",9 "44364545343414463734030345","443545453434144A3725030324",9 "45345545233445235A57010024","45345545233345045A48040024",9 "45354555343433442934030324","45354555343433396866230324",9 "45355545043434383338110323","45355545042646383379110333",9 "45456645143435441226340023","45456645143434445826340333",9 "55354535333414493524010334","55144535334414693522010334",9 "553645553411344B3522240045","55364555341134365722240334",9 "55465645141345283523340034","55465645041345242723010034",9 "56255545042434292366240134","56252445042434462347140134",9 "56333556343434232357040044","56255856343434233A38040044",9 "56345656444535695A24240024","56345656344535393547240024",9 locus 2 "45345536342434363522030334","23344444341234363522030334",9 "453466450414458B5A47440123","343466450414132A5A47440124",9 "55346745340344463345240324","35346745342334393345240324",9 "36355544044534682327010023","25355544041334362327010024",9 "45455644343444483578140024","25455636343444283545140024",9 "44453425441534693522030024","34453556441534693537030024",9 "46453545233514391248013323","45453545233514663A57013323",9 "354656453424148A2569440034","344656453445144A3369440034",9 "34565545331234682327040024","26564645331234682324010024",9 Locus 3 "48144645365545393545040024","33554645135545393545040024",10 So most matches are found on the first 2 processings. These took a few seconds each for dividing and about an hour each for match checking. Running the >8 percent version for xx = 1,055,000 would produce about 32500 profiles. Then sort each sub-file , reconcatenate and then use the dnas5 match checker. Results for 32,488 such profiles then first 8 loci matches , = 70 9 loci matches = 5 10 loci matches 0 I've not had time to fully check the following possible algorithm let alone implement it. To reduce the number of profiles before "each with each" checking for maximal partial matches. for 13 loci Pair up loci, 1 &2, .... 11&12 Produce a look-up table of "allele frequencies" for each pairing of allele quads. Then delete all profiles that show only 1 frequency of occurance in 5 or more pairings. That would seem not to throw out the baby with the bathwater but would need further checking. Loci pairs with 1 locus matching and one unmatching , if rearanged as the next step would not apparently have been elligible for inclusion in the next deletion round. Re-assigning the remaing profiles as 2 & 3, ....... 12 & 13 and repeating. Then checking each remaining profile with each other for 9,10,11,12 and 13 loci partial matches. For 10 loci, similar but final output for 8,9 and 10 loci matches. Pair up as 1 & 2 ...... 9 & 10 delete any profile with 3 or more single pairwise occurances. pair up as 2 & 3 ,........... 8 & 9, 10 & 1 delete any profile with 3 or more single pairwise occurances. This is my attempt to explain the Arizona partial DNA profile matches. Requiring the use of the simple and neat but surprisingly accurate "Jan Haugland" approximation for non-integre factorials via the Gamma function and back to factorial notation. (n+a)! == n! * (n + (1+a)/2 )^a or a Gamma Function Calculator on the net (and account for the supplementary "1"). For various coefficients of relationship (C of R) so statistical combinations of eg 6.5 from 9 ( for brothers, CofR =1/2 so 13/2) as well as 9 from 13 so numbers like 6.5!, 3.25!, 0.5! etc For a C of R of 0.0385 or 0.5/13, half a locus co-ancestry on average,
(T9 meaning the matching chance for locus 9) T9 (for > 5.6 per cent, CofR 0.5/13) = 2.6 * 10^-11.
134, 9 loci matches
22.4 , 10 loci matches
2.2, 11 loci matches
0.07 , 12 loci matches
T10 = 1.44*10^-11, T11 = 3.9*10^-12.
On top of that it is only required to add 2 or 3 people from one consanguinous family so increasing the C of R to 7/8 , to supply the related 11 and 12 loci matches. T9 (for > 6 per cent, CofR 0.4/13) = 3.6 * 10^-11.
149, 9 loci matches
39 , 10 loci matches
3.1, 11 loci matches
The Arizona Data is 144, 9 loci matches; 22 , 10 loci matches; 2 related 11 loci matches and 1 related 12 loci match in 65,493 , 13 loci DNA profiles. My maths involves using the formula for the first match , from the loaded dice derivation, but gradually ignoring the minor alleles , rescaling to give an AF sum of 1 and re-processing until the maths agrees with reality. Total anathema to forensic 'scientists' but DNA matches only involve the small sub-set of people with ALL large allele frequencies (like my own DNA profile, presumably because at least 3 generations of ancestry from only 2 counties). Not necessarily the largest at any locus but cerainly not any of the minor ones. For the Arizona near-match above my T13 value was determined by ignoring all AFs ( allele frequencies from the RCMP site ) less than 5.6 per cent . The simulated populations below used 6 per cent as the cut-off. My "coefficient of allelic co-ancestry" for the Arizona simulation above is T13 = 3.6 * 10^-14 for 13 loci Thence via the square law the minimum number of unrelated CODIS profiles before an evens chance of 1, 13 loci match is SQRT (2/T) = 7.4 million Then scaling T by 715, 286, 78 etc for T9, T10,T11 etc, partial matches and then scaling by the non-integre combination factors for 1/2 , 1/4, 1/8, 1/26 etc shared DNA. The bounds for T13 restrict it to the range of allele frequencies to be >5.6 per cent to > 6 per cent (CofR in range 0.4/13 and 0.5/13 ) to give the unrelated 9 loci mastches to be less than 144 on the one hand and not more than 0.6666 unrelated 11 loci matches on the other.
So for T9 = the T(matching chance) for 9 loci, x9 = number of 9 loci matches, n = half the square of the population being considered, C(2.5,9) the number of combinations of 2.5 from 9 because 6.5 (13 loci/2) match, as brothers say. Then x9 = T9 * n * C(9,13) * C(2.5,9) Attemps to simulate a population to give the Arizona 144,22,2,1 numbers did not work. Even if there was as much as 2.5 percent cousin- cousin marriages (USA generally less than 1 percent) that would only contribute a single 9 locus partial match. It is a juggling optimisation exercise with the main constraints being: I've allowed the maximum of 11 loci unrelated partial matches to be less than 0.6666 so less than one when summed and rounded, precludes putting the co-ancestry coefficient too high. For related matches , 11 loci, > 1.3333 to give 2 when rounded, precludes increasing the related numbers too high. I've made sons and brothers non-exclusive to a certain extent so say 59,000 unrelated plus 6,000 (fathers and sons F+Ss) and 4,000 brothers ( B+Bs ) can sum to 65,000. I've also added a cross-component of random matches between the related and untrelated sections, to the unrelated side, relatively minor, but considered. Target from the Arizona data 144 pairs at 9 loci 22 pairs at 10 loci 2 pairs at 11 loci 1 pair at 12 loci
......... Unrelated / F+Ss / B+Bs / .... Totals
.......... 59,000 ...6,000 .. 4,000 ... 65,000
9 loci... 54.3 ...... 26.9 ... 19.9 ... 101.1
10 loci . 8.7 ....... 12.4 ... 4.7 .... 25.8
11 loci . 0.64 ...... 1.33 ... 0.6 .... 2.6
12 loci . 0.02 ...... 0.05 ... 0.02 ... 0.09

One 12 loci match is easily added by the use of
one 7/8 consanguinity pair of grandfather and
grandson via incestuous son and daughter mating.
Changing to the following gives a better match but
I do not know how to increase the 9 loci figures
without increasing the 10 loci figures outside
the bounds. Cousin matches do not work
either.

......... Unrelated / F+S / B+B / .... Totals
.......... 59,000 ...1,000 .. 5,500 ... 65,000
9 loci... 54.3 ...... 1.0 ... 51.1 .... 106
10 loci . 8.7 ....... 0.46 ... 12.1 .... 21.3
11 loci . 0.64 ...... 0.05 ... 1.27 .... 1.96
and adding a 7/8 consanguinous pair for the
12 loci match.


So the simpler simulations using a non-Bayesian coefficient of 
co-ancestry of order of about one allele in 26 gives 
the closer results, unless anyone has any ideas how to juggle a 
hypothetical population of fathers, brothers, cousins, etc.

The following is adapting the maths to allow 
for an overall population coefficient of relationship 
ie various degrees of shared DNA loci from .1 in 13 to 
3.9 in 13 and also the cut-down AF data.


Below are VB routines to place between 
Sub and End Sub. The first one to enter 
tabulated Allele Frequency (AF) data into 
a file for later use


'Allele Frequency data entry routine ' Some error trapping, if sum of allele frequencies ' is not =1 +/- 0.002 then the routine repeats that 'locus data, if you know you you've made an error ' ' just enter 1 for each allele to the end and then repeat ' that locus data from the start ' 'if you manually cut/paste the output file you 'have to add 2 numbers at the end of each line ' before using in later processing, values as such ' are ignored but some numbers have to be there ll = InputBox("Enter the number of Loci") ll = Val(ll) Dim af(20, 30) yyyy = InputBox(" Output file number ") yyyy = Val(yyyy) temp0 = "allele frequencies" & Str(yyyy) Open temp0 For Output As #10 For m = 1 To ll Beep nn = InputBox("Enter the number of alleles in this locus") nn = Val(nn) If nn < 2 Or nn > 30 Or nn <> Int(nn) Then Beep nn = InputBox(" *** Error *** Non integre or other problem , Enter the number of alleles in this locus") nn = Val(nn) End If Sn = 0 Qn = 0 For p = 1 To nn pq = InputBox("Enter the allele frequency as a decimal ") pq = Val(pq) af(m, p) = pq Qn = Qn + (pq * pq) Sn = Sn + pq Next p If Sn < 0.998 Or Sn > 1.002 Then Beep nn = InputBox(" *** Error *** Sum of AFs not approximately 1 , Enter the number of alleles in this locus") nn = Val(nn) Sn = 0 Qn = 0 For p = 1 To nn pq = InputBox("Enter the allele frequency ") pq = Val(pq) 'pq=pq/100 here if data is as percent af(m, p) = pq Qn = Qn + (pq * pq) Sn = Sn + pq Next p End If Write #10, nn, For p = 1 To nn Write #10, af(m, p), Next p Write #10, Sn, Qn Next m Close #10
'13 loci processing ' if fails to process , check that the input file for ' first integre equal to the total in the line minus 2 ' or if cut and paste job then concattenate all rows into ' one , replacing end of line character with a comma ll = InputBox("Enter the number of Loci") ll = Val(ll) yy = InputBox(" Input file number ") yy = Val(yy) temp1 = "allele frequencies" & Str(yy) yyyy = InputBox(" Output file number ") yyyy = Val(yyyy) temp2 = "allele frequencies" & Str(yyyy) rr = InputBox("Enter the cut-off AF as percent, process only > this value ") rr = Val(rr) rr = rr / 100 Dim af(20, 30) Open temp1 For Input As #1 Ms = 0 Mt = 1 For m = 1 To ll Sr = 0 Qr = 0 Ss = 0 Qs = 0 Input #1, rs For q = 1 To rs Input #1, rq If rq > rr Then Qr = Qr + rq * rq Sr = Sr + rq End If Next q Input #1, ru Input #1, rv Ar = Sr * Sr + 0.000001 Ar = 1 / Ar Ss = Ar * Qr Qs = Ss * Ss Ms = (2 - Ss) * Qs Mt = Ms * Mt Next m Close #1 Open temp2 For Output As #4 vv = InputBox("Enter the number in population, no commas ") vv = Val(vv) Write #4, Mt vp = 0.5 * vv * vv t13 = Mt t9 = 715 * t13 t10 = 286 * t13 t11 = 78 * t13 t12 = 13 * t13 Write #4, "AF cut-off =", 100 * rr, "population n = ", vp, "T13", t13 For k = 0 To 3 XL = 9 + k For j = 1 To 39 uu2 = Int(j / 10) uu3 = (j - uu2 * 10) / 10 If uu2 = 0 Then XLf = XL If uu2 = 1 Then XLf = XL * (XL - 1) If uu2 = 2 Then XLf = XL * (XL - 1) * (XL - 2) / 2 If uu2 = 3 Then XLf = XL * (XL - 1) * (XL - 2) * (XL - 3) / 6 ' the following statistical combinations factor ' uses an "Jan Haugland" approximation for n! ' (n+a)! == n! * (n + (1+a)/2 )^a or ' for greater accuracy use a Gamma Function Calculator ' or look up tables uu5 = Exp((1 - uu3) * (Log((XL - uu2 - 1) + (2 - uu3) / 2))) uu6 = Exp((uu3) * (Log(uu2 + (1 + uu3) / 2))) uu7 = uu5 * uu6 x9 = 715 * XLf * vp * t9 / uu7 x10 = 286 * XLf * vp * t10 / uu7 x11 = 76 * XLf * vp * t11 / uu7 x12 = 13 * XLf * vp * t12 / uu7 If k = 0 Then Write #4, 9 + k, j / 10, 0.001 * Int(1000 * x9 + 0.5) End If If k = 1 Then Write #4, 9 + k, j / 10, 0.001 * Int(1000 * x10 + 0.5) End If If k = 2 Then Write #4, 9 + k, j / 10, 0.001 * Int(1000 * x11 + 0.5) End If If k = 3 Then Write #4, 9 + k, j / 10, 0.001 * Int(1000 * x12 + 0.5) End If Next j Next k Close #4
' For 10 loci maximum ll = InputBox("Enter the number of Loci") ll = Val(ll) yy = InputBox(" Input file number ") yy = Val(yy) temp1 = "allele frequencies" & Str(yy) yyyy = InputBox(" Output file number ") yyyy = Val(yyyy) temp2 = "allele frequencies" & Str(yyyy) rr = InputBox("Enter the cut-off AF as percent, process only > this value ") rr = Val(rr) rr = rr / 100 Dim af(20, 30) Open temp1 For Input As #1 Ms = 0 Mt = 1 For m = 1 To ll Sr = 0 Qr = 0 Ss = 0 Qs = 0 Input #1, rs For q = 1 To rs Input #1, rq If rq > rr Then Qr = Qr + rq * rq Sr = Sr + rq End If Next q Input #1, ru Input #1, rv Ar = Sr * Sr + 0.000001 Ar = 1 / Ar Ss = Ar * Qr Qs = Ss * Ss Ms = (2 - Ss) * Qs Mt = Ms * Mt Next m Close #1 Open temp2 For Output As #4 vv = InputBox("Enter the number in population, no commas ") vv = Val(vv) Write #4, Mt vp = 0.5 * vv * vv t10 = Mt t6 = 210 * t10 t7 = 120 * t10 t8 = 45 * t10 t9 = 10 * t10 Write #4, "AF cut-off =", 100 * rr, "population n = ", vp, "T10", t10 For k = 0 To 4 XL = 6 + k For j = 1 To 39 uu2 = Int(j / 10) uu3 = (j - uu2 * 10) / 10 If uu2 = 0 Then XLf = XL If uu2 = 1 Then XLf = XL * (XL - 1) If uu2 = 2 Then XLf = XL * (XL - 1) * (XL - 2) / 2 If uu2 = 3 Then XLf = XL * (XL - 1) * (XL - 2) * (XL - 3) / 6 ' the following statistical combinations factor ' uses an "Jan Haugland" approximation for n! ' (n+a)! == n! * (n + (1+a)/2 )^a or ' for greater accuracy use a Gamma Function Calculator ' or look up tables uu5 = Exp((1 - uu3) * (Log((XL - uu2 - 1) + (2 - uu3) / 2))) uu6 = Exp((uu3) * (Log(uu2 + (1 + uu3) / 2))) uu7 = uu5 * uu6 x6 = 210 * XLf * vp * t6 / uu7 x7 = 120 * XLf * vp * t7 / uu7 x8 = 45 * XLf * vp * t8 / uu7 x9 = 10 * XLf * vp * t9 / uu7 x10 = XLf * vp * t10 / uu7 If k = 0 Then Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x6 + 0.5) End If If k = 1 Then Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x7 + 0.5) End If If k = 2 Then Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x8 + 0.5) End If If k = 3 Then Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x9 + 0.5) End If If k = 4 Then Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x10 + 0.5) End If Next j Next k Close #4
' For 9 loci maximum ll = InputBox("Enter the number of Loci") ll = Val(ll) yy = InputBox(" Input file number ") yy = Val(yy) temp1 = "allele frequencies" & Str(yy) yyyy = InputBox(" Output file number ") yyyy = Val(yyyy) temp2 = "allele frequencies" & Str(yyyy) rr = InputBox("Enter the cut-off AF as percent, process only > this value ") rr = Val(rr) rr = rr / 100 Dim af(20, 30) Open temp1 For Input As #1 Ms = 0 Mt = 1 For m = 1 To ll Sr = 0 Qr = 0 Ss = 0 Qs = 0 Input #1, rs For q = 1 To rs Input #1, rq If rq > rr Then Qr = Qr + rq * rq Sr = Sr + rq End If Next q Input #1, ru Input #1, rv Ar = Sr * Sr + 0.000001 Ar = 1 / Ar Ss = Ar * Qr Qs = Ss * Ss Ms = (2 - Ss) * Qs Mt = Ms * Mt Next m Close #1 Open temp2 For Output As #4 vv = InputBox("Enter the number in population, no commas ") vv = Val(vv) Write #4, Mt vp = 0.5 * vv * vv t9 = Mt t6 = 84 * t9 t7 = 36 * t9 t8 = 9 * t9 Write #4, "AF cut-off =", 100 * rr, "population n = ", vp, "t9", t9 For k = 0 To 3 XL = 6 + k For j = 1 To 39 uu2 = Int(j / 10) uu3 = (j - uu2 * 10) / 10 If uu2 = 0 Then XLf = XL If uu2 = 1 Then XLf = XL * (XL - 1) If uu2 = 2 Then XLf = XL * (XL - 1) * (XL - 2) / 2 If uu2 = 3 Then XLf = XL * (XL - 1) * (XL - 2) * (XL - 3) / 6 ' the following statistical combinations factor ' uses an "Jan Haugland" approximation for n! ' (n+a)! == n! * (n + (1+a)/2 )^a or ' for greater accuracy use a Gamma Function Calculator ' or look up tables uu5 = Exp((1 - uu3) * (Log((XL - uu2 - 1) + (2 - uu3) / 2))) uu6 = Exp((uu3) * (Log(uu2 + (1 + uu3) / 2))) uu7 = uu5 * uu6 x6 = 84 * XLf * vp * t6 / uu7 x7 = 36 * XLf * vp * t7 / uu7 x8 = 9 * XLf * vp * t8 / uu7 x9 = 1 * XLf * vp * t9 / uu7 If k = 0 Then Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x6 + 0.5) End If If k = 1 Then Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x7 + 0.5) End If If k = 2 Then Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x8 + 0.5) End If If k = 3 Then Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x9 + 0.5) End If Next j Next k Close #4
' For 6 loci maximum ll = InputBox("Enter the number of Loci") ll = Val(ll) yy = InputBox(" Input file number ") yy = Val(yy) temp1 = "allele frequencies" & Str(yy) yyyy = InputBox(" Output file number ") yyyy = Val(yyyy) temp2 = "allele frequencies" & Str(yyyy) rr = InputBox("Enter the cut-off AF as percent, process only > this value ") rr = Val(rr) rr = rr / 100 Dim af(20, 30) Open temp1 For Input As #1 Ms = 0 Mt = 1 For m = 1 To ll Sr = 0 Qr = 0 Ss = 0 Qs = 0 Input #1, rs For q = 1 To rs Input #1, rq If rq > rr Then Qr = Qr + rq * rq Sr = Sr + rq End If Next q Input #1, ru Input #1, rv Ar = Sr * Sr + 0.000001 Ar = 1 / Ar Ss = Ar * Qr Qs = Ss * Ss Ms = (2 - Ss) * Qs Mt = Ms * Mt Next m Close #1 Open temp2 For Output As #4 vv = InputBox("Enter the number in population, no commas ") vv = Val(vv) Write #4, Mt vp = 0.5 * vv * vv t6 = Mt Write #4, "AF cut-off =", 100 * rr, "population n = ", vp, "T6", t6 For k = 0 To 0 XL = 6 + k For j = 1 To 9 ' for-next and integre problem uu2 = Int(j / 10) uu3 = (j - uu2 * 10) / 10 If uu2 = 0 Then XLf = XL ' the following statistical combinations factor ' uses an "Jan Haugland" approximation for n! ' (n+a)! == n! * (n + (1+a)/2 )^a or ' for greater accuracy use a Gamma Function Calculator ' or look up tables uu5 = Exp((1 - uu3) * (Log((XL - uu2 - 1) + (2 - uu3) / 2))) uu6 = Exp((uu3) * (Log(uu2 + (1 + uu3) / 2))) uu7 = uu5 * uu6 x6 = XLf * vp * t6 / uu7 If k = 0 Then Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x6 + 0.5) End If If k = 1 Then Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x7 + 0.5) End If If k = 2 Then Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x8 + 0.5) End If If k = 3 Then Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x9 + 0.5) End If If k = 4 Then Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x10 + 0.5) End If Next j Next k Close #4
' partial match checker, set for 10 loci ' has found all deliberately seeded partial match profiles ' but no guarantee it may miss some ' it is linear in operation, double the number of profiles ' and processing tme doubles, so 65000 profiles can be easily checked Dim ps As String Dim pt As String Dim ph(13) Dim pk(13) Dim Array2(40000, 3) Dim Array3(40000, 3) count3=0 temp0 = "a_jan_19.txt" temp1 = "a_jan_20_test" temp2 = "a_jan_20_out" temp3 = "a_jan_20_out2" Open temp0 For Input As #10 Open temp1 For Output As #11 Open temp2 For Output As #12 Open temp3 For Output As #13 nn = 0 ' load up, 10 loci source file into an array, eventually as column 1 and 3 Do While (EOF(10) = False) Input #10, pt c1$ = Mid(pt, 1, 20) Array2(nn, 1) = c1$ nn = nn + 1 Loop endlim = nn Close (10) Close #10 Write #12,"number of iterations and output partial matches" ' nnn some number more than half the number of profiles For nnn = 0 To 20000 For nn = 0 To endlim - 1 Step 2 ps = Array2(nn, 1) a1$ = Mid(ps, 1, 2) ph(1) = a1$ a2$ = Mid(ps, 3, 2) ph(2) = a2$ a3$ = Mid(ps, 5, 2) ph(3) = a3$ a4$ = Mid(ps, 7, 2) ph(4) = a4$ a5$ = Mid(ps, 9, 2) ph(5) = a5$ a6$ = Mid(ps, 11, 2) ph(6) = a6$ a7$ = Mid(ps, 13, 2) ph(7) = a7$ a8$ = Mid(ps, 15, 2) ph(8) = a8$ a9$ = Mid(ps, 17, 2) ph(9) = a9$ a10$ = Mid(ps, 19, 2) ph(10) = a10$ pt = Array2(nn + 1, 1) b1$ = Mid(pt, 1, 2) pk(1) = b1$ b2$ = Mid(pt, 3, 2) pk(2) = b2$ b3$ = Mid(pt, 5, 2) pk(3) = b3$ b4$ = Mid(pt, 7, 2) pk(4) = b4$ b5$ = Mid(pt, 9, 2) pk(5) = b5$ b6$ = Mid(pt, 11, 2) pk(6) = b6$ b7$ = Mid(pt, 13, 2) pk(7) = b7$ b8$ = Mid(pt, 15, 2) pk(8) = b8$ b9$ = Mid(pt, 17, 2) pk(9) = b9$ b10$ = Mid(pt, 19, 2) pk(10) = b10$ Count = 0 ' compare adjascent pairs for partial matches For j = 1 To 10 If ph(j) = pk(j) Then Count = Count + 1 Next j Array2(nn, 2) = Count Array2(nn + 1, 2) = Count ' 5 or more partial matches output to array column 2 If Count > 4 Then Write #12, Array2(nn, 1), Array2(nn, 2); Array2(nn, 3) Write #12, Array2(nn + 1, 1), Array2(nn, 2), Array2(nn + 1, 3) End If ' fix original file place number with each profile in array column 3 If nnn = 0 Then Array2(nn, 3) = nn Array2(nn + 1, 3) = nn + 1 End If Next nn ' swap every other profile in the array For nn = 0 To endlim - 3 Step 2 Array3(nn, 1) = Array2(nn, 1) Array3(nn, 2) = Array2(nn, 2) Array3(nn, 3) = Array2(nn, 3) Array3(nn + 2, 1) = Array2(nn + 2, 1) Array3(nn + 2, 2) = Array2(nn + 2, 2) Array3(nn + 2, 3) = Array2(nn + 2, 3) Array2(nn, 1) = Array3(nn + 2, 1) Array2(nn + 2, 1) = Array3(nn, 1) Array2(nn, 2) = -1 Array2(nn + 2, 2) = -1 Array2(nn, 3) = Array3(nn + 2, 3) Array2(nn + 2, 3) = Array3(nn, 3) ' output snapshots of processed array if nnn/100 =int(nnn/100) then Write #13, Array2(nn, 1), Array2(nn, 2), Array2(nn, 3) Write #13, Array2(nn + 1, 1), Array2(nn + 1, 2), Array2(nn + 1, 3) 'Write #13, Array2(nn + 2, 1), Array2(nn + 2, 2), Array2(nn + 2, 3) end if Next nn Write #12, nnn ' end loop when first profile has reached end ' and last profile reached the first row in the array If Array2(0, 3) = endlim - 1 Then count3 = count3+1 If Array2(endlim-1,3) = 0 then count3=count3+1 If count3 = 2 then Exit For End If ' single iteration beep For beepc = 1 To 1 For beept = 1 To 200 beepu = 1 / beept Next beept Beep Next beepc ' 100 fold iteration beep if nnn/100 =int(nnn/100) then For beepc = 1 To int(nnn/100) For beept = 1 To 20000 beepu = 1 / beept Next beept Beep Next beepc end if Next nnn j = 2000 ' beeps after 1000s If j / 1000 = Int(j / 1000) Then For beepc = 1 To (j / 1000) For beept = 1 To 20000 beepu = 1 / beept Next beept Beep Next beepc End If Close #10 Close #11 Close #12 Close #13 ' end beep For beepc = 1 To 2 For beept = 1 To 200000 beepu = 1 / beept Next beept Beep Next beepc
Results using AF data on Allele Frequencies for 15 Autosomal STR Loci on U.S. Caucasian, African American, and Hispanic Populations J Forensic Sci, July 2003, Vol. 48, No. 4 Paper ID JFS2003045_484 Published 19 May 2003 to approximate Arizona AFs Native American data on J. of Forensic Sciences 2006, Vol 51 , pt6, 1410-1413 (error in Caucasian D7 listing, allele 9 should probably read 15.313 mot 5.313, sum not 1) and UK and OZ data previously detailed Arizona mixed is approximating the prison population of Arizona which as far as the 2 major components from http://acjc.state.az.us/pubs/home/Crime_Trends_2005.pdf page 46 Hispanic 46% caucasian 35% scaled to 100% The best fits for the Arizona data is in the region of 5 percent AF inclusion/exclusion and .7 loci co-ancestry and .8 loci and 6 percent AF with .5 to .6 loci co-ancestry. I've not gone with the 8% cut-off because AF tables are discrete rather than continuous making calculations very hit and miss in that region as so few datapoints The Arizona data target is 144 , 9loci / 22 , 10 loci / 2 , 11 loci (related) / 1 related 12 loci match For hispanic/caucasian mixed, 65,000 simulated Arizona population set AF inc/exc loci(9 to 12) co-anc (shared loci) number of matches .01% 9L 2.6L 100 (0.01% is all AFs taken into account) 2.7 108 3.1 140 3.2 148 .01% 10L 2.6 21.6 2.7 23.6 .01% 11L 2.6 2.0 2.7 2.3 1% 9L 2.1 109 (excluding all AFs less than 1% and rescaling ) 2.2 121 2.3 133 2.4 146 1% 10L 2.1 19.6 2.2 24.8 2.3 27.7 2.4 30.8 1% 11L 2.1 2.0 2.2 2.2 2.3 2.5 2.4 2.8 3% 9L 1.3 114 (>3%) 1.4 132 1.5 152 10L 1.3 20.9 1.4 24.6 11L 1.3 1.7 1.4 2.0 5% 9L .7 127 .8 153 10L .7 21.8 .8 26.6 11L .7 1.7 .8 2.0 6% 9L .5 128 .6 153 10L .5 21.6 .6 26.8 11L .5 1.6 .6 2.1 8% 9L .1 129 .2 168 10L .1 20.8 .2 27.4 11L .1 1.5 .2 2.0 How you interpret related to unrelated out of this I don't know Projecting to 13 loci then first Codis match would require 2.3 to 2.57 million, comparing with something like 20.5 million for completely random profile ie Bayesian independence. Interestingly, due to the peaky modal AF structure for Native Americans , this is the results for 65,000 such population using the 5% : .7/.8 and 6% .5 / .6 co-ancestry factors For 5% to 6% 9 loci , between 2394 and 6454 10 loci , 412 to 1099 11 loci , 32 to 84 12 loci , 1 to 2.5 For UK , 10 loci , but the same co-ancestry treatment for a representative UK population and NDNAD total of 3.2 million samples For 5 to 6 percent 140 to 177 , 10 loci matches (totally random simulations show about 2 , unrelated 10 loci matches in 3.2 million ) For 30 million , half the population for 5 to 6 percent means 12,300 to 15,600 , 10 loci matches leading to a 14,000 in 30 million chance of a match to someone else or 1 in 2,150 for any 1 person in the UK to have a match with someone else in the UK. Until the corrupt keepers of such databases discriminate the related / unrelated eg the 22, 10 loci Arizona matches. www.aph.gov.au/hansard/senate/commttee/s8081.pdf has reference to 144,546 Australian DNA profiles in 2005 if all were Caucasian 4.3 to 6.2 , 9 loci matches in 144546 if all were Aborigine 12 to 41 , 9 loci matches I mixed the 2 principal Arizona Hispanic and Caucasian allele frequencies in ratio of 46 to 35 scaled to 100 to represent the Arizona population. Then the minimum AF and co-ancestry factors that gave between 127 to 153, 9 loci partial matches and 21.6 to 26.8 , 10 loci partial matches. Then used those coefficients applied to a general USA (1/2)population of 151 million and 13 loci using USA AFs. This gave the fundamental number of profiles for an evens chance of 1 unrelated full 13 loci match as between 2.3 and 2.57 million. Then the square law of scaling to 151 million gave between 3,500 and 4,300 so between 1 in 35,00 and 1 in 44,000 of an evens chance of an unrelated false match in the USA population. For a 2008 CODIS total of 6 million DNA profiles then about 7 unrelated false matches. In other words for a large arena containing 40,000 males then there is a better than evens chance for someone there to be falsely matched to an unrelated person in the USA. Corresponding figures for the UK , 10 loci and 30 million , half population. Between 12,300 and 15,600 unrelated false matches in 30 million or between 1 in 1,900 and 1 in 2,400 so a large hall or small arena of men For Australia and 9 loci and 10.5 million half population. Between 22,700 and 32,700 unrelated false matches in 10.5m or between 1 in 320 and 1 in 460 so just a small hall. Also maths agreement with Forensic Science International 95 (1998) p30 declaration of 10 matches at 6 loci in 6311 profiles. Reducing the maths back to the 6 loci mid 90s SGM structure the results for 6311 is between 5.8 and 7.7 matches. The only other disclosed database data I am aware of is from Forensic Science International 95 (1998) p30 of 10 6 loci matches in 6311. Adapting the UK 10 loci routine back to the 6 loci mid 90s SGM structure the results for 6311 is between 5.8 and 7.7 matches so at least consistent.
Repeated background maths
Derivation of the Square Law concerning DNA databases.
Acknowledgement to PeteM ,02 July ,2003
Arrange the N members of the population in a list m1, m2, .... mN.
The probability of a profile match between two members selected at random is i.
The expected number of matches between m1 and subsequent individuals in the list (m2,m3,m4 ...) is (N-1)i
The expected number of matches between m2 and subsequent individuals in the list (m2,m3,m4 ...) is (N-2)i.
And so on. So the total number of expected matches (including triples etc) is
[sigma from j=1 to j=N] {i*(j-1)} = iN(N-1)/2 ~ 0.5iN^2

General formula for deriving the minimum number of profiles in a database before false matches occur
Starting with a simpler analogue
Consider a 10 faced loaded dice with weighting such that face 0 or face 1 have a probability of 0.2 each face 2 or 3 , probability 0.15 each and faces 4 to 9 , 0.05 each
Toss 10 times and record the 10 digit number Repeat n times. Determine a number N where a repeat of a previously occuring 10 digit number will occur.
The probability of a random pair of single digits matching is sum of squares = 2(.2^2) + 2(.15^2) + 6(.05)^2 = 0.14 . The digits in each of the 10 positions are independent, so the overall probability of all 10 digits matching is ( sum of squares )^10 ~= 2.893e-9, and call p.
To generate N numbers, there are N(N-1)/2 pairs of numbers which must all be different to avoid a repeat. If the pairs were independent then the expected number of repeats would be pN(N-1)/2, which will be 1 when N is about 26,000. The pairs won't actually be independent, but this estimate for the expected value should be fairly close for N << 1/p.
N = SQRT(2/p)
By comparison, if the numbers were unbiased then about 1 repeat in the first 140,000 numbers.
Now convert to factor-in directed pairs
If all pairs were directed then the new directed pair (dp) probability would by, taking 2 at a time, be dp = 2p*p but the pairs 00,11,22 etc are not directed so 2p*p is inflated by the probability of just the doublets so deduct this factor from the fomula.
The factor dp now becomes (2 * 0.14^2 - 0.14^3)^5.
Now convert to the DNA profile situation and formula becomes
For n loci 1..... 5 (6,9,10,13,15 or any number)
and m (valid) alleles at each locus and 2 per locus.
So Allele Frequencies are AF1 ..... AFm
Let Sn be the sum of the squares of AFs at locus n
ie Sn = AF1^2 + AF2^2 +...... + AFm^2 for each n
Let Qn = Sn^2 for each n
Let p = (Q1 * Q2 * .... * Qn ) [(2-S1) * (2-S2) * .... * (2-Sn)]
Then N = minimum number before evens chance of finding a match is
N = SQRT (2/p)

An interesting study into analysing the Troyer Arizona disclosure, he is of the opinion it is 120 instead of 144 (inclusive of higher order matches) 9 loci matches
http://www.ias.ac.in/jgenet/Vol87No2/temp/jgen00133.pdf
Can simple population genetic models reconcile partial match
frequencies observed in large forensic databases?
LAURENCE D. MUELLER
Department of Ecology and Evolutionary Biology, University of California,
Irvine, CA 92697-2525, USA

Surprisingly a lot in there I have to agree with from my own analysis
. But he cannot get a reliable handle on the sub-patterning of 9/10
loci number partial matches and consequently cannot scale up to whole
populations and full 13 loci false matches. So making the exercise
rather futile.

Some quotes , from his article, worth repeating here

If some combination of theta, and relatives can correctly
predict the number of 9-locus matches but not the number of
10-locus matches then it is an unsatisfactory explanation of
the Arizona observations.
...

The results (figure 3) show that as theta increases the
number of matches at 9 and 10 loci increase but the number of matches
at 9 loci increase faster than the number of
matches at 10 loci.

...

Adding pairs of full sibs to the Arizona database increases
both the number of 9-locus and 10-locus matches (figure 5),
but as in the substructure-only-simulations, the number of 9-
locus matches quickly exceeds the number in Arizona well
before the number of 10-locus matches is even close to 20.
Consequently, no models that add sibs alone can adequately
explain the Arizona observations.
...

The general findings are that acceptable parameter values
require fewer pairs of siblings as theta increases. The range of
sibling pairs that produce an adequate description of the Ari-
zona observations is relatively narrow. Thus, if the true num-
ber of sibling pairs was much less than 1000, or much greater
than 3000, then none of these models would produce reliable
predictions of the observed number of matches. The claim
that there is a relatively narrow parameter range that explains
the Arizona results can be put into perspective as follows. If
150 9-locus matches and 15 10-locus matches had actually
been observed in Arizona, then virtually all simulations in
figures 6–8 would have been consistent with this result.

...
These results (figure 9) show that
even with a database composed almost entirely of parent–
offspring pairs the number of matches at 10 loci is far below
the Arizona value. From these results it is reasonable to con-
clude that the only relatives that would possibly contribute to
explaining the Arizona observations are full sibs.

...

Not any number of siblings will work.
...

An additional method for studying these problems would
be to get the profiles from two different states, say Arizona
and Maryland. The number of matches within databases
could be compared to the number between databases. This
latter number would not be expected to be inflated by numer-
ous full sibs and thus should be close to the numbers pre-
dicted by substructure only.
It is clear from these simulations that, even for the best
models, the probability of the Arizona observations is only
9%–12%. The study of additional offender databases would
help add to the empirical foundation of this study and help
assess whether Arizona is the norm or, for some reason, an
odd outlier. Ultimately, if the simple models examined here
cannot adequately explain the number of matches observed
in the Arizona offender database, some modification of the
underlying probability models may be required.
The product rule with some minor modification is the
most common method for computing the frequency of DNA
profiles in forensic laboratories. This method relies critically
on the assumption that there is statistical independence be-
tween loci. The empirical support for this method comes
mainly from tests of independence between pairs of loci (Bu-
dowle et al. 1999). However, recent research on finite pop-
ulations, with mutation and a monogamous mating system
shows that departures from the product rule get worse as
one looks at more loci (Dr Yun Song, personal communica-
tion). Thus, rigorous testing of the product rule predictions
at many loci may yield different results than prior work at
only two loci. Perhaps the most important quality control
issue in forensic DNA typing is determining the adequacy
of the methods for computing profile frequencies. In this
respect offender databases can serve a useful and unique pur-
pose, as apparently intended by the DNA Identification Act.
The tremendous size of these databases makes them a unique
resource which would cost many millions of dollars to recre-
ate. There is certainly much more that can be learned from
additional scientific research with offender databases.

End quotes

From the LAtimes piece, URL above.
To get the Illinous figure of 903 , 9 loci
matches in 220,000 also requires about 3.1
loci of shared co-ancestry. Using a
.62/.3/.08 A-C/Caucasian/Hispanic AF split
, to reflect an Illinois demographic, not 
that that probably makes much difference
to these sorts of analysis.

The Maryland data of 32 , 9 loci matches in 30,000
looks too corrupted to be of much use.
I would expect less than 0.001 matches
at 13 loci in 30,000 so presumably
those 3 are duplicates. 29 , 9 loci
matches is difficult to achieve even
with 4 loci of background co-ancestry.
 Getting partial match data divulged from these 
databases avoids this problem of duplicates 
for determining what the true , full profile, 
false match figures are within a whole population.

From John Buckleton, New Zealand.
"I have been asked to join this discussion. The partial matches in the
Arizona database have attracted quite an amount of attention much based
on preconception to me. The simple matter is to compare observed and
expected under the relevant population genetic models. Of interest is
whether we use the US model or the European one (which I prefer) and
whether or not we allow a correction for relatives in the datasets.
Whilst I have not done this for the Arizona dataset I (and others) have
now done it for Caucasians in New Zealand and Australia, Australian
Aboriginals, Eastern and Western Polynesians (all published) and
Croatians (in draft). ... deviating off subject and not returning"

The Annals of Applied Statistics
2007,Vol.1,No.2,358–370
DOI:10.1214/07-AOAS128
Institute of Mathematical Statistics, 2007

THE RARITY OF DNA PROFILES
BY BRUCE S. WEIR
University of Washington
http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdfview_1&handle=euclid.aoas/1196438022

"The finding of Troyer, Gilroy and Koeneman (2001) was for a pair of profiles
that matched at nine loci, partially matched at three loci and mismatched at one locus."

So in a sense a 10.5 loci unrelated match. (NDNAD uses 10 loci)


As you only get the first tranche of unrelated matches
in the modal group, with all high allele
frequency alleles, like myself (all 4 grandparents
from England) then perhaps it should
not be too surprising that partial or
full matches should apparently reflect
a high level of co-ancestry.

No wonder the FBI want to keep a lid on
this sort of data. Strange they
have released, in the past, (doctored) data
of DNA profiles with ethnicity. The FSS
is perfectly happy to sell millions of
DNA profile data , with ethnicity, to UK companies
http://www.telegraph.co.uk/news/newstopics/politics/lawandorder/2459976/Millions\
-of-profiles-from-DNA-database-passed-to-private-firms.html
25 Jul 2008

The full set of pairs of Arizona 9 loci matches , with
ethnicities, should be released into the public realm
or the only obvious interpretation will be
drawn from its suppression.


Email Paul Nutteing by removing 4 of the 5 dots
or email Paul Nutteing ,remove all but one dot

Or a message on usenet group uk.legal has got to me recently a couple of times.
A lot of the contents of this file plus other material 'peer reviewed' on the main forensic science usergroup

Background
A simulation of a large DNA profile database
A simulation of DNA profile 'families'
A simulation of DNA profile families with consanguinity
A simulation of DNA profile 'families' for 6 generations
dnas.htm revisited with all alleles represented
dnas.htm revisited for >8 percent allele frequency subset (similar ancestry )
Simulation of Taiwanese Tao and Rukai populations to explore the effect of within and without ancestral clusters
Basques autochthonous DNA profiles simulation, 9 loci
Australian Capital Caucasian 9 loci simulation
Australian Capital Caucasian 9 loci simulation, >= 5% allele frequency
CODIS, 13 Loci Caucasian Simulation
Automating the macros
Exploring other DNA profile match scenarios
Suspect familial matching
Return to co-ancestry factor in the NDNAD simulations
144 random matches in 65,000 -- ONLY?

Powered by counter.bloke.com