I am aware that full loci ,false matches, ( the first few dozen
or so ) only occur with those people with all high frequency of occurence
of alleles at all loci. So , to me, it is no surprise that
this effect comes to the fore rather than related matches, the acceptable
"explanation" for the Arizona partial match data. Whatever the
contribution is of related partial matches, that proportion
will stay the same with the addition of other states databases -
it is naturally limited.
But the number of unrelated partial matches will increase
quadratically as no limitation.
Trying to simulate
http://www.stanford.edu/~kdevlin/Cold_Hit_Probabilities.pdf
Scientific Heat about Cold Hit
"... It should be noted that a recent analysis of the Arizona
convicted offender data base (a database that uses the 13 CODIS loci)
revealed that among the approximately 65,000 entries listed there were
144 individuals whose DNA profiles match at 9 loci (including one
match between individuals of different races, one Caucasion, the other
African American), another few who match at 10 loci, one pair that
match at 11, and one pair that match at 12. The 11 and 12 loci matches
were siblings, hence not random. But matches on 9 or 10 loci among a
database as small as 65,000 entries cast considerable doubt on figures
such as "one in ten trillion" for a match that extends to just 3 or 4 a
dditional loci. ..."
http://www.latimes.com/news/local/la-me-dna20-2008jul20,0,1506170,full.story
How reliable is DNA in identifying suspects?
July 19, 2008
... "The FBI laboratory, which administers the national DNA database system, tried to stop distribution of Troyer's results and began an aggressive behind-the-scenes campaign to block similar searches elsewhere, even those ordered by courts, a Times investigation found.
At stake is the credibility of the compelling odds often cited in DNA cases, which can suggest an all but certain link between a suspect and a crime scene. .... As a result, Thomas Callaghan, head of the FBI's CODIS unit, has dismissed Troyer's findings as "misleading" and "meaningless."
He urged authorities in several states to object to Arizona-style searches, advising them to tell courts that the probes could violate the privacy of convicted offenders, tie up crucial databases and even lead the FBI to expel offending states from CODIS -- a penalty that could cripple states' ability to solve crimes.
... After the judge, Steven Platt, rejected her arguments, Groves returned to court, saying the search was too risky. FBI officials had now warned her that it could corrupt the entire state database, something they would not help fix, she told the court."
With further info from
http://www.maa.org/devlin/devlin_10_06.html
Devlin's Angle
October 2006
...
As far as I am aware, to date there has been only one attempt to do this,
and the results obtained were both startling and worrying. A study of the
Arizona CODIS database carried out in 2005 showed that approximately 1 in
every 228 profiles in the database matched another profile in the database
at nine or more loci, that approximately 1 in every 1,489 profiles matched
at 10 loci, 1 in 16,374 profiles matched at 11 loci, and 1 in 32,747
matched at 12 loci.
How big a population does it take to produce so many matches that appear
to contradict so dramatically the astronomical, theoretical figures given
by the naive application of the product rule? The Arizona database
contained at the time a mere 65,493 entries. Scary isn't it?
It is not much of a leap to estimate that the FBI's national CODIS database
of 3,000,000 entries will contain not just one but several pairs that match
on all 13 loci, contrary (and how!) to the prediction made by the currently
much touted RMP that you can expect a single match only when you have on
the order of 15 quadrillion profiles.
...
Which translates to
144 pairs at 9 loci
22 pairs at 10 loci
2 pairs at 11 loci
1 pair at 12 loci
The Arizona Data
http://www.nlada.org/Defender/forensics/for_lib/Documents/1148592247.61/Myers%20CAC%20Presentation.pdf
is 144, 9 loci matches; 22 , 10 loci matches;
2 related 11 loci matches and 1 related 12 loci match in
65,493 , 13 loci DNA profiles. That is 144 inclusive of 120 9 loci, 22 10 loci
, and 2 10/11 loci.
"Avoid Saying that 13-Locus Profiles are de facto Unique"
Quote from the Myers presentation - should be writ
large in labs and courtrooms
Source info for the Arizona database disclosures via a
Kathryn Troyer, now on
http://www.nlada.org/Defender/forensics/for_lib/Documents/1148592247.61/Myers%20\
CAC%20Presentation.pdf
or http://tinyurl.com/ytp6ho
and original
http://www.promega.com/geneticidproc/ussymp12proc/abstracts/troyer.pdf
FYI another related file on that site has a mistranscribed
transcript , which all the way through, has a
mysterious reference to "low side" - a homonym for loci.
One further clarification is the 144 off pairs of
9 loci matches consists of 120 exclusive
9 loci + 22 off 10 loci + 1 off 11 loci + 1 off
12 loci,
22 off 10 loci = 20 + 1 +1
2 off 11 loci = 1 +1
1 off 12 loci as is.
3 of the 9-loci partial matches are shown on
that presentation as pairs A/B,C/D and E/F
I've added AFs as % and minimum AFs (allele frequencies)
included and
excluded in the matches afterwards
A/B AF-cauc AF-hisp
D3 16,16 / 16,16 25,25 / 25,25 29,29 / 29,29
vWA 17,18 / 17,18 28,20 / 28,20 22,17 / 22,17
FGA 22.2,24/19,19 <.2,13.6/5.3,5.3 <.4,15/6.4,6.4
Amel X,Y / X,Y - - - -
D8 12,13 / 8,15 15,30 / 1.2,11 14,27 /0.7,13
D21 30,31 / 29,30 28,8.3/19.28 26,8.2/ 20,26
D18 14,16 / 14.16 14,14 / 14,14 14,14 / 14,14
D5 11,12 / 11,12 36,38 / 36,38 35,35 / 35,35
D13 12,13 / 12,13 25,12 / 25,12 22,12 / 22,12
D7 11,11 / 11,11 21,21 / 21,21 26,26 / 26,26
D16 11,12 / 11,12 32,33 / 32,33 26,25 / 26,25
THO1 7,8 / 6,8 19,8.4/23,8.4 28,9.6/21,9.6
TPOX 8,11 / 8,11 53,24 / 53,24 47,27 / 47,27
CSF 12,13 / 12,13 36,9.6/36,9.6 36,6.1/36,6.1
min inc = 9.6 min inc=6.1
min exc=<.2 min exc=<.4
C/D
D3 16,17 / 14,17 25,21 / 10,21 29,20 /7.9,20
vWA 16,17 / 16,17 20,28 / 20,28 26,22 / 26,22
FGA 22,23 / 21,21 22,13 / 18,18 15,14 / 17,17
Amel X,Y / X,Y - - - -
D8 12,14 / 12,16 18,17 / 18,3 14,25 /14,2.5
D21 28,30 / 28,30 16,28 / 16,28 9.6,26/9.6,26
D18 16,16 / 16,18 14,14 /14,7.6 14,14 /14,6.8
D5 12,12 / 12,12 38,38 / 38,38 35,35 / 35,35
D13 11,12 / 11,12 34,25 / 34,25 24,12 / 24,12
D7 9,10 / 9,10 18,24 / 18,24 11,29 / 11,29
D16 11,13 / 11,13 32,15 / 32,15 26,19 / 26,19
THO1 8,9 / 8,9 8.4,12/8.4,12 9.6,15/9.6,15
TPOX 8,9 / 8,9 53,12 / 53,12 47,10 / 47,10
CSF 10,11 / 10,11 21,30 / 21,30 23,29 / 23,29
min inc = 8.4 min inc=9.6
min exc=3.1 min exc=6.8
E/F
D3 15,15 / 15,15 26,26 / 26,26 29,29 / 29,29
vWA 17,17 / 17,17 28,28 / 28,28 22,22 / 22,22
FGA 18,24 / 20,25 2.6,3.6/13,7.1 1.8,15/8.9/12
Amel X,Y / X,Y - - - -
D8 13,14 / 13,14 30,17 / 30,17 27,25 / 27,25
D21 28,31.2/28,31.2 16,10 / 16,10 9.6,11/9.6,11
D18 13,16 / 14,19 13,14 /14,3.8 11,14 /14,3.9
D5 11,12 / 11,12 36,38 / 36,39 35,35 / 35,35
D13 11,11 / 11,11 34,34 / 34,34 24,24 / 24,24
D7 11,12 / 10,11 21,17 / 24,21 26,16 / 29,26
D16 12,12 / 12,12 33,33 / 33,33 25,25 / 25,25
THO1 6,8 / 6,8 23,8.4/23.8.4 21,9.6/21,9.6
TPOX 8,8 / 8,8 53,53 / 53,53 47,47 / 47,47
CSF 10,12 / 10,10 21,36 / 21,21 23,26 / 23,23
min inc =8.4 min inc=9.6
min exc=2.6 min exc=1.8
It is instructive to convert to allele
frequencies as in simulations unrelated
false matches in "population" sizes of
order 1 million to 10 million
consist only of a subset of the population.
My own profile falls in the all >8% subset
to put me in the firing line for a false match
at some indeterrminate future date via the UK
system.
For UK , 10 loci, then minimum AF of 6.6%
in all matches
when all published alleles are used totally
randomly in simulations.
For CODIS simulated profiles with
all published alleles then the minimum AF was 7.8%
for all the matching pairs.
Some subset proportions (caucasian)
Percentage of UK population with all >8% is 6.6% (10loci)
Percentage of UK population with all > 6.9% is 15.8% (10 loci)
Percentage of USA population with all >8% is 5.6% (13 loci)
Percentage of Oz population with all >3% is 52% (9 loci)
So in the above A,B,D,E,F,G , whether caucasian or hispanic i would
say they were all likely unrelated except maybe A-B if they
were hispanic. There is a 9 in 13 chance that any
rareish allele (< 6.6 to 7.8 AF) would be included
in the matching alleles if the pair were related
which would not be the case for unrelated matches.
I will attempt to show how rare a possibility
this is. There is about a 7.2 percent chance of
an allele having 5.6 percent allele frequency (C. Brenner)
Chance of a <5.6 percent allele 0.072
chance > 5.6 percent then 1-0.072 = 0.928
Number of times, drawing one low one in 54 draws is 0.072 * 54 = 3.89
Number of times all 54 being > 5.6, is 0.928^54 = 0.0177
so 3.89/0.0177 = 220 times more unlikely than the situation
if related.
Repeating the AF analysis for African American
shows 2 pairs matching if related, ie
included AFs have one less than 5.6 percent
A/B AF- African American
D3 16,16 / 16,16 29,29 / 29,29
vWA 17,18 / 17,18 18,14 / 18,14
FGA 22.2,24/19,19 <.3,18 / 7,7
Amel X,Y / X,Y -
D8 12,13 / 8,15 8,15 / .6,23
D21 30,31 / 29,30 28,8.3 / 19.28
D18 14,16 / 14,16 18,9 / 18,9
D5 11,12 / 11,12 25,37 / 25,37
D13 12,13 / 12,13 40,16 / 40,16
D7 11,11 / 11,11 21,21 / 21,21
D16 11,12 / 11,12 35,17 / 35,17
THO1 7,8 / 6,8 38,22 / 16,22
TPOX 8,11 / 8,11 32,25 / 32,25
CSF 12,13 / 12,13 27,5.1 / 36,5.1
min inc = 5.1
min exc = <.3
C/D
D3 16,17 / 14,17 29,27 / 9,27
vWA 16,17 / 16,17 25,18 / 25,18
FGA 22,23 / 21,21 19,13 / 18,18
Amel X,Y / X,Y -
D8 12,14 / 12,16 8,36 / 8,6.4
D21 28,30 / 28,30 26,18 / 26,18
D18 16,16 / 16,18 18,18 / 18,10
D5 12,12 / 12,12 37,37 / 37,37
D13 11,12 / 11,12 28,40 / 28,40
D7 9,10 / 9,10 13,30 / 13,30
D16 11,13 / 11,13 35,12 / 35,12
THO1 8,9 / 8,9 22,12 / 22,12
TPOX 8,9 / 8,9 32,21 / 32,21
CSF 10,11 / 10,11 25,27 / 25,27
min inc = 12
min exc = 9
E/F
D3 15,15 / 15,15 24,24 / 24,24
vWA 17,17 / 17,17 18,18 / 18,18
FGA 18,24 / 20,25 0.9,18 / 3.5,10
Amel X,Y / X,Y -
D8 13,14 / 13,14 15,36 / 15,36
D21 28,31.2/28,31.2 26,4.8 / 26,4.8
D18 13,16 / 14,19 5.4,19 / 8,9.2
D5 11,12 / 11,12 25,37 / 25,37
D13 11,11 / 11,11 28,28 / 28,28
D7 11,12 / 10,11 21,12 / 30,21
D16 12,12 / 12,12 17,17 / 17,17
THO1 6,8 / 6,8 16,22 / 16,22
TPOX 8,8 / 8,8 32,32 / 32,32
CSF 10,12 / 10,10 25,27 / 25,25
min inc = 4.8
min exc = 0.9
And repeating for Native American , all
related to my criteria
AB min incl CSF 4.7 percent
CD min incl THO1 5.3 and TPOX 4.2 percent
EF min incl THO1 5.2 percent
Also taking the principal 3 subcomponents of Arizona
prisons of Hispanic 46.1 percent/Cauc 34.7 percent
/ Af-Am 8.6 percent /native American 4.4 percent
from http://acjc.state.az.us/pubs/home/Crime_Trends_2005.pdf
page 46 ignoring oriental, native-Am etc
and rescaling to 49.1/37.0/ 9.2 / 4.7 percent
The ratio of likelihood of unrelated to related matches
in just the Hispanic and Caucasian communities for
matching pairs ABCDE,F in the Myers presentation
is 220
and 2/3 probability of pairs AB and EF are related
matches ,if from the smaller Af-Am population,
and all 3 pairs related if from Nat-Am then
I make the revised ratio that those A to F profiles
and the stated 2 related pairs represent
about 87.1 percent unrelated to 12.9 percent related
in the 9 loci matches for a representaive Az
offender DNA database.
ie NOT mostly or overwhelmingly related matches.
I've not yet seen anyone else's evaluation
from the (so far ?) available evidence.
Admittedly, further account has to be taken of the probability
of related matches having no low AF alleles.
From
The Annals of Applied Statistics
2007,Vol.1,No.2,358–370
THE RARITY OF DNA PROFILES
BY BRUCE S. WEIR
Table 2, for a simulation of 65,493 profiles
and theta = 0.03 he would expect a 9/10
ratio of 10.5 to 1. So a 2 pairs in 22 10 loci
known to be related , he would expect only 21 related
pairs in 122 of 9 loci pairs.
The Visual Basic code is between the Horizontal Rules
as dividers. To generate use the
13 loci CODIS generator on dnas11.htm
I initially chose to generate 32,500, half
the quoted number. The following VB
routines are for locus 1
divider/sorter then match checker for 2 to 13
locus 2 sorter then checker
Locus 3 sorter then checker
Locus 4 sorter then checker
Converting 'profile' strings back to standard Codis
form.
CODIS generator for only >8 percent AFs
Results
Use the Codis generator on dnas11
changing the - to space in the file names for consistency.
Place the VB code between Sub and End Sub,
change the file names to whatever you
prefer, I prefer names based on date.
It would be possible to join together
these routines but I prefer to keep
a bit hands on outputting to separate files
' dividing into 10 by first digit
' Dividing 10 files into 10 by second digit
Dim ps As String
Dim ph(26)
xx = 0
yyyy = xx
temp = "jul12-d.txt"
temp0 = "jul28a 0"
temp1 = "jul28a 1"
temp2 = "jul28a 2"
temp3 = "jul28a 3"
temp4 = "jul28a 4"
temp5 = "jul28a 5"
temp6 = "jul28a 6"
temp7 = "jul28a 7"
temp8 = "jul28a 8"
temp9 = "jul28a 9"
tempc = "jul28a c"
Open temp For Input As #1
Open temp0 For Output As #10
Open temp1 For Output As #11
Open temp2 For Output As #12
Open temp3 For Output As #13
Open temp4 For Output As #14
Open temp5 For Output As #15
Open temp6 For Output As #16
Open temp7 For Output As #17
Open temp8 For Output As #18
Open temp9 For Output As #19
count0 = 0
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
Do Until (EOF(1) = True)
Input #1, ps
a2$ = Mid(ps, 1, 1)
ph(1) = Val(a2$)
If ph(1) = 0 Then
Write #10, ps
count0 = count0 + 1
End If
If ph(1) = 1 Then
Write #11, ps
count1 = count1 + 1
End If
If ph(1) = 2 Then
Write #12, ps
count2 = count2 + 1
End If
If ph(1) = 3 Then
Write #13, ps
count3 = count3 + 1
End If
If ph(1) = 4 Then
Write #14, ps
count4 = count4 + 1
End If
If ph(1) = 5 Then
Write #15, ps
count5 = count5 + 1
End If
If ph(1) = 6 Then
Write #16, ps
count6 = count6 + 1
End If
If ph(1) = 7 Then
Write #17, ps
count7 = count7 + 1
End If
If ph(1) = 8 Then
Write #18, ps
count8 = count8 + 1
End If
If ph(1) = 9 Then
Write #19, ps
count9 = count9 + 1
End If
x = x + 1
Loop
Close (1)
Close #1
Close #10
Close #11
Close #12
Close #13
Close #14
Close #15
Close #16
Close #17
Close #18
Close #19
' output counts
countt = count0 + count1 + count2 + count3 + count4 + count5 + count6 + count7 + count8 + count9
Open tempc For Output As #20
Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9, total, countt
Close #20
For xx = 0 To 9
yyyy = xx
' beep count on every tenth division
' as a progress indicator
temp = "jul28a" & Str(yyyy)
temp0 = "jul28a" & Str(yyyy) & " 0"
temp1 = "jul28a" & Str(yyyy) & " 1"
temp2 = "jul28a" & Str(yyyy) & " 2"
temp3 = "jul28a" & Str(yyyy) & " 3"
temp4 = "jul28a" & Str(yyyy) & " 4"
temp5 = "jul28a" & Str(yyyy) & " 5"
temp6 = "jul28a" & Str(yyyy) & " 6"
temp7 = "jul28a" & Str(yyyy) & " 7"
temp8 = "jul28a" & Str(yyyy) & " 8"
temp9 = "jul28a" & Str(yyyy) & " 9"
tempc = "jul28a" & Str(yyyy) & " c"
Open temp For Input As #1
Open temp0 For Output As #10
Open temp1 For Output As #11
Open temp2 For Output As #12
Open temp3 For Output As #13
Open temp4 For Output As #14
Open temp5 For Output As #15
Open temp6 For Output As #16
Open temp7 For Output As #17
Open temp8 For Output As #18
Open temp9 For Output As #19
count0 = 0
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
Do Until (EOF(1) = True)
Input #1, ps
a2$ = Mid(ps, 2, 1)
ph(1) = Val(a2$)
If ph(1) = 0 Then
Write #10, ps
count0 = count0 + 1
End If
If ph(1) = 1 Then
Write #11, ps
count1 = count1 + 1
End If
If ph(1) = 2 Then
Write #12, ps
count2 = count2 + 1
End If
If ph(1) = 3 Then
Write #13, ps
count3 = count3 + 1
End If
If ph(1) = 4 Then
Write #14, ps
count4 = count4 + 1
End If
If ph(1) = 5 Then
Write #15, ps
count5 = count5 + 1
End If
If ph(1) = 6 Then
Write #16, ps
count6 = count6 + 1
End If
If ph(1) = 7 Then
Write #17, ps
count7 = count7 + 1
End If
If ph(1) = 8 Then
Write #18, ps
count8 = count8 + 1
End If
If ph(1) = 9 Then
Write #19, ps
count9 = count9 + 1
End If
x = x + 1
Loop
Close (1)
Close #1
Close #10
Close #11
Close #12
Close #13
Close #14
Close #15
Close #16
Close #17
Close #18
Close #19
' output counts
Open tempc For Output As #20
Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9
Close #20
Next xx
'End Beep
Beep
' progressively check for locus 1 matches and any in
' 2 to 13 then the next until no more to check as only maximum of
' matches on 9 loci required
' match checker for locus 1
Dim ps As String
Dim pt As String
Dim ph(13)
Dim pk(13)
Dim Array1(40000, 1)
temp0 = "jul29b-r.txt"
Open temp0 For Output As #10
For bb = 0 To 9
For aa = 0 To 9
temp = "jul28a" & Str(bb) & Str(aa)
temp2 = "jul28a" & Str(bb) & Str(aa)
Open temp2 For Input As #2
nn = 0
Do While (EOF(2) = False)
Input #2, pt
c1$ = Mid(pt, 1, 26)
Array1(nn, 1) = c1$
nn = nn + 1
Loop
Close (2)
Close #2
startlim = 0
' endlim= nn-1
endlim = nn - 1
j = startlim
k = 0
For n = 0 To endlim
' pt is the profile to be checked against for each
' other than itself
' comparing all columns 2 to 13
pt = Array1(n, 1)
b2$ = Mid(pt, 3, 2)
pk(2) = b2$
b3$ = Mid(pt, 5, 2)
pk(3) = b3$
b4$ = Mid(pt, 7, 2)
pk(4) = b4$
b5$ = Mid(pt, 9, 2)
pk(5) = b5$
b6$ = Mid(pt, 11, 2)
pk(6) = b6$
b7$ = Mid(pt, 13, 2)
pk(7) = b7$
b8$ = Mid(pt, 15, 2)
pk(8) = b8$
b9$ = Mid(pt, 17, 2)
pk(9) = b9$
b10$ = Mid(pt, 19, 2)
pk(10) = b10$
b11$ = Mid(pt, 21, 2)
pk(11) = b11$
b12$ = Mid(pt, 23, 2)
pk(12) = b12$
b13$ = Mid(pt, 25, 2)
pk(13) = b13$
k = 0
count0 = 0
count1 = 0
' qq quasi-loop because the Exit For in the
' loop forces closure of the loop rather than Next
For q = n + 1 To nn
For qq = 0 To 0
ps = Array1(q, 1)
' pt is the profile to be checked
zz = 0
xx = 0
a2$ = Mid(ps, 3, 2)
ph(2) = a2$
If ph(2) <> pk(2) Then zz = zz + 1
xx = xx + zz
zz = 0
a3$ = Mid(ps, 5, 2)
ph(3) = a3$
If ph(3) <> pk(3) Then zz = zz + 1
xx = xx + zz
zz = 0
a4$ = Mid(ps, 7, 2)
ph(4) = a4$
If ph(4) <> pk(4) Then zz = zz + 1
xx = xx + zz
zz = 0
a5$ = Mid(ps, 9, 2)
ph(5) = a5$
If ph(5) <> pk(5) Then zz = zz + 1
xx = xx + zz
zz = 0
a6$ = Mid(ps, 11, 2)
ph(6) = a6$
If ph(6) <> pk(6) Then zz = zz + 1
xx = xx + zz
If xx > 4 Then
Exit For
End If
zz = 0
a7$ = Mid(ps, 13, 2)
ph(7) = a7$
If ph(7) <> pk(7) Then zz = zz + 1
xx = xx + zz
If xx > 4 Then
Exit For
End If
zz = 0
a8$ = Mid(ps, 15, 2)
ph(8) = a8$
If ph(8) <> pk(8) Then zz = zz + 1
xx = xx + zz
If xx > 4 Then
Exit For
End If
zz = 0
a9$ = Mid(ps, 17, 2)
ph(9) = a9$
If ph(9) <> pk(9) Then zz = zz + 1
xx = xx + zz
If xx > 4 Then
Exit For
End If
zz = 0
a10$ = Mid(ps, 19, 2)
ph(10) = a10$
If ph(10) <> pk(10) Then zz = zz + 1
xx = xx + zz
If xx > 4 Then
Exit For
End If
zz = 0
a11$ = Mid(ps, 21, 2)
ph(11) = a11$
If ph(11) <> pk(11) Then zz = zz + 1
xx = xx + zz
If xx > 4 Then
Exit For
End If
zz = 0
a12$ = Mid(ps, 23, 2)
ph(12) = a12$
If ph(12) <> pk(12) Then zz = zz + 1
xx = xx + zz
If xx > 4 Then
Exit For
End If
zz = 0
a13$ = Mid(ps, 25, 2)
ph(13) = a13$
If ph(13) <> pk(13) Then zz = zz + 1
xx = xx + zz
If xx > 4 Then
Exit For
End If
Write #10, ps, pt, 13 - xx
Next qq
' xx is the count of non-matching alleles in ps profile to the pt profile
count1 = count1 + 1
k = k + 1
Next q
j = j + 1
' beeps after 1000s
If j / 1000 = Int(j / 1000) Then
For beepc = 1 To (j / 1000)
For beept = 1 To 200000
beepu = 1 / beept
Next beept
Beep
Next beepc
End If
Next n
Close #1
Next aa
' end beep
For beepc = 1 To bb
For beept = 1 To 200000
beepu = 1 / beept
Next beept
Beep
Next beepc
Next bb
Close #10
' end beep
For beepc = 1 To 10
For beept = 1 To 200000
beepu = 1 / beept
Next beept
Beep
Next beepc
' dividing into 10 by third digit
' Dividing 10 files into 10 by fourth digit
Dim ps As String
Dim ph(26)
xx = 0
yyyy = xx
temp = "jul12-d.txt"
temp0 = "jul28b 0"
temp1 = "jul28b 1"
temp2 = "jul28b 2"
temp3 = "jul28b 3"
temp4 = "jul28b 4"
temp5 = "jul28b 5"
temp6 = "jul28b 6"
temp7 = "jul28b 7"
temp8 = "jul28b 8"
temp9 = "jul28b 9"
tempc = "jul28b c"
Open temp For Input As #1
Open temp0 For Output As #10
Open temp1 For Output As #11
Open temp2 For Output As #12
Open temp3 For Output As #13
Open temp4 For Output As #14
Open temp5 For Output As #15
Open temp6 For Output As #16
Open temp7 For Output As #17
Open temp8 For Output As #18
Open temp9 For Output As #19
count0 = 0
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
Do Until (EOF(1) = True)
Input #1, ps
a2$ = Mid(ps, 3, 1)
ph(1) = Val(a2$)
If ph(1) = 0 Then
Write #10, ps
count0 = count0 + 1
End If
If ph(1) = 1 Then
Write #11, ps
count1 = count1 + 1
End If
If ph(1) = 2 Then
Write #12, ps
count2 = count2 + 1
End If
If ph(1) = 3 Then
Write #13, ps
count3 = count3 + 1
End If
If ph(1) = 4 Then
Write #14, ps
count4 = count4 + 1
End If
If ph(1) = 5 Then
Write #15, ps
count5 = count5 + 1
End If
If ph(1) = 6 Then
Write #16, ps
count6 = count6 + 1
End If
If ph(1) = 7 Then
Write #17, ps
count7 = count7 + 1
End If
If ph(1) = 8 Then
Write #18, ps
count8 = count8 + 1
End If
If ph(1) = 9 Then
Write #19, ps
count9 = count9 + 1
End If
x = x + 1
Loop
Close (1)
Close #1
Close #10
Close #11
Close #12
Close #13
Close #14
Close #15
Close #16
Close #17
Close #18
Close #19
' output counts
countt = count0 + count1 + count2 + count3 + count4 + count5 + count6 + count7 + count8 + count9
Open tempc For Output As #20
Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9, total, countt
Close #20
For xx = 0 To 9
yyyy = xx
temp = "jul28b" & Str(yyyy)
temp0 = "jul28b" & Str(yyyy) & " 0"
temp1 = "jul28b" & Str(yyyy) & " 1"
temp2 = "jul28b" & Str(yyyy) & " 2"
temp3 = "jul28b" & Str(yyyy) & " 3"
temp4 = "jul28b" & Str(yyyy) & " 4"
temp5 = "jul28b" & Str(yyyy) & " 5"
temp6 = "jul28b" & Str(yyyy) & " 6"
temp7 = "jul28b" & Str(yyyy) & " 7"
temp8 = "jul28b" & Str(yyyy) & " 8"
temp9 = "jul28b" & Str(yyyy) & " 9"
tempc = "jul28b" & Str(yyyy) & " c"
Open temp For Input As #1
Open temp0 For Output As #10
Open temp1 For Output As #11
Open temp2 For Output As #12
Open temp3 For Output As #13
Open temp4 For Output As #14
Open temp5 For Output As #15
Open temp6 For Output As #16
Open temp7 For Output As #17
Open temp8 For Output As #18
Open temp9 For Output As #19
count0 = 0
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
Do Until (EOF(1) = True)
Input #1, ps
a2$ = Mid(ps, 4, 1)
ph(1) = Val(a2$)
If ph(1) = 0 Then
Write #10, ps
count0 = count0 + 1
End If
If ph(1) = 1 Then
Write #11, ps
count1 = count1 + 1
End If
If ph(1) = 2 Then
Write #12, ps
count2 = count2 + 1
End If
If ph(1) = 3 Then
Write #13, ps
count3 = count3 + 1
End If
If ph(1) = 4 Then
Write #14, ps
count4 = count4 + 1
End If
If ph(1) = 5 Then
Write #15, ps
count5 = count5 + 1
End If
If ph(1) = 6 Then
Write #16, ps
count6 = count6 + 1
End If
If ph(1) = 7 Then
Write #17, ps
count7 = count7 + 1
End If
If ph(1) = 8 Then
Write #18, ps
count8 = count8 + 1
End If
If ph(1) = 9 Then
Write #19, ps
count9 = count9 + 1
End If
x = x + 1
Loop
Close (1)
Close #1
Close #10
Close #11
Close #12
Close #13
Close #14
Close #15
Close #16
Close #17
Close #18
Close #19
' output counts
Open tempc For Output As #20
Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9
Close #20
Next xx
'End Beep
Beep
Dim ps As String
Dim pt As String
Dim ph(13)
Dim pk(13)
Dim Array1(40000, 1)
' after dividing into 10 x 10 files on locus 2, L2
' compares L3 columns each with the other
temp0 = "jul29a-r.txt"
Open temp0 For Output As #10
For bb = 0 To 9
For aa = 0 To 9
temp = "jul28b" & Str(bb) & Str(aa)
temp2 = "jul28b" & Str(bb) & Str(aa)
Open temp2 For Input As #2
nn = 0
Do While (EOF(2) = False)
Input #2, pt
c1$ = Mid(pt, 1, 26)
Array1(nn, 1) = c1$
nn = nn + 1
Loop
Close (2)
Close #2
startlim = 0
' endlim= nn-1
endlim = nn - 1
j = startlim
k = 0
For n = 0 To endlim
' pt is the profile to be checked against for each
' other than itself
' comparing all columns 3 to 13
pt = Array1(n, 1)
b3$ = Mid(pt, 5, 2)
pk(3) = b3$
b4$ = Mid(pt, 7, 2)
pk(4) = b4$
b5$ = Mid(pt, 9, 2)
pk(5) = b5$
b6$ = Mid(pt, 11, 2)
pk(6) = b6$
b7$ = Mid(pt, 13, 2)
pk(7) = b7$
b8$ = Mid(pt, 15, 2)
pk(8) = b8$
b9$ = Mid(pt, 17, 2)
pk(9) = b9$
b10$ = Mid(pt, 19, 2)
pk(10) = b10$
b11$ = Mid(pt, 21, 2)
pk(11) = b11$
b12$ = Mid(pt, 23, 2)
pk(12) = b12$
b13$ = Mid(pt, 25, 2)
pk(13) = b13$
k = 0
count0 = 0
count1 = 0
' qq quasi-loop because the Exit For in the
' loop forces closure of the loop rather than Next
For q = n + 1 To nn
For qq = 0 To 0
ps = Array1(q, 1)
' pt is the profile to be checked
zz = 0
xx = 0
a3$ = Mid(ps, 5, 2)
ph(3) = a3$
If ph(3) <> pk(3) Then zz = zz + 1
xx = xx + zz
zz = 0
a4$ = Mid(ps, 7, 2)
ph(4) = a4$
If ph(4) <> pk(4) Then zz = zz + 1
xx = xx + zz
zz = 0
a5$ = Mid(ps, 9, 2)
ph(5) = a5$
If ph(5) <> pk(5) Then zz = zz + 1
xx = xx + zz
zz = 0
a6$ = Mid(ps, 11, 2)
ph(6) = a6$
If ph(6) <> pk(6) Then zz = zz + 1
xx = xx + zz
If xx > 3 Then
Exit For
End If
zz = 0
a7$ = Mid(ps, 13, 2)
ph(7) = a7$
If ph(7) <> pk(7) Then zz = zz + 1
xx = xx + zz
If xx > 3 Then
Exit For
End If
zz = 0
a8$ = Mid(ps, 15, 2)
ph(8) = a8$
If ph(8) <> pk(8) Then zz = zz + 1
xx = xx + zz
If xx > 3 Then
Exit For
End If
zz = 0
a9$ = Mid(ps, 17, 2)
ph(9) = a9$
If ph(9) <> pk(9) Then zz = zz + 1
xx = xx + zz
If xx > 3 Then
Exit For
End If
zz = 0
a10$ = Mid(ps, 19, 2)
ph(10) = a10$
If ph(10) <> pk(10) Then zz = zz + 1
xx = xx + zz
If xx > 3 Then
Exit For
End If
zz = 0
a11$ = Mid(ps, 21, 2)
ph(11) = a11$
If ph(11) <> pk(11) Then zz = zz + 1
xx = xx + zz
If xx > 3 Then
Exit For
End If
zz = 0
a12$ = Mid(ps, 23, 2)
ph(12) = a12$
If ph(12) <> pk(12) Then zz = zz + 1
xx = xx + zz
If xx > 3 Then
Exit For
End If
zz = 0
a13$ = Mid(ps, 25, 2)
ph(13) = a13$
If ph(13) <> pk(13) Then zz = zz + 1
xx = xx + zz
If xx > 3 Then
Exit For
End If
Write #10, ps, pt, 12 - xx
Next qq
' xx is the count of non-matching alleles in ps profile to the pt profile
count1 = count1 + 1
k = k + 1
Next q
j = j + 1
' beeps after 1000s
If j / 1000 = Int(j / 1000) Then
For beepc = 1 To (j / 1000)
For beept = 1 To 200000
beepu = 1 / beept
Next beept
Beep
Next beepc
End If
Next n
Close #1
Next aa
Next bb
Close #10
' end beep
For beepc = 1 To 10
For beept = 1 To 200000
beepu = 1 / beept
Next beept
Beep
Next beepc
' dividing into 10 by fifth digit
' Dividing 10 files into 10 by sixth digit
Dim ps As String
Dim ph(26)
xx = 0
yyyy = xx
temp = "jul12-d.txt"
temp0 = "jul28c 0"
temp1 = "jul28c 1"
temp2 = "jul28c 2"
temp3 = "jul28c 3"
temp4 = "jul28c 4"
temp5 = "jul28c 5"
temp6 = "jul28c 6"
temp7 = "jul28c 7"
temp8 = "jul28c 8"
temp9 = "jul28c 9"
tempc = "jul28c c"
Open temp For Input As #1
Open temp0 For Output As #10
Open temp1 For Output As #11
Open temp2 For Output As #12
Open temp3 For Output As #13
Open temp4 For Output As #14
Open temp5 For Output As #15
Open temp6 For Output As #16
Open temp7 For Output As #17
Open temp8 For Output As #18
Open temp9 For Output As #19
count0 = 0
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
Do Until (EOF(1) = True)
Input #1, ps
a2$ = Mid(ps, 5, 1)
ph(1) = Val(a2$)
If ph(1) = 0 Then
Write #10, ps
count0 = count0 + 1
End If
If ph(1) = 1 Then
Write #11, ps
count1 = count1 + 1
End If
If ph(1) = 2 Then
Write #12, ps
count2 = count2 + 1
End If
If ph(1) = 3 Then
Write #13, ps
count3 = count3 + 1
End If
If ph(1) = 4 Then
Write #14, ps
count4 = count4 + 1
End If
If ph(1) = 5 Then
Write #15, ps
count5 = count5 + 1
End If
If ph(1) = 6 Then
Write #16, ps
count6 = count6 + 1
End If
If ph(1) = 7 Then
Write #17, ps
count7 = count7 + 1
End If
If ph(1) = 8 Then
Write #18, ps
count8 = count8 + 1
End If
If ph(1) = 9 Then
Write #19, ps
count9 = count9 + 1
End If
x = x + 1
Loop
Close (1)
Close #1
Close #10
Close #11
Close #12
Close #13
Close #14
Close #15
Close #16
Close #17
Close #18
Close #19
' output counts
countt = count0 + count1 + count2 + count3 + count4 + count5 + count6 + count7 + count8 + count9
Open tempc For Output As #20
Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9, total, countt
Close #20
For xx = 0 To 9
yyyy = xx
temp = "jul28c" & Str(yyyy)
temp0 = "jul28c" & Str(yyyy) & " 0"
temp1 = "jul28c" & Str(yyyy) & " 1"
temp2 = "jul28c" & Str(yyyy) & " 2"
temp3 = "jul28c" & Str(yyyy) & " 3"
temp4 = "jul28c" & Str(yyyy) & " 4"
temp5 = "jul28c" & Str(yyyy) & " 5"
temp6 = "jul28c" & Str(yyyy) & " 6"
temp7 = "jul28c" & Str(yyyy) & " 7"
temp8 = "jul28c" & Str(yyyy) & " 8"
temp9 = "jul28c" & Str(yyyy) & " 9"
tempc = "jul28c" & Str(yyyy) & " c"
Open temp For Input As #1
Open temp0 For Output As #10
Open temp1 For Output As #11
Open temp2 For Output As #12
Open temp3 For Output As #13
Open temp4 For Output As #14
Open temp5 For Output As #15
Open temp6 For Output As #16
Open temp7 For Output As #17
Open temp8 For Output As #18
Open temp9 For Output As #19
count0 = 0
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
Do Until (EOF(1) = True)
Input #1, ps
a2$ = Mid(ps, 6, 1)
ph(1) = Val(a2$)
If ph(1) = 0 Then
Write #10, ps
count0 = count0 + 1
End If
If ph(1) = 1 Then
Write #11, ps
count1 = count1 + 1
End If
If ph(1) = 2 Then
Write #12, ps
count2 = count2 + 1
End If
If ph(1) = 3 Then
Write #13, ps
count3 = count3 + 1
End If
If ph(1) = 4 Then
Write #14, ps
count4 = count4 + 1
End If
If ph(1) = 5 Then
Write #15, ps
count5 = count5 + 1
End If
If ph(1) = 6 Then
Write #16, ps
count6 = count6 + 1
End If
If ph(1) = 7 Then
Write #17, ps
count7 = count7 + 1
End If
If ph(1) = 8 Then
Write #18, ps
count8 = count8 + 1
End If
If ph(1) = 9 Then
Write #19, ps
count9 = count9 + 1
End If
x = x + 1
Loop
Close (1)
Close #1
Close #10
Close #11
Close #12
Close #13
Close #14
Close #15
Close #16
Close #17
Close #18
Close #19
' output counts
Open tempc For Output As #20
Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9
Close #20
Next xx
'End Beep
Beep
Dim ps As String
Dim pt As String
Dim ph(13)
Dim pk(13)
Dim Array1(40000, 1)
' after dividing into 10 x 10 files on locus 3
' compares L4 columns each with the other
temp0 = "jul30-r.txt"
Open temp0 For Output As #10
For bb = 0 To 9
For aa = 0 To 9
temp = "jul28c" & Str(bb) & Str(aa)
temp2 = "jul28c" & Str(bb) & Str(aa)
Open temp2 For Input As #2
nn = 0
Do While (EOF(2) = False)
Input #2, pt
c1$ = Mid(pt, 1, 26)
Array1(nn, 1) = c1$
nn = nn + 1
Loop
Close (2)
Close #2
startlim = 0
' endlim= nn-1
endlim = nn - 1
j = startlim
k = 0
For n = 0 To endlim
' pt is the profile to be checked against for each
' other than itself
' comparing all columns 4 to 13
pt = Array1(n, 1)
b4$ = Mid(pt, 7, 2)
pk(4) = b4$
b5$ = Mid(pt, 9, 2)
pk(5) = b5$
b6$ = Mid(pt, 11, 2)
pk(6) = b6$
b7$ = Mid(pt, 13, 2)
pk(7) = b7$
b8$ = Mid(pt, 15, 2)
pk(8) = b8$
b9$ = Mid(pt, 17, 2)
pk(9) = b9$
b10$ = Mid(pt, 19, 2)
pk(10) = b10$
b11$ = Mid(pt, 21, 2)
pk(11) = b11$
b12$ = Mid(pt, 23, 2)
pk(12) = b12$
b13$ = Mid(pt, 25, 2)
pk(13) = b13$
k = 0
count0 = 0
count1 = 0
' qq quasi-loop because the Exit For in the
' loop forces closure of the loop rather than Next
For q = n + 1 To nn
For qq = 0 To 0
ps = Array1(q, 1)
' pt is the profile to be checked
zz = 0
xx = 0
a4$ = Mid(ps, 7, 2)
ph(4) = a4$
If ph(4) <> pk(4) Then zz = zz + 1
xx = xx + zz
zz = 0
a5$ = Mid(ps, 9, 2)
ph(5) = a5$
If ph(5) <> pk(5) Then zz = zz + 1
xx = xx + zz
zz = 0
a6$ = Mid(ps, 11, 2)
ph(6) = a6$
If ph(6) <> pk(6) Then zz = zz + 1
xx = xx + zz
If xx > 2 Then
Exit For
End If
zz = 0
a7$ = Mid(ps, 13, 2)
ph(7) = a7$
If ph(7) <> pk(7) Then zz = zz + 1
xx = xx + zz
If xx > 2 Then
Exit For
End If
zz = 0
a8$ = Mid(ps, 15, 2)
ph(8) = a8$
If ph(8) <> pk(8) Then zz = zz + 1
xx = xx + zz
If xx > 2 Then
Exit For
End If
zz = 0
a9$ = Mid(ps, 17, 2)
ph(9) = a9$
If ph(9) <> pk(9) Then zz = zz + 1
xx = xx + zz
If xx > 2 Then
Exit For
End If
zz = 0
a10$ = Mid(ps, 19, 2)
ph(10) = a10$
If ph(10) <> pk(10) Then zz = zz + 1
xx = xx + zz
If xx > 2 Then
Exit For
End If
zz = 0
a11$ = Mid(ps, 21, 2)
ph(11) = a11$
If ph(11) <> pk(11) Then zz = zz + 1
xx = xx + zz
If xx > 2 Then
Exit For
End If
zz = 0
a12$ = Mid(ps, 23, 2)
ph(12) = a12$
If ph(12) <> pk(12) Then zz = zz + 1
xx = xx + zz
If xx > 2 Then
Exit For
End If
zz = 0
a13$ = Mid(ps, 25, 2)
ph(13) = a13$
If ph(13) <> pk(13) Then zz = zz + 1
xx = xx + zz
If xx > 2 Then
Exit For
End If
Write #10, ps, pt, 11 - xx
Next qq
' xx is the count of non-matching alleles in ps profile to the pt profile
count1 = count1 + 1
k = k + 1
Next q
j = j + 1
' beeps after 1000s
If j / 1000 = Int(j / 1000) Then
For beepc = 1 To (j / 1000)
For beept = 1 To 200000
beepu = 1 / beept
Next beept
Beep
Next beepc
End If
Next n
Close #1
Next aa
Next bb
Close #10
' end beep
For beepc = 1 To 10
For beept = 1 To 200000
beepu = 1 / beept
Next beept
Beep
Next beepc
' dividing into 10 by seventth digit
' Dividing 10 files into 10 by eighth digit
Dim ps As String
Dim ph(26)
xx = 0
yyyy = xx
temp = "jul12-d.txt"
temp0 = "jul28d 0"
temp1 = "jul28d 1"
temp2 = "jul28d 2"
temp3 = "jul28d 3"
temp4 = "jul28d 4"
temp5 = "jul28d 5"
temp6 = "jul28d 6"
temp7 = "jul28d 7"
temp8 = "jul28d 8"
temp9 = "jul28d 9"
tempc = "jul28d c"
Open temp For Input As #1
Open temp0 For Output As #10
Open temp1 For Output As #11
Open temp2 For Output As #12
Open temp3 For Output As #13
Open temp4 For Output As #14
Open temp5 For Output As #15
Open temp6 For Output As #16
Open temp7 For Output As #17
Open temp8 For Output As #18
Open temp9 For Output As #19
count0 = 0
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
Do Until (EOF(1) = True)
Input #1, ps
a2$ = Mid(ps, 7, 1)
ph(1) = Val(a2$)
If ph(1) = 0 Then
Write #10, ps
count0 = count0 + 1
End If
If ph(1) = 1 Then
Write #11, ps
count1 = count1 + 1
End If
If ph(1) = 2 Then
Write #12, ps
count2 = count2 + 1
End If
If ph(1) = 3 Then
Write #13, ps
count3 = count3 + 1
End If
If ph(1) = 4 Then
Write #14, ps
count4 = count4 + 1
End If
If ph(1) = 5 Then
Write #15, ps
count5 = count5 + 1
End If
If ph(1) = 6 Then
Write #16, ps
count6 = count6 + 1
End If
If ph(1) = 7 Then
Write #17, ps
count7 = count7 + 1
End If
If ph(1) = 8 Then
Write #18, ps
count8 = count8 + 1
End If
If ph(1) = 9 Then
Write #19, ps
count9 = count9 + 1
End If
x = x + 1
Loop
Close (1)
Close #1
Close #10
Close #11
Close #12
Close #13
Close #14
Close #15
Close #16
Close #17
Close #18
Close #19
' output counts
countt = count0 + count1 + count2 + count3 + count4 + count5 + count6 + count7 + count8 + count9
Open tempc For Output As #20
Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9, total, countt
Close #20
For xx = 0 To 9
yyyy = xx
temp = "jul28d" & Str(yyyy)
temp0 = "jul28d" & Str(yyyy) & " 0"
temp1 = "jul28d" & Str(yyyy) & " 1"
temp2 = "jul28d" & Str(yyyy) & " 2"
temp3 = "jul28d" & Str(yyyy) & " 3"
temp4 = "jul28d" & Str(yyyy) & " 4"
temp5 = "jul28d" & Str(yyyy) & " 5"
temp6 = "jul28d" & Str(yyyy) & " 6"
temp7 = "jul28d" & Str(yyyy) & " 7"
temp8 = "jul28d" & Str(yyyy) & " 8"
temp9 = "jul28d" & Str(yyyy) & " 9"
tempc = "jul28d" & Str(yyyy) & " c"
Open temp For Input As #1
Open temp0 For Output As #10
Open temp1 For Output As #11
Open temp2 For Output As #12
Open temp3 For Output As #13
Open temp4 For Output As #14
Open temp5 For Output As #15
Open temp6 For Output As #16
Open temp7 For Output As #17
Open temp8 For Output As #18
Open temp9 For Output As #19
count0 = 0
count1 = 0
count2 = 0
count3 = 0
count4 = 0
count5 = 0
count6 = 0
count7 = 0
count8 = 0
count9 = 0
Do Until (EOF(1) = True)
Input #1, ps
a2$ = Mid(ps, 8, 1)
ph(1) = Val(a2$)
If ph(1) = 0 Then
Write #10, ps
count0 = count0 + 1
End If
If ph(1) = 1 Then
Write #11, ps
count1 = count1 + 1
End If
If ph(1) = 2 Then
Write #12, ps
count2 = count2 + 1
End If
If ph(1) = 3 Then
Write #13, ps
count3 = count3 + 1
End If
If ph(1) = 4 Then
Write #14, ps
count4 = count4 + 1
End If
If ph(1) = 5 Then
Write #15, ps
count5 = count5 + 1
End If
If ph(1) = 6 Then
Write #16, ps
count6 = count6 + 1
End If
If ph(1) = 7 Then
Write #17, ps
count7 = count7 + 1
End If
If ph(1) = 8 Then
Write #18, ps
count8 = count8 + 1
End If
If ph(1) = 9 Then
Write #19, ps
count9 = count9 + 1
End If
x = x + 1
Loop
Close (1)
Close #1
Close #10
Close #11
Close #12
Close #13
Close #14
Close #15
Close #16
Close #17
Close #18
Close #19
' output counts
Open tempc For Output As #20
Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9
Close #20
Next xx
'End Beep
Beep
Dim ps As String
Dim pt As String
Dim ph(13)
Dim pk(13)
Dim Array1(40000, 1)
' after dividing into 10 x 10 files on locus 4
' compares L5 columns each with the other
temp0 = "jul30a-r.txt"
Open temp0 For Output As #10
For bb = 0 To 9
For aa = 0 To 9
temp = "jul28d" & Str(bb) & Str(aa)
temp2 = "jul28d" & Str(bb) & Str(aa)
Open temp2 For Input As #2
nn = 0
Do While (EOF(2) = False)
Input #2, pt
c1$ = Mid(pt, 1, 26)
Array1(nn, 1) = c1$
nn = nn + 1
Loop
Close (2)
Close #2
startlim = 0
' endlim= nn-1
endlim = nn - 1
j = startlim
k = 0
For n = 0 To endlim
' pt is the profile to be checked against for each
' other than itself
' comparing all columns 4 to 13
pt = Array1(n, 1)
b5$ = Mid(pt, 9, 2)
pk(5) = b5$
b6$ = Mid(pt, 11, 2)
pk(6) = b6$
b7$ = Mid(pt, 13, 2)
pk(7) = b7$
b8$ = Mid(pt, 15, 2)
pk(8) = b8$
b9$ = Mid(pt, 17, 2)
pk(9) = b9$
b10$ = Mid(pt, 19, 2)
pk(10) = b10$
b11$ = Mid(pt, 21, 2)
pk(11) = b11$
b12$ = Mid(pt, 23, 2)
pk(12) = b12$
b13$ = Mid(pt, 25, 2)
pk(13) = b13$
k = 0
count0 = 0
count1 = 0
' qq quasi-loop because the Exit For in the
' loop forces closure of the loop rather than Next
For q = n + 1 To nn
For qq = 0 To 0
ps = Array1(q, 1)
' pt is the profile to be checked
zz = 0
xx = 0
a5$ = Mid(ps, 9, 2)
ph(5) = a5$
If ph(5) <> pk(5) Then zz = zz + 1
xx = xx + zz
zz = 0
a6$ = Mid(ps, 11, 2)
ph(6) = a6$
If ph(6) <> pk(6) Then zz = zz + 1
xx = xx + zz
If xx > 1 Then
Exit For
End If
zz = 0
a7$ = Mid(ps, 13, 2)
ph(7) = a7$
If ph(7) <> pk(7) Then zz = zz + 1
xx = xx + zz
If xx > 1 Then
Exit For
End If
zz = 0
a8$ = Mid(ps, 15, 2)
ph(8) = a8$
If ph(8) <> pk(8) Then zz = zz + 1
xx = xx + zz
If xx > 1 Then
Exit For
End If
zz = 0
a9$ = Mid(ps, 17, 2)
ph(9) = a9$
If ph(9) <> pk(9) Then zz = zz + 1
xx = xx + zz
If xx > 1 Then
Exit For
End If
zz = 0
a10$ = Mid(ps, 19, 2)
ph(10) = a10$
If ph(10) <> pk(10) Then zz = zz + 1
xx = xx + zz
If xx > 1 Then
Exit For
End If
zz = 0
a11$ = Mid(ps, 21, 2)
ph(11) = a11$
If ph(11) <> pk(11) Then zz = zz + 1
xx = xx + zz
If xx > 1 Then
Exit For
End If
zz = 0
a12$ = Mid(ps, 23, 2)
ph(12) = a12$
If ph(12) <> pk(12) Then zz = zz + 1
xx = xx + zz
If xx > 1 Then
Exit For
End If
zz = 0
a13$ = Mid(ps, 25, 2)
ph(13) = a13$
If ph(13) <> pk(13) Then zz = zz + 1
xx = xx + zz
If xx > 1 Then
Exit For
End If
Write #10, ps, pt, 10 - xx
Next qq
' xx is the count of non-matching alleles in ps profile to the pt profile
count1 = count1 + 1
k = k + 1
Next q
j = j + 1
' beeps after 1000s
If j / 1000 = Int(j / 1000) Then
For beepc = 1 To (j / 1000)
For beept = 1 To 200000
beepu = 1 / beept
Next beept
Beep
Next beepc
End If
Next n
Close #1
Next aa
Next bb
Close #10
' end beep
For beepc = 1 To 10
For beept = 1 To 200000
beepu = 1 / beept
Next beept
Beep
Next beepc
' converting integre values back to DNA loci,alleles, 13 loci Codis
' afterwards tidy up using Word search/replace
' to get rid of string quotes "" etc
' xxxx is number of profiles to be converted
Dim ph(26)
Dim pj(26)
Dim ps As String
Open "result.txt" For Input As #1
Open "result_conv.txt" For Output As #2
For x = 1 To xxxx
Input #1, ps
a1$ = Mid(ps, 1, 1)
a2$ = Mid(ps, 2, 1)
a3$ = Mid(ps, 3, 1)
a4$ = Mid(ps, 4, 1)
a5$ = Mid(ps, 5, 1)
a6$ = Mid(ps, 6, 1)
a7$ = Mid(ps, 7, 1)
a8$ = Mid(ps, 8, 1)
a9$ = Mid(ps, 9, 1)
a10$ = Mid(ps, 10, 1)
a11$ = Mid(ps, 11, 1)
a12$ = Mid(ps, 12, 1)
a13$ = Mid(ps, 13, 1)
a14$ = Mid(ps, 14, 1)
a15$ = Mid(ps, 15, 1)
a16$ = Mid(ps, 16, 1)
a17$ = Mid(ps, 17, 1)
a18$ = Mid(ps, 18, 1)
a19$ = Mid(ps, 19, 1)
a20$ = Mid(ps, 20, 1)
a21$ = Mid(ps, 21, 1)
a22$ = Mid(ps, 22, 1)
a23$ = Mid(ps, 23, 1)
a24$ = Mid(ps, 24, 1)
a25$ = Mid(ps, 25, 1)
a26$ = Mid(ps, 26, 1)
ph(0) = a1$
ph(1) = a2$
ph(2) = a3$
ph(3) = a4$
ph(4) = a5$
ph(5) = a6$
ph(6) = a7$
ph(7) = a8$
ph(8) = a9$
ph(9) = a10$
ph(10) = a11$
ph(11) = a12$
ph(12) = a13$
ph(13) = a14$
ph(14) = a15$
ph(15) = a16$
ph(16) = a17$
ph(17) = a18$
ph(18) = a19$
ph(19) = a20$
ph(20) = a21$
ph(21) = a22$
ph(22) = a23$
ph(23) = a24$
ph(24) = a25$
ph(25) = a26$
For j = 0 To 1
' D3
If ph(j) = "0" Then pj(j) = 12
If ph(j) = "1" Then pj(j) = 13
If ph(j) = "2" Then pj(j) = 14
If ph(j) = "3" Then pj(j) = 15
If ph(j) = "4" Then pj(j) = 16
If ph(j) = "5" Then pj(j) = 17
If ph(j) = "6" Then pj(j) = 18
If ph(j) = "7" Then pj(j) = 19
If ph(j) = "8" Then pj(j) = 20
Next j
For j = 2 To 3
' vWA
' 11, "8"
' 12, "9"
If ph(j) = "0" Then pj(j) = 13
If ph(j) = "1" Then pj(j) = 14
If ph(j) = "2" Then pj(j) = 15
If ph(j) = "3" Then pj(j) = 16
If ph(j) = "4" Then pj(j) = 17
If ph(j) = "5" Then pj(j) = 18
If ph(j) = "6" Then pj(j) = 19
If ph(j) = "7" Then pj(j) = 20
' 21, "A"
Next j
For j = 4 To 5
' D8
If ph(j) = "0" Then pj(j) = 8
If ph(j) = "1" Then pj(j) = 9
If ph(j) = "2" Then pj(j) = 10
If ph(j) = "3" Then pj(j) = 11
If ph(j) = "4" Then pj(j) = 12
If ph(j) = "5" Then pj(j) = 13
If ph(j) = "6" Then pj(j) = 14
If ph(j) = "7" Then pj(j) = 15
If ph(j) = "8" Then pj(j) = 16
'17
'18
Next j
For j = 6 To 7
' D5
If ph(j) = "0" Then pj(j) = 7
If ph(j) = "1" Then pj(j) = 8
If ph(j) = "2" Then pj(j) = 9
If ph(j) = "3" Then pj(j) = 10
If ph(j) = "4" Then pj(j) = 11
If ph(j) = "5" Then pj(j) = 12
If ph(j) = "6" Then pj(j) = 13
If ph(j) = "7" Then pj(j) = 14
' 8
Next j
For j = 8 To 9
' D13
If ph(j) = "0" Then pj(j) = 8
If ph(j) = "1" Then pj(j) = 9
If ph(j) = "2" Then pj(j) = 10
If ph(j) = "3" Then pj(j) = 11
If ph(j) = "4" Then pj(j) = 12
If ph(j) = "5" Then pj(j) = 13
If ph(j) = "6" Then pj(j) = 14
' 15
Next j
For j = 10 To 11
' D7
If ph(j) = "0" Then pj(j) = 7
If ph(j) = "1" Then pj(j) = 8
If ph(j) = "2" Then pj(j) = 9
If ph(j) = "3" Then pj(j) = 10
If ph(j) = "4" Then pj(j) = 11
If ph(j) = "5" Then pj(j) = 12
If ph(j) = "6" Then pj(j) = 13
If ph(j) = "7" Then pj(j) = 14
Next j
For j = 12 To 13
' D16
If ph(j) = "0" Then pj(j) = 8
If ph(j) = "1" Then pj(j) = 9
If ph(j) = "2" Then pj(j) = 10
If ph(j) = "3" Then pj(j) = 11
If ph(j) = "4" Then pj(j) = 12
If ph(j) = "5" Then pj(j) = 13
If ph(j) = "6" Then pj(j) = 14
Next j
For j = 14 To 15
' FGA
If ph(j) = "0" Then pj(j) = 16
If ph(j) = "1" Then pj(j) = 18
If ph(j) = "2" Then pj(j) = 19
If ph(j) = "3" Then pj(j) = 20
If ph(j) = "4" Then pj(j) = 21
If ph(j) = "5" Then pj(j) = 21.2
If ph(j) = "6" Then pj(j) = 22
If ph(j) = "7" Then pj(j) = 22.2
If ph(j) = "8" Then pj(j) = 23
If ph(j) = "9" Then pj(j) = 24
If ph(j) = "A" Then pj(j) = 25
If ph(j) = "B" Then pj(j) = 26
If ph(j) = "C" Then pj(j) = 27
Next j
For j = 16 To 17
' D21
If ph(j) = "0" Then pj(j) = 26
If ph(j) = "1" Then pj(j) = 27
If ph(j) = "2" Then pj(j) = 28
If ph(j) = "3" Then pj(j) = 29
If ph(j) = "4" Then pj(j) = 29.2
If ph(j) = "5" Then pj(j) = 30
If ph(j) = "6" Then pj(j) = 30.2
If ph(j) = "7" Then pj(j) = 31
If ph(j) = "8" Then pj(j) = 31.2
If ph(j) = "9" Then pj(j) = 32
If ph(j) = "A" Then pj(j) = 32.2
If ph(j) = "B" Then pj(j) = 33.2
If ph(j) = "C" Then pj(j) = 34
If ph(j) = "D" Then pj(j) = 34.2
Next j
For j = 18 To 19
' D18
If ph(j) = "0" Then pj(j) = 10
If ph(j) = "1" Then pj(j) = 11
If ph(j) = "2" Then pj(j) = 12
If ph(j) = "3" Then pj(j) = 13
If ph(j) = "4" Then pj(j) = 14
If ph(j) = "5" Then pj(j) = 15
If ph(j) = "6" Then pj(j) = 16
If ph(j) = "7" Then pj(j) = 17
If ph(j) = "8" Then pj(j) = 18
If ph(j) = "9" Then pj(j) = 19
If ph(j) = "A" Then pj(j) = 20
If ph(j) = "B" Then pj(j) = 21
If ph(j) = "C" Then pj(j) = 22
If ph(j) = "D" Then pj(j) = 23
If ph(j) = "E" Then pj(j) = 24
Next j
For j = 20 To 21
' THO1
If ph(j) = "0" Then pj(j) = 6
If ph(j) = "1" Then pj(j) = 7
If ph(j) = "2" Then pj(j) = 8
If ph(j) = "3" Then pj(j) = 9
If ph(j) = "4" Then pj(j) = 9.3
If ph(j) = "5" Then pj(j) = 10
Next j
For j = 22 To 23
' TPOX
If ph(j) = "0" Then pj(j) = 8
If ph(j) = "1" Then pj(j) = 9
If ph(j) = "2" Then pj(j) = 10
If ph(j) = "3" Then pj(j) = 11
If ph(j) = "4" Then pj(j) = 12
Next j
For j = 24 To 25
' CSF1PO
If ph(j) = "0" Then pj(j) = 8
If ph(j) = "1" Then pj(j) = 9
If ph(j) = "2" Then pj(j) = 10
If ph(j) = "3" Then pj(j) = 11
If ph(j) = "4" Then pj(j) = 12
If ph(j) = "5" Then pj(j) = 13
If ph(j) = "6" Then pj(j) = 14
If ph(j) = "7" Then pj(j) = 15
Next j
Write #2, ""; pj(0), pj(1); ""; pj(2), pj(3); ""; pj(4), pj(5); ""; pj(6), pj(7); ""; pj(8), pj(9); ""; pj(10), pj(11); ""; pj(12), pj(13); ""; pj(14), pj(15); ""; pj(16), pj(17); ""; pj(18), pj(19); ""; pj(20), pj(21); ""; pj(22), pj(23); ""; pj(24), pj(25); ""
Next x
Close #1
Close #2
' Generating 13 loci x2 profiles CODIS with >8 percent AFs
' directing pairs and first divider
' with changed order
' L11,L12,L13 then L1 to L10 to deliberately cherry-pick for greatest number
' of matches
Dim ph(26)
Dim pb(26)
' initialising Random Number Generator - RNG
count9 = 0
count8 = 0
Randomize
a = 214013
c = 2531011
x0 = Timer
z = 2 ^ 24
' 1 file 'aug02a-g' for original, un-directed pairs, source data.
' This file is necessary to check on the performance of the RNG
' when a matched pair is found then it is highly unlikely that
' both sequences as generated, before pair directing, would
' be the same - more likely a manifest of repeat within the RNG
' (reason for adopting the 214013 / 2531011 RNG )
' Use 'Word' find function on part of the sequences, including pair reversals,
' with luck would include a 'homozygotic' pair eg (3,3) say ,so no reversal
' on that pair
Open "aug02a-g" For Output As #1
' outputs directed and divided by first digit
Open "aug02a-0" For Output As #10
Open "aug02a-1" For Output As #11
Open "aug02a-2" For Output As #12
Open "aug02a-3" For Output As #13
Open "aug02a-4" For Output As #14
Open "aug02a-5" For Output As #15
Open "aug02a-6" For Output As #16
Open "aug02a-7" For Output As #17
Open "aug02a-8" For Output As #18
Open "aug02a-9" For Output As #19
' change xxxx for different total size
' for xxxx = 10000000 my computer took 5 hours to generate over-night
xxxx = 100000
xxxx = xxxx - 1
For x = 0 To xxxx
flag = 0
For j = 0 To 1
' D3 , locus 1
' RNG random number generator
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.005 Then ph(j) = 11
If ph(j) < 0.01 Then ph(j) = 1
If ph(j) < 0.124 Then ph(j) = 2
If ph(j) < 0.382 Then ph(j) = 3
If ph(j) < 0.623 Then ph(j) = 4
If ph(j) < 0.84 Then ph(j) = 5
If ph(j) < 0.988 Then ph(j) = 6
If ph(j) < 0.998 Then ph(j) = 7
If ph(j) < 1 Then ph(j) = 8
If ph(j) > 10 Then ph(j) = 0
If ph(j) = "7" Then flag = 1
If ph(j) = "8" Then flag = 1
Next j
For j = 2 To 3
' vWA locus 2
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.002 Then ph(j) = 11
If ph(j) < 0.084 Then ph(j) = 1
If ph(j) < 0.193 Then ph(j) = 2
If ph(j) < 0.42 Then ph(j) = 3
If ph(j) < 0.691 Then ph(j) = 4
If ph(j) < 0.903 Then ph(j) = 5
If ph(j) < 0.987 Then ph(j) = 6
If ph(j) < 1 Then ph(j) = 7
If ph(j) > 10 Then ph(j) = 0
If ph(j) = "0" Then flag = 1
If ph(j) = "1" Then flag = 1
If ph(j) = "6" Then flag = 1
If ph(j) = "7" Then flag = 1
Next j
For j = 4 To 5
' D8 , locus 3
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.012 Then ph(j) = 11
If ph(j) < 0.022 Then ph(j) = 1
If ph(j) < 0.107 Then ph(j) = 2
If ph(j) < 0.187 Then ph(j) = 3
If ph(j) < 0.324 Then ph(j) = 4
If ph(j) < 0.64 Then ph(j) = 5
If ph(j) < 0.865 Then ph(j) = 6
If ph(j) < 0.976 Then ph(j) = 7
If ph(j) < 1 Then ph(j) = 8
If ph(j) > 10 Then ph(j) = 0
If ph(j) = "0" Then flag = 1
If ph(j) = "1" Then flag = 1
If ph(j) = "8" Then flag = 1
Next j
For j = 6 To 7
' D5 , locus 4
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.002 Then ph(j) = 11
If ph(j) < 0.004 Then ph(j) = 1
If ph(j) < 0.031 Then ph(j) = 2
If ph(j) < 0.099 Then ph(j) = 3
If ph(j) < 0.435 Then ph(j) = 4
If ph(j) < 0.824 Then ph(j) = 5
If ph(j) < 0.985 Then ph(j) = 6
If ph(j) < 1 Then ph(j) = 7
If ph(j) > 10 Then ph(j) = 0
If ph(j) = "0" Then flag = 1
If ph(j) = "1" Then flag = 1
If ph(j) = "2" Then flag = 1
If ph(j) = "3" Then flag = 1
If ph(j) = "7" Then flag = 1
Next j
For j = 8 To 9
' D13 , locus 5
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.111 Then ph(j) = 11
If ph(j) < 0.198 Then ph(j) = 1
If ph(j) < 0.268 Then ph(j) = 2
If ph(j) < 0.543 Then ph(j) = 3
If ph(j) < 0.862 Then ph(j) = 4
If ph(j) < 0.944 Then ph(j) = 5
If ph(j) < 1 Then ph(j) = 6
If ph(j) > 10 Then ph(j) = 0
If ph(j) = "1" Then flag = 1
If ph(j) = "5" Then flag = 1
If ph(j) = "6" Then flag = 1
Next j
For j = 10 To 11
' D7, locus 6
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.02 Then ph(j) = 11
If ph(j) < 0.162 Then ph(j) = 1
If ph(j) < 0.302 Then ph(j) = 2
If ph(j) < 0.589 Then ph(j) = 3
If ph(j) < 0.816 Then ph(j) = 4
If ph(j) < 0.955 Then ph(j) = 5
If ph(j) < 0.993 Then ph(j) = 6
If ph(j) < 1 Then ph(j) = 7
If ph(j) > 10 Then ph(j) = 0
If ph(j) = "0" Then flag = 1
If ph(j) = "6" Then flag = 1
If ph(j) = "7" Then flag = 1
Next j
For j = 12 To 13
' D16, locus 7
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.017 Then ph(j) = 11
If ph(j) < 0.133 Then ph(j) = 1
If ph(j) < 0.182 Then ph(j) = 2
If ph(j) < 0.472 Then ph(j) = 3
If ph(j) < 0.824 Then ph(j) = 4
If ph(j) < 0.972 Then ph(j) = 5
If ph(j) < 1 Then ph(j) = 6
If ph(j) > 10 Then ph(j) = 0
If ph(j) = "0" Then flag = 1
If ph(j) = "2" Then flag = 1
If ph(j) = "6" Then flag = 1
Next j
For j = 14 To 15
' FGA, locus 8
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.002 Then ph(j) = 11
If ph(j) < 0.011 Then ph(j) = 1
If ph(j) < 0.091 Then ph(j) = 2
If ph(j) < 0.245 Then ph(j) = 3
If ph(j) < 0.429 Then ph(j) = 4
If ph(j) < 0.431 Then ph(j) = 5
If ph(j) < 0.605 Then ph(j) = 6
If ph(j) < 0.612 Then ph(j) = 7
If ph(j) < 0.755 Then ph(j) = 8
If ph(j) < 0.897 Then ph(j) = 9
If ph(j) < 0.963 And ph(j) >= 0.897 Then pb(j) = "A"
If ph(j) < 0.993 And ph(j) >= 0.963 Then pb(j) = "B"
If ph(j) < 1 And ph(j) >= 0.993 Then pb(j) = "C"
If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)
If ph(j) = "0" Then flag = 1
If ph(j) = "1" Then flag = 1
If ph(j) = "2" Then flag = 1
If ph(j) = "5" Then flag = 1
If ph(j) = "7" Then flag = 1
If pb(j) = "A" Then flag = 1
If pb(j) = "B" Then flag = 1
If pb(j) = "C" Then flag = 1
Next j
For j = 16 To 17
' D21, locus 9
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.005 Then ph(j) = 11
If ph(j) < 0.041 Then ph(j) = 1
If ph(j) < 0.201 Then ph(j) = 2
If ph(j) < 0.452 Then ph(j) = 3
If ph(j) < 0.454 Then ph(j) = 4
If ph(j) < 0.683 Then ph(j) = 5
If ph(j) < 0.7 Then ph(j) = 6
If ph(j) < 0.785 Then ph(j) = 7
If ph(j) < 0.874 Then ph(j) = 8
If ph(j) < 0.886 Then ph(j) = 9
If ph(j) < 0.964 And ph(j) >= 0.886 Then pb(j) = "A"
If ph(j) < 0.996 And ph(j) >= 0.964 Then pb(j) = "B"
If ph(j) < 0.998 And ph(j) >= 0.996 Then pb(j) = "C"
If ph(j) < 1 And ph(j) >= 0.998 Then pb(j) = "D"
If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)
If ph(j) = "0" Then flag = 1
If ph(j) = "1" Then flag = 1
If ph(j) = "4" Then flag = 1
If ph(j) = "6" Then flag = 1
If ph(j) = "7" Then flag = 1
If ph(j) = "9" Then flag = 1
If pb(j) = "A" Then flag = 1
If pb(j) = "B" Then flag = 1
If pb(j) = "C" Then flag = 1
If pb(j) = "D" Then flag = 1
Next j
For j = 18 To 19
' D18, locus 10
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.005 Then ph(j) = 11
If ph(j) < 0.017 Then ph(j) = 1
If ph(j) < 0.137 Then ph(j) = 2
If ph(j) < 0.256 Then ph(j) = 3
If ph(j) < 0.435 Then ph(j) = 4
If ph(j) < 0.602 Then ph(j) = 5
If ph(j) < 0.742 Then ph(j) = 6
If ph(j) < 0.863 Then ph(j) = 7
If ph(j) < 0.924 Then ph(j) = 8
If ph(j) < 0.955 Then ph(j) = 9
If ph(j) < 0.974 And ph(j) >= 0.955 Then pb(j) = "A"
If ph(j) < 0.986 And ph(j) >= 0.974 Then pb(j) = "B"
If ph(j) < 0.991 And ph(j) >= 0.986 Then pb(j) = "C"
If ph(j) < 0.998 And ph(j) >= 0.991 Then pb(j) = "D"
If ph(j) < 1 And ph(j) >= 0.998 Then pb(j) = "E"
If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)
If ph(j) = "0" Then flag = 1
If ph(j) = "1" Then flag = 1
If ph(j) = "8" Then flag = 1
If ph(j) = "9" Then flag = 1
If pb(j) = "A" Then flag = 1
If pb(j) = "B" Then flag = 1
If pb(j) = "C" Then flag = 1
If pb(j) = "D" Then flag = 1
If pb(j) = "E" Then flag = 1
If pb(j) = "F" Then flag = 1
Next j
For j = 20 To 21
' THO1 , locus 11
' RNG random number generator
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.239 Then ph(j) = 11
If ph(j) < 0.403 Then ph(j) = 1
If ph(j) < 0.512 Then ph(j) = 2
If ph(j) < 0.679 Then ph(j) = 3
If ph(j) < 0.988 Then ph(j) = 4
If ph(j) < 1 Then ph(j) = 5
If ph(j) > 10 Then ph(j) = 0
If ph(j) = "5" Then flag = 1
Next j
For j = 22 To 23
' TPOX, locus 12
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.543 Then ph(j) = 11
If ph(j) < 0.632 Then ph(j) = 1
If ph(j) < 0.7 Then ph(j) = 2
If ph(j) < 0.968 Then ph(j) = 3
If ph(j) < 1 Then ph(j) = 4
If ph(j) > 10 Then ph(j) = 0
If ph(j) = "2" Then flag = 1
If ph(j) = "4" Then flag = 1
Next j
For j = 24 To 25
' CSF1PO , locus 13
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.002 Then ph(j) = 11
If ph(j) < 0.031 Then ph(j) = 1
If ph(j) < 0.301 Then ph(j) = 2
If ph(j) < 0.62 Then ph(j) = 3
If ph(j) < 0.928 Then ph(j) = 4
If ph(j) < 0.991 Then ph(j) = 5
If ph(j) < 0.998 Then ph(j) = 6
If ph(j) < 1 Then ph(j) = 7
If ph(j) > 10 Then ph(j) = 0
If ph(j) = "0" Then flag = 1
If ph(j) = "1" Then flag = 1
If ph(j) = "5" Then flag = 1
If ph(j) = "6" Then flag = 1
If ph(j) = "7" Then flag = 1
Next j
If flag = 1 Then countf = countf + 1
If flag = 0 Then
' output the original generated file
' output the original generated file
Write #1, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) & ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25)
' Because in real DNA profiles without further info ,no one
' knows which allele in each pair came from the mother or father
' by convention they are written smaller ,larger (or equal).
' The following directs each pair
For j = 0 To 24 Step 2
If ph(j + 1) < ph(j) Then
jjj = ph(j)
ph(j) = ph(j + 1)
ph(j + 1) = jjj
End If
Next j
' put extra conditional statements here to reduce
' the number of files or just delete some of the following
'
' dividing on first col
If ph(20) = 0 Then
Write #10, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count0 = count0 + 1
End If
If ph(20) = 1 Then
Write #11, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count1 = count1 + 1
End If
If ph(20) = 2 Then
Write #12, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count2 = count2 + 1
End If
If ph(20) = 3 Then
Write #13, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count3 = count3 + 1
End If
If ph(20) = 4 Then
Write #14, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count4 = count4 + 1
End If
If ph(20) = 5 Then
Write #15, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count5 = count5 + 1
End If
If ph(20) = 6 Then
Write #16, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count6 = count6 + 1
End If
If ph(20) = 7 Then
Write #17, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count7 = count7 + 1
End If
If ph(20) = 8 Then
Write #18, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count8 = count8 + 1
End If
If ph(20) = 9 Then
Write #19, ph(20) & ph(21) & ph(22) & ph(23) & ph(24) & ph(25) & ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count9 = count9 + 1
End If
end if
Next x
Close #10
Close #11
Close #12
Close #13
Close #14
Close #15
Close #16
Close #17
Close #18
Close #19
Close #1
' count file for data to fix for - next loops in sucessive dividings
Open "aug02a-c" For Output As #20
Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9
Close #20
Results
With a quicker algorithm it now takes me a couple
of hours on a ten year old pc to check for all partial matches.
This presumably translates to about 10 minutes on a
modern pc. I'll stick with that, without burrowing
into b-trees, bb-trees, hash tables etc.
Basically it goes more in the direction of
discarding non-matching alleles as >3 non-matches
occur quicker than >8 matches .
Anyway results for half of 65,000 using RCMP
Cauacasian allele frequency data.
13 loci CODIS simulation using RCMP Toronto Caucasian data
http://www.csfs.ca/databases/cfs_CC_ProfilerPlus_freq.htm
http://www.csfs.ca/databases/cfs_CC_Cofiler_freq.htm
So anticipating
about 1/4 of 72 = 18 , 9 loci matches in 32,500
Result
34 9 loci matches
One 10 loci match , that one for
D3,vWA,D8,D5,D13,D7,D16,FGA,D21,D18,THO1,TPOX,CSF1PO
(16,20)(14,17)(12,14)(11,12)(11,14)(12,12)(12,13)(20,24)(29,30)(14,15)(6,9.3)(8,8)(10,12) and
(15,15)(18,18)(12,14)(11,12)(9,11)(12,12)(12,13)(20,24)(29,30)(14,15)(6,9.3)(8,8)(10,12)
Does anyone have any idea of the Arizona allele frequencies ?
I was expecting to decrease the randomness, having to
build in co-ancestry to match real world results.
I cannot of course increase the randomness,
it is a very well behaved RNG.
Including a proportion of afro-caribbean 'profiles'
I would not expect that much difference as for these
sorts of matches it is much more heavily biased towards
the largest allele frequencies for which A-C values, some
increase and some decrease cf caucasian.
Analysing the allele frequencies occuring in these 35
matches then as usual the highest standard AFs
are increased , middle ranking stay much the same and
rare ones drop out in effect.
For just the most common alleles per locus
D3 15, goes from normal AF of 26 per cent to matches AF of 30 pc
vWA 17, goes from 27 to 31 per cent
D8 14, 32 to 48 pc
D5 12, 38 to 46
D13 12, 31 to 46
D7 10, 29 to 31
D16 12, 35 to 45
FGA 21, 18 to 21
D21 29, 25 to 35
D18 14, 18 to 23
THO1 9.3, 31 to 37
TPOX 8, 54 to 76
CSF1 11, 31 to 40
I've still not seen confirmation that the Arizona
results refer to permutated partial matches rather
than first 9 of 13 / 10 of 13 loci matches
which if leaving out the TPOX one would
bring the 13 down to more like 12 in effect,
lowering the results
The results not converted
Locus 1
"244555664414349C5A77040034","24235566442434465A45040034",9
"25355545451334393548340324","25355545444434463544340324",9
"3324455533143536335A240345","33244555331135443525240345",9
"33343544343534495545230023","33342344343534193545040023",9
"33345655334534495558030034","3334565533453426AA46440034",9
"34334555141334487A57340023","34334555141334343525340323",9
"34345755133544465A24440024","34345755023544693524440034",9
"343555364534342A5744010345","34352536233434135744010334",9
"3435574604354646337B440034","34355745043546383336000034",9
"344546443424554A3A35130024","344546440424556A5A35120024",9
"34455655443314382535340423","34455645463314382534340323",9
"35265544443433462356440233","35245544443433892378440235",9
"35344745044434462544040334","35342745044434465A9A040034",9
"44364446443434685722030024","44344456440234685722110024",9
"44364545343414463734030345","443545453434144A3725030324",9
"45345545233445235A57010024","45345545233345045A48040024",9
"45354555343433442934030324","45354555343433396866230324",9
"45355545043434383338110323","45355545042646383379110333",9
"45456645143435441226340023","45456645143434445826340333",9
"55354535333414493524010334","55144535334414693522010334",9
"553645553411344B3522240045","55364555341134365722240334",9
"55465645141345283523340034","55465645041345242723010034",9
"56255545042434292366240134","56252445042434462347140134",9
"56333556343434232357040044","56255856343434233A38040044",9
"56345656444535695A24240024","56345656344535393547240024",9
locus 2
"45345536342434363522030334","23344444341234363522030334",9
"453466450414458B5A47440123","343466450414132A5A47440124",9
"55346745340344463345240324","35346745342334393345240324",9
"36355544044534682327010023","25355544041334362327010024",9
"45455644343444483578140024","25455636343444283545140024",9
"44453425441534693522030024","34453556441534693537030024",9
"46453545233514391248013323","45453545233514663A57013323",9
"354656453424148A2569440034","344656453445144A3369440034",9
"34565545331234682327040024","26564645331234682324010024",9
Locus 3
"48144645365545393545040024","33554645135545393545040024",10
So most matches are found on the first 2 processings.
These took a few seconds each for dividing
and about an hour each for match checking.
Running the >8 percent version for xx = 1,055,000
would produce about 32500 profiles. Then sort each
sub-file , reconcatenate and then use the dnas5
match checker.
Results for 32,488 such profiles then
first 8 loci matches , = 70
9 loci matches = 5
10 loci matches 0
I've not had time to fully check the following possible algorithm let
alone implement it.
To reduce the number of profiles before "each with each"
checking for maximal partial matches.
for 13 loci
Pair up loci, 1 &2, .... 11&12
Produce a look-up table of "allele frequencies" for
each pairing of allele quads. Then delete all profiles that show only
1 frequency of occurance in 5 or more pairings. That would seem
not to throw out the baby with the bathwater but would
need further checking. Loci pairs with 1 locus matching and one unmatching
, if rearanged as the next step would not apparently have
been elligible for inclusion in the next deletion round.
Re-assigning the remaing profiles as
2 & 3, ....... 12 & 13 and repeating.
Then checking each remaining profile with each other
for 9,10,11,12 and 13 loci partial matches.
For 10 loci, similar
but final output for 8,9 and 10 loci matches.
Pair up as 1 & 2 ...... 9 & 10
delete any profile with 3 or more single pairwise occurances.
pair up as 2 & 3 ,........... 8 & 9, 10 & 1
delete any profile with 3 or more single pairwise occurances.
This is my attempt to explain the
Arizona partial DNA profile matches.
Requiring the use of the simple and neat
but surprisingly accurate "Jan Haugland"
approximation for non-integre factorials
via the Gamma function and back to factorial notation.
(n+a)! == n! * (n + (1+a)/2 )^a or a Gamma Function Calculator
on the net (and account for the supplementary "1").
For various coefficients of relationship (C of R)
so statistical combinations of eg 6.5 from 9
( for brothers, CofR =1/2 so 13/2) as well as 9 from 13 so
numbers like 6.5!, 3.25!, 0.5! etc
For a C of R of
0.0385 or 0.5/13, half a locus co-ancestry on average,
(T9 meaning the matching chance for locus 9)
T9 (for > 5.6 per cent, CofR 0.5/13) = 2.6 * 10^-11.
134, 9 loci matches
22.4 , 10 loci matches
2.2, 11 loci matches
0.07 , 12 loci matches
T10 = 1.44*10^-11, T11 = 3.9*10^-12.
On top of that it is only required to add
2 or 3 people from one consanguinous family
so increasing the C of R to 7/8 , to supply
the related 11 and 12 loci matches.
T9 (for > 6 per cent, CofR 0.4/13) = 3.6 * 10^-11.
149, 9 loci matches
39 , 10 loci matches
3.1, 11 loci matches
The Arizona Data is 144, 9 loci matches; 22 , 10 loci matches;
2 related 11 loci matches and 1 related 12 loci match in
65,493 , 13 loci DNA profiles.
My maths involves using the formula for
the first match , from the loaded dice derivation, but gradually ignoring
the minor alleles , rescaling to give an
AF sum of 1 and re-processing until the
maths agrees with reality.
Total anathema to forensic 'scientists'
but DNA matches only involve the small
sub-set of people with ALL large
allele frequencies (like my own DNA
profile, presumably because at least 3
generations of ancestry from only 2 counties).
Not necessarily the largest
at any locus but cerainly not any of the minor
ones. For the Arizona near-match above
my T13 value was determined by ignoring all
AFs ( allele frequencies from the RCMP site )
less than 5.6 per cent . The simulated
populations below used 6 per cent as the cut-off.
My "coefficient of allelic co-ancestry"
for the Arizona simulation above is
T13 = 3.6 * 10^-14 for 13 loci
Thence via the square law the minimum
number of unrelated CODIS profiles
before an evens chance of 1, 13 loci
match is SQRT (2/T) = 7.4 million
Then scaling T by 715, 286, 78 etc
for T9, T10,T11 etc, partial matches and then scaling
by the non-integre combination factors
for 1/2 , 1/4, 1/8, 1/26 etc shared DNA.
The bounds for T13 restrict it to the range of
allele frequencies to be >5.6 per cent to > 6 per cent
(CofR in range 0.4/13 and 0.5/13 )
to give the unrelated 9 loci mastches to be less than
144 on the one hand and not more than 0.6666 unrelated
11 loci matches on the other.
So for T9 = the T(matching chance) for 9 loci, x9 =
number of 9 loci matches, n = half the
square of the population being considered,
C(2.5,9) the number of combinations of
2.5 from 9 because 6.5 (13 loci/2) match, as brothers say.
Then x9 = T9 * n * C(9,13) * C(2.5,9)
Attemps to simulate a population to give
the Arizona 144,22,2,1 numbers did not work.
Even if there was as much as 2.5 percent cousin-
cousin marriages (USA generally less than 1 percent)
that would only contribute a single 9 locus partial match.
It is a juggling optimisation exercise with the
main constraints being:
I've allowed the maximum of 11 loci unrelated partial
matches to be less than 0.6666 so less than one
when summed and rounded, precludes putting the co-ancestry
coefficient too high.
For related matches , 11 loci,
> 1.3333 to give 2 when rounded,
precludes increasing the related numbers too high.
I've made sons and brothers non-exclusive to
a certain extent so say 59,000 unrelated
plus 6,000 (fathers and sons F+Ss) and 4,000 brothers
( B+Bs ) can sum to 65,000. I've also added a
cross-component of random matches between the
related and untrelated sections, to the unrelated
side, relatively minor, but considered.
Target from the Arizona data
144 pairs at 9 loci
22 pairs at 10 loci
2 pairs at 11 loci
1 pair at 12 loci
......... Unrelated / F+Ss / B+Bs / .... Totals
.......... 59,000 ...6,000 .. 4,000 ... 65,000
9 loci... 54.3 ...... 26.9 ... 19.9 ... 101.1
10 loci . 8.7 ....... 12.4 ... 4.7 .... 25.8
11 loci . 0.64 ...... 1.33 ... 0.6 .... 2.6
12 loci . 0.02 ...... 0.05 ... 0.02 ... 0.09
One 12 loci match is easily added by the use of
one 7/8 consanguinity pair of grandfather and
grandson via incestuous son and daughter mating.
Changing to the following gives a better match but
I do not know how to increase the 9 loci figures
without increasing the 10 loci figures outside
the bounds. Cousin matches do not work
either.
......... Unrelated / F+S / B+B / .... Totals
.......... 59,000 ...1,000 .. 5,500 ... 65,000
9 loci... 54.3 ...... 1.0 ... 51.1 .... 106
10 loci . 8.7 ....... 0.46 ... 12.1 .... 21.3
11 loci . 0.64 ...... 0.05 ... 1.27 .... 1.96
and adding a 7/8 consanguinous pair for the
12 loci match.
So the simpler simulations using a non-Bayesian coefficient of
co-ancestry of order of about one allele in 26 gives
the closer results, unless anyone has any ideas how to juggle a
hypothetical population of fathers, brothers, cousins, etc.
The following is adapting the maths to allow
for an overall population coefficient of relationship
ie various degrees of shared DNA loci from .1 in 13 to
3.9 in 13 and also the cut-down AF data.
Below are VB routines to place between
Sub and End Sub. The first one to enter
tabulated Allele Frequency (AF) data into
a file for later use
'Allele Frequency data entry routine
' Some error trapping, if sum of allele frequencies
' is not =1 +/- 0.002 then the routine repeats that
'locus data, if you know you you've made an error '
' just enter 1 for each allele to the end and then repeat
' that locus data from the start
'
'if you manually cut/paste the output file you
'have to add 2 numbers at the end of each line
' before using in later processing, values as such
' are ignored but some numbers have to be there
ll = InputBox("Enter the number of Loci")
ll = Val(ll)
Dim af(20, 30)
yyyy = InputBox(" Output file number ")
yyyy = Val(yyyy)
temp0 = "allele frequencies" & Str(yyyy)
Open temp0 For Output As #10
For m = 1 To ll
Beep
nn = InputBox("Enter the number of alleles in this locus")
nn = Val(nn)
If nn < 2 Or nn > 30 Or nn <> Int(nn) Then
Beep
nn = InputBox(" *** Error *** Non integre or other problem , Enter the number of alleles in this locus")
nn = Val(nn)
End If
Sn = 0
Qn = 0
For p = 1 To nn
pq = InputBox("Enter the allele frequency as a decimal ")
pq = Val(pq)
af(m, p) = pq
Qn = Qn + (pq * pq)
Sn = Sn + pq
Next p
If Sn < 0.998 Or Sn > 1.002 Then
Beep
nn = InputBox(" *** Error *** Sum of AFs not approximately 1 , Enter the number of alleles in this locus")
nn = Val(nn)
Sn = 0
Qn = 0
For p = 1 To nn
pq = InputBox("Enter the allele frequency ")
pq = Val(pq)
'pq=pq/100 here if data is as percent
af(m, p) = pq
Qn = Qn + (pq * pq)
Sn = Sn + pq
Next p
End If
Write #10, nn,
For p = 1 To nn
Write #10, af(m, p),
Next p
Write #10, Sn, Qn
Next m
Close #10
'13 loci processing
' if fails to process , check that the input file for
' first integre equal to the total in the line minus 2
' or if cut and paste job then concattenate all rows into
' one , replacing end of line character with a comma
ll = InputBox("Enter the number of Loci")
ll = Val(ll)
yy = InputBox(" Input file number ")
yy = Val(yy)
temp1 = "allele frequencies" & Str(yy)
yyyy = InputBox(" Output file number ")
yyyy = Val(yyyy)
temp2 = "allele frequencies" & Str(yyyy)
rr = InputBox("Enter the cut-off AF as percent, process only > this value ")
rr = Val(rr)
rr = rr / 100
Dim af(20, 30)
Open temp1 For Input As #1
Ms = 0
Mt = 1
For m = 1 To ll
Sr = 0
Qr = 0
Ss = 0
Qs = 0
Input #1, rs
For q = 1 To rs
Input #1, rq
If rq > rr Then
Qr = Qr + rq * rq
Sr = Sr + rq
End If
Next q
Input #1, ru
Input #1, rv
Ar = Sr * Sr + 0.000001
Ar = 1 / Ar
Ss = Ar * Qr
Qs = Ss * Ss
Ms = (2 - Ss) * Qs
Mt = Ms * Mt
Next m
Close #1
Open temp2 For Output As #4
vv = InputBox("Enter the number in population, no commas ")
vv = Val(vv)
Write #4, Mt
vp = 0.5 * vv * vv
t13 = Mt
t9 = 715 * t13
t10 = 286 * t13
t11 = 78 * t13
t12 = 13 * t13
Write #4, "AF cut-off =", 100 * rr, "population n = ", vp, "T13", t13
For k = 0 To 3
XL = 9 + k
For j = 1 To 39
uu2 = Int(j / 10)
uu3 = (j - uu2 * 10) / 10
If uu2 = 0 Then XLf = XL
If uu2 = 1 Then XLf = XL * (XL - 1)
If uu2 = 2 Then XLf = XL * (XL - 1) * (XL - 2) / 2
If uu2 = 3 Then XLf = XL * (XL - 1) * (XL - 2) * (XL - 3) / 6
' the following statistical combinations factor
' uses an "Jan Haugland" approximation for n!
' (n+a)! == n! * (n + (1+a)/2 )^a or
' for greater accuracy use a Gamma Function Calculator
' or look up tables
uu5 = Exp((1 - uu3) * (Log((XL - uu2 - 1) + (2 - uu3) / 2)))
uu6 = Exp((uu3) * (Log(uu2 + (1 + uu3) / 2)))
uu7 = uu5 * uu6
x9 = 715 * XLf * vp * t9 / uu7
x10 = 286 * XLf * vp * t10 / uu7
x11 = 76 * XLf * vp * t11 / uu7
x12 = 13 * XLf * vp * t12 / uu7
If k = 0 Then
Write #4, 9 + k, j / 10, 0.001 * Int(1000 * x9 + 0.5)
End If
If k = 1 Then
Write #4, 9 + k, j / 10, 0.001 * Int(1000 * x10 + 0.5)
End If
If k = 2 Then
Write #4, 9 + k, j / 10, 0.001 * Int(1000 * x11 + 0.5)
End If
If k = 3 Then
Write #4, 9 + k, j / 10, 0.001 * Int(1000 * x12 + 0.5)
End If
Next j
Next k
Close #4
' For 10 loci maximum
ll = InputBox("Enter the number of Loci")
ll = Val(ll)
yy = InputBox(" Input file number ")
yy = Val(yy)
temp1 = "allele frequencies" & Str(yy)
yyyy = InputBox(" Output file number ")
yyyy = Val(yyyy)
temp2 = "allele frequencies" & Str(yyyy)
rr = InputBox("Enter the cut-off AF as percent, process only > this value ")
rr = Val(rr)
rr = rr / 100
Dim af(20, 30)
Open temp1 For Input As #1
Ms = 0
Mt = 1
For m = 1 To ll
Sr = 0
Qr = 0
Ss = 0
Qs = 0
Input #1, rs
For q = 1 To rs
Input #1, rq
If rq > rr Then
Qr = Qr + rq * rq
Sr = Sr + rq
End If
Next q
Input #1, ru
Input #1, rv
Ar = Sr * Sr + 0.000001
Ar = 1 / Ar
Ss = Ar * Qr
Qs = Ss * Ss
Ms = (2 - Ss) * Qs
Mt = Ms * Mt
Next m
Close #1
Open temp2 For Output As #4
vv = InputBox("Enter the number in population, no commas ")
vv = Val(vv)
Write #4, Mt
vp = 0.5 * vv * vv
t10 = Mt
t6 = 210 * t10
t7 = 120 * t10
t8 = 45 * t10
t9 = 10 * t10
Write #4, "AF cut-off =", 100 * rr, "population n = ", vp, "T10", t10
For k = 0 To 4
XL = 6 + k
For j = 1 To 39
uu2 = Int(j / 10)
uu3 = (j - uu2 * 10) / 10
If uu2 = 0 Then XLf = XL
If uu2 = 1 Then XLf = XL * (XL - 1)
If uu2 = 2 Then XLf = XL * (XL - 1) * (XL - 2) / 2
If uu2 = 3 Then XLf = XL * (XL - 1) * (XL - 2) * (XL - 3) / 6
' the following statistical combinations factor
' uses an "Jan Haugland" approximation for n!
' (n+a)! == n! * (n + (1+a)/2 )^a or
' for greater accuracy use a Gamma Function Calculator
' or look up tables
uu5 = Exp((1 - uu3) * (Log((XL - uu2 - 1) + (2 - uu3) / 2)))
uu6 = Exp((uu3) * (Log(uu2 + (1 + uu3) / 2)))
uu7 = uu5 * uu6
x6 = 210 * XLf * vp * t6 / uu7
x7 = 120 * XLf * vp * t7 / uu7
x8 = 45 * XLf * vp * t8 / uu7
x9 = 10 * XLf * vp * t9 / uu7
x10 = XLf * vp * t10 / uu7
If k = 0 Then
Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x6 + 0.5)
End If
If k = 1 Then
Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x7 + 0.5)
End If
If k = 2 Then
Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x8 + 0.5)
End If
If k = 3 Then
Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x9 + 0.5)
End If
If k = 4 Then
Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x10 + 0.5)
End If
Next j
Next k
Close #4
' For 9 loci maximum
ll = InputBox("Enter the number of Loci")
ll = Val(ll)
yy = InputBox(" Input file number ")
yy = Val(yy)
temp1 = "allele frequencies" & Str(yy)
yyyy = InputBox(" Output file number ")
yyyy = Val(yyyy)
temp2 = "allele frequencies" & Str(yyyy)
rr = InputBox("Enter the cut-off AF as percent, process only > this value ")
rr = Val(rr)
rr = rr / 100
Dim af(20, 30)
Open temp1 For Input As #1
Ms = 0
Mt = 1
For m = 1 To ll
Sr = 0
Qr = 0
Ss = 0
Qs = 0
Input #1, rs
For q = 1 To rs
Input #1, rq
If rq > rr Then
Qr = Qr + rq * rq
Sr = Sr + rq
End If
Next q
Input #1, ru
Input #1, rv
Ar = Sr * Sr + 0.000001
Ar = 1 / Ar
Ss = Ar * Qr
Qs = Ss * Ss
Ms = (2 - Ss) * Qs
Mt = Ms * Mt
Next m
Close #1
Open temp2 For Output As #4
vv = InputBox("Enter the number in population, no commas ")
vv = Val(vv)
Write #4, Mt
vp = 0.5 * vv * vv
t9 = Mt
t6 = 84 * t9
t7 = 36 * t9
t8 = 9 * t9
Write #4, "AF cut-off =", 100 * rr, "population n = ", vp, "t9", t9
For k = 0 To 3
XL = 6 + k
For j = 1 To 39
uu2 = Int(j / 10)
uu3 = (j - uu2 * 10) / 10
If uu2 = 0 Then XLf = XL
If uu2 = 1 Then XLf = XL * (XL - 1)
If uu2 = 2 Then XLf = XL * (XL - 1) * (XL - 2) / 2
If uu2 = 3 Then XLf = XL * (XL - 1) * (XL - 2) * (XL - 3) / 6
' the following statistical combinations factor
' uses an "Jan Haugland" approximation for n!
' (n+a)! == n! * (n + (1+a)/2 )^a or
' for greater accuracy use a Gamma Function Calculator
' or look up tables
uu5 = Exp((1 - uu3) * (Log((XL - uu2 - 1) + (2 - uu3) / 2)))
uu6 = Exp((uu3) * (Log(uu2 + (1 + uu3) / 2)))
uu7 = uu5 * uu6
x6 = 84 * XLf * vp * t6 / uu7
x7 = 36 * XLf * vp * t7 / uu7
x8 = 9 * XLf * vp * t8 / uu7
x9 = 1 * XLf * vp * t9 / uu7
If k = 0 Then
Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x6 + 0.5)
End If
If k = 1 Then
Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x7 + 0.5)
End If
If k = 2 Then
Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x8 + 0.5)
End If
If k = 3 Then
Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x9 + 0.5)
End If
Next j
Next k
Close #4
' For 6 loci maximum
ll = InputBox("Enter the number of Loci")
ll = Val(ll)
yy = InputBox(" Input file number ")
yy = Val(yy)
temp1 = "allele frequencies" & Str(yy)
yyyy = InputBox(" Output file number ")
yyyy = Val(yyyy)
temp2 = "allele frequencies" & Str(yyyy)
rr = InputBox("Enter the cut-off AF as percent, process only > this value ")
rr = Val(rr)
rr = rr / 100
Dim af(20, 30)
Open temp1 For Input As #1
Ms = 0
Mt = 1
For m = 1 To ll
Sr = 0
Qr = 0
Ss = 0
Qs = 0
Input #1, rs
For q = 1 To rs
Input #1, rq
If rq > rr Then
Qr = Qr + rq * rq
Sr = Sr + rq
End If
Next q
Input #1, ru
Input #1, rv
Ar = Sr * Sr + 0.000001
Ar = 1 / Ar
Ss = Ar * Qr
Qs = Ss * Ss
Ms = (2 - Ss) * Qs
Mt = Ms * Mt
Next m
Close #1
Open temp2 For Output As #4
vv = InputBox("Enter the number in population, no commas ")
vv = Val(vv)
Write #4, Mt
vp = 0.5 * vv * vv
t6 = Mt
Write #4, "AF cut-off =", 100 * rr, "population n = ", vp, "T6", t6
For k = 0 To 0
XL = 6 + k
For j = 1 To 9
' for-next and integre problem
uu2 = Int(j / 10)
uu3 = (j - uu2 * 10) / 10
If uu2 = 0 Then XLf = XL
' the following statistical combinations factor
' uses an "Jan Haugland" approximation for n!
' (n+a)! == n! * (n + (1+a)/2 )^a or
' for greater accuracy use a Gamma Function Calculator
' or look up tables
uu5 = Exp((1 - uu3) * (Log((XL - uu2 - 1) + (2 - uu3) / 2)))
uu6 = Exp((uu3) * (Log(uu2 + (1 + uu3) / 2)))
uu7 = uu5 * uu6
x6 = XLf * vp * t6 / uu7
If k = 0 Then
Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x6 + 0.5)
End If
If k = 1 Then
Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x7 + 0.5)
End If
If k = 2 Then
Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x8 + 0.5)
End If
If k = 3 Then
Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x9 + 0.5)
End If
If k = 4 Then
Write #4, 6 + k, j / 10, 0.001 * Int(1000 * x10 + 0.5)
End If
Next j
Next k
Close #4
' partial match checker, set for 10 loci
' has found all deliberately seeded partial match profiles
' but no guarantee it may miss some
' it is linear in operation, double the number of profiles
' and processing tme doubles, so 65000 profiles can be easily checked
Dim ps As String
Dim pt As String
Dim ph(13)
Dim pk(13)
Dim Array2(40000, 3)
Dim Array3(40000, 3)
count3=0
temp0 = "a_jan_19.txt"
temp1 = "a_jan_20_test"
temp2 = "a_jan_20_out"
temp3 = "a_jan_20_out2"
Open temp0 For Input As #10
Open temp1 For Output As #11
Open temp2 For Output As #12
Open temp3 For Output As #13
nn = 0
' load up, 10 loci source file into an array, eventually as column 1 and 3
Do While (EOF(10) = False)
Input #10, pt
c1$ = Mid(pt, 1, 20)
Array2(nn, 1) = c1$
nn = nn + 1
Loop
endlim = nn
Close (10)
Close #10
Write #12,"number of iterations and output partial matches"
' nnn some number more than half the number of profiles
For nnn = 0 To 20000
For nn = 0 To endlim - 1 Step 2
ps = Array2(nn, 1)
a1$ = Mid(ps, 1, 2)
ph(1) = a1$
a2$ = Mid(ps, 3, 2)
ph(2) = a2$
a3$ = Mid(ps, 5, 2)
ph(3) = a3$
a4$ = Mid(ps, 7, 2)
ph(4) = a4$
a5$ = Mid(ps, 9, 2)
ph(5) = a5$
a6$ = Mid(ps, 11, 2)
ph(6) = a6$
a7$ = Mid(ps, 13, 2)
ph(7) = a7$
a8$ = Mid(ps, 15, 2)
ph(8) = a8$
a9$ = Mid(ps, 17, 2)
ph(9) = a9$
a10$ = Mid(ps, 19, 2)
ph(10) = a10$
pt = Array2(nn + 1, 1)
b1$ = Mid(pt, 1, 2)
pk(1) = b1$
b2$ = Mid(pt, 3, 2)
pk(2) = b2$
b3$ = Mid(pt, 5, 2)
pk(3) = b3$
b4$ = Mid(pt, 7, 2)
pk(4) = b4$
b5$ = Mid(pt, 9, 2)
pk(5) = b5$
b6$ = Mid(pt, 11, 2)
pk(6) = b6$
b7$ = Mid(pt, 13, 2)
pk(7) = b7$
b8$ = Mid(pt, 15, 2)
pk(8) = b8$
b9$ = Mid(pt, 17, 2)
pk(9) = b9$
b10$ = Mid(pt, 19, 2)
pk(10) = b10$
Count = 0
' compare adjascent pairs for partial matches
For j = 1 To 10
If ph(j) = pk(j) Then Count = Count + 1
Next j
Array2(nn, 2) = Count
Array2(nn + 1, 2) = Count
' 5 or more partial matches output to array column 2
If Count > 4 Then
Write #12, Array2(nn, 1), Array2(nn, 2); Array2(nn, 3)
Write #12, Array2(nn + 1, 1), Array2(nn, 2), Array2(nn + 1, 3)
End If
' fix original file place number with each profile in array column 3
If nnn = 0 Then
Array2(nn, 3) = nn
Array2(nn + 1, 3) = nn + 1
End If
Next nn
' swap every other profile in the array
For nn = 0 To endlim - 3 Step 2
Array3(nn, 1) = Array2(nn, 1)
Array3(nn, 2) = Array2(nn, 2)
Array3(nn, 3) = Array2(nn, 3)
Array3(nn + 2, 1) = Array2(nn + 2, 1)
Array3(nn + 2, 2) = Array2(nn + 2, 2)
Array3(nn + 2, 3) = Array2(nn + 2, 3)
Array2(nn, 1) = Array3(nn + 2, 1)
Array2(nn + 2, 1) = Array3(nn, 1)
Array2(nn, 2) = -1
Array2(nn + 2, 2) = -1
Array2(nn, 3) = Array3(nn + 2, 3)
Array2(nn + 2, 3) = Array3(nn, 3)
' output snapshots of processed array
if nnn/100 =int(nnn/100) then
Write #13, Array2(nn, 1), Array2(nn, 2), Array2(nn, 3)
Write #13, Array2(nn + 1, 1), Array2(nn + 1, 2), Array2(nn + 1, 3)
'Write #13, Array2(nn + 2, 1), Array2(nn + 2, 2), Array2(nn + 2, 3)
end if
Next nn
Write #12, nnn
' end loop when first profile has reached end
' and last profile reached the first row in the array
If Array2(0, 3) = endlim - 1 Then count3 = count3+1
If Array2(endlim-1,3) = 0 then count3=count3+1
If count3 = 2 then
Exit For
End If
' single iteration beep
For beepc = 1 To 1
For beept = 1 To 200
beepu = 1 / beept
Next beept
Beep
Next beepc
' 100 fold iteration beep
if nnn/100 =int(nnn/100) then
For beepc = 1 To int(nnn/100)
For beept = 1 To 20000
beepu = 1 / beept
Next beept
Beep
Next beepc
end if
Next nnn
j = 2000
' beeps after 1000s
If j / 1000 = Int(j / 1000) Then
For beepc = 1 To (j / 1000)
For beept = 1 To 20000
beepu = 1 / beept
Next beept
Beep
Next beepc
End If
Close #10
Close #11
Close #12
Close #13
' end beep
For beepc = 1 To 2
For beept = 1 To 200000
beepu = 1 / beept
Next beept
Beep
Next beepc
Results using AF data on
Allele Frequencies for 15 Autosomal STR
Loci on U.S. Caucasian, African American,
and Hispanic Populations
J Forensic Sci, July 2003, Vol. 48, No. 4
Paper ID JFS2003045_484
Published 19 May 2003
to approximate Arizona AFs
Native American data on J. of Forensic Sciences
2006, Vol 51 , pt6, 1410-1413
(error in Caucasian D7 listing, allele 9
should probably read 15.313 mot 5.313, sum not 1)
and UK and OZ data previously detailed
Arizona mixed is approximating the prison population of
Arizona which as far as the 2 major components from
http://acjc.state.az.us/pubs/home/Crime_Trends_2005.pdf
page 46
Hispanic 46%
caucasian 35% scaled to 100%
The best fits for the Arizona data is in the region
of 5 percent AF inclusion/exclusion and .7 loci co-ancestry
and .8 loci
and 6 percent AF with .5 to .6 loci co-ancestry.
I've not gone with the 8% cut-off because AF tables
are discrete rather than continuous making calculations
very hit and miss in that region as so few datapoints
The Arizona data target is 144 , 9loci /
22 , 10 loci / 2 , 11 loci (related) / 1 related 12 loci match
For hispanic/caucasian mixed, 65,000 simulated Arizona population set
AF inc/exc loci(9 to 12) co-anc (shared loci) number of matches
.01% 9L 2.6L 100 (0.01% is all AFs taken into account)
2.7 108
3.1 140
3.2 148
.01% 10L 2.6 21.6
2.7 23.6
.01% 11L 2.6 2.0
2.7 2.3
1% 9L 2.1 109 (excluding all AFs less than 1% and rescaling )
2.2 121
2.3 133
2.4 146
1% 10L 2.1 19.6
2.2 24.8
2.3 27.7
2.4 30.8
1% 11L 2.1 2.0
2.2 2.2
2.3 2.5
2.4 2.8
3% 9L 1.3 114 (>3%)
1.4 132
1.5 152
10L 1.3 20.9
1.4 24.6
11L 1.3 1.7
1.4 2.0
5% 9L .7 127
.8 153
10L .7 21.8
.8 26.6
11L .7 1.7
.8 2.0
6% 9L .5 128
.6 153
10L .5 21.6
.6 26.8
11L .5 1.6
.6 2.1
8% 9L .1 129
.2 168
10L .1 20.8
.2 27.4
11L .1 1.5
.2 2.0
How you interpret related to unrelated out
of this I don't know
Projecting to 13 loci then first Codis
match would require 2.3 to 2.57 million,
comparing with something like 20.5 million
for completely random profile ie Bayesian independence.
Interestingly, due to the peaky modal AF structure
for Native Americans , this is the results for
65,000 such population using the 5% : .7/.8
and 6% .5 / .6 co-ancestry factors
For 5% to 6%
9 loci , between 2394 and 6454
10 loci , 412 to 1099
11 loci , 32 to 84
12 loci , 1 to 2.5
For UK , 10 loci , but the same
co-ancestry treatment for a representative
UK population and NDNAD total of 3.2 million samples
For 5 to 6 percent
140 to 177 , 10 loci matches
(totally random simulations show about
2 , unrelated 10 loci matches in 3.2 million )
For 30 million , half the population
for 5 to 6 percent means
12,300 to 15,600 , 10 loci matches
leading to a 14,000 in 30 million
chance of a match to someone else or
1 in 2,150 for any 1 person in the UK
to have a match with someone else in the
UK. Until the corrupt keepers of such databases
discriminate the related / unrelated eg the
22, 10 loci Arizona matches.
www.aph.gov.au/hansard/senate/commttee/s8081.pdf
has reference to 144,546 Australian DNA profiles
in 2005
if all were Caucasian
4.3 to 6.2 , 9 loci matches in 144546
if all were Aborigine
12 to 41 , 9 loci matches
I mixed the 2 principal Arizona Hispanic and Caucasian
allele frequencies in ratio of 46 to 35
scaled to 100 to
represent the Arizona population.
Then the minimum AF and co-ancestry
factors that gave between
127 to 153, 9 loci partial matches
and 21.6 to 26.8 , 10 loci partial matches.
Then used those coefficients
applied to a general USA (1/2)population of
151 million and 13 loci using USA AFs.
This gave the fundamental number of profiles
for an evens chance of 1 unrelated full 13 loci
match as between 2.3 and 2.57 million.
Then the square law of scaling to 151 million
gave between 3,500 and 4,300 so between
1 in 35,00 and 1 in 44,000 of an evens
chance of an unrelated false match
in the USA population. For a 2008 CODIS
total of 6 million DNA profiles then
about 7 unrelated false matches.
In other words for a large arena containing
40,000 males then there is a better than evens
chance for someone there to be falsely matched to
an unrelated person in the USA.
Corresponding figures for the UK , 10 loci
and 30 million , half population.
Between 12,300 and 15,600 unrelated
false matches in 30 million
or between 1 in 1,900 and 1 in 2,400
so a large hall or small arena of men
For Australia and 9 loci and 10.5 million
half population.
Between 22,700 and 32,700 unrelated
false matches in 10.5m
or between 1 in 320 and 1 in 460 so
just a small hall.
Also maths agreement with Forensic Science
International 95 (1998) p30 declaration
of 10 matches at 6 loci in 6311 profiles.
Reducing the maths back to the
6 loci mid 90s SGM structure the results
for 6311 is between 5.8 and 7.7 matches.
The only other disclosed database data I
am aware of is
from Forensic Science
International 95 (1998) p30
of 10 6 loci matches in 6311.
Adapting the UK 10 loci routine back to the
6 loci mid 90s SGM structure the results
for 6311 is between 5.8 and 7.7 matches so
at least consistent.
Repeated background maths
Derivation of the Square Law concerning DNA databases.
Acknowledgement to PeteM ,02 July ,2003
Arrange the N members of the population in a list m1, m2, .... mN.
The probability of a profile match between two members selected at random
is i.
The expected number of matches between m1 and subsequent individuals in
the list (m2,m3,m4 ...) is (N-1)i
The expected number of matches between m2 and subsequent individuals in
the list (m2,m3,m4 ...) is (N-2)i.
And so on. So the total number of expected matches (including triples
etc) is
[sigma from j=1 to j=N] {i*(j-1)} = iN(N-1)/2 ~ 0.5iN^2
General formula for deriving the minimum number of
profiles in a database before false matches occur
Starting with a simpler analogue
Consider a 10 faced loaded dice with weighting
such that
face 0 or face 1 have a probability of 0.2 each
face 2 or 3 , probability 0.15 each
and faces 4 to 9 , 0.05 each
Toss 10 times and record the 10 digit number
Repeat n times.
Determine a number N where a repeat
of a previously occuring 10 digit number will occur.
The probability of a random pair of single
digits matching is
sum of squares = 2(.2^2) + 2(.15^2) + 6(.05)^2 = 0.14 . The digits
in each of the 10 positions are independent, so the overall
probability of all 10 digits matching is ( sum of squares )^10 ~= 2.893e-9, and call p.
To generate N numbers, there are N(N-1)/2 pairs of numbers which
must all be different to avoid a repeat. If the pairs were
independent then the expected number of repeats would be pN(N-1)/2,
which will be 1 when N is about 26,000. The pairs won't actually be
independent, but this estimate for the expected value should be fairly
close for N << 1/p.
N = SQRT(2/p)
By comparison, if
the numbers were unbiased then about 1 repeat in the
first 140,000 numbers.
Now convert to factor-in directed pairs
If all pairs were directed then the new directed pair (dp) probability
would by, taking 2 at a time, be dp = 2p*p but the pairs 00,11,22 etc
are not directed so 2p*p is inflated by the probability of just the doublets
so deduct this factor from the fomula.
The factor dp now becomes (2 * 0.14^2 - 0.14^3)^5.
Now convert to the DNA profile situation and formula becomes
For n loci 1..... 5 (6,9,10,13,15 or any number)
and m (valid) alleles at each locus and 2 per locus.
So Allele Frequencies are AF1 ..... AFm
Let Sn be the sum of the squares of AFs at locus n
ie Sn = AF1^2 + AF2^2 +...... + AFm^2 for each n
Let Qn = Sn^2 for each n
Let p = (Q1 * Q2 * .... * Qn ) [(2-S1) * (2-S2) * .... * (2-Sn)]
Then N = minimum number before evens chance of finding a match is
N = SQRT (2/p)
An interesting study into analysing the Troyer Arizona disclosure,
he is of the opinion it is 120 instead of 144 (inclusive of
higher order matches) 9 loci matches
http://www.ias.ac.in/jgenet/Vol87No2/temp/jgen00133.pdf
Can simple population genetic models reconcile partial match
frequencies observed in large forensic databases?
LAURENCE D. MUELLER
Department of Ecology and Evolutionary Biology, University of California,
Irvine, CA 92697-2525, USA
Surprisingly a lot in there I have to agree with from my own analysis
. But he cannot get a reliable handle on the sub-patterning of 9/10
loci number partial matches and consequently cannot scale up to whole
populations and full 13 loci false matches. So making the exercise
rather futile.
Some quotes , from his article, worth repeating here
If some combination of theta, and relatives can correctly
predict the number of 9-locus matches but not the number of
10-locus matches then it is an unsatisfactory explanation of
the Arizona observations.
...
The results (figure 3) show that as theta increases the
number of matches at 9 and 10 loci increase but the number of matches
at 9 loci increase faster than the number of
matches at 10 loci.
...
Adding pairs of full sibs to the Arizona database increases
both the number of 9-locus and 10-locus matches (figure 5),
but as in the substructure-only-simulations, the number of 9-
locus matches quickly exceeds the number in Arizona well
before the number of 10-locus matches is even close to 20.
Consequently, no models that add sibs alone can adequately
explain the Arizona observations.
...
The general findings are that acceptable parameter values
require fewer pairs of siblings as theta increases. The range of
sibling pairs that produce an adequate description of the Ari-
zona observations is relatively narrow. Thus, if the true num-
ber of sibling pairs was much less than 1000, or much greater
than 3000, then none of these models would produce reliable
predictions of the observed number of matches. The claim
that there is a relatively narrow parameter range that explains
the Arizona results can be put into perspective as follows. If
150 9-locus matches and 15 10-locus matches had actually
been observed in Arizona, then virtually all simulations in
figures 6–8 would have been consistent with this result.
...
These results (figure 9) show that
even with a database composed almost entirely of parent–
offspring pairs the number of matches at 10 loci is far below
the Arizona value. From these results it is reasonable to con-
clude that the only relatives that would possibly contribute to
explaining the Arizona observations are full sibs.
...
Not any number of siblings will work.
...
An additional method for studying these problems would
be to get the profiles from two different states, say Arizona
and Maryland. The number of matches within databases
could be compared to the number between databases. This
latter number would not be expected to be inflated by numer-
ous full sibs and thus should be close to the numbers pre-
dicted by substructure only.
It is clear from these simulations that, even for the best
models, the probability of the Arizona observations is only
9%–12%. The study of additional offender databases would
help add to the empirical foundation of this study and help
assess whether Arizona is the norm or, for some reason, an
odd outlier. Ultimately, if the simple models examined here
cannot adequately explain the number of matches observed
in the Arizona offender database, some modification of the
underlying probability models may be required.
The product rule with some minor modification is the
most common method for computing the frequency of DNA
profiles in forensic laboratories. This method relies critically
on the assumption that there is statistical independence be-
tween loci. The empirical support for this method comes
mainly from tests of independence between pairs of loci (Bu-
dowle et al. 1999). However, recent research on finite pop-
ulations, with mutation and a monogamous mating system
shows that departures from the product rule get worse as
one looks at more loci (Dr Yun Song, personal communica-
tion). Thus, rigorous testing of the product rule predictions
at many loci may yield different results than prior work at
only two loci. Perhaps the most important quality control
issue in forensic DNA typing is determining the adequacy
of the methods for computing profile frequencies. In this
respect offender databases can serve a useful and unique pur-
pose, as apparently intended by the DNA Identification Act.
The tremendous size of these databases makes them a unique
resource which would cost many millions of dollars to recre-
ate. There is certainly much more that can be learned from
additional scientific research with offender databases.
End quotes
From the LAtimes piece, URL above.
To get the Illinous figure of 903 , 9 loci
matches in 220,000 also requires about 3.1
loci of shared co-ancestry. Using a
.62/.3/.08 A-C/Caucasian/Hispanic AF split
, to reflect an Illinois demographic, not
that that probably makes much difference
to these sorts of analysis.
The Maryland data of 32 , 9 loci matches in 30,000
looks too corrupted to be of much use.
I would expect less than 0.001 matches
at 13 loci in 30,000 so presumably
those 3 are duplicates. 29 , 9 loci
matches is difficult to achieve even
with 4 loci of background co-ancestry.
Getting partial match data divulged from these
databases avoids this problem of duplicates
for determining what the true , full profile,
false match figures are within a whole population.
From John Buckleton, New Zealand.
"I have been asked to join this discussion. The partial matches in the
Arizona database have attracted quite an amount of attention much based
on preconception to me. The simple matter is to compare observed and
expected under the relevant population genetic models. Of interest is
whether we use the US model or the European one (which I prefer) and
whether or not we allow a correction for relatives in the datasets.
Whilst I have not done this for the Arizona dataset I (and others) have
now done it for Caucasians in New Zealand and Australia, Australian
Aboriginals, Eastern and Western Polynesians (all published) and
Croatians (in draft). ... deviating off subject and not returning"
The Annals of Applied Statistics
2007,Vol.1,No.2,358–370
DOI:10.1214/07-AOAS128
Institute of Mathematical Statistics, 2007
THE RARITY OF DNA PROFILES
BY BRUCE S. WEIR
University of Washington
http://projecteuclid.org/DPubS/Repository/1.0/Disseminate?view=body&id=pdfview_1&handle=euclid.aoas/1196438022
"The finding of Troyer, Gilroy and Koeneman (2001) was for a pair of profiles
that matched at nine loci, partially matched at three loci and mismatched at one locus."
So in a sense a 10.5 loci unrelated match. (NDNAD uses 10 loci)
As you only get the first tranche of unrelated matches
in the modal group, with all high allele
frequency alleles, like myself (all 4 grandparents
from England) then perhaps it should
not be too surprising that partial or
full matches should apparently reflect
a high level of co-ancestry.
No wonder the FBI want to keep a lid on
this sort of data. Strange they
have released, in the past, (doctored) data
of DNA profiles with ethnicity. The FSS
is perfectly happy to sell millions of
DNA profile data , with ethnicity, to UK companies
http://www.telegraph.co.uk/news/newstopics/politics/lawandorder/2459976/Millions\
-of-profiles-from-DNA-database-passed-to-private-firms.html
25 Jul 2008
The full set of pairs of Arizona 9 loci matches , with
ethnicities, should be released into the public realm
or the only obvious interpretation will be
drawn from its suppression.