Simulation of a large DNA Profile Database, full alleles , >8 per
cent allele frequency co-ancestry
Simulation of a large DNA Profile Database, full alleles, >8 per
cent allele frequency co-ancestry
Result - for people with the same sort of ancestry as myself then
in that subset of the general population there would probably
be between 50 and 80 10 loci matches within 2 million such people.
' Generating 10 loci x2 profiles with
' AF >= 8.7%
' directing pairs and first divider
Dim ph(20)
Dim pb(20)
' initialising Random Number Generator - RNG
count9 = 0
count8 = 0
countf = 0
Randomize
a = 214013
c = 2531011
x0 = Timer
z = 2 ^ 24
' 1 file 'dec29-g' for original, un-directed pairs, source data.
' This file is necessary to check on the performance of the RNG
' when a matched pair is found then it is highly unlikely that
' both sequences as generated, before pair directing, would
' be the same - more likely a manifest of repeat within the RNG
' (reason for adopting the 214013 / 2531011 RNG )
' In final match checking use 'Word' find function on part of the sequences, including pair reversals,
' with luck would include a 'homozygotic' pair eg (3,3) say ,so no reversal
' on that pair
Open "dec29-g" For Output As #1
' outputs directed and divided by first digit
Open "dec29-0" For Output As #10
Open "dec29-1" For Output As #11
Open "dec29-2" For Output As #12
Open "dec29-3" For Output As #13
Open "dec29-4" For Output As #14
Open "dec29-5" For Output As #15
Open "dec29-6" For Output As #16
Open "dec29-7" For Output As #17
Open "dec29-8" For Output As #18
Open "dec29-9" For Output As #19
' change for different total size eg 199999 for 200,000
For x = 0 To 4999
flag = 0
For j = 0 To 1
' vWA ,first locus
' RNG random number generator
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.001 Then ph(j) = 11
If ph(j) < 0.106 Then ph(j) = 1
If ph(j) < 0.186 Then ph(j) = 2
If ph(j) < 0.402 Then ph(j) = 3
If ph(j) < 0.672 Then ph(j) = 4
If ph(j) < 0.891 Then ph(j) = 5
If ph(j) < 0.984 Then ph(j) = 6
If ph(j) < 0.998 Then ph(j) = 7
If ph(j) < 1 Then ph(j) = 8
If ph(j) > 10 Then ph(j) = 0
If ph(j) = "0" Then flag = 1
If ph(j) = "2" Then flag = 1
If ph(j) = "7" Then flag = 1
If ph(j) = "8" Then flag = 1
Next j
For j = 2 To 3
' THO1
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.002 Then ph(j) = 11
If ph(j) < 0.243 Then ph(j) = 1
If ph(j) < 0.437 Then ph(j) = 2
If ph(j) < 0.545 Then ph(j) = 3
If ph(j) < 0.546 Then ph(j) = 4
If ph(j) < 0.686 Then ph(j) = 5
If ph(j) < 0.99 Then ph(j) = 6
If ph(j) < 1 Then ph(j) = 7
If ph(j) > 10 Then ph(j) = 0
If ph(j) = "0" Then flag = 1
If ph(j) = "4" Then flag = 1
If ph(j) = "7" Then flag = 1
Next j
For j = 4 To 5
' D8
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.018 Then ph(j) = 11
If ph(j) < 0.031 Then ph(j) = 1
If ph(j) < 0.125 Then ph(j) = 2
If ph(j) < 0.191 Then ph(j) = 3
If ph(j) < 0.334 Then ph(j) = 4
If ph(j) < 0.667 Then ph(j) = 5
If ph(j) < 0.876 Then ph(j) = 6
If ph(j) < 0.964 Then ph(j) = 7
If ph(j) < 0.995 Then ph(j) = 8
If ph(j) < 1 Then ph(j) = 9
If ph(j) > 10 Then ph(j) = 0
If ph(j) = "0" Then flag = 1
If ph(j) = "1" Then flag = 1
If ph(j) = "3" Then flag = 1
If ph(j) = "8" Then flag = 1
If ph(j) = "9" Then flag = 1
Next j
For j = 6 To 7
' FGA
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.025 Then ph(j) = 11
If ph(j) < 0.081 Then ph(j) = 1
If ph(j) < 0.224 Then ph(j) = 2
If ph(j) < 0.226 And ph(j) >= 0.224 Then pb(j) = "A"
If ph(j) < 0.413 Then ph(j) = 3
If ph(j) < 0.415 And ph(j) >= 0.413 Then pb(j) = "B"
If ph(j) < 0.58 Then ph(j) = 4
If ph(j) < 0.591 And ph(j) >= 0.58 Then pb(j) = "C"
If ph(j) < 0.73 Then ph(j) = 5
If ph(j) < 0.734 And ph(j) >= 0.73 Then pb(j) = "D"
If ph(j) < 0.88 Then ph(j) = 6
If ph(j) < 0.882 And ph(j) >= 0.88 Then pb(j) = "E"
If ph(j) < 0.957 Then ph(j) = 7
If ph(j) < 0.992 Then ph(j) = 8
If ph(j) < 0.999 Then ph(j) = 9
If ph(j) < 1 And ph(j) >= 0.999 Then pb(j) = "F"
If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)
If ph(j) = "0" Then flag = 1
If ph(j) = "1" Then flag = 1
If ph(j) = "7" Then flag = 1
If ph(j) = "8" Then flag = 1
If ph(j) = "9" Then flag = 1
If pb(j) = "A" Then flag = 1
If pb(j) = "B" Then flag = 1
If pb(j) = "C" Then flag = 1
If pb(j) = "D" Then flag = 1
If pb(j) = "E" Then flag = 1
If pb(j) = "F" Then flag = 1
Next j
For j = 8 To 9
' D21
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.001 Then pb(j) = "A"
If ph(j) < 0.002 And ph(j) >= 0.001 Then pb(j) = "B"
If ph(j) < 0.033 Then ph(j) = 11
If ph(j) < 0.193 Then ph(j) = 1
If ph(j) < 0.419 Then ph(j) = 2
If ph(j) < 0.677 Then ph(j) = 3
If ph(j) < 0.704 Then ph(j) = 4
If ph(j) < 0.773 Then ph(j) = 5
If ph(j) < 0.866 Then ph(j) = 6
If ph(j) < 0.884 Then ph(j) = 7
If ph(j) < 0.974 Then ph(j) = 8
If ph(j) < 0.975 And ph(j) >= 0.974 Then pb(j) = "C"
If ph(j) < 0.997 Then ph(j) = 9
If ph(j) < 1 And ph(j) >= 0.997 Then pb(j) = "D"
If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)
If ph(j) = "0" Then flag = 1
If ph(j) = "4" Then flag = 1
If ph(j) = "5" Then flag = 1
If ph(j) = "7" Then flag = 1
If ph(j) = "9" Then flag = 1
If pb(j) = "A" Then flag = 1
If pb(j) = "B" Then flag = 1
If pb(j) = "C" Then flag = 1
If pb(j) = "D" Then flag = 1
Next j
For j = 10 To 11
' D18
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.001 Then pb(j) = "A"
If ph(j) < 0.009 And ph(j) >= 0.001 Then pb(j) = "B"
If ph(j) < 0.021 Then ph(j) = 11
If ph(j) < 0.16 Then ph(j) = 1
If ph(j) < 0.285 Then ph(j) = 2
If ph(j) < 0.449 Then ph(j) = 3
If ph(j) < 0.594 Then ph(j) = 4
If ph(j) < 0.731 Then ph(j) = 5
If ph(j) < 0.846 Then ph(j) = 6
If ph(j) < 0.926 Then ph(j) = 7
If ph(j) < 0.967 Then ph(j) = 8
If ph(j) < 0.982 Then ph(j) = 9
If ph(j) < 0.992 And ph(j) >= 0.982 Then pb(j) = "C"
If ph(j) < 0.997 And ph(j) >= 0.992 Then pb(j) = "D"
If ph(j) < 0.998 And ph(j) >= 0.997 Then pb(j) = "E"
If ph(j) < 1 And ph(j) >= 0.998 Then pb(j) = "F"
' allele 20 (C) reduced from .017 to .015 as allele
' frequencies summed to 1.002
If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)
If ph(j) = "0" Then flag = 1
If ph(j) = "7" Then flag = 1
If ph(j) = "8" Then flag = 1
If ph(j) = "9" Then flag = 1
If pb(j) = "A" Then flag = 1
If pb(j) = "B" Then flag = 1
If pb(j) = "C" Then flag = 1
If pb(j) = "D" Then flag = 1
If pb(j) = "E" Then flag = 1
If pb(j) = "F" Then flag = 1
Next j
For j = 12 To 13
' D2S1338
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.037 Then ph(j) = 11
If ph(j) < 0.222 Then ph(j) = 1
If ph(j) < 0.309 Then ph(j) = 2
If ph(j) < 0.419 Then ph(j) = 3
If ph(j) < 0.557 Then ph(j) = 4
If ph(j) < 0.589 Then ph(j) = 5
If ph(j) < 0.613 Then ph(j) = 6
If ph(j) < 0.725 Then ph(j) = 7
If ph(j) < 0.867 Then ph(j) = 8
If ph(j) < 0.978 Then ph(j) = 9
If ph(j) < 0.997 And ph(j) >= 0.978 Then pb(j) = "A"
If ph(j) < 1 And ph(j) >= 0.997 Then pb(j) = "B"
If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)
If ph(j) = "0" Then flag = 1
If ph(j) = "5" Then flag = 1
If ph(j) = "6" Then flag = 1
If pb(j) = "A" Then flag = 1
If pb(j) = "B" Then flag = 1
Next j
For j = 14 To 15
' D16
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.019 Then ph(j) = 11
If ph(j) < 0.148 Then ph(j) = 1
If ph(j) < 0.202 Then ph(j) = 2
If ph(j) < 0.491 Then ph(j) = 3
If ph(j) < 0.779 Then ph(j) = 4
If ph(j) < 0.965 Then ph(j) = 5
If ph(j) < 0.994 Then ph(j) = 6
If ph(j) < 1 Then ph(j) = 7
If ph(j) > 10 Then ph(j) = 0
If ph(j) = "0" Then flag = 1
If ph(j) = "2" Then flag = 1
If ph(j) = "6" Then flag = 1
If ph(j) = "7" Then flag = 1
Next j
For j = 16 To 17
' D19
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.087 Then ph(j) = 11
If ph(j) < 0.309 Then ph(j) = 1
If ph(j) < 0.322 Then ph(j) = 2
If ph(j) < 0.704 Then ph(j) = 3
If ph(j) < 0.719 Then ph(j) = 4
If ph(j) < 0.896 Then ph(j) = 5
If ph(j) < 0.934 Then ph(j) = 6
If ph(j) < 0.975 Then ph(j) = 7
If ph(j) < 0.992 Then ph(j) = 8
If ph(j) < 0.997 Then ph(j) = 9
If ph(j) < 0.999 And ph(j) >= 0.997 Then pb(j) = "A"
If ph(j) < 1 And ph(j) >= 0.999 Then pb(j) = "B"
If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)
If ph(j) = "2" Then flag = 1
If ph(j) = "4" Then flag = 1
If ph(j) = "6" Then flag = 1
If ph(j) = "7" Then flag = 1
If ph(j) = "8" Then flag = 1
If ph(j) = "9" Then flag = 1
If pb(j) = "A" Then flag = 1
If pb(j) = "B" Then flag = 1
Next j
For j = 18 To 19
' D3
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.001 Then ph(j) = 11
If ph(j) < 0.007 Then ph(j) = 1
If ph(j) < 0.139 Then ph(j) = 2
If ph(j) < 0.404 Then ph(j) = 3
If ph(j) < 0.651 Then ph(j) = 4
If ph(j) < 0.846 Then ph(j) = 5
If ph(j) < 0.987 Then ph(j) = 6
If ph(j) < 1 Then ph(j) = 7
If ph(j) > 10 Then ph(j) = 0
If ph(j) = "0" Then flag = 1
If ph(j) = "1" Then flag = 1
If ph(j) = "7" Then flag = 1
Next j
If flag = 1 Then countf = countf + 1
If flag = 0 Then
' output the original generated file
Write #1, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
' Because in real DNA profiles without further info ,no one
' knows which allele in each pair came from the mother or father
' by convention they are written smaller ,larger (or equal).
' The following directs each pair
For j = 0 To 18 Step 2
If ph(j + 1) < ph(j) Then
jjj = ph(j)
ph(j) = ph(j + 1)
ph(j + 1) = jjj
End If
Next j
' put extra conditional statements here to reduce
' the number of files or just delete some of the following
'
' dividing on first column, file by file
If ph(0) = 0 Then
Write #10, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count0 = count0 + 1
End If
If ph(0) = 1 Then
Write #11, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count1 = count1 + 1
End If
If ph(0) = 2 Then
Write #12, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count2 = count2 + 1
End If
If ph(0) = 3 Then
Write #13, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count3 = count3 + 1
End If
If ph(0) = 4 Then
Write #14, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count4 = count4 + 1
End If
If ph(0) = 5 Then
Write #15, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count5 = count5 + 1
End If
If ph(0) = 6 Then
Write #16, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count6 = count6 + 1
End If
If ph(0) = 7 Then
Write #17, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count7 = count7 + 1
End If
If ph(0) = 8 Then
Write #18, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count8 = count8 + 1
End If
If ph(0) = 9 Then
Write #19, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19)
count9 = count9 + 1
End If
End If
Next x
Close #10
Close #11
Close #12
Close #13
Close #14
Close #15
Close #16
Close #17
Close #18
Close #19
Close #1
' count file for data to fix for - next loops in sucessive dividings
' count for >= 8.7% alleles is x minus countf
' countf is the count of rejected profiles with at least one
' allele less than 8.7% af
Open "dec29-c" For Output As #20
Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9, "countflag", countf
Close #20
Results
My own genetic background is parents and
grandparents from 2 separated English counties.
Then a generation further back one from another
county and one from Ireland. Further back ,
then a Huguenot input but too far back to influence
my DNA profile.
So for these purposes '4 fourths English' presumably
reflected in my own DNA profile which contains
alleles of 8.7% or more frequency of occurance,
ie nothing rarer than 8.7%.
Unless anyone can show me otherwise I will
take that as farely representative, I cannot
believe my background is that exceptional
for England.
I have adjusted my simulation to only generate
profiles with allele frequencies more than 8%
This increases the chance of false matches within
THAT subset from the general UK Caucasian
of 1.2 in 2 million to about 1 in 270,000.
381,843 such >8% profiles produced 2
' 10 loci ' matches
"33566645121479143336"
"44164545232517133546"
Which translate to standard form
vWA,THO1,D8,FGA,D2,D18,D2,D16,D19,D3
(16,16)(9,9.3)(14,14)(22,23)(28,29)(12,15)(23,25)(9,12)(14,14)(15,18) and
(17,17)(6,9.3)(12,13)(22,23)(29,30)(13,16)(17,23)(9,11)(14,15)(16,18)
and as usual the lowest frequency of any allele found
in the above 2 matches is increased, this time to 11.1%
Matches started as
33566654124179413363 and
33656645121479143336
44615454325271315364 and
44165445232571313546
so RNG good
Other matches for first (not any )
12 digit 19,089
14 837
16 104
18 8
19 true excluding '20s' 5
20 2
So 2 matches in 381,843 scales to about
1 in 220,000 to 270,000 (as 2 represents >2 and <3 )
or going the other way assuming square law
doesn't breakdown then about
54 to 82 in 2 million or
1,400 to 2,000 10 loci matches in 10 million.
The corresponding result for 19 out of 20 matches
and that is permutate any 19 from 20 and for
a population of 4 million is about
(4*60) * 120 <> 28,000
( 60 between 54 and 82, scaled for 4 million and 120 from dnas5.htm results)
or 1 in 140
If the NDNAD was still using the pre-1999 6 loci
structure then for 2 million profiles of this >= 8%
subset then that would have meant about 523,000
matching profiles within the database.
For the NDNAD as it stood with 840,000 profiles
in Spring 2000 mainly 6 loci profiles then number of
matches must be greater than 5,000 and less than 90,000 .
5,000 scaled from dnas5.htm simulation (no co-ancestry ) and 90,000 scaled from
the dnas6.htm simulation (>= 8% allele frequency only).
On top of this would be people using aliases and double
entries but that figure is unquantifiable by simulation.
For the brother matching situation then the
general UK Caucasian figure of 1 in 30,000
probably drops to about 1 in 600 for '4-fourths English'.