Simulation of a large DNA Profile Database, full alleles , for within
and without ethnic clustering
Simulation of a large DNA Profile Database, full alleles, for within
and without ethnic clustering
' Generating 8 loci x2 profiles (Taiwan Tao loci
' vWA,THO1,D8,FGA,D21,D18,D16,D3
' 10% flagged 'TT'
' directing pairs and first divider
Dim ph(20)
Dim pb(20)
' initialising Random Number Generator - RNG
count9 = 0
count8 = 0
Randomize
a = 214013
c = 2531011
x0 = Timer
z = 2 ^ 24
' 1 file 'jan03-Tfg' for original, un-directed pairs, source data.
' This file is necessary to check on the performance of the RNG
' when a matched pair is found then it is highly unlikely that
' both sequences as generated, before pair directing, would
' be the same - more likely a manifest of repeat within the RNG
' (reason for adopting the 214013 / 2531011 RNG )
' Use 'Word' find function on part of the sequences, including pair reversals,
' with luck would include a 'homozygotic' pair eg (3,3) say ,so no reversal
' on that pair
Open "jan03-Tfg" For Output As #1
' outputs directed and divided by first digit
Open "jan03-Tf0" For Output As #10
Open "jan03-Tf1" For Output As #11
Open "jan03-Tf2" For Output As #12
Open "jan03-Tf3" For Output As #13
Open "jan03-Tf4" For Output As #14
Open "jan03-Tf5" For Output As #15
Open "jan03-Tf6" For Output As #16
Open "jan03-Tf7" For Output As #17
' change for different total size eg 199999 for 200,000
For x = 0 To 4999
For j = 0 To 1
' vWA
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.111 Then ph(j) = 11
If ph(j) < 0.179 Then ph(j) = 1
If ph(j) < 0.358 Then ph(j) = 2
If ph(j) < 0.482 Then ph(j) = 3
If ph(j) < 0.734 Then ph(j) = 4
If ph(j) < 0.953 Then ph(j) = 5
If ph(j) < 1 Then ph(j) = 6
If ph(j) > 10 Then ph(j) = 0
Next j
For j = 2 To 3
' THO1
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.026 Then ph(j) = 11
If ph(j) < 0.351 Then ph(j) = 1
If ph(j) < 0.419 Then ph(j) = 2
If ph(j) < 0.987 Then ph(j) = 3
If ph(j) < 1 Then ph(j) = 4
If ph(j) > 10 Then ph(j) = 0
Next j
For j = 4 To 5
' D8
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.128 Then ph(j) = 11
If ph(j) < 0.222 Then ph(j) = 1
If ph(j) < 0.243 Then ph(j) = 2
If ph(j) < 0.807 Then ph(j) = 3
If ph(j) < 0.905 Then ph(j) = 4
If ph(j) < 0.978 Then ph(j) = 5
If ph(j) < 0.991 Then ph(j) = 6
If ph(j) < 1 Then ph(j) = 8
If ph(j) > 10 Then ph(j) = 0
Next j
For j = 6 To 7
' FGA
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.021 Then ph(j) = 11
If ph(j) < 0.495 Then ph(j) = 1
If ph(j) < 0.559 Then ph(j) = 2
If ph(j) < 0.606 Then ph(j) = 3
If ph(j) < 0.683 Then ph(j) = 4
If ph(j) < 0.687 Then ph(j) = 5
If ph(j) < 0.846 Then ph(j) = 6
If ph(j) < 0.863 Then ph(j) = 7
If ph(j) < 0.958 Then ph(j) = 8
If ph(j) < 0.996 Then ph(j) = 9
If ph(j) < 1 And ph(j) >= 0.996 Then pb(j) = "A"
If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)
Next j
For j = 8 To 9
' D21
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.009 Then ph(j) = 11
If ph(j) < 0.304 Then ph(j) = 1
If ph(j) < 0.436 Then ph(j) = 2
If ph(j) < 0.466 Then ph(j) = 4
If ph(j) < 0.547 Then ph(j) = 5
If ph(j) < 0.556 Then ph(j) = 6
If ph(j) < 0.761 Then ph(j) = 7
If ph(j) < 1 Then ph(j) = 9
If ph(j) > 10 Then ph(j) = 0
Next j
For j = 10 To 11
' D18
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.004 Then ph(j) = 11
If ph(j) < 0.107 Then ph(j) = 1
If ph(j) < 0.214 Then ph(j) = 2
If ph(j) < 0.47 Then ph(j) = 3
If ph(j) < 0.577 Then ph(j) = 4
If ph(j) < 0.667 Then ph(j) = 5
If ph(j) < 0.877 Then ph(j) = 6
If ph(j) < 0.945 Then ph(j) = 7
If ph(j) < 0.992 Then ph(j) = 8
If ph(j) < 0.996 Then ph(j) = 9
If ph(j) < 1 And ph(j) >= 0.996 Then pb(j) = "A"
If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)
Next j
For j = 12 To 13
' D16
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.235 Then ph(j) = 11
If ph(j) < 0.312 Then ph(j) = 1
If ph(j) < 0.594 Then ph(j) = 2
If ph(j) < 0.833 Then ph(j) = 3
If ph(j) < 1 Then ph(j) = 4
If ph(j) > 10 Then ph(j) = 0
Next j
For j = 14 To 15
' D3
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.021 Then ph(j) = 11
If ph(j) < 0.26 Then ph(j) = 1
If ph(j) < 0.529 Then ph(j) = 2
If ph(j) < 0.85 Then ph(j) = 3
If ph(j) < 0.987 Then ph(j) = 4
If ph(j) < 1 Then ph(j) = 5
If ph(j) > 10 Then ph(j) = 0
Next j
ph(16) = " "
If (x / 10) - Int(x / 10) = 0 Then ph(16) = "TT"
' output the original generated file
Write #1, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
' Because in real DNA profiles without further info ,no one
' knows which allele in each pair came from the mother or father
' by convention they are written smaller ,larger (or equal).
' The following directs each pair
For j = 0 To 4 Step 2
If ph(j + 1) < ph(j) Then
jjj = ph(j)
ph(j) = ph(j + 1)
ph(j + 1) = jjj
End If
Next j
' put extra conditional statements here to reduce
' the number of files or just delete some of the following
'
' dividing on first column, file by file
If ph(0) = 0 Then
Write #10, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
count0 = count0 + 1
End If
If ph(0) = 1 Then
Write #11, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
count1 = count1 + 1
End If
If ph(0) = 2 Then
Write #12, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
count2 = count2 + 1
End If
If ph(0) = 3 Then
Write #13, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
count3 = count3 + 1
End If
If ph(0) = 4 Then
Write #14, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
count4 = count4 + 1
End If
If ph(0) = 5 Then
Write #15, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
count5 = count5 + 1
End If
If ph(0) = 6 Then
Write #16, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
count6 = count6 + 1
End If
If ph(0) = 7 Then
Write #17, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
count7 = count7 + 1
End If
Next x
Close #10
Close #11
Close #12
Close #13
Close #14
Close #15
Close #16
Close #17
Close #1
' count file for data to fix for - next loops in sucessive dividings
Open "jan03-Tfc" For Output As #20
Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7
Close #20
' Generating 8 loci x2 profiles (Taiwan Rukai loci
' vWA,THO1,D8,FGA,D21,D18,D16,D3
' 10% flagged 'RR'
' directing pairs and first divider
Dim ph(20)
Dim pb(20)
' initialising Random Number Generator - RNG
count9 = 0
count8 = 0
Randomize
a = 214013
c = 2531011
x0 = Timer
z = 2 ^ 24
' 1 file 'jan03-Rfg' for original, un-directed pairs, source data.
' This file is necessary to check on the performance of the RNG
' when a matched pair is found then it is highly unlikely that
' both sequences as generated, before pair directing, would
' be the same - more likely a manifest of repeat within the RNG
' (reason for adopting the 214013 / 2531011 RNG )
' Use 'Word' find function on part of the sequences, including pair reversals,
' with luck would include a 'homozygotic' pair eg (3,3) say ,so no reversal
' on that pair
Open "jan03-Rfg" For Output As #1
' outputs directed and divided by first digit
Open "jan03-Rf0" For Output As #10
Open "jan03-Rf1" For Output As #11
Open "jan03-Rf2" For Output As #12
Open "jan03-Rf3" For Output As #13
Open "jan03-Rf4" For Output As #14
Open "jan03-Rf5" For Output As #15
Open "jan03-Rf6" For Output As #16
Open "jan03-Rf7" For Output As #17
' change for different total size eg 199999 for 200,000
For x = 0 To 4999
For j = 0 To 1
' vWA
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.074 Then ph(j) = 11
If ph(j) < 0.111 Then ph(j) = 1
If ph(j) < 0.162 Then ph(j) = 2
If ph(j) < 0.434 Then ph(j) = 3
If ph(j) < 0.794 Then ph(j) = 4
If ph(j) < 0.956 Then ph(j) = 5
If ph(j) < 0.993 Then ph(j) = 6
If ph(j) < 1 Then ph(j) = 7
If ph(j) > 10 Then ph(j) = 0
Next j
For j = 2 To 3
' THO1
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.154 Then ph(j) = 11
If ph(j) < 0.551 Then ph(j) = 1
If ph(j) < 0.735 Then ph(j) = 2
If ph(j) < 0.926 Then ph(j) = 3
If ph(j) < 1 Then ph(j) = 4
If ph(j) > 10 Then ph(j) = 0
Next j
For j = 4 To 5
' D8
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.132 Then ph(j) = 11
If ph(j) < 0.280 Then ph(j) = 1
If ph(j) < 0.449 Then ph(j) = 2
If ph(j) < 0.589 Then ph(j) = 3
If ph(j) < 0.883 Then ph(j) = 4
If ph(j) < 0.986 Then ph(j) = 5
If ph(j) < 0.993 Then ph(j) = 6
If ph(j) < 1 Then ph(j) = 7
If ph(j) > 10 Then ph(j) = 0
Next j
For j = 6 To 7
' FGA
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.088 Then ph(j) = 1
If ph(j) < 0.103 Then ph(j) = 2
If ph(j) < 0.294 Then ph(j) = 3
If ph(j) < 0.434 Then ph(j) = 4
If ph(j) < 0.544 Then ph(j) = 6
If ph(j) < 0.742 Then ph(j) = 7
If ph(j) < 0.860 Then ph(j) = 8
If ph(j) < 0.963 Then ph(j) = 9
If ph(j) < 0.978 And ph(j) >= 0.963 Then pb(j) = "A"
If ph(j) < 1 And ph(j) >= 0.978 Then pb(j) = "B"
If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)
Next j
For j = 8 To 9
' D21
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.066 Then ph(j) = 11
If ph(j) < 0.257 Then ph(j) = 1
If ph(j) < 0.581 Then ph(j) = 2
If ph(j) < 0.596 Then ph(j) = 4
If ph(j) < 0.640 Then ph(j) = 5
If ph(j) < 0.669 Then ph(j) = 6
If ph(j) < 0.838 Then ph(j) = 7
If ph(j) < 0.934 Then ph(j) = 8
If ph(j) < 0.941 Then ph(j) = 9
If ph(j) < 0.985 And ph(j) >= 0.941 Then pb(j) = "A"
If ph(j) < 1 And ph(j) >= 0.985 Then pb(j) = "B"
If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)
Next j
For j = 10 To 11
' D18
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
pb(j) = "Z"
If ph(j) < 0.022 Then ph(j) = 1
If ph(j) < 0.081 Then ph(j) = 2
If ph(j) < 0.375 Then ph(j) = 3
If ph(j) < 0.663 Then ph(j) = 4
If ph(j) < 0.869 Then ph(j) = 5
If ph(j) < 0.920 Then ph(j) = 6
If ph(j) < 0.942 Then ph(j) = 8
If ph(j) < 0.949 Then ph(j) = 9
If ph(j) < 0.986 And ph(j) >= 0.949 Then pb(j) = "A"
If ph(j) < 0.993 And ph(j) >= 0.986 Then pb(j) = "B"
If ph(j) < 1 And ph(j) >= 0.993 Then pb(j) = "C"
If ph(j) > 10 Then ph(j) = 0
If pb(j) <> "Z" Then ph(j) = pb(j)
Next j
For j = 12 To 13
' D16
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.162 Then ph(j) = 11
If ph(j) < 0.324 Then ph(j) = 1
If ph(j) < 0.434 Then ph(j) = 2
If ph(j) < 0.706 Then ph(j) = 3
If ph(j) < 0.985 Then ph(j) = 4
If ph(j) < 1 Then ph(j) = 5
If ph(j) > 10 Then ph(j) = 0
Next j
For j = 14 To 15
' D3
' RNG
temp = x0 * a + c
temp = temp / z
x1 = (temp - Fix(temp)) * z
x0 = x1
phj = x1 / z
ph(j) = phj
If ph(j) < 0.022 Then ph(j) = 11
If ph(j) < 0.368Then ph(j) = 1
If ph(j) < 0.787 Then ph(j) = 2
If ph(j) < 0.949Then ph(j) = 3
If ph(j) < 1 Then ph(j) = 4
If ph(j) > 10 Then ph(j) = 0
Next j
ph(16) = " "
If (x / 10) - Int(x / 10) = 0 Then ph(16) = "RR"
' output the original generated file
Write #1, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
' Because in real DNA profiles without further info ,no one
' knows which allele in each pair came from the mother or father
' by convention they are written smaller ,larger (or equal).
' The following directs each pair
For j = 0 To 4 Step 2
If ph(j + 1) < ph(j) Then
jjj = ph(j)
ph(j) = ph(j + 1)
ph(j + 1) = jjj
End If
Next j
' put extra conditional statements here to reduce
' the number of files or just delete some of the following
'
' dividing on first column, file by file
If ph(0) = 0 Then
Write #10, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
count0 = count0 + 1
End If
If ph(0) = 1 Then
Write #11, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
count1 = count1 + 1
End If
If ph(0) = 2 Then
Write #12, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
count2 = count2 + 1
End If
If ph(0) = 3 Then
Write #13, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
count3 = count3 + 1
End If
If ph(0) = 4 Then
Write #14, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
count4 = count4 + 1
End If
If ph(0) = 5 Then
Write #15, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
count5 = count5 + 1
End If
If ph(0) = 6 Then
Write #16, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
count6 = count6 + 1
End If
If ph(0) = 7 Then
Write #17, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16)
count7 = count7 + 1
End If
Next x
Close #10
Close #11
Close #12
Close #13
Close #14
Close #15
Close #16
Close #17
Close #1
' count file for data to fix for - next loops in sucessive dividings
Open "jan03-Rfc" For Output As #20
Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7
Close #20
This is the results for simulating 2 separate populations
within the same country. To explore the increased
probability of false matches within a highly inter-related
community compared to a background population
Source file
http://www.cpu.edu.tw/~fsjournal/content/vol1.no.1/p4(31-37).pdf
shows indiginous populations AF data in Taiwan.
From the cross-plot the genetically most dissimilar
are the Tao and Rukai.
I adapted the generator of my simulation for 8
of the loci common to SGM+ for both Tao
and Rukai 'populations'.
Did a 500,000 run for Tao with also a randomly
selected and flagged 10 per cent of that population.
Determined the number of matches in the 500,000
and the 50,000.
Repeated with modification for the Rukai 500,000
and 50,000.
Then took the 50,000 Tao and merged with the 450,000
Rukai and determined matches concerning only Tao and
Rukai. Not many at all.
Rukai 500,000
10 digit matches 38,582
12 digit 2,359
14 digit 132
16 digit 12
Scales according to assumed square law to
10 loci equivalent of about 1 in 620,000
matches within selected 50,000
10 digit 471
12 digit 23
14 digit 3
16 digit 0
( NB the square law in operation 50,000 to 500,000
is tenfold increase but 12 digit matches increase hundred-fold
from 23 to 2,359 )
------------------
Tao 500,000
10 digit 174,534
12 digit 17,880
14 digit 1,172
16 digit 59
Scales according to assumed square law to
10 loci equivalent of about 1 in 360,000
matches within the randomly selected 50,000
10 digit 5,533
12 digit 234
14 digit 9
16 digit 0
---------------------
Matches within 50,000 Tao and 45,000 Rukai
10 digit 38, 402
12 digit 2,189
14 digit 114
16 digit 10
Matches only concerning Tao with Tao and Tao with Rukai
10 digit 5,926
12 digit 247
14 digit 11
16 digit 0
So extracting the within 50K Tao matches of 5,533 & 234 & 9 & 0
gives only
10 digit 393
12 digit 13
14 digit 2
16 digit 0
Scales according to assumed square law to
10 loci equivalent of about 1 in 8 million
So making a difference of about 500 by square law