Simulation of a large DNA Profile Database, full alleles

Simulation of a large DNA Profile Database, full alleles



Previous simulation ( file dnas.htm ) used only a maximum of 10 alleles, 
wheras the average for UK caucasians published in journal 
articles have an average of 11.3 alleles per locus.
The method disclosed here would be expandable to 
any number of loci (eg 13 for the USA ) and more than 
60 alleles per locus if ever required.

Visual basic macros and Word macros are 
between the horizontal rules. 
Before going into Visual Basic Editor go into ordinary 
Word and call up anything in the directory you want 
the VB files to go into as this is not designated in the 
following code.
Using plain text handling Notepad with no line wrap 
copy to Run/Create and place between Sub / End Sub ,Reset and Run.
Some long conditional statements may break 
and would need reconcattenating before running.

If using straight VB6 then designate the directory 
for files by "replace all" occurances of dec14 to 
c:\vb\dec14 or whatever, also add a sound progress 
indicator before the [ next x ] line 
 
If x/1000 = Int (x/1000 ) Then Beep
 
before highlighting and copying.

In VB6 open New Project
In Form1 open up a Command1 button
Double click this button to open command 
code window and copy and paste the 'DNA' VB code 
between the Private Sub Command1_click ()
and the End Sub
Then Run/Start
Press command1
Wait until Beep/ clicks cease


Despite official government sites linking to these files there are still corrupt persons knocking out my sites, so for the purposes of searchengines cross-linking them, files no longer available on the original web hosting sites were on http://www.nutteing.50megs.com/dnas5.htm , http://www.nutteing.freeisp.co.uk/dnas5.htm , http://www.nutteing.batcave.net/dnas5.htm , http://home.graffiti.net/nutteing/dnas5.htm , http://nutteing.no-frills.net/dnas5.htm and http://nutteing3.no-frills.net/dnas5.htm (last 2 due now to host failure)

' Generating 10 loci x2 profiles ' directing pairs and first divider Dim ph(20) Dim pb(20) ' initialising Random Number Generator - RNG count9 = 0 count8 = 0 Randomize a = 214013 c = 2531011 x0 = Timer z = 2 ^ 24 ' 1 file 'dec14-g' for original, un-directed pairs, source data. ' This file is necessary to check on the performance of the RNG ' when a matched pair is found then it is highly unlikely that ' both sequences as generated, before pair directing, would ' be the same - more likely a manifest of repeat within the RNG ' (reason for adopting the 214013 / 2531011 RNG ) ' Use 'Word' find function on part of the sequences, including pair reversals, ' with luck would include a 'homozygotic' pair eg (3,3) say ,so no reversal ' on that pair Open "dec14-g" For Output As #1 ' outputs directed and divided by first digit Open "dec14-0" For Output As #10 Open "dec14-1" For Output As #11 Open "dec14-2" For Output As #12 Open "dec14-3" For Output As #13 Open "dec14-4" For Output As #14 Open "dec14-5" For Output As #15 Open "dec14-6" For Output As #16 Open "dec14-7" For Output As #17 Open "dec14-8" For Output As #18 Open "dec14-9" For Output As #19 ' change for different total size eg 199999 for 200,000 For x = 0 To 4999 For j = 0 To 1 ' vWA ,first locus ' RNG random number generator temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj If ph(j) < 0.001 Then ph(j) = 11 If ph(j) < 0.106 Then ph(j) = 1 If ph(j) < 0.186 Then ph(j) = 2 If ph(j) < 0.402 Then ph(j) = 3 If ph(j) < 0.672 Then ph(j) = 4 If ph(j) < 0.891 Then ph(j) = 5 If ph(j) < 0.984 Then ph(j) = 6 If ph(j) < 0.998 Then ph(j) = 7 If ph(j) < 1 Then ph(j) = 8 If ph(j) > 10 Then ph(j) = 0 Next j For j = 2 To 3 ' THO1 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj If ph(j) < 0.002 Then ph(j) = 11 If ph(j) < 0.243 Then ph(j) = 1 If ph(j) < 0.437 Then ph(j) = 2 If ph(j) < 0.545 Then ph(j) = 3 If ph(j) < 0.546 Then ph(j) = 4 If ph(j) < 0.686 Then ph(j) = 5 If ph(j) < 0.99 Then ph(j) = 6 If ph(j) < 1 Then ph(j) = 7 If ph(j) > 10 Then ph(j) = 0 Next j For j = 4 To 5 ' D8 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj If ph(j) < 0.018 Then ph(j) = 11 If ph(j) < 0.031 Then ph(j) = 1 If ph(j) < 0.125 Then ph(j) = 2 If ph(j) < 0.191 Then ph(j) = 3 If ph(j) < 0.334 Then ph(j) = 4 If ph(j) < 0.667 Then ph(j) = 5 If ph(j) < 0.876 Then ph(j) = 6 If ph(j) < 0.964 Then ph(j) = 7 If ph(j) < 0.995 Then ph(j) = 8 If ph(j) < 1 Then ph(j) = 9 If ph(j) > 10 Then ph(j) = 0 Next j For j = 6 To 7 ' FGA ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj pb(j) = "Z" If ph(j) < 0.025 Then ph(j) = 11 If ph(j) < 0.081 Then ph(j) = 1 If ph(j) < 0.224 Then ph(j) = 2 If ph(j) < 0.226 And ph(j) >= 0.224 Then pb(j) = "A" If ph(j) < 0.413 Then ph(j) = 3 If ph(j) < 0.415 And ph(j) >= 0.413 Then pb(j) = "B" If ph(j) < 0.58 Then ph(j) = 4 If ph(j) < 0.591 And ph(j) >= 0.58 Then pb(j) = "C" If ph(j) < 0.73 Then ph(j) = 5 If ph(j) < 0.734 And ph(j) >= 0.73 Then pb(j) = "D" If ph(j) < 0.88 Then ph(j) = 6 If ph(j) < 0.882 And ph(j) >= 0.88 Then pb(j) = "E" If ph(j) < 0.957 Then ph(j) = 7 If ph(j) < 0.992 Then ph(j) = 8 If ph(j) < 0.999 Then ph(j) = 9 If ph(j) < 1 And ph(j) >= 0.999 Then pb(j) = "F" If ph(j) > 10 Then ph(j) = 0 If pb(j) <> "Z" Then ph(j) = pb(j) Next j For j = 8 To 9 ' D21 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj pb(j) = "Z" If ph(j) < 0.001 Then pb(j) = "A" If ph(j) < 0.002 And ph(j) >= 0.001 Then pb(j) = "B" If ph(j) < 0.033 Then ph(j) = 11 If ph(j) < 0.193 Then ph(j) = 1 If ph(j) < 0.419 Then ph(j) = 2 If ph(j) < 0.677 Then ph(j) = 3 If ph(j) < 0.704 Then ph(j) = 4 If ph(j) < 0.773 Then ph(j) = 5 If ph(j) < 0.866 Then ph(j) = 6 If ph(j) < 0.884 Then ph(j) = 7 If ph(j) < 0.974 Then ph(j) = 8 If ph(j) < 0.975 And ph(j) >= 0.974 Then pb(j) = "C" If ph(j) < 0.997 Then ph(j) = 9 If ph(j) < 1 And ph(j) >= 0.997 Then pb(j) = "D" If ph(j) > 10 Then ph(j) = 0 If pb(j) <> "Z" Then ph(j) = pb(j) Next j For j = 10 To 11 ' D18 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj pb(j) = "Z" If ph(j) < 0.001 Then pb(j) = "A" If ph(j) < 0.009 And ph(j) >= 0.001 Then pb(j) = "B" If ph(j) < 0.021 Then ph(j) = 11 If ph(j) < 0.16 Then ph(j) = 1 If ph(j) < 0.285 Then ph(j) = 2 If ph(j) < 0.449 Then ph(j) = 3 If ph(j) < 0.594 Then ph(j) = 4 If ph(j) < 0.731 Then ph(j) = 5 If ph(j) < 0.846 Then ph(j) = 6 If ph(j) < 0.926 Then ph(j) = 7 If ph(j) < 0.967 Then ph(j) = 8 If ph(j) < 0.982 Then ph(j) = 9 If ph(j) < 0.992 And ph(j) >= 0.982 Then pb(j) = "C" If ph(j) < 0.997 And ph(j) >= 0.992 Then pb(j) = "D" If ph(j) < 0.998 And ph(j) >= 0.997 Then pb(j) = "E" If ph(j) < 1 And ph(j) >= 0.998 Then pb(j) = "F" ' allele 20 (C) reduced from .017 to .015 as allele ' frequencies summed to 1.002 If ph(j) > 10 Then ph(j) = 0 If pb(j) <> "Z" Then ph(j) = pb(j) Next j For j = 12 To 13 ' D2S1338 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj pb(j) = "Z" If ph(j) < 0.037 Then ph(j) = 11 If ph(j) < 0.222 Then ph(j) = 1 If ph(j) < 0.309 Then ph(j) = 2 If ph(j) < 0.419 Then ph(j) = 3 If ph(j) < 0.557 Then ph(j) = 4 If ph(j) < 0.589 Then ph(j) = 5 If ph(j) < 0.613 Then ph(j) = 6 If ph(j) < 0.725 Then ph(j) = 7 If ph(j) < 0.867 Then ph(j) = 8 If ph(j) < 0.978 Then ph(j) = 9 If ph(j) < 0.997 And ph(j) >= 0.978 Then pb(j) = "A" If ph(j) < 1 And ph(j) >= 0.997 Then pb(j) = "B" If ph(j) > 10 Then ph(j) = 0 If pb(j) <> "Z" Then ph(j) = pb(j) Next j For j = 14 To 15 ' D16 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj If ph(j) < 0.019 Then ph(j) = 11 If ph(j) < 0.148 Then ph(j) = 1 If ph(j) < 0.202 Then ph(j) = 2 If ph(j) < 0.491 Then ph(j) = 3 If ph(j) < 0.779 Then ph(j) = 4 If ph(j) < 0.965 Then ph(j) = 5 If ph(j) < 0.994 Then ph(j) = 6 If ph(j) < 1 Then ph(j) = 7 If ph(j) > 10 Then ph(j) = 0 Next j For j = 16 To 17 ' D19 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj pb(j) = "Z" If ph(j) < 0.087 Then ph(j) = 11 If ph(j) < 0.309 Then ph(j) = 1 If ph(j) < 0.322 Then ph(j) = 2 If ph(j) < 0.704 Then ph(j) = 3 If ph(j) < 0.719 Then ph(j) = 4 If ph(j) < 0.896 Then ph(j) = 5 If ph(j) < 0.934 Then ph(j) = 6 If ph(j) < 0.975 Then ph(j) = 7 If ph(j) < 0.992 Then ph(j) = 8 If ph(j) < 0.997 Then ph(j) = 9 If ph(j) < 0.999 And ph(j) >= 0.997 Then pb(j) = "A" If ph(j) < 1 And ph(j) >= 0.999 Then pb(j) = "B" If ph(j) > 10 Then ph(j) = 0 If pb(j) <> "Z" Then ph(j) = pb(j) Next j For j = 18 To 19 ' D3 ' RNG temp = x0 * a + c temp = temp / z x1 = (temp - Fix(temp)) * z x0 = x1 phj = x1 / z ph(j) = phj If ph(j) < 0.001 Then ph(j) = 11 If ph(j) < 0.007 Then ph(j) = 1 If ph(j) < 0.139 Then ph(j) = 2 If ph(j) < 0.404 Then ph(j) = 3 If ph(j) < 0.651 Then ph(j) = 4 If ph(j) < 0.846 Then ph(j) = 5 If ph(j) < 0.987 Then ph(j) = 6 If ph(j) < 1 Then ph(j) = 7 If ph(j) > 10 Then ph(j) = 0 Next j ' output the original generated file Write #1, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) ' Because in real DNA profiles without further info ,no one ' knows which allele in each pair came from the mother or father ' by convention they are written smaller ,larger (or equal). ' The following directs each pair For j = 0 To 18 Step 2 If ph(j + 1) < ph(j) Then jjj = ph(j) ph(j) = ph(j + 1) ph(j + 1) = jjj End If Next j ' put extra conditional statements here to reduce ' the number of files or just delete some of the following ' ' dividing on first column, file by file If ph(0) = 0 Then Write #10, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count0 = count0 + 1 End If If ph(0) = 1 Then Write #11, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count1 = count1 + 1 End If If ph(0) = 2 Then Write #12, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count2 = count2 + 1 End If If ph(0) = 3 Then Write #13, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count3 = count3 + 1 End If If ph(0) = 4 Then Write #14, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count4 = count4 + 1 End If If ph(0) = 5 Then Write #15, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count5 = count5 + 1 End If If ph(0) = 6 Then Write #16, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count6 = count6 + 1 End If If ph(0) = 7 Then Write #17, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count7 = count7 + 1 End If If ph(0) = 8 Then Write #18, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count8 = count8 + 1 End If If ph(0) = 9 Then Write #19, ph(0) & ph(1) & ph(2) & ph(3) & ph(4) & ph(5) & ph(6) & ph(7) & ph(8) & ph(9) & ph(10) & ph(11) & ph(12) & ph(13) & ph(14) & ph(15) & ph(16) & ph(17) & ph(18) & ph(19) count9 = count9 + 1 End If Next x Close #10 Close #11 Close #12 Close #13 Close #14 Close #15 Close #16 Close #17 Close #18 Close #19 Close #1 ' count file for data to fix for - next loops in sucessive dividings Open "dec14-c" For Output As #20 Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9 Close #20
' Dividing file into 10 by second digit Dim ph(20) dim ps as string ' xxxx = count size from count file xxxx = ' input file Open "dec14-1" For Input As #1 ' 10 divided files Open "dec14-10" For Output As #10 Open "dec14-11" For Output As #11 Open "dec14-12" For Output As #12 Open "dec14-13" For Output As #13 Open "dec14-14" For Output As #14 Open "dec14-15" For Output As #15 Open "dec14-16" For Output As #16 Open "dec14-17" For Output As #17 Open "dec14-18" For Output As #18 Open "dec14-19" For Output As #19 count9 = 0 count8 = 0 xxxx = xxxx - 1 For x = 0 To xxxx Input #1, ps a2$ = Mid(ps, 2, 1) ph(1) = Val(a2$) If ph(1) = 0 Then Write #10, ps count0 = count0 + 1 End If If ph(1) = 1 Then Write #11, ps count1 = count1 + 1 End If If ph(1) = 2 Then Write #12, ps count2 = count2 + 1 End If If ph(1) = 3 Then Write #13, ps count3 = count3 + 1 End If If ph(1) = 4 Then Write #14, ps count4 = count4 + 1 End If If ph(1) = 5 Then Write #15, ps count5 = count5 + 1 End If If ph(1) = 6 Then Write #16, ps count6 = count6 + 1 End If If ph(1) = 7 Then Write #17, ps count7 = count7 + 1 End If If ph(1) = 8 Then Write #18, ps count8 = count8 + 1 End If If ph(1) = 9 Then Write #19, ps count9 = count9 + 1 End If Next x Close #1 Close #10 Close #11 Close #12 Close #13 Close #14 Close #15 Close #16 Close #17 Close #18 Close #19 ' output counts Open "dec14-1c" For Output As #20 Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9 Close #20
' Dividing file into 10 by third digit Dim ph(20) dim ps as string ' enter count in xxxx xxxx = Open "dec14-11" For Input As #1 Open "dec14-110" For Output As #10 Open "dec14-111" For Output As #11 Open "dec14-112" For Output As #12 Open "dec14-113" For Output As #13 Open "dec14-114" For Output As #14 Open "dec14-115" For Output As #15 Open "dec14-116" For Output As #16 Open "dec14-117" For Output As #17 Open "dec14-118" For Output As #18 Open "dec14-119" For Output As #19 count9 = 0 count8 = 0 xxxx=xxxx - 1 For x = 0 To xxxx Input #1, ps a3$ = Mid(ps, 3, 1) ph(2) = Val(a3$) If ph(2) = 0 Then Write #10, ps count0 = count0 + 1 End If If ph(2) = 1 Then Write #11, ps count1 = count1 + 1 End If If ph(2) = 2 Then Write #12, ps count2 = count2 + 1 End If If ph(2) = 3 Then Write #13, ps count3 = count3 + 1 End If If ph(2) = 4 Then Write #14, ps count4 = count4 + 1 End If If ph(2) = 5 Then Write #15, ps count5 = count5 + 1 End If If ph(2) = 6 Then Write #16, ps count6 = count6 + 1 End If If ph(2) = 7 Then Write #17, ps count7 = count7 + 1 End If If ph(2) = 8 Then Write #18, ps count8 = count8 + 1 End If If ph(2) = 9 Then Write #19, ps count9 = count9 + 1 End If Next x Close #1 Close #10 Close #11 Close #12 Close #13 Close #14 Close #15 Close #16 Close #17 Close #18 Close #19 Open "dec14-11c" For Output As #20 Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9 Close #20
' Dividing file into 10 by fourth digit Dim ph(20) dim ps as string ' enter count in xxxx xxxx = Open "dec14-131" For Input As #1 Open "dec14-1310" For Output As #10 Open "dec14-1311" For Output As #11 Open "dec14-1312" For Output As #12 Open "dec14-1313" For Output As #13 Open "dec14-1314" For Output As #14 Open "dec14-1315" For Output As #15 Open "dec14-1316" For Output As #16 Open "dec14-1317" For Output As #17 Open "dec14-1318" For Output As #18 Open "dec14-1319" For Output As #19 count9 = 0 count8 = 0 xxxx=xxxx - 1 For x = 0 To xxxx Input #1, ps a4$ = Mid(ps, 4, 1) ph(3) = Val(a4$) If ph(3) = 0 Then Write #10, ps count0 = count0 + 1 End If If ph(3) = 1 Then Write #11, ps count1 = count1 + 1 End If If ph(3) = 2 Then Write #12, ps count2 = count2 + 1 End If If ph(3) = 3 Then Write #13, ps count3 = count3 + 1 End If If ph(3) = 4 Then Write #14, ps count4 = count4 + 1 End If If ph(3) = 5 Then Write #15, ps count5 = count5 + 1 End If If ph(3) = 6 Then Write #16, ps count6 = count6 + 1 End If If ph(3) = 7 Then Write #17, ps count7 = count7 + 1 End If If ph(3) = 8 Then Write #18, ps count8 = count8 + 1 End If If ph(3) = 9 Then Write #19, ps count9 = count9 + 1 End If Next x Close #1 Close #10 Close #11 Close #12 Close #13 Close #14 Close #15 Close #16 Close #17 Close #18 Close #19 Open "dec14-131c" For Output As #20 Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9 Close #20
' Dividing file into 10 by fifth digit Dim ph(20) Dim ps As String ' enter count in xxxx xxxx = Open "dec14-3412" For Input As #1 Open "dec14-34120" For Output As #10 Open "dec14-34121" For Output As #11 Open "dec14-34122" For Output As #12 Open "dec14-34123" For Output As #13 Open "dec14-34124" For Output As #14 Open "dec14-34125" For Output As #15 Open "dec14-34126" For Output As #16 Open "dec14-34127" For Output As #17 Open "dec14-34128" For Output As #18 Open "dec14-34129" For Output As #19 count9 = 0 count8 = 0 xxxx = xxxx - 1 For x = 0 To xxxx Input #1, ps a5$ = Mid(ps, 5, 1) ph(4) = Val(a5$) If ph(4) = 0 Then Write #10, ps count0 = count0 + 1 End If If ph(4) = 1 Then Write #11, ps count1 = count1 + 1 End If If ph(4) = 2 Then Write #12, ps count2 = count2 + 1 End If If ph(4) = 3 Then Write #13, ps count3 = count3 + 1 End If If ph(4) = 4 Then Write #14, ps count4 = count4 + 1 End If If ph(4) = 5 Then Write #15, ps count5 = count5 + 1 End If If ph(4) = 6 Then Write #16, ps count6 = count6 + 1 End If If ph(4) = 7 Then Write #17, ps count7 = count7 + 1 End If If ph(4) = 8 Then Write #18, ps count8 = count8 + 1 End If If ph(4) = 9 Then Write #19, ps count9 = count9 + 1 End If Next x Close #1 Close #10 Close #11 Close #12 Close #13 Close #14 Close #15 Close #16 Close #17 Close #18 Close #19 Open "dec14-3412c" For Output As #20 Write #20, 0, count0, 1, count1, 2, count2, 3, count3, 4, count4, 5, count5, 6, count6, 7, count7, 8, count8, 9, count9 Close #20
The next is sorting using Word Tables/ Sort Before using ,make a test batch of numbers as there are various Sort outcomes. Now I'm using string data, Text sort gave the right form on my machine. Use Ctrl+shift+Home(or End) to highlight text up or down . After sort and before saving to disk press up or down arrow to select which way the text is returned to you. My set-up was limited to no more than 15,000. To sort say 28,000 sort upper half ,then lower half then cut and paste say 0 to 2 section of lower half into the top of the top half. Re-sort the expanded 0 to 2 section then re-sort the remainder. If say selecting 2 to 3 section then cut and paste at the juncture of 2 and 3 in the other block to save some repeated sorting. Other times it is quicker to oversort then backtrack / overlap on the next sort. Many of the subdivision files are empty because of the directing. They consist of eg 4,4.. 4,5.... etc never 4,0.., 4,1.. etc and a number of 8 and 9 sections are absent back to the generator characteristics eg only first 8 of 10 are used. When you know all files are less than 15,000, or whatever Sort limit, use the next (simply a recorded macro) to sort 10 related files. An empty file will stop the macro so edit out empty files before running.
'Sort 10 related files in one go ' Documents.Open FileName:="dec14-130", ConfirmConversions:=False, ReadOnly _ :=False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate _ :="", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="" _ , Format:=wdOpenFormatAuto Selection.Sort ExcludeHeader:=False, FieldNumber:="Paragraphs", _ SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderAscending, _ FieldNumber2:="", SortFieldType2:=wdSortFieldAlphanumeric, SortOrder2:= _ wdSortOrderAscending, FieldNumber3:="", SortFieldType3:= _ wdSortFieldAlphanumeric, SortOrder3:=wdSortOrderAscending, Separator:= _ wdSortSeparateByTabs, SortColumn:=False, CaseSensitive:=False, LanguageID _ :=wdLanguageNone ActiveDocument.Save ' Documents.Open FileName:="dec14-131", ConfirmConversions:=False, ReadOnly _ :=False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate _ :="", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="" _ , Format:=wdOpenFormatAuto Selection.Sort ExcludeHeader:=False, FieldNumber:="Paragraphs", _ SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderAscending, _ FieldNumber2:="", SortFieldType2:=wdSortFieldAlphanumeric, SortOrder2:= _ wdSortOrderAscending, FieldNumber3:="", SortFieldType3:= _ wdSortFieldAlphanumeric, SortOrder3:=wdSortOrderAscending, Separator:= _ wdSortSeparateByTabs, SortColumn:=False, CaseSensitive:=False, LanguageID _ :=wdLanguageNone ActiveDocument.Save ' Documents.Open FileName:="dec14-132", ConfirmConversions:=False, ReadOnly _ :=False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate _ :="", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="" _ , Format:=wdOpenFormatAuto Selection.Sort ExcludeHeader:=False, FieldNumber:="Paragraphs", _ SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderAscending, _ FieldNumber2:="", SortFieldType2:=wdSortFieldAlphanumeric, SortOrder2:= _ wdSortOrderAscending, FieldNumber3:="", SortFieldType3:= _ wdSortFieldAlphanumeric, SortOrder3:=wdSortOrderAscending, Separator:= _ wdSortSeparateByTabs, SortColumn:=False, CaseSensitive:=False, LanguageID _ :=wdLanguageNone ActiveDocument.Save ' Documents.Open FileName:="dec14-133", ConfirmConversions:=False, ReadOnly _ :=False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate _ :="", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="" _ , Format:=wdOpenFormatAuto Selection.Sort ExcludeHeader:=False, FieldNumber:="Paragraphs", _ SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderAscending, _ FieldNumber2:="", SortFieldType2:=wdSortFieldAlphanumeric, SortOrder2:= _ wdSortOrderAscending, FieldNumber3:="", SortFieldType3:= _ wdSortFieldAlphanumeric, SortOrder3:=wdSortOrderAscending, Separator:= _ wdSortSeparateByTabs, SortColumn:=False, CaseSensitive:=False, LanguageID _ :=wdLanguageNone ActiveDocument.Save ' Documents.Open FileName:="dec14-134", ConfirmConversions:=False, ReadOnly _ :=False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate _ :="", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="" _ , Format:=wdOpenFormatAuto Selection.Sort ExcludeHeader:=False, FieldNumber:="Paragraphs", _ SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderAscending, _ FieldNumber2:="", SortFieldType2:=wdSortFieldAlphanumeric, SortOrder2:= _ wdSortOrderAscending, FieldNumber3:="", SortFieldType3:= _ wdSortFieldAlphanumeric, SortOrder3:=wdSortOrderAscending, Separator:= _ wdSortSeparateByTabs, SortColumn:=False, CaseSensitive:=False, LanguageID _ :=wdLanguageNone ActiveDocument.Save ' Documents.Open FileName:="dec14-135", ConfirmConversions:=False, ReadOnly _ :=False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate _ :="", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="" _ , Format:=wdOpenFormatAuto Selection.Sort ExcludeHeader:=False, FieldNumber:="Paragraphs", _ SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderAscending, _ FieldNumber2:="", SortFieldType2:=wdSortFieldAlphanumeric, SortOrder2:= _ wdSortOrderAscending, FieldNumber3:="", SortFieldType3:= _ wdSortFieldAlphanumeric, SortOrder3:=wdSortOrderAscending, Separator:= _ wdSortSeparateByTabs, SortColumn:=False, CaseSensitive:=False, LanguageID _ :=wdLanguageNone ActiveDocument.Save ' Documents.Open FileName:="dec14-136", ConfirmConversions:=False, ReadOnly _ :=False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate _ :="", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="" _ , Format:=wdOpenFormatAuto Selection.Sort ExcludeHeader:=False, FieldNumber:="Paragraphs", _ SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderAscending, _ FieldNumber2:="", SortFieldType2:=wdSortFieldAlphanumeric, SortOrder2:= _ wdSortOrderAscending, FieldNumber3:="", SortFieldType3:= _ wdSortFieldAlphanumeric, SortOrder3:=wdSortOrderAscending, Separator:= _ wdSortSeparateByTabs, SortColumn:=False, CaseSensitive:=False, LanguageID _ :=wdLanguageNone ActiveDocument.Save ' Documents.Open FileName:="dec14-137", ConfirmConversions:=False, ReadOnly _ :=False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate _ :="", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="" _ , Format:=wdOpenFormatAuto Selection.Sort ExcludeHeader:=False, FieldNumber:="Paragraphs", _ SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderAscending, _ FieldNumber2:="", SortFieldType2:=wdSortFieldAlphanumeric, SortOrder2:= _ wdSortOrderAscending, FieldNumber3:="", SortFieldType3:= _ wdSortFieldAlphanumeric, SortOrder3:=wdSortOrderAscending, Separator:= _ wdSortSeparateByTabs, SortColumn:=False, CaseSensitive:=False, LanguageID _ :=wdLanguageNone ActiveDocument.Save ' Documents.Open FileName:="dec14-138", ConfirmConversions:=False, ReadOnly _ :=False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate _ :="", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="" _ , Format:=wdOpenFormatAuto Selection.Sort ExcludeHeader:=False, FieldNumber:="Paragraphs", _ SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderAscending, _ FieldNumber2:="", SortFieldType2:=wdSortFieldAlphanumeric, SortOrder2:= _ wdSortOrderAscending, FieldNumber3:="", SortFieldType3:= _ wdSortFieldAlphanumeric, SortOrder3:=wdSortOrderAscending, Separator:= _ wdSortSeparateByTabs, SortColumn:=False, CaseSensitive:=False, LanguageID _ :=wdLanguageNone ActiveDocument.Save ' Documents.Open FileName:="dec14-139", ConfirmConversions:=False, ReadOnly _ :=False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate _ :="", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="" _ , Format:=wdOpenFormatAuto Selection.Sort ExcludeHeader:=False, FieldNumber:="Paragraphs", _ SortFieldType:=wdSortFieldAlphanumeric, SortOrder:=wdSortOrderAscending, _ FieldNumber2:="", SortFieldType2:=wdSortFieldAlphanumeric, SortOrder2:= _ wdSortOrderAscending, FieldNumber3:="", SortFieldType3:= _ wdSortFieldAlphanumeric, SortOrder3:=wdSortOrderAscending, Separator:= _ wdSortSeparateByTabs, SortColumn:=False, CaseSensitive:=False, LanguageID _ :=wdLanguageNone ActiveDocument.Save
' empty files will append spurious carriage returns at ' head or tail of files so check for this before final match routine ' otherwise use Insert / File to merge files ' merge 10 related files back to one ' for convenience I named these re-concattenated ' files as .txt so they were obvious in listing ' compared to no suffix ones ' in windows explorer the file sizes of these .txt files ' should have the same file-size as the original ' unexpanded and unsorted file ' use "Word" find for any repeated repeatedend of line "^p^p" ' Documents.Add Template:="", NewTemplate:=False Selection.InsertFile FileName:="dec14-130", Range:="", ConfirmConversions _ :=False, Link:=False, Attachment:=False Selection.InsertFile FileName:="dec14-131", Range:="", ConfirmConversions _ :=False, Link:=False, Attachment:=False Selection.InsertFile FileName:="dec14-132", Range:="", ConfirmConversions _ :=False, Link:=False, Attachment:=False Selection.InsertFile FileName:="dec14-133", Range:="", ConfirmConversions _ :=False, Link:=False, Attachment:=False Selection.InsertFile FileName:="dec14-134", Range:="", ConfirmConversions _ :=False, Link:=False, Attachment:=False Selection.InsertFile FileName:="dec14-135", Range:="", ConfirmConversions _ :=False, Link:=False, Attachment:=False Selection.InsertFile FileName:="dec14-136", Range:="", ConfirmConversions _ :=False, Link:=False, Attachment:=False Selection.InsertFile FileName:="dec14-137", Range:="", ConfirmConversions _ :=False, Link:=False, Attachment:=False Selection.InsertFile FileName:="dec14-138", Range:="", ConfirmConversions _ :=False, Link:=False, Attachment:=False Selection.InsertFile FileName:="dec14-139", Range:="", ConfirmConversions _ :=False, Link:=False, Attachment:=False ActiveDocument.SaveAs FileName:="dec14-13.txt", FileFormat:=wdFormatText, _ LockComments:=False, Password:="", AddToRecentFiles:=True, WritePassword _ :="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _ SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _ False
Copy and paste all these subfiles together to submit to the next section. The final match finding, initially for 12 digits ,then change to 14,16,18 and finally 20 if 18 shows something. This routine after hours of dividing/sorting/re-merging takes only seconds to complete.
' Find matching pairs in 12 digits ' xxxx is count = ???? xxxx = b$ = "0" Count = 0 Dim ps As String Open "dec14-24.txt" For Input As #1 Open "dec14-24m12.txt" For Output As #2 ' change the 12 in the #2 file name above and ' the Left function below to suit number of matches xxxx = xxxx - 1 For x = 0 To xxxx Input #1, ps a$ = Left(ps, 12) If a$ = b$ Then Write #2, ps Count = Count + 1 End If b$ = a$ Next x Write #2, "Count ", Count close #1 Close #2
' Find matching triples in 12 digits ' xxxx is count from the count files xxxx = b$ = "0" c$ = "0" Count = 0 Dim ps As String xxxx = xxxx - 1 Open "dec14-1.txt" For Input As #1 Open "dec14-1trip.txt" For Output As #2 ' change the 12 in the #2 file name above and ' the Left function below to suit number of matches For x = 0 To xxxx Input #1, ps a$ = Left(ps, 12) a2$ = ps If a$ = c$ Then Write #2, a2$, b2$, c2$ Count = Count + 1 End If If a$ = b$ Then c$ = b$ c2$ = b2$ End If b$ = a$ b2$ = a2$ Next x Write #2, "Count ", Count Close #1 Close #2
' Find matching quadruples in 12 digits ' xxxx is from the count files xxxx = b$ = "0" c$ = "0" Count = 0 Dim ps As String xxxx = xxxx - 1 Open "dec14-3.txt" For Input As #1 Open "dec14-3quad.txt" For Output As #2 ' change the 12 in the #2 file name above and ' the Left function below to suit number of matches For x = 0 To xxxx Input #1, ps a$ = Left(ps, 12) a2$ = ps If a$ = d$ Then Write #2, a2$, b2$, c2$, d2$ Count = Count + 1 End If If a$ = c$ Then d$ = c$ d2$ = c2$ End If If a$ = b$ Then c$ = b$ c2$ = b2$ End If b$ = a$ b2$ = a2$ Next x Write #2, "Count ", Count Close #1 Close #2
' Find matching quintuples in 12 digits ' xxxx is from the count files xxxx = b$ = "0" c$ = "0" Count = 0 Dim ps As String xxxx = xxxx - 1 Open "dec14-4.txt" For Input As #1 Open "dec14-4quin.txt" For Output As #2 ' change the 12 in the #2 file name above and ' the Left function below to suit number of matches For x = 0 To xxxx Input #1, ps a$ = Left(ps, 12) a2$ = ps If a$ = e$ Then Write #2, a2$, b2$, c2$, d2$, e2$ Count = Count + 1 End If If a$ = d$ Then e$ = d$ e2$ = d2$ End If If a$ = c$ Then d$ = c$ d2$ = c2$ End If If a$ = b$ Then c$ = b$ c2$ = b2$ End If b$ = a$ b2$ = a2$ Next x Write #2, "Count ", Count Close #1 Close #2
' converting integre values back to DNA loci,alleles, UK 10 loci xxxx = ' xxxx is number of profiles to be converted Dim ph(20) Dim pj(20) Dim ps As String Open "dec14-aaa.txt" For Input As #1 Open "dec14-rrr.txt" For Output As #2 For x = 1 To xxxx Input #1, ps a1$ = Mid(ps, 1, 1) a2$ = Mid(ps, 2, 1) a3$ = Mid(ps, 3, 1) a4$ = Mid(ps, 4, 1) a5$ = Mid(ps, 5, 1) a6$ = Mid(ps, 6, 1) a7$ = Mid(ps, 7, 1) a8$ = Mid(ps, 8, 1) a9$ = Mid(ps, 9, 1) a10$ = Mid(ps, 10, 1) a11$ = Mid(ps, 11, 1) a12$ = Mid(ps, 12, 1) a13$ = Mid(ps, 13, 1) a14$ = Mid(ps, 14, 1) a15$ = Mid(ps, 15, 1) a16$ = Mid(ps, 16, 1) a17$ = Mid(ps, 17, 1) a18$ = Mid(ps, 18, 1) a19$ = Mid(ps, 19, 1) a20$ = Mid(ps, 20, 1) ph(0) = a1$ ph(1) = a2$ ph(2) = a3$ ph(3) = a4$ ph(4) = a5$ ph(5) = a6$ ph(6) = a7$ ph(7) = a8$ ph(8) = a9$ ph(9) = a10$ ph(10) = a11$ ph(11) = a12$ ph(12) = a13$ ph(13) = a14$ ph(14) = a15$ ph(15) = a16$ ph(16) = a17$ ph(17) = a18$ ph(18) = a19$ ph(19) = a20$ For j = 0 To 1 ' vWA If ph(j) = "0" Then pj(j) = 13 If ph(j) = "1" Then pj(j) = 14 If ph(j) = "2" Then pj(j) = 15 If ph(j) = "3" Then pj(j) = 16 If ph(j) = "4" Then pj(j) = 17 If ph(j) = "5" Then pj(j) = 18 If ph(j) = "6" Then pj(j) = 19 If ph(j) = "7" Then pj(j) = 20 If ph(j) = "8" Then pj(j) = 21 If ph(j) = "9" Then pj(j) = 0 Next j For j = 2 To 3 ' THO1 If ph(j) = "0" Then pj(j) = 5 If ph(j) = "1" Then pj(j) = 6 If ph(j) = "2" Then pj(j) = 7 If ph(j) = "3" Then pj(j) = 8 If ph(j) = "4" Then pj(j) = 8.3 If ph(j) = "5" Then pj(j) = 9 If ph(j) = "6" Then pj(j) = 9.3 If ph(j) = "7" Then pj(j) = 10 If ph(j) = "8" Then pj(j) = 0 If ph(j) = "9" Then pj(j) = 0 Next j For j = 4 To 5 ' D8 If ph(j) = "0" Then pj(j) = 8 If ph(j) = "1" Then pj(j) = 9 If ph(j) = "2" Then pj(j) = 10 If ph(j) = "3" Then pj(j) = 11 If ph(j) = "4" Then pj(j) = 12 If ph(j) = "5" Then pj(j) = 13 If ph(j) = "6" Then pj(j) = 14 If ph(j) = "7" Then pj(j) = 15 If ph(j) = "8" Then pj(j) = 16 If ph(j) = "9" Then pj(j) = 17 Next j For j = 6 To 7 ' FGA If ph(j) = "0" Then pj(j) = 18 If ph(j) = "1" Then pj(j) = 19 If ph(j) = "2" Then pj(j) = 20 If ph(j) = "A" Then pj(j) = 20.2 If ph(j) = "3" Then pj(j) = 21 If ph(j) = "B" Then pj(j) = 21.2 If ph(j) = "4" Then pj(j) = 22 If ph(j) = "C" Then pj(j) = 22.2 If ph(j) = "5" Then pj(j) = 23 If ph(j) = "D" Then pj(j) = 23.2 If ph(j) = "6" Then pj(j) = 24 If ph(j) = "E" Then pj(j) = 24.2 If ph(j) = "7" Then pj(j) = 25 If ph(j) = "8" Then pj(j) = 26 If ph(j) = "9" Then pj(j) = 27 If ph(j) = "F" Then pj(j) = 30 Next j For j = 8 To 9 ' D21 If ph(j) = "A" Then pj(j) = 25 If ph(j) = "B" Then pj(j) = 26 If ph(j) = "0" Then pj(j) = 27 If ph(j) = "1" Then pj(j) = 28 If ph(j) = "2" Then pj(j) = 29 If ph(j) = "3" Then pj(j) = 30 If ph(j) = "4" Then pj(j) = 30.2 If ph(j) = "5" Then pj(j) = 31 If ph(j) = "6" Then pj(j) = 31.2 If ph(j) = "7" Then pj(j) = 32 If ph(j) = "8" Then pj(j) = 32.2 If ph(j) = "C" Then pj(j) = 33 If ph(j) = "9" Then pj(j) = 33.2 If ph(j) = "D" Then pj(j) = 34.2 Next j For j = 10 To 11 ' D18 If ph(j) = "A" Then pj(j) = 9.2 If ph(j) = "B" Then pj(j) = 10 If ph(j) = "0" Then pj(j) = 11 If ph(j) = "1" Then pj(j) = 12 If ph(j) = "2" Then pj(j) = 13 If ph(j) = "3" Then pj(j) = 14 If ph(j) = "4" Then pj(j) = 15 If ph(j) = "5" Then pj(j) = 16 If ph(j) = "6" Then pj(j) = 17 If ph(j) = "7" Then pj(j) = 18 If ph(j) = "8" Then pj(j) = 19 If ph(j) = "9" Then pj(j) = 20 If ph(j) = "C" Then pj(j) = 21 If ph(j) = "D" Then pj(j) = 22 If ph(j) = "E" Then pj(j) = 23 If ph(j) = "F" Then pj(j) = 24 Next j For j = 12 To 13 ' D2S1338 If ph(j) = "0" Then pj(j) = 16 If ph(j) = "1" Then pj(j) = 17 If ph(j) = "2" Then pj(j) = 18 If ph(j) = "3" Then pj(j) = 19 If ph(j) = "4" Then pj(j) = 20 If ph(j) = "5" Then pj(j) = 21 If ph(j) = "6" Then pj(j) = 22 If ph(j) = "7" Then pj(j) = 23 If ph(j) = "8" Then pj(j) = 24 If ph(j) = "9" Then pj(j) = 25 If ph(j) = "A" Then pj(j) = 26 If ph(j) = "B" Then pj(j) = 27 Next j For j = 14 To 15 ' D16 If ph(j) = "0" Then pj(j) = 8 If ph(j) = "1" Then pj(j) = 9 If ph(j) = "2" Then pj(j) = 10 If ph(j) = "3" Then pj(j) = 11 If ph(j) = "4" Then pj(j) = 12 If ph(j) = "5" Then pj(j) = 13 If ph(j) = "6" Then pj(j) = 14 If ph(j) = "7" Then pj(j) = 15 If ph(j) = "8" Then pj(j) = 0 If ph(j) = "9" Then pj(j) = 0 Next j For j = 16 To 17 ' D19 If ph(j) = "0" Then pj(j) = 12 If ph(j) = "1" Then pj(j) = 13 If ph(j) = "2" Then pj(j) = 13.2 If ph(j) = "3" Then pj(j) = 14 If ph(j) = "4" Then pj(j) = 14.2 If ph(j) = "5" Then pj(j) = 15 If ph(j) = "6" Then pj(j) = 15.2 If ph(j) = "7" Then pj(j) = 16 If ph(j) = "8" Then pj(j) = 16.2 If ph(j) = "9" Then pj(j) = 17 If ph(j) = "A" Then pj(j) = 18.2 If ph(j) = "B" Then pj(j) = 19.2 Next j For j = 18 To 19 ' D3 If ph(j) = "0" Then pj(j) = 12 If ph(j) = "1" Then pj(j) = 13 If ph(j) = "2" Then pj(j) = 14 If ph(j) = "3" Then pj(j) = 15 If ph(j) = "4" Then pj(j) = 16 If ph(j) = "5" Then pj(j) = 17 If ph(j) = "6" Then pj(j) = 18 If ph(j) = "7" Then pj(j) = 19 If ph(j) = "8" Then pj(j) = 0 If ph(j) = "9" Then pj(j) = 0 Next j Write #2, ""; pj(0), pj(1); ""; pj(2), pj(3); ""; pj(4), pj(5); ""; pj(6), pj(7); ""; pj(8), pj(9); ""; pj(10), pj(11); ""; pj(12), pj(13); ""; pj(14), pj(15); ""; pj(16), pj(17); ""; pj(18), pj(19); "" Next x Close #1 Close #2
Cross-check on generator functioning comparing output ' allele frequencies ' to published original for UK caucasian. From 5000 'profiles' and 10,000 'alleles' for each locus Allele / Published / Generated Locus VWA 13 0.001 0.0010 14 0.105 0.1063 15 0.080 0.0794 16 0.216 0.2159 17 0.270 0.2727 18 0.219 0.2184 19 0.093 0.0919 20 0.014 0.0124 21 0.002 0.0020 THO1 5 0.002 0.0016 6 0.241 0.2513 7 0.194 0.1976 8 0.108 0.1041 8.3 0.001 0.0006 9 0.140 0.1407 9.3 0.304 0.2938 10 0.012 0.0103 D8 D8S1179 / D6S50 8 0.018 0.0195 9 0.013 0.0133 10 0.094 0.0929 11 0.066 0.0661 12 0.143 0.1363 13 0.333 0.3395 14 0.209 0.2067 15 0.088 0.0910 16 0.031 0.0302 17 0.004 0.0045 FGA 18 0.025 0.0239 19 0.056 0.0561 20 0.143 0.1474 20.2 0.002 0.0021 21 0.187 0.1831 21.2 0.002 0.0022 22 0.165 0.1609 22.2 0.011 0.0110 23 0.139 0.1394 23.2 0.004 0.0038 24 0.146 0.1494 24.2 0.002 0.0019 25 0.075 0.0764 26 0.035 0.0347 27 0.007 0.0069 30 0.001 0.0008 D21 D21S11 with Urquhart to Moller conversion 54 0.001 0.0013 57 (26) 0.001 0.0013 59 (27) 0.031 0.0330 61 (28) 0.160 0.1589 63 (29) 0.226 0.2210 65 (30) 0.258 0.2653 66 0.027 0.0267 67 (31) 0.069 0.0653 68 0.093 0.0913 69 (32) 0.018 0.0203 70 0.090 0.0926 71 (33) 0.001 0.0006 72 0.022 0.0200 74 0.002 0.0024 D18 D18S51 9.2 0.001 0.0005 10 0.008 0.0083 11 0.012 0.0111 12 0.139 0.1379 13 0.125 0.1273 14 0.164 0.1678 15 0.145 0.1405 16 0.137 0.1402 17 0.115 0.1116 18 0.080 0.0808 19 0.041 0.0406 20 0.017 0.0147 21 0.010 0.0102 22 0.005 0.0039 23 0.001 0.0021 24 0.002 0.0025 D2 D2S1338 16 0.037 0.0355 17 0.185 0.1875 18 0.087 0.841 19 0.110 0.1162 20 0.138 0.1369 21 0.032 0.0326 22 0.024 0.0244 23 0.112 0.1098 24 0.142 0.1409 25 0.111 0.1121 26 0.019 0.0174 27 0.002 0.0026 D16 D16S539 8 0.019 0.0221 9 0.129 0.1255 10 0.054 0.0567 11 0.289 0.2882 12 0.288 0.2819 13 0.186 0.1895 14 0.029 0.0210 15 0.005 0.0051 D19 D19S433 12 0.087 0.0834 13.0 0.222 0.2197 13.2 0.013 0.0132 14 0.382 0.3786 14.2 0.015 0.0184 15 0.177 0.1773 15.2 0.038 0.0385 16 0.041 0.0422 16.2 0.017 0.0194 17 0.005 0.0052 18.2 0.002 0.0025 19.2 0.001 0.0016 D3 D3S1358 12 0.001 0.0012 13 0.006 0.0063 14 0.132 0.1308 15 0.265 0.2645 16 0.247 0.2451 17 0.195 0.1998 18 0.141 0.1393 19 0.014 0.0130 As a further check for a 4 million run the combined 2 loci count for E 'allele' was 23,822 ,desired was 24,000 and F 'allele' was 24,181 for desired 24,000 Exploring further the 7 x 19 digit matches in one subset. Including the other 19 digit matches and shifting columns and checking all then there are at least 38 x 19 digit matches given a free choice. That is ANY 9 pairs matching and 1 pair matching in the 4 remaining alleles. This time with 4 million and have included all rare alleles for UK caucasians. Simulating all loci and alleles and in agreement with the published allele frequency tables. All generated profiles are parthenogenic in effect as generated purely randomly in each of the 4 million cases. With no allowance made for parentage, co-ancestry, in-breeding etc which will increase the chance of matches above the base-level determined here. So far I've fully sorted and match-checked the majority starting 1...,2.... and 3.... Remaining subsets 0..., 4... , 5.... , 6.... ,7.... 8... results over the next few days. Note the 12 digit matches as pairs,triples,quads and quins was enough to falsely convict people in the UK up till 1999. 19 datapoint matches are these days considered enough of a match to convict the likes of Colin Waite , with no other corroborative evidence. A single mismatch is stated in court to be an anomaly with biology eg 'spontaneous mutation' without any further biological testing to confirm this conjecture. Wheras a single datapoint mismatch should be evidence of exclusion unless there is biological confirmation of single allele mismatch within an individual ,say between semen and buccal cells. Results so far for 2,562,221 of the 4 million 'profiles' Subset 1................... ( 795,224 profiles) 12 digit matches 14,190 14 digit 429 16 digit 36 18 digit 3 20 digit 0 12 digit triples 806 12 digit quads 60 12 digit quins 4 subset 2................... ( 546,224 ' profiles') 12 digit matches 7,863 14 digit 241 16 digit 25 18 digit 2 20 digit match 1 which was 23165525234649133524 starting as 32165552326494315342 and 32615552324649313542 which before processing are directed into ordered pairs, as structured in DNA profiles. So no problem with the random number generator spuriously repeating the same sequence of numbers. converted to standard loci pairs on vWA,THO1,D8,FGA,D21,D18,D2,D16,D19,D3 (15,16)(6,9.3)(13,13)(20,23)(29,30)(15,17)(20,25)(9,11)(14,15)(14,16) 12 digit triple matches 355 qudruples 34 quintuples 7 subset 3................... ( 1,220,712 'profiles' ) 12 digit matches 42,679 14 digit 1,302 16 digit 118 18 digit 11 19 digit 7 20 digit match 1 One of those annoying matters of chance. So nearly 2x 20 digit matches but only one. match was 34235726232658443555 converted to standard (16,17)(7,8)(13,15)(20,24)(29,30)(13,17)(21,24)(12,12)(14,15)(17,17) started as 34237526326285445355 and 43325726326258445355 so random number generator ok. As a further check the generator function for the first 3 pairs has not changed with the addition of rare alleles. Just considering the first 6 digits then these numbers have not appeared in any of the previous outings of this simulation. Subset 4................... (999635 'profiles' ) 12 digit matches 23,166 14 digit 657 16 digit 49 18 digit 3 19 'real' ,not first 19, 2 20 digit match 1 44264525134528441322 which converts to vWA,THO1,D8,FGA,D21,D18,D2,D16,D19,D3 (17,17)(7,9.3)(12,13)(20,23)(28,30)(15,16)(18,24)(12,12)(13,14)(14,14) Started as 44625452314582441322 and 44625452134528441322 so RNG ok 12 digit triple matches 2,097 qudruples 263 quintuples 45 The 19 digit matches were from checking the 18 point matches for any match in either 19 or 20 position. There is of course many more 19 point matches possible if given free choce from any 9 pairs rather than in this simulation the first 9 pairs. If any statistician can tell me what this result implies for general case, I would be interested to hear. That is 7 matches on 9 designated pairs plus one match in either position of the tenth pair. Compared to how many matches on any 9 pairs plus one in either position of the remaining pairs. 7 x 10 ? = 70 ? 12 digit triple matches 4,364 12 digit quadruple 630 12 digit quintuple 116 A few of the added rare alleles do come through in 12,14 and 16 digit matches but none so far in 18 or 20 point matches. A final piece of analysis will be the quantitive change of allele frequencies in matches compared to the original profiles. Qualitively - increase in frequency of common alleles and decrease in frequency of rare alleles ( multi-modal matches). One more '10 locus' match in the remainder. 0....., and 5........ , 6.......... , 7........ 8....... 12 digit matches 7,082 14 digit 208 16 digit 15 18 digit 2 19 digit (real) 1 20 1 55164623232349351335 converts to (18,18)(6,9.3)(12,14)(20,21)(29,30)(13,14)(20,25)(11,13)(13,14)(15,17) started as 55614623232394531353 and 55614623322349353153 so RNG ok 12 digit triple matches 396 quad 37 quin 4 Summary of results of matches on first x of 10 loci matches rather than any x of 4 million 10 loci. 12 digit = 6 loci, 94,980 7 loci , 2,837 8 loci , 243 9 loci , 21 10 loci , 4 matches Aggrgated 6 loci higher order matches in 4 million Triples, 3,654 Quadruples , 394 Quins , 60 ----------------------------- I also checked whether there were any triple matches on 18 digits but none. Multi-modal match analysis First 7 loci data based on 2837 profiles from the total 14 digit match data and full alleles for UK Caucasian Allele / UK Caucasian allele frequency / match allele frequency Locus VWA 13 0.001 0 14 0.105 0.0767 15 0.080 0.0458 16 0.216 0.2843 17 0.270 0.3255 18 0.219 0.2249 19 0.093 0.0419 20 0.014 0.0009 21 0.002 0 THO1 5 0.002 0 6 0.241 0.2608 7 0.194 0.1791 8 0.108 0.0730 8.3 0.001 0 9 0.140 0.1078 9.3 0.304 0.3791 10 0.012 0.0001 D8 D8S1179 / D6S50 8 0.018 0.0025 9 0.013 0.0016 10 0.094 0.0597 11 0.066 0.0261 12 0.143 0.1251 13 0.333 0.4977 14 0.209 0.2346 15 0.088 0.0476 16 0.031 0.0051 17 0.004 0 FGA 18 0.025 0.0002 19 0.056 0.0263 20 0.143 0.1581 20.2 0.002 0 21 0.187 0.2545 21.2 0.002 0 22 0.165 0.1942 22.2 0.011 0.0005 23 0.139 0.1442 23.2 0.004 0 24 0.146 0.1576 24.2 0.002 0 25 0.075 0.0483 26 0.035 0.0120 27 0.007 0.0002 30 0.001 0 D21, D21S11 with Urquhart to Moller conversion in brackets 54 0.001 0 57 (26) 0.001 0 59 (27) 0.031 0.0070 61 (28) 0.160 0.1641 63 (29) 0.226 0.3086 65 (30) 0.258 0.3611 66 0.027 0.0062 67 (31) 0.069 0.0312 68 0.093 0.0627 69 (32) 0.018 0.0019 70 0.090 0.0543 71 (33) 0.001 0 72 0.022 0.0028 74 0.002 0 D18, D18S51 9.2 0.001 0 10 0.008 0.0007 11 0.012 0.0018 12 0.139 0.1581 13 0.125 0.1255 14 0.164 0.2201 15 0.145 0.1660 16 0.137 0.1532 17 0.115 0.1027 18 0.080 0.0555 19 0.041 0.0137 20 0.017 0.0016 21 0.010 0.0007 22 0.005 0.0002 23 0.001 0 24 0.002 0.0002 D2, D2S1338 16 0.037 0.0134 17 0.185 0.2624 18 0.087 0.0587 19 0.110 0.1059 20 0.138 0.1556 21 0.032 0.0085 22 0.024 0.0058 23 0.112 0.1110 24 0.142 0.1674 25 0.111 0.1066 26 0.019 0.0044 27 0.002 0.0002 Next based on 243 profiles only from 16 digit match data D16, D16S539 8 0.019 0.0041 9 0.129 0.0638 10 0.054 0.0144 11 0.289 0.3560 12 0.288 0.3724 13 0.186 0.1893 14 0.029 0 15 0.005 0 D19 D19S433 and D3 D3S1358 not analysed because only 23 profiles from 18 digit matches and 4 on 20 digit Unmistakable conclusion - the commonest alleles increase in frequency in matches, rare alleles decrease in frequency , and rarest alleles are practically absent from matches. At least at the level of 4 million pool. Giving 'Average Joe' profile on first 7 loci of VWA,THO1,D8,FGA,D21,D18,D2,D16 (16,17)(6,9.3)(13,13)(21,21)(29,30)(14,14)(17,17)(11,12) So for 4million profiles ,fully sorted and match checked there were 4 matches. Because each of these 4 is a minimum ie greater than 1 and less than 2 and as so near a pair of matches in the 3..... section I will declare result as 4.8 matches in 4 million as reasonable conclusion. For an interesting exploration and derrivation of this "Square Law" look in the Usenet 2003 archives for group "uk.legal" or "alt.sci.math.probability" and thread titled "Prosecutors fallacy revisited " Then by square law implies 1.2 matches in 2 million or 1 in 1.8 million. Whether the square law applies scaling upward I've no idea but assuming it does then it implies:- 4.8 matches in 4 million or 1 in 830,000 30 matches in 10 million or 1 in 330,000 750 matches in 50 million or 1 in 67,000 75,000 in 500 million or 1 in 6,700 7.5 million in 5000 million or 1 in 670 if whole earth was profiled with SGM+ The next determination is how co-ancestry factored in reduces these figures for real world representation. The other result for 4 million population is for matches of 19 out of 20. For matches in first 9 pairs and 1 allele match in 10th pair is 12. Giving about 120 matches of any 19 out of 20 in a 4 million population or 1 in 33,000 To show how wrong forensic 'scientists' are. This was the first discoverd randomly generated 10 loci matching profile. vWA,THO1,D8,FGA,D21,D18,D2,D16,D19,D3 (15,16)(6,9.3)(13,13)(20,23)(29,30)(15,17)(20,25)(9,11)(14,15)(14,16) Multiply the allele frequencies together gives 0.08x.216x.241x..304x.333x.333x.143x.139x .226x.258x.145x.115x.138x.111x .129x.289x.382x.177x.132x.247 =3.42 *10^-15 reciprocal = 290 million million or 290 trillion. A forensic 'scientist' given the above profile is likely to say that the chance of finding a pair of such profiles would be 1 in 290 trillion.

Email Paul Nutteing by removing 4 of the 5 dots
or email Paul Nutteing ,remove all but one dot
Or a message on usenet group uk.legal has got to me recenently a couple of times.
My original accounts nutteing@quickfindit.com and the nutteing2@quickfindit account are now dead. A lot of the contents of this file plus other material 'peer reviewed' on the main forensic science usergroup

Background
A simulation of a large DNA profile database
A simulation of DNA profile 'families'
A simulation of DNA profile families with consanguinity
A simulation of DNA profile 'families' for 6 generations
dnas.htm revisited with all alleles represented
dnas.htm revisited for >8 percent allele frequency subset (similar ancestry )
Simulation of Taiwanese Tao and Rukai populations to explore the effect of within and without ancestral clusters
Basques autochthonous DNA profiles simulation, 9 loci
Australian Capital Caucasian 9 loci simulation
Australian Capital Caucasian 9 loci simulation, >= 5% allele frequency
CODIS, 13 Loci Caucasian Simulation
Automating the macros
Exploring other DNA profile match scenarios
Suspect familial matching
Return to co-ancestry factor in the NDNAD simulations
144 random matches in 65,000 -- ONLY?


Powered by counter.bloke.com