Saturday, September 25, 2010

Polymarker (PM)


The PM portion of the PM plus DQA1TM  kit involves 5 genetic loci in addition to DQ alpha. (The manufacturer/ distributor's name has changed from Roche Molecular to PE Applied Biosystems).  These additional loci are named for historical reasons.  The 5 loci are LDLR, GYPA, HBGG, D7S8 and GC.  Each of these represents a distinct location or locus in the DNA.  The 5 non-DQ alpha loci have rather simple allelic variations compared to DQ alpha.  For example, there are only two LDLR alleles detected by the system, allele A and allele B.  The same is true for GYPA and D7S8 that each have A and B alleles that can be detected .  The loci, HBGG and GC each have A, B and C alleles, three variations each in other words.  Thus, reading PM typing strips is fairly simple at least on the surface.  Here are some examples:



 
 The manufacturer recommends a lower limit of input DNA for PM plus DQA1 typing.  The reason for this lower limit (2 nanograms, ng) is the possibility of missing alleles if the input DNA is too low.  Missing alleles (related terms are "allelic dropout" and differential amplification) is certainly possible in the author's opinion, particularly with low amounts of DNA or degraded DNA.  The phenomenon of failing to detect all alleles present was discussed in the context of DQ alpha in the original User's Guide  although the conditions under which this may occur were not precisely defined with respect to amount of input DNA and the condition of input DNA.  The potential for missing an allele of course increases if the control or S dot is absent or extremely weak, remembering that the C and S dots test whether a threshold has been reached at the PCR and later stages.  The following example illustrates how a DNA profile of one person might change to that of another due to failure to detect an allele. 



 
 Failure to detect alleles under certain circumstances is a theoretical probability and was actually demonstrated for DQ alpha in the original User's Guide.  The theory that addresses this is called, the "stochastic effect."  In addition to the stochastic effect, a PCR phenomenon called "differential amplification" may play a role when input DNA amounts are low, when input DNA is extensively degraded and possibly at other times. 
 PM plus DQA1 is frequently used on mixed DNA samples from two or more people.  The following example illustrates some of the ambiguity that can arise if interpretations are not cautious:

 
In the example above, since two of the loci (HBGG and GC) show three alleles, the sample was a mixture of at least two people.  The problem here is that any two people can be included as contributing to the mixture.  The typing strip is saturated, meaning every dot that can be showing is showing.  A poorly recognized limitation of the PM strip is that it is very easily saturated.  For example, two people of types AB/AA/AB/BB/BC (person 1) and AB/BB/AC/AA/AA (person 2) could, when their DNAs are mixed produce the pattern in the example.  In fact, there are almost limitless combinations of 2 types that could produce the pattern.  There are also many combinations of two people that would lead to a typing strip lacking one or two dots.  Finally, there are many mixtures that may mimic a single source of DNA.  For example:
  

 
 The profile in this example could have come from a single person whose profile was, AB/AB/AC/AB/AB.  Alternatively, two people of types AA/AB/AC/AA/BB and BB/AB/AA/BB/AA if mixed, could produce the profile.  There are many other possible combinations of people who, when their DNAs are mixed, could produce the profile.  In fact, the only individuals excluded are those possessing the HBGG, B allele and the GC, C allele assuming that the typing strip is reliably detecting all the alleles present.  Extreme caution should be used when there is a possibility of a DNA mixture.  It is arguable whether the system should be relied upon when there is an unresolved mixture.  The ease of saturation may lead to false inclusions.  False exclusions are also possible when the amount of input DNA is low, input DNA is degraded or the S dot is weak or absent.

Dot Hybridization


 
From the pattern of probes that the amplified DNA binds to, a potential DNA type, also called a genotype, can be inferred. 
DQ alpha typing strips look like this before any types are obtained:
 



 
 The invisible dot to the right of the number 1, has a DNA probe for the 1-allele (variation) for DQ alpha.  The invisible dot to the right of the 2 has a DNA probe for the 2-allele and so on.  The 1-allele itself has variations, the 1.1,1.2 and 1.3 subtypes, also called alleles.  Notice that the typing strip has no specific dot or probe for the 1.2 subtype.  Also, the typing strip can't distinguish between the 4.2 and 4.3 subtypes and there is a single dot for these.  It is quite possible that there exist DQ alpha alleles that would be undetected by the typing strip and alleles that may be further subtypes of the alleles that the strip does detect.
 Here are some examples of how the strips are read: 



 



 
This last example brings up an important issue with DQ alpha typing.  The 1.2 allele is actually the second most common allele in most populations.  This means there will be frequent situations where the 1.2 allele may be present but undetected as in the last example.  An obvious question is:  Why not just have a specific probe for the 1.2 allele?  The answer is that the typing strip already maximizes the probing of a relatively short stretch of DNA.  That is, the DQ alpha locus itself is only about 240 base pairs long.  The multiple probe typing strip was probably about the best that could be done in terms of detecting multiple alleles of this small locus in a single typing step.  
Historicall, DQ alpha was often the first PCR-based test that forensic labs used.  Actually, the DQ alpha system is quite different from the majority of PCR applications in the scientific community.  This will be explained in more detail below. 



 
 



 

DQA1 (also known as DQ alpha)


The PM plus DQA1TM (PE Applied Biosystems) typing kit targets six genetic loci.  All six are copied in the initial PCR.  The products from this reaction are then placed onto two separate typing strips.  One strip is for DQ alpha and the other types the remaining five loci. 
There are several steps in a DQ alpha PCR test: 
1.  DNA from 50 or more cells is extracted.  Notice that this test requires fewer cells that the RFLP test.  Sensitivity (the number of cells needed) is the main advantage of PCR tests.  However, the increased sensitivity also makes PCR tests more vulnerable to trace contaminants, DNA from unexpected sources, in other words. 
2.  The DNA from the sample is copied over and over resulting in amplification of the original target sequence.  The copying or amplification is accomplished in a machine specially designed for this purpose.  This machine is called a thermal cycler. 
3.  The amplified DNA is now treated with a variety of probes that are bound to a blot (see RFLP: Note: In RFLP, the target DNA is bound to the blot and the probe DNA is added.  For the DQ alpha dot blot, the probe DNAs are bound to a small blot strip and the target DNA is added).

Partial Profiles


Use of "partial profiles" is a newly emerging and fairly disturbing trend.  A partial profile is one in which not all of the loci targeted show up in the sample.  For example, if 13 loci were targeted, and only 9 could be reported, that would be termed, a partial profile.  Failure of all targeted loci to show up demonstrates a serious deficiency in the sample.  Normally, all human cells (except red blood cells and cells called "platelets") have all 13 loci.  Therefore, a partial profile represents the equivalent of less than a single human cell.  This presents some important problems: 
1.         A partial profile essentially proves that one is operating outside of well-characterized and recommended limits. 
2.         Contaminating DNA usually presents as a partial profile, although not always.  For this reason, the risk that the result is a contaminant is greater than for samples that present as full profiles.
3.         A partial profile is at risk of being incomplete and misleading.  The partial nature of it proves that DNA molecules have been missed.  There is no way of firmly determining what the complete profile would have been, except by seeking other samples that may present a full profile. 
Most forensic laboratories will try to obtain full profiles.  Unfortunately, in an important case, it may be tempting to use a partial profile, especially if that is all that one has.  However, such profiles should be viewed skeptically.  Over-interpretation of partial profiles can probably lead to serious mistakes.  Such mistakes could include false inclusions and false exclusions, alike.  It could be said that, compared to the first PCR-based tests introduced into the courts, use of partial profiles represents a decline in standards.  This is because those earlier tests, while less discriminating, had controls (known as "control dots") that helped prevent the use of partial profiles.  The earlier tests will be discussed below, primarily for historic reasons, but also because they do still appear on occasion.

Multiplex STR


One of the more commonly encountered STR test designs in forensic testing is called, Multiplex STR.  There are multiplex, PCR reagent kits sold by both:  PE Applied Biosystems and by  Promega.  Such systems combine three or more different PCRs in one reaction that targets distinct STR loci at the same time.  Three of the commonly used loci are called, CSF1PO, TPOX and THO1.  Again, the names of the loci have historical significance, but are of little importance as names. 
 Profiler Plus and CofilerTM (PE Applied Biosystems) combines 13 different STR loci.  PowerplexTM (Promega) uses the same 13 loci but the primers used are different. The Promega kit incorporates published primer sequences, a significant scientific advantage, since without the primer sequences, it is unclear which STRs at some loci are targeted.  A newer typing kit, IdentifilerTM (PE Applied Biosystems) incorporates the original 13 loci but adds 2 additional loci.  By design (meaning where the primers were placed on the DNA by the designers) multiplex STR loci have different, non-overlapping size ranges so that DNA fragments from the different loci will have different, non-overlapping size ranges.  Or, if the sizes overlap, they are tagged with differing dyes to help distinguish the 13 loci.  These test systems have boldly ambitious designs and should be considered fairly experimental, especially for samples whose quantity and/or quality is outside tested limits.   
There have been some discrepancies in profiles obtained with test kits from the two manufactures when the same samples were analyzed.  These discrepancies are not extremely common but are noticeable and fairly dramatic when they occur.  Any base within DNA can mutate (ie. change).  For example, an A base at a particular position can change to a G. Such mutations usually first appear in a sperm or egg cell.  Each mutation then appears throughout the body of the person who results from such a sperm or egg.   Discrepancies in test kit results are thought to be due to mutations in the sites that the primers bind.  These events are called, primer binding site mutations or PBS mutations.
Multiplex STRs are often combined with PCR for another locus called, amelogenin (pronunciation varies, but usually it is AM'-EEL-O-GEN-IN).  Amelogenin adds little to the discriminating power of the test.  Its purpose is to help distinguish male and female sources of DNA by detecting the X and Y chromosomes.  The amelogenin products have sizes that place them outside the size ranges of the other loci.  
Compared to PCR-based systems originally introduced, such as PM plus DQA1 (PE Applied Biosystems) multiplex STRs are technically more simple and direct at the allele detection stage.  On the other hand, multiplex STR are slightly more vulnerable to missing alleles.  There are two reasons for this.  1)Larger DNA fragments are degraded before smaller ones.  This is simply due to the fact that larger DNA molecules are bigger targets for degradative enzymes than smaller DNA molecules.  2)PCR itself favors (will produce more of) smaller DNA targets compared to larger ones that take more time to copy. The copying is done by a protein called an enzyme.  It can finish copying smaller DNA fragments more rapidly than larger ones.  
Both of these factors result in a tendency for small DNA fragments to be seen more readily than larger ones.  This is not an overwhelming tendency but certainly should be considered when amounts of input DNA are low, when DNA degradation is suspected, and particularly when a single small STR allele is weakly observed at a given STR locus.  
STRs are prone to an artifact called, "stutter bands" or "shadow bands."  These are thought to be due to the DNA repeats slipping out of register during the PCR process.  These are spurious PCR products that are usually one repeat length smaller than the main band.  The main problem that these pose is that it may be difficult to impossible to determine whether light intensity bands are due to stutter or due to presence of a mixture.  Although the stutter bands are predictably below (shorter than) the main band, the stutter bands do often align with common alleles. 



 
Most forensic laboratories are aware of stutter artifacts and many take extremely careful and appropriate countermeasures.  However stutter artifacts conceivably could play a role if inappropriate attempts are made to interpret minor components of a mixture.  
Some of the current STR detection/typing schemes use thin tubes called capillaries, instead of flat gels.  When a capillary is used, the results are often displayed  as tracings on a graph, instead of the image display shown above.  On such tracings, each main STR product will appear as a large peak while stutter bands appear as smaller  peaks (to the left).  The tracings are called, electropherograms (ELECTRO-FERO-GRAMS).  The tracing data should be accompanied by 



 
 numeric data that reveals:  the measured size of each PCR product, the intensity (peak height) and the estimated allele size.  The numeric data can be important in determining the quality of the results. 
 
 

 
 The two figures above show some alternative ways in which STR results/data are presented.  Basically the peaks represent tracings of bands that have come off the end of a gel, or may represent tracings of the gel itself, depending on the equipment used.  Larger DNA fragments are on the right and smaller ones on the left.  There are recommended standards, called thresholds for how high or low the peaks may be.
All technology has limitations.  For multiplex PCRs, the most serious limitations are in the areas of samples that are minimal, degraded, mixed, over-interpreted, contaminated or even potential combinations of these.  Some current practices lack support by the established literature.  Over-interpretation can also occur when there are partial profiles.[1]  The scientific system recognizes the human tendency toward over-interpretation and offers the countermeasures:  independent review, independent verification, scientific controls and demonstrations of reproducibility.  These reviews and controls are considered integral parts of the scientific process.
PCR-based testing is potentially useful since it is currently the only quick method of amplifying really minuscule amounts of DNA.  However, it is important to recognize that PCR based methods are exquisitely sensitive to contamination and need to be interpreted with extreme caution.  Match probabilities generated with some STR typing systems may involve extreme numbers perhaps giving the impression of an infallible result.  Scientific rigor often requires that extreme numbers be placed in a context that considers all aspects of testing including laboratory error rates and technical limitations.

What is the complement of a DNA sequence?

  This might be more information than you would like, but to really understand PCR primers, try to walk through this: 
The complement of a DNA sequence is the sequence written backwards exchanging all A's for T's, all T's for A's, all G's for C's and all C's for G's.  For example, the complement of the sequence, AGTA is TACT.  An easy way to get the complement of a DNA sequence is to write another line below the original sequence remembering that A replaces T and G replaces C.  Then read the lower line backwards: 
So, for the sequence:
 GATCTTAGCTTTAAAGCCC
 write the complementary line below it giving:
GATCTTAGCTTTAAAGCCC
CTAGAATCGAAATTTCGGG
 Then, just read the lower line backwards (from right to left) giving the complement: 
GGGCTTTAAAGCTAAGATC
In practical words, the upstream (left) primer can be a direct reading of the target sequence while the downstream primer (right) must be the complement of the directly read sequence.  
If the above is confusing, it may suffice to think of the primers as  two arrows that point at one another with the STR located between them.  This is how the PCR targets the locus and the STR.
In practice, PCR primers are usually at least 17 bases in length.  The point here is that to use PCR to target an STR, the primers recognize constant, conserved sequences that flank the actual STR.  This means that the actual length of the target sequence depends on where the primers are placed in the flanking sequence.  For example, the Promega and PE, Applied Biosystems test kits use mostly different primers.  For example, the upstream primer could be designed to recognize DNA 100 bases upstream of the sequence shown.  Similarly, the downstream primer could be designed to recognize DNA further downstream.  Such placement of the primers by design, further upstream and downstream, would make all alleles (variations) of the STR appear to be larger than if the primers are placed by design close to the STR itself.  Wherever the primers are placed, that defines the region we will examine.  That region will then vary among individuals due to changes in the STR itself as explained above for the simple STR based on the repeating AT motif.

After PCR is used to provide many copies of a given person's STR, the products (copies) are separated according to size on an electrophoretic gel (see RFLP above for more details about gels).  The gel can be flat, as for RFLP, or it can be in a round tube, called a capillary with a detector at the end of it.  Typical flat gel STR results look like this:
 The black bars are called bands.  Each band is made up of many identical-size DNA molecules that were produced by PCR.  The gel separates smaller bands (DNA molecules) from larger ones.  The bands near the lower end of the gel are smaller (ie. the DNA fragments are shorter in length)  than those near the top.  For example, looking at the reference ladder, the first band near the lower end of the gel is the smallest STR.  For simplicity, let's say this smallest band contains a single repeat such as CATG, flanked by other DNA that the primers actually recognize in everyone's DNA.  The next higher band in the ladder would then contain 2 repeats, CATGCATG; the next 3 repeats and so on.  By comparing the positions of bands in the unknown samples with the reference ladder, the allele sizes are deduced.  In this example, Sample A had bands at the 2-repeat position and the 5-repeat position. Common terminology would call this sample a 2,5 type.  Sample B would be called,  2,4.  For a single person, each locus normally has two alleles and these can be different (heterozygous) or the same (homozygous). 

STRs


This will be presented in some detail because STRs are important in current, forensic DNA testing.  The abbreviation, STR stands for Short Tandem Repeat.  STRs are the type of DNA used in most of the currently popular forensic DNA tests.  STR is a generic term that describes any short, repeating DNA sequence.  For example, the DNA sequence ATATATATATAT is an STR that has a repeating motif consisting of two bases, A and T.  It turns out that our DNA has a variety of STRs scattered among DNA sequences that encode cellular functions.  For reasons that are not entirely understood, people vary from one another in the number of repeats they have, at least for some STR loci.  For example, person #1 may have ATATAT at a particular locus while person #2 may have ATATATATATAT.  Thus, STRs are often variable (polymorphic) and these variations are used to try and distinguish people.  The term, STR doesn't necessarily imply PCR.  PCR is one of many methods that might be used to help analyze STRs.  STRs have also been analyzed by DNA sequencing for example.  To understand PCR-assisted STR typing, it is useful to briefly consider how such PCRs are designed.
Suppose that laboratory data revealed the following DNA sequence:
 --ATGCTAGTATTTGGATAGATAGATAGATAGATAGATAGATAAAAAAATTTTTTTT--
The STR is underlined and consists of the sequence, GATA repeated 7 times.  The dashes at the beginning and end of the overall sequence shown indicate that there is more sequence available both upstream and downstream of the region shown.  Remember, DNA is relatively very long and linear and we are just going to look at a small region of it. 
Now, let's say we want to design a PCR to examine this same locus in other people.  To design the PCR, we need two primers, short synthetic DNA molecules that recognize the region.  One primer might be, ATGCTAGTA (Italics, in the above sequence) a sequence that would recognize the DNA flanking the left side of the STR.  The second primer might be, AAAAAAAATTTTTT.  This is called the downstream primer and it might be difficult to recognize in the sequence.  The reason it is difficult to recognize at first is that it is the complement of the sequence, AAAAAAAATTTTTT (italics, on the right in the longer sequence above).  See "General Considerations", for a more detailed discussion.