Genetics: Linkage Analysis

Introduction

Linkage analysis is a method that is used in both the detection of female 'at-risk' carriers and for prenatal diagnosis. In many families, linkage analysis has been replaced by mutational analysis but in a small number of families in whom the mutation cannot be identified, linkage analysis remains the only method for the genetic diagnosis of carriers.

Two genetic loci are said to be in linkage if the alleles at these loci segregate together more often that would be expected by chance – that is the two loci are so close together on the same chromosome that the chances of them separating by a crossover event (recombination) during Meiosis is small. The probability that any two alleles at two randomly selected loci with be inherited together is 0.5. If two loci are closely linked then the chances of a crossover or recombination event occurring is <0.5. The chances of recombination taking places is linked to the distance between any two loci. The recombination fraction [θ] is a measure of the genetic distance between two loci. The distance between two loci is measured in centimorgans and 1 centimorgan is defined as the genetic distance between two loci with a recombination frequency of 1%. Although the centimorgan is not a measure of physical distance, it typically equates to a physical distance of one million base pairs. So two loci close to the F8 gene with a 5% probability of recombination would be 5 centimorgans apart i.e. approximately 5 million base pairs.

The aim of linkage analysis is to identify a marker that co-segregates with the gene of interest and so can be used to track the gene within a family without actually knowing the mutation. By definition this marker must co-segregate with the gene of interest and so be present in affected family members but absent in unaffected family members. In the era before rapid sequence analysis, linkage analysis was the principal method for establishing the carrier status of 'at-risk' females within a family and for pre-natal diagnosis.

Whilst we usually think of linkage analysis using DNA markers, other markers such as proteins can be also be used. Such a case is the gene for Glucose-6 phosphate dehydrogenase [G6PD] which maps to the long arm of the X-chromosome at Xq28 close to the gene for factor VIII [F8]. Close linkage between the loci for G6PD and F8 has allowed prenatal diagnosis of haemophilia in the foetuses of women who are heterozygous for two electrophoretic variants (A and B) of G6PD [Edgell et al].

The pedigree below illustrates the theoretical use of G6PD variants (A and B) for carrier detection in a family with severe haemophilia A.

 

In this pedigree, I:1 and III:2 have severe haemophilia A [VIII:C<1 IU/dL). Analysis shows that they both have the A variant of G6PD. In contrast, the unaffected males in this pedigree have the B variant. So in this family we can use the A variant of the G6PD protein to track the abnormal F8 gene.

III:3 wishes to know if she is a carrier so severe haemophilia A or not. From the pedigree she has a 1/2 chance of being a carrier. If we use the G6PD variants [remember the gene for G6PD is located on the X-chromosome at Xq28 close to the F8 gene which also maps to Xq28] - then III:3 has inherited the B allele from her father and the A allele which tracks with the abnormal F8 gene from her mother and she is, therefore, likely to be a carrier. Bayesian risk analysis would allow us to make more confident predictions as to her carrier status but to undertake this we would need to know the frequency of recombination occurring between the F8 gene and the G6PD gene. Furthermore, measurement of the FVIII:C and VWF:Ag ratio would allow us to derive a VIII:C/VWF:Ag ratio and this may allow us to more accurately predict the carrier status for III:3 - see Bayesian risk analysis.

There are, of course, serious limitations to this method of linkage analysis and in particular the risks of a recombination event occurring with each generation and as a result incorrectly assigning the carrier status of ‘at-risk’ females and in the case of pre-natal diagnosis - establishing whether a fetus is an affected male or not. In addition, it relies upon the identification of women who are heterozygous for variants of G6PD. This is found in approximately 40% of black females in the US but uncommon in other ethnic groups.

 

Polymorphic DNA Markers

We have seen how we can use protein variants to track a gene within a family but more commonly we use DNA markers.

The aim of linkage analysis is to identify a DNA marker that co-segregates with the gene of interest and so can be used to track the gene within a family without actually knowing the mutation. The markers that we now commonly use to track a gene within a family are known as polymorphic markers or polymorphisms. There are various types of polymorphisms

A. Single Nucleotide Polymorphisms [SNPs] - pronounced 'SNIPS'

B. Short Tandem Repeat [STRs] or Variable Number Tandem Repeats [VNTRs]

A. Single Nucleotide Polymorphisms [SNPs]: are single nucleotide changes that usually, although not always result in no alteration in the amino acid sequence of the protein of interest. Polymorphisms are located throughout out the human genome and can be found both within a gene (so-called intragenic polymorphisms - usually within the introns of a gene or in the immediate 5' and 3' untranslated regions [upstream of downstream of the coding sequence of a gene]) or closely linked to a gene (so-called extragenic markers). The further a marker is from the gene of interest, the greater the chance that recombination will occur during meiosis.

Historically, SNPs were often designated by the restriction endonuclease or enzyme which was used to digest the DNA prior to gel electrophoresis and Southern Blotting e.g. within the F8 gene the enzyme Bcl I identifies an intragenic polymorphism located within intron 18 and which cuts the DNA into two sequences and which gives rise of 2 fragments of 0.8kb and 1.1kb when the digested DNA fragments are resolved on agarose gel, blotted on nylon membranes and then probed with a labeled DNA fragment that binds to the DNA sequences of interest.

Similarly the enzyme Bgl I identifies a SNP located within intron 25 of the F8 gene and cuts the DNA into two sequences of 5kb and 20kb.

The enzyme Bgl II identifies a SNP located close to but not part of the F8 gene and gives rise to two fragments of 5.8kb and 2.8kb.

The common feature is that when digested with a restriction endonuclease e.g. Bgl II, Bcl I, Bgl I etc, DNA products are generated which are of different length. The polymorphisms giving rise to these differing fragments are known as Restriction Fragment Length Polymorphisms or RFLPs. Click HERE for more information on RFLPs.

Southern blotting is rarely performed today and most SNPs are detected by PCR with either sequence analysis or resolution of the DNA fragments on gel electrophoresis. For a review on Southern Blotting - click here.

 

B. Short Tandem Repeat [STRs]: STRs are useful DNA markers as they are highly polymorphic and inherited in a strictly Mendelian fashion.

Areas of repetitive DNA occur throughout the genome where the repeating unit is very small, usually 1-6 nucleotides. These are generally polymorphic within a population and can be used for bone marrow transplant engraftment, forensics, identity testing, paternity testing etc. Common STRs include dinucleotide repeat sequences e.g. [CA] in which the repeated sequence occurs multiple times a e.g. [CACACACACA] and are notated, therefore, as [CA]. Other STRs include trinucleotide repeats e.g. [ATT]n or tetranucleotide repeats e.g. [GATA]n. STRs are widely used in genetic linkage studies and the reason for this lies in the greater chance that a particular individual may be heterozygous for a particular marker. Although the number of repeat sequences can change - this happens only every 100 generations or so.

Click STR or VNTR for information.

The pedigrees below illustrates linkage analysis using two hypothetical polymorphisms:

Pedigree 1a: Linkage analysis using an a SNP.

In the pedigree above we have undertaken linkage analysis using a SNP that has two possible alleles - A or B. As we are looking, in this case at the F8 gene - males are hemizygous that is they have only a single X-chromosome and so can can have only a single SNP [A or B] whilst females possess 2 X chromosomes and so can have three possible combinations - homozygous AA, homozygous BB or heterozygous AB.

In this pedigree with severe haemophilia A, we can see that the abnormal F8 gene is marked by the A allele of our SNP. The affected males in the family shown by the solid squares all have the 'A' allele of our SNP whereas the unaffected male III:2 has the 'B' allele.

II:2 has to have both the A and B alleles i.e. she is heterozygous for the polymorphisms - so that she can have two sons with differing genotypes.

III:3 must inherit the A allele from her father [he has only a single X chromosome] and she has inherited the A allele from her obligate carrier mother II:2 - so III:3 must be a carrier and indeed this is confirmed by the finding that she has a son IV:3 with severe haemophilia A. However - we could not use this polymorphism for pre-natal diagnosis in III:3 as she is homozygous AA and so we would be unable to establish which of the two A alleles tracked with abnormal F8 gene.

In the cases of IV:1 and IV:2 - both must inherit a B allele from their father but again it is not clear which A allele has been inherited from their mother [III:3] i.e. the one that tracks with abnormal F8 gene or the other. This SNP cannot, therefore, be used to establish the carrier status of IV:1 or IV:2.

 

Pedigree 1b: Linkage analysing using a VNTR

In this pedigree which is the same as pedigree 1a, we have used an STR- in this case the repeat sequence - [GT]n located within intron 1 of the F8 gene. Again males can only have a single copy of this sequence but females can have various combinations depending upon the number of repeat sequences.

We can see that the abnormal F8 gene is marked by the 17 repeat sequence of the [GT]n VNTR [i.e. there are 17 GT repeats with intron 1 of the F8 gene in III:1, IV:3 and II:3] whereas the unaffected male III:2 has the 15 repeat allele. II:2 has to have both the 15 and 17 alleles so that she can have two sons with differing genotypes.

III:3 must inherit the 20 repeat allele from her father [he has only a single X chromosome] and she has inherited the 17 repeat allele from her obligate carrier mother II:2 - so III:3 must be a carrier and indeed this is confirmed by the finding that she has a son IV:3 with severe haemophilia A. However - we can now use this polymorphism for pre-natal diagnosis in III:3 as she is heterozygous 15/17 and so we know that the 17 repeat allele tracks with the abnormal F8 gene.

In the cases of IV:1 and IV:2 - both must inherit a 18 repeat allele from their father but now we can see that IV:1 has inherited the 18 repeat allele from her mother and so is not a carrier of severe haemophilia A, whereas IV:2 has inherited the 17 repeat allele and so is a carrier. Furthermore, we can use this [GT]n repeat for pre-natal diagnosis in IV:2.

This pedigree highlights the value of VNTRs in both carrier detection and pre-natal diagnosis. As a result of the variation in copy numbers between individuals when we use VNTRs, there is a greater change that a female will be heterozygous for a particular marker.

 

The following illustrations show a variety of SNPs and VNTRs.

1. Electropherogram showing an [ATT]n repeat sequence located within intron 5 of the human antithrombin gene at chromosome 1q23. Alleles range in size from [ATT]5 to [ATT]18 and this locus reveals 76% to 87% heterozygosity.

The sequencing gel below shows the antithrombin [ATT]n repeat sequence but instead of displaying an electropherogram - the bases are displayed as bands on an autoradiograph.

To establish the frequency of the various alleles using the Antithrombin gene [ATT]n repeat - primers were designed to amplify the repeat and a series of DNA samples were amplified and the products run on agarose gel. The allelic frequencies are summarised in the table below:

Comments

Linkage analysis requires:

1. The DNA of an affected male so that the allele that tracks with the abnormal gene can be established. In some cases it may be possible to infer which polymorphic allele tracks with the abnormal gene if sufficient family members are available - see above:

2. Correct paternity. There is a fundamental assumption in linkage analysis that the paternity is as given i.e. correct. The rate of non-paternity is commonly quoted to be around 10% but the true value varies according to the population studied.

3. Linkage analysis can be combined with the results of factor assays and Bayesian risk analysis undertaken to establish the risk that a particular female is or is not a carrier of haemophilia or other inherited coagulopathies.

4. Linkage analysis has in many cases been replaced by direct mutation analysis. However, in approximately 3% of cases of severe haemophilia A, no mutation can be identified within the F8 gene and so linkage analysis remains the a valuable method for accurately establishing carrier status. However, there is a fundamental assumption that the cause of the haemophilia A in these families resides within the F8 gene and so we are justified in using polymorphisms in and linked to the F8 gene. This is clearly inappropriate if the cause of the disorder resides on another part of the X chromosome or another chromosome.

5. In families who are non-informative for all the intragenic polymorphisms i.e. we cannot find a polymorphism for which a particular female is heterozygous - we may have to use linked extragenic markers. In these cases due to the risks of recombination - it is unwise to rely upon the results of a single linked marker and use of a number of linked markers should be used to confirm the findings taking into account any additional information that may be available from phenotypic assays.

6. The allelic frequencies for some of these Polymorphisms vary with differing ethnic populations.

 

Useful Links & References

1. Bennett, R.L., et al., Recommendations for standardized human pedigree nomenclature. Pedigree Standardization Task Force of the National Society of Genetic Counselors. Am J Hum Genet, 1995. 56(3): p. 745-52.

2. Bernardi, F., et al., RFLP analysis in families with sporadic hemophilia A. Estimate of the mutation ratio in male and female gametes. Hum Genet, 1987. 76(3): p. 253-6.

3. Bowen, D.J., Haemophilia A and haemophilia B: molecular insights. Mol Pathol, 2002. 55(2): p. 127-44.

4. Brocker-Vriends, A.H., et al., Sex ratio of the mutation frequencies in haemophilia A: coagulation assays and RFLP analysis. J Med Genet, 1991. 28(10): p. 672-80.

5. Edgell, C.J., et al., Prenatal diagnosis by linkage: hemophilia A and polymorphic glucose-6-phosphate deydrogenase. Am J Hum Genet, 1978. 30(1): p. 80-4.

6. Fischer, C., et al., Modelling germline mosaicism and different new mutation rates simultaneously for appropriate risk calculations in families with Duchenne muscular dystrophy. Ann Hum Genet, 2006. 70(Pt 2): p. 237-48.

7. Gitschier, J., et al., Antenatal diagnosis and carrier detection of haemophilia A using factor VIII gene probe. Lancet, 1985. 1(8437): p. 1093-4.

8. He, M. and W. Li, PediDraw: a web-based tool for drawing a pedigree in genetic counseling. BMC Med Genet, 2007. 8: p. 31.

9. Jayandharan, G., et al., Informativeness of linkage analysis for genetic diagnosis of haemophilia A in India. Haemophilia, 2004. 10(5): p. 553-9.

10. Ljung, R.C. and E. Sjorin, Origin of mutation in sporadic cases of haemophilia A. Br J Haematol, 1999. 106(4): p. 870-4.

11. Mitchell, M., S. Keeney, and A. Goodeve, The molecular analysis of haemophilia B: a guideline from the UK haemophilia centre doctors' organization haemophilia genetics laboratory network. Haemophilia, 2005. 11(4): p. 398-404.

12. Ogino, S., et al., Standard mutation nomenclature in molecular diagnostics: practical and educational challenges. J Mol Diagn, 2007. 9(1): p. 1-6.

13. Peyvandi, F., et al., Genetic diagnosis of haemophilia and other inherited bleeding disorders. Haemophilia, 2006. 12 Suppl 3: p. 82-9.

14. Pruthi, R.K., Hemophilia: a practical approach to genetic testing. Mayo Clin Proc, 2005. 80(11): p. 1485-99.

15. Pruthi, R.K., A practical approach to genetic testing for von Willebrand disease. Mayo Clin Proc, 2006. 81(5): p. 679-91.

16. Rosendaal, F.R., et al., Sex ratio of the mutation frequencies in haemophilia A: estimation and meta-analysis. Hum Genet, 1990. 86(2): p. 139-46.

17. Steinhaus, K.A., et al., Inconsistencies in pedigree symbols in human genetics publications: a need for standardization. Am J Med Genet, 1995. 56(3): p. 291-5.

18. Tuddenham, E.G., et al., Haemophilia A: carrier detection and prenatal diagnosis by linkage analysis using DNA polymorphism. J Clin Pathol, 1987. 40(9): p. 971-7.

19. Winter, R.M., et al., A maximum likelihood estimate of the sex ratio of mutation rates in haemophilia A. Hum Genet, 1983. 64(2): p. 156-9.

20 . El-Maarri, O., et al., Analysis of mRNA in hemophilia A patients with undetectable mutations reveals normal splicing in the factor VIII gene. J Thromb Haemost, 2005. 3(2): p. 332-9.

21. Graw, J., et al., Haemophilia A: from mutation analysis to new therapies. Nat Rev Genet, 2005. 6(6): p. 488-501.

 

Data Interpretation

Click HERE to go to the Data Interpretation Exercises.