Probabilities for heterozygote genetic markers in hybrids

by Ketil Malde; March 10, 2016

Nature is nothing if not flexible, and when the environment changes, plants and animals will try their best to adapt to new conditions. Migration can be one such adaptation, and one which sometimes brings previously separate populations into contact with each other. For example, we have seen recent examples of Antarctic minke whale migrating all the way to the North Atlantic to breed with the whales native to that region. We don't know precisely what drives this, but one likely candidate is changes in the ecosystem, perhaps caused by global warming.

In order to monitor this migration, we would like to use genetic markers to identify migrants and their offspring. Ideally, we want markers that are fully diagnostic, that is, they always give one value (or allele) in one population, and always a different value in the other. In practice, even for quite good markers there is often some occurrences of the foreign allele, and even if the marker appears fully diagnostic, we can't say for sure, since our testing will always be limited by our sampling. Usually, the best we can do is to quantify the allele frequencies as confidence intervals.

In order to classify hybrids, we need to know the probability for the different combinations of alleles (genotypes) in the various types of hybrids. To simplify the analysis, we limit our interest to the case where a migrant enters a native population and interbreeds with it, and where the offspring (referred to as the F1 generation) continues to interbreed with the native population, resulting in new generations (F2, F3, and so on) of back-crossed hybrids.

Assuming fully diagnostic markers

As mentioned, a marker is fully diagnostic if no allele occurs in both populations. If we restrict analysis to single nucleotide polymorphisms (SNPs), where there are two possible alleles per marker, all non-hybrid individuals are homozygote.

An F1 hybrid by necessity inherits one allele from each population, and thus it is always heterozygote. An F2 back-cross inherits one allele from the native population, and one from the F1 hybrid. The probability of heterozygosity is therefore the probability of inheriting the foreign allele (a) from the hybrid, i.e. 0.5. Similarly, the probability of homozygosity is the probability of inheriting the native allele (A), also 0.5. In general, the probability of retaining the foreign allele is halved for each subsequent back-cross.

If we label the native allele A, and the foreign allele a, we can list the probabilities for the different genotypes, as seen in Table diagnostic below.

Genotype probabilities with fully diagnostic markers for increasing generations of back-crossed hybrids.
BC Gen	P(AA)	P(Aa)	P(aa)
migrant	0	0	1
native	1	0	0
F1	0	1	0
F2	0.5	0.5	0
F3	0.75	0.25	0
:	:	:	:
Fn	1-2^1-n	2^1-n	0

Arbitrary allele frequencies

Although fully diagnostic markers are the ideal case, in practice the foreign allele often occurs in the native population, and vice versa. In any case, with limited testing we cannot ascertain that the markers are fully diagnostic; at best, we can give a confidence interval for the minor allele frequency.

To address this we let A and a no longer represent the actual allele values, but instead the allele origin. In other words, A means an allele inherited from the native population, and a represents an allele with foreign origin, inherited from its migrant ancestor. We see that we can then use Table diagnostic to determine the probability of a back-cross having two alleles from the native population, or retaining one allele from its migrant forebear.

Definition of allele frequencies in the two population.
	B	b
native	p_n	q_n
foreign	q_f	p_f

Now, we turn to the actual allele values. Let the alleles be labelled B and b, and allele frequencies defined as in Table freqs.

The probability of the two cases of allele heritage, and the associated probabilities for the possible genotypes.
Case	Probability	Genotype BB	Genotype Bb	Genotype bb
Two native alleles	1 − 2^1 − n	p_n²	2p_nq_n	q_n²
One native, one foreign	2^1 − n	p_nq_f	p_np_f + q_nq_f	q_np_f

Table probs combines the allele frequencies with Table diagnostic to calcuate the probability for the different genotypes. From this, we see that the probability of a heterozygote (genotype Bb) in an Fn hybrid is therefore:

If we assume that the population has the same minor allele frequency, that is p_n = p_f = p and q_n = q_f = q = 1 − p. From the relationship p = 1 − q, it follows that p² + q² = p² + (1 − p)² = 2p² − 2p + 1 = 1 − 2pq, and we get:

We observe here that if p = 1, the probability of a heterozygote is 2^1 − n, as in Table diagnostic, and as n increases, the heterozygote probability converges to 2pq, the heterozygote probability in the native population. Table probs gives the heterozygote probabilities for back-cross generations F1 to F10 under various minor allele frequencies.

Table of heterozygote probabilities given generation and MAF
Gen	0.1	0.05	0.025	0.01	0
1	0.820	0.905	0.951	0.980	1.000
2	0.500	0.500	0.500	0.500	0.500
3	0.340	0.298	0.274	0.260	0.250
4	0.260	0.196	0.162	0.140	0.125
5	0.220	0.146	0.105	0.080	0.063
6	0.200	0.120	0.077	0.050	0.031
7	0.190	0.108	0.063	0.035	0.016
8	0.185	0.101	0.056	0.027	0.008
9	0.183	0.098	0.052	0.024	0.004
10	0.181	0.097	0.051	0.022	0.002
native	0.180	0.095	0.049	0.020	0.000

When q is small, we can make the following approximation by ignoring q²:

Fixed markers in the native population

For the minke whale, we tested fifty markers on about 400 specimens. There are some cases of non-zero minor allele frequency in the Antarctic population, but the common minke population (Atlantic and Pacific) appears to be entirely homozygote for these markers. One possible explanation can be that the larger Antarctic population allows the maintenance of a wider genetic diversity. Since we are primarily interested in the introgression of (foreign) Antarctic minke into (native) common minke populations, we might assume that p_n = 1 and q_n = 0. In that case, we can simplify (1) and the probability of heterozygotes becomes:

Previous literature

As usual, I don't check the literature too closely before putting pen to paper. But Andersom and Thompson give an overview of various methods, and is probably a good starting point. Many of the methods mentioned depend on fully diagnostic markers, and many apply to a limited number of generations. Some methods attempt to identify hybrids without known allele frequencies in the native populations -- this is an implicit requirement in our analysis above. We have also used more-or-less standard classification methods (e.g., programs like Structure and Geneclass), but as I understand it, these only look at allele frequencies, and don't take into account the special distribution of genotypes (i.e., increased number of heterozygotes) that is particular to hybrids.

Acknowledgments

Thanks to Hans J. Skaug for helping out with the math. I am still to blame for any remaining errors, of course - if you find any, I appreciate being made aware of them.

comments powered by Disqus

Feedback? Please email ketil@malde.org.