Use of Single Nucleotide Polymorphisms for Whole Genome Selection in Cattle

Author Information

G.E. Seidel, Jr.
Animal Reproduction and Biotechnology Laboratory
Colorado State University


Genomic selection using SNPs (single nucleotide polymorphisms) is a powerful new tool for genetic selection. Current SNP profiles for individual animals are generated using a small plastic chip that is diagnostic for up to 50,000 SNPs spaced throughout the bovine genome. Phenotypes, usually averaged over offspring of bulls, are matched with SNP profiles of bulls mathematically so that animals can be ranked for siring desirable phenotypes through their SNP profiles. For improving many traits in dairy cattle, the rate of genetic improvement can be nearly doubled when SNP information is used in addition to the current methods of genetic evaluation. Separate SNP analyses need to be developed for different populations; for example, the system for Holsteins is not useful for Jerseys. Also, the value of these systems is very dependent on the number of accurate phenotypes matched with SNP profiles; increasing the number of North American Holstein bulls evaluated from 1,151 to 3,576 quadrupled the additional genetic gain in net merit from this approach. Thus, available information will be insufficient to exploit this technology fully for most populations. However, once a valid SNP evaluation system is developed, any animal in that population, including embryos, can be evaluated with similar accuracy. Biopsying embryos and screening them through SNP analysis will greatly enhance the value of this technology by minimizing generation intervals.

Please check this link first if you are interested in organic or specialty dairy production.

Brief Summary of Some Genetic Principles

Phenotype and Genotype

Phenotype, what organisms look like and how they perform, is determined in several ways: entirely by genetic makeup for some traits (e.g., sex, hair color) or mostly or entirely by environment for other traits (e.g., death due to lightning or becoming infected with certain viruses), but, for most traits, by the combination of genetics and environment, often with an interaction. An example of an interaction is dairy cows selected genetically for high milk production fed intensively or extensively. Epigenetic effects also can affect phenotype (Bromfield et al. 2008) but will not be covered here. Although cattle examples will be used, the broad principles apply to most mammals.

The focus of animal breeding is to manipulate genetics, mostly by selective breeding, to obtain desired phenotypes. The genotype of an animal is fixed at fertilization, when a haploid sperm containing about 2.8 billion base pairs of DNA in the case of cattle (about 4% fewer for a Y sperm than an X sperm) fertilizes a haploid oocyte, which after extrusion of the second polar body, also contains about 2.8 billion base pairs of DNA (Elsik et al. 2009). The resulting zygote duplicates this DNA and divides to produce a two-cell embryo, so each blastomere contains about 5.6 billion base pairs of DNA. As the cells of the embryo continue to duplicate DNA and divide, the resulting adult animal will have about 50 trillion somatic cells, each (with a few exceptions) containing the same 5.6 billion base pairs of DNA (11.2 billion if duplicated in preparation for cell division) that were present in the zygote.

Genes and Alleles

From genetic principles, genes are units or lengths of DNA that contain two kinds of information: 1) specification of the amino acid makeup of proteins through making mRNA, and 2) regulation of when and where and how much of that specific RNA is made. For example, the gene-specifying RNA for the amino acid sequence of the milk protein casein has a regulatory part that causes that RNA to be made only in the mammary gland and only when lactation is physiologically appropriate. It turns out that there are around 22,000 such genes in cattle, specifying proteins ranging from hemoglobin to follicle-stimulating hormone (FSH) (The Bovine Genome Sequence and Analysis Consortium, 2009). Thousands of genes produce RNA that is not translated into proteins. Several of these are structural RNAs (e.g., for ribosomes), but most are small regulatory RNAs that interact with the regulatory regions of protein-specifying genes so that they are turned on to produce the right amount of RNA at the right time in the right tissues. Thus, skin cells do not make FSH, and pituitary cells do not make skin, partly due to regulatory RNAs.

A final elementary concept is alleles, which are alternate forms of a gene. Genes inherited from one parent often differ in small but important ways from those inherited from the other parent (Fig. 1). These differences are the basis of genetic variation and are termed alleles. Familiar examples are coat color, horned or polled, etc., with sex being a special case. For most of the 22,000 proteins (and their variants due to alternative splicing), these differences are less dramatic. For example, hundreds of genes affect growth, one of which is growth hormone, which comes in different forms due to alleles, primarily due to differences in the regulatory parts of the gene. Thus, some animals produce more growth hormone than others, affecting not only growth but also such traits as milk production.


Figure 1. Illustration of how a SNP marks an allelic difference between two chromosomes, which could be considered homologous (one from each parent within an individual, or chromosomes from two individuals).

Figure 1. Illustration of how a SNP marks an allelic difference between two chromosomes, which could be considered homologous (one from each parent within an individual, or chromosomes from two individuals).

Note that the base pair sequence is identical for the top and bottom chromosomes except for the SNP marker and allele. This is simplified in various ways (e.g., directionality of the DNA is not specified).

Genetic selection for phenotypic traits is nothing more than choosing different combinations of alleles. For example, an individual animal will have three possibilities for the allelic composition of the regulatory regions of the growth hormone gene: high from father and high from mother; low from father and low from mother; or high from one parent and low from the other. For a number of genes, there are more than two alleles present in the population of animals, but in an individual animal, only two alleles are possible.

Gene Sequences and Maps

Sequencing the bovine or human or rice genome is nothing more than determining the linear order of four bases in the genome: adenosine (A), thymidine, (T), guanine (G), and cytosine (C). Each A is paired with a T, and G with a C. In the case of cattle, the DNA is arranged on 29 pairs of linear autosomal chromosomes (one of each pair from each parent) and the sex chromosomes. A special case is the circular mitochondrial genome of around 16,000 base pairs with about 35 genes; this is inherited maternally through mitochondria in the oocyte. Because sperm mitochondria degenerate after fertilization, sperm generally do not contribute to embryo mitochondrial genes.

A huge problem in sequencing 2.8 billion bases of the haploid bovine genome is knowing where you are, even though the genome is divided up into the 29 autosomes and two sex chromosomes; the process is called mapping, and there are several kinds of maps. The same kinds of problems and solutions occur when designing and using maps for transportation. One needs to be able to match up information along the road, such as road signs and mile markers, with the map for it to be of much value. Particular DNA sequences serve as markers in the same way; for example, there are hundreds of ways to order the four DNA bases in sequences of four: ATGC, ATCG, AGTC, TACG, AATT, etc. With a sequence of 20 bases, there are billions of combinations, and with a few exceptions, such as 20 A’s in a row, 20 base pairs define a unique map marker. Large numbers of markers make it possible to know where you are in the genome with reasonable precision.

Another problem is that there really are two maps for each animal, one for the genome inherited from the mother and the other the genome from the father. We already covered that these maps differ in the alleles of genes. Another way they differ even more is in the DNA sequences between the genes, which comprise over 90% of the genome. Any difference at a particular point, termed a locus, whether in a gene or between genes, is called a polymorphism. Often these are a one base pair change, frequently without the adjacent base pairs being different. These are called single base pair (nucleotide) polymorphisms, or SNPs (pronounced “snips”).