ﺑﺎﺯﮔﺸﺖ ﺑﻪ ﺻﻔﺤﻪ ﻗﺒﻠﯽ
خرید پکیج
تعداد آیتم قابل مشاهده باقیمانده : 3 مورد
نسخه الکترونیک
medimedia.ir

Basic genetics concepts: DNA regulation and gene expression

Basic genetics concepts: DNA regulation and gene expression
Literature review current through: Jan 2024.
This topic last updated: Sep 20, 2023.

INTRODUCTION — The role of genetic information in the practice of medicine is increasing at a rapid pace, and an understanding of the basic principles underlying DNA regulation, genetic variation, and genetic disease is integral to many aspects of patient care.

The basic principles of DNA regulation and gene expression are reviewed here, along with the types of genetic disorders that can occur and the types of tools available to evaluate them. A related discussion of chromosome organization and segregation is presented separately. (See "Basic genetics concepts: Chromosomes and cell division".)

The following subjects are also discussed in separate topic reviews:

Terminology – (See "Genetics: Glossary of terms".)

Genetic testing – (See "Genetic testing".)

Genetic counseling – (See "Genetic counseling: Family history interpretation and risk assessment".)

NORMAL ORGANIZATION AND REGULATION

Nuclear genome — The nuclear genome provides the complete blueprint (other than mitochondrial genes) for the human genome in every nucleated cell in the body.

It has the following properties:

It consists of approximately 3 billion (3x109) base pairs of DNA.

It is organized into 23 chromosomes (figure 1), with 22 autosomes and 1 sex chromosome (X or Y). (See "Basic genetics concepts: Chromosomes and cell division", section on 'Chromosome organization'.)

The default sex is female. The Y chromosome contains the SRY gene (sex-determining region, Y chromosome), which suppresses the formation of female gonads (via production of anti-Müllerian hormone) and initiates development of male reproductive organs. Individuals with a Y chromosome and an intact SRY gene are phenotypically male; those without a Y chromosome are phenotypically female. Exceptions include XY females with androgen resistance and rare XX males in XY translocation kindreds.

Between 1 and 2 percent of the sequence of the nuclear genome encodes for genes (discrete functional units) that provide instructions for making proteins. Together, the combination of parts of the genomes that encode for genes is known as the exome.

The exome contains approximately 21,000 different genes, although estimates vary greatly depending on how genes are defined and how the analyses are conducted [1]. Coding genes are transcribed into messenger RNA (mRNA) that is subsequently translated into protein. This information became available as a result of the completion of the Human Genome Project in 2001 to 2003 [2]. Some experts were surprised that the number was this small. Noncoding genes code for other RNA types, such as microRNAs (miRNAs) and small nucleolar RNAs (snRNAs), with functional properties that do not require translation to protein. (See 'Central dogma versus more complex regulation' below.)

In most cases, protein-coding genes consist of exons, which have segments that code for amino acids, introns (segments with code that are spliced out during transcription; many introns regulate the expression of genes), and other regulatory regions (5' and 3' untranslated regions) (figure 2). Alternative splicing uses different combinations of exons to produce different versions of proteins and expands the breadth of functions encoded by each gene. (See 'Transcription' below.)

There are several ways to describe the physical position (locus) of a gene on a chromosome:

Data from completely sequenced genomes allows researchers to refer to chromosomal position at the resolution of a single base pair. Although the relative position of most DNA sequences is stable, absolute base pair numbering varies from one sequence build to another due to ongoing additions and editing of the genome. The build number must be noted when using absolute base numbering.

This standardization of the reference genome is critical to ensure that pathogenic or likely pathogenic variants in a gene (see 'Clinical classification of pathogenicity' below) are reported consistently across different laboratories.

Prior to the completion of the Human Genome Project, genetic loci were described with reference to cytogenetic markers (dark regions on chromosomes that appear with Giemsa staining, also known as G-bands) (figure 3). High-resolution karyotyping allows greater subdivision. Although array comparative genomic hybridization (aCGH) has grown in popularity for assessing copy number variation at the chromosome level, karyotyping remains important clinically, such as when examining for a balanced translocation that may have been responsible for an offspring with an unbalanced chromosomal rearrangement (partial trisomy or monosomy), or in evaluating certain hematologic malignancies. (See "Genomic disorders: An overview", section on 'Copy number variations' and "Tools for genetics and genomics: Cytogenetics and molecular genetics" and "Chromosomal translocations, deletions, and inversions".)

Mitochondrial genome — Mitochondria are cytoplasmic organelles thought to be derived by an ancient acquisition of bacteria into eukaryotic cells. Their genome is 16,589 base pairs long and contains 37 genes, 13 of which are protein coding. Many of these are involved in mitochondrial respiration and the electron transport chain. Mitochondrial DNA is found within DNA-protein complexes called nucleoids, which are located in the mitochondrial matrix. (See "Mitochondrial regulation and functions".)

During conception, mitochondria come exclusively from the egg. Thus, all mitochondrial DNA is maternally inherited, and mitochondrial disorders are exclusively transmitted through the maternal line. (See "Inheritance patterns of monogenic disorders (Mendelian and non-Mendelian)", section on 'Mitochondrial inheritance'.)

Gene expression — Gene expression is the process in which the genetic information is converted to biochemical or biologic functions. The process is highly regulated on multiple levels.

DNA and RNA — DNA and RNA are nucleic acids (polymers with a repeating modular structure) in which nucleotide bases are bound to a sugar-phosphate backbone. The sugars are five-carbon rings, also called pentoses. The sugar in DNA is deoxyribose (hence, deoxyribonucleic acid) and in RNA is ribose (ribonucleic acid).

The sugar moieties are connected to each another by the formation of phosphodiester bonds between the 3rd and 5th positions of adjacent carbon rings.

The bases are covalently bound to the 1st position of the carbon ring. Bases with two rings are referred to as purines, and bases with one ring are referred to as pyrimidines (figure 4).

DNA – The four bases in DNA are:

Purines – Adenine (A) and guanine (G)

Pyrimidines – Cytosine (C) and thymine (T)

RNA – RNA contains the same purines (A and G). The pyrimidines are C and uracil (U). U and T differ by a single methyl group (CH3).

DNA consists of a double helix (two paired, antiparallel strands in which noncovalent hydrogen bonds form between complementary bases on the opposite strand), forming a helical polymer (double helix) (figure 5). A pairs with T and C with G. The rule of sequence complementarity (sequence conservation in newly transcribed RNA and newly synthesized DNA) provides the basis for accurate DNA transcription and replication. Adenine and thymine pair with two hydrogen bonds and cytosine and guanine pair with three hydrogen bonds.

RNA is usually single stranded, although some RNAs base-pair with other RNAs to regulate them. (See 'Central dogma versus more complex regulation' below.)

The difference in the sugar moiety between DNA and RNA affects the relative stability of the molecules. DNA is more stable because the carbon at position 2' is hydrogen-bonded rather than hydroxylated. In contrast, RNA is more susceptible to degradation by heat and ultraviolet light. RNases also greatly reduce the stability of RNA. This degradation is one method of rapid inactivation of gene expression.

Central dogma versus more complex regulation — The fundamental processes by which heritable information translated from genetic code to function in living organisms are common throughout all living organisms. The order in which these processes occur (transcription of DNA to RNA and translation of RNA to protein in a linear, unidirectional process) were termed the "Central Dogma of Molecular Biology" by Francis Crick in 1958. The figure illustrates the processes of transcription and translation (figure 6).

However, regulation of gene expression is much more complex than simply the unidirectional copying of a sequence.

Proteins and RNA species regulate the expression of specific genes, and there are complex interactions between DNA, RNA, and protein. Examples of other types of RNA besides mRNA include snRNAs, miRNAs, ribosomal RNA (rRNA), and short interfering RNAs (siRNAs) [3]. (See 'DNA and RNA' above.)

Reversible epigenetic changes to the DNA also occur. These changes are modifications superimposed on individual DNA strands (eg, methylation) and/or on histones (eg, methylation, acetylation), which confer higher-order chromosome regulation. (See "Principles of epigenetics".)

Three-dimensional folding of DNA brings together segments of chromosomes that are separated by great genetic distances in ways that can be critical to tissue-specific and stage-specific gene expression.

Some viruses have an RNA-based genome. Retroviruses use reverse transcription to integrate their genome into the host DNA-based genome, causing genetic information to flow from RNA to DNA and then back to RNA again [4].

Transcription — The process of transcription transfers genetic information from DNA to RNA, mediated by new RNA synthesis using the DNA as a template. It occurs in the nucleus, at distinct genomic positions at defined times in response to various triggers, which may include internal programs related to development or maturation, or external triggers such as arrival of nutrients, starvation, or exposure to a number of substances or circumstances.

Transcription occurs from the antisense DNA strand, in the 5' to 3' direction. (See 'DNA and RNA' above.)

Transcription requires that cellular machinery unfold the DNA to allow transcriptional regulators (transcription factors) and the transcription machinery to bind a single-stranded region of the DNA. This is followed by a cascade of protein recruitment that ultimately leads to the binding of RNA polymerase to the 5' end of the DNA sequence. One type of RNA polymerase catalyzes RNA chain elongation by reading the DNA template and incorporating nucleotides (A, G, C, and U (figure 7)) into a new chain linked by phosphodiesterase bonds (other RNA polymerases have other functions).

Differences in DNA sequence that alter chromatin structure or affect the binding of transcription factors are often implicated as susceptibility factors for common, complex genetic traits [5]. (See 'Genetic variation' below and 'Modes of inheritance' below.)

The first phase of transcription produces precursor messenger RNA molecules (pre-mRNA) that subsequently undergo modifications to make a stable mRNA transcript. These modifications are illustrated in the figure (figure 8) and include:

Splicing – DNA contains protein-coding exons and noncoding introns. Splicing is the process that removes the noncoding segments from pre-mRNA molecules and splices together the ends of the intervening coding segments to produce a seamless mature mRNA product.

Differential splicing allows a single gene to encode more than one protein (figure 9). Differences in splicing may be tissue-specific and may be highly regulated within the same tissue.

Splicing relies on spliceosomes (enzymatic ribonucleoprotein complexes). Intron boundaries are marked by conserved splice donor and splice acceptor sites on the pre-mRNA that provide sequence recognition sites for the spliceosomes. Mutations that disrupt splice sites can impair normal splicing and alter protein structure and function. (See 'Genetic variation' below.)

Capping – Capping improves the stability and function of mRNA by linking an inverted methylated guanosine (m7G) to the 5' end. The cap prevents the 5' end from binding to other nucleic acid chains, protects the mRNA from exonucleases, promotes translocation of the mRNA from the cell nucleus to the cytoplasm, and facilitates ribosomal binding to the 5' end of the mRNA to initiate translation.

Polyadenylation – Polyadenylation improves the stability and function of mRNA by linking a long tail of adenine molecules (a polyA tail) to the 3' end. The polyA tail increases transcript stability, promotes the tertiary structure of the transcript, and facilitates the initiation of translation.

Translation — The process of translation uses the information in mRNA to create proteins (polypeptide chains of amino acids that fold into three-dimensional structures capable of interacting with each other). It occurs in the cytoplasm, as illustrated in the figure (figure 10).

The mRNA is transported from the nucleus to the endoplasmic reticulum (figure 10), where translation is initiated [6].

Mature mRNA serves as the template that determines the linear sequence of amino acids. By convention, the amino acid sequence is listed in the direction of synthesis, from the amino (N) terminus to the carboxy (C) terminus.

Translation is carried out by ribosomes (complex ribonucleoprotein structures that contain the enzymatic machinery for protein synthesis) (figure 11). The ribosome binds to transfer RNA (tRNA), which provides the nucleotide sequence. Each tRNA has a three-base sequence (an anti-codon) complementary to an mRNA triplet.

There are 20 amino acids, each encoded by a three-nucleotide mRNA sequence known as a codon (figure 12). There are 64 codons; thus, there is some redundancy in amino acid determination (some amino acids can be specified by more than one codon) (figure 13).

Each amino acid has an amino-carboxyl backbone with a unique side chain. They can be classified according to whether their side chain is polar or nonpolar, and for the polar amino acids, whether it is charged or not (figure 14). The collective interactions of the side chains confer tertiary structure (folding) and protein-protein interactions.

There are three stop codons (UGA, UAA, and UAG); the presence of a stop codon is used to signify the end of the transcript. In the mitochondrial genome, UGA encodes tryptophan rather than a stop codon [7,8]. AUG, which encodes methionine, also signals the start of each transcript.

Ribosomes recognize and bind the start codon (methionine) at the 5' end of mRNA to initiate the translation process. The ribosomal machinery advances in a 5' to 3' direction along the mRNA, adding successive amino acids to the growing polypeptide chain until a stop codon is reached (figure 11). Multiple ribosomes can advance along the same mRNA molecule; polyribosomes (polysomes) represent tight, spatial arrays of ribosomes translating a single mRNA molecule.

Further processing steps (post-translational modifications of the polypeptide chain) provide another layer of regulation:

Cleavage of an upstream leader sequence by a protease can convert an inactive precursor to an active protein (eg, the cleavage of proinsulin to active insulin).

Addition of various side groups can facilitate interactions with other proteins or trafficking to various cellular compartments. Examples include glycosylation, carboxylation, phosphorylation, oxidation, or attachment of a membrane anchor.

Membrane-bound and secreted proteins are moved into the endoplasmic reticulum (ER).

MODES OF INHERITANCE — Inheritance patterns describe the manner in which a trait is expressed in members of a family, as determined by the phenotypes and genotypes of the individual and family members. Broadly, inheritance patterns are described as either monogenic (one gene) or polygenic (many genes). Oligogenic inheritance refers to patterns in which a few genes contribute.

Monogenic (Mendelian and non-Mendelian) — Single-gene (monogenic) disorders often follow Mendelian inheritance patterns such as autosomal dominant, autosomal recessive, or X-linked (table 1). (See "Inheritance patterns of monogenic disorders (Mendelian and non-Mendelian)".)

Mendelian – Mendelian inheritance is often characterized by a very tight correlation between a trait or disease genotype and its phenotype. This can occur because these variants often confer significant effects on a gene's function. In these situations, knowledge of an individual's genotype can be highly predictive for a trait or disease risk, and the information can be very useful clinically. (See 'Implications for medicine' below.)

Since somatic cells contain two copies of each autosome, there are paired alleles for every gene or locus in the genome. In most recessive disorders, heterozygosity (presence of a pathogenic variant at only one allele) is a benign carrier state with minimal to no disease manifestations, whereas homozygosity (the same pathogenic variant on both alleles) or compound heterozygosity (different pathogenic variants on both alleles) is sufficient to cause the phenotype (development of disease or increased disease risk).

In autosomal dominant inheritance, haploinsufficiency (a pathogenic variant of the wild-type allele in a heterozygote) can be sufficient to cause the phenotype. Dominant-negative effects are a common mechanism of disease pathogenesis in which an abnormal protein produced by one allele can impair the function of the normal protein produced by the other allele.

Non-Mendelian – Some monogenic disorders do not appear to follow these textbook patterns, due to a number of factors that can influence the relationship between genotype and phenotype, including incomplete penetrance, variable expressivity, anticipation, mosaicism, parent-of-origin effects (imprinting), and mitochondrial inheritance. Details are described separately. (See "Inheritance patterns of monogenic disorders (Mendelian and non-Mendelian)", section on 'Causes of non-Mendelian inheritance'.)

Polygenic — Many "common" diseases, such as hypertension or diabetes, are typically caused by the cumulative and interactive effects of genetic variation in more than one gene (often dozens, and sometimes as many as thousands). Moreover, the risk of developing many of the common diseases is also influenced by developmental, environmental, and social factors. (See "Principles of complex trait genetics".)

In these disorders, a 1:1 correlation between risk genotype and disease is never observed, and each risk variant contributes only a fraction of the total genetic risk for disease. As a result, identification of genes that contribute to polygenic phenotypes and the clinical usefulness of genetic information may be lower than for monogenic traits or diseases.

Best practices for testing, communicating, or managing polygenic findings are lacking, but this is an active area of research. (See 'Implications for medicine' below and 'Research areas' below.)

Complex — Complex is a broader category than polygenic inheritance, in which environmental and gene-environment interactions, as well as multiple genes and gene-gene interactions, also play a role.

GENETIC VARIATION — The human genome is rich in variation. Completion of the Human Genome Project (see 'Nuclear genome' above) and expansion of tools for determining genotype has greatly expanded the catalog of genetic variation that contributes to disease susceptibility, disease phenotypes, and disease response to treatment.

Variation can be categorized into three broad groupings based on the scale of the change, from single nucleotide to entire chromosome, and whether it is permanent (genetic) or reversible (epigenetic). Small-scale variation (single nucleotide changes or small insertions or deletions) are thought to comprise the majority of genetic variation. (See 'Sequence variants' below and "Basic genetics concepts: Chromosomes and cell division", section on 'Structural aberrations' and "Principles of epigenetics".)

Variation is also characterized according to whether it occurs in the germline (and thus is transmitted to offspring) or in somatic cells.

Terminology that refers to variants according to their pathogenicity rather than calling all variants "mutations" is more useful clinically because it helps the patient and clinician distinguish among pathogenic variants that require a specific action or intervention; benign variants that represent normal genetic variety among individuals; and variants of uncertain (unknown) significance (VUS) for which more study of the variant is needed to determine pathogenicity. (See 'Clinical classification of pathogenicity' below.)

Variation in the genome

Sequence variants — DNA sequence variation is the most abundant type of genetic variation and is responsible for most of the genetic diversity in heritable traits. While the reference sequence is sometimes referred to as the "normal" sequence, this presumes that there is one true normal and does not account for a heavy bias toward reference sequences from certain populations such as those of European descent.

Difference from a reference DNA sequence is not inherently advantageous or deleterious. On the one hand, variation is responsible for genetic diversity and evolution. On the other hand, variation can interfere with normal gene functions and can increase the risk of deleterious traits or disease.

There is a wide range in the estimated frequency of DNA sequence variation, with some regions of the genome such as the major histocompatability cluster (MHC) showing high heterogeneity and others, especially coding regions, showing more restricted variation. It has been proposed that critical functional elements are less tolerant of variation [9-13].

Common types of sequence variation include the following:

Single nucleotide substitutions – These are variants in which one nucleotide is replaced with a different nucleotide. These primarily arise due to single-base slip mispairing during DNA replication and/or to cytosine deamination [14,15]. Single nucleotide substitutions are thought to represent more than half of human DNA variation.

Missense – Missense mutations substitute one nucleotide for another. The messenger RNA (mRNA) remains "in frame," meaning that only a single codon is affected and all downstream codons remain the same. In some cases, the mutation is "silent" (not changing the amino acid) and in others, one amino acid is substituted for another. This may or may not affect protein function depending on whether the residue is critical and/or whether the polarity and charge are maintained. (See 'Translation' above.)

Nonsense – Nonsense mutations substitute one nucleotide for another, but the new nucleotide introduces a stop codon that was not previously present. When the mRNA is translated, the protein ends prematurely, creating a truncated form of the protein that may undergo nonsense-mediated decay and/or lack important functional domains.

Splice site – Single nucleotide changes can also alter sites where the protein-coding DNA (exons) are spliced together, referred to as splice site variants.

Insertion/deletion (indel) polymorphisms – Indels are variants in which one or more nucleotides are removed or added, potentially changing the reading frame of the resulting mRNA (figure 15). Most indels range in length from one to four nucleotides; the vast majority are a single nucleotide [16]. Although less common than single nucleotide substitutions, indels are distributed throughout the genome and are thought to represent approximately 10 to 25 percent of human DNA variation [13]. Like single-base substitutions, most indels (particularly insertions) arise through slip-mispairing [17,18].

Frameshift – Most clinically significant indels introduce a shift in the reading frame (figure 15); codons downstream of the insertion or deletion are no longer in register with the original protein sequence and a premature stop codon may eventually be created. An example is a 32 base-pair deletion in the CCR5 gene, which encodes the receptor used by the HIV virus to enter T cells. Homozygotes for CCR5 Delta32 are protected from HIV infection, and heterozygotes have a slower rate of disease progression [19-21].

If a frameshift mutation causes a premature stop codon, the transcript may undergo nonsense mediated decay.

In-frame – Insertions or deletions of three or six nucleotides can have pathogenic consequences despite preserving the reading frame for the downstream amino acids. As an example, a 3-nucleotide (CTT) deletion at codon 508 (phenylalanine) of the CFTR gene (DeltaF508) is responsible for approximately 70 percent of cystic fibrosis cases in populations of European descent. This variant prevents normal trafficking of mature CFTR protein to the cell surface [22-24]. (See "Cystic fibrosis: Genetics and pathogenesis".)

Short tandem repeats (STR) – Small segments of DNA, typically two- to six-nucleotide, long, that are repeated, often 20 to 30 times in a row in healthy individuals. STRs are common in the genome, with lengths that often vary between generations in a kindred due to their susceptibility to misalignment during DNA replication (figure 16). Disease-causing expansions of tandem repeat sequences (trinucleotide repeats) have been described in coding regions of genes, intronic sequences, and 5' and 3' untranslated regions (5'UTR and 3'UTR) of sequence [25-27]. When they expand to 100 or even 1000 copies, they can be associated with one of several neurologic syndromes. Susceptibility to prostate cancer is another example.

Examples include fragile X syndrome (expansion of CGG in the 5'UTR of the FMR1 gene), myotonic dystrophy (expansion of CTG in the 3'UTR of the DMPK gene), and Huntington disease (expansion of CAG in the coding region of the HD gene) [25,28-30]. (See "Autosomal dominant spinocerebellar ataxias", section on 'Fragile X-associated tremor/ataxia syndrome' and "Myotonic dystrophy: Etiology, clinical features, and diagnosis" and "Huntington disease: Genetics and pathogenesis".)

Repeat expansion is generated by slip-mispairing during DNA replication, with formation of DNA loops and back-priming of the leading strand [31,32]. The longer a repetitive segment, the more susceptible it is to slip-mispairing. As a result, these disorders may be associated with genetic anticipation (increasing disease severity and earlier age of onset in subsequent generations due to increasing numbers of repeats). As an example, in myotonic dystrophy, diagnosis of the condition in a severely affected child may lead to identification of mild manifestations in a grandparent who was previously unaware of the diagnosis.

The converse phenomenon of repeat length contraction has also been observed, and may explain variable penetrance observed in myotonic dystrophy, although this phenomenon is very rare [29]. Parent-of-origin biases have also been observed in several of these conditions, likely due to gender-related differences in repeat expansion rates.

STRs were previously widely used to facilitate genotyping, but other variants have largely replaced them.

Terminology for DNA variation is evolving. Previously, single base changes with a high frequency in the population (generally defined as 1 percent or more) have been referred to as single nucleotide polymorphisms (SNPs, pronounced "snips"), whereas less common changes were called mutations. However, as noted below, the term "variant" is now preferred, with a qualifier about pathogenicity. (See 'Clinical classification of pathogenicity' below.)

Structural variation — Variation in chromosome number (aneuploidy) or chromosome structure can occur in the germline. Structural variants within chromosomes such as copy number variants (CNVs; structural variants affecting the number of copies of certain genes) are common and often are benign; however, certain CNVs can affect drug metabolism, such as those in genes that affect metabolic enzymes (eg, CYP2D6). Other structural variants in the germline can lead to congenital syndromes, and acquired structural variants are common in cancer. These types of variations are discussed separately. (See "Basic genetics concepts: Chromosomes and cell division", section on 'Numerical and structural chromosome variation'.)

Variation in gene expression/regulation

Epigenetic variation — Epigenetic variation refers to changes that are superimposed on the DNA molecule or chromatin without altering its primary sequence. Examples include addition or removal of a methyl group or other side group to DNA or histones (figure 17), which can alter DNA expression in numerous ways.

These changes are heritable (passed down from one cell to another), but they can be reversed (removed and/or added) in response to developmental cues or environmental exposures. Functional consequences include normal silencing of genes from one parent and abnormal changes in gene regulation in certain malignancies. (See "Principles of epigenetics".)

Regulated gene expression — Differential expression of genes is responsible for phenotypic variation within and between individuals.

Examples include:

Temporal – Gene expression varies across the lifespan, from embryogenesis to older age.

Spacial – Tissue-specific gene expression promotes tissue differentiation during embryogenesis and normal tissue remodeling.

Circadian – A study of gene expression over time suggests an internal clock controls some aspects of gene regulation [33].

Expression (the level of gene activity) is controlled by transcription factors, regulatory RNAs, and other factors. These factors act in concert like a "dimmer switch" for a gene, ranging from on/off status to differing levels of gene expression. Typically, the level of expression is measured quantitatively through the levels of mRNA being produced for a particular gene, although gene expression is also affected by how quickly mRNA is processed or degraded. These factors can vary depending on the conditions the cell or an organism is experiencing at the time of measurement.

Gene expression profiling can be used to study variation in gene expression. (See "Tools for genetics and genomics: Gene expression profiling".)

IMPLICATIONS FOR MEDICINE — The roles of DNA regulation and gene expression in clinical medicine continue to expand as additional information about disease associations and therapeutic implications becomes available and as testing methods and costs improve.

Single nucleotide polymorphism (SNP) microarray technologies enable simultaneous genotyping of more than 1 million variants. (See "Tools for genetics and genomics: Cytogenetics and molecular genetics", section on 'Array comparative genomic hybridization'.)

Next-generation sequencing (NGS) technologies provide complete exome or whole genome sequencing data. (See "Next-generation DNA sequencing (NGS): Principles and clinical applications".)

Various tests for cancer early detection prognostication use assays of expression of multiple genes simultaneously. (See "Tools for genetics and genomics: Gene expression profiling", section on 'Clinical use'.)

Advances in mRNA technologies have facilitated the development of mRNA-based vaccines, including vaccines against coronavirus disease 2019 (COVID-19) [34]. (See "COVID-19: Vaccines", section on 'General principles'.)

These technologies and the associated computational tools have transformed the way genetic information is gathered and integrated into clinical practice due to the enormous amount of information they can generate.

Because an individual's genomic sequence remains largely invariant over their lifetime, it is possible that genome-wide data generated in childhood or early adulthood could be used across many aspects of clinical care over different stages of life. However, this potential benefit must be weighed against potential adverse effects related to ethical issues of testing children, privacy concerns, and costs and burdens of follow-up testing, which carries its own risk-benefit calculation. (See "Genetic testing", section on 'Ethical, legal, and psychosocial issues'.)

Online tutorials and instructional videos are available from a number of sources, such as: http://learn.genetics.utah.edu.

Clinical classification of pathogenicity — All types of genetic variation have the potential to cause disease or contribute to disease susceptibility risk. The role of a variant disease pathogenesis is determined by its functional impact on gene expression or protein function.

A variety of classification conventions have been applied by clinical laboratories when reporting back results of genetic testing. In an attempt to standardize this, the American College of Medical Genetics and Genomics (ACMG), together with the Association for Molecular Pathology and the College of American Pathologists, developed a set of guidelines for variant reporting and have recommended use of a five-tier classification system consisting of the following designations [35]:

Pathogenic – A pathogenic variant is a disease-causing variant, as determined by very strong genetic and experimental evidence, including consistent familial co-segregation with disease and definitive functional studies.

Likely pathogenic – A likely pathogenic variant is a variant with strong, but not definitive, evidence of pathogenicity based on its similarity to known pathogenic variants, co-segregation with disease in families or populations, and functional evidence.

Uncertain significance – A variant of uncertain significance (VUS) is a variant for which the specific criteria for the other four categories are not met, or when contradictory lines of evidence in support of both benign and pathogenic classifications are present.

Likely benign – A likely benign variant is a variant with multiple supporting (but not conclusive) lines of evidence suggesting it is not disease causing.

Benign – A benign variant is a variant with conclusive evidence that it is not disease causing, as determined typically (but not exclusively) by a high prevalence of the variant in the general (healthy) population, at a prevalence that exceeds that of the suspected disease.

The recommendations were accompanied by a detailed description of the process for variant classification [35]. This process combines multiple lines of evidence, including allele frequency data from the general population, evidence for co-segregation of variant with disease in families, results from prediction algorithms that considered the functional potential and cross-species conservation, and experimental studies demonstrating altered gene or protein function. Individual testing laboratories may classify a particular genetic variant differently depending on their assessments of the evidence. The criteria, and an approach to discussing the clinical implications, are discussed in more detail separately. (See "Secondary findings from genetic testing", section on 'Definitions and classification of variants'.)

Searchable databases of human variants are available on the following websites:

ClinGen, an NIH-funded resource that defines the clinical relevance of genes and variants.

The ClinVar website of the National Library of Medicine [36].

The Genome Aggregation Database (gnomAD) [37].

The Genome Reference Consortium (GRC; https://www.ncbi.nlm.nih.gov/grc) helps provide standardization to the reference genome to which sequence data are compared. The University of California, Santa Cruz (UCSC) Genome Browser (https://genome.ucsc.edu/index.html) is a visualization tool developed to help analyze the human genome.

Use of sequence information in clinical care

Screening — The reference genome sequence from the human genome product has made it possible to inform any individual about variation between their genotype and that of the reference genome, even when they are healthy. The Genetics Home Reference (https://ghr.nlm.nih.gov/) is a robust source of information for understanding the impact of genetic changes on health.

Carrier screening for some recessive genetic disorders is routinely performed as part of the preconception or early prenatal testing, with partner testing if the mother is a carrier. In some cases, the specific genes and/or variants in those genes are tailored to the individual's racial or ethnic background, whereas in others, testing is performed in all individuals. (See "Preconception and prenatal carrier screening for genetic disorders more common in people of Ashkenazi Jewish descent and others with a family history of these disorders" and "Hemoglobinopathy: Screening and counseling in the reproductive setting and fetal diagnosis".)

Screening of healthy adults for monogenic disorders has been proposed. Generally, other means of identifying these disorders are preferred when they exist (eg, biochemical testing for glucose-6-phosphate dehydrogenase [G6PD] deficiency; iron studies rather than testing for HFE variants). However, organizations such as the Centers for Disease Control and Prevention (CDC) have begun to consider conditions with strong actionability that may be appropriate for population screening [38]. Frameworks for screening healthy individuals have been proposed [39]. In addition, opportunistic screening, where exome or genome sequencing data generated for diagnostic purposes are screened for pathogenic variants in unrelated genes, is recommended by the American College of Medical Genetics and Genomics for selected genes associated with highly actionable conditions [40].

Characterizing blood group antigens using genetic rather than serologic testing is gaining uptake, especially in populations with a high risk of alloimmunization or complex serologic results. (See "Pretransfusion testing for red blood cell transfusion", section on 'RBC genotyping' and "Red blood cell transfusion in sickle cell disease: Indications and transfusion techniques", section on 'Genetic RBC antigen typing'.)

Perinatal screening, either by sampling of prenatal cells or cell-free DNA, or before implantation of an embryo conceived by in vitro fertilization, can be used to identify fetuses or embryos with a deleterious genetic variant. (See "Prenatal screening for common aneuploidies using cell-free DNA" and "Preimplantation genetic testing".)

Newborn screening, which may involve genetic or biochemical tests, is routinely performed to identify life-threatening metabolic or other disorders with onset in infancy or early childhood that are common and/or amenable to medical interventions. (See "Overview of newborn screening".)

Polygenic risk predictions for complex conditions are gaining interest in clinical settings. Unlike monogenic disorders for which there are laboratory tests (eg, iron studies for hemochromatosis), there are often no biochemical tests that could provide insight about the genetic risks in polygenic disorders. Early evidence suggests that behavioral modifications can be effective for mitigating high polygenic risks [41].

Another change has come with the marketing of genetic screening for common variants directly to consumers (DTC marketed genetics). This practice has accelerated discussions of bioethical issues and has raised a range of questions regarding the oversight of genetic testing, as discussed separately. (See "Personalized medicine", section on 'Direct-to-consumer testing'.)

Diagnostic testing and familial variant testing — Diagnostic genetic testing is used to provide a molecular explanation for an individual’s condition. In addition to confirming a specific diagnosis in an individual with disease manifestations, diagnostic genetic testing can affect treatment strategies, including surveillance for pleiotropic outcomes that may manifest in the future. In general, diagnostic genetic testing is conducted on patients for whom the disease genotype is suspected based on a screening test.

This subject is discussed in detail separately (see "Genetic testing") and in disease-specific reviews. A brief summary is as follows:

For many highly penetrant autosomal dominant disorders such as certain cancer syndromes, testing of asymptomatic relatives of an affected individual can have important benefits for those who test positive (carriers) and those who test negative (noncarriers) for a familial variant that had been identified in a relative.

The concept of testing relatives for an identified pathogenic or likely pathogenic variant is referred to as familial variant testing or cascade testing. Many clinical laboratories offer this testing at a reduced price if the laboratory identifies the variant in the proband. (See "Genetic testing", section on 'Which relatives should be tested?'.)

One example is familial adenomatous polyposis, due to pathogenic variants in the APC gene. Individuals who carry the familial disease variant can undergo endoscopic surveillance or colectomy, while those who do not carry the variant can be reassured that their risk for polyp development is not increased. (See "Molecular genetics of colorectal cancer".)

Another example is hereditary breast and ovarian cancer syndrome caused by pathogenic variants in the BRCA1 or BRCA2 genes. Individuals who carry the familial variant may undergo increased surveillance and/or risk-reducing surgeries, while noncarriers can avoid these procedures. (See "Genetic testing and management of individuals at risk of hereditary breast and ovarian cancer syndromes".)

Identification of genetic risk factors can influence how other health conditions are managed. For instance, use of medications known to prolong the QT interval are discouraged for patients who have variants associated with congenital long QT syndrome. (See "Congenital long QT syndrome: Treatment".)

Individuals sometimes undergo genetic testing because of a personal or family history of disease when a genetic variant had not been previously identified. In these instances, negative findings are often called "uninformative"; these individuals should generally still be managed as having an increased risk for disease.

Evaluation of complex neurodevelopmental disorders in children may reveal a genetic cause, with implications for treatment, prognosis, and family testing. (See "Next-generation DNA sequencing (NGS): Principles and clinical applications", section on 'Children'.)

Assay of germline or tumor tissue for cancer disease variants, which may inform cancer treatments, additional risk reduction interventions, and family testing. (See "Personalized medicine".)

Testing for specific variants in a person with a disease such as cystic fibrosis, to determine which therapy is appropriate. (See "Cystic fibrosis: Clinical manifestations and diagnosis", section on 'Molecular diagnosis'.)

Prenatal testing of fetal cells for chromosome aberrations is typically offered to pregnant individuals at high risk due to advanced maternal age, an abnormal screening test, or a known familial genetic condition for which the genotype is known. (See "Prenatal care: Initial assessment" and "Preconception and prenatal carrier screening for genetic disorders more common in people of Ashkenazi Jewish descent and others with a family history of these disorders" and "Diagnostic amniocentesis" and "Chorionic villus sampling".)

Advantages and disadvantages of different testing methods are discussed separately. (See "Genetic testing".)

Pharmacogenetic testing — Genetic testing can identify variants associated with drug metabolism that affect medication dosing and efficacy.

This information can identify individuals who are at increased risk for an adverse drug event (eg, hemolysis in an individual with G6PD deficiency; Stevens-Johnson syndrome in an individual treated with carbamazepine) or those for whom a medication may not provide expected benefits (eg, clopidogrel in patients with certain CYP2C19 variants). (See "Overview of pharmacogenomics".)

Increasingly, health systems are providing pharmacogenomic testing while patients are healthy and storing genetic information until needed. However, some approaches to exome and genome sequencing may be inadequate for providing comprehensive pharmacogenetic information.

Therapeutics — A number of therapies that manipulate gene expression levels and/or gene sequence information are under investigation. Some have progressed to early clinical use. (See "Overview of gene therapy, gene editing, and gene silencing".)

RESEARCH AREAS — Information about individual and population genetic variation can provide new directions for research involving diagnostic testing, drug development, and understanding of the molecular underpinnings of a number of types of traits and diseases. However, this type of research also poses significant challenges including the need for a large infrastructure and informatics resources. There is a paucity of evidence-based clinical decision support for many conditions, and provider education is often lacking.

Some of the relevant questions address how to best do the following:

Advance noninvasive means for prenatal genetic testing.

Improve diagnostic testing and care for genetic disorders.

Identify opportunities for personalized management of cancer.

Enhance discovery for complex and multigenic traits.

Optimally store the data so that patients and clinicians can access it as needed.

Safeguard patient privacy and prevent discrimination based on genetic data.

Share data with the research community in a way that is accessible and easy to use across multiple platforms.

Take advantage of differences between human DNA and that of infectious organisms to create new antimicrobials.

Take advantage of new tools for therapeutic gene therapy, gene editing, or gene silencing.

Genome-wide association (GWAS) is useful in identifying previously unknown susceptibility loci in complex diseases. (See "Genetic association and GWAS studies: Principles and applications".)

Other strategies for identifying disease genes take advantage of next-generation sequencing (NGS) and systems biology approaches, as it has been recognized that susceptibility loci identified by GWAS account for only a small proportion of the estimated heritability of common, complex traits [42-44]. (See "Next-generation DNA sequencing (NGS): Principles and clinical applications".)

Gene therapy, gene editing, and gene silencing methods and some of their potential clinical uses are discussed separately. (See "Overview of gene therapy, gene editing, and gene silencing".)

INFORMATION FOR PATIENTS — UpToDate offers two types of patient education materials, "The Basics" and "Beyond the Basics." The Basics patient education pieces are written in plain language, at the 5th to 6th grade reading level, and they answer the four or five key questions a patient might have about a given condition. These articles are best for patients who want a general overview and who prefer short, easy-to-read materials. Beyond the Basics patient education pieces are longer, more sophisticated, and more detailed. These articles are written at the 10th to 12th grade reading level and are best for patients who want in-depth information and are comfortable with some medical jargon.

Here are the patient education articles that are relevant to this topic. We encourage you to print or e-mail these topics to your patients. (You can also locate patient education articles on a variety of subjects by searching on "patient info" and the keyword(s) of interest.)

Basics topics (see "Patient education: Genetic testing (The Basics)")

SUMMARY

Nuclear and mitochondrial genomes – The nuclear genome provides the complete blueprint (other than mitochondrial genes) for the human genome in every nucleated cell in the body. It consists of approximately 21,000 protein-coding genes, along with RNA-coding, regulatory, and structural sequences distributed among 23 pairs of chromosomes (46,XX in females and 46,XY in males). Mitochondrial DNA, inherited exclusively from the mother, encodes 37 genes, 13 of which are protein coding. (See 'Nuclear genome' above and 'Mitochondrial genome' above.)

DNA, RNA, and protein – DNA consists of four bases (nucleotides; A, T, C, and G) on a sugar-phosphate backbone in two antiparallel strands with complementary base-pairing to form a double helix that supplies genetic information through its sequence. Transcription is the process by which messenger RNA (mRNA; a single-stranded, less-stable nucleic acid than DNA) is synthesized using the DNA as a template (figure 6). Translation is the process by which proteins are synthesized using mRNA as a template (figure 10), using a combination of 20 amino acids. (See 'Gene expression' above.)

Inheritance patterns – Inheritance of single gene (monogenic) disorders can follow classic Mendelian patterns (autosomal dominant, autosomal recessive, X-linked); sometimes these patterns can vary due to factors such as incomplete penetrance, variable expression, mosaicism, and other processes. Most common traits and disorders are polygenic or multifactorial and are further influenced by environmental factors. (See 'Modes of inheritance' above and "Inheritance patterns of monogenic disorders (Mendelian and non-Mendelian)".)

Types of genetic variation – Genetic variation can occur in the germline or in somatic tissues. The consequences can be advantageous (facilitating genetic diversity and evolution) or deleterious (leading to disease due to dysfunction of vital cellular proteins). Variation can occur on different scales, from single nucleotide differences, insertions, and small deletions; to epigenetic changes to alterations of large regions of chromosomes (copy number variants, translocations) or entire chromosomes (aneuploidies). Large-scale variation affecting entire chromosomes or larger regions of chromosomes is discussed in detail separately. (See 'Genetic variation' above and "Basic genetics concepts: Chromosomes and cell division", section on 'Numerical and structural chromosome variation'.)

Clinical implications – The clinical implications of genetics continue to expand. Examples include screening in healthy people; testing for disease risk, diagnosis, or disease stratification; or treating a variety of diseases from infections, to cancer, to various single gene disorders. By convention, variants are classified from pathogenic to benign according to the confidence in their association with disease. (See 'Use of sequence information in clinical care' above and 'Clinical classification of pathogenicity' above.)

  1. Willyard C. New human gene tally reignites debate. Nature 2018; 558:354.
  2. Collins FS, Green ED, Guttmacher AE, et al. A vision for the future of genomics research. Nature 2003; 422:835.
  3. Plasterk RH. RNA silencing: the genome's immune system. Science 2002; 296:1263.
  4. Temin HM. The DNA Provirus Hypothesis. Nobel Prize Lecture, 1975. Available at: nobelprize.org/medicine/laureates/1975/temin-lecture.pdf (Accessed on March 21, 2012).
  5. Murphy A, Chu JH, Xu M, et al. Mapping of numerous disease-associated expression polymorphisms in primary peripheral blood CD4+ lymphocytes. Hum Mol Genet 2010; 19:4745.
  6. Nirenberg M. Protein synthesis and the RNA code. Harvey Lect 1965; 59:155.
  7. Barrell BG, Bankier AT, Drouin J. A different genetic code in human mitochondria. Nature 1979; 282:189.
  8. Hall BD. Mitochondria spring surprises. Nature 1979; 282:129.
  9. Chakravarti A. It's raining SNPs, hallelujah? Nat Genet 1998; 19:216.
  10. Li WH, Sadler LA. Low nucleotide diversity in man. Genetics 1991; 129:513.
  11. Stephens JC, Schneider JA, Tanguay DA, et al. Haplotype variation and linkage disequilibrium in 313 human genes. Science 2001; 293:489.
  12. Halushka MK, Fan JB, Bentley K, et al. Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nat Genet 1999; 22:239.
  13. 1000 Genomes Project Consortium, Abecasis GR, Altshuler D, et al. A map of human genome variation from population-scale sequencing. Nature 2010; 467:1061.
  14. Rideout WM 3rd, Coetzee GA, Olumi AF, Jones PA. 5-Methylcytosine as an endogenous mutagen in the human LDL receptor and p53 genes. Science 1990; 249:1288.
  15. Nabel CS, Manning SA, Kohli RM. The curious chemical biology of cytosine: deamination, methylation, and oxidation as modulators of genomic potential. ACS Chem Biol 2012; 7:20.
  16. Mills RE, Luttig CT, Larkins CE, et al. An initial map of insertion and deletion (INDEL) variation in the human genome. Genome Res 2006; 16:1182.
  17. Krawczak M, Cooper DN. Gene deletions causing human genetic disease: mechanisms of mutagenesis and the role of the local DNA sequence environment. Hum Genet 1991; 86:425.
  18. Taylor MS, Ponting CP, Copley RR. Occurrence and consequences of coding sequence insertions and deletions in Mammalian genomes. Genome Res 2004; 14:555.
  19. Liu R, Paxton WA, Choe S, et al. Homozygous defect in HIV-1 coreceptor accounts for resistance of some multiply-exposed individuals to HIV-1 infection. Cell 1996; 86:367.
  20. Samson M, Libert F, Doranz BJ, et al. Resistance to HIV-1 infection in caucasian individuals bearing mutant alleles of the CCR-5 chemokine receptor gene. Nature 1996; 382:722.
  21. Eugen-Olsen J, Iversen AK, Garred P, et al. Heterozygosity for a deletion in the CKR-5 gene leads to prolonged AIDS-free survival and slower CD4 T-cell decline in a cohort of HIV-seropositive individuals. AIDS 1997; 11:305.
  22. Kerem B, Rommens JM, Buchanan JA, et al. Identification of the cystic fibrosis gene: genetic analysis. Science 1989; 245:1073.
  23. Riordan JR, Rommens JM, Kerem B, et al. Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science 1989; 245:1066.
  24. Rommens JM, Zengerling S, Burns J, et al. Identification and regional localization of DNA markers on chromosome 7 for the cloning of the cystic fibrosis gene. Am J Hum Genet 1988; 43:645.
  25. A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes. The Huntington's Disease Collaborative Research Group. Cell 1993; 72:971.
  26. Lalioti MD, Scott HS, Buresi C, et al. Dodecamer repeat expansion in cystatin B gene in progressive myoclonus epilepsy. Nature 1997; 386:847.
  27. Muragaki Y, Mundlos S, Upton J, Olsen BR. Altered growth and branching patterns in synpolydactyly caused by mutations in HOXD13. Science 1996; 272:548.
  28. Verkerk AJ, Pieretti M, Sutcliffe JS, et al. Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome. Cell 1991; 65:905.
  29. Buxton J, Shelbourne P, Davies J, et al. Detection of an unstable fragment of DNA specific to individuals with myotonic dystrophy. Nature 1992; 355:547.
  30. Harley HG, Brook JD, Rundle SA, et al. Expansion of an unstable DNA region and phenotypic variation in myotonic dystrophy. Nature 1992; 355:545.
  31. Schlötterer C. Evolutionary dynamics of microsatellite DNA. Chromosoma 2000; 109:365.
  32. Ellegren H. Microsatellite mutations in the germline: implications for evolutionary inference. Trends Genet 2000; 16:551.
  33. Talamanca L, Gobet C, Naef F. Sex-dimorphic and age-dependent organization of 24-hour gene expression rhythms in humans. Science 2023; 379:478.
  34. Pardi N, Hogan MJ, Porter FW, Weissman D. mRNA vaccines - a new era in vaccinology. Nat Rev Drug Discov 2018; 17:261.
  35. Richards S, Aziz N, Bale S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med 2015; 17:405.
  36. https://www.ncbi.nlm.nih.gov/clinvar/.
  37. https://gnomad.broadinstitute.org/about (Accessed on April 15, 2020).
  38. https://www.cdc.gov/genomics/implementation/toolkit/tier1.htm (Accessed on June 24, 2020).
  39. https://doi.org/10.31478/201812a (Accessed on June 30, 2020).
  40. Miller DT, Lee K, Gordon AS, et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2021 update: a policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet Med 2021; 23:1391.
  41. Khera AV, Emdin CA, Drake I, et al. Genetic Risk, Adherence to a Healthy Lifestyle, and Coronary Disease. N Engl J Med 2016; 375:2349.
  42. Prins BP, Lagou V, Asselbergs FW, et al. Genetics of coronary artery disease: genome-wide association studies and beyond. Atherosclerosis 2012; 225:1.
  43. Guerra SG, Vyse TJ, Cunninghame Graham DS. The genetics of lupus: a functional perspective. Arthritis Res Ther 2012; 14:211.
  44. Marian AJ. Elements of 'missing heritability'. Curr Opin Cardiol 2012; 27:197.
Topic 2900 Version 27.0

References

آیا می خواهید مدیلیب را به صفحه اصلی خود اضافه کنید؟