What is the difference between non-coding and intergenic regions?

What is the difference between non-coding and intergenic regions?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

The initial question was about understanding what is in the downstream of a gene in a eukaryotic organism. I understand that this region is located 3' of a gene, and therefore I would expect to find non-coding regions, but is there a difference between non-coding regions and an intergenic region?

A small background of what I am doing: I am studying the effect of retrotransposons in S. cerevisiae populations, and I already identified the potential impacts of these retrotransposons and these were the consequences:

Upstream gene variant 49 Downstream gene variant 42 Intergenic variant 4 Transcript ablation 3 Coding sequence variant 1 Feature elongation 1 3' UTR variant 1

So is it clever to say that downstream of a gene there are non-coding genes?

"Intergenic" is, well, an embarrassment, though it can be hard to avoid. Intergenic means, literally, between genes. Genes are, as you'd expect, genetically defined as regions of the chromosome with some genetic role. Between genes, we have, well, junk, spam, without meaning or significance, since otherwise, it would be a gene. In the 1990s everybody knew most of the genome was "junk DNA" and it was pretty straightforward. And the central dogma was DNA makes mRNA makes protein and if there was no mRNA making protein, or even no long protein, then they assumed there was no gene.

Well, the problem with using these terms nowadays is, first, it is very hard to show any given piece of DNA never has a role. After all, you can often remove a whole gene without visible effect on an organism, so how can some noncoding sequence 'between genes' be proved conclusively to be meaningless and of no relevance to either of its neighbors? And second, DNA definitely does have a role even if all it does is produce non-coding RNA, or act as a regulatory sequence that does. So for example, on OMIM you can look up many records for "intergenic", with results like H19, for which one term is "LONG INTERGENIC NONCODING RNA H19". Note also H19 is listed as the HGNC approved gene symbol, so it is an intergenic gene. Tricky, eh? (I have great respect for OMIM - this isn't their fault) Also, enhancers can be quite a long way from a gene, even past the end of another gene or affecting multiple genes, so even though a mutation in the enhancer might genetically act as an allele of a specific gene, it might be described in sequence terms as located in an intergenic region.

Bottom line: the intergenic regions you're reading about are going to be between things that are recognized as genes, and may or may not be known to have some effect on one or the other (or both… ). They won't contain CDS, which is always considered to be a gene, nor 5'UTR or 3'UTR because those are transcribed before or after a CDS. Past that, how they are being distinguished from "upstream sequence" and "downstream sequence", that could be completely arbitrary. They might have gone to WormBase and pulled out curated gene records, and added some number of base pairs before the start site as promoter - but I don't know that. You'll just have to try to find the particular reference for whatever tool you are using.

An intergenic region is just one of many types of non-coding sequence.

Others include:

  • promoters
  • regulatory binding sites
  • terminators
  • introns
  • aptamers
  • ribosome entry sites

There's a lot of stuff that's part of genes besides the protein coding sequence.

A good way to get a picture of the sorts of things there are besides coding regions is to browse the Sequence Ontology. If you browse to coding sequence and start browsing parent and cousin terms, you'll find a vast number including all of the things that I mentioned above, as well as intergenic region, with definitions associated with ear.

What is the difference between non-coding and intergenic regions? - Biology

An Intergenic region is a stretch of DNA sequences located between clusters of genes that contain few or no genes. Occasionally some intergenic DNA acts to control genes close by, but most of it has no currently known function. It is one of the DNA sequences collectively referred to as junk DNA, though it is only one phenomenon labeled such and in scientific.
Full article >>>

Intergenic region summary with 2 pages of encyclopedia entries, essays, summaries, . Analysis of 16S-23S intergenic spacer regions of the rRNA operons .
Full article >>>

This suggests that intergenic transcripts are subject to functional constraints . In general, intergenic transcripts tend to be expressed at low levels, sometimes .
Full article >>>

20% of intergenic regions, often occur in clumps and are rare . In order to find orthologous intergenic regions, we searched for pairs of .
Full article >>>

The 16S-23S rDNA intergenic spacer varies more significantly in size and . Heterogeneity among 16S-23S rRNA intergenic spacers of species within the ` .
Full article >>>

Encyclopedia information on Intergenic DNA . name implies, intergenic DNA refers to . Patterns of Evolutionary Constraints in Intronic and Intergenic DNA .
Full article >>>

Upstream intergenic regions were mined from the genome sequences using a PERL . Intergenic sequences can then be obtained by copy and pasting COGSearch .
Full article >>>

Zoology information, history, articles and research from Academic Journals, Newspapers, and magazines at Free Trial, Credit Card Required.
Full article >>>

As a comparison to the intergenic spacer locus and to assess linkage . Two of the linked intergenic spacer and p66 genotypes were unique to species .
Full article >>>

We generated a set of 66 intergenic sequences in Arabidopsis lyrata, a close . The intergenic regions included transposable element (TE) remnants and regions .
Full article >>>

more problematic for intergenic regions of plant nuclear genomes, because plant . In theory orthologous intergenic data is valuable for studying .
Full article >>>

Details on Gene Mutation, Intragenic, Intergenic, Point Mutations, Enzymatic Capacities, Normal Base Sequence, DNA Moleucle . may be intragenic or intergenic. .
Full article >>>

Arrows underline the three repetitive sequences found in this intergenic region. . ( B) Schematic representation of the intergenic region. .
Full article >>>

The Key to E. coli . Dr. Sophie Bachellier has a webpage for the IRUs (Intergenic Repeat Units ) . located in the intergenic regions of bacterial .
Full article >>>

PLoS ONE: an inclusive, peer-reviewed, open-access resource from the PUBLIC . Conserved intergenic sequences are identified by comparing the IGRs of the .
Full article >>>

. searching the genome of E. coli for intergenic regions of high sequence identity . Novel intergenic repeats of Escherichia coli K-12". Res. Microbiol. .
Full article >>>

many intergenic regions would be too highly. conserved among the sensu stricto species to . from S. cerevisiae that their intergenic sequences .
Full article >>>

Intergenic sequences were randomly excised from all 24 chromosomes in the human . training set included 12 000 intergenic fragments and 12 000 PET fragments. .
Full article >>>

BioInfoBank Library :: DNA, Intergenic :: physiology :: [Noncoding sequences of the eukaryotic genome as an additional protection of genes from chemical mutagens] .
Full article >>>

Characterization of the intergenic RNA profile at abdominal-A and Abdominal-B in . These observations suggest that the intergenic RNAs may play a role in .
Full article >>>

Differences between coding and non-coding regions in the Trichomonas vaginalis genome: an actin gene as a locus model 1

The sequence of a cloned genomic fragment of Trichomonas vaginalis containing a complete actin gene was determined. An uninterrupted open reading frame of 1128 nucleotides was found that codes for an actin gene. Two overlapped consensus promoter sequences for T. vaginalis were found 12 nucleotides upstream the actin initiation codon. In addition to actin, two incomplete open reading frames were found at the 5′ and 3′ ends of the clone. These two sequences are expressed and showed similarity to adenylate cyclase genes and a yeast hypothetical protein. The overall sequence showed a higher G+C content and a lower frequency of repeated sequences in the coding regions when compared with the non-coding regions. A similar unequal nucleotide distribution was found in various T. vaginalis genes retrieved from data bases.

Functional Microbial Diversity in Contaminated Environment and Application in Bioremediation

Satyanarayan Panigrahi , . Toleti Subba Rao , in Microbial Diversity in the Genomic Era , 2019 Ribosomal Intergenic Spacer Analysis (RISA)

rRNA intergenic spacer analysis (RISA) is a microbial community analysis method that involves PCR amplification of a region of the rRNA gene operon between the small (16S) and large (23S) subunits ( Fisher and Triplett, 1999 ) called the intergenic spacer region (ISR) ( Fig. 21.8 ).

Figure 21.8 . Ribosomal intergenic spacer analysis a typical overview.

By using oligonucleotide primers targeted to conserved regions in the 16S and 23S genes, RISA fragments can be generated from most of the dominant bacteria in an environmental sample. Majority of the rRNA operon serves a structural function portions of the 16S–23S intergenic region can encode tRNAs depending on the bacterial species. However, the taxonomic value of the ISR lies in the significant heterogeneity in both length and nucleotide sequence. ISR ranges between 150 and 1500 bp with the majority of the ISR lengths being between 150 and 500 bp. The automated version of RISA is known as ARISA and involves use of a fluorescence-labeled forward primer, and ISR fragments are detected automatically by a laser detector. ARISA allows simultaneous analysis of many samples however, the technique has been shown to overestimate microbial richness and diversity ( Fisher and Triplett, 1999 ). RISA has been used to detect microbial populations involved in the degradation of polyaromatic hydrocarbons at low temperature under aerobic and nitrate-reducing enriched soil conditions ( Eriksson et al., 2003 ). It was observed that dominance of bacteria belonging to Alpha-, Beta-, and Gamma-proteobacteria present in the enrichments were successfully brought into culture. A recent study utilized ARISA to investigate the diversity of hydrocarbon degrading bacteria and bacterial community response in oil spill-contaminated beach sands in the Gulf of Mexico ( Kostka et al., 2011 ). The authors observed increased abundance of Alcanivorax sequences in the oil-contaminated sand. Ranjard et al. (2001) used ARISA to characterize the bacterial communities from four different types of soil Their results demonstrated that ARISA is a very effective and sensitive method for detecting differences between complex bacterial communities at various spatial scales (between- and within-site variability).


Coding DNA refers to the DNA in the genome, containing for protein-coding genes while noncoding DNA refers to the other type of DNA, which does not code for proteins.

Percentage in the Genome

Coding DNA accounts only for 1% of the human genome while noncoding DNA accounts for 99% of the human genome.


Coding DNA composes of exons while noncoding DNA composes of regulatory elements, noncoding RNA genes, introns, pseudogenes, repeating sequences, and telomeres.

Encoding for Proteins

Coding DNA encodes for proteins while noncoding DNA does not encode for proteins.

Resultants of Transcription

Coding DNA undergoes transcription to synthesize mRNAs while noncoding DNA undergoes transcription to synthesize tRNAs, rRNAs, and other regulatory RNAs.

The function of the Gene Products

Proteins encoded by coding DNA have structural, functional, and regulatory importance in the cell while noncoding DNA is important for controlling gene activity.


Coding DNA is the type of DNA in the genome, encoding for protein-coding genes. Generally, these genes undergo transcription to synthesize mRNA. In eukaryotes, the coding region of protein-coding genes is interrupted by introns, which are removed after transcription. However, mRNAs undergo translation to produce proteins. Significantly, proteins play a key role in the cell by serving as structural, functional, and regulatory components of the cell. In contrast, noncoding DNA is another type of DNA, representing around 99% of the genome. However, it contains genes for noncoding RNAs, including tRNAs, rRNAs, and other regulatory RNAs, which are important in the translation of mRNA. Besides, noncoding DNA includes regulatory elements, introns, pseudogenes, repeating sequences, and telomeres. Therefore, the main difference between coding DNA and noncoding DNA is the type of genes present and their gene products.


1. “What Is Noncoding DNA? – Genetics Home Reference – NIH.” U.S. National Library of Medicine, National Institutes of Health, Available Here.

Image Courtesy:

1. “Gene structure eukaryote 2 annotated” By Thomas Shafee – Shafee T, Lowe R (2017). “Eukaryotic and prokaryotic gene structure”. WikiJournal of Medicine 4 (1). DOI:10.15347/wjm/2017.002. ISSN 20024436. (CC BY 4.0) via Commons Wikimedia
2. “TATA box mechanism” By Luttysar – Own work (CC BY-SA 4.0) via Commons Wikimedia
3. “DNA to protein or ncRNA” By Thomas Shafee – Own work (CC BY 4.0) via Commons Wikimedia

About the Author: Lakna

Lakna, a graduate in Molecular Biology & Biochemistry, is a Molecular Biologist and has a broad and keen interest in the discovery of nature related things


The amount of total genomic DNA varies widely between organisms, and the proportion of coding and non-coding DNA within these genomes varies greatly as well. For example, it was originally suggested that over 98% of the human genome does not encode protein sequences, including most sequences within introns and most intergenic DNA, [16] while 20% of a typical prokaryote genome is non-coding. [3]

In eukaryotes, genome size, and by extension the amount of non-coding DNA, is not correlated to organism complexity, an observation known as the C-value enigma. [17] For example, the genome of the unicellular Polychaos dubium (formerly known as Amoeba dubia) has been reported to contain more than 200 times the amount of DNA in humans. [18] The pufferfish Takifugu rubripes genome is only about one eighth the size of the human genome, yet seems to have a comparable number of genes approximately 90% of the Takifugu genome is non-coding DNA. [16] Therefore, most of the difference in genome size is not due to variation in amount of coding DNA, rather, it is due to a difference in the amount of non-coding DNA. [19]

In 2013, a new "record" for the most efficient eukaryotic genome was discovered with Utricularia gibba, a bladderwort plant that has only 3% non-coding DNA and 97% of coding DNA. Parts of the non-coding DNA were being deleted by the plant and this suggested that non-coding DNA may not be as critical for plants, even though non-coding DNA is useful for humans. [15] Other studies on plants have discovered crucial functions in portions of non-coding DNA that were previously thought to be negligible and have added a new layer to the understanding of gene regulation. [20]

Cis- and trans-regulatory elements Edit

Cis-regulatory elements are sequences that control the transcription of a nearby gene. Many such elements are involved in the evolution and control of development. [21] Cis-elements may be located in 5' or 3' untranslated regions or within introns. Trans-regulatory elements control the transcription of a distant gene.

Promoters facilitate the transcription of a particular gene and are typically upstream of the coding region. Enhancer sequences may also exert very distant effects on the transcription levels of genes. [22]

Introns Edit

Introns are non-coding sections of a gene, transcribed into the precursor mRNA sequence, but ultimately removed by RNA splicing during the processing to mature messenger RNA. Many introns appear to be mobile genetic elements. [23]

Studies of group I introns from Tetrahymena protozoans indicate that some introns appear to be selfish genetic elements, neutral to the host because they remove themselves from flanking exons during RNA processing and do not produce an expression bias between alleles with and without the intron. [23] Some introns appear to have significant biological function, possibly through ribozyme functionality that may regulate tRNA and rRNA activity as well as protein-coding gene expression, evident in hosts that have become dependent on such introns over long periods of time for example, the trnL-intron is found in all green plants and appears to have been vertically inherited for several billions of years, including more than a billion years within chloroplasts and an additional 2–3 billion years prior in the cyanobacterial ancestors of chloroplasts. [23]

Pseudogenes Edit

Pseudogenes are DNA sequences, related to known genes, that have lost their protein-coding ability or are otherwise no longer expressed in the cell. Pseudogenes arise from retrotransposition or genomic duplication of functional genes, and become "genomic fossils" that are nonfunctional due to mutations that prevent the transcription of the gene, such as within the gene promoter region, or fatally alter the translation of the gene, such as premature stop codons or frameshifts. [24] Pseudogenes resulting from the retrotransposition of an RNA intermediate are known as processed pseudogenes pseudogenes that arise from the genomic remains of duplicated genes or residues of inactivated genes are nonprocessed pseudogenes. [24] Transpositions of once functional mitochondrial genes from the cytoplasm to the nucleus, also known as NUMTs, also qualify as one type of common pseudogene. [25] Numts occur in many eukaryotic taxa.

While Dollo's Law suggests that the loss of function in pseudogenes is likely permanent, silenced genes may actually retain function for several million years and can be "reactivated" into protein-coding sequences [26] and a substantial number of pseudogenes are actively transcribed. [24] [27] Because pseudogenes are presumed to change without evolutionary constraint, they can serve as a useful model of the type and frequencies of various spontaneous genetic mutations. [28]

Repeat sequences, transposons and viral elements Edit

Transposons and retrotransposons are mobile genetic elements. Retrotransposon repeated sequences, which include long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs), account for a large proportion of the genomic sequences in many species. Alu sequences, classified as a short interspersed nuclear element, are the most abundant mobile elements in the human genome. Some examples have been found of SINEs exerting transcriptional control of some protein-encoding genes. [29] [30] [31]

Endogenous retrovirus sequences are the product of reverse transcription of retrovirus genomes into the genomes of germ cells. Mutation within these retro-transcribed sequences can inactivate the viral genome. [32]

Over 8% of the human genome is made up of (mostly decayed) endogenous retrovirus sequences, as part of the over 42% fraction that is recognizably derived of retrotransposons, while another 3% can be identified to be the remains of DNA transposons. Much of the remaining half of the genome that is currently without an explained origin is expected to have found its origin in transposable elements that were active so long ago (> 200 million years) that random mutations have rendered them unrecognizable. [33] Genome size variation in at least two kinds of plants is mostly the result of retrotransposon sequences. [34] [35]

Telomeres Edit

Telomeres are regions of repetitive DNA at the end of a chromosome, which provide protection from chromosomal deterioration during DNA replication. Recent studies have shown that telomeres function to aid in its own stability. Telomeric repeat-containing RNA (TERRA) are transcripts derived from telomeres. TERRA has been shown to maintain telomerase activity and lengthen the ends of chromosomes. [36]

The term "junk DNA" became popular in the 1960s. [37] [38] According to T. Ryan Gregory, the nature of junk DNA was first discussed explicitly in 1972 by a genomic biologist, David Comings, who applied the term to all non-coding DNA. [39] The term was formalized that same year by Susumu Ohno, [19] who noted that the mutational load from deleterious mutations placed an upper limit on the number of functional loci that could be expected given a typical mutation rate. Ohno hypothesized that mammal genomes could not have more than 30,000 loci under selection before the "cost" from the mutational load would cause an inescapable decline in fitness, and eventually extinction. This prediction remains robust, with the human genome containing approximately (protein-coding) 20,000 genes. Another source for Ohno's theory was the observation that even closely related species can have widely (orders-of-magnitude) different genome sizes, which had been dubbed the C-value paradox in 1971. [6]

The term "junk DNA" has been questioned on the grounds that it provokes a strong a priori assumption of total non-functionality and some have recommended using more neutral terminology such as "non-coding DNA" instead. [39] Yet "junk DNA" remains a label for the portions of a genome sequence for which no discernible function has been identified and that through comparative genomics analysis appear under no functional constraint suggesting that the sequence itself has provided no adaptive advantage.

Since the late 70s it has become apparent that the majority of non-coding DNA in large genomes finds its origin in the selfish amplification of transposable elements, of which W. Ford Doolittle and Carmen Sapienza in 1980 wrote in the journal Nature: "When a given DNA, or class of DNAs, of unproven phenotypic function can be shown to have evolved a strategy (such as transposition) which ensures its genomic survival, then no other explanation for its existence is necessary." [40] The amount of junk DNA can be expected to depend on the rate of amplification of these elements and the rate at which non-functional DNA is lost. [41] In the same issue of Nature, Leslie Orgel and Francis Crick wrote that junk DNA has "little specificity and conveys little or no selective advantage to the organism". [42] The term occurs mainly in popular science and in a colloquial way in scientific publications, and it has been suggested that its connotations may have delayed interest in the biological functions of non-coding DNA. [43]

Some evidence indicate that some "junk DNA" sequences are sources for (future) functional activity in evolution through exaptation of originally selfish or non-functional DNA. [44]

ENCODE Project Edit

In 2012, the ENCODE project, a research program supported by the National Human Genome Research Institute, reported that 76% of the human genome's non-coding DNA sequences were transcribed and that nearly half of the genome was in some way accessible to genetic regulatory proteins such as transcription factors. [1] However, the suggestion by ENCODE that over 80% of the human genome is biochemically functional has been criticized by other scientists, [5] who argue that neither accessibility of segments of the genome to transcription factors nor their transcription guarantees that those segments have biochemical function and that their transcription is selectively advantageous. After all, non-functional sections of the genome can be transcribed, given that transcription factors typically bind to short sequences that are found (randomly) all over the whole genome. [45]

Furthermore, the much lower estimates of functionality prior to ENCODE were based on genomic conservation estimates across mammalian lineages. [6] [7] [8] [9] Widespread transcription and splicing in the human genome has been discussed as another indicator of genetic function in addition to genomic conservation which may miss poorly conserved functional sequences. [11] Furthermore, much of the apparent junk DNA is involved in epigenetic regulation and appears to be necessary for the development of complex organisms. [4] [13] [14] Genetic approaches may miss functional elements that do not manifest physically on the organism, evolutionary approaches have difficulties using accurate multispecies sequence alignments since genomes of even closely related species vary considerably, and with biochemical approaches, though having high reproducibility, the biochemical signatures do not always automatically signify a function. [11] Kellis et al. noted that 70% of the transcription coverage was less than 1 transcript per cell (and may thus be based on spurious background transcription). On the other hand, they argued that 12–15% fraction of human DNA may be under functional constraint, and may still be an underestimate when lineage-specific constraints are included. Ultimately genetic, evolutionary, and biochemical approaches can all be used in a complementary way to identify regions that may be functional in human biology and disease. [11] Some critics have argued that functionality can only be assessed in reference to an appropriate null hypothesis. In this case, the null hypothesis would be that these parts of the genome are non-functional and have properties, be it on the basis of conservation or biochemical activity, that would be expected of such regions based on our general understanding of molecular evolution and biochemistry. According to these critics, until a region in question has been shown to have additional features, beyond what is expected of the null hypothesis, it should provisionally be labelled as non-functional. [46]

Some non-coding DNA sequences must have some important biological function. This is indicated by comparative genomics studies that report highly conserved regions of non-coding DNA, sometimes on time-scales of hundreds of millions of years. This implies that these non-coding regions are under strong evolutionary pressure and positive selection. [47] For example, in the genomes of humans and mice, which diverged from a common ancestor 65–75 million years ago, protein-coding DNA sequences account for only about 20% of conserved DNA, with the remaining 80% of conserved DNA represented in non-coding regions. [48] Linkage mapping often identifies chromosomal regions associated with a disease with no evidence of functional coding variants of genes within the region, suggesting that disease-causing genetic variants lie in the non-coding DNA. [48] The significance of non-coding DNA mutations in cancer was explored in April 2013. [49]

Non-coding genetic polymorphisms play a role in infectious disease susceptibility, such as hepatitis C. [50] Moreover, non-coding genetic polymorphisms contribute to susceptibility to Ewing sarcoma, an aggressive pediatric bone cancer. [51]

Some specific sequences of non-coding DNA may be features essential to chromosome structure, centromere function and recognition of homologous chromosomes during meiosis. [52]

According to a comparative study of over 300 prokaryotic and over 30 eukaryotic genomes, [53] eukaryotes appear to require a minimum amount of non-coding DNA. The amount can be predicted using a growth model for regulatory genetic networks, implying that it is required for regulatory purposes. In humans the predicted minimum is about 5% of the total genome.

Over 10% of 32 mammalian genomes may function through the formation of specific RNA secondary structures. [54] The study used comparative genomics to identify compensatory DNA mutations that maintain RNA base-pairings, a distinctive feature of RNA molecules. Over 80% of the genomic regions presenting evolutionary evidence of RNA structure conservation do not present strong DNA sequence conservation.

Non-coding DNA may perhaps serve to decrease the probability of gene disruption during chromosomal crossover. [55]

Evidence from Polygenic Scores and GWAS Edit

Genome-wide association studies (GWAS) and machine learning analysis of large genomic datasets has led to the construction of polygenic predictors for human traits such as height, bone density, and many disease risks. Similar predictors exist for plant and animal species and are used in agricultural breeding. [57] The detailed genetic architecture of human predictors has been analyzed and significant effects used in prediction are associated with DNA regions far outside coding regions. The fraction of variance accounted for (i.e., fraction of predictive power captured by the predictor) in coding vs. non-coding regions varies widely for different complex traits. For example, atrial fibrillation and coronary artery disease risk are mostly controlled by variants in non-coding regions (non-coding variance fraction over 70 percent), whereas diabetes and high cholesterol display the opposite pattern (non-coding variance roughly 20-30 percent). [56] Individual differences between humans are clearly affected in a significant way by non-coding genetic loci, which is strong evidence for functional effects. Whole exome genotypes (i.e., which contain information restricted to coding regions only) do not contain enough information to build or even evaluate polygenic predictors for many well-studied complex traits and disease risks.

In 2013, it was estimated that, in general, up to 85% of GWAS loci have non-coding variants as the likely causal association. The variants are often common in populations and were predicted to affect disease risks through small phenotypic effects, as opposed to the large effects of Mendelian variants. [58]

Some non-coding DNA sequences determine the expression levels of various genes, both those that are transcribed to proteins and those that themselves are involved in gene regulation. [59] [60] [61]

Transcription factors Edit

Some non-coding DNA sequences determine where transcription factors attach. [59] A transcription factor is a protein that binds to specific non-coding DNA sequences, thereby controlling the flow (or transcription) of genetic information from DNA to mRNA. [62] [63]

Operators Edit

An operator is a segment of DNA to which a repressor binds. A repressor is a DNA-binding protein that regulates the expression of one or more genes by binding to the operator and blocking the attachment of RNA polymerase to the promoter, thus preventing transcription of the genes. This blocking of expression is called repression. [64]

Enhancers Edit

An enhancer is a short region of DNA that can be bound with proteins (trans-acting factors), much like a set of transcription factors, to enhance transcription levels of genes in a gene cluster. [65]

Silencers Edit

A silencer is a region of DNA that inactivates gene expression when bound by a regulatory protein. It functions in a very similar way as enhancers, only differing in the inactivation of genes. [66]

Promoters Edit

A promoter is a region of DNA that facilitates transcription of a particular gene when a transcription factor binds to it. Promoters are typically located near the genes they regulate and upstream of them. [67]

Insulators Edit

A genetic insulator is a boundary element that plays two distinct roles in gene expression, either as an enhancer-blocking code, or rarely as a barrier against condensed chromatin. An insulator in a DNA sequence is comparable to a linguistic word divider such as a comma in a sentence, because the insulator indicates where an enhanced or repressed sequence ends. [68]

Evolution Edit

Shared sequences of apparently non-functional DNA are a major line of evidence of common descent. [69]

Pseudogene sequences appear to accumulate mutations more rapidly than coding sequences due to a loss of selective pressure. [28] This allows for the creation of mutant alleles that incorporate new functions that may be favored by natural selection thus, pseudogenes can serve as raw material for evolution and can be considered "protogenes". [70]

A study published in 2019 shows that new genes (termed de novo gene birth) can be fashioned from non-coding regions. [71] Some studies suggest at least one-tenth of genes could be made in this way. [71]

Long range correlations Edit

A statistical distinction between coding and non-coding DNA sequences has been found. It has been observed that nucleotides in non-coding DNA sequences display long range power law correlations while coding sequences do not. [72] [73] [74]

Forensic anthropology Edit

Police sometimes gather DNA as evidence for purposes of forensic identification. As described in Maryland v. King, a 2013 U.S. Supreme Court decision: [75]

The current standard for forensic DNA testing relies on an analysis of the chromosomes located within the nucleus of all human cells. 'The DNA material in chromosomes is composed of "coding" and "non-coding" regions. The coding regions are known as genes and contain the information necessary for a cell to make proteins. . . . Non-protein coding regions . . . are not related directly to making proteins, [and] have been referred to as "junk" DNA.' The adjective "junk" may mislead the lay person, for in fact this is the DNA region used with near certainty to identify a person. [75]

More Clues that Intergenic DNA Is Functional

You&rsquore an enzyme of RNA polymerase floating in the nucleus of a cell. Your job is to transcribe a gene, but you are blind and it&rsquos dark. Other machines guide you to a promoter, where your work begins, but how do you know which direction to read?

Two recent papers add more insight to the wondrous design of DNA transcription. Both papers recognize that protein-coding genes represent only a tiny part, about 3%, of the DNA in a cell. The looming question that the ENCODE project began to answer last year is, how much of that intergenic DNA is functional? Since most of it is transcribed (a process that requires the expenditure of energy), the cell presumably performs all that work for a reason.

The first paper, published in Nature, examined how RNA polymerase (RNAP) knows which way to begin transcription. Gene starts are designated by &ldquopromoter&rdquo regions, but from that point, RNAP can read either direction on either fork, once the double helix is unwound. The authors found that two DNA segments, working against each other, regulate the reading of genes and non-genes.

One, named PAS, controls whether a polyadenylation tail (a series of adenines, or &ldquoA&rdquo letters in the code), is added to the growing messenger RNA (mRNA). For genes, that tail prepares the mRNA for export from the nucleus. For intergenic transcripts, though, polyadenylation signals other enzymes to cleave it into small transcripts.

The other sequence, named U1 snRNP, controls whether the mRNA is cleaved after transcription by suppressing polyadenylation. When present, it allows RNAP to proceed uninterrupted.

Gene regions are rich in U1 snRNP but low in PAS. The reverse is true for intergenic regions. The authors believe this is how RNAP avoids excessive transcribing of non-coding DNA. The shortened, cleaved transcripts, like lincRNAs, stay in the nucleus to perform other functions. A report from MIT explains how these sequences offset each other:

The work demonstrates the important role of U1 snRNP in protecting mRNA as it is transcribed from genes and in preventing the cell from unnecessary copying of non-protein-coding DNA, says Gideon Dreyfuss, a professor of biochemistry and biophysics at the University of Pennsylvania School of Medicine.

&ldquoThey&rsquove identified a very likely mechanism for early termination of these upstream RNAs by depriving them of U1 snRNP suppression of polyadenylation and cleavage,&rdquo says Dreyfuss, who was not part of the research team.

The authors of the Nature paper, though, remained undecided about the roles of these upstream, intergenic transcripts:

The function of all of this upstream noncoding RNA is still a subject of much investigation. &ldquoThat transcriptional process could produce an RNA that has some function, or it could be a product of the nature of the biochemical reaction. This will be debated for a long time,&rdquo Sharp says.

His lab is now exploring the relationship between this transcription process and the observation of large numbers of so-called long noncoding RNAs (lncRNAs). He plans to investigate the mechanisms that control the synthesis of such RNAs and try to determine their functions. (Emphasis added.)

In their paper, the authors toss in a Darwinian speculation. They proposed that upstream antisense RNAs (uaRNA), or RNAs transcribed in the &ldquowrong&rdquo direction, might represent ancestors of protein-coding genes, and that lncRNAs are intermediate forms that gained or lost U1 snRNP and polyadenylation sequences. They found some differences in U1 snRNP counts between orthologous regions in human and mouse genomes as support for the idea.

This hypothesis, though, seems absurd for several reasons. For one, how or why would a non-functional transcript acquire a function? Before it had a function, why would it be transcribed and conserved? Natural selection cannot act to &ldquostore up&rdquo variations in hopes of finding a future function. Functional protein sequences, as William Dembski and Robert Marks have shown, represent a tiny fraction of sequence space. Imagining that a blind, unguided process would find one of them seems optimistic to the point of being ridiculous. The authors did not pursue their wishful thinking in detail, but rather dropped the subject after a brief mention, focusing primarily on the &ldquoU1-PAS axis&rdquo as having &ldquowide use as a general mechanism to regulate transcription elongation in mammals.&rdquo Regulation by a mechanism is the language of design.

A second paper, in PLoS Genetics, is more confident that the intergenic transcripts are functional. Confirming what ENCODE found last year (that at least 85% of intergenic regions are transcribed and regulated), these authors believe functions are soon to be discovered in the forest of intergenic DNA. The Abstract says:

Known protein coding gene exons compose less than 3% of the human genome. The remaining 97% is largely uncharted territory, with only a small fraction characterized. The recent observation of transcription in this intergenic territory has stimulated debate about the extent of intergenic transcription and whether these intergenic RNAs are functional. Here we directly observed with a large set of RNA-seq data covering a wide array of human tissue types that the majority of the genome is indeed transcribed, corroborating recent observations by the ENCODE project. Furthermore, using de novo transcriptome assembly of this RNA-seq data, we found that intergenic regions encode far more long intergenic noncoding RNAs (lincRNAs) than previously described, helping to resolve the discrepancy between the vast amount of observed intergenic transcription and the limited number of previously known lincRNAs. In total, we identified tens of thousands of putative lincRNAs expressed at a minimum of one copy per cell, significantly expanding upon prior lincRNA annotation sets. These lincRNAs are specifically regulated and conserved rather than being the product of transcriptional noise. In addition, lincRNAs are strongly enriched for trait-associated SNPs suggesting a new mechanism by which intergenic trait-associated regions may function. These findings will enable the discovery and interrogation of novel intergenic functional elements.

The clear implication is that lincRNAs are functional, else why would the cell regulate them and ensure their conservation? The authors&rsquo optimism continues in their Introduction:

A large fraction of the human genome consists of intergenic sequence. Once referred to as &ldquojunk DNA&rdquo, it is now clear that functional elements exist in intergenic regions. In fact, genome wide association studies have revealed that approximately half of all disease and trait-associated genomic regions are intergenic. While some of these regions may function solely as DNA elements, it is now known that intergenic regions can be transcribed, and a growing list of functional noncoding RNA genes within intergenic regions has emerged.

What do we know about lincRNA functions at this time?

Long intergenic noncoding RNAs (lincRNAs) are defined as intergenic (relative to current gene annotations) transcripts longer than 200 nucleotides in length that lack protein coding capacity. LincRNAs are known to perform myriad functions through diverse mechanisms ranging from the regulation of epigenetic modifications and gene expression to acting as scaffolds for protein signaling complexes.

Since these authors found significantly more lincRNAs in their survey than previously known, the implication is that more of those &ldquomyriad functions&rdquo are waiting to be found. (For more functions already discovered, see the lncRNA blog.) Here&rsquos their concluding statement:

Owing to the extended breadth of tissues sampled and relaxed constraints on transcript structure, we find significantly more lincRNAs than all previous lincRNA annotation sets combined. Our analyses revealed that these lincRNAs display many features consistent with functionality, contrasting prior claims that intergenic transcription is primarily the product of transcriptional noise. In sum, our findings corroborate recent reports of pervasive transcription across the human genome and demonstrate that intergenic transcription results in the production of a large number of previously unknown lincRNAs. We provide this vastly expanded lincRNA annotation set as an important resource for the study of intergenic functional elements in human health and disease.

It&rsquos clear that the search for function is driving this cutting-edge research. Search for function is exactly what intelligent-design science would recommend. Darwinians describe natural selection as a tinkerer, generating useless parts as well as structures cobbled together that might do something by chance, since there is no supervising designer to guide the process in a particular way. By contrast, intelligent design expects that what exists, as the product of mind, is there for a reason.

Remember how Darwinists call ID a &ldquoscience stopper,&rdquo since it supposedly counsels just giving up and saying, &ldquoGod did it&rdquo? The real science stopper is Darwinism. It focused only on protein-coding genes and dismissed everything else as &ldquotranscriptional noise&rdquo or &ldquojunk DNA&rdquo left behind by the blind tinkerer. Why waste time studying junk? Were it not for that attitude, our understanding of intergenic DNA function might have been much farther along by now.


Long intergenic non-coding RNA (lincRNA) genes have diverse features that distinguish them from mRNA-encoding genes and exercise functions such as remodelling chromatin and genome architecture, RNA stabilization and transcription regulation, including enhancer-associated activity. Some genes currently annotated as encoding lincRNAs include small open reading frames (smORFs) and encode functional peptides and thus may be more properly classified as coding RNAs. lincRNAs may broadly serve to fine-tune the expression of neighbouring genes with remarkable tissue specificity through a diversity of mechanisms, highlighting our rapidly evolving understanding of the non-coding genome.

Which are found in non coding sections of DNA?

Non-coding DNA sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, ribosomal RNA, and regulatory RNAs).

One may also ask, what are the coding regions of DNA called? Coding DNA sequences are separated by long regions of DNA called introns that have no apparent function. Coding DNA is also known as an exon.

Likewise, people ask, why do you have non coding areas of DNA?

Some introns can regulate transfer RNA and ribosomal RNA activity and protein-coding gene expression. A very important non-coding sequence of DNA is called a telomere, which is a region of repetitive DNA at the end of a chromosome and protects coding DNA from being lost during cell division.

Genomic Variation between Organisms

The amount of total genomic DNA varies widely between organisms, and the proportion of coding and noncoding DNA within these genomes varies greatly as well. More than 98% of the human genome does not encode protein sequences, including most sequences within introns and most intergenic DNA. While overall genome size, and by extension the amount of noncoding DNA, are correlated to organism complexity, there are many exceptions. For example, the genome of the unicellular Polychaos dubium (formerly known as Amoeba dubia) has been reported to contain more than 200 times the amount of DNA in humans. The pufferfish Takifugu rubripes genome is only about one eighth the size of the human genome, yet seems to have a comparable number of genes approximately 90% of the Takifugu genome is noncoding DNA.

In 2013, a new &ldquorecord&rdquo for most efficient genome was discovered. Utricularia gibba, a bladderwort plant, has only 3% noncoding DNA. The extensive variation in nuclear genome size among eukaryotic species is known as the C-value enigma or C-value paradox. Most of the genome size difference appears to lie in the noncoding DNA. About 80 percent of the nucleotide bases in the human genome may be transcribed, but transcription does not necessarily imply function.

Figure (PageIndex<1>): Utricularia gibba flower: Utricularia gibba has 3% noncoding DNA, which is low for flowering plants. This 3% has given this plant the title the &lsquomost efficient&rsquo genome.

Watch the video: Ινστιτούτο Μοριακής Βιολογίας και Βιοτεχνολογίας ΙΜΒΒ (August 2022).