Solving the Mystery of Genomic 'Dark Matter'

Graduate Division

In human cells, three billion base pairs arrange themselves into sequences of As, Ts, Gs, and Cs to form genes. However, despite its large size, only 1% to 2% of the human genome is actually organized into genes. So, what does the remaining, mysterious, 98% of the human genome do?

A team of researchers headed by a group from Lawrence Berkeley National Labs is trying to solve the mystery by analyzing snake genomes to see if they could discern how the reptile lost its legs.

The Central Dogma states that DNA is transcribed into RNA, and RNA is translated into protein. Proteins are the functional products of genomes that carry out a variety of tasks inside, on the surface of, and outside of cells.

All cells in an organism contain all its genetic information. So how does a melanocyte in the skin know how to develop and behave differently than a neuron?

This gets back to the 98% of the human genome that does not encode a gene. Some of this genomic “dark matter” controls when or where a gene is transcribed, or how much product is produced.

There are several types of non-coding regulatory sequences in genomes, but promoters and enhancers are of particular interest.

Promoters are sequences of DNA before a gene that are bound by proteins called transcription factors (TFs).

Activating TFs recruit other proteins to the promoter of a gene that transcribe that gene’s DNA into RNA.

Enhancers are sequences of DNA that can also bind TFs. They can be located proximal to or distal from genes.

Enhancers can increase the activity of a promoter by recruiting transcriptional activators and bringing them closer to gene promoters, often through physical looping of DNA.

They can also be bound by repressive TFs, which recruit other proteins that decrease transcription of a gene, and prevent the enhancer from looping to a promoter.

In human cells, promoter and enhancer activity is regulated by epigenetic mechanisms. Epigenetic (above the genome) changes, like DNA methylation or histone modifications do not change the underlying DNA sequence, but influence how genes are expressed.

For example, DNA methylation in a promoter can prevent an activating TF from binding, which means the gene won’t be expressed.

Histone modifications can prevent enhancers from being able to loop DNA and reach their target promoter. There are also activating epigenetic changes.

Neurons and melanocytes have different active promoters and enhancers that lead to the transcription of different regions of DNA into RNA, and translation of RNA into protein, giving them their unique cell structure and function.

Enhancers and promoters can also have DNA sequence changes that influence their activity.

Studying how the sequences of enhancers and promoters vary between species can help scientists understand the evolutionary relationship of species.

Limb development is a morphological feature that is very different in vertebrate animals. Despite fish fins and mice legs looking very different from human arms and legs, the way the limbs are directed to form during development are very similar among them all.

A gene called Sonic hedgehog (SHH) is expressed in the developing limb bud in the Zone of Polarizing Activity (ZPA). SHH expression in the ZPA is under the control of an enhancer called the ZPA Regulatory Sequence, or ZRS.

If SHH is not correctly expressed in the limb bud, limb abnormalities, like too many digits, too few digits, or stunted limbs, can occur.

At UCSF, the lab of Nadav Ahituv studies the function of enhancers in development, disease, and evolution, including the enhancer control of SHH in limb development.

Snakes are an ideal species in which to study genetic control of limb development.

Basal snakes actually have a tiny, evolutionarily leftover pelvic girdle and hindlimb bones hidden underneath their scales, while advanced snakes lack both of these features. Advanced snakes comprise the majority of snakes alive today.

These two types of snakes provide morphological timepoints on the evolutionary continuum of limb loss in snakes.

As mentioned earlier, a team of researchers analyzed snake genomes to see if they could discern how snakes lost their legs.

First, the authors wanted to know if the ZRS, which is important in limb development of other vertebrates like humans and mice, also plays a role in snake limb development (or lack thereof).

They compared the genomes of the Burmese python, boa constrictor, king cobra, speckled rattlesnake, viper, and corn snake. The first two are basal snakes, and the remainder are advanced snakes.

Through evolutionary time, it is expected that some spontaneous mutations will randomly accumulate in the genome and persist in offspring. There is an expected substitution rate in non-coding DNA sequences, and divergence from this expected rate can suggest evolutionary selection.

Basal snakes have a ZRS that is 80% similar to lizards and demonstrate expected substitution rates. Advanced snakes have many more substitutions in the ZRS than expected, showing a fast evolutionary rate, which coincides with the loss of the pelvic girdle in advanced snakes.

Next, the authors wanted to know if the differences in ZRS observed between snakes were functional in animals.

To test this, the authors designed DNA constructs that put ZRSs in front of a gene called LacZ, which produces an enzyme that yields a blue color when given its substrate.

They put these DNA constructs into mice, harvested mouse embryos, and looked to see where each ZRS was driving expression.

Human, cow, horse, chicken, lizard, platypus, sloth, bony fish, and mouse ZRSs showed the same expression pattern in mouse limb buds, indicating that ZRS function was conserved amongst these species.

Dolphin, megabat, and cartilaginous fish ZRSs had expanded function. ZRS enhancer activity in basal snakes was reduced compared to the other vertebrates, and advanced snakes had no enhancer activity.

To determine how severely the shape of a limb would be changed by the reduced ZRS activity, the authors created transgenic mice using CRISPR/Cas9 gene editing.

They replaced the mouse ZRS with sequences of the same length from human, bony fish, python, and cobra. ZRS enhancers from human and bony fish produced normal limbs in mice, whereas python and cobra enhancers could not induce normal limb function, leading to nearly legless, “serpentized” mice.

The visible deficit in limb formation in the mouse was accompanied by decreased expression of SHH in the developing limb bud.

After seeing how drastically the snake enhancers altered limb development in mice, the authors wanted to pinpoint what part of the ZRS was responsible for the deficit.

They reexamined the DNA sequence information and found that snakes have a 17 base pair deletion in the ZRS that was conserved in all other limbed vertebrates and fish they analyzed.

They added this 17 base pair back to the python ZRS and created another transgenic mouse with the python + 17 base pair ZRS. This ZRS restored normal limb development and SHH expression in mice.

The authors analyzed this 17 base pair sequence and found that it contained a specific sequence that is known to be a binding site for an activating TF called ETS1.

When they examined the whole ZRS, five ETS1 binding sites are conserved across limbed vertebrates and fish. Only three ETS1 sites are conserved in snakes.

However, loss of two ETS1 binding sites in snakes was not sufficient to explain limb loss.

The ZRS was re-scanned for other TF binding sites, and loss of homeodomain TF binding sites were also found in snakes.

ZRS sequence changes, plus changes in other TF binding sites, explain how snakes lost their legs.