CTCFBSDB: a CTCF binding site database for characterization of vertebrate genomic insulators
  Home Search Experimentally Identified CTCFBS Predicted CTCFBS CTCFBS Prediction Tool Data Submission Help  

A new version of the database is available. To access it, click CTCFBSDB 2.0.

Table of Contents

  1. Background
  2. Content
  3. Browsing Features
  4. In silico CTCFBS prediction tool
  5. Data download
  6. Data submission
  7. References
  8. Contact us

1. Background

CCCTC-binding factor (CTCF) is a versatile transcription regulator that is evolutionarily conserved from fruit fly to human. CTCF binds to different DNA sequences through combinatorial use of 11-zinc fingers, and shows distinct functions (transcription activation/repression and chromatin insulation) depending on the biological contexts 1. CTCF is the only protein identified so far in vertebrate that binds to insulators and shows enhancer-blocking activity. In eukaryotic genomes, maintenance of distinct chromatin domains is critical for transcription control. Insulators, with the functions of enhancer-blocking and domain-bordering, are critical regulatory elements for gene expression control 2,3. They represent a class of diverged DNA sequences capable of shielding genes against inappropriate cis-regulatory signals from their genomic neighborhood. Recent studies also linked insulators to epigenetics, such as imprinting 4,5 and X-chromosome inactivation 6. Recently, high-throughput chromatin immunoprecipitation experiments 7,8,12 and comparative genomic studies 8,9 have identified ten thousands of potential CTCFBS in the mammalian genomes. To analyze this important type of DNA regulatory elements, we created a CTCF binding site database (CTCFBSDB), a comprehensive collection of experimentally determined and computationally predicted CTCF binding sites (CTCFBS) from the literature. The database is designed to facilitate the studies on insulators and their roles in demarcating functional genomic domains.

2. Content

Currently, the database contains 34417 experimentally determined (110 INSUL_MAN, 244 INSUL_OHL 7, 13801 INSUL_REN 8 and 20262 INSUL_ZHAO 12) and 18905 predicted CTCF binding sites 8,9. Biological knowledge and data from multiple resources were integrated to annotate the CTCF binding sites. Users can browse insulator sequence features, function annotations, genomic contexts including histone methylation profiles, flanking gene expression patterns and orthologous regions in other mammalian genomes. Users can also retrieve data by text search, sequence search and genomic range search.

3. Browsing Features

  • Validation Method

  • In vitro binding CTCF binding is validated by in vitro assays (EMSA etc.).
    In vivo binding CTCF binding is validated by in vivo assays (ChIP etc.).
    Enhancer-blocking assay Enhancer blocking activity is validated by the assay of 10.

  • Flanking Gene Expression

  • The flanking gene expression track compares the expression status of genes flanking the CTCF binding site. Red indicates overexpression in the tissue, and green indicates underexpression. The expression data are obtained from the GNF Gene Expression Atlas 211, which contains genome-wide gene expression profiles of 61 mouse tissues and 79 human tissues. The raw data was base 2 log-transformed and normalized to have a zero mean and a standard deviation of one. The images were generated using the slcview software.

  • Genomic context

  • CTCFBS sequences were mapped onto the genomes using the BLAT program. The assemblies used are hg18 for human, mm8 for mouse, rn3 for rat and galGal2 for chicken. CTCF binding sites without chromosome location information usually means that they probably map to heterochromatic regions of the genome. We use UCSC genome browser to display the surrounding genomic context. Each query CTCFBS (red) and flanking genes within 100 kb distance from the CTCFBS are displayed in the genome browser. Other CTCF binding sites located within this genomic range are also displayed. We use different colors to designate the different sources of CTCF binding sites: yellow for INSUL_MAN, blue for INSUL_OHL, green for INSUL_REN and black for INSUL_PRE.

    H3K4 trimethylation (H3K4me3) and H3K27 trimethylation (H3K27me3) were found to be a pair of "Yin-Yang" modifications. High level of H3K4me3 and H3K27me3 marks gene activation and silencing respectively12 . It was also found that CTCF binding sites may demarcate the boundary of histone methylations-defined chromatin domains12. We integrated H3K4me3 and H3K27me3 maps with our CTCFBS map in the genome browser, in order to help the users generate hypotheses about functions of CTCF binding sites.

  • Orthologous region

  • Comparative genomic studies on Human, Mouse and Rat may provide clues for evolutionary dynamics of CTCFBS. To this end, we built the mammalian orthologous region track. The region encapsulating insulators and their flanking genes in the reference genome was used to retrieve orthologous regions in other two genomes from the UCSC precomputed block chains. Only the DNA block having the maximal alignment score against the query region was retained as the orthologous region. The aligned genomic sequences in up to 16 vertebrate genomes that are orthologous to the query CTCFBS sequence can be displayed by clicking " view alignment" button.

    4. In silico CTCFBS prediction tool

    CTCF uses different combinations of its zinc fingers to recognize divergent DNA sequences. Recent studies have identified core motifs for CTCFBS sequences and the motifs are represented by position weight matrices (PWM) . Altogether, four closely related PWM have been derived to accommodate the divergence of CTCFBS sequence8,9. We offer the users a simple web tool to search for CTCFBS core motifs in a query sequence. We used the STORM program13 and each of the four PWM to report the best hit in the query sequence.

    The PWM score corresponds to the log-odds of the observed sequence being generated by the motif versus being generated by the background. So a large positive score suggests a good match. Usually a sequence with a PWM score >3.0 is a suggestive match.

    5. Data Download

    All experimentally identified data can be downloaded at here
    All computationally predicted data can be downloaded at here.

    6. Data Submission

    To maintain an up-to-date resource, we encourage researchers to submit their insulator data to CTCFBSDB. Data can be submitted directly using our web submission interface. Submitted data will be manually checked and added to the database

    8. References

    1. Ohlsson, R., Renkawitz, R. and Lobanenkov, V. (2001) CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. Trends Genet, 17, 520-527.
    2. Bell, A.C., West, A.G. and Felsenfeld, G. (2001) Insulators and boundaries: versatile regulatory elements in the eukaryotic. Science, 291, 447-450.
    3. West, A.G., Gaszner, M. and Felsenfeld, G. (2002) Insulators: many functions, many mechanisms. Genes Dev, 16, 271-288.
    4. Hark, A.T., et al. (2000) CTCF mediates methylation-sensitive enhancer-blocking activity at the H19/Igf2 locus. Nature, 405, 486-489.
    5. Bell, A.C. and Felsenfeld, G. (2000) Methylation of a CTCF-dependent boundary controls imprinted expression of the Igf2 gene. Nature, 405, 482-485.
    6. Chao, W., et al. (2002) CTCF, a candidate trans-acting factor for X-inactivation choice. Science, 295, 345-347.
    7. Mukhopadhyay, R., et al. (2004) The binding sites for the chromatin insulator protein CTCF map to DNA methylation-free domains genome-wide. Genome Res, 14, 1594-1602.
    8. Kim, T.H., et al. (2007) Analysis of the vertebrate insulator protein CTCF-binding sites in the human genome. Cell, 128, 1231-1245.
    9. Xie, X., et al. (2007) Systematic discovery of regulatory motifs in conserved regions of the human genome, including thousands of CTCF insulator sites. Proc Natl Acad Sci U S A, 104, 7145-7150.
    10. Chung, J.H., et al. (1993) A 5' element of the chicken beta-globin domain serves as an insulator in human erythroid cells and protects against position effect in Drosophila. Cell 74: 505-514.
    11. Su, A.I., et al. (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. USA, 101: 6062-6067.
    12. Barski, A., et al. (2007) High-resolution profiling of histone methylations in the human genome. Cell, 129:823-837.
    13. Schones, D.E., et al.(2007) Statistical significance of cis-regulatory modules. BMC Bioinformatics, 8:19.

    9. Contact us

    Please send questions and comments to Dr. Yan Cui at University of Tennesee Health Science Center.