CTCFBSDB: a CTCF binding site database for characterization of vertebrate genomic insulators | ||||||||
Home | Search | Experimentally Identified CTCFBS | Predicted CTCFBS | CTCFBS Prediction Tool | Data Submission | Help |
A new version of the database is available. To access it, click CTCFBSDB 2.0. |
Table of Contents
Currently, the database contains 34417 experimentally determined (110 INSUL_MAN, 244 INSUL_OHL 7, 13801 INSUL_REN 8 and 20262 INSUL_ZHAO 12) and 18905 predicted CTCF binding sites 8,9. Biological knowledge and data from multiple resources were integrated to annotate the CTCF binding sites. Users can browse insulator sequence features, function annotations, genomic contexts including histone methylation profiles, flanking gene expression patterns and orthologous regions in other mammalian genomes. Users can also retrieve data by text search, sequence search and genomic range search. In vitro binding CTCF binding is validated by in vitro assays (EMSA etc.). In vivo binding CTCF binding is validated by in vivo assays (ChIP etc.). Enhancer-blocking assay Enhancer blocking activity is validated by the assay of 10. The flanking gene expression track compares the expression status of genes flanking the CTCF binding site. Red indicates overexpression in the tissue, and green indicates underexpression. The expression data are obtained from the GNF Gene Expression Atlas 211, which contains genome-wide gene expression profiles of 61 mouse tissues and 79 human tissues. The raw data was base 2 log-transformed and normalized to have a zero mean and a standard deviation of one. The images were generated using the slcview software. CTCFBS sequences were mapped onto the genomes using the BLAT program. The assemblies used are hg18 for human, mm8 for mouse, rn3 for rat and galGal2 for chicken. CTCF binding sites without chromosome location information usually means that they probably map to heterochromatic regions of the genome. We use UCSC genome browser to display the surrounding genomic context. Each query CTCFBS (red) and flanking genes within 100 kb distance from the CTCFBS are displayed in the genome browser. Other CTCF binding sites located within this genomic range are also displayed. We use different colors to designate the different sources of CTCF binding sites: yellow for INSUL_MAN, blue for INSUL_OHL, green for INSUL_REN and black for INSUL_PRE. H3K4 trimethylation (H3K4me3) and H3K27 trimethylation (H3K27me3) were found to be a pair of "Yin-Yang" modifications. High level of H3K4me3 and H3K27me3 marks gene activation and silencing respectively12 . It was also found that CTCF binding sites may demarcate the boundary of histone methylations-defined chromatin domains12. We integrated H3K4me3 and H3K27me3 maps with our CTCFBS map in the genome browser, in order to help the users generate hypotheses about functions of CTCF binding sites. Comparative genomic studies on Human, Mouse and Rat may provide clues for evolutionary dynamics of CTCFBS. To this end, we built the mammalian orthologous region track. The region encapsulating insulators and their flanking genes in the reference genome was used to retrieve orthologous regions in other two genomes from the UCSC precomputed block chains. Only the DNA block having the maximal alignment score against the query region was retained as the orthologous region. The aligned genomic sequences in up to 16 vertebrate genomes that are orthologous to the query CTCFBS sequence can be displayed by clicking " view alignment" button. 4. In silico CTCFBS prediction tool CTCF uses different combinations of its zinc fingers to recognize divergent DNA sequences. Recent studies have identified core motifs for CTCFBS sequences and the motifs are represented by position weight matrices (PWM) . Altogether, four closely related PWM have been derived to accommodate the divergence of CTCFBS sequence8,9. We offer the users a simple web tool to search for CTCFBS core motifs in a query sequence. We used the STORM program13 and each of the four PWM to report the best hit in the query sequence. The PWM score corresponds to the log-odds of the observed sequence being generated by the motif versus being generated by the background. So a large positive score suggests a good match. Usually a sequence with a PWM score >3.0 is a suggestive match.
All computationally predicted data can be downloaded at here. To maintain an up-to-date resource, we encourage researchers to submit their insulator data to CTCFBSDB. Data can be submitted directly using our web submission interface. Submitted data will be manually checked and added to the database
1. Ohlsson, R., Renkawitz, R. and Lobanenkov, V. (2001) CTCF is a uniquely versatile transcription regulator linked to epigenetics and disease. Trends Genet, 17, 520-527.
|