- modENCODE
modENCODE is run as a Research Network and the consortium is formed by 11 primary projects , divided between worm and fly, spanning the domains of gene structure, mRNA and ncRNA expression profiling, transcription factor binding sites, histone modifications and replacement, chromatin structure, DNA replication initiation and timing, and copy number variation. - ENCODE
The ENCODE (Encyclopedia of DNA Elements) Consortium is an international collaboration of research groups funded by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to build a comprehensive parts list of functional elements in the human genome, including elements that act at the protein and RNA levels, and regulatory elements that control cells and circumstances in which a gene is active. - UCSC
The UCSC Genome Browser is an on-line genome browser hosted by the University of California, Santa Cruz (UCSC). It is an interactive website offering access to genome sequence data from a variety of vertebrate and invertebrate species and major model organisms, integrated with a large collection of aligned annotations. - Basic analysis
The ChIP experimental design were extracted from the original data sources. Brief description of the TF (species name, gene ID, gene name, GO terms) were retrieved from Ensemble database using BioMart. The length distribution and distance to TSS were calculated using a custom R script. - Putative target genes assignment
The ChIP peaks were compared to the location of coordinates of gene body using a custom Perl script. In total, 5 methods were used to assign target genes: 'physical overlapping', 'nearest gene' and 'neighbor overlapping'(1kb, 10kb, 100kb). - Co-binding analysis
To decode potential co-binding of TFs, we extracted experimentally confirmed TFBSs from RedFly and overlapped them with ChIP peaks. - Motif enrichment scanning
The presence of known motifs in TF ChIP peaks was identified using PWMEnrich with default parameters. The de novo motif scanning was performed using findMotifsGenome.pl from HOMER. - The length distribution of ChIP peaks
- The frequency of distance between ChIP peak and TSS
- The proportions of ChIP peaks associated with promoter, 5 UTR, 3 UTR, exon, intron, downstream and intergenic regions
- The overlapping between ChIP peaks and experimentally verified enhancers in RedFly database.
- Enriched known motifs in ChIP peaks
- Enrichment of de novo motifs in ChIP peaks
1A comprehensive functional anotation database for the TF binding of human and model species. In total, 1870 ChIP data sets of 617 TFs from 5 species (human, mouse, fly, worm and yeast) are included in TFBSbank.
2Our database predict putative cofactors/collaborators and enriched motifs based on ChIP data, as they are crucial for the in vivo functions of TFs.
The transcription factor (TF) genome-wide binding data has been extensively generated in the past few years, which poses a great challenge to the currently limited data interpreting capacity. Therefore, comprehensive and dedicated functional annotation databases for TF-DNA interaction are in great demands to manage, explore and utilize those invaluable data resources. Here, we constructed a state-of-the-art and user-friendly platform ‘TFBSbank’ which houses the annotation of 1870 ChIP data sets of 585 TFs in 5 species (human, mouse, fly, worm and yeast). There are mainly 5 functional modules in TFBSbank aimed at characterizing chromatin immunoprecipitation (ChIP) peaks, identifying putative targets, predicting TF responsive enhancers, revealing potential cofactors/collaborators and discovering enriched TF motifs. TFBSbank has two distinctive features compared to the existing databases. Firstly, we provided putative cofactors/collaborators analysis (for Drosophila melanogaster), as they are crucial for the in vivo functions of TFs. Additionally, this database predicted the enrichment of both known and de novo motifs based on ChIP data. TFBSbank is freely accessible at http://tfbsbank.co.uk