Close
Help




JOURNAL

Bioinformatics and Biology Insights

An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters

Submit a Paper


Bioinformatics and Biology Insights 2015:Suppl. 4 59-69

Methodology

Published on 25 Oct 2016

DOI: 10.4137/BBI.S29330


Further metadata provided in PDF



Sign up for email alerts to receive notifications of new articles published in Bioinformatics and Biology Insights

Abstract

A Bayesian method for sampling from the distribution of matches to a precompiled transcription factor binding site (TFBS) sequence pattern (conditioned on an observed nucleotide sequence and the sequence pattern) is described. The method takes a position frequency matrix as input for a set of representative binding sites for a transcription factor and two sets of noncoding, 5′ regulatory sequences for gene sets that are to be compared. An empirical prior on the frequency A (per base pair of gene-vicinal, noncoding DNA) of TFBSs is developed using data from the ENCODE project and incorporated into the method. In addition, a probabilistic model for binding site occurrences conditioned on λ is developed analytically, taking into account the finite-width effects of binding sites. The count of TFBS β (conditioned on the observed sequence) is sampled using Metropolis–Hastings with an information entropy-based move generator. The derivation of the method is presented in a step-by-step fashion, starting from specific conditional independence assumptions. Empirical results show that the newly proposed prior on β improves accuracy for estimating the number of TFBS within a set of promoter sequences.



Downloads

PDF  (1.51 MB PDF FORMAT)

RIS citation   (ENDNOTE, REFERENCE MANAGER, PROCITE, REFWORKS)

BibTex citation   (BIBDESK, LATEX)

XML




Quick Links


New article and journal news notification services