Additional material to manuscript:
Local Rényi entropic profiles of DNA sequences
Susana Vingaa,b* and
Jonas S. Almeidac,d
BMC Bioinformatics (2007), 8:393 (Oct 16)
a Instituto de Engenharia de
Sistemas e Computadores: Investigação e Desenvolvimento (INESC-ID), R. Alves
Redol 9, 1000-029 Lisboa, Portugal
b Departamento de
Bioestatística e Informática, Faculdade de Ciências Médicas – Universidade Nova
de Lisboa (FCM/UNL), Campo dos Mártires da Pátria 130, 1169-056 Lisboa, Portugal
c Dept Biostatistics and Applied Mathematics, Univ. Texas
MDAnderson Cancer Center - unit 447, 1515 Holcombe Blvd, Houston TX 77030-4009,
USA
d Biomathematics Group, Instituto de Tecnologia Química e Biológica – Universidade Nova de Lisboa (ITQB/UNL),
R. Qta. Grande 6, 2780-156 Oeiras, Portugal
*Corresponding author
NOTE: This is a old implementation of the algorithm, kept here just for archaeological purposes... please visit the new webpage with an efficient version of the entropic profiler here
Click for...Download text files (in FASTA format) with all the DNA sequences used in this study [first (shortest) four in seq.zip].
Sequence name | Brief description |
---|---|
m3 | random with inserted motif L=3 'ATC' |
m4 | random with inserted motif L=4 'ATCG' |
m5 | random with inserted motif L=5 'ATCGA' |
Es | experimental promoter regions of B.subtilis - see paper for full description |
Ecoli | Escherichia coli K12, complete genome [GenBank:NC_000913], 4,639,675 bp, sense of replication |
Hinf | Haemophilus influenzae Rd KW20, complete genome [GenBank:NC_000907], 1,830,138 bp, sense of replication |
Current version 2 (Ago 27, 2007). Next upgrades will be posted here.
See an application example and look at functions' help. (NOTE: since these files were automatically generated some graphs appear differently from those in the manuscript).
Click to download all m-code MATLAB functions entropicprofile.zip, which includes the following files:
File name | Brief description |
---|---|
readfasta.m | Reads sequences from FASTA format files to struct MATLAB variables |
count_repeat.m |
Counts L-tuple repetitions for each position in input DNA sequences |
fill_kernel3D |
Calculates probability density estimation matrix (KM) with fractal kernel. Calls kernel_analytical.m |
kernel_analytical.m |
Closed form for fractal
kernel calculation |
normKM.m |
Normalizes KM estimations. |
find_scale.m | Finds the scale where maxima and minima of KM occur |
local_study.m |
Analyses specific user defined position/symbol |
vinga_entropic.m |
Main function - calls all others |
vinga_entropic('m4.seq',6,12,253) one would use, at the command line (might take couple of minutes):
vinga_entropic m4.seq 6 12 253
nucleotides: 2000
Calculating Kernel distribution function ...
Forward counts...
Backward counts...
save results also as XML: m4_kernel3D_cf.xml
saving plot of density as a function of phi in png and pdf format
saving plot of density as a function of position with N variable in png and pdf format
saving plot of density as a function of position for maximum values of KMn in png and pdf format
saving plot of density as a function of position detail (MAX) in png and pdf for mat
saving plot of density as a function of position detail (MIN) in png and pdf for mat
The detailed help information for this function is provided below.
Please note that the executable binary file will produce several export files. These include producing the figures in pdf and png formats and also exporting the numerical results of the calculations as an XML.
If we want to study other position in the sequence, for example 380, there is no need to repeat all the calculations (couple of seconds):
vinga_entropic m4.seq 6 12 380
intermediate results were found (m4_kernel3D_cf.mat) for a sequence with the same name and will now be retrieved.
saving plot of density as a function of phi in png and pdf format
saving plot of density as a function of position with N variable in png and pdf format
saving plot of density as a function of position for maximum values of KMn in png and pdf format
saving plot of density as a function of position detail (MAX) in png and pdf for mat
saving plot of density as a function of position detail (MIN) in png and pdf for mat
help on vinga_entropic:
VINGA_ENTROPIC is the main function of the toolbox for processing sequences
EXAMPLE: vinga_entropic('m4.seq',6,12,253)
NOTE: The compiled version takes char inputs, therefore the type conversion is
checked and corrected if needed
Syntax: vinga_entropic(fastafile,N,phi,position)
Description: this is the main function that calls all others and is the
right function to compile for sequence processing. The numerical results,
figures and intermediate calculations will be stored with names built
from the fasta file name. For example, if the fastafile is m4.seq, all
graphs and figures will be saved as m4_*.pdf and m4_*.png.
****Input arguments:
fastafile - text file with sequence to be analyzed (in FASTA format)
N - kernel resolution parameter
phi - kernel smoothing parameter
position - local study of particular symbol in the original sequence
****Output:
figure files with all the graphical results (see webpage)
XML page with numerical data
-----------------------------------------------------------------
Authors: Susana Vinga and Jonas S Almeida
Reference: "Local Rényi entropic profiles of DNA sequences"
BMC Bioinformatics (submitted)
Version: 2007.08.27
Webpage: http://algos.inesc-id.pt/~svinga/ep/
-----------------------------------------------------------------
[Renyi continuous entropy][Alfréd Rényi's Biography][MATLAB site]
Suggestions & Comments:
svinga at algos inesc-id pt
Created: 2007 Jan 19 -- Last update: 2011 Jul 20