Additional material to manuscript:


Local Rényi entropic profiles of DNA sequences
Susana Vingaa,b* and Jonas S. Almeidac,d

BMC Bioinformatics (2007), 8:393 (Oct 16)

a Instituto de Engenharia de Sistemas e Computadores: Investigação e Desenvolvimento (INESC-ID), R. Alves Redol 9, 1000-029 Lisboa, Portugal
b Departamento de Bioestatística e Informática, Faculdade de Ciências Médicas – Universidade Nova de Lisboa (FCM/UNL), Campo dos Mártires da Pátria 130, 1169-056 Lisboa, Portugal
c Dept Biostatistics and Applied Mathematics, Univ. Texas MDAnderson Cancer Center - unit 447, 1515 Holcombe Blvd, Houston TX 77030-4009, USA
d Biomathematics Group, Instituto de Tecnologia Química e Biológica – Universidade Nova de Lisboa (ITQB/UNL), R. Qta. Grande 6, 2780-156 Oeiras, Portugal

E-mail addresses: svinga at algos inesc-id pt (SV), jalmeida at mdanderson org (JSA).

*Corresponding author


 

NOTE: This is a old implementation of the algorithm, kept here just for archaeological purposes... please visit the new webpage with an efficient version of the entropic profiler here

Click for...

DNA Datasets

Download text files (in FASTA format) with all the DNA sequences used in this study [first (shortest) four in seq.zip].

 

Sequence name Brief description
m3 random with inserted motif L=3 'ATC'
m4 random with inserted motif L=4 'ATCG'
m5 random with inserted motif L=5 'ATCGA'
Es experimental promoter regions of B.subtilis - see paper for full description
Ecoli Escherichia coli K12, complete genome [GenBank:NC_000913], 4,639,675 bp, sense of replication
Hinf Haemophilus influenzae Rd KW20, complete genome [GenBank:NC_000907], 1,830,138 bp, sense of replication

 

MATLAB source code

Current version 2 (Ago 27, 2007). Next upgrades will be posted here.

See an application example and look at functions' help. (NOTE: since these files were automatically generated some graphs appear differently from those in the manuscript).

Click to download all m-code MATLAB functions entropicprofile.zip, which includes the following files:

 

File name Brief description
readfasta.m Reads sequences from FASTA format files to struct MATLAB variables
count_repeat.m

Counts L-tuple repetitions for each position in input DNA sequences

fill_kernel3D

Calculates probability density estimation matrix (KM) with fractal kernel. Calls kernel_analytical.m
kernel_analytical.m Closed form for fractal kernel calculation
 
normKM.m Normalizes KM estimations.
 
find_scale.m Finds the scale where maxima and minima of KM occur
local_study.m Analyses specific user defined position/symbol
 
vinga_entropic.m Main function - calls all others
 

Binary files

Stand-alone executables for Windows, Mac and Linux. Each environment requires that a runtime server application (MCR) be downloaded and installed. For those who have Matlab installed, please note that most likely you will still need to install the appropriate MCR. The reason is that the stand alone executable files were produced for MCR 2006a so unless you have Matlab version 2006a you will need to also install the MCR listed below for your operating system. No Mathworks/Matlab licenses are needed.
  1. WINDOWS: first install MCRinstaller 2006a for Windows and then unzip the files in renyi_bin_win.zip.
  2. LINUX: first install MCRinstaller 2006a for Linux and then unzip the files in renyi_bin_lin.zip. (coming soon)
  3. MACINTOSH: first install MCRinstaller 2006a for Linux and then unzip the files in renyi_bin_mac.zip. (coming soon)
The execution of the executable code is similar to that of the m-code function vinga_entropic.m except that the input arguments are passed space delimited. For example, instead of using:

  vinga_entropic('m4.seq',6,12,253)

one would use, at the command line (might take couple of minutes):

  vinga_entropic m4.seq 6 12 253

nucleotides: 2000
Calculating Kernel distribution function ...
Forward counts...
Backward counts...
save results also as XML: m4_kernel3D_cf.xml
saving plot of density as a function of phi in png and pdf format
saving plot of density as a function of position with N variable in png and pdf format
saving plot of density as a function of position for maximum values of KMn in png and pdf format
saving plot of density as a function of position detail (MAX) in png and pdf for mat
saving plot of density as a function of position detail (MIN) in png and pdf for mat

The detailed help information for this function is provided below.
Please note that the executable binary file will produce several export files. These include producing the figures in pdf and png formats and also exporting the numerical results of the calculations as an XML.

If we want to study other position in the sequence, for example 380, there is no need to repeat all the calculations (couple of seconds):

  vinga_entropic m4.seq 6 12 380

intermediate results were found (m4_kernel3D_cf.mat) for a sequence with the same name and will now be retrieved.
saving plot of density as a function of phi in png and pdf format
saving plot of density as a function of position with N variable in png and pdf format
saving plot of density as a function of position for maximum values of KMn in png and pdf format
saving plot of density as a function of position detail (MAX) in png and pdf for mat
saving plot of density as a function of position detail (MIN) in png and pdf for mat

help on vinga_entropic:

VINGA_ENTROPIC is the main function of the toolbox for processing sequences
Syntax: vinga_entropic(fastafile,N,phi,position)
Description: this is the main function that calls all others and is the right function to compile for sequence processing. The numerical results, figures and intermediate calculations will be stored with names built from the fasta file name. For example, if the fastafile is m4.seq, all graphs and figures will be saved as m4_*.pdf and m4_*.png.
****Input arguments: fastafile - text file with sequence to be analyzed (in FASTA format)
N - kernel resolution parameter
phi - kernel smoothing parameter
position - local study of particular symbol in the original sequence
****Output:
figure files with all the graphical results (see webpage)
XML page with numerical data

EXAMPLE: vinga_entropic('m4.seq',6,12,253)

NOTE: The compiled version takes char inputs, therefore the type conversion is checked and corrected if needed
-----------------------------------------------------------------
Authors: Susana Vinga and Jonas S Almeida
Reference: "Local Rényi entropic profiles of DNA sequences"
BMC Bioinformatics (submitted)
Version: 2007.08.27
Webpage: http://algos.inesc-id.pt/~svinga/ep/
-----------------------------------------------------------------

Links

[Renyi continuous entropy][Alfréd Rényi's Biography][MATLAB site]


Suggestions & Comments: svinga at algos inesc-id pt
Created: 2007 Jan 19 -- Last update: 2011 Jul 20