NASC - non aligned sequence comparison
Matlab© companion Toolbox to paper:
Alignment-free
sequence comparison – a review
Bioinformatics Vol. 19 no. 4 (2003) - Pages 513-523
1Biomathematics Group, ITQB – Univ. Nova Lisboa, P.O.Box 127, 2780-156 Oeiras, Portugal. E-mail: svinga@itqb.unl.pt
2Dept Biometry & Epidemiology, Medical Univ. South Carolina, 135 Cannon Street, Suite 303, P.O. Box 250835, Charleston, SC 29425, USA. E-mail: almeidaj@musc.edu
Downloadable zip file includes manual with examples and all Matlab functions used.
Current version 4.21 (April 2003). Next upgrades will be posted here. A new version of Mahalanobis distance is currently being prepared (more efficient).
See full description in MANUAL.doc and in functions' help
Download NASC_Toolbox v4.21 zip file.
File name | Brief description |
---|---|
MANUAL.doc |
Toolbox Manual with examples. |
EU.txt |
Natural language example in correct file format. [See HTML] |
HUMHBB.txt | Protein example. Translations of the Human beta globin region on chromosome 11. [NCBI gi:455025]. [See HTML] |
thrABC.txt | DNA example. E. coli K12 threonine operon. [See HTML] |
cgcria.m |
Reads text file, transforms symbol to number, creates USM coordinates. |
freqseq.m |
Calculates counts and frequencies of L-tuples (or L-words) in sequences extracted previously from file. |
overlap.m |
Calculates overlap capability of words present. |
word_var.m |
Variances of L-tuple counts. |
word_cov.m |
Covariances of L-tuple counts. |
distance.m |
Calculates different metrics on sequences (see references for distance definitions). |
nasc.m |
Calls all previous functions. |
plotdistance.m |
Plots all types of distances between chosen sequences |
classif.m |
Final sequence classification and dendrogram construction (Cluster Analysis). |
crossd.m |
USM cross distances calculations (not yet optimized); see also bUSM (boolean USM) toolbox. |
ang.m |
Auxiliary function. Angle between vectors in euclidean space. |
h_rel.m |
Auxiliary function. Relative entropy between vectors. |
isquareform.m |
Auxiliary function. Matrix operations. |
Graphical results with EU text (see above): dendrogram with language classification obtained with combined Euclidean distance [Figure]
2003-04-16: Change classif.m (add ; to avoid
extended output). Add Protein (HUMHBB.txt) and DNA (thrABC.txt) examples.
Manual.
2003-02-24: New function isquareform.m included
(vectorized)
2003-02-24: Old version 4.1 (July 2002) - NASC_Toolbox v4.1 zip
file
[see paper - not yet available]
Suggestions &Comments: svinga@itqb.unl.pt
Created: 2002 Jul 24 -- Last update: 2004 Fev 10
Web-counter since 10Fev04