NASC - non aligned sequence comparison

Matlab companion Toolbox to paper:


Alignment-free sequence comparison a review
Bioinformatics Vol. 19 no. 4 (2003) - Pages 513-523

Susana Vinga1 & Jonas Almeida2,1.

1Biomathematics Group, ITQB Univ. Nova Lisboa, P.O.Box 127, 2780-156 Oeiras, Portugal. E-mail: svinga@itqb.unl.pt

2Dept Biometry & Epidemiology, Medical Univ. South Carolina, 135 Cannon Street, Suite 303, P.O. Box 250835, Charleston, SC 29425, USA. E-mail: almeidaj@musc.edu


 

Source code

Downloadable zip file includes manual with examples and all Matlab functions used.

Current version 4.21 (April 2003). Next upgrades will be posted here. A new version of Mahalanobis distance is currently being prepared (more efficient).

See full description in MANUAL.doc and in functions' help

Download NASC_Toolbox v4.21 zip file.

 

File name Brief description

MANUAL.doc

Toolbox Manual with examples.

EU.txt

Natural language example in correct file format. [See HTML]

HUMHBB.txt Protein example. Translations of the Human beta globin region on chromosome 11. [NCBI gi:455025]. [See HTML]
thrABC.txt DNA example. E. coli K12 threonine operon. [See HTML]

cgcria.m
cgcgr0.m
cgcgr1.m
cgcode.m
cgle.m
cgtp.m

Reads text file, transforms symbol to number, creates USM coordinates.

freqseq.m

Calculates counts and frequencies of L-tuples (or L-words) in sequences extracted previously from file.

overlap.m

Calculates overlap capability of words present.

word_var.m

Variances of L-tuple counts.

word_cov.m

Covariances of L-tuple counts.

distance.m

Calculates different metrics on sequences (see references for distance definitions).

nasc.m

Calls all previous functions.

plotdistance.m

Plots all types of distances between chosen sequences

classif.m

Final sequence classification and dendrogram construction (Cluster Analysis).

crossd.m

USM cross distances calculations (not yet optimized); see also bUSM (boolean USM) toolbox.

ang.m

Auxiliary function. Angle between vectors in euclidean space.

h_rel.m

Auxiliary function. Relative entropy between vectors.

isquareform.m

Auxiliary function. Matrix operations.

Example

Graphical results with EU text (see above): dendrogram with language classification obtained with combined Euclidean distance [Figure]

What's New & Old versions

2003-04-16: Change classif.m (add ; to avoid extended output). Add Protein (HUMHBB.txt) and DNA (thrABC.txt) examples. Manual.
2003-02-24: New function isquareform.m included (vectorized)
2003-02-24: Old version 4.1 (July 2002) - NASC_Toolbox v4.1 zip file

References:

[see paper - not yet available]


Suggestions &Comments: svinga@itqb.unl.pt
Created: 2002 Jul 24 -- Last update: 2004 Fev 10  
                 Web-counter since 10Fev04