Additional material and results to paper:

Comparative evaluation of word composition distances for the recognition of SCOP relationships
Bioinformatics Vol.20 no.2 (2004) -  Pages 206-215

Susana Vinga1, Rodrigo Gouveia-Oliveira1 & Jonas Almeida2,1*

1  Biomathematics Group, ITQB Univ. Nova Lisboa, R.Qta. Grande, 2780-156 Oeiras, PORTUGAL.

2*Dept Biometry & Epidemiology, Medical Univ. South Carolina, 135 Cannon Street, Suite 303, P.O. Box 250835, Charleston, SC 29425, USA.


Datasets & Metrics compared

Download MATLAB files with sequences PDB40b.mat and PDB40v.mat. Go to ASTRAL webpage for PDB40-B set (release 1.61). Check older versions of SCOP database, including PDB40 (release 1.35) in [Brenner et al, 1998]. For additional code contact the authors.
Briefly recall definitions of distance functions used.


Complete dataset results

ROC Curves and AUC values

PDB40-v PDB40-b OBS.

ROC curves, four levels, all metrics

ROC curves, four levels, all metrics

Figures 2 and 3

AUC values

AUC values

Figures 4 and 5

Table with AUC values


Equivalence of metrics

 W-metric AUC values for PDB40-b dataset, using other scoring matrices and normalization procedures.


Higher-order tuples

AUC values for 2 and 3-tuples

Stratified analysis

AUC values:

PDB40-v PDB40-b

SW vs. W-metric

SW vs. W-metric

SW vs. Std. Euclidean

SW vs. Std. Euclidean

Copy-friendly Table with AUC values for all the metrics


