BSig: Toolbox to Evaluate the Statistical Significance of Flexible Biclustering Solutions
BSig provides a wide-set of statistical tests able to assess the significance of biclustering solutions with varying properties. BSig makes available:- statistical tests to evaluate the significance of biclusters with varying homogeneity criteria, including biclusters with constant, additive, multiplicative, symmetric, plaid and order-preserving coherency assumptions;
- sound statistical criteria for both symbolic and real-valued data settings with arbitrary levels of noise and missing values;
- statistical tests to assess biclusters with continuous shifting and scaling factors;
- parameterizations offering the possibility to control the risk towards false positive and/or false negative discoveries;
- revised state-of-the-art biclustering algorithms (including BicPAM, BicNET, BicSPAM, BiP, DeBi and BiModule) integrating the previous statistical views (to guide the search and guarantee statistically significant outputs);
- statistical tests to infer constraints from global statistical tests (such as homogeneity-conditional minimum number of rows and columns per bicluster) to guide biclustering tasks;
Software: BSig JAR v2.3.0 (01-02-2016, 6.10 MB) (soon available without request for access)
Data
- Real datasets:
- dlblc.arff (180 conditions, 660 genes)
- yeast.arff (17 conditions, 1884 genes)
- coloncancer.arff (62 conditions, 2000 genes)
- leukemia.arff (38 conditions, 7129 genes)
- Synthetic datasets:
- Data with constant, additive, multiplicative and symmetric models
- Data with order-preserving and plaid models
- Data with planted biclusters combining previous coherencies
Others: Raw Results and Statistical Sheets (soon available without request for access)