BicPAM

BicPAM makes available a set of pattern-based approaches for biclustering. BicPAM integrates existing disperse efforts towards pattern-based biclustering and introduces novel methods to discover biclusters with multiple patterns of expression, varying quality and alternative underlying structures. Additionally, BicPAM allows for parameterizable definition of mining, mapping, and closing options (including search, pattern representation, normalization, discretization, extension, merging and filtering strategies) and alternative ways to deal with missing values and noise.

Authors: Rui Henriques and Sara Madeira

@article{,
	title = {BicPAM: Pattern-based biclustering for biomedical data analysis},
	journal = {Algorithms for Molecular Biology},
	author = {Henriques, Rui and Madeira, Sara},
	volume = {9},
	year = {2014},
	number = {1},
	pages = {27},
	url = {http://www.almob.org/content/9/1/27},
	doi = {10.1186/s13015-014-0027-z},
	issn = {1748-7188},
}

Synthetic datasets (non-exhaustive set):

Datasets with constant biclusters (noise up to ±5%)
- 500x60: datasets (5 Uniform + 5 Normal), and hidden biclusters
- 1000x100: datasets (5 Uniform + 5 Normal), and hidden biclusters
- 2000x200: datasets (5 Uniform + 5 Normal), and hidden biclusters
- 4000x400: datasets I (5 Uniform), datasets II (5 Normal), and hidden biclusters
Datasets with additive biclusters (noise up to ±5%)
- 1000x100: datasets (5 Uniform + 5 Normal), and hidden biclusters
- 2000x200: datasets (5 Uniform + 5 Normal), and hidden biclusters
Datasets with multiplicative biclusters (noise up to ±5%)
- 1000x100: datasets (5 Uniform + 5 Normal), and hidden biclusters
- 2000x200: datasets (5 Uniform + 5 Normal), and hidden biclusters
Datasets with 2% of missing values (noise up to ±5%)
- 1000x100 (constant assumption): datasets, and hidden biclusters
- 1000x100 (additive assumption): datasets, and hidden biclusters
- 1000x100 (multiplicative assumption): datasets, and hidden biclusters
Datasets with 5% of missing values (noise up to ±5%)
- 1000x100 (constant assumption): datasets, and hidden biclusters
- 1000x100 (additive assumption): datasets, and hidden biclusters
- 1000x100 (multiplicative assumption): datasets, and hidden biclusters
Datasets with no noise and no missings
- 1000x100 (constant assumption): datasets, and hidden biclusters
- 1000x100 (additive assumption): datasets, and hidden biclusters
- 1000x100 (multiplicative assumption): datasets, and hidden biclusters

Real datasets:

dlblc.arff (diffuse large-B-cell lymphoma).
From (Rosenwald et al. 2002) consisting of 180 samples and 661 probe sets with skewness of -0.05 and excess kurtosis of 0.35 after standardization.
The goal was to predict the survival after chemotherapy. In (Hoshida et al. 2007) 3 classes were found that can be identified directly by pattern-based biclustering.
hughes.arff (oligonucleotide array for Saccharomyces cerevisiae).
High-resolution genome-wide of S. Cerevisiae (prepared from haploid yeast, collected in the logarithmic phase of growth in YPD medium and hybridized to an Affymetrix tiling).
The original goal, in (David et. al 2006; Lee et. al 2007), was to identify the boundary, structure, and level of coding and noncoding transcripts - study nucleosome occupancy.
gasch.txt (Yeast responses to different stress conditions).
From (Gasch et al. 2000) capturing Saccharomyces cerevisiae response to diverse environmental transitions. DNA microarrays were used to measure changes in transcript levels over time for almost every yeast gene, as cells responded to temperature shocks, hydrogen peroxide, the superoxide-generating drug menadione, the sulfhydryl-oxidizing agent diamide, the disulfide-reducing agent dithiothreitol, hyper- and hypo-osmotic shock, amino acid starvation, nitrogen source depletion, and progression into stationary phase.

Material:

Software

BicPAM JAR v1.4.0 (04-20-2013, 6.40 MB)

Example on how to use BicPAM:

public class SyntheticTests {

	//bicluster types
	static PatternType[] types = new PatternType[]{PatternType.Constant, PatternType.Additive, PatternType.Multiplicative};

	//mining methods
	static FIM[] pminers = new FIM[]{new CharmClosedFIM(), new AprioriTIDClosedFIM(), new FPGrowthTIDSimpleFIM(), new EclatSimpleFIM()};

	//mapping options
	static Itemizer itemizer = new Itemizer(10, //nr of items
				NormalizationCriteria.Overall,
				DiscretizationCriteria.NormalDist,
				FillingCriteria.Replace, //missings handler
				OutlierizationCriteria.Overall);
				
	//closing options
	Biclusterizer bichandler = new Biclusterizer(
				new BiclusterExtender(0.25,0.25), //criteria for the default extender
				new BiclusterIntraFilter(0.8), 
				new BiclusterBicFilter(0.4), 
				new BiclusterMerger(0.7));

	//Generator properties
	static Background background = Background.Random;
	static String distRowsBics = "Uniform", distColsBics = "Uniform";
	static double noise = 0.05; //5% of noise
	static double missings = 0.02; //2% of missing values
	static int[] alphabets = new int[]{10,20};
	static int[] numRows = new int[]{500,1000,2000,4000};
	static int[] numColumns = new int[]{60,100,200,400};
	static int[] numBics = new int[]{5,10,15,20};
	static int[] minRowsBics = new int[]{15,20,40,60}, maxRowsBics = new int[]{30,40,70,100};
	static int[] minColsBics = new int[]{6,6,6,6}, maxColsBics = new int[]{8,10,14,20};
	
	public static void main(String[] args) throws Exception{
		for(PatternType type : types){
			for(int alphabet : alphabets){	  
				for(int i = 0, l = numRows.length; i<l; i++){
					BicGenerator generator = new BicGenerator(numRows[i],numColumns[i],numBics[i],background,alphabet);

					Biclusters trueBics = generator.generateKBiclustersWithCoherentEvolutionOnColumns(type,
							distRowsBics, minRowsBics[i], maxRowsBics[i], distColsBics, minColsBics[i], maxColsBics[i]);
					
					int[][] dataset = generator.getSymbolicExpressionMatrix();
					dataset = generator.putNoise(dataset,noise);
					dataset = generator.putMissings(dataset,missings);
					Dataset data = new Dataset(dataset,true);

					for(FIM pminer : pminers){
						double minRowsSup = findMinRowsSupExpectations(data);
						double minColsSup = findMinColsSupExpectations(data);
						pminer.inputParams(minColsSup,minRowsSup);

						BiclusterMiner bicminer = null;
						if(type.equals(PatternType.Constant)) 
							bicminer = new ColumnConstantBiclusterMiner(data,pminer,bichandler,itemizer);
						else if(type.equals(PatternType.Additive))
							bicminer = new ColumnAdditiveBiclusterMiner(data,pminer,bichandler,itemizer);
						else bicminer = new ColumnMultiplicativeBiclusterMiner(data,pminer,bichandler,itemizer);
						core(bicminer,trueBics);
					}
				}
			}
		}
	}

	private static void core(BiclusterMiner bicminer, Biclusters trueBics) throws Exception {
		Biclusters bics = bicminer.mineBiclusters();
		System.out.println("True Bics: " + trueBics.toShortString());
		System.out.println("Foung Bics: " + bics.toShortString());
		System.out.println(MatchMetrics.run(bics, trueBics));
	}
}