About 

CompostBin is a DNA-composition-based binning algorithm for classifying metagenomic reads. Unlike previous methods that seek to bin assembled contigs and often require training on known reference genomes, CompostBin has the ability to accurately bin raw sequence reads without need for assembly or training. It applies principal component analysis to project the data into an informative lower-dimensional space, and then uses the normalized cut clustering algorithm on this filtered data set to classify sequences into taxon-specific bins. We have demonstrated the algorithm’s accuracy on a variety of simulated data sets and on one metagenomic data set with known species assignments. CompostBin is a work in progress, with several refinements of the algorithm planned for the future.

 Source Code

The source code is available here. It has been  Please email Sourav Chatterji (schatterji AT ucdavis DOT edu) for questions and/or comments. 

Reference

CompostBin is described in "Sourav Chatterji, Ichitaro Yamazaki, Zhaojun Bai and Jonathan Eisen, CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads , to appear in RECOMB 2008."

A pre-publication version has been submitted to arxiv.org here. The test data-sets used to test the program can be downloads by following this link.