These are files for the hierarchichal multi-label version of C4.5, as described in my PhD thesis: Clare, A. (2003) Machine learning and data mining for yeast functional genomics. PhD thesis. University of Wales Aberystwyth. http://users.aber.ac.uk/afc/papers/AClarePhDThesis.pdf (1Mb) They should be untarred over the top of Ross Quinlan's C4.5 Release 8, which can be downloaded from http://www.cse.unsw.edu.au/~quinlan/c4.5r8.tar.gz So to install you should do, tar xvfz c4.5r8.tar.gz tar xvfz HierMultiLabelC4.5.tar.gz cd R8/Src make all .data and .names files follow normal C4.5 conventions except that we can allow multiple class labels per data item, and provide a class hierarchy. Multiple classes for each data item should be spearated by the '@' character. For example: 3.4,2.5,sunny,class1@class2. The class hierarchy is shown by indentation in a separate file with the same filestem as the .data and .names files, but ending in .classes (eg foo.data, foo.names, foo.classes). For example: a a1 a11 a2 b b1 b12 b2 b21 b22 b221 Example data is given in testhier.data, testhhier.names and testhier.classes. Windowing and attribute subsets options to c4.5 do not work - I haven't updated this part of the code. I developed this only to use c4.5 and c4.5rules - any other executables are untested and used at your own risk. Confusion matrices are still produced, but should really be ignored - how can confusion matrices of multi-label problems be represented truthfully? If a classification is wrong, which column should it be reported under? Email me if any problems: afc@aber.ac.uk