ABSTRACT
We present Circle, a classification algorithm based on the priciples
of boolean function minimization. This classification process
uses a recursive method to generate a set of implicants (or rules).
The novelty of this algorithm is in the fact that the rules generated
contain information about not only presence of features, but also
their absence in determining class values. Although function minimization
is inherently exponential on the number of attributes, we
introduce several optimization techniques to reduce the complexity
to the extent that we are able to scale the algorithm to 1000
columns, the limit in most commercial database systems. Circle is
levelwise algorithm that iteratively produces implicants. For portions
of the training set that are misclassified, Circle recurs producing
additional implicants that will be logically conjoined to the
previous implicants. Several optimization techniques were applied
to the base Circle algorithm, as well as numerous ways it can be
configured for specifc data mining tasks. Circle is completely implemented
in Java with a JDBC-compliant database backend. One
of the primary applications of Circle is mining bioinformatics data,
and particularly genomic data. We present results of our experiments
running circle on several well-known data sets for machine
learning, as well as special large genomic data sets.