When constructing a predictive model from high-dimensional data it is desirable to remove redundant and non-predictive independent variables. This leads to models that are more easily understandable, and often improves generalization by reducing overfitting. The problem of determining the right subset of variables to use is known as the feature selection problem. In many domains, especially medical problems, the determination of important variables is an important problem itself. The general form of this problem is NP-hard and is further complicated by the fact that the objective must be estimated, and depends on the form of the predictive model being used. This talk will describe two techniques for finding optimal feature sets in an inductive learning setting. The first uses concave minimization to solve an approximation of the combinatorial problem. The second frames feature selection as a multi-objective optimization problem and uses genetic algorithms to explore the space of possible solutions. Finally, the talk will explore clustering (or unsupervised learning) and use the genetic algorithm framework to determine not only the optimal feature set but also the necessary number of clusters.
|
|
Nick Street is an assistant professor in the Management Sciences Department at the University of Iowa. He received a Ph.D. in 1994 in Computer Sciences from the University of Wisconsin. He was a postdoctoral researcher in the departments of Surgery and Computer Sciences at Wisconsin from 1994 to 1996, and an assistant professor in the Computer Science department at Oklahoma State University from 1996 to 1998. His research interests are in machine learning and data mining, particularly the use of mathematical optimization in inductive learning techniques. Areas of his previous and current work include: feature selection, overfitting avoidance, clustering, ensemble methods, prediction with censored data, and image and text segmentation. He has published extensively in the medical literature on breast cancer diagnosis and prognosis. Professor Street was the recipient of a CAREER award from the National Science Foundation in 1997 and an INRSA postdoctoral fellowship from the National Cancer Institute in 1995, as well as several smaller research awards from various public and private sources.