This is the abstract of a talk prepared for the Oeiras Mathematical and Computational Biology Workshop. June 20, 2003, Instituto Gulbenkian de Ciência
Abstract: Biology needs Informatics. The production of larger and larger databases in molecular biology, particularly those containing genomic data, have lead to a strong interest in Bioinformatics and Computational Biology due to the obvious need to analyze and understand such large collections of data. In particular, DNA microarray technology , with its ability to measure the expression patterns of thousands of genes simultaneously, presents researchers with formidable data analysis difficulties.
Analysis so far has mostly been limited to identification of genes and arrays with similar overall expression patterns by clustering methods. However, in general, we expect the expression behavior of genes to be influenced by more than one regulatory network or cellular process. Therefore, the application of ‘spectral’ analysis methods for gene expression data, which could reveal the distinct, superposed, processes influencing a gene’s expression level is desirable. We describe our research using such methods for gene expression analysis.
But data-mining approaches to large-scale measurements in biology (e.g. in gene expression) is only the first stage of a more comprehensive approach to bioinformatics. These methods are typically used to discover patterns of expression behavior associated with subsets of genes, which are thus identified. But this analysis is pursued using exclusively the numerical expression values obtained from microarray experiments. Therefore, they cannot directly help us in deriving functional knowledge. The biological reasons for the patterns identified by these techniques must ultimately be ascertained by biologists who need to be able to integrate knowledge about a large number of possible underlying biological mechanisms. Given the large number of genes in microarrays and the myriad possible networks of cellular interaction, this is a daunting task indeed. The second stage of the analysis of the large-scale measurement methods now available to biology, derives from the need to assist biologists in generating functional hypothesis about numerical analysis results (the first stage).
Recent renewed interest in Systems Biology has lead researchers in Bioinformatics to the idea that in general, no single set of measurements, data analysis method, or single research team will be sufficient to understand complex biological networks of vast size . Instead, this research needs to be carried out by interdisciplinary teams empowered with Informatics technology capable of automatically integrating the results of pattern recognition analysis of microarray data, with available sources of functional knowledge. Clearly, such integrative technology does not aim to replace biologists, but rather to assist them by reducing the number of possible explanations of functional behavior.
At Los Alamos we are a) investigating and developing ‘spectral’ methods for gene-expression analysis and b) extracting functional knowledge from literature sources using several techniques from information retrieval which we intend to develop for this area. Our work is pursued in collaboration with a team of researchers from computer science and biology in several LANL divisions and external research institutions as such an interdisciplinary endeavor requires.