Indiana University School of Informatics


Research
Faculty Research Profiles

Catharine Wyss

Faculty Title

Assistant Professor of Informatics and Computer Science

Research Statement

Data Integration and Interoperability

My research is primarily in the area of database theory and systems. Currently, my central project involves integrating data from different sources, which may be technologically and/or structurally dissimilar. There is great need in our increasingly interconnected world for disparate data and information sources to communicate and interoperate, and the database community is particularly poised to contribute to this effort. It was thought (over the last decade) that the emerging web standards, including XML, would lead to comprehensive solutions to the data integration and interoperability problem, however it has turned out that most of the challenges that existed before XML still exist! Thus, the sub-field of databases comprising federated and heterogeneous information systems has renewed importance within the spectrum of database research. As someone who never “left” this sub-field, this is a welcome and anticipated development.

Comprehensive Metadata Management

Our approach to the data integration and the interoperability problem centers on comprehensive metadata management. All information systems must contain meta-information (or metadata) that describes how the data within these systems is structured. In particular among structurally dissimilar information sources, it is thus important to access the metadata of constituent systems and use this metadata to construct useful mappings among the sources.

My previous contribution involved extending the most common information system technology, the relational model, with intrinsic metadata support. Unlike previous attempts, my extension framework includes a formal algebraic language for metadata manipulation. This algebra has a clean, inherently relational semantics, a bounded complexity, and relationally complete expressiveness. This extended framework allows metadata to be queried and exchanged as easily as data.

Currently, we are building on this work to develop a larger framework to address data integration and interoperability on a much vaster scale. We are developing simple systemic extensions to relational databases for storing more comprehensive metadata, including metadata about the model semantics behind the data structures. Such extensions facilitate a highly-sought model of peer-to-peer interoperability, where large networks of interconnected information sources respond to individual queries with quick return, incremental answers. Such a network depends on a notion of structural “closeness” among constituent data sources, and does not require these sources to conform to a single (possibly constricting) global structural description, unlike the frameworks currently in use.

Information Representation and Database Design

I organize and teach the third-year undergraduate informatics core course I308 Information Representation. The goal of this course is to teach our students how to go from real-world descriptions of problems, enterprises, and general knowledge to a useful digital representation of these systems. A primary emphasis is on information design, which is important for several reasons. First, the design process is crucial for truly understanding the real-world domain that is to be modeled. Second, a good design facilitates future maintenance and usage of the resultant system, whereas (conversely) a poor design is the single biggest predictor of failure of information systems. Finally, my work in information design and modeling relates back to the comprehensive problems of data integration and interoperation that I research (and every informatics student will no doubt face) in that the design is what becomes encoded as metadata in the system. If this metadata works properly, ostensibly disparate systems can nonetheless assist in decision-making and query response, since these systems will be much more likely to be able to automatically communicate with one another via efficient and comprehensive metadata exchange.

Select Presentations

  • “Intrinsic Support for Metadata Integration in Relational Federations”. With James Lu, Shun Yan Cheung and Mehdi Akhavein. Engineering Federated Information Systems (EFIS 2003), Coventry, UK, July 2003.
  • “Managing XML Schemas Through XRDBMS”. With James Lu, Shun Yan Cheung, and Mehdi Akhavein. Information and Knowledge Engineering (IKE 2003), Las Vegas, Nevada, June 2003.
  • “A Relational Algebra for Data/Metadata Integration in a Federated Database System”. With Dirk Van Gucht. Conference on Information and Knowledge Management (CIKM 2001), Atlanta, Georgia, November 2001.
  • “Augmenting SQL with Dynamic Typing to Support Interoperability in a Relational Federation”. With Felix Wyss and Dirk Van Gucht. Engineering Federated Information Systems (EFIS 2001), Berlin, Germany, October 2001.

More Information

Catharine Wyss

Our faculty research profiles highlight the research interests and accomplishments of a select faculty member from the IU School of Informatics. View all