Information Extraction & Object Views

Loading...
Thumbnail Image

Degree type

Discipline

Subject

Information extraction
object data model
object view.

Funder

Grant number

License

Copyright date

Distributor

Related resources

Contributor

Abstract

Information extraction consists in identifying classes of events and relationships between extracted instances of these classes. In general, extracted data usually fills slots in a template and is stored in tables. We propose to extend the usual approach to the use of an object database. Information extraction tools have a conceptual representation as schema components: concept classes, meta-concepts and attributes. The user expresses in his query a structure (target structure) which corresponds to his understanding of the domain and is used as a schema for the database. We use the object data model whose syntax matches both the user's target structure and the conceptual representation of extracting capabilities. Query evaluation consists in first determining the schema of the database as expressed by the user, and secondly populating the database through methods invoking extraction tools on a given source of documents. In a third step, it returns the output of the query against the resulting database. The two first steps define an object view of the given source(s) as a materialized extension of the current schema (each refinement of a query may add more structure, and thus more extracted data) followed by a non-materialized projection. Our approach is user-oriented: the object representation of data provides the user with the flexibility of asking his query with his understanding of the domain, and object views are built on-the-fly according to the user's organization of data. The modularity of the conceptual representation of extraction capabilities in a pool of schema components enables easy plug-in of new extracting tools.

Advisor

Date Range for Data Collection (Start Date)

Date Range for Data Collection (End Date)

Digital Object Identifier

Series name and number

Publication date

1998-03-01

Volume number

Issue number

Publisher

Publisher DOI

relationships.isJournalIssueOf

Comments

University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-98-11.

Recommended citation

Collection