Open Access Repository

Incremental knowledge-based system for schema mapping

Downloads

Downloads per month over past year

Anam, S (2016) Incremental knowledge-based system for schema mapping. PhD thesis, University of Tasmania.

[img]
Preview
PDF (Whole thesis)
Anam_whole_thes...pdf | Download (2MB)
Available under University of Tasmania Standard License.

| Preview

Abstract

Schemas describe the data structures of various domains such as purchase order, conference, health and music. A large number of schemas are available on the Web. Since
different schema elements may have the same semantics but exist in distinct schemas,
it is important to manage their semantic heterogeneity. Schema matching is usually
used to determine mappings between semantically correspondent elements of different
schemas. It can be conducted manually, semi-automatically and automatically. Man-
ual matching is a time-consuming, error-prone and expensive process. Fully-automated
matching is not possible because of the complexity of the schemas.
This research investigated semi-automatic schema matching systems to overcome
manual works for schema mapping. In general, these systems use machine learning
and knowledge engineering approaches. Machine learning approaches require training
datasets for building matching models. However, it is usually very diffcult to ob-
tain appropriate training datasets for large datasets and to change the trained models
once mapped. Knowledge engineering approaches require domain experts and time-
consuming knowledge acquisition. In order to solve these problems, an incremental
knowledge engineering approach - Ripple-Down Rules (RDR) can be a promising approach since it allows its knowledge to grow incrementally. However, acquiring matching rules is still a time-intensive task. In order to overcome the limitations of these
independent approaches, a hybrid approach called Hybrid-RDR has been developed
by combining a machine learning approach with the Censor Production Rules (CPR)
based RDR approach.
First, the most suitable machine learning algorithm, J48 is determined by comparing eleven machine learning approaches including decision trees, rules, Naive Bayes,
AdaBootM1, and later combined with CPR based RDR for building Hybrid-RDR ap-
proach. This approach constructs a matching model using J48. When new data are available, the model may suggest incorrect matchings for some cases which are corrected
by incrementally adding rules to the knowledge base. The approach reuses the previous
match operations (rules) and handles the schema matching problems using an incremental knowledge acquisition process. So users do not need to add, delete or modify
schema matching results manually. The Hybrid-RDR approach works for element-level
matching that only considers matching names of schema elements. Structure-level
matching that considers the hierarchical structure of the schema, is required to adjust
incorrect matches found from the element-level matching.
A Knowledge-based Schema Matching System (KSMS) has also been developed
that performs element-level matching using Hybrid-RDR and structure-level matching using Similarity Flooding algorithm. This algorithm considers the concept that
two nodes are similar when their neighbor elements are similar. The final mappings
are generated by combining the results of element-level matching and structure-level
matching using aggregation functions. In order to evaluate the performance of the
system, evaluations using real world schemas found on the Web have been conducted.
Experimental results have shown that the system determines good performance both
at element-level matching and structure-level matching. This research has resolved
the ongoing problem of elements having different names within different schemas. The
KSMS allows for matching of different schemas to produce accurate mappings.

Item Type: Thesis (PhD)
Keywords: Schema matching, schema mapping, string similarity metrics, text processing techniques, machine learning approaches, Ripple Down Rules, Hybrid-RDR approach
Copyright Information:

Copyright 2016 the author

Date Deposited: 12 Oct 2016 01:31
Last Modified: 19 Oct 2016 03:40
Item Statistics: View statistics for this item

Actions (login required)

Item Control Page Item Control Page
TOP