Library Open Repository

Elimination of Redundant Information for Web Data Mining

Downloads

Downloads per month over past year

Taib, SM and Yeom, SJ and Kang, BH (2005) Elimination of Redundant Information for Web Data Mining. In: International Conference on Information Technology, 4-6 April 2005, Las Vegas, USA.

[img]
Preview
PDF
PID51344.pdf | Download (176kB)
Available under University of Tasmania Standard License.

Abstract

These days, billions of Web pages are created with HTML or other markup languages. They only have a few uniform structures and contain various authoring styles compared to traditional text-based documents. However, users usually focus on a particular section of the page that presents the most relevant information to their interest. Therefore, Web documents classification needs to group and filter the pages based on their contents and relevant information. Many researches on Web mining report on mining Web structure and extracting information from web contents. However, they have focused on detecting tables that convey specific data, not the tables that are used as a mechanism for structuring the layout of Web pages. Case modeling of tables can be constructed based on structure abstraction. Furthermore, Ripple Down Rules (RDR) is used to implement knowledge organization and construction, because it supports a simple rule maintenance based on case and local validation.

Item Type: Conference or Workshop Item (Paper)
Keywords: Web Monitoring, Web information management, Ripple Down Rules, RDR, MCRDR
Publisher: IEEE
Page Range: pp. 200-205
Date Deposited: 18 May 2005
Last Modified: 18 Nov 2014 03:10
URI: http://eprints.utas.edu.au/id/eprint/178
Item Statistics: View statistics for this item

Repository Staff Only (login required)

Item Control Page Item Control Page