Open Access Repository

Elimination of Redundant Information for Web Data Mining


Downloads per month over past year

Taib, SM, Yeom, SJ and Kang, BH 2005 , 'Elimination of Redundant Information for Web Data Mining', paper presented at the International Conference on Information Technology, 4-6 April 2005, Las Vegas, USA.

PID51344.pdf | Download (176kB)
Available under University of Tasmania Standard License.

| Preview


These days, billions of Web pages are created with
HTML or other markup languages. They only have a few
uniform structures and contain various authoring styles
compared to traditional text-based documents. However,
users usually focus on a particular section of the page
that presents the most relevant information to their
interest. Therefore, Web documents classification needs
to group and filter the pages based on their contents and
relevant information. Many researches on Web mining
report on mining Web structure and extracting
information from web contents. However, they have
focused on detecting tables that convey specific data, not
the tables that are used as a mechanism for structuring
the layout of Web pages. Case modeling of tables can be
constructed based on structure abstraction. Furthermore,
Ripple Down Rules (RDR) is used to implement
knowledge organization and construction, because it
supports a simple rule maintenance based on case and
local validation.

Item Type: Conference or Workshop Item (Paper)
Authors/Creators:Taib, SM and Yeom, SJ and Kang, BH
Keywords: Web Monitoring, Web information management, Ripple Down Rules, RDR, MCRDR
Publisher: IEEE
Item Statistics: View statistics for this item

Actions (login required)

Item Control Page Item Control Page