Elimination of Redundant Information for Web Data Mining
Taib, SM and Yeom, SJ and Kang, BH (2005) Elimination of Redundant Information for Web Data Mining. In: International Conference on Information Technology, 4-6 April 2005, Las Vegas, USA. ![[img]](http://eprints.utas.edu.au/style/images/fileicons/application_pdf.png)  Preview |
| PDF - Requires a PDF viewer 172Kb |
AbstractThese days, billions of Web pages are created with
HTML or other markup languages. They only have a few
uniform structures and contain various authoring styles
compared to traditional text-based documents. However,
users usually focus on a particular section of the page
that presents the most relevant information to their
interest. Therefore, Web documents classification needs
to group and filter the pages based on their contents and
relevant information. Many researches on Web mining
report on mining Web structure and extracting
information from web contents. However, they have
focused on detecting tables that convey specific data, not
the tables that are used as a mechanism for structuring
the layout of Web pages. Case modeling of tables can be
constructed based on structure abstraction. Furthermore,
Ripple Down Rules (RDR) is used to implement
knowledge organization and construction, because it
supports a simple rule maintenance based on case and
local validation. Repository Staff Only: item control page
|