Library Open Repository

Noise Elimination from the Web Documents by Using URL paths and Information Redundancy

Downloads

Downloads per month over past year

Kang, BH and Kim, YS (2006) Noise Elimination from the Web Documents by Using URL paths and Information Redundancy. In: The 2006 International Conference on Information & Knowledge Engineering, 26-29 Jun, Las Vegas, US.

[img]
Preview
PDF
IKE06-Noise_Elimination_from.pdf | Download (346kB)
Available under University of Tasmania Standard License.

Abstract

Noise data in the Web document significantly affect on the performance of the Web information management system. Many researchers have proposed document structure based noise data elimination methods. In this paper, we propose a different approach that uses a redundant information elimination approach in the Web documents from the same URL path. We propose a redundant word/phrase filtering method for single or multiple tokenizations. We conducted two experiments to examine efficiency and effectiveness of our filtering approaches. Experimental results show that our approach produces a high performance in these two criteria

Item Type: Conference or Workshop Item (Paper)
Keywords: MCRDR Filtering Web
Date Deposited: 08 Feb 2007
Last Modified: 18 Nov 2014 03:13
URI: http://eprints.utas.edu.au/id/eprint/723
Item Statistics: View statistics for this item

Repository Staff Only (login required)

Item Control Page Item Control Page