Library Open Repository
Noise Elimination from the Web Documents by Using URL paths and Information Redundancy
Kang, BH and Kim, YS (2006) Noise Elimination from the Web Documents by Using URL paths and Information Redundancy. In: The 2006 International Conference on Information & Knowledge Engineering, 26-29 Jun, Las Vegas, US.
IKE06-Noise_Eli...pdf | Download (346kB)
Available under University of Tasmania Standard License.
Noise data in the Web document significantly affect on the performance of the Web information management system. Many researchers have proposed document structure based noise data elimination methods. In this paper, we propose a different approach that uses a redundant information elimination approach in the Web documents from the same URL path. We propose a redundant word/phrase filtering method for single or multiple tokenizations. We conducted two experiments to examine efficiency and effectiveness of our filtering approaches. Experimental results show that our approach produces a high performance in these two criteria
|Item Type:||Conference or Workshop Item (Paper)|
|Keywords:||MCRDR Filtering Web|
|Date Deposited:||08 Feb 2007|
|Last Modified:||18 Nov 2014 03:13|
|Item Statistics:||View statistics for this item|
Actions (login required)
|Item Control Page|