Open Access Repository

Effectiveness of methods for syntactic and semantic recognition of numeral strings: tradeoffs between number of features and length of word N-grams

Downloads

Downloads per month over past year

Min, KH and Wilson, WH and Kang, BH (2007) Effectiveness of methods for syntactic and semantic recognition of numeral strings: tradeoffs between number of features and length of word N-grams. In: 20th Australian Joint Conference on Artificial Intelligence (AI2007), 2-6 December 2007, Gold Coast, Australia.

[img] PDF
4776.pdf | Request a copy
Full text restricted

Abstract

This paper describes and compares the use of methods based on N-grams (specifically trigrams and pentagrams), together with five features, to recognise the syntactic and semantic categories of numeral strings representing money, number, date, etc., in texts. The system employs three interpretation processes: word N-grams construction with a tokeniser; rule-based processing of numeral strings; and N-gram-based classification. We extracted numeral strings from 1,111 online newspaper articles. For numeral strings interpretation, we chose 112 (10%) of 1,111 articles to provide unseen test data (1,278 numeral strings), and used the remaining 999 articles to provide 11,525 numeral strings for use in extracting N-gram-based constraints to disambiguate meanings of the numeral strings. The word trigrams method resulted in 83.8% precision, 81.2% recall ratio, and 82.5% in F-measurement ratio. The word pentagrams method resulted in 86.6% precision, 82.9% recall ratio, and 84.7% in F-measurement ratio.

Item Type: Conference or Workshop Item (Paper)
Keywords: numeral strings - N-grams - named entity recognition - natural language processing
Page Range: pp. 445-455
Additional Information:

The original publication is available at www.springerlink.com

Date Deposited: 07 Apr 2008 14:57
Last Modified: 18 Nov 2014 03:36
Item Statistics: View statistics for this item

Actions (login required)

Item Control Page Item Control Page
TOP