Semantic Similarity Measurement between Words using Lexical Patterns
Abstract
Semantic similarity measurement between words is a tedious task in web mining, information extraction and natural language processing. The semantic similarity measurement between entities is required in Web mining applications such as community extraction, identification of relations etc. In this paper, the authors proposed an automatic approach to evaluate the logical or semantic similarity between words or entities with the help of web search engines. To describe distinct word co-occurrence measures and to integrate these with lexical patterns, page counts are used. In order to identify meaningful relationships between two given words, the authors proposed a new pattern extraction algorithm and a pattern clustering algorithm. Vector Support Machine (VSM) is used to acquire the optimal combination of page counts-based co-occurrence measures and lexical pattern clusters. The proposed method overcomes various previously proposed web-based similarity measures on the benchmark data sets that showed a high correlation with human ratings.