A Hash Map based Binary Matrix Approach for Text Document Classification

Suhail Afroz; M. Hanimi Reddy

Suhail Afroz Assoc. Professor, CVR College Of Engineering / CSE Department, Hyderabad, India.
M. Hanimi Reddy Sr. Asst. Professor, CVR College Of Engineering / CSE Department, Hyderabad, India

Abstract

The conventional model uses the sequential approach for classifying the text document. In this paper, a new approach for the text document classification is proposed. The proposed method preserves the sequence of words that are occurring in a document. The data structure that is used in this method to preserve the word sequences is called “Binary Matrix”. A classification technique is also proposed for classifying the text document. To index the terms, it uses Hash Map and this is associated with the list of class labels of the document in which the word is present.