Pattern based Topics for Document Modelling in Information Filtering
US$40.99
10000 in stock
SupportDescription
Topic modelling, such as Latent Dirichlet Allocation (LDA), was proposed to generate statistical models to represent multiple topics in a collection of documents, which has been widely utilized in the fields of machine learning and information retrieval, etc. But its effectiveness in information filtering is rarely known. Patterns are always thought to be more representative than single terms for representing documents. However, in reality users’ interests can be diverse and the documents in the collection often involve multiple topics. But in a topic model, topics are represented by distributions over words which are limited to distinctively represent the semantics of topics. Patterns are always thought to be more discriminative than single terms and are able to reveal the inner relations between words. In this project, a new topic model, called MPBTM is proposed for document representation and document relevance ranking. The proposed model consists of topic distributions describing topic preferences of documents or a document collection and structured pattern-based topic representations representing the semantic meaning of topics in a document. The patterns in the MPBTM, on the contrary, are grouped based on their support and are structured based on their taxonomic relationship. The patterns in the MPBTM deliver specificity that is enhanced by using the association of words and the taxonomic levels of the patterns. The proposed approach incorporates the semantic topics from topic modeling and the specificity of the representative patterns. The proposed model has been evaluated by using RCV1 and TREC topics for the task of information filtering. Comparing with the state-of-the-art models, MPBTM demonstrates excellent strength on document modeling and relevance ranking.