A Machine Learning Approach for Identifying Disease-Treatment Relations in Short Texts
Our Price
₹2,500.00
10000 in stock
Support
Ready to Ship
Description
Information in text form remains a greatly underutilized resource in biomedical applications. We have begun a research effort aimed at learning routines for automatically mapping information from biomedical text sources, such as Medline, into structured representations, such as knowledge bases. The Medline database is a rich source of information for the biomedical sciences, providing bibliographic information and abstracts for more than nine million articles. A fundamental limitation of Medline and similar sources, however, is that the information they contain is not represented in structured format, but instead in natural language text. The articles in Medline describe a vast web of relationships among the genes, proteins, pathways, tissues and diseases of various systems and organisms of interest. The goal of our research is to develop methods that can inexpensively and accurately map information in scientific text sources, such as Medline into a structured representation, such as a knowledge base or a database. Toward this end, we are investigating methods for automatically extracting key facts from scientific texts. There are three major approaches used in extracting relations between entities: co-occurrences analysis, rule based approaches, and statistical methods. The co-occurrences methods are mostly based only on lexical knowledge and words in context, and even though they tend to obtain good levels of recall, their precision is low. This scheme describes a ML-based methodology for building an application that is capable of identifying and disseminating healthcare information. It extracts sentences from published medical papers that mention diseases and treatments, and identifies semantic relations that exist between diseases and treatments. First identifies sentences from Medline published abstracts that talk about diseases and treatments. Second we focus on three relations: Cure, Prevent, and Side Effect, a subset of the eight relations that the corpus is annotated with. We decided to focus on these three relations.
Tags: 2012, Data Mining Projects, Java


