MULTIMEDICA: Multilingual Information Extraction in Health domain and application to scientific and informative documents

Referencia
TIN2010-20644-C03-01

The aim of this project is to define and develop information extraction and retrieval techniques based on texts from the medical domain. This will be carried out following two basic tasks: firstly, processing scientific documents in English about pharmacology, and secondly, processing informative texts about health topics in other languages such as Spanish and Arabic. These information extraction techniques include domain entities recognition, pattern recognition, machine learning for extracting semantic relations, and the integration of lexical resources which are specific within the public health system (UMLS, SNOMED and so on) in order to improve applications. On the other hand, the information extracted from the processing task must be used to enrich the information retrieval tools. Thus, three prototypes of searching information will be created in order to show the feasibility of the proposed techniques. The first of them is an application oriented to pharmacists to extract knowledge about drug-drug interactions from scientific publications. The second prototype will be a tool focused on general public or patients to search information about illnesses and medicines. The third one will use the terminology extracted from the Spanish-Arabic parallel corpus to aid terminology teaching in the biomedical domain.

Año
-
Entidades financieras
Plan Nacional de I+D, Ministerio de Ciencia e Innovación
Estado
Activo
Tipo
Público
Investigador principal