TRENDMINER: Large-scale Cross-lingual Trend Mining of Real-time media streams

The recent massive growth in online media and the rise of user-authored content (e.g weblogs, Twitter, Facebook) has lead to challenges of how to access and interpret these strongly multilingual data, in a timely, efficient, and affordable manner. Scientifically, streaming online media pose new challenges, due to their shorter, noisier, and more colloquial nature. Moreover, they form a temporal stream strongly grounded in events and context. Consequently, existing language technologies fall short onaccuracy, scalability and portability. The goal of this project is to deliver.

MULTIMEDICA: Multilingual Information Extraction in Health domain and application to scientific and informative documents

The aim of this project is to define and develop information extraction and retrieval techniques based on texts from the medical domain. This will be carried out following two basic tasks: firstly, processing scientific documents in English about pharmacology, and secondly, processing informative texts about health topics in other languages such as Spanish and Arabic.

BRAVO: Multimodal and Multilingual Advanced Answers Search

BRAVO is devoted to research on technologies to improve the answers search in both text and voice, and the main result is a platform for a modular answers search system which allows to measure the improvement of different techniques for questions classification, answer extraction, passages retrieval, etc. SPINDEL is one of the techniques developed in this project, an entity recognizer which, regardless of language, applies machine learning based on bootstrapping.