Julián Moreno Schneider (Projects)

TRENDMINER: Large-scale Cross-lingual Trend Mining of Real-time media streams

The recent massive growth in online media and the rise of user-authored content (e.g weblogs, Twitter, Facebook) has lead to challenges of how to access and interpret these strongly multilingual data, in a timely, efficient, and affordable manner. Scientifically, streaming online media pose new challenges, due to their shorter, noisier, and more colloquial nature. Moreover, they form a temporal stream strongly grounded in events and context. Consequently, existing language technologies fall short onaccuracy, scalability and portability. The goal of this project is to deliver. innovative, portable open-source real-time methods for cross-lingual mining and summarisation of large-scale stream media. TrendMiner will achieve this through an inter-disciplinary approach, combining deep linguistic methods from text processing, knowledge-based reasoning from web science, machine learning, economics, and political science. No expensive human annotated data will be required due to our use of time-series data (e.g. financial markets, political polls) as a proxy. A key novelty will be weakly supervised machine learning algorithms for automatic discovery of new trends and correlations. Scalability and affordability will be addressed through a cloud-based infrastructure for real-time text mining from stream media. Results will be validated in two high-profile case studies: financial decision support (with analysts, traders, regulators, and economists) and political analysis and monitoring (with politicians, economists, and political journalists). The techniques will be generic with many business applications: business intelligence, customer relations management, community support. The project will also benefit society and ordinary citizens by enabling enhanced access to government data archives, summarisation of online health information, and tracking of hot societal issues.

Reference: FP7-ICT 287863
Financing: European Commission
Project type: Público
State: Activo
Principal investigator: Paloma Martínez Fernández
Other investigator: Paloma Martínez Fernández, Lourdes Moreno, Isabel Segura Bedmar, Julián Moreno Schneider, María González García, María Herrero Zazo, Ricardo Revert Arenaz
Duration: 2013 - 2014
https://cordis.europa.eu/project/id/287863

MULTIMEDICA: Multilingual Information Extraction in Health domain and application to scientific and informative documents

The aim of this project is to define and develop information extraction and retrieval techniques based on texts from the medical domain. This will be carried out following two basic tasks: firstly, processing scientific documents in English about pharmacology, and secondly, processing informative texts about health topics in other languages such as Spanish and Arabic. These information extraction techniques include domain entities recognition, pattern recognition, machine learning for extracting semantic relations, and the integration of lexical resources which are specific within the public health system (UMLS, SNOMED and so on) in order to improve applications. On the other hand, the information extracted from the processing task must be used to enrich the information retrieval tools. Thus, three prototypes of searching information will be created in order to show the feasibility of the proposed techniques. The first of them is an application oriented to pharmacists to extract knowledge about drug-drug interactions from scientific publications. The second prototype will be a tool focused on general public or patients to search information about illnesses and medicines. The third one will use the terminology extracted from the Spanish-Arabic parallel corpus to aid terminology teaching in the biomedical domain.

Reference: TIN2010-20644-C03-01
Financing: Plan Nacional de I+D, Ministerio de Ciencia e Innovación
Project type: Público
State: Activo
Principal investigator: Paloma Martínez Fernández
Other investigator: Paloma Martínez Fernández, Lourdes Moreno, Elena Castro Galán, Ana M. Iglesias Maqueda, Isabel Segura Bedmar, María Teresa Vicente-Díez, José Luis Martínez Fernández, Julián Moreno Schneider, Daniel Sánchez Cisneros, María Herrero Zazo
Duration: 2011 - 2013
http://labda.inf.uc3m.es/multimedica/

BRAVO: Multimodal and Multilingual Advanced Answers Search

BRAVO is devoted to research on technologies to improve the answers search in both text and voice, and the main result is a platform for a modular answers search system which allows to measure the improvement of different techniques for questions classification, answer extraction, passages retrieval, etc. SPINDEL is one of the techniques developed in this project, an entity recognizer which, regardless of language, applies machine learning based on bootstrapping. In the framework of BRAVO project, one of the current research areas is related to the location of drug names and interactions between them in the medical literature using UMLS, dictionaries and USAN rules of naming drugs. As a result, it is available automatically annotated corpus using the DrugNer system (developed by the Advances Databases Group) with generic drug names and other biomedical concepts and manually evaluated by a pharmacological expert. The system combines information obtained by the UMLS MetaMap Transfer (MMTx) program and nomenclature rules recommended by the World Health Organization (WHO) International Nonproprietary Names (INNs) Program to identify and classify pharmaceutical substances

Reference: TIN2007-67407-C03-01
Financing:
Project type: Público
State: Activo
Principal investigator: Paloma Martínez Fernández
Other investigator: Lourdes Moreno, Elena Castro Galán, Ana M. Iglesias Maqueda, César De Pablo Sánchez, Isabel Segura Bedmar, María Teresa Vicente-Díez, José Luis Martínez Fernández, Belén Ruiz-Mezcua, Julián Moreno Schneider, Mario Crespo
Duration: 2007 - 2010