Artículo "A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case" | HULAT | Human Language & Accesibility Technologies Group

The article is available here and the dataset is available at Zenodo repository here. This article describes a dataset on nut allergy extracted from Spanish clinical records provided by the Hospital Universitario Fundación Alcorcón (HUFA) in Madrid, Spain, in collaboration with its Allergology Unit and Information Systems and Technologies Department.

There are few publicly available clinical texts in Spanish and having more is essential as a valuable resource to train and test information extraction systems. In total, 828 clinical notes in Spanish were employed and several experts participated in the annotation process by categorizing the annotated entities into medical semantic groups related to allergies. To evaluate inter-annotator agreement, a triple annotation was performed on 8% of the texts.

The guidelines followed to create the corpus are also provided. To determine the validation of the corpus and introduce a real use case, we performed some experiments using this resource in the context of a supervised named entity recognition (NER) task by fine-tuning encoder-based transformers. In these experiments, an average F-measure of 86.2% was achieved. These results indicate that the corpus used is suitable for training and testing approaches to NER related to the field of allergology.