HULAT, CSIC and the Vienna University of Technology present a key study for the detection of biomedical texts generated by AI. | HULAT | Human Language & Accesibility Technologies Group

Leonardo Campillo presentando MedAID-ML en CLEF

On September 11, the HULAT (Human Language Technologies) research group, in collaboration with the Spanish National Research Council (CSIC) and TU Wien (Vienna University of Technology), presented their paper “MedAID-ML: A Multilingual Dataset of Biomedical Texts for Detecting AI-Generated Content” at the renowned international conference CLEF (Conference and Labs of the Evaluation Forum).

MedAID-ML is a new dataset designed to advance the detection of AI-generated texts in the biomedical domain. It brings together original human-written documents, collected from public and authorized sources, alongside texts produced by some of today’s most powerful language models: Mistral-7b, Llama-3.1-70b, and GPT-4o. In addition, it includes parallel, carefully revised translations in four languages —English, German, Spanish, and French— making it a highly valuable multilingual resource. With more than 13,000 documents and nearly 3.8 million words, MedAID-ML stands out as a key tool to foster research in this emerging field.

In this work, the researchers explored different language model architectures for the automatic detection of AI-generated texts, achieving very promising results. They also applied explainability methods that provide valuable insights into the linguistic patterns and features most common in AI-generated content.

One of the study’s most striking findings comes from the human evaluation. The researchers asked a group of experts to determine whether a text had been written by a human or by AI. The results were clear: with an accuracy of just about 50%, the task proved extremely difficult even for specialists. This underscores the urgent need for robust automatic detection tools.

This work represents a crucial step towards ensuring that medical information is reliable and transparent —a fundamental requirement for both patient safety and the integrity of scientific research.

The presentation at CLEF, a global benchmark in the evaluation of information retrieval systems, marks a significant milestone for the HULAT group and its collaborators. The project strengthens their leadership in the field of natural language processing and highlights the importance of international collaboration in addressing the challenges of artificial intelligence.