Seminario de investigación: "Automatic Text Simplification and Summarization" (18-22 enero 2021) | HULAT | Human Language & Accesibility Technologies Group

Acceso online: https://eu.bbcollab.com/collab/ui/session/guest/35755f40cc9b4fcf90b6bb8a2f7d6800

Resumen:

Text Simplification: Automatic text simplification as an NLP task arose from the necessity to make electronic textual content equally accessible to everyone. Automatic text simplification is a complex task which encompasses a number of operations applied to a text at different linguistic levels. The aim is to turn a complex text into a simplified variant, taking into consideration the specific needs of a particular target user. Automatic text simplification has traditionally had a double purpose. It can serve as preprocessing tool for other NLP applications and it can be used for a social function, making content accessible to different users such as foreign language learners, readers with aphasia, low literacy individuals, etc. The first attempts to text simplification were rule-based syntactic simplification systems however nowadays with the availability of large parallel corpora, such as the English Wikipedia and the Simple English Wikipedia, approaches to automatic text simplification have become more data-driven.

Text simplification is a very active research topic where progress is still needed. In this seminar I will provide the audience with a panorama of more than a decade of work in the area emphasizing also the relevant social function that content simplification can make to the information society.

Text Summarization: A summary is a text with a very specific purpose: to give the reader a concise idea of the contents of another text. The idea of automatically producing summaries has a long story in the field of natural language processing, however, nowadays with the ever growing amount of texts and messages available on-line in public or private networks, this research field has become, more than ever before, key for the information society.

The generation by computers of summaries or abstracts has been addressed from different angles starting with seminal work in the late fifties. The applied techniques were first focused on the generation of sentence extracts and several methods grounded on statistical techniques were proposed to assess the relevance of sentences in a document. In the eighties, Artificial Intelligence symbolic techniques which considered summarization as an example of text understanding focused on the production of abstracts. Hybrid techniques combining symbolic and statistical approaches sometimes relying on machine learning become popular with a renewed interest in summarization in the late nineties. Nowadays, with the availability of huge volumes of texts for training machine learning systems, several methods have emerged in the area of deep learning. In particular, neural networks perform today at the state of the art.

Offering a historical perspective, I will go through relevant solutions in the area of text summarization, emphasizing the role of current machine learning systems. Likewise, I will describe evaluation methods, challenges, and resources available for system development.

Toda la información aquí