We are thrilled to share the publication of From Zero to Hero: Human-In-The-Loop Entity Linking in Low Resource Domains by Jan-Christoph Klie, Richard Eckart de Castilho, and Iryna Gurevych. The project focused on improving entity linking (EL) annotation by presenting a Human-In-The-Loop annotation approach to speed up the annotation process and make it less tedious.
From Zero to Hero worked with three datasets, including data from Women Writers Online (WWO). Documents from WWO have been annotated with named entities and persons have been linked to create a personography. Candidate ranking was central to the project. Their research found that users preferred their approach and improved annotating speeds around 35%.
Abstract: Entity linking (EL) is concerned with disambiguating entity mentions in a text against knowledge bases (KB). It is crucial in a considerable number of fields like humanities, technical writing and biomedical sciences to enrich texts with semantics and discover more knowledge. The use of EL in such domains requires handling noisy texts, low resource settings and domain-specific KBs. Existing approaches are mostly inappropriate for this, as they depend on training data. However, in the above scenario, there exists hardly annotated data, and it needs to be created from scratch. We therefore present a novel domain-agnostic Human-In-The-Loop annotation approach: we use recommenders that suggest potential concepts and adaptive candidate ranking, thereby speeding up the overall annotation process and making it less tedious for users. We evaluate our ranking approach in a simulation on difficult texts and show that it greatly outperforms a strong baseline in ranking accuracy. In a user study, the annotation speed improves by 35 % compared to annotating without interactive support; users report that they strongly prefer our system. An open-source and ready-to-use implementation based on the text annotation platform INCEpTION is made available.
Klie presented the project at ACL 2020, a virtual conference for the Association for Computational Linguistics (ACL). Watch his presentation here.
If you would like to use the XML files for the texts in Women Writers Online in your own research, please send an email with a brief description of your research plans to [email protected].