AI-Powered entity extraction for archive catalogue enhancement

The research – resulting in this technical report - was one of two projects funded by ARA investigating AI (and other emerging technologies) and their application within the recordkeeping sector:

AI-POWERED ENTITY EXTRACTION FOR ARCHIVE CATALOGUE ENHANCEMENT Analysis for LLM-Based Named Entity Recognition in Legacy Catalogues

Boardman, David [davidb1]; Alexandrina Buchanan and Rosa Methol

D.boardman@liverpool.ac.uk

This technical report provides a detailed analysis of the methodologies, technologies, and implementations used in the digital archives project focused on extracting and analysing named entities from legacy catalogues of archival material relevant to related to the transatlantic trade of enslaved African people. The project tests the feasibility of Large Language Models (LLMs) for Named Entity Recognition (NER) and conducts comprehensive evaluations across different model architectures. It concludes that LLMs have potential in this area but are not yet adequate for identifying the elements needed for converting legacy catalogues into RiC’s-ready finding aids. Whilst this is disappointing, the research has identified useful directions for future work. Our findings will inform future research, including a forthcoming project intended to convert a database data dump of surrogate archive material from the Ministry of Defence into a searchable archive.

This report and the underpinning research were funded by the Archives and Records Association’s Research, Development and Advocacy Fund. The authors would like to record their gratitude to the ARA and its Board for supporting the project.