Archives and AI Part 1: Opportunities

Archives are knowledge-focused organizations, but they lag in adopting Artificial Intelligence and Machine Learning technology. This article outlines specific AI and ML use-cases and opportunities for archives. My goal is to stimulate archivists to think about possibilities and technologists to consider new opportunities.  

Many opportunities arise only when archival materials are in digital form. Every year, a larger portion of archival materials are born digital. Archives can also digitise materials when they are physical. Digitisation by itself can enable researchers to access materials without a journey to the archive, but when it is coupled with AI tools like handwriting recognition or image analysis, benefits really start to flow.

In this article, I’m focusing on the distinctive business processes of archives. I take these broadly to include national, regional, city, university, business, and cultural heritage institutions that preserve significant records for current and future use. For now, I’ll consider five core business processes:

  1. Appraisal and Acquisition: Determining which records to add to the collection based on their value and the archive’s mission, policy, and resources. This applies to gifts and loans as well.
  2. Arrangement and Description: Organizing and describing materials for findability and understanding. Arrangement follows archival principles like provenance and original order. Description creates finding aids, catalogue records, and other discovery tools.
  3. Preservation: Ensuring that holdings survive for the long term through proper storage, monitoring, and preventive measures. This includes disaster planning and recovery for both digital and physical materials.
  4. Access: Providing access through research guides, discovery tools, reference services, and digital access services.
  5. Outreach: Actively promoting collections and their value through exhibitions, educational programs, and publications.

Modern AI tools, such as LLMs (large language models) including Google’s Gemini, OpenAI’s ChatGPT, Anthropic’s Claude, or Meta’s Llama can provide benefits for every one of these processes. The first three are oriented towards archivists. Depending on the organisation, access is often oriented towards researchers, while outreach is oriented towards the more general public.

Appraisal and acquisition

Let’s consider two situations.

  • A national archive receives materials from a government department on a scheduled basis. One of the main tasks here is to ensure that the archive gets everything that it is supposed to. Another is that it did not receive things that it shouldn’t. It is challenging for an archivist to be kept up to date on the major projects or even organisational components of a department. But an AI system could read departmental and government web sites to identify projects, organisational components, and key people. It could then cross-check to see that the relevant material is included.
  •  An archivist is considering whether to accept a private donation. She has expert staff, but she realizes that it will be years before they have time to organise and describe the content. That means it will be years before any researcher will be able to use it, if ever. If it can’t be used, it has little value. If material is digital or digitised, then an AI system could carry out the second step of providing an initial description that would enable archivists and researchers to better understand it.

For me, this is one of the most important contributions. By reducing the cost of description, AI can reduce the cost of acquisition. I used to hate it when I worked at the British Library and we had to turn down fascinating donations – often the archive of an author, researcher, or artist – because we couldn’t afford to make it accessible. Even worse was when we could have raised money to cover the cost of digitisation but not enough for description.

I think that the acquisition process can also be supported by AI, but those use cases are broadly shared by other organizations that negotiate over and price materials such as auction houses, galleries, or many other businesses.

Arrangement and description

This is an area that has huge potential – especially if material is in digital form or can be digitised. Current AI is very capable at summarising documents, identifying people, places, projects, building up a timeline, and many other aspects that lead to useful descriptions. They can even do a decent job of identifying things like key people or important projects. There have also been major improvements over the last few years in handling images, audio, and video material. This could make material that was previously completely inaccessible or very expensive to describe a full part of the archive.

Preservation

For now, off-the-shelf AI components may have less to contribute to this area. There are, however, many possibilities. For example, a drone could take images or video of (open) storage areas that could be automatically reviewed to identify damage, rot, or signs of rodents. People working with obsolete software systems could consult a system that detailed knowledge from the manuals, user forums, and YouTube videos and was able to show them how to accomplish their tasks. Photographs could be digitised and minor damage like scratches repaired.

Access

AI tools are absolutely revolutionising how researchers interrogate and interact with content. We are all doing it! Summarise this document! What is this project about? Explain this long email chain! Who are the key people involved? Tell me about this artist’s broader career!  What was the broader social context for this work? Describe the photos in this folder! Who is speaking on this recording? Translate this to a language that I speak!

Researchers need a new generation of AI-enabled tools to work with and understand archival materials. This has the potential not only to accelerate their work, but also to enable them to use material that they would never have considered before because it was too much additional work to find, access, or understand. This could dramatically increase the real value of archival material and of archives themselves.

Of course, when a summary is created – just as one a researcher reads an item – there are opportunities for misunderstanding and unexpected biases to be introduced. In a later article, I’ll discuss how I like to think about the contributions that these sorts of tools make and point to some new research in this area.

Outreach

Archival collections can bring value to diverse distributed communities. AI tools have potential to lower the cost of people interacting with collections in ways that they find meaningful – in a language and style that is natural for them. Sometimes this is literally language. Current LLMs can do a brilliant job of translating text from one language to another. But they are also good at paraphrasing in a way that makes it easier for someone to understand. For example, I can ask a for the system to “explain it like I’m five” or “paraphrase this without the math.” With historical English texts, I can ask for it to be paraphrased in modern English. All of these methods can introduce some errors or misunderstandings, but they also mean that people have an enriched starting point.

Current LLMs also make it easier to embed archival items in a broader business, cultural, and historic context. People can have open ended conversations that might approximate what they might have with a curator. AI tools also open up collections to people who are not comfortable asking questions directly to a person or coming into a traditional archive building. They never get tired of answering the same question or explaining in a new way.

Conclusion

My goal in this article was to explore AI and ML use-cases for archives and help to build a bridge between archivists and technologists. Archivists need to develop a realistic vision of how AI can benefit them. Technologists need to understand more about what archives do so that they can develop helpful solutions.

The next article in this series will talk about some of the barriers to introducing AI technology in Archives and how to address them. After that, we will look at the intersection of possibilities and technology and build up a timeline to guide thinking about implementations.

Leave a Reply

Your email address will not be published. Required fields are marked *