15 Jun 2020 3:30pm - 4:30pm Online


Machine Reading the Archive – End of Programme Workshop 2019/20.

This end of programme workshop will showcase the digital archive projects created by our cohort of project participants as well as an invited contribution from Federico Nanni, a leading expert in the field.

Federico Nanni is a Research Data Scientist at The Alan Turing Institute, working as part of the Research Engineering Group, and a visiting fellow at the School of Advanced Study, University of London. He completed a PhD in History of Technology and Digital Humanities at the University of Bologna focusing on the use of web archives in historical research and has been a post-doc in Computational Social Science at the Data and Web Science Group of the University of Mannheim. He also spent time as a visiting researcher at the Foundation Bruno Kessler and the University of New Hampshire, working on Natural Language Processing and Information Retrieval.

Federico will offer an overview of his research at the intersection of information retrieval and digital libraries. In particular, he will focus on approaches for the automatic creation of fine-grained collections from large-scale web archives, in order to support studies in digital humanities and computational social sciences concerning specific events, entities and topics.

Chair: Anne Alexander, CDH Learning Director

The introduction to each digital archive project created by our cohort can be found here.

Booking for this workshop is required.


About Machine Reading the Archive

Machine Reading the Archive aims to bring humanities researchers, archivists and computer scientists together to explore the challenges of working with archives in the digital age. Through a series of reading group sessions, practical workshops, technical demonstrations, field trips and a one-day end-of-programme workshop, we hope to seed new collaborations and encourage the exchange of ideas and practices across professions and disciplines. The programme is born out of a recognition that the practice of making, curating and using archives has been changed by the adoption of digital technologies, at both an institutional and individual level.

Archives and library special collections are developing new roles as platforms for different kinds of data, held in a variety of formats from xml, to pdfs and tiffs, rather than physical containers for people, books and documents. Many researchers return from visits to the archive (or the archive’s website) having filled hard drives with collections of digital photographs of rare books, documents, manuscripts, maps, pictures and objects of scholarly interest whose fragility and immobility required the production of a digital copy. The digital archive thus seeds new private sub-collections on researchers’ laptops and tablets, at times a promising and overwhelmingly rich resource and at other times remaining invisible and inaccessible; while growing in scale and complexity over the trajectory of a scholarly life.

The primary aim of Machine Reading the Archive is to help participants develop a deeper understanding of the challenges and possibilities of working with archival data in the digital age, drawing on theory, methods and practice from the humanities, computer science and the archival profession. The program provides a chance to develop skills to engage with existing digital archives in new ways, to turn a cluttered hard drive of archival photographs into a refined dataset or to embark on the mission of text-mining to reveal new aspects in existing research or lay the groundwork for prospective projects. In addition to providing participants the chance to learn practical skills and experiment with digital methods using their own or provided datasets, the framework of the course is designed to ignite reflection on the significance of the ways private and institutional digital archives are sorted, structured and accessed and to discuss how these insular knowledge infrastructures impact and influence writing, thinking and the development of research projects.

Upcoming Events


Tel: +44 1223 766886
Email enquiries@crassh.cam.ac.uk