Introduction to OCR: tools for turning pdfs into machine-readable data

23 January 2018, 11:00 - 12:30

S3, Alison Richard Bulding, Sidgwick Site

Optical character recognition (OCR) is a term used to describe techniques for converting images containing printed or handwritten text into a format that can be searched and analysed computationally. Despite recent advances in OCR technology, OCR tools available to researchers are not always as accurate as one might hope, and are unable to work with handwritten text without significant time investment and significant amounts of source material written in the same hand. Nevertheless, there are several computational tools that can be applied to images and PDFs to enable text mining and to make scanned documents more searchable. This workshop will introduce several such tools along with some practical techniques for using them, and will also highlight OCR and related services offered by the Digital Content Unit at the Cambridge University Library.

For more information and to register for a place please click here or use the online registration link on this page.