Automatic Text Recognition: diving into the background

27 Mar 2018

11:00am - 1:00pm

S1, Alison Richard Building

Description

Description

Speaker: Professor Roger Labahn (University of Rostock, READ project)

Automatic Text Recognition (ATR) is increasingly becoming an essential core component of application software in Digital Humanities. After years of working with “classical” OCR (Optical Character Recognition) to printed texts, we are now seeing impressive results from applying ATR to demanding handwritten texts. The shift from OCR to ATR requires, however, the development of entirely new paradigms in algorithms and technology: Rather than processing single characters, entire sequences have to be considered, e.g. words, lines, whole text cells or paragraph blocks, and even entire pages. Moreover, rather than meeting the demands of traditional full-text reading, utilization targets like Keyword Searching / Spotting (KWS) and Advanced Text Investigation (ATI) are receiving increasing attention in both the techology and the application domains.

This presentation will offer a general survey of the foundations of these new approaches and explain in more detail selected basic algorithms of contemporary ATR technology, focussing on Machine Learning with mainly Recurrent Neural Networks. We will also explore fundamental decoding ideas, i.e. how to move from the network's 'magic' output to meaningful recognition results, providing a more elaborated introduction to its technological background in order to successfully use advanced KWS software. Finally, we will demonstrate realistic application examples from and with the Transkribus platform.

Spaces are limited and should be booked in advance here or via the online registration link on this page.

A sandwich lunch will be provided – please email Michelle Maciejewska (mm405 @ cam.ac.uk) by 19 March if you have any specific dietary requirements. If you book a ticket and find you can no longer attend, please cancel through Eventbrite so that we can cater accurately.