Skip to main content

Showing 1–5 of 5 results for author: Kettunen, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2206.00369  [pdf

    cs.CL

    Optical character recognition quality affects perceived usefulness of historical newspaper clippings

    Authors: Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen, Juha Rautiainen

    Abstract: Introduction. We study effect of different quality optical character recognition in interactive information retrieval with a collection of one digitized historical Finnish newspaper. Method. This study is based on the simulated interactive information retrieval work task model. Thirty-two users made searches to an article collection of Finnish newspaper Uusi Suometar 1869-1918 with ca. 1.45 millio… ▽ More

    Submitted 1 June, 2022; originally announced June 2022.

    Comments: 21 pages, 6 figures, 2 tables, 1 appendix. arXiv admin note: substantial text overlap with arXiv:2203.03557

  2. arXiv:2203.03557  [pdf

    cs.IR cs.CL cs.DL

    OCR quality affects perceived usefulness of historical newspaper clippings -- a user study

    Authors: Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen, Juha Rautiainen

    Abstract: Effects of Optical Character Recognition (OCR) quality on historical information retrieval have so far been studied in data-oriented scenarios regarding the effectiveness of retrieval results. Such studies have either focused on the effects of artificially degraded OCR quality (see, e.g., [1-2]) or utilized test collections containing texts based on authentic low quality OCR data (see, e.g., [3]).… ▽ More

    Submitted 4 March, 2022; originally announced March 2022.

    Comments: IRCDL2022

    Journal ref: IRCDL 2022 Italian Research Conference on Digital Libraries 2022, http://ceur-ws.org/Vol-3160/

  3. arXiv:2002.12793  [pdf, ps, other

    cs.PL

    Behavioural Types for Memory and Method Safety in a Core Object-Oriented Language

    Authors: Mario Bravetti, Adrian Francalanza, Iaroslav Golovanov, Hans Hüttel, Mathias Steen Jakobsen, Mikkel Klinke Kettunen, António Ravara

    Abstract: We present a type-based analysis ensuring memory safety and object protocol completion in the Java-like language Mungo. Objects are annotated with usages, typestates-like specifications of the admissible sequences of method calls. The analysis entwines usage checking, controlling the order in which methods are called, with a static check determining whether references may contain null values. The… ▽ More

    Submitted 28 February, 2020; originally announced February 2020.

  4. arXiv:1611.05239  [pdf

    cs.CL

    How to do lexical quality estimation of a large OCRed historical Finnish newspaper collection with scarce resources

    Authors: Kimmo Kettunen

    Abstract: The National Library of Finland has digitized the historical newspapers published in Finland between 1771 and 1910. This collection contains approximately 1.95 million pages in Finnish and Swedish. Finnish part of the collection consists of about 2.40 billion words. The National Library's Digital Collections are offered via the digi.kansalliskirjasto.fi web service, also known as Digi. Part of the… ▽ More

    Submitted 17 October, 2019; v1 submitted 16 November, 2016; originally announced November 2016.

    Comments: 23 pages, 6 tables, 6 figures

  5. arXiv:1611.02839  [pdf

    cs.CL

    Old Content and Modern Tools - Searching Named Entities in a Finnish OCRed Historical Newspaper Collection 1771-1910

    Authors: Kimmo Kettunen, Eetu Mäkelä, Teemu Ruokolainen, Juha Kuokkala, Laura Löfberg

    Abstract: Named Entity Recognition (NER), search, classification and tagging of names and name like frequent informational elements in texts, has become a standard information extraction procedure for textual data. NER has been applied to many types of texts and different types of entities: newspapers, fiction, historical records, persons, locations, chemical compounds, protein families, animals etc. In gen… ▽ More

    Submitted 9 November, 2016; originally announced November 2016.

    Comments: 24 pages, 13 tables