Skip to main content

Showing 1–6 of 6 results for author: Lee, B C G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2207.02960  [pdf, other

    cs.LG cs.DL

    The "Collections as ML Data" Checklist for Machine Learning & Cultural Heritage

    Authors: Benjamin Charles Germain Lee

    Abstract: Within the cultural heritage sector, there has been a growing and concerted effort to consider a critical sociotechnical lens when applying machine learning techniques to digital collections. Though the cultural heritage community has collectively developed an emerging body of work detailing responsible operations for machine learning in libraries and other cultural heritage institutions at the or… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: 32 pages

  2. arXiv:2112.02471  [pdf

    cs.DL cs.IR

    Grappling with the Scale of Born-Digital Government Publications: Toward Pipelines for Processing and Searching Millions of PDFs

    Authors: Benjamin Charles Germain Lee, Trevor Owens

    Abstract: Official government publications are key sources for understanding the history of societies. Web publishing has fundamentally changed the scale and processes by which governments produce and disseminate information. Significantly, a range of web archiving programs have captured massive troves of government publications. For example, hundreds of millions of unique U.S. Government documents posted t… ▽ More

    Submitted 4 December, 2021; originally announced December 2021.

    Comments: 22 pages, 4 figures

  3. arXiv:2109.01732  [pdf, other

    cs.CV cs.DL cs.IR

    Navigating the Mise-en-Page: Interpretive Machine Learning Approaches to the Visual Layouts of Multi-Ethnic Periodicals

    Authors: Benjamin Charles Germain Lee, Joshua Ortiz Baco, Sarah H. Salter, Jim Casey

    Abstract: This paper presents a computational method of analysis that draws from machine learning, library science, and literary studies to map the visual layouts of multi-ethnic newspapers from the late 19th and early 20th century United States. This work departs from prior approaches to newspapers that focus on individual pieces of textual and visual content. Our method combines Chronicling America's MARC… ▽ More

    Submitted 3 September, 2021; originally announced September 2021.

    Comments: 13 pages, 4 figures

  4. arXiv:2103.15348  [pdf, other

    cs.CV cs.AI

    LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis

    Authors: Zejiang Shen, Ruochen Zhang, Melissa Dell, Benjamin Charles Germain Lee, Jacob Carlson, Weining Li

    Abstract: Recent advances in document image analysis (DIA) have been primarily driven by the application of neural networks. Ideally, research outcomes could be easily deployed in production and extended for further investigation. However, various factors like loosely organized codebases and sophisticated model configurations complicate the easy reuse of important innovations by a wide audience. Though ther… ▽ More

    Submitted 21 June, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

    Comments: Accepted at ICDAR 2021, 16 pages, 6 figures, 2 tables

  5. arXiv:2005.01583  [pdf, other

    cs.IR cs.CV cs.LG

    The Newspaper Navigator Dataset: Extracting And Analyzing Visual Content from 16 Million Historic Newspaper Pages in Chronicling America

    Authors: Benjamin Charles Germain Lee, Jaime Mears, Eileen Jakeway, Meghan Ferriter, Chris Adams, Nathan Yarasavage, Deborah Thomas, Kate Zwaard, Daniel S. Weld

    Abstract: Chronicling America is a product of the National Digital Newspaper Program, a partnership between the Library of Congress and the National Endowment for the Humanities to digitize historic newspapers. Over 16 million pages of historic American newspapers have been digitized for Chronicling America to date, complete with high-resolution images and machine-readable METS/ALTO OCR. Of considerable int… ▽ More

    Submitted 4 May, 2020; originally announced May 2020.

    Comments: 14 pages, 5 figures

  6. arXiv:2003.04315  [pdf, ps, other

    cs.IR cs.LG stat.ML

    LIMEADE: From AI Explanations to Advice Taking

    Authors: Benjamin Charles Germain Lee, Doug Downey, Kyle Lo, Daniel S. Weld

    Abstract: Research in human-centered AI has shown the benefits of systems that can explain their predictions. Methods that allow an AI to take advice from humans in response to explanations are similarly useful. While both capabilities are well-developed for transparent learning models (e.g., linear models and GA$^2$Ms), and recent techniques (e.g., LIME and SHAP) can generate explanations for opaque models… ▽ More

    Submitted 17 January, 2023; v1 submitted 9 March, 2020; originally announced March 2020.

    Comments: 18 pages, 7 figures