Our Vision

There Are Undiscovered Stories in the Words We Publish

Illustration by Yoshi Sodeoka

Natural language processing (N.L.P.), one of the most active fields within artificial intelligence, seeks to help machines understand human language. Advancements in N.L.P. now power everything from voice assistants, automatic voice transcription, intelligent text analysis and new tools that can generate text. R&D is interested in the potential these same underlying breakthroughs may also hold for journalism.

We recently published work using N.L.P. models for reader question and answer tools that might help our readers get quick access to trusted information in a search or assistant-like experience. We shared our thoughts on how researchers should evaluate language models for bias before use in production. We have several other prototype projects underway that utilize natural language processing techniques and will share more soon. We believe we are only at the beginning of the possibilities N.L.P. may bring to journalism.

The following is an incomplete list (in no particular order) of other promising N.L.P. use cases within journalism that we are thinking about. If you are working in this field and have research that you believe could be applied in service of journalism, we’re eager to hear from you.

Possible N.L.P. Use Cases for Journalism

  1. News Q&A. Journalists work to help people understand the world, often by answering questions from readers. We’re interested in techniques to understand reader questions, provide trusted answers and deliver the answers where readers are, whether on the web, mobile, voice assistants or messaging platforms.
  2. Content organization and discovery. The Times offers millions of articles covering a huge array of topics in our 160+ year archive of NYT content. We’re interested in how we might be able to use N.L.P. techniques to more granularly organize content, especially evergreen content, and develop more robust systems to help with search, personalization, recommendation algorithms and more.
  3. Machine translation. The truth knows no language and ensuring equal access to journalism regardless of language is an important yet unrealized goal. Today, for instance, we translate only a small subset of our total articles to Spanish. We’re interested in advancements in machine translation that might allow us to provide all of our news articles in other languages while maintaining a tone and factual accuracy on par with human translations.
  4. Topic explainers. Journalists regularly cover complex topics and provide readers with explainers along the way. We’re interested in how we might enhance the reader experience by surfacing additional coverage of mentioned topics, concepts or entities we may have covered in past reporting to help readers better understand topics covered in coverage.
  5. News timelines. Journalists cover events as they evolve over time. We’re interested in techniques that would allow us to extract significant moments of coverage from our 160+ year archive of NYT content. How might we give readers new ways to understand the news over time? How might we order coverage based on when events happened instead of when an article was published?
  6. Article summarization. Readers are consuming journalism in increasingly diverse ways. We’re interested in techniques for accurately summarizing articles without changing their meaning. How might we provide a version of an article for readers on-the-go, listening on their smartphones or in their cars? How might we summarize updates across topics or fields that a reader is interested in following? How might we make an article more accessible for readers with disabilities?
  7. Article archive text transcription. The Times archive contains over 160 years of articles as images. We’re looking at ways we can achieve human-level performance with text transcriptions from scanned images of documents, including our own archive. How might we use state-of-the-art O.C.R., image classification models, entity extraction, language models and more to unlock the text from millions of archival articles? How might we make any such tooling available to others to unlock the knowledge in their archives?
  8. Understanding potential bias in N.L.P. models. As more organizations incorporate NLP techniques into their work, we hope to extend our early research on bias detection by researching new ways to detect potential bias in models, techniques for mitigating bias and education efforts to help users understand the possible pitfalls of using language models.

Read more about our vision: Journalists Should Be Able to Know Everything Computers Can See

By R&D

Related Projects