Ben Lambert

Ben Lambert

Cambridge, Massachusetts, United States
978 followers 500+ connections

Activity

Join now to see all activity

Experience

  • Nightfall AI Graphic

    Nightfall AI

    Boston, Massachusetts, United States

  • -

    Greater Boston Area

  • -

    Greater Boston Area

  • -

    Cambridge, MA

  • -

    Pittsburgh, PA

  • -

    Urbana-Champaign, Illinois

  • -

    Greater Boston Area

  • -

    Greater Boston Area

Education

  • Carnegie Mellon University Graphic
  • -

    Course work in language technologies, including natural language process, AI, knowledge representation and machine learning.

    Adviser: Scott Fahlman

  • -

    Course- and research-based degree. Research focused on machine learning, and using linguistic information for information retrieval (search engines). Additional experience with teaching introductory computer science courses, and corporate relations and fund-raising.

    Member of the Cognitive Computation research group with Dan Roth.

  • -

    Magna cum laude, Commonwealth College honors, departmental honors. Bachelors of Science in Computer Science, with a minor in Math.

    Undergraduate honors thesis on relation extraction from text with Andrew McCallum.

Publications

  • Discriminatively Trained Dependency Language Modeling for Conversational Speech Recognition

    Interspeech

    We present a discriminatively trained dependency parser based language model. The model operates on utterances, rather than words, and so can utilize long-distance structural features of each sentence. We train the model discriminatively on n-best lists, using the perceptron algorithm to tune the model weights. Our features include standard n-gram style features, long-distance co-occurrence features, and syntactic structural features. We evaluate this model by re-ranking nbest lists of…

    We present a discriminatively trained dependency parser based language model. The model operates on utterances, rather than words, and so can utilize long-distance structural features of each sentence. We train the model discriminatively on n-best lists, using the perceptron algorithm to tune the model weights. Our features include standard n-gram style features, long-distance co-occurrence features, and syntactic structural features. We evaluate this model by re-ranking nbest lists of recognized speech from the Fisher dataset of informal telephone conversations. We compare various combinations of feature types, and methods of training the model.

  • Creating a linguistic plausibility dataset with non-expert annotators

    Interspeech

    We describe the creation of a linguistic plausibility dataset that contains annotated examples of language judged to be linguistically plausible, implausible, and every-thing in between. To create the dataset we randomly generate sentences and have them annotated by crowd sourcing over the Amazon Mechanical Turk. Obtaining inter-annotator agreement is a difficult problem because linguistic
    plausibility is highly subjective. The annotations obtained depend, among other factors, on the manner…

    We describe the creation of a linguistic plausibility dataset that contains annotated examples of language judged to be linguistically plausible, implausible, and every-thing in between. To create the dataset we randomly generate sentences and have them annotated by crowd sourcing over the Amazon Mechanical Turk. Obtaining inter-annotator agreement is a difficult problem because linguistic
    plausibility is highly subjective. The annotations obtained depend, among other factors, on the manner in which annotators are questioned about the plausibility of sentences. We describe our experiments
    on posing a number of different questions to the annotators, in order to elicit the responses with greatest agreement, and present several methods for analyzing the resulting responses. The generated dataset and annotations are being made available to public.

  • The use of sense in unsupervised training of acoustic models for HMM-based ASR systems

    Interspeech

    In unsupervised training of ASR systems, no annotated data are assumed to exist. Word-level annotations for training audio are generated iteratively using an ASR system. At each iteration a subset
    of data judged as having the most reliable transcriptions is selected to train the next set of acoustic models. Data selection however remains a difficult problem, particularly when the error rate of the recognizer providing the initial annotation is very high. In this paper we propose an iterative…

    In unsupervised training of ASR systems, no annotated data are assumed to exist. Word-level annotations for training audio are generated iteratively using an ASR system. At each iteration a subset
    of data judged as having the most reliable transcriptions is selected to train the next set of acoustic models. Data selection however remains a difficult problem, particularly when the error rate of the recognizer providing the initial annotation is very high. In this paper we propose an iterative algorithm that uses a combination of likelihoods and a simple model of sense to select data. We show that the
    algorithm is effective for unsupervised training of acoustic models, particularly when the initial annotation is highly erroneous. Experiments conducted on Fisher-1 data using initial models from Switchboard, and a vocabulary and LM derived from the Google N-grams, show that performance on a selected held-out test set from Fisher data improves more with iterations relative to likelihood-based data selection.

  • A Knowledge-Based Architecture for using Semantics in Automatic Speech Recognition

    Ph.D. Thesis Proposal, Carnegie Mellon University

    Human languages allow us to express infinitely many ideas and expressions, in phenomenally creative ways. The range and variety of expressivity in human language poses a significant challenge to automatic speech recognition. To automatically recognize speech, it is necessary to build computational models of speech, to model both the acoustics and the language. These are typically modeled separately—the acoustics are modeled using signal processing and pattern recognition techniques, whereas…

    Human languages allow us to express infinitely many ideas and expressions, in phenomenally creative ways. The range and variety of expressivity in human language poses a significant challenge to automatic speech recognition. To automatically recognize speech, it is necessary to build computational models of speech, to model both the acoustics and the language. These are typically modeled separately—the acoustics are modeled using signal processing and pattern recognition techniques, whereas the language is typically modeled with a statistical language model. However, current statistical language models lack the ability to incorporate semantic and pragmatic knowledge. I propose to develop and demonstrate a framework for incorporating semantics of spoken language, as represented in a symbolic knowledge base, into automatic speech recognition (ASR) systems.

  • Knowledge-Driven Learning and Discovery

    AAAI

    The goal of our current research is machine learning with the help and guidance of a knowledge base (KB). Rather than learning numerical models, our approach generates explicit symbolic hypotheses. These hypotheses are subject to the constraints of the KB and are easily human-readable and verifiable. Toward this end, we have implemented algorithms that hypothesize new relations and new types of entities in a KB by examining structural regularities in the KB that represent implicit knowledge. We…

    The goal of our current research is machine learning with the help and guidance of a knowledge base (KB). Rather than learning numerical models, our approach generates explicit symbolic hypotheses. These hypotheses are subject to the constraints of the KB and are easily human-readable and verifiable. Toward this end, we have implemented algorithms that hypothesize new relations and new types of entities in a KB by examining structural regularities in the KB that represent implicit knowledge. We evaluate these algorithms on a publications KB and a zoology KB.

  • Improving Information Retrieval with Natural Language Processing

    Master's thesis, University of Illinois at Urbana-Champaign

    The goal of this research is to transform a text corpus into a database rich in linguistic information and robust with multifarious applications. Computational natural language processing liberates the syntax and semantics of natural language to convert a plain text corpus into rich database that stores these implicit linguistic components. Information retrieval techniques index into the otherwise implicit data that is encoded by the language and its syntax. We use Lemur and the Indri query…

    The goal of this research is to transform a text corpus into a database rich in linguistic information and robust with multifarious applications. Computational natural language processing liberates the syntax and semantics of natural language to convert a plain text corpus into rich database that stores these implicit linguistic components. Information retrieval techniques index into the otherwise implicit data that is encoded by the language and its syntax. We use Lemur and the Indri query language to index this linguistic information. The resulting index supports numerous applications that can utilize the additional linguistic information. For one application, searching a document corpus with natural language queries, we show syntactic and semantic information that improve retrieval performance.

  • SconeEdit: A Text-guided Domain Knowledge Editor

    NAACL

    We will demonstrate SconeEdit, a new tool for exploring and editing knowledge bases (KBs) that leverages interaction with domain texts. The tool provides an annotated view of user-selected text, allowing a user to see which concepts from the text are in the KB and to edit the KB directly from
    this Text View. Alongside the Text View, SconeEdit provides a navigable KB View of the knowledge base, centered on concepts that appear in the text. This unified tool gives the user a text-driven way to…

    We will demonstrate SconeEdit, a new tool for exploring and editing knowledge bases (KBs) that leverages interaction with domain texts. The tool provides an annotated view of user-selected text, allowing a user to see which concepts from the text are in the KB and to edit the KB directly from
    this Text View. Alongside the Text View, SconeEdit provides a navigable KB View of the knowledge base, centered on concepts that appear in the text. This unified tool gives the user a text-driven way to explore a KB and add new knowledge.

  • Classifying Entity Relations from Natural Language

    Senior undergraduate honors thesis, University of Massachusetts Amherst

    In an effort to expand upon previous work, this research investigates the automation of classifying relations between entities (such as organizations, people and locations) in natural, written language. This research differs from earlier endeavors in the types of relations sought and the assumed information availability. More general relations, such as located at and part of, are sought on a per document basis with the assumption that complete knowledge of co-reference is available. To automate…

    In an effort to expand upon previous work, this research investigates the automation of classifying relations between entities (such as organizations, people and locations) in natural, written language. This research differs from earlier endeavors in the types of relations sought and the assumed information availability. More general relations, such as located at and part of, are sought on a per document basis with the assumption that complete knowledge of co-reference is available. To automate this task, three feature-based machine learning algorithms are used to train software classifiers. Using only relatively simple features, such as sentence tokens, digrams, and trigrams, many of these relations can be identified. The middling results achieved with few features show this task to have a potential and support further study.

Languages

  • French

    Elementary proficiency

  • Spanish

    Elementary proficiency

  • English

    Native or bilingual proficiency

  • Mandarin Chinese

    Elementary proficiency

Organizations

  • ACL

    -

More activity by Ben

View Ben’s full profile

  • See who you know in common
  • Get introduced
  • Contact Ben directly
Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Ben Lambert in United States

Add new skills with these courses