Ben Lambert

Cambridge, Massachusetts, United States

978 followers 500+ connections

View mutual connections with Ben

Welcome back

Email or phone

Password

Forgot password?

or

New to LinkedIn? Join now

or

New to LinkedIn? Join now

Join to view profile

Nightfall AI

Carnegie Mellon University

Activity

Officially my last week at Walmart after my family’s decision to not accept the required relocation. Incredibly grateful for these 2 years and the…

Officially my last week at Walmart after my family’s decision to not accept the required relocation. Incredibly grateful for these 2 years and the…

Liked by Ben Lambert
News: I wrote a book! You can't read it until June 2024, but it was announced to the UK trade press today, so at least it's no longer secret. It's…

News: I wrote a book! You can't read it until June 2024, but it was announced to the UK trade press today, so at least it's no longer secret. It's…

Liked by Ben Lambert
First day physically in the office at Spotify in my new role as Director of Research, Large Language Models. I am excited to join the band!…

First day physically in the office at Spotify in my new role as Director of Research, Large Language Models. I am excited to join the band!…

Liked by Ben Lambert

Join now to see all activity

Experience

Nightfall AI

Boston, Massachusetts, United States
-

Greater Boston Area
-

Greater Boston Area
-

Cambridge, MA
-

Pittsburgh, PA
-

Urbana-Champaign, Illinois
-

Greater Boston Area
-

Greater Boston Area

Education

Carnegie Mellon University

2007 - 2013
2005 - 2007

Course work in language technologies, including natural language process, AI, knowledge representation and machine learning.

Adviser: Scott Fahlman
2003 - 2005

Course- and research-based degree. Research focused on machine learning, and using linguistic information for information retrieval (search engines). Additional experience with teaching introductory computer science courses, and corporate relations and fund-raising.

Member of the Cognitive Computation research group with Dan Roth.
1999 - 2003

Magna cum laude, Commonwealth College honors, departmental honors. Bachelors of Science in Computer Science, with a minor in Math.

Undergraduate honors thesis on relation extraction from text with Andrew McCallum.

Publications

Discriminatively Trained Dependency Language Modeling for Conversational Speech Recognition

Interspeech 2013

We present a discriminatively trained dependency parser based language model. The model operates on utterances, rather than words, and so can utilize long-distance structural features of each sentence. We train the model discriminatively on n-best lists, using the perceptron algorithm to tune the model weights. Our features include standard n-gram style features, long-distance co-occurrence features, and syntactic structural features. We evaluate this model by re-ranking nbest lists of…

We present a discriminatively trained dependency parser based language model. The model operates on utterances, rather than words, and so can utilize long-distance structural features of each sentence. We train the model discriminatively on n-best lists, using the perceptron algorithm to tune the model weights. Our features include standard n-gram style features, long-distance co-occurrence features, and syntactic structural features. We evaluate this model by re-ranking nbest lists of recognized speech from the Fisher dataset of informal telephone conversations. We compare various combinations of feature types, and methods of training the model.
Creating a linguistic plausibility dataset with non-expert annotators

Interspeech 2010

We describe the creation of a linguistic plausibility dataset that contains annotated examples of language judged to be linguistically plausible, implausible, and every-thing in between. To create the dataset we randomly generate sentences and have them annotated by crowd sourcing over the Amazon Mechanical Turk. Obtaining inter-annotator agreement is a difficult problem because linguistic
plausibility is highly subjective. The annotations obtained depend, among other factors, on the manner…

We describe the creation of a linguistic plausibility dataset that contains annotated examples of language judged to be linguistically plausible, implausible, and every-thing in between. To create the dataset we randomly generate sentences and have them annotated by crowd sourcing over the Amazon Mechanical Turk. Obtaining inter-annotator agreement is a difficult problem because linguistic
plausibility is highly subjective. The annotations obtained depend, among other factors, on the manner in which annotators are questioned about the plausibility of sentences. We describe our experiments
on posing a number of different questions to the annotators, in order to elicit the responses with greatest agreement, and present several methods for analyzing the resulting responses. The generated dataset and annotations are being made available to public.
The use of sense in unsupervised training of acoustic models for HMM-based ASR systems

Interspeech 2010

In unsupervised training of ASR systems, no annotated data are assumed to exist. Word-level annotations for training audio are generated iteratively using an ASR system. At each iteration a subset
of data judged as having the most reliable transcriptions is selected to train the next set of acoustic models. Data selection however remains a difficult problem, particularly when the error rate of the recognizer providing the initial annotation is very high. In this paper we propose an iterative…

In unsupervised training of ASR systems, no annotated data are assumed to exist. Word-level annotations for training audio are generated iteratively using an ASR system. At each iteration a subset
of data judged as having the most reliable transcriptions is selected to train the next set of acoustic models. Data selection however remains a difficult problem, particularly when the error rate of the recognizer providing the initial annotation is very high. In this paper we propose an iterative algorithm that uses a combination of likelihoods and a simple model of sense to select data. We show that the
algorithm is effective for unsupervised training of acoustic models, particularly when the initial annotation is highly erroneous. Experiments conducted on Fisher-1 data using initial models from Switchboard, and a vocabulary and LM derived from the Google N-grams, show that performance on a selected held-out test set from Fisher data improves more with iterations relative to likelihood-based data selection.
A Knowledge-Based Architecture for using Semantics in Automatic Speech Recognition

Ph.D. Thesis Proposal, Carnegie Mellon University 2009

Human languages allow us to express infinitely many ideas and expressions, in phenomenally creative ways. The range and variety of expressivity in human language poses a significant challenge to automatic speech recognition. To automatically recognize speech, it is necessary to build computational models of speech, to model both the acoustics and the language. These are typically modeled separately—the acoustics are modeled using signal processing and pattern recognition techniques, whereas…

Human languages allow us to express infinitely many ideas and expressions, in phenomenally creative ways. The range and variety of expressivity in human language poses a significant challenge to automatic speech recognition. To automatically recognize speech, it is necessary to build computational models of speech, to model both the acoustics and the language. These are typically modeled separately—the acoustics are modeled using signal processing and pattern recognition techniques, whereas the language is typically modeled with a statistical language model. However, current statistical language models lack the ability to incorporate semantic and pragmatic knowledge. I propose to develop and demonstrate a framework for incorporating semantics of spoken language, as represented in a symbolic knowledge base, into automatic speech recognition (ASR) systems.
Knowledge-Driven Learning and Discovery

AAAI 2007

The goal of our current research is machine learning with the help and guidance of a knowledge base (KB). Rather than learning numerical models, our approach generates explicit symbolic hypotheses. These hypotheses are subject to the constraints of the KB and are easily human-readable and verifiable. Toward this end, we have implemented algorithms that hypothesize new relations and new types of entities in a KB by examining structural regularities in the KB that represent implicit knowledge. We…

The goal of our current research is machine learning with the help and guidance of a knowledge base (KB). Rather than learning numerical models, our approach generates explicit symbolic hypotheses. These hypotheses are subject to the constraints of the KB and are easily human-readable and verifiable. Toward this end, we have implemented algorithms that hypothesize new relations and new types of entities in a KB by examining structural regularities in the KB that represent implicit knowledge. We evaluate these algorithms on a publications KB and a zoology KB.
Improving Information Retrieval with Natural Language Processing

Master's thesis, University of Illinois at Urbana-Champaign May 2006

The goal of this research is to transform a text corpus into a database rich in linguistic information and robust with multifarious applications. Computational natural language processing liberates the syntax and semantics of natural language to convert a plain text corpus into rich database that stores these implicit linguistic components. Information retrieval techniques index into the otherwise implicit data that is encoded by the language and its syntax. We use Lemur and the Indri query…

The goal of this research is to transform a text corpus into a database rich in linguistic information and robust with multifarious applications. Computational natural language processing liberates the syntax and semantics of natural language to convert a plain text corpus into rich database that stores these implicit linguistic components. Information retrieval techniques index into the otherwise implicit data that is encoded by the language and its syntax. We use Lemur and the Indri query language to index this linguistic information. The resulting index supports numerous applications that can utilize the additional linguistic information. For one application, searching a document corpus with natural language queries, we show syntactic and semantic information that improve retrieval performance.
SconeEdit: A Text-guided Domain Knowledge Editor

NAACL 2006

We will demonstrate SconeEdit, a new tool for exploring and editing knowledge bases (KBs) that leverages interaction with domain texts. The tool provides an annotated view of user-selected text, allowing a user to see which concepts from the text are in the KB and to edit the KB directly from
this Text View. Alongside the Text View, SconeEdit provides a navigable KB View of the knowledge base, centered on concepts that appear in the text. This unified tool gives the user a text-driven way to…

We will demonstrate SconeEdit, a new tool for exploring and editing knowledge bases (KBs) that leverages interaction with domain texts. The tool provides an annotated view of user-selected text, allowing a user to see which concepts from the text are in the KB and to edit the KB directly from
this Text View. Alongside the Text View, SconeEdit provides a navigable KB View of the knowledge base, centered on concepts that appear in the text. This unified tool gives the user a text-driven way to explore a KB and add new knowledge.
Classifying Entity Relations from Natural Language

Senior undergraduate honors thesis, University of Massachusetts Amherst Jun 2003

In an effort to expand upon previous work, this research investigates the automation of classifying relations between entities (such as organizations, people and locations) in natural, written language. This research differs from earlier endeavors in the types of relations sought and the assumed information availability. More general relations, such as located at and part of, are sought on a per document basis with the assumption that complete knowledge of co-reference is available. To automate…

In an effort to expand upon previous work, this research investigates the automation of classifying relations between entities (such as organizations, people and locations) in natural, written language. This research differs from earlier endeavors in the types of relations sought and the assumed information availability. More general relations, such as located at and part of, are sought on a per document basis with the assumption that complete knowledge of co-reference is available. To automate this task, three feature-based machine learning algorithms are used to train software classifiers. Using only relatively simple features, such as sentence tokens, digrams, and trigrams, many of these relations can be identified. The middling results achieved with few features show this task to have a potential and support further study.

Languages

French

Elementary proficiency
Spanish

Elementary proficiency
English

Native or bilingual proficiency
Mandarin Chinese

Elementary proficiency

Organizations

ACL

-

More activity by Ben

The AI team at Dataminr has 3 papers at the #acl2023 conference this week on topics ranging from AI for Good, hallucinations in LLMs, and event…

The AI team at Dataminr has 3 papers at the #acl2023 conference this week on topics ranging from AI for Good, hallucinations in LLMs, and event…

Liked by Ben Lambert
Friends, I’m kicking off my job search and am very excited by the tech innovation in the Boston area. Please reach out if you'd like to talk more!…

Friends, I’m kicking off my job search and am very excited by the tech innovation in the Boston area. Please reach out if you'd like to talk more!…

Liked by Ben Lambert

View Ben’s full profile

See who you know in common
Get introduced
Contact Ben directly

Join to view full profile

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Others named Ben Lambert in United States

129 others named Ben Lambert in United States are on LinkedIn

See others named Ben Lambert

Add new skills with these courses

See all courses

Ben Lambert

Cambridge, Massachusetts, United States 978 followers 500+ connections

See your mutual connections View mutual connections with Ben Sign in Welcome back Email or phone Password Show Forgot password? Sign in or New to LinkedIn? Join now or New to LinkedIn? Join now

Activity

Officially my last week at Walmart after my family’s decision to not accept the required relocation. Incredibly grateful for these 2 years and the…

Liked by Ben Lambert

News: I wrote a book! You can't read it until June 2024, but it was announced to the UK trade press today, so at least it's no longer secret. It's…

Liked by Ben Lambert

First day physically in the office at Spotify in my new role as Director of Research, Large Language Models. I am excited to join the band!…

Liked by Ben Lambert

Experience

Nightfall AI

-

-

-

-

-

-

-

Education

Carnegie Mellon University

Publications

Discriminatively Trained Dependency Language Modeling for Conversational Speech Recognition

Interspeech 2013

Creating a linguistic plausibility dataset with non-expert annotators

Interspeech 2010

The use of sense in unsupervised training of acoustic models for HMM-based ASR systems

Interspeech 2010

A Knowledge-Based Architecture for using Semantics in Automatic Speech Recognition

Ph.D. Thesis Proposal, Carnegie Mellon University 2009

Knowledge-Driven Learning and Discovery

AAAI 2007

Improving Information Retrieval with Natural Language Processing

Master's thesis, University of Illinois at Urbana-Champaign May 2006

SconeEdit: A Text-guided Domain Knowledge Editor

NAACL 2006

Classifying Entity Relations from Natural Language

Senior undergraduate honors thesis, University of Massachusetts Amherst Jun 2003

Languages

French

Elementary proficiency

Spanish

Elementary proficiency

English

Native or bilingual proficiency

Mandarin Chinese

Elementary proficiency

Organizations

ACL

-

More activity by Ben

The AI team at Dataminr has 3 papers at the #acl2023 conference this week on topics ranging from AI for Good, hallucinations in LLMs, and event…

Liked by Ben Lambert

Friends, I’m kicking off my job search and am very excited by the tech innovation in the Boston area. Please reach out if you'd like to talk more!…

Liked by Ben Lambert

View Ben’s full profile

Other similar profiles

Soumith Chintala

Ari Chanen

Manohar Paluri

Nikolaos Vasiloglou

Daniel Mahler

Rohit Prasad

Navid Shahdi, Ph.D.

Mohamed El-Geish

Sunny Tahilramani

Peter Norvig

Ewa Dominowska

Tyler Folkman

Hassan K

Animesh Singh

Han Qin

Claire (Na) Cheng

Prashant Shah

Danny Lange

Michael Hayes

John Sears

Explore collaborative articles

Others named Ben Lambert in United States

Ben Lambert

Cambridge, Massachusetts, United States

978 followers 500+ connections

View mutual connections with Ben

Welcome back

Email or phone

Password

Forgot password?

or

New to LinkedIn? Join now

or

New to LinkedIn? Join now