Skip to main content

Showing 1–22 of 22 results for author: Newman, B

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.17468  [pdf, other

    cs.CL cs.AI

    WildHallucinations: Evaluating Long-form Factuality in LLMs with Real-World Entity Queries

    Authors: Wenting Zhao, Tanya Goyal, Yu Ying Chiu, Liwei Jiang, Benjamin Newman, Abhilasha Ravichander, Khyathi Chandu, Ronan Le Bras, Claire Cardie, Yuntian Deng, Yejin Choi

    Abstract: While hallucinations of large language models (LLMs) prevail as a major challenge, existing evaluation benchmarks on factuality do not cover the diverse domains of knowledge that the real-world users of LLMs seek information about. To bridge this gap, we introduce WildHallucinations, a benchmark that evaluates factuality. It does so by prompting LLMs to generate information about entities mined fr… ▽ More

    Submitted 24 July, 2024; originally announced July 2024.

  2. arXiv:2407.08876  [pdf, other

    cs.CV cs.RO

    DegustaBot: Zero-Shot Visual Preference Estimation for Personalized Multi-Object Rearrangement

    Authors: Benjamin A. Newman, Pranay Gupta, Kris Kitani, Yonatan Bisk, Henny Admoni, Chris Paxton

    Abstract: De gustibus non est disputandum ("there is no accounting for others' tastes") is a common Latin maxim describing how many solutions in life are determined by people's personal preferences. Many household tasks, in particular, can only be considered fully successful when they account for personal preferences such as the visual aesthetic of the scene. For example, setting a table could be optimized… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: 19 pages, 10 figures

  3. arXiv:2404.10733  [pdf, other

    cs.AI cs.HC cs.RO

    Bootstrapping Linear Models for Fast Online Adaptation in Human-Agent Collaboration

    Authors: Benjamin A Newman, Chris Paxton, Kris Kitani, Henny Admoni

    Abstract: Agents that assist people need to have well-initialized policies that can adapt quickly to align with their partners' reward functions. Initializing policies to maximize performance with unknown partners can be achieved by bootstrapping nonlinear models using imitation learning over large, offline datasets. Such policies can require prohibitive computation to fine-tune in-situ and therefore may mi… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 10 pages, 4 figures, Accepted to AAMAS 2024

  4. arXiv:2401.13045  [pdf

    stat.ML cs.LG stat.AP stat.ME

    Assessment of Sports Concussion in Female Athletes: A Role for Neuroinformatics?

    Authors: Rachel Edelstein, Sterling Gutterman, Benjamin Newman, John Darrell Van Horn

    Abstract: Over the past decade, the intricacies of sports-related concussions among female athletes have become readily apparent. Traditional clinical methods for diagnosing concussions suffer limitations when applied to female athletes, often failing to capture subtle changes in brain structure and function. Advanced neuroinformatics techniques and machine learning models have become invaluable assets in t… ▽ More

    Submitted 9 March, 2024; v1 submitted 23 January, 2024; originally announced January 2024.

  5. arXiv:2311.00059  [pdf, other

    cs.AI cs.CL cs.CV cs.LG

    The Generative AI Paradox: "What It Can Create, It May Not Understand"

    Authors: Peter West, Ximing Lu, Nouha Dziri, Faeze Brahman, Linjie Li, Jena D. Hwang, Liwei Jiang, Jillian Fisher, Abhilasha Ravichander, Khyathi Chandu, Benjamin Newman, Pang Wei Koh, Allyson Ettinger, Yejin Choi

    Abstract: The recent wave of generative AI has sparked unprecedented global attention, with both excitement and concern over potentially superhuman levels of artificial intelligence: models now take only seconds to produce outputs that would challenge or exceed the capabilities even of expert humans. At the same time, models still show basic errors in understanding that would not be expected even in non-exp… ▽ More

    Submitted 31 October, 2023; originally announced November 2023.

  6. arXiv:2305.14772  [pdf, other

    cs.CL

    A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents

    Authors: Benjamin Newman, Luca Soldaini, Raymond Fok, Arman Cohan, Kyle Lo

    Abstract: Many real-world applications (e.g., note taking, search) require extracting a sentence or paragraph from a document and showing that snippet to a human outside of the source document. Yet, users may find snippets difficult to understand as they lack context from the original document. In this work, we use language models to rewrite snippets from scientific documents to be read on their own. First,… ▽ More

    Submitted 30 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: 19 pages, 2 figures, 8 tables, EMNLP2023

  7. Comparing Sentence-Level Suggestions to Message-Level Suggestions in AI-Mediated Communication

    Authors: Liye Fu, Benjamin Newman, Maurice Jakesch, Sarah Kreps

    Abstract: Traditionally, writing assistance systems have focused on short or even single-word suggestions. Recently, large language models like GPT-3 have made it possible to generate significantly longer natural-sounding suggestions, offering more advanced assistance opportunities. This study explores the trade-offs between sentence- vs. message-level suggestions for AI-mediated communication. We recruited… ▽ More

    Submitted 26 February, 2023; originally announced February 2023.

    Comments: 13 pages, 10 figures

  8. arXiv:2211.09110  [pdf, other

    cs.CL cs.AI cs.LG

    Holistic Evaluation of Language Models

    Authors: Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao , et al. (25 additional authors not shown)

    Abstract: Language models (LMs) are becoming the foundation for almost all major language technologies, but their capabilities, limitations, and risks are not well understood. We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models. First, we taxonomize the vast space of potential scenarios (i.e. use cases) and metrics (i.e. desiderata) that are of interest fo… ▽ More

    Submitted 1 October, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Project page: https://crfm.stanford.edu/helm/v1.0

    Journal ref: Published in Transactions on Machine Learning Research (TMLR), 2023

  9. arXiv:2110.07280  [pdf, other

    cs.CL

    P-Adapters: Robustly Extracting Factual Information from Language Models with Diverse Prompts

    Authors: Benjamin Newman, Prafulla Kumar Choubey, Nazneen Rajani

    Abstract: Recent work (e.g. LAMA (Petroni et al., 2019)) has found that the quality of the factual information extracted from Large Language Models (LLMs) depends on the prompts used to query them. This inconsistency is problematic because different users will query LLMs for the same information using different wording, but should receive the same, accurate responses regardless. In this work we aim to addre… ▽ More

    Submitted 19 April, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: 15 pages, 6 figures, 4 tables

  10. arXiv:2108.07258  [pdf, other

    cs.LG cs.AI cs.CY

    On the Opportunities and Risks of Foundation Models

    Authors: Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Ermon, John Etchemendy, Kawin Ethayarajh , et al. (89 additional authors not shown)

    Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their cap… ▽ More

    Submitted 12 July, 2022; v1 submitted 16 August, 2021; originally announced August 2021.

    Comments: Authored by the Center for Research on Foundation Models (CRFM) at the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Report page with citation guidelines: https://crfm.stanford.edu/report.html

  11. arXiv:2104.09635  [pdf, other

    cs.CL

    Refining Targeted Syntactic Evaluation of Language Models

    Authors: Benjamin Newman, Kai-Siang Ang, Julia Gong, John Hewitt

    Abstract: Targeted syntactic evaluation of subject-verb number agreement in English (TSE) evaluates language models' syntactic knowledge using hand-crafted minimal pairs of sentences that differ only in the main verb's conjugation. The method evaluates whether language models rate each grammatical sentence as more likely than its ungrammatical counterpart. We identify two distinct goals for TSE. First, eval… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: 14 pages, 5 figures, 3 tables. To appear at NAACL 2021

    ACM Class: I.2.7

  12. arXiv:2010.07358  [pdf, other

    cs.HC cs.AI

    Optimal Assistance for Object-Rearrangement Tasks in Augmented Reality

    Authors: Benjamin Newman, Kevin Carlberg, Ruta Desai

    Abstract: Augmented-reality (AR) glasses that will have access to onboard sensors and an ability to display relevant information to the user present an opportunity to provide user assistance in quotidian tasks. Many such tasks can be characterized as object-rearrangement tasks. We introduce a novel framework for computing and displaying AR assistance that consists of (1) associating an optimal action sequen… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: 19 pages including supplementary. Under review for ACM IUI 2021

  13. arXiv:2010.07174  [pdf, other

    cs.CL

    The EOS Decision and Length Extrapolation

    Authors: Benjamin Newman, John Hewitt, Percy Liang, Christopher D. Manning

    Abstract: Extrapolation to unseen sequence lengths is a challenge for neural generative models of language. In this work, we characterize the effect on length extrapolation of a modeling decision often overlooked: predicting the end of the generative process through the use of a special end-of-sequence (EOS) vocabulary item. We study an oracle setting - forcing models to generate to the correct sequence len… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

    Comments: 16 page, 7 Figures, 9 Tables, Blackbox NLP Workshop at EMNLP 2020

  14. arXiv:1909.07290  [pdf, other

    cs.CL

    Communication-based Evaluation for Natural Language Generation

    Authors: Benjamin Newman, Reuben Cohn-Gordon, Christopher Potts

    Abstract: Natural language generation (NLG) systems are commonly evaluated using n-gram overlap measures (e.g. BLEU, ROUGE). These measures do not directly capture semantics or speaker intentions, and so they often turn out to be misaligned with our true goals for NLG. In this work, we argue instead for communication-based evaluations: assuming the purpose of an NLG system is to convey information to a read… ▽ More

    Submitted 11 October, 2019; v1 submitted 16 September, 2019; originally announced September 2019.

    Comments: 11 pages, 2 figures, SCiL, camera-ready - clarified certain points, updated acknowledgements

  15. arXiv:1807.11154  [pdf, other

    cs.RO cs.HC

    HARMONIC: A Multimodal Dataset of Assistive Human-Robot Collaboration

    Authors: Benjamin A. Newman, Reuben M. Aronson, Siddartha S. Srinivasa, Kris Kitani, Henny Admoni

    Abstract: We present the Human And Robot Multimodal Observations of Natural Interactive Collaboration (HARMONIC) data set. This is a large multimodal data set of human interactions with a robotic arm in a shared autonomy setting designed to imitate assistive eating. The data set provides human, robot, and environmental data views of twenty-four different people engaged in an assistive eating task with a 6 d… ▽ More

    Submitted 30 July, 2020; v1 submitted 29 July, 2018; originally announced July 2018.

  16. arXiv:1302.3912  [pdf

    cs.HC cs.CY cs.SI

    An Online Environment for Democratic Deliberation: Motivations, Principles, and Design

    Authors: Todd Davies, Brendan O'Connor, Alex Cochran, Jonathan J. Effrat, Andrew Parker, Benjamin Newman, Aaron Tam

    Abstract: We have created a platform for online deliberation called Deme (which rhymes with 'team'). Deme is designed to allow groups of people to engage in collaborative drafting, focused discussion, and decision making using the Internet. The Deme project has evolved greatly from its beginning in 2003. This chapter outlines the thinking behind Deme's initial design: our motivations for creating it, the pr… ▽ More

    Submitted 15 February, 2013; originally announced February 2013.

    Comments: Appeared in Todd Davies and Seeta Peña Gangadharan (Editors), Online Deliberation: Design, Research, and Practice, CSLI Publications/University of Chicago Press, October 2009, pp. 275-292; 18 pages, 3 figures

    ACM Class: H.5.3; K.4.1; K.4.3

  17. arXiv:1302.3545  [pdf

    cs.HC

    Displaying Asynchronous Reactions to a Document: Two Goals and a Design

    Authors: Todd Davies, Benjamin Newman, Brendan O'Connor, Aaron Tam, Leo Perry

    Abstract: We describe and motivate three goals for the screen display of asynchronous text deliberation pertaining to a document: (1) visibility of relationships between comments and the text they reference, between different comments, and between group members and the document and discussion, and (2) distinguishability of boundaries between contextually related and unrelated text and comments and between i… ▽ More

    Submitted 14 February, 2013; originally announced February 2013.

    Comments: Appeared as a Poster Paper, Conference on Computer Supported Cooperative Work, 20th Anniversary - Conference Supplement (CSCW 2006, Banff, November 4-8, 2006), pp. 169-170; Modified as "Document Centered Discussion: A Design Pattern for Online Deliberation", in D. Schuler, Liberating Voices: A Pattern Language for Communication Revolution, MIT Press, 2008, pp. 384-386; 2 pages, 1 figure, 1 table

    ACM Class: H.5.3; I.7.1

  18. arXiv:cs/0306116  [pdf

    cs.MM cs.NI

    Global Platform for Rich Media Conferencing and Collaboration

    Authors: Harvey B. Newman, Philippe Galvez, Gregory Denis, David Collados, Kun Wei, David Adamczyk

    Abstract: The Virtual Rooms Videoconferencing Service (VRVS) provides a worldwide videoconferencing service and collaborative environment to the research and education communities. This system provides a low cost, bandwidth-efficient, extensible means for videoconferencing and remote collaboration over networks within the High Energy and Nuclear Physics communities (HENP). VRVS has become a standard part… ▽ More

    Submitted 15 July, 2003; v1 submitted 19 June, 2003; originally announced June 2003.

    Comments: CHEP03 Conference

    ACM Class: H.5.3

  19. arXiv:cs/0306109  [pdf

    cs.DC cs.DB

    Distributed Heterogeneous Relational Data Warehouse In A Grid Environment

    Authors: Saima Iqbal, Julian J. Bunn, Harvey B. Newman

    Abstract: This paper examines how a "Distributed Heterogeneous Relational Data Warehouse" can be integrated in a Grid environment that will provide physicists with efficient access to large and small object collections drawn from databases at multiple sites. This paper investigates the requirements of Grid-enabling such a warehouse, and explores how these requirements may be met by extensions to existing… ▽ More

    Submitted 18 June, 2003; originally announced June 2003.

    Comments: 4 pages, 6 figures

    ACM Class: H.2.1; H.2.2; H.2.4; H.2.7; H.3.1; H.3.5

  20. arXiv:cs/0306096  [pdf

    cs.DC

    MonALISA : A Distributed Monitoring Service Architecture

    Authors: H. B. Newman, I. C. Legrand, P. Galvez, R. Voicu, C. Cirstoiu

    Abstract: The MonALISA (Monitoring Agents in A Large Integrated Services Architecture) system provides a distributed monitoring service. MonALISA is based on a scalable Dynamic Distributed Services Architecture which is designed to meet the needs of physics collaborations for monitoring global Grid systems, and is implemented using JINI/JAVA and WSDL/SOAP technologies. The scalability of the system derive… ▽ More

    Submitted 16 June, 2003; originally announced June 2003.

    Comments: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 8 pages, pdf. PSN MOET001

    ACM Class: H4.3; H5.2; J2; D2.8

  21. arXiv:cs/0306002  [pdf, ps, other

    cs.DC

    The Clarens web services architecture

    Authors: Conrad D. Steenberg, Eric Aslakson, Julian J. Bunn, Harvey B. Newman, Michael Thomas, Frank van Lingen

    Abstract: Clarens is a uniquely flexible web services infrastructure providing a unified access protocol to a diverse set of functions useful to the HEP community. It uses the standard HTTP protocol combined with application layer, certificate based authentication to provide single sign-on to individuals, organizations and hosts, with fine-grained access control to services, files and virtual organization… ▽ More

    Submitted 14 July, 2003; v1 submitted 30 May, 2003; originally announced June 2003.

    Comments: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 6 pages, LaTeX, 4 figures, PSN MONT008

    ACM Class: H.3.4

  22. arXiv:cs/0306001  [pdf, ps, other

    cs.DC

    Clarens Client and Server Applications

    Authors: Conrad D. Steenberg, Eric Aslakson, Julian J. Bunn, Harvey B. Newman, Michael Thomas, Frank van Lingen

    Abstract: Several applications have been implemented with access via the Clarens web service infrastructure, including virtual organization management, JetMET physics data analysis using relational databases, and Storage Resource Broker (SRB) access. This functionality is accessible transparently from Python scripts, the Root analysis framework and from Java applications and browser applets.

    Submitted 14 July, 2003; v1 submitted 30 May, 2003; originally announced June 2003.

    Comments: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 4 pages, LaTeX, no figures, PSN TUCT005

    ACM Class: H.3.4