Skip to main content

Showing 1–13 of 13 results for author: Wexler, J

Searching in archive cs. Search in all archives.
.
  1. arXiv:2403.04894  [pdf, other

    cs.CL cs.AI

    ConstitutionalExperts: Training a Mixture of Principle-based Prompts

    Authors: Savvas Petridis, Ben Wedin, Ann Yuan, James Wexler, Nithum Thain

    Abstract: Large language models (LLMs) are highly capable at a variety of tasks given the right prompt, but writing one is still a difficult and tedious process. In this work, we introduce ConstitutionalExperts, a method for learning a prompt consisting of constitutional principles (i.e. rules), given a training dataset. Unlike prior methods that optimize the prompt as a single entity, our method incrementa… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

  2. Take It, Leave It, or Fix It: Measuring Productivity and Trust in Human-AI Collaboration

    Authors: Crystal Qian, James Wexler

    Abstract: Although recent developments in generative AI have greatly enhanced the capabilities of conversational agents such as Google's Gemini (formerly Bard) or OpenAI's ChatGPT, it's unclear whether the usage of these agents aids users across various contexts. To better understand how access to conversational AI affects productivity and trust, we conducted a mixed-methods, task-based user study, observin… ▽ More

    Submitted 1 April, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

    Comments: 15 pages. Published in the 29th International Conference on Intelligent User Interfaces (IUI '24)

  3. arXiv:2402.14880  [pdf, other

    cs.CL cs.AI cs.HC

    Automatic Histograms: Leveraging Language Models for Text Dataset Exploration

    Authors: Emily Reif, Crystal Qian, James Wexler, Minsuk Kahng

    Abstract: Making sense of unstructured text datasets is perennially difficult, yet increasingly relevant with Large Language Models. Data workers often rely on dataset summaries, especially distributions of various derived features. Some features, like toxicity or topics, are relevant to many datasets, but many interesting features are domain specific: instruments and genres for a music dataset, or diseases… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

  4. arXiv:2402.10524  [pdf, other

    cs.HC cs.AI cs.CL cs.LG

    LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models

    Authors: Minsuk Kahng, Ian Tenney, Mahima Pushkarna, Michael Xieyang Liu, James Wexler, Emily Reif, Krystal Kallarackal, Minsuk Chang, Michael Terry, Lucas Dixon

    Abstract: Automatic side-by-side evaluation has emerged as a promising approach to evaluating the quality of responses from large language models (LLMs). However, analyzing the results from this evaluation approach raises scalability and interpretability challenges. In this paper, we present LLM Comparator, a novel visual analytics tool for interactively analyzing results from automatic side-by-side evaluat… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

  5. arXiv:2310.15428  [pdf, other

    cs.HC cs.AI

    ConstitutionMaker: Interactively Critiquing Large Language Models by Converting Feedback into Principles

    Authors: Savvas Petridis, Ben Wedin, James Wexler, Aaron Donsbach, Mahima Pushkarna, Nitesh Goyal, Carrie J. Cai, Michael Terry

    Abstract: Large language model (LLM) prompting is a promising new approach for users to create and customize their own chatbots. However, current methods for steering a chatbot's outputs, such as prompt engineering and fine-tuning, do not support users in converting their natural feedback on the model's outputs to changes in the prompt or model. In this work, we explore how to enable users to interactively… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  6. arXiv:2112.06989  [pdf, other

    cs.LG

    Analyzing a Caching Model

    Authors: Leon Sixt, Evan Zheran Liu, Marie Pellat, James Wexler, Milad Hashemi, Been Kim, Martin Maas

    Abstract: Machine Learning has been successfully applied in systems applications such as memory prefetching and caching, where learned models have been shown to outperform heuristics. However, the lack of understanding the inner workings of these models -- interpretability -- remains a major obstacle for adoption in real-world deployments. Understanding a model's behavior can help system administrators and… ▽ More

    Submitted 11 February, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

    Comments: Presented at the Neurips 2021 Workshop ML for System

  7. arXiv:2106.08641  [pdf, other

    cs.LG

    Best of both worlds: local and global explanations with human-understandable concepts

    Authors: Jessica Schrouff, Sebastien Baur, Shaobo Hou, Diana Mincu, Eric Loreaux, Ralph Blanes, James Wexler, Alan Karthikesalingam, Been Kim

    Abstract: Interpretability techniques aim to provide the rationale behind a model's decision, typically by explaining either an individual prediction (local explanation, e.g. 'why is this patient diagnosed with this condition') or a class of predictions (global explanation, e.g. 'why is this set of patients diagnosed with this condition in general'). While there are many methods focused on either one, few f… ▽ More

    Submitted 31 January, 2022; v1 submitted 16 June, 2021; originally announced June 2021.

  8. arXiv:2008.05122  [pdf, other

    cs.CL

    The Language Interpretability Tool: Extensible, Interactive Visualizations and Analysis for NLP Models

    Authors: Ian Tenney, James Wexler, Jasmijn Bastings, Tolga Bolukbasi, Andy Coenen, Sebastian Gehrmann, Ellen Jiang, Mahima Pushkarna, Carey Radebaugh, Emily Reif, Ann Yuan

    Abstract: We present the Language Interpretability Tool (LIT), an open-source platform for visualization and understanding of NLP models. We focus on core questions about model behavior: Why did my model make this prediction? When does it perform poorly? What happens under a controlled change in the input? LIT integrates local explanations, aggregate analysis, and counterfactual generation into a streamline… ▽ More

    Submitted 12 August, 2020; originally announced August 2020.

  9. arXiv:1907.06637  [pdf, other

    cs.SD cs.HC cs.LG eess.AS stat.ML

    The Bach Doodle: Approachable music composition with machine learning at scale

    Authors: Cheng-Zhi Anna Huang, Curtis Hawthorne, Adam Roberts, Monica Dinculescu, James Wexler, Leon Hong, Jacob Howcroft

    Abstract: To make music composition more approachable, we designed the first AI-powered Google Doodle, the Bach Doodle, where users can create their own melody and have it harmonized by a machine learning model Coconet (Huang et al., 2017) in the style of Bach. For users to input melodies, we designed a simplified sheet-music based interface. To support an interactive experience at scale, we re-implemented… ▽ More

    Submitted 14 July, 2019; originally announced July 2019.

    Comments: Proceedings of the 18th International Society for Music Information Retrieval Conference, ISMIR 2019

  10. The What-If Tool: Interactive Probing of Machine Learning Models

    Authors: James Wexler, Mahima Pushkarna, Tolga Bolukbasi, Martin Wattenberg, Fernanda Viegas, Jimbo Wilson

    Abstract: A key challenge in developing and deploying Machine Learning (ML) systems is understanding their performance across a wide range of inputs. To address this challenge, we created the What-If Tool, an open-source application that allows practitioners to probe, visualize, and analyze ML systems, with minimal coding. The What-If Tool lets practitioners test performance in hypothetical situations, anal… ▽ More

    Submitted 3 October, 2019; v1 submitted 9 July, 2019; originally announced July 2019.

    Comments: IEEE VIS (VAST) 2019

    ACM Class: H.5.2

  11. arXiv:1902.03129  [pdf, other

    stat.ML cs.CV cs.LG

    Towards Automatic Concept-based Explanations

    Authors: Amirata Ghorbani, James Wexler, James Zou, Been Kim

    Abstract: Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make important decisions. Most of the current explanation methods provide explanations through feature importance scores, which identify features that are important for each individual input. However, how to systematically summarize and interpret such per sample feature… ▽ More

    Submitted 8 October, 2019; v1 submitted 6 February, 2019; originally announced February 2019.

  12. arXiv:1810.05798  [pdf, other

    cs.HC

    ClinicalVis: Supporting Clinical Task-Focused Design Evaluation

    Authors: Marzyeh Ghassemi, Mahima Pushkarna, James Wexler, Jesse Johnson, Paul Varghese

    Abstract: Making decisions about what clinical tasks to prepare for is multi-factored, and especially challenging in intensive care environments where resources must be balanced with patient needs. Electronic health records (EHRs) are a rich data source, but are task-agnostic and can be difficult to use as summarizations of patient needs for a specific task, such as "could this patient need a ventilator tom… ▽ More

    Submitted 13 October, 2018; originally announced October 2018.

  13. Scalable and accurate deep learning for electronic health records

    Authors: Alvin Rajkomar, Eyal Oren, Kai Chen, Andrew M. Dai, Nissan Hajaj, Peter J. Liu, Xiaobing Liu, Mimi Sun, Patrik Sundberg, Hector Yee, Kun Zhang, Gavin E. Duggan, Gerardo Flores, Michaela Hardt, Jamie Irvine, Quoc Le, Kurt Litsch, Jake Marcus, Alexander Mossin, Justin Tansuwan, De Wang, James Wexler, Jimbo Wilson, Dana Ludwig, Samuel L. Volchenboum , et al. (9 additional authors not shown)

    Abstract: Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of p… ▽ More

    Submitted 11 May, 2018; v1 submitted 24 January, 2018; originally announced January 2018.

    Comments: Published version from https://www.nature.com/articles/s41746-018-0029-1

    Journal ref: npj Digital Medicine 1:18 (2018)