Skip to main content

Showing 1–17 of 17 results for author: Hall, K

Searching in archive cs. Search in all archives.
.
  1. arXiv:2309.10539  [pdf, other

    cs.CL cs.AI

    OpenMSD: Towards Multilingual Scientific Documents Similarity Measurement

    Authors: Yang Gao, Ji Ma, Ivan Korotkov, Keith Hall, Dana Alon, Don Metzler

    Abstract: We develop and evaluate multilingual scientific documents similarity measurement models in this work. Such models can be used to find related works in different languages, which can help multilingual researchers find and explore papers more efficiently. We propose the first multilingual scientific documents dataset, Open-access Multilingual Scientific Documents (OpenMSD), which has 74M papers in 1… ▽ More

    Submitted 19 September, 2023; originally announced September 2023.

    Comments: Scripts for constructing the OpenMSD dataset is available at: https://github.com/google-research/google-research/tree/master/OpenMSD

  2. My Model is Unfair, Do People Even Care? Visual Design Affects Trust and Perceived Bias in Machine Learning

    Authors: Aimen Gaba, Zhanna Kaufman, Jason Chueng, Marie Shvakel, Kyle Wm. Hall, Yuriy Brun, Cindy Xiong Bearfield

    Abstract: Machine learning technology has become ubiquitous, but, unfortunately, often exhibits bias. As a consequence, disparate stakeholders need to interact with and make informed decisions about using machine learning models in everyday systems. Visualization technology can support stakeholders in understanding and evaluating trade-offs between, for example, accuracy and fairness of models. This paper a… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: 11 pages, 6 figures, to appear in IEEE Transactions of Visualization and Computer Graphics (Also in proceedings of IEEE VIS 2023)

    ACM Class: H.5.0

    Journal ref: IEEE TVCG 30(1):327-337

  3. arXiv:2212.10528  [pdf, other

    cs.CL cs.IR

    HYRR: Hybrid Infused Reranking for Passage Retrieval

    Authors: Jing Lu, Keith Hall, Ji Ma, Jianmo Ni

    Abstract: We present Hybrid Infused Reranking for Passages Retrieval (HYRR), a framework for training rerankers based on a hybrid of BM25 and neural retrieval models. Retrievers based on hybrid models have been shown to outperform both BM25 and neural models alone. Our approach exploits this improved performance when training a reranker, leading to a robust reranking model. The reranker, a cross-attention n… ▽ More

    Submitted 20 December, 2022; originally announced December 2022.

  4. arXiv:2210.04723  [pdf, other

    cs.AI cs.HC

    Experiential Explanations for Reinforcement Learning

    Authors: Amal Alabdulkarim, Madhuri Singh, Gennie Mansi, Kaely Hall, Mark O. Riedl

    Abstract: Reinforcement Learning (RL) systems can be complex and non-interpretable, making it challenging for non-AI experts to understand or intervene in their decisions. This is due in part to the sequential nature of RL in which actions are chosen because of future rewards. However, RL agents discard the qualitative features of their training, making it difficult to recover user-understandable informatio… ▽ More

    Submitted 13 December, 2023; v1 submitted 10 October, 2022; originally announced October 2022.

    Comments: preprint

  5. arXiv:2209.11755  [pdf, other

    cs.CL cs.IR

    Promptagator: Few-shot Dense Retrieval From 8 Examples

    Authors: Zhuyun Dai, Vincent Y. Zhao, Ji Ma, Yi Luan, Jianmo Ni, Jing Lu, Anton Bakalov, Kelvin Guu, Keith B. Hall, Ming-Wei Chang

    Abstract: Much recent research on information retrieval has focused on how to transfer from one task (typically with abundant supervised data) to various other tasks where supervision is limited, with the implicit assumption that it is possible to generalize from one task to all the rest. However, this overlooks the fact that there are many diverse and unique retrieval tasks, each targeting different search… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

  6. arXiv:2201.06469  [pdf, ps, other

    cs.CL

    Handling Compounding in Mobile Keyboard Input

    Authors: Andreas Kabel, Keith Hall, Tom Ouyang, David Rybach, Daan van Esch, Françoise Beaufays

    Abstract: This paper proposes a framework to improve the typing experience of mobile users in morphologically rich languages. Smartphone keyboards typically support features such as input decoding, corrections and predictions that all rely on language models. For latency reasons, these operations happen on device, so the models are of limited size and cannot easily cover all the words needed by users for th… ▽ More

    Submitted 17 January, 2022; originally announced January 2022.

    Comments: 7 pages

  7. arXiv:2201.01745  [pdf, other

    cs.IR cs.CL

    Atomized Search Length: Beyond User Models

    Authors: John Alex, Keith Hall, Donald Metzler

    Abstract: We argue that current IR metrics, modeled on optimizing user experience, measure too narrow a portion of the IR space. If IR systems are weak, these metrics undersample or completely filter out the deeper documents that need improvement. If IR systems are relatively strong, these metrics undersample deeper relevant documents that could underpin even stronger IR systems, ones that could present con… ▽ More

    Submitted 5 January, 2022; originally announced January 2022.

    Comments: 13 pages, 6 figures

  8. arXiv:2112.07899  [pdf, other

    cs.IR cs.CL

    Large Dual Encoders Are Generalizable Retrievers

    Authors: Jianmo Ni, Chen Qu, Jing Lu, Zhuyun Dai, Gustavo Hernández Ábrego, Ji Ma, Vincent Y. Zhao, Yi Luan, Keith B. Hall, Ming-Wei Chang, Yinfei Yang

    Abstract: It has been shown that dual encoders trained on one domain often fail to generalize to other domains for retrieval tasks. One widespread belief is that the bottleneck layer of a dual encoder, where the final score is simply a dot-product between a query vector and a passage vector, is too limited to make dual encoders an effective retrieval model for out-of-domain generalization. In this paper, we… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

  9. arXiv:2108.08877  [pdf, other

    cs.CL

    Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models

    Authors: Jianmo Ni, Gustavo Hernández Ábrego, Noah Constant, Ji Ma, Keith B. Hall, Daniel Cer, Yinfei Yang

    Abstract: We provide the first exploration of sentence embeddings from text-to-text transformers (T5). Sentence embeddings are broadly useful for language processing tasks. While T5 achieves impressive performance on language tasks cast as sequence-to-sequence mapping problems, it is unclear how to produce sentence embeddings from encoder-decoder models. We investigate three methods for extracting T5 senten… ▽ More

    Submitted 14 December, 2021; v1 submitted 19 August, 2021; originally announced August 2021.

  10. arXiv:2108.02333  [pdf, other

    cs.HC cs.GR

    Professional Differences: A Comparative Study of Visualization Task Performance and Spatial Ability Across Disciplines

    Authors: Kyle Wm. Hall, Anthony Kouroupis, Anastasia Bezerianos, Danielle Albers Szafir, Christopher Collins

    Abstract: Problem-driven visualization work is rooted in deeply understanding the data, actors, processes, and workflows of a target domain. However, an individual's personality traits and cognitive abilities may also influence visualization use. Diverse user needs and abilities raise natural questions for specificity in visualization design: Could individuals from different domains exhibit performance diff… ▽ More

    Submitted 4 August, 2021; originally announced August 2021.

    Comments: The paper has been accepted to IEEE VIS 2021, and will appear in IEEE TVCG. 11 pages with 9 figures

    ACM Class: H.5.0; H.5.2; I.3.0

  11. arXiv:2010.00200  [pdf, other

    cs.IR cs.CL

    RRF102: Meeting the TREC-COVID Challenge with a 100+ Runs Ensemble

    Authors: Michael Bendersky, Honglei Zhuang, Ji Ma, Shuguang Han, Keith Hall, Ryan McDonald

    Abstract: In this paper, we report the results of our participation in the TREC-COVID challenge. To meet the challenge of building a search engine for rapidly evolving biomedical collection, we propose a simple yet effective weighted hierarchical rank fusion approach, that ensembles together 102 runs from (a) lexical and semantic retrieval systems, (b) pre-trained and fine-tuned BERT rankers, and (c) releva… ▽ More

    Submitted 1 October, 2020; originally announced October 2020.

    Comments: 14 pages

  12. arXiv:2007.01176  [pdf

    cs.CL

    Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset

    Authors: Brian Roark, Lawrence Wolf-Sonkin, Christo Kirov, Sabrina J. Mielke, Cibu Johny, Isin Demirsahin, Keith Hall

    Abstract: This paper describes the Dakshina dataset, a new resource consisting of text in both the Latin and native scripts for 12 South Asian languages. The dataset includes, for each language: 1) native script Wikipedia text; 2) a romanization lexicon; and 3) full sentence parallel data in both a native script of the language and the basic Latin alphabet. We document the methods used for preparation and s… ▽ More

    Submitted 2 July, 2020; originally announced July 2020.

    Comments: Published at LREC 2020

  13. arXiv:2005.14288  [pdf, other

    cs.CV cs.IR

    ePillID Dataset: A Low-Shot Fine-Grained Benchmark for Pill Identification

    Authors: Naoto Usuyama, Natalia Larios Delgado, Amanda K. Hall, Jessica Lundin

    Abstract: Identifying prescription medications is a frequent task for patients and medical professionals; however, this is an error-prone task as many pills have similar appearances (e.g. white round pills), which increases the risk of medication errors. In this paper, we introduce ePillID, the largest public benchmark on pill image recognition, composed of 13k images representing 9804 appearance classes (t… ▽ More

    Submitted 7 September, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

    Comments: CVPR 2020 VL3. Project Page: https://github.com/usuyama/ePillID-benchmark

  14. arXiv:2004.14503  [pdf, other

    cs.IR cs.CL

    Zero-shot Neural Passage Retrieval via Domain-targeted Synthetic Question Generation

    Authors: Ji Ma, Ivan Korotkov, Yinfei Yang, Keith Hall, Ryan McDonald

    Abstract: A major obstacle to the wide-spread adoption of neural retrieval models is that they require large supervised training sets to surpass traditional term-based techniques, which are constructed from raw corpora. In this paper, we propose an approach to zero-shot learning for passage retrieval that uses synthetic question generation to close this gap. The question generation system is trained on gene… ▽ More

    Submitted 27 January, 2021; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: 14 pages, 4 figures

  15. Design by Immersion: A Transdisciplinary Approach to Problem-Driven Visualizations

    Authors: Kyle Wm. Hall, Adam J. Bradley, Uta Hinrichs, Samuel Huron, Jo Wood, Christopher Collins, Sheelagh Carpendale

    Abstract: While previous work exists on how to conduct and disseminate insights from problem-driven visualization projects and design studies, the literature does not address how to accomplish these goals in transdisciplinary teams in ways that advance all disciplines involved. In this paper we introduce and define a new methodological paradigm we call design by immersion, which provides an alternative pers… ▽ More

    Submitted 17 October, 2019; v1 submitted 1 August, 2019; originally announced August 2019.

    Comments: The paper has been accepted to IEEE VIS (InfoVis) 2019, and will appear IEEE TVCG. ACM 2012 CCS - Human-centered computing, Visualization, Visualization design and evaluation methods

    ACM Class: H.5.0; H.5.2; I.3.6; I.3.8

  16. Unwind: Interactive Fish Straightening

    Authors: Francis Williams, Alexander Bock, Harish Doraiswamy, Cassandra Donatelli, Kayla Hall, Adam Summers, Daniele Panozzo, Cláudio T. Silva

    Abstract: The ScanAllFish project is a large-scale effort to scan all the world's 33,100 known species of fishes. It has already generated thousands of volumetric CT scans of fish species which are available on open access platforms such as the Open Science Framework. To achieve a scanning rate required for a project of this magnitude, many specimens are grouped together into a single tube and scanned all a… ▽ More

    Submitted 5 February, 2020; v1 submitted 9 April, 2019; originally announced April 2019.

  17. arXiv:cs/0605051  [pdf, ps, other

    cs.IT

    A General Method for Finding Low Error Rates of LDPC Codes

    Authors: Chad A. Cole, Stephen G. Wilson, Eric. K. Hall, Thomas R. Giallorenzi

    Abstract: This paper outlines a three-step procedure for determining the low bit error rate performance curve of a wide class of LDPC codes of moderate length. The traditional method to estimate code performance in the higher SNR region is to use a sum of the contributions of the most dominant error events to the probability of error. These dominant error events will be both code and decoder dependent, co… ▽ More

    Submitted 11 May, 2006; originally announced May 2006.

    Comments: Submitted Trans. Inf. Theory