Skip to main content

Showing 1–8 of 8 results for author: Minhas, U F

Searching in archive cs. Search in all archives.
.
  1. arXiv:2404.01626  [pdf, other

    cs.CL cs.IR

    Entity Disambiguation via Fusion Entity Decoding

    Authors: Junxiong Wang, Ali Mousavi, Omar Attia, Ronak Pradeep, Saloni Potdar, Alexander M. Rush, Umar Farooq Minhas, Yunyao Li

    Abstract: Entity disambiguation (ED), which links the mentions of ambiguous entities to their referent entities in a knowledge base, serves as a core component in entity linking (EL). Existing generative approaches demonstrate improved accuracy compared to classification approaches under the standardized ZELDA benchmark. Nevertheless, generative approaches suffer from the need for large-scale pre-training a… ▽ More

    Submitted 7 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted at NAACL'24 main

  2. arXiv:2311.15781  [pdf, other

    cs.AI cs.CL cs.LG

    Increasing Coverage and Precision of Textual Information in Multilingual Knowledge Graphs

    Authors: Simone Conia, Min Li, Daniel Lee, Umar Farooq Minhas, Ihab Ilyas, Yunyao Li

    Abstract: Recent work in Natural Language Processing and Computer Vision has been using textual information -- e.g., entity names and descriptions -- available in knowledge graphs to ground neural models to high-quality structured data. However, when it comes to non-English languages, the quantity and quality of textual information are comparatively scarce. To address this issue, we introduce the novel task… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: Camera ready for EMNLP 2023

  3. Growing and Serving Large Open-domain Knowledge Graphs

    Authors: Ihab F. Ilyas, JP Lacerda, Yunyao Li, Umar Farooq Minhas, Ali Mousavi, Jeffrey Pound, Theodoros Rekatsinas, Chiraag Sumanth

    Abstract: Applications of large open-domain knowledge graphs (KGs) to real-world problems pose many unique challenges. In this paper, we present extensions to Saga our platform for continuous construction and serving of knowledge at scale. In particular, we describe a pipeline for training knowledge graph embeddings that powers key capabilities such as fact ranking, fact verification, a related entities ser… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: To be published in SIGMOD 2023

  4. arXiv:2304.01926  [pdf

    cs.DB cs.AI cs.LG

    High-Throughput Vector Similarity Search in Knowledge Graphs

    Authors: Jason Mohoney, Anil Pacaci, Shihabur Rahman Chowdhury, Ali Mousavi, Ihab F. Ilyas, Umar Farooq Minhas, Jeffrey Pound, Theodoros Rekatsinas

    Abstract: There is an increasing adoption of machine learning for encoding data into vectors to serve online recommendation and search use cases. As a result, recent data management systems propose augmenting query processing with online vector similarity search. In this work, we explore vector similarity search in the context of Knowledge Graphs (KGs). Motivated by the tasks of finding related KG queries a… ▽ More

    Submitted 4 April, 2023; originally announced April 2023.

    Comments: 13 pages, 7 figures, to be published in ACM SIGMOD 2023

  5. arXiv:2111.14905  [pdf, other

    cs.DB cs.LG

    Bounding the Last Mile: Efficient Learned String Indexing

    Authors: Benjamin Spector, Andreas Kipf, Kapil Vaidya, Chi Wang, Umar Farooq Minhas, Tim Kraska

    Abstract: We introduce the RadixStringSpline (RSS) learned index structure for efficiently indexing strings. RSS is a tree of radix splines each indexing a fixed number of bytes. RSS approaches or exceeds the performance of traditional string indexes while using 7-70$\times$ less memory. RSS achieves this by using the minimal string prefix to sufficiently distinguish the data unlike most learned approaches… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

    Comments: 3rd International Workshop on Applied AI for Database Systems and Applications (AIDB'21), August 20, 2021, Copenhagen, Denmark

  6. APEX: A High-Performance Learned Index on Persistent Memory

    Authors: Baotong Lu, Jialin Ding, Eric Lo, Umar Farooq Minhas, Tianzheng Wang

    Abstract: The recently released persistent memory (PM) offers high performance, persistence, and is cheaper than DRAM. This opens up new possibilities for indexes that operate and persist data directly on the memory bus. Recent learned indexes exploit data distribution and have shown great potential for some workloads. However, none support persistence or instant recovery, and existing PM-based indexes typi… ▽ More

    Submitted 6 December, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: To appear at VLDB 2022 (PVLDB Vol. 15 Issue 3)

  7. arXiv:2004.10898  [pdf, other

    cs.DB cs.DS cs.LG

    Qd-tree: Learning Data Layouts for Big Data Analytics

    Authors: Zongheng Yang, Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li, Umar Farooq Minhas, Per-Åke Larson, Donald Kossmann, Rajeev Acharya

    Abstract: Corporations today collect data at an unprecedented and accelerating scale, making the need to run queries on large datasets increasingly important. Technologies such as columnar block-based data organization and compression have become standard practice in most commercial database systems. However, the problem of best assigning records to data blocks on storage is still open. For example, today's… ▽ More

    Submitted 22 April, 2020; originally announced April 2020.

    Comments: ACM SIGMOD 2020

  8. arXiv:1905.08898  [pdf, other

    cs.DB cs.DS cs.LG

    ALEX: An Updatable Adaptive Learned Index

    Authors: Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David Lomet, Tim Kraska

    Abstract: Recent work on "learned indexes" has changed the way we look at the decades-old field of DBMS indexing. The key idea is that indexes can be thought of as "models" that predict the position of a key in a dataset. Indexes can, thus, be learned. The original work by Kraska et al. shows that a learned index beats a B+Tree by a factor of up to three in search time and by an order of magnitude in memory… ▽ More

    Submitted 20 May, 2020; v1 submitted 21 May, 2019; originally announced May 2019.

    Report number: MSR-TR-2020-12