Skip to main content

Showing 1–4 of 4 results for author: López, J A H

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.04147  [pdf, other

    cs.SE

    ALPINE: An adaptive language-agnostic pruning method for language models for code

    Authors: Mootez Saad, José Antonio Hernández López, Boqi Chen, Dániel Varró, Tushar Sharma

    Abstract: Language models of code have demonstrated state-of-the-art performance across various software engineering and source code analysis tasks. However, their demanding computational resource requirements and consequential environmental footprint remain as significant challenges. This work introduces ALPINE, an adaptive programming language-agnostic pruning technique designed to substantially reduce th… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  2. arXiv:2401.07930  [pdf, other

    cs.SE

    On Inter-dataset Code Duplication and Data Leakage in Large Language Models

    Authors: José Antonio Hernández López, Boqi Chen, Mootez Saaz, Tushar Sharma, Dániel Varró

    Abstract: Motivation. Large language models (LLMs) have exhibited remarkable proficiency in diverse software engineering (SE) tasks. Handling such tasks typically involves acquiring foundational coding knowledge on large, general-purpose datasets during a pre-training phase, and subsequently refining on smaller, task-specific datasets as part of a fine-tuning phase. Problem statement. While intra-dataset… ▽ More

    Submitted 1 August, 2024; v1 submitted 15 January, 2024; originally announced January 2024.

  3. arXiv:2206.11719  [pdf, other

    cs.CL cs.AI cs.LG cs.PL cs.SE

    AST-Probe: Recovering abstract syntax trees from hidden representations of pre-trained language models

    Authors: José Antonio Hernández López, Martin Weyssow, Jesús Sánchez Cuadrado, Houari Sahraoui

    Abstract: The objective of pre-trained language models is to learn contextual representations of textual data. Pre-trained language models have become mainstream in natural language processing and code modeling. Using probes, a technique to study the linguistic properties of hidden vector spaces, previous works have shown that these pre-trained language models encode simple linguistic properties in their hi… ▽ More

    Submitted 10 September, 2022; v1 submitted 23 June, 2022; originally announced June 2022.

  4. arXiv:2008.11858  [pdf, other

    cs.SE cs.IR

    MAR: A structure-based search engine for models

    Authors: José Antonio Hernández López, Jesús Sánchez Cuadrado

    Abstract: The availability of shared software models provides opportunities for reusing, adapting and learning from them. Public models are typically stored in a variety of locations, including model repositories, regular source code repositories, web pages, etc. To profit from them developers need effective search mechanisms to locate the models relevant for their tasks. However, to date, there has been li… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.