Skip to main content

Showing 1–2 of 2 results for author: Hussein, G

Searching in archive cs. Search in all archives.
.
  1. arXiv:2408.08926  [pdf, other

    cs.CR cs.AI cs.CL cs.CY cs.LG

    Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models

    Authors: Andy K. Zhang, Neil Perry, Riya Dulepet, Eliot Jones, Justin W. Lin, Joey Ji, Celeste Menders, Gashon Hussein, Samantha Liu, Donovan Jasper, Pura Peetathawatchai, Ari Glenn, Vikram Sivashankar, Daniel Zamoshchin, Leo Glikbarg, Derek Askaryar, Mike Yang, Teddy Zhang, Rishi Alluri, Nathan Tran, Rinnara Sangpisit, Polycarpos Yiorkadjis, Kenny Osele, Gautham Raghupathi, Dan Boneh , et al. (2 additional authors not shown)

    Abstract: Language Model (LM) agents for cybersecurity that are capable of autonomously identifying vulnerabilities and executing exploits have the potential to cause real-world impact. Policymakers, model providers, and other researchers in the AI and cybersecurity communities are interested in quantifying the capabilities of such agents to help mitigate cyberrisk and investigate opportunities for penetrat… ▽ More

    Submitted 15 August, 2024; originally announced August 2024.

    Comments: 86 pages, 7 figures

  2. arXiv:2405.00715  [pdf, other

    cs.CL cs.AI cs.LG

    Adapting Open-Source Large Language Models for Cost-Effective, Expert-Level Clinical Note Generation with On-Policy Reinforcement Learning

    Authors: Hanyin Wang, Chufan Gao, Bolun Liu, Qiping Xu, Guleid Hussein, Mohamad El Labban, Kingsley Iheasirim, Hariprasad Korsapati, Chuck Outcalt, Jimeng Sun

    Abstract: Proprietary Large Language Models (LLMs) such as GPT-4 and Gemini have demonstrated promising capabilities in clinical text summarization tasks. However, due to patient data privacy concerns and computational costs, many healthcare providers prefer using small, locally-hosted models over external generic LLMs. This study presents a comprehensive domain- and task-specific adaptation process for the… ▽ More

    Submitted 9 June, 2024; v1 submitted 25 April, 2024; originally announced May 2024.