Skip to main content

Showing 1–1 of 1 results for author: Gelmi, M

Searching in archive cs. Search in all archives.
.
  1. arXiv:2407.15762  [pdf, other

    cs.LG cs.AI cs.CL

    Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning

    Authors: Kaiwen Wang, Rahul Kidambi, Ryan Sullivan, Alekh Agarwal, Christoph Dann, Andrea Michi, Marco Gelmi, Yunxuan Li, Raghav Gupta, Avinava Dubey, Alexandre Ramé, Johan Ferret, Geoffrey Cideron, Le Hou, Hongkun Yu, Amr Ahmed, Aranyak Mehta, Léonard Hussenot, Olivier Bachem, Edouard Leurent

    Abstract: Reward-based finetuning is crucial for aligning language policies with intended behaviors (e.g., creativity and safety). A key challenge here is to develop steerable language models that trade-off multiple (conflicting) objectives in a flexible and efficient manner. This paper presents Conditioned Language Policy (CLP), a general framework for finetuning language models on multiple objectives. Bui… ▽ More

    Submitted 22 July, 2024; originally announced July 2024.

    Comments: 40 pages