Paloma: A Benchmark for Evaluating Language Model Fit

Magnusson, Ian; Bhagia, Akshita; Hofmann, Valentin; Soldaini, Luca; Jha, Ananya Harsh; Tafjord, Oyvind; Schwenk, Dustin; Walsh, Evan Pete; Elazar, Yanai; Lo, Kyle; Groeneveld, Dirk; Beltagy, Iz; Hajishirzi, Hannaneh; Smith, Noah A.; Richardson, Kyle; Dodge, Jesse

Computer Science > Computation and Language

arXiv:2312.10523 (cs)

[Submitted on 16 Dec 2023]

Title:Paloma: A Benchmark for Evaluating Language Model Fit

Authors:Ian Magnusson, Akshita Bhagia, Valentin Hofmann, Luca Soldaini, Ananya Harsh Jha, Oyvind Tafjord, Dustin Schwenk, Evan Pete Walsh, Yanai Elazar, Kyle Lo, Dirk Groeneveld, Iz Beltagy, Hannaneh Hajishirzi, Noah A. Smith, Kyle Richardson, Jesse Dodge

View PDF HTML (experimental)

Abstract:Language models (LMs) commonly report perplexity on monolithic data held out from training. Implicitly or explicitly, this data is composed of domains$\unicode{x2013}$varying distributions of language. Rather than assuming perplexity on one distribution extrapolates to others, Perplexity Analysis for Language Model Assessment (Paloma), measures LM fit to 585 text domains, ranging from this http URL to r/depression on Reddit. We invite submissions to our benchmark and organize results by comparability based on compliance with guidelines such as removal of benchmark contamination from pretraining. Submissions can also record parameter and training token count to make comparisons of Pareto efficiency for performance as a function of these measures of cost. We populate our benchmark with results from 6 baselines pretrained on popular corpora. In case studies, we demonstrate analyses that are possible with Paloma, such as finding that pretraining without data beyond Common Crawl leads to inconsistent fit to many domains.

Comments:	Project Page: this https URL
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2312.10523 [cs.CL]
	(or arXiv:2312.10523v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2312.10523

Submission history

From: Ian Magnusson [view email]
[v1] Sat, 16 Dec 2023 19:12:45 UTC (5,893 KB)

Computer Science > Computation and Language

Title:Paloma: A Benchmark for Evaluating Language Model Fit

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Paloma: A Benchmark for Evaluating Language Model Fit

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators