Large Language Models TAP 2022 Final 051622
Large Language Models TAP 2022 Final 051622
Chatterbox?
Large Language Models,
Why They Matter, and What
We Should Do About Them
Johanna Okerlund
Evan Klasky
Aditya Middha
Sujin Kim
Hannah Rosenfeld
Molly Kleinman
Shobita Parthasarathy
Contents
I MPLI CATI O NS O F LLM ADO PTI O N
Section 4: Reinforcing 62
Social Inequalities
Acronyms and 6
Definitions
LLM CASE STUDY
Section 7: Transforming 86
Executive Summary 8
the Scientific Landscape
Introduction 16
Policy 97
Recommendations
Background: How do 30
Large Language Models
Developers’ Code of 101
Work?
Conduct
AI Artificial intelligence
Developers who integrate the LLM into an app or product that is deployed
App developer for others to use
CI Cochlear implant
Corpus (plural, Dataset consisting of text-based documents that an LLM is trained on.
corpora)
Person or entity that uses an app or product built on top of an LLM; we
End user also refer to them as users.
Graphic processing unit; a highly parallel computing circuit used for fast
GPU processing.
Software for which the original source code is openly available and
Open source licensed so that future developers can use and build on it, so long as they
promise to keep their source code open so others can innovate beyond it.
LLM size is measured in parameters; the more parameters there are, the
Parameters more complex information about language a model can store
Executive Summary
Large language models (LLMs)—machine technologies–in terms of form, function, and
learning algorithms that can recognize, impacts–to anticipate the implications of
summarize, translate, predict, and generate emerging technologies.
human languages on the basis of very large
text-based datasets—are likely to provide This report first summarizes the LLM
the most convincing computer-generated landscape and the technology’s basic features.
imitation of human language yet. Because We then outline the implications identified
language generated by LLMs will be more through our ACS approach. We conclude that
sophisticated and human-like than their LLMs will produce enormous social change
predecessors, and because they perform including: 1) exacerbating environmental
better on tasks for which they have not been injustice; 2) accelerating our thirst for data;
explicitly trained, we expect that they will be 3) becoming quickly integrated into existing
widely used. Policymakers might use them infrastructure; 4) reinforcing inequality;
to assess public sentiment about pending 5) reorganizing labor and expertise, and 6)
legislation, patients could summarize and increasing social fragmentation. LLMs will
evaluate the state of biomedical knowledge to transform a range of sectors, but the final
empower their interactions with healthcare section of the report focuses on how these
professionals, and scientists could translate changes could unfold in one specific area:
research findings across languages. In sum, scientific research. Finally, using these
LLMs have the potential to transform how insights we provide informed guidance on
and with whom we communicate. how to develop, manage, and govern LLMs.
are developing more transparent and open based documents, often taking advantage
approaches to LLMs, they are supported of collections of digitized books and user-
by the same venture capital firms and tech generated content on the internet. Second,
companies shaping the industry overall. the model learns about word relationships
Meanwhile, although there are many from this data. Large models are able to retain
academic researchers in this area, they tend complex patterns, such as how sentences,
to depend on the private sector for LLM paragraphs, and documents are structured.
access and therefore work in partnership Finally, developers assess and manually
with them. Government funding agencies, fine-tune the model to address undesirable
including the National Science Foundation, language patterns it may have learned from
support these collaborations. This tightness the data.
in the LLM development landscape means
that even seemingly alternative or democratic After the model is trained, a human can use
approaches to LLM development are likely it by feeding it a sentence or paragraph, to
to reinforce the priorities and biases of large which the model will respond with a sentence
companies. or paragraph that it determines is appropriate
to follow. Developers are under no obligation
on scarce resources and its subsequent We are also concerned that LLM developers
effects. Already, residents near Google and will turn to unethical methods of data
Microsoft data centers on the West Coast have collection in order to diversify the corpora.
expressed concerns about the companies’ As noted above, researchers have already
overconsumption of water and contribution demonstrated how LLMs reflect historical
to toxic air pollution. Unfortunately, it is biases about race, gender, religion, and
unlikely that these concerns will influence sexuality. The best way to address these
siting decisions; like oil and gas pipelines, biases is to ensure that the corpora include
we expect that data centers will be legally more texts authored by people from
classified as “critical infrastructure”. marginalized communities. However,
Attempted protests will be treated as criminal this poses serious risks of unethical data
offenses. extraction such as when Google attempted to
improve the accuracy of its facial recognition
technology by, in part, taking pictures of
Accelerating the Thirst For
homeless people without complete informed
Data consent.
Our analysis shows that LLMs are likely LLMs are likely to function best in their
to reinforce inequalities in a few ways. In dominant training language. Eventually this
addition to producing biased text, they will reinforce the dominance of standard
will reinforce the inequitable distribution American English in ways that will expedite
of resources by continuing to favor those the extinction of lesser-known languages or
who are privileged through its design. For dialects, and contribute to the cultural erasure
example, racial bias is already embedded of marginalized people. Furthermore, because
in medical devices such as the spirometer, they are based on historical texts LLMs
which is used to measure lung function. The are likely to preserve limited, historically
technology considers race in its assessment suspended understandings especially of
of “normal” lung function, falsely assuming the non-American or Chinese cultures
that Black people naturally have lower lung represented in its corpora.
function than their white counterparts. This
makes it more difficult for Black people to
Remaking Labor and
access treatment. Similarly, imagine an LLM
app designed to summarize insights from
Expertise
previous scientific publications and generate
health care recommendations accordingly. Most people studying the impact of
assumptions, or simply ignore the needs particularly for those in lower skilled
of particular groups, the LLM’s advice is occupations. In the case of LLMs, we expect
likely to be inaccurate too. We expect similar job losses to be more prevalent in professions
scenarios in other domains including criminal tightly coupled with previous technologies;
justice, housing, and education where biases LLMs will completely eliminate certain
texts are likely to generate advice that moderation of social media while creating
perpetuates inequities in resource allocation. new kinds of tech-based work. But our
Unfortunately, because the models are opaque analysis suggests that LLMs are also likely to
and appear objective, it will be difficult to transform labor. In particular, we expect that
identify and address such problems. As a with widespread adoption LLMs will perform
result, individuals will bear the brunt of them mundane tasks while shifting humans to
Professions that heavily use writing (e.g., that aligns with their interests and values and
law, academia, journalism) will have to erode shared realities further.
develop new standards and mechanisms for
evaluating authorship and authenticity. For Finally, as LLMs get better at writing text
example, the invention of the typewriter led that is indistinguishable from something a
to the creation of the “document examiner” human could have written, they will not only
position to determine the provenance of typed challenge the cultural position of authors but
text; we could imagine a similar job for LLM- also trust in their authorship. For example,
based text. Finally, we expect widespread use many schools and universities today use
of LLMs to trigger labor resistance. There plagiarism detection technologies to prevent
is a long legacy of technology-driven labor student cheating. However, this has triggered
unrest including the Luddites of the 19th a technological arms race. A variety of
century. More recently, the United Food services have emerged to help students cheat
and Commercial Workers International while evading detection by Turnitin, from
Union’s developed public campaigns against websites full of how-to advice to paid essay
Amazon’s cashierless grocery store model. writing services. LLMs will trigger a similar
LLMs will incite similar resistance from dynamic. The more writers of all kinds use
workers and consumers based on fear of job LLMs for assistance, the more efforts to
loss, violations of social norms, and reduced authenticate whether they “really” wrote
income taxes. their article or book, and the more writers
will find new ways to take advantage of LLM
In addition to shaping access to knowledge, Finally, we expect that LLMs will help
we expect that LLMs will transform scientific some researchers improve their English or
knowledge itself. Technologies, from Chinese writing skills and increase their
the microscope to the superconducting publications in top journals. The technology
supercollider, have long shaped the substance will likely be particularly useful for scholars
of research, and LLMs will be no exception. from British Commonwealth countries
We expect that fields that analyze text, whose language may differ only slightly
including the digital humanities, to be from standard English. However, we expect
the most affected. Researchers will need translation in and out of other languages to
to develop standard protocols on how to be poor and researchers unfortunately may
scrutinize insights generated by LLMs and not always be aware of such limitations at
how to cite LLM output so that others can the outset. Meanwhile, the more common
replicate the results. LLMs are likely to have LLMs become as a scientific tool, the more
profound impacts on the nature of scientific they will reinforce English as the lingua
inquiry as well, by encouraging recent trends franca of science. This will likely also mean
that focus on finding patterns in big data that the values and concerns of the English-
rather than establishing causal relationships. speaking world–particularly the United
States and Britain–will dominate global
LLMs are also likely to transform scientific scientific priorities. And yet, these political
evaluation systems. Editors currently struggle implications may remain hidden because
to find peer reviewers, and LLMs could help. LLMs will be promoted as a technology that
However, LLMs are likely to be rigid and will be able to truly globalize science.
Introduction
Large language models (LLMs) are a type their health care (Bommasani et al., 2021).
of artificial intelligence (AI) intended Their ability to answer questions and hold
to recognize, generate, summarize, and conversations could transform customer
translate human language. They are different service (Dale, 2021). In the classroom they
from previous approaches to natural language could be used to create virtual teachers
processing (NLP) because they are based personalized to a student’s learning style
on enormous datasets and designed to (Manjoo, 2020). And, because LLMs gain new
extract and replicate the rules of language functionalities as the scale of their datasets
(Radford et al., 2019). Although some smaller increases, enthusiasts claim that future LLMs
scale language automation algorithms are will develop new and unforeseen applications
currently in use, LLMs have the potential with additional benefits (Seabrook, 2019).
to transform how and with whom we
communicate because their output is likely Despite these promises, LLMs have already
to be more sophisticated and human-like prompted controversies that complicate
than their predecessors, and because they these claims. Because LLMs are trained on
perform better on tasks for which they have datasets that include substantial quantities
not been explicitly trained. To create LLMs, of old texts that often contain antiquated
developers use machine learning techniques and violently prejudiced language, LLMs
to model the relationships between different repeat and perpetuate those same violent
text elements based on extremely large data tendencies (Abid et al., 2021; Tamkin et
sets of text from internet and book archives. al., 2021). The large number of computers
Once the LLM model is complete, it can be and colossal amount of computing power
applied to tasks like automated question required to both train and operate LLMs
answering, translation, text summarization, leads to resource extraction that degrades
and chatbots (Tamkin et al., 2021). the environment, and carbon emissions that
contribute to climate change (Bender & Gebru
Scientists, entrepreneurs, and tech- et al., 2021). Their ability to produce text
watchers excited about LLMs describe that sounds human with minimal prompts
them as a revolutionary technology with make LLMs a potential tool to efficiently and
potential applications in a dizzying array of effectively manufacture propaganda and
contexts and fields (Bommasani et al., 2021; disinformation through false news articles
Dale, 2021). LLMs could be used to bolster and social media posts (Tamkin et al., 2021).
international collaboration in science, provide Most importantly, critics point out that these
legal services to those who traditionally can’t equity and environmental problems are likely
afford them, and help patients advocate for to go unaddressed because the high cost of
running LLMs has made their use exclusive In this report, we anticipate the potential
to very large and well resourced corporations, implications of LLMs by analyzing the history
creating economic barriers and limiting of similar technologies, using what we call an
access to only wealthier and more powerful analogical case study method. We then focus
entities (Knight, 2021). on one domain where LLMs are likely to have
a significant impact: scientific research. We
conclude with recommendations for both
policymakers and the scientific research
The “Stochastic community, and a “code of conduct” to guide
the practices of LLM developers.
Parrots”
Controversy
LLMs are still new and
Large language models gained experimental, and therefore their
notoriety in the wake of the firing of
ex-Google employees Timnit Gebru social impact is still emerging.
and Margaret Mitchell. Gebru co- But history teaches us that
led Google’s “Ethical AI team” with
Margaret Mitchell, and along with
their impact will be profoundly
academic and Google colleagues, shaped by those creating it.
co-authored a paper on the risks
and failings of LLMs called “On the
Dangers of Stochastic Parrots: Can LLMs are still new and experimental, and
Language Models Be Too Big?”
therefore their social impact is still emerging.
The paper raised concerns about
But history teaches us that their impact will
the environmental impacts, and
be profoundly shaped by those creating it, and
problems with training data including
at present, because of the enormous capacity
unmanageability, encoded bias,
and lack of accountability (Bender and resources needed, the primary developers
& Gebru et al., 2021). According are a handful of large tech companies. As we
to Gebru, in winter 2020 Google discuss throughout the report, this should
attempted to prevent her from give us some cause for concern. We noted
releasing the paper without major above that LLMs could improve and increase
revisions; when Gebru refused, they access to specialized expertise from law,
fired her (Metz & Wakabayashi, medicine, science, and more. They could
2020). The ensuing controversy also make technical information more
eventually led to Mitchell’s firing, and widely available to the public, ultimately
publication of the original version empowering individuals and communities.
of the paper without Google’s
However, it is unlikely that the technology
edits (Bender & Gebru et al., 2021;
will be able to achieve any of these benefits
Simonite, 2021).
if it is built by a narrow group of elites and
without proper technology assessment and
The ELIZA program, created by Joseph Aside from Eliza, most early language
Weinbaum in 1964 at the Massachusetts automation research was driven by U.S.
Institute of Technology’s (MIT) Artificial military priorities and government funders
Intelligence Lab, was one of the first language largely considered them inadequate
(Hutchins, 2003). From the mid 1950s to for representing language input designed
1964, most federal research funding focused to help machines develop conceptual
on machine translation, specifically Russian understandings of words so their responses
to English translations for the Cold War. In could be more useful (Moltzau, 2020). These
1964, the Department of Defense, National developments led to the first program that
Science Foundation, and Central Intelligence used simple natural language inputs -
Agency among other branches of the language written as it normally would be in
government created the Automatic Language human conversation - to control a machine
Processing Advisory Committee (ALPAC) at Massachusetts Institute of Technology
to assess the utility of this work in terms of in 1971. By 1985 the main uses developers
advancing government priorities, reducing imagined for language programs at the time
costs, improving performance, or addressing could be loosely categorized into 6 groups: 1)
a need that humans could not fill. To evaluate interfacing with databases, 2) conversational
the translation program, ALPAC compared interfaces for programs, 3) content scanning
Russian to English machine translations to of semi-formatted texts to determine actions,
human translations based on intelligibility 4) text editing for grammar and style, 5)
and fidelity, and found that humans far translation, and 6) transcription of spoken
outperformed machines, human translation input (Dale, 2017).
was less costly after edits, and that the
government had sufficient capacity for By the 1980s the increased availability of
Russian translation. Based on these findings, computing power allowed researchers to
the committee recommended that defense begin integrating statistical methods into
agencies stop funding machine translation, natural language program development
and that the NSF switch to only funding basic (Hutchins, 2003). These statistical
computational linguistics research. approaches, called Natural Language
Processing (NLP), essentially allowed the
As a result of the ALPAC report, the federal computer to “learn” for itself how language
government largely stopped funding works, by identifying patterns from text-
machine translation research until the based “training” datasets rather than
Advanced Research Projects Agency took up relying on researchers to lay out complex
the subject again in the 1990s, but private rules based on linguistics research. As the
industry continued to tackle machine amount of digital text grew, researchers were
language projects aimed at other uses like able to compile larger and larger data sets
text generation and conversation. During which improved the performance of these
that period, research in machine language “statistical” language programs. Eventually
moved towards methods of representing and statistical methods outperformed and
communicating meaning in dialogue and replaced programs based on linguistic rules
natural language generation, which more and NLP has become an interdisciplinary field
closely resembles the goals of today’s LLMs bringing together insights from linguistics,
(Hutchins, 2003). For example, in 1969 and computer science, and artificial intelligence.
1970, researchers introduced new methods
partner with Massachusetts Institute of capital firms that fund the for-profit entities.
Technology’s (MIT) Computer Science and For example, Hugging Face has developed
Artificial Intelligence Lab (Stanford HAI, n.d; open source resources for the production of
MIT CSAIL, n.d). The top cited LLM research LLMs, including datasets and models, and
papers are co-authored by researchers explicitly invokes the values of accessibility
from industry and academia. In addition to and democratizing innovation (Dillet, 2021).
providing university researchers with LLM It supports users as they develop and upload
access, these collaborations also provide the their own content, allowing for transparency
private sector counterparts with scholarly and collaboration of model and dataset
credibility. construction. Hugging Face hired Margaret
Mitchell, a co-author of the Stochastic
Government funding agencies have Parrots paper (See “Stochastic Parrots” Text
increasingly encouraged university-industry Box” above), to lead data governance efforts.
collaboration. In 2019, the US National In addition, Hugging Face has initiated the
Science & Technology Council released a BigScience Project, an effort to create and
strategic plan that prioritized collaboration share datasets, models, and software tools
between academia and industry (Select in order to reveal and minimize potential
Committee on Artificial Intelligence, 2019). problems with LLMs (Hao, 2021). Similarly,
The plan argued that these collaborations EleutherAI has developed open source
predictability, ethics, and legal questions GPT-3, as well as an 860 GB dataset for
proactively, before AI products are developed. language modeling (EleutherAI, n.d.; Gao
This had an immediate impact: the NSF, et al., 2020). While EleutherAI is volunteer-
which had previously focused on AI research based, the collective depends on donated GPU
within universities, has now launched compute from CoreWeave, which is part of
seven national AI research institutes that the NVIDIA Preferred Cloud Services Provider
(Gibson, 2020). Three of these have focus LLM development landscape means that
collaborations, often financing links between developers (AI Now Institute, 2021).
There are also attempts to create open-source recently announced a new interdisciplinary
or more transparent LLMs, but some of these research arm, the Center for Research on
projects are backed by the same venture Foundation Models (CRFM) to “study and
TA B L E 1 . I N F L U E N T I A L L L M S
BERT Google AI 2017 340 million Demonstrates the Open source (Devlin et al.
Language transformer architecture, 2018b): Model and corpus are
which is what enables all available to use and adapt
LLMs to be large for free
GPT-3 OpenAI 2020 175 billion Was the largest model App developers can apply
at the time of release; for paid API access; OpenAI
demonstrates that new indicates this is a temporary
properties emerge simply safety measure
by increasing the size of
a model
GPT-J EleutherAI 2020 6.7 billion Open source grassroots Open source: Model and corpus
version of GPT-3 and are all available to use and
adapt for free
WuDao Beijing Academy 2021 1.75 trillion First model to reach 1 Open source (Romero, 2021b):
2.0 of Artificial trillion parameters; model Model and corpus and are all
Intelligence (BAAI) is multimodal (trained on available to use and adapt for
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 22 PDF: Click to
both text and images) free
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
TA B L E 2 . TA X O N O M Y O F E N T I T I E S I N V O LV E D
IN LLM DEVELOPMENT AND DEPLOYMENT
Description Example
App developer Developers who integrate the LLM into Company that develops an LLM-enabled
an app or product that is deployed for platform for extracting insights from
others to use customer feedback
End user Person or entity that uses an app or Company that uses an LLM-enabled
product built on top of an LLM platform for extracting insights from
customer feedback
When an application or end user gives an and integrating it into different end-user
the LLM will be able to return the following 2021), which raises questions about code
technical reports or legislative text (Tamkin they don’t spend money at all in a month”
et al., 2021). Similarly, companies may be (Syverson, 2020).
able to better understand and extract key
information and takeaways from customer Perhaps most immediately, companies
feedback or other interactions (Viable, n.d.). are likely to use conversational LLMs to
make more sophisticated customer service
of the EU, 2018). The corpora themselves US states. Because many LLM corpora include
are not protected by their own copyright, text scraped from the internet, they may
proprietary and remain secretive about what or multiple pieces of text that the model could
affected, and solutions that might be feasible primary sources to develop an understanding
with emerging innovation (Nelkin, 1992). of the history, political economy, and
As Guston and Sarewitz (2002) argue: technical dimensions of LLMs.
“knowledge about who has responded to
transforming innovation in the past, the We then brainstormed two types of analogical
types of responses that they have used, and cases. Type 1 cases are similar to LLMs in
the avenues selected for pursuing those terms of their function (i.e., processing large
responses can be applied to understand amounts of data, often with the purpose of
connections between emerging areas of prediction), while Type 2 cases have similar
rapidly advancing science and specific implications as those projected for LLMs (e.g.,
patterns of societal response that may racial bias, massive energy use).
emerge” (p.101). By deliberately considering
the histories of analogical technologies across
We investigated these cases, which
sectors, our method identifies relevant social
intentionally draw from both historical
patterns in how technologies develop and
and more recent technologies, in areas
are implemented. It also allows us to identify
both similar to and different from LLMs.
successful social and policy approaches to
For example, to help us understand the
managing technological harms.
implications of potential biases embedded
in this emerging technology, we looked
Our analytic approach to LLMs builds on at medical technologies including the
the method we developed to study facial spirometer and pulse oximeter. To
recognition technologies and vaccine understand how LLMs might pose challenges
hesitancy (Galligan et al., 2020; Wang et al., to how we understand expertise and
2021). We began our work in May 2021 by professional competence, we looked at traffic
training the research team, composed of a lights, which removed traffic management
diverse group of faculty, staff, a postdoctoral from the domain of law enforcement officers.
fellow, and undergraduate and graduate We also looked at biobanks, large scale
students, in some of the basic concepts repositories of DNA and other forms of data
related to the history and sociology of used for the purpose of facilitating biomedical
technology and the ACS methodology. To research and ultimately predicting and
help stimulate our creativity, we read some alleviating human disease. We adopted an
speculative fiction that imagines how AI iterative process: after we worked our way
might shape the future (Jemisin, 2011; through the initial set of cases and presented
Jemisin, 2012). We also reviewed the scholarly our insights to one another, we reflected
and journalistic literature to understand the about the potential implications of LLMs. We
projected implications of LLMs. However, then generated an additional list of cases,
because the technology is at such an early and so on until we were confident that we
stage of development, this literature is small had exhausted the social, ethical, equity, and
and it has been produced almost exclusively environmental implications that we could
by NLP researchers and journalists. Team anticipate.
members used this literature as well as
KEY POINTS
• LLMs are considered more “intelligent” than previous NLP efforts due to their capacity for
complex language patterns and ability to behave appropriately in novel situations.
• LLMs learn language from datasets of human-written text from the internet and digitized
books that are so large the developers who assemble them often do not know the entirety of
their contents.
• While LLM developers may be able to assess the performance of their models, there is no
standard approach.
• LLM developers can fix the model’s behavior through fine-tuning, but they must both
identify problems and develop solutions manually.
LLMs differ from their predecessors in two developers assemble a dataset, or “corpus”,
critical ways. LLMs are much larger, both of text-based documents. Second, the
in terms of the massive amounts of data algorithm learns about word relationships
developers use to train them, and the millions from this data. Finally, developers iteratively
of complex word patterns and associations assess and fine-tune the model as needed to
the models contain. LLMs also more fix specific problems. In theory, models can
closely embody the promise of “artificial continue to be fine-tuned even once they
intelligence” than previous NLP efforts with are in use, although the time and manual
smaller data sets because they can complete labor required of such a change may render
many types of tasks without being specifically ongoing maintenance of this kind impractical.
trained for each one. They are “intelligent” Developers assess model performance against
because the complexity of the association a number of formal and informal metrics
model allows LLMs to respond to questions such as how well it completes sentences,
and tasks they have never seen before, in the how accurately it translates to a different
same way they process other language inputs, language, and whether a human can tell if
and identify appropriate responses. This text was written by the model or by a human.
characteristic in particular makes any single
LLM more widely applicable
than previous NLPs.
Developing an LLM involves three steps, be able to generate a possible body of the
each of which can dramatically change how article based on text that followed similar
the program models language, and therefore phrases in the training set. While LLMs do
how it will function when it is used. First, not understand the text they generate in
the way a human does, because they are
reliable sources of text with ensuring there by Amazon Web Services. The Common
is diverse text from a range of different Crawl dataset includes millions of GB of data
sources. The corpus GPT-3 was trained on, for (Common Crawl, n.d.), and is updated once
example, contains text from Wikipedia, the a month. Each update contains 200 to 300
internet, and online collections of books, with terabytes (TB; a TB is equal to 1000 GB) of
greater weight put on the text from Wikipedia textual content scraped via automated web
even though the corpus contained a greater crawling (Luccioni & Viviano, 2021, p.1). It is
volume of data from other sources (Brown constructed from the text of websites, but its
et al., 2020). Putting greater weight on archive represents only part of each website
Wikipedia was a way for the LLM developers crawled. The organization argues that this
to ensure emphasis on text they trusted more creates a “representative” sample of the
(Brown et al., 2020). internet, but this approach also allows it to
claim a fair use exception to copyright laws
by only using a portion of each site, instead
of the whole thing (Luccioni & Viviano, 2021,
p.1). Overall, the Common Crawl dataset
represents the text of the internet’s most
frequent users, who are disproportionately
younger, English speaking individuals from
Western countries who often engage in toxic
discourse (Luccioni & Viviano, 2021, p.5).
Therefore it includes a significant amount of
harmful data including text that is violent,
targets marginalized groups, and perpetuates
social biases (Luccioni & Viviano, 2021, p.3).
Because LLMs identify and replicate patterns,
the inclusion of this data creates a significant
risk that without explicit additional case-by-
Many LLMs (including GPT-3 and BERT) case training, LLMs will produce language
use text from the internet in their corpora. that is similarly harmful and biased. LLMs
do not at present have the capacity to LLM developers also draw on collections of
automatically detect this kind of language digitized books, but often offer few details
without specialized training. about the composition of the collections
or the rationale behind it. This matters
private tech venture that built GPT-3, has less appropriate for training an LLM. The
another approach to overcoming the quality BooksCorpus, for example, part of the corpus
challenge of internet text. Its internet corpus, used to train BERT, was originally curated
WebText, is a 40GB dataset that contains for a completely different project designed
the text from outbound Reddit links that to train an AI to generate rich descriptive
received at least 3 “karma,” or upvotes from text when given video or images (Zhu et al.,
users (Radford et al., 2019). Their rationale is 2015). It contains over 11,000 free web-based
that these linked and upvoted webpages are books written by unpublished authors, but
more likely to contain quality text because the curators do not include much detail about
someone bothered to link and upvote them. its contents other than a breakdown of a few
This may be true, but these linked web pages of the genres (2,865 romance books, 1,479
are also more likely to represent the values fantasy books, etc.). The developers who used
and ideology of Reddit users, which are also this dataset to train the BERT LLM also did
not representative of the general population not not provide additional details such as
(Morales et al., 2021). Meanwhile, because what ideas the corpus includes, whose voices
WebText is not available in full for use by it represents, or why it is an appropriate part
people outside of OpenAI, others have tried of their training corpus. Even more opaque
to construct a publicly accessible version of are the Books1 and Books2 datasets, which
the same corpora, also based on Reddit links are part of the training data for OpenAI’s
(Gokaslan & Cohen, 2019). GPT-3 (Gokaslan & Cohen, 2019). There is
no discussion at all of what these datasets
contain, how they were constructed, or what
While Common Crawl’s corpus is open
they represent in the context of training the
and available for anyone to use, it is huge,
LLM (Scareflow, 2020).
complex, and heterogeneous. As a result, it
requires a large amount of computational
resources to download and process the Furthermore, these corpora are often
data, which means it requires high compute private; most LLM developers do not allow
(Luccioni & Viviano, 2021, p.2). Therefore, others to inspect or build on their dataset.
only researchers at elite universities and large A rare exception is Eleuther AI, which, as
companies are likely to have the financial we describe above, takes a more democratic
resources, expertise, and personnel to be able approach to its LLM overall. Eleuther AI
to use it to build their own LLMs. The LLMs developers created the Pile, a publicly
they build are likely to reflect their values and available English-text corpus that is about
priorities; this will influence LLMs’ capacity 886 GB and made up of existing corpora
to truly democratize text and knowledge. such as OpenWebText2 and Books3, as well
as internet based datasets such as a filtered
version of Common Crawl (Gao et al., 2020). vectors: lists of numbers that represent
The Eye, a non-profit, community driven coordinates in a many-dimensional space.
and funded group, which archives a variety This allows computers to use math to
of creative materials, hosts this corpus (The understand the relationships between words
Eye, 2020). and sentences and predict what words should
be used to complete a sentence or paragraph.
There are different methods
for generating these word
vectors, but recent advances
These bodies of text are often so have developed techniques
large that not even the developers that take into account the
fact that different words
know what is in them. have different meanings
in different sentences.
Everything the LLM learns
about the relationship
Overall, our observations about the between words is based on what is written
composition of LLM corpora echo what in the training data. For example, it will only
Bender and Gebru et al. (2021) have said about associate “sky” with “blue” if there are many
LLMs; these bodies of text are often so large sentences in the corpus that demonstrate
that not even the developers know what is in an association between those words. Bender
The number of parameters, or pieces of training. The actual training process involves
information about language stored by the the model traversing the training corpus
model, is also crucial. Like neurons in a piece by piece and updating its parameters
human brain, the parameters work together according to what it learns from each phrase
to store and process complex information. it encounters.
Models that have more parameters are able
to remember more complex patterns from Training LLMs requires a massive amount
the training data, such as how sentences, of compute, which is only available to a
paragraphs, and documents are structured, handful of people who have access to high
but they require more compute to train. The performance computers or cloud computing
number of parameters in state of the art LLMs services through their institutions (Ahmed
is increasing by a factor of about 10 each year & Wahed, 2020). The developers of GPT-3
(Li, 2020). At its creation, report how many computational
GPT-3 was the largest operations were performed
at 175 billion during the training
parameters, and process, but they do
at the time not give details on
developers the hardware
were already setup, the cost
eyeing the of training, or
possibility how long the
of a 1 trillion training actually
parameter model took. Based on
(Brown et al., 2020). the reported
A short time later, the numbers, other
Beijing Academy of Artificial researchers estimate
Intelligence announced Wu Dao that it could have taken
2.0, which is a 1.75 trillion parameter 355 years to train GPT-3
model (Zhavoronkov, 2021). Parameter size on a single graphics processing
seems to have significant impact on LLM unit (GPU) or could have cost $4.6
performance. GPT-3 (175 billion parameters) million with the necessary parallel hardware
performs far better than GPT-2 (2.7 billion for faster training (Li, 2020). There is no
parameters) along several measures (Brown doubt that training LLMs is expensive. In
et al., 2020). Likely as a result, competition fact, the developers of GPT-3 noticed errors
among developers is primarily about the size in their data during the training process,
of the LLM. but did not start the training process over
because it would have been too expensive
After the developers have determined and time consuming (Brown et al., 2020).
what architecture to use, the number of While initiatives such as the federally-funded
parameters, and the strategy it will use National Research Cloud aim to broaden
for tokenization, the model is ready for access to compute, it is likely to mostly
only help those who already have access to create their own fine-tuned version of GPT-3
significant amounts of compute on their own (OpenAI, 2021b). But this poses new risks
because the gap in access is so great (AI Now and uncertainties. People could feed the LLM
Institute, 2021). hateful or unethical text to produce a more
socially dangerous technology. Meanwhile,
those with more positive intentions have
Fine-tuning the model no guarantees that socially beneficial fine-
tuning will be adopted in the main GPT-3
After an LLM is trained, it can be fine-tuned. model.
This is a relatively light-weight process that
involves feeding the model a few hand-picked
examples, and is often used to train the model Understanding the
on socially sensitive topics. The model learns model’s output and
from these examples and changes a few of
its parameters without changing the core of capability
the model. OpenAI, for example, created a
“values-targeted” version of GPT-3, which There is no universal standard for assessing
they fine-tuned using 80 additional human- LLM quality. However, developers usually ask
written examples that illustrate preferred their models to perform a common set of tests
behavior so it would answer questions about in order to assess performance. These include
subjective topics such as violence, injustice, asking the model to complete sentences,
human characteristics, and political opinion correct grammar, answer commonsense
in what they deemed a desirable manner questions, translate sentences to another
(Solaiman & Dennison, 2021). After learning language, answer reading comprehension
from these additional examples, GPT-3 questions and compare the results to other
generated drastically different responses to LLMs and to a human on the same tasks.
questions about these topics.
Developers are under no obligation to
Fine-tuning allows dynamic updates to disclose either the tests they perform or the
address problems with LLM outputs that results, which makes it difficult for third
are far faster and require less compute parties, including consumers, to evaluate
than model training. However, this process performance. But some publicly available
depends on the sensibilities and knowledge assessments of GPT-3 help us develop a
of developers, who will need to decide better understanding of LLM capabilities and
which topics require additional training and potential areas of concern.
carefully curate the examples. In many cases,
developers may not know which examples First, tests suggest that humans are not able
will be best for fine-tuning. To address to identify LLM-generated text. Developers
these challenges, some organizations have asked GPT-3 to generate a news article based
decided to “democratize” fine-tuning. on a headline, and then asked people whether
OpenAI is taking steps to allow anyone to the article had been generated by a human
or a computer (Brown et al., 2020). Humans about a marginalized group from an LLM
were only able to guess with 52% accuracy, means removing any mention of that group,
which is barely better than random chance. even if it is positive (Welbl et al., 2021).
While there are potential benefits to being
able to perform so well, it is also concerning Overall, LLMs constitute a significant leap
because LLMs could circulate false or forward for Natural Language Processing,
damaging information that would be very and the field continues to expand and advance
difficult to trace (Better Language Models, rapidly. LLM developers may continue to
2019). compete primarily on the size of their models,
or they may eventually shift to improving the
Second, LLMs demonstrate gender, racial, quality. New uses and functions of LLMs, as
and religious bias. When asked to perform well as new ways to capitalize on them, are
a variety of word association tasks, GPT-3 emerging regularly, but the basic operation
produced violent associations with Islam of the technology described here remains
and negative associations with Black people, consistent. In the following sections, we
which reflects biases in the training data discuss the implications of widespread
(Brown et al., 2020). And this problem may growth and adoption of LLMs, both as the
not be solvable: thus far, even though fine- technology exists today, and as we expect it to
tuning in general can produce very different advance.
outputs, efforts to remove toxic language
Section 1: Exacerbating
Environmental Injustice
KEY POINTS
• LLMs will require the construction of more data centers, which will increase energy and
water consumption.
• Data centers will displace and disrupt the lives of already marginalized communities.
• Those living near data centers will experience resource scarcity, higher utility prices, and
pollution from backup diesel generators.
• Data centers will be classified as “critical infrastructure”, which will make them more
difficult to challenge. This will erode the civil rights of affected communities.
Although we tend to think of artificial world (Johnson, 2017; Morgan, 2020). They
intelligence as purely digital and “in are supported by ventilation systems, cooling
the cloud”, LLMs rely on physical data systems, and backup generators.
centers to process the corpora and run the
algorithms (Bender & Gebru et al., 2021). Data centers rely very heavily on natural
They are, in other words, part of the LLM’s resources. They already make up
“sociotechnical system”, which includes not approximately 2% of U.S. electricity use, and
just the immediate technology but also its contributed to 0.5% of the country’s total
developers and users, and the other artifacts, emissions in 2018 (Oberhaus, 2019; Siddik
institutions, relationships, and people that et al., 2021). They require large amounts of
make an LLM work (Hughes, 1983). Data water to cool and power the servers, with
centers themselves are also sociotechnical a single medium-sized, high-density data
systems, containing between fifty to eighty center requiring 360,000 gallons of water
thousand servers that store and process data. a day (Ensmenger, 2018). For comparison,
These servers use graphic processing units the city of Ann Arbor, Michigan, with a
(GPUs) that contain silicon chips as well as population of nearly 130,000 people, uses
rare earth elements that are mined around the 5 billion gallons of water per year (The City
of Ann Arbor, n.d.). Not surprisingly then, are invariably cheaper–are becoming more
data centers are already having significant attractive (Isberto, 2021). Microsoft is even
impacts on the US water supply: they draw experimenting with underwater data centers
water from 90% of US watersheds, and 20% (Roach, 2020). As data centers increase,
of data centers rely on watersheds that are they will require additional infrastructure
moderately to highly stressed (Selsky and including roads and people but also a massive
Valdes, 2021). Furthermore, much of the increase in natural resources.
consumed water is potable, derived from local
public utilities (Moss, 2021). As we noted in the Introduction, Emily
Bender, Timnit Gebru, and their colleagues
Despite this already-high consumption of observed in their “Stochastic Parrots” paper
resources, the current capacity of data centers that LLMs–due to their thirst for data and
is inadequate. To accommodate the rise of need for processing in data centers–would
LLMs and other types of AI, tech companies increase pressure on our energy systems and
will need many more of these large facilities. exacerbate climate change (Bender & Gebru
In fact, data centers owned by hyperscale et al., 2021). We agree with their conclusions,
providers like Amazon Web Services, Google but expect that the environmental impacts of
Cloud, and Microsoft Azure have doubled LLMs will be much greater and extend beyond
from 2015 to 2020 (Haranas, 2021), and data climate change. In what follows, we rely on
center investment is projected to increase by analogical case studies to suggest that the
11.6% to $226 billion in 2022 (Haranas, 2022). rise of data centers will disproportionately
They are typically 100,000 square feet in size, affect marginalized communities through
but the largest data center in the world (in displacement, direct harms, and curtailing
China) is over 6 million square feet, and the their civil rights to protest.
largest data center in the United States is over
3 million square feet (Allen, 2018). The United
States currently has the most data centers,
Data Centers will
with hubs around Washington D.C., New York Displace Marginalized
City, Chicago, Los Angeles, San Francisco, and
Dallas (Data Center Map, 2022; Berry, 2021);
Communities
in Europe, there are hubs outside London,
Amsterdam, Frankfurt, and Paris, and in In their quest to find the cheapest land
Asia in Hong Kong, Mumbai, and Singapore available to build data centers, LLM
(Data Center Map, 2022). Companies developers will likely displace low-income
choose data center locations based on their and marginalized people in both urban and
power grids, labor markets, transportation rural areas. This kind of displacement has a
networks, water access, and other social and long history. In the middle of the twentieth
a result, they have historically chosen more the federally funded highway program to
densely populated areas. But with the rise of eliminate areas they saw as areas of urban
distributed computing, rural areas–which “blight,” a vague term that could suggest
the brunt of the negative effects directly. The process of extracting natural resources
This is a common story. Although developers causes environmental degradation and
invariably claim that such facilities bring resource scarcity in the immediate area.
good jobs to the area (Day, 2017), and Companies mining lithium in Argentina
cities often provide tax breaks and other and Chile worsened water shortages in the
incentives, in the long term communities region (Frankel & Whoriskey, 2016). Mining
must manage a range of ill effects (Rayome, practices themselves produce chemical
2016; Fairchild & Weinrub, 2017). Power residue, particularly of sulfuric acid, dissolved
plants, oil and gas refineries, factories, and iron, copper, lead, and mercury, and this acid
other toxic release sites have a long history runoff can pollute both groundwater and
of being located in communities that lack surface water (U.S. Geological Survey, 2018).
the political power to fight back, but must Similarly, pipeline construction and use
endure the consequences. Perhaps most cause negative health outcomes through the
notorious is Louisiana’s “Cancer Alley”, an contamination of water sources and degrade
85-mile stretch of petrochemical plants and ecosystems of plant and animal life (Betcher
refineries where nearby residents are at a et al., 2019; Mall, 2021).
much higher risk of contracting
cancer (Allen, 2003). Marginalized
communities are subject to other
kinds of risks as well. As Montana
and North Dakota opened up their
Communities... worry that data
lands for oil extraction from the centers will stress their scarce
Bakken Formation, male laborers resources and cause pollution.
flooded the area (Stern, 2021).
These new employees stressed
the resources of economically
fragile areas, and rates of human Communities across the country have already
trafficking, sex trafficking, and missing and begun to worry that data centers will stress
murdered Indigenous women rose (First their scarce resources and cause pollution.
Peoples Worldwide, 2019). Long legacies Google recently gained approval to develop
of discrimination, coupled with land- several data centers in The Dalles, Oregon,
use, housing, and transportation policies, a region experiencing severe drought (More
make it difficult for these communities to Perfect Union, 2021). Citing trade secrets, the
escape these “sacrifice zones” (Fairchild company refuses to disclose how much water
& Weinrub, 2017; Baker, 2019). Similarly, its facilities will use, and local residents worry
oil pipeline construction has caused the that the company’s resource needs will be
destruction of culturally significant sites like prioritized over their own. Google says that it
Native American burial grounds, inflicting will drill wells, build water mains and develop
significant harm on Indigenous communities an aquifer to store water and increase supply
(Whyte, 2017). during drier periods, but this could create
additional risks to the community. Rural and
Whether or not data centers are formally LLMs will place enormous pressure on
designated as critical infrastructure, current data processing capacity, which
we expect the concerns of marginalized will trigger the development of data centers
communities to hold less weight in siting around the world. This will require not only
decisions. As we discuss above, dangerous the development of built infrastructure
rare earth mining practices continue despite but also massive resource extraction. Our
frequent community dissent (Business analogical case study analysis has suggested
& Human Rights Resource Centre, 2021). that already marginalized communities–
both low income areas and
communities of color–are likely
to experience the negative
Section 2: Accelerating
the Thirst for Data
KEY POINTS
• LLMs will further test the model of individual informed consent which currently governs data
sharing and privacy.
• LLM developers may use unethical tactics to diversify the corpora, placing a disproportionate
burden on marginalized communities.
• Users who formerly relied on human interpreters will feel that LLMs offer more privacy than
relying on another person.
In addition to data centers and natural are neither effective nor commonplace
resources, LLMs require vast amounts of (Privacy Considerations in Large Language
data. Much of it will come from us, the users Models, n.d.). This presents a serious
of the internet. As we discussed
in Background, LLMs already
extract text from old books
and across the web, including LLMs require vast amounts of
text from links posted on social
media. In turn, this raises data
data. Much of it will come from
security and privacy issues. us, the users of the internet.
While LLM developers have
adopted some practices to filter
out personally identifiable
information (PII, which can include full vulnerability to third-party extraction attacks
name, social security number, zip code, and and unintentional leaks of PII. However,
more) in LLM training corpora, such methods even if LLMs successfully screen out PII,
LLMs might still be able to triangulate bits and travel (e.g., those who are hearing
of disconnected information such as mental impaired) may actually be able to better
health status or political opinions that appear maintain certain forms of privacy.
in the corpora to develop a full, personalized
picture of an actual individual, their family, or
community (Kulkarni, 2021). Thus far, there
LLMs will Transform
has been little transparency as to whether Informed Consent
the most popular LLMs have been security-
tested, but the vulnerabilities are likely to
Most LLM corpora are created using a data
increase as model development increases.
collection method called web crawling, which
involves systematically traversing the entire
Meanwhile, Americans are increasingly internet to gather text. Much of this text was
concerned about data security: 79% of provided by the population through their
adults worry that companies are using their online activity when they upload web pages
personal information and 64% are worried or post comments. But few of us have any
about government data collection (Auxier et idea that our text is included in the corpora,
al., 2019). These concerns are valid as data much less which information is used or how
security is a challenge: while Illinois and it is used. In many cases, we may have already
California have passed data privacy laws, the provided our consent. We agree to complex
United States lacks federal legislation and and lengthy “click through” agreements to
much of the population remains unprotected use online services, such as WordPress or
by data privacy or security policies. Reddit, that allow third parties to have access
to the text we post, including for LLMs. This
In this Section, we analyze how LLMs will is problematic because few people read user
affect the privacy and security of personal agreements and are therefore unaware of the
information and accelerate a thirst for data. scope of their consent (Cakebread, 2017). And
We conclude that LLMs will likely be able LLMs pose particular challenges. As noted
to produce information about individuals above, the text we post can be triangulated to
and communities even if they are barred develop a full picture of us or to predict our
from including personally identifiable behaviors. If we post information about our
information (PII). As a result, publics will community or family, we are also consenting
become more hesitant to share information to data collection on behalf of others without
about themselves online. These information their knowledge. In sum, while some of us
practices will have uneven impacts for may consent to the use or sale of some of
marginalized groups: those who are the information we post, LLMs bring it all
underrepresented in the corpora are likely to together and expand the scope. This makes
be pressured to participate in LLMs and may the information more powerful and has
lose some civil liberties if they do not. But potentially serious implications even for
others, including those who currently rely on those who are careful about what text they
interpreters or translators to communicate post online.
The field of genomics has been dealing forensic databases that include the DNA
with these kinds of challenges for decades. information of all individuals convicted of
With the rise of mapping and sequencing (and sometimes, even arrested for) a crime
technologies and the infrastructure to build (National Conference on State Legislatures,
and process large databases of information 2014; Interpol, 2020). When they find DNA
about an individual’s genome as well as at a crime scene, they then search these
their health, environment, and lifestyle, databases to find not only matches but also
there has been growing concern about “familial matches”, i.e., people whose DNA
individual, family, and community privacy. partially matches the DNA found at a crime
An individual’s decision to get a genetic test scene. This helps police officers narrow down
has ripple effects for their family members. the pool of potential suspects and focus on
If someone tests positive for a gene mutation
related to Huntington’s Disease, for example,
then not only will their family members
feel additional stress or anxiety that their
loved one will soon experience a debilitating
neurodegenerative condition, but their
children will also be more concerned that they
too will have the disease (Oliveri et al., 2018).
However, these individuals have no say in
whether their family member gets a genetic
test. Similarly, when a handful of members of
a racial group or ethnic community choose to
participate in genomics research, all members
of the group are all affected by the findings.
In the 1970s, the US federal and state Credit: Darryl Leja, NHGRI
governments created screening programs to
identify African Americans at risk of sickle
cell disease, a painful blood disorder, and
ensure that they receive appropriate services a specific family. But, it also means that
(Duster, 1990). Unfortunately, the program individuals who never agreed to participate in
resulted in stigmatization and employment the database are affected by its findings. This
exclusion based on race; the US Air Force took place in the infamous Golden State Killer
Academy, for example, erroneously used the case, where investigators identified the killer
data to exclude sickle cell carriers from the via the genetic profiles of distant relatives
applicant pool. dating back to the 1800s. Clearly, these
relatives never consented to upload their
database can also have broad criminal out (Zabel, 2019). Some might argue that this
agencies across the world have created of public safety, but studies show that these
databases have a disproportionate number were quite isolated, and perturbed by the
of samples from historically overpoliced Western scientists making these requests.
communities of color and are thus more likely They were also distressed by the concept of
to affect them (Murphy & Tong, 2019). We individual, informed consent. After all, one
have also begun to see similar use of health person’s DNA would provide information
or ancestry-focused DNA services, such as about the whole community. What if others
23andMe. Even though individuals provide in the community questioned the use of
their DNA in a non-criminal context, the this information? In response, some of the
services have demonstrated a willingness to scientists proposed a new approach that
share this information with law enforcement. would include informed consent from both
For example, researchers investigating individuals and the group through “culturally
the Y-chromosome Haplotype Reference appropriate authorities” (North American
(YHRD) forensic database, which contains Regional Committee, 1997). Later attempts
300,000 anonymous genetic profiles, to map human genomic diversity, including
have raised ethical concerns over a lack of the International Haplotype Mapping Project,
informed consent for the Uyghur and Roma have tried to implement this approach (The
populations. Without knowledge of where International HapMap Consortium, 2004).
their genetic information will go, these
minority ethnic groups stand at an increased Another similar framework involves engaging
risk for persecution (Schiermeier, 2021). In all community members throughout the life of
of these cases, the decisions of a few people to a project. Researchers from the University
share information had widespread impacts. of Oklahoma followed this model when
conducting genomic research with Native
In almost all these cases, individuals American populations. They surveyed
provided free and individual informed participants about health decision-making,
consent, a framework developed in the explained intentions behind the project
latter 20th century in response to scandals during public meetings, and established
about unethical medical experimentation community review boards to review
and practice. But this framework is clearly manuscripts. While this approach can slow
insufficient for situations where one person down research and is often logistically
may share information that has implications challenging, it ensures public trust and
for others. In response, researchers have effective guidelines can help minimize
pioneered new approaches that take human negative effects (Foster et al., 1997). The way
connection seriously. Consider, for example, researchers obtain consent can have long
the Human Genome Diversity Project initiated term impacts particularly if the researchers
in the 1990s. Excited about the opportunity to want to return to the community for further
use new techniques to map genomic diversity, data collection. For example, in the case of
scientists identified populations around the Havasuppai in Arizona, the tribe provided
the world and began to ask for their DNA their DNA to Arizona State University
using well-established consent procedures researchers with the understanding that they
(Reardon, 2005). Many of these communities would use it for diabetes research (Harmon,
the trust between the university and the tribe. and realize the depth and breadth of the data
they are trained on as well as their potential
to disclose sensitive information, they will
LLMs will raise similar concerns and
hesitate to provide personal information
controversy, as people realize that their
online.
text is being used for purposes that they did
not intend or with which they disagree. If
developers do not address these concerns at In 2017, Equifax, an American credit
the outset, they risk further erosion of trust in reporting agency, suffered a cyber attack that
the tech industry, and ultimately, resistance uncovered and downloaded sensitive PII of
as we discuss in Section 6. However, they can over 140 million customers. Negligent Equifax
learn from the genomics and medical arenas, security officials were at fault because they
failed to install a security patch from
their software provider, Apache
Struts, which had been released
trust in the tech industry, and shares dropped 13% in early trading,
people were outraged over the lack
ultimately, resistance. of transparency about the attack,
and hundreds sued the agency
for damages and won. After these
sanctions, 54.2% of those publicly
which have been experimenting with new surveyed believed that Equifax should no
forms of consent. This includes both group longer serve as a credit bureau (Brown, 2018).
consent, described above, as well as qualified One year later, Equifax attempted to remedy
granular consent (Simon et al., 2011) which is the issue by providing consumers with free
designed to provide users with more authority credit monitoring but only 30.6% said that
over how their data is used. these steps had improved their perception of
the company.
standards and data governance practices cases, contains text from private sources.
– but keeping PII private continues to be a Hackers could extract specific parts of
challenge. In just the first quarter of 2021, training data that the LLM has memorized,
4 billion online accounts were hacked known as a training data extraction attack.
worldwide, with LinkedIn and Facebook An adversary with access to an LLM would
being the most vulnerable (C., 2022). simply have to input probable phrases (e.g.
Occasionally, the party misusing the data is “The phone number of John Doe is”…) and
the one collecting the data itself. For example, let the model complete information that
employees of the ride-sharing app Uber might reveal PII. Using confidential data
used the company’s database to track the to train the LLM is dangerous, as it risks
locations of politicians, celebrities, and even revealing information that users intended
ex-spouses. They exploited this “God View” to keep secret. This technique has already
feature for over 2 years with little to no user been put into practice, as Gmail’s auto-
knowledge (Evans, 2016). The cumulative complete model is trained on private text
effect is one of user mistrust across communication between users (Privacy
companies, and a feeling that the onus is on Considerations in Large Language Models,
the user to take preventative measures. n.d.). We expect that public-facing LLMs will
in part use confidential data for training,
But the effects go beyond loss of trust. which means personal data breaches will be
User behavior often changes dramatically. possible. But not all breaches of privacy will
Consider recent concerns over email trackers rely on private data and sensitive PII. Even an
in the most popular email clients (Google’s LLM that connects a person’s professional
Gmail, Microsoft’s Outlook, Yahoo Mail), online presence with their personal one could
user’s email address and activity on a user’s includes information about things like health,
web browser. With this information linked sexual orientation, or immigration status.
LLMs present similar risks, especially because give PII and may create new ways to stay
the training dataset is large and, in some anonymous online, such as tools that prevent
web scraping, although this may not be people of color, women, children, gender
accessible to or benefit everyone (Zou et al., non-confirming people–except white male
2018). adults (Grother et al., 2019). But they are
increasingly being used by law enforcement,
schools, airlines, and even, briefly, the
LLMs will Create Internal Revenue Service (Epstein et al.,
New Forms of Data 2022; Galligan et al., 2020). To deal with the
technology’s accuracy problems, developers
Exploitation have sought out pictures of individuals from
marginalized communities. Most famously
With the rise of surveillance capitalism and problematically, a contractor hired by
(Zuboff, 2019), all digital data has increased Google targeted attendees of the BET Awards,
in value. LLMs exemplify this trend as they college students of color, and even homeless
take advantage of the freely generated data of Black people in Atlanta for facial scans. The
millions of individuals to produce commercial practice was exploitative, as volunteers were
technologies that summarize, generate, and rushed through consent forms and misled
predict language. But in this ecosystem, not about what would be done with the scans
all data has the same value. At present, LLM (Dillon, 2019).
corpora overwhelmingly include English
or Chinese language texts, and many of
Similarly, as we suggest above, the rise of
these texts are quite old. The racist, sexist,
genomic science has also made particular
and homophobic output described in the
genomes valuable. This has, in turn, triggered
Introduction is one result. In order to ensure
unethical practices and created new burdens
that LLMs are more useful and less offensive,
for already marginalized communities. A
developers are keen to expand the corpora to
2019 controversy at the UK’s Sanger Centre,
include more languages and dialects, genders,
the UK’s premier genomics institute,
cultures, and populations. History suggests
echoes the Havasuppai and Human Genome
that the texts least likely to be currently
Diversity Project cases described above
represented in the corpora, and thus most
(Stokstad, 2019). The Centre was trying to
likely to be valuable in the future, come from
develop and commercialize a “gene chip”
marginalized communities and cultures. But
that would identify genetic links to common
as developers try to improve their models
diseases, and needed African DNA samples
by expanding corpora in these directions,
in order to ensure that this technology was
they will create new forms of exploitation
adequately representative. So it entered
that will disproportionately affect already
into agreements with scientific institutes in
marginalized communities.
Africa that had collected indigenous DNA.
But it did not disclose that the DNA would be
Facial recognition technologies have posed used commercially, and many of the original
similar problems. They are famously DNA sharing agreements had forbidden
inaccurate among all populations–including this kind of use. The African scientists who
collected the data worried that it would In the coming years, LLM developers are
alienate communities who had just begun to likely to prioritize collecting texts from
participate in genomics research (AT Editor, marginalized communities in the name
2019). of increasing accuracy. This might mean
purchasing access to non-digitized
texts in a variety of languages, or
deploying speech-to-text apps
to capture the rare dialects of
Some communities have some communities. But given the
learned how to take advantage power of these companies, there
proper credit and compensation (Muller, (CI) has enhanced privacy by eliminating
At their core, LLMs rely on data about the communities who have traditionally
the way humans communicate through relied on assistive technology may gain some
language and thus, LLM developers will immediate privacy but will now be disclosing
continually need data to maintain their their personal information to an LLM.
Section 3:
Normalizing LLMs
KEY POINTS
• Developers will try to incorporate LLMs into existing sociotechnical systems, particularly
those governed by trusted institutions, in order to ensure their longevity.
• When LLMs produce hateful language or errors, developers will deflect blame onto
infrastructure or human users.
Developers have made experimental LLMs But these conversations have been largely
available to both researchers and publics, restricted to the field of artificial intelligence
which has stimulated early excitement and specialist technology publications. An
about the technology’s text summarization, important exception is the 2021 controversy
generation, and translation capabilities. over Google’s firing of Timnit Gebru and
And yet, as we have noted repeatedly thus Margaret Mitchell, which we discuss in the
far, there is already concern about the social Introduction. Newspapers across the globe
harms. Journalists worry that LLMs will be covered Gebru’s firing, likely due to combined
able to generate articles and further damage concerns about the practices of the major
their job prospects (Seabrook, 2019). Others technology companies, artificial intelligence
worry that like cryptocurrency, LLMs will and algorithmic bias, and heightened media
require the use of so much energy that they attention to discrimination against Black
will make it harder to fight climate change people in the wake of George Floyd’s murder
(Bender & Gebru et al., 2021). There is already in June 2020.
evidence that LLMs will reflect the historical
biases of English-language texts by using These events have likely triggered some
racist and sexist language and reproducing skepticism about LLMs, most significantly
harmful assumptions about marginalized within Black and research communities.
communities (Abid et al., 2021). Despite these concerns, based on our
analysis of analogical cases we expect In the project’s early days he said to the MIT
Google and other developers to emphasize Technology Review that OLPC “is probably
LLMs’ democratizing and even empowering the only hope. I don’t want to place too much
potential, as well as their modularity. In fact, on OLPC, but if I really had to look at how to
they have already begun to highlight the eliminate poverty, create peace, and work
broad social benefits particularly in terms on the environment, I can’t think of a better
of increased access to crucial services such way to do it” (Ames, 2019). As Negroponte
as legal aid and health advice. They will and his team tried to sell the technology first
also try to make LLMs ubiquitous quickly, to investors and then to the governments of
and promote their use particularly among Southern countries, he framed it as not just
established authoritative institutions. In transformational but as leveling the playing
the process, they will continue to dismiss field across the world.
the technology’s limitations and errors,
deflecting blame onto infrastructure and Although OLPC was explicitly designed for
other users. humanitarian purposes, we expect LLMs to
be framed in similar ways. Consulting firm
and advocates, the technology has become published these scientists’ findings (Rabin,
harder to challenge. While civil society groups 2020; Harris, 2020). But today, doctor’s
and even policymakers have tried to ban or offices and hospitals still regularly use the
otherwise regulate the technology, they have pulse oximeter as part of their health care,
met limited local success. And as a result, to determine the severity of their patient’s
facial recognition’s reach is growing: in 2022, condition. It seems too difficult to change
Clearview AI, with one of the largest indexes professional practices, despite the human
of faces, announced massive expansion cost.
of their services beyond law enforcement
(Harwell, 2022). The situation is similar with
the breathalyzer, which is used to evaluate
Deflecting Blame for the
cognitive impairment due to alcohol (Cowley Technology’s Problems
& Silver-Greenberg, 2019). Despite extensive
evidence that it generates inaccurate results,
Especially in the early days of LLMs, we
it is still widely used by law enforcement.
expect users to identify a range of errors and
problems with the technology. Developers
This technological entrenchment is not will first try to maintain the technology’s
unique to law enforcement. Consider the credibility by ignoring these problems. If
pulse oximeter, which assesses blood oxygen that proves impossible, they will likely blame
levels, and is crucial to diagnosing severe the infrastructure or users. Let’s return to
cases of COVID-19. In 2020, an anthropologist the OLPC. It never had the positive impacts
published an article observing that the that Negroponte envisioned. Demand for the
device was likely to be less accurate among machine was less than anticipated, and even
people of color because its reading is based when governments or civil society groups
on light refraction (Moran-Thomas, 2020; donated them to low-income children, many
Sjoding et al., 2020). A few months later, a broke or simply went unused. And yet, OLPC’s
group of physician scientists validated this developers largely do not acknowledge the
hypothesis through a randomized controlled technology’s failure (Ames, 2019). When they
trial: they found that people with darker skin do, they suggest that the technology simply
tones tended to have higher readings than lacked the needed support structure.
their white counterparts. This means that
when already marginalized people of color
Or, consider Boeing’s introduction of the
went to the hospital unable to breathe, a
Maneuvering Characteristics Augmentation
pulse oximeter reading might suggest to the
System (MCAS) system in its 737 MAX planes,
health care professional that they were not in
and the subsequent crashes of two planes in
distress. Likely as a result, in the early days of
Indonesia and Ethiopia in 2018 and 2019. 346
the COVID-19 pandemic Black patients were
passengers died in total. Boeing had installed
turned away from hospitals because their
the technology in its planes without alerting
blood oxygen was not low enough (Lothian-
regulators in the US or elsewhere. But when
McLean 2020; Rahman 2020). The New York
the first plane crashed in October 2018, the
Times and other prominent media outlets
Section 4: Reinforcing
Social Inequalities
KEY POINTS
• Individuals, particularly those from already marginalized communities, will bear the blame
for LLM error, rather than the developers or the technology itself.
• LLMs will reinforce English language, and ultimately Anglo-American, dominance, and
alienate those outside these cultures.
• Because they seem technical and objective, LLMs will obscure systemic biases embedded in
their design. This will make inequities even harder to identify.
In this section, we shift from the construction language towards minoritized groups. In
of LLMs to the implications of their use. AI other words, they reflect historical biases. Not
developers tend to emphasize the objectivity surprisingly, then, LLMs that generate “new”
of their technologies, but scholars point content end up reproducing these biases often
out how technologies always reflect the in the form of violent language (Abid et al.,
societies that make them (Benjamin, 2021; Tamkin et al., 2021). But fixing these
2019; Parthasarathy, 2007). The history of problems isn’t just a matter of including
technology provides numerous examples more, better data. LLMs are built and
of tools and systems that are presented as maintained by humans who bring prejudices
value-free and yet are skewed by built- and biases to their work, and who operate
in biases including racism, sexism, and within institutions, in social and political
xenophobia. The same is true for LLMs. contexts. This will shape the biases that
As described in the Introduction, they are developers perceive, and how they choose
trained on vast datasets composed of internet to fix them. Meanwhile, researchers have
text and historic literature; both contain already brought attention to how artificial
enormous amounts of prejudiced and hateful intelligence is exacerbating what they call a
“compute divide”: wealthy companies and to replicate and perpetuate racism and other
academic institutions have greater resources biases. Second, people are likely to assume
to invest in emerging technologies, which is that because they are based on vast amounts
likely to reflect their worldviews, needs, and of data and produced by highly technical,
biases (Ahmed & Wahed, 2020). proprietary algorithms, they will be objective.
Therefore, these biases will be harder to
In what follows, we suggest that LLMs identify and challenge. This section identifies
are likely to reproduce social biases in a these biases at a societal level; other sections
variety of ways beyond what observers discuss how LLMs will affect equity in the
oximeter, the LLM’s advice is likely to be and patients must spend months or years of
inaccurate too. And while these cases focus therapy to develop new connections in the
on medicine, we can imagine other domains brain to accommodate the CIs and learn to
including criminal justice, housing, and associate the signals with different sounds.
education where biases and discrimination In other words, deaf adults may still need
enshrined in historical texts are likely to auxiliary services even when people assume
generate advice that perpetuates inequities in their problem has been solved.
resource allocation. Aadhar, India’s biometric
identification system, has already begun to Meanwhile, just as CIs were approved,
highlight such inequities. In order to receive academics and activists developed the concept
an Aadhar number, citizens must provide of Deaf Culture (Denworth, 2014). Deaf
their fingerprints, iris scans, and their Culture is based on shared values, experiences
photograph. For many of India’s marginalized and beliefs of people influenced by deafness,
communities, such forms of identification and serves as a form of organizing political
are impossible to provide: their fingerprints power. Activists define deafness as a neutral
have rubbed off due to years of manual labor, trait rather than a disability. Because most
or they cannot provide an accurate iris scan deaf children are born to hearing parents,
due to a disability (Singh & Jackson, 2021). acculturation happens primarily within deaf
And yet, these are the communities that organizations and schools. Activists fear
need Aadhar the most: in order to access any that the perception of CIs as a cure might
social services, they must provide not just lead to the reinterpretation of deafness as
their Aadhar number but also their biometric a voluntary disability, which could result in
information. the defunding of deaf institutions (Tucker,
1998). This could also make it more difficult
Cochlear implants (CIs), which we to access accommodations in employment
introduced in Section 2, also demonstrate or education such as interpreters, and
how technology can distort needs and alienate deaf people for whom CIs do not
ultimately erode services for marginalized work or who choose not to get CIs (Cooper,
populations. The FDA first approved CIs for 2019). Similarly LLMs, as a cheap and fast
adults in 1984 and for children in 1990 and translation and interpretation tool, could
consist of two components: a permanently actually lead to a reduction in other kinds
implanted internal set of electrodes that of support, including human interpreters,
interface directly with the nervous system, written materials offered in languages other
and an external processor that picks up than English, and even language learning
sounds and translates them to patterns of programs. This would create harm for
electrical impulses for the electrodes. While Indigenous groups, marginalized groups that
many may assume that CIs cure deafness by use dialects not covered well by LLMs, and
giving the wearers the same level of aural immigrant communities. After all, because
capacity as those with natural hearing, in fact the LLM corpora are mostly in English and
they only have a small range of audial inputs Chinese, they will be less accurate in other
languages. However, most people may not produce a risk score that assesses the urgency
know about these deficiencies. In high stakes of individual reports that come in through a
settings such as hospitals and courtrooms, child welfare hotline. Though it is designed
human translators, though fallible, can rely to be used in tandem with a human screener,
on a variety of cues to ensure the person they in practice the algorithm tends to train the
are assisting understands what is happening, humans, and over time the screeners’ scores
and can ask and answer clarifying questions. begin to match the algorithm’s. In other
LLMs cannot do those things. If the LLM has a words, independent human oversight is
poor understanding of a particular language, diminished.
or is otherwise unable to accurately translate
technical medical or legal terminology, AFST is supposed to be objective and evidence
individuals are left without support, which based, but its results overrepresent poor
depending on the setting could result in a and working class families in ways that
variety of negative outcomes. become a self-fulfilling prophecy (Eubanks,
2018). Simply asking for support from public
We can also expect similar outcomes when services including childcare, tutoring, or
LLMs are used in social service provision. therapy increases a family’s risk score in the
LLMs may be used to
automatically screen
applicants, or they might
be used as a chat function When institutions such as social
on websites to assist service agencies, hospitals, insurers,
people seeking resources
or banks use LLMs to determine
or help. But the historical
use of automated decision eligibility for or recommend products
making tools by social and services, we can expect that
service agencies produced
LLMs will make recommendations
results that are biased or
inequitable in ways the rooted in historical biases that will
tool is meant to prevent. then produce inequitable outcomes.
Allegheny County, based in
southwestern Pennsylvania,
adopted the Allegheny
Family Screening Tool (AFST), a computer- system. When wealthier families get that
based program designed to assess the risk same support, they do so privately so it does
that a child might be experiencing harm and not affect their score. As a result, simply
require intervention (Eubanks, 2018). The being poor or requesting help become “risk
AFST uses a wide array of historical data factors”. When a family’s score is higher,
about children and families, including data it increases the likelihood that a report will
from local housing authorities, the criminal result in a home visit, and pulls parents into a
justice system, and local school districts, to system of increased state surveillance, which
itself increases the risk of further harms such training data, rather than a characteristic
as removal of children from the home, that of these communities. Regardless, the
parents will be arrested or lose their jobs, or consequences are dire: these decisions will
even lose their housing. These outcomes then generally favor the powerful, and further
get fed back into the algorithm as evidence perpetuate inequitable distribution of
that its risk assessment was correct, even resources.
though in fact the risk assessment caused the
outcomes.
LLMs will Reinforce
When institutions such as social service Dominant Cultures
agencies, hospitals, insurers, or banks
use LLMs to determine eligibility for or Developers have argued that LLMs hold
recommend products and services, we can tremendous promise for language translation
expect that LLMs will make recommendations (Brown et al., 2020), which will ultimately
rooted in historical biases that will then promote closer relationships across
produce inequitable outcomes. They may communities and international cooperation.
systematically miscategorize or misinterpret We discuss how this capacity might transform
people based on language use patterns, or scientific work in Section 7. However, as
fail to include key elements of their situation. noted above, the largest and most powerful
Overall, like all conclusions drawn from LLMs are being built in the United States and
huge datasets, the LLM is likely to focus China, and their corpora are overwhelmingly
on correlations in historical data (Bender dominated by English and Chinese language
& Gebru et al., 2021; Spinney, 2022). When texts. Although developers are optimistic
evaluating eligibility for social services, LLMs that LLMs will be able to translate across
might reinforce stereotypes about people languages based on minimal training text,
who have previously used social services mistranslation is still likely. We are also likely
such as food banks or have a criminal record. to see language dominance (often English)
They may also be used to collect and store reinforced and cultures distorted and even
additional information about people, similar erased.
to the way facial recognition technologies
are being used in housing under the guise
Let’s return to the case of cochlear implants.
of ease of use (Strong et al., 2020). Or,
Some deaf activists argue that they are an
institutions might use LLMs to determine
attempt by hearing medical professionals
whether a request for services is sincere or
and parents to erase deaf culture (Ramsey,
to provide advice in the interest of lowering
2000). They fear CIs will lead to fewer
the workload of social workers. LLMs might
students participating in deaf organizations
recommend against issuing a loan to some
where acculturation to deaf culture and
historically disadvantaged communities of
visual language learning happen, and that
color based on historical evidence that they
a generation of deaf children will grow up
might default. But this correlation is the
without learning sign language and will
outcome of prejudice and stereotypes in the
social or health care services. And all the regions of the country, including urban,
while, they may not know that the problems suburban, and rural areas, private cars are
are systemic and instead blame themselves. the only available method of transportation,
but cars are inaccessible to large portions
The same is true for pulse oximeters, of the population. They are expensive to
care professionals trusted the technology and 40% of people with disabilities in
and perhaps also mistrusted the patients the U.S. cannot or do not drive (Bureau of
due to systemic biases (Fitzgerald & Hurst, Transportation Statistics, 2018). Meanwhile,
manage the consequences without knowing services, and markets are only accessible by
that the seemingly objective technology had car. But governments and businesses rarely
failed them. We anticipate similar outcomes acknowledge or do anything to address these
with LLMs. They might produce biased text, inequities. As a result, these communities
or some communities will not be able to use are not only further marginalized, but also
them due to financial limitations, disability, alienated because their concerns seem rare
or language barriers. But the technology and out of the mainstream (Schmitt, 2020).
Finally, we know that building and even though they are less likely to have access
amount of computing power. Even using other forms of transportation or protect the
them requires access to a computer and high safety of more vulnerable road users (Shill,
speed internet; as LLMs become embedded 2020; Singer, 2022). Lack of access to a car,
in more areas of life, lack of access will as well as being killed or injured by one, is
deepen existing inequalities, but the cause treated as a personal shortcoming, rather
will not be visible. In the US, for example, than as a societal failure. Lack of access to
both infrastructure and architecture are LLMs, as well as any negative impacts an LLM
largely built for cars. Both road design might have on a person’s life, will similarly
and transportation policy favor the speed be blamed on the individual, rather than the
and convenience of people driving private systems that produced those impacts.
they rely on historical texts, they are likely creating historically arrested caricatures of
to reproduce systemic biases reflected in others. Finally, as they become ubiquitous
those texts. But, these biases will not be their limitations and errors will become less
clearly visible to LLM users because they clear. Users will absorb the responsibility and
will be reproduced by seemingly objective blame, sometimes without even realizing it.
algorithms. Second, they are likely to This phenomenon is likely to be much more
reinforce already dominant cultures, while acute in marginalized populations.
Section 5: Remaking
Labor and Expertise
KEY POINTS
• LLMs will transform, rather than replace, most occupations. In most cases, humans will shift
to more complex and risky tasks.
• LLMs will transform authorship and associated standards for certification and evaluation.
for particular services or treatment plans. In By the middle of the 20th century, new
market environments like the United States, technologies such as chromosomal analysis
some are even prepared to visit another assisted their work, but counselors still played
physician to confirm their diagnosis and the primary role in interpreting the results
facilitate their proposed treatment. Thus and guiding people through difficult decisions
far, the “new expert patient” has not led to
massive deprofessionalization of medicine.
However, the doctor-patient relationship has
changed (Tan & Goonawardene, 2017; Broom,
2005). Rather than providing their clients
with information, physicians are increasingly
focusing on helping them manage and
interpret it. This new role comes with new
expectations, as it requires physicians to
stay up to date on new medical research. And
patients may make more demands for access
to diagnostic, prevention, and treatment
technologies. LLMs are likely to increase
patients’ access to biomedical knowledge.
As more scientific research is incorporated Credit: Sven Dowideit (CC BY SA 2.0)
LLMs will also perform more mundane tasks offer genetic testing directly to consumers.
and shift the risky work onto humans. In In contrast to genetic counseling services
the early part of the 20th century, genetic available mostly at universities for relatively
services were only offered to the public high prices, genetic tests could be ordered
through geneticists or genetic counselors online for a much lower fee. And testing
who had extensive graduate training and companies claimed greater accuracy than
centers. These experts work with families to these direct-to-consumer genetic tests play
with particular diseases and then use this disease. While specialized genetics clinics
information to predict whether a disease remain, they are small, unknown to many,
might emerge in subsequent generations and tend to focus on complex cases. Primary
and advise how to avoid such circumstances. care physicians are often not able to answer
workers to resist LLMs as well. Labor In addition to automation, there are several
mobilization is likely to emerge first among recent cases of workplace surveillance
unions, and then spread more widely. technologies prompting resistance from
workers. In Amazon fulfillment
centers, inventory scanners also
track the workers; the scanners
We can expect some affected calculate the workers’ efficiency and
Workers, both unionized and not, have software to track union activities in facilities
consistently opposed the implementation (Del Rey & Ghaffary, 2020). In response,
of automated checkout devices at grocery employees started using encrypted
stores, arguing that they eliminate jobs and communication platforms to organize
(Harnisch, 2019). Some cashiers opposed is less coordinated. Truck drivers have
Pilon, 2019), and the United Food and creative modes of resistance that are less
(UFCW) developed public campaigns against (Levy, 2016). One trucker, for example,
Amazon’s cashierless grocery store model simply found a way to reprogram his vehicle’s
(UFCW, 2020). Consumers have also refused surveillance technology so that he could play
to engage with automated POS systems in solitaire. During the Covid-19 pandemic when
solidarity with grocery workers, as well as office workers shifted to remote work en
in opposition to the loss of tax revenue that masse, companies deployed tracking software
results from replacing tax-paying human on employee computers that captured when
workers with machines (Harris, 2018). people were using their mouse or keyboard.
These fears of job loss and degradation are In response, employees started using mouse
not unfounded; grocery stores in France jigglers, keyboard tappers, and special apps
used automated POS systems to circumvent to trick the trackers (Cole, 2021). When bosses
national labor laws, keeping stores managed use technology to control their workers,
by machine check-out counters open past workers use technology to evade control.
legal working hours (France24, 2019; In the next Section, we describe how this
Alderman, 2019). The case of POS automation technological one-upmanship leads to a loss
or services that were previously limited by onto existing corporate hierarchies rather
certification and professional gatekeeping, than disrupting or overturning the current
as in the case of direct to consumer genetic economic order. Even so, we expect that
testing. Some will improve convenience for LLMs will exacerbate existing tensions
users at the expense of workers whose job between workers and corporations. Because
or expertise becomes obsolete. Meanwhile, of their flexibility, LLMs will function as both
whole new forms of work and expertise automation and surveillance technology,
will grow around the design, management, producing many areas for worker and
and use of LLMs. Those new roles will map consumer resistance.
Section 6: Increasing
Social Fragmentation
KEY POINTS
• Publics will use LLMs to gather information that aligns with their interests and values, and
ultimately challenge traditional expert authorities.
• LLMs will help outsider groups participate more actively in highly technical discussions
related to science, technology, medicine, and the economy.
• LLMs will be less useful to already marginalized groups, increasing their social alienation.
Thus far, we have described LLMs primarily impacts for institutional trust and social
as a workplace technology that can improve cohesion. In recent years, the United States
professional life. But we also expect has seen declining trust in authoritative
that publics will find great use in these institutions, from the government to science
technologies. As we suggested in the last (Kennedy et al., 2022). We also have less
Section, patients may use LLMs to access trust in one another (Rainie & Perrin, 2019).
technical information about their medical Citizens feel that decisions made by elite
conditions, which will empower them in their institutions do not reflect their knowledge
interactions with physicians. Governments and lived realities (Parthasarathy, 2017).
might use LLMs to extract insights from Media fragmentation has contributed to this:
large volumes of public comments about a citizens can now find information that fits
proposed regulation, as a step towards more with their needs and values. We expect that
politically legitimate policies. LLMs will accelerate this trend. Publics will
solicit information that aligns with their
But this movement towards public interests and values, which may contradict
empowerment will likely have negative expert knowledge authorized by, for example,
scientific, legal, and medical establishments. over the Bible’s teachings because the text
LLMs will also be a crucial information- was in Latin. Worshippers could not access
finding tool for social justice groups the Bible’s teachings directly. German priest
traditionally excluded from technology Martin Luther was frustrated that his fellow
and technology policy domains, which will priests abused this power; for example,
continue to erode institutional legitimacy and they claimed that the Bible endorsed the
accelerate social fragmentation. Meanwhile, exchange of money for admittance to heaven
LLMs will generate questions about (Edwards, 2004). Luther responded by
authorship that will erode interpersonal trust. translating the Bible into vernacular German
Overall, we expect that the least privileged and using the newly developed printing press
groups will experience the greatest social to disseminate the text, and his critique of
alienation. These groups already feel ignored the Catholic Church, across the country. His
by authoritative institutions and as we actions triggered the decline of the Catholic
discuss earlier in this report, LLMs are likely Church and rise of Protestantism in Germany
to reproduce historical biases about, for and other parts of Europe (Dickens, 1974).
example, the physiology, health, and lives of
marginalized communities. Similarly, patients today use the internet
and social media to challenge the knowledge
their concerns and the evidence they have they will need to evolve to the changing
uncovered (Murray et al., 2003). In some information landscape.
cases, following misleading online advice can
cause physical harm (Bessell et al., 2003), as
in the recent case of COVID-19 treatments
LLMs as a mobilization
(Mariana, 2020). Overall, because physicians tool
no longer have exclusive access to medical
knowledge, the internet is eroding some of
We also expect activists to use LLMs to
their power. Patients now feel emboldened
challenge particularly technical areas of
to ask questions, and sometimes get other
science, technology, and public policy. In
medical opinions if they are unsatisfied with
recent decades, communities–particularly
the first.
those that are low-income and historically
disadvantaged–have become frustrated that
In Section 5, we discuss how LLMs are likely science and technology do not reflect their
to reshape knowledge-based professions. needs and priorities, and have mobilized in
Here, we conclude that in the aggregate this response. But in order to influence decision
will also destabilize the cultural power of making, they need to develop a technical
professionals. Developers may create apps understanding of the issue at hand and
also translate their concerns into
quantitative or scientific language
(Parthasarathy, 2010). In the 1980s,
fed up with the lack of research
LLMs will destabilize the cultural and treatments for AIDS, activists
power of professionals. taught themselves immunology,
microbiology, public health, and
the science of clinical trials in
order to translate their concerns
that provide individuals with medical or to scientists and policymakers (Epstein,
legal advice, or scientific information. LLMs 1996). They used this knowledge to force
could go as far as generating contracts such the National Institutes of Health (NIH) to
as drafting an amicus brief in a court case, or fund more HIV/AIDS research, demand that
offering chat-based psychiatric services, thus the Food and Drug Administration expedite
providing individuals with direct access to approval of potentially useful treatments, and
some of the services that they would normally change how biomedical scientists tested the
only access through experts and often for effectiveness of drugs. A decade later breast
a significant fee. We expect that this will cancer activists, similarly concerned about
profoundly challenge social understanding of the scale and ferocity of the disease and what
expertise and authority structures, and may they perceived to be a weak scientific and
even challenge certification systems. While government response, took a similar path
authority figures will still be necessary for (Dickersin et al., 2001). They set up special
some tasks such as writing prescriptions,
workshops to train people with breast cancer community finally began to notice, and
about the science of the disease, diagnostics, Congress authorized a research program
treatments, and prevention measures, and dedicated to the disease, they were frustrated
then successfully lobbied the government that their own knowledge-gathering and
to include these people on committees that expertise were quickly dismissed once
reviewed applications for breast cancer authoritative figures stepped in.
research funding.
(Ottinger, 2010). Ultimately, this influenced that fits with their interests and values,
air pollution monitoring systems not only which means that neighbors might end
but around the world (Scott & Barnett, 2009). Of course, this phenomenon is not new.
Similarly, patients have used technology to Benedict Anderson (1983) once famously
Recently, long COVID sufferers used Twitter communities”, but we are now seeing how
to identify themselves, find one another, the explosion of online media and cable news,
and crowdsource symptoms and treatments by providing content according to the user’s
months before scientists or physicians needs and priorities, can actually produce
acknowledged that the condition existed the opposite. While this diverse media
(Callard & Perego, 2021). However, although landscape broadens the perspectives involved
they were pleased that the biomedical in public and policy discussion by allowing
more people access to information and the Cable news channels were successful, building
ability to find communities where they are loyal audiences over time. Fox News, in
most comfortable, it also increases social particular, became a favorite for conservative
fragmentation. We expect that LLMs will audiences and eventually integral to their
further erode the shared realities that were identities (Hoewe et al., 2020). These
once constructed through common media channels don’t just provide different
consumption and ultimately, erode social perspectives on the same issues. They often
trust. report on different issues entirely, which
gives their viewers different understandings
The impact of the fragmentation of US of what is happening in the world and makes
television news provides a cautionary tale. it difficult to maintain a shared reality, which
For decades, Americans gathered around their contributes to political polarization (Gordon,
television sets at 6pm to watch the evening 2000). In recent years frequent viewing of Fox
news on one of three broadcast networks: News, for example, shaped attitudes towards
NBC, ABC, or CBS. Executives at these building a wall on the US border with Mexico
networks had spent the day deciding which and government action regarding climate
news was important to tell the American change (Hoewe et al., 2020).
erode shared realities even further. As noted material in the database, it flags the paper
above, unlike social media, LLMs will not for cheating. Students are required to submit
even be able to build new communities. their papers to Turnitin for plagiarism review
Meanwhile, developers will have an incentive before they go to the instructor for grading.
to intentionally tune LLMs to generate text However, this has triggered a technological
that is as attention-grabbing as possible, in arms race. A variety of services have emerged
order to capture more users. to help students cheat while evading
detection by Turnitin, from websites full of
In addition, as LLMs get better at writing text how-to advice to paid essay writing services.
that is indistinguishable from something a They are all findable with a quick web search.
For example, many schools and universities writers will find new ways to take advantage
today use plagiarism detection technologies of LLM capabilities without detection. In the
to prevent student cheating. One such service, long run, this will foster cultures of suspicion
mechanisms for empowerment and increased service” (West, 2019). Scholars call this
the corpora coupled with the priorities of self-reflection, empowerment, or care for
developers will likely result in a technology a select group (Gilliard & Golumbia, 2021).
that is more useful for dominant groups This framing focuses on white and wealthier
than for communities that are already members of society, who tend to view law
LLMs better reflect their views and linguistic many individuals dislike surveillance, they
style. There is already evidence that LLMs may feel like they have nothing to hide and
reflect racial and other forms of social bias are in control of the technology (rather than
against marginalized populations (Abid et al., the opposite). By contrast, people of color
2021; Greene, 2021). Meanwhile, privileged and other disadvantaged communities tend
members of society are likely to have more to experience “imposed surveillance”, such
opportunities to shape the ways LLMs are as Detroit’s facial recognition technology
integrated in daily life. This, in turn, will program Project Greenlight, which is imposed
create distance between social groups. The on the local population. They tend not to trust
more that LLMs shape public and private police or surveillance technologies because
sector services, the more marginalized they have a long legacy of being victimized by
Biometric technologies, which track location useful to these communities, it may make it
and bodily measurements such as heart more difficult for these individuals to access
rate, pulse, and sleep, have similar disparate crucial programs that might improve their
impacts (Gilliard & Golumbia, 2021). Law lives.
enforcement officials have long used ankle
monitors to keep track of individuals caught This is likely to increase social alienation of
up in the criminal justice system, whether out already marginalized communities. Thanks
on bail, on house arrest, or on parole. But the to the long legacy of racism in biomedicine,
ankle monitor is quite similar to the FitBit or Black communities distrust both scientists
Apple watch, except that the agency of the and physicians. This has had serious impacts
user differs. As we discuss in Section 4, even during the COVID-19 pandemic, for example,
roads have had these kinds of impacts. In the when the US Black community had a
1950s, city planners across the United States particularly low vaccination rate (Willis et al.,
used the emerging interstate highway system 2021). Similarly, frustrated by a long history
to segregate Black and white communities. of media bias (Race Forward, 2014), Black
This quickly became a crucial dividing line Americans tend to trust media sources based
that allowed white neighborhoods to attract on how they portray their racial group (Kilgo
investment and feel protected, while their et al., 2020).
Black neighborhoods were isolated and
economically starved (Miller, 2018).
LLMs’ capacity to summarize
and generate text will
undoubtedly benefit users
L L M CAS E ST U DY
Section 7: Transforming
the Scientific Landscape
KEY POINTS
• LLMs will transform both the kind of research scientists do, and how they do it.
• Academic publishers are likely to develop LLMs to maintain their monopoly power over most
scientific literature.
• Using LLMs to conduct scientific evaluation will generate controversy among scientists.
Throughout this report, we have anticipated are likely to be of greatest interest to the
the social, political, and equity implications scientific community because they will be
if LLMs are adopted across a range of sectors. the most up-to-date, in contrast to publicly
In this chapter, we examine how LLMs might available LLMs that may contain slightly
transform one sector in particular: science. older scientific knowledge. As they become
In this analysis, it is crucial to remember that more important to academic researchers,
the major LLMs currently under construction universities may be forced to maintain their
are based on corpora composed primarily of subscriptions. Less likely is that academic
open access texts available online. But, most publishers will sell their texts to the large
recent research publications–particularly companies for inclusion in their corpora,
scientific journal articles–are owned by because it would make their texts essentially
academic publishing companies such as available to everyone.
Elsevier and JSTOR. Therefore, they are not
part of these corpora. We expect that these In this new environment, LLMs will
publishers might develop their own LLMs transform scientific practices, including
that leverage their proprietary text databases, authorship and citations. They may also
particularly at a moment when universities transform peer review systems, which have
are frustrated by their high fees (Resnick increasingly come under scrutiny. LLMs will
& Belluz, 2019). These proprietary LLMs also reinforce Anglo-American dominance in
science. While they may help some scientists We also expect LLMs to profoundly shape
from low and middle income countries scientific practice. The development of
participate more actively in the international particle accelerators in the 1930s allowed
scientific community and engage in cross- physicists to investigate the structure of
national collaboration, the English and the atomic nucleus, and more recently to
Chinese language dominance of the corpora investigate subatomic particles (Ishkhanov,
will limit efforts to “decolonize” science. 2012). The polymerase chain reaction
Finally, LLMs will limit the power of the open technique, which makes millions of copies of
access movement, as academic publishers small pieces of DNA, transformed genetics
are likely to have more resources than and biotechnology research and enabled
governments, non-profit organizations, and mapping and sequencing the human
individuals to generate LLMs. genome, the study of ancient DNA, and
gene manipulation including CRISPR gene
editing (Rabinow, 2011). And the internet has
LLMs will transform already had profound impacts on research.
scientific practices It has made it easier for scholars to read
research across fields, and thus promote
interdisciplinary thinking (Herring, 2002). It
Remaking scientific has also helped researchers contact a wider
authorship and methods array of potential subjects, whether for
clinical trials or for surveys and interviews.
Given their capacity to process and Social scientists, for example, use email,
summarize huge amounts of text, we social media, and even the “crowdworking”
expect LLMs to have a profound impact on platform Mechanical Turk (MTurk) owned by
authorship and scientific methods as well Amazon to publicize their studies and recruit
as evaluation. As we describe in more detail subjects. MTurk allows researchers to access
below, researchers in non-English speaking a fairly representative population for a small
countries are likely to use LLMs to more fee (less than half of minimum wage) (Fort et
accurately translate texts or check their al., 2011).
grammar or spelling. This might make it
easier for them to publish in top journals, LLMs will similarly enable new forms of
which are invariably published in English. research, perhaps most notably in the
Even English-dominant researchers might humanities. Historians and scholars of
use LLMs to generate more generic parts English literature will be able to quickly
of scientific texts, including materials and generate summary information about
methods, and parts of introductions and historical texts or genres in the major corpora
conclusions. As we discuss in Section 5, we or new texts they wish to consider. However,
expect that these uses will trigger questions scholars may be reticent to use these sources
about rightful authorship. for two reasons. First, scholars accustomed
to using archives and carefully documenting
the provenance of texts are likely to be wary
of LLMs as data sources at least initially, Psychologists and political scientists could
because of the lack of transparency about use data from the corpora to assess public
the texts contained in the corpora and the attitudes and concerns. Given academic
inability to cite them specifically. Scholars pressures to publish (“or perish”), we expect
and academic publications will likely have to the proliferation of articles identifying data
develop conventions about whether and how correlations. However, without changing
LLMs are used and documented. Wikipedia, statistical methods, this could also increase
for example, has become an important source the production of spurious data that cannot
introducing scholars to a particular topic, but be reproduced.
is generally not acceptable as a reference in
serious scholarly work (Chen, 2009). Second,
Scientific Credit Systems will
because corpora predominantly include
dominant and privileged voices, they may be
Change
of less utility in fields that are increasingly
trying to capture the perspectives and Scientists identify the lineage of their
experiences of those who have been interests, theories, and methods through
coding, for example, but LLMs would allow and ultimately reinforce existing biases in
them to analyze greater quantities of data research fields. While LLMs currently do not
or draw insights from data sources such have the technical capability to identify which
as social media posts that were previously text from the corpus informed the generated
too large to consider as research sources. text, if a future LLM is able to provide
citations along with the text summaries, we of science, Western leaders seemed to be
expect it to privilege highly cited articles changing the game. Similarly, marginalized
which are not likely to represent the field’s scientists might worry that LLMs will make it
diversity or its most novel findings. But in more difficult for them to receive credit and
the more likely scenario, scientists might for their ideas to become recognized as part of
query an LLM about the prevailing knowledge a mainstream corpus of knowledge.
related to a particular phenomenon and
simply treat the output
as general knowledge
that doesn’t need to
be cited. Consider the
They were frustrated that as soon
recent controversy
over sharing data about
as they had begun to build expertise
COVID-19 genomic and resources to participate in the
variants. Western
transnational world of science,
scientists advocated
putting this information
Western leaders seemed to be
into an open database changing the game. Similarly,
that could be used across
marginalized scientists might worry
the world, to facilitate
quicker understanding
that LLMs will make it more difficult
of disease progression for them to receive credit and for their
and development
ideas to become recognized as part of
of prophylactics,
diagnostics, and
a mainstream corpus of knowledge.
treatments (Van
Noorden, 2021).
However, scientists from
Southern countries protested, arguing that Transforming Peer Review
the open approach would rob them of the
opportunity to receive credit for their hard
We also expect research funding agencies,
work identifying variants such as Omicron
scientific publishers and editors, and even
(Maxmen, 2021). They worried further
patent systems to consider incorporating
that scientists from wealthy nations would
LLMs into their review processes. These
publish papers based on–but not citing–their
institutions depend on technical experts to
results, because they had the resources to do
assess the novelty of a study or invention,
further analysis, write up their findings, and
the appropriateness of the methods, and
submit them for publication. More generally,
the plausibility of findings. Invariably,
they were frustrated that as soon as they
these experts also advise researchers how
had begun to build expertise and resources
to consider and address counterfactuals,
to participate in the transnational world
strengthen their claims or findings, or simply reviewer expertise in terms of the number
improve their writing. But peer reviewers are of citations in a particular journal (or set of
unpaid, and as academic pressures increase journals), which may not represent a field’s
it is difficult to find good peer reviewers; cutting edge.
editors say that they spend an enormous
amount of time searching, and even then If humans begin to use LLMs to conduct
the reviewers may be uninformed, provide peer review itself, this could become a
insufficient evaluation, or take too long bigger problem. LLMs are likely to produce
and delay publication (Benos et al., 2007; conservative peer reviews. We expect
Severin & Chataway, 2021). LLMs could solve editors to use LLMs to scaffold parts of the
many of these problems. Developers could peer review process–that is, to train the
create algorithms based on the backlists of technology to look for particular elements in a
all scholarly publications, or smaller ones paper, such as particular methods–to ensure
targeted to a particular field or a particular quality reviews. However, this scaffolding
journal, in order to identify high-quality could produce inflexible standards and
publications and even advise authors how to slower recognition of truly novel results. It
improve their publications or fit better with could also transform scientific practices.
the journal’s standards. In fact, researchers Consider the history of the IRB, in which
have already begun to develop algorithms narrow definitions of risk, benefit, and
that claim to predict the grantability of generalizable research have become hurdles
patent applications, and even which patents for researchers (White, 2007). Or, educators
are likely to be the most consequential in K-12 schools, who have increasingly had
(Candia & Uzzi, 2021). The next step would to twist their instructional strategies to
be to use LLMs to determine patentability, a accommodate standardized testing (Shelton
particularly attractive option as patent offices & Brooks, 2019). Overall, LLMs might be
struggle to hire and retain their personnel. good at evaluating papers in a field where
the conventions, materials, and methods
In the short term, editors might use LLMs are well-established. However, it is hard to
as a half-measure, to help identify peer imagine how a corpus based on historical
reviewers. They might ask the LLM: “who is texts could adequately evaluate new and
an expert in X topic?” Editors have long used evolving science (Kuhn, 1962); we already
email and the internet in this way, which know that this is a challenge for human
has allowed them to diversify their pool of reviewers (Pontis et al., 2017). As a result,
reviewers. However, because LLM corpora are widespread use of LLMs for primary peer
composed of historical texts, this use might review could limit creativity. It could also
actually eliminate the gains in reviewer and perpetuate biases against certain types of
field diversity made in recent years. Unless investigation, such as on structural racism or
the LLM is used very carefully, and with systemic inequality (Hoppe et al., 2019).
additional checks, this use could also affect
a field’s trajectory. An LLM might define
Scientific Evaluation by
free from economic pressures (Shapin,
1995). They maintained their credibility
LLMs will Create Crises by employing probabilistic discourse and
minimizing precision, so as to avoid direct
of Credibility conflict with their peers. Scientists also
trusted others’ findings because they could
LLM-based scientific evaluation systems witness the experiments themselves (Shapin
could also erode trust both within and & Schaffer, 1985). As the scientific enterprise
beyond science. Today, peer review is the grew, witnessing became “virtual”, through
predominant form of scientific evaluation. standardization of methods, research
Experts in a subfield review grant publications, and peer review (Baldwin, 2018).
applications and scientific publications, and These changes, however, came from within
validate the ideas or findings as credible the scientific community, invariably when
and worthy of funding or further circulation they concluded that they needed to establish
through scholarly journals or academic credibility among new audiences.
presses (Latour, 1987). Media outlets and
governments often expect research to be
In fact, professional communities respond
peer reviewed before reporting on it or using
quite poorly to externally imposed evaluation
it as the basis for policymaking. But this
systems, and these external impositions tend
approach to evaluating scientific results is
to be less successful when the community
not natural or self-evident; it is the product
is powerful. For example, in 1836 the US
of social negotiations and settlement. And
Congress passed a law requiring the Patent
it could certainly be otherwise. In the 17th
Office to employ examiners with science
century, wealthy gentlemen were assumed
and engineering backgrounds, to replace the
to be trustworthy–and producing credible
clerks who had previously handled patent
scientific findings–because they were
applications. It was concerned that the
bureaucracy was issuing too many patents
based on old, unoriginal, and non-workable
ideas, and believed that highly trained
technical experts would solve the problem
(Swanson, 2009). However, when these new
examiners applied scientific standards for
novelty and nonobviousness, they found
that very few applications should be granted.
Patent agents and lawyers, who were
accustomed to a bureaucracy that had only
legal criteria for granting patents, protested
vigorously and threatened that if no patents
were granted, the fledgling US economy
would fail. They were ultimately successful;
Credit: Philadelphia College of Pharmacy and Science (CC BY 4.0)
Patent Office administrators negotiated with
in the early days of the technology or in language, we expect that it will flatten
response to the publication of particularly important nuance, caveats, error rates, and
controversial ideas. And if communities uncertainties. This, we fear, will reinforce the
don’t trust evaluation systems then they will illusion that scientific findings are objective,
challenge the institutions promoting them. stanceless, value-free, and are generated
Prescription drug recalls have engendered with a view from nowhere. Ultimately,
not only mistrust in the US Food and Drug this could exacerbate public skepticism of
Administration, but hesitancy towards science. We have seen this with previous
distrust in the US Centers for Drug Control journalism, for example, tends to minimize
and Prevention has exacerbated resistance to what scholars call the “translational gap”:
mask wearing and other protection measures the amount of additional research needed
during the COVID-19 pandemic. before scientific findings can lead to better
medical practice (Summers-Trio et al.,
2019). Instead, they tend to overestimate
the importance of early stage studies. For
example, many early biomedical studies
are performed on mice. This can provide
general indicators about the safety or
effectiveness of a particular treatment,
or shape of a particular phenomenon, but
mice are quite different physiologically journals. Journals may implement new forms
than humans. However, media articles of monetization by charging LLM developers
still report these results with breathless who use their university subscriptions to
excitement, creating false expectations about incorporate journal articles into training
the imminence of treatments and the power corpora. But we believe that LLMs will
of science (Chakradhar, 2019). Similarly, increase the attractiveness of Elsevier and
museums and other exhibitions such as other academic publishers themselves. Given
World’s Fairs tend to produce idealized their financial resources and monopolies over
images of cultures and countries, reinforcing huge volumes of scientific texts, publishers
distorted public understandings with real could create their own LLMs for researchers
geopolitical consequences (Swift, 2019). and bundle them in their services to academic
We expect LLMs to reinforce a similarly institutions. They might even require
idealized image of science, which will leave universities to purchase all of their journals in
publics bewildered and frustrated when order to access their LLM. Indeed, companies
they confront its realities. Ultimately, this frequently leverage emerging technologies
could exacerbate problems of public trust to maintain or enhance their monopoly
and alienation particularly among publics power. Monsanto spliced “terminator gene”
already questioning scientific findings (Funk, technology into its genetically modified
Kennedy, & Tyson, 2020; Funk, Kennedy, & crops in order to prevent them from
Johnson, 2020). replicating (Masood, 1998). This meant that
farmers could not replant their seeds after
the growing season, which they had done
LLMs will Hurt Open for hundreds of years. Similarly, academic
Access Movements publisher JSTOR, in conjunction with MIT,
used its internet surveillance capabilities to
track down and stop excessive downloads
Finally, we expect LLMs to become another
of journal articles it owned. An MIT student
tool for academic publishing giants to
activist Aaron Swartz downloaded these
maintain their control over scientific
articles in order to promote their open access;
knowledge. In recent years, researchers have
he was later criminally charged for this act
become increasingly concerned about how
and died by suicide (Schwartz, 2013).
journal subscription costs hurt access to
knowledge. This, they argue, limits who can
participate in scientific knowledge production Given the vitality of the open access
and ultimately, the quality of science itself. movement, we expect scientists to resist
In response, universities are canceling huge by creating grassroots LLMs. They might
journal subscriptions (Resnick & Belluz, build on the work of non-profit initiatives
2019). Researchers are sharing preprints such as Eleuther AI and rely on pro bono
on their own websites, or on portals such expertise and donated pre-prints and other
as Sci-Hub and ArXiV.org (Nicholas et al., text to develop apps. Scientists made similar
2019). They are publishing in “open access” attempts to gather data about disease-
causing mutations in genes linked to breast
and ovarian cancer (known as the BRCA physicians. This made it virtually impossible
genes), to compete with biotechnology to build a database as powerful or useful as
company Myriad Genetics’ virtual monopoly Myriad’s, which in turn made it difficult
on BRCA gene testing in the United States to challenge the company’s monopoly. We
(Conley et al., 2014). Myriad used its testing expect scientists developing grassroots LLMs
monopoly to build a proprietary database to confront similar challenges, even if they
of information about the genomic variants have access to adequate technical expertise
discovered, their association with disease, as and financial resources.
well as individual and family health histories.
Even though it lost its US testing monopoly in
2013 after patients, physicians, and scientists
LLMs Will Reinforce
contested its patents (Parthasarathy, Anglo-American
2017), Myriad maintained its intellectual
property through this database; patients
Scientific Dominance
and physicians preferred to use Myriad’s
Like the telephone and the internet,
LLMs may facilitate global scientific
communication and even cooperation.
However, given the technology’s capacity
to summarize and translate text, some
may assume that it could facilitate real
international inclusion and even the
“decolonization” of science. Consider how
the internet has changed science. Internet
search engines, scientific databases, and
social media have helped scientists learn
about and build upon one another’s work,
regardless of where they are in the world.
Email has facilitated communication,
allowing researchers to contact one another
and even collaborate despite living in
Credit: Ernesto Del Aguila III, NHGRI
different time zones or on distant continents.
Indeed, there is evidence that international
scientific collaboration has increased
testing service rather than others because the significantly in recent years, allowing
database could provide better interpretations scientists to share project costs, gain access
about the implications of the genetic variants to expansive or unique physical resources,
for disease. In order to build their alternative, share more data, and enhance creativity
scientists had to rely on word of mouth, (Matthews et al., 2020). And yet, technology-
collaborations may have required scientists to LLMs to improve their English writing, to
visit laboratories for extended periods of time facilitate journal publication. While scientists
to learn methods, now such collaborations in former British or US colonies could also use
can occur without any in-person contact. This them to gain easier access to knowledge, they
makes it much more difficult to transfer tacit may still not have access to the proprietary
knowledge–intangible scientific practices– LLMs sold by academic publishing companies.
which is essential for proper collaboration Thus, while LLMs may help some scientists
(Collins, 1992). However, scientists may not in low and middle income countries, the
be aware that this knowledge is lost. prevailing political economy of science is
likely to prevent true
mutual learning and
engagement.
attention to another region beyond the large will dominate global scientific priorities.
and prosperous city of Seoul (Knowles, 2019 Furthermore, knowledge produced in English
p. 207). Similarly, while both the United may be viewed as more generalizable than
States and Soviet Union focused on similar knowledge produced in other languages. And
themes of technological progress and cultural yet, these political implications may remain
diversity in the 1958 World’s Fair, the United hidden because LLMs will be promoted as a
States took a less serious approach in order to technology that will be able to truly globalize
downplay the perception of its strength and science.
power during the Cold War (Swift, 2019 p. 38).
Similarly, Nature has always characterized In this section, we have explored the range
itself as a premier scientific journal that of implications that LLMs will have on
explicitly serves an international community scientific knowledge and practice. We expect
despite its British base. However, in its early LLMs to transform scientific priorities
decades it saw the world through a British and practices, and systems of authorship,
lens (Baldwin, 2015). Contributors adopted a credit, and evaluation. This may produce
voyeuristic approach to foreign science, and crises of credibility, not only within science
often used it as a foil to comment on national and beyond. It will also strengthen the
affairs. power of scientific publishers, despite
growing frustration about their knowledge
The more common LLMs become as a monopolies. Finally, while we are hopeful
scientific tool, the more they will reinforce that LLMs could facilitate international
English as the lingua franca of science. This cooperation and inclusion, we fear that
will likely also mean that the values and this will not materialize unless the corpora
concerns of the English-speaking world– become much more diverse.
particularly the United States and Britain–
Policy Recommendations
LLMs have great potential to benefit society. However, the priorities of the current
development landscape make it difficult for the technology to achieve this goal.
Below, we articulate how both LLMs (the models themselves, corpora, and output)
and LLM-based apps must be regulated in order to maximize the public good. We also
recommend greater scrutiny of LLMs’ impacts on labor and the environment. Finally,
we recommend that the National Science Foundation (and similar science funding
agencies around the globe) invest more heavily in research related to LLMs and their
impacts, to balance attention in an area currently dominated by the private sector.
1
RECOMMENDATION 1
The US government must regulate LLMs, for example through the Federal Trade
Commission. This should include:
f. Requirement to label all LLM output as such and include information about
the developer.
P O L I C Y R E C O M M E N D AT I O N S (CONTINUED)
2
RECOMMENDATION 2
The US government must regulate all apps that use LLMs, for example through
the Federal Trade Commission, according to their use. The more consequential
the LLM output, the greater the regulatory scrutiny (e.g., LLM-based apps related
to criminal justice and patient care receive more extensive evaluation). Evaluation
should consider:
a. Whether app developers are using the right LLM for their needs.
3
RECOMMENDATION 3
Either a national or international standard setting organization (e.g., National
Institute for Standards and Technology, International Standards Organization)
must publish yearly evaluations of LLMs. They should assess: 1) diversity of the
corpora; 2) performance; 3) transparency; 4) accuracy; 5) data security; and 6) bias
towards marginalized communities.
4
RECOMMENDATION 4
The US government must enact comprehensive data privacy and security laws.
5
RECOMMENDATION 5
Under no circumstances should LLM-based apps deployed by the government
(e.g., chatbots that provide information about social services, pre-trial risk
assessment apps in criminal justice proceedings) harvest personally identifiable
information.
P O L I C Y R E C O M M E N D AT I O N S (CONTINUED)
6
RECOMMENDATION 6
The agencies that regulate LLMs and LLM-based apps, those that incorporate
LLMs into its services, and all standard-setting bodies (e.g., the National Institute
for Standards and Technology) must employ full-time advisors in the social and
equity dimensions of technology. This “Chief Human Rights in Tech” Officer
would advise procurement and technology evaluation decisions, monitor the
technology once it is used and flag problems, and address disparate impacts.
7
RECOMMENDATION 7
Both national and international intellectual property authorities (e.g., the US
Copyright Office, the World Intellectual Property Organization) must develop
clear rules about the copyright status of LLM-generated inventions and artistic
works.
8
RECOMMENDATION 8
All environmental assessments of new data centers must evaluate the impacts
on local utility prices, local marginalized communities, human rights in minerals
mining, and climate change.
9
RECOMMENDATION 9
The US government must work with other governments around the world
(perhaps under the auspices of the United Nations) to develop global labor
standards for tech work (including minerals mining).
10
RECOMMENDATION 10
The government must evaluate the health, safety, and psychological risks that
LLMs and other forms of artificial intelligence create for workers, e.g., reorienting
them towards more complex and often unsafe tasks. The Occupational Safety
and Health Administration can perform this role, but it will require new
regulations for workplace safety and an expansion of its purview to include
psychological risks.
P O L I C Y R E C O M M E N D AT I O N S (CONTINUED)
11
RECOMMENDATION 11
The US government must develop a robust response to the job consolidation
that LLMs, and automation more generally, are likely to create. At a targeted level
this should include job retraining programs and at a broad level, a guaranteed
basic income and universal health care.
12
RECOMMENDATION 12
The National Science Foundation must substantially increase its funding for LLM
development. This funding should prioritize:
d. Support research into building new types of models that are more easily
updated and maintained.
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 100 PDF: Click to
return to top
WHAT’S
WHAT’S IN
IN THE
THE CHATTERBOX?
CHATTERBOX? LARGE
LARGE LANGUAGE
LANGUAGE MODELS,
MODELS, WHY
WHY THEY
THEY MATTER,
MATTER, AND
AND WHAT
WHAT WE
WE SHOULD
SHOULD DO
DO ABOUT
ABOUT THEM
THEM
Developers’ Code
of Conduct
LLMs are likely to trigger profound social change. Both LLM and app developers
must recognize their public responsibilities and try to maximize the benefits of
these technologies while minimizing the risks. To do this, they should adhere to the
following practices:
• LLM developers should dedicate significant • LLM developers should prioritize research in
effort and resources to maintaining and the following areas:
improving on existing LLMs rather than
• Building models that are easily
exclusively developing new ones. LLMs must
updated and maintained
be kept up to date with changing language
and sentiments. • Evaluating the fitness of a model for a
particular task
• LLM developers should curate corpora with
care. They should resist appropriating already • Equity, social, and environmental
assembled bodies of text that were created impacts of LLMs
for other purposes. They should instead
define standards their corpus needs to meet • Understanding and explaining to end
and build a collection of texts with those users the rationale behind LLM output
standards in mind.
UNIVERSITY
UNIVERSITY OF
OF MICHIGAN
MICHIGAN TECHNOLOGY
TECHNOLOGY ASSESSMENT
ASSESSMENT PROJECT
PROJECT APRIL
APRIL 2022
2022 101
101 PDF: Click to
return to top
WHAT’S
WHAT’S IN
IN THE
THE CHATTERBOX?
CHATTERBOX? LARGE
LARGE LANGUAGE
LANGUAGE MODELS,
MODELS, WHY
WHY THEY
THEY MATTER,
MATTER, AND
AND WHAT
WHAT WE
WE SHOULD
SHOULD DO
DO ABOUT
ABOUT THEM
THEM
technology (i.e., science and technology developers. App developers, in turn, must not
studies scholars). This includes systematic use LLMs to perform tasks they are not suited
analysis of both positive and negative for. Specifically:
implications for marginalized communities.
• LLMs should not be treated as a
• App developers must label LLM-generated source of intelligence since they
text as such. were trained to model language, not
understand the world. The fact that
LLMs “know” some things about the
Both LLM and App world is coincidental.
Developers • Developers should build apps and
• Rather than creating a few general purpose deploy LLMs only in situations
LLMs and assuming they are ready to be where up-to-date language patterns
integrated into a variety of apps, LLMs should are not necessary. Since LLMs are
be designed and evaluated for specific conservative, they replicate the past.
purposes. Both app and LLM developers
• An LLM cannot speak for everyone.
should work together or developers should
LLMs are universalizing; they favor
take on both of these roles.
dominant language patterns and
• Both LLM and app developers must support flatten nuance, but language is
low income and marginalized communities’ diverse even within a single language.
capacity to drive development. This includes This means that even an LLM that
providing funding and technical support so appears to be “neutral” will serve
that community organizations can develop members of the dominant group as it
their own apps and LLMs. In the process, alienates others.
developers must recognize that the trust
• Both LLM and app developers should
of marginalized communities is fragile, and
implement a complaint system for end
can only be achieved through authentic
users and other stakeholders to document
engagement and long-term relationships.
their negative experiences with an LLM.
• LLM developers must be fully transparent Developers should be sympathetic and
about the limitations of their technology, responsive to these concerns.
including in their discussions with app
UNIVERSITY
UNIVERSITY OF
OF MICHIGAN
MICHIGAN TECHNOLOGY
TECHNOLOGY ASSESSMENT
ASSESSMENT PROJECT
PROJECT APRIL
APRIL 2022
2022 102
102 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 103 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
R E C O M M E N D AT I O N S F O R
THE SCIENTIFIC COMMUNITY (CONTINUED)
• All publications that rely on LLMs for text analysis should provide detail
about the corpora and algorithms on which the results are based.
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 104 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Acknowledgements
The authors would like to thank Shelby Pitts, Daniel Rivkin, and Nick Pfost for their assistance in
researching, revising, and producing this report.
The Technology Assessment Project is supported in part through a generous grant from the Alfred
P. Sloan Foundation (grant #G-2021-16769)
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 105 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
References
Abid, A., Farooqi, M., & Zou, J. (2021, Alford, A. (2020, November 3). Large-
July). Persistent anti-muslim bias in scale multilingual AI models from
large language models. Proceedings of Google, Facebook, and Microsoft. InfoQ.
the 2021 AAAI/ACM Conference on AI, https://www.infoq.com/news/2020/11/
Ethics, and Society (pp. 298-306). https:// multilingual-ai-models/
doi.org/10.1145/3461702.3462624
AI Now Institute. (2021, October 5). Allen, M. (2018, June 14). And the title of the
Democratize AI? How the proposed largest data center in the world and largest
National AI Research Resource falls data center in the US goes to…. DataCenters.
short. AINOW. https://medium.com/@ com. https://www.datacenters.com/news/
AINowInstitute/democratize-ai-how- and-the-title-of-the-largest-data-center-
the-proposed-national-ai-research- in-the-world-and-largest-data-center-in
resource-falls-short-96ae5f67ccfa
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 106 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 107 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Benos, D. J., Bashari, E., Chaves, J. M., Gaggar, Bijker, W., Hughes, T.P., & Pinch, T. (1987).
A., Kapoor, N., LaFrance, M., ... & Zotov, A. The social construction of technological
(2007). The ups and downs of peer review. systems: New directions in the sociology
Advances in physiology education. 31(2), 145- and history of technology. MIT Press.
152. https://doi.org/10.1152/advan.00104.2006
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 108 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
on the doctor/patient relationship. Buss, D., Rutherford, B., Stewart, J., Côté,
Health, 9(3), 319-338. G.E., Sebina-Zziwa, A., Kibombo, R.,
Hinton, J., and Lebert, J. (2019, November).
Brown, A., Parrish, W., and Speri, A. Gender and artisanal and small-scale
expose inner workings of ‘Surveillance- The Extractive Industries and Society, 6(4),
theintercept.com/2017/06/03/standing- science/article/pii/S2214790X19301522
rock-documents-expose-inner-workings-
of-surveillance-industrial-complex/ C., R. (2022). Almost 6 billion accounts
affected in data breaches in 2021.
https://tdwi.org/articles/2018/10/29/biz-
all-impact-of-equifax-data-breach.aspx Cagle, S. (2019, July 8) ‘Protesters as
terrorists’: growing number of states
Brown, T., Mann, B., Ryder, N., Subbiah, turn anti-pipeline activism into a crime.
M., Kaplan, J. D., Dhariwal, P., ... & Amodei, The Guardian. https://www.theguardian.
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 109 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Caplar, N., Tacchella, S., & Birrer, S. (2017). Cohen, P. (2010, April 5). Indian tribes
Quantitative evaluation of gender bias in go in search of their lost languages. The
astronomical publications from citation New York Times. https://www.nytimes.
counts. Nature Astronomy, 1:141. https:// com/2010/04/06/books/06language.
www.nature.com/articles/s41550-017-0141 html#:~:text=Of%20the%20more%20
than%20300,reclamation%20efforts%20
Chakradhar, S. (2019, April 15). It’s just Institute for Policy Studies, https://ips-dc.org/
in mice! This scientist is calling out hype wp-content/uploads/2020/10/Muzzling-
statnews.com/2019/04/15/in-mice-twitter-
account-hype-science-reporting/ Cole, S. (2021, December 8). Workers are
using ‘mouse movers’ so they can use the
Chen, C.J. (2009). Art history: a guide bathroom in peace. Vice. https://www.vice.
org/10.1108/01604950910971152
Collins, H. (1992). Changing order:
Chui, M., Manyika, J., and Miremadi, M. Replication and induction in scientific
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 110 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Cooper, A. (2019). Hear me out. Missouri Daws, R. (2020, October 28). Medical
Medicine, 116(6), 469-471. https://www.ncbi. chatbot using OpenAI’s GPT-3 told a
nlm.nih.gov/pmc/articles/PMC6913847/ fake patient to kill themselves. AI News.
https://artificialintelligence-news.
position. https://www.consilium.europa.
eu/en/press/press-releases/2018/05/25/ Day, T. (2017, August 1). Building data centers
copyright-rules-for-the-digital- creates jobs. U.S. Chamber of Commerce.
environment-council-agrees-its-position/# https://www.uschamber.com/technology/
building-data-centers-creates-jobs
Data Center Map. (2022). Data Center Map. of deep bidirectional transformers
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 111 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Dickens, A. G. (1974). The German nation Society and History, 26(4), 571-586.
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 112 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Privacy implications of email tracking. Eschrich, J., & Miller, C. (2021, March
Proceedings on Privacy Enhancing 12). Cities of Light: A Collection of Solar
Technologies, 2018(1), 109–126. https:// Futures. Center for Science and the
doi.org/10.1515/popets-2018-0006 Imagination, Arizona State University.
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 113 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
First Peoples Worldwide. (2019, March 14). France24. (2019, August 27). Protests
New report finds increase of violence coincides erupt after French supermarket uses
with oil boom. University of Colorado, automation to evade labour laws. https://
Boulder. https://www.colorado.edu/ www.france24.com/en/20190827-
program/fpw/2019/03/14/new-report-finds- protests-erupt-french-supermarket-
increase-violence-coincides-oil-boom automation-labour-laws-sunday-laws
Fisher, E., Mahajan, R. L., & Mitcham, C. Frankel, T.C., & Whoriskey, P. (2016,
(2006). Midstream modulation of technology: December 19). Tossed aside in the ‘white
Governance from within. Bulletin of Science, gold’ rush. Washington Post. https://www.
Technology & Society, 26(2), 485-496. https:// washingtonpost.com/graphics/business/
doi.org/10.1177/0270467606295402 batteries/tossed-aside-in-the-lithium-rush/
Fitzgerald, C., & Hurst, S. (2017). Implicit bias Frynas, J.G. (2001). Corporate and state
in health care professionals: a systematic responses to anti-oil protests in the Niger
review. BMC Medical Ethics, 18(19). https:// Delta. African Affairs, 100(398), 27–54,
doi.org/10.1186/s12910-017-0179-8 http://www.jstor.org/stable/3518371.
Fort, K., Adda, G., & Cohen, K. B. Funk, C., Kennedy, B., Johnson, C. (2020,
(2011). Amazon Mechanical Turk: Gold May 21). Trust in medical scientists has
mine or coal mine?. Computational grown in U.S., but mainly among Democrats.
Linguistics, 413-420. https://hal. Pew Research Center. https://www.
archives-ouvertes.fr/hal-00569450 pewresearch.org/science/2020/05/21/
trust-in-medical-scientists-has-grown-
Foster, M. W., Eisenbraun, A. J., & Carter, Galligan, C., Rosenfeld, H., Kleinman, M.,
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 114 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Hoppe, T., Foster, C., Phang, J., He, H., Thite, air-plane-crash-pilots.html?partner=IFTTT
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 115 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Grother, P., Ngan, M., & Hanaoka, K. Hao, Karen (2020, December 16).
(2019). Face recognition vendor test ‘I started crying’: Inside Timnit
part 3: demographic effects. National Gebru’s last days at Google--and
Institute of Standards and Technology, 8280. what happens next. MIT Technology
https://doi.org/10.6028/nist.ir.8280 Review. https://www.technologyreview.
com/2020/12/16/1014634/google-ai-
Guadamuz, A. (2016). The monkey selfie: ethics-lead-timnit-gebru-tells-story/
copyright lessons for originality in
photographs and internet jurisdiction. Hao, K. (2021, May 20). The race to
Internet Policy Review, 5(1). https:// understand the exhilarating, dangerous
doi.org/10.14763/2016.1.398 world of language AI. MIT Technology
Review. https://www.technologyreview.
Guendelsberger, E. (2019, July 18). I com/2021/05/20/1025135/ai-large-
worked at an Amazon fulfillment center; language-models-bigscience-project/
They treat workers like robots. TIME.
https://time.com/5629233/amazon- Haranas, M. (2021, March 21). Microsoft to
warehouse-employee-treatment-robots/ build new $200M data center as Azure sales
soar. CRN. https://www.crn.com/news/
data-center/microsoft-to-build-new-
200m-data-center-as-azure-sales-soar
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 116 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 117 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Hoppe, T. A., Litovitz, A., Willis, K. A., Isberto, M. (2021, June 9). Are there
Meseroll, R. A., Perkins, M. J., Hutchins, benefits of a rural data center?. Colocation
B. I., Davis, A.F., Lauer, M.S., Valentine, America. https://www.colocationamerica.
H.A., and Santangelo, G. M. (2019). Topic com/blog/rural-data-center-benefits
choice contributes to the lower rate of
NIH awards to African-American/black Ishkhanov, B. S. (2012). The atomic nucleus.
scientists. Science advances, 5(10). https:// Moscow University Physics Bulletin, 67(1), 1-24.
doi.org/10.1126/sciadv.aaw7238 https://doi.org/10.3103/S0027134912010092
Howell, J.D. (1995) Technology in the hospital: Jemisin, N. K. (2011). The Trojan Girl.
Transforming patient care in the early twentieth Weird Tales #357. https://nkjemisin.
century. Johns Hopkins University Press. com/2012/08/the-trojan-girl/
Hughes, T.P. (1983). Networks of Power: Kelly, H. (2021, November 19). For seniors
Electrification in Western Society, 1880- using tech to age in place, surveillance
1930. Johns Hopkins University Press. can be the price of independence.
The Washington Post. https://www.
Hutchins, J. (2003). ALPAC: the washingtonpost.com/technology/2021/11/19/
(in) famous report. Readings in seniors-smart-home-privacy/
machine translation, 14, 131-135.
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 118 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Kilgo, D. K., Wilner, T., Masullo, G. M., Kulkarni, P., & K, C. N. (2021). Personally
& Bennett, L. K. (2020). News distrust Identifiable Information (PII) detection
among Black Americans is a fixable problem. in the unstructured large text corpus
Center for Media Engagement. https:// using Natural Language Processing
mediaengagement.org/research/news- and unsupervised learning technique.
distrust-among-black-americans International Journal of Advanced Computer
Science and Applications, 12(9). https://
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 119 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 120 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Matthews, K. R., Yang, E., Lewis, S. Michelson, E. (2016). Assessing the societal
W., Vaidyanathan, B. R., & Gorman, implications of emerging technologies:
M. (2020). International scientific Anticipatory governance in practice. Routledge.
collaborative activities and barriers to
them in eight societies. Accountability in Miller, Johnny. (2018, February 21).
Research, 27(8), 477-495. https://doi.or Roads to nowhere: How infrastructure
g/10.1080/08989621.2020.1774373 built on American inequality. The
Guardian. www.theguardian.com/
Maxmen, A. (2021). Why some researchers cities/2018/feb/21/roads-nowhere-
oppose unrestricted sharing of coronavirus infrastructure-american-inequality.
genome data. Nature, 593(7858), 176-177.
Metz, C., & Wakabayashi, D. (2020, Dec Monaghan, J., & Walby, K. (2017). Surveillance
3). Google researcher says she was fired of environmental movements in Canada:
over paper highlighting bias in A.I. The critical infrastructure protection and the
New York Times. https://www.nytimes. petro-security apparatus, Contemporary
com/2020/12/03/technology/google- Justice Review, 20(1), 51-70. https://doi.
researcher-timnit-gebru.html. org/10.1080/10282580.2016.1262770
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 121 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Moore, W C. (1959). The questioned Moss, S. (2021, October 21). Data center water
typewritten document. Minnesota usage remains hidden. Data Center Dynamics.
Law Review, 2585. https://core.ac.uk/ https://www.datacenterdynamics.com/en/
download/pdf/217208687.pdf analysis/data-center-water-usage-remains-
hidden/#:~:text=Direct%20water%20
bostonreview.net/articles/amy-
moran-thomas-pulse-oximeter/ Murphy, E. E., & Tong, J. (2019). The
Racial Composition of Forensic DNA
More Perfect Union. (2021, November 9). Databases. California Law Review, 108(6).
Morgan, T.P. (2020, February 15). The R. (2003). The impact of health information
datacenter has an appetite for GPU compute. on the internet on the physician-patient
an-appetite-for-gpu-compute/ https://doi.org/10.1001/archinte.163.14.1727
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 122 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 123 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
columbiacommunityconnection.
com/the-dalles/google/data/centers/ Rabinow, P. (2011). Making PCR: A story of
tax-abatements/city-council biotechnology. University of Chicago Press.
Pilon, M. (2019, April 29). Stop & Shop Race Forward: The Center for Racial Justice
strike reveals concerns about job-killing Innovation. (2014, January 22). Moving
technology. Hartford Business Journal. the race conversation forward. https://
https://www.hartfordbusiness.com/ www.raceforward.org/research/reports/
article/stop-shop-strike-reveals- moving-race-conversation-forward
concerns-about-job-killing-technology
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 124 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Rainie, L., & Perrin, A. (2019, July 22). Resnick, B., & Belluz, J. (2019, July 10).
Key findings about Americans’ declining The war to free science: how librarians,
trust in government and each other. Pew pirates, and funders are liberating
Research Center. https://www.pewresearch. the world’s academic research from
org/fact-tank/2019/07/22/key-findings- paywalls. Vox. https://www.vox.com/
about-americans-declining-trust- the-highlight/2019/6/3/18271538/
in-government-and-each-other/ open-access-elsevier-california-
sci-hub-academic-paywalls
bring-new-jobs-to-small-towns/
Robinson, R. (2021, November 16).
Reardon, J., & Princeton University. Boeing built an unsafe plane, and blamed
(2005). Race to the finish : identity the pilots when It crashed. Bloomberg
digitalcommons.wou.edu/theses/21/ https://towardsdatascience.com/
cant-access-gpt-3-here-s-gpt-j-its-
open-source-cousin-8af86a638b11
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 125 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Romero, A. (2021, June 5). GPT-3 scared you? Scareflow, A. [@vboykis]. (2020, August 2).
Meet Wu Dao 2.0: A monster of 1.75 trillion NLP People: Anyone know what these two
parameters. Towards Data Science. https:// Books1 and Books2 data sources in GPT-3 are?
towardsdatascience.com/gpt-3-scared- Writing a newsletter about them...[Tweet].
you-meet-wu-dao-2-0-a-monster-of-1- Twitter. https://twitter.com/vboykis/
75-trillion-parameters-832cd83db484 status/1290030614410702848?lang=en
Rousseau, A., Baudelaire, C., Riera, K. (2020, Schiermeier, Q. (2021). Forensic database
October 27). Doctor GPT-3: hype or reality? challenged over ethics of DNA holdings.
Nabla. https://www.nabla.com/blog/gpt-3/ Nature, 594(7863), 320–322. https://doi.
org/10.1038/d41586-021-01584-w
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 126 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Selinger, E., & Durant, D. (2021). Amazon’s Siddik, M.A.B., Shehabi, A., and Marston, L.
Ring: Surveillance as a Slippery Slope (2021). The environmental footprint of data
Service. Science as Culture, 1-15. https:// centers in the United States. Environmental
doi.org/10.1080/09505431.2021.1983797 Research Letters, 16. https://iopscience.iop.
org/article/10.1088/1748-9326/abfba1/pdf
Selsky, A., and Valdes, M. (2021, October
25). Big tech data centers spark worry over Simon, C. M., L’Heureux, J., Murray, J.
scarce western water. Associated Press. https:// C., Winokur, P., Weiner, G., Newbury,
apnews.com/article/technology-business- E., Shinkunas, L., & Zimmerman, B.
environment-and-nature-oregon-united- (2011). Active choice but not too active:
states-2385c62f1a87030d344261ef9c76ccda Public perspectives on biobank consent
models. Genetics in Medicine : Official
Semuels, A. (2016, March 18). The role Journal of the American College of Medical
of highways in American poverty. The Genetics, 13(9), 821–831. https://doi.
Atlantic. https://www.theatlantic.com/ org/10.1097/GIM.0b013e31821d2f88
business/archive/2016/03/role-of-
highways-in-american-poverty/474282/ Simonite, T. (2021). What really happened
when Google ousted Timnit Gebru? Wired.
Severin, A., & Chataway, J. (2021). https://www.wired.com/story/google-
Overburdening of peer reviewers: A multi- timnit-gebru-ai-what-really-happened/.
stakeholder perspective on causes and
effects. Learned Publishing, 34(4), 537-
546. https://doi.org/10.1002/leap.1392
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 127 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Singh, R. & Jackson, S. (2021). Seeing Stanford CRFM. (n.d.) Developing and
like an infrastructure: Low-resolution understanding responsible foundation
citizens and the Aadhaar identification models. Retrieved March 3, 2022
project. Proceedings of the ACM on Human- from https://crfm.stanford.edu/
Computer Interaction, 5(CSCW2), 1-26.
https://doi.org/10.1145/3476056 Stanford HAI. (n.d.) Corporate members
program. Retrieved March 3, 2022
Sjoding, M. W., Dickson, R. P., Iwashyna, from https://hai.stanford.edu/about/
T. J., Gay, S. E., & Valley, T. S. (2020). Racial corporate-members-program
bias in pulse oximetry measurement.
New England Journal of Medicine, 383(25), Stangeland, C. (2016, December).
2477-2478. https://www.nejm.org/ Fracking: Unintended consequences for local
doi/full/10.1056/nejmc2029240 communities. Homeland Security Affairs.
https://www.hsaj.org/articles/13753
Solaiman, I., & Dennison, C. (2021).
Process for adapting language models Starr, P. (1982). The social transformation
to society (palms) with values- of American medicine. Basic Books.
targeted datasets. Advances in Neural
Information Processing Systems, 34.
Stern, J. (2021, May 28). Pipeline of
violence: The oil industry and missing and
Sovacool, B.K. (2021, March). When murdered Indigenous women. Immigration
subterranean slavery supports sustainability and Human Rights Law Review. https://
transitions? power, patriarchy, and child lawblogs.uc.edu/ihrlr/2021/05/28/
labor in artisanal Congolese cobalt mining. pipeline-of-violence-the-oil-industry-
The Extractive Industries & Society, 8(1), and-missing-and-murdered-indigenous-
271-293. https://www.sciencedirect.com/ women/#post-274-footnote-ref-6
science/article/pii/S2214790X20303154
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 128 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Stilgoe, J., Owen, R., & Macnaghten, P. (2013). Summers-Trio, P., Hayes-Conroy, A.,
Developing a framework for responsible Singer, B., & Horwitz, R. I. (2019). Biology,
innovation. Research Policy, 42(9), 1568-1580. biography, and the translational gap. Science
https://doi.org/10.1016/j.respol.2013.05.008 Translational Medicine, 11(479). https://
doi.org/10.1126/scitranslmed.aat7027
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 129 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Terry, S. F., Terry, P. F., Rauen, K. A., Uitto, J., Hastings Center Report, 28(4), 6-14.
& Bercovitch, L. G. (2007). Advocacy groups as https://doi.org/10.2307/3528607
research organizations: the PXE International
example. Nature Reviews Genetics, 8(2), Turing, A. M. (1950). Computing
157-164. https://doi.org/10.1038/nrg1991 Machinery and Intelligence. Creative
Computing, 6(1), 44-53.
The British Psychological Society. (2017).
Working with interpreters: Guidelines U.S. Copyright Office Review Board.
for psychologists. https://www.bps. (2022, February 14). Re: Second Request
org.uk/news-and-policy/working- for Reconsideration for Refusal to Register A
interpreters-guidelines-psychologists Recent Entrance to Paradise (Correspondence
ID 1-3ZPC6C3; SR # 1-7100387071)
The City of Ann Arbor. (n.d.) Drinking [Letter]. https://www.copyright.gov/
water. Retrieved March 13, 2022 from rulings-filings/review-board/docs/a-
https://www.a2gov.org/departments/ recent-entrance-to-paradise.pdf
systems-planning/planning-areas/water-
resources/Pages/Drinking-Water.aspx U.S. Geological Survey. (2018, June 8).
Mining and water quality. USGS Water
The Eye. (2020). Enter the Eye: An open Science School. https://www.usgs.gov/
directory data archive. https://the-eye.eu/ special-topics/water-science-school/
science/mining-and-water-quality
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 130 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Vaswani, A., Shazeer, N., Parmar, N., Kohli, P., Coppin, B., and Huang, P. S. (2021).
Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Challenges in detoxifying language models.
L. and Polosukhin, I. (2017). Attention is arXiv. https://arxiv.org/abs/2109.07445
all you need. Advances in neural information
processing systems (pp. 5998-6008). https:// West, D. (2019). The Future of Work: Robots, AI,
proceedings.neurips.cc/paper/2017/file/3f5e and Automation. Brookings Institution Press.
e243547dee91fbd053c1c4a845aa-Paper.pdf
Welbl, J., Glaese, A., Uesato, J., Dathathri, Wyden, R. (2022). Wyden, Booker and
S., Mellor, J., Hendricks, L. A., Anderson, K, Clarke Introduce Algorithmic Accountability
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 131 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
Act of 2022 To Require New Transparency Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov,
And Accountability For Automated R., Urtasun, R., Torralba, A., & Fidler,
Decision Systems. Press Release. S. (2015). Aligning books and movies:
Towards story-like visual explanations
Zabel, J. (2019). The Killer Inside Us: Law, by watching movies and reading books.
Ethics, and the Forensic Use of Family Proceedings of the IEEE international
Genetics. Berkeley Journal of Criminal Law, conference on computer vision (pp. 19-27).
24(2). https://doi.org/10.15779/Z385D8NF7
Zou, Y., Mhaidli, A. H., McCall, A., & Schaub,
Zeavin, H. (2021). The distance cure: A F. (2018). “I’ve got nothing to lose”:
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 132 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 133 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM
LEARN MORE
myumi.ch/LLMReport
(734) 764-0453
stpp.fordschool.umich.edu
[email protected]
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 134 PDF: Click to
return to top