Download as pdf or txt
Download as pdf or txt
You are on page 1of 134

What’s in the

Chatterbox?
Large Language Models,
Why They Matter, and What
We Should Do About Them

Johanna Okerlund
Evan Klasky
Aditya Middha
Sujin Kim
Hannah Rosenfeld
Molly Kleinman
Shobita Parthasarathy

TECHNOLOGY ASSESSMENT PROJECT REPORT


WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Contents
I MPLI CATI O NS O F LLM ADO PTI O N

Section 4: Reinforcing 62
Social Inequalities

About the Authors 3


Section 5: Remaking 71
Labor and Expertise
About the Science, 5
Technology, and Public
Policy Program Section 6: Increasing 78
Social Fragmentation

Acronyms and 6
Definitions
LLM CASE STUDY

Section 7: Transforming 86
Executive Summary 8
the Scientific Landscape

Introduction 16
Policy 97
Recommendations
Background: How do 30
Large Language Models
Developers’ Code of 101
Work?
Conduct

IMP L ICAT I O NS O F L L M DEV E LO P M ENT


Recommendations 103
Section 1: Exacerbating 39 for the Scientific
Environmental Injustice Community

Section 2: Accelerating 46 Acknowledgements 105


the Thirst for Data
References 106
Section 3: Normalizing 56
LLMs
For Further Information 133
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

About the Authors


Johanna Okerlund is a Aditya Middha is an
Human-Computer Interaction researcher undergraduate student in Computer Science
with a background in Computer Science and at the University of Michigan College of
additional training in Science Technology Engineering, with a minor in Public Policy,
Studies and Public Policy. She has a PhD in graduating in May 2022. Previously, he
Computing and Information Systems from contributed to research on an ethical
the University of North Carolina at Charlotte, computer science curriculum, as well as
where she studied makerspaces relative risk-limiting election audits. On campus, he
to their promise of democratization. As a helped co-found D2 Map, a mobile platform
postdoc at U-M working with the Science, to expand the reach of local community
Technology, and Public Policy program and organizers, and serves as a weekly volunteer
the Computer Science department, Johanna for the Downtown Boxing Gym. After
has been developing ways to bring ethics graduation, Aditya will be a Product Manager
and justice into CS courses and contribute for Microsoft and has plans to enter into the
to ongoing research about the societal educational technology space in the near
implications of emerging technology. She future.
plans to continue approaching technology
from a critical interdisciplinary perspective.
Sujin Kim is completing her BA in
Political Science from the University of
Evan Klasky is completing their Michigan, where she will graduate with
Master’s degree in Environmental Justice honors and distinction in May 2022. She is
from the University of Michigan’s School interested in the politics of the congressional
for Environment and Sustainability, along legislative process, and American political
with a graduate certificate in Science, institutions more broadly. She has worked on
Technology, and Public Policy from the Ford research projects spanning a range of topics,
School for Public Policy, in May 2022. Their including public health harm reduction
research has focused on the biopolitics of legislation, cybersecurity policy, and
agricultural technology. They hold a BA in congressional oversight capacity. Following
Political Science from Haverford College, graduation she will be pursuing a PhD in
where he researched regime transformation American Politics, and hopes to apply her
in Venezuela. In the fall of 2022, he plans to experience with Congress and the legislative
enter a doctoral program in Geography. process in practice on the Hill.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 3 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Ph.D. in Higher Education Policy from the


Hannah Rosenfeld earned University of Michigan Center for the Study
a Master of Public Policy degree from the
of Higher and Postsecondary Education, her
University of Michigan, where she also
M.S. in Information from the University of
received graduate certificates in Science,
Michigan School of Information, and her
Technology, and Public Policy and Diversity,
B.A. in English and Gender Studies from Bryn
Equity, and Inclusion. Hannah was an author
Mawr College.
of the first Technology Assessment Project
report Cameras in the Classroom: Facial
Recognition Technology in Schools (2020) and Shobita Parthasarathy
conducted research on COVID-19 testing is Professor of Public Policy and Women’s
and medical technology innovation. She Studies, and Director of the Science,
worked in the tech industry for over seven Technology, and Public Policy Program, at the
years developing consumer products and University of Michigan. She conducts research
medical diagnostic tools before moving on the political economy of innovation with
into technology regulation and led the New a focus on equity, as well as the politics of
York City chapter of the LGBTQ+ non-profit evidence and expertise in policymaking, in
Out in Tech before becoming the Head of comparative and international perspective.
Diversity, Inclusion, and Belongingness Her research topics include genetics and
for the international organization. In April biotechnology, intellectual property, inclusive
2022, she will continue developing policy for innovation, and machine learning. Professor
emerging technology at the Food and Drug Parthasarathy is the author of multiple
Administration, focusing on digital health. scholarly articles and two books: Building
Genetic Medicine: Breast Cancer, Technology,
and the Comparative Politics of Health Care
Molly Kleinman serves as (MIT Press, 2007) and Patent Politics: Life
the Managing Director of the Science,
Forms, Markets, and the Public Interest in
Technology, and Public Policy program at the
the United States and Europe (University of
University of Michigan. In this role, Molly
Chicago Press, 2017). She writes frequently
oversees the day-to-day management and
for public audiences and co-hosts The
provides strategic direction for STPP. Molly
Received Wisdom podcast, on the relationships
brings over 15 years of experience across
between science, technology, policy, and
several areas of higher education, with
society. She regularly advises policymakers
much of her work centering on educational
in the United States and around the world,
technology, access to information, and
and is a non-resident fellow of the Center for
intellectual property. Molly received her
Democracy and Technology.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 4 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

About the Science,


Technology, and Public
Policy Program
The University of Michigan’s Science, and just science, technology, and related
Technology, and Public Policy (STPP) policies. Housed in the Ford School of Public
program is a unique research, education, Policy, STPP has a vibrant graduate certificate
and policy engagement center concerned program, postdoctoral fellowship program,
with cutting-edge questions that arise at the public and policy engagement activities, and
intersection of science, technology, policy, a lecture series that brings to campus experts
and society. It is dedicated to a rigorous in science and technology policy from around
interdisciplinary approach, and working the world. Our affiliated faculty do research
with policymakers, engineers, scientists, and influence policy on a variety of topics,
and civil society to produce more equitable from national security to energy.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 5 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Acronyms and Definitions


Analogical case study; a methodology for predicting the impact of
ACS emerging technologies

AI Artificial intelligence

Automatic Language Processing Advisory Committee; formed in 1964 to


ALPAC assess the utility of recent advances in NLP

Developers who integrate the LLM into an app or product that is deployed
App developer for others to use

ASM Artisanal and small scale mining

CI Cochlear implant

Supercomputing measurement that corresponds to how many


Compute computational operations take place and, ultimately, resources required.

Corpus (plural, Dataset consisting of text-based documents that an LLM is trained on.
corpora)
Person or entity that uses an app or product built on top of an LLM; we
End user also refer to them as users.

Few-shot A language model is a few-shot learner if it does not need additional

learner training to be able to perform different types of useful operations.

Fine-tuning involves feeding a trained LLM additional examples to steer


Fine-tuning its behavior relative to certain kinds of prompts.

FTC Federal Trade Commission

GDPR European General Data Protection Regulation

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 6 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Graphic processing unit; a highly parallel computing circuit used for fast
GPU processing.

Large language model; a type of AI trained on a massive amount of text


LLM to learn the rules of language. Can be used to translate, summarize, and
generate text.

Companies or other organizations creating LLMs such as OpenAI or


LLM Developer EleutherAI.

NLP Natural Language Processing

NSF National Science Foundation

OLPC One Laptop Per Child

Software for which the original source code is openly available and
Open source licensed so that future developers can use and build on it, so long as they
promise to keep their source code open so others can innovate beyond it.

LLM size is measured in parameters; the more parameters there are, the
Parameters more complex information about language a model can store

PII Personally identifiable information

Science and Technology Studies; a field of study that investigates the


STS historical, social, and political dimensions of science and technology.

Technical development in AI architecture that enabled LLMs to reach new


Transformer levels of enormity.

Set of coordinates in multi-dimensional space represented as a list of


Vector numbers; vectors are used in LLMs to represent words mathematically.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 7 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Executive Summary
Large language models (LLMs)—machine technologies–in terms of form, function, and
learning algorithms that can recognize, impacts–to anticipate the implications of
summarize, translate, predict, and generate emerging technologies.
human languages on the basis of very large
text-based datasets—are likely to provide This report first summarizes the LLM
the most convincing computer-generated landscape and the technology’s basic features.
imitation of human language yet. Because We then outline the implications identified
language generated by LLMs will be more through our ACS approach. We conclude that
sophisticated and human-like than their LLMs will produce enormous social change
predecessors, and because they perform including: 1) exacerbating environmental
better on tasks for which they have not been injustice; 2) accelerating our thirst for data;
explicitly trained, we expect that they will be 3) becoming quickly integrated into existing
widely used. Policymakers might use them infrastructure; 4) reinforcing inequality;
to assess public sentiment about pending 5) reorganizing labor and expertise, and 6)
legislation, patients could summarize and increasing social fragmentation. LLMs will
evaluate the state of biomedical knowledge to transform a range of sectors, but the final
empower their interactions with healthcare section of the report focuses on how these
professionals, and scientists could translate changes could unfold in one specific area:
research findings across languages. In sum, scientific research. Finally, using these
LLMs have the potential to transform how insights we provide informed guidance on
and with whom we communicate. how to develop, manage, and govern LLMs.

However, LLMs have already generated


serious concerns. Because they are trained
Understanding the LLM
on text from old books and webpages, LLMs Landscape
reproduce historical biases and hateful
speech towards marginalized communities.
Because LLMs require enormous resources in
They also require enormous amounts of
terms of finances, infrastructure, personnel,
energy and computing power, and thus are
and computational power, only a handful of
likely to accelerate climate change and other
large tech companies can afford to develop
forms of environmental degradation. In this
them. Google, Microsoft, Infosys, and
report, we analyze the implications of LLM
Facebook are behind the prominent LLM
development and adoption using what we call
developments in the United States. While a
the analogical case study (ACS) method. This
few organizations (such as EleutherAI and
method examines the history of similar past
the Beijing Academy of Artificial Intelligence)

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 8 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

are developing more transparent and open based documents, often taking advantage
approaches to LLMs, they are supported of collections of digitized books and user-
by the same venture capital firms and tech generated content on the internet. Second,
companies shaping the industry overall. the model learns about word relationships
Meanwhile, although there are many from this data. Large models are able to retain
academic researchers in this area, they tend complex patterns, such as how sentences,
to depend on the private sector for LLM paragraphs, and documents are structured.
access and therefore work in partnership Finally, developers assess and manually
with them. Government funding agencies, fine-tune the model to address undesirable
including the National Science Foundation, language patterns it may have learned from
support these collaborations. This tightness the data.
in the LLM development landscape means
that even seemingly alternative or democratic After the model is trained, a human can use
approaches to LLM development are likely it by feeding it a sentence or paragraph, to
to reinforce the priorities and biases of large which the model will respond with a sentence
companies. or paragraph that it determines is appropriate
to follow. Developers are under no obligation

How Do Large Language to disclose the accuracy of their models, or


the results of any tests they perform, and
Models Work? there is no universal standard for assessing
LLM quality. This makes it difficult for
LLMs are much larger than their third parties, including consumers, to
predecessors, both in terms of the massive evaluate performance. But publicly available
amounts of data developers use to train them, assessments of GPT-3, one of the largest
and the millions of complex word patterns language models to date, suggest two
and associations the models contain. LLMs areas for concern. First, people are not able
also more closely embody the promise to distinguish LLM-generated text from
of “artificial intelligence” than previous human-generated text, which means that
natural language processing (NLP) efforts this technology could be used to distribute
because they can complete many types of disinformation without a trace. Second, as
tasks without being specifically trained for suggested earlier, LLMs demonstrate gender,
each, which makes any single LLM widely racial, and religious bias.
applicable.
We add two more concerns, related to the
Developing an LLM involves three steps, emerging political economy of LLMs. As
each of which can dramatically change noted above, there are only a handful of
how the model “understands” language, developers working on these technologies,
and therefore how it will function when which means that they are unlikely to reflect
it is used. First, developers assemble an much diversity in need or consideration.
enormous dataset, or “corpus”, of text- Developers may simply not know, for

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 9 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

We add two more concerns, related to the emerging political


economy of LLMs. Because there are only a few developers
working on these technologies, they are unlikely to reflect
much diversity in need or consideration. And, because the
vast majority of models are in English, they are unlikely to
achieve their translation goals. Taking these dimensions
together, they could exacerbate global inequalities.

example, the limitations in their models


and corpora and thus, how they should be
The Implications of LLM
adjusted. Additionally, the vast majority of Development
models are based on English, and to a lesser
extent Chinese, texts. This means that LLMs
Exacerbating Environmental
are unlikely to achieve their translation goals
(even to and from English and Chinese),
Injustice
and will be less useful for those who are not
English or Chinese dominant. Taking these LLMs rely on physical data centers to process
dimensions together, they could exacerbate the corpora and train the models. These data
global inequalities. centers rely on massive amounts of natural
resources including 360,000 gallons of water
a day and immense electricity, infrastructure,
We have divided the findings of our ACS
and rare earth material usage. As LLMs
analysis into two categories. The first
become widespread, there will be a growing
focuses on the implications of LLM design
need for these centers. We expect that
and development, examining the social
their construction will disproportionately
and material requirements to make the
harm already marginalized populations.
technology work. The second identifies
Most directly, data centers will be built in
how LLM applications and outputs might
inexpensive areas, displacing low-income
transform the world.
residents, as US highways did in the 1960s
when planners displaced over 30,000 Black
and immigrant families per year. In the
process of accommodating LLMs, tech
companies will turn a blind eye to similar
community disruption. Meanwhile, those
that continue to live near data centers will
be forced to deal with an increased strain

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 10 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

on scarce resources and its subsequent We are also concerned that LLM developers
effects. Already, residents near Google and will turn to unethical methods of data
Microsoft data centers on the West Coast have collection in order to diversify the corpora.
expressed concerns about the companies’ As noted above, researchers have already
overconsumption of water and contribution demonstrated how LLMs reflect historical
to toxic air pollution. Unfortunately, it is biases about race, gender, religion, and
unlikely that these concerns will influence sexuality. The best way to address these
siting decisions; like oil and gas pipelines, biases is to ensure that the corpora include
we expect that data centers will be legally more texts authored by people from
classified as “critical infrastructure”. marginalized communities. However,
Attempted protests will be treated as criminal this poses serious risks of unethical data
offenses. extraction such as when Google attempted to
improve the accuracy of its facial recognition
technology by, in part, taking pictures of
Accelerating the Thirst For
homeless people without complete informed
Data consent.

As we note above, LLMs are based on


At the same time, LLMs will enhance feelings
datasets made up of internet and book
of privacy and security for some users.
archives. The authors of these texts have
Disabled people and the elderly, who often
not provided consent for their data to be
depend on human assistants to fulfill basic
used in this way; tech developers use web
needs, will now be able to rely on help from
crawling technologies judiciously to stay
LLM-based apps.
on the right side of copyright laws. But
because they collect enormous amounts of
data, LLMs will likely be able to triangulate Normalizing LLMs
bits of disconnected information about
individuals including mental health status We expect that in order to ensure that LLMs
or political opinions to develop a full, become central to our daily lives, developers
personalized picture of actual people, their will emphasize their humanitarian and even
families, or communities. We expect that empowering features. At present, most people
this will trigger distrust of LLMs and other know nothing about the technology, except
digital technologies. In response, users for tech news watchers aware that Google
will use evasive and anonymizing behavior fired two employees due to their concerns
when operating online which will create about equity and energy implications. In this
real problems for institutions that regularly environment, developers will emphasize the
collect such information. In a world with technology’s modularity: that it can be tuned
LLMs, the customary method for ethical data to serve specific purposes. This emphasis on
collection–individual informed consent–no flexibility will be reminiscent of the early days
longer makes sense. of the auto industry, when car manufacturers

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 11 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

promoted broad social acceptance of the


automobile by encouraging skeptical
The Implications of LLM
farmers to use the technology as a malleable Adoption
power source. We also expect developers
to quickly integrate the technology into
Reinforcing Inequality
crucial and stable social systems, such as law
enforcement.
Trained on texts that have marginalized the
experiences and knowledge of certain groups,
Finally, developers will emphasize the
and produced by a small set of technology
accuracy of LLMs and attempt to minimize
companies, LLMs are likely to systematically
any errors and deflect blame for them. This
misconstrue, minimize, and misrepresent the
was already clear in the Google episode, when
voices of historically excluded people while
the company asked their employees to remove
amplifying the perspectives of the already
their names as co-authors from a research
powerful. But fixing these problems isn’t just
paper critical of LLMs. But this is a common
approach, especially
at early stages of
a technology’s
deployment. One Trained on texts that have marginalized
particularly high-
the experiences and knowledge of certain
profile example is
the Boeing 737 MAX
groups, and produced by a small set of
plane. After Boeing technology companies, LLMs are likely
quietly installed
to systematically misconstrue, minimize,
the Maneuvering
Characteristics
and misrepresent the voices of historically
Augmentation excluded people while amplifying the
System (MCAS)
perspectives of the already powerful.
system onto
its planes and
an Indonesian
airliner crashed,
a matter of including more, better data. LLMs
the company insisted that the pilots were
are built and maintained by humans who
at fault. Only after a second plane crash in
bring values and biases to their work, and
Ethiopia did corrective action take place. LLM
who operate within institutions, in social and
development could follow a similar path,
political contexts. This will shape the LLM
deflecting blame away from the technology
issues that developers perceive, and how they
until problems become too big to ignore or
choose to fix them.
until affected parties learn about one another
and build a coalition in response.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 12 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Our analysis shows that LLMs are likely LLMs are likely to function best in their
to reinforce inequalities in a few ways. In dominant training language. Eventually this
addition to producing biased text, they will reinforce the dominance of standard
will reinforce the inequitable distribution American English in ways that will expedite
of resources by continuing to favor those the extinction of lesser-known languages or
who are privileged through its design. For dialects, and contribute to the cultural erasure
example, racial bias is already embedded of marginalized people. Furthermore, because
in medical devices such as the spirometer, they are based on historical texts LLMs
which is used to measure lung function. The are likely to preserve limited, historically
technology considers race in its assessment suspended understandings especially of
of “normal” lung function, falsely assuming the non-American or Chinese cultures
that Black people naturally have lower lung represented in its corpora.
function than their white counterparts. This
makes it more difficult for Black people to
Remaking Labor and
access treatment. Similarly, imagine an LLM
app designed to summarize insights from
Expertise
previous scientific publications and generate
health care recommendations accordingly. Most people studying the impact of

If previous publications rely on racist automation on labor warn of job losses,

assumptions, or simply ignore the needs particularly for those in lower skilled

of particular groups, the LLM’s advice is occupations. In the case of LLMs, we expect

likely to be inaccurate too. We expect similar job losses to be more prevalent in professions

scenarios in other domains including criminal tightly coupled with previous technologies;

justice, housing, and education where biases LLMs will completely eliminate certain

and discrimination enshrined in historical kinds of tech-based work such as content

texts are likely to generate advice that moderation of social media while creating

perpetuates inequities in resource allocation. new kinds of tech-based work. But our

Unfortunately, because the models are opaque analysis suggests that LLMs are also likely to

and appear objective, it will be difficult to transform labor. In particular, we expect that

identify and address such problems. As a with widespread adoption LLMs will perform

result, individuals will bear the brunt of them mundane tasks while shifting humans to

alone. more difficult or damaging tasks. This will


even happen in high-skilled professions.
Consider genetic counselors, who began
Meanwhile, LLMs will reinforce the
helping people assess their and their families’
dominance of Anglo-American and Chinese
genetic risks in the early 20th century. With
language and culture at the expense of
the recent rise of genetic testing, consumers
others. We are particularly concerned that the
are increasingly learning about their risks
corpora are composed primarily of English
through private companies such as 23andMe.
or Chinese language texts. While some
But genetic counselors are still working; they
developers have argued that LLMs could help
just handle the more complex, urgent, and
preserve languages that are disappearing,
stressful cases.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 13 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Professions that heavily use writing (e.g., that aligns with their interests and values and
law, academia, journalism) will have to erode shared realities further.
develop new standards and mechanisms for
evaluating authorship and authenticity. For Finally, as LLMs get better at writing text
example, the invention of the typewriter led that is indistinguishable from something a
to the creation of the “document examiner” human could have written, they will not only
position to determine the provenance of typed challenge the cultural position of authors but
text; we could imagine a similar job for LLM- also trust in their authorship. For example,
based text. Finally, we expect widespread use many schools and universities today use
of LLMs to trigger labor resistance. There plagiarism detection technologies to prevent
is a long legacy of technology-driven labor student cheating. However, this has triggered
unrest including the Luddites of the 19th a technological arms race. A variety of
century. More recently, the United Food services have emerged to help students cheat
and Commercial Workers International while evading detection by Turnitin, from
Union’s developed public campaigns against websites full of how-to advice to paid essay
Amazon’s cashierless grocery store model. writing services. LLMs will trigger a similar
LLMs will incite similar resistance from dynamic. The more writers of all kinds use
workers and consumers based on fear of job LLMs for assistance, the more efforts to
loss, violations of social norms, and reduced authenticate whether they “really” wrote
income taxes. their article or book, and the more writers
will find new ways to take advantage of LLM

Accelerating Social capabilities without detection. In the long


run, this will create cultures of suspicion on a
Fragmentation
massive scale.

While LLMs may be used primarily in


the workplace, we also expect a variety Case Study:
of public-facing apps, including those
that summarize medical information and
Transforming Scientific
help citizens generate legal documents. Research
Such apps are likely to empower some
communities in important ways, even Overall, this report focuses broadly on the
allowing them to mount successful activism social and equity impacts of LLMs, and we
against scientific, medical, and policy have suggested that the technology will affect
establishments. But, because LLM design a range of professions. In the final substantive
is likely to distort or devalue the needs of section of the report, we provide an example
marginalized communities we worry that of how LLMs will affect just one: scientific
LLMs might actually alienate them further research. First, because academic publishers,
from social institutions. We also expect social such as Elsevier and Pearson, own most
fragmentation to arise elsewhere, as LLMs research publications, we expect that they
will allow individuals to generate information will construct their own LLMs and use them

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 14 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

to increase their monopoly power. While systematically biased. Institutional review


LLMs could be extremely valuable tools for boards, which evaluate the ethics of scientific
disseminating knowledge, publishers’ LLMs research, have been repeatedly criticized for
will concentrate knowledge further and most reducing ethical assessments to legal hurdles,
people will be unable to afford subscriptions. and we expect a similar outcome if LLMs are
While researchers may try to construct used for peer review. For example, LLMs will
alternative LLMs that provide accessible and probably not be able to identify truly novel
egalitarian access to scholarly research, these work, a task that is already quite difficult for
will be extremely difficult to build without human beings. Given these likely outcomes,
targeted assistance from both the scientific we suspect that scientists will come to
community and government funders. distrust LLMs.

In addition to shaping access to knowledge, Finally, we expect that LLMs will help
we expect that LLMs will transform scientific some researchers improve their English or
knowledge itself. Technologies, from Chinese writing skills and increase their
the microscope to the superconducting publications in top journals. The technology
supercollider, have long shaped the substance will likely be particularly useful for scholars
of research, and LLMs will be no exception. from British Commonwealth countries
We expect that fields that analyze text, whose language may differ only slightly
including the digital humanities, to be from standard English. However, we expect
the most affected. Researchers will need translation in and out of other languages to
to develop standard protocols on how to be poor and researchers unfortunately may
scrutinize insights generated by LLMs and not always be aware of such limitations at
how to cite LLM output so that others can the outset. Meanwhile, the more common
replicate the results. LLMs are likely to have LLMs become as a scientific tool, the more
profound impacts on the nature of scientific they will reinforce English as the lingua
inquiry as well, by encouraging recent trends franca of science. This will likely also mean
that focus on finding patterns in big data that the values and concerns of the English-
rather than establishing causal relationships. speaking world–particularly the United
States and Britain–will dominate global

LLMs are also likely to transform scientific scientific priorities. And yet, these political

evaluation systems. Editors currently struggle implications may remain hidden because

to find peer reviewers, and LLMs could help. LLMs will be promoted as a technology that

However, LLMs are likely to be rigid and will be able to truly globalize science.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 15 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Introduction
Large language models (LLMs) are a type their health care (Bommasani et al., 2021).
of artificial intelligence (AI) intended Their ability to answer questions and hold
to recognize, generate, summarize, and conversations could transform customer
translate human language. They are different service (Dale, 2021). In the classroom they
from previous approaches to natural language could be used to create virtual teachers
processing (NLP) because they are based personalized to a student’s learning style
on enormous datasets and designed to (Manjoo, 2020). And, because LLMs gain new
extract and replicate the rules of language functionalities as the scale of their datasets
(Radford et al., 2019). Although some smaller increases, enthusiasts claim that future LLMs
scale language automation algorithms are will develop new and unforeseen applications
currently in use, LLMs have the potential with additional benefits (Seabrook, 2019).
to transform how and with whom we
communicate because their output is likely Despite these promises, LLMs have already
to be more sophisticated and human-like prompted controversies that complicate
than their predecessors, and because they these claims. Because LLMs are trained on
perform better on tasks for which they have datasets that include substantial quantities
not been explicitly trained. To create LLMs, of old texts that often contain antiquated
developers use machine learning techniques and violently prejudiced language, LLMs
to model the relationships between different repeat and perpetuate those same violent
text elements based on extremely large data tendencies (Abid et al., 2021; Tamkin et
sets of text from internet and book archives. al., 2021). The large number of computers
Once the LLM model is complete, it can be and colossal amount of computing power
applied to tasks like automated question required to both train and operate LLMs
answering, translation, text summarization, leads to resource extraction that degrades
and chatbots (Tamkin et al., 2021). the environment, and carbon emissions that
contribute to climate change (Bender & Gebru
Scientists, entrepreneurs, and tech- et al., 2021). Their ability to produce text
watchers excited about LLMs describe that sounds human with minimal prompts
them as a revolutionary technology with make LLMs a potential tool to efficiently and
potential applications in a dizzying array of effectively manufacture propaganda and
contexts and fields (Bommasani et al., 2021; disinformation through false news articles
Dale, 2021). LLMs could be used to bolster and social media posts (Tamkin et al., 2021).
international collaboration in science, provide Most importantly, critics point out that these
legal services to those who traditionally can’t equity and environmental problems are likely
afford them, and help patients advocate for to go unaddressed because the high cost of

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 16 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

running LLMs has made their use exclusive In this report, we anticipate the potential
to very large and well resourced corporations, implications of LLMs by analyzing the history
creating economic barriers and limiting of similar technologies, using what we call an
access to only wealthier and more powerful analogical case study method. We then focus
entities (Knight, 2021). on one domain where LLMs are likely to have
a significant impact: scientific research. We
conclude with recommendations for both
policymakers and the scientific research
The “Stochastic community, and a “code of conduct” to guide
the practices of LLM developers.
Parrots”
Controversy
LLMs are still new and
Large language models gained experimental, and therefore their
notoriety in the wake of the firing of
ex-Google employees Timnit Gebru social impact is still emerging.
and Margaret Mitchell. Gebru co- But history teaches us that
led Google’s “Ethical AI team” with
Margaret Mitchell, and along with
their impact will be profoundly
academic and Google colleagues, shaped by those creating it.
co-authored a paper on the risks
and failings of LLMs called “On the
Dangers of Stochastic Parrots: Can LLMs are still new and experimental, and
Language Models Be Too Big?”
therefore their social impact is still emerging.
The paper raised concerns about
But history teaches us that their impact will
the environmental impacts, and
be profoundly shaped by those creating it, and
problems with training data including
at present, because of the enormous capacity
unmanageability, encoded bias,
and lack of accountability (Bender and resources needed, the primary developers

& Gebru et al., 2021). According are a handful of large tech companies. As we
to Gebru, in winter 2020 Google discuss throughout the report, this should
attempted to prevent her from give us some cause for concern. We noted
releasing the paper without major above that LLMs could improve and increase
revisions; when Gebru refused, they access to specialized expertise from law,
fired her (Metz & Wakabayashi, medicine, science, and more. They could
2020). The ensuing controversy also make technical information more
eventually led to Mitchell’s firing, and widely available to the public, ultimately
publication of the original version empowering individuals and communities.
of the paper without Google’s
However, it is unlikely that the technology
edits (Bender & Gebru et al., 2021;
will be able to achieve any of these benefits
Simonite, 2021).
if it is built by a narrow group of elites and
without proper technology assessment and

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 17 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

programs that could hold a

Because LLMs are still at an early conversation (Liddy, 2001). ELIZA


was not programmed to learn
stage, we can still shape the connections between ideas or
development and implementation words like today’s LLMs. Instead,

of the technology to ensure that it using the dominant approach of


the period, Weinbaum’s team
serves the public interest. manually developed simple
linguistic rules and automated
them, which allowed ELIZA to
respond in conversation with
oversight. Because LLMs are still at an early
phrases that reflected back what had just
stage, we can still shape the development and
been said (Epstein, 2001). Weinbaum’s goal
implementation of the technology to ensure
was to demonstrate that computers did
that it serves the public interest.
not have intelligence by showing that the
conversations were too simplistic, but users
History of Automating anthropomorphized the computer, and other
researchers determined that the rudimentary
Language chatbot might even have therapeutic value
despite its simple communication.
In 1950, mathematician and philosopher Alan
Turing proposed that a machine should be
considered intelligent if a human could not
tell whether another human or the computer
was responding to their questions; this
was the beginning of the ongoing quest to
automate human language (Oppy & Dowe,
2021; Turing, 1950). The “Turing Test” is still
used to measure whether a “talking” program
is communicating successfully (Computer AI
passes, 2014). While developers have since
imagined that automated language programs
could be used for a variety of service and
artistic tasks, the success of the effort is still
measured, at least in part, by its capacity to
imitate human language. Public domain

The ELIZA program, created by Joseph Aside from Eliza, most early language
Weinbaum in 1964 at the Massachusetts automation research was driven by U.S.
Institute of Technology’s (MIT) Artificial military priorities and government funders
Intelligence Lab, was one of the first language largely considered them inadequate

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 18 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

(Hutchins, 2003). From the mid 1950s to for representing language input designed
1964, most federal research funding focused to help machines develop conceptual
on machine translation, specifically Russian understandings of words so their responses
to English translations for the Cold War. In could be more useful (Moltzau, 2020). These
1964, the Department of Defense, National developments led to the first program that
Science Foundation, and Central Intelligence used simple natural language inputs -
Agency among other branches of the language written as it normally would be in
government created the Automatic Language human conversation - to control a machine
Processing Advisory Committee (ALPAC) at Massachusetts Institute of Technology
to assess the utility of this work in terms of in 1971. By 1985 the main uses developers
advancing government priorities, reducing imagined for language programs at the time
costs, improving performance, or addressing could be loosely categorized into 6 groups: 1)
a need that humans could not fill. To evaluate interfacing with databases, 2) conversational
the translation program, ALPAC compared interfaces for programs, 3) content scanning
Russian to English machine translations to of semi-formatted texts to determine actions,
human translations based on intelligibility 4) text editing for grammar and style, 5)
and fidelity, and found that humans far translation, and 6) transcription of spoken
outperformed machines, human translation input (Dale, 2017).
was less costly after edits, and that the
government had sufficient capacity for By the 1980s the increased availability of
Russian translation. Based on these findings, computing power allowed researchers to
the committee recommended that defense begin integrating statistical methods into
agencies stop funding machine translation, natural language program development
and that the NSF switch to only funding basic (Hutchins, 2003). These statistical
computational linguistics research. approaches, called Natural Language
Processing (NLP), essentially allowed the
As a result of the ALPAC report, the federal computer to “learn” for itself how language
government largely stopped funding works, by identifying patterns from text-
machine translation research until the based “training” datasets rather than
Advanced Research Projects Agency took up relying on researchers to lay out complex
the subject again in the 1990s, but private rules based on linguistics research. As the
industry continued to tackle machine amount of digital text grew, researchers were
language projects aimed at other uses like able to compile larger and larger data sets
text generation and conversation. During which improved the performance of these
that period, research in machine language “statistical” language programs. Eventually
moved towards methods of representing and statistical methods outperformed and
communicating meaning in dialogue and replaced programs based on linguistic rules
natural language generation, which more and NLP has become an interdisciplinary field
closely resembles the goals of today’s LLMs bringing together insights from linguistics,
(Hutchins, 2003). For example, in 1969 and computer science, and artificial intelligence.
1970, researchers introduced new methods

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 19 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

The major LLM developers are the US-


Today’s LLM Landscape based Alphabet (Google), Meta (Facebook),
Microsoft, and OpenAI, and the China-based
At present, large tech companies dominate Alibaba Group and Baidu. Some LLMs are
the development of LLMs because of the built entirely within one company, including
enormous resources required in terms
of finances, infrastructure, personnel,
and “compute”, a measurement in
supercomputing that corresponds to how
many computational operations take place
and ultimately, how many resources are
required (Luitse and Denkena, 2021). Even
academic researchers must partner with the
private sector in order to obtain the resources
to develop truly large language models that
can approach the capabilities of cutting edge
LLMs. This monopolization raises a few
concerns. First, LLMs are likely to reflect
the priorities of the private sector, or more
Credit: Baltic Servers
specifically a handful of the most powerful
tech companies, complicating their potential
to achieve societal benefits. Second, the Google’s BERT, Alibaba’s VECO, and
prominent role of the private sector makes Facebook’s M2M-100 (Alford, 2020). Even
it more difficult for third parties to assess the developer of GPT-3, OpenAI, was created
the technology, while internal researchers by investments from Microsoft, Infosys,
are likely to be under pressure to paint a rosy and several venture capital firms and tech
picture, as demonstrated by Google’s decision billionaires. It recently gave up its non-profit
to fire Timnit Gebru and Margaret Mitchell status to become a company that profits from
(See “Stochastic Parrots” Text Box) (Hao, AI products sometimes built exclusively for
2020; Schiffer, 2021). investors.

Many of these same

First, LLMs are likely to reflect the companies also fund


university research.
priorities of the private sector, Google, IBM, and Wells
complicating their potential to achieve Fargo help fund a

social benefits. Second, the prominent prominent center, the


Stanford Human-Centered
role of the private sector makes it more Artificial Intelligence
difficult for third parties to assess. (HAI) Lab, while Microsoft
and Toyota frequently

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 20 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

partner with Massachusetts Institute of capital firms that fund the for-profit entities.
Technology’s (MIT) Computer Science and For example, Hugging Face has developed
Artificial Intelligence Lab (Stanford HAI, n.d; open source resources for the production of
MIT CSAIL, n.d). The top cited LLM research LLMs, including datasets and models, and
papers are co-authored by researchers explicitly invokes the values of accessibility
from industry and academia. In addition to and democratizing innovation (Dillet, 2021).
providing university researchers with LLM It supports users as they develop and upload
access, these collaborations also provide the their own content, allowing for transparency
private sector counterparts with scholarly and collaboration of model and dataset
credibility. construction. Hugging Face hired Margaret
Mitchell, a co-author of the Stochastic

Government funding agencies have Parrots paper (See “Stochastic Parrots” Text
increasingly encouraged university-industry Box” above), to lead data governance efforts.
collaboration. In 2019, the US National In addition, Hugging Face has initiated the

Science & Technology Council released a BigScience Project, an effort to create and

strategic plan that prioritized collaboration share datasets, models, and software tools

between academia and industry (Select in order to reveal and minimize potential

Committee on Artificial Intelligence, 2019). problems with LLMs (Hao, 2021). Similarly,

The plan argued that these collaborations EleutherAI has developed open source

would address problems with safety, language models intended to replicate

predictability, ethics, and legal questions GPT-3, as well as an 860 GB dataset for

proactively, before AI products are developed. language modeling (EleutherAI, n.d.; Gao

This had an immediate impact: the NSF, et al., 2020). While EleutherAI is volunteer-

which had previously focused on AI research based, the collective depends on donated GPU

within universities, has now launched compute from CoreWeave, which is part of

seven national AI research institutes that the NVIDIA Preferred Cloud Services Provider

facilitate industry-university collaborations network (Leahy, 2021). The tightness of the

(Gibson, 2020). Three of these have focus LLM development landscape means that

areas related to LLMs. Similarly, the even seemingly alternative or democratic

European Commission is funding several approaches to LLM development are likely

international university-industry research to reinforce the priorities and biases as large

collaborations, often financing links between developers (AI Now Institute, 2021).

European academic researchers and US tech


companies (Stix, 2020). The Japanese and Social scientists, ethicists, and computer
Chinese governments are fostering similar scientists are starting to investigate the social
collaborations (AIRC, n.d.; Luong & Arnold, implications of LLMs, sometimes with the
2021). assistance of government funding (Birhane
et al., 2021). Stanford’s HAI, for example,

There are also attempts to create open-source recently announced a new interdisciplinary

or more transparent LLMs, but some of these research arm, the Center for Research on

projects are backed by the same venture Foundation Models (CRFM) to “study and

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 21 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

build responsible foundation models’’ are commonly referenced in the LLM


(Stanford CRFM, n.d.). The amount of critical community. BERT was the first LLM to
analysis on LLMs has grown since Google’s implement and show the promise of the
decision to fire AI Ethics leads Gebru and transformer architecture, which is the key
Mitchell, but it is limited by a lack of industry innovation that has allowed LLMs to process
collaboration, which leaves non-industry such large amounts of data. OpenAI built on
researchers without basic access to the model this architecture and created GPT-3, which
or information about the contents of the corpus. demonstrated how new behaviors and levels
of accuracy emerge simply by increasing the
size of an LLM. GPT-J is an LLM called the
Influential Large Language
“open source cousin” to GPT-3. It does not
Models Today perform as well, but is notable as a grassroots
effort to replicate corporate LLMs (Romero,
Throughout the report we refer to specific 2021). Hugging Face is a prominent source
LLMs periodically, especially Google’s of LLM development, though they focus on
BERT (Devlin et al., 2018a), OpenAI’s GPT- supporting development of and providing
3 (Brown et al., 2020), and EleutherAI’s access to different models, documentation,
GPT-J (Romero, 2021) as they represent and corpora (Hao, 2021).
specific technological achievements and

TA B L E 1 . I N F L U E N T I A L L L M S

Model Developer Released Size Hallmark /


Name Name Year In parameters uniqueness Availability

BERT Google AI 2017 340 million Demonstrates the Open source (Devlin et al.
Language transformer architecture, 2018b): Model and corpus are
which is what enables all available to use and adapt
LLMs to be large for free

GPT-3 OpenAI 2020 175 billion Was the largest model App developers can apply
at the time of release; for paid API access; OpenAI
demonstrates that new indicates this is a temporary
properties emerge simply safety measure
by increasing the size of
a model

GPT-J EleutherAI 2020 6.7 billion Open source grassroots Open source: Model and corpus
version of GPT-3 and are all available to use and
adapt for free

WuDao Beijing Academy 2021 1.75 trillion First model to reach 1 Open source (Romero, 2021b):
2.0 of Artificial trillion parameters; model Model and corpus and are all
Intelligence (BAAI) is multimodal (trained on available to use and adapt for
UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 22 PDF: Click to
both text and images) free
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Developers and Users Interact particular purpose. OpenAI, for example,


has created a software interface that allows
with LLMs at Different Levels apps or websites to send input text to GPT-3
and receive the output text that the model
Throughout this report, we describe the generates based on that input. One such
LLM landscape in terms of three types example is Viable, an app that companies
of participants. LLM developers are the can use to synthesize and extract insights
entities (usually companies) creating LLMs. from customer feedback (Viable, n.d.). In
First-order users are the entities, likely to another case, a first-order user fine-tuned
range from individuals to self-organized GPT-3 to mimic his deceased fiance in a chat
community groups to large companies, application (Fagone, 2021). Finally, second-
who develop tailored apps to harness an order LLM users are the publics who use the
LLM’s power, often fine-tuning it for a apps created by first-order users.

TA B L E 2 . TA X O N O M Y O F E N T I T I E S I N V O LV E D
IN LLM DEVELOPMENT AND DEPLOYMENT

Description Example

LLM developers Companies or other organizations OpenAI, Hugging Face, Eleuther


creating LLMs

App developer Developers who integrate the LLM into Company that develops an LLM-enabled
an app or product that is deployed for platform for extracting insights from
others to use customer feedback

End user Person or entity that uses an app or Company that uses an LLM-enabled
product built on top of an LLM platform for extracting insights from
customer feedback

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 23 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

concerns about authenticity in authorship


Uses of LLMs and the automation of propaganda. Not
only do humans have trouble identifying
As LLMs increase in size, they will be able LLM-generated text, but a GPT-3 bot also
to perform a growing number of tasks. generated and posted comments on Reddit
Each new generation of models brings new for a week without anyone noticing (Heaven,
functionality, so some of these tasks will be 2020) and a study showed that Twitter users’
difficult or impossible to predict. However, political opinions can be swayed by GPT-3
we know that LLMs are likely to be able to tweets (Knight, 2021).
generate, summarize, translate, and engage
in dialogue. In what follows, we discuss the
LLMs can also generate computer code;
near-future functional applications of LLMs
GPT-3 was able to write code because
and the different ways technologists or
its corpus likely contained tutorials and
developers might interact with them.
discussion posts with snippets of text that
had human language descriptions followed
Generating Text by code (Brown et al., 2020). Developers and
companies are now refining this functionality

When an application or end user gives an and integrating it into different end-user

LLM a word, sentence, or paragraph(s), products (OpenAI Codex, 2021; Vincent,

the LLM will be able to return the following 2021), which raises questions about code

word, sentence, or paragraph(s) based on quality, security, and intellectual property.

the patterns learned from the training data.


For example, when given the headline of a Summarizing and Extracting
hypothetical news article, the LLM would
Information
return an entire article with credible text
to match that headline. As a result, an LLM
LLMs will likely be able to summarize web
could be immensely helpful as a writing
pages and other content. Google has already
assistant, helping users generate text (Better
incorporated some of this basic functionality
Language Models, 2019). However, this raises
into its search engine,
and LLM summarization
could ultimately be used
Not only do humans have trouble to produce a new kind of

identifying LLM-generated text, but a search engine that responds


directly to a query rather
GPT-3 bot also generated and posted than simply providing a
comments on Reddit for a week without ranked list of related links

anyone noticing and a study showed (Heaven, 2021). It could


also help users understand
that Twitter users’ political opinions can key information from
be swayed by GPT-3 tweets. lengthy documents such as

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 24 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

technical reports or legislative text (Tamkin they don’t spend money at all in a month”
et al., 2021). Similarly, companies may be (Syverson, 2020).
able to better understand and extract key
information and takeaways from customer Perhaps most immediately, companies
feedback or other interactions (Viable, n.d.). are likely to use conversational LLMs to
make more sophisticated customer service

Translating Text chatbots and generate more meaningful


search results when customers ask complex
questions online (Algolia, n.d.). Experts have
LLMs can translate text across languages
suggested however, that LLMs will probably
as well as transform text across linguistic
not be able to exhibit this functionality
styles. For example, LLMs could turn legalese
without additional fine-tuning (Patterson,
into plain language (Blijd, 2020) or create a
n.d.). One serious concern with conversation
version of content written in an individual’s
applications is that there is a high risk that
writing or speaking style (Better Language
LLMs will push dialogue in dangerous or
Models, 2019). They could also facilitate
socially undesirable directions: for example,
international communication by translating
one model encouraged suicide (Daws, 2020).
between languages. However, the precision
of the translation depends on how much
of that language or linguistic style is in a Policy Landscape
given corpus, and as noted above most of the
corpora of prominent US-based LLMs are in
No laws, anywhere in the world, specifically
English (Brown et al., 2020).
cover LLMs. Nor are there credible third-
party assessments of their accuracy. Instead,
Engaging in Dialogue-based a variety of laws and policies related to
Conversation copyright, data privacy, and algorithm
accuracy touch on both LLM production and
the content that they produce.
LLMs will also likely be able to converse with
humans coherently (Better Language Models,
2019). As a result, like ELIZA and other early LLM corpora include millions of books and
chatbots, LLMs could provide companionship, articles that are individually protected by
psychotherapy (Zeavin, 2021), or medical copyright. However, in the United States a
advice (Rousseau et al., 2020). LLMs could string of legal cases have established that
also help with idea generation. Design and computers can index, search, and archive
consulting firm IDEO gave an LLM a sentence these texts without violating copyright
description of the problem they were trying to protections (Grimmelman, 2016). The
solve (i.e., encourage better spending habits) European Union gives media companies more
and it generated possible product ideas control over how search engines can use and
such as “Reward the user with real money if display certain kinds of content like news,
but it still permits text and data mining of the

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 25 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

generally assume some

No laws, anywhere in the world, level of participation by


a human creator, and
specifically cover LLMs. Nor are there do not provide for fully
credible third-party assessments of autonomous authorship

their accuracy. Instead, a variety of (Vézina & Moran, 2020).

laws and policies related to copyright,


LLMs also raise questions
data privacy, and algorithm accuracy about data privacy
touch on both LLM production and the and security, which is

content that they produce. covered by the European


Union’s General Data
Protection Regulation
(GDPR) and similar laws

kind used to build a training corpus (Council in several countries and

of the EU, 2018). The corpora themselves US states. Because many LLM corpora include

are not protected by their own copyright, text scraped from the internet, they may

but companies generally treat them as inadvertently include personal information

proprietary and remain secretive about what or multiple pieces of text that the model could

they contain. put together to deduce private information.


Studies have shown that if it receives the
right prompts, an LLM could output this kind
Meanwhile, there is great controversy about
of personal data (Carlini, 2020). The GDPR
the copyright status of AI-generated works,
focuses on the rights of people whose data
including the output of LLMs. In the United
companies collect, and on the responsibilities
States, a human must create a work in order
of companies collecting that data; as such, it
for it to be eligible for copyright protection
is unclear whether an LLM developer could
(Guadamuz, 2016; U.S. Copyright Office
be held liable in Europe for gathering or
Review Board, 2022). Under that framework,
using personal data available to programs
anything an LLM writes will be in the public
that scrape the internet. It is possible that the
domain, free for anyone to use or adapt in
website that originally hosted the data would
any way without permission. Copyright non-
be accountable because the primary source
profit Creative Commons and some legal
should have prevented web scrapers from
scholars support this approach, while others
accessing sensitive information. Because
believe that the laws should be changed to
most LLMs are not available to the public
extend copyright protections to AI-generated
for general use, the risk of these security
work (Hristov, 2017; Vézina & Moran,
breaches remains theoretical, and we are not
2020). The UK, Ireland, and New Zealand
aware of any legal proceedings related to an
provide limited protection to computer
LLM-involved data breach.
generated works, but these protections

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 26 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Finally, except for occasional bans or implications of technologies and sometimes


moratoria on algorithm-based technology even mobilize against them (Parthasarathy,
such as facial recognition, there is no 2017; Schurman & Munro, 2010), social
systematic regulation of algorithms anywhere scientists and humanists have argued that
in the world. However, governments are scientists, engineers, and policymakers can
starting to consider strategies. In 2021, the do a better job of predicting their impacts.
European Union proposed the Artificial These insights can then be used to design,
Intelligence Act which adopts a risk- implement, and govern technologies better to
based approach to regulating all of these maximize benefits while minimizing harms
technologies, which would presumably and maintaining public trust in science and
include LLMs (European Commission, 2021). technology. Researchers have experimented
AI that poses greater societal risk would be with multiple methods to accomplish this
subject to more regulation. In the United “anticipatory governance” (Michelson, 2016;
States, Congress has proposed various Stilgoe et al., 2013; Fisher et al., 2006; Selin,
bills. One of the most comprehensive is the 2011; Eschrich & Miller, 2021; Hamlett, Cobb,
Algorithmic Accountability Act of 2022, & Guston, 2012; Stirling, 2008).
proposed by Congresswoman
Yvette Clarke and Senators Cory
Booker and Ron Wyden (Wyden,
2022). Much more limited than We hypothesize that by
the pending EU legislation,
understanding how societies
it would require companies
to assess the impacts of their
have managed past technologies,
AI systems and disclose their we can anticipate how they
findings to the Federal Trade
might do so in the future.
Commission (FTC), create a new
Bureau of Technology within
the FTC, and require that the
FTC publish an annual report Our analogical approach rests on findings
on algorithmic trends which would help from the field of science and technology
consumers better understand the use of AI. studies (STS) that there are social patterns
in the development, implementation, and

Analogical Case Study implications of technologies (Browne, 2015;


Bijker et al., 1987; Parthasarathy, 2007).
approach We hypothesize that by understanding how
societies have managed past technologies,
In this report, we analyze LLMs using an we can anticipate how they might do so
analogical case study (ACS) approach. in the future. Furthermore, controversies
Over the last few decades, as societies over previous technologies offer insights
have begun to contend with the complex into the kinds of concerns and resistance
that might arise, groups who might be

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 27 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

affected, and solutions that might be feasible primary sources to develop an understanding
with emerging innovation (Nelkin, 1992). of the history, political economy, and
As Guston and Sarewitz (2002) argue: technical dimensions of LLMs.
“knowledge about who has responded to
transforming innovation in the past, the We then brainstormed two types of analogical
types of responses that they have used, and cases. Type 1 cases are similar to LLMs in
the avenues selected for pursuing those terms of their function (i.e., processing large
responses can be applied to understand amounts of data, often with the purpose of
connections between emerging areas of prediction), while Type 2 cases have similar
rapidly advancing science and specific implications as those projected for LLMs (e.g.,
patterns of societal response that may racial bias, massive energy use).
emerge” (p.101). By deliberately considering
the histories of analogical technologies across
We investigated these cases, which
sectors, our method identifies relevant social
intentionally draw from both historical
patterns in how technologies develop and
and more recent technologies, in areas
are implemented. It also allows us to identify
both similar to and different from LLMs.
successful social and policy approaches to
For example, to help us understand the
managing technological harms.
implications of potential biases embedded
in this emerging technology, we looked
Our analytic approach to LLMs builds on at medical technologies including the
the method we developed to study facial spirometer and pulse oximeter. To
recognition technologies and vaccine understand how LLMs might pose challenges
hesitancy (Galligan et al., 2020; Wang et al., to how we understand expertise and
2021). We began our work in May 2021 by professional competence, we looked at traffic
training the research team, composed of a lights, which removed traffic management
diverse group of faculty, staff, a postdoctoral from the domain of law enforcement officers.
fellow, and undergraduate and graduate We also looked at biobanks, large scale
students, in some of the basic concepts repositories of DNA and other forms of data
related to the history and sociology of used for the purpose of facilitating biomedical
technology and the ACS methodology. To research and ultimately predicting and
help stimulate our creativity, we read some alleviating human disease. We adopted an
speculative fiction that imagines how AI iterative process: after we worked our way
might shape the future (Jemisin, 2011; through the initial set of cases and presented
Jemisin, 2012). We also reviewed the scholarly our insights to one another, we reflected
and journalistic literature to understand the about the potential implications of LLMs. We
projected implications of LLMs. However, then generated an additional list of cases,
because the technology is at such an early and so on until we were confident that we
stage of development, this literature is small had exhausted the social, ethical, equity, and
and it has been produced almost exclusively environmental implications that we could
by NLP researchers and journalists. Team anticipate.
members used this literature as well as

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 28 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Based on our analysis of dozens of cases, we to transform many high-skilled professions–


identified six broad implications of LLMs: albeit in different ways–we then focused on
three related to their construction and three the implications for one: scientific research.
related to their use. Because LLMs are likely

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 29 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Background: How do Large


Language Models Work?

KEY POINTS

• LLMs are considered more “intelligent” than previous NLP efforts due to their capacity for
complex language patterns and ability to behave appropriately in novel situations.

• LLMs learn language from datasets of human-written text from the internet and digitized
books that are so large the developers who assemble them often do not know the entirety of
their contents.

• While LLM developers may be able to assess the performance of their models, there is no
standard approach.

• LLM developers can fix the model’s behavior through fine-tuning, but they must both
identify problems and develop solutions manually.

LLMs emerged from the field


of NLP, where models have
Today’s language models, from the
grown dramatically in size and
sophistication in recent years simplest to the most advanced,
with the availability of data share key features with the original
and more computing power.
efforts: a training data set, a
Increased computing power, in
particular, has made it easier process for learning patterns in the
and quicker for researchers to data set, and the use of the model
collect and categorize data and
to generate new text.
perform more sophisticated
operations. Today’s language
models, from the simplest to
process for learning patterns in the data set,
the most advanced, share key features with
and the use of the model to generate new text.
the original efforts: a training data set, a

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 30 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

LLMs differ from their predecessors in two developers assemble a dataset, or “corpus”,
critical ways. LLMs are much larger, both of text-based documents. Second, the
in terms of the massive amounts of data algorithm learns about word relationships
developers use to train them, and the millions from this data. Finally, developers iteratively
of complex word patterns and associations assess and fine-tune the model as needed to
the models contain. LLMs also more fix specific problems. In theory, models can
closely embody the promise of “artificial continue to be fine-tuned even once they
intelligence” than previous NLP efforts with are in use, although the time and manual
smaller data sets because they can complete labor required of such a change may render
many types of tasks without being specifically ongoing maintenance of this kind impractical.
trained for each one. They are “intelligent” Developers assess model performance against
because the complexity of the association a number of formal and informal metrics
model allows LLMs to respond to questions such as how well it completes sentences,
and tasks they have never seen before, in the how accurately it translates to a different
same way they process other language inputs, language, and whether a human can tell if
and identify appropriate responses. This text was written by the model or by a human.
characteristic in particular makes any single
LLM more widely applicable
than previous NLPs.

In theory, models can continue to


LLMs are few-shot learners.
This means that they do not be fine tuned even once they are in
need additional fine-tuning use, although the time and manual
to be able to perform different
labor required of such a change
types of useful operations
(Brown et al., 2020). To use may render ongoing maintenance
an LLM, the user inputs a task of this kind impractical.
description, a few examples,
and then the prompt for the
model to continue. The model
is then able to predict the next words, Once the language model is trained, it can
sentences, or paragraphs. This makes LLMs generate and translate text and answer
versatile, with the ability for users to apply questions, among other tasks based on
them to tasks that the LLM developers did not initial input given by a user. For example,
necessarily anticipate. someone could feed an LLM the headline
of a hypothetical news article and it would

Developing an LLM involves three steps, be able to generate a possible body of the

each of which can dramatically change how article based on text that followed similar

the program models language, and therefore phrases in the training set. While LLMs do

how it will function when it is used. First, not understand the text they generate in
the way a human does, because they are

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 31 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

repeating patterns developed from human-


written text, they are able to generate text
Gathering the training
that closely resembles what a human might data or “corpus”
write. For any initial input given to the model,
it is as though the model is asking itself “if I Each LLM is trained on a large dataset,
were to encounter this text in a document in called a corpus, consisting of many text-
my training set, what would I expect to see based documents such as books, newspaper
next?” Their ability to do this convincingly is archives, and websites. In order to
based on the amount of data they are trained comprehend LLMs, we must first understand
on and the size and sophistication of the these datasets and the decisions behind them
algorithms. because they fundamentally shape the output.
The large size also limits who can create and
For the past few years, LLMs have been maintain an LLM.
very rapidly increasing in size. Specifically,
the number of pieces of information about The corpora used to train LLMs are massive
language each model stores has been compared to those used in previous NLP
increasing by ten times each year (Li, endeavors. The corpus for GPT-3, for
2020). As the size of the models increase, example, was 570 gigabytes (GB) in size
the definition of a “large” language model (Brown et al., 2020). For a sense of scale, 1
is also shifting. When we discuss large GB is about 1,000 400-page books, or about
language models, we are also anticipating 10 yards of physical books on a shelf (Gavin,
future models that are even larger without 2018). The smaller corpora of past language
knowing how large they will become or what models could be stored and even processed on
emergent capabilities they will have: GPT-3, ubiquitous computer hardware, but massive
one of the largest LLMs ever developed, has corpora require computing space that few
better performance and new behaviors even have access to. Creating such a large corpus
though the only difference from previous from scratch is not practically possible, and
models is its size (Brown et al., 2020). For even gathering and curating a collection
example, GPT-3 can generate snippets of of this size requires a long time and many
code based on human description of the human and financial resources. Instead,
desired code functionality, which was neither LLM developers typically take advantage
intended as a feature of the model, nor was of already-written and curated bodies of
it a characteristic of smaller models with the text such as collections of digitized books
same training process. and the user-created text that comprises
the internet. Each LLM is thus trained on a
collection of different sources, and each type
of source is given a weight, or a percentage
that represents how much of the final dataset
contains data from that source. Weighting
addresses the challenge of balancing quality

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 32 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

reliable sources of text with ensuring there by Amazon Web Services. The Common
is diverse text from a range of different Crawl dataset includes millions of GB of data
sources. The corpus GPT-3 was trained on, for (Common Crawl, n.d.), and is updated once
example, contains text from Wikipedia, the a month. Each update contains 200 to 300
internet, and online collections of books, with terabytes (TB; a TB is equal to 1000 GB) of
greater weight put on the text from Wikipedia textual content scraped via automated web
even though the corpus contained a greater crawling (Luccioni & Viviano, 2021, p.1). It is
volume of data from other sources (Brown constructed from the text of websites, but its
et al., 2020). Putting greater weight on archive represents only part of each website
Wikipedia was a way for the LLM developers crawled. The organization argues that this
to ensure emphasis on text they trusted more creates a “representative” sample of the
(Brown et al., 2020). internet, but this approach also allows it to
claim a fair use exception to copyright laws
by only using a portion of each site, instead
of the whole thing (Luccioni & Viviano, 2021,
p.1). Overall, the Common Crawl dataset
represents the text of the internet’s most
frequent users, who are disproportionately
younger, English speaking individuals from
Western countries who often engage in toxic
discourse (Luccioni & Viviano, 2021, p.5).
Therefore it includes a significant amount of
harmful data including text that is violent,
targets marginalized groups, and perpetuates
social biases (Luccioni & Viviano, 2021, p.3).
Because LLMs identify and replicate patterns,
the inclusion of this data creates a significant
risk that without explicit additional case-by-

Many LLMs (including GPT-3 and BERT) case training, LLMs will produce language

use text from the internet in their corpora. that is similarly harmful and biased. LLMs

The most common way


that developers incorporate
internet text is by way of
a large dataset called the LLM training data includes a
Common Crawl. Although the significant amount of harmful
Common Crawl is managed
by a non-profit organization
material including text that is violent,
of the same name, it has deep targets marginalized groups, and
ties to Google and other large perpetuates social biases.
tech companies, and is hosted

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 33 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

do not at present have the capacity to LLM developers also draw on collections of
automatically detect this kind of language digitized books, but often offer few details
without specialized training. about the composition of the collections
or the rationale behind it. This matters

OpenAI, the former non-profit-turned- because some datasets may be more or

private tech venture that built GPT-3, has less appropriate for training an LLM. The

another approach to overcoming the quality BooksCorpus, for example, part of the corpus

challenge of internet text. Its internet corpus, used to train BERT, was originally curated

WebText, is a 40GB dataset that contains for a completely different project designed

the text from outbound Reddit links that to train an AI to generate rich descriptive

received at least 3 “karma,” or upvotes from text when given video or images (Zhu et al.,

users (Radford et al., 2019). Their rationale is 2015). It contains over 11,000 free web-based
that these linked and upvoted webpages are books written by unpublished authors, but
more likely to contain quality text because the curators do not include much detail about

someone bothered to link and upvote them. its contents other than a breakdown of a few

This may be true, but these linked web pages of the genres (2,865 romance books, 1,479

are also more likely to represent the values fantasy books, etc.). The developers who used

and ideology of Reddit users, which are also this dataset to train the BERT LLM also did

not representative of the general population not not provide additional details such as

(Morales et al., 2021). Meanwhile, because what ideas the corpus includes, whose voices

WebText is not available in full for use by it represents, or why it is an appropriate part

people outside of OpenAI, others have tried of their training corpus. Even more opaque

to construct a publicly accessible version of are the Books1 and Books2 datasets, which

the same corpora, also based on Reddit links are part of the training data for OpenAI’s

(Gokaslan & Cohen, 2019). GPT-3 (Gokaslan & Cohen, 2019). There is
no discussion at all of what these datasets
contain, how they were constructed, or what
While Common Crawl’s corpus is open
they represent in the context of training the
and available for anyone to use, it is huge,
LLM (Scareflow, 2020).
complex, and heterogeneous. As a result, it
requires a large amount of computational
resources to download and process the Furthermore, these corpora are often

data, which means it requires high compute private; most LLM developers do not allow

(Luccioni & Viviano, 2021, p.2). Therefore, others to inspect or build on their dataset.

only researchers at elite universities and large A rare exception is Eleuther AI, which, as

companies are likely to have the financial we describe above, takes a more democratic

resources, expertise, and personnel to be able approach to its LLM overall. Eleuther AI

to use it to build their own LLMs. The LLMs developers created the Pile, a publicly

they build are likely to reflect their values and available English-text corpus that is about

priorities; this will influence LLMs’ capacity 886 GB and made up of existing corpora

to truly democratize text and knowledge. such as OpenWebText2 and Books3, as well
as internet based datasets such as a filtered

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 34 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

version of Common Crawl (Gao et al., 2020). vectors: lists of numbers that represent
The Eye, a non-profit, community driven coordinates in a many-dimensional space.
and funded group, which archives a variety This allows computers to use math to
of creative materials, hosts this corpus (The understand the relationships between words
Eye, 2020). and sentences and predict what words should
be used to complete a sentence or paragraph.
There are different methods
for generating these word
vectors, but recent advances
These bodies of text are often so have developed techniques

large that not even the developers that take into account the
fact that different words
know what is in them. have different meanings
in different sentences.
Everything the LLM learns
about the relationship

Overall, our observations about the between words is based on what is written

composition of LLM corpora echo what in the training data. For example, it will only

Bender and Gebru et al. (2021) have said about associate “sky” with “blue” if there are many

LLMs; these bodies of text are often so large sentences in the corpus that demonstrate

that not even the developers know what is in an association between those words. Bender

them. and Gebru et al. (2021) caution that although


LLMs might appear to be intelligent and
coherent, they have no understanding of
Training the model the underlying properties or relationships
between the concepts that words represent
When training a language model, developers and often generate nonsense as a result.
first make decisions about the model’s setup
and the process the algorithm will use to LLM developers must also decide on a
learn from the training data. This includes the strategy for tokenization when they are
architecture they will use for training. LLMs training an LLM. Tokenization involves
are able to operate on a massive scale thanks breaking the text from a document into pieces
in part to the invention of the transformer for analysis, or tokens. Some tokenization
architecture, which was introduced in algorithms convert each word into a token,
2017 and allows the model to learn the others break words apart, creating, for
relationships between any two words in a example, separate tokens for “sing” and
sentence as opposed to only the one or two “ing” in the word “singing”. Different
neighboring words as was the previous norm approaches for tokenization may affect model
(Vaswani et al., 2017). performance down the line, but, like many
decisions made when training an LLM, it is
In many LLMs, words are represented as hard to predict what the impact will be.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 35 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

The number of parameters, or pieces of training. The actual training process involves
information about language stored by the the model traversing the training corpus
model, is also crucial. Like neurons in a piece by piece and updating its parameters
human brain, the parameters work together according to what it learns from each phrase
to store and process complex information. it encounters.
Models that have more parameters are able
to remember more complex patterns from Training LLMs requires a massive amount
the training data, such as how sentences, of compute, which is only available to a
paragraphs, and documents are structured, handful of people who have access to high
but they require more compute to train. The performance computers or cloud computing
number of parameters in state of the art LLMs services through their institutions (Ahmed
is increasing by a factor of about 10 each year & Wahed, 2020). The developers of GPT-3
(Li, 2020). At its creation, report how many computational
GPT-3 was the largest operations were performed
at 175 billion during the training
parameters, and process, but they do
at the time not give details on
developers the hardware
were already setup, the cost
eyeing the of training, or
possibility how long the
of a 1 trillion training actually
parameter model took. Based on
(Brown et al., 2020). the reported
A short time later, the numbers, other
Beijing Academy of Artificial researchers estimate
Intelligence announced Wu Dao that it could have taken
2.0, which is a 1.75 trillion parameter 355 years to train GPT-3
model (Zhavoronkov, 2021). Parameter size on a single graphics processing
seems to have significant impact on LLM unit (GPU) or could have cost $4.6
performance. GPT-3 (175 billion parameters) million with the necessary parallel hardware
performs far better than GPT-2 (2.7 billion for faster training (Li, 2020). There is no
parameters) along several measures (Brown doubt that training LLMs is expensive. In
et al., 2020). Likely as a result, competition fact, the developers of GPT-3 noticed errors
among developers is primarily about the size in their data during the training process,
of the LLM. but did not start the training process over
because it would have been too expensive
After the developers have determined and time consuming (Brown et al., 2020).
what architecture to use, the number of While initiatives such as the federally-funded
parameters, and the strategy it will use National Research Cloud aim to broaden
for tokenization, the model is ready for access to compute, it is likely to mostly

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 36 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

only help those who already have access to create their own fine-tuned version of GPT-3
significant amounts of compute on their own (OpenAI, 2021b). But this poses new risks
because the gap in access is so great (AI Now and uncertainties. People could feed the LLM
Institute, 2021). hateful or unethical text to produce a more
socially dangerous technology. Meanwhile,
those with more positive intentions have
Fine-tuning the model no guarantees that socially beneficial fine-
tuning will be adopted in the main GPT-3
After an LLM is trained, it can be fine-tuned. model.
This is a relatively light-weight process that
involves feeding the model a few hand-picked
examples, and is often used to train the model Understanding the
on socially sensitive topics. The model learns model’s output and
from these examples and changes a few of
its parameters without changing the core of capability
the model. OpenAI, for example, created a
“values-targeted” version of GPT-3, which There is no universal standard for assessing
they fine-tuned using 80 additional human- LLM quality. However, developers usually ask
written examples that illustrate preferred their models to perform a common set of tests
behavior so it would answer questions about in order to assess performance. These include
subjective topics such as violence, injustice, asking the model to complete sentences,
human characteristics, and political opinion correct grammar, answer commonsense
in what they deemed a desirable manner questions, translate sentences to another
(Solaiman & Dennison, 2021). After learning language, answer reading comprehension
from these additional examples, GPT-3 questions and compare the results to other
generated drastically different responses to LLMs and to a human on the same tasks.
questions about these topics.
Developers are under no obligation to
Fine-tuning allows dynamic updates to disclose either the tests they perform or the
address problems with LLM outputs that results, which makes it difficult for third
are far faster and require less compute parties, including consumers, to evaluate
than model training. However, this process performance. But some publicly available
depends on the sensibilities and knowledge assessments of GPT-3 help us develop a
of developers, who will need to decide better understanding of LLM capabilities and
which topics require additional training and potential areas of concern.
carefully curate the examples. In many cases,
developers may not know which examples First, tests suggest that humans are not able
will be best for fine-tuning. To address to identify LLM-generated text. Developers
these challenges, some organizations have asked GPT-3 to generate a news article based
decided to “democratize” fine-tuning. on a headline, and then asked people whether
OpenAI is taking steps to allow anyone to the article had been generated by a human

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 37 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

or a computer (Brown et al., 2020). Humans about a marginalized group from an LLM
were only able to guess with 52% accuracy, means removing any mention of that group,
which is barely better than random chance. even if it is positive (Welbl et al., 2021).
While there are potential benefits to being
able to perform so well, it is also concerning Overall, LLMs constitute a significant leap
because LLMs could circulate false or forward for Natural Language Processing,
damaging information that would be very and the field continues to expand and advance
difficult to trace (Better Language Models, rapidly. LLM developers may continue to
2019). compete primarily on the size of their models,
or they may eventually shift to improving the
Second, LLMs demonstrate gender, racial, quality. New uses and functions of LLMs, as
and religious bias. When asked to perform well as new ways to capitalize on them, are
a variety of word association tasks, GPT-3 emerging regularly, but the basic operation
produced violent associations with Islam of the technology described here remains
and negative associations with Black people, consistent. In the following sections, we
which reflects biases in the training data discuss the implications of widespread
(Brown et al., 2020). And this problem may growth and adoption of LLMs, both as the
not be solvable: thus far, even though fine- technology exists today, and as we expect it to
tuning in general can produce very different advance.
outputs, efforts to remove toxic language

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 38 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

IMP LI CAT I O NS O F LLM DEV E LO PMENT

Section 1: Exacerbating
Environmental Injustice

KEY POINTS

• LLMs will require the construction of more data centers, which will increase energy and
water consumption.

• Data centers will displace and disrupt the lives of already marginalized communities.

• Those living near data centers will experience resource scarcity, higher utility prices, and
pollution from backup diesel generators.

• Data centers will be classified as “critical infrastructure”, which will make them more
difficult to challenge. This will erode the civil rights of affected communities.

Although we tend to think of artificial world (Johnson, 2017; Morgan, 2020). They
intelligence as purely digital and “in are supported by ventilation systems, cooling
the cloud”, LLMs rely on physical data systems, and backup generators.
centers to process the corpora and run the
algorithms (Bender & Gebru et al., 2021). Data centers rely very heavily on natural
They are, in other words, part of the LLM’s resources. They already make up
“sociotechnical system”, which includes not approximately 2% of U.S. electricity use, and
just the immediate technology but also its contributed to 0.5% of the country’s total
developers and users, and the other artifacts, emissions in 2018 (Oberhaus, 2019; Siddik
institutions, relationships, and people that et al., 2021). They require large amounts of
make an LLM work (Hughes, 1983). Data water to cool and power the servers, with
centers themselves are also sociotechnical a single medium-sized, high-density data
systems, containing between fifty to eighty center requiring 360,000 gallons of water
thousand servers that store and process data. a day (Ensmenger, 2018). For comparison,
These servers use graphic processing units the city of Ann Arbor, Michigan, with a
(GPUs) that contain silicon chips as well as population of nearly 130,000 people, uses
rare earth elements that are mined around the 5 billion gallons of water per year (The City

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 39 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

of Ann Arbor, n.d.). Not surprisingly then, are invariably cheaper–are becoming more
data centers are already having significant attractive (Isberto, 2021). Microsoft is even
impacts on the US water supply: they draw experimenting with underwater data centers
water from 90% of US watersheds, and 20% (Roach, 2020). As data centers increase,
of data centers rely on watersheds that are they will require additional infrastructure
moderately to highly stressed (Selsky and including roads and people but also a massive
Valdes, 2021). Furthermore, much of the increase in natural resources.
consumed water is potable, derived from local
public utilities (Moss, 2021). As we noted in the Introduction, Emily
Bender, Timnit Gebru, and their colleagues
Despite this already-high consumption of observed in their “Stochastic Parrots” paper
resources, the current capacity of data centers that LLMs–due to their thirst for data and
is inadequate. To accommodate the rise of need for processing in data centers–would
LLMs and other types of AI, tech companies increase pressure on our energy systems and
will need many more of these large facilities. exacerbate climate change (Bender & Gebru
In fact, data centers owned by hyperscale et al., 2021). We agree with their conclusions,
providers like Amazon Web Services, Google but expect that the environmental impacts of
Cloud, and Microsoft Azure have doubled LLMs will be much greater and extend beyond
from 2015 to 2020 (Haranas, 2021), and data climate change. In what follows, we rely on
center investment is projected to increase by analogical case studies to suggest that the
11.6% to $226 billion in 2022 (Haranas, 2022). rise of data centers will disproportionately
They are typically 100,000 square feet in size, affect marginalized communities through
but the largest data center in the world (in displacement, direct harms, and curtailing
China) is over 6 million square feet, and the their civil rights to protest.
largest data center in the United States is over
3 million square feet (Allen, 2018). The United
States currently has the most data centers,
Data Centers will
with hubs around Washington D.C., New York Displace Marginalized
City, Chicago, Los Angeles, San Francisco, and
Dallas (Data Center Map, 2022; Berry, 2021);
Communities
in Europe, there are hubs outside London,
Amsterdam, Frankfurt, and Paris, and in In their quest to find the cheapest land

Asia in Hong Kong, Mumbai, and Singapore available to build data centers, LLM

(Data Center Map, 2022). Companies developers will likely displace low-income

choose data center locations based on their and marginalized people in both urban and

power grids, labor markets, transportation rural areas. This kind of displacement has a

networks, water access, and other social and long history. In the middle of the twentieth

geographical factors (Ensmenger, 2018). As century, city planners took advantage of

a result, they have historically chosen more the federally funded highway program to

densely populated areas. But with the rise of eliminate areas they saw as areas of urban

distributed computing, rural areas–which “blight,” a vague term that could suggest

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 40 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

congestion, property vacancy, vandalism, displaced over 25,000 people (Downing,


unkempt vegetation, graffiti, and more 2002). Local community members and
(Lopez, 2012; Stumpf, 2018; Mock, 2017). activists who opposed the dam were
These tended to be Black and immigrant murdered in the process (Environmental
neighborhoods, which had experienced Justice Atlas, 2019). Mining projects in
decades of disinvestment due to redlining Honduras, Argentina, and Colombia, among
(Semuels, 2016; Miller, 2018). In their place, many other countries, have also led to forced
city planners built highways that supported displacement (Working Group on Mining
automotive transportation from and to the and Human Rights in Latin America, 2014).
suburbs, which benefited wealthy, white, Mining projects have ruptured the social
car commuters (Semuels, 2016). It also hurt fabric of local communities in Chile and
marginalized communities by destroying Mexico. Even if a technology’s development
their homes and businesses, physically does not trigger direct displacement, the
dividing them, and polluting the local area economic ripple effects can. Fracking in the
(Lopez, 2012). This initiative displaced Willison Basin, in the northwestern United
approximately 32,400 families per year in the States, led to a sharp increase in housing
early 1960s (Pritchett, 2003). prices which displaced longtime residents
(Stangeland, 2016). Those with fixed
incomes were at the greatest risk. Although
tech companies may entice city leaders to
accept data centers because they will bring
jobs to an area and contribute tax revenue
(Day, 2017; Glanz, 2012; Peterson, 2021),
their mere construction is likely to interrupt
neighborhoods and change mobility patterns.
They will damage community cohesion and
property prices, and ultimately increase
socioeconomic inequalities.

Data Centers will


Detroit, 1951 (left) and 2010 (right)
Expose Marginalized
Communities to
Disparate Harms
This is only one of many examples of
displacement in the service of technology. Data centers will also subject marginalized
In order to build the Tucuruí Hydropower communities to direct and disparate harms
Complex, one of the world’s largest in two ways. First, already vulnerable
hydroelectric dams, the Brazilian government individuals living near data centers will bear

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 41 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

the brunt of the negative effects directly. The process of extracting natural resources
This is a common story. Although developers causes environmental degradation and
invariably claim that such facilities bring resource scarcity in the immediate area.
good jobs to the area (Day, 2017), and Companies mining lithium in Argentina
cities often provide tax breaks and other and Chile worsened water shortages in the
incentives, in the long term communities region (Frankel & Whoriskey, 2016). Mining
must manage a range of ill effects (Rayome, practices themselves produce chemical
2016; Fairchild & Weinrub, 2017). Power residue, particularly of sulfuric acid, dissolved
plants, oil and gas refineries, factories, and iron, copper, lead, and mercury, and this acid
other toxic release sites have a long history runoff can pollute both groundwater and
of being located in communities that lack surface water (U.S. Geological Survey, 2018).
the political power to fight back, but must Similarly, pipeline construction and use
endure the consequences. Perhaps most cause negative health outcomes through the
notorious is Louisiana’s “Cancer Alley”, an contamination of water sources and degrade
85-mile stretch of petrochemical plants and ecosystems of plant and animal life (Betcher
refineries where nearby residents are at a et al., 2019; Mall, 2021).
much higher risk of contracting
cancer (Allen, 2003). Marginalized
communities are subject to other
kinds of risks as well. As Montana
and North Dakota opened up their
Communities... worry that data
lands for oil extraction from the centers will stress their scarce
Bakken Formation, male laborers resources and cause pollution.
flooded the area (Stern, 2021).
These new employees stressed
the resources of economically
fragile areas, and rates of human Communities across the country have already
trafficking, sex trafficking, and missing and begun to worry that data centers will stress
murdered Indigenous women rose (First their scarce resources and cause pollution.
Peoples Worldwide, 2019). Long legacies Google recently gained approval to develop
of discrimination, coupled with land- several data centers in The Dalles, Oregon,
use, housing, and transportation policies, a region experiencing severe drought (More
make it difficult for these communities to Perfect Union, 2021). Citing trade secrets, the
escape these “sacrifice zones” (Fairchild company refuses to disclose how much water
& Weinrub, 2017; Baker, 2019). Similarly, its facilities will use, and local residents worry
oil pipeline construction has caused the that the company’s resource needs will be
destruction of culturally significant sites like prioritized over their own. Google says that it
Native American burial grounds, inflicting will drill wells, build water mains and develop
significant harm on Indigenous communities an aquifer to store water and increase supply
(Whyte, 2017). during drier periods, but this could create
additional risks to the community. Rural and

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 42 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

drought-prone Quincy, Washington, home to known as artisanal and small-scale mining


data centers for Microsoft, Yahoo, and Dell, (ASM) (Buss et al., 2019; Lawson, 2021).
has seen some of these problems. The town These miners do not have access to adequate
attracted Microsoft, for example, not only safety equipment, and experience significant
with the usual tax breaks but also by offering negative health impacts due to the pollution
the company much lower electricity rates of the mines (Amnesty International, 2020b).
than the national average and promising to Furthermore, fatal accidents occur frequently
build a new substation (Glanz, 2012). When (Al Jazeera, 2020). Wages are low, and miners
the company deemed the public utility too are often subject to verbal, physical, and even
slow in building the substation, it began sexual abuse (Sovacool, 2021), but they do not
to waste millions of watts of electricity leave because cobalt mining provides wages
as a pressure tactic. Residents worry that in an area where there are few opportunities
behaviors like this will lead to a shortage of for employment. More data centers will
power and high prices, especially because mean increased demand for hardware like
Microsoft and Yahoo together used 41.8 GPUs, which will only increase the need for
million watts while all residential and small these elements. This, in turn, will worsen
commercial accounts used only 9.5 million the working conditions for these desperate
(Glanz, 2012). There are also concerns that miners as corrupt employers push them
Microsoft’s 24 diesel generators, which the to increase their yields. Even though these
company uses for backup power for the data mines clearly violate international standards,
center, will create toxic air pollution that the major tech companies have avoided
may cause cancer. Microsoft’s Santa Clara, scrutiny or responsibility for their activities
California, data center was one of the largest for years (Amnesty International, 2016).
stationary diesel polluters in the Bay Area
(Glanz, 2012).
Affected Communities
Meanwhile, the increased need for data will have Limited Civil
centers will continue, and perhaps even
Rights
exacerbate, exploitation of the communities
that mine minerals–including cobalt, tin,
Because of their central role in maintaining
gold, copper, aluminum, tungsten, boron,
cloud computing and other services, we
tantalum, and palladium around the world
expect the US government to legally designate
(Euromines, 2020). These elements are
data centers as “critical infrastructure”:
required to construct the servers housed
physical or cyber systems that are seen
in data centers. Consider what is already
as so essential to the country that their
happening in the Democratic Republic of
incapacitation would have significant
Congo (DRC), which produces approximately
negative effects on public health, safety,
70% of the global cobalt supply (Sovacool,
or the economy (Cybersecurity and
2021). 20% of these producers are informal
Infrastructure Security Agency, n.d.). In
workers who have very low incomes and look
2021 for example, Australia classified
for the mineral under hazardous conditions,

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 43 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

the “data storage and processing sector” McVey, 2020).


(The Parliament of the Commonwealth of
Australia, 2021), a category which includes The Royal Canadian Mounted Police used this
data centers (Barbaschow, 2020; Hirst, framing to label protestors a mix of “peaceful
2020), as critical infrastructure. In the United activists, militants, and violent extremists,”
States, 85% of this critical infrastructure is particularly emphasizing the “extremists,”
privately owned (Brown et al., 2017); these who are characterized as having an “anti-
companies must collaborate and share petroleum ideology” (Royal Canadian
information with the government in order to Mounted Police, 2014; Spice, 2018). Similarly,
receive its protection (Monaghan & Walby, the Bangladesh government has suppressed
2017). But while the critical infrastructure freedoms of assembly and speech in response
classification enhances the security of these to protests against the Rampal Coal Power
facilities and increases the likelihood that Plant (Savaresi, 2020).
they can maintain operations under adverse
conditions, it also makes it more difficult for
In fact, in recent years over a dozen US
communities to protest against them.
states have passed “critical infrastructure
protection” laws to criminalize anti-oil and
Both governments and companies have gas pipeline protests. While they focus on
used the critical infrastructure framing to violence or property damage, many worry
surveil and persecute environmental activists that their true aim is to deter nonviolent
(Monaghan & Walby, 2017). Multinational civil disobedience that is protected by the
company Shell, for example, pumped oil from First Amendment to the US Constitution
the Niger delta for decades. Oil spills were (Colchete & Sen, 2020; Cagle, 2019). Consider
frequent, which led to extensive air, water, protests over the Bayou Bridge pipeline,
and soil pollution and ultimately damage which moves oil between Texas and Louisiana
to human and ecosystem health (United (Cagle, 2019). Critics worry that the pipeline
Nations Environment Programme, 2011). will leak and cause environmental damage,
Communities began to protest systematically, particularly in swamplands. In 2018,
but were invariably met by security forces Energy Transfer, the company building
who intimidated them, attacked them the pipeline, directed “private duty” law
physically, damaged their property, arrested, enforcement officers from the Louisiana
and even murdered them (Frynas, 2001; Department of Probation and Parole to arrest
Amnesty International, 2020a). In other three protestors traveling by canoe and
words, the oil and gas projects didn’t just hurt kayak who were observing and challenging
communities physically, they damaged their pipeline construction. The company claimed
rights as citizens (Frynas, 2001). This kind “unlawful entry of a critical infrastructure,”
of “security” is not unique to Nigeria. Both a recently enhanced felony under Louisiana
companies and governments have deployed law. The charges were later dropped, but
security forces to protect oil infrastructure similar cases are pending around the country
in Canada, Bangladesh, Mexico, Kenya, and (Baurick, 2020).
Australia, and the United States (Savaresi &

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 44 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Whether or not data centers are formally LLMs will place enormous pressure on
designated as critical infrastructure, current data processing capacity, which
we expect the concerns of marginalized will trigger the development of data centers
communities to hold less weight in siting around the world. This will require not only
decisions. As we discuss above, dangerous the development of built infrastructure
rare earth mining practices continue despite but also massive resource extraction. Our
frequent community dissent (Business analogical case study analysis has suggested
& Human Rights Resource Centre, 2021). that already marginalized communities–
both low income areas and
communities of color–are likely
to experience the negative

Whether or not data centers impacts disproportionately.


Many of them will be displaced,
are formally designated as and their neighborhoods and
critical infrastructure, we expect towns transformed. Those who

the concerns of marginalized remain will have to manage new


health and ecosystem risks, as
communities to hold less well as economic burdens due
weight in siting decisions. to the data center’s energy and
water use. However, they will
have limited opportunities to
challenge this dynamic. City
Similarly, Indigenous Americans have leaders will be enticed by the promise of jobs
repeatedly raised concerns about developing and regional economic development, and
sacred lands–whether for laying pipelines or likely classify the new facilities as “critical”.
constructing mines–with little success. And This designation will provide additional
recently, a US federal court rejected attempts security, which will likely be used to curtail
by the Paiute and Shoshone communities free speech and, ultimately, eliminate
in Nevada to prevent the construction of a opposition.
lithium mine on their ancestral lands, for
example, largely because the US legal system
does not recognize Indigeneous religious
perspectives and by extension, cannot protect
their sacred sites (Golden, 2021).

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 45 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

IMP LI CAT I O NS O F LLM DEV E LO PMENT

Section 2: Accelerating
the Thirst for Data
KEY POINTS

• LLMs will further test the model of individual informed consent which currently governs data
sharing and privacy.

• Publics will increasingly hesitate to share personally identifiable information online,


negatively impacting not only LLMs but other institutions that work with personal
information.

• LLM developers may use unethical tactics to diversify the corpora, placing a disproportionate
burden on marginalized communities.

• Users who formerly relied on human interpreters will feel that LLMs offer more privacy than
relying on another person.

In addition to data centers and natural are neither effective nor commonplace
resources, LLMs require vast amounts of (Privacy Considerations in Large Language
data. Much of it will come from us, the users Models, n.d.). This presents a serious
of the internet. As we discussed
in Background, LLMs already
extract text from old books
and across the web, including LLMs require vast amounts of
text from links posted on social
media. In turn, this raises data
data. Much of it will come from
security and privacy issues. us, the users of the internet.
While LLM developers have
adopted some practices to filter
out personally identifiable
information (PII, which can include full vulnerability to third-party extraction attacks
name, social security number, zip code, and and unintentional leaks of PII. However,
more) in LLM training corpora, such methods even if LLMs successfully screen out PII,

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 46 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

LLMs might still be able to triangulate bits and travel (e.g., those who are hearing
of disconnected information such as mental impaired) may actually be able to better
health status or political opinions that appear maintain certain forms of privacy.
in the corpora to develop a full, personalized
picture of an actual individual, their family, or
community (Kulkarni, 2021). Thus far, there
LLMs will Transform
has been little transparency as to whether Informed Consent
the most popular LLMs have been security-
tested, but the vulnerabilities are likely to
Most LLM corpora are created using a data
increase as model development increases.
collection method called web crawling, which
involves systematically traversing the entire
Meanwhile, Americans are increasingly internet to gather text. Much of this text was
concerned about data security: 79% of provided by the population through their
adults worry that companies are using their online activity when they upload web pages
personal information and 64% are worried or post comments. But few of us have any
about government data collection (Auxier et idea that our text is included in the corpora,
al., 2019). These concerns are valid as data much less which information is used or how
security is a challenge: while Illinois and it is used. In many cases, we may have already
California have passed data privacy laws, the provided our consent. We agree to complex
United States lacks federal legislation and and lengthy “click through” agreements to
much of the population remains unprotected use online services, such as WordPress or
by data privacy or security policies. Reddit, that allow third parties to have access
to the text we post, including for LLMs. This
In this Section, we analyze how LLMs will is problematic because few people read user
affect the privacy and security of personal agreements and are therefore unaware of the
information and accelerate a thirst for data. scope of their consent (Cakebread, 2017). And
We conclude that LLMs will likely be able LLMs pose particular challenges. As noted
to produce information about individuals above, the text we post can be triangulated to
and communities even if they are barred develop a full picture of us or to predict our
from including personally identifiable behaviors. If we post information about our
information (PII). As a result, publics will community or family, we are also consenting
become more hesitant to share information to data collection on behalf of others without
about themselves online. These information their knowledge. In sum, while some of us
practices will have uneven impacts for may consent to the use or sale of some of
marginalized groups: those who are the information we post, LLMs bring it all
underrepresented in the corpora are likely to together and expand the scope. This makes
be pressured to participate in LLMs and may the information more powerful and has
lose some civil liberties if they do not. But potentially serious implications even for
others, including those who currently rely on those who are careful about what text they
interpreters or translators to communicate post online.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 47 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

The field of genomics has been dealing forensic databases that include the DNA
with these kinds of challenges for decades. information of all individuals convicted of
With the rise of mapping and sequencing (and sometimes, even arrested for) a crime
technologies and the infrastructure to build (National Conference on State Legislatures,
and process large databases of information 2014; Interpol, 2020). When they find DNA
about an individual’s genome as well as at a crime scene, they then search these
their health, environment, and lifestyle, databases to find not only matches but also
there has been growing concern about “familial matches”, i.e., people whose DNA
individual, family, and community privacy. partially matches the DNA found at a crime
An individual’s decision to get a genetic test scene. This helps police officers narrow down
has ripple effects for their family members. the pool of potential suspects and focus on
If someone tests positive for a gene mutation
related to Huntington’s Disease, for example,
then not only will their family members
feel additional stress or anxiety that their
loved one will soon experience a debilitating
neurodegenerative condition, but their
children will also be more concerned that they
too will have the disease (Oliveri et al., 2018).
However, these individuals have no say in
whether their family member gets a genetic
test. Similarly, when a handful of members of
a racial group or ethnic community choose to
participate in genomics research, all members
of the group are all affected by the findings.
In the 1970s, the US federal and state Credit: Darryl Leja, NHGRI
governments created screening programs to
identify African Americans at risk of sickle
cell disease, a painful blood disorder, and
ensure that they receive appropriate services a specific family. But, it also means that
(Duster, 1990). Unfortunately, the program individuals who never agreed to participate in
resulted in stigmatization and employment the database are affected by its findings. This
exclusion based on race; the US Air Force took place in the infamous Golden State Killer
Academy, for example, erroneously used the case, where investigators identified the killer
data to exclude sickle cell carriers from the via the genetic profiles of distant relatives
applicant pool. dating back to the 1800s. Clearly, these
relatives never consented to upload their

An individual’s participation in a DNA genetic profiles, but there is no option to opt-

database can also have broad criminal out (Zabel, 2019). Some might argue that this

justice implications. Law enforcement erosion of privacy is permissible in the name

agencies across the world have created of public safety, but studies show that these

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 48 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

databases have a disproportionate number were quite isolated, and perturbed by the
of samples from historically overpoliced Western scientists making these requests.
communities of color and are thus more likely They were also distressed by the concept of
to affect them (Murphy & Tong, 2019). We individual, informed consent. After all, one
have also begun to see similar use of health person’s DNA would provide information
or ancestry-focused DNA services, such as about the whole community. What if others
23andMe. Even though individuals provide in the community questioned the use of
their DNA in a non-criminal context, the this information? In response, some of the
services have demonstrated a willingness to scientists proposed a new approach that
share this information with law enforcement. would include informed consent from both
For example, researchers investigating individuals and the group through “culturally
the Y-chromosome Haplotype Reference appropriate authorities” (North American
(YHRD) forensic database, which contains Regional Committee, 1997). Later attempts
300,000 anonymous genetic profiles, to map human genomic diversity, including
have raised ethical concerns over a lack of the International Haplotype Mapping Project,
informed consent for the Uyghur and Roma have tried to implement this approach (The
populations. Without knowledge of where International HapMap Consortium, 2004).
their genetic information will go, these
minority ethnic groups stand at an increased Another similar framework involves engaging
risk for persecution (Schiermeier, 2021). In all community members throughout the life of
of these cases, the decisions of a few people to a project. Researchers from the University
share information had widespread impacts. of Oklahoma followed this model when
conducting genomic research with Native
In almost all these cases, individuals American populations. They surveyed
provided free and individual informed participants about health decision-making,
consent, a framework developed in the explained intentions behind the project
latter 20th century in response to scandals during public meetings, and established
about unethical medical experimentation community review boards to review
and practice. But this framework is clearly manuscripts. While this approach can slow
insufficient for situations where one person down research and is often logistically
may share information that has implications challenging, it ensures public trust and
for others. In response, researchers have effective guidelines can help minimize
pioneered new approaches that take human negative effects (Foster et al., 1997). The way
connection seriously. Consider, for example, researchers obtain consent can have long
the Human Genome Diversity Project initiated term impacts particularly if the researchers
in the 1990s. Excited about the opportunity to want to return to the community for further
use new techniques to map genomic diversity, data collection. For example, in the case of
scientists identified populations around the Havasuppai in Arizona, the tribe provided
the world and began to ask for their DNA their DNA to Arizona State University
using well-established consent procedures researchers with the understanding that they
(Reardon, 2005). Many of these communities would use it for diabetes research (Harmon,

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 49 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

2010). However, the researchers gave the


DNA to another group of scientists who used
People Will Hesitate
it to investigate the tribe’s genetic origins to Share Personal
and whether members of the tribe had
genetic links to mental illness. When the tribe
Information
discovered this misuse, they sued. Eventually,
they settled the case, but not before damaging We expect that as people interact with LLMs

the trust between the university and the tribe. and realize the depth and breadth of the data
they are trained on as well as their potential
to disclose sensitive information, they will
LLMs will raise similar concerns and
hesitate to provide personal information
controversy, as people realize that their
online.
text is being used for purposes that they did
not intend or with which they disagree. If
developers do not address these concerns at In 2017, Equifax, an American credit

the outset, they risk further erosion of trust in reporting agency, suffered a cyber attack that

the tech industry, and ultimately, resistance uncovered and downloaded sensitive PII of

as we discuss in Section 6. However, they can over 140 million customers. Negligent Equifax

learn from the genomics and medical arenas, security officials were at fault because they
failed to install a security patch from
their software provider, Apache
Struts, which had been released

If developers do not address two months before (Baird Equity


Research, 2017). When it disclosed
these concerns at the outset, the attack a few months later,
they risk further erosion of Equifax lost significant credibility –

trust in the tech industry, and shares dropped 13% in early trading,
people were outraged over the lack
ultimately, resistance. of transparency about the attack,
and hundreds sued the agency
for damages and won. After these
sanctions, 54.2% of those publicly

which have been experimenting with new surveyed believed that Equifax should no

forms of consent. This includes both group longer serve as a credit bureau (Brown, 2018).

consent, described above, as well as qualified One year later, Equifax attempted to remedy

granular consent (Simon et al., 2011) which is the issue by providing consumers with free

designed to provide users with more authority credit monitoring but only 30.6% said that

over how their data is used. these steps had improved their perception of
the company.

As a result of data breaches, companies


across the globe have improved their security

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 50 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

standards and data governance practices cases, contains text from private sources.
– but keeping PII private continues to be a Hackers could extract specific parts of
challenge. In just the first quarter of 2021, training data that the LLM has memorized,
4 billion online accounts were hacked known as a training data extraction attack.
worldwide, with LinkedIn and Facebook An adversary with access to an LLM would
being the most vulnerable (C., 2022). simply have to input probable phrases (e.g.
Occasionally, the party misusing the data is “The phone number of John Doe is”…) and
the one collecting the data itself. For example, let the model complete information that
employees of the ride-sharing app Uber might reveal PII. Using confidential data
used the company’s database to track the to train the LLM is dangerous, as it risks
locations of politicians, celebrities, and even revealing information that users intended
ex-spouses. They exploited this “God View” to keep secret. This technique has already
feature for over 2 years with little to no user been put into practice, as Gmail’s auto-
knowledge (Evans, 2016). The cumulative complete model is trained on private text
effect is one of user mistrust across communication between users (Privacy
companies, and a feeling that the onus is on Considerations in Large Language Models,
the user to take preventative measures. n.d.). We expect that public-facing LLMs will
in part use confidential data for training,

But the effects go beyond loss of trust. which means personal data breaches will be

User behavior often changes dramatically. possible. But not all breaches of privacy will

Consider recent concerns over email trackers rely on private data and sensitive PII. Even an

in the most popular email clients (Google’s LLM that connects a person’s professional

Gmail, Microsoft’s Outlook, Yahoo Mail), online presence with their personal one could

which enable third-parties to extract a have implications if their online presence

user’s email address and activity on a user’s includes information about things like health,

web browser. With this information linked sexual orientation, or immigration status.

together, the third party trackers can target


ads based on any future online activity As a result, users will lose trust and ultimately
across all devices. The practice is widespread hesitate to provide personal information
- an estimated 70% of emails embed at online and in other communication channels.
least one tracker (Englehardt et al., 2018). As we discuss in Section 6, this breakdown
In response, users are flocking to a less- in trust will have implications for social
established platform that prioritizes security, fragmentation. This could also hurt the
DuckDuckGo. DuckDuckGo’s Email Protection accuracy of LLMs and the development of
feature strips emails of trackers, sets up new apps. But it will also hurt institutions
a disposable email to forward spam, and that require access to PII to function (e.g.,
prevents disclosure of personal information hospitals, banks) as well as the individuals
(Gershgorn, 2021). who rely on them and on accurate digital
technologies. Users are likely to hesitate to

LLMs present similar risks, especially because give PII and may create new ways to stay

the training dataset is large and, in some anonymous online, such as tools that prevent

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 51 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

web scraping, although this may not be people of color, women, children, gender
accessible to or benefit everyone (Zou et al., non-confirming people–except white male
2018). adults (Grother et al., 2019). But they are
increasingly being used by law enforcement,
schools, airlines, and even, briefly, the
LLMs will Create Internal Revenue Service (Epstein et al.,
New Forms of Data 2022; Galligan et al., 2020). To deal with the
technology’s accuracy problems, developers
Exploitation have sought out pictures of individuals from
marginalized communities. Most famously
With the rise of surveillance capitalism and problematically, a contractor hired by
(Zuboff, 2019), all digital data has increased Google targeted attendees of the BET Awards,
in value. LLMs exemplify this trend as they college students of color, and even homeless
take advantage of the freely generated data of Black people in Atlanta for facial scans. The
millions of individuals to produce commercial practice was exploitative, as volunteers were
technologies that summarize, generate, and rushed through consent forms and misled
predict language. But in this ecosystem, not about what would be done with the scans
all data has the same value. At present, LLM (Dillon, 2019).
corpora overwhelmingly include English
or Chinese language texts, and many of
Similarly, as we suggest above, the rise of
these texts are quite old. The racist, sexist,
genomic science has also made particular
and homophobic output described in the
genomes valuable. This has, in turn, triggered
Introduction is one result. In order to ensure
unethical practices and created new burdens
that LLMs are more useful and less offensive,
for already marginalized communities. A
developers are keen to expand the corpora to
2019 controversy at the UK’s Sanger Centre,
include more languages and dialects, genders,
the UK’s premier genomics institute,
cultures, and populations. History suggests
echoes the Havasuppai and Human Genome
that the texts least likely to be currently
Diversity Project cases described above
represented in the corpora, and thus most
(Stokstad, 2019). The Centre was trying to
likely to be valuable in the future, come from
develop and commercialize a “gene chip”
marginalized communities and cultures. But
that would identify genetic links to common
as developers try to improve their models
diseases, and needed African DNA samples
by expanding corpora in these directions,
in order to ensure that this technology was
they will create new forms of exploitation
adequately representative. So it entered
that will disproportionately affect already
into agreements with scientific institutes in
marginalized communities.
Africa that had collected indigenous DNA.
But it did not disclose that the DNA would be
Facial recognition technologies have posed used commercially, and many of the original
similar problems. They are famously DNA sharing agreements had forbidden
inaccurate among all populations–including this kind of use. The African scientists who

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 52 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

collected the data worried that it would In the coming years, LLM developers are
alienate communities who had just begun to likely to prioritize collecting texts from
participate in genomics research (AT Editor, marginalized communities in the name
2019). of increasing accuracy. This might mean
purchasing access to non-digitized
texts in a variety of languages, or
deploying speech-to-text apps
to capture the rare dialects of
Some communities have some communities. But given the
learned how to take advantage power of these companies, there

of the importance of their own is a great risk of exploitation.


Marginalized communities will
data for emerging technologies. need assistance from NGOs and the
government in order to ensure that
any data-sharing agreements are
appropriately balanced and serve

However, some communities have learned community needs.

how to take advantage of the importance


of their data for emerging technologies.
Groups representing individuals with rare
LLMs create a sense
diseases have negotiated with scientists to of privacy for some
own technologies produced using their and
vulnerable communities
their childrens’ DNA (Terry et al., 2007).
Indigenous communities have developed
Some vulnerable communities, including
benefit-sharing agreements with Western
immigrants who do not speak the dominant
companies seeking to commercialize their
language, and people with auditory
knowledge (Foster, 2018). Other groups seek
disabilities, rely on human interpreters to
to keep the economic benefits to themselves,
access social services from healthcare to legal
such as Te Hiku Media, an Indigenous-owned
aid. Currently, they have to disclose personal,
tech nonprofit that has refused to share
potentially embarrassing information to
hundreds of hours of valuable Maori language
another person, and trust them to protect it.
audio. Instead, by building speech recognition
But in the future the user could interact with
technology internally, Te Hiku has ensured
an LLM or other technology that feels more
that only the Maori people will control the
private. However, they would still be sharing
use of and profit from their language (Coffey,
personal information with the company
2021). Similarly, recognizing that non-
providing the LLM-based service.
Black creators reap financial benefits from
co-opting their dances, Black TikTok stars
boycotted the platform in hopes of receiving For the deaf community, the cochlear implant

proper credit and compensation (Muller, (CI) has enhanced privacy by eliminating

2021). the need for a human sign language

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 53 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

interpreter. The CI is a surgically implanted medical conditions as well as medication


neuroprosthetic that can interpret requirements, both of which may be seen
electrical signals as speech and sound. As as intrusive. They and other staff also use
helpful as interpreters are, deaf people cameras to surveil residents to identify those
can see them as a burden who are in distress and provide appropriate
(Reinhardt, assistance. However, this data can be
2015). easily misused: staff might use personal
First, there information to impersonate residents, or
are far fewer to prey on them (Berridge & Levy, 2019).
interpreters than To avoid this potential invasion of privacy,
deaf individuals, elderly individuals replace nursing home
which means that few personnel with in-home technologies
deaf people can afford including virtual assistants, stair lifts, and
their own personal interpreter. telemedicine applications (Kelly, 2021). At
As a result, they must disclose the very least, these technologies delay the
Credit: WikiCommons user
Hear Hear! (CC BY 4.0) personal information to multiple need for an older person to move to a nursing
interpreters, which increases home or assisted living facility. For the blind
the risk that interpreters might misuse the population, sighted guides similarly share,
information. Interpreters might be familiar through vocal or physical cues, information
with one another and share information about orientation and navigation when
about the deaf person, or the interpreter may traveling in unfamiliar areas. This help can be
simply not be fully fluent in sign language crucial. However, as with previous examples,
which makes it impossible to establish trust the cost of this assistance is the disclosure
from the beginning. Interpreters often report of personal information which can, again,
feeling anxious when having to translate be easily misused (Merry-Noel, 2015). Blind
serious discussions, such as marriage people might use guide dogs, braille, or white
therapy (Levinger, 2020). Deaf people must canes instead, to maintain some privacy.
trust that the interpreter will adhere to
their professional obligations and maintain LLMs offer a similar sense of privacy for
their privacy. With a CI, however, this trust some communities. As we describe in other
is unnecessary: the human intermediary sections, LLMs may be able to translate
is displaced along with privacy concerns text across languages and linguistic styles,
(The British Psychological Society, 2017). making the world more accessible for those
CIs cannot store information and thus particularly for those who do not primarily
confidential information remains between use spoken English. While the interpersonal
the deaf individual and the intended party. dimension of human communication might
be lost, the case studies we have reviewed
Similarly, the elderly population living in here suggest that this will produce privacy
nursing homes are often heavily surveilled benefits. Without a third party translator,
in order to provide proper assistance. the communication experience will be less
Nursing home staff are informed about intimidating especially for those disclosing

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 54 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

embarrassing personal information to the conditions, communities are likely to


intended recipient. However, it is important become increasingly reticent to share their
to note that even in this case, the user is information. In the long run this could affect
disclosing information to the LLM or LLM- our health care, finances, security, and even
based app, which could incorporate this rights. It could also produce controversies
information into a corpus that may only that damage trust in LLMs. Meanwhile, LLM
be partially protected. Thus, there is still developers are likely to use highly unethical
a privacy risk, but it may be indirect and practices to extract data, especially from
attenuated in comparison to the benefit. marginalized populations, in the name of
enhancing accuracy in their models. Finally,

At their core, LLMs rely on data about the communities who have traditionally

the way humans communicate through relied on assistive technology may gain some
language and thus, LLM developers will immediate privacy but will now be disclosing
continually need data to maintain their their personal information to an LLM.

model’s accuracy. However, under current

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 55 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

IMP LI CAT I O NS O F LLM DEV E LO PMENT

Section 3:
Normalizing LLMs

KEY POINTS

• To gain user acceptance, LLMs will be framed as empowering and modular.

• Developers will try to incorporate LLMs into existing sociotechnical systems, particularly
those governed by trusted institutions, in order to ensure their longevity.

• When LLMs produce hateful language or errors, developers will deflect blame onto
infrastructure or human users.

Developers have made experimental LLMs But these conversations have been largely
available to both researchers and publics, restricted to the field of artificial intelligence
which has stimulated early excitement and specialist technology publications. An
about the technology’s text summarization, important exception is the 2021 controversy
generation, and translation capabilities. over Google’s firing of Timnit Gebru and
And yet, as we have noted repeatedly thus Margaret Mitchell, which we discuss in the
far, there is already concern about the social Introduction. Newspapers across the globe
harms. Journalists worry that LLMs will be covered Gebru’s firing, likely due to combined
able to generate articles and further damage concerns about the practices of the major
their job prospects (Seabrook, 2019). Others technology companies, artificial intelligence
worry that like cryptocurrency, LLMs will and algorithmic bias, and heightened media
require the use of so much energy that they attention to discrimination against Black
will make it harder to fight climate change people in the wake of George Floyd’s murder
(Bender & Gebru et al., 2021). There is already in June 2020.
evidence that LLMs will reflect the historical
biases of English-language texts by using These events have likely triggered some
racist and sexist language and reproducing skepticism about LLMs, most significantly
harmful assumptions about marginalized within Black and research communities.
communities (Abid et al., 2021). Despite these concerns, based on our

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 56 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

analysis of analogical cases we expect In the project’s early days he said to the MIT
Google and other developers to emphasize Technology Review that OLPC “is probably
LLMs’ democratizing and even empowering the only hope. I don’t want to place too much
potential, as well as their modularity. In fact, on OLPC, but if I really had to look at how to
they have already begun to highlight the eliminate poverty, create peace, and work
broad social benefits particularly in terms on the environment, I can’t think of a better
of increased access to crucial services such way to do it” (Ames, 2019). As Negroponte
as legal aid and health advice. They will and his team tried to sell the technology first
also try to make LLMs ubiquitous quickly, to investors and then to the governments of
and promote their use particularly among Southern countries, he framed it as not just
established authoritative institutions. In transformational but as leveling the playing
the process, they will continue to dismiss field across the world.
the technology’s limitations and errors,
deflecting blame onto infrastructure and Although OLPC was explicitly designed for
other users. humanitarian purposes, we expect LLMs to
be framed in similar ways. Consulting firm

LLMs as an empowering Deloitte has already suggested that LLMs


will be able to more efficiently and accurately
technology synthesize public comments on pending
policies (Eggers et al., 2019). Others have
In keeping with a long history of technology, emphasized that the technology could provide
particularly those focused on communication, legal aid to those who could not otherwise
developers will emphasize LLMs’ capacity afford it (Bommasani et al., 2021). We might
to empower their users. The One Laptop even expect developers to encourage the
Per Child (OLPC) program is an instructive first apps on “public interest” oriented
analog. Founded in 2005 by Nicholas technologies, such as therapy chatbots.
Negroponte, founder and chairman of the Despite emerging concerns about LLMs,
MIT Media Lab, OLPC aimed to transform we do not expect corporate developers to
education for children around the world voluntarily take steps to build public trust by
by providing them with extremely making the corpora or algorithms transparent
cheap (~$200), rugged computers, and or bringing in community knowledge
accompanying software and content (Ames, to develop more politically legitimate
2019). To gain support, Negroponte presented technologies. While technologists have begun
his ideas and solicited investments across to take such steps in highly controversial
the world, including at the World Economic areas such as geoengineering and human
Forum in Davos, Switzerland. He claimed gene editing (Stilgoe et al., 2013; Gusmano
that the technology would allow children to et al., 2021), LLMs have not yet risen to that
teach themselves and their parents, providing level of public attention or scrutiny.
them both with an education that would
allow them to lift themselves out of poverty.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 57 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Creators of technologies also sometimes


emphasize the empowering potential of
Connecting to
their machines by allowing them to have Authoritative Institutions
“interpretive flexibility” particularly in their
initial rollout. Rather than dictating use, LLM developers will also try to establish
these developers allow users to integrate the legitimacy of the technology by
technologies into their current work and quickly integrating them into the existing
lives however they wish, to increase uptake infrastructure, including connecting it to
and excitement (Bijker et al., 1987). Early authoritative institutions. Facial recognition
car manufacturers used this approach to technologies have followed this path. First
increase acceptance in the early 20th century used for security on a large scale at the
(Kline & Pinch,1996). Farmers were initially 2001 Super Bowl, when law enforcement
very skeptical of the automobile, which was used it to detect potential threats among
loud and scared away livestock, and made the crowds, facial recognition has spread
it difficult for them to use their horse-
drawn buggies on the roads. The technology
also brought urbanites into their towns,
whom rural residents found irritating and
sometimes even scary. So, farmers used a
variety of strategies to keep out what many
called the “devil wagon”, and an anti-car
movement began to flourish. However, some
farmers reinterpreted the technology as a
source of power, and demonstrated how its
engine could be used to facilitate farm tasks
including corn shelling, sheep shearing,
and grinding. Soon, manufacturers made
changes to new car models, developed new Credit:

accessories, and changed their advertising


strategy to capture this understanding of the
automobile (Kline & Pinch,1996). They knew rapidly particularly over the last 10 years
that by endorsing these interpretations of with almost no regulation (Galligan et al.,
their technology, they could increase demand 2020). Security companies convinced police,
and ultimately entrench it in American life. universities, K-12 schools, and airlines across
LLM developers, by encouraging targeted the United States and around the world to
apps designed for a range of purposes, are adopt the technology in the name of public
already starting to construct a technological safety, even in the face of growing evidence
ecosystem geared towards this kind of that is inaccurate among marginalized
flexibility. communities and often ineffective. With law
enforcement and academia as early adopters

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 58 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

and advocates, the technology has become published these scientists’ findings (Rabin,
harder to challenge. While civil society groups 2020; Harris, 2020). But today, doctor’s
and even policymakers have tried to ban or offices and hospitals still regularly use the
otherwise regulate the technology, they have pulse oximeter as part of their health care,
met limited local success. And as a result, to determine the severity of their patient’s
facial recognition’s reach is growing: in 2022, condition. It seems too difficult to change
Clearview AI, with one of the largest indexes professional practices, despite the human
of faces, announced massive expansion cost.
of their services beyond law enforcement
(Harwell, 2022). The situation is similar with
the breathalyzer, which is used to evaluate
Deflecting Blame for the
cognitive impairment due to alcohol (Cowley Technology’s Problems
& Silver-Greenberg, 2019). Despite extensive
evidence that it generates inaccurate results,
Especially in the early days of LLMs, we
it is still widely used by law enforcement.
expect users to identify a range of errors and
problems with the technology. Developers
This technological entrenchment is not will first try to maintain the technology’s
unique to law enforcement. Consider the credibility by ignoring these problems. If
pulse oximeter, which assesses blood oxygen that proves impossible, they will likely blame
levels, and is crucial to diagnosing severe the infrastructure or users. Let’s return to
cases of COVID-19. In 2020, an anthropologist the OLPC. It never had the positive impacts
published an article observing that the that Negroponte envisioned. Demand for the
device was likely to be less accurate among machine was less than anticipated, and even
people of color because its reading is based when governments or civil society groups
on light refraction (Moran-Thomas, 2020; donated them to low-income children, many
Sjoding et al., 2020). A few months later, a broke or simply went unused. And yet, OLPC’s
group of physician scientists validated this developers largely do not acknowledge the
hypothesis through a randomized controlled technology’s failure (Ames, 2019). When they
trial: they found that people with darker skin do, they suggest that the technology simply
tones tended to have higher readings than lacked the needed support structure.
their white counterparts. This means that
when already marginalized people of color
Or, consider Boeing’s introduction of the
went to the hospital unable to breathe, a
Maneuvering Characteristics Augmentation
pulse oximeter reading might suggest to the
System (MCAS) system in its 737 MAX planes,
health care professional that they were not in
and the subsequent crashes of two planes in
distress. Likely as a result, in the early days of
Indonesia and Ethiopia in 2018 and 2019. 346
the COVID-19 pandemic Black patients were
passengers died in total. Boeing had installed
turned away from hospitals because their
the technology in its planes without alerting
blood oxygen was not low enough (Lothian-
regulators in the US or elsewhere. But when
McLean 2020; Rahman 2020). The New York
the first plane crashed in October 2018, the
Times and other prominent media outlets

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 59 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

company denied responsibility. It responded


that the plane was “as safe as any airplane
that has ever flown the skies”, and pointed
instead to human error (Robison, 2021).
When observers began to question its MCAS
system, the company persisted, arguing that
the pilots should have known how to handle
the emergency. Taking advantage of age-old
Western prejudices towards people in low
and middle-income countries, it suggested
that the Indonesian pilots had insufficient
expertise. If they had followed the established
emergency procedures, Boeing argued, pilots Credit: Library of Congress (CC0)

would have been able to reverse the plane’s


downward spiral and keep the vehicle aloft
(Glanz et al., 2019). In November 2018, the knobs, and a lack of seatbelts. He presented
company explicitly advised pilots to take this information to the auto industry at a
corrective action if the MCAS system engaged. conference in 1953, along with remedies that
But it was only after a second plane crashed would improve “crashworthiness” including
in Ethiopia in March 2019 that governments a collapsible steering column and a 3 point
around the world grounded Boeing’s 737 MAX seatbelt. However, the industry insisted that
planes. Boeing changed the aircraft design the problem was not with their products,
in response, and in December 2020 the US but rather with “the nut behind the wheel”:
Federal Aviation Administration allowed “reckless” drivers who were the cause of
the planes to fly again. Other countries deadly crashes. And they were successful
quickly followed. Until governments stepped for a time. It took external pressure–Ralph
in, Boeing kept deflecting blame for their Nader’s 1965 book Unsafe at Any Speed (Nader,
technology. 1965) and the resulting outcry–to trigger
regulatory action and ultimately changes
Similarly, since the automobile’s earliest to the technology’s design. But in the years
incarnation, automakers have refused to between DeHaven’s presentation and the
address known safety concerns or deploy publication of Unsafe at Any Speed, over half a
safety features until forced by legislation million people had died in car crashes.
(Singer, 2022). For example, Hugh DeHaven,
a former pilot who survived a deadly airplane Right now, LLMs are unknown to many
crash and then spent decades trying to except for the few who paid attention to
improve airplane and automobile safety, the controversies over Timnit Gebru and
collected reports from hundreds of crashes Margaret Mitchells’ firings. Given this, we
to identify the most dangerous parts of a car: expect LLM and app developers to try to shape
rigid steering columns that did not collapse early public opinion about the technology
on impact, unpadded dashboards, pointed

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 60 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

both by emphasizing its empowering particularly those who are marginalized as we


potential and humanitarian benefits, and discuss in the next section.
by encouraging flexible
interpretations of its use.
They will also try to integrate
it into existing sociotechnical As LLMs become ubiquitous, we
systems and infrastructure,
particularly those that
expect that developers will be
enjoy high public trust. As able to dismiss any problems or
the technology becomes errors and avoid real public or
ubiquitous, we expect that
developers will be able to
policy scrutiny–unless there is
dismiss any problems or a catastrophic outcome. In the
errors and avoid real public or meantime, the costs will likely be
policy scrutiny–unless there
is a catastrophic outcome.
borne by users, particularly those
In the meantime, the costs who are marginalized.
will likely be borne by users,

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 61 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

IMP LI CAT I O NS O F LLM A DO PT I O N

Section 4: Reinforcing
Social Inequalities

KEY POINTS

• LLMs will exacerbate the inequalities faced by marginalized communities.

• Individuals, particularly those from already marginalized communities, will bear the blame
for LLM error, rather than the developers or the technology itself.

• LLMs will reinforce English language, and ultimately Anglo-American, dominance, and
alienate those outside these cultures.

• Because they seem technical and objective, LLMs will obscure systemic biases embedded in
their design. This will make inequities even harder to identify.

In this section, we shift from the construction language towards minoritized groups. In
of LLMs to the implications of their use. AI other words, they reflect historical biases. Not
developers tend to emphasize the objectivity surprisingly, then, LLMs that generate “new”
of their technologies, but scholars point content end up reproducing these biases often
out how technologies always reflect the in the form of violent language (Abid et al.,
societies that make them (Benjamin, 2021; Tamkin et al., 2021). But fixing these
2019; Parthasarathy, 2007). The history of problems isn’t just a matter of including
technology provides numerous examples more, better data. LLMs are built and
of tools and systems that are presented as maintained by humans who bring prejudices
value-free and yet are skewed by built- and biases to their work, and who operate
in biases including racism, sexism, and within institutions, in social and political
xenophobia. The same is true for LLMs. contexts. This will shape the biases that
As described in the Introduction, they are developers perceive, and how they choose
trained on vast datasets composed of internet to fix them. Meanwhile, researchers have
text and historic literature; both contain already brought attention to how artificial
enormous amounts of prejudiced and hateful intelligence is exacerbating what they call a

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 62 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

“compute divide”: wealthy companies and to replicate and perpetuate racism and other
academic institutions have greater resources biases. Second, people are likely to assume
to invest in emerging technologies, which is that because they are based on vast amounts
likely to reflect their worldviews, needs, and of data and produced by highly technical,
biases (Ahmed & Wahed, 2020). proprietary algorithms, they will be objective.
Therefore, these biases will be harder to

In what follows, we suggest that LLMs identify and challenge. This section identifies

are likely to reproduce social biases in a these biases at a societal level; other sections

variety of ways beyond what observers discuss how LLMs will affect equity in the

have already identified. Trained on texts workplace (Section 5) and environmental

that have marginalized the experiences justice (Section 1).

and knowledge of certain groups and were


produced by a small set of technology
companies primarily in the United States
LLMs will Perpetuate
and China, LLMs are likely to systematically Inequitable Distribution
misconstrue, minimize, and misrepresent
of Resources
the voices of some groups, while amplifying
the perspectives of the already powerful. In
LLMs will reinforce the inequitable
addition to producing language that contains
distribution of resources, continuing to favor
racist, sexist, xenophobic tropes, they may
those who are privileged over people who
fail to include representations of minoritized
need aid the most. The racism embedded
groups altogether. These implications are
in the very design of many technologies
particularly problematic for two reasons.
including common medical diagnostic
First, LLMs have a wide range of possible
tools has had similar impacts. Consider the
uses across fields, so there is broad potential
spirometer, a device
widely used to measure
the volume of air inspired

Trained on texts that have marginalized and expired by the lungs.


It is used to diagnose
the experiences and knowledge of diseases such as asthma
certain groups and produced by a small and emphysema and

set of technology companies primarily to identify the cause


of shortness of breath,
in the United States and China, LLMs including environmental
are likely to systematically misconstrue, contamination. Patients

minimize, and misrepresent the voices breathe into a tube,


and the machine both
of some groups, while amplifying the measures lung function
perspectives of the already powerful. and assesses whether it is
“normal” using software.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 63 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

However, these assessments differ by race, numerous scholarly publications describing


based on false beliefs that race affects lung this bias, there have been no changes in
function, and that Black people naturally have medical diagnosis. Still today, lower lung
lower lung function than white people (Braun function in a Black person is often considered
et al., 2013). normal and does not trigger further action.

The racist belief that The bias built–and perpetuated–in


Black people have spirometry machines means that Black
lower lung function patients need to be sicker, with more
can be traced severe illness, in order to qualify for many
back to slavery. treatments and insurance coverage. For
White slavers used example in 1999, employees of an insulation
early versions manufacturer filed for disability payouts
of spirometry related to asbestos-caused lung disease. In
to “prove” that a bid to limit compensation, the company
Black people were set different standards for Black employees,
physically inferior; who had to demonstrate more severe disease
they found that and lower lung function than their white
enslaved Black coworkers to qualify for compensation
people had lower (Braun, 2014). After all, the company argued,
lung function than their Black employees started out with lower
free white people, lung function according to the spirometer.
without accounting This made it difficult for Black employees to
for the many challenge their employer, and it continues to
factors–such as the affect the quality of care that Black patients
effects of extreme receive to this day: the severity of their illness
physical labor is not recognized by the machine. As noted
and abuse–that in the previous section, the story of the pulse
could contribute oximeter similarly requires Black patients to
to such a disparity be sicker to receive care.
(Braun, 2021). These
assumptions were We can expect similar scenarios if, for
Spirometer diagram. Credit:
Wellcome Library, London then embedded example, health care professionals use LLMs
into assessments of to assist with diagnoses. Imagine an LLM
“normal” lung function, first in tables and app designed to summarize insights from
then in the software of the spirometer. Today, previous scientific publications and generate
most health care professionals operating health care recommendations accordingly.
spirometers have no idea that assessments But if previous publications rely on racist
of normal and abnormal lung function are assumptions, or simply ignore the needs of
based on this racist science, instead viewing particular groups as in the case of the pulse
it as an objective measurement. And despite

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 64 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

oximeter, the LLM’s advice is likely to be and patients must spend months or years of
inaccurate too. And while these cases focus therapy to develop new connections in the
on medicine, we can imagine other domains brain to accommodate the CIs and learn to
including criminal justice, housing, and associate the signals with different sounds.
education where biases and discrimination In other words, deaf adults may still need
enshrined in historical texts are likely to auxiliary services even when people assume
generate advice that perpetuates inequities in their problem has been solved.
resource allocation. Aadhar, India’s biometric
identification system, has already begun to Meanwhile, just as CIs were approved,
highlight such inequities. In order to receive academics and activists developed the concept
an Aadhar number, citizens must provide of Deaf Culture (Denworth, 2014). Deaf
their fingerprints, iris scans, and their Culture is based on shared values, experiences
photograph. For many of India’s marginalized and beliefs of people influenced by deafness,
communities, such forms of identification and serves as a form of organizing political
are impossible to provide: their fingerprints power. Activists define deafness as a neutral
have rubbed off due to years of manual labor, trait rather than a disability. Because most
or they cannot provide an accurate iris scan deaf children are born to hearing parents,
due to a disability (Singh & Jackson, 2021). acculturation happens primarily within deaf
And yet, these are the communities that organizations and schools. Activists fear
need Aadhar the most: in order to access any that the perception of CIs as a cure might
social services, they must provide not just lead to the reinterpretation of deafness as
their Aadhar number but also their biometric a voluntary disability, which could result in
information. the defunding of deaf institutions (Tucker,
1998). This could also make it more difficult
Cochlear implants (CIs), which we to access accommodations in employment
introduced in Section 2, also demonstrate or education such as interpreters, and
how technology can distort needs and alienate deaf people for whom CIs do not
ultimately erode services for marginalized work or who choose not to get CIs (Cooper,
populations. The FDA first approved CIs for 2019). Similarly LLMs, as a cheap and fast
adults in 1984 and for children in 1990 and translation and interpretation tool, could
consist of two components: a permanently actually lead to a reduction in other kinds
implanted internal set of electrodes that of support, including human interpreters,
interface directly with the nervous system, written materials offered in languages other
and an external processor that picks up than English, and even language learning
sounds and translates them to patterns of programs. This would create harm for
electrical impulses for the electrodes. While Indigenous groups, marginalized groups that
many may assume that CIs cure deafness by use dialects not covered well by LLMs, and
giving the wearers the same level of aural immigrant communities. After all, because
capacity as those with natural hearing, in fact the LLM corpora are mostly in English and
they only have a small range of audial inputs Chinese, they will be less accurate in other

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 65 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

languages. However, most people may not produce a risk score that assesses the urgency
know about these deficiencies. In high stakes of individual reports that come in through a
settings such as hospitals and courtrooms, child welfare hotline. Though it is designed
human translators, though fallible, can rely to be used in tandem with a human screener,
on a variety of cues to ensure the person they in practice the algorithm tends to train the
are assisting understands what is happening, humans, and over time the screeners’ scores
and can ask and answer clarifying questions. begin to match the algorithm’s. In other
LLMs cannot do those things. If the LLM has a words, independent human oversight is
poor understanding of a particular language, diminished.
or is otherwise unable to accurately translate
technical medical or legal terminology, AFST is supposed to be objective and evidence
individuals are left without support, which based, but its results overrepresent poor
depending on the setting could result in a and working class families in ways that
variety of negative outcomes. become a self-fulfilling prophecy (Eubanks,
2018). Simply asking for support from public
We can also expect similar outcomes when services including childcare, tutoring, or
LLMs are used in social service provision. therapy increases a family’s risk score in the
LLMs may be used to
automatically screen
applicants, or they might
be used as a chat function When institutions such as social
on websites to assist service agencies, hospitals, insurers,
people seeking resources
or banks use LLMs to determine
or help. But the historical
use of automated decision eligibility for or recommend products
making tools by social and services, we can expect that
service agencies produced
LLMs will make recommendations
results that are biased or
inequitable in ways the rooted in historical biases that will
tool is meant to prevent. then produce inequitable outcomes.
Allegheny County, based in
southwestern Pennsylvania,
adopted the Allegheny
Family Screening Tool (AFST), a computer- system. When wealthier families get that
based program designed to assess the risk same support, they do so privately so it does
that a child might be experiencing harm and not affect their score. As a result, simply
require intervention (Eubanks, 2018). The being poor or requesting help become “risk
AFST uses a wide array of historical data factors”. When a family’s score is higher,
about children and families, including data it increases the likelihood that a report will
from local housing authorities, the criminal result in a home visit, and pulls parents into a
justice system, and local school districts, to system of increased state surveillance, which

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 66 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

itself increases the risk of further harms such training data, rather than a characteristic
as removal of children from the home, that of these communities. Regardless, the
parents will be arrested or lose their jobs, or consequences are dire: these decisions will
even lose their housing. These outcomes then generally favor the powerful, and further
get fed back into the algorithm as evidence perpetuate inequitable distribution of
that its risk assessment was correct, even resources.
though in fact the risk assessment caused the
outcomes.
LLMs will Reinforce
When institutions such as social service Dominant Cultures
agencies, hospitals, insurers, or banks
use LLMs to determine eligibility for or Developers have argued that LLMs hold
recommend products and services, we can tremendous promise for language translation
expect that LLMs will make recommendations (Brown et al., 2020), which will ultimately
rooted in historical biases that will then promote closer relationships across
produce inequitable outcomes. They may communities and international cooperation.
systematically miscategorize or misinterpret We discuss how this capacity might transform
people based on language use patterns, or scientific work in Section 7. However, as
fail to include key elements of their situation. noted above, the largest and most powerful
Overall, like all conclusions drawn from LLMs are being built in the United States and
huge datasets, the LLM is likely to focus China, and their corpora are overwhelmingly
on correlations in historical data (Bender dominated by English and Chinese language
& Gebru et al., 2021; Spinney, 2022). When texts. Although developers are optimistic
evaluating eligibility for social services, LLMs that LLMs will be able to translate across
might reinforce stereotypes about people languages based on minimal training text,
who have previously used social services mistranslation is still likely. We are also likely
such as food banks or have a criminal record. to see language dominance (often English)
They may also be used to collect and store reinforced and cultures distorted and even
additional information about people, similar erased.
to the way facial recognition technologies
are being used in housing under the guise
Let’s return to the case of cochlear implants.
of ease of use (Strong et al., 2020). Or,
Some deaf activists argue that they are an
institutions might use LLMs to determine
attempt by hearing medical professionals
whether a request for services is sincere or
and parents to erase deaf culture (Ramsey,
to provide advice in the interest of lowering
2000). They fear CIs will lead to fewer
the workload of social workers. LLMs might
students participating in deaf organizations
recommend against issuing a loan to some
where acculturation to deaf culture and
historically disadvantaged communities of
visual language learning happen, and that
color based on historical evidence that they
a generation of deaf children will grow up
might default. But this correlation is the
without learning sign language and will
outcome of prejudice and stereotypes in the

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 67 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

have difficultly communicating in the future historically suspended understandings


because CIs are only a partial fix. LLMs could especially of the non-American or Chinese
have similar impacts on other marginalized cultures represented in its corpora. And they
cultures. While some developers have argued could erroneously perpetuate these limited
that LLMs could help preserve languages understandings, even as these cultures are
that are disappearing (Coffey, 2021), they changing, which could exacerbate cultural
are also likely to contribute to the erasure misunderstandings at the expense of the
of marginalized cultures and languages. people of those cultures.
Because their corpora are English and Chinese
dominant, LLMs learn the rules of language
through the lens of these languages, and
Responsibility for the
are thus likely to be most accurate in their Technology’s Errors
dominant training language. Eventually this
will reinforce the dominance of standard
will fall on Marginalized
American English in ways that will expedite Communities
the extinction of lesser-known languages,
and contribute to the cultural erasure of Given the limitations in LLMs and the corpora
marginalized people. Of the 300 Indigenous behind them, we expect that the technology
languages that were once spoken in the will be less accurate in already marginalized
United States, only 175 remain today with communities. But these inaccuracies may not
most of them at risk of extinction (Cohen, always be clear. Let’s return to the example
2010). of the spirometer. It is the gold standard for
diagnosing and monitoring a broad range
LLMs might also distort our understanding of common lung conditions, but patients
of other cultures. Consider the ongoing need intact cognitive abilities, muscle
controversy over a museum to preserve and coordination, and a certain level of physical
exhibit the history of Chinese Americans, strength to use it correctly. If it produces
built in New York City’s Chinatown in 1980 an inaccurate reading or misdiagnosis, the
(de Freytas-Tamura, 2021). The museum patient is usually blamed (Braun, 2021).
recently received a $35 million grant But the problems with spirometry are
from the city in exchange for allowing the systemic; in addition to its inaccuracies in
expansion of a jail in the neighborhood. Black people, it consistently fails for people
Opponents argued that while the initiative with certain disabilities. One could imagine
is well-intentioned, funding the museum that the spirometer could be redesigned to
supports the preservation of only a narrow accommodate these ground realities. Instead,
slice of Chinese culture while not considering individuals bear the responsibility for the
or supporting the ongoing vitality of the technology’s failures. If they cannot get an
community itself particularly as it faces accurate spirometer reading, patients may
gentrification pressures (de Freytas-Tamura, undergo more invasive means of assessing
2021). Similarly, LLMs could preserve limited, lung function or lose access to important

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 68 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

social or health care services. And all the regions of the country, including urban,
while, they may not know that the problems suburban, and rural areas, private cars are
are systemic and instead blame themselves. the only available method of transportation,
but cars are inaccessible to large portions

The same is true for pulse oximeters, of the population. They are expensive to

which we discussed in Section 3. Health purchase, maintain, insure, and store,

care professionals trusted the technology and 40% of people with disabilities in

and perhaps also mistrusted the patients the U.S. cannot or do not drive (Bureau of

due to systemic biases (Fitzgerald & Hurst, Transportation Statistics, 2018). Meanwhile,

2017), and marginalized communities had to many forms of employment, important

manage the consequences without knowing services, and markets are only accessible by

that the seemingly objective technology had car. But governments and businesses rarely
failed them. We anticipate similar outcomes acknowledge or do anything to address these
with LLMs. They might produce biased text, inequities. As a result, these communities

or some communities will not be able to use are not only further marginalized, but also

them due to financial limitations, disability, alienated because their concerns seem rare

or language barriers. But the technology and out of the mainstream (Schmitt, 2020).

will not be blamed, especially as it becomes


ubiquitous. Instead, any problems will be In addition, car crashes kill nearly 40,000
individualized and treated as a personal Americans every year, and seriously injure
failing. In some cases, the problem might millions more (National Highway Traffic
not even be clear and even more difficult to Safety Administration, 2021); Black and
identify and solve. Latinx pedestrians and bicyclists are
disproportionately the victims of car crashes,

Finally, we know that building and even though they are less likely to have access

maintaining LLMs is extraordinarily to cars (Schmitt, 2020). However, local, state

expensive, and requires an enormous and federal governments do little to improve

amount of computing power. Even using other forms of transportation or protect the

them requires access to a computer and high safety of more vulnerable road users (Shill,

speed internet; as LLMs become embedded 2020; Singer, 2022). Lack of access to a car,

in more areas of life, lack of access will as well as being killed or injured by one, is

deepen existing inequalities, but the cause treated as a personal shortcoming, rather

will not be visible. In the US, for example, than as a societal failure. Lack of access to

both infrastructure and architecture are LLMs, as well as any negative impacts an LLM

largely built for cars. Both road design might have on a person’s life, will similarly

and transportation policy favor the speed be blamed on the individual, rather than the

and convenience of people driving private systems that produced those impacts.

vehicles over the safety and wellbeing of


people who walk, use public transportation, In sum, we imagine that LLMs will reproduce
or ride bicycles (Shill, 2020). In many societal biases in a few ways. First, because

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 69 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

they rely on historical texts, they are likely creating historically arrested caricatures of
to reproduce systemic biases reflected in others. Finally, as they become ubiquitous
those texts. But, these biases will not be their limitations and errors will become less
clearly visible to LLM users because they clear. Users will absorb the responsibility and
will be reproduced by seemingly objective blame, sometimes without even realizing it.
algorithms. Second, they are likely to This phenomenon is likely to be much more
reinforce already dominant cultures, while acute in marginalized populations.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 70 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

IMP LI CAT I O NS O F LLM A DO PT I O N

Section 5: Remaking
Labor and Expertise

KEY POINTS

• LLMs will transform, rather than replace, most occupations. In most cases, humans will shift
to more complex and risky tasks.

• LLMs will transform authorship and associated standards for certification and evaluation.

• LLMs will eliminate some tech-based professions and enable others.

• Workers, supported by consumers, will resist these technologies.

For years, observers have predicted that the


rise of artificial intelligence would trigger
Transforming Professions
significant job losses, particularly for those
in lower skilled occupations (West, 2019). As they become more accurate, LLMs will
Our analogical case study analysis validates change work across a range of jobs, from
these concerns, and suggests that LLMs will translation to customer service. Over time,
transform some professions completely. But they will likely perform central parts of
we expect LLMs to have a major impact on even high-skilled professions including
higher skilled professions, which will use the constructing legal arguments and thus
technologies to summarize, generate, and replacing the lawyer’s typical tasks
translate text. While LLMs will be initially (Blijd, 2020). Consider how technology
introduced as assistive technologies, they has transformed the medical profession
will eventually take over more common (Howell, 1995). Over the last two decades,
and predictable tasks. This will leave more the internet has allowed patients to search
challenging labor to humans which will carry for information related to their concerns
physical, psychological, and social risks. We and join online support groups to develop
also expect these changes to trigger popular knowledge about the conditions that directly
unrest and mobilization, both organized and affect them. They then visit their physicians
informal. armed with this information, ready to ask

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 71 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

for particular services or treatment plans. In By the middle of the 20th century, new
market environments like the United States, technologies such as chromosomal analysis
some are even prepared to visit another assisted their work, but counselors still played
physician to confirm their diagnosis and the primary role in interpreting the results
facilitate their proposed treatment. Thus and guiding people through difficult decisions
far, the “new expert patient” has not led to
massive deprofessionalization of medicine.
However, the doctor-patient relationship has
changed (Tan & Goonawardene, 2017; Broom,
2005). Rather than providing their clients
with information, physicians are increasingly
focusing on helping them manage and
interpret it. This new role comes with new
expectations, as it requires physicians to
stay up to date on new medical research. And
patients may make more demands for access
to diagnostic, prevention, and treatment
technologies. LLMs are likely to increase
patients’ access to biomedical knowledge.
As more scientific research is incorporated Credit: Sven Dowideit (CC BY SA 2.0)

into the corpora (Else, 2021), the models


will be able to summarize recent findings at
a level that lay people can understand. This
will exacerbate the trend identified here, in about marriage, reproduction, estate
which physicians play a more supportive, planning, and communication and disclosure
interpretive role than a didactic one. among loved ones (Rapp, 1999). But by the
end of the century, companies had begun to

LLMs will also perform more mundane tasks offer genetic testing directly to consumers.

and shift the risky work onto humans. In In contrast to genetic counseling services

the early part of the 20th century, genetic available mostly at universities for relatively

services were only offered to the public high prices, genetic tests could be ordered

through geneticists or genetic counselors online for a much lower fee. And testing

who had extensive graduate training and companies claimed greater accuracy than

worked at specialized clinics (Hogan, the human interpretation of family histories

2016), usually based in academic medical of disease (Parthasarathy, 2007). Today,

centers. These experts work with families to these direct-to-consumer genetic tests play

understand their histories of and experiences a central role in assessing susceptibility to

with particular diseases and then use this disease. While specialized genetics clinics

information to predict whether a disease remain, they are small, unknown to many,

might emerge in subsequent generations and tend to focus on complex cases. Primary

and advise how to avoid such circumstances. care physicians are often not able to answer

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 72 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

patients’ genetics-related questions if they


are not equipped with specialized genetics
LLMs and Gatekeeping
expertise. The tests have thus offloaded the
mundane task of genetic testing from experts LLMs will also transform the social
and put untrained patients and physicians in understanding of authorship and professional
the risky position of interpreting the results. standards, particularly in fields that prize
writing including law, academia, and
journalism. Authorship and credit are socially
Such changes are not unique to high-skilled
constructed, influenced by the circumstances
professions. Despite the rise of point-of-
of the time and place. For much of modern
sale (POS) systems in supermarkets and
history, for example, authorship–as
other stores, cashiers have not become
defined by copyright law–was restricted to
obsolete (Mateescu & Elish, 2019). Instead,
individuals legally recognized as fully human:
they have been retrained to help customers
white men (Vats, 2020). Similarly, major
use “self” checkout systems. However, as
scientific prizes tend not to recognize the
with physicians and genetics professionals,
contributions of women, even today (Lincoln
their labor now focuses on facilitating the
et al., 2012).
user’s interaction with the technology (e.g.,
difficulty scanning
an item or a
coupon), which
substantially
LLMs will interrupt our understandings of
changes the nature
of their jobs. They
authorship and require us to reconfigure
are more likely our systems of evaluation and certification.
to encounter
customers who
are tense and
frustrated in their interactions with the But technology also plays an important role
technology, which puts them at higher risk in constructing authorship. The invention
and removes the pleasure of mundane, of the typewriter in the mid-19th century
informal interactions. In this and many other raised a serious question: how could you be
cases, technological automation transforms sure who authored a document? Previously,
a consistent and predictable job to one that both individuals and legal authorities trusted
deals primarily with exceptions and other the authenticity of documents because they
problems that the technology cannot solve were handwritten and could be scrutinized
(Chui et al., 2015). using formal handwriting analysis. But the
typewriter triggered forgeries and even
fraudulent transactions (Moore, 1959). This
led to the establishment of a “document
examiner” who was widely recognized as
an expert in determining the genealogy of

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 73 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

a document. Based on a document’s page


alignment, spacing, ribbon, color, overtyping,
The Fall and Rise of
type variation, retyping, and more, the Tech-Based Work
examiner could link it to a particular
typewriter. In other words, with the rise of We believe that in many cases, LLMs are
the typewriter, document examiners became likely to change rather than eliminate most
central to the construction of authorship. occupations. Before the invention of the
traffic light, police teams managed flows
The rise of the premier scientific journal of traffic manually, physically standing at
Nature is another useful analog. In its early intersections and directing traffic (McShane,
years, the journal targeted a wide audience 1999). Later, traffic lights used timers to
including laypeople. But over time, it began to automate light cycles, removing police from
focus exclusively on scientific professionals; the activity and making it the responsibility
it excluded political topics and published of electricians and engineers. Police officers’
frequently to compete with other scientific responsibilities then shifted to dealing
journals (Baldwin, 2015). In the process, it exclusively with violations and enforcing
defined science itself, constructing it as a traffic laws.
technical domain of interest to a narrow set of
practicing experts. However we expect that LLMs will eliminate
some types of work completely, and trigger
Like typewriters and Nature, LLMs will the creation of new types. The rapid social
interrupt our understandings of authorship acceptance of the telegraph during the
and require us to reconfigure our systems of 19th century, for example, allowed buyers
evaluation and certification. Everyone, from and sellers to communicate directly and
high school teachers to the judges of the inexpensively, and removed the need for both
Pulitzer Prize competition will need to decide wholesalers and supply chain middlemen
whether they will accept LLM-generated who had previously facilitated commerce
work, how much, from whom, and under (du Boff, 1984). Meanwhile, telegraphs also
what circumstances. They may also have to created new problems that required whole
change their standards accordingly. However, new categories of work. Telegraphs made
if these institutions accept LLMs as legitimate it possible for all kinds of information to
authors, this could increase inequality for travel very quickly, regardless of its veracity.
researchers or writers who do not use or Malicious misinformation triggered financial
have access to LLM-based tools and strip the panics. In response, companies developed the
authors who wrote the text the LLMs were new field of ‘business intelligence’ to validate
trained on of due credit and value. and distribute trustworthy information
about commodities via telegraph (du Boff,
1984). Similarly, we expect that even as
LLMs reduce or eliminate the need for some
types of human labor, they will also prompt

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 74 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

the development of new professions and


categories of expertise.

More recent paradigm shifts in


communications technology have also
created new professions and categories of
work. In response to the flood of violent and
often harassing vitriol unleashed on social
media platforms, an entire ecosystem of work
and expertise has developed. Social media
companies created the content moderator
to police and filter what users post on their
platforms, and formed teams dedicated to Credit: Max Gruber / Better Images of AI (CC-BY 4.0)

developing principles and best practices for


content moderation (Roberts, 2021; Gray
toll (Roberts, 2021; Perrigo, 2022). In other
& Suri, 2019). Legal departments handle
words, workers in the Global North do safer,
lawsuits related to freedom of speech,
higher paid jobs while workers in the Global
harassment, and defamation. Researchers in
South, who have to manage the failures of
academia study the impacts of toxic content
the technology, are paid less and subject to
on society, the experiences of workers
psychological, emotional, and moral costs.
responsible for moderating toxic content,
We anticipate that LLMs will create similar
and the effectiveness of different content
divisions of labor, especially because many
moderation approaches (Roberts, 2021; Gray
of the companies that created the need
& Suri, 2019).
for content moderation are the same ones
developing LLMs.
Within social media companies, the work
of content moderation spans a global labor
hierarchy similar to what we anticipate will Labor Unrest
happen with LLMs. Employees at the top
of the hierarchy, dedicated to developing The history of technology is a history of labor
content policy and making organization- unrest as workers worry how technological
wide decisions, are highly paid, and typically changes will affect their jobs. In fact, Luddites
located in San Francisco or another major city – now a term to describe someone resistant
in the Global North. Workers at the bottom, to emerging technologies – were 19th century
who review content and process complaints, textile workers who destroyed new machinery
are typically located in countries with less because they worried about their job security
expensive labor such as the Philippines (Sale, 1995). Similarly, police officers initially
or India. These “clickworkers” handle an resisted the introduction of traffic lights
immense volume of disturbing content at operated by engineers (McShane, 1999).
a rapid pace, which takes a psychological Given this, we can expect some affected

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 75 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

workers to resist LLMs as well. Labor In addition to automation, there are several
mobilization is likely to emerge first among recent cases of workplace surveillance
unions, and then spread more widely. technologies prompting resistance from
workers. In Amazon fulfillment
centers, inventory scanners also
track the workers; the scanners
We can expect some affected calculate the workers’ efficiency and

workers to resist LLMs. productivity and penalize them if


they do not meet targets determined
by algorithms (Guendelsberger,
2019). Amazon has also implemented

Workers, both unionized and not, have software to track union activities in facilities
consistently opposed the implementation (Del Rey & Ghaffary, 2020). In response,
of automated checkout devices at grocery employees started using encrypted

stores, arguing that they eliminate jobs and communication platforms to organize

disproportionately impact people of color (Palmer, 2020). Sometimes, labor unrest

(Harnisch, 2019). Some cashiers opposed is less coordinated. Truck drivers have

them by going on strike (Lombrana, 2019; responded to electronic monitoring by finding

Pilon, 2019), and the United Food and creative modes of resistance that are less

Commercial Workers International Union’s confrontational and more identity-affirming

(UFCW) developed public campaigns against (Levy, 2016). One trucker, for example,

Amazon’s cashierless grocery store model simply found a way to reprogram his vehicle’s

(UFCW, 2020). Consumers have also refused surveillance technology so that he could play

to engage with automated POS systems in solitaire. During the Covid-19 pandemic when

solidarity with grocery workers, as well as office workers shifted to remote work en

in opposition to the loss of tax revenue that masse, companies deployed tracking software

results from replacing tax-paying human on employee computers that captured when

workers with machines (Harris, 2018). people were using their mouse or keyboard.

These fears of job loss and degradation are In response, employees started using mouse

not unfounded; grocery stores in France jigglers, keyboard tappers, and special apps

used automated POS systems to circumvent to trick the trackers (Cole, 2021). When bosses

national labor laws, keeping stores managed use technology to control their workers,

by machine check-out counters open past workers use technology to evade control.

legal working hours (France24, 2019; In the next Section, we describe how this

Alderman, 2019). The case of POS automation technological one-upmanship leads to a loss

suggests that LLMs might similarly incite in social trust overall.

resistance from workers and consumers


based on fear of job loss, violations of social LLMs are poised to remake labor and
norms, and reduced income taxes. expertise across a wide swath of industries
and professions. Some of these changes
are likely to expand access to knowledge

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 76 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

or services that were previously limited by onto existing corporate hierarchies rather
certification and professional gatekeeping, than disrupting or overturning the current
as in the case of direct to consumer genetic economic order. Even so, we expect that
testing. Some will improve convenience for LLMs will exacerbate existing tensions
users at the expense of workers whose job between workers and corporations. Because
or expertise becomes obsolete. Meanwhile, of their flexibility, LLMs will function as both
whole new forms of work and expertise automation and surveillance technology,
will grow around the design, management, producing many areas for worker and
and use of LLMs. Those new roles will map consumer resistance.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 77 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

IMP LI CAT I O NS O F LLM A DO PT I O N

Section 6: Increasing
Social Fragmentation

KEY POINTS

• Publics will use LLMs to gather information that aligns with their interests and values, and
ultimately challenge traditional expert authorities.

• The tailored information provided by LLMs will erode shared realities.

• LLMs will produce widespread public suspicion about legitimate authorship.

• LLMs will help outsider groups participate more actively in highly technical discussions
related to science, technology, medicine, and the economy.

• LLMs will be less useful to already marginalized groups, increasing their social alienation.

Thus far, we have described LLMs primarily impacts for institutional trust and social
as a workplace technology that can improve cohesion. In recent years, the United States
professional life. But we also expect has seen declining trust in authoritative
that publics will find great use in these institutions, from the government to science
technologies. As we suggested in the last (Kennedy et al., 2022). We also have less
Section, patients may use LLMs to access trust in one another (Rainie & Perrin, 2019).
technical information about their medical Citizens feel that decisions made by elite
conditions, which will empower them in their institutions do not reflect their knowledge
interactions with physicians. Governments and lived realities (Parthasarathy, 2017).
might use LLMs to extract insights from Media fragmentation has contributed to this:
large volumes of public comments about a citizens can now find information that fits
proposed regulation, as a step towards more with their needs and values. We expect that
politically legitimate policies. LLMs will accelerate this trend. Publics will
solicit information that aligns with their

But this movement towards public interests and values, which may contradict

empowerment will likely have negative expert knowledge authorized by, for example,

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 78 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

scientific, legal, and medical establishments. over the Bible’s teachings because the text
LLMs will also be a crucial information- was in Latin. Worshippers could not access
finding tool for social justice groups the Bible’s teachings directly. German priest
traditionally excluded from technology Martin Luther was frustrated that his fellow
and technology policy domains, which will priests abused this power; for example,
continue to erode institutional legitimacy and they claimed that the Bible endorsed the
accelerate social fragmentation. Meanwhile, exchange of money for admittance to heaven
LLMs will generate questions about (Edwards, 2004). Luther responded by
authorship that will erode interpersonal trust. translating the Bible into vernacular German
Overall, we expect that the least privileged and using the newly developed printing press
groups will experience the greatest social to disseminate the text, and his critique of
alienation. These groups already feel ignored the Catholic Church, across the country. His
by authoritative institutions and as we actions triggered the decline of the Catholic
discuss earlier in this report, LLMs are likely Church and rise of Protestantism in Germany
to reproduce historical biases about, for and other parts of Europe (Dickens, 1974).
example, the physiology, health, and lives of
marginalized communities. Similarly, patients today use the internet
and social media to challenge the knowledge

LLMs will destabilize monopolies of their physicians. These


technologies allow them to research their
institutional authority symptoms, diagnose themselves, and
come to their medical appointments armed
For generations, experts across a range with information. This allows patients to
of fields, from science to the law, have advocate for themselves, develop a better
controlled the production and interpretation understanding of their condition, and
of information. In order to file a complaint change the types of conversations they have
against their landlord, tenants usually need with their physicians, as suggested in the
the help of a lawyer. To combat local air previous Section (Tan & Goonewardene,
pollution, residents need scientific specialists 2017). However, patients lack the training and
who can help them interpret their symptoms experiences of physicians, leading them to
quantitatively. These knowledge monopolies sometimes rely on incorrect information or
have given these professions economic, magnify the importance of websites that fit
political, and cultural power. But the history with their preconceptions. When physicians
of communications technologies suggests are open to talking through this information,
that LLMs will disrupt these monopolies. they are able to maintain their patient’s
trust (Tan & Goonewardene, 2017). But
physicians often resist or refuse to discuss
The emergence of one of the earliest
the information, which leaves patients
communication technologies, the printing
frustrated (Stevenson et al., 2007; McMullan,
press, is an excellent example. In 16th century
2006). Ultimately, this leads patients to seek
Germany, Catholic priests held a monopoly
out other physicians who might validate

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 79 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

their concerns and the evidence they have they will need to evolve to the changing
uncovered (Murray et al., 2003). In some information landscape.
cases, following misleading online advice can
cause physical harm (Bessell et al., 2003), as
in the recent case of COVID-19 treatments
LLMs as a mobilization
(Mariana, 2020). Overall, because physicians tool
no longer have exclusive access to medical
knowledge, the internet is eroding some of
We also expect activists to use LLMs to
their power. Patients now feel emboldened
challenge particularly technical areas of
to ask questions, and sometimes get other
science, technology, and public policy. In
medical opinions if they are unsatisfied with
recent decades, communities–particularly
the first.
those that are low-income and historically
disadvantaged–have become frustrated that
In Section 5, we discuss how LLMs are likely science and technology do not reflect their
to reshape knowledge-based professions. needs and priorities, and have mobilized in
Here, we conclude that in the aggregate this response. But in order to influence decision
will also destabilize the cultural power of making, they need to develop a technical
professionals. Developers may create apps understanding of the issue at hand and
also translate their concerns into
quantitative or scientific language
(Parthasarathy, 2010). In the 1980s,
fed up with the lack of research
LLMs will destabilize the cultural and treatments for AIDS, activists
power of professionals. taught themselves immunology,
microbiology, public health, and
the science of clinical trials in
order to translate their concerns
that provide individuals with medical or to scientists and policymakers (Epstein,
legal advice, or scientific information. LLMs 1996). They used this knowledge to force
could go as far as generating contracts such the National Institutes of Health (NIH) to
as drafting an amicus brief in a court case, or fund more HIV/AIDS research, demand that
offering chat-based psychiatric services, thus the Food and Drug Administration expedite
providing individuals with direct access to approval of potentially useful treatments, and
some of the services that they would normally change how biomedical scientists tested the
only access through experts and often for effectiveness of drugs. A decade later breast
a significant fee. We expect that this will cancer activists, similarly concerned about
profoundly challenge social understanding of the scale and ferocity of the disease and what
expertise and authority structures, and may they perceived to be a weak scientific and
even challenge certification systems. While government response, took a similar path
authority figures will still be necessary for (Dickersin et al., 2001). They set up special
some tasks such as writing prescriptions,

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 80 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

workshops to train people with breast cancer community finally began to notice, and
about the science of the disease, diagnostics, Congress authorized a research program
treatments, and prevention measures, and dedicated to the disease, they were frustrated
then successfully lobbied the government that their own knowledge-gathering and
to include these people on committees that expertise were quickly dismissed once
reviewed applications for breast cancer authoritative figures stepped in.
research funding.

We expect that outsider activists will use


More specifically, activists have a long record LLMs in a variety of ways. Communities
of using technology to bring data and gain that feel unheard by scientists, engineers,
public attention to their issue of concern. and policymakers will use the technology
For years, residents of Norco, Louisiana, to summarize the state of knowledge in a
had complained about the air pollution particular area in order to participate more
coming from a local chemical plant. But the confidently in public debate. Some will use
government did not take these concerns LLMs to extract insights from or evaluate
seriously; according to its measurements, information that experts have traditionally
which analyzed average air quality over 24- ignored, such as pollution impacts, and then
hour samples taken once every six days, local bring them to the attention of other citizens
residents were not at high risk (Ottinger, and decision makers with the additional
2010). But residents worried about the legitimacy of the technology.
impacts of short term pollution flare ups that
the government’s sensors did not capture.
So they taught themselves not only about the
LLMs will increase social
science of air pollution and its health impacts, fragmentation and
but also about monitoring technologies. They
constructed their own bucket monitoring
mistrust
system that took measurements over
shorter duration and during the flare ups LLMs will help people access information

(Ottinger, 2010). Ultimately, this influenced that fits with their interests and values,

air pollution monitoring systems not only which means that neighbors might end

at the US Environmental Protection Agency up consuming rather different media.

but around the world (Scott & Barnett, 2009). Of course, this phenomenon is not new.

Similarly, patients have used technology to Benedict Anderson (1983) once famously

challenge expert understandings of disease. wrote that newspapers construct “imagined

Recently, long COVID sufferers used Twitter communities”, but we are now seeing how

to identify themselves, find one another, the explosion of online media and cable news,

and crowdsource symptoms and treatments by providing content according to the user’s

months before scientists or physicians needs and priorities, can actually produce

acknowledged that the condition existed the opposite. While this diverse media

(Callard & Perego, 2021). However, although landscape broadens the perspectives involved

they were pleased that the biomedical in public and policy discussion by allowing

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 81 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

more people access to information and the Cable news channels were successful, building
ability to find communities where they are loyal audiences over time. Fox News, in
most comfortable, it also increases social particular, became a favorite for conservative
fragmentation. We expect that LLMs will audiences and eventually integral to their
further erode the shared realities that were identities (Hoewe et al., 2020). These
once constructed through common media channels don’t just provide different
consumption and ultimately, erode social perspectives on the same issues. They often
trust. report on different issues entirely, which
gives their viewers different understandings

The impact of the fragmentation of US of what is happening in the world and makes

television news provides a cautionary tale. it difficult to maintain a shared reality, which

For decades, Americans gathered around their contributes to political polarization (Gordon,
television sets at 6pm to watch the evening 2000). In recent years frequent viewing of Fox
news on one of three broadcast networks: News, for example, shaped attitudes towards

NBC, ABC, or CBS. Executives at these building a wall on the US border with Mexico

networks had spent the day deciding which and government action regarding climate

news was important to tell the American change (Hoewe et al., 2020).

public, and curated the


text and images to speak to
the nation. But in the late
20th century the number
of channels available to
LLMs are likely to produce greater
households increased social fragmentation than cable news.
dramatically, and people
eating dinner could choose
all sorts of programming
to accompany them. In 1980, CNN appeared, LLMs are likely to produce greater social
and gave viewers the opportunity to watch the fragmentation than cable news, as users will
news all day. Media executives and advertisers be able to use LLMs to distill the text they
soon learned that there was an audience consume into forms that fits their individual
for 24-hour news, and launched both Fox needs and priorities. A user could use an LLM
News and MSNBC in 1996 to take advantage to filter news articles or summarize the key
of the market. Now, there was a battle for pieces of information. Consumers would
viewership among the multiple broadcast thus no longer be exclusively shaped by the
and cable networks, and each tried to cater priorities of media executives, at least until
to a different audience (Fox to conservative media executives figure out how to integrate
viewers, MSNBC to liberal viewers, CNN to LLMs into their offerings. Websites or social
establishment viewers) (Morris, 2007). They services that use LLMs to generate text may
also focused on sensational stories in order cater to different demographics with new
to capture and maintain attention (Gordon, specificity and accuracy, but in the process
2000). of providing bespoke information they will

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 82 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

erode shared realities even further. As noted material in the database, it flags the paper
above, unlike social media, LLMs will not for cheating. Students are required to submit
even be able to build new communities. their papers to Turnitin for plagiarism review
Meanwhile, developers will have an incentive before they go to the instructor for grading.
to intentionally tune LLMs to generate text However, this has triggered a technological
that is as attention-grabbing as possible, in arms race. A variety of services have emerged
order to capture more users. to help students cheat while evading
detection by Turnitin, from websites full of

In addition, as LLMs get better at writing text how-to advice to paid essay writing services.

that is indistinguishable from something a They are all findable with a quick web search.

human could have written, they will not only


challenge the cultural position of authors LLMs will trigger a similar dynamic. As
but also trust in their authorship. As noted students use the technology to write better
in the previous section, every technology papers, instructors will employ more and
that enables new ways to make, copy, and more sophisticated methods of detecting LLM
distribute creative work produces a new round assistance, and students will fight to stay one
of both cultural and legal negotiations about step ahead. On both sides, companies will
how authorship is defined and authenticated. be ready to stoke and profit from mistrust
LLMs will be used for writing tasks that by selling tools and services that promise
range from enhanced spelling and grammar to detect or obscure the use of an LLM in
checkers to producing entire paragraphs writing. All of this will erode trust between
or even articles from whole cloth. This will students and their educational institutions.
make it much more difficult to determine This phenomenon will not be unique to
just how much human effort was involved in educational institutions. The more writers of
the creation of a given text and will enhance all kinds use LLMs for assistance, the more
social suspicion related to authorship. efforts to authenticate whether they “really”
wrote their article or book, and the more

For example, many schools and universities writers will find new ways to take advantage

today use plagiarism detection technologies of LLM capabilities without detection. In the

to prevent student cheating. One such service, long run, this will foster cultures of suspicion

Turnitin, compares the submitted paper on a massive scale.

against its massive database of previous


student papers, as well
as common sources
like encyclopedias and
textbooks (Foster, 2002; LLMs are likely to foster cultures of
Davies, 2022). If the
software determines that
suspicion on a massive scale.
some or all of the text is
substantially similar to

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 83 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

motion-sensing, video-recording doorbell


LLMs will have disparate connected to an app–as fun, a way to be a
impacts on social trust good neighbor, and stay safe (Selinger &
Durant, 2021). Part of the marketing behind

Although we expect LLMs to be framed as the devices focuses on “surveillance as a

mechanisms for empowerment and increased service” (West, 2019). Scholars call this

access to knowledge, the composition of “luxury surveillance”, because it enables

the corpora coupled with the priorities of self-reflection, empowerment, or care for

developers will likely result in a technology a select group (Gilliard & Golumbia, 2021).

that is more useful for dominant groups This framing focuses on white and wealthier

than for communities that are already members of society, who tend to view law

marginalized. Since they are trained enforcement and surveillance technologies

primarily on text written by the majority, as protecting their interests. Furthermore,

LLMs better reflect their views and linguistic many individuals dislike surveillance, they

style. There is already evidence that LLMs may feel like they have nothing to hide and

reflect racial and other forms of social bias are in control of the technology (rather than

against marginalized populations (Abid et al., the opposite). By contrast, people of color

2021; Greene, 2021). Meanwhile, privileged and other disadvantaged communities tend

members of society are likely to have more to experience “imposed surveillance”, such

opportunities to shape the ways LLMs are as Detroit’s facial recognition technology

integrated in daily life. This, in turn, will program Project Greenlight, which is imposed

create distance between social groups. The on the local population. They tend not to trust

more that LLMs shape public and private police or surveillance technologies because

sector services, the more marginalized they have a long legacy of being victimized by

communities will feel alienated from them them (Browne, 2015).

and from society.

In Section 4, we discuss how technology


can reinforce systemic bias. Here, we
emphasize how the same technology can
be seen completely differently by dominant
and marginalized groups, which has serious
impacts for public trust. For decades, the
United States and other countries have used
cameras to ensure public safety. However, in
practice, they tend to extend and reinforce
surveillance over historically disadvantaged
communities of color while making dominant
communities feel protected. Amazon, for
example, portrays its Ring doorbell–a Credit: WikiCommons user Abas Gemini (CC BY SA 4.0)

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 84 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Biometric technologies, which track location useful to these communities, it may make it
and bodily measurements such as heart more difficult for these individuals to access
rate, pulse, and sleep, have similar disparate crucial programs that might improve their
impacts (Gilliard & Golumbia, 2021). Law lives.
enforcement officials have long used ankle
monitors to keep track of individuals caught This is likely to increase social alienation of
up in the criminal justice system, whether out already marginalized communities. Thanks
on bail, on house arrest, or on parole. But the to the long legacy of racism in biomedicine,
ankle monitor is quite similar to the FitBit or Black communities distrust both scientists
Apple watch, except that the agency of the and physicians. This has had serious impacts
user differs. As we discuss in Section 4, even during the COVID-19 pandemic, for example,
roads have had these kinds of impacts. In the when the US Black community had a
1950s, city planners across the United States particularly low vaccination rate (Willis et al.,
used the emerging interstate highway system 2021). Similarly, frustrated by a long history
to segregate Black and white communities. of media bias (Race Forward, 2014), Black
This quickly became a crucial dividing line Americans tend to trust media sources based
that allowed white neighborhoods to attract on how they portray their racial group (Kilgo
investment and feel protected, while their et al., 2020).
Black neighborhoods were isolated and
economically starved (Miller, 2018).
LLMs’ capacity to summarize
and generate text will
undoubtedly benefit users

Marginalized individuals may not by answering their complex


queries and making technical
get to decide when or how they information more accessible.
encounter an LLM. This will empower people to
fight for their needs in their
individual interactions with
medical and legal experts, and
We expect LLMs to have similar impacts. to mobilize against technical organizations.
Privileged communities might be able to But, this individual and community
choose to use them, to help write a blog post, empowerment has social costs. We expect
summarize technical information, or file a that LLMs’ capacity to produce tailored text
legal complaint against a service provider. will further fragment society, as publics can
But already marginalized individuals may essentially generate information or look at
not get to decide when or how they encounter information through a lens that fits with
an LLM. For example, government agencies their needs and values. This is likely to hurt
might use them to evaluate someone’s already marginalized communities the most,
application for social services, or as a chatbot since LLMs are likely to be the least useful for
that answers public questions. But because their needs and even reproduce biases against
the technology is less likely to be accurate or them.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 85 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

L L M CAS E ST U DY

Section 7: Transforming
the Scientific Landscape

KEY POINTS

• LLMs will transform both the kind of research scientists do, and how they do it.

• Academic publishers are likely to develop LLMs to maintain their monopoly power over most
scientific literature.

• Using LLMs to conduct scientific evaluation will generate controversy among scientists.

• LLMs will reinforce Anglo-American dominance in science.

Throughout this report, we have anticipated are likely to be of greatest interest to the
the social, political, and equity implications scientific community because they will be
if LLMs are adopted across a range of sectors. the most up-to-date, in contrast to publicly
In this chapter, we examine how LLMs might available LLMs that may contain slightly
transform one sector in particular: science. older scientific knowledge. As they become
In this analysis, it is crucial to remember that more important to academic researchers,
the major LLMs currently under construction universities may be forced to maintain their
are based on corpora composed primarily of subscriptions. Less likely is that academic
open access texts available online. But, most publishers will sell their texts to the large
recent research publications–particularly companies for inclusion in their corpora,
scientific journal articles–are owned by because it would make their texts essentially
academic publishing companies such as available to everyone.
Elsevier and JSTOR. Therefore, they are not
part of these corpora. We expect that these In this new environment, LLMs will
publishers might develop their own LLMs transform scientific practices, including
that leverage their proprietary text databases, authorship and citations. They may also
particularly at a moment when universities transform peer review systems, which have
are frustrated by their high fees (Resnick increasingly come under scrutiny. LLMs will
& Belluz, 2019). These proprietary LLMs also reinforce Anglo-American dominance in

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 86 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

science. While they may help some scientists We also expect LLMs to profoundly shape
from low and middle income countries scientific practice. The development of
participate more actively in the international particle accelerators in the 1930s allowed
scientific community and engage in cross- physicists to investigate the structure of
national collaboration, the English and the atomic nucleus, and more recently to
Chinese language dominance of the corpora investigate subatomic particles (Ishkhanov,
will limit efforts to “decolonize” science. 2012). The polymerase chain reaction
Finally, LLMs will limit the power of the open technique, which makes millions of copies of
access movement, as academic publishers small pieces of DNA, transformed genetics
are likely to have more resources than and biotechnology research and enabled
governments, non-profit organizations, and mapping and sequencing the human
individuals to generate LLMs. genome, the study of ancient DNA, and
gene manipulation including CRISPR gene
editing (Rabinow, 2011). And the internet has
LLMs will transform already had profound impacts on research.
scientific practices It has made it easier for scholars to read
research across fields, and thus promote
interdisciplinary thinking (Herring, 2002). It
Remaking scientific has also helped researchers contact a wider
authorship and methods array of potential subjects, whether for
clinical trials or for surveys and interviews.
Given their capacity to process and Social scientists, for example, use email,
summarize huge amounts of text, we social media, and even the “crowdworking”
expect LLMs to have a profound impact on platform Mechanical Turk (MTurk) owned by
authorship and scientific methods as well Amazon to publicize their studies and recruit
as evaluation. As we describe in more detail subjects. MTurk allows researchers to access
below, researchers in non-English speaking a fairly representative population for a small
countries are likely to use LLMs to more fee (less than half of minimum wage) (Fort et
accurately translate texts or check their al., 2011).
grammar or spelling. This might make it
easier for them to publish in top journals, LLMs will similarly enable new forms of
which are invariably published in English. research, perhaps most notably in the
Even English-dominant researchers might humanities. Historians and scholars of
use LLMs to generate more generic parts English literature will be able to quickly
of scientific texts, including materials and generate summary information about
methods, and parts of introductions and historical texts or genres in the major corpora
conclusions. As we discuss in Section 5, we or new texts they wish to consider. However,
expect that these uses will trigger questions scholars may be reticent to use these sources
about rightful authorship. for two reasons. First, scholars accustomed
to using archives and carefully documenting
the provenance of texts are likely to be wary

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 87 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

of LLMs as data sources at least initially, Psychologists and political scientists could
because of the lack of transparency about use data from the corpora to assess public
the texts contained in the corpora and the attitudes and concerns. Given academic
inability to cite them specifically. Scholars pressures to publish (“or perish”), we expect
and academic publications will likely have to the proliferation of articles identifying data
develop conventions about whether and how correlations. However, without changing
LLMs are used and documented. Wikipedia, statistical methods, this could also increase
for example, has become an important source the production of spurious data that cannot
introducing scholars to a particular topic, but be reproduced.
is generally not acceptable as a reference in
serious scholarly work (Chen, 2009). Second,
Scientific Credit Systems will
because corpora predominantly include
dominant and privileged voices, they may be
Change
of less utility in fields that are increasingly
trying to capture the perspectives and Scientists identify the lineage of their

experiences of those who have been interests, theories, and methods through

historically marginalized. explicit citations to earlier work. This is an


important method of providing credit. It has
also become crucial to measuring scholarly
LLMs will also continue to transform the
impact. Scientists use “citation counts”
nature of scientific inquiry. In recent years,
to decide whether a publication is worth
there has been an explosion in enormous
reading, or citing in their own publications.
datasets and the computing power needed
Hiring, tenure, and promotion committees
to process them. As a result, scientists can
use these indicators to judge a scientist’s
now use algorithms to identify correlations
impact. Meanwhile, journals have developed
in huge datasets rather than starting with
“impact factors” based on the average
hypotheses (Huang, 2018; Kitchin, 2014).
number of times their articles are cited;
However, these correlations tell them neither
these impact factors in turn affect scientists’
about causality nor how such relationships
decisions where to publish and university
emerge. In addition, just because a correlation
decisions on how to evaluate employees and
appears in the data doesn’t mean it is real or
applicants. However, citation practices are
meaningful (Zhang, 2018). Researchers could
also highly political; white men tend to be the
also use LLMs as a new tool for data analysis,
most cited across fields (Caplar et al., 2017;
using them to extract insights from or
Dworkin et al., 2020).
summarize large amounts of text. Qualitative
researchers are often constrained by the
laborious manual processes of thematic We expect LLMs to reduce citations overall,

coding, for example, but LLMs would allow and ultimately reinforce existing biases in

them to analyze greater quantities of data research fields. While LLMs currently do not

or draw insights from data sources such have the technical capability to identify which

as social media posts that were previously text from the corpus informed the generated

too large to consider as research sources. text, if a future LLM is able to provide

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 88 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

citations along with the text summaries, we of science, Western leaders seemed to be
expect it to privilege highly cited articles changing the game. Similarly, marginalized
which are not likely to represent the field’s scientists might worry that LLMs will make it
diversity or its most novel findings. But in more difficult for them to receive credit and
the more likely scenario, scientists might for their ideas to become recognized as part of
query an LLM about the prevailing knowledge a mainstream corpus of knowledge.
related to a particular phenomenon and
simply treat the output
as general knowledge
that doesn’t need to
be cited. Consider the
They were frustrated that as soon
recent controversy
over sharing data about
as they had begun to build expertise
COVID-19 genomic and resources to participate in the
variants. Western
transnational world of science,
scientists advocated
putting this information
Western leaders seemed to be
into an open database changing the game. Similarly,
that could be used across
marginalized scientists might worry
the world, to facilitate
quicker understanding
that LLMs will make it more difficult
of disease progression for them to receive credit and for their
and development
ideas to become recognized as part of
of prophylactics,
diagnostics, and
a mainstream corpus of knowledge.
treatments (Van
Noorden, 2021).
However, scientists from
Southern countries protested, arguing that Transforming Peer Review
the open approach would rob them of the
opportunity to receive credit for their hard
We also expect research funding agencies,
work identifying variants such as Omicron
scientific publishers and editors, and even
(Maxmen, 2021). They worried further
patent systems to consider incorporating
that scientists from wealthy nations would
LLMs into their review processes. These
publish papers based on–but not citing–their
institutions depend on technical experts to
results, because they had the resources to do
assess the novelty of a study or invention,
further analysis, write up their findings, and
the appropriateness of the methods, and
submit them for publication. More generally,
the plausibility of findings. Invariably,
they were frustrated that as soon as they
these experts also advise researchers how
had begun to build expertise and resources
to consider and address counterfactuals,
to participate in the transnational world

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 89 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

strengthen their claims or findings, or simply reviewer expertise in terms of the number
improve their writing. But peer reviewers are of citations in a particular journal (or set of
unpaid, and as academic pressures increase journals), which may not represent a field’s
it is difficult to find good peer reviewers; cutting edge.
editors say that they spend an enormous
amount of time searching, and even then If humans begin to use LLMs to conduct
the reviewers may be uninformed, provide peer review itself, this could become a
insufficient evaluation, or take too long bigger problem. LLMs are likely to produce
and delay publication (Benos et al., 2007; conservative peer reviews. We expect
Severin & Chataway, 2021). LLMs could solve editors to use LLMs to scaffold parts of the
many of these problems. Developers could peer review process–that is, to train the
create algorithms based on the backlists of technology to look for particular elements in a
all scholarly publications, or smaller ones paper, such as particular methods–to ensure
targeted to a particular field or a particular quality reviews. However, this scaffolding
journal, in order to identify high-quality could produce inflexible standards and
publications and even advise authors how to slower recognition of truly novel results. It
improve their publications or fit better with could also transform scientific practices.
the journal’s standards. In fact, researchers Consider the history of the IRB, in which
have already begun to develop algorithms narrow definitions of risk, benefit, and
that claim to predict the grantability of generalizable research have become hurdles
patent applications, and even which patents for researchers (White, 2007). Or, educators
are likely to be the most consequential in K-12 schools, who have increasingly had
(Candia & Uzzi, 2021). The next step would to twist their instructional strategies to
be to use LLMs to determine patentability, a accommodate standardized testing (Shelton
particularly attractive option as patent offices & Brooks, 2019). Overall, LLMs might be
struggle to hire and retain their personnel. good at evaluating papers in a field where
the conventions, materials, and methods
In the short term, editors might use LLMs are well-established. However, it is hard to
as a half-measure, to help identify peer imagine how a corpus based on historical
reviewers. They might ask the LLM: “who is texts could adequately evaluate new and
an expert in X topic?” Editors have long used evolving science (Kuhn, 1962); we already
email and the internet in this way, which know that this is a challenge for human
has allowed them to diversify their pool of reviewers (Pontis et al., 2017). As a result,
reviewers. However, because LLM corpora are widespread use of LLMs for primary peer
composed of historical texts, this use might review could limit creativity. It could also
actually eliminate the gains in reviewer and perpetuate biases against certain types of
field diversity made in recent years. Unless investigation, such as on structural racism or
the LLM is used very carefully, and with systemic inequality (Hoppe et al., 2019).
additional checks, this use could also affect
a field’s trajectory. An LLM might define

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 90 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Scientific Evaluation by
free from economic pressures (Shapin,
1995). They maintained their credibility
LLMs will Create Crises by employing probabilistic discourse and
minimizing precision, so as to avoid direct
of Credibility conflict with their peers. Scientists also
trusted others’ findings because they could
LLM-based scientific evaluation systems witness the experiments themselves (Shapin
could also erode trust both within and & Schaffer, 1985). As the scientific enterprise
beyond science. Today, peer review is the grew, witnessing became “virtual”, through
predominant form of scientific evaluation. standardization of methods, research
Experts in a subfield review grant publications, and peer review (Baldwin, 2018).
applications and scientific publications, and These changes, however, came from within
validate the ideas or findings as credible the scientific community, invariably when
and worthy of funding or further circulation they concluded that they needed to establish
through scholarly journals or academic credibility among new audiences.
presses (Latour, 1987). Media outlets and
governments often expect research to be
In fact, professional communities respond
peer reviewed before reporting on it or using
quite poorly to externally imposed evaluation
it as the basis for policymaking. But this
systems, and these external impositions tend
approach to evaluating scientific results is
to be less successful when the community
not natural or self-evident; it is the product
is powerful. For example, in 1836 the US
of social negotiations and settlement. And
Congress passed a law requiring the Patent
it could certainly be otherwise. In the 17th
Office to employ examiners with science
century, wealthy gentlemen were assumed
and engineering backgrounds, to replace the
to be trustworthy–and producing credible
clerks who had previously handled patent
scientific findings–because they were
applications. It was concerned that the
bureaucracy was issuing too many patents
based on old, unoriginal, and non-workable
ideas, and believed that highly trained
technical experts would solve the problem
(Swanson, 2009). However, when these new
examiners applied scientific standards for
novelty and nonobviousness, they found
that very few applications should be granted.
Patent agents and lawyers, who were
accustomed to a bureaucracy that had only
legal criteria for granting patents, protested
vigorously and threatened that if no patents
were granted, the fledgling US economy
would fail. They were ultimately successful;
Credit: Philadelphia College of Pharmacy and Science (CC BY 4.0)
Patent Office administrators negotiated with

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 91 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

the new examiners to lower their standards.


Physicians launched similar protests when
LLMs will Reinforce
the United States began to consider a national Public Myths about
health care system in the mid-20th century,
because they worried that it would lead to
Science
new forms of oversight and evaluation (Starr,
1982). As we have discussed in earlier sections of this
report, we expect LLMs will increase the trend
towards open and free information facilitated
Especially because many scientists have
by the internet. Patients will be able to query
already begun to criticize the business models
disease symptoms and receive summaries of
of academic publishing–and ultimately
related medical articles. Curious individuals
distrust their intentions–we expect that
can generate lay summaries about the
if these companies build LLMs to replace
most technical topics, from astrophysics to
peer review it will create a similar crisis
artificial intelligence. In many respects, this
among scientists. Scientists will not trust the
will, as developers argue, democratize access
technology to replace their judgment, and
to knowledge.
will likely point out the types of limitations
that we have outlined above. We also
expect publics to question scientific results But as the technology presents complex

that LLMs have evaluated, particularly scientific findings in comprehensible

in the early days of the technology or in language, we expect that it will flatten

response to the publication of particularly important nuance, caveats, error rates, and

controversial ideas. And if communities uncertainties. This, we fear, will reinforce the

don’t trust evaluation systems then they will illusion that scientific findings are objective,

challenge the institutions promoting them. stanceless, value-free, and are generated

Prescription drug recalls have engendered with a view from nowhere. Ultimately,

not only mistrust in the US Food and Drug this could exacerbate public skepticism of

Administration, but hesitancy towards science. We have seen this with previous

vaccines (Goldenberg, 2021). Similarly, efforts to popularize science. Scientific

distrust in the US Centers for Drug Control journalism, for example, tends to minimize

and Prevention has exacerbated resistance to what scholars call the “translational gap”:

mask wearing and other protection measures the amount of additional research needed

during the COVID-19 pandemic. before scientific findings can lead to better
medical practice (Summers-Trio et al.,
2019). Instead, they tend to overestimate
the importance of early stage studies. For
example, many early biomedical studies
are performed on mice. This can provide
general indicators about the safety or
effectiveness of a particular treatment,
or shape of a particular phenomenon, but

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 92 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

mice are quite different physiologically journals. Journals may implement new forms
than humans. However, media articles of monetization by charging LLM developers
still report these results with breathless who use their university subscriptions to
excitement, creating false expectations about incorporate journal articles into training
the imminence of treatments and the power corpora. But we believe that LLMs will
of science (Chakradhar, 2019). Similarly, increase the attractiveness of Elsevier and
museums and other exhibitions such as other academic publishers themselves. Given
World’s Fairs tend to produce idealized their financial resources and monopolies over
images of cultures and countries, reinforcing huge volumes of scientific texts, publishers
distorted public understandings with real could create their own LLMs for researchers
geopolitical consequences (Swift, 2019). and bundle them in their services to academic
We expect LLMs to reinforce a similarly institutions. They might even require
idealized image of science, which will leave universities to purchase all of their journals in
publics bewildered and frustrated when order to access their LLM. Indeed, companies
they confront its realities. Ultimately, this frequently leverage emerging technologies
could exacerbate problems of public trust to maintain or enhance their monopoly
and alienation particularly among publics power. Monsanto spliced “terminator gene”
already questioning scientific findings (Funk, technology into its genetically modified
Kennedy, & Tyson, 2020; Funk, Kennedy, & crops in order to prevent them from
Johnson, 2020). replicating (Masood, 1998). This meant that
farmers could not replant their seeds after
the growing season, which they had done
LLMs will Hurt Open for hundreds of years. Similarly, academic
Access Movements publisher JSTOR, in conjunction with MIT,
used its internet surveillance capabilities to
track down and stop excessive downloads
Finally, we expect LLMs to become another
of journal articles it owned. An MIT student
tool for academic publishing giants to
activist Aaron Swartz downloaded these
maintain their control over scientific
articles in order to promote their open access;
knowledge. In recent years, researchers have
he was later criminally charged for this act
become increasingly concerned about how
and died by suicide (Schwartz, 2013).
journal subscription costs hurt access to
knowledge. This, they argue, limits who can
participate in scientific knowledge production Given the vitality of the open access
and ultimately, the quality of science itself. movement, we expect scientists to resist
In response, universities are canceling huge by creating grassroots LLMs. They might
journal subscriptions (Resnick & Belluz, build on the work of non-profit initiatives
2019). Researchers are sharing preprints such as Eleuther AI and rely on pro bono
on their own websites, or on portals such expertise and donated pre-prints and other
as Sci-Hub and ArXiV.org (Nicholas et al., text to develop apps. Scientists made similar
2019). They are publishing in “open access” attempts to gather data about disease-
causing mutations in genes linked to breast

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 93 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

and ovarian cancer (known as the BRCA physicians. This made it virtually impossible
genes), to compete with biotechnology to build a database as powerful or useful as
company Myriad Genetics’ virtual monopoly Myriad’s, which in turn made it difficult
on BRCA gene testing in the United States to challenge the company’s monopoly. We
(Conley et al., 2014). Myriad used its testing expect scientists developing grassroots LLMs
monopoly to build a proprietary database to confront similar challenges, even if they
of information about the genomic variants have access to adequate technical expertise
discovered, their association with disease, as and financial resources.
well as individual and family health histories.
Even though it lost its US testing monopoly in
2013 after patients, physicians, and scientists
LLMs Will Reinforce
contested its patents (Parthasarathy, Anglo-American
2017), Myriad maintained its intellectual
property through this database; patients
Scientific Dominance
and physicians preferred to use Myriad’s
Like the telephone and the internet,
LLMs may facilitate global scientific
communication and even cooperation.
However, given the technology’s capacity
to summarize and translate text, some
may assume that it could facilitate real
international inclusion and even the
“decolonization” of science. Consider how
the internet has changed science. Internet
search engines, scientific databases, and
social media have helped scientists learn
about and build upon one another’s work,
regardless of where they are in the world.
Email has facilitated communication,
allowing researchers to contact one another
and even collaborate despite living in
Credit: Ernesto Del Aguila III, NHGRI
different time zones or on distant continents.
Indeed, there is evidence that international
scientific collaboration has increased

testing service rather than others because the significantly in recent years, allowing

database could provide better interpretations scientists to share project costs, gain access

about the implications of the genetic variants to expansive or unique physical resources,

for disease. In order to build their alternative, share more data, and enhance creativity

scientists had to rely on word of mouth, (Matthews et al., 2020). And yet, technology-

and voluntary submissions of test results mediated communication also increases

and other information from patients and misunderstandings. Whereas previous

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 94 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

collaborations may have required scientists to LLMs to improve their English writing, to
visit laboratories for extended periods of time facilitate journal publication. While scientists
to learn methods, now such collaborations in former British or US colonies could also use
can occur without any in-person contact. This them to gain easier access to knowledge, they
makes it much more difficult to transfer tacit may still not have access to the proprietary
knowledge–intangible scientific practices– LLMs sold by academic publishing companies.
which is essential for proper collaboration Thus, while LLMs may help some scientists
(Collins, 1992). However, scientists may not in low and middle income countries, the
be aware that this knowledge is lost. prevailing political economy of science is
likely to prevent true
mutual learning and
engagement.

While LLMs may help some scientists


Instead, we expect
in low and middle income countries, LLMs to reinforce
the prevailing political economy of Anglo-American

science is likely to prevent true mutual dominance in science


while also helping
learning and engagement. Chinese scientists.
In fact, it may also
promote international
collaboration between
In the abstract, LLMs could allow scientists the two. Our research suggests that most
across countries to read texts in their native efforts to promote mutual understanding
languages, facilitating communication. across nations cannot escape geopolitical
In practice, however, the picture already power struggles. Consider the World’s Fairs,
looks more complicated. As we have noted international platforms to showcase national
repeatedly throughout this report, LLM scientific and technological achievements
corpora–particularly those being built by the and facilitate cultural exchange, which began
major companies–are primarily in English, in the late 18th century. Cities hosting these
and to a lesser extent, Chinese. This is crucial yearly events brought global attention to
when considering the impacts of LLMs for their activities, and the sites also usually
international scientific cooperation; it means featured themed pavilions from a variety
that the technology’s translation capabilities of countries that allowed them to showcase
are likely to be poor, particularly for the themselves and perhaps even develop
languages where there are fewer digitized grounds for collaboration (Molella & Knowles,
texts. While scientists in non-English 2019). However, countries used these as
speaking countries may initially use them for opportunities to advance their priorities. In
translation purposes, the outputs will likely 1993, South Korea’s fifth largest city Daejeon
be filled with errors and this practice will hosted a Specialized Expo which produced
stop. However, we do expect scientists to use international investment, and brought

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 95 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

attention to another region beyond the large will dominate global scientific priorities.
and prosperous city of Seoul (Knowles, 2019 Furthermore, knowledge produced in English
p. 207). Similarly, while both the United may be viewed as more generalizable than
States and Soviet Union focused on similar knowledge produced in other languages. And
themes of technological progress and cultural yet, these political implications may remain
diversity in the 1958 World’s Fair, the United hidden because LLMs will be promoted as a
States took a less serious approach in order to technology that will be able to truly globalize
downplay the perception of its strength and science.
power during the Cold War (Swift, 2019 p. 38).
Similarly, Nature has always characterized In this section, we have explored the range
itself as a premier scientific journal that of implications that LLMs will have on
explicitly serves an international community scientific knowledge and practice. We expect
despite its British base. However, in its early LLMs to transform scientific priorities
decades it saw the world through a British and practices, and systems of authorship,
lens (Baldwin, 2015). Contributors adopted a credit, and evaluation. This may produce
voyeuristic approach to foreign science, and crises of credibility, not only within science
often used it as a foil to comment on national and beyond. It will also strengthen the
affairs. power of scientific publishers, despite
growing frustration about their knowledge
The more common LLMs become as a monopolies. Finally, while we are hopeful
scientific tool, the more they will reinforce that LLMs could facilitate international
English as the lingua franca of science. This cooperation and inclusion, we fear that
will likely also mean that the values and this will not materialize unless the corpora
concerns of the English-speaking world– become much more diverse.
particularly the United States and Britain–

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 96 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Policy Recommendations
LLMs have great potential to benefit society. However, the priorities of the current
development landscape make it difficult for the technology to achieve this goal.
Below, we articulate how both LLMs (the models themselves, corpora, and output)
and LLM-based apps must be regulated in order to maximize the public good. We also
recommend greater scrutiny of LLMs’ impacts on labor and the environment. Finally,
we recommend that the National Science Foundation (and similar science funding
agencies around the globe) invest more heavily in research related to LLMs and their
impacts, to balance attention in an area currently dominated by the private sector.

1
RECOMMENDATION 1
The US government must regulate LLMs, for example through the Federal Trade
Commission. This should include:

a. Clear definition of what constitutes an LLM.

b. Evaluation and approval of LLMs based on: 1) process of corpus


development and ongoing procedures for maintenance and quality
assurance; 2) diversity of the corpus; 3) LLM performance including
accuracy particularly in terms of output related to marginalized communities;
4) transparency of the corpora and algorithms; and 5) data security.

c. Evaluation of efforts to diversify corpora. Government should monitor data


extraction practices to ensure that efforts to diversify the corpora are ethical.

d. A complaint system that allows users to document their negative


experiences with an LLM. These complaints should be publicly available.
Developers must articulate in writing how they have addressed all
complaints.

e. Ongoing oversight and monitoring of LLMs. Developers must make the


corpora available to regulators for periodic testing. This should include
both basic accessibility and comprehensibility to someone with a basic
understanding of data and computer science.

f. Requirement to label all LLM output as such and include information about
the developer.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 97 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

P O L I C Y R E C O M M E N D AT I O N S (CONTINUED)

2
RECOMMENDATION 2
The US government must regulate all apps that use LLMs, for example through
the Federal Trade Commission, according to their use. The more consequential
the LLM output, the greater the regulatory scrutiny (e.g., LLM-based apps related
to criminal justice and patient care receive more extensive evaluation). Evaluation
should consider:

a. Whether app developers are using the right LLM for their needs.

b. Likelihood that the app will generate false or dangerous results.

c. Potential benefits for the user.

d. Social, equity, and psychological implications, including potential harms to


end users.

3
RECOMMENDATION 3
Either a national or international standard setting organization (e.g., National
Institute for Standards and Technology, International Standards Organization)
must publish yearly evaluations of LLMs. They should assess: 1) diversity of the
corpora; 2) performance; 3) transparency; 4) accuracy; 5) data security; and 6) bias
towards marginalized communities.

4
RECOMMENDATION 4
The US government must enact comprehensive data privacy and security laws.

5
RECOMMENDATION 5
Under no circumstances should LLM-based apps deployed by the government
(e.g., chatbots that provide information about social services, pre-trial risk
assessment apps in criminal justice proceedings) harvest personally identifiable
information.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 98 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

P O L I C Y R E C O M M E N D AT I O N S (CONTINUED)

6
RECOMMENDATION 6
The agencies that regulate LLMs and LLM-based apps, those that incorporate
LLMs into its services, and all standard-setting bodies (e.g., the National Institute
for Standards and Technology) must employ full-time advisors in the social and
equity dimensions of technology. This “Chief Human Rights in Tech” Officer
would advise procurement and technology evaluation decisions, monitor the
technology once it is used and flag problems, and address disparate impacts.

7
RECOMMENDATION 7
Both national and international intellectual property authorities (e.g., the US
Copyright Office, the World Intellectual Property Organization) must develop
clear rules about the copyright status of LLM-generated inventions and artistic
works.

8
RECOMMENDATION 8
All environmental assessments of new data centers must evaluate the impacts
on local utility prices, local marginalized communities, human rights in minerals
mining, and climate change.

9
RECOMMENDATION 9
The US government must work with other governments around the world
(perhaps under the auspices of the United Nations) to develop global labor
standards for tech work (including minerals mining).

10
RECOMMENDATION 10
The government must evaluate the health, safety, and psychological risks that
LLMs and other forms of artificial intelligence create for workers, e.g., reorienting
them towards more complex and often unsafe tasks. The Occupational Safety
and Health Administration can perform this role, but it will require new
regulations for workplace safety and an expansion of its purview to include
psychological risks.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 99 PDF: Click to


return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

P O L I C Y R E C O M M E N D AT I O N S (CONTINUED)

11
RECOMMENDATION 11
The US government must develop a robust response to the job consolidation
that LLMs, and automation more generally, are likely to create. At a targeted level
this should include job retraining programs and at a broad level, a guaranteed
basic income and universal health care.

12
RECOMMENDATION 12
The National Science Foundation must substantially increase its funding for LLM
development. This funding should prioritize:

a. Developing alternative corpora and models, especially those driven by the


needs of low-income and marginalized communities (and in partnership
with them).

b. Meetings that establish standards for making corpora representative and


for incorporating the knowledge of citizens (particularly low-income and
marginalized communities)

c. Supporting updates and maintenance of existing corpora and models (in


contrast to just making more new models).

d. Support research into building new types of models that are more easily
updated and maintained.

e. Research into evaluation of fit between model and use.

f. Research on the equity, social, and environmental impacts of LLMs.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 100 PDF: Click to
return to top
WHAT’S
WHAT’S IN
IN THE
THE CHATTERBOX?
CHATTERBOX? LARGE
LARGE LANGUAGE
LANGUAGE MODELS,
MODELS, WHY
WHY THEY
THEY MATTER,
MATTER, AND
AND WHAT
WHAT WE
WE SHOULD
SHOULD DO
DO ABOUT
ABOUT THEM
THEM

Developers’ Code
of Conduct
LLMs are likely to trigger profound social change. Both LLM and app developers
must recognize their public responsibilities and try to maximize the benefits of
these technologies while minimizing the risks. To do this, they should adhere to the
following practices:

should be open to the problems identified


LLM Developer by these stakeholders and make changes
Responsibilities accordingly.

• LLM developers should dedicate significant • LLM developers should prioritize research in
effort and resources to maintaining and the following areas:
improving on existing LLMs rather than
• Building models that are easily
exclusively developing new ones. LLMs must
updated and maintained
be kept up to date with changing language
and sentiments. • Evaluating the fitness of a model for a
particular task
• LLM developers should curate corpora with
care. They should resist appropriating already • Equity, social, and environmental
assembled bodies of text that were created impacts of LLMs
for other purposes. They should instead
define standards their corpus needs to meet • Understanding and explaining to end

and build a collection of texts with those users the rationale behind LLM output

standards in mind.

• Construction of the corpora must be ethical App Developer


and be reviewed by ethics experts before Responsibilities
deployment. Authors should be able to opt-
out of their texts’ inclusion in the corpora. • App developers must carefully evaluate
the social and equity implications of their
• LLM developers should make each corpus products before development, with the help
publicly accessible for other developers and of potential users, relevant stakeholders,
interested stakeholders to scrutinize. They and experts who systematically analyze

UNIVERSITY
UNIVERSITY OF
OF MICHIGAN
MICHIGAN TECHNOLOGY
TECHNOLOGY ASSESSMENT
ASSESSMENT PROJECT
PROJECT APRIL
APRIL 2022
2022 101
101 PDF: Click to
return to top
WHAT’S
WHAT’S IN
IN THE
THE CHATTERBOX?
CHATTERBOX? LARGE
LARGE LANGUAGE
LANGUAGE MODELS,
MODELS, WHY
WHY THEY
THEY MATTER,
MATTER, AND
AND WHAT
WHAT WE
WE SHOULD
SHOULD DO
DO ABOUT
ABOUT THEM
THEM

technology (i.e., science and technology developers. App developers, in turn, must not
studies scholars). This includes systematic use LLMs to perform tasks they are not suited
analysis of both positive and negative for. Specifically:
implications for marginalized communities.
• LLMs should not be treated as a
• App developers must label LLM-generated source of intelligence since they
text as such. were trained to model language, not
understand the world. The fact that
LLMs “know” some things about the
Both LLM and App world is coincidental.
Developers • Developers should build apps and
• Rather than creating a few general purpose deploy LLMs only in situations
LLMs and assuming they are ready to be where up-to-date language patterns
integrated into a variety of apps, LLMs should are not necessary. Since LLMs are
be designed and evaluated for specific conservative, they replicate the past.
purposes. Both app and LLM developers
• An LLM cannot speak for everyone.
should work together or developers should
LLMs are universalizing; they favor
take on both of these roles.
dominant language patterns and
• Both LLM and app developers must support flatten nuance, but language is
low income and marginalized communities’ diverse even within a single language.
capacity to drive development. This includes This means that even an LLM that
providing funding and technical support so appears to be “neutral” will serve
that community organizations can develop members of the dominant group as it
their own apps and LLMs. In the process, alienates others.
developers must recognize that the trust
• Both LLM and app developers should
of marginalized communities is fragile, and
implement a complaint system for end
can only be achieved through authentic
users and other stakeholders to document
engagement and long-term relationships.
their negative experiences with an LLM.
• LLM developers must be fully transparent Developers should be sympathetic and
about the limitations of their technology, responsive to these concerns.
including in their discussions with app

UNIVERSITY
UNIVERSITY OF
OF MICHIGAN
MICHIGAN TECHNOLOGY
TECHNOLOGY ASSESSMENT
ASSESSMENT PROJECT
PROJECT APRIL
APRIL 2022
2022 102
102 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Recommendations for the


Scientific Community
We urge all professions to develop rules and guidelines to accommodate the rise of
LLMs. Because we focused our attention on how LLMs might affect science (Section
7), we offer recommendations specific to this community. We hope this will guide
researchers, journal editors, scientific publishers, and universities, as they contend
with this emerging technology.

Development of LLMs by the


scientific community
• If scientific publishers develop LLMs, they should:

• Provide users with information about how output is generated


(i.e., the composition of the corpora and the logic of the
algorithm).

• Ensure that the LLM is accessible to and accurate for non-


English speakers.

• The National Science Foundation should support the development of


an LLM that includes publicly available journal articles and all results
generated from their funding. It should deliberately include texts across
all fields. To ensure that it captures the nuances of a variety of fields,
experts from multiple disciplines–from the natural sciences to the
humanities–should test it before deployment.

• All authors should be permitted to opt-out of their texts’ inclusion in


LLM corpora.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 103 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

R E C O M M E N D AT I O N S F O R
THE SCIENTIFIC COMMUNITY (CONTINUED)

LLM use for evaluation


• If scientific journals and academic publishers use LLMs to evaluate
the quality of manuscripts, they must be transparent about this use.
This includes clear explanations on the publisher’s website so that
prospective authors can be fully informed about LLM use before
submission.

• Scientific journals and academic publishers should not rely completely


on LLMs for “peer review”. LLMs are likely to produce conservative
evaluations–and therefore be more critical of novel findings and ideas–
because they are based on historical texts.

Research using LLMs


• Scientific journals and academic publishers must develop rules for how
they–and peer reviewers–will evaluate research conducted using LLMs.

• All publications that rely on LLMs for text analysis should provide detail
about the corpora and algorithms on which the results are based.

Scientific communication using LLMs


• Scientific communicators should help publics understand how to use
LLMs to interpret science. This includes evaluating which LLMs are the
most appropriate for their needs, and how to understand the credibility
of LLM output.

• Scientific communicators and publics should test LLMs before


deployment to ensure that outputs related to scientific topics are
accurate, credible, and comprehensible.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 104 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Acknowledgements
The authors would like to thank Shelby Pitts, Daniel Rivkin, and Nick Pfost for their assistance in
researching, revising, and producing this report.

The Technology Assessment Project is supported in part through a generous grant from the Alfred
P. Sloan Foundation (grant #G-2021-16769)

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 105 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

References
Abid, A., Farooqi, M., & Zou, J. (2021, Alford, A. (2020, November 3). Large-
July). Persistent anti-muslim bias in scale multilingual AI models from
large language models. Proceedings of Google, Facebook, and Microsoft. InfoQ.
the 2021 AAAI/ACM Conference on AI, https://www.infoq.com/news/2020/11/
Ethics, and Society (pp. 298-306). https:// multilingual-ai-models/
doi.org/10.1145/3461702.3462624

Algolia. (n.d.). Retrieved March 13, 2022


Ahmed, N., & Wahed, M. (2020). from https://www.algolia.com/
The De-democratization of AI: Deep
Learning and the Compute Divide in Allen, B.L. (2003) Uneasy alchemy:
Artificial Intelligence Research. ArXiv. Citizens and experts in Louisiana’s chemical
https://arxiv.org/abs/2010.15581 corridor disputes. The MIT Press.

AI Now Institute. (2021, October 5). Allen, M. (2018, June 14). And the title of the
Democratize AI? How the proposed largest data center in the world and largest
National AI Research Resource falls data center in the US goes to…. DataCenters.
short. AINOW. https://medium.com/@ com. https://www.datacenters.com/news/
AINowInstitute/democratize-ai-how- and-the-title-of-the-largest-data-center-
the-proposed-national-ai-research- in-the-world-and-largest-data-center-in
resource-falls-short-96ae5f67ccfa

Ames, M. G. (2019). The charisma


AIRC. (n.d.). About AIRC. Retrieved machine: The life, death, and legacy of
March, 13, 2022 from https:// One Laptop per Child. MIT Press.
www.airc.aist.go.jp/en/intro/

Amnesty International. (2016, January


Al Jazeera. (2020, September 12). At least 19). Democratic Republic of Congo: “This
50 people feared dead in DR Congo mine is what we die for”: Human rights abuses
collapse. Al Jazeera. https://www.aljazeera. in the Democratic Republic of the Congo
com/news/2020/9/12/at-least-50-feared- power the global trade in cobalt. Amnesty
dead-in-dr-congo-mine-collapse International. https://www.amnesty.org/
en/documents/afr62/3183/2016/en/.
Alderman, L. (2019, December 26). Self-
checkout in France sets off battle over a Amnesty International. (2020, February
day of rest. The New York Times. https:// 10). Nigeria: 2020 could be Shell’s year
www.nytimes.com/2019/12/26/business/ of reckoning. Amnesty International.
self-checkout-automation.html

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 106 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

https://www.amnesty.org/en/latest/ Baker, S.H., (2019). Anti-resilience: A


news/2020/02/nigeria-2020-could- roadmap for transformational justice
be-shell-year-of-reckoning/ within the energy system. Harvard Civil
Rights- Civil Liberties Law Review, 54, 1-48.

Amnesty International. (2020, May 6). https://ssrn.com/abstract=3362355.

DRC: Alarming research shows long lasting


harm from cobalt mine abuses. Amnesty Baldwin, M. (2015). Making “Nature”:
International. https://www.amnesty.org/ The history of a scientific journal.
en/latest/news/2020/05/drc-alarming- University of Chicago Press.
research-harm-from-cobalt-mine-abuses/

Baldwin, M. (2018). Scientific autonomy,


Anderson, B. (1983). Imagined public accountability, and the rise
communities: Reflections on the origin of “peer review” in the Cold War
and spread of nationalism. Verso. United States. Isis, 109(3), 538-558.
https://doi.org/10.1086/700070

AT Editor. (2019, October 17). Wellcome


Sanger denies charge of misusing African Barbaschow, A. (2020, November 8).
DNA. Africa Times. https://africatimes. Australia’s critical infrastructure definition
com/2019/10/17/wellcome-sanger-denies- to span communications, data storage, space.
charge-of-misusing-african-dna/Auxier, ZDNet. https://www.zdnet.com/article/
B., Rainie, L., Anderson, M., Perrin, A., critical-infrastructure-definition-to-span-
Kumar, M., & Turner, E. (2019, November communications-data-storage-and-space/
15). Americans and privacy: Concerned,
confused and feeling lack of control over Baurick, T. (2020). Bayou Bridge Pipeline
their personal information. Pew Research protesters’ lawsuit against company can proceed,
Center. https://www.pewresearch.org/ judge rules. NOLA.com. https://www.nola.
internet/2019/11/15/americans-and-privacy- com/news/environment/article_05cacaa4-
concerned-confused-and-feeling-lack-of- 0358-11eb-b1a2-4303c8dedb22.html
control-over-their-personal-information/

Bender, E. M., Gebru, T., McMillan-Major,


Baird Equity Research. (2017). Equifax A., & Shmitchell, S. (2021). On the dangers
Inc. (EFX) announces significant data of stochastic parrots: Can language models
breach; -13.4% in after-hours. https:// be too big? Proceedings of the 2021 ACM
baird.bluematrix.com/docs/pdf/dbf801ef- Conference on Fairness, Accountability,
f20e-4d6f-91c1-88e55503ecb0.pdf and Transparency, 610–623. https://
doi.org/10.1145/3442188.3445922

Benjamin, R. (2019). Race After


Technology: Abolitionist Tools for
the New Jim Code. Polity Press.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 107 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Benos, D. J., Bashari, E., Chaves, J. M., Gaggar, Bijker, W., Hughes, T.P., & Pinch, T. (1987).
A., Kapoor, N., LaFrance, M., ... & Zotov, A. The social construction of technological
(2007). The ups and downs of peer review. systems: New directions in the sociology
Advances in physiology education. 31(2), 145- and history of technology. MIT Press.
152. https://doi.org/10.1152/advan.00104.2006

Birhane, A. (2021). The impossibility of


Berridge, C., & Levy, K. (2019, July automating ambiguity. Artificial Life, 27(1),
24). Webcams in nursing home rooms 44-61. https://doi.org/10.1162/artl_a_00336
may deter elder abuse – but are they
ethical? The Conversation. https:// Blijd, R. (2020, September 7). Will lawyers
theconversation.com/webcams-in- be replaced by GPT-3? Yes, and here’s when.
nursing-home-rooms-may-deter-elder- Spark Max. https://www.legalcomplex.
abuse-but-are-they-ethical-120208 com/2020/09/07/will-lawyers-be-
replaced-by-gpt-3-yes-and-heres-when/
Berry, I. (2021, September 20). Top 10 countries
with the most data centers. Data Centre Bommasani, R., Hudson, D. A., Adeli,
Magazine. https://datacentremagazine.com/ E., Altman, R., Arora, S., von Arx, S.… &
top10/top-10-countries-most-data-centres Liang, P. (2021). On the opportunities
and risks of foundation models. arXiv.
Bessell, T. L., Anderson, J. N., Silagy, C. A., https://arxiv.org/pdf/2108.07258.pdf.
Sansom, L. N., & Hiller, J. E. (2003). Surfing,
self-medicating and safety: buying non- Braun, L., Wolfgang, M., & Dickersin,
prescription and complementary medicines K. (2013). Defining race/ethnicity
via the internet. BMJ Quality & Safety, 12(2), and explaining difference in research
88-92. http://dx.doi.org/10.1136/qhc.12.2.88 studies on lung function. EUROPEAN
RESPIRATORY JOURNAL, 41(6), 9.
Betcher, M., Hanna A., Hansen, E.,
and Hirschmann, D. (2019, August Braun, L. (2014). Breathing Race into
21). Pipeline impacts to water quality: the Machine: The Surprising Career
Documented impacts and recommendations of the Spirometer from Plantation to
for improvements. Downstream Strategies Genetics. U of Minnesota Press.
and Hirschmann Water & Environment,
LLC. https://www.tu.org/wp-content/
Braun, L. (2021). Race correction and
uploads/2019/10/Pipeline-Water-Quality-
spirometry: why history matters.
Impacts-FINAL-8-21-2019.pdf
Chest, 159(4), 1670-1675. https://doi.
org/10.1016/j.chest.2020.10.046
Better language models and their implications.
(2019, February 14). OpenAI. https://openai.
​​
Broom, A. (2005). Medical specialists’
com/blog/better-language-models/
accounts of the impact of the Internet

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 108 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

on the doctor/patient relationship. Buss, D., Rutherford, B., Stewart, J., Côté,
Health, 9(3), 319-338. G.E., Sebina-Zziwa, A., Kibombo, R.,
Hinton, J., and Lebert, J. (2019, November).

Brown, A., Parrish, W., and Speri, A. Gender and artisanal and small-scale

(2017, June 3) Standing rock documents mining: Implications for formalization.

expose inner workings of ‘Surveillance- The Extractive Industries and Society, 6(4),

Industrial Complex.’ The Intercept. https:// 1101-1112. https://www.sciencedirect.com/

theintercept.com/2017/06/03/standing- science/article/pii/S2214790X19301522

rock-documents-expose-inner-workings-
of-surveillance-industrial-complex/ C., R. (2022). Almost 6 billion accounts
affected in data breaches in 2021.

Brown, M. (2018, October 29). One year Atlas VPN. https://atlasvpn.com/blog/


later: The impact of Equifax’s data breach. almost-6-billion-accounts-affected-

Transforming Data With Intelligence. in-data-breaches-in-2021

https://tdwi.org/articles/2018/10/29/biz-
all-impact-of-equifax-data-breach.aspx Cagle, S. (2019, July 8) ‘Protesters as
terrorists’: growing number of states

Brown, T., Mann, B., Ryder, N., Subbiah, turn anti-pipeline activism into a crime.

M., Kaplan, J. D., Dhariwal, P., ... & Amodei, The Guardian. https://www.theguardian.

D. (2020). Language models are few-shot com/environment/2019/jul/08/wave-

learners. Advances in neural information of-new-laws-aim-to-stifle-anti-

processing systems, 33, 1877-1901. pipeline-protests-activists-say

Browne, S. (2015). Dark matters. Cakebread, C. (2017, November 15). You’re

Duke University Press. not alone, no one reads terms of service


agreements. Business Insider. https://www.
businessinsider.com/deloitte-study-
​​Bureau of Transportation Statistics.
91-percent-agree-terms-of-service-
(2018). Travel Patterns of American
without-reading-2017-11?r=US&IR=T
Adults with Disabilities. Retrieved March
13, 2022 from https://www.bts.gov/
travel-patterns-with-disabilities. Callard, F., & Perego, E. (2021). How and
why patients made Long Covid. Social
science & medicine, 268, 113426. https://
Business & Human Rights Resource
doi.org/10.1016/j.socscimed.2020.113426
Centre. (2021, February). Transition
minerals tracker: global analysis of human
rights policies & practices. https://media. Candia, C., & Uzzi, B. (2021). Quantifying

business-humanrights.org/media/ the selective forgetting and integration

documents/2021_Transition_Minerals_ of ideas in science and technology.

Tracker_Monday_w_numbers_updated.pdf American Psychologist, 76(6), 1067.


https://doi.org/10.1037/amp0000863

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 109 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Caplar, N., Tacchella, S., & Birrer, S. (2017). Cohen, P. (2010, April 5). Indian tribes
Quantitative evaluation of gender bias in go in search of their lost languages. The
astronomical publications from citation New York Times. https://www.nytimes.
counts. Nature Astronomy, 1:141. https:// com/2010/04/06/books/06language.
www.nature.com/articles/s41550-017-0141 html#:~:text=Of%20the%20more%20
than%20300,reclamation%20efforts%20

Carlini, N., Tramer, F., Wallace, E., Jagielski, have%20shown%20success

M., Herbert-Voss, A., Lee, K., ... & Raffel, C.


(2020). Extracting training data from large Colchete, G., & Sen, B. (2020, October).
language models. arXiv. arXiv:2012.07805. Muzzling dissent: How corporate influence
over politics has fueled anti-protest laws.

Chakradhar, S. (2019, April 15). It’s just Institute for Policy Studies, https://ips-dc.org/
in mice! This scientist is calling out hype wp-content/uploads/2020/10/Muzzling-

in science reporting. STAT. https://www. Dissent-Anti-Protest-Laws-Report.pdf

statnews.com/2019/04/15/in-mice-twitter-
account-hype-science-reporting/ Cole, S. (2021, December 8). Workers are
using ‘mouse movers’ so they can use the

Chen, C.J. (2009). Art history: a guide bathroom in peace. Vice. https://www.vice.

to basic research resources. Collection com/en/article/88gqgp/mouse-mover-

Building, 28(3), 122-125. https://doi. jiggler-app-keep-screen-on-active

org/10.1108/01604950910971152
Collins, H. (1992). Changing order:

Chui, M., Manyika, J., and Miremadi, M. Replication and induction in scientific

(2015, November) Four fundamentals of practice. University of Chicago Press.

workplace automation. McKinsey Quarterly.


https://roubler.com/sg/wp-content/uploads/ Common Crawl. (n.d.). Want to use our data?
sites/49/2016/11/Four-fundamentals- https://commoncrawl.org/the-data/.
of-workplace-automation.pdf

Computer AI passes Turing test in ‘world


Coffey, D. (2021). Māori are trying to first’. (2014, June 9). BBC. https://www.
save their language from Big Tech. bbc.com/news/technology-27762088
Wired UK. https://www.wired.co.uk/
article/maori-language-tech Conley, J. M., Cook-Deegan, R., & Lázaro-
Muñoz, G. (2014). Myriad after myriad: the
proprietary data dilemma. North Carolina
journal of law & technology, 15(4), 597.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 110 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Cooper, A. (2019). Hear me out. Missouri Daws, R. (2020, October 28). Medical
Medicine, 116(6), 469-471. https://www.ncbi. chatbot using OpenAI’s GPT-3 told a
nlm.nih.gov/pmc/articles/PMC6913847/ fake patient to kill themselves. AI News.
https://artificialintelligence-news.

Council of the EU (2018). Copyright rules for com/2020/10/28/medical-chatbot-

the digital environment: Council agrees its openai-gpt3-patient-kill-themselves/

position. https://www.consilium.europa.
eu/en/press/press-releases/2018/05/25/ Day, T. (2017, August 1). Building data centers
copyright-rules-for-the-digital- creates jobs. U.S. Chamber of Commerce.
environment-council-agrees-its-position/# https://www.uschamber.com/technology/
building-data-centers-creates-jobs

Cowley, S., & Silver-Greenberg, J. (2019,


November 3). These machines can put you de Freytas-Tamura, K. (2021, August 19). Why
in jail. Don’t trust them. The New York Times. some people in Chinatown oppose a museum
https://www.nytimes.com/2019/11/03/ dedicated to their culture. The New York
business/drunk-driving-breathalyzer.html Times. https://www.nytimes.com/2021/08/19/
nyregion/chinatown-museum-protests.html

Cybersecurity and Infrastructure Security


Agency. (n.d.) Infrastructure security. Del Rey, J. and Ghaffary, S. (2020,
Cybersecurity and Infrastructure Security Agency. October 6). Leaked: Confidential Amazon
https://www.cisa.gov/infrastructure-security memo reveals new software to track
unions. Vox. https://www.vox.com/

Dale, R. (2017). The commercial NLP recode/2020/10/6/21502639/amazon-

landscape in 2017. Natural Language union-busting-tracking-memo-spoc

Engineering, 23(4), 641-647. https://


doi.org/10.1017/S1351324917000237 Denworth, L. (2014, April 25). Science gave
my son the gift of sound. TIME. https://time.

Dale, R. (2021). GPT-3: What’s it good for? com/76154/deaf-culture-cochlear-implants/

Natural Language Engineering, 27(1), 113-


118. doi:10.1017/S1351324920000601 Devlin, J., Chang, M. W., Lee, K., &
Toutanova, K. (2018). Bert: Pre-training

Data Center Map. (2022). Data Center Map. of deep bidirectional transformers

https://www.datacentermap.com/ for language understanding. arXiv.


https://arxiv.org/abs/1810.04805

Davies, W. (2022, February 24). How many


words does it take to make a mistake? Devlin, J., Change, M., Research Scientists,

London Review of Books, 44(4). https:// Google AI Language. (2018, November

www.lrb.co.uk/the-paper/v44/n04/ 2). Open sourcing BERT: state-of-the-

william-davies/how-many-words- art pre-training for natural language

does-it-take-to-make-a-mistake processing. Google AI Blog. https://

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 111 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

ai.googleblog.com/2018/11/open- Du Boff, R.B. (1984). The telegraph in


sourcing-bert-state-of-art-pre.html nineteenth-century America: Technology
and monopoly. Comparative Studies in

Dickens, A. G. (1974). The German nation Society and History, 26(4), 571-586.

and Martin Luther. Edward Arnold.


Duster, T. (1990). Backdoor to

Dickersin, K., Braun, L., Mead, M., eugenics. Routledge, Cop.

Millikan, R., Wu, A. M., Pietenpol, J.,


Troyan, S., Anderson, B., and Visco, F. Dworkin, J.D., Linn, K.A., Twitch, E.G.,
(2001). Development and implementation Shinohara, R.T., & Bassett, D.S. (2020). The
of a science training course for breast extent and drivers of gender imbalance
cancer activists: Project LEAD (leadership, in neuroscience reference lists. Nature
education and advocacy development). Neuroscience, 23(8), 918-926. https://
Health Expectations, 4(4), 213-220. https:// doi.org/10.1038/s41593-020-0658-y
doi.org/10.1046/j.1369-6513.2001.00153.x

Edwards Jr, M. U. (2004). Printing,


Dillet, R. (2021, March 11). Hugging Propaganda, and Martin Luther. Fortress Press.
Face raises $40 million for its natural
language processing library. TechCrunch. Eggers, W.D., Malik, N, & Gracie, M.
https://techcrunch.com/2021/03/11/ (2019). Using AI to unleash the power of
hugging-face-raises-40-million-for-its- unstructured government data. Deloitte
natural-language-processing-library/ Insights. https://www2.deloitte.com/us/
en/insights/focus/cognitive-technologies/
Dillon, G. A. O., Nancy. (2019, October 2). natural-language-processing-
Google using dubious tactics to target examples-in-government-data.html
people with “darker skin” in facial
recognition project: sources. Nydailynews. EleutherAI. (n.d.). Frequently asked
com. https://www.nydailynews.com/news/ questions. Retrieved March 3, 2022
national/ny-google-darker-skin-tones- from https://www.eleuther.ai/faq/
facial-recognition-pixel-20191002-
5vxpgowknffnvbmy5eg7epsf34-story.html
Else, H. (2021, October 26) Giant,
free index to world’s research papers
Downing, T.E. (2002, April). Avoiding released online. Nature. https://www.
new poverty: Mining-induced displacement nature.com/articles/d41586-021-
and resettlement. Mining, Minerals, 02895-8?utm_source=twt_nat&utm_
and Sustainable Development. medium=social&utm_campaign=nature
https://pubs.iied.org/sites/default/
files/pdfs/migrate/G00549.pdf
Englehardt, S., Han, J., & Narayanan,
A. (2018). I never signed up for this!

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 112 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Privacy implications of email tracking. Eschrich, J., & Miller, C. (2021, March
Proceedings on Privacy Enhancing 12). Cities of Light: A Collection of Solar
Technologies, 2018(1), 109–126. https:// Futures. Center for Science and the
doi.org/10.1515/popets-2018-0006 Imagination, Arizona State University.

Ensmenger, N. (2018, October). The Eubanks, V. (2018). Automating Inequality:


environmental history of computing. How High-Tech Tools Profile, Police, and
Technology and Culture, 59(4). http://www. Punish the Poor. St. Martin’s Press.
ctcs505.com/wp-content/uploads/2016/01/
Ensmenger-2018-The-Environmental- Euromines. (2020, May). The electronics
History-of-Computing.pdf value chain and its raw materials. Euromines.
http://euromines.org/files/key_value_
Environmental Justice Atlas. (2019, April chain_electronics_euromines_final.pdf
19). Tucuruí hydroelectric dam, Pará, Brazil.
Institute of Environmental Science and European Commission (2021). A
Technology at the Universitat Autònoma European approach to artificial intelligence.
de Barcelona. Retrieved March 13, 2022 Retrieved March 13, 2022 from https://
from https://ejatlas.org/conflict/tucurui- digital-strategy.ec.europa.eu/en/
hydroelectric-dam-and-the-assassination- policies/european-approach-artificial-
of-dilma-ferreira-silva-para-brazil intelligence#:~:text=The%20European%20
approach%20to%20artificial,in%20
Epstein, J., Donnan, S., & Bass, D. (2022, AI%20and%20trustworthy%20AI.
January 28). Treasury weighing alternatives
to ID.me over privacy concerns. Bloomberg. Evans, W. (2016, December 12). Uber said it
https://www.bloomberg.com/news/ protects you from spying. Security sources
articles/2022-01-28/treasury-weighing-id- say otherwise. Reveal. https://revealnews.org/
me-alternatives-over-privacy-concerns article/uber-said-it-protects-you-from-
spying-security-sources-say-otherwise/
Epstein, J., & Klinkenberg, W. D. (2001).
From Eliza to Internet: A brief history of Fagone, J. (2021, July 23). The Jessica
computerized assessment. Computers in Simulation: Love and loss in the age of A.I.
human behavior, 17(3). 295-314. https:// The San Francisco Chronicle. https://www.
doi.org/10.1016/S0747-5632(01)00004-8 sfchronicle.com/projects/2021/jessica-
simulation-artificial-intelligence/
Epstein, S. (1996). Impure science:
AIDS, activism, and the politics of Fairchild, D., & Weinrub, A. (2017).
knowledge. Univ of California Press. Energy democracy: Advancing equity in
clean energy solutions. Island Press.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 113 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

First Peoples Worldwide. (2019, March 14). France24. (2019, August 27). Protests
New report finds increase of violence coincides erupt after French supermarket uses
with oil boom. University of Colorado, automation to evade labour laws. https://
Boulder. https://www.colorado.edu/ www.france24.com/en/20190827-
program/fpw/2019/03/14/new-report-finds- protests-erupt-french-supermarket-
increase-violence-coincides-oil-boom automation-labour-laws-sunday-laws

Fisher, E., Mahajan, R. L., & Mitcham, C. Frankel, T.C., & Whoriskey, P. (2016,
(2006). Midstream modulation of technology: December 19). Tossed aside in the ‘white
Governance from within. Bulletin of Science, gold’ rush. Washington Post. https://www.
Technology & Society, 26(2), 485-496. https:// washingtonpost.com/graphics/business/
doi.org/10.1177/0270467606295402 batteries/tossed-aside-in-the-lithium-rush/

Fitzgerald, C., & Hurst, S. (2017). Implicit bias Frynas, J.G. (2001). Corporate and state
in health care professionals: a systematic responses to anti-oil protests in the Niger
review. BMC Medical Ethics, 18(19). https:// Delta. African Affairs, 100(398), 27–54,
doi.org/10.1186/s12910-017-0179-8 http://www.jstor.org/stable/3518371.

Fort, K., Adda, G., & Cohen, K. B. Funk, C., Kennedy, B., Johnson, C. (2020,
(2011). Amazon Mechanical Turk: Gold May 21). Trust in medical scientists has
mine or coal mine?. Computational grown in U.S., but mainly among Democrats.
Linguistics, 413-420. https://hal. Pew Research Center. https://www.
archives-ouvertes.fr/hal-00569450 pewresearch.org/science/2020/05/21/
trust-in-medical-scientists-has-grown-

Foster, A. L. (2002, May 17). Plagiarism- in-u-s-but-mainly-among-democrats/

detection tool creates legal quandary. The


Chronicle of Higher Education. https:// Funk, C., Kennedy, B., Tyson, A. (2020,
www.chronicle.com/article/plagiarism- August 28). Black Americans have less
detection-tool-creates-legal-quandary/ confidence in scientists to act in the public
interest. Pew Research Center. https://www.

Foster, L. A. (2018). Reinventing hoodia: pewresearch.org/fact-tank/2020/08/28/

peoples, plants, and patents in South black-americans-have-less-confidence-in-

Africa. Wits University Press. scientists-to-act-in-the-public-interest/

Foster, M. W., Eisenbraun, A. J., & Carter, Galligan, C., Rosenfeld, H., Kleinman, M.,

T. H. (1997). Communal discourse as a & Parthasarathy, S. (2020). Cameras in the

supplement to informed consent for genetic Classroom: Facial Recognition Technology in

research. Nature Genetics, 17(3), 277–279. Schools. Technology Assessment Project,

https://doi.org/10.1038/ng1197-277 Science, Technology and Public Policy


Program, University of Michigan. http://

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 114 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

stpp.fordschool.umich.edu/sites/stpp. Glanz, J., Creswell, J., Kaplan, T., Wichter, Z.


fordschool.umich.edu/files/file-assets/ (2019, February 3). After a Lion Air 737 Max
cameras_in_the_classroom_full_report.pdf crashed in October, questions about the plane
arose. The New York Times. https://www.

Gao, L., Biderman, S., Black, S., Golding, L., nytimes.com/2019/02/03/world/asia/lion-

Hoppe, T., Foster, C., Phang, J., He, H., Thite, air-plane-crash-pilots.html?partner=IFTTT

A., Nabeshima, N., Presser, S., and Leahy, C.


(2020, December, 31). The pile: An 800GB Glanz, J. (2012, September 23). Data
dataset of diverse text for language modeling. barns in a farm town, gobbling power
arXiv. https://arxiv.org/abs/2101.00027 and flexing muscle. The New York Times.
https://www.nytimes.com/2012/09/24/

Gavin, B. (2018, May 25). How big are technology/data-centers-in-rural-


gigabytes, terabytes, and petabytes? How-to washington-state-gobble-power.html

Geek. Retrieved March 13, 2022 from https://


www.howtogeek.com/353116/how-big- Gokaslan, A., and Cohen, V. (2019).
are-gigabytes-terabytes-and-petabytes/ OpenWeb text corpus. http://Skylion007.
github.io/OpenWebTextCorpus

Gershgorn, D. (2021, July 20). DuckDuckGo


launches new Email Protection service Golden, H. (2021, October 15). Indigenous
to remove trackers. The Verge. https:// tribes tried to block a car battery mine.
www.theverge.com/2021/7/20/22576352/ But the courts stood in the way. The
duckduckgo-email-protection- Guardian. https://www.theguardian.com/
privacy-trackers-apple-alternative environment/2021/oct/15/indigenous-
tribes-block-car-battery-mine-courts

Gibson, E. (2020, August 26). New NSF


AI research institutes to push forward the Goldenberg, M. J. (2021). Vaccine hesitancy:
frontiers of artificial intelligence. NSF. public trust, expertise, and the war on
https://beta.nsf.gov/science-matters/ science. University of Pittsburgh Press.
new-nsf-ai-research-institutes-push-
forward-frontiers-artificial-intelligence Gordon, M. T. (2000). Public trust in
government: The US media as an agent
Gilliard C., & Golumbia, D. (2021, July of accountability?. International Review of
6). Luxury surveillance. Real Life. https:// Administrative Sciences, 66(2), 297-310.
reallifemag.com/luxury-surveillance/ https://doi.org/10.1177/0020852300662006

Gray, M.L., & Suri, S. (2019). Ghost work:


How to stop silicon valley from building a
new global underclass. Harper Business.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 115 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Greene, T. (2021, September 20). Gusmano, M.K., Kaebnick, G.E., Maschke,


DeepMind tells Google it has no idea K.J., Neuhaus, C.P., & Wills, B.C. (2021).
how to make AI less toxic. The Next Public deliberation about gene editing in
Web. https://thenextweb.com/news/ the wild. Hastings Center Report. 51 (S2):
deepmind-tells-google-no-idea-make- S2-S10. https://doi.org/10.1002/hast.1314
ai-less-toxic?utm_campaign=Neural%20
Newsletter&utm_medium=email&_ Guston, D. H., & Sarewitz, D. (2002). Realtime
hsmi=162703071&_hsenc=p2ANqtz-9uk Technology Assessment. Technology in
UzOvODtsxryos3QqQUtYtzktnr7KI4FUY Society, 24(1-2), 93-109. https://doi.
HBBkqAmxLLBL8bZGYEK9Y5nB3n9b05 org/10.1016/S0160-791X(01)00047-1
ishvnRk06Phu1JpWwlzGSPParw&utm_
content=162703071&utm_source=hs_email
Hamlett, P., Cobb, M. D., & Guston, D.
H. (2013). National citizens’ technology
Grimmelman, J. (2016). Copyright for literate forum: Nanotechnologies and human
robots. Iowa Law Review, 101(2). https:// enhancement. Nanotechnology, the Brain,
ilr.law.uiowa.edu/print/volume-101- and the Future, 265-283. https://doi.
issue-2/copyright-for-literate-robots/ org/10.1007/978-94-007-1787-9_16

Grother, P., Ngan, M., & Hanaoka, K. Hao, Karen (2020, December 16).
(2019). Face recognition vendor test ‘I started crying’: Inside Timnit
part 3: demographic effects. National Gebru’s last days at Google--and
Institute of Standards and Technology, 8280. what happens next. MIT Technology
https://doi.org/10.6028/nist.ir.8280 Review. https://www.technologyreview.
com/2020/12/16/1014634/google-ai-
Guadamuz, A. (2016). The monkey selfie: ethics-lead-timnit-gebru-tells-story/
copyright lessons for originality in
photographs and internet jurisdiction. Hao, K. (2021, May 20). The race to
Internet Policy Review, 5(1). https:// understand the exhilarating, dangerous
doi.org/10.14763/2016.1.398 world of language AI. MIT Technology
Review. https://www.technologyreview.
Guendelsberger, E. (2019, July 18). I com/2021/05/20/1025135/ai-large-
worked at an Amazon fulfillment center; language-models-bigscience-project/
They treat workers like robots. TIME.
https://time.com/5629233/amazon- Haranas, M. (2021, March 21). Microsoft to
warehouse-employee-treatment-robots/ build new $200M data center as Azure sales
soar. CRN. https://www.crn.com/news/
data-center/microsoft-to-build-new-
200m-data-center-as-azure-sales-soar

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 116 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Haranas, M. (2022, January 20). Data center com/technology/2022/02/16/clearview-


market 2022 forecast: Private equity takes expansion-facial-recognition/
over. CRN. https://www.crn.com/news/
data-center/data-center-market-2022- Heaven, W.D. (2020, October 8). A GPT-
forecast-private-equity-takes-over 3 bot posted comments on Reddit for a
week and no one noticed. MIT Technology
Harmon, A. (2010, April 21). Indian Review. https://www.technologyreview.
Tribe Wins Fight to Limit Research of Its com/2020/10/08/1009845/a-gpt-3-
DNA. The New York Times. https://www. bot-posted-comments-on-reddit-
nytimes.com/2010/04/22/us/22dna. for-a-week-and-no-one-noticed/
html?scp=1&sq=indian%20tribe%20
wins%20fight%20to%20limit%20 Heaven, W.D. (2021, May 14). Language
research%20on%20its%20dna&st=cse models like GPT-3 could herald a new
type of search engine. MIT Technology
Harnish, K. (2019, September 4). Oregon Review. https://www.technologyreview.
labor union wants voters to limit grocers com/2021/05/14/1024918/language-
to two self-checkout stations per store. models-gpt3-search-engine-google/
Willamette Week. https://www.wweek.com/
news/state/2019/09/04/oregon-labor- Herring, S.D. (2002). Use of electronic
union-wants-voters-to-limit-grocers-to- resources in scholarly electronic
two-self-checkout-stations-per-store/ journals: a citation analysis. College
& Research Libraries, 63(4), 334-340.
Harris, R. (2020, December 16). Oxygen- https://doi.org/10.5860/crl.63.4.334
Detecting Devices Give Misleading Readings
in People with Dark Skin. NPR. https://www. Hirst, D. (2020, September 9). How data
npr.org/2020/12/16/947261192/oxygen- centers became as important as water and
detecting-devices-give-misleading- energy. Data Centre Dynamics Ltd. https://
readings-in-people-with-dark-skin www.datacenterdynamics.com/en/
opinions/how-data-centres-became-
Harris, S. (2018, December 8). ‘They kill important-water-and-energy/
jobs’: Meet Canadians who refuse to use
self-checkout. CBC. https://www.cbc.ca/ Hoewe, J., Brownell, K. C., & Wiemer, E. C.
news/business/self-checkout-cashier- (2020, October). The role and impact of Fox
jobs-retail-automation-1.4937040 News. The Forum 18(3), 367-388. De Gruyter.
https://doi.org/10.1515/for-2020-2014
Harwell, D. (2022). Facial recognition
firm Clearview AI tells investors it’s Hogan, A.J. (2016). Life histories of genetic
seeking massive expansion beyond diseases. Johns Hopkins University Press.
law enforcement. The Washington Post.
February 16. https://www.washingtonpost.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 117 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Hoppe, T. A., Litovitz, A., Willis, K. A., Isberto, M. (2021, June 9). Are there
Meseroll, R. A., Perkins, M. J., Hutchins, benefits of a rural data center?. Colocation
B. I., Davis, A.F., Lauer, M.S., Valentine, America. https://www.colocationamerica.
H.A., and Santangelo, G. M. (2019). Topic com/blog/rural-data-center-benefits
choice contributes to the lower rate of
NIH awards to African-American/black Ishkhanov, B. S. (2012). The atomic nucleus.
scientists. Science advances, 5(10). https:// Moscow University Physics Bulletin, 67(1), 1-24.
doi.org/10.1126/sciadv.aaw7238 https://doi.org/10.3103/S0027134912010092

Howell, J.D. (1995) Technology in the hospital: Jemisin, N. K. (2011). The Trojan Girl.
Transforming patient care in the early twentieth Weird Tales #357. https://nkjemisin.
century. Johns Hopkins University Press. com/2012/08/the-trojan-girl/

Hristov, K. (2017). Artificial intelligence and Jemisin, N. K. (2012). Valedictorian. After:


the copyright dilemma. IDEA - The Journal Nineteen stories of apocalypse and dystopia (T.
of the Franklin Pierce Center for Intellectual Windling & E. Datlow, Eds.). Little, Brown.
Property, 57(3), 431-454. https://ipmall.
law.unh.edu/sites/default/files/hosted_
Johnson, P. (2017). With the public clouds of
resources/IDEA/hristov_formatted.pdf.
Amazon, Microsoft, and Google, big data is
the proverbial big deal. Forbes. https://www.
Huang, S. (2018). The tension between forbes.com/sites/johnsonpierr/2017/06/15/
big data and theory in the “omics” era with-the-public-clouds-of-amazon-
of biomedical research. Perspectives in microsoft-and-google-big-data-is-the-
biology and medicine, 61(4), 472-488. proverbial-big-deal/?sh=1a5dc7652ac3

Hughes, T.P. (1983). Networks of Power: Kelly, H. (2021, November 19). For seniors
Electrification in Western Society, 1880- using tech to age in place, surveillance
1930. Johns Hopkins University Press. can be the price of independence.
The Washington Post. https://www.
Hutchins, J. (2003). ALPAC: the washingtonpost.com/technology/2021/11/19/
(in) famous report. Readings in seniors-smart-home-privacy/
machine translation, 14, 131-135.

Kennedy, B., Tyson, A., and Funk, C.


Interpol. (2020). Our 19 databases. (2022, February 15). Americans’ trust in
Retrieved March 13, 2022 from https:// scientists, other groups declines. Pew
www.interpol.int/en/How-we-work/ Research Center. https://www.pewresearch.
Databases/Our-19-databases org/science/2022/02/15/americans-trust-
in-scientists-other-groups-declines/

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 118 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Kilgo, D. K., Wilner, T., Masullo, G. M., Kulkarni, P., & K, C. N. (2021). Personally
& Bennett, L. K. (2020). News distrust Identifiable Information (PII) detection
among Black Americans is a fixable problem. in the unstructured large text corpus
Center for Media Engagement. https:// using Natural Language Processing
mediaengagement.org/research/news- and unsupervised learning technique.
distrust-among-black-americans International Journal of Advanced Computer
Science and Applications, 12(9). https://

Kitchin, R. (2014). Big Data, new doi.org/10.14569/ijacsa.2021.0120957

epistemologies and paradigm shifts. Big data


& society, 1(1), 2053951714528481. https:// Latour, B. (1987). Science in action: How
doi.org/10.1177%2F2053951714528481 to follow scientists and engineers through
society. Harvard university press.
Kline, R. & Pinch, T. (1996). Users as agents of
technological change: The social construction Lawson, M.F. (2021, September 1). The
of the automobile in the rural united states. DRC mining industry: Child labor and
Technology and Culture, 37(4), 763-795. formalization of small-scale mining. Wilson
Center. https://www.wilsoncenter.org/blog-

Knight, W. (2021, August 24). AI post/drc-mining-industry-child-labor-

can write disinformation now—and and-formalization-small-scale-mining

dupe human readers. WIRED. https://


www.wired.com/story/ai-write- Leahy, C., Hallahan, E., Gao, L., Biderman,
disinformation-dupe-human-readers/ S. (2021, July 7). What a long, strange trip
it’s been: EleutherAI one year retrospective.

Knowles, S. G. (2019). Does the World’s Fair EleutherAI. https://blog.eleuther.ai/year-one/

Still Matter?. In Molella, A. P., & Knowles, S.


G. (Eds.). World’s Fairs in the Cold War: Science, Levinger, M. (2020). Triad in the
technology, and the culture of progress. (pp. therapy room - The interpreter,
194-212) University of Pittsburgh Press. the therapist, and the deaf person.
Journal of Interpretation, 28(1). https://

Ku, E., McCulloch, C.E., Adey, D.B., Li, L. & digitalcommons.unf.edu/joi/vol28/iss1/5

Johansen, K.L. (2021). Racial disparities in


eligibility for preemptive waitlisting for Levy, K.E.C. (2016). Digital Surveillance in
kidney transplantation and modification the hypermasculine workplace. Feminist
of eGFR thresholds to equalize waitlist Media Studies, 16(2), 361-365., https://doi.
time. Journal of the American Society org/10.1080/14680777.2016.1138607.
of Nephrology, 32, 677-685. https://
doi.org/10.1681/ASN.2020081144 Li, C. (2020, June 3). OpenAI’s GPT-3
language model: A technical overview.
Kuhn, T. (1962). The structure of scientific Lambda Labs. https://lambdalabs.
revolutions. University of Chicago Press. com/blog/demystifying-gpt-3/

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 119 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Liddy, E.D. (2001). Natural Language Conference on Natural Language Processing,


Processing. Encyclopedia of Library 2, doi:10.18653/v1/2021.acl-short.24.
and Information Science, 2nd Ed.
NY. Marcel Decker, Inc. https:// Luitse, D., and Denkena, W. (2021). The
surface.syr.edu/istpub/63/ great transformer: Examining the role
of large language models in the political
Lincoln, A.E., Pincus, S., Koster, J.B., Leboy, economy of AI. Big Data & Society, 8(2).
P.S. (2012). The Matilda Effect in science: https://doi.org/10.1177/20539517211047734
Awards and prizes in the US, 1990s and 2000s.
Social Studies of Science. 42(2). 307-320. Luong, N., & Arnold, Z. (2021). China’s
https://doi.org/10.1177%2F0306312711435830 Artificial Intelligence industry alliance.
Center for Security and Emerging Technology.
Lombrana, L.M. (2019, July 10) Walmart https://doi.org/10.51593/20200094
Workers Rebel Against Retailer’s Robot
Push in Chile. Bloomberg. https://www. Mall, A. (2021, May 24). Gas pipelines:
bloomberg.com/news/articles/2019-07-10/ Harming clean water, people, and the
walmart-workers-rebel-against- planet. National Resources Defense Council,
retailer-robot-push-in-chile Inc. https://www.nrdc.org/experts/
amy-mall/gas-pipelines-harming-
Lopez, R. (2012). Urban Renewal and Highway clean-water-people-and-planet
Construction. In Lopez, R., Building American
public health: Urban planning, architecture, Manjoo, F. (2020, July 29). How do you know
and the quest for better health in the United a human wrote this? The New York Times.
States (199-138). Palgrave Macmillan. https://www.nytimes.com/2020/07/29/
opinion/gpt-3-ai-automation.html
Lothian-McLean, M. (2020, April 27).
“Black Woman in US dies after being Mariana, S. (2020, May 27). Coronavirus: The
turned away from Hospital she worked human cost of virus misinformation. BBC.
at for 31 years.” indy100. https://www. https://www.bbc.com/news/stories-52731624
indy100.com/article/coronavirus-
black-health-care-worker-dies-test-
Masood, E. (1998). Monsanto set to back down
detroit-deborah-gatewood-9485341.
over ‘terminator’ gene?. Nature, 396(6711),
Downloaded May 20, 2020.
503-503. https://doi.org/10.1038/24949

Luccioni, A., and Viviano, J. (2021). What’s


Mateescu, A., and Elish, M.C. (2019, January
in the box?: An analysis of undesirable
30). AI in context: The labor of integrating
content in the common crawl corpus.
new technologies. Data & Society. https://
Proceedings of the 59th Annual Meeting
datasociety.net/library/ai-in-context/
of the Association for Computational
Linguistics and the 11th International Joint

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 120 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Matthews, K. R., Yang, E., Lewis, S. Michelson, E. (2016). Assessing the societal
W., Vaidyanathan, B. R., & Gorman, implications of emerging technologies:
M. (2020). International scientific Anticipatory governance in practice. Routledge.
collaborative activities and barriers to
them in eight societies. Accountability in Miller, Johnny. (2018, February 21).
Research, 27(8), 477-495. https://doi.or Roads to nowhere: How infrastructure
g/10.1080/08989621.2020.1774373 built on American inequality. The
Guardian. www.theguardian.com/
Maxmen, A. (2021). Why some researchers cities/2018/feb/21/roads-nowhere-
oppose unrestricted sharing of coronavirus infrastructure-american-inequality.
genome data. Nature, 593(7858), 176-177.

MIT CSAIL. (n.d.) Strategic partners. Retrieved


McMullan, M. (2006). Patients using the March 3, 2022 from https://www.csail.
Internet to obtain health information: mit.edu/sponsors/strategic-partners
how this affects the patient–health
professional relationship. Patient education Mock, B. (2017, February 16).The meaning
and counseling, 63(1-2), 24-28. https:// of blight. Bloomberg. https://www.
doi.org/10.1016/j.pec.2005.10.006 bloomberg.com/news/articles/2017-02-16/
why-we-talk-about-urban-blight
McShane, C. (1999) The origins and
globalization of traffic control signals. Molella, A. P., & Knowles, S. G. (Eds.).
Journal of Urban History, 25(3), 379-404., (2019). World’s Fairs in the Cold War:
doi:10.1177/009614429902500304. Science, Technology, and the Culture of
Progress. University of Pittsburgh Press.
Merry-Noel, C. (2015). Sighted/Human
Guide: One Instructor’s Perspective. Moltzau, A. (2020, August 2). A short history
National Federation of the Blind. of natural-language understanding. Towards
Retrieved March 13, 2022 from https:// Data Science. https://towardsdatascience.
nfb.org//sites/default/files/images/nfb/ com/a-short-history-of-natural-
publications/fr/fr34/1/fr340110.htm language-understanding-f1b3c382f285

Metz, C., & Wakabayashi, D. (2020, Dec Monaghan, J., & Walby, K. (2017). Surveillance
3). Google researcher says she was fired of environmental movements in Canada:
over paper highlighting bias in A.I. The critical infrastructure protection and the
New York Times. https://www.nytimes. petro-security apparatus, Contemporary
com/2020/12/03/technology/google- Justice Review, 20(1), 51-70. https://doi.
researcher-timnit-gebru.html. org/10.1080/10282580.2016.1262770

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 121 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Moore, W C. (1959). The questioned Moss, S. (2021, October 21). Data center water
typewritten document. Minnesota usage remains hidden. Data Center Dynamics.
Law Review, 2585. https://core.ac.uk/ https://www.datacenterdynamics.com/en/
download/pdf/217208687.pdf analysis/data-center-water-usage-remains-
hidden/#:~:text=Direct%20water%20

Morales, G. D. F., Monti, C., & Starnini, consumption%20of%20US,coming%20

M. (2021). No echo in the chambers straight%20from%20the%20utility

of political interactions on Reddit.


Scientific Reports, 11(1), 1-12. https://doi. Muller, O. (2021, June 28). Here’s why
org/10.1038/s41598-021-81531-x Black TikTok creators are boycotting
Megan Thee Stallion’s new song. TODAY.

Moran-Thomas, A. (2020, August 5). https://www.today.com/tmrw/here-


How a popular medical device encodes s-why-black-tiktok-creators-are-

racial bias. Boston Review. https:// boycotting-megan-thee-t223706

bostonreview.net/articles/amy-
moran-thomas-pulse-oximeter/ Murphy, E. E., & Tong, J. (2019). The
Racial Composition of Forensic DNA

More Perfect Union. (2021, November 9). Databases. California Law Review, 108(6).

Google threatens water supply of drought- https://doi.org/10.15779/Z381G0HV8M

stricken town. https://perfectunion.us/google-


data-center-water-supply-oregon-drought/ Murray, E., Lo, B., Pollack, L., Donelan, K.,
Catania, J., White, M., Zapert, K., & Turner,

Morgan, T.P. (2020, February 15). The R. (2003). The impact of health information

datacenter has an appetite for GPU compute. on the internet on the physician-patient

The Next Platform. https://www.nextplatform. relationship: patient perceptions. Archives

com/2020/02/15/the-datacenter-has- of internal medicine, 163(14), 1727-1734.

an-appetite-for-gpu-compute/ https://doi.org/10.1001/archinte.163.14.1727

Morris, J. S. (2007). Slanted objectivity? Nader, R. (1965). Unsafe at any

Perceived media bias, cable news exposure, speed. Grossman Publishers.

and political attitudes. Social science


quarterly, 88(3), 707-728. https://doi. National Conference on State
org/10.1111/j.1540-6237.2007.00479.x Legislatures. (2014, August 5). Forensic
science database. https://www.ncsl.org/
research/civil-and-criminal-justice/
dna-database-search-by-policy.aspx

National Highway Traffic Safety


Administration. (2021). 2020 Fatality
data show increased traffic fatalities

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 122 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

during pandemic. https://www.nhtsa. cancer diseases. Frontiers in Genetics, 9(624).


gov/press-releases/2020-fatality- https://doi.org/10.3389/fgene.2018.00624
data-show-increased-traffic-
fatalities-during-pandemic. OpenAI. (2021, December 14). Customizing
GPT-3 for your application. https://
Nelkin, D. E. (1992). Controversy: politics of openai.com/blog/customized-gpt-3/
technical decisions. Sage Publications, Inc.

OpenAI. (2021, August 10). OpenAI Codex.


Ng, A., (2018, September 7). How the Equifax https://openai.com/blog/openai-codex/
hack happened, and what still needs to be
done. CNET. https://www.cnet.com/tech/ Oppy, G., & Dowe, D. (2021). The Turing test.
services-and-software/equifaxs-hack- Stanford Encyclopedia of Philosophy. http://
one-year-later-a-look-back-at-how- seop.illc.uva.nl/entries/turing-test/
it-happened-and-whats-changed/

Ottinger, G. (2010). Buckets of resistance:


Nicholas, D., Boukacem-Zeghmouri, C., Standards and the effectiveness of citizen
Xu, J., Herman, E., Clark, D., Abrizah, science. Science, Technology, & Human
A., ... & Świgoń, M. (2019). Sci-Hub: The Values, 35(2), 244-270. https://doi.
new and ultimate disruptor? View from org/10.1177%2F0162243909337121
the front. Learned Publishing, 32(2), 147-
153. https://doi.org/10.1002/leap.1206
Palmer, A. (2020, October 24). How
Amazon keeps a close eye on employee
North American Regional Committee of activism to head off unions. CNBC.
the Human Genome Diversity Project. https://www.cnbc.com/2020/10/24/
(1997). Proposed model ethical protocol how-amazon-prevents-unions-by-
for collecting DNA samples. Houston Law surveilling-employee-activism.html
Rev. 33, 1431– 1473. http://www.stanford.
edu/group/morrinst/hgdp/protocol.html
Parthasarathy, S. (2007) Building
genetic medicine: Breast cancer,
Oberhaus, D. (2019, December 10). Amazon, technology, and the comparative politics
Google, Microsoft: Here’s Who Has The of health care. The MIT Press.
Greenest Cloud. Wired. https://www.wired.
com/story/amazon-google-microsoft-
Parthasarathy, S. (2010). Breaking the
green-clouds-and-hyperscale-data-centers/
expertise barrier: understanding activist
strategies in science and technology
Oliveri, S., Ferrari, F., Manfrinati, A., & policy domains. Science and Public
Pravettoni, G. (2018). A systematic review Policy, 37(5), 355-367. https://doi.
of the psychological implications of genetic org/10.3152/030234210X501180
testing: A comparative analysis among
cardiovascular, neurodegenerative and

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 123 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Parthasarathy, S. (2017). Patent politics. scholarship#:~:text=Blight%2C%20


University of Chicago Press. renewal%20proponents%20argued%2C%20
was,the%20future%20of%20the%20city.

Patterson, M. (n.d.). GPT-3 and AI


in Customer Support. Help Scout. Privacy Considerations in Large
https://www.helpscout.com/blog/ Language Models. (n.d.). Google AI Blog.
ai-in-customer-support/ Retrieved January 31, 2022, from https://
ai.googleblog.com/2020/12/privacy-

Perrigo, B. (2022, February 14). Inside considerations-in-large.html

Facebook’s African Sweatshops. TIME.


https://time.com/6147458/facebook-africa- Rabin, R.C. (2020, December 22). Pulse
content-moderation-employee-treatment/. oximeter devices have higher error rate
in Black patients. The New York Times.

Peterson, T. (2021, October 22). Google https://www.nytimes.com/2020/12/22/

looks to be a go in The Dalles. Columbia health/oximeters-covid-black-patients.

Community Connection. https://www. html?searchResultPosition=1

columbiacommunityconnection.
com/the-dalles/google/data/centers/ Rabinow, P. (2011). Making PCR: A story of
tax-abatements/city-council biotechnology. University of Chicago Press.

Pilon, M. (2019, April 29). Stop & Shop Race Forward: The Center for Racial Justice
strike reveals concerns about job-killing Innovation. (2014, January 22). Moving
technology. Hartford Business Journal. the race conversation forward. https://
https://www.hartfordbusiness.com/ www.raceforward.org/research/reports/
article/stop-shop-strike-reveals- moving-race-conversation-forward
concerns-about-job-killing-technology

Radford, A., Wu, J., Child, R., Luan, D.,


Pontis, S., Blandford, A., Greifeneder, E., Amodei, D., & Sutskever, I. (2019). Language
Attalla, H., & Neal, D. (2017). Keeping up to models are unsupervised multitask learners.
date: An academic researcher’s information OpenAI blog, 1(8), 9. http://www.persagen.
journey. Journal of the Association for com/files/misc/radford2019language.pdf
Information Science and Technology, 68(1),
22-35. https://doi.org/10.1002/asi.23623 Rahman, K. (2020, April 24). Michigan
man died after being repeatedly denied
Pritchett, W.E. (2003). The “public test just hours after his father died of
menace” of blight: urban renewal and Coronavirus, family say. Newsweek. https://
the private uses of eminent domain. Penn www.newsweek.com/michigan-man-dies-
Law: Legal Scholarship Repository. https:// coronavirus-repeatedly-turned-away-
scholarship.law.upenn.edu/cgi/viewcontent. 1499818Downloaded May 20, 2020.
cgi?article=2199&context=faculty_

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 124 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Rainie, L., & Perrin, A. (2019, July 22). Resnick, B., & Belluz, J. (2019, July 10).
Key findings about Americans’ declining The war to free science: how librarians,
trust in government and each other. Pew pirates, and funders are liberating
Research Center. https://www.pewresearch. the world’s academic research from
org/fact-tank/2019/07/22/key-findings- paywalls. Vox. https://www.vox.com/
about-americans-declining-trust- the-highlight/2019/6/3/18271538/
in-government-and-each-other/ open-access-elsevier-california-
sci-hub-academic-paywalls

Ramsey, C.L. (2000). Ethics and Culture in


the Deaf Community Response to Cochlear Roach, J. (2020, September 14). Microsoft
Implants. Seminars in Hearing, 21, 0075-0086. finds underwater datacenters are reliable,
practical, and use energy sustainably.
Rapp, R. (1999). Testing women, Microsoft News. https://news.microsoft.

testing the fetus. Routledge. com/innovation-stories/project-


natick-underwater-datacenter/

Rayome, A.D. (2016, September 19). Why data


centers fail to bring new jobs to small towns. Roberts, S.T. (2021) Behind the screen:

Tech Republic. https://www.techrepublic. Content moderation in the shadows of

com/article/why-data-centers-fail-to- social media. Yale University Press.

bring-new-jobs-to-small-towns/
Robinson, R. (2021, November 16).

Reardon, J., & Princeton University. Boeing built an unsafe plane, and blamed

(2005). Race to the finish : identity the pilots when It crashed. Bloomberg

and governance in an age of genomics. Businessweek. https://www.bloomberg.

Princeton University Press, Cop. com/news/features/2021-11-16/are-


boeing-planes-unsafe-pilots-blamed-
for-corporate-errors-in-max-737-crash
Reinhardt, L.R. (2015). Deaf-hearing
interpreter teams: Navigating trust
in shared space. (Publication No. 21) Romero, A. (2021, June 24). Can’t access

[Master’s Thesis, Western Oregon GPT-3? Here’s GPT-J-- Its open-

University]. Digital Commons. https:// source cousin. Towards Data Science.

digitalcommons.wou.edu/theses/21/ https://towardsdatascience.com/
cant-access-gpt-3-here-s-gpt-j-its-
open-source-cousin-8af86a638b11

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 125 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Romero, A. (2021, June 5). GPT-3 scared you? Scareflow, A. [@vboykis]. (2020, August 2).
Meet Wu Dao 2.0: A monster of 1.75 trillion NLP People: Anyone know what these two
parameters. Towards Data Science. https:// Books1 and Books2 data sources in GPT-3 are?
towardsdatascience.com/gpt-3-scared- Writing a newsletter about them...[Tweet].
you-meet-wu-dao-2-0-a-monster-of-1- Twitter. https://twitter.com/vboykis/
75-trillion-parameters-832cd83db484 status/1290030614410702848?lang=en

Rousseau, A., Baudelaire, C., Riera, K. (2020, Schiermeier, Q. (2021). Forensic database
October 27). Doctor GPT-3: hype or reality? challenged over ethics of DNA holdings.
Nabla. https://www.nabla.com/blog/gpt-3/ Nature, 594(7863), 320–322. https://doi.
org/10.1038/d41586-021-01584-w

Royal Canadian Mounted Police. (2014,


January 24). Critical Infrastructure Schiffer, Zoe (2021, February 19). Google
Intelligence Assessment: Criminal threats fires second AI ethics researcher following
to the Canadian Petroleum industry. Ottawa: internal investigation. The Verge. https://
RCMP. https://www.statewatch.org/media/ www.bbc.com/news/technology-56135817
documents/news/2015/feb/can-2014-01-24-
rcmp-anti-petroleum-activists-report.pdf Schmitt, A. (2020). Right of way: Race, class,
and the silent epidemic of pedestrian deaths
Sale, K. (1995) Rebels against the future: in America. Island Press: Washington, DC.
The luddites and their war on the industrial
revolution: Lessons for the computer Schurman, R., & Munro, W. A. (2010).
age. Addison-Wesley Publishing. Fighting for the future of food: Activists versus
agribusiness in the struggle over biotechnology
Savaresi, A., & McVey, M. (2020, February 7). (Vol. 35). U of Minnesota Press.
Human rights abuses by fossil fuel companies.
350. https://350.org/climate-defenders/ Schwartz, J. (2013, January 12). Internet
activist, a creator of RSS, is dead at 26,
Savero, R. (1981, February 4). Air Force apparently a suicide. The New York Times.
Academy to drop its ban on applicants https://www.nytimes.com/2013/01/13/
with sickle-cell gene. The New York Times. technology/aaron-swartz-internet-
https://www.nytimes.com/1981/02/04/ activist-dies-at-26.html
us/air-academy-to-drop-its-ban-on-
applicants-with-sickle-cell-gene.html Scott, D., & Barnett, C. (2009).
Something in the air: civic science and
contentious environmental politics
in post-apartheid South Africa.
Geoforum, 40(3), 373-382. https://doi.
org/10.1016/j.geoforum.2008.12.002

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 126 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Seabrook, J. (2019, October 14). Shapin, S. (1995). A social history of truth:


The Next Word. The New Yorker. 95 Civility and science in seventeenth-century
(31). https://www.newyorker.com/ England. University of Chicago Press.
magazine/2019/10/14/can-a-machine-
learn-to-write-for-the-new-yorker Shapin, S., & Schaffer, S. (1985). Leviathan and
the air-pump. Princeton University Press.
Select Committee on Artificial Intelligence.
(2019, June). The national artificial intelligence Shelton, S. A., & Brooks, T. (2019). “We
research and development strategic plan: 2019 need to get these scores up”: A narrative
update. NITRD. https://www.nitrd.gov/ examination of the challenges of teaching
pubs/National-AI-RD-Strategy-2019.pdf literature in the age of standardized testing.
Journal of Language and Literacy Education,
Selin, C. (2011). Negotiating plausibility: 15(2), n2. https://eric.ed.gov/?id=EJ1235207
Intervening in the future of nanotechnology.
Science and Engineering Ethics, 17, 723-737. Shill, G. (2020). Should law subsidize
https://doi.org/10.1007/s11948-011-9315-x driving? NYU Law Review, 95, 498-579.

Selinger, E., & Durant, D. (2021). Amazon’s Siddik, M.A.B., Shehabi, A., and Marston, L.
Ring: Surveillance as a Slippery Slope (2021). The environmental footprint of data
Service. Science as Culture, 1-15. https:// centers in the United States. Environmental
doi.org/10.1080/09505431.2021.1983797 Research Letters, 16. https://iopscience.iop.
org/article/10.1088/1748-9326/abfba1/pdf
Selsky, A., and Valdes, M. (2021, October
25). Big tech data centers spark worry over Simon, C. M., L’Heureux, J., Murray, J.
scarce western water. Associated Press. https:// C., Winokur, P., Weiner, G., Newbury,
apnews.com/article/technology-business- E., Shinkunas, L., & Zimmerman, B.
environment-and-nature-oregon-united- (2011). Active choice but not too active:
states-2385c62f1a87030d344261ef9c76ccda Public perspectives on biobank consent
models. Genetics in Medicine : Official
Semuels, A. (2016, March 18). The role Journal of the American College of Medical
of highways in American poverty. The Genetics, 13(9), 821–831. https://doi.
Atlantic. https://www.theatlantic.com/ org/10.1097/GIM.0b013e31821d2f88
business/archive/2016/03/role-of-
highways-in-american-poverty/474282/ Simonite, T. (2021). What really happened
when Google ousted Timnit Gebru? Wired.
Severin, A., & Chataway, J. (2021). https://www.wired.com/story/google-
Overburdening of peer reviewers: A multi- timnit-gebru-ai-what-really-happened/.
stakeholder perspective on causes and
effects. Learned Publishing, 34(4), 537-
546. https://doi.org/10.1002/leap.1392

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 127 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Singer, J. (2022). There are no accidents: The com/technology/2022/jan/09/are-we-


deadly rise of injury and disaster—who profits witnessing-the-dawn-of-post-theory-
and who pays the price. Simon and Schuster. science?CMP=Share_iOSApp_Other

Singh, R. & Jackson, S. (2021). Seeing Stanford CRFM. (n.d.) Developing and
like an infrastructure: Low-resolution understanding responsible foundation
citizens and the Aadhaar identification models. Retrieved March 3, 2022
project. Proceedings of the ACM on Human- from https://crfm.stanford.edu/
Computer Interaction, 5(CSCW2), 1-26.
https://doi.org/10.1145/3476056 Stanford HAI. (n.d.) Corporate members
program. Retrieved March 3, 2022
Sjoding, M. W., Dickson, R. P., Iwashyna, from https://hai.stanford.edu/about/
T. J., Gay, S. E., & Valley, T. S. (2020). Racial corporate-members-program
bias in pulse oximetry measurement.
New England Journal of Medicine, 383(25), Stangeland, C. (2016, December).
2477-2478. https://www.nejm.org/ Fracking: Unintended consequences for local
doi/full/10.1056/nejmc2029240 communities. Homeland Security Affairs.
https://www.hsaj.org/articles/13753
Solaiman, I., & Dennison, C. (2021).
Process for adapting language models Starr, P. (1982). The social transformation
to society (palms) with values- of American medicine. Basic Books.
targeted datasets. Advances in Neural
Information Processing Systems, 34.
Stern, J. (2021, May 28). Pipeline of
violence: The oil industry and missing and
Sovacool, B.K. (2021, March). When murdered Indigenous women. Immigration
subterranean slavery supports sustainability and Human Rights Law Review. https://
transitions? power, patriarchy, and child lawblogs.uc.edu/ihrlr/2021/05/28/
labor in artisanal Congolese cobalt mining. pipeline-of-violence-the-oil-industry-
The Extractive Industries & Society, 8(1), and-missing-and-murdered-indigenous-
271-293. https://www.sciencedirect.com/ women/#post-274-footnote-ref-6
science/article/pii/S2214790X20303154

Stevenson, F. A., Kerr, C., Murray, E., &


Spice, A. (2018). Fighting invasive Nazareth, I. (2007). Information from the
infrastructures, Environment and Internet and the doctor-patient relationship:
Society, 9(1), 40-56. https://doi. the patient perspective–a qualitative study.
org/10.3167/ares.2018.090104 BMC family practice, 8(1), 1-8. https://
doi.org/10.1186/1471-2296-8-47
Spinney, Laura (2022, January 9). Are we
witnessing the dawn of post-theory science?
The Guardian. https://www.theguardian.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 128 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Stilgoe, J., Owen, R., & Macnaghten, P. (2013). Summers-Trio, P., Hayes-Conroy, A.,
Developing a framework for responsible Singer, B., & Horwitz, R. I. (2019). Biology,
innovation. Research Policy, 42(9), 1568-1580. biography, and the translational gap. Science
https://doi.org/10.1016/j.respol.2013.05.008 Translational Medicine, 11(479). https://
doi.org/10.1126/scitranslmed.aat7027

Stirling, A. (2008). ’Opening up’ and


‘closing down’: Power, participation, Swanson, K. W. (2009). The emergence
and pluralism in the social appraisal of the professional patent practitioner.
of technology. Science, Technology, and Technology and Culture, 50(3), 519-548.
Human Values, 33(2), 262-294. https:// https://www.jstor.org/stable/40345727
doi.org/10.1177%2F0162243907311265

Swift, A. (2019). Soviet−American Rivalry at


Stix, C. (2020). (C. Brasoveanu, Ed.). Expo ’58. In Molella, A. P., & Knowles, S. G.
European Commission. https://ec.europa. (Eds.). (2019). World’s Fairs in the Cold War:
eu/jrc/communities/sites/jrccties/files/ Science, technology, and the culture of progress.
reportontheeuropeanailandscapeworkshop. (pp. 27-45) University of Pittsburgh Press.
pdf

Syverson, B., (2020, October 19). The


Stokstad, E. (2019). Major U.K. genetics lab rules of brainstorming change when
accused of misusing African DNA. Science. artificial intelligence gets involved.
https://doi.org/10.1126/science.aba0343 Here’s How. IDEO. https://www.ideo.
com/blog/the-rules-of-brainstorming-

Strong, J., Ryan-Mosley, T., Cillekens, change-when-ai-gets-involved-heres-

E., Hao, K., Reilly, M., and Lichfield, how?utm_content=143682113&utm_

G. (2020, December 2). Podcast: Facial medium=social&utm_source=twitter&hss_

recognition is quietly being used to control channel=tw-23462787

access to housing and social services.


MIT Technology Review. https://www. Tamkin, A., Brundage, M., Clark, J., &
technologyreview.com/2020/12/02/1012901/ Ganguli, D. (2021). Understanding the
no-face-no-service/ capabilities, limitations, and societal
impact of large language models. arXiv.

Stumpf, J. (2018). How to understand urban https://arxiv.org/abs/2102.02503

blight in America’s neighborhoods and


work to eliminate it. Dickinson Magazine. Tan, S.S.L., & Goonawardene, N. (2017).
https://www.dickinson.edu/news/ Internet health information seeking and the
article/3461/how_to_understand_urban_ patient-physician relationship: a systematic
blight_in_america_s_neighborhoods_ review. Journal of Medical Internet Research,
and_work_to_eliminate_it 19(1), https://doi.org/10.2196/jmir.5729

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 129 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Terry, S. F., Terry, P. F., Rauen, K. A., Uitto, J., Hastings Center Report, 28(4), 6-14.
& Bercovitch, L. G. (2007). Advocacy groups as https://doi.org/10.2307/3528607
research organizations: the PXE International
example. Nature Reviews Genetics, 8(2), Turing, A. M. (1950). Computing
157-164. https://doi.org/10.1038/nrg1991 Machinery and Intelligence. Creative
Computing, 6(1), 44-53.
The British Psychological Society. (2017).
Working with interpreters: Guidelines U.S. Copyright Office Review Board.
for psychologists. https://www.bps. (2022, February 14). Re: Second Request
org.uk/news-and-policy/working- for Reconsideration for Refusal to Register A
interpreters-guidelines-psychologists Recent Entrance to Paradise (Correspondence
ID 1-3ZPC6C3; SR # 1-7100387071)
The City of Ann Arbor. (n.d.) Drinking [Letter]. https://www.copyright.gov/
water. Retrieved March 13, 2022 from rulings-filings/review-board/docs/a-
https://www.a2gov.org/departments/ recent-entrance-to-paradise.pdf
systems-planning/planning-areas/water-
resources/Pages/Drinking-Water.aspx U.S. Geological Survey. (2018, June 8).
Mining and water quality. USGS Water
The Eye. (2020). Enter the Eye: An open Science School. https://www.usgs.gov/
directory data archive. https://the-eye.eu/ special-topics/water-science-school/
science/mining-and-water-quality

The International HapMap Consortium.


(2004). Integrating ethics and science United Food and Commercial Workers
in the International HapMap Project. International Union. (2020, August 24).
Nature Reviews Genetics, 5(6), 467–475. UFCW Statement on Amazon Cashierless
https://doi.org/10.1038/nrg1351 Grocery Store Opening. UFCW. https://
www.ufcw.org/press-releases/cashier/

The Parliament of the Commonwealth


of Australia. (2021). Security legislation United Nations Environment Programme.
amendment (critical infrastructure) bill (2011). Environmental assessment of
no. , 2021: A bill for an act to amend Ogoniland. https://postconflict.unep.ch/
legislation relating to critical infrastructure, publications/OEA/UNEP_OEA.pdf
and for other purposes. House of
Representatives. https://parlinfo.aph. Van Noorden, R. (2021, February 3). Scientists
gov.au/parlInfo/download/legislation/ call for fully open sharing of coronavirus
bills/r6657_aspassed/toc_pdf/20182b01. genome data. Nature, 590(7845),195-196. doi:
pdf;fileType=application%2Fpdf https://doi.org/10.1038/d41586-021-00305-7

Tucker, B. P. (1998). Deaf culture,


cochlear implants, and elective disability.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 130 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Vaswani, A., Shazeer, N., Parmar, N., Kohli, P., Coppin, B., and Huang, P. S. (2021).
Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Challenges in detoxifying language models.
L. and Polosukhin, I. (2017). Attention is arXiv. https://arxiv.org/abs/2109.07445
all you need. Advances in neural information
processing systems (pp. 5998-6008). https:// West, D. (2019). The Future of Work: Robots, AI,
proceedings.neurips.cc/paper/2017/file/3f5e and Automation. Brookings Institution Press.
e243547dee91fbd053c1c4a845aa-Paper.pdf

White, R. F. (2007). Institutional Review


Vats, A. (2020). The color of creatorship: Board mission creep: The common rule,
Intellectual property, race, and the making social science, and the nanny state. The
of Americans. Stanford University Press. Independent Review, 11(4), 547–564.
http://www.jstor.org/stable/24562415
Vézina, B., & Moran, B. (2020, August
10). Artificial Intelligence and creativity: Whyte, K.P. (2017, February 28). The
Why we’re against copyright protection Dakota Access Pipeline, environmental
for AI-generated output. Creative injustice, and U.S. colonialism. Red Ink:
Commons Blog. https://creativecommons. An International Journal of Indigenous
org/2020/08/10/no-copyright- Literature, Arts, & Humanities. 19(1),
protection-for-ai-generated-output/. https://ssrn.com/abstract=2925513.

Viable. (n.d.) https://askviable.com/ Willis, D. E., Andersen, J. A., Bryant-Moore,


K., Selig, J. P., Long, C. R., Felix, H. C., ... &
Vincent, J. (2021, May 25). Microsoft has McElfish, P. A. (2021). COVID-19 vaccine
built an AI-powered autocomplete for hesitancy: Race/ethnicity, trust, and fear.
code using GPT-3. The Verge. https:// Clinical and translational science, 14(6), 2200-
www.theverge.com/2021/5/25/22451144/ 2207. ​​https://doi.org/10.1111/cts.13077
microsoft-gpt-3-openai-coding-
autocomplete-powerapps-power-fx Working Group on Mining and Human
Rights in Latin America. (2014, May 22). The
Wang, Z., Rodriguez Morales, M.M., Husak, impact of Canadian mining in Latin America
K. Kleinman, M., & Parthasarathy, S. (2021). and Canada’s responsibility - Executive
In Communities We Trust: Institutional summary of the report submitted to the Inter-
Failures and Sustained Solutions for Vaccine American Commission on Human Rights. Due
Hesitancy. https://stpp.fordschool. Process of Law Foundation. https://www.
umich.edu/research/research-report/ dplf.org/en/resources/impact-canadian-
communities-we-trust-institutional- mining-latin-america-and-canadas-
failures-and-sustained-solutions responsibility-executive-summary

Welbl, J., Glaese, A., Uesato, J., Dathathri, Wyden, R. (2022). Wyden, Booker and
S., Mellor, J., Hendricks, L. A., Anderson, K, Clarke Introduce Algorithmic Accountability

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 131 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

Act of 2022 To Require New Transparency Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov,
And Accountability For Automated R., Urtasun, R., Torralba, A., & Fidler,
Decision Systems. Press Release. S. (2015). Aligning books and movies:
Towards story-like visual explanations

Zabel, J. (2019). The Killer Inside Us: Law, by watching movies and reading books.

Ethics, and the Forensic Use of Family Proceedings of the IEEE international

Genetics. Berkeley Journal of Criminal Law, conference on computer vision (pp. 19-27).

24(2). https://doi.org/10.15779/Z385D8NF7
Zou, Y., Mhaidli, A. H., McCall, A., & Schaub,

Zeavin, H. (2021). The distance cure: A F. (2018). “I’ve got nothing to lose”:

history of teletherapy. MIT Press. Consumers’ risk perceptions and protective


actions after the Equifax data breach.
Fourteenth Symposium on Usable Privacy and
Zhang, K. (2018, December 13). How big
Security, 197-216. https://www.usenix.org/
data has created a big crisis in science.
conference/soups2018/presentation/zou
The Conversation. https://theconversation.
com/how-big-data-has-created-
a-big-crisis-in-science-102835 Zuboff, S. (2019). The age of surveillance
capitalism: The fight for the future at the
new frontier of power. Profile Books.
Zhavoronkov, A. (2021, July 19). Wu Dao
2.0 - bigger, stronger, faster AI from
China. Forbes. https://www.forbes.com/
sites/alexzhavoronkov/2021/07/19/
wu-dao-20bigger-stronger-faster-
ai-from-china/?sh=3020dbf16fb2

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 132 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

For Further Information


If you would like additional information about this report, the Technology Assessment Project, or
University of Michigan’s Science, Technology, and Public Policy Program, you can contact us at
[email protected] or stpp.fordschool.umich.edu.

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 133 PDF: Click to
return to top
WHAT’S IN THE CHATTERBOX? LARGE LANGUAGE MODELS, WHY THEY MATTER, AND WHAT WE SHOULD DO ABOUT THEM

LEARN MORE

myumi.ch/LLMReport

Technology Assessment Project


Science, Technology, and Public Policy Program
Gerald R. Ford School of Public Policy
University of Michigan
735 S. State Street
Ann Arbor, MI 48109
Brain/circuit icon: Freepik

(734) 764-0453
stpp.fordschool.umich.edu
[email protected]

© 2022 The Regents of the University of Michigan

UNIVERSITY OF MICHIGAN TECHNOLOGY ASSESSMENT PROJECT APRIL 2022 134 PDF: Click to
return to top

You might also like