Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

September revealed that IBM was also working with the New York City Police Department (NYPD) to

build an “ethnicity detection” feature to search faces based on race, using police camera footage of
thousands of people in the streets of New York taken without their knowledge or permission. 17
This is just a sampling of an extraordinary series of incidents from 2018.18 The response has included a
growing wave of criticism, with demands for greater accountability from the technology industry and
the systems they build.19 In turn, some companies have made public calls for the U.S. to regulate
technologies like facial recognition.20 Others have published AI ethics principles and increased efforts to
produce technical fixes for issues of bias and discrimination in AI systems. But many of these ethical and
technical approaches define the problem space very narrowly, neither contending with the historical or
social context nor providing mechanisms for public accountability, oversight, and due process. This
makes it nearly impossible for the public to validate that any of the current problems have, in fact, been
addressed.

As numerous scholars have noted, one significant barrier to accountability is the culture of industrial and
legal secrecy that dominates AI development.21 Just as many AI technologies are “black boxes”, so are
the industrial cultures that create them.22 Many of the fundamental building blocks required to
understand AI systems and to ensure certain forms of accountability – from training data, to data
models, to the code dictating algorithmic functions, to implementation guidelines and software, to the
business decisions that directed design and development – are rarely accessible to review, hidden by
corporate secrecy laws.

The current accountability gap is also caused by the incentives driving the rapid pace of technical AI
research. The push to “innovate,” publish first, and present a novel addition to the technical domain has
created an accelerated cadence in the field of AI, and in technical disciplines more broadly. This comes
at the cost of considering empirical questions of context and use, or substantively engaging with ethical
concerns.23 Similarly, technology companies are driven by pressures to “launch and iterate,” which
assume complex social and political questions will be handled by policy and legal departments, leaving
developers and sales departments free from the responsibility of considering the potential downsides.
The “move fast and break things” culture provides little incentive for ensuring meaningful public
accountability or engaging the communities most likely to experience harm.24 This is particularly
problematic as the accelerated application of AI systems in sensitive social and political domains
presents risks to marginalized communities.

The challenge to create better governance and greater accountability for AI poses particular problems
when such systems are woven into the fabric of government and public institutions. The lack of
transparency, notice, meaningful engagement, accountability, and oversight creates serious structural
barriers for due process and redress for unjust and discriminatory decisions.
In this year’s report, we assess many pressing issues facing us as AI tools are deployed further into the
institutions that govern everyday life. We focus on the biggest industry players, because the number of
companies able to create AI at scale is very small, while their power and reach is global. We evaluate the
current range of responses from industry, governments, researchers, activists, and civil society at large.
We suggest a series of substantive approaches and make ten specific recommendations. Finally, we
share the latest research and policy strategies that can contribute to greater accountability, as well as a
richer understanding of AI systems in a wider social context.

1. THE INTENSIFYING PROBLEM SPACE


In identifying the most pressing social implications of AI this year, we look closely at the role of AI in
widespread surveillance in multiple countries around the world, and at the implications for rights and
liberties. In particular, we consider the increasing use of facial recognition, and a subclass of facial
recognition known as affect recognition, and assess the growing calls for regulation. Next, we share our
findings on the government use of automated decision systems, and what questions this raises for
fairness, transparency, and due process when such systems are protected by trade secrecy and other
laws that prevent auditing and close examination.25 Finally, we look at the practices of deploying
experimental systems “in the wild,” testing them on human populations. We analyze who has the most
to gain, and who is at greatest risk of experiencing harm.

1.1 AI is Amplifying Widespread Surveillance


This year, we have seen AI amplify large-scale surveillance through techniques that analyze video, audio,
images, and social media content across entire populations and identify and target individuals and
groups. While researchers and advocates have long warned about the dangers of mass data collection
and surveillance,26 AI raises the stakes in three areas: automation, scale of analysis, and predictive
capacity. Specifically, AI systems allow automation of surveillance capabilities far beyond the limits of
human review and hand-coded analytics. Thus, they can serve to further centralize these capabilities in
the hands of a small number of actors. These systems also exponentially scale analysis and tracking
across large quantities of data, attempting to make connections and inferences that would have been
difficult or impossible before their introduction. Finally, they provide new predictive capabilities to make
determinations about individual character and risk profiles, raising the possibility of granular population
controls.

China has offered several examples of alarming AI-enabled surveillance this year, which we know about
largely because the government openly acknowledges them. However, it’s important to note that many
of the same infrastructures already exist in the U.S. and elsewhere, often produced and promoted by
private companies whose marketing emphasizes beneficial use cases. In the U.S. the use of these tools
by law enforcement and government is rarely open to public scrutiny, as we will review, and there is
much we do not know. Such infrastructures and capabilities could easily be turned to more surveillant
ends in the U.S., without public disclosure and oversight, depending on market incentives and political
will.

In China, military and state-sanctioned automated surveillance technology is being deployed to monitor
large portions of the population, often targeting marginalized groups. Reports include installation of
facial recognition tools at the Hong Kong-Shenzhen border,27 using flocks of robotic dove-like drones in
five provinces across the country,28 and the widely reported social credit monitoring system,29 each of
which illustrates how AI-enhanced surveillance systems can be mobilized as a means of far-reaching
social control.30

The most oppressive use of these systems is reportedly occuring in the Xinjiang Autonomous Region,
described by T he Economist as a “police state like no other.”31 Surveillance in this Uighur ethnic
minority area is pervasive, ranging from physical checkpoints and programs where Uighur households
are required to “adopt” Han Chinese officials into their family, to the widespread use of surveillance
cameras, spyware, Wi-Fi sniffers, and biometric data collection, sometimes by stealth. Machine learning
tools integrate these streams of data to generate extensive lists of suspects for detention in re-education
camps, built by the government to discipline the group. Estimates of the number of people detained in
these camps range from hundreds of thousands to nearly one million.32

These infrastructures are not unique to China. Venezuela announced the adoption of a new smart card
ID known as the “carnet de patria,” which, by integrating government databases linked to social
programs, could enable the government to monitor citizens’ personal finances, medical history, and
voting activity.33 In the United States, we have seen similar efforts. The Pentagon has funded research
on AI-enabled social media surveillance to help predict large-scale population behaviors, 34 and the U.S.
Immigration and Customs Enforcement (ICE) agency is using an Investigative Case Management System
developed by Palantir and powered by Amazon Web Services in its deportation operations.35 The
system integrates public data with information purchased from private data brokers to create profiles of
immigrants in order to aid the agency in profiling, tracking, and deporting individuals. 36 These examples
show how AI systems increase integration of surveillance technologies into data-driven models of social
control and amplify the power of such data, magnifying the stakes of misuse and raising urgent and
important questions as to how basic rights and liberties will be protected.

The faulty science and dangerous history of affect recognition

We are also seeing new risks emerging from unregulated facial recognition systems. These systems
facilitate the detection and recognition of individual faces in images or video, and can be used in
combination with other tools to conduct more sophisticated forms of surveillance, such as automated
lip-reading, offering the ability to observe and interpret speech from a distance.37

Among a host of AI-enabled surveillance and tracking techniques, facial recognition raises particular civil
liberties concerns. Because facial features are a very personal form of biometric identification that is
extremely difficult to change, it is hard to subvert or “opt out” of its operations. And unlike other
tracking tools, facial recognition seeks to use AI for much more than simply recognizing faces. Once
identified, a face can be linked with other forms of personal records and identifiable data, such as credit
score, social graph, or criminal record.

Affect recognition, a subset of facial recognition, aims to interpret faces to automatically detect inner
emotional states or even hidden intentions. This approach promises a type of emotional weather
forecasting: analyzing hundreds of thousands of images of faces, detecting
“micro-expressions,” and mapping these expressions to “true feelings.”38 This reactivates a long
tradition of physiognomy – a pseudoscience that claims facial features can reveal innate aspects of our
character or personality. Dating from ancient times, scientific interest in physiognomy grew enormously
in the nineteenth century, when it became a central method for scientific forms of racism and
discrimination.39 Although physiognomy fell out of favor following its association with Nazi race science,
researchers are worried about a reemergence of physiognomic ideas in affect recognition applications. 40
The idea that AI systems might be able to tell us what a student, a customer, or a criminal suspect is
really feeling or what type of person they intrinsically are is proving attractive to both corporations and
governments, even though the scientific justifications for such claims are highly questionable, and the
history of their discriminatory purposes well-documented.

The case of affect detection reveals how machine learning systems can easily be used to intensify forms
of classification and discrimination, even when the basic foundations of these theories remain
controversial among psychologists. The scientist most closely associated with AI-enabled affect
detection is the psychologist Paul Ekman, who asserted that emotions can be grouped into a small set of
basic categories like anger, disgust, fear, happiness, sadness, and surprise.41 Studying faces, according to
Ekman, produces an objective reading of authentic interior states—a direct window to the soul.
Underlying his belief was the idea that emotions are fixed and universal, identical across individuals, and
clearly visible in observable biological mechanisms regardless of cultural context. But Ekman’s work has
been deeply criticized by psychologists, anthropologists, and other researchers who have found his
theories do not hold up under sustained scrutiny.42 The psychologist Lisa Feldman Barrett and her
colleagues have argued that an understanding of emotions in terms of these rigid categories and
simplistic physiological causes is no longer tenable.43 Nonetheless, AI researchers have taken his work
as fact, and used it as a basis for automating emotion detection.44

Contextual, social, and cultural factors — how, where, and by whom such emotional signifiers are
expressed — play a larger role in emotional expression than was believed by Ekman and his peers. In
light of this new scientific understanding of emotion, any simplistic mapping of a facial expression onto
basic emotional categories through AI is likely to reproduce the errors of an outdated scientific
paradigm. It also raises troubling ethical questions about locating the arbiter of someone’s “real”
character and emotions outside of the individual, and the potential abuse of power that can be justified
based on these faulty claims. Psychiatrist Jamie Metzl documents a recent cautionary example: a pattern
in the 1960s of diagnosing Black people with schizophrenia if they supported the civil rights
movement.45 Affect detection combined with large-scale facial recognition has the potential to magnify
such political abuses of psychological profiling.
In the realm of education, some U.S. universities have considered using affect analysis software on
students.46 The University of St. Thomas, in Minnesota, looked at using a system based on Microsoft’s
facial recognition and affect detection tools to observe students in the classroom using a webcam. The
system predicts the students’ emotional state. An overview of student sentiment is viewable by the
teacher, who can then shift their teaching in a way that “ensures student engagement,” as judged by the
system. This raises serious questions on multiple levels: what if the system, with a simplistic emotional
model, simply cannot grasp more complex states? How would a student contest a determination made
by the system? What if different students are seen as “happy” while others are “angry”—how should
the teacher redirect the lesson? What are the privacy implications of such a system, particularly given
that, in the case of the pilot program, there is no evidence that students were informed of its use on
them?

Outside of the classroom, we are also seeing personal assistants, like Alexa and Siri, seeking to pick up
on the emotional undertones of human speech, with companies even going so far as to patent methods
of marketing based on detecting emotions, as well as mental and physical health.47 The AI-enabled
emotion measurement company Affectiva now promises it can promote safer driving by monitoring
“driver and occupant emotions, cognitive states, and reactions to the driving experience...from face and
voice.”48 Yet there is little evidence that any of these systems actually work across different individuals,
contexts, and cultures, or have any safeguards put in place to mitigate concerns about privacy, bias, or
discrimination in their operation. Furthermore, as we have seen in the large literature on bias and
fairness, classifications of this nature not only have direct impacts on human lives, but also serve as data
to train and influence other AI systems. This raises the stakes for any use of affect recognition, further
emphasizing why it should be critically examined and its use severely restricted.

Facial recognition amplifies civil rights concerns

Concerns are intensifying that facial recognition increases racial discrimination and other biases in the
criminal justice system. Earlier this year, the American Civil Liberties Union (ACLU) disclosed that both
the Orlando Police Department and the Washington County Sheriff’s department were using Amazon’s
Rekognition system, which boasts that it can perform “real-time face recognition across tens of millions
of faces” and detect “up to 100 faces in challenging crowded photos.”49 In Washington County, Amazon
specifically worked with the Sheriff’s department to create a mobile app that could scan faces and
compare them against a database of at least 300,000 mugshots.50 An Amazon representative recently
revealed during a talk that they have been considering applications where Orlando’s network of
surveillance cameras could be used in conjunction with facial recognition technology to find a “person of
interest” wherever they might be in the city.51

In addition to the privacy and mass surveillance concerns commonly raised, the use of facial recognition
in law enforcement has also intersected with concerns of racial and other biases. Researchers at the
ACLU and the University of California (U.C.) Berkeley tested Amazon’s Rekognition tool by comparing the
photos of sitting members in the United States Congress with a database containing 25,000 photos of
people who had been arrested. The results showed significant levels of inaccuracy: Amazon’s
Rekognition incorrectly identified 28 members of Congress as people from the arrest database.
Moreover, the false positives disproportionately occurred among non-white members of Congress, with
an error rate of nearly 40% compared to only 5% for white members.52 Such results echo a string of
findings that have demonstrated that facial recognition technology is, on average, better at detecting
light-skinned people than dark-skinned people, and better at detecting men than women. 53

In its response to the ACLU, Amazon acknowledged that “the Rekognition results can be significantly
skewed by using a facial database that is not appropriately representative.”54 Given the deep and
historical racial biases in the criminal justice system, most law enforcement databases are unlikely to be
“appropriately representative.”55 Despite these serious flaws, ongoing pressure from civil rights groups,
and protests from Amazon employees over the potential for misuse of these technologies, Amazon Web
Services CEO Andrew Jassy recently told employees that “we feel really great and really strongly about
the value that Amazon Rekognition is providing our customers of all sizes and all types of industries in
law enforcement and out of law enforcement.”56

Nor is Amazon alone in implementing facial recognition technologies in unaccountable ways.


Investigative journalists recently disclosed that IBM and the New York City Police Department (NYPD)
partnered to develop such a system that included “ethnicity search” as a custom feature, trained on
thousands of hours of NYPD surveillance footage.57 Use of facial recognition software in the private
sector has expanded as well.58 Major retailers and venues have already begun using these technologies
to detect shoplifters, monitor crowds, and even “scan for unhappy customers,” using facial recognition
systems instrumented with “affect detection” capabilities.59

These concerns are amplified by a lack of laws and regulations. There is currently no federal legislation
that seeks to provide standards, restrictions, requirements, or guidance regarding the development or
use of facial recognition technology. In fact, most existing federal legislation looks to promote the use of
facial recognition for surveillance, immigration enforcement, employment verification, and domestic
entry-exit systems.60 The laws that we do have are piecemeal, and none specifically address facial
recognition. Among these is the Biometric Information Privacy Act, a 2008 Illinois law that sets forth
stringent rules regarding the collection of biometrics. While the law does not mention facial recognition,
given that the technology was not widely available in 2008, many of its requirements, such as obtaining
consent, are reasonably interpreted to apply.61 More recently, several municipalities and a local transit
system have adopted ordinances that seek to create greater transparency and oversight of data
collection and use requirements regarding the acquisition of surveillance technologies, which would
include facial recognition based on the expansive definition in these ordinances.62

Opposition to the use of facial recognition tools by government agencies is growing. Earlier this year, AI
Now joined the ACLU and over 30 other research and advocacy organizations calling on Amazon to stop
selling facial recognition software to government agencies after the ACLU uncovered documents
showing law enforcement use of Amazon’s Rekognition API.63 Members of Congress are also pushing
Amazon to provide more information.64

Some have gone further, calling for an outright ban. Scholars Woodrow Hartzog and Evan Selinger argue
that facial recognition technology is a “tool for oppression that’s perfectly suited for governments to
display unprecedented authoritarian control and an all-out privacy-eviscerating machine,” necessitating
extreme caution and diligence before being applied in our contemporary digital ecosystem. 65 Critiquing
the Stanford “gaydar” study that claimed its deep neural network was more accurate than humans at
predicting sexuality from facial images,66 Frank Pasquale wrote that “there are some scientific research
programs best not pursued - and this might be one of them.”67

Kade Crockford, Director of the Technology for Liberty Program at ACLU of Massachusetts, also wrote in
favor of a ban, stating that “artificial intelligence technologies like face recognition systems
fundamentally change the balance of power between the people and the government...some
technologies are so dangerous to that balance of power that they must be rejected.”68 Microsoft
President Brad Smith has called for government regulation of facial recognition, while Rick Smith, CEO of
law enforcement technology company Axon, recently stated that the “accuracy thresholds” of facial
recognition tools aren’t “where they need to be to be making operational decisions.”69

The events of this year have strongly underscored the urgent need for stricter regulation of both facial
and affect recognition technologies. Such regulations should severely restrict use by both the public and
the private sector, and ensure that communities affected by these technologies are the final arbiters of
whether they are used at all. This is especially important in situations where basic rights and liberties are
at risk, requiring stringent oversight, audits, and transparency. Linkages should not be permitted
between private and government databases. At this point, given the evidence in hand, policymakers
should not be funding or furthering the deployment of these systems in public spaces.

1.2 The Risks of Automated Decision Systems in Government


Over the past year, we have seen a substantial increase in the adoption of Automated Decision Systems
(ADS) across government domains, including criminal justice, child welfare, education, and immigration.
Often adopted under the theory that they will improve government efficiency or cost-savings, ADS seek
to aid or replace various decision-making processes and policy determinations. However, because the
underlying models are often proprietary and the systems frequently untested before deployment, many
community advocates have raised significant concerns about lack of due process, accountability,
community engagement, and auditing.70

Such was the case for Tammy Dobbs, who moved to Arkansas in 2008 and signed up for a state disability
program to help her with her cerebral palsy.71 Under the program, the state sent a qualified nurse to
assess Tammy to determine the number of caregiver hours she would need. Because Tammy spent most
of her waking hours in a wheelchair and had stiffness in her hands, her initial assessment allocated 56
hours of home care per week. Fast forward to 2016, when the state assessor arrived with a new ADS on
her laptop. Using a proprietary algorithm, this system calculated the number of hours Tammy would be
allotted. Without any explanation or opportunity for comment, discussion, or reassessment, the
program allotted Tammy 32 hours per week, a massive and sudden drop that Tammy had no chance to
prepare for and that severely reduced her quality of life.
Nor was Tammy’s situation exceptional. According to Legal Aid of Arkansas attorney Kevin De Liban,
hundreds of other individuals with disabilities also received dramatic reductions in hours, all without any
meaningful opportunity to understand or contest their allocations. Legal Aid subsequently sued the
State of Arkansas, eventually winning a ruling that the new algorithmic allocation program was
erroneous and unconstitutional. Yet by then, much of the damage to the lives of those affected had
been done.72

The Arkansas disability cases provide a concrete example of the substantial risks that occur when
governments use ADS in decisions that have immediate impacts on vulnerable populations. While
individual assessors may also suffer from bias or flawed logic, the impact of their case-by-case decisions
has nowhere near the magnitude or scale that a single flawed ADS can have across an entire population.

The increased introduction of such systems comes at a time when, according to the World Income
Inequality Database, the United States has the highest income inequality rate of all western countries. 73
Moreover, Federal Reserve data shows wealth inequalities continue to grow, and racial wealth
disparities have more than tripled in the last 50 years, with current policies set to exacerbate such
problems.74 In 2018 alone, we have seen a U.S. executive order cutting funding for social programs that
serve the country’s poorest citizens,75 alongside a proposed federal budget that will significantly reduce
low-income and affordable housing,76 the implementation of onerous work requirements for
Medicaid,77 and a proposal to cut food assistance benefits for low-income seniors and people with
disabilities.78

In the context of such policies, agencies are under immense pressure to cut costs, and many are looking
to ADS as a means of automating hard decisions that have very real effects on those most in need.79 As
such, many ADS systems are often implemented with the goal of doing more with less in the context of
austerity policies and cost-cutting. They are frequently designed and configured primarily to achieve
these goals, with their ultimate effectiveness being evaluated based on their ability to trim costs, often
at the expense of the populations such tools are ostensibly intended to serve.80 As researcher Virginia
Eubanks argues, “What seems like an effort to lower program barriers and remove human bias often has
the opposite effect, blocking hundreds of thousands of people from receiving the services they
deserve.”81

When these problems arise, they are frequently difficult to remedy. Few ADS are designed or
implemented in ways that easily allow affected individuals to contest, mitigate, or fix adverse or
incorrect decisions. Additionally, human discretion and the ability to intervene or override a system’s
determination is often substantially limited or removed from case managers, social workers, and others
trained to understand the context and nuance of a particular person and situation.82 These front-line
workers become mere intermediaries, communicating inflexible decisions made by automated systems,
without the ability to alter them.

Unlike the civil servants who have historically been responsible for such decisions, many ADS come from
private vendors and are frequently implemented without thorough testing, review, or auditing to ensure
their fitness for a given domain.83 Nor are these systems typically built with any explicit form of
oversight or accountability. This makes discovery of problematic automated outcomes difficult,
especially since such errors and evidence of discrimination frequently manifest as collective harms, only
recognizable as a pattern across many individual cases. Detecting such problems requires oversight and
monitoring. It also requires access to data that is often neither available to advocates and the public nor
monitored by government agencies.

For example, the Houston Federation of Teachers sued the Houston Independent School District for
procuring a third-party ADS to use student test data to make teacher employment decisions, including
which teachers were promoted and which were terminated. It was revealed that no one in the district –
not a single employee – could explain or even replicate the determinations made by the system, even
though the district had access to all the underlying data.84 Teachers who sought to contest the
determinations were told that the “black box” system was simply to be believed and could not be
questioned. Even when the teachers brought a lawsuit, claiming constitutional, civil rights, and labor law
violations, the ADS vendor fought against providing any access to how its system worked. As a result, the
judge ruled that the use of this ADS in public employee cases could run afoul of constitutional due
process protections, especially when trade secrecy blocked employees’ ability to understand how
decisions were made. The case has subsequently been settled, with the District agreeing to abandon the
third-party ADS.

Similarly, in 2013, Los Angeles County adopted an ADS to assess imminent danger or harm to children,
and to predict the likelihood of a family being re-referred to the child welfare system within 12 to 18
months. The County did not perform a review of the system or assess the efficacy of using predictive
analytics for child safety and welfare. It was only after the death of a child whom the system failed to
identify as at-risk that County leadership directed a review, which raised serious questions regarding the
system’s validity. The review specifically noted that the system failed to provide a comprehensive picture
of a given family, “but instead focus[ed] on a few broad strokes without giving weight to important
nuance.”85 Virginia Eubanks found similar problems in her investigation of an ADS developed by the
same private vendor for use in Allegheny County, PA. This system produced biased outcomes because it
significantly oversampled poor children from working class communities, especially communities of
color, in effect subjecting poor parents and children to more frequent investigation.86

Even in the face of acknowledged issues of bias and the potential for error in high-stakes domains, these
systems are being rapidly adopted. The Ministry of Social Development in New Zealand supported the
use of a predictive ADS system to identify children at risk of maltreatment, despite their recognizing that
the system raised “significant ethical concerns.” They defended this on the grounds that the benefits
“plausibly outweighed” the potential harms, which included reconfiguring child welfare as a statistical
issue.87

These cases not only highlight the need for greater transparency, oversight, and accountability in the
adoption, development, and implementation of ADS, but also the need for examination of the
limitations of these systems overall, and of the economic and policy factors that accompany the push to
apply such systems. Virginia Eubanks, who investigated Allegheny County’s use of an ADS in child
welfare, looked at this and a number of case studies to show how ADS are often adopted to avoid or
obfuscate broader structural and systemic problems in society – problems that are often beyond the
capacity of cash-strapped agencies to address meaningfully.88

Other automated systems have also been proposed as a strategy to combat pre-existing problems within
government systems. For years, criminal justice advocates and researchers have pushed for the
elimination of cash bail, which has been shown to disproportionately harm individuals based on race
and socioeconomic status while at the same time failing to enhance public safety.89 In response, New
Jersey and California recently passed legislation aimed at addressing this concern. However, instead of
simply ending cash bail, they replaced it with a pretrial assessment system designed to algorithmically
generate “risk” scores that claim to predict whether a person should go free or be detained in jail while
awaiting trial.90

The shift from policies such as cash bail to automated systems and risk assessment scoring is still
relatively new, and is proceeding even without substantial research examining the potential to amplify
discrimination within the criminal justice system. Yet there are some early indicators that raise concern.
New Jersey’s law went into effect in 2017, and while the state has experienced a decline in its pretrial
population, advocates have expressed worry that racial disparities in the risk assessment system
persist.91 Similarly, when California’s legislation passed earlier this year, many of the criminal justice
advocates who pushed for the end of cash bail, and supported an earlier version of the bill, opposed its
final version due to the risk assessment requirement.92

Education policy is also feeling the impact of automated decision systems. A University College London
professor is among those who argued for AI to replace standardized testing, suggesting that UCL
Knowledge Lab’s AIAssess can be “trusted...with the assessment of our children’s knowledge and
understanding,” and can serve to replace or augment more traditional testing.93 However, much like
other forms of AI, there is a growing body of research that shows automated essay scoring systems may
encode bias against certain linguistic and ethnic groups in ways that replicate patterns of
marginalization.94 Unfair decisions based on automated scores assigned to students from historically
and systemically disadvantaged groups are likely to have profound consequences on children’s lives, and
to exacerbate existing disparities in access to employment opportunities and resources.95

The implications of educational ADS go beyond testing to other areas, such as school assignments and
even transportation. The City of Boston was in the spotlight this year after two failed efforts to address
school equity via automated systems. First, the school district adopted a geographically-driven school
assignment algorithm, intended to provide students access to higher quality schools closer to home. The
city’s goal was to increase the racial and geographic integration in the school district, but a report
assessing the impact of the system determined that it did the opposite: while it shortened student
commutes, it ultimately reduced school integration.96 Researchers noted that this was, in part, because
it was impossible for the system to meet its intended goal given the history and context within which it
was being used. The geographic distribution of quality schools in Boston was already inequitable, and
the pre-existing racial disparities that played a role in placement at these schools created complications
that could not be overcome by an algorithm.97
Following this, the Boston school district tried again to use an algorithmic system to improve inequity,
this time designing it to reconfigure school start times – aiming to begin high school later, and middle
school earlier. This was done in an effort to improve student health and performance based on a
recognition of students’ circadian rhythms at different ages, and to optimize use of school buses to
produce cost savings. It also aimed to increase racial equity, since students of color primarily attended
schools with inconvenient start times compounded by long bus rides. The city developed an ADS that
optimized for these goals. However, it was never implemented because of significant public backlash,
which ultimately resulted in the resignation of the superintendent.98

In this case, the design process failed to adequately recognize the needs of families, or include them in
defining and reviewing system goals. Under the proposed system, parents with children in both high
school and middle school would need to reconfigure their schedules for vastly different start and end
times, putting strain on those without this flexibility. The National Association for the Advancement of
Colored People (NAACP) and the Lawyers’ Committee for Civil Rights and Economic Justice opposed the
plan because of the school district’s failure to appreciate that parents of color and lower-income parents
often rely on jobs that lack work schedule flexibility and may not be able to afford additional child care. 99

These failed efforts demonstrate two important issues that policymakers must consider when evaluating
the use of these systems. First, unaddressed structural and systemic problems will persist and will likely
undermine the potential benefits of these systems if they are not addressed prior to a system’s design
and implementation. Second, robust and meaningful community engagement is essential before a
system is put in place and should be included in the process of establishing a system’s goals and
purpose.

In AI Now’s Algorithmic Impact Assessment (AIA) framework, community engagement is an integral part
of any ADS accountability process, both as part of the design stage as well as before, during, and after
implementation.100 When affected communities have the opportunity to assess and potentially
reject the use of systems that are not acceptable, and to call out fundamental flaws in the system before
it is put in place, the validity and legitimacy of the system is vastly improved. Such engagement serves
communities and government agencies: if parents of color and lower-income parents in Boston were
meaningfully engaged in assessing the goals of the school start time algorithmic intervention, their
concerns might have been accounted for in the design of the system, saving the city time and resources,
and providing a much-needed model of oversight.

Above all, accountability in the government use of algorithmic systems is impossible when the systems
making recommendations are “black boxes.” When third-party vendors insist on trade secrecy to keep
their systems opaque, it makes any path to redress or appeal extremely difficult.101 This is why
vendors should waive trade secrecy and other legal claims that would inhibit the ability to understand,
audit, or test their systems for bias, error, or other issues. It is important for both people in government
and those who study the effects of these systems to understand why automated recommendations are
made, and to be able to trust their validity. It is even more critical that those whose lives are negatively
impacted by these systems be able to contest and appeal adverse decisions.102
Governments should be cautious: while automated decision systems may promise short-term cost
savings and efficiencies, it is governments, not third party vendors, who will ultimately be held
responsible for their failings. Without adequate transparency, accountability, and oversight, these
systems risk introducing and reinforcing unfair and arbitrary practices in critical government
determinations and policies.103

1.3 Experimenting on Society: Who Bears the Burden?


Over the last ten years, the funding and focus on technical AI research and development has
accelerated. But efforts at ensuring that these systems are safe and non-discriminatory have not
received the same resources or attention. Currently, there are few established methods for measuring,
validating, and monitoring the effects of AI systems “in the wild”. AI systems tasked with significant
decision making are effectively tested on live populations, often with little oversight or a clear regulatory
framework.

You might also like