AI Report
AI Report
Contents
AI Year in Review 04
AI Adoption Trends 16
Top Use Cases By Industry 22 ML Lifecycle 30
Insurance 24 Working with
Foundation Models 31
Retail & eCommerce 25
Data Challenges 33
Financial Services 26
Data Best Practices 35
Logistics & Supply Chain 28
Model Evaluation 37
Conclusion 40
About Scale 42
Methodology 44
ii Scale
A I YEA R I N R EV I E W
65%
either accelerated their existing
strategies or created an AI strategy
for the first time
06 Scale
AI Y E A R IN R EVI EW
—GREG BROCKMAN,
PRESIDENT & CO-FOUNDER, OPENAI
08 Scale
AI Y EA R I N REV IEW
A New
Era Generative models are already
transforming how we create art,
understand our world, and conduct
business.
Large language models help us write content
such as blogs, emails, or ad copy more quickly
and creatively. They summarize long-form
content so that we can quickly understand the
most critical information from reports and
news articles. Diffusion models streamline
marketing workflows, enabling marketers
to generate unlimited and infinitely creative
product imagery. Developers use LLMs to write
code more efficiently and help them quickly
identify and fix bugs. Advanced chatbots enable
businesses to improve their customer service
at a lower cost. Finally, organizations are
unlocking the power of their knowledge bases
by customizing LLMs with their proprietary
data to perform better on tasks unique to their
business.
We will now look at a few key terms and trends
essential to understanding this new era of
Generative AI.
10 Scale
A I Y E AR I N REVI EW
Over time, generative models have become more their specific business use cases.
capable as they’ve increased in size. Model size is Generative models are trained on a large
typically determined by its training dataset size amount of internet data, making them compe-
measured in tokens (parts of words) or by its tent generalists. These models can write poetry,
number of parameters (the number of values the solve logic puzzles, and identify bugs in code.
model can change as it learns). While generative models are great generalists,
• BERT (2018) was 3.7B tokens and 240 million parameters. they are poor specialists when solving problems
• GPT-2 (2019) was 9.5B tokens and 1.5 billion parameters. outside of their data distribution. Since a signif-
• GPT-3 (2020) has 499B tokens and 175 billion parameters. icant portion of data is proprietary to individual
• PaLM (2022) was 780B tokens and 540 billion parameters.
organizations, base large language models are
As these models scale in size, they become not well adapted to these specific domains.
increasingly capable, providing more incentive To improve performance on the specific tasks
for companies to build applications on top. of, say, an insurance company, an eCommerce
Generative models are now more widely avail- company, or a logistics company, these models
able as many large model developers provide must be fine-tuned and aligned to excel at those
APIs or make them open-source, and compa- particular tasks and provide responses that are
nies are quickly adopting these large models to useful to customers and employees.
Though Reinforcement Learning from Human Feed- ChatGPT is a large language model that has been
back (RLHF) is not new to the research community, tuned specifically for the task of conversational text
in 2022, it catapulted in popularity as it was a critical generation. ChatGPT was trained with RLHF and
ingredient in the success of ChatGPT. data in dialogue formats to enable it to act as a con-
versational chatbot.
Instead of attempting to write a loss function with
which to train a model, RLHF involves soliciting ChatGPT quickly became one of the most impactful
feedback from human users and training a reward product launches of all time, reaching 1 million users
model on that feedback. This human-defined reward in just five days and currently sits at over 100 million
model is then used to train a base model. This also users.
allows training on much more data since the human
feedback is mimicked by the reward model, so the ChatGPT was initially launched with GPT-3.5 but
dataset size is now only constrained by how many now also includes GPT-4 for ChatGPT plus subscrib-
prompts you can create. ers. These models are highly capable of question
answering, content generation, and summarization.
RLHF tuning results in models better aligned to hu- While these models provide more robust, informa-
man preferences, producing more detailed and factual tive, and creative responses than their predeces-
responses. sors, the real breakthrough for adoption was their
ability to hold conversations with humans. The
RLHF also defines the “personality” and “mood” ability to interact with the models in an intuitive
of the model, making it more helpful, friendly, and way increased the accessibility of the models so that
factual than the base model would be otherwise. anyone can use them.
This means we get responses from the model that
feel more human and less like talking to a machine.
RLHF is a critical component to the success of
recent LLMs and is also critical to ensuring that
enterprises using Generative AI get model responses
that align with their policies and brands.
12 Scale
AI Y EA R I N REV IEW
Prompt Engineer
— E M A D M O S TA Q U E ,
F O U N D E R & C E O , S TA B I L I T Y A I
14 Scale
AI Y EA R I N REV IEW
As Generative AI is now more capable and widely On their own, base generative models are valuable
available, companies are quickly incorporating it tools. Paired with a business’s proprietary data,
into their operations. 72% of companies will signifi- they become strong differentiators, improving the
cantly increase their investment in AI each year for customer experience, product development, and
the next three years. profitability.
16 Scale
Adoption Trends
Business leaders have identified that AI is critical
72%
to the future of their companies and are looking of companies are looking to
increase their investment in AI
to adopt it as quickly and with as much impact
each year for the next three years
as possible. We examine this trend and provide
insights on best practices.
59% of companies view AI as critical or highly As companies view AI as more critical to the fu-
critical to their business in the next year, and ture success of their business, they are increasing
69% in the next three years. The increasing AI investments over the next three years. 72% of
capabilities and availability of Generative AI will companies plan to increase their investment in AI
accelerate AI adoption. each year for the next three years.
52%
are in investing heavily
investing heavily in LLMs, 36% in generative visual in LLMs
models, and 30% in computer vision applications.
With the recently popularized capabilities of LLMs,
companies have rapidly shifted their AI strategies to
36%
harness the power of Generative AI. are in investing heavily
in generative visual
models
30%
are in investing in
computer vision models
18 Scale
What outcomes are achieved by companies that adopt AI?
As mentioned previously, com-
panies adopting AI are seeing
positive outcomes from im-
proved customer experiences,
the ability to develop new prod-
ucts or services and improve
existing products, and improved
collaboration across business
functions.
“I really believe that we are at a transformative moment today where ML is moving at an incredible
speed and problems that were thought to be too complex to solve with computers a few years ago,
are now being solved by applying machine learning. So we have this great opportunity. If machine
learning becomes more accessible, the world will move faster, our economy will move faster, science
will move faster.”
— F R A N C O I S C H O L L E T,
AI ENGINEER AND RESEARCHER, GOOGLE
Companies that view AI as critical to their business While leaders have identified the need to adopt AI,
indicate they have the executive support, strate- the execution of these strategies is difficult, nu-
gy and vision, and budget they need to succeed in anced, and heavily dependent on expertise. The field
implementing their company’s AI strategy. However, is moving so quickly that it is difficult to keep up
these companies generally lack the necessary exper- with the pace of advancement. Highly talented peo-
tise, software, and tools required to achieve success. ple with expertise in Generative AI are simply not
available to most organizations. Similarly, selecting,
standardizing, and updating the software and tools
associated with Generative AI, MLOps, and even
DevOps is challenging for companies without ded-
icated teams to keep up with these changes as the
requisite tech stacks are constantly evolving.
20 Scale
“As a product of this shortage in AI talent,
most businesses are missing out on a huge
opportunity to integrate this tech into
products and into their developer’s workflows.
Consumers are missing out on products that
have more magical, intuitive, and smart
experiences. The start of the fix comes with
the product folks making the decision about
what’s prioritized, looking at what they can
do, understanding where the technology is
today, and where they could insert it, and then
building it. I think we need to start making AI a
standard piece of every single product. I don’t
think consumers are going to tolerate dumb
products anymore. We need to make them
much, much smarter.”
—AIDAN GOMEZ,
CEO, COHERE
AI Adoption
by Industry
80%
Every industry is looking to
increase its AI budgets over the
next three years. Those that top
the list are: Insurance
79% 77%
Logistics & Financial
Supply Chain Services
75% 74%
Healthcare & Retail &
Life Sciences eCommerce
22 Scale
TO P U SE CASE S BY I N D U STRY
Insurance
Insurance companies look to AI
to help them improve customer
experience and improve opera-
tional efficiency.
24 Scale
TO P U S E CAS ES BY I N D U STRY
Financial Services
Financial services companies
look to AI to help them enhance
the customer experience, grow
revenue, and increase operational
efficiency.
26 Scale
TO P U S E CAS ES BY I N D U STRY
Financial Services
Content summarization includes
summarizing data sources such
as financial statements, histori-
cal data, news, and social media.
Trend detection is applying AI
to data to help identify patterns
humans are otherwise ill-equipped
to detect.
28 Scale
TO P U S E CAS ES BY I N D U STRY
30 Scale
Working with Foundation Models
As of late 2022, the most widely used foundation models were
BERT, GPT-3, Stable Diffusion, BLOOM, and T5/FLAN.
However, this landscape is quickly changing as more and more powerful generative models are being developed.*
BERT plays a critical but quieter role in many orga- to state-of-the-art models like GPT-4. The compute
nizations today, providing natural language under- required to run these more sophisticated models will
standing capabilities at a significantly reduced cost become cheaper over time, and companies will have
compared to larger models such as GPT-3. more third-party tools and expertise available to help
integrate these larger models into their operations.
However, this trend is shifting as BERT is not a gen- Open-source models such as LLaMA are already be-
erative model, so its use cases are limited compared ing optimized to run on consumer laptops.
*(ChatGPT/GPT-3.5/GPT-4 were not included in this survey as survey collection began in late 2022 before these models were launched)
64%
foundation models using their proprietary data
and knowledge bases. The biggest challenges
for fine-tuning foundation models are acquiring
training data and the necessary ML infrastructure.
Organizations working with generative models
find it challenging and resource-intensive to fine-
tune their models in-house. The most effective
techniques, like RLHF, require humans to apply
feedback to model outputs, custom software, and
26%
26%
specialized skill sets to ensure high quality and lim-
it human biases in models.
32 Scale
Data Challenges
34 Scale
Data Best Practices
Companies that invest in good data annotation infrastructure
can deploy new models, retrain existing ones, and deploy them
into production faster.
36 Scale
Model Evaluation
While most of the report has focused on RLHF Just as we found in 2022, measuring the business
and Generative AI, companies apply machine impact of models remains a challenge, especially
learning in many different ways, from object de- for startups or very small companies (those with
tection to recommendation systems. One critical fewer than 250 employees). These companies
component of these production ML systems is rely more on aggregated model metrics as they
how companies evaluate and monitor their per- are building products and a customer base, so the
formance. business impact is difficult to measure. However,
small to medium size organizations (500-9,999
employees) are measuring business impact more
than they were even one year ago, at about 73%
now (compared to 55% in last year’s survey).
38 Scale
Model Evaluation Methods vs. Time to Deployment
As we found in last year’s report, ML teams that Although small, agile teams at smaller companies
identify issues fastest with their models are most may find failure modes, problems in their models, or
likely to use A/B testing when deploying models. problems in their data earlier than teams at large en-
terprises, their validation, testing, and deployment
Aggregate metrics are a useful baseline, but as en- strategies are typically less sophisticated. Thus, with
terprises develop a more robust modeling practice, simpler models solving more uniform problems for
tools such as “shadow deployments,” ensembling, customers and clients, it’s easier to spot failures.
A/B testing, and even scenario tests can help validate When the system grows to include a large market
models in challenging edge-case scenarios or rare or even a large engineering staff, complexity and
classes. technical debt begin to grow. At scale, scenario tests
become essential, and even then, it may take longer
to detect issues in a more complex system.
While we have covered best practices for evaluat- appropriately, and human judgment is required to
ing models, we must note that for the first time, evaluate correctness. That means any sound model
the performance of Generative AI models is nearly evaluation, whether done by a generative model
impossible to evaluate automatically. This is because builder or an enterprise, will require human-in-the-
there are many ways for the model to respond loop validation and verification on an ongoing basis.
40 Scale
“Large generative models are already
giving people a productivity boost—
we’ve seen how these systems help
people write, code, learn, and more.
We expect the capabilities of these
models to rapidly improve, possibly
beyond our imagination. If we can
learn how to safely integrate AI
into businesses by creating helpful,
harmless, and honest systems, it
could have a transformative effect
on the economy and industries as we
know them.”
—JA R E D K A P L A N
C H I E F S C I E N T I S T, A N T H R O P I C
42
Zeitgeist: 2023 AI Readiness Report 43
Methodology
44 Scale
This survey was conducted online within the sciences (4%), media/entertainment/hospitality
United States by Scale AI from December 15, 2022, (3%), manufacturing (2%), and other (6%).
to January 25, 2023. We received 2,909 responses
from ML practitioners (e.g., ML engineers, data Many respondents (31%) represent organizations
scientists, development operations, etc.) and that are advanced in their AI/ML adoption—they
leaders involved with AI in their companies. After have multiple models deployed to production and
data cleaning and filtering out those who indicated are regularly retrained. About 18% are slightly less
they are not involved with AI or ML projects and/ advanced—they have multiple models deployed
or are not familiar with any steps of the ML de- to production—while 8% have only one model
velopment lifecycle, the dataset consisted of 1,699 deployed to production, 12% are developing their
respondents. We examined the data as follows: first model, and 11% are only evaluating use cases.
When asked to describe their level of seniority in
their organizations, over one-third of respondents
(37%) reported they are an individual contribu-
tor, nearly one quarter (25%) said they function
as a team lead, 33% are a department head or
executive, and 4% are owners. Most come from
small companies with fewer than 500 employees
(39%) or large companies with more than 10,000
employees (21%).