A Better AI Interface for Reviewing Contracts
Contextualized LLM output on the Screens platform

A Better AI Interface for Reviewing Contracts

A defining feature of generative LLMs is their ability to produce text given a prompt. This has naturally aligned LLM-powered products to converge on a consensus user experience: the conversational chatbot. A chatbot whose prose looks and feels human is part of what took the world by storm with the launch of ChatGPT. In business applications, this interface is popular for use cases such as automated customer service and agent-like virtual assistants. 

But try to use chatbots like ChatGPT for repeatable and scalable processes and you’ll quickly run into some issues:

  1. Collaborating with your team

  2. Provisioning access to your private data

  3. Organizing historical conversations

  4. Producing a coherent work product

  5. Frustrating prompt iterations

  6. Overly verbose output and wishy-washy answers

  7. Building repeatable workflows

Entrepreneurs and product managers are working on solving these problems. But for many use cases, the chatbot interface starts to feel like a square peg in a round hole. Not all LLM-powered applications need to be conversational chatbots. When LLMs are being used to power repeatable workflows, verbose conversation is a bug rather than a feature. 

Beyond the Chatbot

Consider reframing the core appeal of LLMs: not chatbots, but scalable reasoning engines. Their ability to follow instructions and perform advanced reasoning across a breadth of domains can often be more useful than plausibly human sounding conversational prose. 

A rich user experience that is purpose built to solve a particular set of business problems isn’t a new idea: these are SaaS apps. Today these are often hosted as web applications, where the user experience is tailored to solve a job to be done. LLMs present two novel improvements for traditional SaaS apps:

  1. A natural language instruction interface – This enables infinite user-driven customization. In the past, machine learning engineers defined application level details by guiding the AI at train time, now it’s the actual users of the application driving the AI at prediction time. AI is no longer narrowly defined within an application to do only the specific task it was trained to do. It is now general purpose.

  2. Real time advanced reasoning abilities – Nearly anything you can imagine can be reasoned about, at least to some extent, within interactive software applications in real time. Use cases are no longer bound by the application developer’s ability to collect data and train a bespoke AI model at a discrete point in time.

Taking these two concepts beyond the chatbot will power a new generation of business applications that unlock more of what LLMs have to offer.

Reviewing Contracts with LLMs

At TermScout, we’ve been building contract review software on top of machine learning and AI for years. When ChatGPT came online in November 2022, we dropped everything we were doing to start reviewing contracts with it. Since then, LLMs have continued to improve, and we think that they are ready to review contracts out in the wild given appropriate guardrails. 

But we ran into all of the problems outlined above when it came to doing any serious contract review work with a chatbot. Even when it could perform the reasoning, it was too verbose and challenging to organize repeatable analysis. It was tough to share, collaborate, and collate the output into something useful. 

More substantively, we experienced two major issues:

LLMs aren’t great at knowing what to look for in a contract. For example, I asked ChatGPT (using GPT4) the following:

ChatGPT's ideas for contract review criteria

On the surface this response probably seems okay. When you consult an expert on this type of contract, as TermScout has done with a combination of in-house expertise and our contract advisory panel, you’ll get criteria that look more like this:

The customer must not assign any intellectual property to the vendor except for rights to feedback provided by the customer.

The vendor must not completely disclaim all liability.

If there are any restrictions on the customer's ability to compete with the vendor, such restrictions must only apply to the use of the vendor's services for purposes of competing.

...

These are more precise. The wording lends itself to either passing or failing for any specific contract, rather than being more of a point of discussion as ChatGPT’s wording implies. Phrasing is intentional and exceptions are made clear.

It’s hard to instruct them to provide the answers that you really want. If a query is too vague, the answer will be too. If a query is too exact, it will take it too literally. Too often, the answer will be: it depends and you have to plead with the LLM to get a yes or no answer. When reviewing a contract programmatically, you need yes or no answers, else you are left with even more reading to do, which starts to defeat the purpose of automation.

We’ve found that to get a great contract review out of a LLM, you need to iterate on the language that you are using to query it. You need to provide descriptions of how you and your team define terminology:

  • What do you mean when you say broad usage rights?

  • What exactly would constitute protection of confidential information?

  • What do you consider a commitment to security standards?

Even if you think wording like this is industry standard, if you don't clarify, the LLM will assume. If you are using a LLM that has been fined tuned on legal terminology, you might disagree with the definitions that it was trained on. We don't want to rely on the LLM's definitions, we want to guide it with our own, and enable our customers to do the same.

We want to combine the LLM's ability to perform complex reasoning with expert opinions about how to interpret contracts – and deploy this to large numbers of contracts at scale.

This is what led us to develop Screens.   

Screens

Screens lets subject matter experts configure LLMs to review contracts. Anyone can then apply the embedded knowledge of those experts to any contract of their choosing in a fast, scalable, and repeatable way. These screens are portable and customizable. Users can create their own screens from scratch and they can be shared within private workspaces. The platform walks you through validating the accuracy of the standards that you set for a screen and helps you iterate on the language you are using to prompt the LLM. 

The result is a contract review platform that focuses on expert consensus around what matters in contracts, precise language for querying LLMs, and calibrating for accuracy. It enables repeatable and consistent contract reviews, bulk contract analysis, tagging and organization, and much more. 

We hope you’ll give Screens a try.

Monikaben Lala

Chief Marketing Officer | Product MVP Expert | Cyber Security Enthusiast

5mo

Evan, thanks for sharing!

Like
Reply

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics