Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Creating a 30-Million-Rule System: MCC and Cycorp

Douglas Lenat

IEEE Annals of the History of Computing, Volume 44, Number 1,


January-March 2022, pp. 44-56 (Article)

Published by IEEE Computer Society

For additional information about this article


https://muse.jhu.edu/article/853382

[100.0.196.127] Project MUSE (2024-04-18 01:15 GMT) Harvard Library


THEME ARTICLE: EXPERT SYSTEMS: COMMERCIALIZING
ARTIFICIAL INTELLIGENCE

Creating a 30-Million-Rule System:


MCC and Cycorp
Douglas Lenat , Cycorp, Austin, TX, 78746, USA

Hard-won discoveries led to early expert systems (ESs) successes, then overhyping,
and then disillusionment. The bottleneck was infrastructure: limited expressivity
representation languages, inefficient inference engines, inadequate ontologies, as
well as lack of common-sense, general theories of the world, and argumentation
and context mechanisms. At Microelectronics and Computer Technology
Corporation and then at Cycorp, we have systematically codified much of the
“obvious” knowledge of the world that one rarely articulates since “everyone” of
course already knows it. Lack of that solid infrastructure limits AIs’ trustworthiness:
they make mistakes no human would make, and they cannot explain their
reasoning. This is the story of my ESs experience and how 50 years of lessons
learned led my team to steadily and successfully construct an enormous
knowledge-based system that avoids such brittleness.

1966–1975: THE ADOLESCENCE OF One huge benefit of any ES is transparency: it


EXPERT SYSTEMS (ESs) just recounts the exact sequence of reasoning steps

S
tarting with DENDRAL [1] in 1965, a new soft- (i.e., rule firings) that led to its answer. In other words,
ware development paradigm began to take ESs cannot help but be able to fully explain their
shape. The essence of it was to observe conclusions.
experts performing some tasks and get them to intro- This is much like a student working on a math
spect and articulate; thereby, you could ferret out problem with pencil and paper; unless they erase their
more and more of the if-then “rules of thumb” they uti- work, it is available for inspection. A teacher or tutor
lize, which enable them to do that task. You could will often be able to “debug” a student’s mistake by fol-
then represent such pieces of expertise as rules for an lowing along that trace, identifying the exact step at
[100.0.196.127] Project MUSE (2024-04-18 01:15 GMT) Harvard Library

inference engine to run. In contrast to this new knowl- which they made their mistake, and then determining
edge engineering paradigm, conventional software the one or two most likely wrong procedural notions,
engineering (then and now) works by flowcharting and or missing concepts, which will teach the student
algorithmic design, then translating that worked-out exactly what they need to learn, to avoid making that
algorithm into a procedural program. ESs, however, sort of error in the future. That same educational pro-
are built incrementally, one nugget of wisdom at a cess applies when an ES gets an answer wrong: a
time, and eventually, hopefully, and incrementally, the human expert examines its reasoning steps, identifies
expert system approaches competence at its task. the place where the ES went awry, and directs a
This approach requires much the same leap of knowledge engineer to fix (or add) the incorrect (or
faith that we take when we educate the human beings missing) if-then rule. Eventually, that ES performs well
to carry out a new task. There, and in ESs, we cannot enough, case after case, and is trusted. This is not
be sure that the incremental education process will unlike the middle ages’ apprentice/journeyman/mas-
work, but it usually does—witness the many ES suc-
ter guild system, or our current interns/residents/
cess stories [2], [3].
attendings medical training system.
My personal research during this decade mirrored
the path that the field as a whole was following. In
1058-6180 ß 2022 IEEE
Digital Object Identifier 10.1109/MAHC.2022.3149468
high school, in the mid-1960s, I wrote procedural pro-
Date of publication 14 February 2022; date of current version grams in Fortran to perform statistical tests on experi-
18 March 2022. mental data for the nearby Beaver College psychology

44 IEEE Annals of the History of Computing Published by the IEEE Computer Society January-March 2022
EXPERT SYSTEMS: COMMERCIALIZING ARTIFICIAL INTELLIGENCE

department. I painstakingly flowcharted and imple- [6] and an airline seat reservation system. PUP superfi-
mented a huge decision tree that answered opera- cially seemed very successful, but analyzing its results
tions manual questions for sailors aboard a U.S. led me to realize that much of that intelligence was in
aircraft carrier. And as a University of Pennsylvania the eye of the beholder—much of the “I” in that AI was
undergraduate, in 1970, I programed pattern recogni- in the “I” of the observer. PUP succeeded because we
tion algorithms to connect cloud-chamber “dots” into first chose the target programs, then analyzed what
continuous particle tracks. dialogs could successfully synthesize them, and finally
Entering the Ph.D. program at Stanford in 1972, my codified exactly the knowledge and rules and auxiliary
advisor, Cordell Green, asked me to write a symbolic functions needed to carry out those dialogs. There-
AI program that could in turn write other target pro- fore, it is not surprising that PUP hit those targets, and
grams. It was given a small set of representative it is not surprising that it could not successfully carry
input/output (I/O) pairs for the target program. For out dialogs to write much else besides those large but
example, in one case, we gave it a few lists as inputs, preanalyzed target programs.
and their reverses as outputs; the program-writing My take-away from PW1 and PUP was that it was
program, which I called PW1, was then able to auto- too easy to build an AI that accomplished a narrow,
matically write a function that reversed a list. How did fixed, prespecified task, even a superficially large
it accomplish this? PW1 had one short recursive LISP task. We needed to pick a much more open-ended
program schema with nine “blanks” to fill in—nine vari- task, where we did not know exactly where the AI
able names—and a list of 12 auxiliary functions (such would end up. The result, in 1975, was “Automated
as first-element, plus, and append), which could fill in Mathematician” (AM) [7], [8], which was given some
those nine blanks. Unlike Friedberg [4], who mutated starting set-theory concepts and sent off to discover
machine-language programs, I was able to use LISP, a “anything interesting.” AM was a theorem-proposer,
much higher level programing language, so a large not a theorem-prover. It had heuristics for gathering
array of target programs (Reverse, Fibonacci, Flatten, data, noticing regularities, and deciding when those
Intersect, Maximum, Sort, etc.) could be quickly and might be interesting (or not). I expected it to develop
successfully synthesized, each starting from just a few more sophisticated set-theory, but instead it quickly
I/O pairs, because every target program fit PW1’s uncovered a unary representation for natural num-
schema [5]. But, just as Friedberg had learned in 1958 bers, and then hared off discovering hundreds of
[4], these early successes are misleading because the things in number theory, many of which had been
approach takes exponentially longer as the length of unknown (at least to me). On one memorable occa-
the target program increases. sion, one of my advisors, George Polya, was looking
I was able to improve the performance and sophis- at its results, and remarked “That reminds me of
tication of the target programs a little bit by adding a something a student of a friend of mine once did.”
few heuristic rules, such as which of the auxiliary func- He rummaged through an old trunk, found the rele-
tions made things bigger (made numbers larger and vant correspondence, and it turned out that his
made lists longer), and vice versa. Learning one of friend was G. H. Hardy, and the student was Srini-
those LISP programs, given a few I/O pairs, took about vasa Ramanujan! Even though that regularity (involv-
1 min of PDP KI-10 time, but learning a realistic-sized ing highly composite numbers) has no practical
million-line software system that way would require significance, Polya and I were happy to see AM
21,000,000 times as much computation, many orders of behaving much like the young self-taught Indian
magnitude longer than the age of the universe, even genius had, in his explorations in search for interest-
on today’s fastest supercomputers. ing regularities.
Taking more seriously how people write nontrivial From PW1, I learned that an AGI—a general
programs, my 1973–1974 program-synthesizing-pro- human-level AI—would take trillions of years to
gram (PUP) [3] was more like an ES. It started not from evolve; from PUP, I learned that it would be impressive
a few I/O pairs, but rather it received the specification but brittle to try to produce an AGI by prespecifying
of—and wrote—-the target program incrementally its behavior in great detail; but from AM, I learned that
and hierarchically, by carrying on a back-and-forth dia- a general human-level AI might be built by educating
log in English with the user (the person who wanted it incrementally toward intelligence. This is much like
that program written). PUP was able to write LISP pro- humans are educated, giving them facts, but more
grams that were a several pages long, not just a few importantly, giving them rules of good judgment and
lines long, including Pat Winston’s recent thesis pro- good guessing, rules for carrying out some analog of
gram (which itself was a machine learning program) the scientific method, and rules for judging the

January-March 2022 IEEE Annals of the History of Computing 45


EXPERT SYSTEMS: COMMERCIALIZING ARTIFICIAL INTELLIGENCE

interestingness of the results of those real-world and significantly, at nearly the last minute, specifically to
thought experiments. try to prevent EURISKO from winning again [11].
After a few years, EURISKO began to slow down in
its rate of discoveries, albeit not as quickly as AM had.
1976–1985: THE HEYDAY OF “TRUE” What was going on? John Seely Brown and I analyzed
ESs the sources of power in AM and EURISKO, and (in an
Much like the 1956 Dartmouth Conference [9] exuber- event sadly all too infrequent in academia) we wrote
ance, in 1976, we were all overoptimistic about how an article—“Why AM and EURISKO Appear to Work”
ESs would lead to human-level general AI in the ensu- [12]—with the bad news as well as the good news.
ing decade. Using a high-level programing language meant that
Others can recount the dramatic splash of knowl- there was a better impedance match between the
edge-based systems and companies during this syntax of a program and its semantics, so syntactic
decade [2], [3]; meanwhile, I was trying to “fix” AM: its mutation of math concepts (AM) and heuristic if-then
rate of making new discoveries slowed down, because rules (EURISKO) were able to succeed relatively often.
it never learned new heuristics beyond the 286 “inter- But that attenuated as the objects being mutated got
estingness” rules I originally programed into it. So, as a farther and farther away from the lovingly hand-
professor at Carnegie Mellon and Stanford Universi- crafted ones we started with, where that syntax-
ties, my research focused on how ESs could learn semantics impedance match was very strong. This
new if-then rules by mutating existing ones, trying had occurred at the object level with AM, and then,
them out, and empirically judging how well the more slowly, at the metalevel with EURISKO. Should
mutants did. The result, EURISKO [10], did make some we put in meta–meta-rules? That seemed too much
clever discoveries of new rules, which led it to, e.g., like recapitulating the Ptolemean rabbit hole of cycles,
design bizarre-but-lab-bench-verified three-dimen- epicycles, and epi-epicycles.
sional VLSI chips and bizarre-but-winning fleets in Well, okay, but why do not humans slow down and
national wargame tournaments, discovering new “peter out” when dealing with novel situations, then?
design heuristics even when the organizers, Game Why are not we brittle like that? One thing we do is
Designers Workshop, changed the tournament rules analogize to specific, but superficially far-flung

THE RISE OF ESs COMPANIES

I n 1980, the Stanford Heuristic Programming Project over its mission, and also resigned from its Board. It
[100.0.196.127] Project MUSE (2024-04-18 01:15 GMT) Harvard Library

(HPP) rs—Feigenbaum, Buchanan, and myself—and eventually went public, but by that time our founders’
staff members founded Teknowledge, Inc., to stock was diluted homeopathically.
commercialize expert systems. We took MYCIN, Other notable early expert systems commercialization
discarded its 400þ if-then medical diagnosis rules, and experiments included the Carnegie Group (generalizing
licensed the result, called E-MYCIN (for “Empty MYCIN”) CMU researcher John McDermott’s XCON expert
as a generic expert systems “shell”: a platform one could system, which had originally been built to configure
use to build their own expert systems by adding new orders for Digital Equipment Corporation products);
application-specific rules. Inference Corporation (which pioneered some elements
HPP also birthed a second 1980 start-up, Intelligenetics, of explicit context, so some rules could be true in some
commercializing the genetic-engineering-focused contexts and false in others); AION; Neuron Data; and
MOLGEN expert system. In 1983, it dropped that focus, the tragically namely AIDS (AI Decision Systems).
genericized its name to Intellicorp, and thus began But the expert systems companies languished, ultimately
directly competing with Teknowledge. Faced with a getting acquired and absorbed, their staff reassigned,
conflict of interest, Ed Feigenbaum recused himself from their technology mothballed. Below (“1985–1995”), I share
Teknowledge’s Board. Shortly thereafter, I grew my thoughts on why.
uncomfortable with Teknowledge prioritizing profits

46 IEEE Annals of the History of Computing January-March 2022


EXPERT SYSTEMS: COMMERCIALIZING ARTIFICIAL INTELLIGENCE

knowledge. And we fall back on more and more general had to do theses that differentiated their work
knowledge, such as “unplug it and plug it back in again.” from others’, and where faculty had to publish
So, should we represent the knowledge of, say, an articles and grant proposals that differentiated
encyclopedia, and then build our ESs on top of that their work from others’. We needed one unified
foundation? Alan Kay’s Atari Labs team and I began Manhattan-Project-scale team effort. A whole new
researching that question in 1983 [13], just before AI R&D framework would need to be created, and
America’s videogame industry crashed. One casualty funded, somehow.
was that research project. But by then I had realized Fortuitously, from the point of view of catalyzing
that general AI needed not just encyclopedic facts but events central to this narrative, Japan had recently
also—and even more importantly—their prerequisites: announced its Fifth-Generation Project [14]. Japan had
the various things that everyone already knows but been looking for its next high-payoff industrial frontier
are not considered worth writing down because they to conquer, to wrest control away from the West, much
are things that everyone already knows! as it had just finished doing with consumer electronics
and then automobile manufacturing and most recently,
at that time videogame consoles and games running
Starting MCC on those consoles. What fit the bill perfectly were paral-
I convened an informal meeting, a microversion of the lel processing and ESs-type AI. Japan’s MITI budgeted
Dartmouth Conference [9], at Stanford in the summer $850M for the effort, and its large high-tech corpora-
of 1983, to figure out just how much effort would be tions pledged matching funding and personnel.
involved in such a pump-priming undertaking. Every- This greatly worried large U.S. high-tech companies
one approached the problem from a different angle. (back in those days, they mostly were U.S. companies,
For example, Marvin Minsky did some calculations on not multinationals), and that in turn worried the U.S.
the back of an envelope (he waited until we could pro- government. The reaction from Congress was swift and
cure an envelope for him!) taking the rate at which bold and decisive (that is another way you can tell this
humans learn—a few ideas per minute get perma- was several decades ago!). It quickly passed NCRA, the
nently “burned into” our brains—and multiplying that National Cooperative Research Act: in effect it said:
by the age of a toddler who has “enough” common “Hey, you large U.S. high-technology companies, nor-
sense. The natural language AI folks multiplied the mally it would violate antitrust laws for two or more of
number of words that a speaker typically knows by you to collude on computer- and AI-related R&D, but
how “much” he/she typically knows about each word. I for the next ten years we promise to ignore that.”
analyzed a few encyclopedia articles, ferreting out This gave the green light to those companies to
how much “there” there was—how much content, if form research consortia. They lost no time doing so.
expressed in if-then rule form—and multiplying by the The first was the Microelectronics and Computer
total number of articles. Technology Corporation (MCC), formed by ten U.S.
This cacophony of diverse methods yielded a sur- companies each of which contributed a few million
prisingly consistent result: a small number of millions dollars a year to fund it: Advanced Micro Devices, Con-
of pieces of knowledge would need to be hand-coded trol Data, Digital Equipment Corporation, Harris, Hon-
to prime the knowledge pump, after which two nearly eywell, NCR, National Semiconductor, RCA, Sperry-
automatic methods should become more cost-effec- Univac, and Motorola. By 1992, 13 additional member
tive: i) learning by natural language understanding— companies joined (Bell Labs, Microsoft, Apple. . .) [15].
carrying on conversations and reading online texts; Bobby Ray Inman became its director, and despite
and ii) learning by the scientific method—designing attractive offers from Atlanta, Boston, Palo Alto, and
and carrying out experiments and formulating new Research Triangle Park in North Carolina, Admiral
hypotheses based on the empirical results, much like Inman chose Austin as the place to locate it. Not only
AM and EURISKO did. was Austin much more affordable, but it already had
We then multiplied this by the rate at which we many high-tech companies, and was a melting pot of
were able to write and test if-then rules in ESs (about education, government, the arts, and it offered MCC a
80 minutes per rule, including the overhead for knowl- beautiful new building rent-free for a decade.
edge elicitation up front, and the overhead for debugging MCC was created and endowed to pursue large,
and testing). Therefore, we estimated that hand-engi- high-risk, high-payoff, and long-term research projects.
neering those three million rules would take two person- It was understood that many of those projects would
millennia of effort. That was doable, but not readily not succeed, but the few successes would keep Amer-
doable within academia where graduate students ica competitive with Japan.

January-March 2022 IEEE Annals of the History of Computing 47


EXPERT SYSTEMS: COMMERCIALIZING ARTIFICIAL INTELLIGENCE

Admiral Inman visited me and observed, “Professor, As I said earlier, the “missing think” between peo-
this big AI project you want to do, it’s important, it’s ple and AI systems was that people had common-
exactly what MCC has been set up to do. If you stay sense knowledge and reasoning abilities, and AI sys-
here with a few graduate students, it’ll take you 300 tems did not. When confronted with unexpected situa-
years, or you can come to Texas, and you’ll probably tions—which happen all the time except in carefully
live to see it completed!” A compelling argument, and prescribed applications—humans fall back on increas-
one which basically turned out to be exactly right. But ingly general knowledge and rules of thumb. How
the Stanford Computer Science Department had just could we identify and represent all that, let alone build
voted that I could get tenure, one year early. Fortu- that into an efficient infrastructure for ESs?
itously, in hindsight, the dean, Gordon Bower, asked me My two tenets were as follows.
to wait a year—he wanted to send a message to stop
“all these early-tenure cases.” That tipped the scales; I › What we are doing is not research; it is very long-
accepted the offer to become the Principal Scientist at term development, so we should tackle the hard
MCC, at least for a year, and moved to Austin’s bur- problems head-on (rather than pursuing bump-
geoning “Silicon Prairie.” That year spoiled me: the rate on-a-log opportunities good enough for pub-
of research progress was so great that I never returned lished articles). And we did!
fulltime to academia, though I flew back to teach a › There will be failures and dead ends along the
course every year or so at Stanford [16]. way. And there were!
There were several related R&D threads MCC
needed to pursue, including knowledge-based natural Fortunately, the first few false starts were mea-
language understanding, ESs that were easier to sured in months, not years.
build and debug, and user interfaces that learned
to better anticipate what the user wanted to see when/
where/why. There were also infrastructure projects Distracted by the Black Space Versus
that we launched, to develop more cost-effective stor- the White Space, and by Kids
age devices, smaller microchip packaging, and scalable I already mentioned that my naïve starting point was
deductive databases to accommodate the inevitable focusing on encyclopedia articles. Encyclopedia Bri-
tidal wave of big data, which did indeed arrive and inun- tannica was the gold standard, back then. We started
date the world about as we expected. And out in front representing articles, but quickly realized they con-
was my flagship AI project, the previously mentioned tained largely the complement of common knowledge.
massive effort to codify the human common-sense For example, Britannica’s Water article [17] is 3744
knowledge and develop representation and reasoning words long, and none of those words are “drink”! What
techniques as broad, expressive, and efficient as we really wanted was to codify what all the other
humans used. I dubbed that project Cyc. 40,000 articles besides that one—not that one!—
already assumed that the reader knew about water,
[100.0.196.127] Project MUSE (2024-04-18 01:15 GMT) Harvard Library

which we could ferret out by examining each instance


MCC 1984–1994: ESs TAKE AN where those other articles used the word “water.”
EVOLUTIONARY LEAP Another of our brief false starts involved looking at
MCC provided orders of magnitude more resources young children’s books and talking with children to get a
than my academic tidepool: 400 researchers for ten better handle on what they believed and how they rea-
years, a total of over $500M invested by ultimately 23 soned. Children’s books do have limited vocabularies.
MCC member companies. I made good use of that. But complicated concepts and themes are still present,
We never had to stop at “good,” for commercial or gov- encoded into those smaller vocabularies, and that can
ernment-program or publish-or-perish reasons—we actually make understanding harder, e.g., “smurf” can
could keep hypothesizing, developing, testing, and mean anything. They introduce one-off neologisms, e.g.,
improving our understanding and our technology. in Whoville live “fah who foraze and dah who doraze.”
This was the most productive decade in my Authors freely employ anthropomorphisms violating
research life. My team and I identified and got a solid common sense, e.g., animals talk and wear clothes with
start at building the missing infrastructure without no explanation given at all, but (usually) do not fly. The
which ESs were doomed to be brittle, and without writers can do all this because their young audience
which ESs building was doomed to remain a delicate already has common sense, and very quickly learns the
art or cottage industry rather than an engineering various fictional conventions of the genre. So, this was a
practice. dead-end.

48 IEEE Annals of the History of Computing January-March 2022


EXPERT SYSTEMS: COMMERCIALIZING ARTIFICIAL INTELLIGENCE

FIGURE 1. We spent more time than we should have, tinkering to get Cyc’s upper ontology “right.”

We held interactive sessions with young children, talk- News, and introspect on what we already knew
ing with them and studying their behavior. That was enter- that caused us to disbelieve what it was saying.
taining (for them and for us) and helped us identify several › Look at the space between two sentences, ask
useful pieces of common sense, but it proved much too yourself: what did the writer assume I’d infer,
slow to adopt as our primary methodology for identifying there? What did he/she omit, knowing I’d find
the millions of pieces we knew would be needed. those things unnecessary, tedious, confusing,
Within a few months, we developed the following two even insulting? Consider: “The bank was robbed.
powerful techniques we still use to identify the unstated Fred went to prison for 15 years.” Between those
common sense underlying almost any text—novels, ads, two sentences, the writer expects us to infer
news, emails, etc., not just encyclopedia articles. that Fred was (probably) the robber, in that
event, he was caught, arrested, arraigned, went
› Look at any ambiguity (e.g., pronoun or polyse- to trial, a jury found him guilty, a judge sentenced
mous word) and ask yourself: What did the writer him, etc. Even that one simple example of intro-
assume I already knew, which would let me dis- specting led us to articulating several elements
ambiguate that? In the sentence “The horse was of common sense about crime and punishment.
led into the barn while its head was still wet.,”
the phrase “its head” means the horse’s head. Distracted by Upper Ontology
But changing “head” to “roof” and “its roof” Every ES makes distinctions, which carve up (its repre-
clearly refers to the barn’s roof. In each case: sented corner of) the world into a set of terms or con-
Why? Because horses have heads but not roofs, cepts, what philosophers call an “ontology.” To
and vice versa for barns. That is common sense. distinguish what we were doing, building Cyc from
For fun, but also because it proved useful, we what everyone else was doing building ESs, we dubbed
would sometimes look at The Weekly World our activity “ontological engineering” to distinguish it

January-March 2022 IEEE Annals of the History of Computing 49


EXPERT SYSTEMS: COMMERCIALIZING ARTIFICIAL INTELLIGENCE

from “knowledge engineering.” OE versus KE. What symmetric, and it was extremely important to get it
started as a joke, a consciously pretentious borrowing right, or else you would be building your eventual soar-
of that term from philosophy, has since become the ing temple of knowledge on a faulty foundation.
standard way of referring to the skeleton of an ESs As we eventually came to realize, though, one’s
(i.e., sans the rules), so the joke, I suppose, is on us. upper ontology mostly just impacts efficiency. Even if
In 1984, we began building a broad “upper you have a suboptimal upper ontology, eventually you
ontology” of general terms, which could serve as an will build up a middle and lower ontology of increas-
interlingua between potentially any and all ESs. We ingly specialized concepts, which is adequate to get
continue enlarging that foundation for Cyc, but only by in the world, adequate to communicate with other
rarely nowadays do we have to go back and tinker people and databases, and applications and ESs.
with those tried-and-true concepts and distinctions This was driven home to us at the 1986 Fifth-Gen-
from decades ago. eration Project conference in Tokyo, when we saw the
ontology built by Japan’s answer to Cyc, named Elec-
tronic Dictionary Research (EDR). Their topmost dis-
tinction was between things with souls and things
THE IMPORTANT LESSON WAS: without souls. And large trees were in the former cate-
MAKING SUBOPTIMAL ONTOLOGY gory, whereas small trees were in the latter category.
CHOICES JUST MEANS THAT YOUR We were astonished. . . but. . . so what!? We could still
ONTOLOGY AND KNOWLEDGE BASE communicate with those people, and they with us;
MIGHT HAVE TO BE BIGGER, MORE they seemed to us to have common sense, and vice
VERBOSE, TO MAKE UP FOR THOSE versa. They and their EDR system knew that both
MISSED GENERALIZATION types of trees needed water and sunlight and had
OPPORTUNITIES. roots, etc., they just had to represent each of those
assertions as two separate rules instead of one, as we
did in Cyc. No big deal.
The important lesson was: Making suboptimal ontol-
Somewhat ironically—self-defeating the intent of ogy choices just means that your ontology and knowl-
having one standard—there have been many alterna- edge base might have to be bigger, more verbose, to
tive efforts over the past few decades that attempted make up for those missed generalization opportunities.
the same thing [18]. Almost all of these projects resulted As another extreme example, consider this Philos-
in a relatively small number of terms compared with ophy 101 example: Imagine we do not have the con-
Cyc’s—typically hundreds rather than our hundreds of cepts and words for familiar colors of the rainbow,
thousands—and a small number of relations intercon- instead we just have bizarre concepts like “grue”—the
necting the terms—typically tens rather than our tens property of being green by day and blue by night—-
of thousands. This is not bragging, we continue to and “bleen”—the property of being blue by day and
[100.0.196.127] Project MUSE (2024-04-18 01:15 GMT) Harvard Library

police the ontology and try to keep all those numbers green by night. Yes, those are terrible choices; nothing
as small as possible, but as I will explain below, their in the universe is grue or bleen! But we can still use
magnitude represents what we came up with despite those two terms to express “Grass is green,” just more
our efforts; we only kept those which proved genuinely verbosely—we would have to say “Grass is grue by day
cost-effective. It is akin to why English does not just and bleen by night.”
have a few hundred words; that would obviously be So why care about how good or terse an ontol-
an. . . ungood language in which to communicate. ogy is at all, then? It all comes down to efficiency.
We even wasted quite a bit of time trying to get the If you have grue/bleen-class inefficiencies in the
very most general tip of Cyc’s concept network “right,” way your ES carves up the world, you will have to
or at least as “right” as possible (see Figure 1). write longer if-then rules. And that means more
Part of this obsession might be blamed on Aristo- opportunities for the rule-writers to make mistakes.
tle. He was astoundingly correct about a lot of things, And since your rules will have more literals (e.g.,
so much so that many of his mistaken beliefs were more conjuncts on the if-part, as we saw with the
passed on as established truth, unchallenged, for mil- rule about grass, above), your inference engine will
lennia. One of the latter was his faith (there is no other have to work harder, and slower, and potentially
good word for it) that there does indeed exist one cor- less completely, as a result. And it will be harder to
rect, true, natural upper ontology; and, moreover, that automatically or semiautomatically translate from
that set of distinctions was very small, it was very natural language texts, and other ESs, to your

50 IEEE Annals of the History of Computing January-March 2022


EXPERT SYSTEMS: COMMERCIALIZING ARTIFICIAL INTELLIGENCE

ontology, because those others do not have terms than the specialized term TheGovernmentOfFran-
corresponding one to one with the idiosyncratic ceIn2009. Three of those terms are term-denoting
inefficient terms you chose for your ESs. functions as follows.
On the flip side, a general ontology had better
not make too few distinctions. Every time that hap-
1) During, which takes two arguments. Its Argu-
pens, the inference engine may fail to derive some
ment1 represents something that extends/exists
conclusion it otherwise would have, or at least not
for some period of time. Its Argument2 is a
as easily and efficiently. If you smurf your smurfs
smaller time period, which “clips” the first argu-
too smurf, you will fail to deduce something that
ment to that narrower time frame.
follows from the evidence in front of you. As an
2) TheGovtOf, which takes a geopolitical entity and
example, in 1984, Cyc’s ontology contained one
denotes its government.
relation—“in”—to denote physical containment. But
3) TheYear, which takes an integer and denotes the
just saying that x is in y, can we answer questions,
year with that value, in our current calendar system.
such as the following:
From the outside of Y, can I see any part of X? A fourth term, France, denotes the overall geopolit-
If I turn Y over and shake it, will X fall out? ical entity that is France throughout its existence.
Is there room to put more things in Y? Even though we do not usually think of it this way, the
Is X actually a part of Y? integer 2009 is expressed in the “tens” notation, which
How could I go about removing X from Y? means it is really built compositionally from a few
other, much more frequently used, terms:
No, we cannot. The way that a dollar is in my wallet
(completely contained) is different from the way I am (TheSumOf (Thousands 2) (Hundreds 0) (Tens 0)
in my shirt (not completely contained), which is differ- (Ones 9))
ent from the way the sugar is in my coffee (invisibly
dissolved), etc. Within two years, we had to tease
apart a graph of 60 different specializations of in; Forced to Tradeoff Representational
today, Cyc has 75 such relations. Efficiency Versus Expressive Power
As another example, the Cleveland Clinic hospital The next major mistake we made was where to place
database system, to which we aligned Cyc’s ontology, our bet on this tradeoff between efficiency versus
listed the specific site of a patient’s infection (blood, expressiveness of representation.
urine, etc.) but if there was more than one site, then it
just said “multiple,” instead of saying which sites they › At one extreme are expressive representations,
were, since the value could not be a list. Every time such as natural languages, such as English. But
their database said “multiple,” there were missed computers cannot understand those very well. If
opportunities for inference. I tell you, say, three statements in English from
So, yes, while there may not be one clear correct/true/ which any reasonable human being would con-
natural ontology, some ontological distinctions are better clude X, there is very little likelihood that any
than others. For example, there was Jorge Luis Borges’ computer program now or in the near future
reporting (tongue in cheek) about an ancient “Celestial would conclude X. Cyc is sometimes able to do
Emporium of Benevolent Knowledge” classifying animals this, finally, but even when it is successful, it usu-
into those that belong to the emperor, embalmed ones, ally takes a very long time doing that reasoning.
those that are trained, suckling pigs, mermaids, fabulous › At the other end of the spectrum are program-
ones, stray dogs, those that tremble, innumerable ones. . . ming languages, which are quite efficient with
To minimize the size of Cyc’s ontology, we found it algorithms guaranteed to work, but the range of
useful to allow term-denoting functions. For example, what can be expressed is quite limited. Think of
instead of a combinatorial explosion of terms, such as representing—in Java, SQL, or Cþþ—Juliet’s
the following: belief that Romeo would believe that she was
TheGovernmentOfFranceIn2009, still alive even after he heard of her apparent sui-
Cyc expresses that compositionally as: cide. Imagine how difficult it would be for a pro-
(During (TheGovtOf France) (TheYear 2009)). gram to automatically and quickly infer things
involving that, e.g., to explain why Romeo did not
This is built compositionally out of terms, which
believe she was still alive. By the way, Cyc suc-
are each used many orders of magnitude more often
cessfully did this [19].

January-March 2022 IEEE Annals of the History of Computing 51


EXPERT SYSTEMS: COMMERCIALIZING ARTIFICIAL INTELLIGENCE

Our initial bet, in 1984, was at the point of the representing and solving that. In 1986, Cyc had two
tradeoff curve in the middle, what today would be such representations; by 1990, there were 20, each
called Knowledge Graphs, but back then were known with its own inference engine; today Cyc has over
as Frames and Slots, or Object/Attribute/Value Triples, 1100. They work together as a community of agents,
or Associative Triples, or Beings, or Actors, or Objects. communicating by posting their intermediate
This is not as efficient as compiled code and is not as results on a sort of blackboard that all the other
expressive as English. It is able to represent three- agents can watch and react to if/when/as they see
word English sentences, or longer ones that are built an opportunity that fits their specialty.
up out of such triples using connectives, such as AND,
OR, and NOT. But translating Romeo and Juliet into such
SUCCESS SPOILS ESs: 1986–1995
a “triples” representation throws most of the baby out
I will return to the Cyc story (and MCC story) below,
with the bath water. Over 80% of what Shakespeare
but what about everyone else, during the 1986–1995
was literally saying would be lost, and 99% of the
decade? In 1986, the world was buzzing about ESs
things that every listener or reader is expected to infer
being the future; let me capitalize that: The Future. By
would be lost (“lost” here meaning that a program
1995, ESs were The Past. What happened?
would not infer them, no matter how slowly).
Partly this “AI Winter” [21] was a correction for the
So, kicking and screaming, from 1984 to 1994, we
previous decade’s overhyping; the ESs paradigm was
adopted more and more expressive logical languages:
now taking its proper place as just another software
description logic (with defaults), first-order predicate
application development tool. A good analogy is: I’d
calculus (arbitrary arity; nested quantified variables),
hate to build a house without a saw, or with only a
second-order (term-denoting functions; reasoning
saw!
about relationships and statements), and then higher
In this view, ESs and ESs companies did not fail,
order logics with features, such as modals and reflec-
they got absorbed into larger organizations and para-
tion [20] pro- and con-argumentation, counterfactual
digms. But I saw another reason why ESs platforms
hypotheticals, contexts as first-class objects in our
and tools and companies fell into disfavor.
ontology, several different useful “species” of nega-
Key decision-makers at large organizations found
tion, and dozens more such features.
the ESs story compelling; they literally and metaphori-
But there is a reason that everyone else in AI has
cally bought what the ESs companies were selling.
always shied away from that high-expressivity end of
They began to require their software engineers to
the tradeoff curve: even simple inferences can take a
write application programs that way, using the if-then
long time for the system to mechanically produce.
rule platforms from Intellicorp (KEE), Carnegie Group
In 1986, we saw the way out. As War Operation
(Knowledge Craft), Inference Corporation (ART), and
Plan Response said in War Games, the only way to
Teknowledge (ABE). The coerced programmers down
win is not to play. Instead of picking one represen-
in the trenches were steeped in the old existing soft-
tation along that expressivity versus efficiency
[100.0.196.127] Project MUSE (2024-04-18 01:15 GMT) Harvard Library

ware development paradigm; they never learned, or


tradeoff curve, pick two or more. Each piece of
they never really trusted, this new “incremental
knowledge is therefore represented in Cyc in two
approach to competence” paradigm. They feared that
or more ways. That is just a linear inefficiency, but
if they tried to program that way, by teaching the com-
it provides an exponential speedup. Whenever a
puter, rather than by carefully flowcharting and engi-
problem is being worked on, there is a whole bat-
neering, their application would never succeed. So,
tery of specialized inference engines and represen-
yes, they used the ESs platforms, as ordered, but they
tations that can be brought to bear to work on it
used them to—tortuously—build traditional programs,
and, when making progress, broadcasting the
which only superficially appeared to be ESs.
results to all the other inference engines. Whenever
In one case, an application program was
progress is made, all of them stop and work on the
completely flowcharted out and coded in a procedural
now-simpler subproblem.
programing language, PL1. Explicit line numbers were
Some of the inference engines are very general,
assigned, and the whole large program was then
and work on general representations—e.g., a theo-
methodically translated line by line into a set of super-
rem prover that works on first-order logic. The
ficially (i.e., syntactically) if-then rules, each one having
more specialized inference engines are much faster
the following form:
whenever they do apply. For example, if some sub-
problem requires solving N linear equations in N “IF line-number ¼ 918
unknowns, there is a very efficient way of THEN replace x by 2x and set line-number to 919.”

52 IEEE Annals of the History of Computing January-March 2022


EXPERT SYSTEMS: COMMERCIALIZING ARTIFICIAL INTELLIGENCE

These programs took much longer to write and to governance decisions. The board became incapable of
debug, than if they had just been programed conven- acting. Admiral Inman threatened to resign if that did
tionally. And they had no better success rate—indeed, not change. It did not, and he did resign, in 1986. Four
their behavior was identical to that of their original more CEOs followed, some folksy (Grant Dove), some
PL1/Fortran/C version. The result was technically an brilliant (Craig Fields), but none with combined the wis-
ES; so, could it explain its behavior? Sort of, but those dom, wit, warmth, and vision of Bob Inman.
“explanations” were of the form “because line 918 of Five other changes happened that eroded the
the PL1 program told me so” and were of zero use to underpinnings of the MCC model as follows.
any human being! Also, they ran more slowly, because
they were triggering rule-firings rather than compiling › Large American companies became large multi-
into efficient executable procedural code. national companies, reducing the whole raison
The best analogy I can think of here is that these d’etre for their belonging to any “America-First”
were zombies superficially appearing to be ESs. They MCC consortium.
were unintentional parodies of ESs. › Companies became loathe to spend much on
The ESs companies understood this; they under- long-term R&D, lest their stock be down-valued
stood the “real” ESs paradigm and its power. But they by a new species of being called “mutual fund
were hungry young startups, and their priority was to managers.”
grow at any cost. So, they were complicit. They aided › The Japanese Fifth-Generation Project failed,
and abetted and even encouraged those heart-break- removing the common enemy.
ing “cargo-cult” misuses, resulting in zombie ES’s. › The decade of NCRA antitrust dispensations
Cargo-cultism also hurt the Japanese Fifth-Gener- expired.
ation Project, along with other problems: restricting › Start-ups began recruiting and retaining employ-
their representation to a knowledge tree; making deci- ees with stock options, something MCC could
sions (e.g., selecting programing languages) inten- not do.
tionally different from the West; insisting on parallel
processing even where it did not help much [22], and A down-funding spiral occurred. Project leaders had
staffing up almost entirely with assignees from rival to start going after U.S. Government grants and research
companies—assignees told to keep their eyes open contracts, eliminating one main advantage to being at
but not to contribute anything patentable. MCC versus being at a university. MCC shrank from 400
employees to 300, 200, 100. . . Its overhead rate kept
going up, making it even harder for the remaining techni-
1994–2000: THE END OF MCC AND cal staff to procure enough U.S. Government contract
THE END OF AN ERA dollars. It finally dissolved in 2000.
Like the Japanese Fifth-Generation Project, MCC was
also initially staffed mostly with assignees from its
member companies. Each company sent their “diffi- 1995þ: FORMING CYCORP
cult” cases: employees too smart to fire but a pain to We spun the Cyc-building project out of MCC at the
work with because of personality, habits, ego, commu- end of 1994. The Member Companies agreed to let our
nication skills, and/or personal hygiene. Having a whole new start-up, Cycorp, retain 95% of the ownership of
enterprise comprised of hundreds of such individuals all the IP, and when MCC dissolved, that became
was not livable. After less than a year, CEO Bob Inman 100%.
declared, “I didn’t come back here to Texas to run a Tur- Cycorp supported itself through a combination of
key farm!” and sent them all back to their member com- commercial contracts (including from Microsoft and
panies. Most MCC staff members were then recruited Apple) and government ones (including DARPA and
externally, with each company stationing just one or the U.S. intelligence agencies).
two liaisons at MCC to report back to their home base. That remained true for 27 years, but, as of January
There were a few systemic problems MCC could not 2022, 100% of Cycorp’s funding comes from compa-
escape. For instance, each member company received nies licensing Cyc as a common-sense platform for
one board seat. In Year 1, this was fantastic: the CEOs of their applications, and as an interlingua to fully,
all the member companies actively participated! In Year semantically integrate all the data they generate and
2, they sent their Executive Vice Presidents. In Year 3, all the data they license from third parties.
that EVP sent someone more junior, who was only autho- This strategy has enabled us to avoid almost any
rized to listen and report back, not make any actual stock dilution, and we remain almost entirely employee

January-March 2022 IEEE Annals of the History of Computing 53


EXPERT SYSTEMS: COMMERCIALIZING ARTIFICIAL INTELLIGENCE

owned. Our 50þ technical staff members generate ever And part of the truth is that they have grown and
more common-sense test questions and get Cyc to expanded and live on in Cyc and its applications,
answer them by actively expanding the ontology, writ- which are each just small extensions of Cyc.
ing if-then rules, adding new specialized representa- All the Cyc-based applications today would be imme-
tions and inference engines, and aligning our ontology diately recognizable as ESs; each one is a relatively tiny
with external database schemata, OWL [23] ontologies, extension of the core Cyc system, just as each employee
and web services API’s. of our customer companies only had to learn a relatively
I could go through 148 more knowledge-based sys- tiny body of new information, rules of thumb, etc., for their
tems “lessons learned” stories, taking us from the MCC latest job. Cyc is running in hospitals, tracking and pre-
days through to the present, but that goes beyond the dicting patient care and throughput; in manufacturing,
ESs history focus of this Annals Special Issue. reasoning about supply chain opportunities and vulner-
But I will mention a few more ES-relevant lessons abilities; helping robots be less brittle; etc.
that we learned at Cycorp, which are as follows. In that sense, the ESs story is not over, it is con-
tinuing today on a much more solid grounding (i.e., the
› Contexts are best made explicit, and are a broad Cyc common-sense knowledge and reasoning
necessity not a frill: even ontological relation- platform) than the isolated stove-pipe ESs of the past,
ships are often true in one context and false in each of which had to pick one narrow ontology, one
others (e.g., in other time periods, in other belief representation of knowledge, one specialized set of
systems, at other levels of granularity). consistent if-then rules for a particular task, and one
› That means Cyc’s knowledge base is inconsistent; inference engine.
instead of requiring global KB consistency, each But there is a fourth truth. Perhaps the most impor-
context is locally (more or less) consistent. tant evolution has been a pairing, the synergy, with sta-
› Since contexts are first-class objects in Cyc’s tistical machine learning technology. An early example
ontology, it can reason about them just as its of this is the Cyc application for Cleveland Clinic [25]
reasons about diseases, kids solving math prob- and the follow-on we did for the National Library of Med-
lems, or relationships. icine. A machine learning system looked at longitudinal
› Increasing efficiency is most powerfully accom- data about patients’ diseases and single nucleotide
plished by adding meta-rules (tactics) and polymorphisms (sNPs). Genome-wide association stud-
meta–meta-rules (strategies) rather than by try- ies identified A ! Z correlations between these geno-
ing to limit what the system knows. typic point mutations of their DNA (A) and the
› It is cost-effective to separate out the efficiency- phenotypic medical conditions that brought them into
motivated pragmas from the logical warrant of the hospital (Z). But those correlations were often spuri-
each if-then rule. ous, weak, and did not provide any causal explanation or
› We found we could replace a proliferation of pathway. That is where Cyc came in, pulling together
100þ inference parameters (e.g., maximum rea- biochemical and medical knowledge (e.g., what polymer-
[100.0.196.127] Project MUSE (2024-04-18 01:15 GMT) Harvard Library

soning depth, maximum time allotted) by empiri- izes what, what reactions occur where, with what inputs
cally identifying a handful of settings, which and outputs, what catalyzes what, etc.), general knowl-
could duplicate 99% of the past successful infer- edge (e.g., what people with different hobbies and occu-
ences it ever did. pations do and come into contact with and where), and
› It has been highly cost-effective to build dozens common-sense knowledge (restricting flow in a conduit
of Cyc application-building tools powered by Cyc causes pressure increases up-stream, decreases down-
itself, so it can increasingly help with its own stream, and decreased flow, which applies to veins,
growth and application building and debugging, roadways, straws, pipelines. . .).
much in the spirit started by Teiresias [24]. Cyc would bring all that to bear, and come up with
plausible causal chains of how that particular sNP A
might have led to medical condition Z. For example,
FOUR ANSWERS TO: IS THE ESs this sNP “A” is next to a gene which, when expressed,
STORY OVER? forms this protein, which catalyzes this reaction,
Part of the truth is that ESs as we know them are which. . .. ten steps later, leads to high levels of bioac-
dead; well, mostly dead. tive vitamin D in the patient’s blood, which. . . another
Part of the truth is that they have been incorpo- ten steps later, leads to their developing osteoporosis,
rated into software and software development meth- which is “Z,” the medical problem they came to the
odology, much like a tool in a toolbox. hospital to treat.

54 IEEE Annals of the History of Computing January-March 2022


EXPERT SYSTEMS: COMMERCIALIZING ARTIFICIAL INTELLIGENCE

The power of this synergy loop is that it then goes full [2] D. Leonard-Barton and J. J. Sviokla, “Putting expert
circle: Cyc takes its predictions, such as the one about systems to work,” Harvard Bus. Rev., vol. 66, no. 2,
elevated bioactive vitamin D levels, which seems irrele- pp. 91–98, Mar./Apr. 1988.
vant, but which, if confirmed, is strong evidence confirm- [3] D. A. Waterman and R. Hayes-Roth, Eds., Building Expert
ing that there is some causal linkage between A and Z, Systems. Boston, MA, USA: Addison-Wesley, 1983.
and it is likely the causal pathway Cyc came up with. [4] R. M. Friedberg, “A learning machine,” IBM J. Rev.
So, statistical machine-learning-like reasoning Develop., vol. 2, pp. 2–13, Jan. 1958.
generates hypotheses, then ES-like reasoning gen- [5] C. C. Green and R. J. Waldinger, “Progress report on
erates plausible explanations for this and makes program-understanding systems,” Stanford Univ.,
testable predictions about what else that might Stanford, CA, USA, Tech. Rep. STAN-CS-74-444, Aug.
mean for this patient besides the problem that 1974. [Online]. Available: https://exhibits.stanford.edu/
brought them in for treatment, and then statistical stanford-pubs/catalog/jb609bz6579
machine-learning-like reasoning checks to see if [6] P. H. Winston, “Learning structural descriptions from
those predictions are confirmed or disconfirmed in examples,’’ Massachusetts Inst. Technol., Cambridge,
large patient databases. MA, USA, Tech. Rep. AITR-231, 1970. [Online]. Available:
All four of the truths about ESs above are worth https://dspace.mit.edu/bitstream/handle/1721.1/13800/
recording and hearing about, but this last back and 24510999-MIT.pdf?sequence¼2&isAllowed¼y
forth cooperation loop is what I see as the final evo- [7] D. Lenat, Building Knowledge Based Systems. Boston,
lutionary step for ESs. We do not think much about MA, USA: Addison-Wesley, 1976.
our separate brain hemispheres, but rather our [8] D. Lenat, “The ubiquity of discovery,” J. Artif. Intell.,
whole brains (and what they do!), and I see the vol. 9, no. 3, pp. 257–285, Dec. 1977.
same being true for ESs and non-ESs: together they [9] J. McCarthy, M. L. Minsky, N. Rochester, and
can and increasingly are forming an integrated AI, C. E. Shannon, “A proposal for the Dartmouth summer
which is what will lead to a general AI capability. research project on artificial intelligence, August 31,
Such general AI’s will amplify what our brains can 1955,” AI Mag., vol. 27, no. 4, 2006, Art. no. 12.
do, much as electricity and electrification amplified [10] D. Lenat, “EURISKO: A program that learns new
what our muscles can do; people will be able to heuristics and domain concepts: The nature of
solve harder problems faster, misunderstand each heuristics III: Program design and results,” J. Artif.
other less, be more creative, both as individuals and Intell., vol. 21, no. 1, pp. 61–98, Mar. 1983.
as a species. [11] M. Gladwell, “How David beats Goliath,” in The New
Such general AIs could jump-start a kind of Yorker. New York, NY, USA: Conde Nast, May 11,
national and worldwide knowledge utility, much like 2009. [Online]. Available: https://www.newyorker.
water and telecommunications and power utilities, com/magazine/2009/05/11/how-david-beats-goliath,
qualitatively expanding what Google and Facebook The wargame competition organizers said that if
are today; where individuals will contribute (and get EURISKO entered and won a third time, they would
micropayment credits when their contributions get permanently discontinue the tournament, so we
successfully used as part of anyone’s calling on the rested on our laurels and did not compete after
AI)—think Mechanical Turk squared. No, cubed. That those first two years.
could transform the United States and the world econ- [12] D. B. Lenat and J. C. Brown, “Why AM and EURISKO
omy, and almost all aspects of daily life, much as the appear to work,” J. Artif. Intell., vol. 23, no. 3, pp. 269–294,
other utilities did one century ago. Aug. 1984.
So, the future is dark for the caterpillar of stove- [13] D. Lenat, A. Borning, D. McDonald, C. Taylor, and
piped ESs, but bright for the butterfly of knowledge- S. Weyer, “Knoesphere: Building expert systems with
based systems, which rest on an immense, solid encyclopedic knowledge,” in Proc. 8th Int. Joint Conf.
platform of shared common sense and usually tacit Artif. Intell., 1983, pp. 167–169.
wisdom. [14] E. Feigenbaum and P. McCorduck, The Fifth
Generation: Artificial Intelligence and Japan’s
Computer Challenge to the World. Boston, MA, USA:
REFERENCES/ENDNOTES Addison-Wesley, 1983.
[1] R. K. Lindsay, B. G. Buchanan, E. A. Feigenbaum, and [15] D. Gibson and E. Rogers, R&D Collaboration on Trial:
J. Lederberg, “DENDRAL: A case study of the first The Microelectronics and Computer Technology
expert system for scientific hypothesis formation,” J. Corporation. Boston, MA, USA: Harvard Business
Artif. Intell., vol. 61, no. 2, pp. 209–261, 1993. School Press, 1994.

January-March 2022 IEEE Annals of the History of Computing 55


EXPERT SYSTEMS: COMMERCIALIZING ARTIFICIAL INTELLIGENCE

[16] Until 1995, when, on a single day: First, at a faculty [23] Dec. 11, 2021. [Online]. Available: https://www.w3.org/
meeting, 20 of the smartest people on Earth spent the OWL/
time squabbling about how much window space their [24] R. Davis and D. Lenat, Knowledge Based Systems in
students would have in the new Gates CS building, and Artificial Intelligence. New York, NY, USA: McGraw-Hill,
second, I gave a great lecture in class, at which the only 1982.
question was “Will this be on the final?” Those events [25] D. Lenat et al., “Harnessing Cyc to answer clinical
discharged my remaining guilt over not “professoring,” researchers’ ad hoc queries,” AI Mag., vol. 31, no. 3,
and I now have my 25-year chip. pp. 13–32, 2010, doi: 10.1609/aimag.v31i3.2299.
[17] 2021. [Online]. Available: https://www.britannica.com/
science/water
[18] 2021. [Online]. Available: https://en.wikipedia.org/wiki/ DOUGLAS LENAT received the Ph.D. degree in computer sci-
Upper_ontology ence from Stanford University, Stanford, CA, USA, in 1976,
[19] D. Lenat, “What AI can learn from Romeo & Juliet,”
investigating machine learning, specifically automated dis-
Forbes, Jul. 3, 2019. [Online]. Available: https://www.
covery guided by hundreds of “interestingness” heuristics, for
forbes.com/sites/cognitiveworld/2019/07/03/what-ai-
which he received the most coveted Award in AI, the biennial
can-learn-from-romeo–juliet/?sh=60d6d68c1bd0
[20] “Reflection” is the ability to represent and reason about IJCAI Computers and Thought Award in 1977. He is one of
what the knowledge-based system itself knows and is the pioneers of Artificial Intelligence. His time professoring
doing: what problem it is working on, what tactics it is with Stanford University, being a principal scientist with
tried and how they are turning out so far, etc. “Modals” MCC, and the CEO with Cycorp, is discussed in this article.
are constructs such as believes, expects, wants, He and his team there have spent more than 4 million per-
dreads, knows,. . . Those get nested to represent more
son-hours building up this symbolic AI artifact, Cyc, and span-
complicated situations, such as: “Israel believes that
ning human common sense. He is the author of more than
Hamas wants the US.S. to expect that. . ..”
100 books, chapters, and refereed articles, and has served as
[21] 2021. [Online]. Available: https://en.wikipedia.org/wiki/
AI_winter the principal investigator on dozens of successful efforts for
[22] At one 1986 5Gen conference I attended in Tokyo, they DARPA, DTRA, ONR, NSF, MDA, CIA, NSA, IARPA, ADL, and
showcased a one-million-processor machine predicting NIH. He is a Fellow of the AAAS, AAAI, and Cognitive Science
the weather, highlighting that 50% of the time all Society, and an Editor for Journal of Automated Reasoning,
1,000,000 processors were utilized productively. When I Journal of Learning Sciences, and Journal of Applied Ontol-
inquired, it turned out that the other 50% of the time,
ogy. He is also the only individual to have served on the Sci-
only one processor was running, so the overall speedup
entific Advisory Boards of both Microsoft and Apple. Contact
was just a factor of 2.
him at [email protected].

56 IEEE Annals of the History of Computing January-March 2022

You might also like