Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

On the Web

Synthesis Explorer: A Chemical Reaction Tutorial System


for Organic Synthesis Design and Mechanism Prediction
Jonathan H. Chen and Pierre Baldi*
Institute for Genomics and Bioinformatics and Department of Computer Science,
School of Information and Computer Sciences, University of California, Irvine, Irvine, CA 92697; *[email protected]

Cognitive theory and pedagogical experience have long Taking the idea of learning through experimentation fur-
indicated that practice problems with appropriate performance ther is the concept of inquiry-based learning (8). In this model,
feedback are vital for students to master problem-solving skills students gain mastery of a subject when they can move beyond
(1). Paper homework assignments have traditionally been the passively absorbing information and instead explore the bound-
means to challenge student knowledge and understanding, aries of their knowledge by actively asking questions. Having a
but manually grading paper homework for perhaps hundreds qualified instructor answer questions and work through prob-
of students in a large university classroom is an unwieldy and lems could be considered the gold standard for inquiry-based
labor-intensive task. Even if graded feedback is provided, cor- learning, but the combination of class discussion and instructors
rections and useful comments commonly take up to a week after office hours still suffers scalability and accessibility issues similar
assignment submission, which limits the value of the feedback. to paper homework.
Online problems sets and electronic tutorial systems can address To support randomly generated problems and inquiry-
many of these issues by offering automated grading capabilities, based learning for organic chemistry, a tutorial system must
saving instructors time and providing immediate feedback to be capable in its core to take an arbitrary reactantreagent
the student on the correctness of their answers to tighten the combination and reliably predict the major reaction product.
feedback loop. In fact, several online learning systems do exist to No simple algebraic formula or computational construct can
help address these needs in the chemistry classroom, including: provide such power. The closest systems to embody such pre-
ACEOrganic (2), LON-CAPA (3), MCWeb (4), OWL (5), dictive power are the CAMEO (9) and EROS (10) systems,
WE_LEARN (6), and WebAssign (7). but these were built on previous-generation technologies and
While the existing systems already provide significant have largely fallen out of support. Since then, few projects, for
benefits over traditional instruction models, they are necessarily example, ROBIA (11) and SOPHIA (12), have approached the
limited in scope by the quantity of problems they include. The computational reaction-prediction problem, and none have been
problem sets available in the above systems for the most part applied towards modern chemical education.
simply transplant textbook problems into an electronic format.
In particular, they consist of pre-constructed problems authored Aims for a New System
by human experts. Fixed problem sets inherently limit the
replay value of such systems, as there is little reason to return Here we describe Synthesis Explorer, an online tutorial
to problems previously completed once the answers have been system for organic chemistry designed to enable learning in
seen and memorized. A more valuable model would involve a ways previously unrealized within existing models. The system
system that can dynamically generate similar but non-identical focuses on some of the most important and challenging subjects
problems on demand. Note that we are specifically interested in organic chemistry, including reaction product prediction,
in randomly generated problems that conceivably never existed multi-step synthesis design, and reaction mechanism proposal.
before, as opposed to problems randomly selected from a pre- The system does not rely on a fixed collection of pre-authored
existing set. At least two of the systems mentioned above, OWL problems, but instead relies on a collection of chemical reagent
and MCWeb, support some dynamic generation of random models with inherent predictive power. As a result, the system
problems, but this is only for general chemistry where problems can dynamically generate and validate new problems at will, al-
are relatively straightforward to model in terms of simple math- low students to freely explore novel reaction combinations, and
ematical formulas. In comparison, organic chemistry deals less respond to student inquiries related to reaction and mechanism
with numerical values and algebraic equations and more with predictions.
chemical structure and reactivity.
Especially for organic chemistry, problems composed for System Design and Description
existing learning systems have been relatively constrained to
closed-ended designs such as multiple-choice and fill-in-the- The core features of the system are all related to the reactiv-
blank. This limitation in the variety and flexibility of problems ity of organic molecules and thus content is organized around a
ultimately restricts the creativity of the student, as well as the collection of chemical reagent models with the built-in ability to
amount of meaningful feedback the system can give to the predict the course of chemical reactions. This predictive power
student beyond terse correctincorrect responses. For open- derives from an underlying expert system developed with the
ended problems, such as multi-step organic synthesis design, it OEChem toolkit from OpenEye Scientific Software (13) and
can be very discouraging when a system prevents a student from based on reaction transformation rules written in the SMIRKS
proceeding as soon as he deviates even slightly from the intended language (14).
solution. Particularly when it takes several steps to demonstrate System content is organized into reaction categories cor-
alternative solutions, students should instead be free to experi- responding to chapters from undergraduate organic chemistry
ment with different steps and combinations to find a solution. textbooks (Table 1). The Bruice (15), Loudon (16), and Smith

Division of Chemical Education www.JCE.DivCHED.org Vol. 85 No. XX Month 2008 Journal of Chemical Education 1
On the Web

(17) textbooks were used as models for chapter organization. Table 1. List of Reaction Categories Currently Covered in the System
This textbook organization is provided primarily for conve- Corresponding to Chapter of the Loudon Organic Chemistry Textbook
niencereference to a textbook is not necessary in order to use Chapter Description
the system. Every chapter of material is shared by essentially 5 Alkenes
every undergraduate textbook. The textbooks simply order and 9.04 Substitution Reactions of Alkyl Halides
9.05 Elimination Reactions of Alkyl Halides
filter the complete list of possible chapters. Students and in-
10 Alcohols and Epoxides
structors using other textbooks can simply ignore the chapter
11.04 Epoxides and Organometallic Compounds
numbers and instead attend only to the descriptive labels. Since 11.05 Oxidation of Alcohols and Alkenes
the system can generate problems for chapters in any order and 14 Alkynes
combination, it is flexible enough to fit virtually any textbook 15 Dienes, Conjugation, DielsAlder
and lesson plan. 16 Electrophilic Aromatic Substitution
17 Allylic and Benzylic Reactivity
Synthesis Design Workspace 17.02 Alkanes, Radical Reactions
The major functionalities of Synthesis Explorer revolve 18 Transition Metal (Pd) Catalysis
around multi-step synthesis design problems and begin with 18.04 SnAr and Benzyne Reactions
the student selecting one or more categories or chapters of 19 Aldehydes and Ketones
content they wish to review through the Web interface. From 20.1 Redox of Alcohols and Carbonyls
the selected categories, the system presents a pool of available 21 Carboxylic Acid Derivatives
22 Enolate Chemistry
reagents and starting materials as well as a target synthetic
22 Acetoacetic and Malonic Ester Synthesis
product (Figure 1). The student may then interactively select
22.04 Aldol Chemistry and Michael Addition
any combination of these reactants and reagents and the system 22.05 Claisen Condensations
will predict what (intermediate) products result. Note that this 22.08 Organometallic Addition, Conjugate Addition
product prediction is calculated by the system dynamically and 23 Amines
is not based on pre-coded reaction examples. These intermediate 23.1 Arenediazonium Reactions
products can then be carried over to further reactions to build 24 Naphthalene and Heteroaromatic EAS
increasingly varied and complex molecules. The students goal is 24.05 Pyridine Derivatives
to reconstruct the target product using these tools. 25 Pericyclic Reactions
26.04 Amino Acid Synthesis
Dynamically Generated and Customizable Problems 26.07 Peptide Synthesis
With its inherent predictive power, the system has the 27 Carbohydrates
Note that problems can be dynamically generated from these
unique ability to offer randomly generated organic chemistry
categories in any order and combination, thus the system is not intimately
problems in addition to typical pre-constructed problems. It tied to any particular textbook or lesson plan. Please refer to the system
does so by taking the collection of reactants and reagents that Web site for the most current list.

Figure 1. Multi-step Synthesis Explorer screenshot from the chapter on electrophilic aromatic substitution. The student is presented with a target
product molecule (top-right) to derive a synthesis pathway for, as well as several control buttons and a context-sensitive help box. Completing
the synthesis involves selecting the proper sequence of reactants and reagents from the available, scrollable, pools (top-left and top-middle,
respectively). As the student selects different reactantreagent combinations as possible steps in the synthesis pathway, these combinations are
presented in the Pathway workspace area (bottom) along with each intermediate product the system predicts. Once the target product itself
is borne out of one of these reactions, the system can validate and record that the student was able to solve the problem.

2 Journal of Chemical Education Vol. 85 No. XX Month 2008 www.JCE.DivCHED.org Division of Chemical Education
On the Web

Figure 3. Mechanism Explorer interface. The applet (top-left) allows


Figure 2. Learning by analogy. Progressively more complex reactions students to sketch and submit curved-arrow mechanism diagrams that
the system can solve. By asking the system to solve such a series of the system can validate for correctness. If the submission is incorrect
product prediction questions, students can learn how reagents work (as in the bottom-left diagram) the system provides feedback by
by pattern recognition. For common pitfall cases that deviate from the calculating and displaying the malformed product (bottom-right) that
patterns, such as the carbocation rearrangement in the last example, would result.
the system will notify the student with a warning or hint message.

will be available to the student and then applying them in a


random sequence until it yields a reasonable target product
that the student can retro-synthetically reconstruct using those
starting materials. This provides a unique opportunity to tailor
problems specifically to fit a students needs and interests. In
particular, the student can select any combination of chapters
and the maximum number of steps of a synthesis problem, that
is, the primary measure of its difficulty. The system will generate
a customized problem fitting those specifications on demand.
Currently the system leaves these customization choices to the
student, but conceivably the system could automatically assess
the students competence and dynamically adapt the content
and complexity of the generated problems to this competence
in appropriately challenging ways (18, 19).
Free Form Exploration
Figure 4. Reaction mechanism details page showing the sub-steps
When working with Synthesis Explorer, students are ulti-
used by the system to predict the outcome of a hydrobromination
mately expected to combine reactants and reagents in a particu- reaction applied to 3-methyl-1-butene, including system-generated
lar sequence towards a target synthesis product, but the system curved-arrow mechanism diagrams.
does not constrain the student to this purpose. The system pro-
vides the unique opportunity for the student to deviate from the
intended correct pathway and to instead learn through experi-
mentation. Students are allowed to combine arbitrary reactant a reaction, they can simply view the systems expected solution
and reagent combinations to observe the predicted products for the synthesis or, even better, they can follow a link to the
and any warning or hint messages that accompany them. These Mechanism Explorer interface (Figure 3). The Mechanism Ex-
messages are generated to remind students of common pitfalls plorer interface is a powerful new addition to the system that
that are often overlooked such as when carbocation rearrange- allows students to interactively work through the proposal of a
ments or allylic resonance occurs in the course of a reaction. Not complete curved-arrow mechanism diagram for any reaction,
only can students select any of the available reactants listed by the taking advantage of the capabilities of the MarvinSketch applet
system, they can even input their own novel reactant structures (20). Full mechanism solutions are available for most reactions
through a chemical sketcher interface ( JME Editor, courtesy of in the form of a dynamically generated reaction detail informa-
Peter Ertl of Novartis). For the kinds of structures that would tion page (Figure 4). These pages expose the internal logic used
be found in an undergraduate organic chemistry text, the system by the system to predict the course of a reaction, enabling the
can still make reasonable predictions for the reactivity of novel student to ask not only what the final product of a reaction is,
compounds. As a result, common questions a student would ask but also how the reaction proceeded. Detailed mechanisms are
an instructor can instead be answered by the system providing a depicted via arrow-pushing diagrams showing the elementary
basis for inquiry-based learning (e.g., see Figure 2). sub-steps and intermediate products of the overall reaction. As
with the synthesis problems, these diagrams are not based on
Mechanism Explorer and Reaction Details a pre-constructed set of mechanism diagrams. It is rather the
When the reaction product and any accompanying cau- underlying expert system that can predict reaction mechanisms
tionary messages are insufficient to help a student understand on demand for arbitrary reactantreagent combinations.

Division of Chemical Education www.JCE.DivCHED.org Vol. 85 No. XX Month 2008 Journal of Chemical Education 3
On the Web

35
subject to criticism for some confounding factor, such as courses
30 being taught by different professors, or examinations having
different degrees of difficulty. The best experiment one could
25 imagine would be to randomly select half of the students from a
Student Count

non-participants participants class and require them to use the system, while the remaining half
20
is forbidden from using the system as a control group. However,
15 this has the potential for creating an unfair learning environ-
ment for the control group since the control students are denied
10 access to a resource that their peers can use. Ultimately, we felt
that a fair learning environment where the system was open and
5
optional to all students was more important than having a fully
0 randomized control group for this analysis.
0 10 20 30 40 50 60 70 80 90 100

Total Score (%) Summary and Discussion


Figure 5. Final score distributions for two groups of students taking
an organic chemistry course. The participants curve represents The Synthesis Explorer system specializes in challenging
students who completed at least 45 problems from the Synthesis problems of reactivity in organic chemistry, particularly reac-
Explorer system (~5 per week), while the non-participants curve tion product prediction, mechanism prediction, and multi-step
represents all other students. The average score difference is 10.4% synthesis design. From our experience, the systems support of a
of their final score. mix of random and pre-constructed problems works best with
random problems for extensive practice and pre-constructed
problems for targeted testing and evaluation. The system is built
on an underlying expert system whose inherent predictive power
Classroom Trials enables a richer learning experience through experimentation
and interactive dialogue with customized feedback in the form
Subject Group Description of both text and predicted chemical structures.
To assess the efficacy of Synthesis Explorer, it has been The majority of reagents covered in a second-year organic
made available to several undergraduate classes taught by the chemistry curriculum are already modeled within the system,
chemistry department at the University of California, Irvine. and new reagents and reaction mechanisms are being added
These classes, each taken by hundreds of students, represent the periodically. Mechanism prediction problems have recently
second of a three quarter undergraduate organic chemistry cur- been added alongside the existing synthesis design problems.
riculum where students learn and use many chemical reactions Other features that could be pursued in the future include the
and synthesis strategies. Subjects covered include reactions of capability to automatically assess the students abilities and
alkanes (radical chemistry), alkyl halides, alcohols, epoxides, dynamically tailor the problems to fit the students particular
organometallics, benzene derivatives, and carboxylic acid de- strengths and weaknesses.
rivatives. Students were offered an incentive of a ~2% boost to The expert system upon which Synthesis Explorer is built
their final grade for completing multi-step synthesis problems has been presented in an educational setting, to facilitate the
generated by the system (~5 per week). None of these classes learning of chemistry in ways previously unrealized. But the
included written homework assignments, though non-graded same expert system can be used in other applications in chemical
paper problem sets were available for students to practice. informatics and modeling. For example, computerized retro-
synthesis decision support systems (21, 22) could be based
Performance Results on the same technology. In fact, the underlying expert system
A typical final score histogram for one of these classes, is already being used to help solve the very kinds of synthesis
separated into 2 groups is shown in Figure 5. Students who com- problems generated by Synthesis Explorer (23). As the content
pleted at least 45 problems in Synthesis Explorer by the end of and robustness of chemical expert systems expand, these will
the course were classified as participants and offered the extra become useful not only to undergraduate students, but also to
credit. All other students were classified as non-participants. professional chemists.
The chart illustrates that the participants score distribution is
distinctly up-shifted from the non-participants, in this case by Distribution and Access
10.4% of their final grade (this is before the additional 2% of ex- Presently, the system is hosted on computer servers at UC
tra credit was offered). A similar pattern was observed for every Irvine and is freely accessible via the Internet to any user or
class, with the score improvement ranging from 7% to 14%. institution. Students may work anonymously on the system or
The above results offer a qualitative indication that the sys- login with a unique identifier (e.g., student ID number) to let
tem can indeed improve student learning and performance. We the system record their progress automatically, while instructors
realize that a causal relationship cannot be strictly established, may review the records of the students for whom they have the
in particular because in this experiment use of the system was identifier. Walkthroughs are available on the Web site to guide
voluntary and student motivation could be a confounding fac- users step-by-step through the basic functionalities. For more
tor in the analysis. Other study designs were considered, such advanced instructor features, such as the ability to assign spe-
as comparing average scores against previous classes where the cific (vs random) problems with enforced due dates, instructors
system was not available, or normalizing results against student should contact the developers listed on the Web site to arrange
scores in past courses. Unfortunately, all these designs can also be an appropriate collaboration. Possible partnerships with com-

4 Journal of Chemical Education Vol. 85 No. XX Month 2008 www.JCE.DivCHED.org Division of Chemical Education
On the Web

panies, such as textbook publishers, or other organizations are 11. Socorro, I. M.; Taylor, K.; Goodman, J. M. Org. Lett. 2005, 7,
being explored to facilitate a more widespread distribution of 35413544.
the system and its integration into more formalized assignment 12. Satoh, H.; Funatsu, K. J. Chem. Inf. Comp. Sci. 1995, 1995,
and assessment programs. The latest information on how to ac- 3444.
cess the system will always be available via the Web site and the 13. OpenEye. http://www.eyesopen.com (accessed Sep 2008).
respective help page (24). 14. James, C. A.; Weininger, D.; Delany, J. Daylight Theory Manual;
Daylight Chemical Information Systems, Inc.: Aliso Viejo, CA,
Acknowledgments 2008; http://www.daylight.com/dayhtml/doc/theory/ (accessed Sep
2008).
Work supported by an NIH Biomedical Informatics Train- 15. Bruice, P. Y. Organic Chemistry; 4th ed.; Prentice-Hall: Upper
ing grant (LM-07443-01) and NSF grants EIA-0321390 and Saddle River, NJ, 2004.
0513376 to PB. We acknowledge OpenEye Scientific Software, 16. Loudon, M. Organic Chemistry; 4th ed.; Oxford University Press:
Peter Ertl of Novartis ( JME Editor), and ChemAxon for aca- New York, 2001.
demic software licenses. We thank Suzanne Blum, David Van 17. Smith, J. G. Organic Chemistry; 2nd ed.; McGraw-Hill: New York,
Vranken, Zhibin Guan, Elizabeth Jarvo, Susan King, Larry Over- 2006.
man, Mare Taagepera, Chris Vanderwal, and Gregory Weiss, 18. Falmagne, J.-C.; Koppen, M.; Villano, M.; Doignon, J.-P.; Johan-
who taught the undergraduate chemistry classes, and all the nesen, L. Psychol. Rev. 1990, 97, 201224.
participating students for their feedback. We acknowledge Peter 19. Suppes, P. In Artificial Intelligence in Higher Education; Marik, V.,
Phung and Paul Rigor for contributing to software design and Stepankova, O., Zdrahal, Z., Eds.; Springer Verlag: Berlin, 1990;
development. We thank James Nowick, Scott Rychnovsky, and pp 206225.
Kenneth Shea for additional feedback and comments. 20. ChemAxon http://www.chemaxon.com, (accessed Sep 2008).
21. Todd, M. H. Chem. Soc. Rev. 2004, 34, 247266.
Literature Cited 22. Hanessian, S. Curr. Opin Drug Disc. 2005, 8, 798819.
23. Chen, J. H.; Linstead, E.; Swamidass, S. J.; Wang, D.; Baldi, P.
1. Frederiksen, N. Rev. Educ. Res. 1984, 54, 363407. Bioinformatics 2007, 23 (17), 23482351.
2. Chamala, R. R.; Ciochina, R.; Grossman, R. B.; Finkel, R. A.; 24. ChemDB Web Interface Index. http://cdb.ics.uci.edu (accessed
Kannan, S.; Ramachandran, P. J. Chem. Educ. 2006, 83, 164. Sep 2008).
3. LON-CAPA. http://www.loncapa.org/ (accessed Sep 2008).
4. Arasasingham, R. D.; Taagepera, M. Potter, F.; Martorell, I.; Supporting JCE Online Material
Lonjers, S. J. Chem. Educ. 2005, 82, 1251. http://www.jce.divched.org/Journal/Issues/2008/XXX/absXXX.html
5. OWL. http://owl1.thomsonlearning.com (accessed Sep 2008).
Abstract and keywords
6. Penn, J. H.; Nedeff, V. M.; Gozdzik, G. J. Chem. Educ. 2000, 77,
227231. Full text (PDF)
7. WebAssign. http://www.webassign.com (accessed Sep 2008). Links to cited URLs and JCE articles
8. Joolingen, W. R. v.; Jong, T. d.; Dimitrakopoulou, A. J. Comput. Supplement
Assist. Lear. 2007, 23, 111119. Description of the reagent models, single-step reaction drills, and
9. Jorgensen, W. L.; Laird, E. R.; Gushurst, A. J.; Fleischer, J. M.; structure details
Gothe, S. A.; Helson, H. E.; Paderes, G. D.; Sinclair, S. Pure. Appl.
User feedback and survey statistics
Chem. 1990, 62, 19211932.
10. Gasteiger, J.; Pfortner, M.; Sitzmann, M.; Hollering, R.; Sa- Implementation details
cher, O.; Kostka, T.; Karg, N. Perspect. Drug Discov. 2000, 20,
245264.

Division of Chemical Education www.JCE.DivCHED.org Vol. 85 No. XX Month 2008 Journal of Chemical Education 5

You might also like