Professional Documents
Culture Documents
Applying Case-Based Reasoning To Code Understanding and Generation
Applying Case-Based Reasoning To Code Understanding and Generation
1 Introduction
This paper briefly reviews the applicability of case-based reasoning (CBR) to code
understanding (and its inverse, code generation). This topic is of interest because it
suggests a practical route towards much more powerful software engineering
technology where, for example, reuse and interoperability might be managed
directly, by systems rather than by human operators.
The rest of this paper is organised as follows:
Section 2 introduces code understanding and code generation, and discusses the
motivations for them.
Section 3 discusses the benefits of CBR, and why it is appropriate to apply CBR
to code understanding and generation.
Section 4 reviews some of the existing approaches to code understanding and
generation from a CBR perspective - both those which claim to take a CBR
approach, and those that dont but have some affinity with CBR.
1.
This work was supported by the Engineering and Physical Sciences Research Council
(EPSRC).
2.
Although pedantically it could be argued that compilers extract knowledge that is at a lower
level of abstraction than the code itself!
knowledge about all domains [22]. That implies having a prohibitively large
case library. Analogical reasoning [23] should be able to help, because it would
allow cases to be used analogously across different domains.
Clichs are naturally also useful for code generation, as overlays suggest how
given specification clichs can be implemented. Clichs are used in this way in the
Programmers Apprentice [36,37], a semi-automatic programming environment.
The underlying philosophy is the same as that of CBR, but the user has to tell the
system which clich to use and how to adapt it - the system is merely carrying out
mechanical transformations, whereas a CBR system should be able to decide for
itself which clich to use and what transformations to apply.
The clich-based approaches to code understanding and generation are clearly in
the spirit of CBR, and share the same underlying philosophies. The key difference
is that clichs do not admit of any partial matching and adaptation, and thus do not
have the flexibility or efficiency3 that a CBR approach could offer. Clichs also tend
to be small chunks of knowledge, whereas cases would tend to be large chunks several clichs could be extracted all at once. That is why Quilici calls his work a
memory-based approach [1] instead of a case-based approach (c.f. [38]).
The field of software reuse is strongly related to CBR (c.f. Section 2.1) - the
terms retrieval and adaptation are part of the reuse communitys vocabulary
(Section V of [11]), but the term case-based reasoning is not in their mainstream
vocabulary. As well as the explicit CBR approaches to reuse mentioned in Section
4.1, there is a lot of covert CBR (e.g. [39]).
Design patterns [40] share the philosophy which underlies CBR - they describe
solutions to recurring software problems which conform to the principles of good
object-oriented programming (flexibility, elegance and reusability). However, in
their current avatar, design patterns are intended as guidelines for human
programmers rather than automatic programmers, although it is quite easy to
conceive of automatically generating particular implementations of design patterns
or recognising them in existing code.
Matsuura et al. [41] describe a system called EVA which derivationally replays
the process of deriving a program from a specification when the specification
changes. It is thus doing adaptation without needing a retrieval phase first (evolution
as opposed to reuse). It is conceivable that EVA could be extended to be a reuse tool
by augmenting it with a retrieval phase, and it would not then be unreasonable to
call it a case-based automatic programming system.
Maiden and Sutcliffe [42] describe an analogical reasoning approach to
specification reuse, i.e. developing new specifications by analogy with old ones.
Analogical reasoning [23] is like CBR except that the case which is used to solve
the new problem can be from a different domain (e.g. a heat flow problem could be
solved by analogy with a current electricity problem, and a situation involving
planets orbiting a star could be understood by analogy with a situation involving
electrons orbiting the nucleus of an atom). Analogical reasoning requires a process
of analogical mapping to determine the correspondences between two analogous
situations. Keane [43] suggests that much adaptation is latent in this analogical
3.
The efficiency of CBR is that it should be able to recognise larger chunks at once and avoid
combinatorial explosion. This should outweigh the cost of retrieval, partial matching and
adaptation.
mapping process.
5 Future Directions
Our research focuses on understanding the constraints in information models [44].
This is a special case of code understanding (information models are written in a
formal language, namely EXPRESS [5], which is a kind of code). The project aims
to take a CBR approach to extracting higher-level knowledge about constraints
from the EXPRESS code. The constraint-understanding system is also being
extended to other kinds of code. Many other kinds of code are being considered for
this extension, such as:
other information modelling languages (e.g. UML [6]);
imperative program code (e.g. Java [45] or C/C++ [46]);
machine code;
functional languages (e.g. SML [47] or LISP [48]);
logic languages (e.g. Prolog [49]);
formal specification languages (e.g. VDM (Appendix B of [2]) or Z [3]);
database query languages (e.g. SQL [4]);
scanner/parser generators (e.g. lex and yacc [50] or JavaCC [51]).
References
1. Quilici A. A memory-based approach to recognizing programming plans.
Communications of the ACM 1994; 37(5):84-93
2. van Vliet JC. Software engineering: Principles and perspective. John Wiley & Sons,
Chichester, 1993
3. Spivey JM. The Z notation. Prentice Hall, New York, 1994
4. Elmasri R, Navathe SB. Fundamentals of database systems, 2nd edition. The Benjamin/
Cummings Publishing Company, Inc., Redwood City, 1994. Chapter 7
5. ISO TC184/SC4. Industrial automation systems and integration - Product data
representation and exchange - Part 11: Description methods: The EXPRESS language
reference manual. ISO standard, reference no. ISO 10303-11. ISO, Switzerland, 1994
6. Rational Software Corporation. UML Resource Center: Unified Modeling Language,
Standard Software Notation. http://www.rational.com/uml/index.jtmpl
7. Chikofsky EJ, Cross JH II. Reverse engineering and design recovery: A taxonomy. IEEE
Software 1990; 7(1):13-17
8. Rich C, Waters RC. Automatic programming: Myths and prospects. IEEE Computer
1988; 21(8):40-51
9. Ning JQ, Engberts A, Kozaczynski W. Recovering reusable components from legacy
systems by program segmentation. In: Waters RC, Chikofsky EJ (eds) Proceedings of the
1st IEEE Working Conference on Reverse Engineering, Baltimore, 21st-23rd May 1993.
IEEE Computer Society Press, 1993, pp 64-72
10. Ning JQ, Engberts A, Kozaczynski W. Automated support for legacy code
understanding. Communications of the ACM 1994; 37(5):50-57
11. Mili H, Mili F, Mili A. Reusing software: Issues and research directions. IEEE
Transactions on Software Engineering 1995; 21(6):528-562
12. Cimitile A, De Lucia A, Munro M. An overview of structural and specification driven
candidature criteria for reuse reengineering processes. Department of Computer Science,
University of Durham, 1995, technical report 7/95. http://www.dur.ac.uk/CSM/
projects/RE2/ps/TR7dur.ps
13. Cimitile A, De Lucia A, Munro M. Identifying reusable functions using specification
driven program slicing: A case study. In: Proceedings of the IEEE International
Conference on Software Maintenance, Nice, October 1995, pp 124-133. http://
www.dur.ac.uk/CSM/projects/RE2/ps/ICSM95.ps
14. Canfora G, Cimitile A, Munro M. An improved algorithm for identifying objects in code.
Software - Practice & Experience 1996; 26(1):25-48
15. Kolodner JL. Case-based reasoning. Morgan Kaufmann Publishers, Inc., San Mateo,
1993
16. Watson ID. Applying case-based reasoning: Techniques for enterprise systems. Morgan
Kaufmann Publishers, Inc., San Francisco, 1997
17. Brown MG. A memory model for case retrieval by activation passing. PhD thesis,
University of Manchester, Manchester, 1993, technical report UMCS-94-2-1. ftp://
ftp.cs.man.ac.uk/pub/TR/UMCS-94-2-1.ps.Z
18. Rich C, Shrobe HE. Design of a Programmer's Apprentice. In: Winston PH, Brown RH
(eds) Artificial intelligence: An MIT perspective, vol 1: Expert problem solving, natural
language understanding, intelligent computer coaches, representation and learning. MIT
Press, Cambridge, MA, 1979, pp 137-173
35. Quilici A. A hybrid approach to recognizing programming plans. In: Waters RC,
Chikofsky EJ (eds) Proceedings of the 1st IEEE Working Conference on Reverse
Engineering, Baltimore, 21st-23rd May 1993. IEEE Computer Society Press, 1993, pp
126-133.
http://www-ee.eng.hawaii.edu/~alex/Research/
Postscript/Papers/wcre93.ps
36. Waters RC. The Programmers Apprentice: A session with KBEmacs. IEEE
Transactions on Software Engineering 1985; 11(11): 1296-1320
37. Rich C, Waters RC. The Programmers Apprentice: A research overview. IEEE
Computer 1988; 21(11):10-25
38. Aamodt A, Plaza E. Case-based reasoning: Foundational issues, methodological
variations, and system approaches. AICom - Artificial Intelligence Communications
1994; 7(1):39-59. http://www.iiia.csic.es/People/enric/AICom.html
39. Jeng J-J, Cheng BHC. A formal approach to reusing more general components. In:
Proceedings of the 9th IEEE Knowledge-Based Software Engineering Conference,
Monterey, September 1994, pp 90-97. ftp://ftp.cps.msu.edu/pub/serg/
reuse/kbse94-reuse.ps.Z
40. Gamma E, Helm R, Johnson R, Vlissides J. Design patterns: Elements of reusable
object-oriented software. Addison-Wesley Longman, Inc., Reading, 1995
41. Matsuura S, Kuruma H, Honiden S. EVA: A flexible programming method for evolving
systems. IEEE Transactions on Software Engineering 1997; 23(5):296-313
42. Maiden NA, Sutcliffe AG. Exploiting reusable specifications through analogy.
Communications of the ACM 1992; 35(4):55-64
43. Keane MT. Analogical asides on case-based reasoning. In: Wess S, Althoff K-D, Richter
MM (eds) Topics in Case-Based Reasoning, Selected Papers from the First European
Workshop (EWCBR-93), Kaiserlautern, November 1993. Springer-Verlag, Berlin, 1994,
pp 21-32 (Lecture notes in computer science no. 837)
44. Broad AP. The application of case-based reasoning to the understanding of constraints on
information models. MPhil thesis, University of Manchester, Manchester, 1999. http:/
/www.cs.man.ac.uk/~broada/cs/mphil/thesis/
45. Grand M. Java language reference. O'Reilly & Associates, Inc., Sebastopol, 1997
46. Stroustrup B. The C++ programming language, 2nd edition. Addison-Wesley Publishing
Company, Inc., Reading, 1991
47. Myers C, Clack C, Poon E. Programming with Standard ML. Prentice Hall, New York,
1993
48. Wilensky R. Common LISPcraft. Norton Press, New York, 1986
49. Clocksin WF, Mellish CS. Programming in Prolog, 4th edition. Springer-Verlag, Berlin,
1994
50. Levine JR, Mason T, Brown D. Lex & yacc, 2nd edition. O'Reilly & Associates, Inc.,
Sebastopol, 1992
51. Sun Microsystems, Inc. Java Compiler Compiler (JavaCC) - The Java Parser Generator.
http://www.sun.com/suntest/products/JavaCC/