Keystone: A parser and front-end for ISO C++

Papers on Keystone

An Approach for Modeling the Name Lookup Problem in the C++ Programming Language

J. F. Power, B. A. Malloy

Formal grammars are well established for specifying the syntax of programming languages. However, the formal specification of programming language semantics has proven more elusive. A recent standard, the Unified Modeling Language (UML), has quickly become established as a common framework for the specification of large scale software applications. In this paper, we describe an approach for using the UML to solve the name lookup problem for the recently standardized C++ programming language. We apply our approach to C++ because a solution to the name lookup problem is required for parser construction and the development of analysis and testing tools.

Proceedings of the ACM Symposium on Applied Computing, (SAC 2000)

[ citeseer ]

Symbol table construction and name lookup in ISO C++

J. F. Power, B. A. Malloy

In this paper, we present an object-oriented model of symbol table construction and name lookup for ISO C++ using the Unified Modeling Language (UML). Our use of UML class, activity and sequence diagrams serves to explicate our model and our use of patterns such as decorator and facade increase the understandability of the model. Clause three of the ISO C++ standard describes the procedures and rules for performing name lookup; our activity and sequence diagrams serve to simulate these procedures in graphical fashion. An advantage of our approach is that our model can increase C++ understandability for those practitioners with a working UML knowledge.

An important contribution of our work is that our model forms the basis for construction of a parser front-end for ISO C++. Our explication of the name lookup problem represents a necessary first step in this construction and our component approach is independent of the compiler technology utilized. Our use of the UML in describing parser-driven applications demonstrates how front-end development can be integrated into an object-oriented framework. Construction of an ISO C++ front-end will help to increase the collection of tools for applications that use this popular language.

Proceedings of the International Conference on the Technology of Object-Oriented Languages and Systems, (TOOLS 2000)

[ citeseer ]

Decorating tokens to facilitate recognition of ambiguous language constructs

B. A. Malloy, J. F. Power, T. H. Gibbs

Software tools are fundamental to the comprehension, analysis, testing and debugging of application systems. A necessary first step in the development of many tools is the construction of a parser front-end that can recognize the implementation language of the system under development. In this paper, we describe our use of token decoration to facilitate recognition of ambiguous language constructs. We apply our approach to the C++ language since its grammar is replete with ambiguous derivations such as the declaration/expression and template-declaration/expression ambiguity. We describe our implementation of a parser front-end for C++, keystone, and we describe our results in decorating tokens for our test suite including the examples from Clause Three of the C++ standard. We are currently exploiting the keystone front-end to develop a taxonomy for implementation-based class testing and to reverse-engineer UML class diagrams.

Software, Practice and Experience, (Jan 2003)

The Design and Implementation of a Parser and Front-end for the ISO C++ Language and Validation of the Parser

T. H. Gibbs

In this thesis, we address the problems associated with the early phases of compiler development for object-oriented languages: lexical analysis, parsing and construction of a parser front-end. We first show that many language constructs for object-oriented languages cannot be parsed using only syntactic information. We then describe a technique for parsing ambiguous language constructs that exploits semantic information previously gathered in the parse. To demonstrate the effectiveness of the technique we apply it to a language that is notoriously difficult to parse, the C++ programming language. We show that the technique permits parsing of ambiguous C++ constructs in the grammar provided in the ISO C++ standard without modifying, refactoring or extending the grammar.

We then describe a dynamic, automated technique for validating the parser, including the development of a technique that permits validation of class invariants that are not initially valid. The algorithm for this technique accepts, as input, a formal specification of the invariants for the important classes and class hierarchies in the system. The algorithm produces, as output, a validator that exercises these invariants as part of the testing process.

Papers Utilizing Keystone

Reveal: A Tool to Reverse Engineer Class Diagrams

S. Matzko, P. J. Clarke, T. H. Gibbs, B. A. Malloy, J. F. Power

Many systems are constructed without the use of modeling and visualization artifacts, due to constraints imposed by deadlines or a shortage of manpower. Nevertheless, such systems might profit from the visualization provided by diagrams to facilitate maintenance of the constructed system. In this paper, we present a tool, Reveal, to reverse engineer a class diagram from the C++ source code representation of the software. In Reveal, we remain faithful to the UML standard definition of a class diagram wherever possible. However, to accommodate the vagaries of the C++ language, we offer some extensions to the standard notation to include representations for namespaces, stand-alone functions and friend functions. We compare our representation to three other tools that reverse-engineer class diagrams, for both compliance to the UML standard and for their ability to faithfully represent the software system under study.

Proceedings of Technology of Object-Oriented Languages and Systems, (2002)

[ citeseer ]

Identifying Implementation-Based Testing Techniques for Classes

P. J. Clarke, B. A. Malloy

We present an algorithm that automates the process of identifying implementation-based testing techniques that are suitable for testing a given class. The algorithm accepts a summary of the class under test and a set representing testing techniques available to the developer engaged in performing the test. The summary of the class is based on our taxonomy that maps the characteristics of a class in an object-oriented system into our taxonomy, each entry consisting of a nomenclature and feature properties. Each element in the set of testing techniques supplied to the algorithm uses our taxonomy to summarize the class characteristics favored by that technique. Using the nomenclature and feature properties of the class under test, together with a set of available testing techniques, the algorithm identifies a subset of techniques that are appropriate for the class under test.

International Journal of Computers and Information Systems, (Sep 2002)

Using A Taxonomy Tool To Identify Changes in OO Software

P. J. Clarke, B. A. Malloy, J. P. Gibson

In this paper, we present a taxonomy that allows the maintainer to catalog OO classes based on the characteristics of the class. The characteristics of a class include the properties of data items and methods, as well as the relationships with other classes in the application. We construct a tool to track changes across multiple releases of software applications containing hundreds of classes, providing information about each changed class. Our tool identifies class changes in terms of the characteristics exhibited by classes with the same name in different releases of an application.

Proceedings of 7th European Conference on Software Maintenance and Reengineering, (CSMR 2003)

[ citeseer ]

A Parameterized Cost Model to Order Classes for Class-based Testing of C++ Applications

B. A. Malloy, P. J. Clarke, E. L. Lloyd

In this paper we present the design and implementation of a Class Ordering System that is driven by a parameterized cost model. The parameters to the model assign weights to the edge types that describe the relationships between the classes in the graphical representation of the program. The nodes in the graph are classes and the edges express relationships between the classes. Previous research has included three or four edge types in the graph. However, to accommodate the full complement of C++ language constructs, which include template classes and functions and nested classes, we extend the graph to include six edge types. The parameters to the cost model can be tuned to remove certain types of edges in an attempt to reduce the cost of the testing effort or to reduce the cost of breaking cycles in the graph. Our case study indicates that inclusion of inheritance edges in cycle breaking considerations may reduce the number of edge removals by a factor of two or more.

Proceedings of International Symposium on Software Reliability Engineering, (ISSRE 2003)