Contributor Documentation

2. Overview

Scalpel behaves just like any other C++ compiler front-end. It is made of a preprocessor, a syntax analyzer and a semantic analyzer:

[Note]Missing feature

Currently, Scalpel can only handle a single translation unit. There might be an equivalent of a linker in the final version.

Before diving into the details, let's briefly define the different steps and data structures.

2.1. Raw C++ source code

A raw C++ source code is simply the code you write in your text editor. Say we wrote a simple library containing a single function square(), which squares a given number. We carefully declared the function in a header file (square.hpp) and defined it in an implementation file (square.cpp):

#ifndef SQUARE_HPP
#define SQUARE_HPP

//returns the square of number
double square(const double number);

#endif

#include "square.hpp"

double square(const double number)
{
	return number * number;
}

2.2. Preprocessing and pure C++ source code

The preprocessor produces what's called a pure C++ source code. It performs preliminary tasks such as resolution of preprocessor directives (#include, #define, etc.) and removal of comments. Here's what the preprocessor outputs from square.cpp source code:

double square(const double number);

double square(const double number)
{
	return number * number;
}

It is a simple std::string.

2.3. Syntax analysis and tree

Then comes the syntax analysis. It consists of constructing an object structure called syntax tree.

The type of a node can be one of the two hundred syntax node types of the C++ language, among which are declaration, statement, function_definition, logical_or_expression, literal and many others. All of these types are located in the scalpel::cpp::syntax_nodes namespace.

The terminal nodes contains the text tokens of the input source code.

The type of the root node of the tree is scalpel::cpp::syntax_nodes::translation_unit.

In the figure below is a sightly simplified representation (UML object diagram) of the syntax tree of our example. More precisely, this is the syntax tree of the function declaration only (the whole syntax tree would probably not fit in your screen).

2.4. Semantic analysis and graph

The final step is the semantic analysis. The semantic graph is the object structure most Scalpel's users will deal with. We finally get back the concrete notions of namespaces, classes, functions and variables.

The root object will always be an anonymous namespace, which represents the implicit global namespace.

Here is the UML object diagram of the semantic graph of our example:

[Note]Missing feature

Scalpel isn't able to analyze function bodies yet. This is why there's no object describing the return statement of the function.