Programming Macrocoder
Book 1
Lifecycles, grammars and phases
Macrocoder allows the development of macrocoding code generators in minutes. To allow this, a dedicated programming paradigm, called phase programming, has been developed.
This manual will go through all the basic concepts on which the Macrocoder programming is based.
The prerequisites required for this tutorial are:
When working with a tool like Macrocoder, some confusion about roles might arise. A programmer "A" writes a FCL program that implements a compiler for a new language "L" she or he has invented. Then a programmer "B" writes a program in "L" using the compiler developed by "A" using Macrocoder. Both the programmers use Macrocoder, but with two different roles. We shall call who develops the language (i.e. programmer "A") the rules developer, while who uses the new language will be the target developer.
This entire document is dedicated to the rules developer.
Some confusion can arise also when talking about source files. Macrocoder project involve three kinds of source files and it is very important to be always confident to know which is which. These are the three categories with specified the name we shall use to refer each one of them:
Macrocoder does its job by executing a program written in a language, similar to Java or C++, called "FCL". The program is written within the Macrocoder IDE which will take care of compiling and executing it.
The FCL language is based on common concepts: it has classes, methods, attributes, inheritance and interfaces, like Java or C++, plus some extra unique features designed specifically for macro code generation.
The FCL language, as any other object-oriented one, supports the concepts of "class" and "object". The class is the description of the type: what method and attributes it has. The object is a segment of memory allocated to host the data requested by the class definition. If a class is a project of a car, an object is one real car of that model. The action of creating an object that follows the rules defined in a class is called instantiation.
In FCL some classes are created by the developer while other will be generated automatically by the Macrocoder environment. Finally, several utility classes are available from the included Macrocoder Library.
As we will see, this applies also to objects: some objects are created by the programmer, while other are automatically instantiated by the Macrocoder Runtime Environment.
A class definition in FCL is very similar to Java or C++:
class MyClass { // Numeric attribute Int myMumber; // Text attribute String myText; // Method Void showContents () const { system().msg << "Number:" << myNumber << " text:" << myText << endl; } }
By default, all members are public. Although FCL supports public/protected/private as expected, they are seldom used due to greater power given by phase protection, that will be explained later.
One of the unusual concepts widely used in FCL is class extension. Class extension allows to split the definition of a class among multiple files. For example, the following declaration generates the same class as the example above, but the three components (myNumber, myText and showContents) are added separately:
class MyClass { // Numeric attribute Int myMumber; } extend class MyClass { // Text attribute String myText; } extend class MyClass { // Method Void showContents () const { system().msg << "Number:" << myNumber << " text:" << myText << endl; } }
The extension concept is very important in Macrocoder programming, because some parts of the process require to add methods to classes created internally by Macrocoder. Since internal classes do not have a source file with "class ...", the only way to add methods and attributes to them is by extension.
Macrocoder supports some primitive types. They are:
Int | 32-bit integer signed values |
Huge | 64-bit integer signed values |
Float | 64-bit floating point values |
String | UNICODE string of any length |
DecoString | UNICODE string of any length with decoration (bold, italic, font size, etc.) |
Binary | Generic binary data |
Void | Used in methods return type to represent procedures not returning any value. |
There is no Boolean type: the boolean expressions are implemented with the Int type.
The composite types are a mean to group different objects in a complex structure. Composite types are characterized by ownership: the container owns the contained object. This means that if the container is deleted, so they are the contained objects. Also, a given object can be owned at most by one container.
At this stage, we shall take a quick overview. We shall treat them in detail later on.
The class is obviously the first way of create composite types. A class can contain attributes of other types. In this example, class Bar contains two instances of class Foo called f1 and f2:
class Foo { Int x; Int y; } class Bar { Foo f1; Foo f2; }
The array allows one class to own a variable number of instances of another class.
In the example below, the container class Bar can contain any number of instances of Foo (or any type derived from Foo) in an array named myFoos. The array can onw objects
class Foo { Int x; Int y; } class Bar { array of Foo myFoos; }
The variant can contain zero or one element of the indicated type or derived. It is exactly like an array but limited to one element maximum.
class Foo { Int x; Int y; } class Bar { variant of Foo myFoo; }
One of the key concepts in Macrocoder programming is the lifeset.
A lifeset does two things:
A lifecycle is a container of lifesets. Most Macrocoder projects will work with one single lifecycle, whose default name is MAIN
. For this tutorial we will assume that there is one single lifecycle with the default name MAIN
.
In Macrocoder, types can be divided in three families:
The utility library classes are available globally and they can be used anywhere. Although they can be also user-defined, in most project they are represented only by the utility classes coming from the Macrocoder library. They include classes and functions to manipulate strings, files and so on.
The other two families are always created within a namespace called lifeset. In a Macrocoder project there is always one lifeset, but usually they have two (we'll se later why).
In this code snippet, we have defined the Foo class within a lifeset called CORE:
lifeset CORE; class Foo { Int x; Int y; }
Lifesets are created automatically as soon as they are "cited". As soon as the declaration lifeset X; is done, the lifeset X begin to exist in the project. Then, the declaration lifeset X; can be repeated any number of times (actually, it has to be repeated in every source file that refers to that lifeset).
The lifeset is not only a namespace for classes: it is also a root container for objects. Every lifeset has a singleton object, called cauldron that serves as root container for all the objects instanced within the lifeset itself.
In the Macrocoder language, all the objects are dynamic: they are normally destroyed as soon as the function that created them returns. The only way to keep them alive is to assign them to an owning container: an array, a variant or the lifeset cauldron.
Thanks to the above composite data types, the Macrocoder objects form trees, i.e. acyclic graphs that start from a single root object. All the root objects are stored in the "cauldron".
A Macrocoder program actually implements a compiler. The first thing every compiler does is to read the source file it has to compile. We know from the chapter Rules and target that the compiler is developed by the rules developer and the source file it reads is written by the target developer.
The first step performed by Macrocoder is parsing, also known as syntax analysis. During this operation, the input source is verified against the syntax defined by the rules developer. Checking the syntax means verifying that the sequence of keywords and other elements conforms to the defined rules.
For example, this snipped conforms to Java syntax:
class HelloWorld { }
Instead, although it contains exactly the same information, this snipped does not conforms to Java syntax:
HelloWorld class { }
Indeed, the syntax of Java requires the keyword class
followed by a variable identifier representing the name of the class, then followed by an open brace {
and so on. The second example is not respecting this sequence, thus violating the syntax rules. The Java compiler would complain signaling Syntax error at line 1.
In order to be able to validate the target source and show any violation of the syntax rules, Macrocoder, like any other parser, must be aware of the syntax rules currently in force.
This is done by creating a grammar definition to the Macrocoder rules project (see the beginners' guide to have step-by-step instructions on how to create and edit a grammar).
A grammar definition is a tree of parsing rules that starts from a given root node. Let's see a Macrocoder very simple grammar definition designed to parse exactly the Java example above:
The above grammar definition contains one single rule called MyRule
(1). The small red box containing the text "java" (2) means that this is the root rule for files whose file name extension is .java. In other word, every file *.java
added to the target project, will be parsed starting from rule MyRule
.
The rule specifies that a valid target source file must begin with the keyword class
(3) followed by an identifier (4).
An identifier is any non empty sequence of A-Z characters, 0-9 numbers or underscores (_) that does not begin with a number and that does not match any defined kewyord. So abc
is a valid identifer, while x@y
is not because it contains an invalid character (@
).
After the identifier, the grammar expects the keyword {
(5) and the keyword }
(6). Finally, the end of rule symbol (7), states that nothing else might follow.
The rule above is called a sequence because its items are to be expected once and in the specified sequence.
The grammar defined in the previous chapter accepts this target source file:
class FirstClass { }
However, it rejects this source file:
class FirstClass { } class SecondClass { }
The reason is very simple: the above grammar expects one class name{}
not class name1{} class name2{}
.
If we want to be able to accept a sequence made of any number of class name{}
, we must use a repetition:
The repetition symbol (1) means that the contained sub rule (2) can be repeated zero or more times. With this addition, a sequence of any number of class name{}
is accepted.
The reference symbol allow to split a long rule into smaller sub rules; it also allows to reuse the same sub rule in multiple places.
In this case, the OneClass
sub rule has been placed in its own standalone rule (2). It is then referenced (1) by the MyRule
rule.
The rules shown above were able to recognize a fixed sequence. However, mostly in every language, the grammar must be able to recognize different alternatives.
For example, in the snipped below we have a sequence of entries that can either be class
or interface
:
class FirstClass { } interface MyInterface { } class SecondClass { }
This can be achieved using the choice symbol:
As the graph lines intuitively suggest, at the choice symbol the flow splits among multiple valid choices. The parsing can continue with a class
keyword or with interface
. The effect of the specification above is that the grammar expects a sequence of any number of class ...
or interface ...
written in any order.
Another very common case happens when an entry is optional. For example:
class Super { } class Sub extends Super { }
The extends Super
is a component that can be optionally present in a Java class definition.
This nothing more that a choice whose one branch is an empty rule:
Here we have again a choice symbol (1), but we activated the optional line (2) that tells that one of the options is to simply skip entirely all the other options.
In a grammar definition, a terminal is a rule that can not be further expanded. Terminals are the leaves of the grammar tree. In the case of Macrocoder, terminals are keywords and fields.
The keyword terminal matches exactly the text there indicated. It can be a text like class or a symbol like + . | |
The identifier terminal matches any sequence of ASCII characters in the range A-Z, a-z and 0-9 plus the underscore symbol _ . To be a valid identifier, the sequence must not begin with a number. Also, a valid identifier must not match words defined in the grammar as keywords. Valid examples are a and abc_123 . | |
The quoted terminal matches a string enclosed within quotes. For example, "Hello world!" .The characters within the quotes support some escapes exactly like it happens in C or Java. For example, the string My display is 15" wide , the quoted string must be written as "My display is 15\" wide" . | |
The numeric terminal matches a number in various forms; a number always begins with a number in the range 0...9. Numbers can be expressed as decimal (e.g. 123 ), floating point (1.23 , 0.2E+4 , etc.) or exadecimal (0xAF12 ). | |
The freetext terminal matches a special kind of text managed by the Macrocoder editor. This text mode is activated within the editor and it is evidenced because its background is colored in yellow. |
For example, this rule:
matches a string like this:
set myNumber = 1234
While this rule:
matches a string like this:
set myText = "Hello world"
So far we have seen how the grammar rules can be defined. We have learned that the Macrocoder parser will be able to verify if a target source file complies with the grammar rules and, in case of violation, it will report a syntax error message.
The following step is to learn how the information gathered by the Macrocoder parser can be used by the rules programmer.
Let's take one of the previous examples:
set myText = "Hello world"
Thanks to the Macrocoder parser, we know that that target input source is correctly formed. That's certainly a good start, but now we need to know the name of the variable being set (myText
) and the text being assigned to it (Hello world
) so we can take the following steps in code generation. In ohter words, we want to access the parsing tree.
Every Macrocoder grammar definition is associated to a lifeset. The lifesets that host grammar rules, called grammar lifesets, have some extra features but they behave exactly as every other lifeset. When defining grammar rules, the default name of the associated lifeset is GRAMMAR
.
Macrocoder will make the parsing tree available through the automatic creation of classes and objects within its lifeset.
Let's start with a very simple example. We define this very simple grammar:
The Macrocoder grammar editor, using a green dotted box, shows us that for that rule it will create a class named MyRule
(1) containing two attributes: varName
(2) and value
(3).
The class created automatically will look like this:
class MyRule: GBase { GString varName; GString value; }
Being that class defined internally, there is no actual source to be browsed. However, it can be seen in the insight view: click on the Insight tab and browse to MAIN
, GRAMMAR
and MyRule
:
When a real target source file is fed to Macrocoder, its parser analyzes it and creates one or more objects instancing the classes shown in the previous chapter. The newly created objects are bound to the grammar lifeset cauldron.
With this target source as input:
set myText = "Hello world"
the Macrocoder parser would instantiate a single object of class MyRule
and it would store it in the cauldron objects container of the lifeset named GRAMMAR
. That object would have its attributes set as follows:
varName
= "myText"
value
= "Hello world"
Let's now take a look to a bit more complex grammar:
This grammar is able to recognize a sequence of zero or more set x="..." statements:
In this case, Macrocoder will generate the same MyRule
class as before, plus a new class named ManySets
containing an array of MyRule
named setEntries
:
When parsing the example above, Macrocoder will create one instance of ManySets
associated to the cauldron; then, it will create three instances of MyRule
that will be bound to the setEntries
array of the ManySets
instance:
The Macrocoder environment creates the initial objects that contain the information that has been read from the user target source files. From now on, this information has to be processed by our code until the final production of the output.
This process evolves through a sequence of steps in which each step is based on the results obtained by the previous ones, where the first step is the parsing done by Macrocoder. At this stage we will not go through the goals of the various steps yet: it will be the argument of the following chapters. Instead, we shall concentrate on the mechanism that allows these steps to be performed: the phase.
So far we have learned that the Macrocoder objects are organized in a tree where the ultimate root is the lifeset cauldron. A phase is a feature associated to a lifeset; it consists in traversing the tree reading/writing attributes or creating other objects to achieve a declared goal.
A lifeset can have any number of phases. To establish the execution order, i.e. which phase is to be executed first, every phase is assigned a unique number: lower numbers are executed first. For example, phase #24 will be executed before phase #1000. Phase numbers are floating point values: so if we want to add a phase between phases 1 and 2, we can use phase 1.5.
Phase execution starts as soon as Macrocoder has terminated successfully the target sources parsing and it has created the initial instances.
A phase execution consist into orderly scanning all the existing objects and calling a dedicated user-implemented method on each object.
The objects might have no method for a given phase: this is allowed because not all objects are always involved in all phases.
Phases are executed by traversing the tree formed by the objects within a lifeset in a given order.
Macrocoder supports two execution orders:
In the table below the same example with the execution order in the father-first and children-first modes:
Order: children-first | Order: father-first |
|
|
The phase method is the method automatically invoked by Macrocoder during phase execution.
We shall see immediately a FCL example of a phase method. We consider again the example shown in chapter 4.3.3. In figure 4.3.3.3 we can see that Macrocoder has automatically created two classes: ManySets
and MyRule
.
The following code snipped implements a phase:
Let's comment the various lines:
grammar GRAMMAR;
declaration states that we are working in the GRAMMAR
lifeset; we are using the keyword grammar
instead of lifeset
because this is a lifeset containing a grammar and it has some extra features.father-first
phase whose name is ShowData
; the phase number is 1, which means that other phases >1 will be executed after and those $lt;1 will be executed before;extend class MyRule
orders Macrocoder to add the following contents to an existing class; in fact the MyRule
class has been created by the grammar engine and the only way to add methods to it is by extension;ShowData
;system().msg
(i.e. the Macrocoder output window) the contents of the fields varName
and value
;
The tree below reports the objects instanced for parsing the soruce file shown in figure 4.3.3.2. The objects involved in phase ShowData
(i.e. those having a phase method) are evidenced:
It is now time to run the project. The complete rules and target projects for this example can be downloaded at this link: ManySets1.zip.
Start the execution by pressing the "run" icon:
Macrocoder will parse the target source file. Once done, it will execute the only existing defined phase, i.e. ShowData
. The only phase method for that phase is associated to class MyRule
. Since there are three instances of MyRule
(because we wrote three lines of "set..." in our source file), the method will be executed three times. We can see the method output in the red box. Note that the variable names and their contents are colored in blue and underlined: if you click on them, Macrocoder will bring you to the exact source spot where that string has been defined.
In this book we have covered the following concepts about Macrocoder programming: