3. Semantics
In specifying the semantics of the language, it is necessary to distinguish between symbols that are a part of the language, and other auxiliary mathematical functions that are used to define the semantics of the language.
3.1. Term Graphs
The language makes use of term graphs to store data. Term graphs are indicated by the letters G and H. A term graph is a set of triples (node, function symbol, list of children nodes) where there is at most one triple for any node. The notation indicates the triple . If G is a term graph then the set of nodes of G is . Nodes can be indicated by the letters u and v. The symbol f is the label of node v if the triple is in G, and f must be a constructor symbol. The list is the list of children of v. Now nodes is the set of nodes in G, is the label of node u in G, arity is the number of children of u and args is the list of children. Also arg is the node which is the child of u. These functions are easy to confuse with similar functions on terms such as atop, aarity, and aarg.
Given a term graph G and a vertex v in G, define term to be the possibly infinite term defined by:
If v is in G then term is so that atop(term) = f and aarg. In this formalism term does not contain any variables. Therefore, for all v in G, term is a ground constructor term, possibly infinite if G has cycles. It is possible that two distinct nodes u and v satisfy term≡ term where ≡ means syntactic identity. Because terms are used also as arrays, it is possible to have more than one copy of an array with the same function symbol f.
For example, if then term. If then term. If then term.
The symbols and are constructors so these can be the labels of nodes in a term graph. We assume that all term graphs have a node with label .
Define Replace as G with (for node u in G) replaced by in G where is a node in G. This operation is used to define assignment to an element of an array (argument replacement). If or then Replace = G.
Suppose = Replace. Then term = term with the argument replaced by term if .
There are also functions make.term, new.term and cache.term that are auxiliary functions used to define the semantics of the language. Define make.term to be a pair where v is a node and is . Depending on the implementation, the node v can be a new node, not present in G, or else it can be an existing node v of G if already is an element of G. Define new.term to be a pair where v is a new node not present in G and is . The operation new.term is needed for an explicit copy operation. A related operation cache.term will be described later. The functions make.term and new.term are the only functions that can change the set of nodes in the term graph and they can only do this by adding a node to the set of nodes. This implies that the set of nodes in the term graph never gets smaller and may only continually get larger as a program executes.
It would be possible to do garbage collection on the term graph, which could cause the set of nodes in the term graph to get smaller.
3.2. Environments
An environment is a function from program variables to nodes in a term graph G. It is required that undefined variables map to a node in G with label . Environments map from all of the denumerably infinite set of program variables to nodes in G. Only finitely many of them will be defined and these are the only ones that need to be explicitly stored.
3.3. States
A state consists of a term graph G and an environment e that maps program variables to nodes in G. States are indicated by S and T. Also, env is the environment of a state S and is the term graph of S so that env() = e and . The function nodes is extended to states by nodes = nodes
To show that a pair is a state it has to be shown that e maps program variables to nodes in G. A pair in which for some variable x, is not a node of G, is said to be non variable binding; otherwise it is variable binding. All but finitely many variables will be undefined at any time and so they will map to a node in G.
The function “term” is extended to states by term = term for program variables x. However, we often prefer the more complex notation term because it makes clear exactly how the term of a program variable is computed.
Sometimes in specifying the semantics it is necessary to modify the environment of a state; this can be done using the function fix.env defined by fix.env. The function fix.env is an auxiliary function and is not in the syntax of the language.
We say that two environments e and conflict if for some variable .
Program variables x have a term value and a node value in a state S. If S is then the term value of x in S is term which is term and the node value is .
The function equal.node (a node equality test) tests if the nodes and are the same. This function is in the syntax of the language.
3.4. Undefined Variables
Now for some v with label for undefined variables. Formal parameters have initial values as specified by the call to the procedure. Also the environment is defined to be the environment e such that for all program variables x, term = . A more detailed semantics for procedures is given later.
We don’t specify how and are handled.
3.5. Denotational Semantics (Assuming Termination)
We define the denotational semantics
of statements
q in the language by induction on the length of a computation. We also do proofs of properties of the language by a similar induction. Note that the notation for the denotational semantics used here differs from that in the book [
NK14] that includes Isabelle proofs of its results. However, that language does not have arrays or procedures.
Our semantics uses innermost left to right evaluation for functional expressions (terms).
First we give an informal description of the denotational semantics, then a more formal description.
If t is a functional expression (a term) then semantically it maps from states to (node,state) pairs. During the execution (evaluation) of a functional expression, the state can be modified by evaluating functions in the term and by creating structure in the term graph to store the result computed by the term. Other structure besides this may be added to the graph during the evaluation of the expression. The Replace operation also modifies existing structure in the term graph. In this way the functions (procedures) in the term can modify the term graph.
The original environment is restored after the evaluation of a functional expression, though other environments may be created for other functions during the evaluation.
(states → nodes × states) is the type of the semantics of functional expressions t; the environment does not change and if then v is a node in the graph of state . We often write as .
If P is an imperative statement then maps from states to states. The environment and graph may change. (states → states) is the type of the semantics of imperative statements P. We often write as .
3.6. Length of a Computation Sequence
We define the length of a computation sequence and use it to do proofs by induction of properties of the computation, assuming termination. To prove that something is true of a statement P on a state S we assume that it is true for all that are smaller than in the ordering and show that it is true for . Assuming termination, this shows that the property is true of all pairs .
We use the notation for the length of the computation sequence for of a statement or expression P operating on a state S. is a nonnegative integer value that decreases with each computation in a terminating computation. We define along with the semantics for each kind of statement P. For auxiliary functions used to define the semantics we can define for an expression E as the length of the computation it represents. Hopefully the context will make this clear. Another way to formalize this is not to require to have integer values but just abstractly to put a partial ordering on the pairs where if the computation of is a strict subcomputation of the computation for . Then we can say that the computation for is terminating if there are no infinite descending sequences starting from the initial pair . This works by König’s Lemma because each spawns only finitely other pairs such that . The advantage of the partial ordering formalism is that it makes sense even for infinite computations and possibly can help to define a denotational semantics in that case. However, defining as an integer value is simpler for terminating computations.
3.7. Semantics of Imperative Statements
Now we define the semantics of imperative statements.
3.7.1. Assignment Statements (Note This Includes Simple Assignment Statements, Multiple Assignment Statements, Copy Statements, and Argument Replacements)
For an environment e define by and for .
= fix.env(env where
(Simple assignment statement)
.
For the partial order version, .
At the level of an assignment statement the environment of is the same as the environment of S by Theorem 4 which will be proved later. However env(S) has to be modified to reflect the new assignment to x.
Now the sequence makes two different arrays unless the term is in the set to be discussed later. But just makes one array pointed to by both x and y. In the former case an argument replacement statement in state (defined below) will not change term but in the latter case it will. Note that arg can be a term of the form , too.
3.7.2. Semantics of (Multiple Assignment Statement)
|
|
For the partial order version,
.
The semantics of arg will be defined later. There is no need to define a tupling operation because one can use for that where f is a constructor, and then use the arg function to extract the . The arity function can be used to determine the number of arguments of f.
3.7.3. Copy Statement
| If then |
|
for some such that
|
| = new.term. |
|
. No partial order statement is needed here. |
In the copy statement, the variable x is assigned a node distinct from v but having an identical term.
3.7.4. Argument Replacement
This statement is useful for implementing arrays efficiently.
Let and suppose . Let where H = Replace. Then . Here A is a program variable.
We informally speak of this statement as replacing the argument of term with term. We say that the argument replacement operation is of type (term,term, term.
. For the partial order version, .
3.7.5. Conditional Statements
=
if term = then else
if term = true then else
where .
[if t then else = if term() = then else if term = true then else where .
For the partial order version, [if t then else , and also if term = true then [if t then else , and if term true then [if t then else . From now on the partial order version will be omitted.
In a conditional statement, first t is evaluated. If it returns undefined then the state that results from the evaluation of t is returned and neither nor is executed. If t returns true then is executed else is executed.
If the second part of the statement is omitted the semantics is simpler.
=
if term = then else
if term = true then else
where .
[if t then ]: similar to [if t then else ].
3.7.6. Iterative Statements
=
if (term = then else
if term true then else
where
[while t do ] = if (term = then else if term true then else [while t do , where .
Now term is essentially the value resulting from evaluating t in state S and is the state resulting from the execution of t. If t evaluates to undefined or anything other than true, the loop exits in state . Otherwise the loop iterates on the state resulting from the execution of P in state .
for step k until n do : Equivalent to the following:
;
while do
One might define until in a similar style.
3.7.7. Sequence of Statements
For two imperative statements define as .
.
3.8. Nontermination
To handle nontermination it would be necessary to make use of the symbol and possibly some form of denotational semantics using complete partially ordered sets. Note that no imperative statement changes the set of nodes in the term graph.
3.9. Evaluating Functional Expressions
For functional expressions in general, we assume leftmost innermost evaluation so that when a term is evaluated, the top-level subterms will evaluate to nodes of G that represent ground constructor terms. Computing where P is a functional expression, is an environment, and G is a term graph is done as follows.
If P is a variable x then and .
If P is a term where the are terms in then let , let where is a node in , let , …, and let . All the have the same environment and all the are nodes of . The environment is needed to evaluate the because they can contain program variables from the calling procedure.
If any of the fail to terminate then the whole expression does too and the value is ; of course, it may not be possible to compute this value because of nontermination. We do not specify how or are handled if some returns them.
Let S be the state with env and .
Then = finish.function where the are as above and finish.function remains to be defined. So finish.function assumes the arguments of f have been evaluated left to right to obtain the nodes in the term graph and to obtain the state S. Then finish.function applies f to these nodes and to the resulting state S in a way that depends on what kind of a function f is. Finish.function returns a (node, state) pair. The notation is convenient but the nodes cannot actually appear as arguments to f in a functional expression in the language. This notation could be more precisely written as . The nodes of G may appear as children of a node of the form eventually in the term graph. Also = +[finish.function. We want to compute finish.function where S is the state with env = and for various kinds of function symbols f. First it is necessary to define some auxiliary functions for rewriting.
None of these functional expressions directly change the term graph.
3.10. Rewriting Semantics
The use of term graphs makes pattern matching convenient. Therefore this language provides for pattern matching in a rewriting facility. The rewriting facility is not needed for Turing equivalence. The rewriting facility does one rewrite at the top level and then does recursive evaluation. The subterms are first evaluated left to right to normal forms. Recall that all normal forms are constructor terms and even if an implicit type error occurs, the normal form is
, which is also a constructor. This language does not have an explicit type system. This paper does not have a formal discussion of term rewriting systems. There is an extensive literature on this topic; for example, see [
BN98,
DJ90,
DP01]. The evaluation of functions defined by rewriting is performed by the function graph.rewrite defined as follows. How this relates to finish.function is explained below.
Suppose the function symbol
f is defined by rewriting in a statement of the form
in which the syntactic restrictions given earlier are obeyed. In particular the top symbols of the
must all be
f, the
are linear, and all variables in
appear also in
.
Let E represent the expressioni . Then graph.rewrite returns a (node, state) pair and is defined as
if match.list then (match.list else if
match.list then (match.list else … else if
match.list then (match.list else .
In this last case there is an implicit type error.
Also as for the termination measure, [graph.rewrite =
if match.list then ,(match.list] else
if match.list then ,(match.list] else … else
if match.list then ,(match.list] else 1.
Match.list returns an environment (a substitution) and is defined as follows, assuming r is :
if then else if and then
(if match then
else if match then else …
else if match then
else match match).
match is defined as follows:
if r is a variable then [i.e. match] else match.list where is in G.
None of these functions directly change the term graph. However evaluating (match.list may change the term graph.
The basic idea for functions defined by a sequence of rewrite rules is to extract the terms term from ,, do a top level rewrite on the term f(term, … term using the first applicable rewrite rule for f from the list , and evaluate the resulting term recursively. The are obtained from the terms as indicated above by evaluating the left to right. Here we are not concerned with confluence issues. The function graph.rewrite performs this rewrite and makes use of match.list to get a substitution (an environment) mapping the left hand side r of a rewrite rule to f(term, … term. In evaluating match.list, r cannot be a variable because the left hand side of a rewrite rule cannot be a variable. The function match.list does not modify G and does not make use of the current environment e. In the routine match.list we make use of the routine match to match term r to node v in G. Here r is a subterm of the left-hand side of a rewrite rule. This routine match is similar to match.list but also permits the term r to be a variable. These definitions assume that r is left-linear so that the variables in the different are distinct. Because of left linearity there should be no conflicts in the union of environments in match or match.list. Match (and therefore match.list) terminates even if G has cycles because we assume r is a finite cycle-free term (rewrite rules are finite).
3.11. Evaluation of Finish.Function
The evaluation of finish.function depends on what kind of a function is being specified.
3.11.1. Compiled Functions
If the function symbol
f appears in a definition of the form
then as mentioned earlier the instances of such a compiled function are expressions of the form
where the
and
t are finite ground (variable free) constructor terms and the assertion
A holds, that is,
is a theorem of the appropriate logical theory. These instances can be considered as ground rewrite rules. Then the semantics are the same as if the compiled function were defined by the possibly infinite list of such rewrite rules. Alternatively, suppose
= term
. If the
are all finite constructor terms and there is one and only one constructor term
t such that
is a theorem of the relevant logical theory, then finish.function
. Otherwise, finish.function
=
. Also as for the termination measure, [finish.function
= 1.
For all non-logical function symbols in the underlying theory there is a corresponding compiled function. For example, there is a compiled function
The occurrence of + on the left is considered as an infix defined symbol in ITGL and the occurrence on the right is considered as a symbol of the underlying theory.
3.11.2. Constructors
If f is a constructor then finish.function where = make.term (as defined earlier) and [finish.function] = 1. Thus env = e.
This is the place where the set of nodes in the term graph increases. This can happen in the function make.term, new.term, or cache.term.
3.11.3. Rewriting
If f is defined by rewriting then finish.function = graph.rewrite. Note that the environment is not needed for graph.rewrite because the all represent ground terms and the variable case was covered above. So here also env and [finish.function ] = [graph.rewrite] as given earlier.
3.11.4. Procedures
Suppose f is an imperative procedure defined by where the are the formal parameters.
Then finish.function is defined as follows where : Let be defined by and for other program variables x. So no other values can be passed in to the procedure except by the formal parameters. Therefore free variables in the procedure do not pass any values in.
Let be fix.env.
Let be and note that B may change .
Let be and recall that does not change env.
Then finish.function = .
Later we will show that is variable binding because only refers to vertices in .
[-25]Thus in this case also env = so if program variables are changed during the evaluation of they will be restored. Also [finish.function.
None of these functions directly modify G but the evaluation of B and t and the in finish.function may modify G.
3.11.5. Special Symbols
We now define some special defined function symbols. For these the constants true, false, , , and integers i are considered as constructor functions with arity zero. Although top, equal.top, equal.node, arg, and arity are technically constructors, they will never appear in the term graph because during their evaluation by finish.function they are removed. The function symbol copy also cannot appear in the term graph though it is technically a constructor.
finish.function(top = finish.function if label.
If f is an individual constant and label. then finish.function(top = finish.function .
finish.function(equal.top = if (label = label) then finish.function (true,S) else finish.function (false, S).
finish.function(equal.node = if () then finish.function(true,S) else finish.function (false,S).
finish.function(arg = if label, arity, and arg.
finish.function(arg = finish.function if label, arity or .
finish.function(arity = finish.function if arity. We are assuming that integers are individual constants here.
In all these cases [finish.function] = 1.
3.11.6. Array Initialization
It turns out that in all the t need not return the same value even though the environment is restored after each functional expression. This is because of the effect of argument replacement.
Consider this sequence:
procedure d
procedure
Suppose g and h are constructors (because they are not defined). Then when d is called it will call g. The first argument of g will return which will be arg or 1. The second argument of g is but now x is bound to because of the argument replacement statement in procedure f so the second argument of g will evaluate to 2. If instead were called its arguments would evaluate to 1, 2, and 3, respectively.