4.9 Trees
In this document,
unless otherwise stated,
- by
tree,
we mean a
labeled ordered tree; and
- by
tree node,
we mean a
labeled ordered tree node.
For brevity,
in contexts where the meaning is clear,
we refer to a tree node simply as a
node.
Especially when looked at from the point of view
of its labels,
a node
is often called an
instance.
A node is a pair of tuples:
- The first element tuple of a node is
a “label tuple”.
The label tuple is a triple
of symbol ID,
start Earley set ID, and
end Earley set ID.
For more about the Earley set IDs,
see Input.
- The second element tuple of a node is
a list (ordered set) of nodes.
We note that this definition of a tree node is recursive.
In the following list of definitions and assertions, let
nd = [ [ sym, start, end ], children ]
be a tree node:
- We say that
that sym is the symbol of nd.
- We say that nd is an
instance of the symbol with ID sym
starting at start and ending at end.
- We say that nd is an
instance of the symbol with ID sym at location end.
- We say that the
length of nd is the
difference between its start and end,
that is
end-start
.
- The length of nd is zero iff
start is the same as end.
Put another way,
the length of nd is zero iff
start = end
.
- We say that the elements of children
are the
children
of nd.
- We say that every element of children
is a
child
of nd.
- For brevity, we say
that the symbol sym is
at
end.
Note that this means we consider the location of a symbol
to be where it ends.
- nd is a
leaf node iff children
is the empty list.
A leaf node is also call a
leaf.
- nd is an
rule node iff it is not a leaf node.
- Every node is either a leaf node or a rule node,
but no node is both a leaf and a rule node.
- We say that nd is a
terminal node
iff nd is a leaf node
and sym is a terminal.
A terminal node is also called a
token node.
- We say that nd is a
nulled node
iff nd is a leaf node
and sym is not a terminal.
A nulled node is also called a
nulling node.
- Every leaf node is either a nulled node or a terminal node.
But, because nullable LHS terminals are not allowed,
no node is both nulled and terminal.
- We say that nd
is a
BNF node
iff
nd is not a terminal node
and sym is the LHS of a BNF rule.
- We say that nd
is a
sequence node
iff
nd is not a terminal node
and sym is the LHS of a sequence rule.
- Every node is a terminal node, a BNF node
or a sequence node.
But no node is more
than one of the these three.
This is because sequence rules never share a
LHS with a BNF rule,
and no BNF node or sequence node is a terminal node.
- If nd is a rule node, its
LHS is sym.
- If nd is a rule node, its
RHS is the concatenation,
from first to last,
of the symbols of the nodes in children.
- All nulled nodes are zero-length.
No terminal node is zero-length.
- We say that nd is an instance of
sym starting at start and ending
at end.
We also say that nd is an instance of
sym at end or, simply,
that nd is an instance of sym.
- Let r be the rule whose LHS
is equal to the LHS of nd,
and whose RHS is equal to the RHS of nd.
If nd is a BNF rule node,
there must be such a rule.
In that case,
We say that nd is an instance of
r starting at start and ending
at end.
We also say that nd is an instance of
r at end or, simply,
that nd is an instance of r.
- Let r be the sequence rule whose LHS
is equal to the LHS of nd.
If nd is a sequence rule node,
there must be such a rule.
In that case,
We say that nd is an instance of
r starting at start and ending
at end.
We also say that nd is an instance of
r at end or, simply,
that nd is an instance of r.
-
If nd is a nulled instance,
that sym is
nulled at location end or, simply,
say that the symbol sym is nulled.
Let nd1 and nd2 be two nodes.
If nd2 is a child of nd1,
then nd1 is the
parent of nd2.
We define
ancestor
recursively
such that
nd1 is the ancestor
of a node nd2
iff one of the following are true:
- nd1 and nd2 are the same node.
In this case we say that nd1 is the
trivial ancestor
of nd2.
- nd1 is the parent of an ancestor of nd2.
In this case we say that nd1 is a
proper ancestor
of nd2.
Simlarly, we define
descendant
recursively
such that
nd1 is the descendant
of a node nd2
iff one of the following are true:
- nd1 and nd2 are the same node.
In this case we say that nd1 is the
trivial descendant
of nd2.
- nd1 is the parent of an descendant of nd2.
In this case we say that nd1 is a
proper descendant
of nd2.
A tree is its own
root node.
That implies that, in fact, tree and node are just two different terms for the
same thing.
We usually speak of trees when we are thinking of the nodes/trees
as a collection of nodes,
and we
speak of nodes when we are more focused on the individual nodes.
A
parse forest
is a set of one or more
parse trees.
Each tree represents a
parse.
We have used “parse” as a noun in several senses.
Depending on context a “parse” may be
- the process of
parsing an input using a grammar,
- a parse tree, or
- a parse forest.
When the meaning of “parse” is not clear in context,
we will be explicit about which sense is intended.
[ TODO: give example of tree ]
[ TODO: define path ]
[ TODO: define left vs. right ]
[ TODO: define cut ]
[ TODO: define frontier ]
[ TODO: define top-down traversal ]
[ TODO: define bottom-down traversal ]