Regular expressions and grammars¶
Module tock.regexps¶
- class tock.regexps.RegularExpression(op, args, start=None, end=None)¶
A (abstract syntax tree of a) regular expression.
- Parameters:
op (str) – Possible values are: ‘union’, ‘concatenation’, ‘star’, ‘symbol’.
args – tuple of RegularExpression or Symbol objects.
The empty string is represented as RegularExpression(‘concatenation’, ()).
The empty set is represented as RegularExpression(‘union’, ()).
- classmethod from_str(s)¶
Constructs a RegularExpression from a str.
Regular expressions allow the following operations (precedence from lowest to highest):
Symbols
Empty string (
&
)Empty set (
∅
)Union (
|
)Concatenation: Two concatenated symbols must be separated by a space. For example,
a b
isa
concatenated withb
, butab
is a single symbol.Kleene star (
*
)
- tock.regexps.from_regexp(e, display_steps=False)¶
Convert a regular expression to a NFA.
- Parameters:
e (RegularExpression or str) – the regular expression to convert.
display_steps (bool) – if True and if run inside a Jupyter notebook, displays all steps of the conversion.
Module tock.grammars¶
- class tock.grammars.Grammar¶
A string-rewriting grammar.
- add_nonterminal(x)¶
Add x to the nonterminal alphabet.
- add_rule(lhs, rhs)¶
Add rule with left-hand side lhs and right-hand side rhs, where lhs and rhs are both Strings.
- compute_first(nullable=None)¶
Compute, for every terminal, nonterminal, and rhs suffix α, the set of terminals b where α ⇒* b γ for some γ.
- compute_follow(nullable=None, first=None)¶
Compute, for every nonterminal A, the set of terminals b where S →* γ A b δ for some γ, δ.
- compute_nullable()¶
Compute, for every nonterminal and rhs suffix α, whether α ⇒* ε.
- classmethod from_file(filename)¶
Read a grammar from a file.
- Parameters:
filename (str) – name of file to read from
- Returns:
a CFG
- Return type:
The file should contain one rule per line, for example:
S -> a S b S -> &
Currently the grammar must be a context-free grammar. The nonterminal symbols are exactly those that appear on a left-hand side. The left-hand side of the first rule is the start symbol.
- classmethod from_lines(lines)¶
Read a grammar from a list of strings (see from_file).
- Parameters:
lines – a list of strings
- Returns:
a CFG
- Return type:
- has_strict_start()¶
Returns True iff the start nonterminal does not appear in the rhs of any rule. I don’t know what the correct terminology for this is.
- is_contextfree()¶
Returns True iff the grammar is context-free.
- is_contextsensitive()¶
Returns True iff the grammar is context-sensitive, that is, each rule is of the form α A β → α B β where one of the following is true:
A is a nonterminal and len(B) > 0
A = S, α = β = B = ε, and S does not occur on the rhs of any rule
- is_leftlinear()¶
Returns True iff the grammar is left-linear, that is, it is context-free and every rule is of the form A → B w or A → w where w contains only terminals.
- is_noncontracting()¶
Returns True iff the grammar is essentially noncontracting, that is, each rule is of the form α → β where one of the following is true:
len(β) ≥ len(α)
α = S, β = ε, and S does not occur on the rhs of any rule
- is_rightlinear()¶
Returns True iff the grammar is left-linear, that is, it is context-free and every rule is of the form A → w B or A → w where w contains only terminals.
- remove_useless()¶
Returns a new grammar containing just useful rules.
- set_start_nonterminal(x)¶
Set the start symbol to x. If x is not already a nonterminal, it is added to the nonterminal alphabet.
- tock.grammars.from_grammar(g, mode='topdown')¶
Convert a CFG to a PDA.
- Parameters:
g (Grammar) – the grammar to convert, which must be a CFG.
mode (str) –
selects which algorithm to use. Possible values are:
"topdown"
: nondeterministic top-down, as in Sipser (3e) Lemma 2.21."bottomup"
: nondeterministic bottom-up."ll1"
: LL(1) deterministic top-down."lr0"
: LR(0) deterministic bottom-up."lr1"
: LR(1) deterministic bottom-up.
- Returns:
a PDA equivalent to g.
- Return type: