Regular expressions and grammars

Module tock.regexps

class tock.regexps.RegularExpression(op, args, start=None, end=None)

A (abstract syntax tree of a) regular expression.

Parameters:
  • op (str) – Possible values are: ‘union’, ‘concatenation’, ‘star’, ‘symbol’.

  • args – tuple of RegularExpression or Symbol objects.

The empty string is represented as RegularExpression(‘concatenation’, ()).

The empty set is represented as RegularExpression(‘union’, ()).

classmethod from_str(s)

Constructs a RegularExpression from a str.

Regular expressions allow the following operations (precedence from lowest to highest):

  • Symbols

  • Empty string (&)

  • Empty set ()

  • Union (|)

  • Concatenation: Two concatenated symbols must be separated by a space. For example, a b is a concatenated with b, but ab is a single symbol.

  • Kleene star (*)

tock.regexps.from_regexp(e, display_steps=False)

Convert a regular expression to a NFA.

Parameters:
  • e (RegularExpression or str) – the regular expression to convert.

  • display_steps (bool) – if True and if run inside a Jupyter notebook, displays all steps of the conversion.

tock.regexps.to_regexp(m, display_steps=False)

Convert a finite automaton to a regular expression.

Parameters:
  • m (Machine) – the automaton to convert, which must be a finite automaton.

  • display_steps (bool) – if True and if run inside a Jupyter notebook, displays all steps of the conversion.

Module tock.grammars

class tock.grammars.Grammar

A string-rewriting grammar.

add_nonterminal(x)

Add x to the nonterminal alphabet.

add_rule(lhs, rhs)

Add rule with left-hand side lhs and right-hand side rhs, where lhs and rhs are both Strings.

compute_first(nullable=None)

Compute, for every terminal, nonterminal, and rhs suffix α, the set of terminals b where α ⇒* b γ for some γ.

compute_follow(nullable=None, first=None)

Compute, for every nonterminal A, the set of terminals b where S →* γ A b δ for some γ, δ.

compute_nullable()

Compute, for every nonterminal and rhs suffix α, whether α ⇒* ε.

classmethod from_file(filename)

Read a grammar from a file.

Parameters:

filename (str) – name of file to read from

Returns:

a CFG

Return type:

Grammar

The file should contain one rule per line, for example:

S -> a S b
S -> &

Currently the grammar must be a context-free grammar. The nonterminal symbols are exactly those that appear on a left-hand side. The left-hand side of the first rule is the start symbol.

classmethod from_lines(lines)

Read a grammar from a list of strings (see from_file).

Parameters:

lines – a list of strings

Returns:

a CFG

Return type:

Grammar

has_strict_start()

Returns True iff the start nonterminal does not appear in the rhs of any rule. I don’t know what the correct terminology for this is.

is_contextfree()

Returns True iff the grammar is context-free.

is_contextsensitive()

Returns True iff the grammar is context-sensitive, that is, each rule is of the form α A β → α B β where one of the following is true:

  • A is a nonterminal and len(B) > 0

  • A = S, α = β = B = ε, and S does not occur on the rhs of any rule

is_leftlinear()

Returns True iff the grammar is left-linear, that is, it is context-free and every rule is of the form A → B w or A → w where w contains only terminals.

is_noncontracting()

Returns True iff the grammar is essentially noncontracting, that is, each rule is of the form α → β where one of the following is true:

  • len(β) ≥ len(α)

  • α = S, β = ε, and S does not occur on the rhs of any rule

is_rightlinear()

Returns True iff the grammar is left-linear, that is, it is context-free and every rule is of the form A → w B or A → w where w contains only terminals.

remove_useless()

Returns a new grammar containing just useful rules.

set_start_nonterminal(x)

Set the start symbol to x. If x is not already a nonterminal, it is added to the nonterminal alphabet.

tock.grammars.from_grammar(g, mode='topdown')

Convert a CFG to a PDA.

Parameters:
  • g (Grammar) – the grammar to convert, which must be a CFG.

  • mode (str) –

    selects which algorithm to use. Possible values are:

    • "topdown": nondeterministic top-down, as in Sipser (3e) Lemma 2.21.

    • "bottomup": nondeterministic bottom-up.

    • "ll1": LL(1) deterministic top-down.

    • "lr0": LR(0) deterministic bottom-up.

    • "lr1": LR(1) deterministic bottom-up.

Returns:

a PDA equivalent to g.

Return type:

Machine