Adding a new statement to Python's syntax in python_terp

This stackoverflow answer [1] shows how to add an until statement to Python. In this post I will show how to do that in python_terp and how to debug these modifications.

This will also serve as some of the missing documentation for python_terp.

The until statement is analogous to the while statement. Its block is executed until the condition is met. As in the stackoverflow answer, we expect

num = 3
until num == 0:
    print(num)
    num -= 1

to print

3
2
1

Adding until_stmt

Start the interpreter and try this.

$ ipython -i test/python_repl.py
p>> simport simple_ast
p>> num = 3
p>> until num == 0:
...     print(num)
...     num =- 1
...
...

Not too surprisingly, with no modifications, the parser isn't able to handle the until statement. Let's press Control-D and drop to the host interpreter to make our changes.

We will introduce a new grammar rule until_stmt similar to while_stmt [2].

In [1]: grammar = python_grammar.full_definition + python_grammar.extra
In [2]: grammar += r"""
...: until_stmt = "until" {test} ":" {suite}
...: """

and add it to the list of compound statements

In [3]: grammar += r"""
   ...: compound_stmt = if_stmt | while_true_stmt=while_true | while_stmt
   ...:           | until_stmt
   ...:           | simple_for_stmt | for_stmt | try_stmt | with_stmt
   ...:           | funcdef | classdef | decorated
   ...: """

and replace the parser with a new parser with the modified grammar

In [4]: pyterp.parser = python.Interpreter(i3.parse("grammar", grammar))

Now add the semantics by adding a function to simple_ast.py mimicking while_stmt.

def until_stmt(condition, block):
    while_true:
        if evaluate(condition):
            return None
        evaluate(block)

And that's it, we're done! Save simple_ast.py and test it out.

In [5]: pyterp.repl()
p>> simple_ast.reload_module(simple_ast)
p>> num = 3
p>> until num == 0:
...     print(num)
...     num -= 1
... 
3
2
1

Debugging

Now, we won't always be so lucky and get everything right on the first try. Let's see some examples of debugging.

Debugging the parser

To help debug the parsing step, we can add breakpoints to the grammar with !(bp()). For example

until_stmt = !(bp()) "until" {test} ":" {suite} !(bp())

bp() is just shorthand for pdb.set_trace() which is imported by default. We could write any Python expression between !(...).

Let's also set .source this time so we can get stack traces in the parser.

In [7]: grammar += r"""until_stmt = !(bp()) "until" {test} ":" {suite} !(bp())"""
In [8]: pyterp.parser = python.Interpreter(i3.parse("grammar", grammar))
In [9]: pyterp.parser.source = grammar

Now if we read input from a string or file (reading from the p>> prompt will break too often: once after each line). Instead we'll make a string with the test and parse that.

In [10]: s = """
  ....: until num == 0:
  ....:     print(num)
  ....:     num -= 1
  ....: """
In [11]: pyterp.parser.parse("grammar", s)

We hit the first breakpoint. Here, we can examine the input

--Return--
> <string>(1)<module>()->None
(Pdb) self.input
['\nuntil num == 0:\n    print(num)\n    num -= 1\n', 0]

(0 is the index in the input we are at) and the grammar matcher's stack to make sure we're at the right place.


(Pdb) self.st()
0  In file <grammar> line 162 function None (apply)
grammar = file_input
1  In file <grammar> line 7 function file_input (and)
file_input = (EMPTY_LINE | SAME_INDENT stmt)* ENDMARKER
2  In file <grammar> line 7 function file_input (quantified)
file_input = (EMPTY_LINE | SAME_INDENT stmt)* ENDMARKER
3  In file <grammar> line 7 function file_input (or)
file_input = (EMPTY_LINE | SAME_INDENT stmt)* ENDMARKER
4  In file <grammar> line 7 function file_input (and)
file_input = (EMPTY_LINE | SAME_INDENT stmt)* ENDMARKER
5  In file <grammar> line 7 function file_input (apply)
file_input = (EMPTY_LINE | SAME_INDENT stmt)* ENDMARKER
6  In file <grammar> line 21 function stmt (or)
stmt = compound_stmt | simple_stmt
7  In file <grammar> line 21 function stmt (apply)
stmt = compound_stmt | simple_stmt
8  In file <grammar> line 170 function compound_stmt (or)
compound_stmt = if_stmt | while_true_stmt=while_true | while_stmt
          | until_stmt
          | simple_for_stmt | for_stmt | try_stmt | with_stmt
          | funcdef | classdef | decorated
9  In file <grammar> line 171 function compound_stmt (apply)
          | until_stmt
10 In file <grammar> line 169 function until_stmt (and)
until_stmt = !(bp()) "until" {test} ":" {suite} !(bp()) 
11 In file <grammar> line 169 function until_stmt (action)
until_stmt = !(bp()) "until" {test} ":" {suite} !(bp()) 

Everything looks alright. Let's keep going.

(Pdb) c
--Return--
> <string>(1)<module>()->None
(Pdb) self.input
['\nuntil num == 0:\n    print(num)\n    num -= 1\n', 20]
(Pdb) self.st()
[...]

25 In file <grammar> line 169 function until_stmt (action)
until_stmt = !(bp()) "until" {test} ":" {suite} !(bp()) 

We're now at the first breakpoint in the nested suite of the until block. Continue until we're finally back and at the second breakpoint.

(Pdb) c
--Return--
> <string>(1)<module>()->None
(Pdb) c
--Return--
> <string>(1)<module>()->None
(Pdb) self.st()
[...]

11 In file <grammar> line 169 function until_stmt (action)
until_stmt = !(bp()) "until" {test} ":" {suite} !(bp())

We can look at the parent frame's cumulated outputs to see if it matches our expectation

(Pdb) self.stack[-2].outputs
[None, 'until', __binary__['==', NAME['num'], NUMBER[0]], ':',
suite[print_stmt[NAME['num']], aug_assign[NAME['num'],
operation['-='], NUMBER[1]]]]

Let's pretty print some of that to see better

(Pdb) self.stack[-2].outputs[2].pprint()
__binary__
  str '=='
  NAME
    str 'num'
  NUMBER
    int 0
(Pdb) self.stack[-2].outputs[4].pprint()
suite
  print_stmt
    NAME
      str 'num'
  aug_assign
    NAME
      str 'num'
    operation
      str '-='
    NUMBER
      int 1

That looks like the condition and block's body, as expected. Now continue until parsing is complete.

Debugging semantics

For debugging the newly defined until_stmt function, we can add breakpoints with bpoint.

def until_stmt(condition, block):
    while_true:
        bpoint()
        if evaluate(condition):
            return None
        evaluate(block)

This drops us in the pdb debugger when bpoint(). From there we can get a stack trace and other information

(Pdb) u
-> return func(*args_value)
(Pdb) self.stack[-1].scope.keys()
['__func_name__', '__caller__', '__break__', 'block',
'__parent__', '__return__', '__func__', '__continue__',
'condition']
(Pdb) self.st()

0  In file <console-3> line 0 function __main__ (until_stmt)
until num == 0:
    print(num)
    num -= 1


1  In file <console-3> line 0 function __main__ (__call__)
until num == 0:
    print(num)
    num -= 1


2  In file lib/simple_ast.py line 151 function until_stmt (while_true)
    while_true:
        bpoint()
        if evaluate(condition):
            return None
        evaluate(block)


3  In file lib/simple_ast.py line 151 function until_stmt (suite)
    while_true:
        bpoint()
        if evaluate(condition):
            return None
        evaluate(block)


4  In file lib/simple_ast.py line 152 function until_stmt (__call__)
        bpoint()
5  In file lib/simple_ast.py line 152 function until_stmt (NAME)
        bpoint()
(Pdb) p self.stack[1].scope['num']
3

We'll continue once we're done with examining values

(Pdb) c

We could modify the AST for later iterations. For example if we no longer want a breakpoint and instead want to print the value of the condition.

Examining the stack with self.st(), we see that the bpoint() call is the index 0 child of the root of self.stack[3].

(Pdb) self.stack[3].root.pprint()
suite
  __call__
    NAME
      str 'bpoint'
    arglist
  single_if
    __call__
      NAME
        str 'evaluate'
      arglist
        NAME
          str 'condition'
    return_stmt
      NAME
        str 'None'
  __call__
    NAME
      str 'evaluate'
    arglist
      NAME
        str 'block'
  EMPTY_LINE
    str '\n'

Let's set it to print(evaluate(condition)) to show the truth value of the condition.

(Pdb) self.stack[3].root[0] = Node("print_stmt",
[Node("__call__", [Node("NAME", ["evaluate"]),
Node("arglist", [Node("NAME", ["condition"])])])])

That's pretty tedious to type. Instead we can use the parser

(Pdb) self.stack[3].root[0] = self.parser.parse("grammar",
"print(evaluate(condition))\n")

There's actually a shortcut made for this:

(Pdb) self.stack[3].root[0] = self.ast("print(evaluate(condition))")

Let's check we made the change at the right place and then continue execution.

(Pdb) self.stack[3].root.pprint()
suite
  print_stmt
    __call__
      NAME
        str 'evaluate'
      arglist
        NAME
          str 'condition'
  single_if
    __call__
      NAME
        str 'evaluate'
      arglist
        NAME
          str 'condition'
    return_stmt
      NAME
        str 'None'
  __call__
    NAME
      str 'evaluate'
    arglist
      NAME
        str 'block'
  EMPTY_LINE
    str '\n'
(Pdb) c
2
False
1
True
p>>

The loop stopped when the condition is true, as expected.

Stepping through

To execute the next AST node in python_terp (rather than the host interpreter), we can enable the debugger after hitting the first breakpoint.

(Pdb) self.debug = True
(Pdb) c
(dbg)

This debugger is a bit buggy at the moment from lack of use and maintenance. It supports some of the usual commands such as

It can store multiple commands with the b prefix. So b n n n s s is the same as pressing n three times (followed by enter each time) and then s twice. Its intended to be used to come back to an earlier point from a breakpoint quickly (in cases where its not worth setting up the appropriate breakpoint at the destination instead).

Finally, a lone ! drops us in the python_terp REPL (while keeping the current call stack).

Adding statements in CPython

Since that stackoverflow answer, new features in Python 3.3+ allows more ways to modify CPython by using only Python. So to add until_stmt, we can probably use token transformations into while_stmt (and not) although I haven't thought this one through.

Footnotes

[1] That answer is a mirror of this blog post by the same author.

[2] For simplicity, we'll add until_stmt without an optional else block. Adding an else block would not be very difficult either.

Posted on Feb 4, 2017

Blog index