2009-07-14

Parsing numbers in LL(1) grammar.

Grammar for numbers:
NUMS --> NUM NUMS NUMS --> NUM --> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
This grammar is belongs in LL(1) grammar type.

Parser for this grammar done in parse.py:

#!/usr/bin/env python # -*- coding: utf-8 -*- # Copyright (C) 2009 by Oleksandr Gavenko class ctx: def __init__(self): self.__n = 0 def add_digit(self, digit): self.__n *= 10 self.__n += digit def get_n(self): return self.__n n = property(get_n) _ctx = ctx() _ctx.add_digit(5) _ctx.add_digit(6) print _ctx.n == 56 def num_token(str, pos, ctx): ctx.add_digit(int(str[pos])) def nums_token(str, pos, ctx): if len(str) > pos: num_token(str, pos, ctx) nums_token(str, pos+1, ctx) def parse(str): _ctx = ctx() nums_token(str, 0, _ctx) return _ctx.n str = "123" print parse(str) == 123
Here some significant design ideas:
  • nums_token and num_token functions represent production parser for non terminals NUMS and NUM.
  • nums_token and num_token have recursively relationships.
  • nums_token and num_token in there code change state of ctx (side effect, do useful job).