Parsing in Python

Did you ever feel the need of learning lex and yacc? I did.

I just recently found a great Python module for parsing grammars: pyparsing. In contrast to traditional, parser-generating approaches, this framework doesn't require you to learn a specific toolchain. It also doesn't generate any code. It's a class library: You construct your grammar by connecting objects.

When building very basic grammars, it looks very similar to the BNF . Thanks to Python's operator overloading, it's possible to compose parse nodes (non-terminals) using operators like + (concatenation), ^ (or) and | (match-first). Here's what it looks like:

from pyparsing import *

IntLiteral = Regex('[\\+\\-]?\\d+').setParseAction(lambda s,l,t: int(t[0]))
VariableName = Regex('\\w+')
EqualSign = Regex('\\s*=\\s*').suppress()
WS = White().suppress()

KeyValue = Group(VariableName + EqualSign + IntLiteral)

Strings can now be parsed by calling parseString() on the grammar:

self.assertEquals([['foo', 234]], KeyValue.parseString('foo=234').asList())

For my requirements, this is a very usable approach to parsing. It may not be as fast as a generated parser in C, but it's easy to learn and takes way less time to write.

Update: Shortly afterwards, I found out there is a name for this
approach: It's a Parser Combinator library.