Parsing in Python
Did you ever feel the need of learning lex and yacc? I did.
I just recently found a great Python module for parsing grammars: pyparsing. In contrast to traditional, parser-generating approaches, this framework doesn't require you to learn a specific toolchain. It also doesn't generate any code. It's a class library: You construct your grammar by connecting objects.
When building very basic grammars, it looks very similar to the BNF
. Thanks to Python's operator overloading, it's possible to compose
parse nodes (non-terminals) using operators like +
(concatenation), ^
(or) and |
(match-first). Here's what it looks like:
from pyparsing import * IntLiteral = Regex('[\\+\\-]?\\d+').setParseAction(lambda s,l,t: int(t[0])) VariableName = Regex('\\w+') EqualSign = Regex('\\s*=\\s*').suppress() WS = White().suppress() KeyValue = Group(VariableName + EqualSign + IntLiteral)
Strings can now be parsed by calling parseString()
on the
grammar:
self.assertEquals([['foo', 234]], KeyValue.parseString('foo=234').asList())
For my requirements, this is a very usable approach to parsing. It may not be as fast as a generated parser in C, but it's easy to learn and takes way less time to write.
Update: Shortly afterwards, I found out there is a name for this
approach: It's a Parser Combinator library.