Wednesday, July 26, 2006

Symbols in Python

Jeff Shell follows up on my DSL post:

There is something about that that just looks kindof... nice. :all. In my editor, Symbols are colorized differently than strings, which helps them stand out even more. find :all. Not findAll() or find(all=True) or find('all') like one might have in Python (all of which are OK solutions, but man.. those Symbols).

Evidently, he's missed the availability of my SymbolType package via the Cheese Shop.  ;-)

Anyway, Python can do symbols quite easily enough without any language changes.  That's why I highlighted function application and block syntax as being the important features of Ruby where DSL's are concerned.  And of the two, the block syntax is by far the most critical.

Interestingly enough, Python actually came very close to getting a DSL-usable block capability in PEP 340 -- authored by Guido himself!  At the last minute, however, it was rejected due to excess flexibility.  More specifically, Guido was convinced by an article about the problems with control-flow macros in C that having a statement whose execution semantics were runtime-defined was a bad idea.  (The original PEP 340 "block" statement would have allowed the block body to execute zero or more times, rebinding the variables in the "as" clause.)

In truth, for many DSL's, PEP 340 still wouldn't have been good enough, even if it had been chosen.  What you really want for a block is something that's basically a function definition that can share variables with its enclosing scope, and possibly rebind them as well.

All in all, the best solution for Python DSLs is probably to create a macro or language extension toolkit that allows syntax-sugared translations to pure Python, that are debug-info preserving (e.g. line number tables correct so you can step through your sugared code).  I've taken a few steps in that direction with my (unreleased) SCALE library and BytecodeAssembler, but it's going to be a while before I get around to doing much more with them.

The "big idea" behind SCALE is simply that Python's lexical syntax and indentation rules can be used to implement a variety of domain-specific languages or extended variants of Python.  (Right now, there's only a parsing/unparsing library implemented, though.)  And, if coupled with an appropriate import mechanism, it could allow unlimited extension to Python in the style of Logix, but without compromising on syntax.  (Logix's base language isn't precisely Python in syntax or semantics; I would want any Python extension languages to be truly Python in their roots.)

Luckily, Python provides us with enough hooks that language extensibility is possible.  It's just not practical to do it without writing parsing code at the moment.  But maybe that will change some day.

Saturday, July 15, 2006

Schema Analysis and the need for Python-based DSLs

I recently came across this interesting project:

"The Alloy Analyzer is a tool developed by the Software Design Group for analyzing models written in Alloy, a simple structural modeling language based on first-order logic. The tool can generate instances of invariants, simulate the execution of operations (even those defined implicitly), and check user-specified properties of a model. Alloy and its analyzer have been used primarily to explore abstract software designs. Its use in analyzing code for conformance to a specification and as an automatic test case generator are being investigated in ongoing research projects."

I took a little time to go through the tutorial, and discovered that the language is very much like the domain-relational calculus, meaning that it expresses a schema in terms of sets and relationships between them.  It actually seems like a terrific language for designing object schemas and expressing constraints over them.  It's gotten me to wondering whether I could perhaps find a way to extend this Python cookbook recipe for expressing prolog-like rules to support the full expressiveness of Alloy, but using a nice Pythonic syntax.

One of the big problems in expressing schemas compactly in pure Python is that you often need to be able to refer to types before they are defined.  For example, if you want to say that a FileSystemObject has a parent that is a Directory, but Directory is a kind of FileSystemObject, then you have a forward reference that can't be easily expressed in today's Python.  The schema definition tools that I created for Chandler and PEAK both work around this by either requiring the later type (Directory) to define an inverse relationship that then links to the forward relationship, or else use strings to refer to types not yet defined.  Neither of these approaches is particularly satisfactory.

So the advantage of the clever cookbook recipe is that it simply makes all names in a function body have a symbolic meaning, automatically creating objects for those names and executing the function code in a context where the names are bound.  Then, traditional operator-overloading techniques can take over from there.  It seems like one could use this to define a schema with something like:

@schema
def foo():
    + FSObject(parent = Dir)
    + Dir(contents = set[FSObject]) < FSObject
    + File < FSObject

Okay, so that's not exactly pretty. In fact, about the only good thing about it is that it allows forward references. But I do plan to study Alloy's reference manual and see if I can find a more natural mapping from its concepts to Python. The forward-reference issue is actually a pretty minor problem, compared to the issue of trying to concisely express generalized set constraints like "no directory can be a child of itself" in Python. These are just a few of the things needed to implement my utopian dreams for peak.schema, that effectively require a logic or set (mini-)language to sort out.

Unfortunately, Python isn't the most suitable language in the world for creating domain-specific languages like these. This is one place where Ruby really does have an advantage over Python -- as opposed to all the supposed advantages touted by Ruby enthusiasts who don't get that Python already has all the stuff they're jabbering about.

However, Ruby's advantage in this area basically boils down to two things: being able to apply functions without parentheses, and having code blocks. The first is nice because it makes it possible to create commands or pseudo-statements, and the second is a necessity because it allows those pseudo-statements to encompass code blocks. Without at least the second of these features, Python is never going to be suitable for heavy-duty DSL construction.

Blocks, however, are probably the one thing that Python will never get, due to concerns about people creating obfuscated code. That is, allowing every programmer to be a language designer means that every program can become its own language: the slippery slope that leads to Lisp. :-) One of the things that makes Python great is that it's a simple, easy-to-learn language. Sort of a "learn once, read anything" principle. Customizable syntax means that in the degenerate case there could be a new language to learn for every program you want to read. I'm honestly not sure I'd want to open those floodgates, despite the fact that it occasionally inconveniences me not to be able to create DSLs.