Tuesday, August 02, 2011

WSGI, Web Frameworks, and Requests: Explicit or Implicit?

In Python web programming and frameworks, there is a constant juggling act that takes place between "explicit" and "implicit".  Too explicit, and the code may get too verbose or unwieldy.  Too implicit, and the code may lose clarity or maintainability.

And nowhere can this tension be more clearly seen, than in the area of "request" objects.  After all, nearly every web programming framework has some notion of a "request" at its core: usually some sort of object with an API.

Now, as you may recall from my previous article, the Web-SIG originally set out in 2003 to standardize a universal "request" API for Python, but I diverted this effort towards a different sort of request API -- the WSGI "environ" object.

Where web framework request APIs usually emphasize methods and properties, the WSGI "environ" object is just a big bag of data.  It doesn't have any operations or properties.

But the upside to this downside, is that the enviornment is extensible, in a way that a request object is not.  You can add whatever you want to it, and you can call functions on it to do things that a request object would do with methods.  (Yay, freedom!)

But the new downside to that upside, is that if you want to use library functions on the environ instead of framework "request" object methods, you now have to pass the environ back into the library functions!  (Boo, hiss.)

Binding To The Environment

So, WSGI-era web libraries (like WebOb and Werkzeug) tend to define their next-generation "request" objects as wrappers bound to the environ.  As Ian Bicking put it:

"Everything WebOb does is basically functions on the environ"

Of course, this isn't the only strategy for managing request information.  Some web app frameworks dodge the argument-passing issue by using thread locals, or worse yet, global variables.  But they're still trying to solve the same problem: connecting actions that a web application needs to perform, with some notion of the "current request".

And in both cases, a key driver for the API design is brevity and ease-of-use (implicit) vs. clarity and consistency (explicit).

On the explicit side, It's annoying to be constantly saying "foo = bar(environ, etc)", if only because it somehow looks less Pythonic than "foo = request.bar(etc)".

So in effect, what we want in our frameworks is a way to (implicitly) bind operations to the "request", so that it isn't necessary to explicitly spell out the connection in every line of code.  (Even if we're still explicitly referencing the request object.)

In fact, we don't even want to have to include boilerplate like 'request = Request(environ)' at the top of our apps' code, and so we'd much rather have this binding take place outside our code entirely.

Now, this is where things get really interesting!  In order to get rid of this boilerplate, web libraries and frameworks will usually do one of two things.  Either:

  1. They provide a decorator to change the calling signature while keeping external WSGI compliance (like WebOb), or
  2. They ditch WSGI entirely and use a different calling signature  (like Django)

And in either case, we're now more or less back where we started, pre-WSGI, as you are now writing code with a calling signature that's implicitly coupled to a specific library or framework.

Sure, you get certain benefits in exchange for making this commitment, and you're less tightly coupled to libraries using option 1.  But it's still a pretty exclusive commitment.  If you want to use code from more than one library, you're going to have to write the boilerplate for each of them, except for whichever one you choose to be your "primary" - the main one that calls you and/or decorates your code.

The Original Goal Of WSGI

Now, the original idea for WSGI (well, my original idea, anyway) was that by letting "request" objects wrap the environ, and using "functions on the environ", we could get out of this situation.  As I wrote in the original PEP 333 rationale section:

"If middleware can be both simple and robust, and WSGI is widely available in servers and frameworks, it allows for the possibility of an entirely new kind of Python web application framework: one consisting of loosely-coupled WSGI middleware components.

"Indeed, existing framework authors may even choose to refactor their frameworks' existing services to be provided in this way, becoming more like libraries used with WSGI, and less like monolithic frameworks. This would then allow application developers to choose "best-of-breed" components for specific functionality, rather than having to commit to all the pros and cons of a single framework."

But what I didn't understand then, was just how annoying it is to have to explicitly pass the environ into every library function you wanted to use!

(Actually, it's not just that it's annoying from a number-of-keystrokes point of view, it's also more foreign to a Python programmer's sensibilities.  We don't usually mind receiving an explicit "self", but for some reason, we seem to hate sending one!)

And that (in a somewhat roundabout way) is how I ended up adding the experimental "binding" protocol to WSGI Lite.

Specifically, what the binding protocol provides, is a way to generically bind things to the environ dictionary, and pass them into your application's calling signature, while retaining WSGI compliance for any code that calls your function.

In other words, the binding protocol is a way to make it so that you can use as many libraries, functions, or objects for your request as you want, without needing to pass an 'environ' parameter to them over and over.

Now, in the simplest case, you can just use the binding protocol as a generic way to obtain any given library's request objects.  You can say, "my 'request' parameter maps to a WebOb request", for example.

But the really interesting cases come about, when you stop thinking in terms of "request" objects, and start thinking about what your application reallly does.

The Meaning of "Lite"

For example, why not bind a session object to your function's 'session' argument?  Or maybe what you really want is to just receive an authenticated user object in your 'user' parameter, and a cart object in your 'cart' parameter, instead of first getting a session, just so you can get to the user and cart.

In other words, what if you made your application goals more explicit?

Now currently, getting access to such application-specific objects requires either painfully-verbose boilerplate off of a raw WSGI environment, or an increasingly tight coupling to an increasingly monolithic framework that does more of the work for you.

But, with the Lite binding protocol, you can now represent anything that's tied to "the current request", just by creating a callable object that takes an environment parameter.

Which means you don't really need "request" objects any more in your main code, because you can simply arrange to be called with whatever objects you need, to do the thing you're actually doing.

And so your application code stops being about manipulating "web stuff", to focus more on whatever it is that your app actually does...  while still being just a WSGI app from the point of view of its caller.

(This by the way, is part of why I dubbed the concept "WSGI Lite", despite the fact that it adds new protocols to WSGI: it effectively lets you take most of the "WSGI" out of "WSGI applications".)

The Great "Apps vs. Controllers" Debate

Now, if you look at how non-WSGI-centric, "full-stack" frameworks (like Django, TurboGears, etc.) operate, they often have things they call "controllers": functions with more specialized signatures for doing this kind of "more app, less web" kind of stuff.  However, these frameworks tend to end up being very un-WSGI internally, because plain WSGI doesn't handle this sort of thing very well.

However, with the WSGI Lite binding protocol, you can write controllers with whatever signature you like, while remaining "WSGI all the way down".  Anything you want as an argument, you can just create a binding rule for, which can be as simple as a string (to pull out an environ key) a short function that computes a value, or a tiny classmethod that returns an object wrapping the environ.

And, if it's a callable (like a function or a method), it too can use the binding protocol, and ask for its arguments to be calculated from the request.

And that means that you can take, say, a generic binding rule that fetches a parsed form, and use it to write an application-specific binding rule that looks up something in a database.

At which point, you can now write a controller that uses that binding rule to get something it needs as an argument.

Where All This Is Going

Now, if you look at where all this is going, you'll see that you're going to end up with a very small application body: just the code that actually does things with the information that came in, and decides what to send back out.

Something, in fact, that looks very much like a "controller" would in a non-WSGI, full-stack web framework...  yet isn't locked in to one particular full stack framework.

Now, I don't know how clear any of the above was without code examples.  (Probably not very.)  But the endgame that I'm trying to describe, is a future in which both "full stack" and "WSGI-centric" frameworks use a common protocol to provide their features to applications.

And, more importantly, a future where full-stack features do not require learning a full stack framework.

And where every application is its own framework.

In effect, the binding protocol is a tool that allows every app to define its own embedded DSL: the set of high-level data objects and operations that it needs in order to do whatever it does.

And these high-level, application-specific objects and operations are composed of lower-level, domain-generic objects and operations (such as form parsers and validators, URL parameter extractors, session and cookie managers, etc.), obtained from libraries or frameworks.

And all of these objects are passed around via the environment and binding rules, while retaining WSGI Lite calling signatures...  making the entire thing "WSGI all the way down".

And yet, the code contained in those applications would not look like "WSGI" as we know it today.  For example:

@lite(
    user = myapp.authorized_user,
    output = myapp.format_chooser,
)
def todo_list(environ, user, output):
    return output(user.todo_items())

Or, perhaps the Python 3 version would look like this:

@lite
def todo_list(
        environ,
        user:   myapp.authorized_user,
        output: myapp.format_chooser
    ):
    return output(user.todo_items())

Neither of these looks anything like "WSGI" code as we know it today - it's more like a full-stack framework's code. But, where the bindings in a full-stack framework are implicit (like automatically formatting the output with a template or turning it into JSON), all of the bindings here are explicit.

And not only is explicit better than implicit, but...

Readability Counts!

You can see right away, for example, that this app is using some sort of chooser to render the output in some request-determined format, and you can track down the relevant code, without having to first learn all of the implicit knowledge of a particular framework's construction.

And, the point of this app function is immediately obvious - it displays a user's todo list. (Something that would otherwise be hidden under a pile of web I/O code, if this were written to plain WSGI or with a WSGI-centric library or framework.)

And what this means is, if this approach becomes a focal point for Python web development, then being a Python web programmer would not be a matter of being a "Django developer" or "TurboGears developer" or "Pyramid Developer" or any other sort of developer...

Other than a Python developer.

Because any Python developer could pick this up, without having to have all the implicit, framework-specific knowledge already in their head.

And hopefully, this will help get us to a situation where, instead of people saying, "you should use Python for your web app because framework X is great"...

People will say, "you should use Python for your web app because it lets you focus on what your application is really doing, and no matter what libraries you use, your code will be readable and maintainable, even by people who haven't used those libraries."

Or maybe just, "you should use Python for your web app because it's a great language for web development!"

Plumbing The Pipe Dream

Now, is all that just a pipe dream?

Maybe so. After all, there are still a lot of hurdles between here and there!

(For starters, I think that the actual binding protocol probably still needs some work!)

But if you want to make a "pipe" dream real, you've got to start with the requirements for the plumbing.

So right now, I'm collecting use cases from frameworks as I encounter them, to see what services the popular frameworks provide, and how they could be expressed as bindings.

But I'm also really interested in the problems that such frameworks have, in terms of how they currently communicate state, configuration, and other information to user code. Are there any open issues the binding protocol could solve now, or could solve with some additions?

Because that's what's really going to make the difference to adoption here. The authors of established libraries and frameworks aren't going to change things just beacuse I said this is a neat idea!

But if we can make the protocol solve some existing problems -- like helping to get rid of thread-local objects, for example -- then folks have another reason to get on board with a common protocol, besides it being a common protocol.

So, that's the interesting question that lies ahead:

Do you have any warts in your current app, library, or framework that this might help you solve? Or a feature you think it could help you add?

Leave me a comment here, or drop me an email via the Web-SIG!

Monday, August 01, 2011

Is WSGI Lite a Library or a Protocol? (And Why You Should Care)

In retrospect, my article yesterday about WSGI Lite made a rather glaring mistake: instead of carefully laying out the background rationale and explaining where WSGI Lite fits in to today's Python world, I threw a bunch of links at people and went "Whee!  It's neat!"

So, in hindsight, I should've expected reactions like "huh?"  "wha?" and "don't we already have WebOb and Werkzeug?"

My bad, guys.  I totally failed to highlight the really crucial point about WSGI Lite, and that is the distinction between "wsgi_lite" (the proof-of-concept/future reference library) and "WSGI Lite" (the PEPpable protocol).

See, in my mind, "wsgi_lite" the library is no more a competitor to WebOb and Werkzeug than the standard library's "wsgiref" package is a competitor to mod_wsgi: just because it has a server in there, doesn't mean it competes with servers!

I think it's a pretty safe bet to say that most WSGI (protocol) code does not use wsgiref (library), except maybe indirectly via something else.  And the same thing may well end up being true of wsgi_lite (the library) and WSGI Lite (the protocol).

Yeah, it's a little confusing.  I get that now.  When I was first writing the code, I called it "WSGI 2", and the decorators were "@wsgi2"  and "@wsgi1", instead of "@lite" and "lighten()".  I was even having the decorators change the "wsgi.version" environment key from (1,0) to (2,0) and back.

However, as the work progressed, the versioning didn't make a lot of sense to me, because in a sense, the core bits of the protocol weren't changing.  Instead, there were a handful of small protocols that, put together, make a new way of doing WSGI.  So I ended up deciding to call it "WSGI Lite", and dropped the version fudging.

But if you look at what is happening with the actual underlying protocol, I really am proposing something like a WSGI 2 here, or probably more like a 1.1.  (Sort of.)  The key point is that it's a protocol that can work in today's WSGI stacks, without needing a massive rewrite effort.

Granted, this means that if you have some pet gripes with WSGI, then Lite may or may not be able to solve them.  A couple people have approached me privately about those issues, and I'd like to start hashing them out on the Web-SIG shortly.

But in the meantime, I'd like to take the rest of this article to lay out just what (and why) WSGI Lite, the protocol, is.  (As opposed to wsgi_lite, the proof-of-concept implementation of the protocol.)

Why A New Protocol?

Because WSGI rots your brain.

Or, to put it less dramatically, it is damn near impossible to write correct WSGI middleware because there are too darn many things to think about.

In the Reddit thread about Armin's article, one person posted a bunch of links to the various patches they had to do to a piece of WSGI code in order to make it work correctly with various corner cases in the protocol, as bugs cropped up in interaction with other WSGI code.

And I took one quick look at one of those patches, and saw that it still had bugs.

Granted, it was a resource-leak bug, but that's not the point.  It shouldn't be so frickin' easy to make that kind of mistake.  (And the author was not exactly a newbie to either WSGI or web programming.)

And as I started writing my proof-of-concept (for what I originally thought of as "WSGI 2" rather than "WSGI Lite"), I discovered all kinds of other mistakes that people could make in their middleware, that had never even occurred to me before.

Even Ian Bicking, author of WebOb, realized after reading the WSGI Lite docs that WebOb contained a latent bug I described there!

So, something has to be done.  WebOb and Werkzeug are great libraries, but if libraries could solve the problem, it would already be solved.  That's why wsgi_lite (the library), is really just a test bed for WSGI Lite, (the protocol).

And the aim of WSGI Lite is not to solve all WSGI 1 problems, nor even the entire subset of WSGI 1 problems that can be addressed in a reasonably performant, backwards-compatible way using a pair of decorators.

Rather, the aim is to eliminate certain key obstacles to solving those problems.

Protocols, WSGI, and Game Theory

Back when I first proposed the idea that became WSGI (late 2003), the goal of the Web-SIG was to define standard "request" and "response" objects for the standard library.

So my counter-proposal to instead define a protocol, and not actually put any code for the protocol into the standard library, may have seemed a bit loopy to some folks.  Perhaps a bit like, "let's solve this problem by not solving this problem!"

But the reason that I did it -- and the reason it ended up working so well that damn near every dynamic language ends up more-or-less cloning WSGI these days -- is because of game theory.

Essentially, there was never any serious chance that a bunch of web framework developers with investment in existing APIs were ever going to get together and agree on the One True Request and One True Response: there were just too many differences in fundamental approaches, and way too much opportunity for bikeshedding.

In game theory terms, you could say there was no Schelling Point.  As Wikipedia puts it:

Consider a simple example: two people unable to communicate with each other are each shown a panel of four squares and asked to select one; if and only if they both select the same one, they will each receive a prize. Three of the squares are blue and one is red. Assuming they each know nothing about the other player, but that they each do want to win the prize, then they will, reasonably, both choose the red square. Of course, the red square is not in a sense a better square; they could win by both choosing any square. 

In other words, in trying to design the One True Request and One True Response, there was no single obvious "square" to choose: everything was up for grabs, so nobody could win the "prize" (i.e., the benefits of having a One True anything in common).

So what I did with my WSGI proposal was deliberately create a Schelling Point: a single red square in a board full of blues.

And the way that I did it, was to specifically remove any semblance of an API that would make WSGI look like another blue square.

Voila: the Web-SIG was able to shift from discussions about what color to paint the bikeshed, to substantive discussions about the guts of HTTP and what requirements we had for interfacing with it.

Now, notice that I'm not saying that I came up with WSGI by myself and I was a genius.  What I'm saying is, I gave the Web-SIG something to collaborate on, instead of something to compete over.

Let me repeat that: something to collaborate on, instead of something to compete over.

I could not have written the WSGI PEP by myself: I didn't have nearly enough information.  But the Web-SIG, in collaboration mode, could.

So what does all this have to do with WSGI Lite?

Well, once again, the idea is to create a collaborative Schelling Point: a protocol, rather than an API.  Because, once again, no one can agree on The One True WSGI Wrapper, when all we have are competing implementations with distinct APIs.

Granted, I may have shot myself in the foot this time, by starting with a proof-of-concept library rather than a PEP explaining the protocols!

Unfortunately, due to the nature of the requirements, I couldn't be sure the protocols would work without prototyping an implementation first, and still can't be sure the protocols really work without some community testing.  (And the shape of the protocols themselves evolved considerably over the last three days of implementing, documenting, realizing something sucked, then fixing it and trying again!)

But what are these protocols exactly?  What do they do, and why are they important?

The First Protocol: Calling Convention

The WSGI Lite protocol consists of a few basic elements working together:

  • A revised calling convention and return protocol
  • A server API extension for resource closing
  • An "argument binding" protocol

The first of these things is something that's been proposed for a long time, and there seems to be fairly widespread consensus that a Rack-style calling convention is a good idea.  WebOb, for example, already has some APIs that work on that calling convention, and I've never heard anybody saying that calling convention was bad, or that the current WSGI convention is better.

(Actually, the closest thing I've seen to somebody saying that, would be in the Hacker News thread about yesterday's article: somebody thought that WSGI Lite forces async code to use greenlets.  But that's a mistake, because WSGI Lite only requires greenlets or threads for code that uses write().  WSGI Lite response bodies can still be produced just as asynchronously as a standard WSGI response body can.)

Anyway, so, the first protocol is well-known to WSGIans, and largely uncontroversial, hence the "uhh..  don't we already have that?" reaction from some quarters.  What's been lacking is a co-ordinated way to move forward on that.

To put it another way, since that protocol lacks any "official" status or name, it's not really possible to use it as a Schelling Point of co-ordination between users and library authors.  Ian can't point to WebOb and say, "WebOb lets you use the [thingy] protocol", instead, he has to say, "WebOb is cool, you should use it."  Meanwhile, Armin is over there on the other side of the room, saying, "Werkzeug is cool, you should use it" too.

Meanwhile, the poor user is left in the middle of the room, scratching his or her head and going, "Uh, so what should I use now?", with respect to any "enhanced WSGI" APIs.

So, as far as this first sub-protocol is concerned, the ultimate point of WSGI Lite is going to be to nail down and "bless" a detailed and specific flavor of the calling protocol, to provide that co-ordination point for libraries to say what they offer to people, and for people to make choices about using them.

I seriously doubt that this is a very controversial proposal.  After all, many people have said they want this calling protocol, and some leading WSGIans (hm, that term even has the word "Ian" in it!) have actually implemented more or less that protocol in their libraries.

What's more, people have been asking me to do something about getting this protocol "out there", reflecting their subconscious realization that a Schelling Point is indeed needed to do this, and that I'm the most obvious "red square" for co-ordination where WSGI is concerned.

So be it.

That's why Armin's article finally pushed me to actually implement something...  and that's when I discovered the need for the other two sub-protocols in WSGI Lite.

The Second Protocol: Resource Management

See, as I was writing the decorators (called @wsgi2 and @wsgi1 at the time), I quickly began to notice that the "close()" part of WSGI was even more of a problem than I previously thought.

I won't go into detail here about the specific problems, or the protocol itself, as they're both laid out in the README file for the wsgi_lite library.  Suffice to say here that under plain WSGI 1, resource closing is fragile because any one piece of middleware can inadvertently break the close() chain.  This is likely more of a problem for WSGI code running on non-refcounting Pythons, but it can cause headaches even on CPython.

So, in order to solve that problem, I created a new resource closing protocol that allows applications to close multiple resources and to bypass broken WSGI 1 middleware.

This, I also expect to be a fairly uncontroversial protocol proposal.  The problem it addresses is not widely understood, nor is there a big popular push for it, but it's an annoying little problem that can bite you in the butt and make debugging difficult, especially on "alternative implementations" of Python.

However, as I began trying to use this new protocol, and writing the early documentation for it, I discovered even more problems with WSGI!

Specifically, I noticed that it was damn hard to document my new closing protocol in such a way that it could actually be used correctly without having to learn even more arbitrary rules about what to call when and where to fetch it from.

Indeed, I ended up with something that looked just as hard to get right, as WSGI middleware was in the first place!

And when I looked at it more closely, I saw two things that were going on.

The first, was that most people don't realize when you pass a WSGI environment to a WSGI app, it's not yours any more.  The application is allowed to clear it, put junk in it, or whatever.  So you absolutely cannot use that environment dictionary once you pass it on.

And this put the closing protocol in a bit of a bind, because the closing protocol needed to be called late in an app or piece of middleware, but retrieved early.

So, if you wrote the natural thing, the obvious thing, and pulled the closing key out of the environment at the point nearest where you were going to use it, then your code would have a latent bug in it.

And that's just evil.

This is the point at which I realized just how much brain rot the bare WSGI protocol has in it: there are lots of little things like this that will bite you in the butt, punishing you for doing the simple, obvious, straightforward thing.

And so that's when I realized that I needed...

The Third Protocol: Argument/Extension Binding

See, the new resource closing protocol I came up with is not the only WSGI environment extension out there -- there are lots of others.  But they share a few potential issues in common:

  1. Being pulled out of the environ at a point where they're no longer valid,
  2. Having to write boilerplate to check for their existence, and fall back to something else, and
  3. Mutually-incompatible decorators provided by libraries to fix problems 1 and 2!

That is, even if a library provides decorator support for its particular WSGI extension, you generally can't use more than one of them at a time.

And so, the argument/extension binding protocol fixes this by providing an argument-level decoration protocol, to replace function-level decoration as the way to solve problems 1 and 2 in existing libraries.

The idea here is that instead of trying to use a session decorator from library 1 and an authentication decorator from library 2, you can just use a single decorator with two keyword arguments.

This idea evolved gradually, as I first wrote a "@with_closing" decorator specifically to address the resource closing issue, and then noticed what having lots of decorators like that would lead to.  (And, sure enough, existing WSGI library wrappers have mutually-incompatible decorators for these purposes.)

Anyway, the argument binding protocol is basically a way to map keyword arguments to things that are derived in some way from the request environment.  It could be a parsed form, a session, an authenticated user, a cart...  you name it, you can have it.

In other words, the idea is once again to have a Schelling Point where libraries can be used collaboratively, instead of having to compete for users.  It also makes it easier for individual users to write one-offs for their particular application.  Writing a WSGI Lite argument binding is a few lines of code over top of whatever kind of request-based object(s) you have in your application, and you can then use them anywhere.

Or...  and this is the bigger point: you can then split out your nifty cart or session or whatever, and make it available to other users as a library, without needing to know dip about decorators.

And, again, it's a co-ordination point, because you can say, "Here's my new session library - and it supports WSGI Lite argument binding."  The binding protocol becomes something that libraries have in common, allowing users to focus on functionality instead of screwing around with which color bikeshed the decorator is.

Now, if you're not clear on the technical bits of what I'm on about here, the argument binding protocol is explained on the wsgi_lite homepage.  The basic idea, though, is that you can call @lite(keyword1=binding_rule1, keyword2=binding_rule2...), and bind your function's keyword arguments to objects like sessions, requests, carts, users, and arbitrary WSGI extensions.  The binding rules can be strings, callables, or sequences of the above, and the first rule that yields a result from the environment gets passed in to your function as a keyword argument.  And if no rule for that keyword yields a result, the keyword doesn't get passed to your function.

So, this allows you to use normal Python function argument defaults to fall back on if you don't get the object you're looking for, and it allows you to get a standard Python error when one of your arguments goes missing: you don't need to write code to check for the argument and raise your own error when it's missing.

Under Python 3, it might be that the decorator could just use argument annotations to do the same thing (instead of duplicating argument names in the decorator) but I haven't tried that yet.

The point, though, is that by defining a binding protocol, you can use it in lots of different ways.  Given the protocol I've specified, you could go out there right now and write yourself a Python 3 decorator that looks for binding rules in argument annotations, and applies them according to the rules of the binding protocol.  And users of your decorator would immediately be able to use anybody else's session, request, cart, or whatever other WSGI Lite argument bindings were out there, in their Python 3 argument annotations.

Likewise, you can, right now, write yourself a binding for your session, request, cart, or whatever objects, and be assured that people will be able to use them with any decorator (or other tool) that uses "WSGI Lite binding rules".

Even in tools that haven't been thought of yet, let alone implemented.

And that's the power of a protocol, versus a mere library.

Now, all in all, the argument binding sub-protocol is perhaps the most potentially-controversial part of the WSGI Lite protocol suite.  It's totally new, and as far as I know, unprecedented in the WSGI world.  And if you're the developer of a heavyweight WSGI library or framework, it might not seem very important to you.

However, the point of it isn't to re-solve a problem you've already solved for your own library, or to replace your API.  Rather, it's a way to allow people to make smaller libraries, by 1) shrinking the unit of reuse to the argument, rather than the decorator, and 2) lowering the entry barrier for a library to be written, by removing the need for a big API or a complex decorator.

And of course, it also lets you add binding rules on top of your existing big library, to offer users an enticement or "gateway drug" to using the rest of your library.  You can, in effect, begin advertising your library as a catalog of bindings, rather than trying to get people to drink all of your library's cool-aid at once.

So, Where Do We Go From Here?

Well, at this point, the protocols are out there, but they don't have any "official" standing, except for my attempt at declaring them "red squares".  That is, they're potential points of co-ordination, and they have my backing as a potential "way forward" for the next-generation of WSGI.

But this doesn't mean they won't change between now and any real "official" status (like a PEP).

My original effort at WSGI -- originally called "WCI" -- was not very much like WSGI at all.  The fundamental idea in WCI and WSGI was the same, sure, but the final implementation was very different.

And the same thing might happen with WSGI Lite, too.

Indeed, I've already gotten emails from a couple of big WSGIans about potential changes to WSGI Lite to fix other problems...  and so some things may well happen there.

Mostly, though, what I want to do with WSGI Lite is create protocols that allow lightweight, collaborative solutions to those problems.

For example, rather than trying to fix all of the warts in "wsgi.input" in the core Lite protocol, I'd rather see some proposals for bindings that people can use to fix those problems.

Instead of us trying, yet again, to create the One True Input Object!

Now, is that really possible with wsgi.input, or any of the other warts that people would like to see fixed in the "next generation" of WSGI?

I don't know.

But I think it's worth a shot at finding out.  And if there are some clear wins to be had by tweaking the three Lite sub-protocols, or adding some others to the mix, I'm all ears.

These are things that need to be hashed out a bit before the protocols are PEPpable, and yes, perhaps a bit of API bikeshedding may be needed as well.

And you know what?

I'm kind of looking forward to it.

See you on the Web-SIG!

Sunday, July 31, 2011

WSGI Is Dead: Long Live WSGI Lite!

Almost a decade ago, back when I first proposed the idea of WSGI to the Web-SIG, I had a rather idealistic vision of how WSGI could be a kind of "framework dissolver".  I envisioned a future in which everything was pluggable, and there would no longer be any reason to have monolithic application frameworks, because everything could be done with libraries, middleware, and decorators.

Alas, that idealistic future didn't come to pass.  In fact, as far back as 2007, I had already noticed it wasn't working, and proposed an idea for a WSGI 2 protocol that would resolve the problems...  and then proceeded to do nothing for the next few years.  (Well, I've been doing other things, like working on setuptools, Chandler, and my own business.  I just wasn't working on web apps or WSGI!)

Anyway, last week, Armin Ronacher wrote a great article on his blog called WSGI and the Pluggable Pipe Dream, about this very topic.  If you haven't read it, I urge you to do so, as it provides in-depth coverage of many of WSGI's dark corners and design decisions that are not widely understood by people who weren't involved in the original design, or who haven't spent a lot of time working with it.

But I was a little disappointed with the end of the article, because Armin's build-up led me to believe he had a solution to the problems of dealing with crud like start_response, write, close, and all that in WSGI middleware.  But really, his claim ended up being that even if somebody invented something better than WSGI, there would be no way to replace it, because of all the investment in the existing protocol.

So, I decided to do something about that.

CHALLENGE ACCEPTED!

Introducing WSGI Lite, WSGI's new younger brother.

WSGI Lite is a protocol that's basically the same thing as the "WSGI 2" calling convention I proposed four years ago, and pretty much the same as what other languages' versions of WSGI use.  There's no start_response, close, write, or exc_info to mess with, and I even threw in a massively improved way to manage post-request resource release and cleanup operations.

Now, if WSGI Lite were just a WSGI alternative, Armin's article would be right: nobody would use it, because it'd be in competition with WSGI, and we'd have to basically "Shut...  Down...  Everything"  in order to replace it.

But the WSGI Lite protocol is actually backwards compatible with WSGI.  You can write code to the WSGI Lite API, and transparently interoperate with existing WSGI servers, apps, and middleware.

Which means, you don't have to replace anything; you can just start using it, wherever it's appropriate or useful to do so.

All it takes, is two decorators: one to declare an app as being a "lite" app, and one to allow you to call standard WSGI apps using the "lite" calling protocol.  (And, as a special bonus, the decorator you use for new code can also automatically bind environment keys, session/request objects, or other cool things to your app or middleware's keyword arguments.  It's tres chic.)

I'm hoping that this will revitalize the "pluggable pipe dream", and make it a little less dream, a little more pipe.

So try it out, and let me know what you think.

Update: on reflection, the above article is woefully inadequate to explain the actual rationale of the WSGI Lite protocol or its implementation, so I've written a follow-up piece to cover that.  Check it out!

Saturday, November 06, 2010

The simplest(?) way to do tree-based queries in SQL

The other day I was looking around at "tree" model libraries for Django, and I noticed that there are an awful lot of algorithms people use to manage hierarchies in SQL out there, but the one that is simplest (at least in my view) seems to not be very widely known at all.

There's the obvious way of simply giving records a "pointer" (foreign key) referring to their parent.  The upside is, it's easy and it's normalized: the data means exactly what it says and there's no duplication.  The downside is, you can't really retrieve an entire tree from the database, without doing lots of SQL queries.  Likewise if you need to be able to find all the children of X at any moment, or be able check within another query whether any of X's ancestors are Y.  It really doesn't work for that sort of thing at all.

That's when people start coming up with all sorts of tricks, like storing a path string and using prefix searches (with BETWEEN or LIKE), or storing a beginning and end number (aka the "nested sets" approach), or having a fixed maximum hierarchy depth and representing the materialized path using individual fields (used in Drupal's menu system).

The advantage to these systems is that they don't take a lot of space, but the disadvantage is that nearly all the maintenance has to be done in the application -- often involving quite a few SQL queries!  So, in essence, these approaches trade lots of queries at retrieval time, for lots of queries at update and insert/delete time, and often some additional programming complexity.

But I don't think I've ever seen an article on my preferred way of managing hierarchies in SQL, and that's using what I call a closure table.

Closure Tables

A closure table gives you the ability to do all the same sorts of "find me all children of X, to depth N" queries as any of the other methods, just by doing a join against the closure table.

But the killer feature of the closure table, is that to add or remove a parent-child relationship, you only need to run *one* SQL query -- a query so simple that even the least-powerful SQL databases can be configured to run it as a trigger on the main table!

Let's take a look at how this works.  A closure table is simply a table that maintains the "transitive closure" of the parent-child relationships in the base table.  So, let's say you're modelling a directory structure, and you have a "directory" table, with a foreign key "parent_dir" pointing to each row's parent directory.

With this structure, you can only query direct (depth 1) relationships, but by adding a "closure" table with fields for "parent", "child", and "depth", you can represent the hierarchy to whatever depth is present.  So, if directory C is a child of directory B, and directory B is a child of A, then the base table would look like this:

idparent_dirname
10A
21B
32C

And the closure table would look like this:

parentchilddepth
110
220
330
121
231
132

In other words, the closure table says, "A is a child of itself at depth 0", "B is a child of A at depth 1", and "C is a child of A at depth 2".  The total number of records in the closure table is equal to the number of records in the base table, times the average depth of the tree(s) in the base table.  (That is, each base table row has a row in the closure table, plus one row for each of its parents.)

Inserting Links

Now wait a minute, you may say.  This looks way more complicated and tricky than a materialized path or even the nested sets thingy.  I mean, how would you even begin to write the code to maintain such a thing?

Well, to make PARENT_ITEM a parent of CHILD_ITEM, the code looks like this:

insert into closure(parent, child, depth)
select p.parent, c.child, p.depth+c.depth+1
  from closure p, closure c
 where p.child=PARENT_ITEM and c.parent=CHILD_ITEM

In other words, it's something your average SQL database can do without breaking a sweat. I mean, it's like baby SQL, for crying out loud. Okay, so maybe deliberately doing a cartesian product join isn't something you everyday, but every database can still do it.

And it's not even a particularly performance intensive query! It basically just says, "make every parent of PARENT_ITEM (implicitly including itself) a parent of every child of CHILD_ITEM (again including itself), at the appropriate depth.

The cost of doing this operation is directly proportional to the size of the subtree under CHILD_ITEM, so if you're adding a leaf node to the tree, this will simply insert parent entries for the new leaf, and nothing else. If you're adding a new parent to an existing tree, on the other hand, then it will insert a new row for every node already in the subtree being "adopted".

Of course, in other "materialized path" strategies, you're still paying the same kinds of proportionate maintenance costs to update a deep tree, except that you're doing string manipulation or resetting a bunch of fields -- usually in code that then makes a bunch of SQL queries... and gods help you if your tree is actually too big to fit in memory.

(Even the nested sets model has similar update costs: you can certainly delegate most of the bulk operations to the database, but if you have a frequently-updated, very large tree, the nested sets approach will end up shuffling position pointers a lot more frequently than the closure table will end up with new rows.)

So how does this query actually work? And how do you delete or update relationships?

How This Works

Well, the real secret to making the closure table work is the rows with depth 0. Without that little innovation, you'd actually have to do four separate inserts, or a giant union query (yuck!) to add everything you need for a new link.

Let's start out with our tables in a "flat" structure, with nobody being a parent of anybody else, but with the (self,self,0) entries already in the closure table:

idparent_dirname
10A
20B
30C
parentchilddepth
110
220
330

Now, if we change the base table to make B into C's parent:

idparent_dirname
10A
20B
32C

And then run the SELECT part of the link-insertion query with a parent of 2 and a child of 3, we get:

p.parentc.childp.depth+c.depth+1
231

Resulting in a new closure table entry for the added link:

parentchilddepth
110
220
330
231

Okay, that seems easy enough. But what if we now make B a child of A?

idparent_dirname
10A
21B
32C

Now our SELECT (using PARENT_ITEM=1 and CHILD_ITEM=2) returns:

p.parentc.childp.depth+c.depth+1
121
132

And our closure table now looks like this:

parentchilddepth
110
220
330
121
231
132

Voila! Because the closure table contains correct information for every part of the tree that already exists, our query can make use of that information to fill in the rest of the new information that should exist.

Removing Links

Removing a link works similarly, using a query like this:

delete link
  from closure p, closure link, closure c
 where p.parent = link.parent and c.child = link.child
   and p.child=PARENT_ITEM    and c.parent=CHILD_ITEM

This is the exact same query, just flipped so as to delete the rows instead of inserting them.

Now, imagine if you hooked these two queries up to triggers: you could simply do whatever operations you want on the base table, and let the database do the rest!

Of course, you'd need a couple of extra actions when a row was inserted or deleted in the base table: at insert time, you need to insert the (self,self,0) entry in the closure table (followed by an optional link addition if the new row was inserted with a parent), and at deletion time, you need to delete both the self-link, and all the links that depend on it:

delete link
  from closure p, closure link, closure c, closure to_delete
 where p.parent = link.parent      and c.child = link.child
   and p.child  = to_delete.parent and c.parent= to_delete.child
   and (to_delete.parent=DELETE_ITEM or to_delete.child=DELETE_ITEM)
   and to_delete.depth<2 

Essentially, this is just a fancy way of doing the earlier link-removal operation for each of the target item's neighboring links -- i.e., ones with depth 0 or 1. (The links with a greater depth are taken care of by the first few lines of the query.)

So now, you have just a little bit of logic to do in three triggers for your base table, to get this set up, and presto! No additional code needed. Just work on the base table normally, and run any hiearchical queries through the closure table. (And make sure you've got unique indexes on (parent, depth, child) and (child, parent, depth), so those queries can really fly.)

So Why Aren't You Doing This Already?

Sure, I know, not every database supports triggers... or do they? Sure, maybe if you're on MySQL.... and it's still the 1990's! SQLite has triggers these days, for crying out loud.

Your database library doesn't support them? Okay, fine. Write the equivalent of triggers in whatever your library does support -- like a post-save signal in Django's case -- and enjoy. (You can probably even set up your closure table as a many-to-many "through" table in Django's case, though I haven't actually tried that yet.)

Really, there are only a few reasons not to use a closure table when hierarchical queries are required, and most of them boil down to, "I have a small tree (i.e.human-readable in its entirety) and I need to display the whole thing in hierarchical order using SORT BY".

And even in the application I first used closure tables for (which had tens of thousands of items in various parent-child relationships), there were also some smaller hierarchies that could've used a materialized path or nested sets approach, and if I were doing that application over, I'd have probably used them.

But, if you have a huge set of hierarchical relationships (a large "forest" of large trees), and need to be able to query over the transitive closure (i.e., do "recursive" parent/child lookups from SQL), IMO a closure table is the only way to go. (And if you're in an "enterprisey" environment where lots of programs touch the data, using triggers or stored procedures to maintain that closure table is a must!)

Tuesday, August 10, 2010

Simplifying prioritized methods in PEAK-Rules

Recently, I've been scouting around the web for examples of what people have been doing with PEAK-Rules (and the older RuleDispatch package) to get an idea of what else I should put in (if anything) before making the first official release.

One of the interesting things I found was a package called prioritized_methods, which advertises itself as a "prioritized" version of PEAK-Rules, and appears to have been used in the ToscaWidgets project at one point.

Prioritized methods certainly seem like a useful idea, but I was a bit bothered by the specific implementation, because it showed just how weak PEAK-Rules' extensibility documentation is at this point.

Really, it shouldn't be that hard to implement manual method priorities in PEAK-Rules.  I mean, prioritized_methods is like 150 lines plus docstrings, it has to define several new method types and decorators to replace those in PEAK-Rules, and if you want to use it with a new method type of your own, you're already screwed by potential incompatibilities.

In short, I clearly wasn't exposing a good enough API or providing good enough examples.  ;-)

So, there had to be a better way, and in fact I immediately thought of one that ought to be doable in a dozen lines of code or so, that would make a perfect demo for PEAK-Rules' predicates module documentation:

from peak.rules import implies, when
from peak.rules.criteria import Test
from peak.rules.predicates import Const, expressionSignature
class priority(int):
    """A simple priority"""
when(implies, (priority, priority))(lambda p1,p2: p1>p2)
@when(expressionSignature, 
      "isinstance(expr, Const) and "
      "isinstance(expr.value, priority)")
def test_for_priority(expr):
    return Test(None, expr.value)

What this code does is create an integer subclass called priority, that can then be used in rule definitions, e.g.:

@when(some_func, "isinstance(foo, Bar) and priority(1)")

Then, between two otherwise-identical rules, the one with a priority will win over the one without, or the higher priority will win if both have priorities.

All you have to do to use it, is import the priority type into the modules where you want to use it. No new decorators or special method types are needed, and it will continue to work with any new method types added to PEAK-Rules or defined by third parties!

Pretty neat, huh?

There was just one downside to it, and that's that it didn't work. :-(

As it happens, PEAK-Rules' predicate dispatch engine barfed on using None as a test expression (in the Test(None, expr.value) part), and I had to tweak a few lines to make it skip over indexing and code generation for tests on None. But, once that was done, I was able to add a tested version of the above as a doctest demo.

Anyway, if you're doing anything interesting with PEAK-Rules, or find yourself needing to extend it in some way, I'd love to hear from you. Right now, it's pretty easy for me to add cool features like the one above, but I'm guessing that there are still some gaps in the current documentation for anybody else trying to implement nifty new features like the above.

So, I'm especially interested in any problems you had doing extensions, as well as success stories. (I'd really like to start firming up the extension APIs soon, as well as their docs!)

Wednesday, August 04, 2010

I Am A Complete Idiot

This is a total, absolute facepalm.  I cannot believe I have been this stupid.

Today on Reddit, I saw a complaint about infrequent setuptools releases.  And I was like, "well, why don't you just install from SVN?"

And then I thought to myself, you know, I really should release that thing more often, so that people can get the changes without needing to use SVN.  I mean, what if they don't have SVN installed, especially on Windows?  (Which the person posting the complaint was using.)

So, I start thinking to myself, "well, how often could I release an update?  Could I do it every month or so?"

And that's when I have the total facepalm.

I mean, couldn't I just release snapshot versions, like say, oh, I don't know... nightly?

You know, just like I've been doing for all these other projects?

For, like, the last five years?

Duh.

I have now set up a snapshot script, and edited setuptools' PyPI entry to point to the snapshots directory.  So, as of now "easy_install -U setuptools" will update you to the latest snapshot of the 0.6 line, until I switch over to offering 0.7 alphas as snapshots.

That is all.

Thursday, February 04, 2010

Don't use CGIHandler on Google App Engine

Update (March 12th): Guido says this problem only happens in the AppEngine SDK, and doesn't happen in production.  "In production, wsgiref.handlers is imported in an earlier stage,  before the request-specific environment is set. So the scope of the problem is much smaller."  Yay!

It seems that an early documentation example recommended that people use wsgiref.handlers.CGIHandler to run WSGI apps on Google App Engine, instead of the correctly-functioning google.appengine.ext.webapp.util.run_wsgi_app() function.

If you are doing this in your application or your web framework, you have a potentially-exploitable security hole and you should fix it at once.  (Or maybe not; see Guido's comment.)

The specific problem is that one of CGIHandler's base classes caches a copy of os.environ, for non-CGI use cases, and this makes it possible for certain CGI variables to "leak" from the request that started the process, into every subsequent request.

Of course, CGIHandler was never intended to be capable of handling long running processes like GAE, because CGI is not a long-running process.  The idea is that if you have a new kind of long-running process, you subclass BaseCGIHandler for your specific use case.

See, in a "traditional" long-running web app protocol (like FastCGI), process startup is distinct from request handling.  Even if a FastCGI app is started because there's a request ready for processing, there is still a separation between application initialization and the actual request processing.  (And wsgiref tries to cache the "startup" os.environ, separate from the "request" os.environ.)

App Engine, however, jams these two phases together, such that the "main" script is being re-run for each request, so there's no distinction between "startup" and "request".  This makes things convenient for people used to a CGI environment, but brings up problems for the CGIHandler, which expects that it will only be used once per process invocation, and so inherits a cached version of os.environ that also contains request content.

The fix is straightforward: switch from using wsgiref.handlers.CGIHandler to google.appengine.ext.webapp.util.run_wsgi_app().

However, if for some reason you can't do that, a quick monkeypatch fix is to add this line:

CGIHandler.os_environ = {}

somewhere in your code before the first use of CGIHandler.

It is possible that Google has already implemented the patch I provided them to fix this, but if so, the bug opened for App Engine is still open, more than two months later, and some of the documentation is still recommending CGIHandler.  Don't know whether that means it's fixed and the docs are okay, or that it's unfixed and they're still recommending people use it.

Either way, though, recommending CGIHandler for use in the GAE environment was never a good idea, since GAE is not really CGI.  If it ain't CGI, don't use CGIHandler.  Subclass BaseCGIHandler instead, and make a GAEHandler or AWSHandler or whatever, and take advantage of the branding opportunity provided thereby.  ;-)

 

Tuesday, October 13, 2009

A Clarification Or Two

It seems that one aspect of the response to my announcement yesterday was one I didn't anticipate: people assuming that my statement about uninstalling Distribute was some sort of snark or an attempt at competition, power plays, etc.

But since the annoucement was made to the Distutils-SIG mailing list, I assumed anyone reading it would already know that having both Distribute and setuptools on sys.path would in most cases cause you to still be using Distribute, and not setuptools.

And just as obviously, it would mean you were testing Distribute, rather than setuptools, thereby invalidating the usefulness of any test results.

As for the comment about bugs being fixed differently, I wanted to make the point that testing Distribute does not equal testing 0.6c10; people should not assume that 0.6c10 has already seen as wide usage as the Distribute code has.  I just wrote it this weekend, for heaven's sake.   (Conversely, if someone is concerned about some of the bugs that were on the setuptools tracker, they deserve to know that not all the patches on the tracker -- and used in Distribute -- were correct or complete, in my estimation.)

Was there a teeny bit of annoyance in there as well?  Might the tone of that paragraph have been a little off?  Perhaps.  I edited it several times, trying to minimize any show-through of annoyance, and keep it 100% neutral/factual, but I can certainly believe that a bit of it came through anyway.  I am annoyed, after all.

I'm annoyed that I had to prepare this release.  I'm annoyed that people rant about me not doing anything, and then the same people turn right back around and rant when I do do something.  I'm annoyed that setuptools has been widely blamed for yet another problem that it didn't actually create.  (Two, actually: the 2.6.3 problem itself, and then the subsequent brouhaha on Python-Dev.)

Sure, I'm annoyed about lots of things.

That doesn't mean I want anybody to uninstall Distribute, for any other reason than that they'd rather use setuptools.

After all, I'm making a new release of setuptools primarily so people have the option of not being forced to use Distribute -- and so that Python-Dev isn't forced to make a new Python release just for the benefit of setuptools users.

So it's certainly not my intention to force anyone to use setuptools.  I'm not even trying to persuade anyone to use it in place of Distribute, for heaven's sake.  (Hell, if you're interested in Python 3 support, Distribute is the only game in town right now. Oh, and I even suggested that Guido put Distribute in the stdlib...  and despite the smiley, I wasn't kidding.)

In short, not being forced to do one thing, is not the same as being forced or persuaded to do the opposite.  Capisce?

Okay, that's all, you can now return to your regularly scheduled blaming and flaming.  Pay no attention to the man in the corner, trying to do something useful.

P.S.  There are people I'd trust with maintainership (or at least committership) of setuptools who are working on Distribute.  It's the public nastiness of certain parties that torpedoed the negotiations on that topic back in July, despite the oft-repeated claims by some that I wouldn't turn over the reins to anyone.  See, for example, this attempt to open discussions on that line, or the last paragraph of this, where I expressed excitement at the idea of having 0.6 get cleaned up by someone else.  (OTOH, if I'd realized it would only take me a weekend rather than a couple of weeks to clean up the backlog, I'd have done it six months ago!  So, that bit of delay is my fault.)

Also, for anyone who thinks that my announcement of this release was completely out of the blue, please see the "new setuptools release" thread on Python-Dev, in particular, the post where I said I was planning to make a new release this week, by this Monday, in order to address the outstanding issue with 2.6.3.

P.P.S. Let's keep it constructive in the comments, shall we?  Comments that show no sign of their author's having read and grokked this entire post (and the items linked to above)will be summarily deleted, no matter how otherwise thoughtful the comment might be.

Frankly, anyone who can read all the links I just gave, and still think I'm deliberately trying to put one over on anybody, spring unannounced surprises, hijack Distribute in some way, or unwilling to hand over significant chunks of setuptools responsibility, well...  let's just say that hypothetical person is not being very charitable, and leave it at that.  Here's hoping you're better than that.

Monday, October 12, 2009

That wasn't so difficult, was it?

So, a few weeks ago, Python 2.6.3 was released with a change to the distutils that broke backward build compatibility for a few libraries, like pywin32, setuptools, and every other package using setuptools to build C extensions in a package.

After wading through huge piles of Python-Dev and Distutils-SIG emails in my inbox every day for a week or so, it became clear to me that it'd be a lot more efficient to just make a new release of setuptools than to try to correct all the myths and misapprehensions, or even just the deliberate mischaracterizations, misappropriations, and outright bald-faced lies.  (As the old saying goes, a lie goes halfway 'round the world while the truth is still putting its shoes on!)

Indeed, spending the weekend bringing setuptools up to date was downright fun by comparison, and the results are now in SVN, ready for you to try them out.  Not only did I implement fixes for virtually all the outstanding bugs in the setuptools tracker, but I also tackled a few that aren't in there but personally annoyed me, like Vista's UAC warnings when you try to run easy_install.

Anyway, I'll be cutting an official release (0.6c10) in a few days, so please be ready if your project infrastructure relies on fetching the latest setuptools version from PyPI.  I'm actually pretty excited about this release, because it represents the finish-out of all the 0.6 maintenance cruft that I wanted to do before working on new features in 0.7a1, and starting on the improved package manager that I've been babbling about for years.

On the other hand, I'm less than excited to have people believing some of the rubbish that they do, and it's even less pleasant to imagine that it's spreading.

I do understand that some people are angry about the long wait between setuptools releases, and that's their right.

It's also their right to do something about it, and for the brief time when they were taking the high road -- i.e., 1) being courteous, and 2) not engaging in a public FUD campagin -- I tried to help and participate, and even offered to release the fork as an official version.  Heck I was even excited about the possibility.

But apparently, my positive attitude was too "suspicious" for some people to accept, and others took my search for a qualified maintainer very personally.  And now, various people are claiming the fork is "official" or "blessed", when it never was...  that it's backward-compatible with setuptools, when it's not...  and that it fixes various bugs...  that in fact it doesn't.

And it seems that the collective anger in some quarters has reached such a fever pitch that anything positive I do is still considered "out of the blue" and suspicious.

It would be nice if I could say I was 100% above the no-good-deed-goes-unpunished philosophy of the programming herd.  Certainly, this sort of rampant negativity -- and the corresponding negativity in me that it tends to trigger -- is a big reason I wanted out of professional software development in the first place.  While there are a lot of cool people in the business, a large number of others are...  shall we say, rather unpleasant?

I had hoped to just move forward and pick things up where I left them last year, when I started taking time off to do some serious work on non-programming projects -- and I had no idea when I started it'd be a whole year.

But it seems that I've underestimated just how upset some formerly-(relatively)-happy users of setuptools are with my (relative) disappearance.  And so it's understandable that showing back up with nothing more than a, "Hey, long time no see, here's an update I'd like you to test" might be...  less than satisfying.

So, for all of you long-suffering setuptools users, I'm sorry.  I should've communicated better about my status and plans, and I would have if I'd had any idea what they were myself.

It didn't help that I fell prey to a bit of an entitlement attitude, after so many distutils-sig flamewars over all sorts of minutiae that tended to make me think of anyone proposing changes to setuptools as being an idiot by default.  Among other things, I forgot that the fact that plenty of other people being wrong, doesn't necessarily make me right.

But you do need to understand: I'm not working on setuptools to get back into anyone's good graces but my own.

You see, over the last few weeks there've been a lot of emails on the distutils-sig and Python-Dev from people who are upset at having to leave setuptools.

And I want them to be able to have an actual choice about whether they start using another package or not.  Some have said that it's not fair game for the very people promoting the use of another package, to force them to do so by changing Python.

And I agree, which is why I've done what I've done.

Now, I realize that this isn't going to do much for the people who're already mad as hell and don't want to take it any more.  But when it comes right down to it, being mad is your choice, not mine.

In the same way, I could choose to be forever angry about what seem like numerous unfair, unjust, unexcusable slights dealt out to me.  And I could keep lashing out in anger at the people I think are responsible for my pain.

But that won't make them stop, and it won't make anything better.  At some point, you have to decide what it is that you want instead, and move towards that, instead of trying to "solve" an existing "problem".

So, that's what I'm starting now.

See, for the last several months I've actually been walking on eggshells, trying to not say anything that anyone might be offended by, or announce any plans that anyone might accuse of being "suspicious".

Now, though, I realize that my silence and bending over backwards was actually adding to the problem.  It gave some people the idea I agreed with things I didn't agree with.  It didn't communicate anything to the people who needed to know what was going on with me.  And it even discouraged me from actually doing anything, because I wasn't sharing the excitement and ideas that motivate me.

I kept thinking that there'd be a "right" time, that if I just lay low long enough, I could go accomplish some things and come back and say, "ta da!" and make everything better.

Well, you can see how well that worked.

So, new plan starts now.

Stay tuned.

Saturday, September 06, 2008

Python Gets Out...

Python seems to keep turning up in the most unusual places.  Today I went to the library and borrowed a couple of books on graphic design to assist in making some layout decisions for the book I'm working on.  One was a book I'd read before, Editing By Design, which I'd used to help with the design of my earlier book, "You, Version 2.0".  The other was a book called (appropriately enough) "The Layout Book".  I was skimming through it, when I came across a page with this near the bottom (I've elided a few items from the middle):

"Simple is better than complicated.
Quiet is better than loud.
Unobtrusive is better than exciting.
Small is better than large.
...
The obvious is better than that which must be sought.
Few elements are better than many.
A system is better than single elements."

The block of text was a quote, attributed to one Dieter Rams.  "Wow," I thought, "I wonder if Tim Peters's Zen of Python was a play off of this..."

Then I turn the page.

At the very top of a collection of "methodologies", I see:

"Python philosophy

Derived from computer programming, the main points of the Python approach were presented by developer Tim Peters in The Zen of Python.   Key points include: beautiful is better than ugly, explicit is better than implicit..."

Small world, eh?

--PJ

P.S. I'm still amused by the mentions of Python in Charles Stross's science fiction novels, especially the one where the future hero is described as doing his game programming work in Python 3000, almost as if it were some highly-futuristic language.  ;-)

P.P.S. In case you hadn't guessed, the reason I'm not doing more programming (or blogging about programming) right now is because I'm working on the book...  in which, incidentally, I'm attempting to take a truly algorithmic approach -- not to mention a highly test-driven one -- to such diverse matters as motivation, belief, creativity, time management, and even optimism.