Skip to content. | Skip to navigation

Personal tools

>>> ''.join(word[:3].lower() for word in 'David Isaac Glick'.split())

‘davisagli’

Navigation

You are here: Home

David Glick – Plone developer

by admin posted Apr 05, 2010 11:48 PM

Deep thoughts from #plone IRC

by David Glick posted May 07, 2011 01:50 PM

kojiro: "end users" is another funny term. I mean, don't we all use our ends? 
kojiro: the end users justify the mean users.
Wyn: kojiro, I completly agree, I have seriously seen people on here who really do not know what a terminal is
kojiro: Wyn: it's the end of something
kojiro: so you can't be an end user unless you have a terminal

The making of zodb.ws

by David Glick posted Apr 03, 2011 09:55 PM

An explanation of the pile of hacks we used to get ZODB running in the browser.

On April 1st Matthew Wilkes and I announced the launch of ZODB Webscale Edition, which runs the ZODB in a Javascript-based Python interpreter backed by HTML localstorage. It was of course in honor of April Fool's day, as the entire concept of running the ZODB in a browser is a bit silly, but the technology actually works. Here's how we did it.

The concept

Matthew first approached me about a month ago with the intent to pull off something epic for April Fool's day this year. His goal, he explained, was to "make something that nutcases would find useful but everyone else knows is stupid." We discussed various ideas such as supporting ymacs as a richtext editor in Plone before I remembered I had seen a way to run Python in the browser. We quickly ruled out running all of Zope2 in the browser as too big a project, but Matthew suggested doing just the ZODB, and I realized that making it be backed by HTML localstorage could make for a fun, reasonably scoped, buzzword-compliant demo. The idea was born.

The Emscripten Python interpreter

The hardest part of the problem—getting a Python interpreter implemented in Javascript—was already basically solved by the Emscripten project. Their Python interpreter was generated by compiling CPython to LLVM bytecode using clang, then using their tools to translate that into Javascript. The result is a 2.8MB closure-compiled "python.js" which includes the logic of CPython as well as implementations of basic C library calls like sprintf and malloc in terms of operations on a heap which consists of a Javascript array. We didn't have time to get the whole Emscripten toolchain set up and working so that we could build a non-packed python.js, but we did need to understand the basics of this Python interpreter, so we used the Google Closure Compiler to unpack the whitespace of python.js so it was at least semi-readable.

Unfortunately this Python interpreter had a major limitation—it had no implementation of importing modules, so you were limited to things like "sys" which are included statically in CPython. Obviously this wouldn't work for getting the ZODB working.

The import system

I wanted to allow dynamically importing as many things as possible, rather than simply bundling the ZODB code with the interpreter in some fashion. So it seemed like a good approach would be to write a small WSGI import server, and then make the interpreter fetch imports via AJAX in some way. But how, exactly?

I knew that importing in Python calls the __import__ builtin, so I could monkeypatch __builtins__.__import__ to make it somehow fetch the module being imported by name, then manually construct a module using imp.new_module() and exec the fetched code in the new module's namespace. However, this would be Python code running within the sandbox of the Javascript-based interpreter, without the ability to make Javascript calls like fetching via AJAX. So how could we do the actual "fetch the code" step?

We could have used the CPython API (as translated into Javascript) to, from Javascript, create a new "builtin" module with a function for loading a module's code via AJAX. But, neither of us had worked with the Python C API much, let alone with its closure-compiled Javascript variant, and this seemed like too big of a task. So we hit on a simpler hack: we wrote a Javascript function to do the fetching and then "hijacked" the raw_input builtin by replacing the interpreter's reference to it (we picked raw_input because the Emscripten Python interpreter didn't implement it anyway).

The result is a glorious mixture of CPython API (we had to use a bit after all to unpack the argument with the module name and to pack the string with the returned source code) and JQuery:

function raw_input(self, args) { 
    // stack management
    var b = a;
    a += 4;
    for(var d = b;d < a;d++) {
        i[d] = j[d] = 0
    }
    i[b] = 0;
    // unpack argument
    Module._PyArg_UnpackTuple(args, $ba, 0, 1, u([b, 0, 0, 0], 0, o));
    var name = ma(Module._PyString_AsString(Module._PyObject_Str(i[b]))); 

    // fetch via *synchronous* XMLHTTPRequest
    output('Importing ' + name + '...', 'status');
    var source = '';
    jQuery.ajax({
        url: 'lib/' + name,
        error: function(xhr, status, code) {},
        success: function(result) {
            source = result;
        },
        async: false,
        dataType: 'text',
        cache: true
    });

    // return the source as a pointer into the Python heap
    var h = Module.Pointer_make(Module.intArrayFromString(source))
    a = b;
    return Module._PyString_FromString(h);
}
// hijack the raw_input builtin
n[RMb] = Module._builtin_raw_input = raw_input;<p> </p>

The __import__ hook could then be implemented in terms of the new raw_input builtin:

import sys, imp
_known_bad = set()
def __import__(name, globals={}, locals={}, fromlist=[], level=-1):
    if name in _known_bad:
        raise ImportError('Could not fetch module %s from server.' % name)
    if name in sys.modules:
        return sys.modules[name]

    # call our hook (we hijack raw_input below)
    source = raw_input(name)
    if not source:
        _known_bad.add(name)
        raise ImportError('Could not fetch module %s from server.' % name)

    m = imp.new_module(name)
    m.__file__ = name
    sys.modules[name] = m
    if '.' in name:
        parent, basename = name.rsplit('.', 1)
        if parent in sys.modules:
            setattr(sys.modules[parent], basename, m)
    exec source in m.__dict__
    return m
__builtins__.__import__ = __import__

This Python is included inline in the HTML, and found and executed during initialization of the interpreter. It is a bit buggy in its handling of packages, but worked well enough to let us move on to the more interesting aspects of the project.

Making the ZODB work

So we could import things. "import this" worked great. The ZODB? Not so much. You see, we soon found out that Emscripten's Python interpreter is really quite minimalistic in its builtin modules. "os" is not included, as sandboxed Javascript can't access the local filesystem, so anything like "logging" which depends on it was a problem. Things like "threading", "re", and "time" were similarly missing. Even more problematic was the omission of the following modules which are used in pickling (sort of the core function of the ZODB): cPickle, marshal, and struct.

So we started hacking up our copies of the ZODB and transaction packages. We removed all the logging. We took out the threading locks, with the justification that Javascript is single-threaded anyway. time.time() got replaced with a simple incrementing counter. Et cetera. As for cPickle and its dependencies, we borrowed the pure Python implementations from PyPy. We also needed Tres Seaver's branch of ZODB to provide a pure Python implementation of the 'persistent' module. It took a couple evenings, but without too much effort we were eventually able to instantiate a DemoStorage, instantiate a DB, connect to it, and commit transactions on the root object. Major win!

The HTML localstorage backend

But it still wasn't a great demo. We wanted it to be possible to commit a transaction, then come back after leaving the page and be able to access the data that had been committed. And we wanted the persistence to happen in browser localstorage on the client side, rather than by passing values to the server. So we needed to find a way to modify or replace DemoStorage to pass its values to Javascript to be placed in localstorage, and to retrieve them again when the page is loaded.

After the CPython API hackery needed to get the imports working, I was a bit scared about doing a lot of passing values from Python to Javascript and back. So at this point I thought, "Wait. We have the entire Python interpreter runtime state in these Javascript arrays; why don't I just save and restore the whole interpreter?" Ultimately this led me down a rabbit hole to nowhere. I never quite figured out the correct bootstrapping process to get all the necessary Javascript variables re-initialized on subsequent loads, but with the Python heap, stack, etc replaced with the old state. And I was bumping up against the 5MB limit for what can be placed in localstorage.

Fortunately Matthew came along at this point with a different approach. He wrote a very simple ZODB storage class, the HTML5Storage (code), which stores pickles of modified objects and writes them to a (Python) global dict, keyed by object id, when a transaction is committed. And instead of messing around with the CPython API to interface Python with Javascript, he simply made a commit print out the JSON representation of that global dict, with a special identifier that the Javascript implementation of print() was modified to watch for and handle specially by parsing the JSON and stuffing it in localstorage. When the page is loaded, the stored values are passed back to Python by converting the localstorage contents to JSON, executing it as Python, and placing the values back in the Python global dict. (There is a bit of extra hackery to encode backslashes in the Python repr of the pickles, handled by the very Britishly named dodgy_encode function.)

At this point, it should have worked. But there was one more hurdle.

Debugging the pickles

When we tried to reload the root object from the HTML5Storage, we were getting unexpected errors. I compared the pickles that had been generated in the Emscripten interpreter with those generated for a similar object on a real Python interpreter, and noticed that they were not the same. I used the pickletools.dis() function from the stdlib to examine the pickle bytecode, and figured out that the size of some strings in the pickle was getting recorded incorrectly, so the pickles were not being executed correctly during unpickling. Specifically, the size of some strings was getting recorded as \x02 regardless of the actual length of the string.

I tracked the bug down to the repr() of the pickle that Matthew was doing in his dodgy_encode function. A variety of different bytes were all getting repr'd as \x02. And then I tracked this into the PyString_Repr implementation of the Javascriptified Python interpreter, to where it calls sprintf with a format of "\\x%02x". It turns out that Emscripten's implementation of sprintf was incomplete and supported neither the "02" used immediately after % to give a zero-padding width, nor the "x" used to specify printing a hex value.  It was interpreting the "02" literally, and the x was getting truncated off. Once I figured out where this issue was arising, it was a relatively simple matter to adjust the sprintf implementation to handle this format correctly.

And there was pickling, and there was unpickling: the first day. And Matthew and David looked on the hackery and saw that it was good.

The launch

Fortunately we still had a day left to put a bit of polish on the thing before April 1. Ryan Foster was kind enough to whip up a nice web-2.0-product-style design on short notice. (Kudos to Ryan; I basically said "we want something like evernote.com" and he magically figured out exactly the sort of color scheme and logo I had in mind.) We added a bit of varnish in front and added some social media bling to the footer. In a flash of inspiration, I dubbed the site a project of "POSKey Enterprises," a reference to the POSKeyError one gets when trying to load an object from the ZODB that does not exist. We revamped the input/output to be much nicer and closer to a real Python interpreter.

zodb.ws launched to much fanfare on the morning of April 1. Like, I mean, we got literally dozens of pageviews. (Okay, actually a few hundred. And we realize the thing is a bit esoteric.) :) Some people assumed the thing was just a light frontend to a server-side interpreter, so people were more impressed once we explained that everything was actually executing client-side. Thanks to everyone else who retweeted the link for helping spread the word a bit beyond the tiny Zope circles!

Ultimately, I think the project satisfied our goals; after all, hacking is about the journey, not the destination.

3 comments

Inspect your ZODB with Eye

by David Glick posted Mar 21, 2011 10:50 AM

Eye is a utility for browsing the contents of a ZODB.

A fairly common complaint about the ZODB is that there's no generic tool for browsing its contents. In fact this is a bit of a lie, as there are at least 3 existing tools called "zodbbrowser," but they all depend on large parts of the Zope stack, and are therefore a bit hard to install.  So at the PyCon sprints I worked on adapting Roberto Allende's zope2.zodbbrowser into a Pyramid-based tool called Eye.

The result is easy to install and looks like it will be fairly useful for seeing all the objects present in a ZODB (not just the ones that the ZMI or some other app-level tool chooses to show). As an added bonus, it knows how to browse "broken" objects, so you don't have to have your application code in Eye's PYTHONPATH.

(Blue items are persistent objects; black ones are included in the ZODB only by virtue of being referenced by persistent objects, and do not get their own pickle.)

Eye can also be used to take a peek at any old set of Python objects that are not in a ZODB.

See the PyPI page for installation and usage instructions, or clone the project on github and send me pull requests. :)

2 comments
David Glick

David Glick

I am a problem solver trying to make websites easier to build.

Currently I do this in my spare time as a member of the Plone core team, and during the day as an independent web developer specializing in Plone and custom Python web applications.