Skip to content. | Skip to navigation

Personal tools

>>> ''.join(word[:3].lower() for word in 'David Isaac Glick'.split())

‘davisagli’

Navigation

You are here: Home

David Glick – Plone developer

by admin posted Apr 05, 2010 11:48 PM

Backporting a topic branch with git

by David Glick posted Nov 26, 2011 11:16 AM

As a maintainer of Plone and Dexterity, I frequently find myself with the need to merge a pull request not only to the master branch, but also to a maintenance branch used for bugfix releases to older versions of the software.

When the pull request involves a single commit, it's pretty straightforward: I merge the pull request to master through github's UI, or via the command line with git merge. Then I check out the maintenance branch and use "git cherry-pick" to apply the changeset from the commit relative to the older branch.

But this quickly gets annoying if the branch I'm trying to merge involved multiple commits, and I have to cherry-pick each one in turn. So here's a different approach I used yesterday to backport a changeset from zedr (Rigel di Scala) to add an Italian translation to plone.dexterity.

His pull request was against the master branch, so I first merged that using the github UI.

Then I needed to apply the same change to the 1.x branch of plone.dexterity.

In my local copy of the plone.dexterity repository, I added zedr's fork as a remote, and fetched it.

$ git remote add zedr https://github.com/zedr/plone.dexterity.git
$ git fetch zedr
From https://github.com/zedr/plone.dexterity
 * [new branch]      1.x        -> zedr/1.x
 * [new branch]      davisagli-extend-fsschema -> zedr/davisagli-extend-fsschema
 * [new branch]      jbaumann-locking -> zedr/jbaumann-locking
 * [new branch]      jmehring-drafts -> zedr/jmehring-drafts
 * [new branch]      master     -> zedr/master
 * [new branch]      toutpt-unicode -> zedr/toutpt-unicode

Next I created a new branch called "zedr-merge" specifically for carrying out the merge, based on zedr's forked master branch which contained the change. I needed a temporary branch for this in order to carry out the rebase in the next step.

$ git co -b zedr-merge zedr/master
Branch zedr-merge set up to track remote branch master from zedr.
Switched to a new branch 'zedr-merge'

Now for the fun part. I used rebase to modify the zedr-merge branch's history so that it contains the commits from the 1.x branch, followed by only the existing commits from the zedr-merge branch that I wanted.

$ git rebase -i --onto 1.x a36f40743d67da8e6d5c7b0aee81e786a2de9f5e

Let's break this down. The rebase command will first store all commits on zedr-merge from a36f40743d67da8e6d5c7b0aee81e786a2de9f5e to HEAD in a temporary location (I found this hash using git log to identify the last commit prior to the changes I was trying to backport). Next it resets the history and state of the zedr-merge branch to be equivalent to that of the 1.x branch (because of the "--onto 1.x"). Finally it reapplies the changes that were stored in the temporary location. The end result is that we have exactly the history we want—that is, the 1.x history plus the relevant portion of the zedr/master history—but it is on the zedr-merge branch rather than on 1.x where we want it to end up. We'll deal with that in a moment.

Notice one more thing about the rebase command. I used the -i flag, which means interactive rebase. This means that I'll be prompted in an editor with a list of commits, and can choose to "pick" (include), remove, or "squash" (merge into the prior commit) each commit. In my case, since zedr had a number of commits making small changes to his translation file that were really all part of the same change at the macro level (adding the Italian translation file), I squashed them all together so that I ended up with a single commit at the HEAD of the zedr-merge branch which accomplished the same changes as all of the commits zedr had made on his master branch.

pick 8562549 Added Italian translation
squash d035596 Correct timestamps
squash 9fa9904 Removed template header
squash 4a97700 cancel does not really mean that; fixed (thanks gborelli)
squash aef40f1 messags now coherent with the standard Italian translations found elsewhere
squash 621e185 Updated changelog

# Rebase a36f407..621e185 onto 9408d8d
#
# Commands:
#  p, pick = use commit
#  r, reword = use commit, but edit the commit message
#  e, edit = use commit, but stop for amending
#  s, squash = use commit, but meld into previous commit
#  f, fixup = like "squash", but discard this commit's log message
#  x, exec = run command (the rest of the line) using shell
#
# If you remove a line here THAT COMMIT WILL BE LOST.
# However, if you remove everything, the rebase will be aborted.

At this point, since I had a single commit that I cared about on zedr-merge, it was a simple matter to use git cherry-pick to apply it to the 1.x branch instead.

$ git co 1.x
$ git cherry-pick 6dec9429d7994f02e332e22f8009a687ee3944c0

And now git log shows the change all together nicely in one commit:

$ git log 1.x
commit 6dec9429d7994f02e332e22f8009a687ee3944c0
Author: zedr <zedr@zedr.com>
Date:   Mon Nov 21 12:29:32 2011 +0100

 Added Italian translation
 
 Correct timestamps
 
 Removed template header
 
 cancel does not really mean that; fixed (thanks gborelli)
 
 messags now coherent with the standard Italian translations found elsewhere
 
 Updated changelog

This is really the story of my first realization of the power of git rebase. But one word of warning: you should not rebase commits if they have been shared (i.e. by pushing to github) and others may have based further work on them. It can lead to duplicate commits in the history if the derivative branch is later merged as well. In my case this is not a concern, though, since the maintenance branch will never be merged back to master.

I really don't know if this is the best method for my use case, but it at least had the end effect I was going for in this case. I'm still not quite sure what the simplest approach would be if I wanted to end up with all the commits from zedr/master separately in the history of the 1.x branch, instead of squashing them. I'd be interested to hear if other people are using different approaches.

2 comments

Deep thoughts from #plone IRC

by David Glick posted May 07, 2011 01:50 PM

kojiro: "end users" is another funny term. I mean, don't we all use our ends? 
kojiro: the end users justify the mean users.
Wyn: kojiro, I completly agree, I have seriously seen people on here who really do not know what a terminal is
kojiro: Wyn: it's the end of something
kojiro: so you can't be an end user unless you have a terminal

The making of zodb.ws

by David Glick posted Apr 03, 2011 09:55 PM

An explanation of the pile of hacks we used to get ZODB running in the browser.

On April 1st Matthew Wilkes and I announced the launch of ZODB Webscale Edition, which runs the ZODB in a Javascript-based Python interpreter backed by HTML localstorage. It was of course in honor of April Fool's day, as the entire concept of running the ZODB in a browser is a bit silly, but the technology actually works. Here's how we did it.

The concept

Matthew first approached me about a month ago with the intent to pull off something epic for April Fool's day this year. His goal, he explained, was to "make something that nutcases would find useful but everyone else knows is stupid." We discussed various ideas such as supporting ymacs as a richtext editor in Plone before I remembered I had seen a way to run Python in the browser. We quickly ruled out running all of Zope2 in the browser as too big a project, but Matthew suggested doing just the ZODB, and I realized that making it be backed by HTML localstorage could make for a fun, reasonably scoped, buzzword-compliant demo. The idea was born.

The Emscripten Python interpreter

The hardest part of the problem—getting a Python interpreter implemented in Javascript—was already basically solved by the Emscripten project. Their Python interpreter was generated by compiling CPython to LLVM bytecode using clang, then using their tools to translate that into Javascript. The result is a 2.8MB closure-compiled "python.js" which includes the logic of CPython as well as implementations of basic C library calls like sprintf and malloc in terms of operations on a heap which consists of a Javascript array. We didn't have time to get the whole Emscripten toolchain set up and working so that we could build a non-packed python.js, but we did need to understand the basics of this Python interpreter, so we used the Google Closure Compiler to unpack the whitespace of python.js so it was at least semi-readable.

Unfortunately this Python interpreter had a major limitation—it had no implementation of importing modules, so you were limited to things like "sys" which are included statically in CPython. Obviously this wouldn't work for getting the ZODB working.

The import system

I wanted to allow dynamically importing as many things as possible, rather than simply bundling the ZODB code with the interpreter in some fashion. So it seemed like a good approach would be to write a small WSGI import server, and then make the interpreter fetch imports via AJAX in some way. But how, exactly?

I knew that importing in Python calls the __import__ builtin, so I could monkeypatch __builtins__.__import__ to make it somehow fetch the module being imported by name, then manually construct a module using imp.new_module() and exec the fetched code in the new module's namespace. However, this would be Python code running within the sandbox of the Javascript-based interpreter, without the ability to make Javascript calls like fetching via AJAX. So how could we do the actual "fetch the code" step?

We could have used the CPython API (as translated into Javascript) to, from Javascript, create a new "builtin" module with a function for loading a module's code via AJAX. But, neither of us had worked with the Python C API much, let alone with its closure-compiled Javascript variant, and this seemed like too big of a task. So we hit on a simpler hack: we wrote a Javascript function to do the fetching and then "hijacked" the raw_input builtin by replacing the interpreter's reference to it (we picked raw_input because the Emscripten Python interpreter didn't implement it anyway).

The result is a glorious mixture of CPython API (we had to use a bit after all to unpack the argument with the module name and to pack the string with the returned source code) and JQuery:

function raw_input(self, args) { 
    // stack management
    var b = a;
    a += 4;
    for(var d = b;d < a;d++) {
        i[d] = j[d] = 0
    }
    i[b] = 0;
    // unpack argument
    Module._PyArg_UnpackTuple(args, $ba, 0, 1, u([b, 0, 0, 0], 0, o));
    var name = ma(Module._PyString_AsString(Module._PyObject_Str(i[b]))); 

    // fetch via *synchronous* XMLHTTPRequest
    output('Importing ' + name + '...', 'status');
    var source = '';
    jQuery.ajax({
        url: 'lib/' + name,
        error: function(xhr, status, code) {},
        success: function(result) {
            source = result;
        },
        async: false,
        dataType: 'text',
        cache: true
    });

    // return the source as a pointer into the Python heap
    var h = Module.Pointer_make(Module.intArrayFromString(source))
    a = b;
    return Module._PyString_FromString(h);
}
// hijack the raw_input builtin
n[RMb] = Module._builtin_raw_input = raw_input;<p> </p>

The __import__ hook could then be implemented in terms of the new raw_input builtin:

import sys, imp
_known_bad = set()
def __import__(name, globals={}, locals={}, fromlist=[], level=-1):
    if name in _known_bad:
        raise ImportError('Could not fetch module %s from server.' % name)
    if name in sys.modules:
        return sys.modules[name]

    # call our hook (we hijack raw_input below)
    source = raw_input(name)
    if not source:
        _known_bad.add(name)
        raise ImportError('Could not fetch module %s from server.' % name)

    m = imp.new_module(name)
    m.__file__ = name
    sys.modules[name] = m
    if '.' in name:
        parent, basename = name.rsplit('.', 1)
        if parent in sys.modules:
            setattr(sys.modules[parent], basename, m)
    exec source in m.__dict__
    return m
__builtins__.__import__ = __import__

This Python is included inline in the HTML, and found and executed during initialization of the interpreter. It is a bit buggy in its handling of packages, but worked well enough to let us move on to the more interesting aspects of the project.

Making the ZODB work

So we could import things. "import this" worked great. The ZODB? Not so much. You see, we soon found out that Emscripten's Python interpreter is really quite minimalistic in its builtin modules. "os" is not included, as sandboxed Javascript can't access the local filesystem, so anything like "logging" which depends on it was a problem. Things like "threading", "re", and "time" were similarly missing. Even more problematic was the omission of the following modules which are used in pickling (sort of the core function of the ZODB): cPickle, marshal, and struct.

So we started hacking up our copies of the ZODB and transaction packages. We removed all the logging. We took out the threading locks, with the justification that Javascript is single-threaded anyway. time.time() got replaced with a simple incrementing counter. Et cetera. As for cPickle and its dependencies, we borrowed the pure Python implementations from PyPy. We also needed Tres Seaver's branch of ZODB to provide a pure Python implementation of the 'persistent' module. It took a couple evenings, but without too much effort we were eventually able to instantiate a DemoStorage, instantiate a DB, connect to it, and commit transactions on the root object. Major win!

The HTML localstorage backend

But it still wasn't a great demo. We wanted it to be possible to commit a transaction, then come back after leaving the page and be able to access the data that had been committed. And we wanted the persistence to happen in browser localstorage on the client side, rather than by passing values to the server. So we needed to find a way to modify or replace DemoStorage to pass its values to Javascript to be placed in localstorage, and to retrieve them again when the page is loaded.

After the CPython API hackery needed to get the imports working, I was a bit scared about doing a lot of passing values from Python to Javascript and back. So at this point I thought, "Wait. We have the entire Python interpreter runtime state in these Javascript arrays; why don't I just save and restore the whole interpreter?" Ultimately this led me down a rabbit hole to nowhere. I never quite figured out the correct bootstrapping process to get all the necessary Javascript variables re-initialized on subsequent loads, but with the Python heap, stack, etc replaced with the old state. And I was bumping up against the 5MB limit for what can be placed in localstorage.

Fortunately Matthew came along at this point with a different approach. He wrote a very simple ZODB storage class, the HTML5Storage (code), which stores pickles of modified objects and writes them to a (Python) global dict, keyed by object id, when a transaction is committed. And instead of messing around with the CPython API to interface Python with Javascript, he simply made a commit print out the JSON representation of that global dict, with a special identifier that the Javascript implementation of print() was modified to watch for and handle specially by parsing the JSON and stuffing it in localstorage. When the page is loaded, the stored values are passed back to Python by converting the localstorage contents to JSON, executing it as Python, and placing the values back in the Python global dict. (There is a bit of extra hackery to encode backslashes in the Python repr of the pickles, handled by the very Britishly named dodgy_encode function.)

At this point, it should have worked. But there was one more hurdle.

Debugging the pickles

When we tried to reload the root object from the HTML5Storage, we were getting unexpected errors. I compared the pickles that had been generated in the Emscripten interpreter with those generated for a similar object on a real Python interpreter, and noticed that they were not the same. I used the pickletools.dis() function from the stdlib to examine the pickle bytecode, and figured out that the size of some strings in the pickle was getting recorded incorrectly, so the pickles were not being executed correctly during unpickling. Specifically, the size of some strings was getting recorded as \x02 regardless of the actual length of the string.

I tracked the bug down to the repr() of the pickle that Matthew was doing in his dodgy_encode function. A variety of different bytes were all getting repr'd as \x02. And then I tracked this into the PyString_Repr implementation of the Javascriptified Python interpreter, to where it calls sprintf with a format of "\\x%02x". It turns out that Emscripten's implementation of sprintf was incomplete and supported neither the "02" used immediately after % to give a zero-padding width, nor the "x" used to specify printing a hex value.  It was interpreting the "02" literally, and the x was getting truncated off. Once I figured out where this issue was arising, it was a relatively simple matter to adjust the sprintf implementation to handle this format correctly.

And there was pickling, and there was unpickling: the first day. And Matthew and David looked on the hackery and saw that it was good.

The launch

Fortunately we still had a day left to put a bit of polish on the thing before April 1. Ryan Foster was kind enough to whip up a nice web-2.0-product-style design on short notice. (Kudos to Ryan; I basically said "we want something like evernote.com" and he magically figured out exactly the sort of color scheme and logo I had in mind.) We added a bit of varnish in front and added some social media bling to the footer. In a flash of inspiration, I dubbed the site a project of "POSKey Enterprises," a reference to the POSKeyError one gets when trying to load an object from the ZODB that does not exist. We revamped the input/output to be much nicer and closer to a real Python interpreter.

zodb.ws launched to much fanfare on the morning of April 1. Like, I mean, we got literally dozens of pageviews. (Okay, actually a few hundred. And we realize the thing is a bit esoteric.) :) Some people assumed the thing was just a light frontend to a server-side interpreter, so people were more impressed once we explained that everything was actually executing client-side. Thanks to everyone else who retweeted the link for helping spread the word a bit beyond the tiny Zope circles!

Ultimately, I think the project satisfied our goals; after all, hacking is about the journey, not the destination.

3 comments
David Glick

David Glick

I am a problem solver trying to make websites easier to build.

Currently I do this in my spare time as a member of the Plone core team, and during the day as an independent web developer specializing in Plone and custom Python web applications.