Skip to content. | Skip to navigation

Personal tools

>>> ''.join(word[:3].lower() for word in 'David Isaac Glick'.split())

‘davisagli’

Navigation

You are here: Home

David Glick – Plone developer

by admin posted Apr 05, 2010 11:48 PM

Recombining ZODB storages

by David Glick posted Dec 02, 2009 05:15 PM

I recently faced the task of joining back together a Plone site composed of 4 ZODB filestorages that had been (mostly through cavalier naïveté on my part) split asunder some time ago.

Normally I would probably just do a ZEXP export of each of the folders that lived in its own mountpoint, then remove the mountpoints and reimport the ZEXP files into the main database. However, that wasn't going to work in this case because the database included some cross-database references.

Some background: Normally in Zope, mountpoints are the only place where one filestorage references another one, but the ZODB has some support for *any* object to link to any other object in any other database, and this can happen within Zope if you copy an object from one filestorage to another. This is generally bad, since the ZODB's support for cross-database references is partial -- when you pack one filestorage, the garbage collection routine doesn't know about the cross-database references (unless you use zc.zodbdgc), so an object might get removed even if some other filestorage still refers to it, and you'll get POSKeyErrors. Also, in ZODB 3.7.x, the code that handles packing doesn't know about cross-database references, so you'll get KeyError: 'm' or KeyError: 'n' while packing.

Well, this is what had happened to my multi-database, and I wanted to keep those cross-database references intact while I merged the site back into one monolithic filestorage. So I ended up adapting the ZEXP export code to:

  1. traverse cross-database references (the standard ZEXP export ignores them and will not include objects in different filestorages from the starting object),
  2. traverse ZODB mountpoints (removing them in the process),
  3. and rewrite all the oids to avoid collisions in the new merged database.

Here is the script I ended up with. If you need to use it, you should:

  1. Edit the final line to pass the object you want to start traversing from, and the filename you want to write the ZEXP dump to.
  2. Run the script using bin/instance run multiexport.py

"""Support for export of multidatabases."""

##############################################################################
#
# Based on the ZODB import/export code.
# Copyright (c) 2009 David Glick.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
# Version 2.1 (ZPL).  A copy of the ZPL should accompany this distribution.
# THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
# WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
# FOR A PARTICULAR PURPOSE
#
##############################################################################

import logging
import cPickle, cStringIO
from ZODB.utils import p64, u64
from ZODB.ExportImport import export_end_marker
from ZODB.DemoStorage import DemoStorage

logger = logging.getLogger('multiexport')

def export_zexp(self, fname):
    context = self
    f = open(fname, 'wb')
    f.write('ZEXP')
    for oid, p in flatten_multidatabase(context):
        f.writelines((oid, p64(len(p)), p))
    f.write(export_end_marker)
    f.close()

def flatten_multidatabase(context):
    """Walk a multidatabase and yield rewritten pickles with oids for a single database"""
    base_oid = context._p_oid
    base_conn = context._p_jar
    dbs = base_conn.connections
    
    dummy_storage = DemoStorage()

    oids = [(base_conn._db.database_name, base_oid)]
    done_oids = {}
    # table to keep track of mapping old oids to new oids
    ooid_to_oid = {oids[0]: dummy_storage.new_oid()}
    while oids:
        # loop while references remain to objects we haven't exported yet
        (dbname, ooid) = oids.pop(0)
        if (dbname, ooid) in done_oids:
            continue
        done_oids[(dbname, ooid)] = True

        db = dbs[dbname]
        try:
            # get pickle
            p, serial = db._storage.load(ooid, db._version)
        except:
            logger.debug("broken reference for db %s, oid %s", (dbname, repr(ooid)),
                         exc_info=True)
        else:
            def persistent_load(ref):
                """ Remap a persistent id to a new ID and create a ghost for it.
                
                This is called by the unpickler for each reference found.
                """

                # resolve the reference to a database name and oid
                if isinstance(ref, tuple):
                    rdbname, roid = (dbname, ref[0])
                elif isinstance(ref, str):
                    rdbname, roid = (dbname, ref)
                else:
                    try:
                        ref_type, args = ref
                    except ValueError:
                        # weakref
                        return
                    else:
                        if ref_type in ('m', 'n'):
                            rdbname, roid = (args[0], args[1])
                        else:
                            return

                # traverse Products.ZODBMountpoint mountpoints to the mounted location
                rdb = dbs[rdbname]
                p, serial = rdb._storage.load(roid, rdb._version)
                klass = p.split()[0]
                if 'MountedObject' in klass:
                    mountpoint = rdb.get(roid)
                    # get the object with the root as a parent, then unwrap,
                    # since there's no API to get the unwrapped object
                    mounted = mountpoint._getOrOpenObject(app).aq_base
                    rdbname = mounted._p_jar._db.database_name
                    roid = mounted._p_oid

                if roid:
                    print '%s:%s -> %s:%s' % (dbname, u64(ooid), rdbname, u64(roid))
                    oids.append((rdbname, roid))

                try:
                    oid = ooid_to_oid[(rdbname, roid)]
                except KeyError:
                    # generate a new oid and associate it with this old db/oid
                    ooid_to_oid[(rdbname, roid)] = oid = dummy_storage.new_oid()
                return Ghost(oid)

            # do the repickling dance to rewrite references
            
            pfile = cStringIO.StringIO(p)
            unpickler = cPickle.Unpickler(pfile)
            unpickler.persistent_load = persistent_load

            newp = cStringIO.StringIO()
            pickler = cPickle.Pickler(newp, 1)
            pickler.persistent_id = persistent_id

            pickler.dump(unpickler.load())
            pickler.dump(unpickler.load())
            p = newp.getvalue()

            yield ooid_to_oid[(dbname, ooid)], p

class Ghost(object):
    __slots__ = ("oid",)
    def __init__(self, oid):
        self.oid = oid

def persistent_id(obj):
    if isinstance(obj, Ghost):
        return obj.oid

export_zexp(app.mysite, '/tmp/mysite.zexp')

Download multiexport.py

I've used this script with apparent success, but it has not been extensively tested and your mileage may of course vary.

5 comments

Seeing a real-time breakdown of web traffic by vhost

by David Glick posted Oct 01, 2009 01:54 AM

Occasionally our servers are hit by traffic spikes. Since we typically host a number of websites per server, we need a way to quickly determine which site is receiving the bulk of incoming requests. (Then we can improve caching on that site, perhaps.) In order to see a real-time indication of what vhosts are being requested, we use the following awk script:

histo.awk

# creates a histogram of values in the first column of piped-in data
function max(arr, big) {
    big = 0;
    for (i in cat) {
        if (cat[i] > big) { big=cat[i]; }
    }
    return big
}

NF > 0 {
    cat[$1]++;
    if (!start) { start = $6 }
    end = $6
}
END {
    printf "from %s to %s\n", start, end
    maxm = max(cat);
    for (i in cat) {
        scaled = 60 * cat[i] / maxm;
        printf "%-25.25s  [%8d]:", i, cat[i]
        for (i=0; i<scaled; i++) {
            printf "#";
        }
        printf "\n";
    }
}

Which can be used like this:

watch 'tail -n 100 /var/log/apache2/access_log | awk -f histo.awk | sort -nrk3'

which will give a histogram of the occurence of vhosts in the last 100 lines of the apache log, updating every 2 seconds, sorted with the most frequent vhosts at the top. (Note that this assumes you are using an apache log format which includes the vhost as the first column.) It looks something like this:

Every 2.0s: tail -n 100 /var/log/apache2/access_log | awk -f histo.awk | sort -nrk3       Thu Oct  1 09:51:41 2009

www.dogwoodinitiative.org  [      49]:############################################################
www.wildliferecreation.or  [      24]:##############################
www.earthministry.org      [      14]:##################
blogs.onenw.org            [       3]:####
www.tilth.org              [       2]:###
www.oeconline.org          [       2]:###
www.audubonportland.org    [       1]:##
oraction.org               [       1]:##
oeconline.org              [       1]:##
dogwoodinitiative.org      [       1]:##
bandon.onenw.org           [       1]:##
209.40.194.148             [       1]:##
from [01/Oct/2009:09:51:21 to [01/Oct/2009:09:48:40

(Another useful variant of this is to produce a histogram of requests by IP address, which can help determine what to block in a DOS attack.)

2 comments

Extending kupu's initialization with a Javascript wrapper decorator

by David Glick posted Jul 20, 2009 03:15 PM

Today I found myself struggling to do something in Javascript that I'm used to doing with ease in Python -- replace an existing method (defined by code I don't want to touch) with a wrapper that calls the original method and then also performs some additional actions. (Yeah, it's a monkey patch. But sometimes it's a cleaner and more maintainable way to extend something than the alternatives.)

In particular, I was trying to adjust the default kupu configuration without overriding kupuploneinit.js to add commands directly to the initPloneKupu method. Here's the snippet that got me there:

var augmentKupuInit = function(orig_fn) {
  return function(){
    var editor = orig_fn.apply(this, arguments);
    // do what you need to on the editor object here.
    // For example, I was trying to prevent kupu from
    // filtering the 'fb:fan' tag of Facebook's "Fan Box"
    // widget, like so:
    editor.xhtmlvalid.tagAttributes['fb:fan'] = ['*'];
    return editor;
  };
};
initPloneKupu = augmentKupuInit(initPloneKupu);

This defines a decorator function called augmentKupuInit that can be used to wrap another function. Then it uses it to wrap the original initPloneKupu method, calling the newly generated function initPloneKupu. As long as this snippet is registered in such a way that it loads after kupuploneinit.js and before the initPloneKupu method is called, it works like a charm!

(Many thanks to http://stackoverflow.com/questions/326596/how-do-i-wrap-a-function-in-javascript, which finally pointed me in the right direction.)

1 comment
David Glick

David Glick

I am a problem solver trying to make websites easier to build.

Currently I do this in my spare time as a member of the Plone core team, and during the day as an independent web developer specializing in Plone and custom Python web applications.