Skip to content. | Skip to navigation

Personal tools

>>> ''.join(word[:3].lower() for word in 'David Isaac Glick'.split())

‘davisagli’

Navigation

You are here: Home / Blog / Recombining ZODB storages

Recombining ZODB storages

by David Glick posted Dec 02, 2009 05:15 PM
I recently faced the task of joining back together a Plone site composed of 4 ZODB filestorages that had been (mostly through cavalier naïveté on my part) split asunder some time ago.

Normally I would probably just do a ZEXP export of each of the folders that lived in its own mountpoint, then remove the mountpoints and reimport the ZEXP files into the main database. However, that wasn't going to work in this case because the database included some cross-database references.

Some background: Normally in Zope, mountpoints are the only place where one filestorage references another one, but the ZODB has some support for *any* object to link to any other object in any other database, and this can happen within Zope if you copy an object from one filestorage to another. This is generally bad, since the ZODB's support for cross-database references is partial -- when you pack one filestorage, the garbage collection routine doesn't know about the cross-database references (unless you use zc.zodbdgc), so an object might get removed even if some other filestorage still refers to it, and you'll get POSKeyErrors. Also, in ZODB 3.7.x, the code that handles packing doesn't know about cross-database references, so you'll get KeyError: 'm' or KeyError: 'n' while packing.

Well, this is what had happened to my multi-database, and I wanted to keep those cross-database references intact while I merged the site back into one monolithic filestorage. So I ended up adapting the ZEXP export code to:

  1. traverse cross-database references (the standard ZEXP export ignores them and will not include objects in different filestorages from the starting object),
  2. traverse ZODB mountpoints (removing them in the process),
  3. and rewrite all the oids to avoid collisions in the new merged database.

Here is the script I ended up with. If you need to use it, you should:

  1. Edit the final line to pass the object you want to start traversing from, and the filename you want to write the ZEXP dump to.
  2. Run the script using bin/instance run multiexport.py

"""Support for export of multidatabases."""

##############################################################################
#
# Based on the ZODB import/export code.
# Copyright (c) 2009 David Glick.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
# Version 2.1 (ZPL).  A copy of the ZPL should accompany this distribution.
# THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
# WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
# FOR A PARTICULAR PURPOSE
#
##############################################################################

import logging
import cPickle, cStringIO
from ZODB.utils import p64, u64
from ZODB.ExportImport import export_end_marker
from ZODB.DemoStorage import DemoStorage

logger = logging.getLogger('multiexport')

def export_zexp(self, fname):
    context = self
    f = open(fname, 'wb')
    f.write('ZEXP')
    for oid, p in flatten_multidatabase(context):
        f.writelines((oid, p64(len(p)), p))
    f.write(export_end_marker)
    f.close()

def flatten_multidatabase(context):
    """Walk a multidatabase and yield rewritten pickles with oids for a single database"""
    base_oid = context._p_oid
    base_conn = context._p_jar
    dbs = base_conn.connections
    
    dummy_storage = DemoStorage()

    oids = [(base_conn._db.database_name, base_oid)]
    done_oids = {}
    # table to keep track of mapping old oids to new oids
    ooid_to_oid = {oids[0]: dummy_storage.new_oid()}
    while oids:
        # loop while references remain to objects we haven't exported yet
        (dbname, ooid) = oids.pop(0)
        if (dbname, ooid) in done_oids:
            continue
        done_oids[(dbname, ooid)] = True

        db = dbs[dbname]
        try:
            # get pickle
            p, serial = db._storage.load(ooid, db._version)
        except:
            logger.debug("broken reference for db %s, oid %s", (dbname, repr(ooid)),
                         exc_info=True)
        else:
            def persistent_load(ref):
                """ Remap a persistent id to a new ID and create a ghost for it.
                
                This is called by the unpickler for each reference found.
                """

                # resolve the reference to a database name and oid
                if isinstance(ref, tuple):
                    rdbname, roid = (dbname, ref[0])
                elif isinstance(ref, str):
                    rdbname, roid = (dbname, ref)
                else:
                    try:
                        ref_type, args = ref
                    except ValueError:
                        # weakref
                        return
                    else:
                        if ref_type in ('m', 'n'):
                            rdbname, roid = (args[0], args[1])
                        else:
                            return

                # traverse Products.ZODBMountpoint mountpoints to the mounted location
                rdb = dbs[rdbname]
                p, serial = rdb._storage.load(roid, rdb._version)
                klass = p.split()[0]
                if 'MountedObject' in klass:
                    mountpoint = rdb.get(roid)
                    # get the object with the root as a parent, then unwrap,
                    # since there's no API to get the unwrapped object
                    mounted = mountpoint._getOrOpenObject(app).aq_base
                    rdbname = mounted._p_jar._db.database_name
                    roid = mounted._p_oid

                if roid:
                    print '%s:%s -> %s:%s' % (dbname, u64(ooid), rdbname, u64(roid))
                    oids.append((rdbname, roid))

                try:
                    oid = ooid_to_oid[(rdbname, roid)]
                except KeyError:
                    # generate a new oid and associate it with this old db/oid
                    ooid_to_oid[(rdbname, roid)] = oid = dummy_storage.new_oid()
                return Ghost(oid)

            # do the repickling dance to rewrite references
            
            pfile = cStringIO.StringIO(p)
            unpickler = cPickle.Unpickler(pfile)
            unpickler.persistent_load = persistent_load

            newp = cStringIO.StringIO()
            pickler = cPickle.Pickler(newp, 1)
            pickler.persistent_id = persistent_id

            pickler.dump(unpickler.load())
            pickler.dump(unpickler.load())
            p = newp.getvalue()

            yield ooid_to_oid[(dbname, ooid)], p

class Ghost(object):
    __slots__ = ("oid",)
    def __init__(self, oid):
        self.oid = oid

def persistent_id(obj):
    if isinstance(obj, Ghost):
        return obj.oid

export_zexp(app.mysite, '/tmp/mysite.zexp')

Download multiexport.py

I've used this script with apparent success, but it has not been extensively tested and your mileage may of course vary.

amleczko says:
Dec 09, 2009 11:41 PM
what about using ZODB 3.8.x? It handles multi-references quite well
David Glick says:
Dec 09, 2009 11:55 PM
ZODB 3.8 doesn't throw KeyErrors while packing anymore if it encounters cross-references, but it still isn't smart enough to keep from garbage collecting an object that is only referenced from another database. For that you need zc.zodbdgc, I think. In ZODB 3.9 there is an option to disallow cross-references entirely in the first place.
Linux_Blade_guy says:
Dec 02, 2009 11:09 PM
You are my hero, seriously!! <p>I inherited a Plone site with exactly this scenario - the portal_catalog has been split in to another database, I get 'n' key errors when packing and it's had POSKeyError problems in the past. <p>Haven't tried out your solution yet but this is the first bit of info I've seen that both explains the problem and tells what to do about it!
Jean Jordaan says:
Mar 14, 2010 12:10 AM
When is a good time to split ZODB? I want to have a filestorage per Plone instance. Reasons are that each Plone is backed up seperately, and they're easy to move to another Zope if I get the urge. I can also easily see which ones are biggest. However if they're going to cause breakage with cross-database cut'n'paste, it may be a bad idea.
David Glick says:
Mar 14, 2010 12:24 AM
@Jean: We routinely create a separate filestorage for each site, for the same reasons you describe. We've never had trouble with cross-references except when we were using several filestorages for the *same* site. (Any of our clients only has access to one site, and our staff knows not to copy and paste between sites.)
Navigation