Skip to content. | Skip to navigation

Personal tools

>>> ''.join(word[:3].lower() for word in 'David Isaac Glick'.split())

‘davisagli’

Navigation

You are here: Home / Blog / Visualizing the ZODB with graphviz

Visualizing the ZODB with graphviz

by David Glick posted May 22, 2009 11:25 AM

While digging around in the ZEXP export code, I realized that it wouldn't be too hard to modify it to dump a representation of a ZODB in graphviz .dot format. Here's a Zope external method I devised to do that:

 

# Generic ZODB walker and graphviz exporter

####################################################################
#
# Copyright (c) 2003 Zope Corporation and Contributors.
# All Rights Reserved.
#
# This software is subject to the provisions of the Zope Public License,
# Version 2.1 (ZPL).  A copy of the ZPL should accompany this distribution.
# THIS SOFTWARE IS PROVIDED "AS IS" AND ANY AND ALL EXPRESS OR IMPLIED
# WARRANTIES ARE DISCLAIMED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
# WARRANTIES OF TITLE, MERCHANTABILITY, AGAINST INFRINGEMENT, AND FITNESS
# FOR A PARTICULAR PURPOSE.
#
####################################################################

import logging
import cPickle, cStringIO
from ZODB.utils import u64

logger = logging.getLogger('ZODB.ExportImport')

def get_reference_dumper(refs):
    # This is a callback which will be called whenever a reference is found.
    def dump_reference(oid, roid):
        refs.append('%s -> %s\n' % (u64(oid), u64(roid)))
    return dump_reference

def export_graphviz(self):
    """
    Walks a ZODB database and dumps the object graph in graphviz .dot format.
    """
    context = self
    f = open('plone.dot', 'w')
    f.write('digraph plone {\n')
    refs = []
    reference_dumper = get_reference_dumper(refs)
    for oid, p in walk_database(context, reference_callback=reference_dumper):
        # Walk to all the objects in the database and examine their references.
        # Whenever a reference is found, it will be recorded via the
        # reference_dumper.  Whenever a new object is found, it will be yieled
        # to this loop.

        # Read the module and class from the pickle bytestream without actually
        # loading the object.
        module, klass = p.split('\n')[:2]
        module = module[2:]
        
        f.write('%s [label="%s.%s"]\n' % (u64(oid), module, klass))
    for ref in refs:
        f.write(ref)
    f.write('}\n')
    f.close()

def walk_database(context, reference_callback=None):
    # Get the object ID and database connection of the starting object.
    base_oid = context._p_oid
    conn = context._p_jar
    
    # oids is used to keep track of found oids that need to be visited.
    # done_oids is used to keep track of which oids have already been yielded.
    oids = [base_oid]
    done_oids = {}
    while oids:
        # loop while references remain to objects we haven't exported yet
        oid = oids.pop(0)
        if oid in done_oids:
            continue
        done_oids[oid] = True
        
        try:
            # fetch the pickle
            p, serial = conn._storage.load(oid, conn._version)
        except:
            logger.debug("broken reference for oid %s", repr(oid),
                         exc_info=True)
        else:
            # If the Unpickler's persistent_load attribute is set to a list,
            # then that list will be populated with the references found in
            # the pickle when noload is called, without actually loading the
            # object.
            refs = []
            u = cPickle.Unpickler(cStringIO.StringIO(p))
            u.persistent_load = refs
            # noload must be called the same # of times it was called when
            # pickling
            u.noload()
            u.noload()

            # loop through the references found on this object
            for ref in refs:

                # look for the various reference types supported by the ZODB
                # (see the docs in ZODB/serialize.py for details)
                if isinstance(ref, tuple):
                    roid = ref[0]
                elif isinstance(ref, str):
                    roid = ref
                else:
                    try:
                        ref_type, args = ref
                    except ValueError:
                        # weakref - not supported
                        continue
                    else:
                        if ref_type in ('m', 'n'):
                            # cross-database ref - not supported
                            continue
                if roid:
                    # record this reference
                    if reference_callback:
                        reference_callback(oid, roid)

                    # add the referenced object to the list of objects we need
                    # to visit
                    oids.append(roid)

            # yield the oid and pickle
            yield oid, p

Download graphviz_export.py

And after running this on a fresh Plone site, sending the result through dot and loading it in zgrviewer, here's the result:

ZODB graphviz visualization

The site root is toward the upper right; most of the graph is persistent tools and such rather than actual content, since there is minimal content in a fresh Plone installation. That hairy mess on the left is the mimetype registry. Any resemblance to the shape of the BFG logo is entirely coincidental.

I'm not really sure what sort of useful information one might be able to get using this sort of technique, but I'm sure there are some possibilities, so please let me know if you have ideas or if you modify this to do something cool.

I want to try this on a site that has real data in it, but at the moment I'm waiting for the latest XCode to download so that I can build the newest graphviz which includes sfdp which is supposed to be better for handling really big graphs.

Ken Lyle says:
May 23, 2009 11:09 PM
This looks really promising...but not sure for what, exactly, as the author says...ATM, it looks more like a picture than a map. When I started with the mapping company, they told me the diff. was a scale and a North arrow. Something analagous, and maybe some labels would be great.
Brent Woodruff says:
May 23, 2009 01:34 AM
Perhaps instead of going straight to a visual representation you could load it up in NetworkX and use graph theory algorithms to obtain statistics.
Navigation