Skip to content. | Skip to navigation

Personal tools

>>> ''.join(word[:3].lower() for word in 'David Isaac Glick'.split())

‘davisagli’

Navigation

You are here: Home / Blog / Recent improvements to Plone-Salesforce integration

Recent improvements to Plone-Salesforce integration

by David Glick posted Jul 10, 2013 11:45 PM
The simple-salesforce client and a celery-based message queue are keys to making Plone-Salesforce integration simpler and more robust.

I've been working on yet another big project that involves synchronizing data between Plone and Salesforce. As in past projects, Plone provides a powerful web content management platform that can support lots of editors with a number of different roles without paying for additional license fees, while Salesforce's superior reporting tools provide motivation for replicating data there as well. With help from Jason Lantz, I've been working on modernizing the toolset used to achieve this replication.

The improvements are on several fronts:

  1. Using the simple-salesforce library instead of beatbox and suds
  2. Handling incoming and outgoing data via a message queue
  3. Using Plone ids as a external id in Salesforce

Let me cover each of these in more detail.

simple-salesforce vs. beatbox and suds

Salesforce originally had only a SOAP API, and for a long time the clients available for interacting with Salesforce from Python only worked with the SOAP API. There was beatbox, which supported the generic Salesforce partner webservice pretty well but didn't handle custom webservices. And there was suds, which could parse WSDL for custom webservices and allow interacting with them. This ecosystem was annoying: you needed different libraries for different tasks, they both were as error-prone as most SOAP clients are, and you needed to update WSDL files whenever someone made a change to the webservice.

These days Salesforce has a REST API—both for generic operations and available from Apex for custom webservices—and Jason called my attention to the simple-salesforce library from the team at New Organizing Institute. I read the code and was impressed: it's only about 500 lines of code and I don't think I would do much differently if I were writing it from scratch. (I'd probably add some tests, but it is working okay for me in practice.) Using simple-salesforce, inserting a contact looks something like this:

from simple_salesforce import Salesforce
sf = Salesforce(
    username='SF_USERNAME', password='SF_PASSWORD',
    security_token='SF_TOKEN', sandbox=True)
sf.Contact.insert({
    'FirstName': 'David',
    'LastName': 'Glick',
    'Email': 'david@glicksoftware.com',
})

Handling data via a message queue

The first generation of Plone-Salesforce integration tools made synchronous calls out to Salesforce. This was the easiest thing to get working, but had some problems:

  • High latency. This was a problem both for user experience and because it tied up server threads and increased the likelihood of database conflict errors.
  • Tight coupling. If Salesforce was temporarily unreachable due to network issues or a maintenance window, the Plone user would see errors.

To solve these problems we knew we needed a message queue to allow for asynchronous processing of data being relayed from one system to the other. We did some research and chose celery (with RabbitMQ as a backend). Celery has a lot of features including support for multiple backends and the flower UI for viewing the queue, and it seems to have some traction in other parts of the Python web programming world.

At first I was worried that celery would be difficult to integrate with Zope, but it turns out it's not very complicated and hardly even warrants the creation of a new library. Celery uses a connection pool, so can be used from within a threaded environment like Zope. So the main consideration is making sure that it plays nicely with Zope's transaction management: you don't want a job to get queued twice if the code that places it on the queue runs within a transaction that hits a database conflict and gets retried. Using an after-commit hook makes this pretty easy to deal with. I ended up creating a @salesforce_task decorator which encapsulates the logic for deferring the queueing to the right time as well as creating a Salesforce connection. See the code.

Then there's the other direction: polling Salesforce periodically for data that have been updated. I used to handle this with a cron job and "bin/instance run" script, but celery has support for periodic tasks so I figured I'd give that a try. The trick here is that this task needs to modify objects in the ZODB, so needs a full Zope environment set up when it runs in the celery worker process. Doing this is non-obvious but not especially difficult; I created another decorator called @zope_task which handles the dirty details. See the code.

Using Plone ids as a external id in Salesforce

To sync multiple changes to the same object between the 2 systems, we either need to store a Salesforce Id on the object in Plone, or a Plone id on the object in Salesforce. In the past I've stored a Salesforce Id in Plone, but this is annoying, because it means when you create a new object you can't push it to Salesforce asynchronously, because you need to wait for the id of the inserted object and store it. It turns out the Salesforce API has pretty good support for referring to objects by an "external id field" if you create one. So now we are adding an external id field called Plone_Id__c to our objects in Salesforce, and then it's pretty darn easy to push changes to Salesforce:

@salesforce_task
def do_upsert(sf, sobject_type, record_id, data):
    sobject = getattr(sf, sobject_type)
    return sobject.upsert(record_id, data)

@grok.subscribe(IContact, IObjectAddedEvent)
@grok.subscribe(IContact, IObjectModifiedEvent)
def handle_contact_modified(contact, event):
    data = {
        'FirstName': contact.first_name,
        'LastName': contact.last_name,
    }
    do_upsert.delay('Contact', 'Plone_Id__c/' + contact.UID(), data)

(In practice I have some extra checks to make sure that we don't call out to Salesforce while tests are running, or when those events are triggered due to pulling data in from Salesforce.)

You can even use an external id when referencing related objects, so we could update a committee that references the Contact who is chairing it like this:

@grok.subscribe(ICommittee, IObjectAddedEvent)
@grok.subscribe(ICommittee, IObjectModifiedEvent)
def handle_committee_modified(committee, event):
    data = {
        'Name': committee.title,
        'Chair__r': {  # this is the relation to the chair in Salesforce
            'Plone_Id__c': committee.chair  # in Plone we store a UID of the chair
        },
    }
    do_upsert.delay('Committee__c', 'Plone_Id__c/' + committee.UID(), data)

Next steps

This is all working very nicely and I'm *much* happier than I was with the old approaches. I'm also quite happy that the new approaches are for the most part Zope-agnostic, and could easily be modified to handle Salesforce integration from other Python-based platforms. Potential directions for future development (if someone wants to pay me to work on them) are:

  • Package up the decorators for creating celery tasks that run within Zope or do callouts to external webservices into a reusable library.
  • Create a higher-level, reusable Plone add-on with a UI for configuring synchronization between Plone content types and Salesforce objects.
Gil Forcada says:
Jul 11, 2013 01:55 PM
I'm curious about it, but did you evaluate plone.app.async and/or collective.cron to do some of the tasks?
David Glick says:
Jul 11, 2013 03:15 PM
plone.app.async would probably work too, but it feels like a large amount of code that's not being actively maintained and improved. So I preferred to use celery which is used by a larger community.
Navigation