Setting up AEM Author Workflow Offloading

Setting up AEM Author Workflow Offloading

July 10, 2018 0 By Tad Reeves

An issue with AEM which has persisted since the earliest days of the platform, is that the Authoring environment has never been good at horizontally scaling.  By authoring environment, this also extends to all of the other things that the Author AEM instance usually does, like image workflows, PDF rendering, video transcoding, and the like.

One model for attempting to horizontally scale the AEM author is to do Workflow Offloading, where you offload the heavy-duty tasks from the AEM author onto a separate AEM instance which is there only to process workflows and then return the payload back to the primary author instance.  This has the purported benefit of being able to take major CPU-intensive and I/O-intensive ops and have them executed by a secondary server which is NOT the one your lag-sensitive authoring users are clicking around on.

However, please be warned – setting up offloading is fraught with pitfalls, and you’ll want to be very-super-extra-sure that you really want to go the offloading route before you try, because usually you’ll just be better-served by beefing up your author box and optimizing your workflows.

Diagrams of AEM Offloading Setup Architectures

AEM Offloading Author with Shared NFS/NAS Datastore

AEM Offloading Author with Shared NFS/NAS Datastore

Above is the way that Adobe recommends setting up your workflow offloading, based on the offloading best practices document here.  Specifically:

  • You’ve already split your segmentstore and datastore at AEM installation time
  • You are using FileDatastore on an externalized NAS/NFS mount, and are sharing this datastore with the AEM Offload Instance
  • You are using binary-less replication to get the workflows and their payloads back and forth between the AEM Author master (leader) instance that your users are logging in to, and the Offload Author(s) that are handling the offloaded workflows.

It is also theoretically possible to do a similar setup using S3Datastore, though such a scenario isn’t explicitly documented by Adobe.   This setup would look like the following:

Diagram of AEM Offloading with S3 Datastore

Diagram of AEM Offloading with S3 Datastore

Theoretically, offloading also works using an S3 Datastore.  When one has an AEM Assets environment spanning multiple (or tens) of terabytes, it’s obviously advantageous to be able to use S3’s low-cost storage to store AEM assets once, rather than multiplying this storage across a shared-nothing publish environment, all on higher-cost EBS storage.

However, even getting workflow offloading to work at all using S3 is an undocumented and yet extremely effective source of pain and agony.  One of the reasons for this is that when an item is uploaded to S3, it’s first written to a local S3 cache while an async call is sent off to persist the binary to S3.  However, there’s no flag in the workflow to wait until the item is persisted out to S3 before kicking off the offload which then attempts binary-less replication of the workflow to the offload author which would then try to access the binary that perhaps is not available yet.  Race condition excellence.

Note: In case it’s not clear from the above, I’m midst working presently on getting workflow offloading working on S3-backed EC2 Author instances, so when I’ve got that working, I’ll update this post.

Adobe Documentation on Workflow Offloading