~ 3 min read
Gearman – offloading Drupal tasks to a job server
At work, we’ve recently needed to offload some processing tasks to a background job server. I’ve worked with Gearman prior to my current position at HP as well as prior to my experience with Drupal and I’m happy to see there’s an extension for that available.
About Gearman
Gearman is an all-purpose job server that manages work between distributed workers for client which throw jobs at it. It has libraries implemented in many languages, whether it’s C, PHP, Python and others so you can quickly get started by installing it and the library for your language of choice. Due to this nature, it also allows you to offload processing of complex computational jobs for example, to workers written in C or C++ for efficiency or other reasons which is great as it doesn’t lock you in.
A Gearman module is available for Drupal 6 only (at this time) and is basically a developer’s module as it just provides an API for other modules to hook up and implement their queued job functions. Moreover, it provides a drush interface for both the worker and the client, meaning you can simply run drush gearman-worker
from the command line as well as triggering jobs from command line via the drush gearman-client
.
To integrate with gearman, it’s required to:
- Install the gearman server (work this out with your favorite Linux distribution)
- Install the gearman php library which is available through pecl to install
- Install the gearman drupal module (http://www.drupal.org/project/gearman)
Defining Gearman job functions
A module needs to implement hook_gearman_drush_function() as follows:
/**
* Implementation of hook_gearman_drush_function()
*
* @return an associative array with the data for GearmanWorker::addFunction
* so that a drush-based worker can know about them automatically.
*
* This array has the following parameters
*
* - 'function_name': The name of a function to register with the job server
* - 'function': A callback that gets called when a job for the registered
* function name is submitted
* - 'context': A reference to arbitrary application context data that can be
* modified by the worker function
* - 'timeout': An interval of time in seconds
*
* This implementation has a general function for wrapping other drush commands,
* as well as the classic "reverse text" example.
*/
function mymodule_gearman_drush_function() {
return array(
array(
'function_name' => 'notifications_send_users',
'function' => 'mymodule_notifications_send_users',
),
);
}
Where-as the hook implementation simply returns an array with the function names as they are required to be invoked with gearman and the actual function name for the callback (the docblock is pretty self-explaining on that).
Invoking a gearman job from Drupal can be easily done via calling
gearman_drush_client($function_name, $data)
with the function name as string (in context with the example above this would be ‘notifications_send_users’) and the data to send off to do this job as a string.
Hint: go ahead and serialize/jsonify the array/object to make your life easy
With gearman_drush_client() make sure that the job runs in the background via $gmclient->doBackground();
as otherwise you didn’t gain much by running it in blocking mode.
Next up, in the actual gearman job function definition, you can harvest the data payload such as:
/**
* Implements gearman notifications_send_users callback
*
* @param object $job gearman job object
*/
function mymodule_notifications_send_users($job) {
if (isset($job->workload()) && is_array($job->workload())) {
$workload = unserialize($job->workload());
$workload_size = $job->workloadSize();
}
// ... rest of the code
}
If you’ve serialized your objects when sending the job to the queue you’ll want to unserialize the value of
$job->workload();
first.