darkflib.github.io

Site Reliability Team Lead at News UK

View on GitHub
26 September 2011

Gearman Coalescing With The Unique Id

by Mike

Many people on both the mailing lists and across the net seem to be slightly confused as to the coalescing features of gearman. I will try to explain what it is and how it works here…

If you are using gearman to generate data for memcache cache misses for a web page then if many people hit that page at once then you will get many jobs all requesting the same thing…

Without coalescing, all these jobs would run one after the other each one setting the same keys in memcache. This is obviously not ideal.

However, with coalescing, as long as all these jobs have the same unique id, they will all be merged into a single job and the result given back to all waiting clients.

An example might help.

<?php

/* create our object */
$gmclient= new GearmanClient();

/* add the default server */
$gmclient->addServer();

/* start some background jobs and save the handles */
$handles = array();
$handles[0] = $gmclient->doBackground("reverse", "Hello World!");
$handles[1] = $gmclient->doLowBackground("reverse", "Aardvarks!");
$handles[2] = $gmclient->doHighBackground("reverse", "Foo");
$handles[3] = $gmclient->doLowBackground("reverse", "Foo");
$handles[4] = $gmclient->doBackground("reverse", "Foo");
$handles[5] = $gmclient->doHighBackground("reverse", "Foo");
$handles[6] = $gmclient->doHighBackground("reverse", "Foo");

$gmclient->setStatusCallback("reverse_status");

/* Poll the server to see when those background jobs finish; */
/* a better method would be to use event callbacks */
do
{
   /* Use the context variable to track how many tasks have completed */
   $done = 0;
for ($i=0; $i<count($handles);$i++) {
   $gmclient->addTaskStatus($handles[$i], &$done);
}
   $gmclient->runTasks();
   echo "Done: $done\n";
   sleep(1);
}
while ($done != count($handles));

function reverse_status($task, $done)
{
   if (!$task->isKnown())
      $done++;
}

?>

As you’d expect, with the prior code, the client sees each request come back one by one (including the ‘Foo’ tasks).

Now let us contrast that to the following, which is almost identical except the ‘Foo’ tasks all have the same unique identifier.

 root@debian:~# cat gmclient3.php
<?php

/* create our object */
$gmclient= new GearmanClient();

/* add the default server */
$gmclient->addServer();

/* start some background jobs and save the handles */
$handles = array();
$handles[0] = $gmclient->doBackground("reverse", "Hello World!");
$handles[1] = $gmclient->doLowBackground("reverse", "Aardvarks!");
$handles[2] = $gmclient->doHighBackground("reverse", "Foo",'foo');
$handles[3] = $gmclient->doLowBackground("reverse", "Foo",'foo');
$handles[4] = $gmclient->doBackground("reverse", "Foo",'foo');
$handles[5] = $gmclient->doHighBackground("reverse", "Foo",'foo');
$handles[6] = $gmclient->doHighBackground("reverse", "Foo",'foo');

$gmclient->setStatusCallback("reverse_status");

/* Poll the server to see when those background jobs finish; */
/* a better method would be to use event callbacks */
do
{
   /* Use the context variable to track how many tasks have completed */
   $done = 0;
for ($i=0; $i<count($handles);$i++) {
   $gmclient->addTaskStatus($handles[$i], &$done);
}
   $gmclient->runTasks();
   echo "Done: $done\n";
   sleep(1);
}
while ($done != count($handles));

function reverse_status($task, $done)
{
   if (!$task->isKnown())
      $done++;
}

?>

Now when this is executed all the Foo tasks come back at the same time - they have been coalesced into a single task and the result copied to each client.

The drawback for this though, is that priorities don’t always work the way you expect them to. The first priority (Low, Normal or High) of any task is the one given to all tasks, so if it is urgent and you have Low priority tasks with the same unique id, then to get this out as fast as possible, you might want to avoid coalescing at the risk of duplicating work (unless you also have workers check memcache too).

I hope this is enlightening to some people that have been wondering about the feature and any comments feel free to email me at mike at technomonk dot com

tags: