Distributed parallelization of tasks
- Install beanstalkd
- Download a glow binary and add it to
$PATH
- Install Go
- Install the lentil beanstalkd client library
go get github.com/nutrun/glow
orcd <path-to-glow-source> && go install
Start beanstalkd:
$ beanstalkd
Start a glow listener:
$ glow -listen
Submit a job:
$ glow -tube=test -out=/dev/stdout ls
The job's output should appear on the terminal running the glow listener. Invoke glow -h
to list all available options.
A listener connects to the beanstalk queue specified by the environment variable GLOW_QUEUE
(it defaults to 0.0.0.0:11300
if GLOW_QUEUE
isn't specified), waits for jobs and executes them as they become available. In order to achieve parallelism, a glow system will have many hosts and a number of listeners on each host. The number of listeners per host should depend on the type of job and number of available cores.
Listen options:
$ glow -h 2>&1 | grep listen
Start a listener:
$ GLOW_QUEUE=10.0.0.4:11300 glow -listen
Log not only errors:
$ glow -listen -v
A beanstalk tube is a priority based fifo queue of jobs. In glow, a tube can depend on one or more other tubes. Tube dependencies are specified in a JSON file:
$ cat > glow-deps.json
{
"foo": ["bar"],
"baz": ["foo", "bar"]
}
$ glow -listen -deps=glow-deps.json
- Tube
foo
depends on tube bar: no jobs fromfoo
will run while there are ready/delayed/reserved jobs inbar
- Tube
bar
does not have any dependencies. Jobs frombar
will run whenever there are free listeners available - Tube
baz
depends on tubebar
andfoo
. It will block untilbar
andfoo
are done - Dependencies are not transitive. If
foo
depends onbar
andbaz
depends onfoo
,baz
doesn't depend onbar
A listener will not reserve jobs from any of the tubes specified by the exclude
flag:
$ glow -listen -exclude=foo,bar
The SMTP server and email FROM
field can be configured for glow's job failure email notifications:
$ glow -listen -SMTP-server=SMTP.example.com [email protected]
Emails will only be sent when a list of recipients has been specified at job submission.
SIGTERM
kills a listener and its running job immediatly:
$ killall glow
Shut down gracefully (wait for job to finish) with SIGINT
:
$ killall -SIGINT glow
Submit options:
$ glow -h 2>&1 | grep submit
Send a job to a tube on the beanstalkd queue to be executed by a listener (-tube
is required):
$ glow -tube=mytube mycmd arg1 arg2 # [...argn]
Delay is an integer number of seconds to wait before making the job avaible to run:
$ glow -tube=mytube -delay=60 mycmd arg1 arg2
$ glow -tube=mytube [email protected],[email protected] mycmd arg1 arg2
Job stdout
and stderr
can be redirected to a file:
$ glow -tube=mytube -stdout=/tmp/mycmd.out -stderr=/tmp/mycmd.err mycmd arg1 arg2
By default, a job's stdout
and stderr
are sent to /dev/null
Priority is an integer < 2**32. Jobs with smaller priority values will be scheduled before jobs with larger priorities:
$ glow -tube=mytube -pri=177 mycmd arg1 arg2
Where to run the job from. Defaults to /tmp
. The listener will chdir
to workdir
before executing the job's command:
$ glow -tube=mytube -workdir=/home/bob/scripts mycmd arg1 arg2
For improved performance when queueng up a lot of jobs at once, a JSON list of jobs can be piped to glow's stdin:
$ echo '[{"cmd":"ls","arguments":["-l", "-a"],"pri":0,"tube":"foo","delay":0,"mailto":"[email protected]","out":"/tmp/glow.out","workdir":"/tmp/glow"},{"cmd":"ps","pri":1,"tube":"bar","delay":0,"mailto":"[email protected]","out":"/tmp/glow.out","workdir":"/tmp/glow"}]' | glow
Every time a job exits with a non 0 exit status, glow sends a message to a tube on GLOW_QUEUE
called GLOW_ERRORS
. beanstalkd clients can listen on GLOW_ERRORS
to implement custom error handling.
If a listener was started with the -smtp-server
flag set, failure emails will be sent to the list of recipients specified by the -mailto
submit flag.
$ glow -listen -smtp-server=smtp.example.com:25
$ glow -tube=mytube [email protected] mycmd arg1 arg2
$ glow -h 2>&1 | grep -v 'submit\|listen'
Delete all jobs from a list of tubes, subsequently killing the tubes:
$ glow -drain=tube1,tube2
The output of drain
is JSON that can be used to requeue the jobs by piping to glow
.
A list of tubes can be paused for a period of seconds specified by the -pause-delay
int flag, during which jobs on those tubes will not be available to be reserved by listeners:
$ glow -pause=tube1,tube2 -pause-delay=600
Show per tube beanstalkd queue statistics:
$ glow -stats