Normally, xargs
runs one command at a time. This is called
"serial" execution; the commands happen in a series, one after another.
If you’d like xargs
to do things in "parallel", you can ask it
to do so, either when you invoke it, or later while it is running.
Running several commands at one time can make the entire operation
go more quickly, if the commands are independent, and if your system
has enough resources to handle the load. When parallelism works in
your application, xargs
provides an easy way to get your work
done faster.
--max-procs=max-procs
-P max-procs
Run up to max-procs processes at a time; the default is 1. If
max-procs is 0, xargs
will run as many processes as
possible at a time. Use the ‘-n’, ‘-s’, or ‘-L’ option
with ‘-P’; otherwise chances are that the command will be run
only once. If a child process exits with status 255, xargs
will
still wait for all child processes to exit (before version 4.9.0 this
might not happen).
If xargs
is run without the ‘-P’ option, it will not
change the handling of the SIGUSR1
and SIGUSR2
signals.
This means they will terminate the xargs
program unless those
signals were set to be ignored in the parent process of xargs
.
If you do not want parallel execution but you also do not want these
signals to be fatal, you can specify -P 1
.
Suppose you have a directory tree of large image files and a
makeallsizes
script that takes a single file name and creates
various sized images from it (thumbnail-sized, web-page-sized,
printer-sized, and the original large file). The script is doing
enough work that it takes significant time to run, even on a single
image. You could run:
find originals -name '*.jpg' | xargs -L 1 makeallsizes
This will run makeallsizes filename
once for each .jpg
file in the originals
directory. However, if your system has
two central processors, this script will only keep one of them busy.
Instead, you could probably finish in about half the time by running:
find originals -name '*.jpg' | xargs -L 1 -P 2 makeallsizes
xargs
will run the first two commands in parallel, and then
whenever one of them terminates, it will start another one, until
the entire job is done.
The same idea can be generalized to as many processors as you have handy.
It also generalizes to other resources besides processors. For example,
if xargs
is running commands that are waiting for a response from a
distant network connection, running a few in parallel may reduce the
overall latency by overlapping their waiting time.
If you are running commands in parallel, you need to think about how they should arbitrate access to any resources that they share. For example, if more than one of them tries to print to stdout, the output will be produced in an indeterminate order (and very likely mixed up) unless the processes collaborate in some way to prevent this. Using some kind of locking scheme is one way to prevent such problems. In general, using a locking scheme will help ensure correct output but reduce performance. If you don’t want to tolerate the performance difference, simply arrange for each process to produce a separate output file (or otherwise use separate resources).
xargs
also allows “turning up” or “turning down” its parallelism
in the middle of a run. Suppose you are keeping your four-processor
system busy for hours, processing thousands of images using -P 4
.
Now, in the middle of the run, you or someone else wants you to reduce
your load on the system, so that something else will run faster.
If you interrupt xargs
, your job will be half-done, and it
may take significant manual work to resume it only for the remaining
images. If you suspend xargs
using your shell’s job controls
(e.g. control-Z
), then it will get no work done while suspended.
Find out the process ID of the xargs
process, either from your
shell or with the ps
command. After you send it the signal
SIGUSR2
, xargs
will run one fewer command in parallel.
If you send it the signal SIGUSR1
, it will run one more command
in parallel. For example:
shell$ xargs <allimages -L 1 -P 4 makeallsizes & [4] 27643 ... at some later point ... shell$ kill -USR2 27643 shell$ kill -USR2 %4
The first kill
command will cause xargs
to wait for
two commands to terminate before starting the next command (reducing
the parallelism from 4 to 3). The second kill
will reduce it from
3 to 2. (%4
works in some shells as a shorthand for the process
ID of the background job labeled [4]
.)
Similarly, if you started a long xargs
job without parallelism, you
can easily switch it to start running two commands in parallel by sending
it a SIGUSR1
.
xargs
will never terminate any existing commands when you ask it
to run fewer processes. It merely waits for the excess commands to
finish. If you ask it to run more commands, it will start the next
one immediately (if it has more work to do). If the degree of
parallelism is already 1, sending SIGUSR2
will have no further
effect (since --max-procs=0
means that there should be no limit
on the number of processes to run).
There is an implementation-defined limit on the number of processes.
This limit is shown with xargs --show-limits
. The limit is at
least 127 on all systems (and on the author’s system it is
2147483647).
If you send several identical signals quickly, the operating system
does not guarantee that each of them will be delivered to xargs
.
This means that you can’t rapidly increase or decrease the parallelism by
more than one command at a time. You can avoid this problem by sending
a signal, observing the result, then sending the next one; or merely by
delaying for a few seconds between signals (unless your system is very
heavily loaded).
Whether or not parallel execution will work well for you depends on the nature of the commmand you are running in parallel, on the configuration of the system on which you are running the command, and on the other work being done on the system at the time.